Data Assets

Research Relevant Data

The PopHealth DataShare maintains an extensive collection of research-ready, secure health care data including IBM® MarketScan®, NC Medicaid, and Medicare claims data repositories with nationally representative and geographic samples, disease-specific cohorts, and Medicare linked to clinical data registries and EHR data. Access for Duke researchers is provided with permission from the Center for Medicare and Medicaid Services (CMS) and in collaboration with the Research Data Assistance Center (ResDAC).

The DataShare team members can work with Duke researchers to find the best data assets for answering specific research questions, and can provide expertise on DUA reuse and data governance. If a tailored approach is required, we can help to build custom data assets from DUA through curation and loading. Data access and use for qualifying Duke researchers are governed by data use agreements with various entities, including CMS, the Agency for Healthcare Research and Quality (AHRQ), and the American Heart Association (AHA). Data assets from additional sources including Medicaid and commercial claims are coming soon.


Medicare is a federal health insurance program for people 65 and older, with certain disabilities, or end stage renal disease (ESRD). Medicare claims are payments from CMS to health care providers (hospitals, outpatient clinics, physicians) for services rendered including institutional costs covered under Medicare Parts A and B and physician services covered under Medicare Part B. The denominator files include beneficiary demographic characteristics, dates of death, and program eligibility and enrollment information. These data can be used to study disease, health care utilization, costs, and longitudinal outcomes of Medicare eligible beneficiaries.


Medicaid is a health insurance program for low-income people co-funded by federal and state governments. States administer their own programs, generating considerable across-state coverage and claims variation. Claims data is collected from CMS Medicaid Analytic Extract (MAX) files or directly from states.

Commercial Claims

Commercial Claims are from private insurance plans and generally provide health care coverage for company or self-employed populations. These claims represent patient populations that are generally <65 (pre-Medicare) and above the poverty line (not Medicaid eligible.)


Healthcare Cost and Utilization Project (HCUP) are from both government and private insurance plans.

Supplemental Data

The PopHealth DataShare includes a robust collection of curated, publicly-available reference data sets that Duke researchers can leverage in analysis.