Duke PopHealth DataShare

Duke PopHealth DataShare™ is an extensive, shared collection of research-ready, secure electronic health data maintained by a dedicated staff.

About the PopHealth DataShare Collection

PopHealth DataShare’s collection includes IBM® MarketScan®, NC Medicaid and Medicare claims data repositories with nationally representative and geographic samples, disease-specific cohorts, and Medicare linked to clinical data registries and EHR data. Additionally, Duke PopHealth is a Jackson Heart Study Vanguard Center and has multiple years of HCUP data.

The Duke PopHealth DataShare Staff is Here to Help

DataShare is run by a team of experts that help qualified Duke scientists access and use these data to generate new insights into health and health care, saving time and money by relieving the burden of managing new, complex data sources. We provide researchers with the regulatory, project management and analytics support needed to efficiently move a project from conception to completion.

Frequently Asked Questions

Who can use Duke PopHealth DataShare?

Duke faculty, staff, post-docs, and students may request DataShare services and access to data assets for their research projects.

How do I access Duke PopHealth DataShare?

To begin a collaboration, or to find out if our services meet your needs, complete our resource request form. After we receive the form, we will set up an initial meeting to discuss your project. Access for Duke researchers is provided with permission from the Center for Medicare and Medicaid Services (CMS) and in collaboration with the Research Data Assistance Center (ResDAC).

Access to data is dependent on compliance with regulations from data providers (e.g., data use agreements) and requirements from DataShare (e.g., completion of training modules, IRB approval, payment of fees).

Can you explain some of the data terms like Medicare, Medicaid, HCUP and Supplemental Data?

Medicare is a federal health insurance program for people 65 and older, with certain disabilities, or end stage renal disease (ESRD). Medicare claims are payments from CMS to health care providers (hospitals, outpatient clinics, physicians) for services rendered including institutional costs covered under Medicare Parts A and B and physician services covered under Medicare Part B. The denominator files include beneficiary demographic characteristics, dates of death, and program eligibility and enrollment information. These data can be used to study disease, health care utilization, costs, and longitudinal outcomes of Medicare eligible beneficiaries. Medicaid is a health insurance program for low-income people co-funded by federal and state governments. States administer their own programs, generating considerable across-state coverage and claims variation. Claims data is collected from CMS Medicaid Analytic Extract (MAX) files or directly from states.

Commercial Claims are from private insurance plans and generally provide health care coverage for company or self-employed populations. These claims represent patient populations that are generally <65 (pre-Medicare) and above the poverty line (not Medicaid eligible.)

Healthcare Cost and Utilization Project (HCUP) are from both government and private insurance plans.

Note, PopHealth DataShare also includes a robust collection of curated, publicly-available reference data sets that Duke researchers can leverage in analysis.

You can find more information about all of these data sets here.

Can provide more detail about what is in the collection and storage options that Duke PopHealth DataShare offers?

Based on your needs, the Duke PopHealth DataShare team can provide the following:

  • Access to DataShare stock data assets
    • Medicare 100% inpatient research identifiable files (RIF)
    • Medicare 100% NC/SC research identifiable files (RIF)
    • Medicare 5% research identifiable files (RIF)
    • Medicare 100% inpatient limited data set (LDS) files
    • Medicare 5% limited data set (LDS) files
    • NC Medicaid data
    • MarketScan data
    • Jackson Heart Study
    • HCUP
  • Access to the highly secure DataShare analytics environment
    • Encrypted, FISMA-moderate PACE environment
    • Powerful multi-user Linux analytics server provisioned with SAS, R and STATA
    • Oracle database
    • PACE Windows desktop pre-installed with SAS, SAS Studio, R, R studio, StatTransfer, MobaXterm, DbVisualizer, PLSQLDeveloper, Microsoft Office 2016 suite, Microsoft Edge, Google Chrome, Mozilla Firefox, Notepad++, Acrobat Reader, Access 2016, Powerpoint 2016, 7-zip
    • Training for up to 3 users, which includes a 1:1 session with data asset SME as well as LMS modules on policies for appropriate use of data.
    • DataShare liaison facilitates IT troubleshooting and ongoing infrastructure maintenance and provisioning
  • Access to DataShare tools and resources
    • Curated, research-ready comorbidity, eligibility, vital and diagnosis tables   
    • Reference library containing several terminologies that are helpful for working with electronic health data (ICD, CPT, LOINC, RXNORM, etc.)
    • SAS macros including GEMS ICD-9 to ICD-10 crosswalk, RUCA rural/urban coding, consumer price index adjustment, etc.
    • ICD-9, ICD-10 and CPT code lists for comorbidity, outcome and procedure algorithms
  • Storage of other PHI/sensitive data
    • If your data provider or sponsor/funder requires a more secure storage set-up, you may request to store your data in the DataShare FISMA-moderate environment.

Can you tell me more about the support the DataShare staff provides researchers?

The PopHealth DataShare team has a wealth of experience using electronic health data for research. DataShare team members work with Duke researchers to find the best data assets for answering specific research questions, provide expert consultation to help researchers achieve objectives and can provide expertise on DUA reuse and data governance. If a tailored approach is required, we can help to build custom data assets from DUA through curation and loading. Specifically, we offer the following:

Analytics Support

  • Medicare, Medicaid, MarketScan and electronic health data expertise
  • Analysis file creation
  • Data management
  • Cohort feasibility assessments
  • Observational study design and data analysis expertise
  • Data processing and curation
  • SAS programming and sample SAS code
  • Coding algorithms

Project Management and Regulatory Support

  • Proposal development
  • Project management
  • Recruitment and patient engagement
  • Regulatory and governance support

Note, data access and use for qualifying Duke researchers are governed by data use agreements with various entities, including CMS, the Agency for Healthcare Research and Quality (AHRQ), and the American Heart Association (AHA). Data assets from additional sources including Medicaid and commercial claims are coming soon.

What are the fees?

We maintain data asset-specific training modules that are hosted in the Duke LMS system. Offered/required training modules include PACE, Medicare, Medicaid, MarketScan, and HCUP.

Fees depend on the services and data being requested. For studies that are not yet externally funded, researchers may apply for funds through Duke’s Core Facility Voucher Program.

My question wasn’t answered. Do you have a past information session I can watch for more information?

Yes! You can listen to a presentation about using the PopHealth DataShare, our data sets, and our pricing model.