R Code with DataShare Logo

Duke PopHealth DataShare

Duke PopHealth DataShare™ is an extensive, shared collection of research-ready, secure electronic health data maintained by a dedicated staff.

About the PopHealth DataShare Collection

PopHealth DataShare’s collection includes NC Medicaid and Medicare claims data repositories with nationally representative and geographic samples, disease-specific cohorts, and Medicare linked to clinical data registries and EHR data. Additionally, Duke PopHealth is a Jackson Heart Study Vanguard Center and has multiple years of HCUP data.

The Duke PopHealth DataShare Staff is Here to Help

DataShare is run by a team of experts that help qualified Duke scientists access and use these data to generate new insights into health and health care, saving time and money by relieving the burden of managing new, complex data sources. We provide researchers with the regulatory, project management and analytics support needed to efficiently move a project from conception to completion.

Frequently Asked Questions

Duke faculty, staff, post-docs, trainees, and students may request DataShare services and access to data assets for their research projects.

To begin a collaboration, or to find out if our services meet your needs, complete our resource request form. After we receive the completed form, we will review your responses and contact you via email to further discuss your project and/or provide a budget for requested services.

Access to data is dependent on compliance with regulations from data providers (e.g., data use agreements) and requirements from DataShare (e.g., completion of training modules, IRB approval, payment of fees).

Medicare is a federal health insurance program for people 65 and older, with certain disabilities, or end-stage renal disease (ESRD). Medicare claims are payments from CMS to health care providers (hospitals, outpatient clinics, physicians) for services rendered including institutional costs covered under Medicare Parts A and B and physician services covered under Medicare Part B. The denominator files include beneficiary demographic characteristics, dates of death, and program eligibility and enrollment information. These data can be used to study disease, health care utilization, costs, and longitudinal outcomes of Medicare-eligible beneficiaries. Medicaid is a health insurance program for low-income people co-funded by federal and state governments. States administer their own programs, generating considerable across-state coverage and claims variation. Claims data is collected from CMS Medicaid Analytic Extract (MAX) files or directly from states.

Commercial Claims are from private insurance plans and generally provide health care coverage for company or self-employed populations. These claims represent patient populations that are generally <65 (pre-Medicare) and above the poverty line (not Medicaid eligible.)

Healthcare Cost and Utilization Project (HCUP) are from both government and private insurance plans.

The American Hospital Association survey database includes information on hospital demographics, utilization, service lines, physician arrangements, etc. from more than 6,000 hospitals and healthcare systems

Note, PopHealth DataShare also includes a robust collection of curated, publicly available reference data sets that Duke researchers can leverage in analysis.

You can find more information about all of these data sets here.

Based on your needs, the Duke PopHealth DataShare team can provide the following:

  • Access to DataShare stock data assets
    • Medicare 100% inpatient research identifiable files (RIF)*
    • Medicare 100% NC/SC research identifiable files (RIF)*
    • Medicare 5% research identifiable files (RIF)*
    • Medicare 100% inpatient limited data set (LDS) files*
    • Medicare 5% limited data set (LDS) files*
    • NC Medicaid data
    • Jackson Heart Study
    • HCUP
    • American Hospital Association survey database

*A detailed review of the differences between the CMS RIFs and the CMS LDS files can be found here: https://resdac.org/articles/differences-between-rif-lds-and-puf-data-files   If  you would like to further discuss the implications of these differences, you can contact us at pophealthdatashare@dm.duke.edu

  • Access to the highly secure DataShare analytics environment
    • Powerful multi-user Linux analytics server provisioned with SAS, R and STATA**
    • Oracle database
    • PACE Windows desktop currently (as of June 2022) pre-installed with SAS, SAS Studio, R, R studio, StatTransfer, MobaXterm, DbVisualizer, PLSQLDeveloper, Microsoft Office 2016 suite, Microsoft Edge, Google Chrome, Mozilla Firefox, Notepad++, Acrobat Reader, Access 2016, PowerPoint 2016, 7-zip
    • Access and a one-on-one technical training session for up to 3 individual users, including SAS technology set up session and DPHS LMS modules on policies for appropriate use of data.
    • DataShare liaison facilitates IT troubleshooting and ongoing infrastructure maintenance and provisioning

** While R, R Studio, and STATA are included in the DPHS analytics environment, no training or technical assistance for these tools is provided by DataShare.

  • Access to DataShare tools and resources
    • Curated, research-ready comorbidity, eligibility, vital, and diagnosis tables   
    • Reference library containing several terminologies that are helpful for working with electronic health data (ICD, CPT, LOINC, RXNORM, etc.)
    • SAS macros including GEMS ICD-9 to ICD-10 crosswalk, RUCA rural/urban coding, consumer price index adjustment, etc.
    • ICD-9, ICD-10, and CPT code lists for comorbidity, outcome, and procedure algorithms
  • Storage of PHI/sensitive data
    • If your data provider or sponsor/funder requires a more secure storage set-up, you may request to store your data in the DataShare FISMA-moderate environment.

The PopHealth DataShare team has a wealth of experience using electronic health data for research. DataShare team members work with Duke researchers to find the best data assets for answering specific research questions, provide expert consultation to help researchers achieve objectives and can provide expertise on DUA reuse and data governance. If a tailored approach is required, we can help to build custom data assets from DUA through curation and loading. Specifically, we offer the following:

DataShare Biostatistics support 

The DataShare Biostatistics team can provide full research study collaboration support or consulting in all phases of a research study: grant/protocol development or review, study design, development of statistical analysis plans, descriptive and inferential analyses, verification of analyses and results, manuscript development and review, and budgeting for statistical collaboration. Areas of expertise include biostatistics, epidemiology, psychology, observational and comparative effectiveness research using large-scale health insurance claims, electronic health records, clinical registries and cohort study data, prospective implementation science, health system randomized controlled trials (RCTs) and survey research, SAS and R statistical programming, and data science. It’s strongly recommended that new data users include consultation with the DataShare Biostats team in their research program.

DataShare Programming & Informatics support  

DataShare analyst programmers design, code, test, execute and document SQL and SAS programs for data analytics; conduct data characterization; perform data validation and quality assessments; create analytic data sets; perform data analyses; and are responsible for data conversion/transfer, graphic production, and project reporting.

DataShare Operations/Regulatory support

DataShare Operations staff provide operational, regulatory, and financial oversight to facilitate the completion of clinical research studies on time and within budget. Specific support activities include the development of study timelines and budgets, serving as a liaison between the study staff and investigators, preparing and submitting data use agreement applications, and preparing and submitting IRB protocols, amendments, and closures. DataShare Operations staff are also experienced in participant recruitment and retention for RCTs, consent form development, and manuscript development. It’s strongly recommended that researchers include DataShare Operations staff (project management and/or regulatory support) in their study.

Note, data access and use for qualifying Duke researchers are governed by data use agreements with various entities, including CMS, the NC Department of Health and Human Services, and the American Heart Association (AHA). 

Fees depend on the services and data being requested. DataShare fees are approved by the School of Medicine and go into effect on July 1 each year.  This means all services provided July 1 - June 30 are billed under that fiscal year’s fee schedule. Fees are calculated based on staff salaries, infrastructure costs, data costs, etc. From July 1, 2019-June 30, 2022, the DataShare received financial support (i.e. subvention funds) from the School of Medicine. During that time, the DataShare was able to grant data access and research staff support at reduced pricing.

PACE charges (e.g. transfer agent, honest broker, PACE folder, etc.) are not included. Researchers should contact the PACE Team to determine the PACE fees associated with their project. https://pace.ori.duke.edu/

For more information about fees, you may complete our resource request form or email pophealthdatashare@dm.duke.edu.

Yes! You can listen to a presentation about using the PopHealth DataShare, our data sets, and our pricing model.