About

The California Department of Health Care Access and Information (HCAI) provides confidential patient-level data sets to researchers eligible through the Information Practices Act (CA Civil Code Section 1798 et seq.), which permits nonprofit educational institutions (such as the University of California) and state agencies to request data for research purposes and for performing legally mandated activities. The contents of these IPA files are described in the Master Variable Grid and the Nonpublic Data Documentation

2023 is the most recent year available for Patient Discharge, Emergency Department, and Ambulatory Surgery Data.

Research Data Set Request Forms

Below are the HCAI forms for requesting research data referenced here.  Please use the checklist and instructions while completing the request. Information is also provided in the instructions about other forms not maintained or provided by HCAI that may be required to process a request.

Justification Grids

This message serves as notice that HCAI will no longer offer the Clinical Classifications Software/Refined (CCS/R) data elements. If you have further questions, please contact DataandReports@hcai.ca.gov.

Available Data Files

Patient Discharge Data (PDD)

The Patient Discharge Dataset consists of a record for each inpatient discharge from a California-licensed hospital. Licensed hospitals include general acute care, acute psychiatric, chemical dependency recovery, and psychiatric health facilities. For more information on the data and reporting requirements, see the California Inpatient Data Reporting Manual. These datasets are available starting in 1983.

For detailed information about the data elements within the PDD, view the data dictionaries here.

Emergency Department Data (ED)

The emergency department data set includes demographic, clinical, payer, and facility information from hospitals licensed to provide emergency medical services. The ED encounters include those patients who had face-to-face contact with the provider. In the event that the patient left without being seen, the patient would not have had a face-to-face encounter with a provider and therefore the ED encounter would not be reported. A provider is defined as the person who has primary responsibility for assessing and treating the condition of a patient at a given contact and exercises independent judgment in the care of the patient. Providers include medical doctors, doctors of osteopathy, doctors of dental surgery, or doctors of podiatric medicine. If the ED encounter resulted in a same-hospital admission, the ED encounter would be combined with the inpatient record and a separate ED record would not be reported. When analyzing ED records, you may want to include the records identified in the inpatient database as having the hospital’s own ED as the source of admission.

For more information on the data and reporting requirements, see the California Emergency Department and Ambulatory Surgery Data Reporting Manual. These data sets are available beginning January 2005.

For detailed information about the data elements within the ED, view the data dictionaries here.

Ambulatory Surgery Center Data (AS)

The ambulatory surgery dataset includes encounters from general acute care hospitals and licensed freestanding Ambulatory Surgery Centers, during which at least one ambulatory surgery procedure is performed. A freestanding ambulatory surgery clinic is defined as a surgical clinic licensed by the California Department of Public Health (CDPH). Many facilities that are called ambulatory surgery centers are not required to be licensed as surgical clinics and do not report data to the Office. An ambulatory surgery procedure is defined as those procedures performed on an outpatient basis in the general operating rooms, ambulatory surgery rooms, endoscopy units, or cardiac catheterization laboratories of a hospital or a freestanding ambulatory surgery clinic. If a procedure was done elsewhere (such as in a radiology unit), no ambulatory surgery record is required to be filed. If a hospital-based AS encounter resulted in a same-hospital admission, the AS encounter would be combined with the inpatient record and a separate AS record would not be reported. When analyzing hospital-based AS records, you may want to include AS direct admissions, which are identified in the hospital’s inpatient data as having Ambulatory Surgery at the same hospital as the source of admission.

For more information on the data and reporting requirements, see the California Emergency Department and Ambulatory Surgery Data Reporting Manual. These data sets are available beginning January 2005.

For detailed information about the data elements within the AS, view the data dictionaries here.

Linked Files – Birth and Death Data

The HCAI patient record-level data is linked with the Vital Statistics Birth Statistical Master File, Birth Cohort File, Death Statistical Master Files, and the California Comprehensive Death File. The vital statistics files themselves are available from the California Department of Public Health at the Vital Statistics Data Web site. The linked data files are available to qualified researchers through requests submitted to HCAI.

Linked Birth Files

The Linked Birth File is a research database created for the purpose of studying delivery and birth outcomes. This linkage utilizes information from the following data sets:

  • California Patient Discharge Data
  • Vital Statistics Birth Certificate Data
  • Vital Statistics Death Certificate Data
  • Vital Statistics Fetal Death File
  • Vital Statistics Birth Cohort File

It includes maternal antepartum and postpartum hospital records for the nine months prior to delivery and one-year post delivery. In addition, the linked file includes birth records and all infant readmissions occurring within the first year of life. The linked pairs of birth/delivery records include information associated with a mother/baby pair from the baby’s discharge data record, the mother’s discharge data record, and the birth certificate data. All associated records (prenatal, postnatal, transfers and infant readmissions) are identified by the variable _BRTHID and are sorted in admission date order.

The file contains all infants that were born in a given year including births that occurred in a California hospital that report to HCAI, births that occurred in a California hospital that did not report to HCAI, and births that occurred outside California. It includes all infants and mothers irrespective of whether they were linked to a birth record or not. Linked Birth files are available to qualified researchers beginning with the 1991 calendar year reporting period. See the Master Variable Grid for available years. Note: The most recent year may not be a full cohort file depending on availability of the input cohort file.

Linked Death Files 

HCAI has developed validated research data sets linking patient data with the state death statistical master file. These data sets allow researchers to track mortality outcomes within and outside of the hospital.

Probabilistic Linked Death File (CCDF)

The Department of Health Care Access and Information (HCAI) is happy to announce the release of two new Linked Death products. The first product links 2014-2019 Patient Discharge Data (PDD) to the California Comprehensive Death File (CCDF) (CMIPDeath2014-2019) and the second links 2014-2019 Emergency Department (ED) Data to the CCDF (CMEDDeath2014-2019). These products are produced using a probabilistic linkage model developed jointly by ChoiceMaker, LLC and HCAI utilizing machine learning to mimic human intuition on record matching.

A brief comparison of the newly released linkage data products with the previously released products is available for download below. Full Data Dictionaries for this product can be found on the Data Documentation page. 

Probabilistic Linked Death File (VS_DSMF)

This data file provides a unique best match of a single death record to a patient’s last identifiable record in the Patient Discharge Data. Two versions of the probabilistic Linked Death file are available from 1990-2013.

  • Version A – Death records are linked to the last PDD discharge record, regardless of type of care.
  • Version B – Death records are linked to the last PDD discharge record for acute type of care.

Deterministic Linked Death File

The deterministic linkage, available beginning with calendar year 2005, links the state death statistical master file to the PDD, ED, and AS files. Requesting all 3 files can provide mortality outcomes for any inpatient, emergency department, or licensed ambulatory surgery care setting.

Coronary Artery Bypass Graft (CABG) Data

The Coronary Artery Bypass Graft (CABG) File is a research database created for the purpose of studying outcomes related to the most common surgical procedure for treating coronary artery disease. In this surgery, a vein or artery from another part of the body is used to create a new path for blood to flow to the heart, bypassing the blocked artery. Coronary artery disease is the leading cause of all adult non-maternal admissions to California hospitals, representing nearly 9% of all admissions. CABG data is collected from California-licensed hospitals where surgeons performed isolated CABG surgery, via the California Coronary Artery Bypass Graft (CABG) Outcomes Reporting Program (CCORP). The data is analyzed for quality of care reporting purposes, in compliance with California Health and Safety Code Sections 128745-128750. Data from the California Coronary Artery Bypass Graft (CABG) Outcomes Reporting Program (CCORP) are available for research purposes, subject to review and approval by HCAI. CABG files are available to qualified researchers beginning with the 2006 calendar year. See the Master Variable Grid for available years.

Frequently Asked Questions

How long does the process take?

Approximately 6-9 months, dependent on corrections needed.

How much does the data cost?

Please see the HCAI Data Pricing Policy webpage for detailed information. 

How will I receive my data?

We send data via an internal Secure File Transfer system. Find instructions here.

Who is eligible for research data requests?

Researchers from non-profit degree-granting research institutions (i.e., Universities) can apply for Research Datasets (formerly known as IPA). These researchers are eligible through the Information Practices Act (or “IPA,” CA Civil Code Section 1798 et seq.), which permits nonprofit educational institutions to request data for research purposes. These data sets are restricted to “minimum variables necessary” and require approval through CPHS.

  • Research protocols must be approved by the Committee for the Protection of Human Subjects (CPHS). The CPHS protocol must be renewed annually and kept current while the researcher has the data. CPHS is the state Internal Review Board for the California Health and Human Services Agency. CPHS approves requests at committee meetings, held six times per year.
  • Request for Linked Birth and Linked Death must also be processed by the Vital Statistics Advisory Committee at the California Department of Public Health.

State agencies are also eligible to obtain IPA files, based on the need to perform their constitutional or statutory duties. The use of the data must be compatible with the purpose for which the data was collected. The requesting agency must attest that their use of HCAI data is to support a mandated activity and a reference to the legal citation must be included as part of the application.

The contents of these IPA files are described in the Master Variable Grid and the Nonpublic Data Documentation. Additionally, detailed submission guidelines for the patient data are available for the current year.

What am I allowed to do with the data?

You may use the data for the project that you have been approved by HCAI and CPHS to use it for, analyze it, prepare reports and articles and publish your findings.

YOU MAY NOT

  • use the data for a different project
  • use someone else’s approved data for your own project
  • share the data with anyone not explicitly listed in the HCAI request form and Data Use Agreement
  • publish patient level data  or small cell size counts less than 11
  • change your scope of work in your protocol or your HCAI request form without proper approvals
  • keep patient-level data in your system after the project has ended

When is the data available? 

Patient level data, i.e., PDD, ED, and AS data, is generally available annually by Mid-July for eligible requestors. Linked Birth Data is current through 2012.  Linked death is current through 2013.

Which should I submit first, my CPHS application or my HCAI application?

We recommend that you do both processes in tandem. You will need to submit a draft of your CPHS protocol to HCAI included with your initial request packet. HCAI will need a completed request packet in order to perform a preliminary review and supply a pre-CPHS letter. You will need to submit the pre-CPHS letter to CPHS as the departmental letter of support, which is required for CPHS approval. Once you receive CPHS approval, you will need to notify HCAI to continue forward with your request.

What documents does HCAI need with my request? 

You can find all supplemental documents on the HCAI website here. The Request Form Checklist outlines all of the documents HCAI may need. 

Why does it take longer to get PDD/Linked Birth or Death data?

The California Department of Public Health is required to review all requests that contain Birth and Death Certificate data, which includes our HCAI Patient data linked to Birth or Death Certificate data. We are not allowed to release PDD/Linked Birth data until it is approved by the CDPH Vital Statistics Advisory Committee. This review is in addition to the approval by HCAI and CPHS.

I want to link HCAI data to data from another source.  Is this possible?

It depends on what is being linked and how the data are being used. This is one of the factors we look at when reviewing your request. HCAI data cannot be used to re-identify actual patients. When a linkage of this nature is needed for a specific research project, details of how the linkage can occur without the identifiers being released to the researcher need to be discussed.

I have requested data before and have been approved. I would like to request the most recent year of data available. What should I do? 

If you have a previously approved request and want to request more recent years of data, you can submit a Supplemental request. You will need to select your original request from a drop down menu in order to access the Supplemental request form. If your original request was submitted via the paper process, email DataandReports@hcai.ca.gov for instructions. The information from your previous request will be visible, but locked. You will only be able to edit the Supplemental sections of the form. This is for you to reference while filling out your Supplemental request. 

I am doing research at a medical center associated with a non-profit University. Can I request HCAI Data? 

Only non-profit Universities qualify to receive HCAI data. You will need to submit the request through the university. 

My university is for profit. Do we still qualify for data? 

Unfortunately, only non-profit Universities qualify to receive HCAI Data according to the Information Practices Act (CA Civil Code Section 1798 et seq.), which permits nonprofit educational institutions (such as the University of California) and state agencies to request data for research purposes and for performing legally mandated activities.

If an article is published from the project I did, am I required to give HCAI a copy?

You are not required to submit a copy of published materials or papers to us, but we like to know how our data is being used.

Do you publish my information on your website?

We currently list all approved IPA requests on the CHHS Open Data Portal, after the data has been released.