About
Assembly Bill 80 (Chapter 12, Statutes of 2020) provides HCAI the authority to establish the Health Care Payments Database (HPD) Program. The Health Care Payments Database (HPD) is California’s All Payer Claims Database or APCD. The HPD is a research database comprised of healthcare administrative data: claims and encounters generated by transactions among payers and providers on behalf of insured individuals. The HPD collects claim and encounter data as submitted from California payers.
The information from the HPD System is intended to support greater health care cost transparency and will be used to inform policy decisions regarding the provision of quality health care, and to reduce health care costs and disparities. It is also intended for the information to be used to develop innovative approaches, services, and programs that may have the potential to deliver health care that is both cost effective and responsive to the needs of all Californians.
To maximize its utility and value for California policymakers, researchers, and others interested in improving California’s healthcare system, HCAI intends for the HPD to be as comprehensive and complete as possible by increasing the quality, volume, and variety of data collected over time.
Data Release Program
HCAI is required to develop a comprehensive data access and release program and convene a Data Release Committee (DRC) to advise HCAI and review requests for access to nonpublic data.
HPD public reports, such as HPD Snapshot and HPD Measures, provide examples for potential applicants to reference for what types of analyses may be possible with access to non-public HPD data.
As California’s APCD, the HPD contains data never before made available for research and analysis, providing an all-payer, all-setting, statewide view of the California healthcare system. Like other APCDs, the HPD contains common information used for billing, such as a patient’s diagnosis, the procedure performed, and the amount paid for a claim.
APCDs have improved price transparency and have been used to impact policy, improve understanding of population health, reduce cost, and improve the provision and quality of health care. Carman, Katherine Grace, et al. “The History, promise and Challenges of State All Payer Claims Databases”: Background Memo for the State All Payer Claims Database Advisory Committee to the Department of Labor, report, PR-A1396-1, U.S. Department of Health and Human Services, Office of the Assistant Secretary for Planning and Evaluation, 2 June 2021, aspe.hhs.gov/sites/default/files/private/pdf/265666/apcd-background-report.pdf.
Rhode Island was able to use its APCD to identify $90 million in potential savings from a reduction in avoidable emergency room visits, and researchers were able to use Virginia’s APCD to identify $586 million in unnecessary spending on low-value services. Analysts in Colorado used its APCD to track EpiPen prescription costs, highlighting an increase of $400 per prescription between 2009 and 2016. Several states have been able to use APCD data to effectively track chronic disease rates and understand more about the costs behind their treatment. Porter, Josephine, and Denise Love. “The ABCs of APCDs.” California Health Care Foundation, 8 Oct. 2018, www.chcf.org/publication/the-abcs-of-apcds.
Available Data Files
The following is a list of data files available for request for years 2018-2023. The HPD uses the APCD Common Data Layout for file formats, which includes all of the data elements available in the database. Potential applicants should review the data layout in preparation for submitting their application.
Medical Claims
This data file contains medical claims and encounters data for primary and specialist care, outpatient surgeries, inpatient stays, and laboratory testing. This also includes data for home health care and nursing home care. Diagnoses, procedures, charged amounts, and paid amounts (plan and consumer paid amounts) data is also included.
Pharmacy Claims
This data file contains pharmacy data, such as drug names, quantity dispensed, charged amount, paid amounts (plan and consumer paid amounts). Provider and pharmacy information including name, specialty, location, and mail-order data is also included.
Member Eligibility
This data file contains member identifiers, age, gender, and location data. Payer/health plan information, assigned primary care provider (PCP), and enrollment in person-centered medical home (PCMH) data is also included.
Provider Information
This data file contains provider identifiers, taxonomy/specialty, and location data.
Available Datasets
Data applicants can request the following types of datasets, each tailored to different levels of access, detail, and confidentiality. The HPD statute contemplates that users of HPD data should only have access to the minimum amount of confidential data necessary for an approved project or access to a dataset designed for an approved purpose. Read more about how HCAI data is protected.
1. Standard Limited: This record-level dataset includes enrollment, medical, pharmacy, and provider data for commercial health plans (including Medicare Advantage), or both commercial health plans and Medi-Cal. It is pre-built to be suitable for a variety of analyses, with new plan years of data added annually. It is called a “limited” dataset because all direct identifiers, and specified indirect identifiers, for patients, providers, and health plans have been removed. The Standard Limited dataset can be requested for access via the secure data enclave or can be directly transmitted to the user’s system.
2. Standard Limited Plus: This dataset is identical to the Standard Limited dataset, except it includes direct identifiers for providers and health plans. This dataset can be requested for access via the secure data enclave.
3. Custom Limited: These datasets are subsets of the Standard Limited or Standard Limited Plus datasets, architected during the collaborative request review process to best meet an applicant’s needs. This dataset can be requested for access via the secure data enclave or direct transmission to the user’s system.
4. Research Identifiable: These record-level datasets include enrollment, medical, pharmacy, and provider data for commercial health plans (including Medicare Advantage), or both commercial health plans and Medi-Cal. They contain direct patient identifiers, which are typically used to link to other datasets. These datasets can be requested for access via the secure data enclave or direct transmission to the user’s system. Request for research identifiable data will only be provided after a recommendation of approval by the HPD Data Release Committee.
Standard Limited and Standard Limited Plus Dataset Request Information
Refer to the Data Dictionary for data element definitions and use the instructions to help develop your data request.
- Standard Limited and Standard Limited Plus Dataset Request Instructions
- Standard Limited and Standard Limited Plus Data Dictionary
Custom Limited and Research Identifiable Dataset Request Information
Use the Justification Grid to select each variable and provide justification for why they are needed to satisfy your project. You will upload this document to your data request through the Data Request Portal as part of your application. The Justification Grid also includes the data dictionary information for the Custom Limited and Research Identifiable datasets.
Methods of Access
There are two ways data applicants can access non-public data: via (1) the HCAI Secure Data Enclave or (2) Direct Transmission. The HPD statute contemplates that use of HPD data in the secure data enclave is the preferred way to safely access HPD data and that access to data outside of the secure data enclave should be limited, such as if the use of the data could not reasonably be achieved by accessing the data through the enclave.
Secure Data Enclave
The Secure Data Enclave is a centralized service for 24/7 remote access to sensitive data. It houses the data in a secure environment, protected in accordance with state and federal security and privacy rules, where it can be accessed via a virtual machine launched from the user’s computer. The Enclave comes equipped with a suite of statistical tools and provides scalable computing power to perform analysis on a high volume of data. Datasets can only be downloaded from the Enclave after they have been properly de-identified and after HCAI administrator review and approval is granted.
Direct Transmission
Direct Transmission is a method where HCAI sends copies of program data outside the secure data enclave directly to an individual or organization. Request for Direct Transmission of data will only be provided after a recommendation of approval by the HPD Data Release Committee.
Pricing Policy
All applicants, except for state agencies, are required to pay the $100 application fee for new and supplemental requests. If the data request is approved, the application fee will be applied to the cost of the data. Otherwise, the fee is non-refundable. Pricing for HPD data requests varies based on access method. Secure Data Enclave requests will be charged by request type, number of users, and the project space. Direct Transmission requests will be charged per data product and volume.
Secure Data Enclave Requests
Secure Data Enclave requests will be charged annually depending on user access and request type. For the first year of the HPD Data Request Process, available budget permitting, HCAI will be providing a 20% reduction in the base price for Secure Data Enclave Access requests:
Components: project space + seats, + custom dataset (if applicable) | Reduced Price (year one only) |
Project space (required) | $4,000 |
+ Analyst, per seat | $4,000 |
+ Researcher, per seat | $4,800 |
+ if Custom Dataset (not the Standard Limited Dataset) | $7,200 |
The Project Space price will be charged for all requests and all dataset types (Standard Limited, Standard Limited Plus, Custom Limited, and Research Identifiable). The additional Custom Dataset price will be charged for requests that use Custom Limited or Research Identifiable datasets.
There are two types of seats used to access the secure data enclave – Analyst and Researcher. Both access the enclave’s shared project folders and software query tools1 through a virtual Windows desktop, where they can create custom reports and data products. The differences between the Analyst and Researcher seats are in the amount of storage and compute power available to them:
1. Analyst
- Amazon Web Services (AWS) Performance Workspace with 2 virtual Central Processing Units (CPUs) and 8 Gigabytes (GB) of memory
- 5 Terabytes (TB) of data querying workspace/month
2. Researcher
- AWS Power Workspace with 4 virtual CPUs and 16 GB of memory
- 10 TB of data querying workspace/month
1 Software Query Tools: Anaconda (for querying with python), DataGrip (for querying with SQL), MS Office (Excel, PowerPoint, Word), and RStudio (for querying with R).
Enclave users will also have the option to add additional software tools, storage, and compute to their workspace. The price for these tools will be charged annually, per user:
Additional Options | Reduced Price (year one only) |
SAS Viya License, per user | $4,000 |
Stata/MP License, per user | $880 |
Tableau Creator License, per user | $744 |
Increase Redshift database storage/500GB/year | $250 |
Increase Amazon S3 file storage/500GB/year | $250 |
Increase data querying space 1TB/user/month | $10 |
To estimate the annual cost of your project, please use the formulas below:
Standard Limited and Standard Limited Plus
Project space ($4,000) + # of Analyst seats ($4,000 each) + # of Researcher seats ($4,800 each) + any additional software tools, storage, and compute options
Custom Limited and Research Identifiable
Project space ($4,000) + # of Analyst seats ($4,000 each) + # of Researcher seats ($4,800 each) + Custom dataset cost ($7,200) + any additional software tools, storage, and compute options
Direct Transmission Requests
Direct Transmission costs are per dataset per year of data. Pricing varies depending on request type:
Standard Limited pricing
Data Table | Price/Year of Data |
Commercial Data | $20,000 |
Commercial + Medi-Cal Data | $22,000 |
Custom Limited and Research Identifiable Pricing
Data Table | Price/Year of Data |
Custom Dataset | $7,200 |
+ | |
All Tables/Year | $22,500 |
Or | |
Table-by-Table | |
Medical Claims | $8,500 |
Pharmacy Claims | $7,000 |
Member Enrollment | $4,000 |
Provider Information | $3,000 |
The additional Custom Dataset price ($7,200) will be charged for requests that use Custom Limited or Research Identifiable datasets.
To estimate the price of your project, please use the formulas below:
Standard Limited Commercial Data Only
[Commercial Data ($20,000)] x number of years of data requested
Standard Limited Commercial +Medi-Cal Data
[Commercial Data + Medi-Cal Data ($22,000)] x number of years of data requested
Custom Limited and Research Identifiable
[Custom Dataset ($7,200)] +
[Medical Claims ($8,500) * number of years of data requested] +
[Pharmacy Claims ($7,000) * number of years of data requested] +
[Member Enrollment ($4,000) * number of years of data requested] +
[Provider Information ($3,000) * number of years of data requested]
Invoicing
Once your request has been submitted, your assigned analyst will provide you with a preliminary invoice. The preliminary invoice is an estimated data price based on selected data products, data years, and access method. Changes made in those areas during the data request review process will affect the final price. The final data price will be provided when the request has been fully approved.
Price Reductions
HCAI will consider price reductions for applicants that show good cause exists for a reduction. For example, with sufficient justification, price reductions may be available for entities that:
1. Are consumer organizations, students or academic fellows, government organizations, or data submitters to HPD, and
2. Are working on behalf of a non-profit organization that have a demonstrable financial hardship, and
3. Are working on projects in high priority areas, or will lead to innovations that will benefit the public at large, including:
- Health equity, or
- Health Workforce, or
- Affordability, or
- Produces open-sourced code available to be shared with HCAI, or
- Results help to evaluate or improve HPD data quality.
If an applicant is requesting a price reduction, HCAI will perform a pre-screening upon receipt of a complete application and the application fee to conditionally approve the reduction, to allow applicants to seek the needed funding. Such conditional approvals are contingent upon successful data request approval and the availability of funds for price reductions at the time the data request is approved. Such approvals are per project.
Funding Opportunities
The California Health Care Foundation notified HCAI that it is establishing a grant program to help offset the cost of HPD data for selected applicants. As a service to the HPD community, HCAI is providing a link for interested parties: https://www.chcf.org/rfp/request-for-proposals-california-healthcare-payments-database-affordability-research-fund/
HCAI is not affiliated with the California Health Care Foundation or its HPD grant program. Submitting a grant application to the California Health Care Foundation has no bearing on an HPD data request submitted to HCAI.
Uses of HPD Data
Can HPD data be used to answer your policy or research question?
HPD can be used to answer questions related to health care delivered to insured California patients and on topics related to:
- Systematic trends, such as comparing payers or counties
- Operational elements of the healthcare system, such as the utilization or cost of conditions and procedures
- Cost analyses based on claims data
The HPD cannot be used for:
- Research about individual patients, the uninsured population, people who self-pay for health services, or about populations residing outside of California
- Research requiring clinical information such as medical chart notes or lab results
- Cost analyses based on non-claims data. Read more about HCAI’s efforts to add non-claims payment information to the HPD.
HPD Data: What Is Included?
The table below outlines what is and what is not included in the HPD.
Category | Included | Not Included |
Geography | California residents | Non-California residents |
Populations | Insured populations | The uninsured population and any self-pay services |
Health Coverage Types | 1. Commercial health plans (fully insured) 2. Medicare Advantage plans 3. Medi-Cal, including the Genetically Handicapped Persons Program; Family Planning, Access, Care and Treatment; and California Children’s Services 4. Self-insured public payers with more than 40,000 self-funded lives 5. Medicare fee for service (available to state agencies only) | 1. Self-insured employers – Read more about HCAI’s efforts to encourage voluntary submission. 2. Military health plans 3. Fully insured plans under 40,000 covered lives 4. Supplemental insurance (including Medicare supplemental) 5. Stop-loss plans 6. Student health insurance 7. Chiropractic-only, discount, and vision-only insurance 8. Accident, disability, hospital indemnity, liability, long-term care insurance, specific disease policies, and workers compensation. |
Data Files | 1. Member Eligibility 2. Medical Claims 3. Pharmacy Claims 4. Dental Claims (not yet available for release) 5. Provider Information | 1. Clinical information such as medical chart notes or lab results 2. Non-claims payment information – Read more about HCAI’s efforts to add non-claims payment information to the HPD. |
Dates | Claims and encounters for services rendered from 2018-2023. Subsequent year claims are added in on a 2-year delay. For example, 2024 data will be available in 2026, and so on. | Claims and encounters for services rendered prior to 2018 |
Informational Webinar
Join us on January 15, 2025, for an informational webinar for researchers, state agencies, and other qualified applicants to hear from HCAl staff about the just-released HPD data request process, including how to access the data requests, dataset meanings, and application review timelines. HCAl staff will provide a brief presentation then respond to pre-submitted questions from attendees.
HCAl encourages attendees to submit questions in advance, preferably through the webinar’s registration form or by email at DataAndReports@hcai.ca.gov.
Resources
- HPD Data Use, Access, and Release Regulations
- HCAI DRC Manual
- HCAI DRC Public Meetings
- Standard Limited and Standard Limited Plus Data Dictionary
- Custom Limited and Research Identifiable Justification Grid
HCAI’s sends out notifications and newsletters for program updates, notifications of proposed regulations, and HCAI initiatives. Click below to subscribe.
FAQs
Who is eligible to request HPD Data?
Any individual or organization is eligible to request HPD Data, including but not limited to state agencies, non-profit research institutions, non-profit educational institutions, hospitals, physician organizations, labor unions, self-insured employer plans, and consumer organizations.
How can I apply to access HPD data?
To start your application to access HPD data, please register on the HCAI Data Request Portal. Then, please submit a detailed application, answering the questions to the best of your ability and attaching necessary documentation. HCAI will then contact you for next steps.
How will I receive my data?
Dependent upon the nature of the request, you will receive the data through the Secure Data Enclave or Direct Transmission.
What is the HCAI Secure Data Enclave?
The HCAI Secure Data Enclave is a centralized service to remotely access sensitive data. It houses data in a secure environment and protects that data in accordance with state and federal security and privacy rules. The Enclave provides a scalable environment, and software tools, so a user can analyze a large volume of data regardless of their personal computer and storage resources. Data in the Enclave is accessed via a virtual machine operating on a remote server and launched from the user’s computer. The virtual machine is controlled by HCAI so that data products created on it can only be downloaded to the user’s computer with HCAI’s permission. Users cannot copy/paste, email out, or otherwise remove data from the Enclave. Only after HCAI verifies that the data products align with the user’s approved request and with CalHHS Data De-Identification Guidelines can they be downloaded to the user’s computer.
A user’s experience in the Enclave is tailored to their data request. HCAI’s Enclave makes the approved data available to approved users for approved projects. The data for each project in the Enclave is segregated from other projects’ data. Authorized users can upload and link their own datasets into the Enclave and link their data to the HPD dataset. Enclave users will have access to interactive, query-based tools, and statistical tools (e.g., SQL, SAS, Stata, R, Python).
Standard Limited Datasets, Standard Limited Plus Datasets, Custom Limited Datasets, and Research Identifiable Data is available through the Enclave.
What is Direct Transmission?
Applicants approved for Direct Transmission will receive copies of HPD datasets outside of the Enclave, via a secure file transfer program.
To receive Direct Transmission, the applicant must describe why the HCAI Data Enclave does not meet their needs.
Standard Limited Datasets, Custom Limited Datasets, and Research Identifiable Data are available through Direct Transmission. Standard Limited Plus Datasets are not available through Direct Transmission.
What am I allowed to do with the data?
You may use the data only for the purposes described in the approved data request and signed Data Use Agreement.
YOU MAY NOT:
- use the data for a different project.
- use someone else’s approved data for your own project.
- share the data with anyone not explicitly listed in the HCAI request form and Data Use Agreement.
- publish aggregated HPD data that includes cell size counts less than 11.
- change your scope of work in your protocol or your HCAI request form without proper approvals.
- keep HPD data in your system after the project has ended.
Why are there additional restrictions on research identifiable datasets, direct transmission access, and access to Medi-Cal data?
Additional restrictions are placed on research identifiable datasets because they contain sensitive information. Additional restrictions are placed on direct transmission because data use is more difficult to control outside of HCAI’s Secure Data Enclave. Additional restrictions are placed on Medi-Cal data because use of that data must be approved by DHCS (Department of Health Care Services) and meet the requirements of the Medi-Cal program.
How long does it take to get approved for HPD data access?
The approval process for HPD application can vary for each request type. HCAI’s goal is to process and approve data requests in 120 days. However, this depends on multiple factors including responsiveness of the requestor, level of detail included in the application, type of data requested, and mode of accessing the data. Requests for commercial-only Standard Limited Datasets accessed via the secure data enclave will have the quickest turnaround time.
What does the application review process look like?
All applications will be reviewed by HCAI, with additional levels of review for more complex requests. Applications requesting Medi-Cal data, research identifiable data, or direct transmission access will require further review from the Department of Health Care Services, the HPD Data Release Committee, and/or the Committee for the Protection of Human Subjects, as appropriate.
What is the role of the HPD Data Release Committee?
The HPD Data Release Committee was established by state law to advise HCAI on issues of data privacy and security and to review specified applications for non-public data. The Data Release Committee is made up of health care payers, providers, purchasers, researchers, consumers, and labor with knowledge and experience with health care data, privacy, and security.
The Data Release Committee has enacted a HPD DRC Manual, which includes the DRC Considerations for Application Review that committee members will apply in reviewing data requests. Applicants may wish to review the manual to assist in developing their data request.
Do I need approval from CPHS for my project?
For Research Identifiable datasets, HPD requires approval of data applicant’s project by the Committee for the Protection of Human Subjects (CPHS). The CPHS is a state entity outside of HCAI and is currently administered by the California Health and Human Services Agency’s Center for the Data Insights and Innovation.
Which should I submit first, my CPHS application or my HCAI application?
For requests that require CPHS approval, we recommend that you do both processes in tandem. You will submit to HCAI a draft of your CPHS protocol as you go through HCAI’s preliminary review of your request, after which HCAI will provide you a pre-CPHS Letter of Acknowledgement. You will need to submit the pre-CPHS letter to CPHS as the departmental letter of support, which is required for CPHS approval. Once you receive CPHS approval, you will need to notify HCAI and upload the CPHS Approval Letter before HCAI’s final approval of your request.
To ensure the HPD data request process is not unduly delayed, please confirm information submitted to CPHS is consistent with information submitted to HCAI.
Why is there a cost for accessing HPD data?
The HPD statute contemplates data user fees as a component of HPD program funding. HCAI uses money received from data requestors to cover the associated costs to administer and operate the HPD program and to provide data to users. To best serve those interested in the HPD program, HCAI will need to develop a long-term funding mechanism to sustain the program. In March 2023, HCAI submitted a report to the Legislature on recommendations for funding options for the program. Read the funding options report.
When will dental claims and non-claims payments data be collected and available?
Dental claims data will be collected beginning in 2024, while non-claims payments data will be collected beginning in 2025.
What are tips for a successful application?
- The top of the application will allow you to select the type of data you are requesting (Limited, Custom or Research) and the requested data access method (Enclave or Direct Transmission). These selections will change the content of the form appropriately.
- Be sure to fill out all fields in the application form with complete details.
- You will be required to indicate which HPD goals your application aligns with or offers significant opportunities to achieve. The goals of the program can be found here.
- Indicate if your organization submits data to the HPD program under Organization Information.
- If you are requesting Research Identifiable Data or Custom Limited Data, complete the Justification Grid beforehand so you can attach it to the application. For more information on the Justification Grid, please visit here.
- If appropriate for your request, complete and upload a draft of the CPHS Protocol and attach to your data request form in the HCAI data request portal.
- All individuals who will have access to the HPD data must be listed in the Data Access table.
- Requests for secure data enclave access must complete the Data Users table, for the setup of each data user’s account within HCAI’s Secure Data Enclave environment.
- If you plan to utilize external datasets with the requested HPD data, include information about those datasets in the Linkage section of the application form.
- To ensure data security guidelines are in alignment, please visit HCAI’s Security Guidelines.
- Once you complete your application and pay the application fee, you will be brought to a page that will show your application status and state “CSXXXXXXX Created”. This is your case number. Please save this for future reference.
- Familiarize yourself with the Data Release Committee Board Manual for considerations that committee members will apply in reviewing data requests.
- To reduce turnaround time for the data request, please be responsive to any questions or comments from HCAI.
How do I contact HCAI if I am having problems registering in the HCAI Data Request Portal, or I have questions before starting my application?
If you are having problems registering in the data request portal or have questions before starting your application, please email dataandreports@hcai.ca.gov.
Feedback
HCAI will continue to advance the accessibility and usefulness of the HPD as the database becomes more comprehensive, and HCAI builds its capacity over time.
HCAI wants your feedback about the published HPD public reports, current or future potential uses of HPD data, and ideas for how to evolve the HPD program. Share your feedback with HCAI staff by clicking the button below.