Name
#175 Data Harmonization in the Age of AI: A 3 Part Framework for Success Across the Department of Health and Human Services
Speakers
Content Presented On Behalf Of:
USPHS
Session Type
Poster
Date
Tuesday, March 3, 2026
Start Time
5:00 PM
End Time
7:00 PM
Location
Prince Georges Expo Hall E
Focus Areas/Topics
Technology
Learning Outcomes
Following this session, the attendee will be able to: (1) Understand how disparate data affects the use of AI effectively across the HHS (2) Describe 3 things from the framework that facilitates more effective and efficient AI use (3) Summarize AI's limitations and strengths and the importance of good data
Session Currently Live
Description
The core mission of the U.S. Department of Health and Human Services (HHS) is to enhance the health and well-being of all Americans. The HHS has a comprehensive strategic plan for the integration of Artificial Intelligence (AI) that guides its initiatives across research, regulatory oversight, and administrative efficiency. However, data harmonization is critical for unlocking the full potential of Artificial Intelligence (AI) across the HHS. The HHS operates through 13 agencies, known as Operating Divisions (OpDivs), that collect vast amounts of data across various domains, such as clinical trials, electronic health records, public health surveillance, and claims processing. This data is often stored in siloed systems with heterogeneous formats, inconsistent ontologies, and varying data quality standards. This fragmentation presents a major impediment to developing robust, generalizable, and equitable AI models that require large, diverse, and clean datasets for training and validation. This work explores the challenges, strategies, and benefits of achieving comprehensive data harmonization to support HHS's AI initiatives, ultimately aiming to improve public health outcomes, enhance operational efficiency, and accelerate biomedical discovery. Our three part framework consist of (1) Standardization and Interoperability: Adopting common data models (e.g., OMOP, FHIR) and standardized terminologies (e.g., LOINC, SNOMED CT) by establishing clear, shared definitions and mapping existing data to these agreed-upon standards; (2) Infrastructure Development: Building secure, scalable, and federated data platforms to promote data sharing and distributed AI model training (e.g., federated learning) while maintaining privacy and security compliance (e.g., HIPAA), ideally within government enclaves and (3) Governance and Policy: Establishing a strong data governance framework within HHS with buy in from all OpDivs. Governance also necessitates clear policies for data ownership, access, quality assessment, and ethical use to ensure consistency and trust. It is hoped that this framework will provide enhanced operational efficiency through the use of AI to automate things such as administrative tasks, detect fraud and abuse in healthcare programs, and optimize resource allocation across departments. In addition data harmonization enables the creation of datasets that accurately reflect diverse populations, ensuring that AI tools are effective and fair for all Americans and helps address health disparities. Lastly, harmonized and well curated data significantly enhances the efficacy and scope of AI applications within HHS via more accurate and less biased AI models for predicting disease outbreaks, identifying at-risk populations, and personalizing treatment protocols that in turn could help accelerated research and discovery. Specifically combining diverse datasets facilitates large-scale meta-analyses and deeper biological insights, accelerating drug discovery, clinical trial optimization, and understanding complex diseases. Achieving robust data harmonization is a prerequisite for realizing the transformative potential of AI across the HHS mission. It requires a sustained, collaborative effort across all agencies, a commitment to standardized practices, and investment in modern data infrastructure and its governance. This foundational work will empower the HHS to leverage data-driven intelligence to improve the health and well-being of the nation in an effective and efficient manner.