Name
#158 A Machine Learning-Directed Knowledge Discovery Case Study Characterizing Post-COVID-19 Conditions Using Military Health System Data
Content Presented On Behalf Of:
Uniformed Services University
Session Type
Poster
Date
Tuesday, March 3, 2026
Start Time
5:00 PM
End Time
7:00 PM
Location
Prince Georges Expo Hall E
Focus Areas/Topics
Technology
Learning Outcomes
1.) Assess the potential of machine learning pathways as a method for knowledge discovery and outcomes research for complex diseases within the Military Health System.

2.)Refine a comprehensive set of ICD-10 codes into a concise list of candidate diagnoses using a clinical classification software and multi-step machine learning approach.

3.)Interpret a consensus Bayesian Network generated consensus directed acyclic graph to explore theoretical relationships between different diagnoses.
Session Currently Live
Description
Background The Military Health System Data Repository (MDR) is a rich supply of population level data. The structure of this data makes it challenging to utilize for knowledge discovery and clinically relevant outcomes research. Post-COVID-19 conditions (PCC) present a significant public health challenge due to a vast array of new or persistent health symptoms. The complex, multi-systemic nature of PCC makes it difficult to differentiate from other medical conditions. To advance our understanding and inform potential clinical interventions, this methodological study leveraged feature selection in conjunction with Bayesian Networks to identify novel manifestations of PCC and model the intricate relationships between PCC-related diagnoses. Methods We conducted a retrospective cohort study utilizing MDR records from July 2018 to June 2023. The study identified 269,900 active duty service members aged 18-64 with COVID-19 identified by either laboratory results or an ICD-10 diagnosis code and matched them 1:1 with uninfected controls on age, sex, and beneficiary status. All encounter data including ICD-10 diagnoses for one year of follow-up were collected. In an effort to only identify incident PCC, any conditions that occurred in the medical record for an individual in the one year before COVID-19 infection were removed. Remaining ICD-10 codes were mapped to 543 clinically relevant categories using the Healthcare Cost and Utilization Project’s Clinical Classification Software (HCUP CCS). PCCs were identified using a Lasso regression model that was stabilized using 10-fold cross validation. Separately, the maximum number of iterations was tuned using a Tree of Parzen Estimators algorithm. The resulting regularization strength and maximum iteration values were used to define the final stable model used for feature extraction. The selected HCUP CCS clinical groups were used to construct a single consensus Bayesian Network structure. We generated 1,000 bootstrap samples from the data, and Bayesian network structure was fitted to each sample set using the Hill-Climbing structure learning algorithm. The structure learning was penalized by the Bayesian Dirichlet equivalent score. The individual network structures were averaged to form a single consensus directed acyclic graph (DAG), with arc strengths weighted by the likelihood of each bootstrapped model. Results The feature selection process successfully identified 8 candidate clinical groups with positive Lasso coefficients, representing a 98.5% reduction from the original 543 categories. The identified conditions primarily included conditions known to be PCC such as respiratory symptoms, malaise, headaches, and joint pain. The feature selection also discovered potential candidate relationships with personality disorders and disorders of the ear (tinnitus). Conclusions The feature selection process followed by the Bayesian Network generated DAG allowed for an efficient evaluation of the relationship between candidate PCC diagnoses. This machine learning-based analytical pathway is a generalizable method for efficiently investigating complex sequelae and novel disease associations, generating hypotheses for targeted clinical investigations and disease management. Discovered conditions can then be validated using traditional epidemiological and statistical techniques to further elucidate associations.