An artificial intelligence–enabled electronic health record phenotyping approach identified patients meeting criteria for postacute sequelae of SARS-CoV-2 infection (PASC) among 74,560 of 457,950 adults with COVID-19 treated across 58 hospitals and affiliated clinics in 4 US regions, according to a retrospective cohort study published in JAMA Network Open.
The study included adults aged 18 years or older with laboratory-confirmed SARS-CoV-2 infection or a COVID-19 diagnosis code in New England, Southeast Texas, Southern California, and Western Pennsylvania. Researchers analyzed electronic health record data from 2017 to 2025 and assessed temporal trends from 2020 to 2024.
The Precision Phenotyping for Research Cohorts (P2RC) algorithm operationalized the World Health Organization case definition for PASC by identifying symptom patterns occurring at least 3 months following infection and persisting for at least 2 months. The algorithm also attempted to account for alternative diagnostic explanations and exclude sequelae explained by preexisting conditions. In prior validation work cited by the researchers, the algorithm achieved 79.9% precision. The researchers did not conduct electronic health record review validation in Southeast Texas, Southern California, or Western Pennsylvania, leaving site-specific precision and recall in those regions unquantified. They instead pointed to the algorithm’s consistent performance across 4 demographically distinct populations, ranging from a single academic medical center to a 40-hospital system, as evidence of robustness.
Overall, the algorithm identified PASC in 16% of patients with COVID-19. Regional prevalence was 19% in New England, 20% in Southeast Texas, 23% in Southern California, and 14% in Western Pennsylvania. Patients identified with PASC were older, had higher comorbidity burdens, and were more likely to be female than the overall COVID-19 cohort.
The researchers framed these findings against existing surveillance benchmarks, noting that prior research has shown the U09.9 diagnostic code captures fewer than 1% of COVID-19 survivors, while broader code-based approaches identify roughly 7%. The study did not directly compare its phenotyping-derived prevalence with site-specific U09.9 counts, relying instead on previously published sensitivity estimates. By that comparison, the phenotyping approach identified more than twice the proportion of cases reported with broader code-based surveillance estimates.
“Current diagnostic coding captured only a fraction of affected individuals, leaving the majority invisible to surveillance systems,” wrote lead study author Jiazi Tian, MSc, of the Department of Medicine at Massachusetts General Hospital in Boston, and colleagues.
Among 883 International Statistical Classification of Diseases, Tenth Revision, Clinical Modification codes associated with PASC manifestations, 594 codes, or 67%, were classified as chronic or potentially chronic conditions. Only 36 codes, or 4%, were classified as acute, self-limited conditions.
Overall, 66,587 patients with PASC, or 89%, had at least 1 chronic condition requiring ongoing clinical management. This represented 15% of all patients with COVID-19 in the cohort.
Systemic manifestations were the most common PASC manifestations across regions, accounting for approximately 23% to 25% of cases, followed by respiratory manifestations at 14% to 19% and gastrointestinal manifestations at 13% to 17%.
The researchers also reported statistically significant organ system heterogeneity across regions. Endocrine findings varied by site: thyroid manifestations were overrepresented in New England, whereas Southeast Texas, Southern California, and Western Pennsylvania showed more metabolically dominant endocrine patterns. The researchers cautioned that these regional differences could reflect local coding practices or population-level comorbidity patterns rather than biological differences.
Between the second quarter of 2020 and the second quarter of 2024, cumulative PASC prevalence increased slightly across all 4 regions. Negative binomial regression showed statistically significant quarterly increases in New England, Southern California, and Western Pennsylvania, while Southeast Texas showed a similar but nonsignificant trend.
The researchers noted that cumulative prevalence should be interpreted as the proportion of patients with COVID-19 who had ever been identified as having PASC, not as an estimate of current active cases, because resolution of conditions could not be reliably determined from retrospective electronic health record data.
The study had several limitations. The algorithm depended on electronic health record documentation and may have underestimated PASC among patients with limited or fragmented health care engagement. Electronic health record review validation was not conducted in Southeast Texas, Southern California, or Western Pennsylvania. The study also lacked a COVID-19–negative comparator group, limiting the ability to quantify excess incidence above background rates. Temporal association did not establish causation, and coincidental incident chronic disease could not be fully excluded.
“These findings suggest that approximately 1 in 6 patients with COVID-19 develops postacute sequelae, predominantly chronic conditions currently invisible to surveillance systems,” the researchers concluded.
The National Institutes of Health funded the study through awards from the National Institute of Allergy and Infectious Diseases and the National Center for Advancing Translational Sciences. Co–study researcher Jonas Hügel, PhD, reported grant support from the German Academic Exchange Service and the German Research Foundation during the study period. No other disclosures were reported.
Source: JAMA Network Open