“Our goal was to develop a model that ranks variants by disease severity — providing a prioritized, clinically meaningful view of a person’s genome,” said co-senior author Debora Marks, professor of systems biology in the Blavatnik Institute at HMS.
The team hopes that popEVE can help clinicians diagnose single-variant genetic diseases — especially rare diseases — more quickly and accurately. The model could also be used to identify new drug targets for genetic conditions.
The tool complements efforts across the HMS community to conduct research, build AI tools, and engage in nationwide collaborations to improve the diagnosis and treatment of rare diseases.
Turning EVE into popEVE
As genomic sequencing has become more accessible, physicians have had access to an increasing amount of information about their patients’ genetic variants.
However, for variants whose link to disease remains poorly understood, identifying which of those variants are responsible for a patient’s condition tends to be time-consuming, inefficient, and sometimes fruitless. As a result, many patients with rare or unique genetic diseases remain undiagnosed for years.
Several years ago, the Marks Lab developed a generative AI model called EVE that uses deep evolutionary information from different species to learn patterns of mutations that are highly conserved in biology. EVE can then make predictions about how variants in human genes affect protein function.
But EVE couldn’t easily compare variants on different human genes to determine which might be the most problematic for health. The same is true of other variant prediction models that have emerged in recent years, the researchers said.
The team believed that finding a better way to compare variants across genes might help clinicians choose which variants to prioritize in their research when trying to diagnose and care for patients, said Rose Orenbuch, a research fellow in the Marks Lab and lead author on the new paper.
To create popEVE, the researchers added two components to EVE: a large-language protein model, which learns from the amino acid sequences that make up proteins, and human population data that captures natural genetic variation. In doing so, they were able to calibrate the model so that the score it produces for each variant can be compared across genes.
Because popEVE combines cross-species and within-species information, it reveals how much a variant affects protein function as well as the importance of that variant for human physiology, Marks explained.
Putting popEVE through its paces
When the researchers tested popEVE on documented variants and case studies, they found that it successfully:
- Distinguished between pathogenic and benign variants.
- Discerned healthy controls from patients with severe developmental disorders.
- Determined whether a variant was likely to cause death in childhood or adulthood.
- Assessed whether an alteration was inherited or occurred randomly, even without having parental genetic information.
Importantly, the model did not show ancestry bias by performing worse in people from underrepresented genetic backgrounds and did not overpredict the prevalence of pathogenic variants.
The researchers then applied popEVE to a cohort of around 30,000 patients with severe developmental disorders who had not yet received a diagnosis.
“These are diseases that we assumed were genetic and caused by a single variant based on their severity, but the variant hadn’t been found,” said Orenbuch.
The analysis led to a diagnosis in about one-third of cases.
Perhaps most notably, the model identified variants on 123 genes linked to developmental disorders that had not been previously identified — essentially finding the likely genetic causes of the disorders. In fact, 25 of these genes have since been independently confirmed by research in other labs to cause the disorders.
Moving popEVE into the clinic
Marks and colleagues are now working on making popEVE available to clinicians and researchers to use and validate in the real world.
Scientists can access popEVE via an online portal.
The team is also collaborating with organizations including the Children’s Rare Disease Collaborative at Boston Children’s Hospital, the Division of Human Genetics at the Children’s Hospital of Philadelphia, and Genomics England in partnership with the Wellcome Sanger Institute.
Marks reports that a clinician-researcher at Centro Nacional de Análisis Genómico in Barcelona, Spain, has been using popEVE to interpret variants in his patients — information that has helped him make several rare-disease diagnoses.
“I feel like we are a step closer to popEVE being useful in the day-to-day pipeline of trying to diagnose genetic diseases faster,” Orenbuch said.
She added that she is especially excited about the model’s potential for patients who have been unable to receive a diagnosis through standard methods.
“These are the cases where we have to look outside of the known disease genes, and popEVE has already found a lot of gene candidates,” she said.
The team noted that while popEVE will need to be further verified to ensure its safety and accuracy before it is widely adopted in the clinic, they hope it can eventually increase clinicians’ confidence in using computational models for genetic diagnoses.
The researchers are also integrating popEVE scores into existing variant and protein databases such as ProtVar and UniProt, which will allow scientists worldwide to use the model to compare variants across genes.
By pinpointing the genetic origins of rare or complex diseases, the researchers noted, popEVE may also identify new targets and avenues for drug development.
“We think prioritizing variants based on predicted disease severity will improve the odds of diagnosis and ultimately pave the way for better treatment and drug discovery,” Marks said.
The future of federally funded research at Harvard Medical School — supported by taxpayers and done in service to humanity — remains uncertain. Learn more.
