AI-MARRVEL: A leap forward in diagnosing genetic diseases with over 98% precision

Trending 2 months ago

In a caller study published successful NEJM AI, researchers developed nan artificial intelligence (AI)-based Model Organism Aggregated Resources for Rare Variant ExpLoration (MARRVEL) exemplary to prime causal genes and their mutations for Mendelian illnesses based connected objective characteristics and familial sequences.

 Antiv/​​​​​​​Study: AI-MARRVEL — A Knowledge-Driven AI System for Diagnosing Mendelian Disorders. Image Credit: Antiv/


Millions of individuals globally are calved pinch familial illnesses, typically Mendelian illnesses caused by azygous cistron mutations. Identifying these mutations takes effort and requires important expertise.

Comprehensive, systematic, and businesslike procedures could summation diagnostic velocity and accuracy. AI has shown imaginable but has only had mediocre occurrence successful superior diagnosis.

Bioinformatics-based re-assessment is little costly but has constricted accuracy, making it tedious to prioritize non-coding variations, and requires utilizing simulation data.

About nan study

In nan coming study, researchers present nan knowledge-driven MARRVEL AI-based exemplary (AIM) to place Mendelian illnesses.

AIM is simply a machine-learning classifier that combines complete 3.5 cardinal variations from thousands of identified cases and expert-engineered variables to heighten molecular diagnosis. The squad compared AIM to patients from 3 cohorts and developed a assurance people to find diagnosable instances successful unresolved pools.

They trained AIM connected high-quality samples and expertly developed features. They tested nan exemplary connected 3 diligent datasets for various applications specified arsenic dominant, recessive, triple diagnosis, caller illness cistron identification, and large-scale re-evaluation.

Researchers collected Human Phenotype Ontology (HPO) keywords and exome sequences from 3 diligent groups: DiagLab, nan Undiagnosed Disease Network (UDN), and nan Deciphering Developmental Disorders (DDD) Project. They divided DiagLab information into training and testing datasets and tested DDD and UDN separately.

They guided AIM by knowledge-driven characteristic engineering, which utilized objective expertise and familial principles to prime 56 earthy features specified arsenic insignificant allelic frequency, illness database, evolutionary conservation, version impact, phenotype matching, inheritance pattern, version pathogenicity estimation scores, cistron constraint, sequencing quality, and splicing prediction.

The squad created six modules for familial diagnostic decision-making, resulting successful 47 other characteristics. They utilized random wood classifiers arsenic nan superior AI algorithm and consulted benchmarking publications and apical performers.

They utilized characteristics specified arsenic SpliceAI to prioritize splicing variations. They developed nan AIM-without-VarDB exemplary to analyse nan effect of erroneous phenotypic data.

They utilized nan "feature climbing" attack to measure nan publication of each characteristic and categorize each characteristics according to their biologic significance.

The researchers developed a cross-sample people to estimate nan chance of a diagnostic variety being successfully diagnosed successful a diligent utilizing AIM.

They divided patients into 2 groups based connected their level of confidence: those pinch precocious assurance had manual review, while those pinch debased assurance underwent reanalysis.

They constructed 4 degrees of confidence, applied them to UDN and DDD samples, and evaluated them by distinguishing affirmative patients from antagonistic ones and unaffected relatives of de novo patients.


AIM dramatically accrued familial diagnostic accuracy, tripling nan number of solved cases comparative to benchmarked approaches successful 3 real-world cohorts. AIM attained a 98% accuracy complaint and detected 57% of diagnoseable retired of 871.

It besides showed committedness successful caller unwellness cistron find by accurately predicting 2 precocious reported genes from nan Undiagnosed Diseases Network. AIM outperformed existing methods connected 3 abstracted datasets, outperforming Genomiser successful nan UDN and DiagLab cohorts.

The AIM method successfully distinguished betwixt non-diagnostic and diagnostic pathogenic variations successful ClinVar. AIM-without-VarDB had a small capacity driblet but yet outperformed nan different benchmarked techniques.

Expert characteristic improvement accrued nan purpose model's accuracy while delaying training saturation. Using 20% of training data, AIM maintained a top-1 diagnostic accuracy of 54%. With much training samples, nan exemplary trained utilizing nan engineered variables showed 66% accuracy, whereas nan exemplary without engineering features was 58% accurate.

The researchers discovered an 11% driblet successful top-1 diagnostic accuracy, showing that precise phenotypic note is critical. Even pinch useless phenotypic information, AIM obtained 78% top-5 diagnostic accuracy, highlighting nan value of molecular evidence.

An summation successful nan OMIM-based phenotypic similarity people from zero to 0.25 accrued prediction results by 60.0% to 90.0%. However, consequent increments complete 0.3 only resulted successful a flimsy rise, indicating a deficiency of request for nan precise lucifer to OMIM phenotypes.

The trio classifier (AIM-Trio) outperformed nan Exomiser and Genomiser Trio models while marginally outperforming nan proband-only exemplary (AIM). The AIM-NDG exemplary removed characteristics linked to recognized unwellness databases.

Based connected nan study findings, AIM is simply a machine-learning familial diagnostic instrumentality tin of identifying caller illness genes and analyzing thousands of samples successful days. It is very meticulous and beneficial for first diagnosis, reanalysis of unresolved cases, and identifying caller illness genes.

AIM analyzes astir 3.5 cardinal variety information points from thousands of diagnosed cases and provides a Web interface for users to taxable cases and analyse findings.

However, limitations see not assessing structural aliases copy-number changes and focusing connected situations pinch coding mutations. Large connection models, specified arsenic PhenoBCBERT and PhenoGPT, person demonstrated higher performance.

Journal reference:

  • Dongxue Mao, Ph.D. et al., (2024) AI-MARRVEL: A Knowledge-Driven AI System for Diagnosing Mendelian Disorders, NEJM AI., doi: 10.1056/AIoa2300009.