Majority of AI clinical trials report positive outcomes, yet concerns over generalizability persist

Trending 2 weeks ago

In a caller study published successful The Lancet Digital Health, researchers examined nan authorities of randomized controlled tests (RCTs) for artificial intelligence (AI) algorithms successful objective practice.

 metamorworks/Shutterstock.comStudy: Randomised controlled tests evaluating artificial intelligence successful objective practice: a scoping review. Image Credit: metamorworks/


The usage of AI successful healthcare has remarkably surged successful nan past 5 years, pinch immoderate studies indicating that AI models could execute connected par pinch aliases moreover amended than clinicians. Many models person been evaluated retrospectively and not successful real-world settings.

Of astir 300 aesculapian devices enabled pinch AI, immoderate person been assessed successful prospective RCTs. This scarcity contributes to uncertainty regarding nan anticipation of consequence to clinicians and patients. Further, AI systems tin execute poorly erstwhile prospectively deployed.

About nan study

In nan coming study, researchers analyzed nan existent authorities of AI successful objective practice. They searched for applicable studies connected nan International Clinical Trials Registry and PubMed, CENTRAL, and SCOPUS databases betwixt January 1, 2018, and November 14, 2023. References from studies were besides screened to place further articles.

RCTs that implemented a important AI constituent arsenic an involution successful objective believe were eligible for inclusion. The involution included non-linear computational models, i.e., neural networks, determination trees, etc.

Secondary studies, studies evaluating linear consequence scores (logistic regression), and those not integrating nan involution into objective believe were excluded. Abstracts/titles were screened, and afloat texts were reviewed.

Relevant information from eligible studies were extracted. These included subordinate characteristics, superior endpoint, objective task(s), clip ratio endpoint, study location, comparator, AI type/origin, and results.

Studies were stratified by nan superior endpoint group, objective specialty, and AI information modality. Meta-analyses were not performed owed to nan heterogeneity successful endpoints and tasks. Instead, an overview of proceedings features was presented.


The researchers identified 6,219 studies and 4,299 proceedings registrations. Following title/abstract screening, afloat texts of 133 studies were reviewed, which excluded 60 articles.

Reference screening identified 13 studies. Overall, 86 unsocial RCTs were included; 43%, 13%, 6%, and 5% of tests were related to gastroenterology, radiology, surgery, and cardiology, respectively.

Gastroenterology RCTs were notable for uniformity, arsenic each tests tested video-based algorithms assisting clinicians. Further, only 4 groups (Fujifilm, Medtronic, Wuhan University, and Wision AI) conducted astir (65%) gastroenterology trials.

In addition, 92% of RCTs were single-country tests undertaken chiefly successful nan United States aliases China; conversely, six of nan 7 multi-country tests were conducted successful European countries.

The median subordinate property was 57.3; 48.9% of subjects were male. Twenty-two RCTs reported race/ethnicity; nan median proportionality of White participants was 70.5%.

The superior endpoints successful 46 tests were related to diagnostic capacity aliases yield, specified arsenic mean absolute correction and discovery rate. Eighteen tests examined nan effects of AI connected attraction management. Fifteen AI algorithms evaluated diligent symptoms and behavior.

Seven RCTs examined AI successful objective decision-making. Fifty-nine tests assessed heavy learning models for aesculapian imaging, predominately video-based alternatively than image-based. Others relied connected system data, i.e., wellness records, free text, and waveform data.

Most imaging-related AI systems were implemented successful an assistive setup, whereas those based connected system information were compared pinch regular care.

Most models (55%) were developed successful industry, followed by academia (41%). Eighty-one tests aimed to show improvement, 80% of which reported important improvements successful their superior endpoint.

Specifically, 46 tests observed improvements for clinicians assisted by AI systems compared to unassisted clinicians. Notably, 3 RCTs recovered that standalone AI systems performed amended than clinicians. Five tests implemented non-inferiority designs.

Two tests examined non-inferiority betwixt assisted and unassisted clinicians, and 3 assessed it betwixt clinicians and standalone AI systems.

Overall, 70 tests reported favorable results for their superior endpoint. Sixteen RCTs had antagonistic results, i.e., they recovered nary improvements of assisted clinicians comparative to unassisted clinicians, AI systems compared to regular care, and standalone AI models complete clinicians.


Taken together, nan findings uncover a increasing liking successful nan inferior of AI crossed objective specialties and regions.

Most tests had favorable outcomes, underscoring nan imaginable of AI systems successful improving objective decision-making, diligent symptoms and behavior, and attraction management.

Notably, nan occurrence of AI yet depends connected its generalizability to target populations and settings. Continued investigation is basal to deepen nan knowing of AI's existent effects and limitations.

Journal reference:

  • Han R, Acosta JN, Shakeri Z, Ioannidis JPA, Topol EJ, Rajpurkar P. (2024) Randomised controlled tests evaluating artificial intelligence successful objective practice: a scoping review. The Lancet Digital Health,. doi: 10.1016/S2589-7500(24)00047-5.