This site is intended for UK healthcare professionals
Medscape UK Univadis Logo
Medscape UK Univadis Logo

The Human Touch Still Wins in Radiology

Artificial intelligence (AI) has been posited as a particular threat to doctors working in non-patient facing specialties like radiology and pathology, with various articles in both medical and lay media almost gleefully reeling off lists of studies in which machine learning produced algorithms that outperformed doctors. However, all is not lost for those working in these fields, after a new British study directly challenged an AI 'candidate' with the Fellowship of the Royal College of Radiologists (FRCR) examination, which UK trainees must pass to qualify as radiology consultants.

In their prospective multi-reader, multi-centre comparative diagnostic accuracy study, published in the Christmas issue of the BMJ, the researchers pitted one AI contender (Smarturgences, Milvue) – already in use in more than 10 institutions across Europe, although not currently in the UK – against 26 radiologists who had passed the FRCR examination in the preceding 12 months.

The authors, from Great Ormond Street Hospital for Children and St George's Hospital in London, University Hospitals of Morecambe Bay NHS Trust, the University of Cambridge and the Royal Papworth Hospital in Cambridge, along with their FRCR-AI study collaborators, noted that the final FRCR examination has three components, and candidates need a pass mark in all of them to pass the examination overall. "For artificial intelligence to replace radiologists, ensuring that it too can pass the same examination would seem prudent," the team stressed.

Exam Component 'Stress Tests' Candidates

One component of the exam is a "rapid reporting" session deliberately designed to "stress test" candidates for speed and accuracy, requiring them to interpret 30 radiographs within 35 minutes, with at least 27 (90%) correctly reported for a passing grade. This seemed to cover the 'skills' in which AI has been purported to excel, so the team felt it should be "an ideal test setting in which to evaluate its prowess".

They used 10 FRCR mock rapid reporting examinations for the study analysis. The radiographs used provided "a mixture of challenging normal and abnormal cases typically referred by general practice and the emergency department for radiological interpretation in clinical practice", they explained.

"The radiographs were selected to reflect the same or a higher level of difficulty and breadth of knowledge expected for the real FRCR examination." They covered all body parts, with a mixture of images from adult and paediatric patients. Some 'cases' had multiple projections/views, with approximately half containing no abnormalities and the rest containing only one pathology (multiple lung nodules counted as the same single pathology). Clinical information is not provided to candidates in the rapid reporting component of the exam.

AI Candidate Trained on Over 600,000 Radiographs

The AI tool had been trained on a multicentric dataset of more than 600,000 chest and musculoskeletal radiographs to detect seven key pathologies (fracture, pleural effusion, lung opacification, joint effusion, lung nodules, pneumothorax, and joint dislocation) and offer its opinion as a binary certainty score (that is, certain/positive or uncertain/doubtful).

The 26 radiologists - 62% female and 73% aged between 31 and 40 - successfully completed all 10 mock FRCR rapid reporting examinations. Pooled results yielded an average accuracy of 84.8% (range 76.1 - 91.9%) with an estimated sensitivity of 84.1% (range 81.0% - 87.0%) and specificity of 87.3% (85.0% - 89.3%).

In comparison, the AI candidate, when graded only on the images it had been trained to interpret (which excluded radiographs of the axial skeleton and abdomen), had an overall accuracy of 79.5% (74.1% - 84.3%), with sensitivity 83.6% (95% confidence interval 76.2% - 89.4%) and specificity 75.2% (66.7% - 82.5%).

With this examiner's 'dispensation' to take account of the AI candidate's lack of experience in interpreting skull, spine, and abdominal radiographs, the AI entity passed two of the 10 mock FRCR exams, while the average radiologist passed four. Although the AI candidate was ranked as the highest performing candidate in one mock examination, it came second to last overall across all interpretable images (rank 26/27).

AI Didn't Pass but Accuracy 'Relatively High'

Furthermore, when scored by similarly strict criteria as the radiologists, without the dispensation, "the AI candidate was unable to pass any of the 10 mock examinations", the researchers reported. However, they noted: "Although the artificial intelligence candidate did not outperform most of the radiologists, its accuracy was relatively high considering the case mix and complexity."

They concluded: "The artificial intelligence candidate would still need further training to achieve the same level of performance and skill of an average recently FRCR qualified radiologist, particularly in the identification of subtle musculoskeletal abnormalities (which made up the majority of the artificial intelligence imaging pitfalls), and also in interpretation of abdominal, skull, and spine radiographs, which it has no training or experience in analysing.

"The artificial intelligence candidate was, however, correct in its diagnosis in half of the cases that most radiologists failed, particularly when these involved hands and feet. These radiographs probably contain more bones and joints for evaluation, which humans may find time-consuming and tedious but an artificial intelligence would not."

AI's Performance Could Improve With More Training

They added: "The promise of artificial intelligence as a diagnostic adjunct in clinical practice remains high." The AI "came close to radiologist level performance" when only the cases it could interpret were scored. Whilst the AI candidate could not outperform radiologists, "further training may improve results".

AI could be useful in situations where routine immediate radiographic reporting is not available, and to assist the accuracy of reporting radiologists, improve their sensitivity and reduce reporting time.

"In a future scenario, in which the performance of artificial intelligence reaches that of humans and artificial intelligence is widely adopted in clinical practice for radiographic interpretation, radiologists' training may place a greater focus on the evaluation of radiographs for which artificial intelligence yields inaccurate or uninterpretable results."

Study Adds 'a Dose of Realism'

In an accompanying editorial, Vanessa Rampton, ETH Zürich Chair for Philosophy II in Switzerland, and Athena Ko, a psychiatry trainee at the University of Ottawa in Canada, said that the study added "a dose of realism to the hype surrounding the outsourcing of radiology to artificial intelligence".

AI may facilitate workflows, but human input is still crucial, they said.

They acknowledge that using artificial intelligence "has untapped potential to further facilitate efficiency and diagnostic accuracy to meet an array of healthcare demands" but say doing so appropriately "implies educating physicians and the public better about the limitations of artificial intelligence and making these more transparent."

Real Life Radiologists 'Essential and Irreplaceable'

Asked to comment on the study by Medscape News UK, Dr Katharine Halliday, president of the Royal College of Radiologists, said: "AI is incredibly useful, adding huge amounts of intelligence and data to healthcare and replacing some processes and procedures, including within radiology and radiography.

"Clinical radiologists interpret complex scans and guide treatment or surgery; there is no question that real life clinical radiologists are essential and irreplaceable. However, a clinical radiologist with the data, insight and accuracy of AI is, and will increasingly be, a formidable force in patient care. This paper demonstrates what we all know, while AI shows great promise, it cannot replace highly trained and skilled professionals."

SCS is funded by a National Institute for Health Research (NIHR) advanced fellowship award (NIHR-301332). JWM is supported by the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). This article presents independent funded research. The views expressed are those of the authors and not necessarily those of the National Health Service (NHS), NIHR, or Department of Health.

All authors have completed the ICJME uniformdisclosure form at anddeclare: support from the National Institute for Health Research;no financial relationships with any organisations that might havean interest in the submitted work in the previous three years; SS isthe organiser of a radiology revision course mentioned in this studyand helped to recruit radiologist readers to the project, but thisrelationship had no influence on the reported results of the work andno financial incentive was provided; no other relationships or activitiesthat could appear to have influenced the submitted work