Sperling Prostate Center

Artificial Intelligence in Medicine: Detecting Errors in Radiology Reports

Artificial Intelligence (AI) and its subsets (Machine Learning, Deep Learning) are gaining ground in identifying disease conditions in medical imaging such as MRI and CT scans. However, whether the images are interpreted by humans alone or with the assistance of AI, the findings must be communicated in the form of a written report delivered to the referring physician, who then shares them with the patient.

If there are errors in the report such as confusing wording or misspelled words, its diagnostics may be interpreted inaccurately. As one paper states, “Interpretation differences between radiologists and diagnostic errors are significant issues in daily radiology practice.”[i] It identifies three types of error:

  1. Diagnosis that is missed, delayed, or inadvertently interpreted as inaccurate;
  2. Findings that are overestimated and result in a prolonged hospital stay and more tests;
  3. Findings that are underestimated but in fact are significant or life-threatening.

Thus, reviewing written reports is of key value, but the task consumes valuable radiologist time.

Since it’s not reasonable to expect perfection from the radiologists who write the reports, there is a need to improve error detection so as not to pass along inaccuracy. “Errors and discrepancies in radiology practice are uncomfortably common, with an estimated day-to-day rate of 3–5% of studies reported, and much higher rates reported in many targeted studies,” writes radiologist Adrian Brady, whose 2017 paper called for research into possible strategies to minimize error.[ii] It now appears that AI may offer new solutions.

AI to the rescue

The majority of our blogs on AI in medicine have focused on the ability of software programs to identify disease in medical imaging. Since reports are expressed in words, however, the type of program needed to detect errors must be able to process language instead of pictures. As described in DataCamp.com, there is an AI program called GPT-4 (the fourth version of Generative Pre-trained Transformers developed by OpenAI), which is “a type of Deep Learning model used for natural language processing and text generation. It marks a significant milestone in the field of artificial intelligence, particularly natural language processing.” It can recognize and create language. It has been shown to perform competitively in standardized tests such as the Scholastic Aptitude Test or a lawyers’ bar exam.

It’s not surprising, then, that GPT-4 would be put to the test of detecting errors in written radiology reports. A German research group from the University of Cologne “compared the performance of GPT-4 (OpenAI) to the performance of six radiologists of varying experience to detect errors (ranging from inappropriate wording and spelling mistakes to side confusion) in 200 radiology reports. The study authors noted that 150 errors were deliberately added to 100 of the reports being reviewed.”[iii]

The study authors found that GPT-4 caught report errors with performance comparable to the radiology readers (a bit better than less experienced readers, a bit worse than highly experienced readers) but with greater speed and efficiency. This has implications for streamlining report reviews. However, a word of caution: GPT-4 itself is not perfect. Though the fourth version has less room for error than previous versions, it is still a work in progress. It is also not clear if human reviewers, knowing that GPT-4 has already reviewed a report, will slack off and take for granted that GPT-4 identified all the errors. While AI may come to the rescue of readers with a heavy load of reports to review, it may not be exempt from the law of unintended consequences.

Final note: Many people use GPT-4 to create text such as website blogs. However, this blog was not generated nor reviewed by GPT-4. Therefore, any errors in it are solely the product of the human who wrote it.

NOTE: This content is solely for purposes of information and does not substitute for diagnostic or medical advice. Talk to your doctor if you are experiencing pelvic pain, or have any other health concerns or questions of a personal medical nature.

References

[i] Onder O, Yarasir Y, Azizova A et al. Errors, discrepancies and underlying bias in radiology with case examples: a pictorial review. Insights Imaging 12, 51 (2021).
[ii] Brady AP. Error and discrepancy in radiology: inevitable or avoidable? Insights Imaging. 2017 Feb;8(1):171-182.
[iii] Jeff Hall. “Can GPT-4 Improve Accuracy in Radiology Reports?” Diagnostic Imaging, Apr. 16, 2024. https://www.diagnosticimaging.com/view/can-gpt-4-improve-accuracy-in-radiology-reports-

 

About Dr. Dan Sperling

Dan Sperling, MD, DABR, is a board certified radiologist who is globally recognized as a leader in multiparametric MRI for the detection and diagnosis of a range of disease conditions. As Medical Director of the Sperling Prostate Center, Sperling Medical Group and Sperling Neurosurgery Associates, he and his team are on the leading edge of significant change in medical practice. He is the co-author of the new patient book Redefining Prostate Cancer, and is a contributing author on over 25 published studies. For more information, contact the Sperling Prostate Center.

You may also be interested in...

WordPress Image Lightbox