A Machine Gets High Marks for Diagnosing Sick Children
The average wait times in U.S. emergency rooms top two hours, leaving both clinicians and patients to feel the pain of an overburdened system. Many a parent has endured those hours with a distressed child, triaged out for lack of urgency only to be sent home with unneeded antibiotics for a garden-variety viral infection.
With the money and time that visits to the ER and urgent care soak up, the chance to revisit old-fashioned physician house calls holds a strong appeal. What if the visit came from an intelligent machine? AI systems are already adept at recognizing patterns in medical imaging to aid in diagnosis. New findings published February 11 in Nature Medicine show similar training can work for deriving a diagnosis from the raw data in a child’s medical chart.
For this study at Guangzhou Women and Children’s Medical Center in southern China, a team of physicians distilled information from thousands of health records into key words linked to different diagnoses. Investigators then taught these key words to the AI system so it could detect the terms in real medical charts. Once trained, the system combed the electronic health records (EHRs) of 567,498 children, parsing the real-world physician notes and highlighting important information.
It drilled down from broad to specific diagnoses from among 55 categories. So how did the robo-doc do? “I think it’s pretty good,” says Mustafa Bashir, an associate professor of radiology at Duke University Medical Center who was not involved in the work. “Conceptually, it’s not that original, but the size of the data set and successful execution are important.” The data processing, Bashir says, follows the typical steps of taking a “big giant messy data set,” putting it through an algorithm and yielding order from the chaos. In that sense, he says, the work is not especially novel, but “that said, their system does appear to perform well.”
The practice of medicine is both art and a science. Skeptics might argue a computer that has processed a lot of patient data cannot furnish the type of qualitative judgment made by a general practitioner to diagnose a human from a distance. In this case, though, a lot of human expertise was brought to bear before the machine training began. “This was a massive project that we started about four years ago,” says study author Kang Zhang, a professor of ophthalmology and chief of ophthalmic genetics at the University of California, San Diego. He and his colleagues began with a team of physicians reviewing 6,183 medical charts to glean key words flagging disease-related symptoms or signs, such as “fever.” The AI system then went through training on these key terms and their association with 55 internationally used diagnostic codes for specific conditions such as an acute sinus infection. In parsing a chart for relevant terms the system stepped through a series of “present/absent” options for specific phrases to arrive at a final diagnostic decision.
To check the system’s accuracy, Zhang and his colleagues also employed old-fashioned “technology”—human diagnosticians. They compared the machine’s conclusions with those in the original records—and they had another team of clinicians make diagnoses using the same data as the AI system.
The machine received good grades, agreeing with the humans about 90 percent of the time. It was especially effective at identifying neuropsychiatric conditions and upper respiratory diseases. For acute upper-respiratory infection, the most common diagnosis in the huge patient group, the AI system got it right 95 percent of the time. Would 95 percent be good enough? One of the next questions that needs to be researched, Zhang says, is whether the system will miss something dire. The benchmark, he says, should be how senior physicians perform, which also is not 100 percent.
A human clinician would serve as a quality-control backup for the AI system. In fact, human and machine would probably follow a similar series of steps. Just like a doctor, the machine starts with a broad category, such as “respiratory system,” and works from the top down to arrive at a diagnosis. “It mimics the human physician’s decision progress,” says Dongxiao Zhu, an associate professor of computer science at Wayne State University who did not take part in the study.
But Zhu sees this as “augmented intelligence” rather than “artificial intelligence” because the system handled only 55 diagnostic options, not the thousands of possibilities in the real world. The machine cannot yet delve into the more complex aspects of a diagnosis such as accompanying conditions or disease stage, he says. How well this system could translate outside of its Chinese setting remains unclear. Bashir says although applying AI to patient information would be difficult anywhere, these authors have proved it is achievable.
Further, Zhu expresses additional skepticism. Pulling diagnostic key words from text notes in an EHR will be “radically different” in a language like English rather than Chinese, he says. He also points to all the work required for only 55 diagnoses, including the human energy of 20 pediatricians grading 11,926 records for comparison of their conclusions with the machine’s diagnoses. Given the four years the overall process required, parents likely have a long wait ahead before a computerized clinician can spare them that visit to the ER.