For many physicians alternative/complementary medicine is a sore point, where magical thinking on the part of patients and unproven/false claims by practitioners lead to significant harm (people die or are severely injured). When it comes to AI, physicians too may risk succumbing to magical thinking and could benefit from some “critical thinking” inoculation.
The fields of alternative and complementary therapies are an irritant for many healthcare professionals. Their success relies on the human proclivity to “magical thinking”, or the projection of our hopes and desires onto a person, practice, or device when we seek remediation for some malady. In the last few years we have seen a similar rush to magical thinking around Artificial Intelligence. Indeed, one is tempted to quip that there is an emerging field of “Alternative Intelligence”, which will magically diagnose every disease and cure all the world’s ills. This year’s HIMSS show may have been peak hype for AI, everything was better with “Brand X’s AI powered products” to paraphrase numerous sales pitches. The best antidote for magical thinking is evidence based thinking. With that in mind let us examine the strengths and weaknesses of a few applications of artificial intelligence.
Many advances in AI have been spurred by image recognition. There are public image data sets on which researchers demonstrate their latest achievements in image classification accuracy. One classic in the field is the recognition of hand written numbers, with a practical application in Google’s Street View finding and reading of house numbers in street images. In some recent papers, researchers examined state-of-the-art image recognition neural nets and found significant shortcomings. They performed poorly when presented with silhouettes of common objects, or objects which were rotated or mildly distorted, something that humans have no trouble with. Indeed, it seems that the best nets focus on the textures of objects rather than their outlines (as humans do). An elephant shape filled with zebra stripes looks like stripy elephant to humans and like a zebra to these nets. This should give one pause about the use of neural nets in radiology. Neural nets which have been trained on hundreds of thousands of diagnostic images may not be paying attention to the same things a (human) radiologist pays attention to. If the image generation techniques change (think contrast media, signal mixes, post processing) there may be subtle changes to the image which do not affect human cognition, but cause a significant degradation of neural net performance, yet this would only come to light after enough diagnostic and treatment errors have occurred to get humans to take notice and intervene.
Another application of techniques from machine learning is speech recognition. When humans listen to speech they perform several tricks simultaneously: (a) transcription of audio into words; (b) error correction; (c) inference of intent. When computers listen to speech they do mostly (a) and very little of (b) and (c). Technologists who work in this area quip that “speech recognition is often confused with mind reading”. Humans have mirror neurons which they use to infer the state of mind of the speaker, whereas computers don’t, and consequently it is hard for them to transcribe meaning as accurately as humans can. Even at the basic task of recognizing individual words, neural nets are still noticeably behind human hearing, and it is not hard to see why. Let’s consider the leading academic research toolkit for speech recognition (“kaldi”), the brainchild of Dan Povey and colleagues at Johns Hopkins and other universities. This has become a benchmark against which recognizers can compare themselves. When kaldi processes audio it throws away half the information in the audio signal. If you have ever seen a spectrogram of human speech you are seeing a picture of the amount of energy at different frequencies and how it changes over time. Kaldi uses these spectrograms. Yet spectrograms are only half the information in the audio signal – they throw away the “phase” information. Why would anyone care? One reason is that when you listen to sound there is a slight timing difference when a wave front hits your left and right ears, i.e., there is a difference in phase. In principle, you can use that to tell the direction of the sound’s source and if there are two speakers you can separate the two voices according to the directions they are coming from. [By the way, human hearing can also separate voices coming from the same direction by the differences in how they sound, a trick that computers are still bad at.] The second reason you might care is that phase represents half the available information. For years it was believed that human hearing doesn’t use phase information (outside of directional cues). After all, if I replay the recording of a musical note, you cannot tell the difference between the replays if I change the phase (by a fixed amount). However, a simple experiment shows that human hearing does process phase. Take a recording of someone’s voice, and change it in two ways, in the first remove all the phase information by setting the phases of the signal in each frequency band to a constant value, but allow the amplitude to vary over time (this is a spectrogram); and in the second remove the amplitude information by setting the amplitude for each frequency band to be the same, but retain all the phase variation (this is a phasegram). When you listen to the first audio, it sounds tinny or robotic, and you can understand what is being said, though it is nowhere near as clear as the original speech. When you listen to the second audio, it too is understandable, which tells you that phase carries information that human hearing can use to recognize speech. This begs a question: wouldn’t it be better to use all the information in the audio rather than half? The fact that the best academic toolkit does not (yet) use phase information to improve accuracy suggests that there are still opportunities for advancement.
Returning to directional information in audio, this is something that various researchers have been looking into for ambient speech recognition, where there are two different speakers in the same room, such as a doctor and a patient. The state of the art in separating the audio from the two speakers is not very good. Researchers at Microsoft recently published a paper where they used a cluster of twelve microphones and a bi-directional neural net (something of a sledge hammer in the world of neural nets). Their results were a new “best” (in terms of published findings), but pretty poor compared with what humans do with ease. The researchers at Google’s Deep Mind have also tried doing ambient speech recognition in doctors’ offices in the UK. Their pithy summary – a much harder problem than they had thought.
Speech recognition also intersects with privacy concerns. There are things called voice prints which identify speakers. In the world of customer service there is a lot of fraud committed by criminals who use stolen identities to fraudulently buy goods and services. Since they are repeat offenders it is possible to create a database of their voice prints and screen incoming callers against this database for matches. Let’s follow the voice print into the healthcare setting. A hospital chooses to deploy Amazon or Google smart speakers so that patients can control their TVs and do other simple tasks without troubling a nurse. Patient and nurse satisfaction both increase. What’s not to like? In this (admittedly dark) scenario, Amazon and Google have heard the patient’s voice and made a voice print, which they match against voice prints from mobile phones or speakers deployed at home. The patient is discharged and is now bombarded with ads for healthcare products and services targeted to the condition(s) they were treated for. The temptation for Amazon and Google to monetize the voice information is great. If patients/consumers don’t want this, who is liable for disclosure of the patient’s identifying information and the fact that they had a recent hospital stay? How many people realize that a patient’s voice, once captured by computer, is now “PII” and subject to privacy laws?
AI is often portrayed as being ahead of human capabilities. IBM’s Watson rose to fame by beating human champions at chess and Jeopardy. Now for a reality check. The rules of chess and go are static. A net that plays many games against itself will become very good. Similarly, a net which ingests the Wikipedia will know more trivia than most humans. IBM thought that Watson could be applied to oncology. Unlike games, oncology is in flux. Humans attend conferences, read papers and talk to each other, neural nets don’t. Result? The neural net is always lagging behind the humans. As IBM found, to its chagrin, Watson was a ho-hum and out-of-date oncologist.
A final example which illustrates that Artificial Intelligence has a ways to catch up with Natural Stupidity. Neural nets are powered by Graphical Processing Unit (GPU) cards, which are the same size as VCR cassettes. Let us compare an AI powered driverless car with the humble bee, which has a brain the size of a pinhead. The latter learns from other bees where there are good sources of nectar, and is able to navigate to those sources avoiding collisions with obstacles, such as pedestrians, cars, trucks, road works, lamp posts, trees, and so on. It maneuvers into parking spots on flowers of all different shapes, sizes, colors, and orientations. In short the bee is way ahead of the best image recognition and driverless car technology. The sarcastic observer might note that precisely zero pedestrians have been knocked over and killed by bees in transit. The cautious observer, for reasons noted above, will not wear fashionably patterned clothing when crossing in front the cleverest driverless car…
Let me end with a plug for Dr. Jonathan Howard’s 500+ page book, “Cognitive Errors and Diagnostic Mistakes”, which is a well written and engaging compendium of human perceptual, judgemental, and reasoning errors in medicine. In a few years’ time it will likely have a companion: “Cognitive Errors and Diagnostic Mistakes made by Machine Intelligence”, the case histories and the errors will seem every bit as straight forward as the human ones. I can hear the plaintiff’s attorney now – “It should have been obvious that…”