Voice-rec software was supposed to improve over time, but has it actually learned anything?
Way back, before college, one of my classmates had a pet peeve of repeating herself. The idea was that, if she was taking the time and effort to communicate with you, you should be paying enough attention that you “got it” the first time she said something.
I think even she recognized at some level that this was not a 100% reasonable expectation to have in all circumstances. There are other stimuli in the world, after all, and they can intrude no matter how much that interferes with your best intentions. There’s also the simple fact that mere mortals relying on verbal communication don’t always speak with sufficient volume or clarity-or, even, the best words for the intended effect.
Related article: 11 Radiology Speech Recognition Bloopers
Many such excuses are nullified when one is dealing with voice-recognition software, which most radiologists now use for generating reports of their interpretations. The software, presumably, is rapt with attention for audio input from its user. And it’s not burdened with any need of comprehending the meaning of what the user says. If I dictate a line of words that are all in its vocabulary yet make no grammatical sense as a sentence, the software should dutifully transcribe the word-salad, whereas a human audience might respond with a “Huh?”
So, yes, a lot of us rads do get vexed when this software we’ve been given (usually without a say as to which product) fails to do what it’s supposed to, time and time again, forcing us to repeat ourselves. Whether because we’ve been told that the software will learn better performance if we dutifully correct it-or just because we don’t want to concede defeat and reach for the keyboard to type stuff in ourselves.
Like the erstwhile classmate above, we suffer diminishing patience as the same software commits the same errors over and over (throw in a few dozen more “and overs” if you like). That classmate, likely as not, knew which people in her life were more-frequent offenders, and probably a more global sense of humanity, as a whole, having made her repeat statements. Each instance of self-repetition chipped away at a future tolerance for doing so.
I’ve writtena couple of pieces in the past about my dissatisfactions with the voice-rec software we rads are given. One of my theories is that, since we’re not the ones paying for it, our satisfaction is of low priority for the software-developers. Also, relatively speaking, we rads-even if lumped in with all other physicians using such software-are a rather small audience. And a stable one, at that-if we change voice-rec software at all, it’s unlikely to happen more than once every few years, if even in the same decade.
By contrast, the massively-larger general population has now stepped decisively into the voice-recognition world. For dictating text messages, emails, even commands to their various Google and Amazon devices. And guess what? They vote with their wallets. If something gives crummy service, it’s going to be yesterday’s news in a hurry.
So the industry clearly takes them seriously. And I’ve noticed the voice-rec on my phone adapts and learns in all sorts of ways that my radiology voice-rec software never has. Which is kind of amazing, since the sorts of things I dictate into my cellphone are far more diverse than what goes into my rad-workstation. That’s a much bigger, more amorphous mess of verbiage to wrangle.
For instance, the phenom that inspired this column: I’ve noticed that, when dictating stuff into my cellphone, it works far better if it has internet-connectivity than if it does not. I have caught it transcribing something erroneous, and then, as if it has taken a moment to think things over (do these words make sense together? Is this a phrase my user has employed before?), it replaced the words with the correct dictation. This only occurs when I’m on WiFi or with strong signal. To me, this says it’s communicating with a remote “brain,” some processing-center that has wherewithal far greater than my dinky little Android.
How is it that none of the radiology-workstations I’ve ever seen have used this bit of technological wizardry? And is it any wonder that, no matter how much we’re told that our dictation-software will learn as we correct it-it doesn’t, because the “brain” it’s got to work with is infinitely smaller than Google’s apparatus, which by now probably rivals Skynet’s.
Perhaps we in the healthcare biz will eventually be thrown a crumb, once the voice-rec biz servicing the bulk of the population grows to the point that we enjoy some spillover. For instance, maybe Dragon will start paying Google to make use of its “brain” so Dragon’s own dictations can start showing some evidence of learning comparable to what an average cellphone-user now enjoys.
Related article: Why I Prefer Transcription to Voice Recognition Software
Otherwise, sooner or later Google might notice that there’s a woefully-underserved niche, and start marketing directly to healthcare facilities as a threat to the current voice-rec vendors. If those vendors want to remain self-contained and viable, they’d better figure out a way to live up to some of the promises they’ve been making us for the past couple of decades.
One wonders if they will “learn” faster than their products have.
The Reading Room Podcast: Emerging Trends in the Radiology Workforce
February 11th 2022Richard Duszak, MD, and Mina Makary, MD, discuss a number of issues, ranging from demographic trends and NPRPs to physician burnout and medical student recruitment, that figure to impact the radiology workforce now and in the near future.
New Study Examines Short-Term Consistency of Large Language Models in Radiology
November 22nd 2024While GPT-4 demonstrated higher overall accuracy than other large language models in answering ACR Diagnostic in Training Exam multiple-choice questions, researchers noted an eight percent decrease in GPT-4’s accuracy rate from the first month to the third month of the study.