Wake Up Words and Viral Proteins: More Similar than you Would Think

Charles Corfield, President & CEO, nVoqHealthcare Leave a Comment

In this week’s pandemic blog, nVoq’s CEO, Charles Corfield illustrates how the concept of wake-up words/phrases which are usually associated with speech recognition devices might be tied to viral proteins and how that helps frame the problem that, in some people, a mild upper respiratory illness can turn into a raging inferno.
About the Author

Charles Corfield, President & CEO, nVoq

nVoq knows a few things about speech recognition, but other than making sure that we have pandemic-related medical vocabulary and phraseology covered, what has COVID-19 to do with our daily bread? It turns out that while we may not be microbiologists there are cross-domain similarities which we can take advantage of as we digest news from the research front.

You may recall sitting in a noisy environment, such as a restaurant, where you were able to focus on just the conversation at your table, when suddenly you heard your name spoken by someone another table, which momentarily distracted you. What humans do naturally, we train computer systems to do: recognize “wake up” words or phrases, and, like medical tests, we want both high sensitivity (if you say the wake up word/phrase the computer recognizes it) and high specificity (the computer doesn’t mistake something else for the wake up word/phrase). The human body is much like a noisy restaurant, the hubbub of conversations is played by the proteins and other chemicals floating around the body. There are receptors which are tuned to particular words/phrases and they will wake up when they hear them.

In speech recognition, words and phrases are sequences of letters or phonemes, while proteins are sequences of 20 amino acids (and yes, the microbiologist do indeed “spell” proteins using 20 letters of the alphabet). In speech, we have the idea that words may sound similar or the same, and we use the term homophone. Proteins (and RNA and DNA) can also be similar, in that they differ by a few amino acids. Microbiologists call similar sequences of amino acids homologs. Since all organisms are subject to random mutations, there is a constant generation of new homologs, and a winnowing process, whereby if they lead to the same or better results they will persist, and, conversely, if they lead to worse results they will be eliminated. What happens if a virus comes along and starts injecting its own words and phrases into the general din? If the words are in a foreign language, they will likely go unrecognized. For example, If the virus shouts “Für! Für!”, no one pays attention. However, if a mutation happens and the virus starts shouting “Fire! Fire!”, the world changes, because the fire brigade and police force will wake up.

When viruses start producing words and phrases which activate receptors, it is not surprising that this can lead to bad results. Viral proteins which contain wake-up sequences, which do bad things, are called pathogenic. In our story telling terms, they are like the fables of crying wolf or the sky is falling. If you give me a sample of speech and ask me to find all the wake up words in it (from a supplied list), this is not a difficult problem, including when the task calls for approximate matches, such as fire and phire, or bomb and bonne. In similar vein, there are catalogs of proteins (and similar compounds) which are known wake-up words for various receptors, and we can examine the RNA of a virus to see what proteins it produces and then match subsequences of these proteins to the catalogs to see which ones could be dangerous wake-up words/phrases. In the case of COVID-19, we know in general terms that something goes screwy with the body’s inflammatory/immune response, which suggests focusing on things which trigger it. One of the most abundant viral proteins in patient sera is not so much the oft mentioned “spike” protein (known as “S”), which is the initial crowbar by which the virus uses to gain entry into cells, as the “nucleocapsid” protein (known as “N”), which makes up the inner capsule of the virus, and which (to date) has not enjoyed the same level of fame or notoriety.

It turns out that there is a region on this protein which can trigger a particularly destructive chain of events in the innate immune system, which is consistent with why we see debris from the innate immune system in lung biopsies, and why the response to COVID-19 looks so much like sepsis. Whether “N” is the smoking gun or just one of several hand grenades that go off, time will tell. However, N is also to be found at the crime scene for the two other serious corona viruses (the original Sars-Cov and MERS).  My earlier update about the possibility that nicotine has some protective effect for smokers has some relation to this because nicotine has a down-regulating effect on inflammation, and thus it may counter the activation from N in mild cases; but presumably, if there is enough N protein floating around, the nicotine will be outgunned. If it turns out that N really is a leading villain, then it will be interesting to test people for antibodies to both S and N. Antibodies to S would prevent the virus from entering cells in the first place, while antibodies to N would stop the subsequent mayhem. It may turn out that some of the difference in disease progression has to do with how quickly patients produce antibodies to these two proteins.

Leave a Reply