January 25-31 2021
Predicting the future of vaccines using human language structures
The ability of viruses to mutate and evade the immune system’s attentions is a major obstacle to the development of universal vaccines, as well as to certain therapies. The mutations enabling evasion can change the “appearance” of viral surface proteins without altering their structure and functions, and so mask the virus to the eyes of the immune system. Mutations can be simple (in one domain of a protein) or multiple (over several domains). But is it possible to predict which mutations will allow viral evasion? That would be extremely useful to us, but the high number of possibilities prohibits testing in laboratories. This topic implies a very important question: for how long will anti-COVID-19 vaccines remain effective?
American researchers (MIT) have developed a computer algorithm to predict in which protein domains mutations enabling evasion will occur, based on the protein’s sequence (the particular combination of amino acids that form it). This model takes its inspiration from natural human language (voice assistance and recognition…): sequences of words can be assembled differently, according to precise rules (grammar), to give phrases with different meanings (semantics). Adapted to virology, amino acids (like words) can be assembled differently, according to precise rules (like grammar) to maintain viral replication and infectivity “viral fitness”- and generate proteins able to evade antibodies (like phrases). This model was applied to 3 surface viral proteins that govern binding to cells and which are therefore the target of neutralizing antibodies : the hemagglutinin (HA) of the Influenza A virus (flu); the envelope protein (Env) of HIV-1 (Aids); and the SARS-CoV-2 (COVID-19) spike protein (S). The model was calibrated using over 100,000 different strains of these viruses. Parameters were adjusted or new parameters were added (structural, pharmacological and immunological data from published literature) to refine the model.
What were the results? Firstly, the linguistic algorithm surpassed virological models in terms of precision. What’s more, only the protein sequence was necessary, much simpler to obtain than its 3D structure.
So, firstly, it was predicted that viral fitness decreases when the number of mutations increases. This is pertinent from a biological point of view (more risk of altering the essential functions of the virus), but also linguistic (the more we change the words of a phrase, the more it loses sense). The model correctly predicts the zones of 3 viral proteins that are subject to mutations leading to immune evasion: the “head” domain of HA, V1/V2 of Env, and the RBD of the spike. It correctly predicts those zones that are conserved (that never mutate because they are essential to the protein’s functioning), which are good targets for vaccines or therapies.
This model could prove useful in the development of vaccines. It is based on the fact that mutations cause proteins to evolve under selection pressure, which underlies the very principle of evolution itself, and could therefore apply to other biomedical domains such as, for example, resistance to medicines.