Liang Huang

Efficient Algorithms for mRNA Vaccine Design

2020 Responsible Machine Learning Summit: AI and COVID-19


Liang Huang
Liang Huang, Distinguished Scientists, Baidu Research USA and Associate Professor of Computer Science, Oregon State University

Efficient Algorithms for mRNA Vaccine Design

Abstract: To defeat the current COVID-19 pandemic, a messenger RNA (mRNA) vaccine has emerged as a promising approach thanks to its rapid and scalable production and non-infectious and non-integrating properties. However, designing an mRNA sequence to achieve high stability and protein yield remains a challenging problem due to the exponentially large search space (e.g., there are 2.4 x 10^632 possible mRNA sequence candidates for the spike protein of SARS-CoV-2). We describe two on-going efforts for this problem, both using linear-time algorithms inspired by my earlier work in natural language parsing. On one hand, the Eterna OpenVaccine project from Stanford Medical School takes a crowd-sourcing approach to let game players all over the world design stable sequences. To evaluate sequence stability (in terms of free energy), they use LinearFold from my group (2019) since it’s the only linear-time RNA folding algorithm available (which makes it the only one fast enough for COVID-scale genomes). On the other hand, we take a computational approach to directly search for the optimal sequence in this exponentially large space via dynamic programming. It turns out this problem can be reduced to a classical problem in formal language theory and computational linguistics (intersection between CFG and DFA), which can be solved in O(n^3) time, just like lattice parsing for speech. In the end, we can design the optimal mRNA vaccine candidate for SARS-CoV-2 spike protein in just about 20 minutes.

Biography: Liang Huang (PhD, Penn, 2008) is an Associate Professor of Computer Science at Oregon State University and Distinguished Scientist at Baidu Research USA. He is a leading theoretical computational linguist, and was recognized at ACL 2008 (Best Paper Award) and ACL 2019 (Keynote Speech), but in recent years he has been more interested in applying his expertise in parsing, translation, and grammar formalisms to biology problems such as RNA folding and RNA design. Since the outbreak of COVID-19, he has shifted his attention to the fight against the virus, which resulted in efficient algorithms for stable mRNA vaccine design, again adapted from mathematical linguistics