?:abstract
|
-
Many functional RNA structures are conserved across evolution, and such conserved structures provide critical targets for diagnostics and treatment. TurboFold II is a state-of-the-art software that can predict conserved structures and alignments given homologous sequences, but its cubic runtime and quadratic memory usage with sequence length prevent it from being applied to most full-length viral genomes. As the COVID-19 outbreak spreads, there is a growing need to have a fast and accurate tool to identify conserved regions of SARS-CoV-2. To address this issue, we present LinearTurboFold, which successfully accelerates TurboFold II without sacrificing accuracy on secondary structure and multiple sequence alignment prediction. LinearTurboFold is orders of magnitude faster than Turbo-Fold II, e.g., 372× faster (12 minutes vs. 3.1 days) on a group of five HIV-1 homologs with average length 9,686 nt. LinearTurboFold is able to scale up to the full sequence of SARS-CoV-2, and identifies conserved structures that have been supported by previous studies. Additionally, LinearTurboFold finds a list of novel conserved regions, including long-range base pairs, which may be useful for better understanding the virus.
|