Petabase-scale sequence alignment catalyses viral discovery Edgar et al. Nature January 2022
Public sequence databases (NCBI, EBI) contain over 20 peta bases and are growing exponentially. Peta is 1015, so 1 petasecond is 31.7 million years, 1 light year is 9.461 petametres. The sequences are filed as evidence supporting publications and as outputs from large scale projects. Because the scale is large, they are hard to search efficiently. This paper presents open source tools to perform sequence alignment using cloud computing. They looked for evidence of previously unidentified RNA viruses and identified of 100,000 novel viruses, over 10 times the number already known. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics. Old-school methods for finding viruses usually involved growing them in a suitable hosts like chick embryos or cell culture, or in some cases by electron microscopy of infected material. Recent methods have switched...