[Important Editor Note: I have since communicated with the authors of the article and other experts in the field and now retract this post in the sense that there are major problems with the methodology of this study. First the bat virus chosen to anchor the tree is too far away to serve as an anchor. You can get whatever tree you want! Second, there are major issues associated with forcing a most parsimonious tree in a set of data with a lot of missing data – perhaps systematically biased missing data. The data set is mostly from China, and even there only a subset. Whatever tree built is a best tree for the data, and may be completely off from reality. The study is still interesting in that it is not assuming a place of origin; it is trying to figure out a place of origin. But the study – at least in its current form – has failed. If I get time, I will write a follow-up post on this study explaining in more detail. Thanks.]
I have been reading this article in the PNAS titled Phylogenetic network analysis of SARS-CoV-2 genomes by Peter Forster, Lucy Forster, Colin Renfrew, and Michael Forster. The study involves a study of the genomes of 160 covid-19 patients from the around the world. As readers may know, viruses are RNA-based entities that periodically and regularly undergo mutations. One can study these mutations and almost like clockwork trace their evolution – i.e. their lineage and migration pattern.
The authors specifically employed a methodology known as “character-based phylogenetic networks”. The technique has been used as the “method of choice” to reconstruct prehistoric human population movements, language evolution, various ecological studies, and some 10,000 phylogenetic studies of diverse organisms – and now virology.
This is an early study – the sample size is only 160 humans – with 100 types. However, the results are stunning. Among the key conclusions:
- There are three major types of coronaviruses, A, B, and C, with type A being the ancestor of SARS-CoV-2 in humans and showing 96.2% similarly to a particular strain of virus in bats.
- Most of the viruses in China and Wuhan are of type B while most of the viruses found in America, Europe and Australia are of type A and C. Type C is not found in Mainland China but is found in significant numbers in Hong Kong, S. Korea, and Taiwan.
- While Type B is found in large numbers across Mainland China (including Wuhan), it is not found in significant numbers around the rest of the world.
- The methodology used was successfully used to trace severally clinically verified cases of virus travel from Wuhan out to various nations, including Brazil and Italy. As such, the author concludes the “character-based phylogenetic networks” methodology was useful and appropriate for studying the spread and evolution of the coronavirus.
- Yet, according to the methodology, the earliest sample of virus studied – collected on December 24 2019 in Wuhan – WAS NOT close to being the ancestor of SARS-CoV-2.
Let all the above sink in for a minute.
Some observations…
First, most of the viruses in the West do not seem to have arisen from China. The authors identified Type B as the main virus type found in Mainland China, with that Type mostly confined to Mainland China and Types A and C predominant outside China – including U.S., Europe and Australia.
Two reasons given for why Type B variants (“China’s virus,” if you must) did not expand much beyond Mainland China: one being “complex founder scenario” and second “the ancestral Wuhan B-type virus is immunologically or environmentally adapted to a large section of the East Asian population, and may need to mutate to overcome resistance outside East Asia.”
Since I have yet to see any reputable studies that shows any strains of the coronavirus having any affinity or dislike to any ethnicity of people 1, let’s focus on “complex founder scenario” and “environment” resistance.
The authors have noted many perplexities in the study. But if we consider the possibility that coronavirus did not originate in Wuhan, those perplexities all go away. More specifically, let’s presume a scenario where the virus was already circulating under the radar in the West and were carried to Wuhan in December or some time before, where it then spread locally within China.
Consider the fact that the authors had noted that Type B variants outside China did not show the “one-month” variations that would have been expected were Type B variants and descents to have traveled out of China to infect the rest of the world.
But if Type B variants – including Type C “descendants” – were already communally established and transmitted outside China, then this paradox easily go away.
Assuming the virus to have been brought to Wuhan instead of originating from Wuhan would also constitute a “complex founding scenario” that the author hypothesized could solve the riddle.
This assumption also provides an explanation for the “environmental resistance” the author hypothesized. If the virus arrived in Wuhan with the Chinese authorities quickly closing down the city soon afterwards, the virus would not have had chance to spread to the rest of the world. The Chinese governments’ shutting down of Wuhan in January could easily form the “environmental resistance” the authors hypothesized for the Type B virus.
Finally, it is important to note that in this study, of the 160 samples, most are of patients in China, only a few from outside Asia. In this study, the author had tentatively labelled Type C as a descendant of Type B found in China. But while Type B is found mostly in Wuhan, Type C is found in significant numbers outside China. As more data from outside China comes live (one hopes soon), the same methodology will probably reveal that the predecessor to Type B and Type C arose outside not inside China. Type A and Type C thus all arose outside China and independently of China, with B going on to Wuhan and rest of China.
While the current study is China-centric (most data are from China), it already has established that the virus probably did not arise in Wuhan. The authors noted importantly in the data supplement “that the oldest isolate from 24 December 2019 (brown node, week 0) lies diagonally opposite to the bat virus outgroup root.”
As we get more data, studies such as this will shed a lot of light on the origins of the coronavirus. It is really too bad, such a shame that the U.S. and Europe has missed such critical times testing and tracking the viral flow. It is truly shameful the attempts of U.S. officials to try to blame China for the virus. Communities that succumb to the virus are victims, not perpetrators of this disease! Luckily, even with limited data, we are already able draw some shocking conclusions that pierce some of these lies!
Notes:
Charles Liu says
Here’s another article on the origin question, thru genome analysis: https://www.nature.com/articles/s41591-020-0820-9
Mr. Allen says
@Charles Liu
Important parts of Nature article:
A couple of note about thinking coronavirus came from bats – the closest genome we coronavirus diverges from SARS-CoV-2 in the all-important binding region.
From a separate analysis I read, it is noted that the evolution distance between RaTG13 and SARS-CoV-2 is something like 50 years. That’s a long time. If we don’t find any bat viruses closer than RaTG13, bats are the wrong tree we are barking on.