This story has been updated with news of WHO’s classification of the variant.
But thanks to a paper published in Nature in July 2020, the variant already has its official name: B.1.1.529. This string of letters and numbers doesn’t roll off the tongue easily, and a single digit misplaced or left out transforms the very meaning of it all. But the sequence makes sense to biologists: it conveys, at a glance, the evolutionary history of the variant virus. And like the Greek nomenclature, the official name avoids the unfair stigma and consequent xenophobia that comes with naming a virus after the place where it was first discovered.
By June 2020, biologists from the universities of Edinburgh, Oxford, and Cambridge wrote in their paper, at least 35,000 sequences of the novel coronavirus’s genetic code were available—testimony not only to how closely the virus was being studied the world over but also how easy genetic sequencing has become over the past decade. As variants emerged, they needed names, “before the scientific literature and communication become further confused,” the scientists wrote.
The system they came up with is known as Pango, from the Latin for “I record” or “I set.” The first letter in the sequence denotes the virus’ root lineage: its genetic resemblance to the closest-known “ancestor” virus in the wild. The earliest lineage A viruses, for instance, show two key genetic molecules (or nucleotides) occurring in the same spots in the genetic code as in two bat viruses. These lineage A viruses were sampled on Jan. 5, 2020, in Wuhan. The first lineage B viruses were sampled on Dec. 24, 2019, in Wuhan, but since these have different nucleotides in those genetic positions, they form a subsequent lineage.
Viruses descending from these lineages get categorized into A.1 or B.1. The category B.1, for instance, refers to a variant traced back to the covid-19 outbreak in northern Italy in early 2020. For a variant to be considered a legitimate branch of an existing lineage, the scientists set down some rules:
- The variant should be transmitted from its place of first discovery into another “geographically distinct population”—another country, say, or another province of a large and populous country.
- It should have at least one key difference in nucleotides from its ancestor.
- At least 95% of its genetic code should have been sequenced a minimum of five times, from five different samples.
The sequence is extendable: the B.1.1 lineage, for instance, descends from its B.1 ancestor using the same rules. But after three levels of digits, the Pango classification recommends starting a new string with a fresh letter. So, for instance, a fourth-level lineage, potentially earning the name B.188.8.131.52, was traced back to a variant first sampled in South Africa in March 2020. Its Pango nomenclature, though, is C.1.
For the average reader, the string may be difficult to remember; South Africa’s health minister once got confused between the B.1.315 and the B.1.351 variants. But a scientist glancing at the name B.1.1.529 will thus be able to know immediately that its lineage goes back to the ancestor sampled first in Wuhan in Dec. 2019, and that its gene sequences make it yet another member of the particularly diverse B.1.1 clan. The utility of these letters and numbers lies in how they’re able to capture, in a stroke, the history and spread of a variant’s family tree.