In Ghana where English is the official language for education and commerce, four in five people speak Twi either as a first or second language. Different variations have extended its usage to other west African towns in Benin and Côte d’Ivoire, bringing the estimated number of speakers to 18 million people.
This number could soon increase.
Mozilla, the web browser company, says it has added Twi to Common Voice, its open source database of languages crowdsourced from real people who speak them. The aim is to feed the data into speech recognition software and increase the availability of diverse local languages on the internet, deconstructing a world where European-colonial languages are the default (in some cases, only) mediums of online communication.
African linguists have complained for years that the internet is eliminating entire histories since “if a machine doesn’t understand your language it will be like it never existed,” Vukosi Marivate, chief of data science at the University of Pretoria in South Africa, said last year. That is mostly due to a lack of effort by internet-based platforms that fail to provide the necessary support features: most do not provide first-language interface support for more than 90% of all Africans, as a survey by an Oxford Internet Institute-backed survey found.
Even Google and Wikipedia, with relatively good support features, “exclude almost half the African population on the basis of primary language.”
Twi is the 100th language to be added to Mozilla’s Common Voice. It joins other African languages Kiswahili, Luganda, Hausa, Tigrinya, Tigre, Igbo, and Kinyarwanda on the project, though each is at a different level of validation based on the volume voice samples users have contributed.
Adding these languages will enable the communities that speak them to “tap into the possibilities of speech technology—creating a healthier and more open AI ecosystem,” EM Lewis-Jong, product lead for Common Voice, said in a statement.
Up to 2,000 languages are spoken in Africa, with 75 having at least 1 million speakers. While it may not necessarily be possible to have all of them online, Common Voice’s collection of just seven so far shows there’s some way to go for African languages to gain a mature presence on the internet.
That said, Twi’s addition joins recent inclusive efforts in that direction: last year, the language learning app Duolingo started plans to offer Zulu and Xhosa, two of South Africa’s most popular languages.
And as to the challenge sometimes cited to justify the absence of African languages on the internet, namely that most are oral with little written corpus, more universities are offering coursework on African languages to expand expertise. Twi, for example, is taught at Rutgers in New Jersey. Last year, Oxford University’s modern languages department formally began teaching Igbo (spoken by around 25 million people in southeast Nigeria) as a course with an instruction manual and an approved lecturer.