MAKING USEFUL TOOLS FOR LANGUAGE COMMUNITIES WITHOUT BIG CORPORA: A VERSATILE MORPHOLOGICAL ANALYSER AND SPELLCHECKER FOR SONGHAY BASED ON FREE SOFTWARE
Keywords:
free software - spellchecker - morphological tagging - morphology - Songhay language - under-resourced languagesAbstract
The current excitement over data-driven artificial intelligence makes headlines across the world, including in Africa. Accordingly, AI is supposed to enable an exponential growth in digital resources and level the playing field for less-resourced languages. However, this enthusiasm largely eludes the correlation between AI capacity and the availability of massive and well-curated datasets. In this regard, for the most part, African languages do not fulfill the basic requirements to participate in the new “revolution”. The Songhay language is a case in point, with slow and continuous work done over two decades to develop even a basic tool like a spellchecker.
Still, even limited experience with localization and building corpora helps toward developing more advanced tools and content over time.
References
Baker, M. ‘Verbal Adjectives as Adjectives without Phi-Features’. Proceedings of the Fourth Tokyo Conference on Psycholinguistics. Yukio Otsu (ed.), Keio University, 1-22, 2003.
Bailey, D. ‘Software Localization: Open Source as a Major Tool for Digital Multilingualism’. In Vannini, L., Le Crosnier, H. (eds.). NET.LANG: Toward the Multilingual Cyberspace. C&F Editions, 2012. http://net-lang.net
Ebongue, A. E., Hurst, E. (eds.). Sociolinguistics in African Contexts: Perspectives and Challenges, Switzerland: Springer International Publishing. 2017.
Ekkehard Wolff, H. Language ideologies and the politics of language in post-colonial Africa. Stellenbosch Papers in Linguistics Plus, Vol. 51, 2017, 1-22 doi: 10.5842/51-0 - 701
Haïdara, Y.M. Dictionnaire soŋay-français. Kalima citaabu soŋay-annasaara senni. Bamako: EDIS, 2010.
Kamwangamalu, N. M. ‘Language ideologies and practices in Africa’. Journal of Sociolinguistics. 2019; 23: 543–554. https://doi.org/10.1111/josl.12327
Ògúnrẹ̀mí, T., Onyothi Nekoto, W., Samuel, S. Decolonizing NLP for “Low-resource Languages”, 2023.
Pirinen, T., Lindén, K. ‘Creating and Weighting Hunspell Dictionaries as Finite-State Automata’. Investigationes Linguisticae. vol. XXI, 2010.
Osborn, D. African Languages in a Digital Age: Challenges and Opportunities for Indigenous Language Computing. HSRC & IDRC, 2010. Prost, A. La langue soṅay et ses dialectes. Dakar: IFAN, 1956.
Ssentanda, M., Nakayiza, J. “Without English There Is No Future”: The Case of Language Attitudes and Ideologies in Uganda. In: Sociolinguistics in African Contexts, 2017, http://doi.org/10.1007/978-3-319-49611-5_7
Stassen, L. ‘Predicative Adjectives’. In Dryer, M. S., Haspelmath, M. (eds.), The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology, 2013. (http://wals.info/chapter/118, accessed on 2015-05-02)