A reflection on how machine learning is helping translate Kinyarwanda to English
It has been a few weeks since we ran the Kinyarwanda Text Cleaning and Augmentation Hackathon and Competition in partnership with GIZ and the Digital Transformation Center Rwanda (DTC).
The objective of the competition was to create a text-cleaning script for Kinyarwanda. The participants were tasked with uncovering and remediating issues in the provided parallel corpus consisting of English-Kinyarwanda sentence pairs, and finding additional data to clean through other means like data augmentation or web scraping. Skills needed to solve the challenge were good coding habits (computational) linguistics or natural language processing.
Digital Transformation Center is a Rwandan-German initiative to develop impact-driven digital solutions in Africa. One of the focus areas of the Digital Transformation Center is Artificial Intelligence. Against this background, the AI Hub Rwanda has been founded, bundling all AI initiatives implemented by GIZ in Rwanda. It comprises two projects, the global FAIR Forward program as well as the DSSD program an adjacent component to improve the precondition(s) of use for Machine Translation in Rwanda.
Let’s dive into the Hackathon, winners and the solutions provided and the deployment of the solutions.
The physical Hackathon in Kigali
It was a two-day event that was held on July 9 and 10th 2022 at the Digital Transformation Center in Kigali, Rwanda with an impressive turnout of 40 participants in person and 15 participants online.
The first day involved the participants forming teams to better collaborate and tackle the challenge, 10 teams were formed and 5 individuals, after intensive and extensive brainstorming and different approaches, the judges determined the winners. The winners were: position 1 Team KF2R, position 2 Team Underscore and position 3 Team Nigh-Omni.
In describing their approach to having a winning solution Rose Mary part of Team KF2R says “We attempted to remove English words that were not proper nouns and that were not kinyarwanda. There were some mistranslations where English had been included. Our use of functions and elegant code set us above the rest, We also did our best to make sure that we didn’t lose any useful data while cleaning. In addition, we collaborated as a team and made our solution run in a short time”.
Additionally, Kefas said, “I think what really set our winning solution is making sure we strictly followed the steps involved in NLP data pre processing, such as Data exploration using different tools such as Microsoft Excel and Jupyter notebook to assess the quality of the dataset, normalizing the dataset to have a uniform casing, checking and removing unwanted characters, stopwords, emails amongst other techniques to have a cleaned dataset”.
The physical hackathon allowed Rwandese data scientists to meet and interact with the AI community physically. It increased their sense of belonging working on their language and gave them a chance to meet with GIZ.
The hackathon was then opened to other nationalities online.
The online competition – GIZ Kinyarwanda Text Cleaning and Augmentation Competition
The online competition started on 25th July 2022 and closed on 7th August 2022. It was open to everyone and lasted two weeks with over 100 participants. It was interesting to see participants form teams online to tackle the challenge we received over 9 submissions and Arnauld Kayonga was declared the winner.
‘’Collaboration with GIZ and DTC has been very instrumental to Zindi, we believe in creating solutions that bring change to the community. The hackathon showed us that African data scientists could solve our own challenges and the future is bright for us”. Says Delilah Rose, Community Coordinator at Zindi.
The solutions were submitted to DTC and we are looking forward to having a machine translation model in the near future.
In efforts to drive forward the natural language procesing field in Rwanda, The digicenter with different partners have launched the NLP fellowship with a focus on Conversational AI and machine translation. The participants will learn and acquire practical experience from leading interactions with computer implications.