Skip to main content

KenCORPUS: Kenyan Languages Corpus Lacuna Project Performance Review Workshop

Content Builder
KenCORPUS Updates

About KenCORPUS PROJECT

A Research Team of Investigators, including the Principal Investigator, from Maseno University’s Departments of Computer Science and Kiswahili and Other African Languages was awarded Research Funds from the LACUNA FUND for a Project titled, KenCorpus: Kenyan Languages Corpus. The Project is in collaboration with Investigators from University of Nairobi and Africa Nazarene University. The researchers shall be creating datasets for Kiswahili, Dholuo, and selected Luluhyia Dialects for eventual Annotation so that these languages that are classified as underserved are enhanced in presence among those that can be applied in machine learning.

This project will have a great impact on the methodologies used in the rapid assembly of under-resourced languages corpora and shed light on how to prepare and annotate speech and texts for use in multilingual communities. Upcoming technology firms interested in human language technology solutions are also going to benefit from this project because they will see pioneer prototypes that could inspire commercial systems thereby motivating co-funding and probable cooperation in future projects


The aim of this Research Project will be to among others; collect natural-occuring language texts in Kiswahili, Dholuo and Luhyia, to Collect speech data for Kiswahili, Dholuo and Luhyia Languages, to translate the DHoluo and Luhyia texts into Kiswahili for Machine Translation, to collect questions and answer pairs for the Kiswahili texts for Machine Comprehension and to annotate the Kiswahili, Dholuo and Luhyia texts with Part of  Speech tags.

Project Performance Review Workshop Activities

 Data cleaning

a.       Digitizing handwritten text

b.      Proof reading the digitized text

  Translation

a.       Dhluo to Kiswahili

b.      Luhya (Lumarachi, Luloogoli, Lubuusu)  to Kiswahili

Transcription (Kiswahili)

a.       Reading out sentences

b.      Listening to voice data and typing out

 Part of Speech (POS) Tagging

a.       Dholuo Text

b.      Luhya (Lumarachi, Luloogoli, Lubukusu)

PERFORMANCE REVIEW WORKSHOP

See Activities


Speech Transcription team and Speech to Text Proof of Concept

Mariachi POS annotation and translation team

Question Answering Team


Website Development Team

Workshop

Activities

Worshops

Presentations

Welcome Remarks

Dr. Lilian Wanzare, the Principal Investigator to KenCORPUS Project, welcomed participants to the Performance Review Worhsop and hightlighed the objectives of the Project and deliverables expected of the workshop.

It's

Team Bonding

Visit KenCORPUS Website

          @  https://kencorpus.co.ke/

 

The KenCORPUS Project

Principal Investigators

________________________________________

GROUPS IN ACTION

PRESENTATIONS BY TEAM LEADERS

Logoli Team Leader

Logoli Team Leader

Marachi Team Leader

Marachi Team Leader

Dholuo Team Leader

Dholuo Team Leader

Bukusu Team Leader

Bukusu Team Leader

Question & Answer Team Leader

Question & Answer Team Leader

Speech Transcription Team Leader

Speech Transcription Team Leader

Data Cleaning Team Leader

Data Cleaning Team Leader

Principal Investigators Team

Principal Investigators Team