Gandabert: Transfer Learning With Mbert for Luganda News Classification

dc.contributor.author	Seth Mbasha
dc.date.accessioned	2025-06-12T12:37:52Z
dc.date.available	2025-06-12T12:37:52Z
dc.date.issued	2025-05
dc.description.abstract	Luganda, spoken by over 21 million Ugandans, is significantly under‐resourced in Natural Language Processing (NLP), lacking effective tools like news classifiers. This gap hinders digital information access and contributes to the digital language divide. This research project addressed this challenge by developing GandaBERT, a model for Luganda news classification. The methodology involved fine‐tuning the multilingual BERT (mBERT) model on a novel multi‐source dataset comprising 2,609 native, translated, and synthetic Luganda news articles across five categories (Politics, Business, Sports, Health, Religion). Evaluation on a held‐out test set showed GandaBERT achieved an overall accuracy of 85.7%. While demonstrating strong performance in certain categories like Politics, challenges and variations across topics were observed, partly linked to overfitting during training. This study confirms the viability of applying transfer learning with mBERT for practical Luganda NLP tasks, provides a valuable classification tool, and contributes towards enhancing digital resources for this low‐resource language.
dc.identifier.uri	https://hdl.handle.net/20.500.12311/2704
dc.language.iso	en
dc.publisher	Uganda Christian University
dc.title	Gandabert: Transfer Learning With Mbert for Luganda News Classification
dc.type	Thesis

Files

Now showing 1 - 1 of 1

Now showing 1 - 1 of 1