Gandabert: Transfer Learning With Mbert for Luganda News Classification
dc.contributor.author | Seth Mbasha | |
dc.date.accessioned | 2025-06-12T12:37:52Z | |
dc.date.available | 2025-06-12T12:37:52Z | |
dc.date.issued | 2025-05 | |
dc.description.abstract | Luganda, spoken by over 21 million Ugandans, is significantly under‐resourced in Natural Language Processing (NLP), lacking effective tools like news classifiers. This gap hinders digital information access and contributes to the digital language divide. This research project addressed this challenge by developing GandaBERT, a model for Luganda news classification. The methodology involved fine‐tuning the multilingual BERT (mBERT) model on a novel multi‐source dataset comprising 2,609 native, translated, and synthetic Luganda news articles across five categories (Politics, Business, Sports, Health, Religion). Evaluation on a held‐out test set showed GandaBERT achieved an overall accuracy of 85.7%. While demonstrating strong performance in certain categories like Politics, challenges and variations across topics were observed, partly linked to overfitting during training. This study confirms the viability of applying transfer learning with mBERT for practical Luganda NLP tasks, provides a valuable classification tool, and contributes towards enhancing digital resources for this low‐resource language. | |
dc.identifier.uri | https://hdl.handle.net/20.500.12311/2704 | |
dc.language.iso | en | |
dc.publisher | Uganda Christian University | |
dc.title | Gandabert: Transfer Learning With Mbert for Luganda News Classification | |
dc.type | Thesis |