Gandabert: Transfer Learning With Mbert for Luganda News Classification

dc.contributor.authorSeth Mbasha
dc.date.accessioned2025-06-12T12:37:52Z
dc.date.available2025-06-12T12:37:52Z
dc.date.issued2025-05
dc.description.abstractLuganda, spoken by over 21 million Ugandans, is significantly under‐resourced in Natural Language Processing (NLP), lacking effective tools like news classifiers. This gap hinders digital information access and contributes to the digital language divide. This research project addressed this challenge by developing GandaBERT, a model for Luganda news classification. The methodology involved fine‐tuning the multilingual BERT (mBERT) model on a novel multi‐source dataset comprising 2,609 native, translated, and synthetic Luganda news articles across five categories (Politics, Business, Sports, Health, Religion). Evaluation on a held‐out test set showed GandaBERT achieved an overall accuracy of 85.7%. While demonstrating strong performance in certain categories like Politics, challenges and variations across topics were observed, partly linked to overfitting during training. This study confirms the viability of applying transfer learning with mBERT for practical Luganda NLP tasks, provides a valuable classification tool, and contributes towards enhancing digital resources for this low‐resource language.
dc.identifier.urihttps://hdl.handle.net/20.500.12311/2704
dc.language.isoen
dc.publisherUganda Christian University
dc.titleGandabert: Transfer Learning With Mbert for Luganda News Classification
dc.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mbasha S_BSCS_2025.pdf
Size:
1.92 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: