Automatic creation of Wikipedia articles about Zambia utilizing Retrieval-Augmented Generation techniques and fact-based vector databases
Main Article Content
Abstract
This study investigates the automatic creation of Wikipedia articles about Zambia through the use of Retrieval-Augmented Generation (RAG) techniques integrated with fact-based vector databases. While Wikipedia serves as a vital open-access knowledge platform, its coverage of Zambia remains inadequate, with many topics underrepresented or missing. Generative AI, particularly Large Language Models (LLMs), presents opportunities for addressing these gaps but is hindered by issues such as factual hallucination and reliance on low-quality, machine-translated web data. To address these challenges, this research proposes a RAG-based approach that grounds content generation in curated, reliable datasets to improve accuracy, contextual relevance, and editorial usability. The study employs a mixed-methods design involving controlled experiments with Zambian university students, implementation of a RAG prototype system, and evaluation of editor acceptance of AI-generated drafts. Key objectives include assessing whether factual resources increase willingness to contribute, evaluating the effectiveness of RAG in producing reliable Wikipedia content, and exploring editor perceptions of AI assistance. By combining technical development with empirical evaluation, this research contributes to both the advancement of trustworthy AI content generation and the promotion of equitable digital knowledge representation for underrepresented regions such as Zambia.