Automation of Topic Generation in Government Information Requests in Mexico

In Mexico, legislation guarantees public access to information, empowering citizens to request data from the government. This research delves into the National Transparency Platform's extensive archive, which includes over 2 million requests for information, with the goal of discerning the primary interests of citizens in government actions from 2003 to 2020. Through the analysis of 2,518,875 requests, Genetic Algorithms were employed to fine-tune three crucial hyperparameters of the Latent Dirichlet Allocation (LDA) model: alpha, beta, and the number of topics.
This optimization aimed at enhancing the model's accuracy in topic identification, measured by the coherence metric of the topics identified.
Additionally, Generative Pre-trained Transformer (GPT) technology facilitated the automatic generation of titles and descriptions for these topics. The investigation revealed 4,131 topics of public interest throughout the Mexican Republic, with significant emphasis on environmental management, public policies, the response to the COVID-19 health crisis, labor issues, and education in 2020. These findings underscore the critical role of proactive transparency and the provision of open data in advancing the analysis of vast quantities of government data.
This study paves the way for future data-driven decision-making and policy development research.
It highlights the profound influence of sophisticated data analysis in promoting government transparency and stimulating citizen engagement.
Using genetic algorithms to refine the LDA model and large language model technology for content generation, this study innovates analyzing public information requests, contributing significantly to improving governmental transparency.


keywords : Genetic Algorithm, Latent Dirichlet Allocation, Governmental Transparency, Open Data, Data Processing.

Descarga el archivo aquí