Rapid proliferation of the World Wide Web led to an enormous increase in the availability of textual corpora. In this paper, the problem of topic detection and tracking is considered with application to news items. The proposed approach explores two algorithms (Non-Negative Matrix Factorization and a dynamic version of Latent Dirichlet Allocation (DLDA)) over discrete time steps and makes it possible to identify topics within storylines as they appear and track them through time. Moreover, emphasis is given to the visualization and interaction with the results through the implementation of a graphical tool (regardless the approach). Experimental analysis on Reuters RCV1 corpus and the Reuters 2015 archive reveals that explored approaches can be effectively used as tools for identifying topic appearances and their evolutions while at the same time allowing for an efficient visualization.
|Title of host publication||Future and Emerging Trends in Language Technology: Machine Learning and Big Data: Second International Workshop, FETLT 2016, Seville, Spain, November 30 --December 2, 2016, Revised Selected Papers|
|Editors||José F Quesada, Francisco-Jesús Martín Mateos, Teresa López Soto|
|Place of Publication||Cham|
|Number of pages||15|
|Publication status||Published - Nov 2016|
Brüggermann, D., Hermey, Y., Orth, C., Schneider, D., Selzer, S., & Spanakis, G. (2016). Towards a Topic Discovery and Tracking System with Application to News Items. In J. F. Quesada, F-J. Martín Mateos, & T. López Soto (Eds.), Future and Emerging Trends in Language Technology: Machine Learning and Big Data: Second International Workshop, FETLT 2016, Seville, Spain, November 30 --December 2, 2016, Revised Selected Papers (pp. 183-197). Springer. https://doi.org/10.1007/978-3-319-69365-1_15