Towards a Topic Discovery and Tracking System with Application to News Items

Daniel Brüggermann, Yannik Hermey, Carsten Orth, Darius Schneider, Stefan Selzer, Gerasimos Spanakis

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

Abstract

Rapid proliferation of the World Wide Web led to an enormous increase in the availability of textual corpora. In this paper, the problem of topic detection and tracking is considered with application to news items. The proposed approach explores two algorithms (Non-Negative Matrix Factorization and a dynamic version of Latent Dirichlet Allocation (DLDA)) over discrete time steps and makes it possible to identify topics within storylines as they appear and track them through time. Moreover, emphasis is given to the visualization and interaction with the results through the implementation of a graphical tool (regardless the approach). Experimental analysis on Reuters RCV1 corpus and the Reuters 2015 archive reveals that explored approaches can be effectively used as tools for identifying topic appearances and their evolutions while at the same time allowing for an efficient visualization.
Original languageEnglish
Title of host publicationFuture and Emerging Trends in Language Technology: Machine Learning and Big Data: Second International Workshop, FETLT 2016, Seville, Spain, November 30 --December 2, 2016, Revised Selected Papers
EditorsJosé F Quesada, Francisco-Jesús Martín Mateos, Teresa López Soto
Place of PublicationCham
PublisherSpringer
Pages183-197
Number of pages15
ISBN (Print)978-3-319-69365-1
DOIs
Publication statusPublished - Nov 2016

Cite this

Brüggermann, D., Hermey, Y., Orth, C., Schneider, D., Selzer, S., & Spanakis, G. (2016). Towards a Topic Discovery and Tracking System with Application to News Items. In J. F. Quesada, F-J. Martín Mateos, & T. López Soto (Eds.), Future and Emerging Trends in Language Technology: Machine Learning and Big Data: Second International Workshop, FETLT 2016, Seville, Spain, November 30 --December 2, 2016, Revised Selected Papers (pp. 183-197). Springer. https://doi.org/10.1007/978-3-319-69365-1_15