Towards a Topic Discovery and Tracking System with Application to News Items

Daniel Brüggermann, Yannik Hermey, Carsten Orth, Darius Schneider, Stefan Selzer, Gerasimos Spanakis

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

Abstract

Rapid proliferation of the World Wide Web led to an enormous increase in the availability of textual corpora. In this paper, the problem of topic detection and tracking is considered with application to news items. The proposed approach explores two algorithms (Non-Negative Matrix Factorization and a dynamic version of Latent Dirichlet Allocation (DLDA)) over discrete time steps and makes it possible to identify topics within storylines as they appear and track them through time. Moreover, emphasis is given to the visualization and interaction with the results through the implementation of a graphical tool (regardless the approach). Experimental analysis on Reuters RCV1 corpus and the Reuters 2015 archive reveals that explored approaches can be effectively used as tools for identifying topic appearances and their evolutions while at the same time allowing for an efficient visualization.
Original languageEnglish
Title of host publicationFuture and Emerging Trends in Language Technology: Machine Learning and Big Data, FETLT 2016
EditorsJosé F Quesada, Francisco-Jesús Martín Mateos, Teresa López Soto
Place of PublicationCham
PublisherSpringer, Cham
Pages183-197
Number of pages15
ISBN (Electronic)978-3-319-69365-1
ISBN (Print)978-3-319-69364-4
DOIs
Publication statusPublished - Nov 2016

Publication series

SeriesLecture Notes in Computer Science
Volume10341
ISSN0302-9743

Cite this