Social media as a data source for official statistics; the Dutch Consumer Confidence Index

Jan van den Brakel*, E. Söhler, P. Daas, B. Buelens

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

In this paper the question is addressed how alternative data sources, such as administrative and social media data, can be used in the production of official statistics. Since most surveys at national statistical institutes are conducted repeatedly over time, a multivariate structural time series modelling approach is proposed to model the series observed by a repeated surveys with related series obtained from such alternative data sources. Generally, this improves the precision of the direct survey estimates by using sample information observed in preceding periods and information from related auxiliary series. This model also makes it possible to utilize the higher frequency of the social media to produce more precise estimates for the sample survey in real time at the moment that statistics for the social media become available but the sample data are not yet available. The concept of cointegration is applied to address the question to which extent the alternative series represent the same phenomena as the series observed with the repeated survey. The methodology is applied to the Dutch Consumer Confidence Survey and a sentiment index derived from social media.
Original languageEnglish
Pages (from-to)183-210
Number of pages28
JournalSurvey Methodology
Volume43
Issue number2
Publication statusPublished - 1 Dec 2017

Keywords

  • Big data
  • Design-based inference
  • Model-based inference
  • Nowcasting
  • Structural time series modelling
  • Cointegration
  • ROTATING PANEL DESIGN
  • SMALL-AREA ESTIMATION
  • TIME-SERIES METHODS
  • MODEL
  • UNEMPLOYMENT
  • SELECTION
  • ERRORS

Cite this