Abstract
In this paper the question is addressed how alternative data sources, such as administrative and social media data, can be used in the production of official statistics. Since most surveys at national statistical institutes are conducted repeatedly over time, a multivariate structural time series modelling approach is proposed to model the series observed by a repeated surveys with related series obtained from such alternative data sources. Generally, this improves the precision of the direct survey estimates by using sample information observed in preceding periods and information from related auxiliary series. This model also makes it possible to utilize the higher frequency of the social media to produce more precise estimates for the sample survey in real time at the moment that statistics for the social media become available but the sample data are not yet available. The concept of cointegration is applied to address the question to which extent the alternative series represent the same phenomena as the series observed with the repeated survey. The methodology is applied to the Dutch Consumer Confidence Survey and a sentiment index derived from social media.
Original language | English |
---|---|
Pages (from-to) | 183-210 |
Number of pages | 28 |
Journal | Survey Methodology |
Volume | 43 |
Issue number | 2 |
Publication status | Published - 1 Dec 2017 |
Keywords
- Big data
- Design-based inference
- Model-based inference
- Nowcasting
- Structural time series modelling
- Cointegration
- ROTATING PANEL DESIGN
- SMALL-AREA ESTIMATION
- TIME-SERIES METHODS
- MODEL
- UNEMPLOYMENT
- SELECTION
- ERRORS