Estimating random-intercept models on data streams

L Ippel*, Maurits Kaptein*, Jeroen K Vermunt*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Multilevel models are often used for the analysis of grouped data. Grouped data occur for instance when estimating the performance of pupils nested within schools or analyzing multiple observations nested within individuals. Currently, multilevel models are mostly fit to static datasets. However, recent technological advances in the measurement of social phenomena have led to data arriving in a continuous fashion (i.e., data streams). In these situations the data collection is never "finished". Traditional methods of fitting multilevel models are ill-suited for the analysis of data streams because of their computational complexity. A novel algorithm for estimating random-intercept models is introduced. The Streaming EM Approximation (SEMA) algorithm is a fully-online (row-by-row) method enabling computationally-efficient estimation of random-intercept models. SEMA is tested in two simulation studies, and applied to longitudinal data regarding individuals' happiness collected continuously using smart phones. SEMA shows competitive statistical performance to existing static approaches, but with large computational benefits. The introduction of this method allows researchers to broaden the scope of their research, by using data streams. (C) 2016 Elsevier B.V. All rights reserved.
Original languageEnglish
Pages (from-to)169-182
Number of pages14
JournalComputational Statistics & Data Analysis
Volume104
DOIs
Publication statusPublished - 2016
Externally publishedYes

Cite this