Online estimation of individual-level effects using streaming shrinkage factors

L. Ippel*, M. C. Kaptein, J. K. Vermunt

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

It has become increasingly easy to collect data from individuals over long periods of time. Examples include smart-phone applications used to track movements with GPS, web-log data tracking individuals' browsing behavior, and longitudinal (cohort) studies where many individuals are monitored over an extensive period of time. All these datasets cover a large number of individuals and collect data on the same individuals repeatedly, causing a nested structure in the data. Moreover, the data collection is never 'finished' as new data keep streaming in. It is well known that predictions that use the data of the individual whose individual-level effect is predicted in combination with the data of all the other individuals, are better in terms of squared error than those that just use the individual mean. However, when data are both nested and streaming, and the outcome variable is binary, computing these individual-level predictions can be computationally challenging. Five computationally-efficient estimation methods which do not revise "old" data but do account for the nested data structure are developed and evaluated. The methods are based on existing shrinkage factors. A shrinkage factor is used to predict an individual-level effect (i.e., the probability to score a 1), by weighing the individual mean and the mean over all data points. The performance of the existing and newly developed shrinkage factors are compared in a simulation study. While the existing methods differ in their prediction accuracy, the differences in accuracy between the novel shrinkage factors and the existing methods are extremely small. The novel methods are however computationally much more appealing. (C) 2019 Elsevier B.V. All rights reserved.

Original languageEnglish
Pages (from-to)16-32
Number of pages17
JournalComputational Statistics & Data Analysis
Volume137
DOIs
Publication statusPublished - Sept 2019

Keywords

  • Data streams
  • Shrinkage factors
  • James-Stein estimator
  • Online learning
  • Nested data
  • CONCEPT DRIFT
  • MULTILEVEL
  • INFERENCE

Cite this