Optimal two-stage sampling for mean estimation in multilevel populations when cluster size is informative

Francesco Innocenti*, Math Candel, Frans Tan, Gerard J.P. van Breukelen

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

To estimate the mean of a quantitative variable in a hierarchical population, it is logistically convenient to sample in two stages (two-stage sampling), i.e. selecting first clusters, and then individuals from the sampled clusters. Allowing cluster size to vary in the population and to be related to the mean of the outcome variable of interest (informative cluster size), the following competing sampling designs are considered: sampling clusters with probability proportional to cluster size, and then the same number of individuals per cluster; drawing clusters with equal probability, and then the same percentage of individuals per cluster; and selecting clusters with equal probability, and then the same number of individuals per cluster. For each design, optimal sample sizes are derived under a budget constraint. The three optimal two-stage sampling designs are compared, in terms of efficiency, with each other and with simple random sampling of individuals. Sampling clusters with probability proportional to size is recommended. To overcome the dependency of the optimal design on unknown nuisance parameters, maximin designs are derived. The results are illustrated, assuming probability proportional to size sampling of clusters, with the planning of a hypothetical survey to compare adolescent alcohol consumption between France and Italy.
Original languageEnglish
Pages (from-to)357-375
Number of pages19
JournalStatistical Methods in Medical Research
Volume30
Issue number2
DOIs
Publication statusPublished - Feb 2021

Keywords

  • Cross-national comparisons, informative cluster size, maximin design, optimal design, sample size calculation, two-stage sampling
  • D-OPTIMAL DESIGNS
  • maximin design
  • Cross-national comparisons
  • RANDOMIZED-TRIALS
  • SCHOOL CONNECTEDNESS
  • MAXIMIN
  • PATTERNS
  • optimal design
  • sample size calculation
  • ROBUST
  • informative cluster size
  • two-stage sampling
  • INTRACLASS CORRELATION VALUES

Cite this