Optimal two-stage sampling for mean estimation in multilevel populations when cluster size is informative

Francesco Innocenti; Math Candel; Frans Tan; Gerard J.P. van Breukelen

doi:10.1177/0962280220952833

Optimal two-stage sampling for mean estimation in multilevel populations when cluster size is informative

Francesco Innocenti^*, Math Candel, Frans Tan, Gerard J.P. van Breukelen

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

To estimate the mean of a quantitative variable in a hierarchical population, it is logistically convenient to sample in two stages (two-stage sampling), i.e. selecting first clusters, and then individuals from the sampled clusters. Allowing cluster size to vary in the population and to be related to the mean of the outcome variable of interest (informative cluster size), the following competing sampling designs are considered: sampling clusters with probability proportional to cluster size, and then the same number of individuals per cluster; drawing clusters with equal probability, and then the same percentage of individuals per cluster; and selecting clusters with equal probability, and then the same number of individuals per cluster. For each design, optimal sample sizes are derived under a budget constraint. The three optimal two-stage sampling designs are compared, in terms of efficiency, with each other and with simple random sampling of individuals. Sampling clusters with probability proportional to size is recommended. To overcome the dependency of the optimal design on unknown nuisance parameters, maximin designs are derived. The results are illustrated, assuming probability proportional to size sampling of clusters, with the planning of a hypothetical survey to compare adolescent alcohol consumption between France and Italy.

Original language	English
Pages (from-to)	357-375
Number of pages	19
Journal	Statistical Methods in Medical Research
Volume	30
Issue number	2
DOIs	https://doi.org/10.1177/0962280220952833
Publication status	Published - Feb 2021

Keywords

Cross-national comparisons, informative cluster size, maximin design, optimal design, sample size calculation, two-stage sampling
D-OPTIMAL DESIGNS
maximin design
Cross-national comparisons
RANDOMIZED-TRIALS
SCHOOL CONNECTEDNESS
MAXIMIN
PATTERNS
optimal design
sample size calculation
ROBUST
informative cluster size
two-stage sampling
INTRACLASS CORRELATION VALUES

Access to Document

10.1177/0962280220952833Licence: CC BY-NC

Cite this

@article{400f2965386648c297f4312dda464e51,

title = "Optimal two-stage sampling for mean estimation in multilevel populations when cluster size is informative",

abstract = "To estimate the mean of a quantitative variable in a hierarchical population, it is logistically convenient to sample in two stages (two-stage sampling), i.e. selecting first clusters, and then individuals from the sampled clusters. Allowing cluster size to vary in the population and to be related to the mean of the outcome variable of interest (informative cluster size), the following competing sampling designs are considered: sampling clusters with probability proportional to cluster size, and then the same number of individuals per cluster; drawing clusters with equal probability, and then the same percentage of individuals per cluster; and selecting clusters with equal probability, and then the same number of individuals per cluster. For each design, optimal sample sizes are derived under a budget constraint. The three optimal two-stage sampling designs are compared, in terms of efficiency, with each other and with simple random sampling of individuals. Sampling clusters with probability proportional to size is recommended. To overcome the dependency of the optimal design on unknown nuisance parameters, maximin designs are derived. The results are illustrated, assuming probability proportional to size sampling of clusters, with the planning of a hypothetical survey to compare adolescent alcohol consumption between France and Italy.",

keywords = "Cross-national comparisons, informative cluster size, maximin design, optimal design, sample size calculation, two-stage sampling, D-OPTIMAL DESIGNS, maximin design, Cross-national comparisons, RANDOMIZED-TRIALS, SCHOOL CONNECTEDNESS, MAXIMIN, PATTERNS, optimal design, sample size calculation, ROBUST, informative cluster size, two-stage sampling, INTRACLASS CORRELATION VALUES",

author = "Francesco Innocenti and Math Candel and Frans Tan and {van Breukelen}, {Gerard J.P.}",

year = "2021",

month = feb,

doi = "10.1177/0962280220952833",

language = "English",

volume = "30",

pages = "357--375",

journal = "Statistical Methods in Medical Research",

issn = "0962-2802",

publisher = "SAGE Publications Ltd",

number = "2",

}

TY - JOUR

T1 - Optimal two-stage sampling for mean estimation in multilevel populations when cluster size is informative

AU - Innocenti, Francesco

AU - Candel, Math

AU - Tan, Frans

AU - van Breukelen, Gerard J.P.

PY - 2021/2

Y1 - 2021/2

N2 - To estimate the mean of a quantitative variable in a hierarchical population, it is logistically convenient to sample in two stages (two-stage sampling), i.e. selecting first clusters, and then individuals from the sampled clusters. Allowing cluster size to vary in the population and to be related to the mean of the outcome variable of interest (informative cluster size), the following competing sampling designs are considered: sampling clusters with probability proportional to cluster size, and then the same number of individuals per cluster; drawing clusters with equal probability, and then the same percentage of individuals per cluster; and selecting clusters with equal probability, and then the same number of individuals per cluster. For each design, optimal sample sizes are derived under a budget constraint. The three optimal two-stage sampling designs are compared, in terms of efficiency, with each other and with simple random sampling of individuals. Sampling clusters with probability proportional to size is recommended. To overcome the dependency of the optimal design on unknown nuisance parameters, maximin designs are derived. The results are illustrated, assuming probability proportional to size sampling of clusters, with the planning of a hypothetical survey to compare adolescent alcohol consumption between France and Italy.

AB - To estimate the mean of a quantitative variable in a hierarchical population, it is logistically convenient to sample in two stages (two-stage sampling), i.e. selecting first clusters, and then individuals from the sampled clusters. Allowing cluster size to vary in the population and to be related to the mean of the outcome variable of interest (informative cluster size), the following competing sampling designs are considered: sampling clusters with probability proportional to cluster size, and then the same number of individuals per cluster; drawing clusters with equal probability, and then the same percentage of individuals per cluster; and selecting clusters with equal probability, and then the same number of individuals per cluster. For each design, optimal sample sizes are derived under a budget constraint. The three optimal two-stage sampling designs are compared, in terms of efficiency, with each other and with simple random sampling of individuals. Sampling clusters with probability proportional to size is recommended. To overcome the dependency of the optimal design on unknown nuisance parameters, maximin designs are derived. The results are illustrated, assuming probability proportional to size sampling of clusters, with the planning of a hypothetical survey to compare adolescent alcohol consumption between France and Italy.

KW - Cross-national comparisons, informative cluster size, maximin design, optimal design, sample size calculation, two-stage sampling

KW - D-OPTIMAL DESIGNS

KW - maximin design

KW - Cross-national comparisons

KW - RANDOMIZED-TRIALS

KW - SCHOOL CONNECTEDNESS

KW - MAXIMIN

KW - PATTERNS

KW - optimal design

KW - sample size calculation

KW - ROBUST

KW - informative cluster size

KW - two-stage sampling

KW - INTRACLASS CORRELATION VALUES

U2 - 10.1177/0962280220952833

DO - 10.1177/0962280220952833

M3 - Article

C2 - 32940135

SN - 0962-2802

VL - 30

SP - 357

EP - 375

JO - Statistical Methods in Medical Research

JF - Statistical Methods in Medical Research

IS - 2

ER -