Abstract
To estimate the mean of a quantitative variable in a hierarchical population, it is logistically convenient to sample in two stages (two-stage sampling), i.e. selecting first clusters, and then individuals from the sampled clusters. Allowing cluster size to vary in the population and to be related to the mean of the outcome variable of interest (informative cluster size), the following competing sampling designs are considered: sampling clusters with probability proportional to cluster size, and then the same number of individuals per cluster; drawing clusters with equal probability, and then the same percentage of individuals per cluster; and selecting clusters with equal probability, and then the same number of individuals per cluster. For each design, optimal sample sizes are derived under a budget constraint. The three optimal two-stage sampling designs are compared, in terms of efficiency, with each other and with simple random sampling of individuals. Sampling clusters with probability proportional to size is recommended. To overcome the dependency of the optimal design on unknown nuisance parameters, maximin designs are derived. The results are illustrated, assuming probability proportional to size sampling of clusters, with the planning of a hypothetical survey to compare adolescent alcohol consumption between France and Italy.
Original language | English |
---|---|
Pages (from-to) | 357-375 |
Number of pages | 19 |
Journal | Statistical Methods in Medical Research |
Volume | 30 |
Issue number | 2 |
DOIs | |
Publication status | Published - Feb 2021 |
Keywords
- Cross-national comparisons, informative cluster size, maximin design, optimal design, sample size calculation, two-stage sampling
- D-OPTIMAL DESIGNS
- maximin design
- Cross-national comparisons
- RANDOMIZED-TRIALS
- SCHOOL CONNECTEDNESS
- MAXIMIN
- PATTERNS
- optimal design
- sample size calculation
- ROBUST
- informative cluster size
- two-stage sampling
- INTRACLASS CORRELATION VALUES