Statistical inference for agreement between multiple raters on a binary scale

Sophie Vanbelle

doi:10.1111/bmsp.12333

Statistical inference for agreement between multiple raters on a binary scale

Sophie Vanbelle^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Agreement studies often involve more than two raters or repeated measurements. In the presence of two raters, the proportion of agreement and of positive agreement are simple and popular agreement measures for binary scales. These measures were generalized to agreement studies involving more than two raters with statistical inference procedures proposed on an empirical basis. We present two alternatives. The first is a Wald confidence interval using standard errors obtained by the delta method. The second involves Bayesian statistical inference not requiring any specific Bayesian software. These new procedures show better statistical behaviour than the confidence intervals initially proposed. In addition, we provide analytical formulas to determine the minimum number of persons needed for a given number of raters when planning an agreement study. All methods are implemented in the R package simpleagree and the Shiny app simpleagree.

Original language	English
Pages (from-to)	245-260
Number of pages	16
Journal	British Journal of Mathematical & Statistical Psychology
Volume	77
Issue number	2
Early online date	1 Jan 2024
DOIs	https://doi.org/10.1111/bmsp.12333
Publication status	E-pub ahead of print - 1 Jan 2024

Keywords

confidence interval
credibility interval
dichotomous
raters
sample size
CONFIDENCE-INTERVALS
KAPPA
RELIABILITY
ASSOCIATION
MODEL

Access to Document

10.1111/bmsp.12333Licence: CC BY-NC

Cite this

@article{1bd7a23854b146c497414ff70935d8f1,

title = "Statistical inference for agreement between multiple raters on a binary scale",

abstract = "Agreement studies often involve more than two raters or repeated measurements. In the presence of two raters, the proportion of agreement and of positive agreement are simple and popular agreement measures for binary scales. These measures were generalized to agreement studies involving more than two raters with statistical inference procedures proposed on an empirical basis. We present two alternatives. The first is a Wald confidence interval using standard errors obtained by the delta method. The second involves Bayesian statistical inference not requiring any specific Bayesian software. These new procedures show better statistical behaviour than the confidence intervals initially proposed. In addition, we provide analytical formulas to determine the minimum number of persons needed for a given number of raters when planning an agreement study. All methods are implemented in the R package simpleagree and the Shiny app simpleagree.",

keywords = "confidence interval, credibility interval, dichotomous, raters, sample size, CONFIDENCE-INTERVALS, KAPPA, RELIABILITY, ASSOCIATION, MODEL",

author = "Sophie Vanbelle",

year = "2024",

month = jan,

day = "1",

doi = "10.1111/bmsp.12333",

language = "English",

volume = "77",

pages = "245--260",

journal = "British Journal of Mathematical & Statistical Psychology",

issn = "0007-1102",

publisher = "Wiley-Blackwell",

number = "2",

}

TY - JOUR

T1 - Statistical inference for agreement between multiple raters on a binary scale

AU - Vanbelle, Sophie

PY - 2024/1/1

Y1 - 2024/1/1

N2 - Agreement studies often involve more than two raters or repeated measurements. In the presence of two raters, the proportion of agreement and of positive agreement are simple and popular agreement measures for binary scales. These measures were generalized to agreement studies involving more than two raters with statistical inference procedures proposed on an empirical basis. We present two alternatives. The first is a Wald confidence interval using standard errors obtained by the delta method. The second involves Bayesian statistical inference not requiring any specific Bayesian software. These new procedures show better statistical behaviour than the confidence intervals initially proposed. In addition, we provide analytical formulas to determine the minimum number of persons needed for a given number of raters when planning an agreement study. All methods are implemented in the R package simpleagree and the Shiny app simpleagree.

AB - Agreement studies often involve more than two raters or repeated measurements. In the presence of two raters, the proportion of agreement and of positive agreement are simple and popular agreement measures for binary scales. These measures were generalized to agreement studies involving more than two raters with statistical inference procedures proposed on an empirical basis. We present two alternatives. The first is a Wald confidence interval using standard errors obtained by the delta method. The second involves Bayesian statistical inference not requiring any specific Bayesian software. These new procedures show better statistical behaviour than the confidence intervals initially proposed. In addition, we provide analytical formulas to determine the minimum number of persons needed for a given number of raters when planning an agreement study. All methods are implemented in the R package simpleagree and the Shiny app simpleagree.

KW - confidence interval

KW - credibility interval

KW - dichotomous

KW - raters

KW - sample size

KW - CONFIDENCE-INTERVALS

KW - KAPPA

KW - RELIABILITY

KW - ASSOCIATION

KW - MODEL

U2 - 10.1111/bmsp.12333

DO - 10.1111/bmsp.12333

M3 - Article

SN - 0007-1102

VL - 77

SP - 245

EP - 260

JO - British Journal of Mathematical & Statistical Psychology

JF - British Journal of Mathematical & Statistical Psychology

IS - 2

ER -