Reliability measures in item response theory: Manifest versus latent correlation functions

E. Milanzi, G. Molenberghs*, A. Alonso, G. Verbeke, P. de Boeck

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review


For item response theory (IRT) models, which belong to the class of generalized linear or non-linear mixed models, reliability at the scale of observed scores (i.e., manifest correlation) is more difficult to calculate than latent correlation based reliability, but usually of greater scientific interest. This is not least because it cannot be calculated explicitly when the logit link is used in conjunction with normal random effects. As such, approximations such as Fisher's information coefficient, Cronbach's , or the latent correlation are calculated, allegedly because it is easy to do so. Cronbach's has well-known and serious drawbacks, Fisher's information is not meaningful under certain circumstances, and there is an important but often overlooked difference between latent and manifest correlations. Here, manifest correlation refers to correlation between observed scores, while latent correlation refers to correlation between scores at the latent (e.g., logit or probit) scale. Thus, using one in place of the other can lead to erroneous conclusions. Taylor series based reliability measures, which are based on manifest correlation functions, are derived and a careful comparison of reliability measures based on latent correlations, Fisher's information, and exact reliability is carried out. The latent correlations are virtually always considerably higher than their manifest counterparts, Fisher's information measure shows no coherent behaviour (it is even negative in some cases), while the newly introduced Taylor series based approximations reflect the exact reliability very closely. Comparisons among the various types of correlations, for various IRT models, are made using algebraic expressions, Monte Carlo simulations, and data analysis. Given the light computational burden and the performance of Taylor series based reliability measures, their use is recommended.

Original languageEnglish
Pages (from-to)43-64
Number of pages22
JournalBritish Journal of Mathematical & Statistical Psychology
Issue number1
Early online date3 Feb 2014
Publication statusPublished - Feb 2015


  • one-parameter logistic model
  • two-parameter logistic model
  • logit link
  • probit link
  • Rasch model

Cite this