Ranking Accuracy for Logistic-GEE Models

Nasser Davarzani*, Ralf Peeters, Evgueni Smirnov, Joël Karel, Hans-peter Brunner-la Rocca

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review


The logistic generalized estimating equations (logistic-gee) models have been extensively used for analyzing clustered binary data. However, assessing the goodness-of-fit and predictability of these models is problematic due to the fact that no likelihood is available and the observations can be correlated within a cluster. In this paper we propose a new measure for estimating the generalization performance of the logistic gee models, namely ranking accuracy for models based on clustered data (ramcd). We define ramcd as the probability that a randomly selected positive observation is ranked higher than randomly selected negative observation from another cluster. We propose a computationally efficient algorithm for ramcd. The algorithm can be applied for two cases: (1) when we estimate ramcd as a goodness-of-fit criterion and (2) when we estimate ramcd as a predictability criterion. This is experimentally shown on clustered data from a simulation study and a biomarkers’ study.
Original languageEnglish
Title of host publicationIDA 2016: Advances in Intelligent Data Analysis XV
EditorsH Boström, A Knobbe, C Soares, P Papapetrou
ISBN (Electronic)978-3-319-46349-0
ISBN (Print)978-3-319-46348-3
Publication statusPublished - 21 Sep 2016

Publication series

SeriesLecture Notes in Computer Science

Cite this