### Abstract

Original language | English |
---|---|

Title of host publication | Lecture Notes in Computer Science |

Editors | H Boström, A Knobbe, C Soares, P Papapetrou |

Publisher | Springer |

Chapter | 2 |

Pages | 14-25 |

Volume | 9897 |

ISBN (Electronic) | 978-3-319-46349-0 |

ISBN (Print) | 978-3-319-46348-3 |

DOIs | |

Publication status | Published - 21 Sep 2016 |

### Publication series

Series | Advances in Intelligent Data Analysis XV |
---|---|

Volume | 9897 |

ISSN | 0302-9743 |

### Cite this

*Lecture Notes in Computer Science*(Vol. 9897, pp. 14-25). Springer. Advances in Intelligent Data Analysis XV, Vol.. 9897 https://doi.org/10.1007/978-3-319-46349-0_2

}

*Lecture Notes in Computer Science.*vol. 9897, Springer, Advances in Intelligent Data Analysis XV, vol. 9897, pp. 14-25. https://doi.org/10.1007/978-3-319-46349-0_2

**Ranking Accuracy for Logistic-GEE Models.** / Davarzani, Nasser; Peeters, Ralf; Smirnov, Evgueni; Karel, Joël; Brunner-la Rocca, Hans-peter.

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

TY - GEN

T1 - Ranking Accuracy for Logistic-GEE Models

AU - Davarzani, Nasser

AU - Peeters, Ralf

AU - Smirnov, Evgueni

AU - Karel, Joël

AU - Brunner-la Rocca, Hans-peter

PY - 2016/9/21

Y1 - 2016/9/21

N2 - The logistic generalized estimating equations (logistic-gee) models have been extensively used for analyzing clustered binary data. However, assessing the goodness-of-fit and predictability of these models is problematic due to the fact that no likelihood is available and the observations can be correlated within a cluster. In this paper we propose a new measure for estimating the generalization performance of the logistic gee models, namely ranking accuracy for models based on clustered data (ramcd). We define ramcd as the probability that a randomly selected positive observation is ranked higher than randomly selected negative observation from another cluster. We propose a computationally efficient algorithm for ramcd. The algorithm can be applied for two cases: (1) when we estimate ramcd as a goodness-of-fit criterion and (2) when we estimate ramcd as a predictability criterion. This is experimentally shown on clustered data from a simulation study and a biomarkers’ study.

AB - The logistic generalized estimating equations (logistic-gee) models have been extensively used for analyzing clustered binary data. However, assessing the goodness-of-fit and predictability of these models is problematic due to the fact that no likelihood is available and the observations can be correlated within a cluster. In this paper we propose a new measure for estimating the generalization performance of the logistic gee models, namely ranking accuracy for models based on clustered data (ramcd). We define ramcd as the probability that a randomly selected positive observation is ranked higher than randomly selected negative observation from another cluster. We propose a computationally efficient algorithm for ramcd. The algorithm can be applied for two cases: (1) when we estimate ramcd as a goodness-of-fit criterion and (2) when we estimate ramcd as a predictability criterion. This is experimentally shown on clustered data from a simulation study and a biomarkers’ study.

U2 - 10.1007/978-3-319-46349-0_2

DO - 10.1007/978-3-319-46349-0_2

M3 - Conference article in proceeding

SN - 978-3-319-46348-3

VL - 9897

T3 - Advances in Intelligent Data Analysis XV

SP - 14

EP - 25

BT - Lecture Notes in Computer Science

A2 - Boström, H

A2 - Knobbe, A

A2 - Soares, C

A2 - Papapetrou, P

PB - Springer

ER -