An Initial Study of Machine Learning Underspecification Using Feature Attribution Explainable AI Algorithms: A COVID-19 Virus Transmission Case Study

James  Hinns; Siyuan Liu; Veera Raghava Reddy Kovvuri; Mehmet Orcun Yalcin; Markus Roggenbach

doi:https://doi.org/10.1007/978-3-030-89188-6_24

An Initial Study of Machine Learning Underspecification Using Feature Attribution Explainable AI Algorithms: A COVID-19 Virus Transmission Case Study

James Hinns, Siyuan Liu, Veera Raghava Reddy Kovvuri, Mehmet Orcun Yalcin, Markus Roggenbach

Research output: Chapter in Book/Report/Conference proceeding › Chapter › Academic

Abstract

From a dataset, one can construct different machine learning (ML) models with different parameters and/or inductive biases. Although these models give similar prediction performances when tested on data that are currently available, they may not generalise equally well on unseen data. The existence of multiple equally performing models exhibits underspecification of the ML pipeline used for producing such models. In this work, we propose identifying underspecification using feature attribution algorithms developed in Explainable AI. Our hypothesis is: by studying the range of explanations produced by ML models, one can identify underspecification. We validate this by computing explanations using the Shapley additive explainer and then measuring statistical correlations between them. We experiment our approach on multiple datasets drawn from the literature, and in a COVID-19 virus transmission case study.

Original language	English
Title of host publication	PRICAI 2021: Trends in Artificial Intelligence
Editors	Duc Ngia Pham, Thanaruk Theeramunkong, Guido Governatori, Fenrong Liu
Publisher	Springer, Cham
Pages	323-335
Number of pages	12
ISBN (Electronic)	978-3-030-89188-6
ISBN (Print)	978-3-030-89187-9
DOIs	https://doi.org/10.1007/978-3-030-89188-6_24
Publication status	Published - 25 Oct 2021
Event	18th Pacific Rim International Conference on Artificial Intelligence - Hanoi, Viet Nam Duration: 8 Nov 2021 → 12 Nov 2021

Publication series

Series	Lecture Notes in Computer Science
Volume	13031
ISSN	0302-9743

Conference

Conference	18th Pacific Rim International Conference on Artificial Intelligence
Abbreviated title	PRICAI 2021
Country/Territory	Viet Nam
City	Hanoi
Period	8/11/21 → 12/11/21

Access to Document

https://doi.org/10.1007/978-3-030-89188-6_24

Cite this

Hinns, J., Liu, S., Kovvuri, V. R. R., Yalcin, M. O., & Roggenbach, M. (2021). An Initial Study of Machine Learning Underspecification Using Feature Attribution Explainable AI Algorithms: A COVID-19 Virus Transmission Case Study. In D. Ngia Pham, T. Theeramunkong, G. Governatori, & F. Liu (Eds.), PRICAI 2021: Trends in Artificial Intelligence (pp. 323-335). Springer, Cham. https://doi.org/10.1007/978-3-030-89188-6_24

Hinns, James ; Liu, Siyuan ; Kovvuri, Veera Raghava Reddy et al. / An Initial Study of Machine Learning Underspecification Using Feature Attribution Explainable AI Algorithms: A COVID-19 Virus Transmission Case Study. PRICAI 2021: Trends in Artificial Intelligence. editor / Duc Ngia Pham ; Thanaruk Theeramunkong ; Guido Governatori ; Fenrong Liu. Springer, Cham, 2021. pp. 323-335 (Lecture Notes in Computer Science, Vol. 13031).

@inbook{0581c1c79c3942eabbee464c7bdfe9c9,

title = "An Initial Study of Machine Learning Underspecification Using Feature Attribution Explainable AI Algorithms: A COVID-19 Virus Transmission Case Study",

abstract = "From a dataset, one can construct different machine learning (ML) models with different parameters and/or inductive biases. Although these models give similar prediction performances when tested on data that are currently available, they may not generalise equally well on unseen data. The existence of multiple equally performing models exhibits underspecification of the ML pipeline used for producing such models. In this work, we propose identifying underspecification using feature attribution algorithms developed in Explainable AI. Our hypothesis is: by studying the range of explanations produced by ML models, one can identify underspecification. We validate this by computing explanations using the Shapley additive explainer and then measuring statistical correlations between them. We experiment our approach on multiple datasets drawn from the literature, and in a COVID-19 virus transmission case study.",

author = "James Hinns and Siyuan Liu and Kovvuri, {Veera Raghava Reddy} and Yalcin, {Mehmet Orcun} and Markus Roggenbach",

note = "Funding Information: Acknowledgments. This work is supported by the Welsh Government Office for Science, Ser Cymru III programme – Tackling Covid-19. Publisher Copyright: {\textcopyright} 2021, Springer Nature Switzerland AG.; 18th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2021 ; Conference date: 08-11-2021 Through 12-11-2021",

year = "2021",

month = oct,

day = "25",

doi = "https://doi.org/10.1007/978-3-030-89188-6_24",

language = "English",

isbn = "978-3-030-89187-9",

series = "Lecture Notes in Computer Science",

publisher = "Springer, Cham",

pages = "323--335",

editor = "{Ngia Pham}, Duc and Thanaruk Theeramunkong and Guido Governatori and Fenrong Liu",

booktitle = "PRICAI 2021: Trends in Artificial Intelligence",

address = "Switzerland",

}

Hinns, J, Liu, S, Kovvuri, VRR, Yalcin, MO & Roggenbach, M 2021, An Initial Study of Machine Learning Underspecification Using Feature Attribution Explainable AI Algorithms: A COVID-19 Virus Transmission Case Study. in D Ngia Pham, T Theeramunkong, G Governatori & F Liu (eds), PRICAI 2021: Trends in Artificial Intelligence. Springer, Cham, Lecture Notes in Computer Science, vol. 13031, pp. 323-335, 18th Pacific Rim International Conference on Artificial Intelligence, Hanoi, Viet Nam, 8/11/21. https://doi.org/10.1007/978-3-030-89188-6_24

An Initial Study of Machine Learning Underspecification Using Feature Attribution Explainable AI Algorithms: A COVID-19 Virus Transmission Case Study. / Hinns, James ; Liu, Siyuan; Kovvuri, Veera Raghava Reddy et al.
PRICAI 2021: Trends in Artificial Intelligence. ed. / Duc Ngia Pham; Thanaruk Theeramunkong; Guido Governatori; Fenrong Liu. Springer, Cham, 2021. p. 323-335 (Lecture Notes in Computer Science, Vol. 13031).

Research output: Chapter in Book/Report/Conference proceeding › Chapter › Academic

TY - CHAP

T1 - An Initial Study of Machine Learning Underspecification Using Feature Attribution Explainable AI Algorithms: A COVID-19 Virus Transmission Case Study

AU - Hinns, James

AU - Liu, Siyuan

AU - Kovvuri, Veera Raghava Reddy

AU - Yalcin, Mehmet Orcun

AU - Roggenbach, Markus

N1 - Funding Information: Acknowledgments. This work is supported by the Welsh Government Office for Science, Ser Cymru III programme – Tackling Covid-19. Publisher Copyright: © 2021, Springer Nature Switzerland AG.

PY - 2021/10/25

Y1 - 2021/10/25

N2 - From a dataset, one can construct different machine learning (ML) models with different parameters and/or inductive biases. Although these models give similar prediction performances when tested on data that are currently available, they may not generalise equally well on unseen data. The existence of multiple equally performing models exhibits underspecification of the ML pipeline used for producing such models. In this work, we propose identifying underspecification using feature attribution algorithms developed in Explainable AI. Our hypothesis is: by studying the range of explanations produced by ML models, one can identify underspecification. We validate this by computing explanations using the Shapley additive explainer and then measuring statistical correlations between them. We experiment our approach on multiple datasets drawn from the literature, and in a COVID-19 virus transmission case study.

AB - From a dataset, one can construct different machine learning (ML) models with different parameters and/or inductive biases. Although these models give similar prediction performances when tested on data that are currently available, they may not generalise equally well on unseen data. The existence of multiple equally performing models exhibits underspecification of the ML pipeline used for producing such models. In this work, we propose identifying underspecification using feature attribution algorithms developed in Explainable AI. Our hypothesis is: by studying the range of explanations produced by ML models, one can identify underspecification. We validate this by computing explanations using the Shapley additive explainer and then measuring statistical correlations between them. We experiment our approach on multiple datasets drawn from the literature, and in a COVID-19 virus transmission case study.

U2 - https://doi.org/10.1007/978-3-030-89188-6_24

DO - https://doi.org/10.1007/978-3-030-89188-6_24

M3 - Chapter

SN - 978-3-030-89187-9

T3 - Lecture Notes in Computer Science

SP - 323

EP - 335

BT - PRICAI 2021: Trends in Artificial Intelligence

A2 - Ngia Pham, Duc

A2 - Theeramunkong, Thanaruk

A2 - Governatori, Guido

A2 - Liu, Fenrong

PB - Springer, Cham

T2 - 18th Pacific Rim International Conference on Artificial Intelligence

Y2 - 8 November 2021 through 12 November 2021

ER -

Hinns J, Liu S, Kovvuri VRR, Yalcin MO, Roggenbach M. An Initial Study of Machine Learning Underspecification Using Feature Attribution Explainable AI Algorithms: A COVID-19 Virus Transmission Case Study. In Ngia Pham D, Theeramunkong T, Governatori G, Liu F, editors, PRICAI 2021: Trends in Artificial Intelligence. Springer, Cham. 2021. p. 323-335. (Lecture Notes in Computer Science, Vol. 13031). doi: https://doi.org/10.1007/978-3-030-89188-6_24