The smarty4covid dataset and knowledge base as a framework for interpretable physiological audio data analysis

Konstantia Zarkogianni; Edmund Dervakos; George Filandrianos; Theofanis Ganitidis; Vasiliki Gkatzou; Aikaterini Sakagianni; Raghu Raghavendra; C. L. Max Nikias; Giorgos Stamou; Konstantina S. Nikita

doi:10.1038/s41597-023-02646-6

The smarty4covid dataset and knowledge base as a framework for interpretable physiological audio data analysis

Konstantia Zarkogianni^*, Edmund Dervakos, George Filandrianos, Theofanis Ganitidis, Vasiliki Gkatzou, Aikaterini Sakagianni, Raghu Raghavendra, C. L. Max Nikias, Giorgos Stamou, Konstantina S. Nikita

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Harnessing the power of Artificial Intelligence (AI) and m-health towards detecting new bio-markers indicative of the onset and progress of respiratory abnormalities/conditions has greatly attracted the scientific and research interest especially during COVID-19 pandemic. The smarty4covid dataset contains audio signals of cough (4,676), regular breathing (4,665), deep breathing (4,695) and voice (4,291) as recorded by means of mobile devices following a crowd-sourcing approach. Other self reported information is also included (e.g. COVID-19 virus tests), thus providing a comprehensive dataset for the development of COVID-19 risk detection models. The smarty4covid dataset is released in the form of a web-ontology language (OWL) knowledge base enabling data consolidation from other relevant datasets, complex queries and reasoning. It has been utilized towards the development of models able to: (i) extract clinically informative respiratory indicators from regular breathing records, and (ii) identify cough, breath and voice segments in crowd-sourced audio recordings. A new framework utilizing the smarty4covid OWL knowledge base towards generating counterfactual explanations in opaque AI-based COVID-19 risk detection models is proposed and validated.

Original language	English
Article number	770
Journal	Scientific data
Volume	10
Issue number	1
DOIs	https://doi.org/10.1038/s41597-023-02646-6
Publication status	Published - 1 Dec 2023

Access to Document

10.1038/s41597-023-02646-6Licence: CC BY

Cite this

@article{a5267e371fc2492cafc837aa34382726,

title = "The smarty4covid dataset and knowledge base as a framework for interpretable physiological audio data analysis",

abstract = "Harnessing the power of Artificial Intelligence (AI) and m-health towards detecting new bio-markers indicative of the onset and progress of respiratory abnormalities/conditions has greatly attracted the scientific and research interest especially during COVID-19 pandemic. The smarty4covid dataset contains audio signals of cough (4,676), regular breathing (4,665), deep breathing (4,695) and voice (4,291) as recorded by means of mobile devices following a crowd-sourcing approach. Other self reported information is also included (e.g. COVID-19 virus tests), thus providing a comprehensive dataset for the development of COVID-19 risk detection models. The smarty4covid dataset is released in the form of a web-ontology language (OWL) knowledge base enabling data consolidation from other relevant datasets, complex queries and reasoning. It has been utilized towards the development of models able to: (i) extract clinically informative respiratory indicators from regular breathing records, and (ii) identify cough, breath and voice segments in crowd-sourced audio recordings. A new framework utilizing the smarty4covid OWL knowledge base towards generating counterfactual explanations in opaque AI-based COVID-19 risk detection models is proposed and validated.",

author = "Konstantia Zarkogianni and Edmund Dervakos and George Filandrianos and Theofanis Ganitidis and Vasiliki Gkatzou and Aikaterini Sakagianni and Raghu Raghavendra and {Max Nikias}, {C. L.} and Giorgos Stamou and Nikita, {Konstantina S.}",

note = "Funding Information: This research was funded by the Hellenic Foundation for Research and Innovation-H.F.R.I within the framework of the H.R.F.I Science & Society “Interventions to address the economic and social consequences of the COVID-19 pandemic” call. Grant number: 05020. Publisher Copyright: {\textcopyright} 2023, The Author(s).",

year = "2023",

month = dec,

day = "1",

doi = "10.1038/s41597-023-02646-6",

language = "English",

volume = "10",

journal = "Scientific data",

issn = "2052-4463",

publisher = "Nature Publishing Group",

number = "1",

}

TY - JOUR

T1 - The smarty4covid dataset and knowledge base as a framework for interpretable physiological audio data analysis

AU - Zarkogianni, Konstantia

AU - Dervakos, Edmund

AU - Filandrianos, George

AU - Ganitidis, Theofanis

AU - Gkatzou, Vasiliki

AU - Sakagianni, Aikaterini

AU - Raghavendra, Raghu

AU - Max Nikias, C. L.

AU - Stamou, Giorgos

AU - Nikita, Konstantina S.

N1 - Funding Information: This research was funded by the Hellenic Foundation for Research and Innovation-H.F.R.I within the framework of the H.R.F.I Science & Society “Interventions to address the economic and social consequences of the COVID-19 pandemic” call. Grant number: 05020. Publisher Copyright: © 2023, The Author(s).

PY - 2023/12/1

Y1 - 2023/12/1

N2 - Harnessing the power of Artificial Intelligence (AI) and m-health towards detecting new bio-markers indicative of the onset and progress of respiratory abnormalities/conditions has greatly attracted the scientific and research interest especially during COVID-19 pandemic. The smarty4covid dataset contains audio signals of cough (4,676), regular breathing (4,665), deep breathing (4,695) and voice (4,291) as recorded by means of mobile devices following a crowd-sourcing approach. Other self reported information is also included (e.g. COVID-19 virus tests), thus providing a comprehensive dataset for the development of COVID-19 risk detection models. The smarty4covid dataset is released in the form of a web-ontology language (OWL) knowledge base enabling data consolidation from other relevant datasets, complex queries and reasoning. It has been utilized towards the development of models able to: (i) extract clinically informative respiratory indicators from regular breathing records, and (ii) identify cough, breath and voice segments in crowd-sourced audio recordings. A new framework utilizing the smarty4covid OWL knowledge base towards generating counterfactual explanations in opaque AI-based COVID-19 risk detection models is proposed and validated.

AB - Harnessing the power of Artificial Intelligence (AI) and m-health towards detecting new bio-markers indicative of the onset and progress of respiratory abnormalities/conditions has greatly attracted the scientific and research interest especially during COVID-19 pandemic. The smarty4covid dataset contains audio signals of cough (4,676), regular breathing (4,665), deep breathing (4,695) and voice (4,291) as recorded by means of mobile devices following a crowd-sourcing approach. Other self reported information is also included (e.g. COVID-19 virus tests), thus providing a comprehensive dataset for the development of COVID-19 risk detection models. The smarty4covid dataset is released in the form of a web-ontology language (OWL) knowledge base enabling data consolidation from other relevant datasets, complex queries and reasoning. It has been utilized towards the development of models able to: (i) extract clinically informative respiratory indicators from regular breathing records, and (ii) identify cough, breath and voice segments in crowd-sourced audio recordings. A new framework utilizing the smarty4covid OWL knowledge base towards generating counterfactual explanations in opaque AI-based COVID-19 risk detection models is proposed and validated.

U2 - 10.1038/s41597-023-02646-6

DO - 10.1038/s41597-023-02646-6

M3 - Article

SN - 2052-4463

VL - 10

JO - Scientific data

JF - Scientific data

IS - 1

M1 - 770

ER -