A data reusability assessment in the nanosafety domain based on the NSDRA framework followed by an exploratory quantitative structure activity relationships (QSAR) modeling targeting cellular viability

Irini Furxhi; Egon Willighagen; Chris Evelo; Anna Costa; Davide Gardini; Ammar Ammar

doi:10.1016/j.impact.2023.100475

A data reusability assessment in the nanosafety domain based on the NSDRA framework followed by an exploratory quantitative structure activity relationships (QSAR) modeling targeting cellular viability

Irini Furxhi, Egon Willighagen, Chris Evelo, Anna Costa, Davide Gardini, Ammar Ammar^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Introduction
The current effort towards the digital transformation across multiple scientific domains requires data that is Findable, Accessible, Interoperable and Reusable (FAIR). In addition to the FAIR data, what is required for the application of computational tools, such as Quantitative Structure Activity Relationships (QSARs), is a sufficient data volume and the ability to merge sources into homogeneous digital assets. In the nanosafety domain there is a lack of FAIR available metadata.

Methodology
To address this challenge, we utilized 34 datasets from the nanosafety domain by exploiting the NanoSafety Data Reusability Assessment (NSDRA) framework, which allowed the annotation and assessment of dataset's reusability. From the framework's application results, eight datasets targeting the same endpoint (i.e. numerical cellular viability) were selected, processed and merged to test several hypothesis including universal versus nanogroup-specific QSAR models (metal oxide and nanotubes), and regression versus classification Machine Learning (ML) algorithms.

Results
Universal regression and classification QSARs reached an 0.86 R2 and 0.92 accuracy, respectively, for the test set. Nanogroup-specific regression models reached 0.88 R2 for nanotubes test set followed by metal oxide (0.78). Nanogroup-specific classification models reached 0.99 accuracy for nanotubes test set, followed by metal oxide (0.91). Feature importance revealed different patterns depending on the dataset with common influential features including core size, exposure conditions and toxicological assay.

Even in the case where the available experimental knowledge was merged, the models still failed to correctly predict the outputs of an unseen dataset, revealing the cumbersome conundrum of scientific reproducibility in realistic applications of QSAR for nanosafety. To harness the full potential of computational tools and ensure their long-term applications, embracing FAIR data practices is imperative in driving the development of responsible QSAR models.

Conclusions
This study reveals that the digitalization of nanosafety knowledge in a reproducible manner has a long way towards its successful pragmatic implementation. The workflow carried out in the study shows a promising approach to increase the FAIRness across all the elements of computational studies, from dataset's annotation, selection, merging to FAIR modeling reporting. This has significant implications for future research as it provides an example of how to utilize and report different tools available in the nanosafety knowledge system, while increasing the transparency of the results. One of the main benefits of this workflow is that it promotes data sharing and reuse, which is essential for advancing scientific knowledge by making data and metadata FAIR compliant. In addition, the increased transparency and reproducibility of the results can enhance the trustworthiness of the computational findings.

Original language	English
Article number	100475
Number of pages	16
Journal	NanoImpact
Volume	31
Issue number	1
DOIs	https://doi.org/10.1016/j.impact.2023.100475
Publication status	Published - 1 Jul 2023

Access to Document

10.1016/j.impact.2023.100475Licence: CC BY

Cite this

@article{0455b46269324c8e85f357226a9cb3e5,

title = "A data reusability assessment in the nanosafety domain based on the NSDRA framework followed by an exploratory quantitative structure activity relationships (QSAR) modeling targeting cellular viability",

abstract = "IntroductionThe current effort towards the digital transformation across multiple scientific domains requires data that is Findable, Accessible, Interoperable and Reusable (FAIR). In addition to the FAIR data, what is required for the application of computational tools, such as Quantitative Structure Activity Relationships (QSARs), is a sufficient data volume and the ability to merge sources into homogeneous digital assets. In the nanosafety domain there is a lack of FAIR available metadata.MethodologyTo address this challenge, we utilized 34 datasets from the nanosafety domain by exploiting the NanoSafety Data Reusability Assessment (NSDRA) framework, which allowed the annotation and assessment of dataset's reusability. From the framework's application results, eight datasets targeting the same endpoint (i.e. numerical cellular viability) were selected, processed and merged to test several hypothesis including universal versus nanogroup-specific QSAR models (metal oxide and nanotubes), and regression versus classification Machine Learning (ML) algorithms.ResultsUniversal regression and classification QSARs reached an 0.86 R2 and 0.92 accuracy, respectively, for the test set. Nanogroup-specific regression models reached 0.88 R2 for nanotubes test set followed by metal oxide (0.78). Nanogroup-specific classification models reached 0.99 accuracy for nanotubes test set, followed by metal oxide (0.91). Feature importance revealed different patterns depending on the dataset with common influential features including core size, exposure conditions and toxicological assay.Even in the case where the available experimental knowledge was merged, the models still failed to correctly predict the outputs of an unseen dataset, revealing the cumbersome conundrum of scientific reproducibility in realistic applications of QSAR for nanosafety. To harness the full potential of computational tools and ensure their long-term applications, embracing FAIR data practices is imperative in driving the development of responsible QSAR models.ConclusionsThis study reveals that the digitalization of nanosafety knowledge in a reproducible manner has a long way towards its successful pragmatic implementation. The workflow carried out in the study shows a promising approach to increase the FAIRness across all the elements of computational studies, from dataset's annotation, selection, merging to FAIR modeling reporting. This has significant implications for future research as it provides an example of how to utilize and report different tools available in the nanosafety knowledge system, while increasing the transparency of the results. One of the main benefits of this workflow is that it promotes data sharing and reuse, which is essential for advancing scientific knowledge by making data and metadata FAIR compliant. In addition, the increased transparency and reproducibility of the results can enhance the trustworthiness of the computational findings.",

author = "Irini Furxhi and Egon Willighagen and Chris Evelo and Anna Costa and Davide Gardini and Ammar Ammar",

year = "2023",

month = jul,

day = "1",

doi = "10.1016/j.impact.2023.100475",

language = "English",

volume = "31",

journal = "NanoImpact",

issn = "2452-0748",

publisher = "Elsevier BV",

number = "1",

}

A data reusability assessment in the nanosafety domain based on the NSDRA framework followed by an exploratory quantitative structure activity relationships (QSAR) modeling targeting cellular viability. / Furxhi, Irini; Willighagen, Egon ; Evelo, Chris et al.
In: NanoImpact, Vol. 31, No. 1, 100475, 01.07.2023.

Research output: Contribution to journal › Article › Academic › peer-review

TY - JOUR

T1 - A data reusability assessment in the nanosafety domain based on the NSDRA framework followed by an exploratory quantitative structure activity relationships (QSAR) modeling targeting cellular viability

AU - Furxhi, Irini

AU - Willighagen, Egon

AU - Evelo, Chris

AU - Costa, Anna

AU - Gardini, Davide

AU - Ammar, Ammar

PY - 2023/7/1

Y1 - 2023/7/1

N2 - IntroductionThe current effort towards the digital transformation across multiple scientific domains requires data that is Findable, Accessible, Interoperable and Reusable (FAIR). In addition to the FAIR data, what is required for the application of computational tools, such as Quantitative Structure Activity Relationships (QSARs), is a sufficient data volume and the ability to merge sources into homogeneous digital assets. In the nanosafety domain there is a lack of FAIR available metadata.MethodologyTo address this challenge, we utilized 34 datasets from the nanosafety domain by exploiting the NanoSafety Data Reusability Assessment (NSDRA) framework, which allowed the annotation and assessment of dataset's reusability. From the framework's application results, eight datasets targeting the same endpoint (i.e. numerical cellular viability) were selected, processed and merged to test several hypothesis including universal versus nanogroup-specific QSAR models (metal oxide and nanotubes), and regression versus classification Machine Learning (ML) algorithms.ResultsUniversal regression and classification QSARs reached an 0.86 R2 and 0.92 accuracy, respectively, for the test set. Nanogroup-specific regression models reached 0.88 R2 for nanotubes test set followed by metal oxide (0.78). Nanogroup-specific classification models reached 0.99 accuracy for nanotubes test set, followed by metal oxide (0.91). Feature importance revealed different patterns depending on the dataset with common influential features including core size, exposure conditions and toxicological assay.Even in the case where the available experimental knowledge was merged, the models still failed to correctly predict the outputs of an unseen dataset, revealing the cumbersome conundrum of scientific reproducibility in realistic applications of QSAR for nanosafety. To harness the full potential of computational tools and ensure their long-term applications, embracing FAIR data practices is imperative in driving the development of responsible QSAR models.ConclusionsThis study reveals that the digitalization of nanosafety knowledge in a reproducible manner has a long way towards its successful pragmatic implementation. The workflow carried out in the study shows a promising approach to increase the FAIRness across all the elements of computational studies, from dataset's annotation, selection, merging to FAIR modeling reporting. This has significant implications for future research as it provides an example of how to utilize and report different tools available in the nanosafety knowledge system, while increasing the transparency of the results. One of the main benefits of this workflow is that it promotes data sharing and reuse, which is essential for advancing scientific knowledge by making data and metadata FAIR compliant. In addition, the increased transparency and reproducibility of the results can enhance the trustworthiness of the computational findings.

AB - IntroductionThe current effort towards the digital transformation across multiple scientific domains requires data that is Findable, Accessible, Interoperable and Reusable (FAIR). In addition to the FAIR data, what is required for the application of computational tools, such as Quantitative Structure Activity Relationships (QSARs), is a sufficient data volume and the ability to merge sources into homogeneous digital assets. In the nanosafety domain there is a lack of FAIR available metadata.MethodologyTo address this challenge, we utilized 34 datasets from the nanosafety domain by exploiting the NanoSafety Data Reusability Assessment (NSDRA) framework, which allowed the annotation and assessment of dataset's reusability. From the framework's application results, eight datasets targeting the same endpoint (i.e. numerical cellular viability) were selected, processed and merged to test several hypothesis including universal versus nanogroup-specific QSAR models (metal oxide and nanotubes), and regression versus classification Machine Learning (ML) algorithms.ResultsUniversal regression and classification QSARs reached an 0.86 R2 and 0.92 accuracy, respectively, for the test set. Nanogroup-specific regression models reached 0.88 R2 for nanotubes test set followed by metal oxide (0.78). Nanogroup-specific classification models reached 0.99 accuracy for nanotubes test set, followed by metal oxide (0.91). Feature importance revealed different patterns depending on the dataset with common influential features including core size, exposure conditions and toxicological assay.Even in the case where the available experimental knowledge was merged, the models still failed to correctly predict the outputs of an unseen dataset, revealing the cumbersome conundrum of scientific reproducibility in realistic applications of QSAR for nanosafety. To harness the full potential of computational tools and ensure their long-term applications, embracing FAIR data practices is imperative in driving the development of responsible QSAR models.ConclusionsThis study reveals that the digitalization of nanosafety knowledge in a reproducible manner has a long way towards its successful pragmatic implementation. The workflow carried out in the study shows a promising approach to increase the FAIRness across all the elements of computational studies, from dataset's annotation, selection, merging to FAIR modeling reporting. This has significant implications for future research as it provides an example of how to utilize and report different tools available in the nanosafety knowledge system, while increasing the transparency of the results. One of the main benefits of this workflow is that it promotes data sharing and reuse, which is essential for advancing scientific knowledge by making data and metadata FAIR compliant. In addition, the increased transparency and reproducibility of the results can enhance the trustworthiness of the computational findings.

U2 - 10.1016/j.impact.2023.100475

DO - 10.1016/j.impact.2023.100475

M3 - Article

C2 - 37423508

SN - 2452-0748

VL - 31

JO - NanoImpact

JF - NanoImpact

IS - 1

M1 - 100475

ER -