Purpose Radiomics is the process to automate tumor feature extraction from medical images. This has shown potential for quantifying the tumor phenotype and predicting treatment response. The three major challenges of radiomics research and clinical adoption are: (a) lack of standardized methodology for radiomics analyses, (b) lack of a universal lexicon to denote features that are semantically equivalent, and (c) lists of feature values alone do not sufficiently capture the details of feature extraction that might nonetheless strongly affect feature values (e.g. image normalization or interpolation parameters). These barriers hamper multicenter validation studies applying subtly different imaging protocols, preprocessing steps and radiomics software. We propose an open-source ontology-guided radiomics analysis workflow (O-RAW) to address the above challenges in the following manner: (a) distributing a free and open-source software package for radiomics analysis, (b) deploying a standard lexicon to uniquely describe features in common usage and (c) provide methods to publish radiomic features as a semantically interoperable data graph object complying to FAIR (findable accessible interoperable reusable) data principles. Methods O-RAW was developed in Python, and has three major modules using open-source component libraries (PyRadiomics Extension and PyRadiomics). First, PyRadiomics Extension takes standard DICOM-RT (Radiotherapy) input objects (i.e. a DICOM series and an RTSTRUCT file) and parses them as arrays of voxel intensities and a binary mask corresponding to a volume of interest (VOI). Next, these arrays are passed into PyRadiomics, which performs the feature extraction procedure and returns a Python dictionary object. Lastly, PyRadiomics Extension parses this dictionary as a W3C-compliant Semantic Web "triple store" (i.e., list of subject-predicate-object statements) with relevant semantic meta-labels drawn from the radiation oncology ontology and radiomics ontology. The output can be published on an SPARQL endpoint, and can be remotely examined via SPARQL queries or to a comma separated file for further analysis. Results We showed that O-RAW executed efficiently on four datasets with different modalities, RIDER (CT), MMD (CT), CROSS (PET) and THUNDER (MR). The test was performed on an HP laptop running Windows 7 operating system and 8GB RAM on which we noted execution time including DICOM images and associated RTSTRUCT matching, binary mask conversion of a single VOI, batch-processing of feature extraction (105 basic features in PyRadiomics), and the conversion to an resource description framework (RDF) object. The results were (RIDER) 407.3, (MMD) 123.5, (CROSS) 513.2 and (THUNDER) 128.9 s for a single VOI. In addition, we demonstrated a use case, taking images from a public repository and publishing the radiomics results as FAIR data in this study on . Finally, we provided a practical instance to show how a user could query radiomic features and track the calculation details based on the RDF graph object created by O-RAW via a simple SPARQL query. Conclusions We implemented O-RAW for FAIR radiomics analysis, and successfully published radiomic features from DICOM-RT objects as semantic web triples. Its practicability and flexibility can greatly increase the development of radiomics research and ease transfer to clinical practice.
- FAIR data
- semantic web