Pathway Enrichment Based on Text Mining and Its Validation on Carotenoid and Vitamin A Metabolism

A.S. Waagmeester*, P. Pezik, S. Coort, F. Tourniaire, C. Evelo, D. Rebholz Schuhmann

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review


Abstract Carotenoid metabolism is relevant to the prevention of various diseases. Although the main actors in this metabolic pathway are known, our understanding of the pathway is still incomplete. The information on the carotenoids is scattered in the large and growing body of scientific literature. We designed a text-mining work flow to enrich existing pathways. It has been validated on the vitamin A pathway, which is a well-studied part of the carotenoid metabolism. In this study we used the vitamin A metabolism pathway as it has been described by an expert team on carotenoid metabolism from the European network of excellence in Nutrigenomics (NuGO). This work flow uses an initial set of publications cited in a review paper (1,191 publications), enlarges this corpus with Medline abstracts (13,579 documents), and then extracts the key terminology from all relevant publications. Domain experts validated the intermediate and final results of our text-mining work flow. With our approach we were able to enrich the pathway representing vitamin A metabolism. We found 37 new and relevant terms from a total of 89,086 terms, which have been qualified for inclusion in the analyzed pathway. These 37 terms have been assessed manually and as a result 13 new terms were then added as entities to the pathway. Another 14 entities belonged to other pathways, which could form the link of these pathways with the vitamin A pathway. The remaining 10 terms were classified as biomarkers or nutrients. Automatic literature analysis improves the enrichment of pathways with entities already described in the scientific literature.
Original languageEnglish
Pages (from-to)367-379
JournalOMICS-a journal of Integrative Biology
Issue number5
Publication statusPublished - 1 Jan 2009

Cite this