PEMapper and PECaller provide a simplified approach to whole-genome sequencing

H. Richard Johnston, Pankaj Chopra, Thomas S. Wingo, Viren Patel, Michael P. Epstein, Jennifer G. Mulle, Stephen T. Warren*, Michael E. Zwick*, Inter Consortium Brain Behav 22q11, Thérèse van Amelsvoort, David J. Cutler*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

The analysis of human whole-genome sequencing data presents significant computational challenges. The sheer size of datasets places an enormous burden on computational, disk array, and network resources. Here, we present an integrated computational package, PEMapper/PECaller, that was designed specifically to minimize the burden on networks and disk arrays, create output files that are minimal in size, and run in a highly computationally efficient way, with the single goal of enabling whole-genome sequencing at scale. In addition to improved computational efficiency, we implement a statistical framework that allows for a base by base error model, allowing this package to perform as well or better than the widely used Genome Analysis Toolkit (GATK) in all key measures of performance on human whole-genome sequences.

Original languageEnglish
Pages (from-to)E1923-E1932
Number of pages10
JournalProceedings of the National Academy of Sciences of the United States of America
Volume114
Issue number10
DOIs
Publication statusPublished - 7 Mar 2017

Keywords

  • genome sequencing
  • GATK
  • sequence mapping
  • SNP calling
  • software
  • 22Q11.2 DELETION SYNDROME
  • EXOME VARIANTS
  • GENERATION
  • SCHIZOPHRENIA
  • FRAMEWORK
  • ALIGNMENT
  • HUMANS
  • NUMBER
  • TOOLS
  • RISK

Cite this