Genomics | Bioinformatics
ECHO: A de novo error correction algorithm for high-throughput short-read sequencing technologies
Wei-Chun Kao*, Yun S Song
*Corresponding author: Wei-Chun Kao
Computer Science Division, University of California, Berkeley, CA, USA
F1000Posters 2010, 1: 301 (poster) [ENGLISH]
Poster [349.89 KB]
Presented at
Society for Molecular Biology and Evolution 2010 meeting,
4 - 8 Jul 2010, P-S23-06
Next generation sequencing technologies have provided an enormous amount of sequence data that makes large scale genetic and disease association study possible. These technologies generate ultra high-throughput DNA sequence data with very low per-base cost. However, these benefits come at a price: shorter read length and higher error rate. These drawbacks create many challenges for downstream sequence analysis especially when reference sequences are not available.
There are two main approaches to address these difficulties. One is to improve the signal processing and base calling algorithms for the sequencing platform. The other is to leverage the cheaper cost and higher coverage to correct the errors. In this poster, we will adopt the second approach and focus on sequence data generated by the Illumina Genome Analyzer.
We have assumed that the reference sequence is not available so that our algorithm is applicable to De Novo sequencing projects. Our algorithm automatically finds reads that cover the same region of the sample and estimates the error characteristics of the underlying sequencing platform without using reference sequences. These parameters are later used to form hypothesis-testing problems for sequence error correction. Preliminary experiments show more than 65% reduction in error rates over the entire sample of RAL-399 strand’s 2L chromosome from the Drosophila Population Genomics Project.
No relevant conflicts of interest declared.
Please note that most posters on this site present work that is preliminary in nature and has not been peer reviewed.
This poster is open access subject to the CC BY-NC Creative Commons 3.0 License

