[Biopython] consensus for forward and reverse reads from a sequencing run

Fields, Christopher J cjfields at illinois.edu
Tue Feb 25 15:40:43 UTC 2014


Torsten Seeman blogged on this and listed a bunch of tools, including a python-based approach:

    http://thegenomefactory.blogspot.com/2012/11/tools-to-merge-overlapping-paired-end.html

He also mentioned the one we have been using internally for MiSeq data (PEAR), which we have found works much better than PandaSeq in many circumstances (complete or overextended overlaps):

    http://bioinformatics.oxfordjournals.org/content/early/2013/11/10/bioinformatics.btt593.full

chris

On Feb 25, 2014, at 5:22 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> I agree that for this specific task (merging overlapped paired
> FASTQ reads) an existing dedicated tool/script is a very
> sensible choice. There are plenty to choose from.
> 
> What Biopython might benefit from is either sample code
> on the Cookbook wiki for how to do this, or perhaps a new
> function in Bio.SeqUtils? i.e. Bits to help you do something
> new or different, if you need to customise a bespoke
> analysis.
> 
> Peter
> 
> On Tue, Feb 25, 2014 at 3:34 AM, Ivan Gregoretti <ivangreg at gmail.com> wrote:
>> Hello Leo,
>> 
>> Besides pandaseq, also consider FLASH from the Salzberg lab.
>> http://ccb.jhu.edu/software/FLASH/
>> 
>> I've been using it for over a year without problems. I wish there was
>> a Biopython tool though.
>> 
>> Cheers,
>> 
>> Ivan
>> 
>> 
>> 
>> Ivan Gregoretti, PhD
>> Bioinformatics
>> 
>> 
>> 
>> On Mon, Feb 24, 2014 at 9:21 PM, Willis, Jordan R
>> <jordan.r.willis at vanderbilt.edu> wrote:
>>> Hi Leo,
>>> 
>>> I know this is not what you asked and I'm not sure if BioPython has a module, but I would really recommend pandaseq (https://github.com/neufeld/pandaseq). Its written in C, so its much faster than python and really could not be any more simple to use. I typically use this for HiSeq and MiSeq runs and it just requires the forward and reverse paired end reads and spits out a consensus (with PHRED scores if you want).
>>> 
>>> Jordan
>>> 
>>> On Feb 24, 2014, at 7:59 PM, Leo Alexander Hansmann <leo2 at stanford.edu<mailto:leo2 at stanford.edu>> wrote:
>>> 
>>> Hi,
>>> I'm getting two fasta files from an Illumina MiSeq run. One contains forward, the other reverse reads. The lines in both files are corresponding, meaning the first sequence in the forward read file should pair with the first sequence line in the reverse read file. Both sequences overlap in the middle in a varying amount of nucleotides. How can I get python or biopython to generate a file with the consensus sequences of each read. For example:
>>> sequence in the forward read file: AATCGTCGGTTACTCTG
>>> corresponding line in the reverse read file: CTCTGAGGGAGAGATC
>>> I want: AATCGTCGGTTACTCTGAGGGAGAGATC
>>> Thank you so much!
>>> Leo
>>> 
> 
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython





More information about the Biopython mailing list