[Biopython] consensus for forward and reverse reads from a sequencing run

Peter Cock p.j.a.cock at googlemail.com
Tue Feb 25 11:22:09 UTC 2014


I agree that for this specific task (merging overlapped paired
FASTQ reads) an existing dedicated tool/script is a very
sensible choice. There are plenty to choose from.

What Biopython might benefit from is either sample code
on the Cookbook wiki for how to do this, or perhaps a new
function in Bio.SeqUtils? i.e. Bits to help you do something
new or different, if you need to customise a bespoke
analysis.

Peter

On Tue, Feb 25, 2014 at 3:34 AM, Ivan Gregoretti <ivangreg at gmail.com> wrote:
> Hello Leo,
>
> Besides pandaseq, also consider FLASH from the Salzberg lab.
> http://ccb.jhu.edu/software/FLASH/
>
> I've been using it for over a year without problems. I wish there was
> a Biopython tool though.
>
> Cheers,
>
> Ivan
>
>
>
> Ivan Gregoretti, PhD
> Bioinformatics
>
>
>
> On Mon, Feb 24, 2014 at 9:21 PM, Willis, Jordan R
> <jordan.r.willis at vanderbilt.edu> wrote:
>> Hi Leo,
>>
>> I know this is not what you asked and I'm not sure if BioPython has a module, but I would really recommend pandaseq (https://github.com/neufeld/pandaseq). Its written in C, so its much faster than python and really could not be any more simple to use. I typically use this for HiSeq and MiSeq runs and it just requires the forward and reverse paired end reads and spits out a consensus (with PHRED scores if you want).
>>
>> Jordan
>>
>> On Feb 24, 2014, at 7:59 PM, Leo Alexander Hansmann <leo2 at stanford.edu<mailto:leo2 at stanford.edu>> wrote:
>>
>> Hi,
>> I'm getting two fasta files from an Illumina MiSeq run. One contains forward, the other reverse reads. The lines in both files are corresponding, meaning the first sequence in the forward read file should pair with the first sequence line in the reverse read file. Both sequences overlap in the middle in a varying amount of nucleotides. How can I get python or biopython to generate a file with the consensus sequences of each read. For example:
>> sequence in the forward read file: AATCGTCGGTTACTCTG
>> corresponding line in the reverse read file: CTCTGAGGGAGAGATC
>> I want: AATCGTCGGTTACTCTGAGGGAGAGATC
>> Thank you so much!
>> Leo
>>




More information about the Biopython mailing list