[Biopython] fastq manipulations speed

Peter Cock p.j.a.cock at googlemail.com
Tue Mar 19 09:24:39 UTC 2013


Excellent :)

P.S. Try to include the mailing list in your replies

On Tuesday, March 19, 2013, natassa wrote:

> Hello,
> Just to let you kniw that the script adapted with your suggestions
> comopleted very fast, ie within a few hours only. Thank you!
> Natassa
>
>
>   ------------------------------
> *From:* Peter Cock <p.j.a.cock at googlemail.com <javascript:_e({}, 'cvml',
> 'p.j.a.cock at googlemail.com');>>
> *To:* natassa <natassa_g_2000 at yahoo.com <javascript:_e({}, 'cvml',
> 'natassa_g_2000 at yahoo.com');>>
> *Cc:* Biopython Mailing List <biopython at lists.open-bio.org<javascript:_e({}, 'cvml', 'biopython at lists.open-bio.org');>>
>
> *Sent:* Sunday, March 17, 2013 5:33 PM
> *Subject:* Re: [Biopython] fastq manipulations speed
>
> On Mon, Mar 18, 2013 at 12:21 AM, natassa <natassa_g_2000 at yahoo.com<javascript:_e({}, 'cvml', 'natassa_g_2000 at yahoo.com');>>
> wrote:
> > Thanks, the length-1 was an error, it was supposed to be 0:length to get
> the
> > qualities of the associated trimmed files. The script seems to be running
> > much faster! But what would be your other suggestions?
> > Natassa
>
> You should be able to refactor the code to make a single call to
> SeqIO.write by giving it a generator which constructs all the
> trimmed records. That would require a bit of thought and
> experience with iterators, generator functions and/or generator
> expression - but can be a really powerful way to think about
> things. I'm expecting this to be faster, but the second idea
> below will definitely be faster, perhaps five times as fast...
>
> More straightforwardly, you don't need to use SeqRecord
> objects for this task - they make slicing the sequence and
> quality easier, but come with a performance cost. See:
> http://news.open-bio.org/news/2009/09/biopython-fast-fastq/
>
> In addition, consider doing the same for the FASTA file with:
> from Bio.SeqIO.FastaIO import SimpleFastaParser
> (requires Biopython 1.61 or later - looks like that wasn't
> highlighted in the release notes which was an oversight).
>
> Good night,
>
> Peter
>
>
>



More information about the Biopython mailing list