[Biopython] extremely long execution time

Peter Cock p.j.a.cock at googlemail.com
Thu Jun 21 16:21:07 UTC 2012


On Thu, Jun 21, 2012 at 5:08 PM, Anita Norman <Anita.Norman at slu.se> wrote:
> Hello Biopythoners!
>
> ...
>
> Here is my code with the fastqGeneralIterator:
>
> from time import time
> from Bio.SeqIO.QualityIO import FastqGeneralIterator
>
> start = time()
>
> recids = open(recidfile, 'r')
> for item in recids: recidlist.append(item[0:-2])

You must be leaving out a line to define recidlist - but I'll assume
it was just:

recidlist = []

Notice the "Filtering a sequence file" in the tutorial uses a set,
http://biopython.org/DIST/docs/tutorial/Tutorial.html and says
"Note that we use a Python set rather than a list, this makes
testing membership faster."

So try this:

recidset = set([])
for item in recids: recidset.add(item[0:-2])

(and later use recidset instead of recidlist)

Peter



More information about the Biopython mailing list