[Biopython] extremely long execution time
Peter Cock
p.j.a.cock at googlemail.com
Thu Jun 21 16:21:07 UTC 2012
On Thu, Jun 21, 2012 at 5:08 PM, Anita Norman <Anita.Norman at slu.se> wrote:
> Hello Biopythoners!
>
> ...
>
> Here is my code with the fastqGeneralIterator:
>
> from time import time
> from Bio.SeqIO.QualityIO import FastqGeneralIterator
>
> start = time()
>
> recids = open(recidfile, 'r')
> for item in recids: recidlist.append(item[0:-2])
You must be leaving out a line to define recidlist - but I'll assume
it was just:
recidlist = []
Notice the "Filtering a sequence file" in the tutorial uses a set,
http://biopython.org/DIST/docs/tutorial/Tutorial.html and says
"Note that we use a Python set rather than a list, this makes
testing membership faster."
So try this:
recidset = set([])
for item in recids: recidset.add(item[0:-2])
(and later use recidset instead of recidlist)
Peter
More information about the Biopython
mailing list