[Biopython] extremely long execution time

Peter Cock p.j.a.cock at googlemail.com
Fri Jun 22 21:29:43 UTC 2012


Hi Anita,

Thank you for letting us know how you got on - I'm impressed
just how much of a difference it made :)

Peter

On Fri, Jun 22, 2012 at 7:56 PM, Anita Norman <Anita.Norman at slu.se> wrote:
> Hi Peter,
>
> Thanks so much for your quick and helpful response. What a difference it
> makes. Now one entire file runs in less than one minute.
>
> All the best and happy midsummer!
>
> Anita
>
>
>
> On 21/06/2012 18:21, "Peter Cock" <p.j.a.cock at googlemail.com> wrote:
>
>>On Thu, Jun 21, 2012 at 5:08 PM, Anita Norman <Anita.Norman at slu.se> wrote:
>>> Hello Biopythoners!
>>>
>>> ...
>>>
>>> Here is my code with the fastqGeneralIterator:
>>>
>>> from time import time
>>> from Bio.SeqIO.QualityIO import FastqGeneralIterator
>>>
>>> start = time()
>>>
>>> recids = open(recidfile, 'r')
>>> for item in recids: recidlist.append(item[0:-2])
>>
>>You must be leaving out a line to define recidlist - but I'll assume
>>it was just:
>>
>>recidlist = []
>>
>>Notice the "Filtering a sequence file" in the tutorial uses a set,
>>http://biopython.org/DIST/docs/tutorial/Tutorial.html and says
>>"Note that we use a Python set rather than a list, this makes
>>testing membership faster."
>>
>>So try this:
>>
>>recidset = set([])
>>for item in recids: recidset.add(item[0:-2])
>>
>>(and later use recidset instead of recidlist)
>>
>>Peter
>



More information about the Biopython mailing list