[Biopython] fastq manipulations speed

Peter Cock p.j.a.cock at googlemail.com
Mon Mar 18 00:07:26 UTC 2013


On Sun, Mar 17, 2013 at 11:02 PM, natassa <natassa_g_2000 at yahoo.com> wrote:
> Thanks Peter,
> I am using python quite often, but I was missing the fact that I don't need
> the keys method (!), and I always used dictionaries instead of sets. I am
> not very clear though about this point, I mean, is the use of set faster or
> not in general?

For checking membership, "key in my_dict" and "key in my_set"
should take about the same time - and both will be much faster
than "key in my_list" or "key in my_tuple" when you have a lot
of things to check.

If all you want the data structure for is checking membership,
then use a set. If you need to associate a value with the key,
then use a dictionary. Because they don't store a separate
value for each key, sets use less memory than dicts.

Note that sets were only included as a built in object in
Python 2.4, so many books and guides written before then
will often use a dict instead.

Also note that neither sets not dicts preserve the order of the
elements, which is sometimes an important reason to use a
list or tuple instead.

Hopefully the updated script is working better for you - I
can think of at least a few more suggestions worth trying.
However, before making it faster - is it doing what you
wanted? Are you sure you should be using length-1 when
you trim the FASTQ records?

Regards,

Peter



More information about the Biopython mailing list