[Biopython-dev] [Bug 2778] Efficiency improvement in function Bio.SeqUtils.GC()

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Thu Mar 5 15:56:38 UTC 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2778





------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 10:56 EST -------
(In reply to comment #3)
> I think that it is clearer to check that the sequence length is
> not zero rather than assuming that if the sum is zero then the
> sequence length is also zero. 

I agree, but had chosen to keep the old code.

> def GC(seq):
>     """Calculates G+C content, ..."""
>    gc=sum(map(seq.count,['G','C','g','c','S','s']))
>    if len(seq) > 0: 
>       return gc*100.0/len(seq)
>    else:
>       return 0
> 

Your length test isn't very elegant, this is much nicer/more pythonic I think:

    if seq :
        gc = sum(map(seq.count,['G','C','g','c','S','s']))
        return gc*100.0/len(seq)
    else :
        return 0

However, given most of the time the sequence will not be empty, this should be
faster:

    try :
        gc = sum(map(seq.count,['G','C','g','c','S','s']))
        return gc*100.0/len(seq)
    except ZeroDivisionError :
        return 0

CVS updated.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list