[Biopython] SeqIO feature.location.start and end for genes spanning origin

Richard Llewellyn llewelr at gmail.com
Thu May 8 16:02:34 UTC 2014


I was surprised to find that the start and end of a SeqIO
record.feature.location for a gene spanning the origin was given as 0 and
the end the length of the circular chromosome (see below).

I know it is difficult to deal with features spanning the origin, and
imagine that there are issues if the start location is given as greater
than the end.

I wonder if you have a suggested work around.

Off the top of my head, I could test whether the feature.location is of
type CompoundLocation, and if so, determine whether it spans the origin
(for instance, test if the end of one location is chromosome length, start
of another is 0), and then take the minimum of the former and the max of
the latter).  Since I am currently working with prokaryotic sequence this
would just add the type test to each parse, a relatively small overhead.

Thanks for the great work.

#####################################


I ran into this problem with Nanoarchaeum equitans Kin4-M,
http://www.ncbi.nlm.nih.gov/nuccore/38349555,

where parsing the first CDS, location.start is 0 and location.end is
490885.


FEATURES             Location/Qualifiers     source          1..490885
                     /organism="Nanoarchaeum equitans Kin4-M"
                     /mol_type="genomic DNA"

...


    gene <http://www.ncbi.nlm.nih.gov/nuccore/38349555?itemid=605&sat=4&sat_key=95493374>
           complement(join(490883..490885,1..879))
                     /locus_tag="NEQ001"
                     /db_xref="GeneID:2732620
<http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=Retrieve&dopt=full_report&list_uids=2732620>"
    CDS <http://www.ncbi.nlm.nih.gov/nuccore/38349555?itemid=1&sat=4&sat_key=95493374>
            complement(join(490883..490885,1..879))



More information about the Biopython mailing list