[Biopython] Handling records referencing other records

Peter Cock p.j.a.cock at googlemail.com
Fri Sep 18 16:16:11 UTC 2015


On Fri, Sep 18, 2015 at 5:01 PM, Athey, John * <John.Athey at fda.hhs.gov> wrote:
> Ivan, Peter,
>
> I'll look into excepting it with a try/except. Thanks.
>
>
> Peter,
>
> I don't see what you're seeing re: the different tags on the CDS features. I see:
> CDS             <1..>587  (DQ100169)
> CDS             <1..>3167 (DQ100170)
>
> The only time I see the order tag is under the gene feature:
> gene            order(<1..587,DQ100170.1:1..>3167)
> gene            order(DQ100169.1:<1..587,1..>3167)

I appear to have looked at the wrong lines. I'm a little confused now.

However, it does actually mater how you download the files - in this
case using Entrez or the "download" option I get a different
location string for "genbank" and "gbwithparts" aka "GenBank (full)"
in the web interface. I wonder if this is a variant of this old bug:

http://blastedbio.blogspot.co.uk/2012/03/missing-external-exons-in-genbank-with.html

> Is the usage of the order tag actually much different from join? I've
> never actually seen the order tag before today, so I have no idea if
> it's commonly used or what the functional distinctions are.

Surprising "order" is like "join" but the order is NOT known. If used for
a CDS, that makes me very un-trusting of the annotation.

See http://www.insdc.org/files/feature_table.html

> For this particular project, I am looking specifically at the nucleotide
> sequence, so I can't just rely on the translation. Excluding these
> genes is problematic only in that it affects the standardization of
> where my data is coming from, and excluding all the partial genes
> is too great a loss of data, but if it can't be helped then it can't be
> helped. Thanks for your suggestions.

I would probably exclude any feature using location type "order".

Peter


More information about the Biopython mailing list