[Biopython] Handling records referencing other records

Athey, John * John.Athey at fda.hhs.gov
Fri Sep 18 14:30:05 UTC 2015


Hello all,

I'm looking for advice on how to handle Genbank records that reference other records as part of their location. My program iterates through large Genbank-formatted files with SeqIO.parse and extracts the CDS for subsequent analysis, using feat.extract(). However, upon hitting a record where the feature location references another record, it SOMETIMES fails. For example, http://www.ncbi.nlm.nih.gov/nuccore/DQ100169 seems to be handled correctly, while http://www.ncbi.nlm.nih.gov/nuccore/DQ100170 gives a "ValueError: Feature references another sequence." Curiously, in both cases the CDS feature itself doesn't specify another record, only the parent gene does.

My questions about this are:

1)      Why does the extraction fail on some records but not on all of them?

2)      Is there a way to extract the data I'm looking for without causing this error?

3)      If the answer to (2) is no, is there some other way to check whether the sequence will cause this error, skip extracting that sequence, and exclude that record from the analysis?

Thanks for any help you can provide!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20150918/7e5881df/attachment.html>


More information about the Biopython mailing list