[Biopython] Handling records referencing other records

Athey, John * John.Athey at fda.hhs.gov
Fri Sep 18 16:01:13 UTC 2015


Ivan, Peter,

I'll look into excepting it with a try/except. Thanks.


Peter,

I don't see what you're seeing re: the different tags on the CDS features. I see:
CDS             <1..>587  (DQ100169)
CDS             <1..>3167 (DQ100170) 

The only time I see the order tag is under the gene feature:
gene            order(<1..587,DQ100170.1:1..>3167)
gene            order(DQ100169.1:<1..587,1..>3167)

Is the usage of the order tag actually much different from join? I've never actually seen the order tag before today, so I have no idea if it's commonly used or what the functional distinctions are. 

For this particular project, I am looking specifically at the nucleotide sequence, so I can't just rely on the translation. Excluding these genes is problematic only in that it affects the standardization of where my data is coming from, and excluding all the partial genes is too great a loss of data, but if it can't be helped then it can't be helped. Thanks for your suggestions.

-----Original Message-----
From: Ivan Gregoretti [mailto:ivangreg at gmail.com] 
Sent: Friday, September 18, 2015 11:27 AM
To: Athey, John *
Cc: biopython at mailman.open-bio.org
Subject: Re: [Biopython] Handling records referencing other records

Hi John.

Here Python itself is designed to help you.
Take a look at the try...except statement:

https://docs.python.org/2/tutorial/errors.html

Cheers,

Ivan




Ivan Gregoretti, PhD
Bioinformatics



On Fri, Sep 18, 2015 at 10:30 AM, Athey, John * <John.Athey at fda.hhs.gov> wrote:
> Hello all,
>
>
>
> I’m looking for advice on how to handle Genbank records that reference 
> other records as part of their location. My program iterates through 
> large Genbank-formatted files with SeqIO.parse and extracts the CDS 
> for subsequent analysis, using feat.extract(). However, upon hitting a 
> record where the feature location references another record, it 
> SOMETIMES fails. For example,
> http://www.ncbi.nlm.nih.gov/nuccore/DQ100169 seems to be handled 
> correctly, while http://www.ncbi.nlm.nih.gov/nuccore/DQ100170 gives a “ValueError:
> Feature references another sequence.” Curiously, in both cases the CDS 
> feature itself doesn’t specify another record, only the parent gene does.
>
>
>
> My questions about this are:
>
> 1)      Why does the extraction fail on some records but not on all of them?
>
> 2)      Is there a way to extract the data I’m looking for without causing
> this error?
>
> 3)      If the answer to (2) is no, is there some other way to check whether
> the sequence will cause this error, skip extracting that sequence, and 
> exclude that record from the analysis?
>
>
>
> Thanks for any help you can provide!
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org 
> http://mailman.open-bio.org/mailman/listinfo/biopython



More information about the Biopython mailing list