[Biopython] EMBL DNA records with locations referencing other sequences

Mon Sep 30 12:21:06 UTC 2019

Peter writes:

> You have found the relevant issue,
> https://github.com/biopython/biopython/issues/808 - which has the
> outlines of a strategy. I would first download the referenced
> sequences, and store them in a dictionary keyed by the accession.
> Then, you need a new "extract" function (which might ultimately
> replace the current extract method) taking as arguments the complex
> location, main sequence, and this dictionary of external sequences.
> This code needs to special case the extract for any FeatureLocation
> with .ref set, see e.g.
>
> https://github.com/biopython/biopython/issues/808#issuecomment-209364333
>
> Does this all look too horrible/challenging?

Makes sense to me - I was just hoping somebody already had got a working
solution :-)

> I've not had to deal with one of these sequences (or indeed seen one)
> in a long time.

(Yeah, first time I have encountered one, but apparently the format has
allowed it since at least 2012.)

> It may be easier to use the feature's accession to look up the gene or
> protein sequence directly online - rather than trying to reconstruct
> it yourself?

Yeah, well, the code I was looking at is trying to make sure that the
DNA for a protein record is available, so from the protein record it
finds the relevant DNA record+feature and then translates to compare to
the original protein record, to make sure the DNA record matches.

It turned out that the one code path I was chasing could probably just
use the "/translation" qualifier on the DNA record.

But for other uses, I think I need the "full solution" you outlined
above.

If only days were 28 hours long...

  Thanks!

   Adam

-- 
 "Clear? Huh! Why a four-year-old child could                    Adam Sjøgren
  understand this report. Run out and find me a four-year-  asjo at koldfront.dk
  old child. I can't make head or tail out of it."