[Biopython] EMBL DNA records with locations referencing other sequences
Adam Sjøgren
asjo at koldfront.dk
Mon Sep 30 12:21:06 UTC 2019
Peter writes:
> You have found the relevant issue,
> https://github.com/biopython/biopython/issues/808 - which has the
> outlines of a strategy. I would first download the referenced
> sequences, and store them in a dictionary keyed by the accession.
> Then, you need a new "extract" function (which might ultimately
> replace the current extract method) taking as arguments the complex
> location, main sequence, and this dictionary of external sequences.
> This code needs to special case the extract for any FeatureLocation
> with .ref set, see e.g.
>
> https://github.com/biopython/biopython/issues/808#issuecomment-209364333
>
> Does this all look too horrible/challenging?
Makes sense to me - I was just hoping somebody already had got a working
solution :-)
> I've not had to deal with one of these sequences (or indeed seen one)
> in a long time.
(Yeah, first time I have encountered one, but apparently the format has
allowed it since at least 2012.)
> It may be easier to use the feature's accession to look up the gene or
> protein sequence directly online - rather than trying to reconstruct
> it yourself?
Yeah, well, the code I was looking at is trying to make sure that the
DNA for a protein record is available, so from the protein record it
finds the relevant DNA record+feature and then translates to compare to
the original protein record, to make sure the DNA record matches.
It turned out that the one code path I was chasing could probably just
use the "/translation" qualifier on the DNA record.
But for other uses, I think I need the "full solution" you outlined
above.
If only days were 28 hours long...
Thanks!
Adam
--
"Clear? Huh! Why a four-year-old child could Adam Sjøgren
understand this report. Run out and find me a four-year- asjo at koldfront.dk
old child. I can't make head or tail out of it."
More information about the Biopython
mailing list