[Biopython] iterating over FeatureLocation

Peter Cock p.j.a.cock at googlemail.com
Tue Jan 21 16:52:35 UTC 2014


On Tue, Jan 21, 2014 at 4:39 PM, Michael Thon <mike.thon at gmail.com> wrote:

> Here’s another question.  I have this GenBank formatted feature:
>
>      CDS             order(complement(3448..3635),complement(2617..3256))
>                      /Source="maker"
>                      /codon_start=1
>                      /ID="CFIO01_05457-RA:cds"
>                      /label=“CDS"
>
> When I extract the sequence I get this:
>
> (Pdb) str(feat.extract(seq).seq)
> ...
>
> This is supposed to be a CDS which can be translated to a protein coding
> sequence starting with M and ending with a stop codon.  the above sequence
> isn’t correct - the exons are in the wrong order.  When I reverse the order
> of the exons I get the correct order and get a CDS sequence that can be
> translated:
>
> (Pdb) feat.location.parts.reverse()
> (Pdb) str(feat.extract(seq).seq)
> ...
> (Pdb) str(feat.extract(seq).seq.translate())
>
> 'MSHEHSHDGPHGHAHSHEGGFNAQEHGHSHEILDGPGSYLGREMPIVEGRNWSDRAFTIGIGGPVGSGKTALMLALCLALREKYSIAAVTNDIFTREDAEFLTRHKALPAPRIRAIETGGCPHAAVREDISANLAALEDLHREFDADLLLIESGGDNLAANYSRELADYIIYVIDVSGGDKIPRKGGPGITQSDLLVVNKTDLAEIVGADLGVMERDARKMREGGPTVFAQVKKNVAVDHIVNLMLSAWKASGAEENRRAAGGPRPTEGLDSLKA*'
>
> So my question is, is there something wrong with the file I’m parsing?
>

Possibly - the 'order' tag actually means the order of the parts is unknown.
If the order is known, it should be 'join' instead:

join(complement(3448..3635),complement(2617..3256))

What's the accession/URL for the full file this example came from?

Peter




More information about the Biopython mailing list