[Biopython] iterating over FeatureLocation

Peter Cock p.j.a.cock at googlemail.com
Mon Jan 13 16:18:01 UTC 2014


On Mon, Jan 13, 2014 at 4:07 PM, Michael Thon <mike.thon at gmail.com> wrote:
> Here are two examples from the GenBank format file (not from GenBank though)
>
>
>      CDS             order(6621..6658,6739..6985)
>                      /Source="maker"
>                      /codon_start=1
>                      /ID="CFIO01_14847-RA:cds"
>                      /label=“CDS"
>
>      CDS             419..2374
>                      /Source="maker"
>                      /codon_start=1
>                      /ID="CFIO01_05899-RA:cds"
>                      /label=“CDS"
>
> if the feature is a simple feature, then I just need to access its start and end.
> If its a compound feature then I need to iterate over each segment, accessing the start and end.
>
> What I am doing at the moment is this:
>
> if feat._sub_features:
>         for sf in feat.sub_features:
>                 start = sf.location.start
>> else:
>         start = feat.location.start
>>
> it works, I think.  Is there a better way?

Don't do that :) Python variables/methods/etc starting with a single
underscore are by convention private and should not generally be
used. In this case, ._sub_features is an internal detail for the behind
the scenes backwards compatibility for the now deprecated property
.sub_features (don't use that either).

Instead use the location object itself directly, it now holds any
sub-location information using a CompoundLocation object.
See the .parts attribute, which gives a list of simple locations.

e.g.

for part in feat.location.parts:
    start = part.start
    ...

>
> Also, is there an easy way to get the sequence represented by the seqfeature,
> if it is made up of CompoundLocations?  These features are CDSs where each
> sub-feature is an exon.  I need to splice them all together and get the translation.
>

Yes, where `feat` is a SubFeature object use `feat.extract(the_parent_sequence)`
to get the spliced sequence, which you can then translate. See the section
"Sequence described by a feature or location" in the Tutorial,

http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

On reflection, the Tutorial could do with a bit more detail on how to use
a CompoundLocation, but I did try to cover this in the docstrings.

Regards,

Peter




More information about the Biopython mailing list