[Biopython] instances

Peter Cock p.j.a.cock at googlemail.com
Tue Jun 30 07:32:47 UTC 2009


On Mon, Jun 29, 2009 at 3:55 PM, Liam Thompson<dejmail at gmail.com> wrote:
> Hi Peter
>
> Thanks for the reply. I certainly didn't write my own parser, I just
> made use of the Genbank one in biopython (I'm using 1.49) and I
> started with the Genbank parser as it was one of the example Brad
> posted some years ago, so I just adapted it (some things didn't work,
> but some tweaking and it worked fine).

OK

> I have referred to the examples on the tutorial cookbook, it has been
> very helpful as well, but I am very new to this so am still trying to
> figure where and why everything goes. Would you suggest I recode the
> py file to take advantage of SeqIO (I'm sure it wouldn't be that
> difficult) ? I would be most willing if it would help with this
> problem.

It sounds like you are using Bio.GenBank to get SeqRecord objects
(containing SeqFeature objects with FeatureLocation objects etc).
If you used Bio.SeqIO instead (with the format="genbank"), you
would get exactly the same objects - but via the standardised API.
i.e. It won't actually make any real difference to you.

Right now, I would only recommend using Bio.GenBank if you
don't want SeqRecord objects, but instead the Bio.GenBank.Record
objects which are a simpler representation of the raw file. This
won't parse the feature locations for example.

> I tried your suggestion and got the following error
>
> Traceback (most recent call last):
>  File "/media/RESCUE/HBx_Bioinformatics/reannotate.py", line 166, in <module>
>    corestart = corecur_seq[0].position
>  File "/var/lib/python-support/python2.6/Bio/SeqFeature.py", line
> 265, in __getattr__
>    raise AttributeError("Cannot evaluate attribute %s." % attr)
> AttributeError: Cannot evaluate attribute position.
>
> So I guess it doesn't have that position option, pressing tab gives me
> __doc__, __getattr__, __init__, __module__, __repr__, _str__, _start,
> _end

>From the information above, I'm not 100% sure which object you are
looking at. There is a hierarchy (which I hope the latest version of the
tutorial explains quite well):

* One GenBank record becomes a SeqRecord
* Each GenBank feature table entry becomes a SeqFeature
(accessed from the parent SeqRecord via the "features" list).
* Each SeqFeature has a FeatureLcation object to say where it is on
the parent SeqRecord (accessed as the "location" property).
* Each FeatureLocation has start and end positions.

Once you have found the relevant FeatureLocation object, the "start"
and "end" properties gives you a complex object representing the
position (which may be a fuzzy location). You can get the position
as a simple integer from this Position object. However, the simplest
route is to use the nofuzzy_start and nofuzzy_end which just give
an integer. In older versions of Biopython these rather important
properties don't actually show up via dir (and thus the tab
autocomplete). There were at least documented. This has been
fixed since Biopython 1.49 (probably in 1.51, but I'd have to
double check).

I had been thinking that corecur_seq[0] in your code was a position
object. Clearly from the error this was not the case, but as I said, it
was difficult to be sure without seeing more of your code.

I now guess that you are looking at a FeatureLocation object. So,
try corecur_seq[0].nofuzzy_start and corecur_seq[0].nofuzzy_start
to get simple integers.

Peter




More information about the Biopython mailing list