[Biopython] instances

Peter Cock p.j.a.cock at googlemail.com
Mon Jun 29 14:25:56 UTC 2009


On 6/29/09, Liam Thompson <dejmail at gmail.com> wrote:
> Hi everyone
>
> Ok, so I managed to write a parser for Genbank files ( I will post to
> script central once completed, it works well with single genes from
> genomic sequences) which can search for a gene from a genomic
> sequence and copy it out as a FASTA.

I hope you didn't spend time writing a whole new GenBank
parser, Biopython already has one which works pretty well ;)
>From the rest of your email it sounds like you actually using
this (the Bio.GenBank module, which is also used internally
by Bio.SeqIO).

> ...
> I then attempt to print the sequence at the given coordinates
>
>  if corecur_seq > 0:
>             print "core sequence only \n"
>             corestart = corecur_seq[0]._start
>             coreend = corecur_seq[0]._end
>             coreseq = corecur_seq[1]
>             print coreseq[corestart:coreend]
>
> getting the following error message
>
> Traceback (most recent call last):
>   File "/media/RESCUE/HBx_Bioinformatics/reannotate.py", line 171, in
> <module>
>     print coreseq[corestart:coreend]
>   File "/var/lib/python-support/python2.6/Bio/Seq.py", line 132, in
> __getitem__
>     return Seq(self._data[index], self.alphabet)
> TypeError: object cannot be interpreted as an index

I would guess that corestart and coreend are NOT integers. To
do slicing, you will need integers. Based on the later bits of your
email you discovered they are Biopython position objects (not
integers):

> I think the error is (although I don't know, I am pretty new to python
> and programming in biopython) with the variable type of
> corestart and coreend, both defined as <type 'instance'> and when I
> print them on the shell I get
>
> Bio.SeqFeature.ExactPosition(1900)
>
> Bio.SeqFeature.ExactPosition(2452)
>
> as an example, do I need to convert these to integers ? I have tried,
> but I think I would need to replace or copy out the number
> into a different variable ?

A position object has a position attribute you should be using
if you just need an integer. I think (without knowing exactly
what your code is doing) that this would work:

corestart = corecur_seq[0].position
coreend = corecur_seq[0].position
print current_entry.seq[corestart:coreend]

> Specific thanks to Peter, Andrew Dalke and Brad who posted
> numerous examples on their pages and on the mailing lists
> which have helped me tremendously.
>
> I would appreciate any comments.

Be careful as lots of Andrew's examples may be out of date
now.

What version of Biopython are you using, and have you been
looking at a recent version of the tutorial? We currently
recommend using Bio.SeqIO to parse GenBank files, although
it does internally use Bio.GenBank

http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

The latest version of the tutorial (included with Biopython 1.51b)
discusses the SeqRecord and SeqFeature objects and their
locations more prominently (they get a whole chapter now).
Most of this section would still apply directly to older versions
of Biopython.

Peter



More information about the Biopython mailing list