[Biopython] GI number

Peter Cock p.j.a.cock at googlemail.com
Mon Jan 25 23:41:46 UTC 2010


On Mon, Jan 25, 2010 at 9:38 PM, x y <rafal.b.pawlak at gmail.com> wrote:
> hello,
> how extract GI number in this program?
>
> from Bio import SeqIO
> handle = open("xyz.fasta")
> for seq_record in SeqIO.parse(handle, "fasta"):
>    print seq_record.description
> handle.close()
>
> ex.
> Osa_SPT6 gi|222632083|gb|EEE64215.1| hypothetical protein Os05g41510.1_ORYZA
> [Oryza sativa Japonica Group]
>
> rafal pawlak

I would just the Python string split method on this string - assuming
all your record use the same layout, e.g. Something like this:

gi = record.description.split()[1].split("|")[1]

There are related examples in the tutorial, search for "get_accession"
which are a bit more robust because they check the string follows
the expected format. You could alternatively use a regular expression.

Peter




More information about the Biopython mailing list