[Bioperl-l] Species retrieval from NCBI nr protein database.

Brian Osborne brian_osborne@cognia.com
Mon, 29 Jul 2002 10:50:23 -0400


Navdeep,

The key documentation for Bioperl, in my opinion, is in the Seq.pm module,
take a look at it. $seqobj->species returns a Bio::Species object, not a
string, so the code should look like:

$species_obj = $seq_obj->species;
$species = $species_obj->binomial;

And so on. See Bio::Species.pm for all the methods that you could use once
you have the Bio::Species object. Mind you, the common_name method may not
always return the common name since the common name isn't provided in all
public databases, use binomial() instead.

Brian O.



-----Original Message-----
From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]On
Behalf Of Navdeep Jaitly
Sent: Monday, July 29, 2002 10:23 AM
To: bioperl-l@bioperl.org
Subject: [Bioperl-l] Species retrieval from NCBI nr protein database.

Hi!
I was using SeqIO to get proteins in NCBI nr database. Unfortunately it
seems that the parsing of the species field is not quite working, and it
gets lumped in with the description field (usually the species is the last
element in the header of the nr database and is surrounded by []). Is this
to be expected or am I doing something wrong ? Is the parsing of the fields
specifiable in declaring a SeqIO instance ?
Thanks!
Deep

ps: Code, and results attached.



use Bio::SeqIO;
use strict ;
$in  = Bio::SeqIO->new('-file' => "c:\\Databases\\nr.fas",
                         '-format' => 'Fasta');
my $TO_PRINT = 3 ;
my $numProteins = 0 ;
my $seq ;
while ( ($seq = $in->next_seq()) && $numProteins < $TO_PRINT)
{
        my $sequence = $seq->seq() ;
        my $name = $seq->display_id() ;
        my $species = $seq->species() ;
        my $description = $seq->desc() ;
        print "NAME: $name\n" ;
        print "SPECIES: $species\n" ;
        print "DESCRIPTION: $description\n" ;
        print "SEQUENCE: $sequence\n\n" ;
        $numProteins++ ;
}


NAME: gi|6|emb|CAA42669.1|
SPECIES:
DESCRIPTION: (X60065) beta-2-glycoprotein  I [Bos taurus]
SEQUENCE:
PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQIVFSCQPGYVSRGGIRRFTCPLTGLWPINTLKC
MPRVCPFAGILENGTVRYTTFEYPNTISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSV
YKPLAGNNSFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPANPVLYYKDTAT
FGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGERVAIQNKFKNGMLHGQKVSFFCKHKEKKC
SYTEDAQCIDGTIEIPKCFKEHSSLAFWKTDASDVKPC

NAME: gi|129249|sp|P02820|OSTC_BOVIN
SPECIES:
DESCRIPTION: OSTEOCALCIN PRECURSOR (GAMMA-CARBOXYGLUTAMIC ACID-CONTAINING
PROTEIN) (BONE GLA-PROTEIN) (BGP)gi|538590|pir||GEBO osteocalcin precursor
- bovinegi|8|emb|CAA35997.1| (X51700) bone Gla precursor (100 AA) [Bos
taurus]gi|720|emb|CAA37737.1| (X53699) Gla protein precusor [Bos taurus]
SEQUENCE:
MRTPMLLALLALATLCLAGRADAKPGDAESGKGAAFVSKQEGSEVVKRLRRYLDHWLGAPAPYPDPLEPKREVCEL
NPDCDELADHIGFQEAYRRFYGPV

NAME: gi|231734|sp|P30274|CGA2_BOVIN
SPECIES:
DESCRIPTION: CYCLIN A2 (CYCLIN A)gi|284597|pir||S24788 cyclin A -
bovinegi|10|emb|CAA48398.1| (X68321) Cyclin A-3 [Bos taurus]
SEQUENCE:
EFQEDQENVNPEKAAPAQQPRTRAGLAVLRAGNSRGPAPQRPKTRRVAPLKDLPINDEYVPVPPWKANNKQPAFTI
HVDEAEEIQKRPTESKKSESEDVLAFNSAVTLPGPRKPLAPLDYPMDGSFESPHTMEMSVVLEDEKPVSVNEVPDY
HEDIHTYLREMEVKCKPKVGYMKKQPDITNSMRAILVDWLVEVGEEYKLQNETLHLAVNYIDRFLSSMSVLRGKLQ
LVGTAAMLLASKFEEIYPPEVAEFVYITDDTYTKKQVLRMEHLVLKVLAFDLAAPTINQFLTQYFLHQQPANCKVE
SLAMFLGELSLIDADPYLKYLPSVIAAAAFHLALYTVTGQSWPESLVQKTGYTLETLKPCLLDLHQTYLRAPQHAQ
QSIREKYKNSKYHGVSLLNPPETLNV


_________________________________________________________________
Join the world's largest e-mail service with MSN Hotmail.
http://www.hotmail.com

_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l