[Bioperl-l] Bio::SeqIO; seq->desc() gives back too (!!!) full header

Benjamin Breu breu@proteosys.com
Thu, 8 Aug 2002 10:01:10 +0200


Hi,

thx Jason for your help.

The desc() funktion prints out the header but there is too much stuff in it. I thought it would print only the description, but if there are multiple gi numbers for one protein (I'm using NCBI-Fasta (nr)), it shows me the description and the following gi, pir, etc. number plus their description. See below.

use Bio::SeqIO;
my $seq = Bio::SeqIO->new(-format => 'fasta', -file => 'filename');   	#filename = my filename
while( my $seq = $in->next_seq ) {
 print  $seq->display_id(), "\n",$seq->desc(), "\n", $seq->seq(), "\n\n";
}

format as folows for output:

ID
description
sequence

gi|15233744|ref|NP_194152.1|
(NM_118554) putative protein [Arabidopsis thaliana]gi|7487330|pir||T09884 hypothetical protein T22A6.40 - Arabidopsis thalianagi|5051763|emb|CAB45056.1| (AL078637) putative protein [Arabidopsis thaliana]gi|7269271|emb|CAB79331.1| (AL161561) putative protein [Arabidopsis thaliana]
MKRSTTDSDLAGDAHNETNKKMKSTEEEEIGFSNLDENLVYEVLKHVDAKTLAMSSCVSKIWHKTAQDERLWELICTRHWTNIGCGQNQLRSVVLALGGFRRLHSLYLWPLSKPNPRARFGKDELKLTLSLLSIRYYEKMSFTKRPLPESK

Is there a problem with the parser or what options does it need in order to tell me the whole gi, pir, etc. -numbers when I call for an ID. That could be an hash with key = database (e.g. dbj, pir) and values = @arrayofnumbers. Is there such a smart little parser or do I have to spend (a lot of) hours to do this myself?

Thx 

Ben