[Bioperl-l] $hit->accession

Wiepert, Mathieu Wiepert.Mathieu@mayo.edu
Fri, 1 Nov 2002 10:25:11 -0600


Hi,

I am not sure, of the answer, but I do see a problem.  I did a blast at NCBI, and saved off the output in XML and as text.  I parsed both files, pulling the accession numbers out.  I get different accnums depending on the file.  Using the examples below, with text I get the accnum as GLK5_HUMAN, while for xml I get Q16478.

SO, unless or until the text is fixed, get your output as xml perhaps, and parse that?  Eventually the two parsing modules should be brought into synch, though it seems like it might be slightly tough.  There seems to be different set-ups for different db types, so a bit of logic to find the accnum that would match the xml output is needed.  Not sure if the regex suhoiy provided matches all the cases.


Sample text hit:

>gi|3287849|sp|Q16478|GLK5_HUMAN Glutamate receptor, ionotropic kainate 5 precursor (Glutamate
           receptor KA-2) (KA2) (Excitatory amino acid receptor 2)
           (EAA2)
 gi|2119548|pir||I57936 glutamate receptor subunit - human
 gi|251840|gb|AAB22591.1| (S40369) glutamate receptor subunit; EAA2; excitatory amino acid
           receptor 2 [Homo sapiens]
          Length = 980

 Score = 20.6 bits (41), Expect =   204
 Identities = 5/5 (100%), Positives = 5/5 (100%)

Query: 1   LMFDA 5
           LMFDA
Sbjct: 306 LMFDA 310

Corresponding XML hit:

        <Hit>
          <Hit_num>7</Hit_num>
          <Hit_id>gi|3287849|sp|Q16478|GLK5_HUMAN</Hit_id>
          <Hit_def>Glutamate receptor, ionotropic kainate 5 precursor (Glutamate receptor KA-2) (KA2) (Excitatory amino acid receptor 2) (EAA2) &gt;gi|2119548|pir||I57936 glutamate receptor subunit - human &gt;gi|251840|gb|AAB22591.1| (S40369) glutamate receptor subunit; EAA2; excitatory amino acid receptor 2 [Homo sapiens]</Hit_def>
          <Hit_accession>Q16478</Hit_accession>
          <Hit_len>980</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>20.5747</Hsp_bit-score>
              <Hsp_score>41</Hsp_score>
              <Hsp_evalue>204.268</Hsp_evalue>
              <Hsp_query-from>1</Hsp_query-from>
              <Hsp_query-to>5</Hsp_query-to>
              <Hsp_hit-from>306</Hsp_hit-from>
              <Hsp_hit-to>310</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_hit-frame>1</Hsp_hit-frame>
              <Hsp_identity>5</Hsp_identity>
              <Hsp_positive>5</Hsp_positive>
              <Hsp_align-len>5</Hsp_align-len>
              <Hsp_qseq>LMFDA</Hsp_qseq>
              <Hsp_hseq>LMFDA</Hsp_hseq>
              <Hsp_midline>LMFDA</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>



Mathieu Wiepert
Medical Informatics Research
Mayo Foundation
(507) 266-2317 Fax (507)-284-0360
wiepert.mathieu@mayo.edu

-----Original Message-----
From: suhoiy [mailto:suhoiy@21cn.com]
Sent: Friday, November 01, 2002 5:17 AM
To: bioperl-l@bioperl.org
Subject: [Bioperl-l] $hit->accession


Hello all,

In Bio::SearchIO::blast module, the accession number is obtained by:
        my @pieces = split(/\|/,$id);
        my $acc = pop @pieces;

but some hits are different, for example:
        gi|3024862|sp|P76372|WZZB_ECOLI Chain length determinant protein...   551   e-156
        gi|18266411|gb|AAL67565.1|AF461121_16 (AF461121) O-antigen chain...   538   e-152
        gi|584950|sp|P37792|WZZB_SHIFL Chain length determinant protein ...   535   e-151

so the returned $hit->accession are "WZZB_ECOLI" and so on, while the true 
accessions are "P76372" and so on.

Is it a bug? maybe we can get the accessions by /gi\|\d+\|\w+\|([\w\.]*)/ ?

thanks.

suhoiy

_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l