[Bioperl-l] blast report parsing addition(s)

Wiepert, Mathieu Wiepert.Mathieu@mayo.edu
Mon, 11 Nov 2002 09:01:08 -0600


Hi,

I changed the way accession numbers are parsed from the blast reports.  In some cases the locus was actually being grabbed, not the accession number.  I added a locus method to HitI, in case anyone wanted the locus.  

I also added something called each_accession_number to HitI.  This was to get all the accession numbers from the description.  I was finding that I needed all the accession numbers to help me categorize things, or organize my hits.  Something to parse hits that look like:

>ref|NP_065733.1| (NM_020682) Cyt19 protein; likely ortholog of rat methyltransferase
           Cyt19; S-adenosylmethionine:arsenic (III)
           methyltransferase [Homo sapiens]
 pir||T14789 hypothetical protein DKFZp586L0724.1 - human
 emb|CAB53709.1| (AL110271) hypothetical protein [Homo sapiens]
 gb|AAG09731.1|AF226730_1 (AF226730) Cyt19 [Homo sapiens]
 gb|AAH01726.1|AAH01726 (BC001726) Similar to DKFZP586L0724 protein [Homo sapiens]

They are implemented in GenericHit.


Something like
my $locus = $hit->locus;
my @accnums = $hit->each_accession_number;
foreach my $a (@accnums) {
  print "\tHit Accnums: ", $a , "\n";
}

For each_accession_number, the first one in the list is always the accession number returned by $hit->accession().

-Mat