[Bioperl-l] BioPerl help with 2D arrays

Chris Fields cjfields at uiuc.edu
Tue Jun 24 15:39:47 UTC 2008


On Jun 23, 2008, at 2:32 PM, Nathan Haseley (RIT Student) wrote:

> Hello,
>   I am writing a script that returns an array of array references  
> to  Bio::Search::HSP::GenericHSP objects (2D array of  
> Bio::Search::HSP::GenericHSP objects).  Whenever I try to call  
> functions such as ->num_identical I get an error message:
> Can't call method "num_identical" without a package or object  
> reference.
>
> Below are the segments of the code that are giving me problems to  
> give you a general idea of what I'm doing.  Is there a way around  
> this?  What am I doing wrong?  Thanks!
> Sincerely,
> Nathan
>
> my $in = new Bio::SearchIO( -format => 'blast',
> 				    -file => $file );
> my $ j = -1;
> while( $result = $in->next_result and ref($result)) {

$result should always be assigned an object ref or undef, the latter  
which kills the loop, so '... and ref($result)' isn't needed.

>   ++$j;
>   while( $has_species == 0) {
>      if( my $hit = $result->next_hit) {
>   	if( $hit -> description =~ /$species_names[$i]/i) {
>       		$has_species = 1;
>   		$temp[$i] = $hit->next_hsp;
>       		$result->rewind;
>        }
>      }
>     }

I think you want to incorporate a simple hash construct for your  
species names.  However...

You have left out a significant bit of code, so I am finding it hard  
to determine what you are trying to accomplish.  For instance, from  
the above it seems you always want to always match to the same species  
name, as $i never changes (so $species_name[$i] also never changes).

The while loop as represented above is pretty dangerous, as you can  
feasibly enter an infinite loop unless you get both a hit AND the hit  
desc matches the regex.  Also, you're rewinding the main $result  
iterator ($result->rewind) while inside the $result iterator loop; not  
sure why you are doing that.

Note that '$hit->next_hsp;' can also be dangerous under some  
circumstances, as this can give you undef in cases where no HSP  
alignment is returned (e.g. the hit data is only in the hit table).   
May be one source of your problems.

>  $homologs[$j] = [@temp];

...

> return $homologs

Shouldn't that be @homologs?  You use '$homologs[$j]' above, which is  
a scalar in the array @homologs.

If you aren't 'use strict/warnings', this creates a local scalar  
$homologs then returns the value of $homologs (undef), which could be  
the problem.  Using 'strict/warnings' should catch that.

> .
> .
> . (eventually in a separate fucntion)
> $temp = $homologs[$i]->[$j];
> $temp->num_identical;

I would go about this by creating a lookup table (hash or array) of  
species names, then iterate through each BLAST report to either

(1) generically grab the species name generically and check it against  
the lookup table using 'exists',

%lookup = map {$_ => 1} ('species1', 'species2');
# if species is in brackets, match inside the brackets
if ($hit->description =~ /\[([^\]]+)\]/) {
    $sp = $1;
    if (exists $lookup{$sp}) {
       # do something
    } else { # do something on failure }
}

or (2) check each name (the key in the lookup table) against the  
description using a 'grep', such as:

%lookup = map {$_ => 1} ('species1', 'species2');
if (grep {$desc =~ /$_/} keys %lookup) {...} else {# do something on  
failure}
# could use array of names as a lookup as well

Depending on how the varied the descriptions are; it may only be  
feasible to try the latter one.

 From there you could store the HSP data in a hash, using the species  
name:

$temp{$species} = $hit->next_hsp || warn 'No HSP returned';

and store that by the report # or description:

# array of reports
push @reports, \%temp;

# hash of reports, by query name
$report{$result->query_name} = \%temp;

chris




More information about the Bioperl-l mailing list