[Bioperl-l] SearchIO

Frank Schwach fs5 at sanger.ac.uk
Fri Mar 30 11:10:13 UTC 2012


You are on the right track.
Yes, you will need to first store the hits' data in a data-structure and 
then you will need another loop after parsing the BLAST results that 
traverses that data structure in order of chromosomes to print your results.
You can use a hash (associative array) where your key is the chromosome 
and the value is an array of HSP data for that chromosome, so you will 
need to investigate how to build and traverse a hash of arrays.
Take a look at his for example: 
http://www.perl.com/doc/FMTEYEWTK/pdsc/pdsc-2.html
To learn how to do this, I would first write a little separate script 
that builds some hash of arrays and then try to traverse it in sorted 
order, i.e. you need to look up how to access keys of a hash in sorted 
order.
I hope this will help to get you going again.

Good luck!


Frank


On 28/03/12 05:23, Detrix wrote:
> Hi,
>
> Im new to perl/bioperl and I need to write a script for an assignment. The
> background is that we BLAST searched a sequence on NCBI and came up with the
> hits. What I have to do is write a script that only extracts the HSPs for
> Mus musculus and mouse, but extract it and match it to each chromosome and
> write it to a table outfile.
>
> So far I have:
>
>
> use strict;
> use lib "C:/Program Files (x86)/BioPerl";
>
>
> use Bio::SearchIO;
> my $parser = new Bio::SearchIO(-format =>  'blast',
>                             -file   =>  'nucleotide.pl');
>
> while (my $result = $parser->next_result) {
> 	
> 	while (my $hit = $result->next_hit) {
>
> 		if ($hit->description =~ /(Mus musculus)|(Mouse)/i) {
> 		
> 			while (my $hsp = $hit->next_hsp) {
>     				
> 	
>          		print
>                	" Hit=", $hit->description, "\n";
>               	print
>               	" HSPs=", $hit->num_hsps, "\n";
>
> 			}
> 		}
> 	}
> }
>
>
>
> What this gets me is the list of all the descriptions of the hits (mouse and
> mus musculus), and the HSPs for them. What I need now is to sort all the
> HSPs for each particular chromosome, and write it to a table outfile. I
> think what I have to do is sort it into an associative array, but all
> attempts at it I have failed. Im lost, so any help would be greatly
> appreciated!
>
> Thanks
>
>


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Bioperl-l mailing list