[Bioperl-l] SearchIO
Frank Schwach
fs5 at sanger.ac.uk
Fri Mar 30 11:10:13 UTC 2012
You are on the right track.
Yes, you will need to first store the hits' data in a data-structure and
then you will need another loop after parsing the BLAST results that
traverses that data structure in order of chromosomes to print your results.
You can use a hash (associative array) where your key is the chromosome
and the value is an array of HSP data for that chromosome, so you will
need to investigate how to build and traverse a hash of arrays.
Take a look at his for example:
http://www.perl.com/doc/FMTEYEWTK/pdsc/pdsc-2.html
To learn how to do this, I would first write a little separate script
that builds some hash of arrays and then try to traverse it in sorted
order, i.e. you need to look up how to access keys of a hash in sorted
order.
I hope this will help to get you going again.
Good luck!
Frank
On 28/03/12 05:23, Detrix wrote:
> Hi,
>
> Im new to perl/bioperl and I need to write a script for an assignment. The
> background is that we BLAST searched a sequence on NCBI and came up with the
> hits. What I have to do is write a script that only extracts the HSPs for
> Mus musculus and mouse, but extract it and match it to each chromosome and
> write it to a table outfile.
>
> So far I have:
>
>
> use strict;
> use lib "C:/Program Files (x86)/BioPerl";
>
>
> use Bio::SearchIO;
> my $parser = new Bio::SearchIO(-format => 'blast',
> -file => 'nucleotide.pl');
>
> while (my $result = $parser->next_result) {
>
> while (my $hit = $result->next_hit) {
>
> if ($hit->description =~ /(Mus musculus)|(Mouse)/i) {
>
> while (my $hsp = $hit->next_hsp) {
>
>
> print
> " Hit=", $hit->description, "\n";
> print
> " HSPs=", $hit->num_hsps, "\n";
>
> }
> }
> }
> }
>
>
>
> What this gets me is the list of all the descriptions of the hits (mouse and
> mus musculus), and the HSPs for them. What I need now is to sort all the
> HSPs for each particular chromosome, and write it to a table outfile. I
> think what I have to do is sort it into an associative array, but all
> attempts at it I have failed. Im lost, so any help would be greatly
> appreciated!
>
> Thanks
>
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioperl-l
mailing list