[Bioperl-l] SearchIO - Stop throwing away data
simon andrews (BI)
simon.andrews at bbsrc.ac.uk
Mon Jul 24 07:14:08 EDT 2006
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> Jayne Vallance
> Sent: 24 July 2006 10:46
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] SearchIO - Stop throwing away data
>
> Hi
>
> I developing someone
> elses work. I wondered whether anyone could identify the
> mistake that the previous coder made?
> I am not very familiar with SearchIO yet.
>
> They are trying to extract filenames from an output report.
I'm not sure what you mean by filenames here. The value which is being
collected in your code snippet is the name of the original query
sequence.
> This is their code:
> while ( my $result = $searchio->next_result() ) {
> # get the hits and their associated name
> # do not want to include these in the clustering step
> while( my $hit = $result->next_hit ) {
> # store the names of these hits into an array
> # these filenames will not be copied over
> $query_name = $result->query_name();
> #print "\nQuery $query_name\n";
> push(@mito_hits, $query_name);
OK, this bit is odd. You're collecting the name of the query sequence
but you're doing it as you're looping through the hits. Since all the
hits come from the same result you're just going to get the same query
name put into your array multiple times (once for each hit). This
almost certainly isn't what you want.
If you just want the name of the query sequence you can miss out the
inner loop (the $result->next_hit() loop). If you actually want to
collect the names of the sequences which were hit then you need to
collect $hit->name() rather than $result->query_name();
> }
> }
>
> I think they have based it on the code at
> http://www.bioperl.org/wiki/HOWTO:SearchIO#Authors
> $searchio->attach_EventHandler(Bio::SearchIO::FastHitEventBuil
der->new);
> while( my $r = $searchio->next_result ) { while( my $h =
> $r->next_hit ) {
> # Hits will NOT have HSPs
> print $h->significance,"\n";
> }
>
> which "throws away data you don't want"???
Indeed, but probably not in the way you're thinking. The data it throws
away is the details of each individual HSP (mostly the alinment data).
You're not using hsp data in your code so it will have no effect (other
than making it a bit quicker). It doesn't throw away whole hits or
anything like that.
> I am finding that our code is finding the last file name in
> the ouput report, rather than each and every one. I suspect
> it is overwriting (or throwing away the data).
I suspect then that you should be collecting the hit names rather than
the query names?
Simon.
More information about the Bioperl-l
mailing list