[Bioperl-l] Getting 'features' from SearchIO?

Jason Stajich jason at bioperl.org
Mon May 11 11:38:14 EDT 2009


Dan -

There is nice documentation on the gmod website covering the gbrowse  
tutorial on the expected format of alignment features.   That is what  
you should probably be generating and loading with the  
bp_seqfeature_load script -- otherwise you need to be converting the  
HSPs into seqfeatures with the right associated information (i.e. the  
tag/value pairs that are in the 9th column) in order to have well  
structured data in the database.

There are boilerplate examples of how to visualize alignments on the  
Gbrowse tutorial website as well so I commend that as great starting  
place for GFF, data, conf files, and what kind of visualization you  
can obtain with the browser.

There is also some helper scripts, that do this for you like  
bp_search2gff. Just dumping the feature will take the query ( i  
believe) of the feature pair that is the HSP by default, so you will  
need to make some choices about what information you want.  You can  
get the individual features from the feature pair with $hsp->query  or  
$hsp->hit  which can also be passed to a GFF writer (or call $hsp->hit- 
 >gff_string).   Note that since the data storage is not structured in  
a GFF3 like-way this won't immediately produce well formed GFF3 for  
the 9th column.

Here's a script I use for some DNA to genome alignments, from FASTA  
output for example - it assumes 1 HSP per Hit as per what you get from  
SSEARCH but is a reasonable jumping off place.
  http://bit.ly/fasta2gff

There is also a wublast to gff converting script in that repository as  
well.

-jason
On May 11, 2009, at 8:07 AM, Dan Bolser wrote:

> 2009/5/11 Dan Bolser <dan.bolser at gmail.com>:
>> Hi,
>>
>> I am parsing a blasttable and extracting Bio::Search::HSP::GenericHSP
>> objects as a result. I read somewhere that HSP objects inherit  
>> Feature
>> objects... How can I get a 'standard' representation of the HSP as a
>> feature? Basically I'd like to simply load the blast results into a
>> feature database...
>>
>> When I call feature methods on the HSP objects I just get blank or
>> undef results... I think this is because I'm trying to get at the
>> sequences existing (non existent) features, rather than get the HSP
>> object as a feature... If that makes sense... How can I confirm  
>> that I
>> have a feature object containing the details of the HSP?
>>
>> I thought of trying to just pass the HSP object to the
>> Bio::DB::SeqFeature::Store, but I need to get that up and running
>> first (I'm looking into it). In the mean time I thought I'd ask if
>> this sounds like the right thing to do.
>
> Well it works... I am seeing things fill into the database as I call
>
> $db->store($p)
> 	or die "Couldn't store!";
>
> (I needed to upgrade bioperl to get Bio::DB::SeqFeature working).
>
>
> Here is my code;
>
> while(my $r = $s->next_result ){
>  print $r->query_name, "\n";
>  while(my $h = $r->next_hit){
>    print "\t", $h->name, "\n";
>    while(my $p = $h->next_hsp){
>      $db->store($p)
> 	or die "Couldn't store!";
>    }
>  }
> }
>
>
> How can I visualize the resulting set of HSPs? i.e.  If I point
> gbrowse at this location, will it automatically pick up the entry
> points and their features from the database? Or how much manual
> configuration will I need? Is there some boilerplate config I can use
> to visualize this?
>
> Cheers,
> Dan.
>
>
>> More generally I want to have features attached to sequences that are
>> themselves annotations of larger sequences (but with unknown
>> position). Is Bio::DB::SeqFeature::Store a way to go? I need to  
>> manage
>> various different bits of information coming from a sequencing
>> project, and I need a solution to the whole 'assembly life cycle
>> management' problem.
>>
>> Thanks for any help,
>> Dan.
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason at bioperl.org






More information about the Bioperl-l mailing list