[Bioperl-l] my bioperl-db hacks

T.D. Houfek tdhoufek at unity.ncsu.edu
Tue Dec 30 12:00:47 EST 2003


I'm monkeying around with bioperl-db 0.1, trying to see what I can get
it to do.  I set about  following some instructions that tell
you how to use the "load_seqdatabase.pl" script to fill your bioperl
database with sequence from a swissprot release file.  (I am using
sprot42.dat).  This did not work for me initally, but I made some
vicious hacks to the code and now the script seems to work more or
less.  It's this "more or less" I'd like comments on... I suspect other
things may have broken because of what I have done, and that someone who
knows the code can help me to find a more stable solution.

I think the problem is arising when in parsing the sprot42.dat file,
Bioperl encounters a record with a feature whose location must be
expressed as a Bio::Location::Fuzzy object.  The inline documentation of
biosqldb-mysql indicates that Fuzzy objects are not supported yet
(but gives you an idea of where you could start if you wished to do so).

Anyway, I first encountered an exception around line 169, of 
Bio/DB/SQL/SeqLocationAdaptor.pm where a check is made to see whether 
$location->isa() isa the righta kinda of object. 

I just added the Fuzzy objects to the list of invited guests:
                                                                                                                                                                                                  
# --start snippet ---------------------
        if( $location->isa('Bio::Location::SplitLocationI')  ) {
               my $rank = 1;
               foreach my $sub ( $location->sub_Location ) {
                   $self->_store_component($sub,$seqfeature_id,$rank);
                   $rank++;
               }
           } elsif( $location->isa('Bio::Location::Simple') ) {
               $self->_store_component($location,$seqfeature_id,1);
           } elsif( $location->isa('Bio::Location::Fuzzy') ) {
                $self->_store_component($location,$seqfeature_id,1);
           } else {
               $self->throw("Not a simple location nor a split nor a
fuzzy. Says its a $location->type.  Yikes");
                                                                                                                    
           }
# -- end snippet ----------------------
                                                                                                                    
                                                                                                                   
Once I fixed this the only thing that broke was around line 208. 
Probably because of the normal behavior supporting Fuzzy locations (but
of course I mention it in case it is bad behavior) some locations passing 
through this section of code were missing either starts or ends.  The
$start and $end variables were set to the null string, and the SQL insert
sequence they were passed into failed.  Failure in depositing one entry 
would terminate the script (but did not undo prior inserts).

With a two-line hack circa 208 I sidestepped outright failures.  I just
made forced uninitialized endpoints to be zero:

	# -- start snippet -------

	unless ($end) { $end=0; }       ## ADDED THESE TWO
	unless ($start) { $start=0; }   ## LINES HERE

   	my $sth = $self->prepare("insert into seqfeature_location
(seqfeature_location_id,seqfeature_id,seq_start,seq_end,seq_strand,location_rank) VALUES (NULL,$seqfeature_id,$start,$end,$strand,$rank)");
                                      
	# -- end snippet ---------
                
Of course all I have really done is provide for a completely buggy 
persistence of Fuzzy objects.

My guess is that SeqLocationAdaptor needs to be upgraded to handle the
Fuzzy locations that Bioperl wants to make out of the Swissprot input.
Is anyone already undertaking this?  Does anyone have any insight about what
problems this hack of mine will cause downstream?


-------------------------------
T.D. Houfek
(email sound-alike: tdhoufek-AT-unity-DOT-ncsu-DOT-edu
bioinformatics development lead
Tobacco Genome Initiative
North Carolina State University
-------------------------------



More information about the Bioperl-l mailing list