[Bioperl-l] load_seqdatabase.pl does not like fasta format

Andy Hammer facemann at yahoo.com
Mon Jun 14 19:03:40 EDT 2004


Thanks all, 

Marc, your one line of code:
$seq->accession_number($display_id);
was all that it took to make the script work.  But now
I am a bit more aware of how biosql is storing the
information.

Thank you for the lesson!

Andy

--- Marc Logghe <Marc.Logghe at devgen.com> wrote:
> Hi Andy,
> 
> > your fasta sequence was 'unknown'. Since the
> triple of  
> > (accession,version,namespace) is constrained by
> and used as a unique  
> > key, and given that fasta doesn't provide version
> numbers, your  
> > sequences will all be considered identical if the
> accession is  
> > 'unknown' for all of them. I.e., after the first
> one is 
> > inserted, the  
> > second one and all others will fail to insert.
> That is because when you load from fasta, the seqID
> goes into the bioperl display_name slot and finally
> into the biosql name field.
> The accession number (bioperl accession_number slot)
> is empty and set to unknown by default. As this slot
> ends up in the accession field in the biosql schema,
> you end up into troubles because EVERY accession
> will be unknown.
> I solved this be adding a --pipeline argument (e.g.
> Bio::SeqProcessor::Accession) with a really simple
> SeqProcessor that copies the display_name into the
> accesion_number slot
> 
> package Bio::SeqProcessor::Accession;
> use strict;
> use vars qw(@ISA);
> use Bio::Seq::BaseSeqProcessor;
> sub process_seq{
>     my ($self,$seq) = @_;
>     my $display_id = $seq->display_id;
>     $seq->accession_number($display_id);
>     return ($seq);
> }
> 
> 
> HTH,
> Marc
> 
> 



	
		
__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 


More information about the Bioperl-l mailing list