NCBI fasta format [was: Re: [Bioperl-l] loading data into
	bioperl-db]
    Hilmar Lapp 
    hlapp at gnf.org
       
    Fri Jun  6 15:33:46 EDT 2003
    
    
  
> -----Original Message-----
> From: Aaron J Mackey [mailto:ajm6q at virginia.edu] 
> Sent: Friday, June 06, 2003 1:07 PM
> To: Bioperl
> Subject: NCBI fasta format [was: Re: [Bioperl-l] loading data 
> into bioperl-db]
> 
[...]
> 
> It should make loading up biosql databases from flatfiles a 
> bit easier, too.
> 
> Any lurkers want to write Bio::SeqIO::fasta_ncbi.pm (inheriting from
> Bio::SeqIO::fasta) ??  I guess we'd have to agree on where 
> the "db" and any secondary accession/names would be stored in 
> which Seq model ...
> 
Or as I pointed earlier you'd write a Bio::Seq::BaseSeqProcessor-derived
module:
package MySeqProcessor;
use vars(@ISA);
use strict;
use Bio::Seq::BaseSeqProcessor;
@ISA = qw(Bio::Seq::BaseSeqProcessor);
# this is the only method you need to override
sub process_seq{
	my $self = shift; my $seq = shift;
	my @idflds = split(/\|/,$seq->display_id);
	if(@idflds > 1) {
		$seq->namespace($idflds[1]);
		my ($acc,$v) = ($idflds[@idflds-1]);
		if($acc =~ /^(.*)\.(\d{1,2})$/) {$acc = $1; $v = $2;}
		$seq->accession_number($acc);
		$seq->version($v);
	}
	# I could massage many more things here 
	# when done, return it
	return $seq;
}
1;
__END__
And then you'd do
	my $seqio = <open your SeqIO here as would otherwise>;
	my $pipe = MySeqProcessor->new(-source_stream => $seqio);
	# treat $pipe as if it were a SeqIO stream
	while(my $seq = $pipe->next_seq()) {
		# whatever
	}
	$pipe->close; # cascades
Or, to load via load_seqdatabase.pl:
	$ load_seqdatabase.pl <your normal options here> \
	                      --pipeline "MySeqProcessor"
The advantage is you can modify and tweak it easily at any time and plug
it back in (no make / install or messing with perl libraries), and you
can use it for any format, not just fasta.
	-hilmar
> -Aaron
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org 
> http://portal.open-> bio.org/mailman/listinfo/bioperl-l
> 
    
    
More information about the Bioperl-l
mailing list