NCBI fasta format [was: Re: [Bioperl-l] loading data into bioperl-db]

Hilmar Lapp hlapp at gnf.org
Fri Jun 6 15:33:46 EDT 2003



> -----Original Message-----
> From: Aaron J Mackey [mailto:ajm6q at virginia.edu] 
> Sent: Friday, June 06, 2003 1:07 PM
> To: Bioperl
> Subject: NCBI fasta format [was: Re: [Bioperl-l] loading data 
> into bioperl-db]
> 
[...]
> 
> It should make loading up biosql databases from flatfiles a 
> bit easier, too.
> 
> Any lurkers want to write Bio::SeqIO::fasta_ncbi.pm (inheriting from
> Bio::SeqIO::fasta) ??  I guess we'd have to agree on where 
> the "db" and any secondary accession/names would be stored in 
> which Seq model ...
> 

Or as I pointed earlier you'd write a Bio::Seq::BaseSeqProcessor-derived
module:

package MySeqProcessor;
use vars(@ISA);
use strict;
use Bio::Seq::BaseSeqProcessor;
@ISA = qw(Bio::Seq::BaseSeqProcessor);

# this is the only method you need to override
sub process_seq{
	my $self = shift; my $seq = shift;
	my @idflds = split(/\|/,$seq->display_id);
	if(@idflds > 1) {
		$seq->namespace($idflds[1]);
		my ($acc,$v) = ($idflds[@idflds-1]);
		if($acc =~ /^(.*)\.(\d{1,2})$/) {$acc = $1; $v = $2;}
		$seq->accession_number($acc);
		$seq->version($v);
	}
	# I could massage many more things here 
	# when done, return it
	return $seq;
}
1;
__END__

And then you'd do

	my $seqio = <open your SeqIO here as would otherwise>;
	my $pipe = MySeqProcessor->new(-source_stream => $seqio);
	# treat $pipe as if it were a SeqIO stream
	while(my $seq = $pipe->next_seq()) {
		# whatever
	}
	$pipe->close; # cascades

Or, to load via load_seqdatabase.pl:

	$ load_seqdatabase.pl <your normal options here> \
	                      --pipeline "MySeqProcessor"

The advantage is you can modify and tweak it easily at any time and plug
it back in (no make / install or messing with perl libraries), and you
can use it for any format, not just fasta.

	-hilmar


> -Aaron
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org 
> http://portal.open-> bio.org/mailman/listinfo/bioperl-l
> 



More information about the Bioperl-l mailing list