[Bioperl-l] Bio::SeqIO, genbank -> fasta, protein only?

Jay Hannah jay at jays.net
Fri Oct 13 16:24:48 UTC 2006


So I'm doing the following:

1) Using Bio::SeqIO to read in a genbank file and kick out fasta.
2) Reading that fasta file w/ command line formatdb.
3) Using that output for command line blastall.
4) Using Bio::SearchIO to read the blast results.

(If there's a better way, do tell. -grin-)

This sequence is working great for nucleotide BLASTing, but I'm stuck on step 1 when trying protein BLAST. 

my $seq_in  = Bio::SeqIO->new(
   -file => "<Organism1.genbank", 
   -format => "genbank", 
   -alphabet => "protein"
);
my $seq_out_protein = Bio::SeqIO->new(
   -file => ">out",
   -format => 'fasta',
   -alphabet => 'protein'
);
while (my $inseq = $seq_in->next_seq) {
   $inseq->molecule("protein");
   $seq_out_protein->write_seq($inseq);
}

This creates a nucleotide file "out". Setting -alphabet doesn't seem to do anything. Setting molecule("protein") doesn't seem to do anything either.

I was expecting that it would just pull all the CDS strings out of the genbank file and dump those into fasta format?

Am I missing something obvious?

Thanks,

j




More information about the Bioperl-l mailing list