[Bioperl-l] GenBank Parser

Hilmar Lapp hlapp@gnf.org
Tue, 31 Dec 2002 01:47:56 -0800


The genbank parser in bioperl does all of this. Code snippets for 
exactly your problem have been posted on the mailing list a number of 
times, you may want to search the archives. Roughly, the deal is along 
the following lines.

use Bio::SeqIO;
my $in = Bio::SeqIO->new(-format => 'genbank', -fh => \*STDIN);
while(my $seq = $in->next_seq()) {
	foreach my $feat ($seq->get_SeqFeatures()) {
		next unless $feat->primary_tag eq 'CDS';
		# nucleotide sequence of the transcript as a Bio::Seq object
		my $mrna = $feat->spliced_seq();
		# protein sequence as a Bio::Seq object
		my $prot = $mrna->translate();
		# do whatever you like with those Seq objects
	}
}

The respective PODs you want to read are Bio::SeqIO, Bio::Seq, and 
Bio::SeqFeatureI.

	-hilmar

On Monday, December 30, 2002, at 06:02  PM, Drew Stewart wrote:

> Hi Everybody,
> I have just joined the mailing list and kind of new to
> Bioperl.
> Here is my problem,
> I am trying to write a parser for the GenBank file
> obtained from the NCBI website, specifically for the
> "CDS" feature in the GenBank file.
> I am trying to parse the range given in the "CDS"
> feature to get the nucleotide subsequence from the
> whole genome for a specific protein.
> For example here is a portion of the genbank file I am
> interested in parsing.
>
>   CDS   complement(join(12..78,54..1043))
>
> or
>
> CDS   join(complement(<1..799),complement(5080..5120))
>
> I want to parse  this and other possible formats
> "join(complement(<1..799),complement(5080..5120))"
>
> I have seen in the GenBank readme file that there are
> many other possible formats for this CDS feature line
> and so I was wondering if somebody has already written
> a parser for this.
>
> Can anyone please suggest some things I can use.
> It would be a great help. Thank you.
>
> Sincerely,
> Dhruv Bhatt.
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------