[Bioperl-l] extracting CDS portion of RefSeqs

Wed Dec 14 11:18:01 EST 2005

Sorry, I hit send before I finished my email

Anyways, I want to extract out the CDS portion of human refseqs. I
downloaded the most recent refseq release in genbank format. I was
extracting out the CDS portion this way:

 foreach my $feat ( $seq->get_SeqFeatures() ) {
             if( $feat->primary_tag eq 'CDS' ) {
		 my $start = $feat->start;
		 my $end = $feat->end;

		 my $seqstr   = $seq->subseq($start,$end);
		 my $displayid = $seq->display_name;
		 my $seqobj = Bio::Seq->new( -display_id => "$displayid:$start..$end",
					     -seq => $seqstr);
		 my $out = Bio::SeqIO->new(-format => 'Fasta');
		 $out->write_seq($seqobj);

But this is quite slow since the refseq genbank file is quite large.
Is there anyway to download the CDS portion of refseq from NCBI? Is
there a quicker BioPerl solution than the one I have?

Thanks for your help.

Amit

--
Amit Indap
http://www.bscb.cornell.edu/Homepages/Amit_Indap/