[Bioperl-l] Parsing Genbank

Brandi Cantarel bcantarel at som.umaryland.edu
Wed Dec 2 19:29:56 UTC 2009


Here is some of my code, the real code actually enters the data into a database.


$in  = Bio::SeqIO->new(-file => $gbkfile,
		       '-format' => 'genbank');

W1:while (my $seq = $in->next_seq()) {
  my @feats = $seq->get_all_SeqFeatures();
  my $j = 0;
 F1:foreach $cds (@feats) {
	next F1 unless ($cds->primary_tag() eq 'CDS');
	#do something with the cds start and cds end
	}
}
	 

LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
ACCESSION   subjpool12_contig3
KEYWORDS    .
SOURCE      human metagenome
  ORGANISM  human metagenome
            unclassified sequences; organismal metagenomes,metagenomes.
FEATURES             Location/Qualifiers
     source          1..974
                     /mol_type="genomic DNA"
                     /isolation_source="Homo sapiens"
                     /organism="human metagenome"
                     /collection_date="19-Nov-2009"
     CDS             complement(911..974)
                     /locus_tag="subjpool12_contig3|metagene|gene_2"
                     /translation="IRIMTVELINPYIRHVEHST"
                     /score="2.52804"
                     /product="hypothetical protein"
                     /note="score=2.52804"
                     /note="score=2.52804"
                     /note="frame=1"
ORIGIN
#some sequence….




>From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.



~~~~~~~~~~~~~~~~~~~~
Brandi Cantarel, PhD
Bioinformatics Analyst
Institute for Genome Sciences
School of Medicine
University of Maryland, Baltimore

On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:

> Hi Brandi-
> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.
> Can you elaborate by posting your code?
> cheers,
> MAJ
> ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, December 02, 2009 1:36 PM
> Subject: [Bioperl-l] Parsing Genbank
> 
> 
>> Hi all,
>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.
>> 
>> 
>> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.
>> 
>> x $cds->start
>> 1
>> x $cds->end
>> 64
>> 
>> How can I get the original coordinates?  Is there a command for that or will I have to just do the math?
>> 
>> Feature or Bug?
>> 
>> 
>> ~~~~~~~~~~~~~~~~~~~~
>> Brandi Cantarel, PhD
>> Bioinformatics Analyst
>> Institute for Genome Sciences
>> School of Medicine
>> University of Maryland, Baltimore
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 





More information about the Bioperl-l mailing list