[Bioperl-l] Parsing Genbank

Chris Fields cjfields at illinois.edu
Wed Dec 2 21:07:58 UTC 2009


One never knows, but I would be very surprised if this somehow snuck by the test suite we have, particularly since Gbrowse extensively uses SeqFeatures (any changes should have popped out along the way). 

Not much we can do unless we have something to help confirm the problem.  Also might help to know the source of the genbank file itself.

chris

On Dec 2, 2009, at 2:52 PM, Mark A. Jensen wrote:

> Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds
> as if there is a bug. If you can provide data that can reproduce
> it, as Chris suggests, we can get onto it. 
> thanks MAJ
>  ----- Original Message ----- 
>  From: Brandi Cantarel 
>  To: Mark A. Jensen 
>  Sent: Wednesday, December 02, 2009 3:38 PM
>  Subject: Re: [Bioperl-l] Parsing Genbank
> 
> 
>  How can I tell what version I am using?When I use the command from the website:
> 
> 
>  perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'
> 
> 
>  I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release….
> 
> 
>  Brandi
> 
> 
> 
> 
>  On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote:
> 
> 
>    with fake seq data and that header, I don't get a problem:
> 
>    DB<2> x $cds->location
>    0  Bio::Location::Simple=HASH(0x37b1df4)
>     '_end' => 974
>     '_location_type' => 'EXACT'
>     '_root_verbose' => 0
>     '_seqid' => 'subjpool12_contig3'
>     '_start' => 911
>     '_strand' => '-1'
> 
>    Are you using the latest BioPerl (1.6.1 or the trunk) ?
>    MAJ
>    ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
>    Cc: <bioperl-l at lists.open-bio.org>
>    Sent: Wednesday, December 02, 2009 2:29 PM
>    Subject: Re: [Bioperl-l] Parsing Genbank
> 
> 
>    Here is some of my code, the real code actually enters the data into a database.
> 
> 
>    $in  = Bio::SeqIO->new(-file => $gbkfile,
>         '-format' => 'genbank');
> 
>    W1:while (my $seq = $in->next_seq()) {
>    my @feats = $seq->get_all_SeqFeatures();
>    my $j = 0;
>    F1:foreach $cds (@feats) {
>    next F1 unless ($cds->primary_tag() eq 'CDS');
>    ###>> debugger stops here for above output
> 
>    #do something with the cds start and cds end
>    }
>    }
> 
> 
>    LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
>    ACCESSION   subjpool12_contig3
>    KEYWORDS    .
>    SOURCE      human metagenome
>    ORGANISM  human metagenome
>              unclassified sequences; organismal metagenomes,metagenomes.
>    FEATURES             Location/Qualifiers
>       source          1..974
>                       /mol_type="genomic DNA"
>                       /isolation_source="Homo sapiens"
>                       /organism="human metagenome"
>                       /collection_date="19-Nov-2009"
>       CDS             complement(911..974)
>                       /locus_tag="subjpool12_contig3|metagene|gene_2"
>                       /translation="IRIMTVELINPYIRHVEHST"
>                       /score="2.52804"
>                       /product="hypothetical protein"
>                       /note="score=2.52804"
>                       /note="score=2.52804"
>                       /note="frame=1"
>    ORIGIN
>    #some sequence….
> 
> 
> 
> 
> 
>      From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.
> 
> 
> 
> 
>    ~~~~~~~~~~~~~~~~~~~~
>    Brandi Cantarel, PhD
>    Bioinformatics Analyst
>    Institute for Genome Sciences
>    School of Medicine
>    University of Maryland, Baltimore
> 
>    On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:
> 
> 
>      Hi Brandi-
> 
>      If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.
> 
>      Can you elaborate by posting your code?
> 
>      cheers,
> 
>      MAJ
> 
>      ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
> 
>      To: <bioperl-l at lists.open-bio.org>
> 
>      Sent: Wednesday, December 02, 2009 1:36 PM
> 
>      Subject: [Bioperl-l] Parsing Genbank
> 
> 
> 
> 
> 
>        Hi all,
> 
>        I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.
> 
> 
> 
> 
> 
>        For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.
> 
> 
> 
>        x $cds->start
> 
>        1
> 
>        x $cds->end
> 
>        64
> 
> 
> 
>        How can I get the original coordinates?  Is there a command for that or will I have to just do the math?
> 
> 
> 
>        Feature or Bug?
> 
> 
> 
> 
> 
>        ~~~~~~~~~~~~~~~~~~~~
> 
>        Brandi Cantarel, PhD
> 
>        Bioinformatics Analyst
> 
>        Institute for Genome Sciences
> 
>        School of Medicine
> 
>        University of Maryland, Baltimore
> 
> 
> 
> 
> 
>        _______________________________________________
> 
>        Bioperl-l mailing list
> 
>        Bioperl-l at lists.open-bio.org
> 
>        http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> 
> 
> 
>    _______________________________________________
>    Bioperl-l mailing list
>    Bioperl-l at lists.open-bio.org
>    http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list