[Bioperl-l] bp_genbank2gff3.pl

David Breimann david.breimann at gmail.com
Sat Sep 18 10:20:50 UTC 2010


Hi Scott,

Here is a very short genbank:
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk

Note all genes in the genbank have locus tags. In the resulting GFF3,
however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no
idea why it deserves a special treatment... :)

p.s. making this change (i.e., copying locus_tag to the GFF3 last column
whenever available) will really make my life easier.

Thank you,
Dave

On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <scott at scottcain.net> wrote:

> Hi Dave,
>
> That seems perfectly reasonable.  If you could point out a GenBank
> entry for which that does not happen, I could try to figure out why
> not.
>
> Scott
>
>
> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
> <david.breimann at gmail.com> wrote:
> > Since locus_tag is an essential tag in genbank, I suggest locus_tag will
> be
> > always added to the GFF last column if it exists in the genbank, whether
> it
> > is used as ID in the GFF or not.
> >
> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net>
> wrote:
> >>
> >> Hi Dave,
> >>
> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with
> >> GenBank files :-)  It was designed initially to work on whole genome
> >> refseqs, and contains several ad hoc rules for trying to make it "do
> >> the right thing."  In practice, it is not unusual for a post
> >> processing step (either by hand or a quicky perl script) to be
> >> required to really get it right.  I don't recall the specifics (if I
> >> ever knew :-) for when and how the locus tag is used, but I do know
> >> that there is a list of things that it will try to use for the ID, and
> >> while the locus is on the list, I don't know where it comes in the
> >> list, so it's possible that other items might supersede it.
> >>
> >> Scott
> >>
> >>
> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
> >> <david.breimann at gmail.com> wrote:
> >> > Hello,
> >> >
> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
> >> > `locus_tag`
> >> > in the fields and sometime it doesn't, even though the genabank has a
> >> > locus
> >> > tag.
> >> > Also, is the ID always equivalent to the locus tag?
> >> >
> >> > Thanks,
> >> > Dave
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >
> >>
> >>
> >>
> >> --
> >> ------------------------------------------------------------------------
> >> Scott Cain, Ph. D.                                   scott at scottcain
> >> dot net
> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> >> Ontario Institute for Cancer Research
> >
> >
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>



More information about the Bioperl-l mailing list