[Bioperl-l] bp_genbank2gff3.pl

Scott Cain scott at scottcain.net
Sat Sep 18 13:40:35 UTC 2010


Hi Dave,

Let's keep the discussion on the mailing list so we can make sure that
when this problem is solved, its resolution will be archived.

I don't really understand what is going on either, though it would
probably be a good idea to set your PERL5LIB env variable so that when
you execute this script from the git repository that it will also uses
BioPerl modules in the git repository instead of the ones that are
installed in your "normal" path.

Also, are you using any command line flags when executing it?  I didn't.

Scott


On Sat, Sep 18, 2010 at 2:14 PM, David Breimann
<david.breimann at gmail.com> wrote:
> Yes, I'm using Ubuntu 10.04.
>
> That is really weired. I tried running the script from the perl-live dir
> (which I just pulled using git), and I get the same results as before
> (`Name` instead of `locus_tag`):
>
>  $ wget
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>  $ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
> NC_009789.genbank
>
> Attached is the resulting GFF3.
> I also attach a copy of bp_genbank2gff3.pl as found under
> /home/dave/src/bioperl-live/blib/script.
>
> This is a real mystery for me!
>
> On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Typically I do build and install, but you can run it directly from the
>> git checkout directory.
>>
>> For locating other versions of the script, are you running linux?  If
>> so, are you familiar with the "locate" command:
>>
>>  locate bp_genbank2gff3.pl
>>
>> If you've never used it before, you may need to update the database
>> the locate command uses as root:
>>
>>  sudo updatedb
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed
>> > of
>> > `locus_tag=`.
>> >
>> > I don't really know how to check for multiple bioperl installations.
>> > I'm using my personal server, so I don't mind removing and installing
>> > everything from scratch -- but I do'nt know ho to do that.
>> >
>> > Also, what I don't get with the git is how the scripts are supposed to
>> > be
>> > updated (unless you build and install).
>> >
>> > Thanks you!
>> >
>> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net> wrote:
>> >>
>> >> Well, if you aren't getting the same results as me then I'd say you
>> >> aren't using the same version of the script :-)
>> >>
>> >> Unfortunately, the scripts are no longer automatically marked with the
>> >> "internal" version information when committed, so there really isn't
>> >> anything in the script I can tell you to look for.  Check for more
>> >> than one bioperl instance on your  computer.
>> >>
>> >> I've attached the GFF3 file I got so you can look at it and tell me if
>> >> it is what you expect.
>> >>
>> >> Scott
>> >>
>> >>
>> >>
>> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
>> >> <david.breimann at gmail.com> wrote:
>> >> > Hi Scott,
>> >> >
>> >> > I just pulled the lated bioperl-live using git.
>> >> > I'm not sure how the scripts are updated, so I Build and installed
>> >> > anyway
>> >> > (perhaps exporting the path is supposed to be enough?)
>> >> > Anyway, I still get the same results. No locus_tag.
>> >> > How can I tell if I'm using the latest version of the script?
>> >> >
>> >> > Thanks again.
>> >> >
>> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
>> >> > wrote:
>> >> >>
>> >> >> Hi Dave,
>> >> >>
>> >> >> A fresh "pull" of the bioperl git repository shows that
>> >> >> bp_genbank2gff3.pl already does this.  It creates a locus_tag for
>> >> >> all
>> >> >> features that have a locus_tag, and uses the locus_tag for the ID
>> >> >> when
>> >> >> it can (it can't blindly use the locus tag for the ID since both the
>> >> >> gene and the CDS have the same tag).
>> >> >>
>> >> >> Scott
>> >> >>
>> >> >>
>> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
>> >> >> <david.breimann at gmail.com> wrote:
>> >> >> > Hi Scott,
>> >> >> >
>> >> >> > Here is a very short genbank:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> >> >> >
>> >> >> > Note all genes in the genbank have locus tags. In the resulting
>> >> >> > GFF3,
>> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I
>> >> >> > have
>> >> >> > no
>> >> >> > idea why it deserves a special treatment... :)
>> >> >> >
>> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last
>> >> >> > column
>> >> >> > whenever available) will really make my life easier.
>> >> >> >
>> >> >> > Thank you,
>> >> >> > Dave
>> >> >> >
>> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <scott at scottcain.net>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi Dave,
>> >> >> >>
>> >> >> >> That seems perfectly reasonable.  If you could point out a
>> >> >> >> GenBank
>> >> >> >> entry for which that does not happen, I could try to figure out
>> >> >> >> why
>> >> >> >> not.
>> >> >> >>
>> >> >> >> Scott
>> >> >> >>
>> >> >> >>
>> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
>> >> >> >> > locus_tag
>> >> >> >> > will
>> >> >> >> > be
>> >> >> >> > always added to the GFF last column if it exists in the
>> >> >> >> > genbank,
>> >> >> >> > whether
>> >> >> >> > it
>> >> >> >> > is used as ID in the GFF or not.
>> >> >> >> >
>> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
>> >> >> >> > <scott at scottcain.net>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> Hi Dave,
>> >> >> >> >>
>> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal
>> >> >> >> >> with
>> >> >> >> >> GenBank files :-)  It was designed initially to work on whole
>> >> >> >> >> genome
>> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make
>> >> >> >> >> it
>> >> >> >> >> "do
>> >> >> >> >> the right thing."  In practice, it is not unusual for a post
>> >> >> >> >> processing step (either by hand or a quicky perl script) to be
>> >> >> >> >> required to really get it right.  I don't recall the specifics
>> >> >> >> >> (if I
>> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I do
>> >> >> >> >> know
>> >> >> >> >> that there is a list of things that it will try to use for the
>> >> >> >> >> ID,
>> >> >> >> >> and
>> >> >> >> >> while the locus is on the list, I don't know where it comes in
>> >> >> >> >> the
>> >> >> >> >> list, so it's possible that other items might supersede it.
>> >> >> >> >>
>> >> >> >> >> Scott
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> >> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> >> > Hello,
>> >> >> >> >> >
>> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds
>> >> >> >> >> > a
>> >> >> >> >> > `locus_tag`
>> >> >> >> >> > in the fields and sometime it doesn't, even though the
>> >> >> >> >> > genabank
>> >> >> >> >> > has a
>> >> >> >> >> > locus
>> >> >> >> >> > tag.
>> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
>> >> >> >> >> >
>> >> >> >> >> > Thanks,
>> >> >> >> >> > Dave
>> >> >> >> >> > _______________________________________________
>> >> >> >> >> > Bioperl-l mailing list
>> >> >> >> >> > Bioperl-l at lists.open-bio.org
>> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> ------------------------------------------------------------------------
>> >> >> >> >> Scott Cain, Ph. D.                                   scott at
>> >> >> >> >> scottcain
>> >> >> >> >> dot net
>> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> >> 216-392-3087
>> >> >> >> >> Ontario Institute for Cancer Research
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> ------------------------------------------------------------------------
>> >> >> >> Scott Cain, Ph. D.                                   scott at
>> >> >> >> scottcain
>> >> >> >> dot net
>> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> 216-392-3087
>> >> >> >> Ontario Institute for Cancer Research
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >>
>> >> >> ------------------------------------------------------------------------
>> >> >> Scott Cain, Ph. D.                                   scott at
>> >> >> scottcain
>> >> >> dot net
>> >> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> >> >> Ontario Institute for Cancer Research
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> ------------------------------------------------------------------------
>> >> Scott Cain, Ph. D.                                   scott at scottcain
>> >> dot net
>> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> >> Ontario Institute for Cancer Research
>> >
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>
>



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research




More information about the Bioperl-l mailing list