From hlapp at gmx.net Fri Aug 1 00:59:54 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 1 Aug 2008 00:59:54 -0400 Subject: [BioSQL-l] Release 1.0.1 in the making Message-ID: <4E2B21EA-6604-4863-96B4-8737A33930E6@gmx.net> I am preparing the release of v1.0.1. This will primarily change the too short column width constraint on dbxref.accession (and in consequence that of bioentry.accession) to 128 chars. I have added migration scripts for Pg, MySQL, and Oracle. It'd be great if someone could help out by providing (and ideally testing) the respective DDL for one (or more) of the other database for which we have schema DDLs: HSQLDB, and Derby. I'm also adding notes about the PostgreSQL v8.3+ incompatibility if you need the Perl language binding (the 8.3 change should have actually very little effect for a strongly typed language such as Java), and have added the scripts by Peter Eisentraut (Pg developer) for people willing to try it (it's otherwise completely untested, though in theory it should work). See http://www.biosql.org/wiki/Releases#BioSQL_release_v1.0.1 for the release plan. Let me know if you would like to include anything else. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Fri Aug 1 05:24:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 1 Aug 2008 10:24:16 +0100 Subject: [BioSQL-l] Fwd: Release 1.0.1 in the making In-Reply-To: <320fb6e00808010223w1d6b698dm55dbfc6e98873040@mail.gmail.com> References: <4E2B21EA-6604-4863-96B4-8737A33930E6@gmx.net> <320fb6e00808010223w1d6b698dm55dbfc6e98873040@mail.gmail.com> Message-ID: <320fb6e00808010224o171eddf2g26e3c29c1a74e3ad@mail.gmail.com> Forgot to send to the list (sorry Hilmar, you'll get this twice!) On Fri, Aug 1, 2008 at 5:59 AM, Hilmar Lapp wrote: > I am preparing the release of v1.0.1. This will primarily change the too > short column width constraint on dbxref.accession (and in consequence that > of bioentry.accession) to 128 chars. > > ... > > See http://www.biosql.org/wiki/Releases#BioSQL_release_v1.0.1 for the > release plan. Let me know if you would like to include anything else. Is fixing Bug 2470 too ambitious for your planned release schedule? http://bugzilla.open-bio.org/show_bug.cgi?id=2470 This would help with some mooted Biopython enhancements to populate the taxonomy "on demand" as new sequences are added (see http://bugzilla.open-bio.org/show_bug.cgi?id=2475 although this is a bit long to read!). Thanks, Peter (Biopython) From jimp at compbio.dundee.ac.uk Fri Aug 1 05:56:56 2008 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Fri, 01 Aug 2008 10:56:56 +0100 Subject: [BioSQL-l] BioSQL at BOSC08 - Was Re: (no subject) In-Reply-To: <02C35A12-7F3A-4C2B-9266-B5A863FF328B@gmx.net> References: <79ceddbc0807171108qe17f5a4g13730eeca90c1f2f@mail.gmail.com> <4884A4AB.3050907@compbio.dundee.ac.uk> <7B81518B-1F70-4382-BAF5-E04B6B062CBC@gmx.net> <4888A0A7.7090001@compbio.dundee.ac.uk> <342088F2-B4DE-4D57-ABE8-6431DA535370@gmx.net> <4888AEC2.8060008@compbio.dundee.ac.uk> <4889EB66.2010700@compbio.dundee.ac.uk> <02C35A12-7F3A-4C2B-9266-B5A863FF328B@gmx.net> Message-ID: <4892DDE8.4020709@compbio.dundee.ac.uk> Hilmar Lapp wrote: > On Jul 25, 2008, at 11:04 AM, James Procter wrote: >> I'd suggest that a wiki page is set up to describe any ad-hoc >> 'extensions' that BioSQL users think might be useful to the ... > actually that page exists already: > > http://www.biosql.org/wiki/Extensions doh - that'll teach me to look before I type :) > Right now all that's there is the fledgling PhyloDB module that's part > of the svn repository (though not yet of a release). Thanks - I may give the PhyloDB module a whirl in the next few months, too. This is, I suspect, a dumb question, but: is there a multiple sequence alignment representation within BioSQL ? This was going to be the extension I'd introduce - but if someone has already done this then I'd be happy to help harden it for production use. cheers j. From hlapp at gmx.net Fri Aug 1 13:18:09 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 1 Aug 2008 13:18:09 -0400 Subject: [BioSQL-l] Fwd: Release 1.0.1 in the making In-Reply-To: <320fb6e00808010224o171eddf2g26e3c29c1a74e3ad@mail.gmail.com> References: <4E2B21EA-6604-4863-96B4-8737A33930E6@gmx.net> <320fb6e00808010223w1d6b698dm55dbfc6e98873040@mail.gmail.com> <320fb6e00808010224o171eddf2g26e3c29c1a74e3ad@mail.gmail.com> Message-ID: On Aug 1, 2008, at 5:24 AM, Peter wrote: > [...] > Is fixing Bug 2470 too ambitious for your planned release schedule? > http://bugzilla.open-bio.org/show_bug.cgi?id=2470 Actually it would have been had I known what I would be getting myself into (I need the release tomorrow night at the very latest for a course we are holding here ... :) After going through the easy steps I realized that there was a real reason for doubling use of the NCBI taxonID as primary key - it is what links the hierarchical structure of the taxonomy together, and also links the taxon names to taxon nodes. Of cource this could be done as lookups, but with several times over looking up almost 500,000 nodes might slow things down a bit. So long story short it should be fixed now. There may be some remnant bugs so any testing would be much appreciated. The changes are committed to svn, but may need a bit more time to percolate to the anonymous svn server. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Fri Aug 1 13:35:14 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 1 Aug 2008 18:35:14 +0100 Subject: [BioSQL-l] *** SPAM *** Re: Fwd: Release 1.0.1 in the making In-Reply-To: References: <4E2B21EA-6604-4863-96B4-8737A33930E6@gmx.net> <320fb6e00808010223w1d6b698dm55dbfc6e98873040@mail.gmail.com> <320fb6e00808010224o171eddf2g26e3c29c1a74e3ad@mail.gmail.com> Message-ID: <320fb6e00808011035k46c110abg77a876e191ea4102@mail.gmail.com> >> [...] >> Is fixing Bug 2470 too ambitious for your planned release schedule? >> http://bugzilla.open-bio.org/show_bug.cgi?id=2470 > > Actually it would have been had I known what I would be getting myself into > (I need the release tomorrow night at the very latest for a course we are > holding here ... :) After going through the easy steps I realized that there > was a real reason for doubling use of the NCBI taxonID as primary key - it > is what links the hierarchical structure of the taxonomy together, and also > links the taxon names to taxon nodes. Of cource this could be done as > lookups, but with several times over looking up almost 500,000 nodes might > slow things down a bit. > > So long story short it should be fixed now. There may be some remnant bugs > so any testing would be much appreciated. The changes are committed to svn, > but may need a bit more time to percolate to the anonymous svn server. I won't be able to make time to try this until next week at the earliest (i.e. after your planned release), but when I get back to using Biopython with BioSQL again in earnest I will check this out. Thanks! Peter From hlapp at gmx.net Fri Aug 1 14:18:40 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 1 Aug 2008 14:18:40 -0400 Subject: [BioSQL-l] Fwd: Release 1.0.1 in the making In-Reply-To: <320fb6e00808011035k46c110abg77a876e191ea4102@mail.gmail.com> References: <4E2B21EA-6604-4863-96B4-8737A33930E6@gmx.net> <320fb6e00808010223w1d6b698dm55dbfc6e98873040@mail.gmail.com> <320fb6e00808010224o171eddf2g26e3c29c1a74e3ad@mail.gmail.com> <320fb6e00808011035k46c110abg77a876e191ea4102@mail.gmail.com> Message-ID: On Aug 1, 2008, at 1:35 PM, Peter wrote: >> So long story short it should be fixed now. There may be some >> remnant bugs >> so any testing would be much appreciated. The changes are committed >> to svn, >> but may need a bit more time to percolate to the anonymous svn >> server. > > I won't be able to make time to try this until next week at the > earliest (i.e. after your planned release), but when I get back to > using Biopython with BioSQL again in earnest I will check this out. By testing I meant primarily if people use other platforms that I do (PostgreSQL on MacOSX), such as MySQL or Oracle on Linux, and can give this a whirl as in, load the NCBI taxonomy into a scratch database (using the script), then load it again (simulating an update), and see whether there are any error or warning messages that'd be great. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Fri Aug 1 16:29:23 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 1 Aug 2008 21:29:23 +0100 Subject: [BioSQL-l] load_ncbi_taxonomy.pl Message-ID: <320fb6e00808011329t6005ee3bt7948718320aae3c7@mail.gmail.com> On Fri, Aug 1, 2008 at 7:18 PM, Hilmar Lapp wrote: > > On Aug 1, 2008, at 1:35 PM, Peter wrote: > >>> So long story short it [load_ncbi_taxonomy.pl] should be fixed now. There >>> may be some remnant bugs so any testing would be much appreciated. >>> The changes are committed to svn, but may need a bit more time to >>> percolate to the anonymous svn server. >> >> I won't be able to make time to try this until next week at the >> earliest (i.e. after your planned release), but when I get back to >> using Biopython with BioSQL again in earnest I will check this out. > > By testing I meant primarily if people use other platforms that I do > (PostgreSQL on MacOSX), such as MySQL or Oracle on Linux, and can give this > a whirl as in, load the NCBI taxonomy into a scratch database (using the > script), then load it again (simulating an update), and see whether there > are any error or warning messages that'd be great. OK, as a very cursory check I did a quick test on a Linux machine using MySQL. I just grabbed the latest script via the SVN webpage, then using an existing (partly populated) database: $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root --download true Downloading NCBI taxon database to taxdata Unable to close datastream at ./load_ncbi_taxonomy.pl line 726 This may be a network issue... the taxdata/taxdump.tar.gz file had downloaded OK, so I manually unzipped it, and then: $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root Loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp ... insert / update / delete taxon nodes ... updating new parent IDs ... (committing nodes) ... rebuilding nested set left/right values ... reading in taxon names from names.dmp ... deleting old taxon names ... inserting new taxon names ... cleaning up Done. So no further error messages - however, I have not actually checked to see what exactly this did to my database ;) Peter From biopython at maubp.freeserve.co.uk Fri Aug 1 16:58:14 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 1 Aug 2008 21:58:14 +0100 Subject: [BioSQL-l] load_ncbi_taxonomy.pl In-Reply-To: <320fb6e00808011329t6005ee3bt7948718320aae3c7@mail.gmail.com> References: <320fb6e00808011329t6005ee3bt7948718320aae3c7@mail.gmail.com> Message-ID: <320fb6e00808011358p9b3d5ek599623c770ca17a3@mail.gmail.com> >> By testing I meant primarily if people use other platforms that I do >> (PostgreSQL on MacOSX), such as MySQL or Oracle on Linux, and can give this >> a whirl as in, load the NCBI taxonomy into a scratch database (using the >> script), then load it again (simulating an update), and see whether there >> are any error or warning messages that'd be great. > > OK, as a very cursory check I did a quick test on a Linux machine > using MySQL. I just grabbed the latest script via the SVN webpage, > then using an existing (partly populated) database: > > $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql > --dbuser root --download true > Downloading NCBI taxon database to taxdata > Unable to close datastream at ./load_ncbi_taxonomy.pl line 726 > > This may be a network issue... the taxdata/taxdump.tar.gz file had > downloaded OK, so I manually unzipped it, and then: > > $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql > --dbuser root Loading NCBI taxon database in taxdata: > ... retrieving all taxon nodes in the database > ... reading in taxon nodes from nodes.dmp > ... insert / update / delete taxon nodes > ... updating new parent IDs > ... (committing nodes) > ... rebuilding nested set left/right values > ... reading in taxon names from names.dmp > ... deleting old taxon names > ... inserting new taxon names > ... cleaning up > Done. > > So no further error messages - however, I have not actually checked to > see what exactly this did to my database ;) I then simulated an update by deleting the downloaded taxdata, and rerunning the script: $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root --download true Downloading NCBI taxon database to taxdata Unable to close datastream at ./load_ncbi_taxonomy.pl line 726 Loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp ... insert / update / delete taxon nodes ... updating new parent IDs ... (committing nodes) ... rebuilding nested set left/right values ... reading in taxon names from names.dmp ... deleting old taxon names ... inserting new taxon names ... cleaning up Done. [Note that after the "unable to close" message I just left the script running this time, and it continued fine] Again, I haven't checked the database. Peter From hlapp at gmx.net Fri Aug 1 17:04:37 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 1 Aug 2008 17:04:37 -0400 Subject: [BioSQL-l] load_ncbi_taxonomy.pl In-Reply-To: <320fb6e00808011358p9b3d5ek599623c770ca17a3@mail.gmail.com> References: <320fb6e00808011329t6005ee3bt7948718320aae3c7@mail.gmail.com> <320fb6e00808011358p9b3d5ek599623c770ca17a3@mail.gmail.com> Message-ID: <149B3375-0305-4420-AF70-BB6961050376@gmx.net> Sounds like I at least managed to silence all the complaining of the script ;-) How long did it run? Was it similar to what you've seen earlier or outrageously longer? -hilmar On Aug 1, 2008, at 4:58 PM, Peter wrote: >>> By testing I meant primarily if people use other platforms that I do >>> (PostgreSQL on MacOSX), such as MySQL or Oracle on Linux, and can >>> give this >>> a whirl as in, load the NCBI taxonomy into a scratch database >>> (using the >>> script), then load it again (simulating an update), and see >>> whether there >>> are any error or warning messages that'd be great. >> >> OK, as a very cursory check I did a quick test on a Linux machine >> using MySQL. I just grabbed the latest script via the SVN webpage, >> then using an existing (partly populated) database: >> >> $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql >> --dbuser root --download true >> Downloading NCBI taxon database to taxdata >> Unable to close datastream at ./load_ncbi_taxonomy.pl line 726 >> >> This may be a network issue... the taxdata/taxdump.tar.gz file had >> downloaded OK, so I manually unzipped it, and then: >> >> $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql >> --dbuser root Loading NCBI taxon database in taxdata: >> ... retrieving all taxon nodes in the database >> ... reading in taxon nodes from nodes.dmp >> ... insert / update / delete taxon nodes >> ... updating new parent IDs >> ... (committing nodes) >> ... rebuilding nested set left/right values >> ... reading in taxon names from names.dmp >> ... deleting old taxon names >> ... inserting new taxon names >> ... cleaning up >> Done. >> >> So no further error messages - however, I have not actually checked >> to >> see what exactly this did to my database ;) > > I then simulated an update by deleting the downloaded taxdata, and > rerunning the script: > > $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql > --dbuser root --download true > Downloading NCBI taxon database to taxdata > Unable to close datastream at ./load_ncbi_taxonomy.pl line 726 > Loading NCBI taxon database in taxdata: > ... retrieving all taxon nodes in the database > ... reading in taxon nodes from nodes.dmp > ... insert / update / delete taxon nodes > ... updating new parent IDs > ... (committing nodes) > ... rebuilding nested set left/right values > ... reading in taxon names from names.dmp > ... deleting old taxon names > ... inserting new taxon names > ... cleaning up > Done. > > [Note that after the "unable to close" message I just left the script > running this time, and it continued fine] > > Again, I haven't checked the database. > > Peter -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Fri Aug 1 19:24:49 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 2 Aug 2008 00:24:49 +0100 Subject: [BioSQL-l] *** SPAM *** Re: load_ncbi_taxonomy.pl In-Reply-To: <149B3375-0305-4420-AF70-BB6961050376@gmx.net> References: <320fb6e00808011329t6005ee3bt7948718320aae3c7@mail.gmail.com> <320fb6e00808011358p9b3d5ek599623c770ca17a3@mail.gmail.com> <149B3375-0305-4420-AF70-BB6961050376@gmx.net> Message-ID: <320fb6e00808011624u70af4135l2a20c09ca96f89fd@mail.gmail.com> On Fri, Aug 1, 2008 at 10:04 PM, Hilmar Lapp wrote: > Sounds like I at least managed to silence all the complaining of the script > ;-) How long did it run? Was it similar to what you've seen earlier or > outrageously longer? > I just ran it again (so updating an already complete database): $ time perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root --download true Downloading NCBI taxon database to taxdata Unable to close datastream at ./load_ncbi_taxonomy.pl line 726 Loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp ... insert / update / delete taxon nodes ... updating new parent IDs ... (committing nodes) ... rebuilding nested set left/right values ... reading in taxon names from names.dmp ... deleting old taxon names ... inserting new taxon names ... cleaning up Done. real 18m29.409s user 2m28.149s sys 0m18.025s Some of that is of course the download time, so without that: $ time perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root Loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp ... insert / update / delete taxon nodes ... updating new parent IDs ... (committing nodes) ... rebuilding nested set left/right values ... reading in taxon names from names.dmp ... deleting old taxon names ... inserting new taxon names ... cleaning up Done. real 13m18.777s user 2m17.285s sys 0m14.821s This is slow, with plenty of disk activity during the taxon names bit. However, I haven't got the equivalent numbers from the previous script to hand (and its after midnight here so I won't re-run it now). I'd have guessed it used to be about 10 minutes on this machine though, i.e. it is probably taking longer, but it was already longer than I liked. I don't know if that helped, but as I said, I hope to do a more thorough job later on. Peter From hlapp at gmx.net Fri Aug 1 19:54:32 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 1 Aug 2008 19:54:32 -0400 Subject: [BioSQL-l] BioSQL uses In-Reply-To: References: <5C57BAC6-974F-4E75-93E6-36BE2A58E980@gmx.net> Message-ID: <9977D67D-5369-4CE4-9125-70ABD25065AE@gmx.net> Just FYI, I finally got around to creating a page on the wiki: http://www.biosql.org/wiki/Uses There's very little there right now, but people should feel free to add themselves to the list where they see fit. -hilmar On Feb 27, 2008, at 11:14 AM, Cook, Malcolm wrote: > this would made a great topic for a page at http://www.biosql.org/wiki/Main_Page > > > > > Malcolm Cook > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, Missouri > > >> -----Original Message----- >> From: biosql-l-bounces at lists.open-bio.org >> [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of >> Wiepert, Mathieu >> Sent: Wednesday, February 27, 2008 5:19 AM >> To: BioSQL >> Subject: [BioSQL-l] BioSQL uses >> >> Hi, >> >> It's great to this coming to release 1.0, thanks very much >> for this work. I was wondering if I may ask how different >> users take advantage of BioSQL in daily work. We have a >> number of pressing issues, many which need a database of >> sequence for which we can overlay SNP, gene exp., Array CGH, >> etc type data. This seems like it would be a great start >> upon which we can add additional location specific >> information or any other feature. >> >> What do others use it for, and how does BioSQL work for you? >> >> -mat >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Fri Aug 1 20:15:58 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 1 Aug 2008 20:15:58 -0400 Subject: [BioSQL-l] load_ncbi_taxonomy.pl In-Reply-To: <320fb6e00808011624u70af4135l2a20c09ca96f89fd@mail.gmail.com> References: <320fb6e00808011329t6005ee3bt7948718320aae3c7@mail.gmail.com> <320fb6e00808011358p9b3d5ek599623c770ca17a3@mail.gmail.com> <149B3375-0305-4420-AF70-BB6961050376@gmx.net> <320fb6e00808011624u70af4135l2a20c09ca96f89fd@mail.gmail.com> Message-ID: <27FCD5FB-34CB-4016-927C-23A2E821B159@gmx.net> These sound like reasonable times, depending on your machine configuration. I suspect that PostgreSQL might even be a bit faster, as that's a similar time to what I'm observing on my laptop. BTW if you provide --verbose=2 on the command line you'll get rows/ time statistics. The slowest steps (recomputing nested set values, and inserting taxon names) average between 900-1800 rows/s on my laptop, depending on what else is going on (I suspect the spotlight indexer to contend for the disk drive on occasion). The faster steps (e.g. inserting taxon nodes) I observe at up to 2500-4000 rows/s. Thanks for all the testing, it's much appreciated! -hilmar On Aug 1, 2008, at 7:24 PM, Peter wrote: > On Fri, Aug 1, 2008 at 10:04 PM, Hilmar Lapp wrote: >> Sounds like I at least managed to silence all the complaining of >> the script >> ;-) How long did it run? Was it similar to what you've seen earlier >> or >> outrageously longer? >> > > I just ran it again (so updating an already complete database): > > $ time perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql > --dbuser root --download true > Downloading NCBI taxon database to taxdata > Unable to close datastream at ./load_ncbi_taxonomy.pl line 726 > Loading NCBI taxon database in taxdata: > ... retrieving all taxon nodes in the database > ... reading in taxon nodes from nodes.dmp > ... insert / update / delete taxon nodes > ... updating new parent IDs > ... (committing nodes) > ... rebuilding nested set left/right values > ... reading in taxon names from names.dmp > ... deleting old taxon names > ... inserting new taxon names > ... cleaning up > Done. > > real 18m29.409s > user 2m28.149s > sys 0m18.025s > > Some of that is of course the download time, so without that: > > $ time perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql > --dbuser root Loading NCBI taxon database in taxdata: > ... retrieving all taxon nodes in the database > ... reading in taxon nodes from nodes.dmp > ... insert / update / delete taxon nodes > ... updating new parent IDs > ... (committing nodes) > ... rebuilding nested set left/right values > ... reading in taxon names from names.dmp > ... deleting old taxon names > ... inserting new taxon names > ... cleaning up > Done. > > real 13m18.777s > user 2m17.285s > sys 0m14.821s > > This is slow, with plenty of disk activity during the taxon names bit. > However, I haven't got the equivalent numbers from the previous > script to hand (and its after midnight here so I won't re-run it now). > I'd have guessed it used to be about 10 minutes on this machine > though, i.e. it is probably taking longer, but it was already longer > than I liked. > > I don't know if that helped, but as I said, I hope to do a more > thorough job later on. > > Peter -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Sat Aug 2 08:30:46 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 2 Aug 2008 13:30:46 +0100 Subject: [BioSQL-l] load_ncbi_taxonomy.pl In-Reply-To: <27FCD5FB-34CB-4016-927C-23A2E821B159@gmx.net> References: <320fb6e00808011329t6005ee3bt7948718320aae3c7@mail.gmail.com> <320fb6e00808011358p9b3d5ek599623c770ca17a3@mail.gmail.com> <149B3375-0305-4420-AF70-BB6961050376@gmx.net> <320fb6e00808011624u70af4135l2a20c09ca96f89fd@mail.gmail.com> <27FCD5FB-34CB-4016-927C-23A2E821B159@gmx.net> Message-ID: <320fb6e00808020530n23d5edd8pf0a3b460441a9bfd@mail.gmail.com> On Sat, Aug 2, 2008 at 1:15 AM, Hilmar Lapp wrote: > These sound like reasonable times, depending on your machine configuration. > I suspect that PostgreSQL might even be a bit faster, as that's a similar > time to what I'm observing on my laptop. > > BTW if you provide --verbose=2 on the command line you'll get rows/time > statistics. The slowest steps (recomputing nested set values, and inserting > taxon names) average between 900-1800 rows/s on my laptop, depending on what > else is going on (I suspect the spotlight indexer to contend for the disk > drive on occasion). The faster steps (e.g. inserting taxon nodes) I observe > at up to 2500-4000 rows/s. I'm seeing about 900 rows/s on the recomputing of the nested set values, which means my 2 year old desktop is slower than your laptop. This is an AMD Athlon 64 X2 4600+ Socket 939 dual core machine, with a Seagate Barracuda hard drive (7200rpm, 200GB, 8MB Cache, IDE Ultra ATA100), running Ubuntu Dapper Drake (due for an upgrade soon!). $ time perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root --verbose=2 Loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp ... insert / update / delete taxon nodes 20000/448630 done (in 0 secs, 20000.0 rows/s) 40000/448630 done (in 1 secs, 20000.0 rows/s) 60000/448630 done (in 0 secs, 20000.0 rows/s) 80000/448630 done (in 0 secs, 20000.0 rows/s) 100000/448630 done (in 0 secs, 20000.0 rows/s) 120000/448630 done (in 0 secs, 20000.0 rows/s) 140000/448630 done (in 1 secs, 20000.0 rows/s) 160000/448630 done (in 0 secs, 20000.0 rows/s) 180000/448630 done (in 0 secs, 20000.0 rows/s) 200000/448630 done (in 0 secs, 20000.0 rows/s) 220000/448630 done (in 0 secs, 20000.0 rows/s) 240000/448630 done (in 1 secs, 20000.0 rows/s) 260000/448630 done (in 0 secs, 20000.0 rows/s) 280000/448630 done (in 0 secs, 20000.0 rows/s) 300000/448630 done (in 0 secs, 20000.0 rows/s) 320000/448630 done (in 0 secs, 20000.0 rows/s) 340000/448630 done (in 1 secs, 20000.0 rows/s) 360000/448630 done (in 0 secs, 20000.0 rows/s) 380000/448630 done (in 0 secs, 20000.0 rows/s) 400000/448630 done (in 0 secs, 20000.0 rows/s) 420000/448630 done (in 0 secs, 20000.0 rows/s) 440000/448630 done (in 1 secs, 20000.0 rows/s) ... updating new parent IDs ... (committing nodes) ... rebuilding nested set left/right values 20000 done (in 22 secs, 909.1 rows/s) 40000 done (in 22 secs, 909.1 rows/s) 60000 done (in 23 secs, 869.6 rows/s) 80000 done (in 22 secs, 909.1 rows/s) 100000 done (in 22 secs, 909.1 rows/s) 120000 done (in 22 secs, 909.1 rows/s) 140000 done (in 22 secs, 909.1 rows/s) 160000 done (in 22 secs, 909.1 rows/s) 180000 done (in 22 secs, 909.1 rows/s) 200000 done (in 21 secs, 952.4 rows/s) 220000 done (in 21 secs, 952.4 rows/s) 240000 done (in 22 secs, 909.1 rows/s) 260000 done (in 22 secs, 909.1 rows/s) 280000 done (in 21 secs, 952.4 rows/s) 300000 done (in 22 secs, 909.1 rows/s) 320000 done (in 21 secs, 952.4 rows/s) 340000 done (in 22 secs, 909.1 rows/s) 360001 done (in 22 secs, 909.1 rows/s) 380001 done (in 22 secs, 909.1 rows/s) 400001 done (in 21 secs, 952.4 rows/s) 420001 done (in 22 secs, 909.1 rows/s) 440001 done (in 21 secs, 952.4 rows/s) ... reading in taxon names from names.dmp ... deleting old taxon names ... inserting new taxon names 20000 done (in 3 secs, 6666.7 rows/s) 40000 done (in 2 secs, 10000.0 rows/s) 60000 done (in 4 secs, 5000.0 rows/s) 80000 done (in 3 secs, 6666.7 rows/s) 100000 done (in 5 secs, 4000.0 rows/s) 120000 done (in 6 secs, 3333.3 rows/s) 140000 done (in 7 secs, 2857.1 rows/s) 160000 done (in 7 secs, 2857.1 rows/s) 180000 done (in 8 secs, 2500.0 rows/s) 200000 done (in 8 secs, 2500.0 rows/s) 220000 done (in 8 secs, 2500.0 rows/s) 240000 done (in 9 secs, 2222.2 rows/s) 260000 done (in 9 secs, 2222.2 rows/s) 280000 done (in 10 secs, 2000.0 rows/s) 300000 done (in 10 secs, 2000.0 rows/s) 320000 done (in 10 secs, 2000.0 rows/s) 340000 done (in 10 secs, 2000.0 rows/s) 360000 done (in 10 secs, 2000.0 rows/s) 380000 done (in 10 secs, 2000.0 rows/s) 400000 done (in 11 secs, 1818.2 rows/s) 420000 done (in 11 secs, 1818.2 rows/s) 440000 done (in 11 secs, 1818.2 rows/s) 460000 done (in 10 secs, 2000.0 rows/s) 480000 done (in 10 secs, 2000.0 rows/s) 500000 done (in 11 secs, 1818.2 rows/s) 520000 done (in 11 secs, 1818.2 rows/s) 540000 done (in 12 secs, 1666.7 rows/s) 560000 done (in 10 secs, 2000.0 rows/s) 580000 done (in 12 secs, 1666.7 rows/s) 600000 done (in 12 secs, 1666.7 rows/s) 620000 done (in 11 secs, 1818.2 rows/s) ... cleaning up Done. real 13m13.805s user 2m3.548s sys 0m13.781s > > Thanks for all the testing, it's much appreciated! > This is only very cursory, confirming the script runs without showing any error messages, but its better than no testing ;) Peter From hlapp at gmx.net Sat Aug 2 09:41:13 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 2 Aug 2008 09:41:13 -0400 Subject: [BioSQL-l] Announcement: BioSQL v1.0.1 Release Message-ID: BioSQL v1.0.1 Release ===================== I am pleased to announce the release of version 1.0.1 of BioSQL, the second release in the Tokyo release series. The release can be downloaded from the following locations: http://biosql.org/DIST/biosql-1.0.1.tar.gz http://biosql.org/DIST/biosql-1.0.1.tar.bz2 http://biosql.org/DIST/biosql-1.0.1.zip (has Windows-style EOL) The core BioSQL schema is a generic, extensible relational model for sequences, sequence features, their annotation, and ontology terms. It is also designed as the interoperable persistence interface between the Bio* projects. This release contains - the core BioSQL schema as DDL (Data Definition Language) for the following RDBMSs: MySQL, PostgreSQL, Oracle, HSQLDB, and Apache Derby, - migration scripts from v1.0.0 for PostgreSQL, MySQL, and Oracle, - ancillary (but optional) schema files for PostgreSQL, among which are scripts providing experimental support for the Bioperl and possibly other language bindings to BioSQL - documentation and an ERD (Entity-Relationship Diagram), and - a Perl script that can pre-load (and update) a BioSQL instance with the NCBI taxonomy. This version of the schema should be fully backwards compatible with the v1.0.0 schema for nearly all software and queries. The only change is relaxing the column width constraint (previously 40 chars, now 128) of bioentry.accession and dbxref.accession. Migration scripts are included for PostgreSQL, MySQL, and Oracle for those who want to simply upgrade their existing database. In addition, the script load_ncbi_taxonomy.pl has been fixed to no longer require the taxon primary key and the NCBI taxon ID to be identical. If you previously relied on this (documented but not guaranteed) behavior, you will need to adjust your respective software. To my knowledge, none of the Bio* language bindings should be affected by this change. The complete change log is listed in the file Changes, and installation instructions for MySQL and PostgreSQL are in the file INSTALL. Additional information regarding BioSQL, including links to language bindings, a roadmap to future releases and enhancements, and possible local optimizations is available from the BioSQL website at http://biosql.org. On behalf of the BioSQL developers, Hilmar Lapp Acknowledgments --------------- BioSQL in general and in particular this point release owes to the community of users and developers who provide feedback, advice, and ideas, and report issues on the BioSQL mailing list (biosql-l{at}lists.open-bio.org). Credit also goes to those who have helped testing, in particular Peter Cock. This project would not exist without their contributions and the support of other developers and users from the Bio* community. The 1.0.x release series is code-named Tokyo in recognition of the role the BioHackathon 2008 played in getting the first of the series (v1.0.0) out the door, and in keeping with an informal tradition held up since the first BioHackathon. Thank you to everyone! License ------- BioSQL is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. Enjoy! From hlapp at gmx.net Sat Aug 2 10:07:17 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 2 Aug 2008 10:07:17 -0400 Subject: [BioSQL-l] Announcement: BioSQL v1.0.1 Release In-Reply-To: References: Message-ID: <07355FA1-10F8-407F-B171-A45B852C6398@gmx.net> On Aug 2, 2008, at 9:41 AM, Hilmar Lapp wrote: > - ancillary (but optional) schema files for PostgreSQL, among which > are scripts providing experimental support for the Bioperl and > possibly other language bindings to BioSQL Of course that's not true. I've fixed this and re-uploaded: - ancillary (but optional) schema files for PostgreSQL, - scripts providing experimental support for the Bioperl and possibly other language bindings to BioSQL with PostgreSQL v8.3+ (v8.2 and earlier are supported fine), Sorry about the goof. I guess to limit confusion for the Google searchers I need to repost the announcement, so delete the previous one from your records ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 2 10:08:17 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 2 Aug 2008 10:08:17 -0400 Subject: [BioSQL-l] Announcement: BioSQL v1.0.1 Release (corrected) Message-ID: <51938949-181A-4750-9CEB-26FBE7C9A24E@gmx.net> (the previous announcement contained a small error) BioSQL v1.0.1 Release ===================== I am pleased to announce the release of version 1.0.1 of BioSQL, the second release in the Tokyo release series. The release can be downloaded from the following locations: http://biosql.org/DIST/biosql-1.0.1.tar.gz http://biosql.org/DIST/biosql-1.0.1.tar.bz2 http://biosql.org/DIST/biosql-1.0.1.zip (has Windows-style EOL) The core BioSQL schema is a generic, extensible relational model for sequences, sequence features, their annotation, and ontology terms. It is also designed as the interoperable persistence interface between the Bio* projects. This release contains - the core BioSQL schema as DDL (Data Definition Language) for the following RDBMSs: MySQL, PostgreSQL, Oracle, HSQLDB, and Apache Derby, - migration scripts from v1.0.0 for PostgreSQL, MySQL, and Oracle, - ancillary (but optional) schema files for PostgreSQL, - scripts providing experimental support for the Bioperl and possibly other language bindings to BioSQL with PostgreSQL v8.3+ (v8.2 and earlier are supported fine), - documentation and an ERD (Entity-Relationship Diagram), and - a Perl script that can pre-load (and update) a BioSQL instance with the NCBI taxonomy. This version of the schema should be fully backwards compatible with the v1.0.0 schema for nearly all software and queries. The only change is relaxing the column width constraint (previously 40 chars, now 128) of bioentry.accession and dbxref.accession. Migration scripts are included for PostgreSQL, MySQL, and Oracle for those who want to simply upgrade their existing database. In addition, the script load_ncbi_taxonomy.pl has been fixed to no longer require the taxon primary key and the NCBI taxon ID to be identical. If you previously relied on this (documented but not guaranteed) behavior, you will need to adjust your respective software. To my knowledge, none of the Bio* language bindings should be affected by this change. The complete change log is listed in the file Changes, and installation instructions for MySQL and PostgreSQL are in the file INSTALL. Additional information regarding BioSQL, including links to language bindings, a roadmap to future releases and enhancements, and possible local optimizations is available from the BioSQL website at http://biosql.org. On behalf of the BioSQL developers, Hilmar Lapp Acknowledgments --------------- BioSQL in general and in particular this point release owes to the community of users and developers who provide feedback, advice, and ideas, and report issues on the BioSQL mailing list (biosql-l{at}lists.open-bio.org). Credit also goes to those who have helped testing, in particular Peter Cock. This project would not exist without their contributions and the support of other developers and users from the Bio* community. The 1.0.x release series is code-named Tokyo in recognition of the role the BioHackathon 2008 played in getting the first of the series (v1.0.0) out the door, and in keeping with an informal tradition held up since the first BioHackathon. Thank you to everyone! License ------- BioSQL is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. From biopython at maubp.freeserve.co.uk Wed Aug 13 07:44:21 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 13 Aug 2008 12:44:21 +0100 Subject: [BioSQL-l] Announcement: BioSQL v1.0.1 Release (corrected) In-Reply-To: <51938949-181A-4750-9CEB-26FBE7C9A24E@gmx.net> References: <51938949-181A-4750-9CEB-26FBE7C9A24E@gmx.net> Message-ID: <320fb6e00808130444g4c8ef5aehc87a6cc01f74092@mail.gmail.com> On Sat, Aug 2, 2008 at 3:08 PM, Hilmar Lapp wrote: > (the previous announcement contained a small error) > > BioSQL v1.0.1 Release > ===================== > > I am pleased to announce the release of version 1.0.1 of BioSQL, the > second release in the Tokyo release series. The release can be > downloaded from the following locations: > > http://biosql.org/DIST/biosql-1.0.1.tar.gz > http://biosql.org/DIST/biosql-1.0.1.tar.bz2 > http://biosql.org/DIST/biosql-1.0.1.zip (has Windows-style EOL) > > ... Hilmar, I've put a belated announcement of the BioSQL 1.0.1 release up on the OBF news server, http://news.open-bio.org/news/ http://news.open-bio.org/news/ Did you get Jason's emails about the new news server? If you register an account he can give you admin rights. Peter (Biopython) From hlapp at gmx.net Wed Aug 13 10:08:51 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 13 Aug 2008 10:08:51 -0400 Subject: [BioSQL-l] Announcement: BioSQL v1.0.1 Release (corrected) In-Reply-To: <320fb6e00808130444g4c8ef5aehc87a6cc01f74092@mail.gmail.com> References: <51938949-181A-4750-9CEB-26FBE7C9A24E@gmx.net> <320fb6e00808130444g4c8ef5aehc87a6cc01f74092@mail.gmail.com> Message-ID: <4C0E6537-C645-467B-AE26-4A22688CE8CA@gmx.net> Thanks Peter, that's much appreciated! It was actually on my todo list. -hilmar On Aug 13, 2008, at 7:44 AM, Peter wrote: > On Sat, Aug 2, 2008 at 3:08 PM, Hilmar Lapp wrote: >> (the previous announcement contained a small error) >> >> BioSQL v1.0.1 Release >> ===================== >> >> I am pleased to announce the release of version 1.0.1 of BioSQL, the >> second release in the Tokyo release series. The release can be >> downloaded from the following locations: >> >> http://biosql.org/DIST/biosql-1.0.1.tar.gz >> http://biosql.org/DIST/biosql-1.0.1.tar.bz2 >> http://biosql.org/DIST/biosql-1.0.1.zip (has Windows-style EOL) >> >> ... > > Hilmar, > > I've put a belated announcement of the BioSQL 1.0.1 release up on the > OBF news server, http://news.open-bio.org/news/ > http://news.open-bio.org/news/ > > Did you get Jason's emails about the new news server? If you register > an account he can give you admin rights. > > Peter > (Biopython) -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mrphysh at juno.com Wed Aug 13 16:13:56 2008 From: mrphysh at juno.com (mrphysh at juno.com) Date: Wed, 13 Aug 2008 20:13:56 GMT Subject: [BioSQL-l] installation.....IO/String problem Message-ID: <20080813.141356.8454.0@webmail16.vgs.untd.com> I am having trouble with database retrieval from online databases. This is an install problem.(?)..I am running Linux (Ubuntu)......... I did these, following the documentation. from cpan> install Bundle::CPAN install Module::Build #one of the many help files said to do this install Bundle;;BioPerl force install B/BI/BIRNEY/bioperl-1.4.tar.gz The ftp found the file and went to work. After many minutes, at the end, this what I saw: t/Variation _IO.............................FAILED tests 15,20,25 Failed 3/25 88% okay t/WABA...............................ok t/XEMBL_DB...........................ok t/XEMBL_DB...........................SOAP::lite and/or XML::DOM not installed. this means that Bio::DB::XEMBL module is not usable. Skipping test t/XEMBL_DB...........................ok failed test stat wstat total fail failed list of failed t/BioFetch_DB.t 27 4 14% 8 20 21 27 t/DB.t 78 2 2.5% 30 31 t/EMBL_DB.t 15 3 20$ 6 13 14 t/Ontology.t 9 2304 50 100 200% 1-50 t/TreeIO.t 41 1 2.4% 42 t/Variation_IO.t 25 3 12% 15 20 25 t/simpleGPparser.t 9 2304 98 196 200% 1-98 18 SUBTESTS SKIPPED fAILED 7/179 TEST SCRIPTS 96.09% 159/8268 SUBTEST FAILED 98% OKAY MAKE: ****[TEST DYNAMIC] ERROR 225 /USR/BIN/MAKE_TEST -- not ok Running make install Warning: you do not have permission to install into /usr/local/lib/perl/5.8.8 at /usr/share/perl/5.8/ExUtils /install.pm line 114 can't open file /usr/local/lib/perl/5.8.8/auto/Bio/.packlist: permission denied at /usr/share/perl/5.8/ExtUtils?Install.pm line 209 writing /usr/local/lib/perl/5.8.8/auto/Bio/.packlist make: *** [pure_site_install error13 /usr/bin/make_install --- NOT OKAY you may have to u to root to install the package cpan> #this is all my typing I have this little script (from a tutorial) and others that are similar use Bio::Perl; # this script will only work with an internet connection # on the computer it is run on $seq_object = get_sequence('swissprot',"ROA1_HUMAN"); write_sequence(">roa1.fasta",'fasta',$seq_object); I quit CPAN and type ~~ perl ee_use_bioperl.pl ######I get you system does not have of LWP, HTTP::Request::Common, IO::String installed so the DB retrieval method is not available. Full Error message is: at /usr/local/hsare/ perl/5.8.8/bio/perl.pm line 464 Bio::perl::Get_sequence('swissprot','ROA!_HUMAN') called at ee_use_bioperl.pl line 4 john at john-desktop:~/bbs$ ############# I feel that I am making progress but need assistance on this roadblock. My ideas and questions. Is this a perl issue. I am using the perl 5.8.8 that came with the Ubuntu I am much aware of the permissions aspect of Linux. The documentation says little about this. Is this where I am hanging up? (As you all know, Ubuntu has no logon as root but uses a sudu permissions system) I have reloaded the bioperl many many itmes. I do not want to sound 'windowie' but should I uninstall, then install? The errors always point to IO::string. I can find String.pm files in the /usr/hsare/perl5/debconf/Element but nowhere else. I cannot find a /IO/ (an IO folder) anywhere. please and thanks John Brigham ____________________________________________________________ Click for free quote on refinancing your mortgage. http://thirdpartyoffers.juno.com/TGL2141/fc/Ioyw6i3m3eQx5FoElnu5twhRhhF3am385HBkN0mvSSXTIBBqKLaZFi/ From biopython at maubp.freeserve.co.uk Wed Aug 13 16:54:50 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 13 Aug 2008 21:54:50 +0100 Subject: [BioSQL-l] installation.....IO/String problem In-Reply-To: <20080813.141356.8454.0@webmail16.vgs.untd.com> References: <20080813.141356.8454.0@webmail16.vgs.untd.com> Message-ID: <320fb6e00808131354q511333fdp107a2cacbe481fae@mail.gmail.com> On Wed, Aug 13, 2008 at 9:13 PM, mrphysh at juno.com wrote: > > I am having trouble with database retrieval from online databases. > This is an install problem.(?)..I am running Linux (Ubuntu)......... > I did these, following the documentation. from cpan> > install Bundle::CPAN > > install Module::Build #one of the many help files said to do this > install Bundle;;BioPerl > force install B/BI/BIRNEY/bioperl-1.4.tar.gz >... > can't open file /usr/local/lib/perl/5.8.8/auto/Bio/.packlist: permission denied at /usr/share/perl/5.8/ExtUtils?Install.pm line 209 > writing /usr/local/lib/perl/5.8.8/auto/Bio/.packlist > make: *** [pure_site_install error13 > /usr/bin/make_install --- NOT OKAY > you may have to u to root to install the package This is the BioSQL mailing list, not the BioPerl mailing list, so you are asking the wrong people. However, this looks like a simple permissions problem - havee you tried to install this as the root user (e.g. use "sudo cpan" to start cpan) or have you configured it to install under your home directory where you should have write permissions? Peter From cjfields at illinois.edu Wed Aug 13 17:26:20 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 13 Aug 2008 16:26:20 -0500 Subject: [BioSQL-l] installation.....IO/String problem In-Reply-To: <320fb6e00808131354q511333fdp107a2cacbe481fae@mail.gmail.com> References: <20080813.141356.8454.0@webmail16.vgs.untd.com> <320fb6e00808131354q511333fdp107a2cacbe481fae@mail.gmail.com> Message-ID: <6011F562-F73A-44E8-8F40-4E4A6C4A83A7@illinois.edu> On Aug 13, 2008, at 3:54 PM, Peter wrote: > On Wed, Aug 13, 2008 at 9:13 PM, mrphysh at juno.com > wrote: >> >> I am having trouble with database retrieval from online databases. >> This is an install problem.(?)..I am running Linux (Ubuntu)......... >> I did these, following the documentation. from cpan> >> install Bundle::CPAN >> >> install Module::Build #one of the many help files said to do this >> install Bundle;;BioPerl >> force install B/BI/BIRNEY/bioperl-1.4.tar.gz >> ... >> can't open file /usr/local/lib/perl/5.8.8/auto/Bio/.packlist: >> permission denied at /usr/share/perl/5.8/ExtUtils?Install.pm line 209 >> writing /usr/local/lib/perl/5.8.8/auto/Bio/.packlist >> make: *** [pure_site_install error13 >> /usr/bin/make_install --- NOT OKAY >> you may have to u to root to install the package > > This is the BioSQL mailing list, not the BioPerl mailing list, so you > are asking the wrong people. > > However, this looks like a simple permissions problem - havee you > tried to install this as the root user (e.g. use "sudo cpan" to start > cpan) or have you configured it to install under your home directory > where you should have write permissions? > > Peter I think this is also a bioperl versioning issue. Module::Build is (oddly) calling for the old BioPerl version (1.4) which is way out-of- date. You should try installing bioperl 1.5.2 or bioperl-live for this; see here: http://www.bioperl.org/wiki/Installing_BioPerl http://www.bioperl.org/wiki/Core_package chris From biopython at maubp.freeserve.co.uk Mon Aug 18 11:15:37 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 18 Aug 2008 16:15:37 +0100 Subject: [BioSQL-l] Checking bioperl-db version number Message-ID: <320fb6e00808180815x3e123c51t73ff4febe5e3c239@mail.gmail.com> Hi all, This might be better asked on the main BioPerl mailing list, however, I would like to know how to get the version of bioperl-db (i.e. the part of BioPerl used to import sequence files into BioSQL). Thanks, Peter -- P.S. I've found two equivalent ways to check the version of BioPerl itself: require Bio::Perl; print "Bio::Perl::VERSION = "; print $Bio::Perl::VERSION, "\n"; require Bio::Root::Version; print "Bio::Root::Version::VERSION = "; print $Bio::Root::Version::VERSION, "\n"; Example output: Bio::Perl::VERSION = 1.005002102 Bio::Root::Version::VERSION = 1.005002102 From hlapp at gmx.net Mon Aug 18 11:44:08 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 18 Aug 2008 11:44:08 -0400 Subject: [BioSQL-l] Checking bioperl-db version number In-Reply-To: <320fb6e00808180815x3e123c51t73ff4febe5e3c239@mail.gmail.com> References: <320fb6e00808180815x3e123c51t73ff4febe5e3c239@mail.gmail.com> Message-ID: Peter - can you repost this on the Bioperl list? There's several people who should this better than I do. -hilmar On Aug 18, 2008, at 11:15 AM, Peter wrote: > Hi all, > > This might be better asked on the main BioPerl mailing list, however, > I would like to know how to get the version of bioperl-db (i.e. the > part of BioPerl used to import sequence files into BioSQL). > > Thanks, > > Peter > > -- > > P.S. I've found two equivalent ways to check the version of BioPerl > itself: > > require Bio::Perl; > print "Bio::Perl::VERSION = "; > print $Bio::Perl::VERSION, "\n"; > > require Bio::Root::Version; > print "Bio::Root::Version::VERSION = "; > print $Bio::Root::Version::VERSION, "\n"; > > Example output: > Bio::Perl::VERSION = 1.005002102 > Bio::Root::Version::VERSION = 1.005002102 > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Mon Aug 18 11:50:26 2008 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Aug 2008 10:50:26 -0500 Subject: [BioSQL-l] Checking bioperl-db version number In-Reply-To: <320fb6e00808180815x3e123c51t73ff4febe5e3c239@mail.gmail.com> References: <320fb6e00808180815x3e123c51t73ff4febe5e3c239@mail.gmail.com> Message-ID: <41928BBC-2665-4D04-A7CA-FE07A8FD63EE@illinois.edu> I don't think bioperl-db has a specific version separate from BioPerl, at least not anymore. As you found you'll get 1.005002102 (i.e. 1.5.2), which corresponds to the bioperl-core version installed. chris On Aug 18, 2008, at 10:15 AM, Peter wrote: > Hi all, > > This might be better asked on the main BioPerl mailing list, however, > I would like to know how to get the version of bioperl-db (i.e. the > part of BioPerl used to import sequence files into BioSQL). > > Thanks, > > Peter > > -- > > P.S. I've found two equivalent ways to check the version of BioPerl > itself: > > require Bio::Perl; > print "Bio::Perl::VERSION = "; > print $Bio::Perl::VERSION, "\n"; > > require Bio::Root::Version; > print "Bio::Root::Version::VERSION = "; > print $Bio::Root::Version::VERSION, "\n"; > > Example output: > Bio::Perl::VERSION = 1.005002102 > Bio::Root::Version::VERSION = 1.005002102 > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From biopython at maubp.freeserve.co.uk Mon Aug 18 12:23:38 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 18 Aug 2008 17:23:38 +0100 Subject: [BioSQL-l] Timing importing GenBank files into BioSQL Message-ID: <320fb6e00808180923w43b242b3ue22c48331260f9be@mail.gmail.com> Hi, I've started trying to look at BioPerl and Biopython and how well they agree in writing GenBank files into BioSQL. I've been using the BioPerl load_seqdatabase.pl script to import sample GenBank files, but I was a little surprised how long this takes to run for E. coli K12, NC_000913.gbk (about 10 minutes!). I'm using E coli K12, NC_000913.2 from ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K12_substr__MG1655/NC_000913.gbk and Nanoarchaeum equitans, NC_005213.1 from ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Nanoarchaeum_equitans/NC_005213.gbk as my example input files. Example timing using BioPerl, after emptying most (all?) of my MySQL test database: $ mysql --user="gbrowse" --pass="biosql" test_biosql -e "truncate table bioentry; truncate table seqfeature; truncate table bioentry_dbxref; truncate table term; truncate table ontology; truncate table reference; truncate table dbxref;" $ time perl ~/Downloads/Software/bioperl-db-1.5.2_100/scripts/biosql/load_seqdatabase.pl --dbname test_biosql --namespace test --format genbank --dbpass biosql --dbuser gbrowse Nanoarchaeum_equitans/NC_005213.gbk Loading Nanoarchaeum_equitans/NC_005213.gbk ... real 0m17.116s user 0m13.914s sys 0m2.293s $ time perl ~/Downloads/Software/bioperl-db-1.5.2_100/scripts/biosql/load_seqdatabase.pl --dbname test_biosql --namespace test --format genbank --dbpass biosql --dbuser gbrowse Escherichia_coli_K12_substr__MG1655/NC_000913.gbk Loading Escherichia_coli_K12_substr__MG1655/NC_000913.gbk ... real 10m0.784s user 6m23.898s sys 3m26.189s This does seem a rather unreasonable length of time (and I've repeated this over three times). Is this normal? I know this may not be a fair comparison, but this it what Biopython takes (code at end of email): $ mysql --user="gbrowse" --pass="biosql" test_biosql -e "truncate table bioentry; truncate table seqfeature; truncate table bioentry_dbxref; truncate table term; truncate table ontology; truncate table reference; truncate table dbxref;" $ time python load.py Importing Nanoarchaeum_equitans/NC_005213.gbk Loaded 1 records Took 5.32s include the commit Importing Escherichia_coli_K12_substr__MG1655/NC_000913.gbk Loaded 1 records Took 64.15s including the commit real 1m10.037s user 0m31.942s sys 0m6.913s I'm wondering if the BioPerl time is typical (I hope not), and if there are any computationally intensive or otherwise slow things it does that BioPython might be skipping (checksums? fetching taxonomy?) Thanks Peter --------------------------------------------------------------------- The contents of my load.py script: import time from Bio import SeqIO from BioSQL import BioSeqDatabase server = BioSeqDatabase.open_database(driver="MySQLdb", user="gbrowse", passwd = "biosql", host = "localhost", db="test_biosql") db = server["test"] start = time.time() filename = "Nanoarchaeum_equitans/NC_005213.gbk" print "Importing %s" % filename records = SeqIO.parse(open(filename), "genbank") print "Loaded %i records" % db.load(records) server.adaptor.commit() print "Took %0.2fs including the commit" % (time.time()-start) start = time.time() filename = "Escherichia_coli_K12_substr__MG1655/NC_000913.gbk" print "Importing %s" % filename records = SeqIO.parse(open(filename), "genbank") print "Loaded %i records" % db.load(records) server.adaptor.commit() print "Took %0.2fs including the commit" % (time.time()-start) From biopython at maubp.freeserve.co.uk Mon Aug 18 13:05:37 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 18 Aug 2008 18:05:37 +0100 Subject: [BioSQL-l] Timing importing GenBank files into BioSQL In-Reply-To: <48A9A44D.4000309@bham.ac.uk> References: <320fb6e00808180923w43b242b3ue22c48331260f9be@mail.gmail.com> <48A9A44D.4000309@bham.ac.uk> Message-ID: <320fb6e00808181005p5894c897ia2cea4b7fe20fdac@mail.gmail.com> On Mon, Aug 18, 2008 at 5:33 PM, Nick Loman wrote: > Peter wrote: > >> I'm wondering if the BioPerl time is typical (I hope not), and if >> there are any computationally intensive or otherwise slow things it >> does that BioPython might be skipping (checksums? fetching taxonomy?) > > I also found that BioPython was faster than BioPerl at importing the same > GenBank file. That is reassuring that you also saw a difference - do you recall how big a difference this was on your setup? The factor of ten I am seeming is rather surprising. > There are some differences in the handling of certain tables, the dbxref > table springs to mind. It is worth doing a dump of the database after > importing each file using the two different methods and comparing the > results. The differences may not be significant for you depending on your > application. I am hoping to bring Biopython into closer agreement with BioPerl (and thus also BioJava) in its use of BioSQL. If you have already made notes on any observed differences, that could be very useful. > I suspect the difference is speed you find is related to the number of > object lookups done in BioPerl which is significantly more than in > BioPython. You can specify --flatlookup to load_seqdatabase.pl which reduces > the number of lookups. Reading the help output from the load_seqdatabase.pl script, ??lookup and --flatlookup seem to be related to speeding up updating existing records (where as in my test, I am trying to start with an empty database each time). I tried it anyway, and it seems to make no difference for this example. But thanks for the suggestions, its one thing ruled out at least. > You could enable DBI_TRACE to get a log of SQL statements for BioPerl. That could help track down some differences, both in what gets written and how it gets written. I am hoping to avoid using too much Perl, otherwise I'm sure profiling load_seqdatabase.pl could be informative too. > For my purposes, I found both Bioperl and Biopython to be a bit slow devised > a batch import script which speeds things up quite dramatically by > eliminating most object lookups, and applying the foreign-key constraints > post-importing. This was your "BioSQL BatchLoader" code for PostgreSQL? I remember the impressive speed up you got, at the expense of a much modified setup. http://portal.open-bio.org/pipermail/biopython-dev/2008-April/003618.html Peter From n.j.loman at bham.ac.uk Mon Aug 18 12:33:17 2008 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Mon, 18 Aug 2008 17:33:17 +0100 Subject: [BioSQL-l] Timing importing GenBank files into BioSQL In-Reply-To: <320fb6e00808180923w43b242b3ue22c48331260f9be@mail.gmail.com> References: <320fb6e00808180923w43b242b3ue22c48331260f9be@mail.gmail.com> Message-ID: <48A9A44D.4000309@bham.ac.uk> Peter wrote: > I'm wondering if the BioPerl time is typical (I hope not), and if > there are any computationally intensive or otherwise slow things it > does that BioPython might be skipping (checksums? fetching taxonomy?) I also found that BioPython was faster than BioPerl at importing the same GenBank file. There are some differences in the handling of certain tables, the dbxref table springs to mind. It is worth doing a dump of the database after importing each file using the two different methods and comparing the results. The differences may not be significant for you depending on your application. I suspect the difference is speed you find is related to the number of object lookups done in BioPerl which is significantly more than in BioPython. You can specify --flatlookup to load_seqdatabase.pl which reduces the number of lookups. You could enable DBI_TRACE to get a log of SQL statements for BioPerl. For my purposes, I found both Bioperl and Biopython to be a bit slow devised a batch import script which speeds things up quite dramatically by eliminating most object lookups, and applying the foreign-key constraints post-importing. Regards, Nick. From biopython at maubp.freeserve.co.uk Mon Aug 18 13:41:58 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 18 Aug 2008 18:41:58 +0100 Subject: [BioSQL-l] Timing importing GenBank files into BioSQL In-Reply-To: <320fb6e00808181005p5894c897ia2cea4b7fe20fdac@mail.gmail.com> References: <320fb6e00808180923w43b242b3ue22c48331260f9be@mail.gmail.com> <48A9A44D.4000309@bham.ac.uk> <320fb6e00808181005p5894c897ia2cea4b7fe20fdac@mail.gmail.com> Message-ID: <320fb6e00808181041h14641ccftef53aa100f758552@mail.gmail.com> Peter wrote: >>> I'm wondering if the BioPerl time is typical (I hope not), and if >>> there are any computationally intensive or otherwise slow things it >>> does that BioPython might be skipping (checksums? fetching taxonomy?) Nick wrote: >> I also found that BioPython was faster than BioPerl at importing the same >> GenBank file. If anyone else with at least two of BioPerl, BioJava, BioRuby and Biopython installed could try this example, and report their findings, that would be interesting. i.e. time importing the small NC_005213.1 and medium sized NC_000913.2 genbank files linked to at the start of this thread into an empty BioSQL database. Thanks, Peter From johnsonm at gmail.com Mon Aug 18 16:53:48 2008 From: johnsonm at gmail.com (Mark Johnson) Date: Mon, 18 Aug 2008 15:53:48 -0500 Subject: [BioSQL-l] Bio::Annotation issues with BioSQL Message-ID: I'm presently refactoring an in-house protein annotation pipeline and converting it to use BioSQL as a data store. I've noticed some slightly screwy behavior with regard to how some of the Bio::Annotation classes are handled: -Instances of Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue attached to the annotation collection for a sequence feature (Bio::SeqFeature::Generic) are converted to tags/values on the feature. -Instances of Bio::AnnotationDBLink with attached comments loose the comment. I'm storing and retrieving things thusly: my $dbadp = Bio::DB::BioDB->new( -database => 'biosql', -user => $user', -pass => $pass, -dbname => $ora_instance, -driver => 'Oracle' ); my $adp = $dbadp->get_object_adaptor("Bio::SeqI"); my $seq = Bio::Seq->new( -id => 'DEBUG001', -accession_number => 'DBG001', -desc => 'Debug Sequence', -seq => 'GATTACA', -namespace => 'DEBUG', ); my $feature = Bio::SeqFeature::Generic->new( -seq_id => 'DEBUG001', -display_name => 'FEAT0001', -primary => 'debug', -source => 'test', -start => 3, -end => 5, -strand => 1, ); my $dblink = Bio::Annotation::DBLink->new( -database => 'FAKE001', -primary_id => 'FK1234567890'', -comment => 'This is a fake comment', ); $feature->annotation->add_Annotation('ANNO0001, $dblink); $seq->add_SeqFeature($feature); my $pseq = $dbadp->create_persistent($seq); $pseq->store(); $adp->commit(); my $dbadp = Bio::DB::BioDB->new( ... ); my $adp = $dbadp->get_object_adaptor("Bio::SeqI"); my $query = Bio::DB::Query::BioQuery->new(); $query->datacollections([ "Bio::PrimarySeqI s", ]); $query->where(["s.display_id like DEBUG%'"]); my $result = $adp->find_by_query($query); while (my $seq = $result->next_object()) { my @features = $seq->get_SeqFeatures(); foreach my $feature (@features) { ## Contents of Bio::Annotation::SimpleValue and Bio::Annotation::StructeredValue have ## migrated to tag/value pairs on $feature and are missing from $annotation_collection. ## ## Comments have gone missing from Bio::Annotation::DBLink, but DBLinks are otherwise intact and present. my $annotation_collection = $feature->annotation(); ... ... } } Is bioperl-db / BioSQL trying to tell me that I shouldn't be using Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue? Is there even a place in the BioSQL schema for a comment to be attached to a DBLink? From hlapp at gmx.net Tue Aug 19 13:56:42 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 19 Aug 2008 13:56:42 -0400 Subject: [BioSQL-l] Bio::Annotation issues with BioSQL In-Reply-To: References: Message-ID: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net> On Aug 18, 2008, at 4:53 PM, Mark Johnson wrote: > I'm presently refactoring an in-house protein annotation pipeline > and converting it to use BioSQL as a data store. I've noticed some > slightly screwy behavior with regard to how some of the > Bio::Annotation classes are handled: > > -Instances of Bio::Annotation::SimpleValue and > Bio::Annotation::StructuredValue attached to the annotation collection > for a sequence feature (Bio::SeqFeature::Generic) are converted to > tags/values on the feature. > > -Instances of Bio::Annotation::DBLink with attached comments loose > the comment. > [...] > $query->where(["s.display_id like DEBUG%'"]); There's a single quote missing here, but I'm assuming that's a result of copy/paste editing? > [...] > Is bioperl-db / BioSQL trying to tell me that I shouldn't be using > Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue? Your example code doesn't contain an example for where you are getting the B::A::StructuredValue object from. If you didn't create that yourself, it would be good to know what you did to end up with that. Chris Fields has written B::A::Tagtree which would be way forward, and if you created the object yourself, can you take a look at that and see whether that class wouldn't serve your purpose as well or even better? In order to be stored in BioSQL structured (hierarchical, nested) annotation is flattened into a string representation, because BioSQL can't store nested annotation collections natively. Right now if I am not mistaken upon retrieval this is not converted back into a B::A::Tagtree object but rather left flat. This is being worked on though, we've just discussed some issues connected with that. I could make B::A::StructuredValue work the same way, but I'm not sure what it provides that B::A::Tagtree doesn't. The latter uses Data::Stag under the hood, which is much cleaner, and more extensible in the future. As for SimpleValue annotation versus tag/value annotation for seqfeatures, yes right now these are treated interchangeably for the purposes of BioSQL and Bioperl-db. You can do this easily too on your end by using Bio::SeqFeature::AnnotationAdaptor. > Is there even a place in the BioSQL schema for a comment to be > attached > to a DBLink? No there isn't. I thought it is but it turns out that this isn't yet one of the desirable extensions to BioSQL from 1.1.x onwards, as documented on the wiki: http://www.biosql.org/wiki/Enhancement_Requests I'll add it (but feel free to do so yourself, especially if you have other enhancmenets). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Aug 19 14:17:36 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 19 Aug 2008 14:17:36 -0400 Subject: [BioSQL-l] Timing importing GenBank files into BioSQL In-Reply-To: <320fb6e00808180923w43b242b3ue22c48331260f9be@mail.gmail.com> References: <320fb6e00808180923w43b242b3ue22c48331260f9be@mail.gmail.com> Message-ID: The timings do seem a bit on the long end, but they are also whole genomes. The first interesting bit would be how much of that time is spent in the BioPerl parser, and how much time is spent loading the sequence. For typical genbank sequences, a rate between 10-20 seqs/sec is in the expected range, depending on your hardware setup (and db configuration) you can get slower or faster speeds. You can get lots of output on what it is doing by passing --debug. Under normal operating conditions, the printed lines should be flying past you much faster than you can identify what it is, and should start doing so right after you get the line "Loading " followed by the filename (before that it is opening the database connection). If there is something that stays on the screen long enough that you can read (or copy&paste) it it is probably a bottle neck. Bioperl-db essentially works like an object-relational mapper, and hence loading data happens one object at a time. There are some speed optimizations, for example some objects (like dbxrefs) are always looked up first and inserted if not found, whereas others (like seqs or features) are inserted first and updated if that fails. The assumptions that this is based on are for databases that you are updating (which is what one typically does 90% of the time), not for fresh loads into an empty db. Finally any speed comparisons aren't really particularly useful so long as you don't know how similar (or different) the resulting data content is, so I would start by comparing that. -hilmar On Aug 18, 2008, at 12:23 PM, Peter wrote: > Hi, > > I've started trying to look at BioPerl and Biopython and how well they > agree in writing GenBank files into BioSQL. I've been using the > BioPerl load_seqdatabase.pl script to import sample GenBank files, but > I was a little surprised how long this takes to run for E. coli K12, > NC_000913.gbk (about 10 minutes!). I'm using E coli K12, NC_000913.2 > from ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K12_substr__MG1655/NC_000913.gbk > and Nanoarchaeum equitans, NC_005213.1 from > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Nanoarchaeum_equitans/NC_005213.gbk > as my example input files. > > Example timing using BioPerl, after emptying most (all?) of my MySQL > test database: > > $ mysql --user="gbrowse" --pass="biosql" test_biosql -e "truncate > table bioentry; truncate table seqfeature; truncate table > bioentry_dbxref; truncate table term; truncate table ontology; > truncate table reference; truncate table dbxref;" > > $ time perl ~/Downloads/Software/bioperl-db-1.5.2_100/scripts/biosql/ > load_seqdatabase.pl > --dbname test_biosql --namespace test --format genbank --dbpass biosql > --dbuser gbrowse Nanoarchaeum_equitans/NC_005213.gbk > Loading Nanoarchaeum_equitans/NC_005213.gbk ... > > real 0m17.116s > user 0m13.914s > sys 0m2.293s > > $ time perl ~/Downloads/Software/bioperl-db-1.5.2_100/scripts/biosql/ > load_seqdatabase.pl > --dbname test_biosql --namespace test --format genbank --dbpass biosql > --dbuser gbrowse Escherichia_coli_K12_substr__MG1655/NC_000913.gbk > Loading Escherichia_coli_K12_substr__MG1655/NC_000913.gbk ... > > real 10m0.784s > user 6m23.898s > sys 3m26.189s > > This does seem a rather unreasonable length of time (and I've repeated > this over three times). Is this normal? I know this may not be a > fair comparison, but this it what Biopython takes (code at end of > email): > > $ mysql --user="gbrowse" --pass="biosql" test_biosql -e "truncate > table bioentry; truncate table seqfeature; truncate table > bioentry_dbxref; truncate table term; truncate table ontology; > truncate table reference; truncate table dbxref;" > > $ time python load.py > Importing Nanoarchaeum_equitans/NC_005213.gbk > Loaded 1 records > Took 5.32s include the commit > Importing Escherichia_coli_K12_substr__MG1655/NC_000913.gbk > Loaded 1 records > Took 64.15s including the commit > > real 1m10.037s > user 0m31.942s > sys 0m6.913s > > I'm wondering if the BioPerl time is typical (I hope not), and if > there are any computationally intensive or otherwise slow things it > does that BioPython might be skipping (checksums? fetching taxonomy?) > > Thanks > > Peter > > --------------------------------------------------------------------- > The contents of my load.py script: > > import time > from Bio import SeqIO > from BioSQL import BioSeqDatabase > server = BioSeqDatabase.open_database(driver="MySQLdb", > user="gbrowse", > passwd = "biosql", host = "localhost", > db="test_biosql") > > db = server["test"] > > start = time.time() > filename = "Nanoarchaeum_equitans/NC_005213.gbk" > print "Importing %s" % filename > records = SeqIO.parse(open(filename), "genbank") > print "Loaded %i records" % db.load(records) > server.adaptor.commit() > print "Took %0.2fs including the commit" % (time.time()-start) > > start = time.time() > filename = "Escherichia_coli_K12_substr__MG1655/NC_000913.gbk" > print "Importing %s" % filename > records = SeqIO.parse(open(filename), "genbank") > print "Loaded %i records" % db.load(records) > server.adaptor.commit() > print "Took %0.2fs including the commit" % (time.time()-start) > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From johnsonm at gmail.com Wed Aug 20 14:43:25 2008 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 20 Aug 2008 13:43:25 -0500 Subject: [BioSQL-l] Bio::Annotation issues with BioSQL In-Reply-To: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net> References: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net> Message-ID: On Tue, Aug 19, 2008 at 12:56 PM, Hilmar Lapp wrote: > On Aug 18, 2008, at 4:53 PM, Mark Johnson wrote: > There's a single quote missing here, but I'm assuming that's a result of > copy/paste editing? Yes, I was a bit sloppy with the example. > Your example code doesn't contain an example for where you are getting the > B::A::StructuredValue object from. If you didn't create that yourself, it > would be good to know what you did to end up with that. Chris Fields has > written B::A::Tagtree which would be way forward, and if you created the > object yourself, can you take a look at that and see whether that class > wouldn't serve your purpose as well or even better? I created the B::A::StructuredValue myself. I'm using it to store the output from PSORTb, which gives a cellular localization and a score for a protein sequence (gene), which I'm trying to keep paired together, if possible. I'll take a look at B::A::Tagtree, that's probably a better fit. > In order to be stored in BioSQL structured (hierarchical, nested) annotation > is flattened into a string representation, because BioSQL can't store nested > annotation collections natively. Right now if I am not mistaken upon > retrieval this is not converted back into a B::A::Tagtree object but rather > left flat. This is being worked on though, we've just discussed some issues > connected with that. The data I have isn't really deeply nested. I just like to keep related annotation in one object, if possible. > I could make B::A::StructuredValue work the same way, but I'm not sure what > it provides that B::A::Tagtree doesn't. The latter uses Data::Stag under the > hood, which is much cleaner, and more extensible in the future. Perhaps B::A::StructuredValue should be deprecated? > As for SimpleValue annotation versus tag/value annotation for seqfeatures, > yes right now these are treated interchangeably for the purposes of BioSQL > and Bioperl-db. You can do this easily too on your end by using > Bio::SeqFeature::AnnotationAdaptor. I'll check out the AnnotationAdaptor, but I'll probably just end using seqfeature tags/values. They're functionally equivalent to B::A::SimpleValue. >> Is there even a place in the BioSQL schema for a comment to be attached >> to a DBLink? > > No there isn't. I thought it is but it turns out that this isn't yet one of > the desirable extensions to BioSQL from 1.1.x onwards, as documented on the > wiki: > > http://www.biosql.org/wiki/Enhancement_Requests > > I'll add it (but feel free to do so yourself, especially if you have other > enhancmenets). I'll take a look at the wiki....I'll file that as a feature request if I get there before you do it. From cjfields at illinois.edu Wed Aug 20 16:25:55 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 20 Aug 2008 15:25:55 -0500 Subject: [BioSQL-l] Bio::Annotation issues with BioSQL In-Reply-To: References: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net> Message-ID: <9872D07D-61AB-4F0A-A477-35AA87ABF72E@illinois.edu> On Aug 20, 2008, at 1:43 PM, Mark Johnson wrote: > ... > >> I could make B::A::StructuredValue work the same way, but I'm not >> sure what >> it provides that B::A::Tagtree doesn't. The latter uses Data::Stag >> under the >> hood, which is much cleaner, and more extensible in the future. > > Perhaps B::A::StructuredValue should be deprecated? Probably. The only place it was used in core was SeqIO::swiss (and now that uses Tagtree in bioperl-live). Let me know if you have any problems with Bio::Annotation::Tagtree. I am planning on doing some more work with it soon. chris From awitney at sgul.ac.uk Wed Aug 27 06:28:50 2008 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 27 Aug 2008 11:28:50 +0100 Subject: [BioSQL-l] fill taxon_id field in bioentry from Seq object? Message-ID: <73F4F12F-3C10-41B5-B431-104A787E05D7@sgul.ac.uk> is it possible to add a taxon_id to a Seq object such that when i save it to my BioSQL database, it is stored in the bioentry table? thanks for any help adam From biopython at maubp.freeserve.co.uk Wed Aug 27 06:44:30 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Aug 2008 11:44:30 +0100 Subject: [BioSQL-l] fill taxon_id field in bioentry from Seq object? In-Reply-To: <73F4F12F-3C10-41B5-B431-104A787E05D7@sgul.ac.uk> References: <73F4F12F-3C10-41B5-B431-104A787E05D7@sgul.ac.uk> Message-ID: <320fb6e00808270344paeaf049p27fa7425053582a1@mail.gmail.com> On Wed, Aug 27, 2008 at 11:28 AM, Adam Witney wrote: > > is it possible to add a taxon_id to a Seq object such that when i save it to > my BioSQL database, it is stored in the bioentry table? > > thanks for any help > > adam Which Bio* binding for BioSQL are you trying to use? BioPerl, Biopython, BioJava etc Peter From awitney at sgul.ac.uk Wed Aug 27 06:49:50 2008 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 27 Aug 2008 11:49:50 +0100 Subject: [BioSQL-l] fill taxon_id field in bioentry from Seq object? In-Reply-To: <320fb6e00808270344paeaf049p27fa7425053582a1@mail.gmail.com> References: <73F4F12F-3C10-41B5-B431-104A787E05D7@sgul.ac.uk> <320fb6e00808270344paeaf049p27fa7425053582a1@mail.gmail.com> Message-ID: <5648AC9C-55D4-45A4-B08B-BC50C6BF5FC8@sgul.ac.uk> On 27 Aug 2008, at 11:44, Peter wrote: > On Wed, Aug 27, 2008 at 11:28 AM, Adam Witney > wrote: >> >> is it possible to add a taxon_id to a Seq object such that when i >> save it to >> my BioSQL database, it is stored in the bioentry table? >> >> thanks for any help >> >> adam > > Which Bio* binding for BioSQL are you trying to use? BioPerl, > Biopython, BioJava etc sorry forgot to mention that bit.... I am using BioPerl thanks adam From biopython at maubp.freeserve.co.uk Wed Aug 27 09:51:41 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Aug 2008 14:51:41 +0100 Subject: [BioSQL-l] get_dblinks in load_seqdatabase.pl is deprecated Message-ID: <320fb6e00808270651q4c3b81d7x375fbbdd47c9b5f5@mail.gmail.com> Hi, In order to install GBrowse 1.69, I've updated my installation of BioPerl (using gbrowse_netinstall.pl) and then by hand fetched the latest BioPerl/BioSQL load_seqdatabase.pl from SVN, http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-db/trunk/scripts/biosql/load_seqdatabase.pl The new script seems to work, but prints out over a page of deprecation warnings about get_dblinks (see below). Should I file this as a bug on bugzilla? Do you think load_seqdatabase.pl be updated to work with the latest BioPerl and still be backwards compatible with BioPerl 1.5.2? Peter $ mysql --user="gbrowse" --pass="biosql" test_biosql -e "truncate table bioentry; truncate table seqfeature; truncate table bioentry_dbxref; truncate table term; truncate table ontology; truncate table reference; truncate table dbxref;" $ time perl load_seqdatabase.pl --dbname test_biosql --namespace test --format genbank --dbpass biosql --dbuser gbrowse Nanoarchaeum_equitans/NC_005213.gbk Loading Nanoarchaeum_equitans/NC_005213.gbk ... Use of get_dblinks is deprecated. Note that prior use of this method could return either simple scalar values or Bio::Annotation::DBLink instances; only Bio::Annotation::DBLink is now supported. Use get_dbxrefs() instead STACK Bio::Ontology::Term::get_dblinks /Library/Perl/5.8.8/Bio/Ontology/Term.pm:437 STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:552 STACK Bio::DB::BioSQL::TermAdaptor::store_children /Library/Perl/5.8.8/Bio/DB/BioSQL/TermAdaptor.pm:280 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK Bio::DB::Persistent::PersistentObject::create /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:244 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 STACK Bio::DB::BioSQL::SeqAdaptor::store_children /Library/Perl/5.8.8/Bio/DB/BioSQL/SeqAdaptor.pm:244 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 STACK (eval) load_seqdatabase.pl:630 STACK toplevel load_seqdatabase.pl:612 [deprecation warning and stack repeated another six times] real 0m15.479s user 0m12.315s sys 0m2.263s From cjfields at illinois.edu Wed Aug 27 10:38:45 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 27 Aug 2008 09:38:45 -0500 Subject: [BioSQL-l] get_dblinks in load_seqdatabase.pl is deprecated In-Reply-To: <320fb6e00808270651q4c3b81d7x375fbbdd47c9b5f5@mail.gmail.com> References: <320fb6e00808270651q4c3b81d7x375fbbdd47c9b5f5@mail.gmail.com> Message-ID: <51375A78-B1F3-4F52-BBEC-8FE62257F789@illinois.edu> Go ahead and file a bug for tracking. I'll see if I can track this down; I'm wondering if there is something within bioperl-db/bioperl- live still using get_dblinks, though it's called through AUTOLOAD. chris On Aug 27, 2008, at 8:51 AM, Peter wrote: > Hi, > > In order to install GBrowse 1.69, I've updated my installation of > BioPerl (using gbrowse_netinstall.pl) and then by hand fetched the > latest BioPerl/BioSQL load_seqdatabase.pl from SVN, > > http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-db/trunk/scripts/biosql/load_seqdatabase.pl > > The new script seems to work, but prints out over a page of > deprecation warnings about get_dblinks (see below). Should I file > this as a bug on bugzilla? > > Do you think load_seqdatabase.pl be updated to work with the latest > BioPerl and still be backwards compatible with BioPerl 1.5.2? > > Peter > > $ mysql --user="gbrowse" --pass="biosql" test_biosql -e "truncate > table bioentry; truncate table seqfeature; truncate table > bioentry_dbxref; truncate table term; truncate table ontology; > truncate table reference; truncate table dbxref;" > > $ time perl load_seqdatabase.pl --dbname test_biosql --namespace test > --format genbank --dbpass biosql --dbuser gbrowse > Nanoarchaeum_equitans/NC_005213.gbk > Loading Nanoarchaeum_equitans/NC_005213.gbk ... > Use of get_dblinks is deprecated. Note that prior use > of this method could return either simple scalar values > or Bio::Annotation::DBLink instances; only > Bio::Annotation::DBLink is now supported. > Use get_dbxrefs() instead > STACK Bio::Ontology::Term::get_dblinks > /Library/Perl/5.8.8/Bio/Ontology/Term.pm:437 > STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD > /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:552 > STACK Bio::DB::BioSQL::TermAdaptor::store_children > /Library/Perl/5.8.8/Bio/DB/BioSQL/TermAdaptor.pm:280 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK Bio::DB::Persistent::PersistentObject::create > /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:244 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > STACK Bio::DB::Persistent::PersistentObject::store > /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 > STACK Bio::DB::BioSQL::SeqAdaptor::store_children > /Library/Perl/5.8.8/Bio/DB/BioSQL/SeqAdaptor.pm:244 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > STACK Bio::DB::Persistent::PersistentObject::store > /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 > STACK (eval) load_seqdatabase.pl:630 > STACK toplevel load_seqdatabase.pl:612 > [deprecation warning and stack repeated another six times] > real 0m15.479s > user 0m12.315s > sys 0m2.263s > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at illinois.edu Wed Aug 27 11:22:45 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 27 Aug 2008 10:22:45 -0500 Subject: [BioSQL-l] get_dblinks in load_seqdatabase.pl is deprecated In-Reply-To: <51375A78-B1F3-4F52-BBEC-8FE62257F789@illinois.edu> References: <320fb6e00808270651q4c3b81d7x375fbbdd47c9b5f5@mail.gmail.com> <51375A78-B1F3-4F52-BBEC-8FE62257F789@illinois.edu> Message-ID: Peter, Unfortunately I'm unable to reproduce this using bioperl-live and bioperl-db (both from Subversion): cjfields$ time perl load_seqdatabase.pl --dbname nano --namespace test --format genbank --dbpass ***** --dbuser foo NC_005213.gbk Loading NC_005213.gbk ... real 0m35.057s user 0m26.480s sys 0m4.456s This problem is similar to one reported recently: http://article.gmane.org/gmane.comp.lang.perl.bio.general/17360 I think the solution may have been making sure to install bioperl and bioperl-db from Subversion or (if you can't access it) the nightly builds. Use 'sudo ./Build install --uninst 1' to remove old versions which may conflict. The nightly build link: http://bioperl.org/DIST/nightly_builds/ chris On Aug 27, 2008, at 9:38 AM, Chris Fields wrote: > Go ahead and file a bug for tracking. I'll see if I can track this > down; I'm wondering if there is something within bioperl-db/bioperl- > live still using get_dblinks, though it's called through AUTOLOAD. > > chris > > On Aug 27, 2008, at 8:51 AM, Peter wrote: > >> Hi, >> >> In order to install GBrowse 1.69, I've updated my installation of >> BioPerl (using gbrowse_netinstall.pl) and then by hand fetched the >> latest BioPerl/BioSQL load_seqdatabase.pl from SVN, >> >> http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-db/trunk/scripts/biosql/load_seqdatabase.pl >> >> The new script seems to work, but prints out over a page of >> deprecation warnings about get_dblinks (see below). Should I file >> this as a bug on bugzilla? >> >> Do you think load_seqdatabase.pl be updated to work with the latest >> BioPerl and still be backwards compatible with BioPerl 1.5.2? >> >> Peter >> >> $ mysql --user="gbrowse" --pass="biosql" test_biosql -e "truncate >> table bioentry; truncate table seqfeature; truncate table >> bioentry_dbxref; truncate table term; truncate table ontology; >> truncate table reference; truncate table dbxref;" >> >> $ time perl load_seqdatabase.pl --dbname test_biosql --namespace test >> --format genbank --dbpass biosql --dbuser gbrowse >> Nanoarchaeum_equitans/NC_005213.gbk >> Loading Nanoarchaeum_equitans/NC_005213.gbk ... >> Use of get_dblinks is deprecated. Note that prior use >> of this method could return either simple scalar values >> or Bio::Annotation::DBLink instances; only >> Bio::Annotation::DBLink is now supported. >> Use get_dbxrefs() instead >> STACK Bio::Ontology::Term::get_dblinks >> /Library/Perl/5.8.8/Bio/Ontology/Term.pm:437 >> STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD >> /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:552 >> STACK Bio::DB::BioSQL::TermAdaptor::store_children >> /Library/Perl/5.8.8/Bio/DB/BioSQL/TermAdaptor.pm:280 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create >> /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK Bio::DB::Persistent::PersistentObject::create >> /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:244 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create >> /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store >> /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >> STACK Bio::DB::Persistent::PersistentObject::store >> /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 >> STACK Bio::DB::BioSQL::SeqAdaptor::store_children >> /Library/Perl/5.8.8/Bio/DB/BioSQL/SeqAdaptor.pm:244 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create >> /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store >> /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >> STACK Bio::DB::Persistent::PersistentObject::store >> /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 >> STACK (eval) load_seqdatabase.pl:630 >> STACK toplevel load_seqdatabase.pl:612 >> [deprecation warning and stack repeated another six times] >> real 0m15.479s >> user 0m12.315s >> sys 0m2.263s >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From biopython at maubp.freeserve.co.uk Wed Aug 27 12:43:06 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Aug 2008 17:43:06 +0100 Subject: [BioSQL-l] get_dblinks in load_seqdatabase.pl is deprecated In-Reply-To: References: <320fb6e00808270651q4c3b81d7x375fbbdd47c9b5f5@mail.gmail.com> <51375A78-B1F3-4F52-BBEC-8FE62257F789@illinois.edu> Message-ID: <320fb6e00808270943t750ce0b3r9ec9d7c2c9744fc5@mail.gmail.com> > Peter, > > Unfortunately I'm unable to reproduce this using bioperl-live and bioperl-db > (both from Subversion): > ... > > This problem is similar to one reported recently: > > http://article.gmane.org/gmane.comp.lang.perl.bio.general/17360 Yes, it does look very similar. > I think the solution may have been making sure to install bioperl and > bioperl-db from Subversion or (if you can't access it) the nightly builds. > Use 'sudo ./Build install --uninst 1' to remove old versions which may > conflict. The nightly build link: > > http://bioperl.org/DIST/nightly_builds/ You were right - I've installed the nightly builds of bioperl-live and bioperl-db with the switch to remove old versions and the deprecation warning went away. Thanks for your help, I've closed the bug I filed as invalid: http://bugzilla.open-bio.org/show_bug.cgi?id=2572 Peter From biopython at maubp.freeserve.co.uk Wed Aug 27 12:52:27 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Aug 2008 17:52:27 +0100 Subject: [BioSQL-l] fill taxon_id field in bioentry from Seq object? In-Reply-To: <5648AC9C-55D4-45A4-B08B-BC50C6BF5FC8@sgul.ac.uk> References: <73F4F12F-3C10-41B5-B431-104A787E05D7@sgul.ac.uk> <320fb6e00808270344paeaf049p27fa7425053582a1@mail.gmail.com> <5648AC9C-55D4-45A4-B08B-BC50C6BF5FC8@sgul.ac.uk> Message-ID: <320fb6e00808270952j701bda51wac5fe096d640754f@mail.gmail.com> Adam wrote: > is it possible to add a taxon_id to a Seq object such that when i save it >to my BioSQL database, it is stored in the bioentry table? > ... > sorry forgot to mention that bit.... I am using BioPerl I'm afraid I can't help you with BioPerl, sorry. Hopefully a BioPerl expect will reply. All I can suggest is you could try parsing a GenBank file with BioPerl and see where the taxon id is stored in the Seq object's annotation, then try and do the same with your data before asking BioPerl to save it to the BioSQL database. Peter From hlapp at gmx.net Wed Aug 27 15:11:09 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 27 Aug 2008 15:11:09 -0400 Subject: [BioSQL-l] fill taxon_id field in bioentry from Seq object? In-Reply-To: <5648AC9C-55D4-45A4-B08B-BC50C6BF5FC8@sgul.ac.uk> References: <73F4F12F-3C10-41B5-B431-104A787E05D7@sgul.ac.uk> <320fb6e00808270344paeaf049p27fa7425053582a1@mail.gmail.com> <5648AC9C-55D4-45A4-B08B-BC50C6BF5FC8@sgul.ac.uk> Message-ID: <0D41112C-DB20-43CA-B7F9-CC83DF3F3A89@gmx.net> $seq->species->ncbi_taxon_id() BTW feel free to post this to the BioPerl mailing list bioperl-l at lists.open-bio.org . -hilmar On Aug 27, 2008, at 6:49 AM, Adam Witney wrote: > > On 27 Aug 2008, at 11:44, Peter wrote: > >> On Wed, Aug 27, 2008 at 11:28 AM, Adam Witney >> wrote: >>> >>> is it possible to add a taxon_id to a Seq object such that when i >>> save it to >>> my BioSQL database, it is stored in the bioentry table? >>> >>> thanks for any help >>> >>> adam >> >> Which Bio* binding for BioSQL are you trying to use? BioPerl, >> Biopython, BioJava etc > > sorry forgot to mention that bit.... I am using BioPerl > > thanks > > adam > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From gabrielle_doan at gmx.net Fri Aug 29 08:36:26 2008 From: gabrielle_doan at gmx.net (Gabrielle Doan) Date: Fri, 29 Aug 2008 14:36:26 +0200 Subject: [BioSQL-l] Increasing value of rank in table seqfeature Message-ID: <48B7ED4A.5000008@gmx.net> Hi all, I have a BioSQL database which contains several chromosomes and features. And now I would like to insert chromosome 2 with some miRNA as a new feature. I meet the problem that in the table seqfeature the entry rank just can store smallint(5) unsigned values. As fare as I know each rank has to be unique. If you want to store many information this value will be excess quickly. Isn't it better to increase this value? It would be very nice if someone could comment my suggestion. Thanks a lot. Cheers, Gabrielle From hlapp at gmx.net Fri Aug 29 10:45:26 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 29 Aug 2008 10:45:26 -0400 Subject: [BioSQL-l] Increasing value of rank in table seqfeature In-Reply-To: <48B7ED4A.5000008@gmx.net> References: <48B7ED4A.5000008@gmx.net> Message-ID: <9021A6D3-7B9D-4C82-A4A4-45DC28C587F1@gmx.net> Hi Gabrielle, smallint can take values up to 65535 if unsigned. I can see that this can become a limitation if the bioentry to which the features belong is a whole chromosome. Note that the uniqueness constraint is not on bioentry (sequence) and rank. Instead, it is on the combination of bioentry (sequence), type term, source term, and rank. I.e., at present, with the smallint constraint, you can't have more than 65535 features of the same type and from the same source for a particular sequence. It's possible that the software you are using (Biojava?) increments the rank for every single feature, rather than resetting for each new combination of type and source. Is that what you are seeing? -hilmar On Aug 29, 2008, at 8:36 AM, Gabrielle Doan wrote: > Hi all, > I have a BioSQL database which contains several chromosomes and > features. And now I would like to insert chromosome 2 with some > miRNA as a new feature. I meet the problem that in the table > seqfeature the entry rank just can store smallint(5) unsigned > values. As fare as I know each rank has to be unique. If you want to > store many information this value will be excess quickly. Isn't it > better to increase this value? > > It would be very nice if someone could comment my suggestion. Thanks > a lot. > > Cheers, > Gabrielle > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Fri Aug 8 11:26:24 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 08 Aug 2008 15:26:24 -0000 Subject: [BioSQL-l] Biopython documentation in BioSQL SVN Message-ID: <320fb6e00808080826hfa9d2d7rf1adec4d10888574@mail.gmail.com> >> However, there is some older LaTeX based documentation on our webpage, >> http://biopython.org/DIST/docs/biosql/python_biosql_basic.html >> http://biopython.org/DIST/docs/biosql/python_biosql_basic.pdf >> >> These are currently living in the BioSQL repository, >... > >> What I would suggest is just to: >> >> (*) add a disclaimer to the top of python_biosql_basic.tex saying this >> document is depreciated, and giving a link to the wiki page, >> http://biopython.org/wiki/BioSQL > > Just send me a patch of the change you would like to make. Better late than never? Here is a patch against the SVN file python_biosql_basic.tex which puts more emphasis on the wiki page, http://biopython.org/wiki/BioSQL and also uses Bio.SeqIO rather than Bio.GenBank for the record parsing. This also removes the stub section "Python Cookbook Code". Thanks, Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: biopython_biosql_doc.patch Type: application/octet-stream Size: 2234 bytes Desc: not available URL: From hlapp at gmx.net Fri Aug 1 03:13:34 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 31 Jul 2008 23:13:34 -0400 Subject: [BioSQL-l] BioSQL at BOSC08 - Was Re: (no subject) In-Reply-To: <4889EB66.2010700@compbio.dundee.ac.uk> References: <79ceddbc0807171108qe17f5a4g13730eeca90c1f2f@mail.gmail.com> <4884A4AB.3050907@compbio.dundee.ac.uk> <7B81518B-1F70-4382-BAF5-E04B6B062CBC@gmx.net> <4888A0A7.7090001@compbio.dundee.ac.uk> <342088F2-B4DE-4D57-ABE8-6431DA535370@gmx.net> <4888AEC2.8060008@compbio.dundee.ac.uk> <4889EB66.2010700@compbio.dundee.ac.uk> Message-ID: <02C35A12-7F3A-4C2B-9266-B5A863FF328B@gmx.net> Hi James, On Jul 25, 2008, at 11:04 AM, James Procter wrote: > I'd suggest that a wiki page is set up to describe any ad-hoc > 'extensions' that BioSQL users think might be useful to the > community. If/When I get round to making any extensions myself then > I'll > add them to that page, too. actually that page exists already: http://www.biosql.org/wiki/Extensions Right now all that's there is the fledgling PhyloDB module that's part of the svn repository (though not yet of a release). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Fri Aug 1 04:59:54 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 1 Aug 2008 00:59:54 -0400 Subject: [BioSQL-l] Release 1.0.1 in the making Message-ID: <4E2B21EA-6604-4863-96B4-8737A33930E6@gmx.net> I am preparing the release of v1.0.1. This will primarily change the too short column width constraint on dbxref.accession (and in consequence that of bioentry.accession) to 128 chars. I have added migration scripts for Pg, MySQL, and Oracle. It'd be great if someone could help out by providing (and ideally testing) the respective DDL for one (or more) of the other database for which we have schema DDLs: HSQLDB, and Derby. I'm also adding notes about the PostgreSQL v8.3+ incompatibility if you need the Perl language binding (the 8.3 change should have actually very little effect for a strongly typed language such as Java), and have added the scripts by Peter Eisentraut (Pg developer) for people willing to try it (it's otherwise completely untested, though in theory it should work). See http://www.biosql.org/wiki/Releases#BioSQL_release_v1.0.1 for the release plan. Let me know if you would like to include anything else. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Fri Aug 1 09:24:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 1 Aug 2008 10:24:16 +0100 Subject: [BioSQL-l] Fwd: Release 1.0.1 in the making In-Reply-To: <320fb6e00808010223w1d6b698dm55dbfc6e98873040@mail.gmail.com> References: <4E2B21EA-6604-4863-96B4-8737A33930E6@gmx.net> <320fb6e00808010223w1d6b698dm55dbfc6e98873040@mail.gmail.com> Message-ID: <320fb6e00808010224o171eddf2g26e3c29c1a74e3ad@mail.gmail.com> Forgot to send to the list (sorry Hilmar, you'll get this twice!) On Fri, Aug 1, 2008 at 5:59 AM, Hilmar Lapp wrote: > I am preparing the release of v1.0.1. This will primarily change the too > short column width constraint on dbxref.accession (and in consequence that > of bioentry.accession) to 128 chars. > > ... > > See http://www.biosql.org/wiki/Releases#BioSQL_release_v1.0.1 for the > release plan. Let me know if you would like to include anything else. Is fixing Bug 2470 too ambitious for your planned release schedule? http://bugzilla.open-bio.org/show_bug.cgi?id=2470 This would help with some mooted Biopython enhancements to populate the taxonomy "on demand" as new sequences are added (see http://bugzilla.open-bio.org/show_bug.cgi?id=2475 although this is a bit long to read!). Thanks, Peter (Biopython) From jimp at compbio.dundee.ac.uk Fri Aug 1 09:56:56 2008 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Fri, 01 Aug 2008 10:56:56 +0100 Subject: [BioSQL-l] BioSQL at BOSC08 - Was Re: (no subject) In-Reply-To: <02C35A12-7F3A-4C2B-9266-B5A863FF328B@gmx.net> References: <79ceddbc0807171108qe17f5a4g13730eeca90c1f2f@mail.gmail.com> <4884A4AB.3050907@compbio.dundee.ac.uk> <7B81518B-1F70-4382-BAF5-E04B6B062CBC@gmx.net> <4888A0A7.7090001@compbio.dundee.ac.uk> <342088F2-B4DE-4D57-ABE8-6431DA535370@gmx.net> <4888AEC2.8060008@compbio.dundee.ac.uk> <4889EB66.2010700@compbio.dundee.ac.uk> <02C35A12-7F3A-4C2B-9266-B5A863FF328B@gmx.net> Message-ID: <4892DDE8.4020709@compbio.dundee.ac.uk> Hilmar Lapp wrote: > On Jul 25, 2008, at 11:04 AM, James Procter wrote: >> I'd suggest that a wiki page is set up to describe any ad-hoc >> 'extensions' that BioSQL users think might be useful to the ... > actually that page exists already: > > http://www.biosql.org/wiki/Extensions doh - that'll teach me to look before I type :) > Right now all that's there is the fledgling PhyloDB module that's part > of the svn repository (though not yet of a release). Thanks - I may give the PhyloDB module a whirl in the next few months, too. This is, I suspect, a dumb question, but: is there a multiple sequence alignment representation within BioSQL ? This was going to be the extension I'd introduce - but if someone has already done this then I'd be happy to help harden it for production use. cheers j. From hlapp at gmx.net Fri Aug 1 17:18:09 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 1 Aug 2008 13:18:09 -0400 Subject: [BioSQL-l] Fwd: Release 1.0.1 in the making In-Reply-To: <320fb6e00808010224o171eddf2g26e3c29c1a74e3ad@mail.gmail.com> References: <4E2B21EA-6604-4863-96B4-8737A33930E6@gmx.net> <320fb6e00808010223w1d6b698dm55dbfc6e98873040@mail.gmail.com> <320fb6e00808010224o171eddf2g26e3c29c1a74e3ad@mail.gmail.com> Message-ID: On Aug 1, 2008, at 5:24 AM, Peter wrote: > [...] > Is fixing Bug 2470 too ambitious for your planned release schedule? > http://bugzilla.open-bio.org/show_bug.cgi?id=2470 Actually it would have been had I known what I would be getting myself into (I need the release tomorrow night at the very latest for a course we are holding here ... :) After going through the easy steps I realized that there was a real reason for doubling use of the NCBI taxonID as primary key - it is what links the hierarchical structure of the taxonomy together, and also links the taxon names to taxon nodes. Of cource this could be done as lookups, but with several times over looking up almost 500,000 nodes might slow things down a bit. So long story short it should be fixed now. There may be some remnant bugs so any testing would be much appreciated. The changes are committed to svn, but may need a bit more time to percolate to the anonymous svn server. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Fri Aug 1 17:35:14 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 1 Aug 2008 18:35:14 +0100 Subject: [BioSQL-l] *** SPAM *** Re: Fwd: Release 1.0.1 in the making In-Reply-To: References: <4E2B21EA-6604-4863-96B4-8737A33930E6@gmx.net> <320fb6e00808010223w1d6b698dm55dbfc6e98873040@mail.gmail.com> <320fb6e00808010224o171eddf2g26e3c29c1a74e3ad@mail.gmail.com> Message-ID: <320fb6e00808011035k46c110abg77a876e191ea4102@mail.gmail.com> >> [...] >> Is fixing Bug 2470 too ambitious for your planned release schedule? >> http://bugzilla.open-bio.org/show_bug.cgi?id=2470 > > Actually it would have been had I known what I would be getting myself into > (I need the release tomorrow night at the very latest for a course we are > holding here ... :) After going through the easy steps I realized that there > was a real reason for doubling use of the NCBI taxonID as primary key - it > is what links the hierarchical structure of the taxonomy together, and also > links the taxon names to taxon nodes. Of cource this could be done as > lookups, but with several times over looking up almost 500,000 nodes might > slow things down a bit. > > So long story short it should be fixed now. There may be some remnant bugs > so any testing would be much appreciated. The changes are committed to svn, > but may need a bit more time to percolate to the anonymous svn server. I won't be able to make time to try this until next week at the earliest (i.e. after your planned release), but when I get back to using Biopython with BioSQL again in earnest I will check this out. Thanks! Peter From hlapp at gmx.net Fri Aug 1 18:18:40 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 1 Aug 2008 14:18:40 -0400 Subject: [BioSQL-l] Fwd: Release 1.0.1 in the making In-Reply-To: <320fb6e00808011035k46c110abg77a876e191ea4102@mail.gmail.com> References: <4E2B21EA-6604-4863-96B4-8737A33930E6@gmx.net> <320fb6e00808010223w1d6b698dm55dbfc6e98873040@mail.gmail.com> <320fb6e00808010224o171eddf2g26e3c29c1a74e3ad@mail.gmail.com> <320fb6e00808011035k46c110abg77a876e191ea4102@mail.gmail.com> Message-ID: On Aug 1, 2008, at 1:35 PM, Peter wrote: >> So long story short it should be fixed now. There may be some >> remnant bugs >> so any testing would be much appreciated. The changes are committed >> to svn, >> but may need a bit more time to percolate to the anonymous svn >> server. > > I won't be able to make time to try this until next week at the > earliest (i.e. after your planned release), but when I get back to > using Biopython with BioSQL again in earnest I will check this out. By testing I meant primarily if people use other platforms that I do (PostgreSQL on MacOSX), such as MySQL or Oracle on Linux, and can give this a whirl as in, load the NCBI taxonomy into a scratch database (using the script), then load it again (simulating an update), and see whether there are any error or warning messages that'd be great. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Fri Aug 1 20:29:23 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 1 Aug 2008 21:29:23 +0100 Subject: [BioSQL-l] load_ncbi_taxonomy.pl Message-ID: <320fb6e00808011329t6005ee3bt7948718320aae3c7@mail.gmail.com> On Fri, Aug 1, 2008 at 7:18 PM, Hilmar Lapp wrote: > > On Aug 1, 2008, at 1:35 PM, Peter wrote: > >>> So long story short it [load_ncbi_taxonomy.pl] should be fixed now. There >>> may be some remnant bugs so any testing would be much appreciated. >>> The changes are committed to svn, but may need a bit more time to >>> percolate to the anonymous svn server. >> >> I won't be able to make time to try this until next week at the >> earliest (i.e. after your planned release), but when I get back to >> using Biopython with BioSQL again in earnest I will check this out. > > By testing I meant primarily if people use other platforms that I do > (PostgreSQL on MacOSX), such as MySQL or Oracle on Linux, and can give this > a whirl as in, load the NCBI taxonomy into a scratch database (using the > script), then load it again (simulating an update), and see whether there > are any error or warning messages that'd be great. OK, as a very cursory check I did a quick test on a Linux machine using MySQL. I just grabbed the latest script via the SVN webpage, then using an existing (partly populated) database: $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root --download true Downloading NCBI taxon database to taxdata Unable to close datastream at ./load_ncbi_taxonomy.pl line 726 This may be a network issue... the taxdata/taxdump.tar.gz file had downloaded OK, so I manually unzipped it, and then: $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root Loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp ... insert / update / delete taxon nodes ... updating new parent IDs ... (committing nodes) ... rebuilding nested set left/right values ... reading in taxon names from names.dmp ... deleting old taxon names ... inserting new taxon names ... cleaning up Done. So no further error messages - however, I have not actually checked to see what exactly this did to my database ;) Peter From biopython at maubp.freeserve.co.uk Fri Aug 1 20:58:14 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 1 Aug 2008 21:58:14 +0100 Subject: [BioSQL-l] load_ncbi_taxonomy.pl In-Reply-To: <320fb6e00808011329t6005ee3bt7948718320aae3c7@mail.gmail.com> References: <320fb6e00808011329t6005ee3bt7948718320aae3c7@mail.gmail.com> Message-ID: <320fb6e00808011358p9b3d5ek599623c770ca17a3@mail.gmail.com> >> By testing I meant primarily if people use other platforms that I do >> (PostgreSQL on MacOSX), such as MySQL or Oracle on Linux, and can give this >> a whirl as in, load the NCBI taxonomy into a scratch database (using the >> script), then load it again (simulating an update), and see whether there >> are any error or warning messages that'd be great. > > OK, as a very cursory check I did a quick test on a Linux machine > using MySQL. I just grabbed the latest script via the SVN webpage, > then using an existing (partly populated) database: > > $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql > --dbuser root --download true > Downloading NCBI taxon database to taxdata > Unable to close datastream at ./load_ncbi_taxonomy.pl line 726 > > This may be a network issue... the taxdata/taxdump.tar.gz file had > downloaded OK, so I manually unzipped it, and then: > > $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql > --dbuser root Loading NCBI taxon database in taxdata: > ... retrieving all taxon nodes in the database > ... reading in taxon nodes from nodes.dmp > ... insert / update / delete taxon nodes > ... updating new parent IDs > ... (committing nodes) > ... rebuilding nested set left/right values > ... reading in taxon names from names.dmp > ... deleting old taxon names > ... inserting new taxon names > ... cleaning up > Done. > > So no further error messages - however, I have not actually checked to > see what exactly this did to my database ;) I then simulated an update by deleting the downloaded taxdata, and rerunning the script: $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root --download true Downloading NCBI taxon database to taxdata Unable to close datastream at ./load_ncbi_taxonomy.pl line 726 Loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp ... insert / update / delete taxon nodes ... updating new parent IDs ... (committing nodes) ... rebuilding nested set left/right values ... reading in taxon names from names.dmp ... deleting old taxon names ... inserting new taxon names ... cleaning up Done. [Note that after the "unable to close" message I just left the script running this time, and it continued fine] Again, I haven't checked the database. Peter From hlapp at gmx.net Fri Aug 1 21:04:37 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 1 Aug 2008 17:04:37 -0400 Subject: [BioSQL-l] load_ncbi_taxonomy.pl In-Reply-To: <320fb6e00808011358p9b3d5ek599623c770ca17a3@mail.gmail.com> References: <320fb6e00808011329t6005ee3bt7948718320aae3c7@mail.gmail.com> <320fb6e00808011358p9b3d5ek599623c770ca17a3@mail.gmail.com> Message-ID: <149B3375-0305-4420-AF70-BB6961050376@gmx.net> Sounds like I at least managed to silence all the complaining of the script ;-) How long did it run? Was it similar to what you've seen earlier or outrageously longer? -hilmar On Aug 1, 2008, at 4:58 PM, Peter wrote: >>> By testing I meant primarily if people use other platforms that I do >>> (PostgreSQL on MacOSX), such as MySQL or Oracle on Linux, and can >>> give this >>> a whirl as in, load the NCBI taxonomy into a scratch database >>> (using the >>> script), then load it again (simulating an update), and see >>> whether there >>> are any error or warning messages that'd be great. >> >> OK, as a very cursory check I did a quick test on a Linux machine >> using MySQL. I just grabbed the latest script via the SVN webpage, >> then using an existing (partly populated) database: >> >> $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql >> --dbuser root --download true >> Downloading NCBI taxon database to taxdata >> Unable to close datastream at ./load_ncbi_taxonomy.pl line 726 >> >> This may be a network issue... the taxdata/taxdump.tar.gz file had >> downloaded OK, so I manually unzipped it, and then: >> >> $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql >> --dbuser root Loading NCBI taxon database in taxdata: >> ... retrieving all taxon nodes in the database >> ... reading in taxon nodes from nodes.dmp >> ... insert / update / delete taxon nodes >> ... updating new parent IDs >> ... (committing nodes) >> ... rebuilding nested set left/right values >> ... reading in taxon names from names.dmp >> ... deleting old taxon names >> ... inserting new taxon names >> ... cleaning up >> Done. >> >> So no further error messages - however, I have not actually checked >> to >> see what exactly this did to my database ;) > > I then simulated an update by deleting the downloaded taxdata, and > rerunning the script: > > $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql > --dbuser root --download true > Downloading NCBI taxon database to taxdata > Unable to close datastream at ./load_ncbi_taxonomy.pl line 726 > Loading NCBI taxon database in taxdata: > ... retrieving all taxon nodes in the database > ... reading in taxon nodes from nodes.dmp > ... insert / update / delete taxon nodes > ... updating new parent IDs > ... (committing nodes) > ... rebuilding nested set left/right values > ... reading in taxon names from names.dmp > ... deleting old taxon names > ... inserting new taxon names > ... cleaning up > Done. > > [Note that after the "unable to close" message I just left the script > running this time, and it continued fine] > > Again, I haven't checked the database. > > Peter -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Fri Aug 1 23:24:49 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 2 Aug 2008 00:24:49 +0100 Subject: [BioSQL-l] *** SPAM *** Re: load_ncbi_taxonomy.pl In-Reply-To: <149B3375-0305-4420-AF70-BB6961050376@gmx.net> References: <320fb6e00808011329t6005ee3bt7948718320aae3c7@mail.gmail.com> <320fb6e00808011358p9b3d5ek599623c770ca17a3@mail.gmail.com> <149B3375-0305-4420-AF70-BB6961050376@gmx.net> Message-ID: <320fb6e00808011624u70af4135l2a20c09ca96f89fd@mail.gmail.com> On Fri, Aug 1, 2008 at 10:04 PM, Hilmar Lapp wrote: > Sounds like I at least managed to silence all the complaining of the script > ;-) How long did it run? Was it similar to what you've seen earlier or > outrageously longer? > I just ran it again (so updating an already complete database): $ time perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root --download true Downloading NCBI taxon database to taxdata Unable to close datastream at ./load_ncbi_taxonomy.pl line 726 Loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp ... insert / update / delete taxon nodes ... updating new parent IDs ... (committing nodes) ... rebuilding nested set left/right values ... reading in taxon names from names.dmp ... deleting old taxon names ... inserting new taxon names ... cleaning up Done. real 18m29.409s user 2m28.149s sys 0m18.025s Some of that is of course the download time, so without that: $ time perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root Loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp ... insert / update / delete taxon nodes ... updating new parent IDs ... (committing nodes) ... rebuilding nested set left/right values ... reading in taxon names from names.dmp ... deleting old taxon names ... inserting new taxon names ... cleaning up Done. real 13m18.777s user 2m17.285s sys 0m14.821s This is slow, with plenty of disk activity during the taxon names bit. However, I haven't got the equivalent numbers from the previous script to hand (and its after midnight here so I won't re-run it now). I'd have guessed it used to be about 10 minutes on this machine though, i.e. it is probably taking longer, but it was already longer than I liked. I don't know if that helped, but as I said, I hope to do a more thorough job later on. Peter From hlapp at gmx.net Fri Aug 1 23:54:32 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 1 Aug 2008 19:54:32 -0400 Subject: [BioSQL-l] BioSQL uses In-Reply-To: References: <5C57BAC6-974F-4E75-93E6-36BE2A58E980@gmx.net> Message-ID: <9977D67D-5369-4CE4-9125-70ABD25065AE@gmx.net> Just FYI, I finally got around to creating a page on the wiki: http://www.biosql.org/wiki/Uses There's very little there right now, but people should feel free to add themselves to the list where they see fit. -hilmar On Feb 27, 2008, at 11:14 AM, Cook, Malcolm wrote: > this would made a great topic for a page at http://www.biosql.org/wiki/Main_Page > > > > > Malcolm Cook > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, Missouri > > >> -----Original Message----- >> From: biosql-l-bounces at lists.open-bio.org >> [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of >> Wiepert, Mathieu >> Sent: Wednesday, February 27, 2008 5:19 AM >> To: BioSQL >> Subject: [BioSQL-l] BioSQL uses >> >> Hi, >> >> It's great to this coming to release 1.0, thanks very much >> for this work. I was wondering if I may ask how different >> users take advantage of BioSQL in daily work. We have a >> number of pressing issues, many which need a database of >> sequence for which we can overlay SNP, gene exp., Array CGH, >> etc type data. This seems like it would be a great start >> upon which we can add additional location specific >> information or any other feature. >> >> What do others use it for, and how does BioSQL work for you? >> >> -mat >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 2 00:15:58 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 1 Aug 2008 20:15:58 -0400 Subject: [BioSQL-l] load_ncbi_taxonomy.pl In-Reply-To: <320fb6e00808011624u70af4135l2a20c09ca96f89fd@mail.gmail.com> References: <320fb6e00808011329t6005ee3bt7948718320aae3c7@mail.gmail.com> <320fb6e00808011358p9b3d5ek599623c770ca17a3@mail.gmail.com> <149B3375-0305-4420-AF70-BB6961050376@gmx.net> <320fb6e00808011624u70af4135l2a20c09ca96f89fd@mail.gmail.com> Message-ID: <27FCD5FB-34CB-4016-927C-23A2E821B159@gmx.net> These sound like reasonable times, depending on your machine configuration. I suspect that PostgreSQL might even be a bit faster, as that's a similar time to what I'm observing on my laptop. BTW if you provide --verbose=2 on the command line you'll get rows/ time statistics. The slowest steps (recomputing nested set values, and inserting taxon names) average between 900-1800 rows/s on my laptop, depending on what else is going on (I suspect the spotlight indexer to contend for the disk drive on occasion). The faster steps (e.g. inserting taxon nodes) I observe at up to 2500-4000 rows/s. Thanks for all the testing, it's much appreciated! -hilmar On Aug 1, 2008, at 7:24 PM, Peter wrote: > On Fri, Aug 1, 2008 at 10:04 PM, Hilmar Lapp wrote: >> Sounds like I at least managed to silence all the complaining of >> the script >> ;-) How long did it run? Was it similar to what you've seen earlier >> or >> outrageously longer? >> > > I just ran it again (so updating an already complete database): > > $ time perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql > --dbuser root --download true > Downloading NCBI taxon database to taxdata > Unable to close datastream at ./load_ncbi_taxonomy.pl line 726 > Loading NCBI taxon database in taxdata: > ... retrieving all taxon nodes in the database > ... reading in taxon nodes from nodes.dmp > ... insert / update / delete taxon nodes > ... updating new parent IDs > ... (committing nodes) > ... rebuilding nested set left/right values > ... reading in taxon names from names.dmp > ... deleting old taxon names > ... inserting new taxon names > ... cleaning up > Done. > > real 18m29.409s > user 2m28.149s > sys 0m18.025s > > Some of that is of course the download time, so without that: > > $ time perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql > --dbuser root Loading NCBI taxon database in taxdata: > ... retrieving all taxon nodes in the database > ... reading in taxon nodes from nodes.dmp > ... insert / update / delete taxon nodes > ... updating new parent IDs > ... (committing nodes) > ... rebuilding nested set left/right values > ... reading in taxon names from names.dmp > ... deleting old taxon names > ... inserting new taxon names > ... cleaning up > Done. > > real 13m18.777s > user 2m17.285s > sys 0m14.821s > > This is slow, with plenty of disk activity during the taxon names bit. > However, I haven't got the equivalent numbers from the previous > script to hand (and its after midnight here so I won't re-run it now). > I'd have guessed it used to be about 10 minutes on this machine > though, i.e. it is probably taking longer, but it was already longer > than I liked. > > I don't know if that helped, but as I said, I hope to do a more > thorough job later on. > > Peter -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Sat Aug 2 12:30:46 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 2 Aug 2008 13:30:46 +0100 Subject: [BioSQL-l] load_ncbi_taxonomy.pl In-Reply-To: <27FCD5FB-34CB-4016-927C-23A2E821B159@gmx.net> References: <320fb6e00808011329t6005ee3bt7948718320aae3c7@mail.gmail.com> <320fb6e00808011358p9b3d5ek599623c770ca17a3@mail.gmail.com> <149B3375-0305-4420-AF70-BB6961050376@gmx.net> <320fb6e00808011624u70af4135l2a20c09ca96f89fd@mail.gmail.com> <27FCD5FB-34CB-4016-927C-23A2E821B159@gmx.net> Message-ID: <320fb6e00808020530n23d5edd8pf0a3b460441a9bfd@mail.gmail.com> On Sat, Aug 2, 2008 at 1:15 AM, Hilmar Lapp wrote: > These sound like reasonable times, depending on your machine configuration. > I suspect that PostgreSQL might even be a bit faster, as that's a similar > time to what I'm observing on my laptop. > > BTW if you provide --verbose=2 on the command line you'll get rows/time > statistics. The slowest steps (recomputing nested set values, and inserting > taxon names) average between 900-1800 rows/s on my laptop, depending on what > else is going on (I suspect the spotlight indexer to contend for the disk > drive on occasion). The faster steps (e.g. inserting taxon nodes) I observe > at up to 2500-4000 rows/s. I'm seeing about 900 rows/s on the recomputing of the nested set values, which means my 2 year old desktop is slower than your laptop. This is an AMD Athlon 64 X2 4600+ Socket 939 dual core machine, with a Seagate Barracuda hard drive (7200rpm, 200GB, 8MB Cache, IDE Ultra ATA100), running Ubuntu Dapper Drake (due for an upgrade soon!). $ time perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root --verbose=2 Loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp ... insert / update / delete taxon nodes 20000/448630 done (in 0 secs, 20000.0 rows/s) 40000/448630 done (in 1 secs, 20000.0 rows/s) 60000/448630 done (in 0 secs, 20000.0 rows/s) 80000/448630 done (in 0 secs, 20000.0 rows/s) 100000/448630 done (in 0 secs, 20000.0 rows/s) 120000/448630 done (in 0 secs, 20000.0 rows/s) 140000/448630 done (in 1 secs, 20000.0 rows/s) 160000/448630 done (in 0 secs, 20000.0 rows/s) 180000/448630 done (in 0 secs, 20000.0 rows/s) 200000/448630 done (in 0 secs, 20000.0 rows/s) 220000/448630 done (in 0 secs, 20000.0 rows/s) 240000/448630 done (in 1 secs, 20000.0 rows/s) 260000/448630 done (in 0 secs, 20000.0 rows/s) 280000/448630 done (in 0 secs, 20000.0 rows/s) 300000/448630 done (in 0 secs, 20000.0 rows/s) 320000/448630 done (in 0 secs, 20000.0 rows/s) 340000/448630 done (in 1 secs, 20000.0 rows/s) 360000/448630 done (in 0 secs, 20000.0 rows/s) 380000/448630 done (in 0 secs, 20000.0 rows/s) 400000/448630 done (in 0 secs, 20000.0 rows/s) 420000/448630 done (in 0 secs, 20000.0 rows/s) 440000/448630 done (in 1 secs, 20000.0 rows/s) ... updating new parent IDs ... (committing nodes) ... rebuilding nested set left/right values 20000 done (in 22 secs, 909.1 rows/s) 40000 done (in 22 secs, 909.1 rows/s) 60000 done (in 23 secs, 869.6 rows/s) 80000 done (in 22 secs, 909.1 rows/s) 100000 done (in 22 secs, 909.1 rows/s) 120000 done (in 22 secs, 909.1 rows/s) 140000 done (in 22 secs, 909.1 rows/s) 160000 done (in 22 secs, 909.1 rows/s) 180000 done (in 22 secs, 909.1 rows/s) 200000 done (in 21 secs, 952.4 rows/s) 220000 done (in 21 secs, 952.4 rows/s) 240000 done (in 22 secs, 909.1 rows/s) 260000 done (in 22 secs, 909.1 rows/s) 280000 done (in 21 secs, 952.4 rows/s) 300000 done (in 22 secs, 909.1 rows/s) 320000 done (in 21 secs, 952.4 rows/s) 340000 done (in 22 secs, 909.1 rows/s) 360001 done (in 22 secs, 909.1 rows/s) 380001 done (in 22 secs, 909.1 rows/s) 400001 done (in 21 secs, 952.4 rows/s) 420001 done (in 22 secs, 909.1 rows/s) 440001 done (in 21 secs, 952.4 rows/s) ... reading in taxon names from names.dmp ... deleting old taxon names ... inserting new taxon names 20000 done (in 3 secs, 6666.7 rows/s) 40000 done (in 2 secs, 10000.0 rows/s) 60000 done (in 4 secs, 5000.0 rows/s) 80000 done (in 3 secs, 6666.7 rows/s) 100000 done (in 5 secs, 4000.0 rows/s) 120000 done (in 6 secs, 3333.3 rows/s) 140000 done (in 7 secs, 2857.1 rows/s) 160000 done (in 7 secs, 2857.1 rows/s) 180000 done (in 8 secs, 2500.0 rows/s) 200000 done (in 8 secs, 2500.0 rows/s) 220000 done (in 8 secs, 2500.0 rows/s) 240000 done (in 9 secs, 2222.2 rows/s) 260000 done (in 9 secs, 2222.2 rows/s) 280000 done (in 10 secs, 2000.0 rows/s) 300000 done (in 10 secs, 2000.0 rows/s) 320000 done (in 10 secs, 2000.0 rows/s) 340000 done (in 10 secs, 2000.0 rows/s) 360000 done (in 10 secs, 2000.0 rows/s) 380000 done (in 10 secs, 2000.0 rows/s) 400000 done (in 11 secs, 1818.2 rows/s) 420000 done (in 11 secs, 1818.2 rows/s) 440000 done (in 11 secs, 1818.2 rows/s) 460000 done (in 10 secs, 2000.0 rows/s) 480000 done (in 10 secs, 2000.0 rows/s) 500000 done (in 11 secs, 1818.2 rows/s) 520000 done (in 11 secs, 1818.2 rows/s) 540000 done (in 12 secs, 1666.7 rows/s) 560000 done (in 10 secs, 2000.0 rows/s) 580000 done (in 12 secs, 1666.7 rows/s) 600000 done (in 12 secs, 1666.7 rows/s) 620000 done (in 11 secs, 1818.2 rows/s) ... cleaning up Done. real 13m13.805s user 2m3.548s sys 0m13.781s > > Thanks for all the testing, it's much appreciated! > This is only very cursory, confirming the script runs without showing any error messages, but its better than no testing ;) Peter From hlapp at gmx.net Sat Aug 2 13:41:13 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 2 Aug 2008 09:41:13 -0400 Subject: [BioSQL-l] Announcement: BioSQL v1.0.1 Release Message-ID: BioSQL v1.0.1 Release ===================== I am pleased to announce the release of version 1.0.1 of BioSQL, the second release in the Tokyo release series. The release can be downloaded from the following locations: http://biosql.org/DIST/biosql-1.0.1.tar.gz http://biosql.org/DIST/biosql-1.0.1.tar.bz2 http://biosql.org/DIST/biosql-1.0.1.zip (has Windows-style EOL) The core BioSQL schema is a generic, extensible relational model for sequences, sequence features, their annotation, and ontology terms. It is also designed as the interoperable persistence interface between the Bio* projects. This release contains - the core BioSQL schema as DDL (Data Definition Language) for the following RDBMSs: MySQL, PostgreSQL, Oracle, HSQLDB, and Apache Derby, - migration scripts from v1.0.0 for PostgreSQL, MySQL, and Oracle, - ancillary (but optional) schema files for PostgreSQL, among which are scripts providing experimental support for the Bioperl and possibly other language bindings to BioSQL - documentation and an ERD (Entity-Relationship Diagram), and - a Perl script that can pre-load (and update) a BioSQL instance with the NCBI taxonomy. This version of the schema should be fully backwards compatible with the v1.0.0 schema for nearly all software and queries. The only change is relaxing the column width constraint (previously 40 chars, now 128) of bioentry.accession and dbxref.accession. Migration scripts are included for PostgreSQL, MySQL, and Oracle for those who want to simply upgrade their existing database. In addition, the script load_ncbi_taxonomy.pl has been fixed to no longer require the taxon primary key and the NCBI taxon ID to be identical. If you previously relied on this (documented but not guaranteed) behavior, you will need to adjust your respective software. To my knowledge, none of the Bio* language bindings should be affected by this change. The complete change log is listed in the file Changes, and installation instructions for MySQL and PostgreSQL are in the file INSTALL. Additional information regarding BioSQL, including links to language bindings, a roadmap to future releases and enhancements, and possible local optimizations is available from the BioSQL website at http://biosql.org. On behalf of the BioSQL developers, Hilmar Lapp Acknowledgments --------------- BioSQL in general and in particular this point release owes to the community of users and developers who provide feedback, advice, and ideas, and report issues on the BioSQL mailing list (biosql-l{at}lists.open-bio.org). Credit also goes to those who have helped testing, in particular Peter Cock. This project would not exist without their contributions and the support of other developers and users from the Bio* community. The 1.0.x release series is code-named Tokyo in recognition of the role the BioHackathon 2008 played in getting the first of the series (v1.0.0) out the door, and in keeping with an informal tradition held up since the first BioHackathon. Thank you to everyone! License ------- BioSQL is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. Enjoy! From hlapp at gmx.net Sat Aug 2 14:07:17 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 2 Aug 2008 10:07:17 -0400 Subject: [BioSQL-l] Announcement: BioSQL v1.0.1 Release In-Reply-To: References: Message-ID: <07355FA1-10F8-407F-B171-A45B852C6398@gmx.net> On Aug 2, 2008, at 9:41 AM, Hilmar Lapp wrote: > - ancillary (but optional) schema files for PostgreSQL, among which > are scripts providing experimental support for the Bioperl and > possibly other language bindings to BioSQL Of course that's not true. I've fixed this and re-uploaded: - ancillary (but optional) schema files for PostgreSQL, - scripts providing experimental support for the Bioperl and possibly other language bindings to BioSQL with PostgreSQL v8.3+ (v8.2 and earlier are supported fine), Sorry about the goof. I guess to limit confusion for the Google searchers I need to repost the announcement, so delete the previous one from your records ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 2 14:08:17 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 2 Aug 2008 10:08:17 -0400 Subject: [BioSQL-l] Announcement: BioSQL v1.0.1 Release (corrected) Message-ID: <51938949-181A-4750-9CEB-26FBE7C9A24E@gmx.net> (the previous announcement contained a small error) BioSQL v1.0.1 Release ===================== I am pleased to announce the release of version 1.0.1 of BioSQL, the second release in the Tokyo release series. The release can be downloaded from the following locations: http://biosql.org/DIST/biosql-1.0.1.tar.gz http://biosql.org/DIST/biosql-1.0.1.tar.bz2 http://biosql.org/DIST/biosql-1.0.1.zip (has Windows-style EOL) The core BioSQL schema is a generic, extensible relational model for sequences, sequence features, their annotation, and ontology terms. It is also designed as the interoperable persistence interface between the Bio* projects. This release contains - the core BioSQL schema as DDL (Data Definition Language) for the following RDBMSs: MySQL, PostgreSQL, Oracle, HSQLDB, and Apache Derby, - migration scripts from v1.0.0 for PostgreSQL, MySQL, and Oracle, - ancillary (but optional) schema files for PostgreSQL, - scripts providing experimental support for the Bioperl and possibly other language bindings to BioSQL with PostgreSQL v8.3+ (v8.2 and earlier are supported fine), - documentation and an ERD (Entity-Relationship Diagram), and - a Perl script that can pre-load (and update) a BioSQL instance with the NCBI taxonomy. This version of the schema should be fully backwards compatible with the v1.0.0 schema for nearly all software and queries. The only change is relaxing the column width constraint (previously 40 chars, now 128) of bioentry.accession and dbxref.accession. Migration scripts are included for PostgreSQL, MySQL, and Oracle for those who want to simply upgrade their existing database. In addition, the script load_ncbi_taxonomy.pl has been fixed to no longer require the taxon primary key and the NCBI taxon ID to be identical. If you previously relied on this (documented but not guaranteed) behavior, you will need to adjust your respective software. To my knowledge, none of the Bio* language bindings should be affected by this change. The complete change log is listed in the file Changes, and installation instructions for MySQL and PostgreSQL are in the file INSTALL. Additional information regarding BioSQL, including links to language bindings, a roadmap to future releases and enhancements, and possible local optimizations is available from the BioSQL website at http://biosql.org. On behalf of the BioSQL developers, Hilmar Lapp Acknowledgments --------------- BioSQL in general and in particular this point release owes to the community of users and developers who provide feedback, advice, and ideas, and report issues on the BioSQL mailing list (biosql-l{at}lists.open-bio.org). Credit also goes to those who have helped testing, in particular Peter Cock. This project would not exist without their contributions and the support of other developers and users from the Bio* community. The 1.0.x release series is code-named Tokyo in recognition of the role the BioHackathon 2008 played in getting the first of the series (v1.0.0) out the door, and in keeping with an informal tradition held up since the first BioHackathon. Thank you to everyone! License ------- BioSQL is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. From biopython at maubp.freeserve.co.uk Wed Aug 13 11:44:21 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 13 Aug 2008 12:44:21 +0100 Subject: [BioSQL-l] Announcement: BioSQL v1.0.1 Release (corrected) In-Reply-To: <51938949-181A-4750-9CEB-26FBE7C9A24E@gmx.net> References: <51938949-181A-4750-9CEB-26FBE7C9A24E@gmx.net> Message-ID: <320fb6e00808130444g4c8ef5aehc87a6cc01f74092@mail.gmail.com> On Sat, Aug 2, 2008 at 3:08 PM, Hilmar Lapp wrote: > (the previous announcement contained a small error) > > BioSQL v1.0.1 Release > ===================== > > I am pleased to announce the release of version 1.0.1 of BioSQL, the > second release in the Tokyo release series. The release can be > downloaded from the following locations: > > http://biosql.org/DIST/biosql-1.0.1.tar.gz > http://biosql.org/DIST/biosql-1.0.1.tar.bz2 > http://biosql.org/DIST/biosql-1.0.1.zip (has Windows-style EOL) > > ... Hilmar, I've put a belated announcement of the BioSQL 1.0.1 release up on the OBF news server, http://news.open-bio.org/news/ http://news.open-bio.org/news/ Did you get Jason's emails about the new news server? If you register an account he can give you admin rights. Peter (Biopython) From hlapp at gmx.net Wed Aug 13 14:08:51 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 13 Aug 2008 10:08:51 -0400 Subject: [BioSQL-l] Announcement: BioSQL v1.0.1 Release (corrected) In-Reply-To: <320fb6e00808130444g4c8ef5aehc87a6cc01f74092@mail.gmail.com> References: <51938949-181A-4750-9CEB-26FBE7C9A24E@gmx.net> <320fb6e00808130444g4c8ef5aehc87a6cc01f74092@mail.gmail.com> Message-ID: <4C0E6537-C645-467B-AE26-4A22688CE8CA@gmx.net> Thanks Peter, that's much appreciated! It was actually on my todo list. -hilmar On Aug 13, 2008, at 7:44 AM, Peter wrote: > On Sat, Aug 2, 2008 at 3:08 PM, Hilmar Lapp wrote: >> (the previous announcement contained a small error) >> >> BioSQL v1.0.1 Release >> ===================== >> >> I am pleased to announce the release of version 1.0.1 of BioSQL, the >> second release in the Tokyo release series. The release can be >> downloaded from the following locations: >> >> http://biosql.org/DIST/biosql-1.0.1.tar.gz >> http://biosql.org/DIST/biosql-1.0.1.tar.bz2 >> http://biosql.org/DIST/biosql-1.0.1.zip (has Windows-style EOL) >> >> ... > > Hilmar, > > I've put a belated announcement of the BioSQL 1.0.1 release up on the > OBF news server, http://news.open-bio.org/news/ > http://news.open-bio.org/news/ > > Did you get Jason's emails about the new news server? If you register > an account he can give you admin rights. > > Peter > (Biopython) -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mrphysh at juno.com Wed Aug 13 20:13:56 2008 From: mrphysh at juno.com (mrphysh at juno.com) Date: Wed, 13 Aug 2008 20:13:56 GMT Subject: [BioSQL-l] installation.....IO/String problem Message-ID: <20080813.141356.8454.0@webmail16.vgs.untd.com> I am having trouble with database retrieval from online databases. This is an install problem.(?)..I am running Linux (Ubuntu)......... I did these, following the documentation. from cpan> install Bundle::CPAN install Module::Build #one of the many help files said to do this install Bundle;;BioPerl force install B/BI/BIRNEY/bioperl-1.4.tar.gz The ftp found the file and went to work. After many minutes, at the end, this what I saw: t/Variation _IO.............................FAILED tests 15,20,25 Failed 3/25 88% okay t/WABA...............................ok t/XEMBL_DB...........................ok t/XEMBL_DB...........................SOAP::lite and/or XML::DOM not installed. this means that Bio::DB::XEMBL module is not usable. Skipping test t/XEMBL_DB...........................ok failed test stat wstat total fail failed list of failed t/BioFetch_DB.t 27 4 14% 8 20 21 27 t/DB.t 78 2 2.5% 30 31 t/EMBL_DB.t 15 3 20$ 6 13 14 t/Ontology.t 9 2304 50 100 200% 1-50 t/TreeIO.t 41 1 2.4% 42 t/Variation_IO.t 25 3 12% 15 20 25 t/simpleGPparser.t 9 2304 98 196 200% 1-98 18 SUBTESTS SKIPPED fAILED 7/179 TEST SCRIPTS 96.09% 159/8268 SUBTEST FAILED 98% OKAY MAKE: ****[TEST DYNAMIC] ERROR 225 /USR/BIN/MAKE_TEST -- not ok Running make install Warning: you do not have permission to install into /usr/local/lib/perl/5.8.8 at /usr/share/perl/5.8/ExUtils /install.pm line 114 can't open file /usr/local/lib/perl/5.8.8/auto/Bio/.packlist: permission denied at /usr/share/perl/5.8/ExtUtils?Install.pm line 209 writing /usr/local/lib/perl/5.8.8/auto/Bio/.packlist make: *** [pure_site_install error13 /usr/bin/make_install --- NOT OKAY you may have to u to root to install the package cpan> #this is all my typing I have this little script (from a tutorial) and others that are similar use Bio::Perl; # this script will only work with an internet connection # on the computer it is run on $seq_object = get_sequence('swissprot',"ROA1_HUMAN"); write_sequence(">roa1.fasta",'fasta',$seq_object); I quit CPAN and type ~~ perl ee_use_bioperl.pl ######I get you system does not have of LWP, HTTP::Request::Common, IO::String installed so the DB retrieval method is not available. Full Error message is: at /usr/local/hsare/ perl/5.8.8/bio/perl.pm line 464 Bio::perl::Get_sequence('swissprot','ROA!_HUMAN') called at ee_use_bioperl.pl line 4 john at john-desktop:~/bbs$ ############# I feel that I am making progress but need assistance on this roadblock. My ideas and questions. Is this a perl issue. I am using the perl 5.8.8 that came with the Ubuntu I am much aware of the permissions aspect of Linux. The documentation says little about this. Is this where I am hanging up? (As you all know, Ubuntu has no logon as root but uses a sudu permissions system) I have reloaded the bioperl many many itmes. I do not want to sound 'windowie' but should I uninstall, then install? The errors always point to IO::string. I can find String.pm files in the /usr/hsare/perl5/debconf/Element but nowhere else. I cannot find a /IO/ (an IO folder) anywhere. please and thanks John Brigham ____________________________________________________________ Click for free quote on refinancing your mortgage. http://thirdpartyoffers.juno.com/TGL2141/fc/Ioyw6i3m3eQx5FoElnu5twhRhhF3am385HBkN0mvSSXTIBBqKLaZFi/ From biopython at maubp.freeserve.co.uk Wed Aug 13 20:54:50 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 13 Aug 2008 21:54:50 +0100 Subject: [BioSQL-l] installation.....IO/String problem In-Reply-To: <20080813.141356.8454.0@webmail16.vgs.untd.com> References: <20080813.141356.8454.0@webmail16.vgs.untd.com> Message-ID: <320fb6e00808131354q511333fdp107a2cacbe481fae@mail.gmail.com> On Wed, Aug 13, 2008 at 9:13 PM, mrphysh at juno.com wrote: > > I am having trouble with database retrieval from online databases. > This is an install problem.(?)..I am running Linux (Ubuntu)......... > I did these, following the documentation. from cpan> > install Bundle::CPAN > > install Module::Build #one of the many help files said to do this > install Bundle;;BioPerl > force install B/BI/BIRNEY/bioperl-1.4.tar.gz >... > can't open file /usr/local/lib/perl/5.8.8/auto/Bio/.packlist: permission denied at /usr/share/perl/5.8/ExtUtils?Install.pm line 209 > writing /usr/local/lib/perl/5.8.8/auto/Bio/.packlist > make: *** [pure_site_install error13 > /usr/bin/make_install --- NOT OKAY > you may have to u to root to install the package This is the BioSQL mailing list, not the BioPerl mailing list, so you are asking the wrong people. However, this looks like a simple permissions problem - havee you tried to install this as the root user (e.g. use "sudo cpan" to start cpan) or have you configured it to install under your home directory where you should have write permissions? Peter From cjfields at illinois.edu Wed Aug 13 21:26:20 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 13 Aug 2008 16:26:20 -0500 Subject: [BioSQL-l] installation.....IO/String problem In-Reply-To: <320fb6e00808131354q511333fdp107a2cacbe481fae@mail.gmail.com> References: <20080813.141356.8454.0@webmail16.vgs.untd.com> <320fb6e00808131354q511333fdp107a2cacbe481fae@mail.gmail.com> Message-ID: <6011F562-F73A-44E8-8F40-4E4A6C4A83A7@illinois.edu> On Aug 13, 2008, at 3:54 PM, Peter wrote: > On Wed, Aug 13, 2008 at 9:13 PM, mrphysh at juno.com > wrote: >> >> I am having trouble with database retrieval from online databases. >> This is an install problem.(?)..I am running Linux (Ubuntu)......... >> I did these, following the documentation. from cpan> >> install Bundle::CPAN >> >> install Module::Build #one of the many help files said to do this >> install Bundle;;BioPerl >> force install B/BI/BIRNEY/bioperl-1.4.tar.gz >> ... >> can't open file /usr/local/lib/perl/5.8.8/auto/Bio/.packlist: >> permission denied at /usr/share/perl/5.8/ExtUtils?Install.pm line 209 >> writing /usr/local/lib/perl/5.8.8/auto/Bio/.packlist >> make: *** [pure_site_install error13 >> /usr/bin/make_install --- NOT OKAY >> you may have to u to root to install the package > > This is the BioSQL mailing list, not the BioPerl mailing list, so you > are asking the wrong people. > > However, this looks like a simple permissions problem - havee you > tried to install this as the root user (e.g. use "sudo cpan" to start > cpan) or have you configured it to install under your home directory > where you should have write permissions? > > Peter I think this is also a bioperl versioning issue. Module::Build is (oddly) calling for the old BioPerl version (1.4) which is way out-of- date. You should try installing bioperl 1.5.2 or bioperl-live for this; see here: http://www.bioperl.org/wiki/Installing_BioPerl http://www.bioperl.org/wiki/Core_package chris From biopython at maubp.freeserve.co.uk Mon Aug 18 15:15:37 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 18 Aug 2008 16:15:37 +0100 Subject: [BioSQL-l] Checking bioperl-db version number Message-ID: <320fb6e00808180815x3e123c51t73ff4febe5e3c239@mail.gmail.com> Hi all, This might be better asked on the main BioPerl mailing list, however, I would like to know how to get the version of bioperl-db (i.e. the part of BioPerl used to import sequence files into BioSQL). Thanks, Peter -- P.S. I've found two equivalent ways to check the version of BioPerl itself: require Bio::Perl; print "Bio::Perl::VERSION = "; print $Bio::Perl::VERSION, "\n"; require Bio::Root::Version; print "Bio::Root::Version::VERSION = "; print $Bio::Root::Version::VERSION, "\n"; Example output: Bio::Perl::VERSION = 1.005002102 Bio::Root::Version::VERSION = 1.005002102 From hlapp at gmx.net Mon Aug 18 15:44:08 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 18 Aug 2008 11:44:08 -0400 Subject: [BioSQL-l] Checking bioperl-db version number In-Reply-To: <320fb6e00808180815x3e123c51t73ff4febe5e3c239@mail.gmail.com> References: <320fb6e00808180815x3e123c51t73ff4febe5e3c239@mail.gmail.com> Message-ID: Peter - can you repost this on the Bioperl list? There's several people who should this better than I do. -hilmar On Aug 18, 2008, at 11:15 AM, Peter wrote: > Hi all, > > This might be better asked on the main BioPerl mailing list, however, > I would like to know how to get the version of bioperl-db (i.e. the > part of BioPerl used to import sequence files into BioSQL). > > Thanks, > > Peter > > -- > > P.S. I've found two equivalent ways to check the version of BioPerl > itself: > > require Bio::Perl; > print "Bio::Perl::VERSION = "; > print $Bio::Perl::VERSION, "\n"; > > require Bio::Root::Version; > print "Bio::Root::Version::VERSION = "; > print $Bio::Root::Version::VERSION, "\n"; > > Example output: > Bio::Perl::VERSION = 1.005002102 > Bio::Root::Version::VERSION = 1.005002102 > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Mon Aug 18 15:50:26 2008 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Aug 2008 10:50:26 -0500 Subject: [BioSQL-l] Checking bioperl-db version number In-Reply-To: <320fb6e00808180815x3e123c51t73ff4febe5e3c239@mail.gmail.com> References: <320fb6e00808180815x3e123c51t73ff4febe5e3c239@mail.gmail.com> Message-ID: <41928BBC-2665-4D04-A7CA-FE07A8FD63EE@illinois.edu> I don't think bioperl-db has a specific version separate from BioPerl, at least not anymore. As you found you'll get 1.005002102 (i.e. 1.5.2), which corresponds to the bioperl-core version installed. chris On Aug 18, 2008, at 10:15 AM, Peter wrote: > Hi all, > > This might be better asked on the main BioPerl mailing list, however, > I would like to know how to get the version of bioperl-db (i.e. the > part of BioPerl used to import sequence files into BioSQL). > > Thanks, > > Peter > > -- > > P.S. I've found two equivalent ways to check the version of BioPerl > itself: > > require Bio::Perl; > print "Bio::Perl::VERSION = "; > print $Bio::Perl::VERSION, "\n"; > > require Bio::Root::Version; > print "Bio::Root::Version::VERSION = "; > print $Bio::Root::Version::VERSION, "\n"; > > Example output: > Bio::Perl::VERSION = 1.005002102 > Bio::Root::Version::VERSION = 1.005002102 > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From biopython at maubp.freeserve.co.uk Mon Aug 18 16:23:38 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 18 Aug 2008 17:23:38 +0100 Subject: [BioSQL-l] Timing importing GenBank files into BioSQL Message-ID: <320fb6e00808180923w43b242b3ue22c48331260f9be@mail.gmail.com> Hi, I've started trying to look at BioPerl and Biopython and how well they agree in writing GenBank files into BioSQL. I've been using the BioPerl load_seqdatabase.pl script to import sample GenBank files, but I was a little surprised how long this takes to run for E. coli K12, NC_000913.gbk (about 10 minutes!). I'm using E coli K12, NC_000913.2 from ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K12_substr__MG1655/NC_000913.gbk and Nanoarchaeum equitans, NC_005213.1 from ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Nanoarchaeum_equitans/NC_005213.gbk as my example input files. Example timing using BioPerl, after emptying most (all?) of my MySQL test database: $ mysql --user="gbrowse" --pass="biosql" test_biosql -e "truncate table bioentry; truncate table seqfeature; truncate table bioentry_dbxref; truncate table term; truncate table ontology; truncate table reference; truncate table dbxref;" $ time perl ~/Downloads/Software/bioperl-db-1.5.2_100/scripts/biosql/load_seqdatabase.pl --dbname test_biosql --namespace test --format genbank --dbpass biosql --dbuser gbrowse Nanoarchaeum_equitans/NC_005213.gbk Loading Nanoarchaeum_equitans/NC_005213.gbk ... real 0m17.116s user 0m13.914s sys 0m2.293s $ time perl ~/Downloads/Software/bioperl-db-1.5.2_100/scripts/biosql/load_seqdatabase.pl --dbname test_biosql --namespace test --format genbank --dbpass biosql --dbuser gbrowse Escherichia_coli_K12_substr__MG1655/NC_000913.gbk Loading Escherichia_coli_K12_substr__MG1655/NC_000913.gbk ... real 10m0.784s user 6m23.898s sys 3m26.189s This does seem a rather unreasonable length of time (and I've repeated this over three times). Is this normal? I know this may not be a fair comparison, but this it what Biopython takes (code at end of email): $ mysql --user="gbrowse" --pass="biosql" test_biosql -e "truncate table bioentry; truncate table seqfeature; truncate table bioentry_dbxref; truncate table term; truncate table ontology; truncate table reference; truncate table dbxref;" $ time python load.py Importing Nanoarchaeum_equitans/NC_005213.gbk Loaded 1 records Took 5.32s include the commit Importing Escherichia_coli_K12_substr__MG1655/NC_000913.gbk Loaded 1 records Took 64.15s including the commit real 1m10.037s user 0m31.942s sys 0m6.913s I'm wondering if the BioPerl time is typical (I hope not), and if there are any computationally intensive or otherwise slow things it does that BioPython might be skipping (checksums? fetching taxonomy?) Thanks Peter --------------------------------------------------------------------- The contents of my load.py script: import time from Bio import SeqIO from BioSQL import BioSeqDatabase server = BioSeqDatabase.open_database(driver="MySQLdb", user="gbrowse", passwd = "biosql", host = "localhost", db="test_biosql") db = server["test"] start = time.time() filename = "Nanoarchaeum_equitans/NC_005213.gbk" print "Importing %s" % filename records = SeqIO.parse(open(filename), "genbank") print "Loaded %i records" % db.load(records) server.adaptor.commit() print "Took %0.2fs including the commit" % (time.time()-start) start = time.time() filename = "Escherichia_coli_K12_substr__MG1655/NC_000913.gbk" print "Importing %s" % filename records = SeqIO.parse(open(filename), "genbank") print "Loaded %i records" % db.load(records) server.adaptor.commit() print "Took %0.2fs including the commit" % (time.time()-start) From biopython at maubp.freeserve.co.uk Mon Aug 18 17:05:37 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 18 Aug 2008 18:05:37 +0100 Subject: [BioSQL-l] Timing importing GenBank files into BioSQL In-Reply-To: <48A9A44D.4000309@bham.ac.uk> References: <320fb6e00808180923w43b242b3ue22c48331260f9be@mail.gmail.com> <48A9A44D.4000309@bham.ac.uk> Message-ID: <320fb6e00808181005p5894c897ia2cea4b7fe20fdac@mail.gmail.com> On Mon, Aug 18, 2008 at 5:33 PM, Nick Loman wrote: > Peter wrote: > >> I'm wondering if the BioPerl time is typical (I hope not), and if >> there are any computationally intensive or otherwise slow things it >> does that BioPython might be skipping (checksums? fetching taxonomy?) > > I also found that BioPython was faster than BioPerl at importing the same > GenBank file. That is reassuring that you also saw a difference - do you recall how big a difference this was on your setup? The factor of ten I am seeming is rather surprising. > There are some differences in the handling of certain tables, the dbxref > table springs to mind. It is worth doing a dump of the database after > importing each file using the two different methods and comparing the > results. The differences may not be significant for you depending on your > application. I am hoping to bring Biopython into closer agreement with BioPerl (and thus also BioJava) in its use of BioSQL. If you have already made notes on any observed differences, that could be very useful. > I suspect the difference is speed you find is related to the number of > object lookups done in BioPerl which is significantly more than in > BioPython. You can specify --flatlookup to load_seqdatabase.pl which reduces > the number of lookups. Reading the help output from the load_seqdatabase.pl script, ??lookup and --flatlookup seem to be related to speeding up updating existing records (where as in my test, I am trying to start with an empty database each time). I tried it anyway, and it seems to make no difference for this example. But thanks for the suggestions, its one thing ruled out at least. > You could enable DBI_TRACE to get a log of SQL statements for BioPerl. That could help track down some differences, both in what gets written and how it gets written. I am hoping to avoid using too much Perl, otherwise I'm sure profiling load_seqdatabase.pl could be informative too. > For my purposes, I found both Bioperl and Biopython to be a bit slow devised > a batch import script which speeds things up quite dramatically by > eliminating most object lookups, and applying the foreign-key constraints > post-importing. This was your "BioSQL BatchLoader" code for PostgreSQL? I remember the impressive speed up you got, at the expense of a much modified setup. http://portal.open-bio.org/pipermail/biopython-dev/2008-April/003618.html Peter From n.j.loman at bham.ac.uk Mon Aug 18 16:33:17 2008 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Mon, 18 Aug 2008 17:33:17 +0100 Subject: [BioSQL-l] Timing importing GenBank files into BioSQL In-Reply-To: <320fb6e00808180923w43b242b3ue22c48331260f9be@mail.gmail.com> References: <320fb6e00808180923w43b242b3ue22c48331260f9be@mail.gmail.com> Message-ID: <48A9A44D.4000309@bham.ac.uk> Peter wrote: > I'm wondering if the BioPerl time is typical (I hope not), and if > there are any computationally intensive or otherwise slow things it > does that BioPython might be skipping (checksums? fetching taxonomy?) I also found that BioPython was faster than BioPerl at importing the same GenBank file. There are some differences in the handling of certain tables, the dbxref table springs to mind. It is worth doing a dump of the database after importing each file using the two different methods and comparing the results. The differences may not be significant for you depending on your application. I suspect the difference is speed you find is related to the number of object lookups done in BioPerl which is significantly more than in BioPython. You can specify --flatlookup to load_seqdatabase.pl which reduces the number of lookups. You could enable DBI_TRACE to get a log of SQL statements for BioPerl. For my purposes, I found both Bioperl and Biopython to be a bit slow devised a batch import script which speeds things up quite dramatically by eliminating most object lookups, and applying the foreign-key constraints post-importing. Regards, Nick. From biopython at maubp.freeserve.co.uk Mon Aug 18 17:41:58 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 18 Aug 2008 18:41:58 +0100 Subject: [BioSQL-l] Timing importing GenBank files into BioSQL In-Reply-To: <320fb6e00808181005p5894c897ia2cea4b7fe20fdac@mail.gmail.com> References: <320fb6e00808180923w43b242b3ue22c48331260f9be@mail.gmail.com> <48A9A44D.4000309@bham.ac.uk> <320fb6e00808181005p5894c897ia2cea4b7fe20fdac@mail.gmail.com> Message-ID: <320fb6e00808181041h14641ccftef53aa100f758552@mail.gmail.com> Peter wrote: >>> I'm wondering if the BioPerl time is typical (I hope not), and if >>> there are any computationally intensive or otherwise slow things it >>> does that BioPython might be skipping (checksums? fetching taxonomy?) Nick wrote: >> I also found that BioPython was faster than BioPerl at importing the same >> GenBank file. If anyone else with at least two of BioPerl, BioJava, BioRuby and Biopython installed could try this example, and report their findings, that would be interesting. i.e. time importing the small NC_005213.1 and medium sized NC_000913.2 genbank files linked to at the start of this thread into an empty BioSQL database. Thanks, Peter From johnsonm at gmail.com Mon Aug 18 20:53:48 2008 From: johnsonm at gmail.com (Mark Johnson) Date: Mon, 18 Aug 2008 15:53:48 -0500 Subject: [BioSQL-l] Bio::Annotation issues with BioSQL Message-ID: I'm presently refactoring an in-house protein annotation pipeline and converting it to use BioSQL as a data store. I've noticed some slightly screwy behavior with regard to how some of the Bio::Annotation classes are handled: -Instances of Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue attached to the annotation collection for a sequence feature (Bio::SeqFeature::Generic) are converted to tags/values on the feature. -Instances of Bio::AnnotationDBLink with attached comments loose the comment. I'm storing and retrieving things thusly: my $dbadp = Bio::DB::BioDB->new( -database => 'biosql', -user => $user', -pass => $pass, -dbname => $ora_instance, -driver => 'Oracle' ); my $adp = $dbadp->get_object_adaptor("Bio::SeqI"); my $seq = Bio::Seq->new( -id => 'DEBUG001', -accession_number => 'DBG001', -desc => 'Debug Sequence', -seq => 'GATTACA', -namespace => 'DEBUG', ); my $feature = Bio::SeqFeature::Generic->new( -seq_id => 'DEBUG001', -display_name => 'FEAT0001', -primary => 'debug', -source => 'test', -start => 3, -end => 5, -strand => 1, ); my $dblink = Bio::Annotation::DBLink->new( -database => 'FAKE001', -primary_id => 'FK1234567890'', -comment => 'This is a fake comment', ); $feature->annotation->add_Annotation('ANNO0001, $dblink); $seq->add_SeqFeature($feature); my $pseq = $dbadp->create_persistent($seq); $pseq->store(); $adp->commit(); my $dbadp = Bio::DB::BioDB->new( ... ); my $adp = $dbadp->get_object_adaptor("Bio::SeqI"); my $query = Bio::DB::Query::BioQuery->new(); $query->datacollections([ "Bio::PrimarySeqI s", ]); $query->where(["s.display_id like DEBUG%'"]); my $result = $adp->find_by_query($query); while (my $seq = $result->next_object()) { my @features = $seq->get_SeqFeatures(); foreach my $feature (@features) { ## Contents of Bio::Annotation::SimpleValue and Bio::Annotation::StructeredValue have ## migrated to tag/value pairs on $feature and are missing from $annotation_collection. ## ## Comments have gone missing from Bio::Annotation::DBLink, but DBLinks are otherwise intact and present. my $annotation_collection = $feature->annotation(); ... ... } } Is bioperl-db / BioSQL trying to tell me that I shouldn't be using Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue? Is there even a place in the BioSQL schema for a comment to be attached to a DBLink? From hlapp at gmx.net Tue Aug 19 17:56:42 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 19 Aug 2008 13:56:42 -0400 Subject: [BioSQL-l] Bio::Annotation issues with BioSQL In-Reply-To: References: Message-ID: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net> On Aug 18, 2008, at 4:53 PM, Mark Johnson wrote: > I'm presently refactoring an in-house protein annotation pipeline > and converting it to use BioSQL as a data store. I've noticed some > slightly screwy behavior with regard to how some of the > Bio::Annotation classes are handled: > > -Instances of Bio::Annotation::SimpleValue and > Bio::Annotation::StructuredValue attached to the annotation collection > for a sequence feature (Bio::SeqFeature::Generic) are converted to > tags/values on the feature. > > -Instances of Bio::Annotation::DBLink with attached comments loose > the comment. > [...] > $query->where(["s.display_id like DEBUG%'"]); There's a single quote missing here, but I'm assuming that's a result of copy/paste editing? > [...] > Is bioperl-db / BioSQL trying to tell me that I shouldn't be using > Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue? Your example code doesn't contain an example for where you are getting the B::A::StructuredValue object from. If you didn't create that yourself, it would be good to know what you did to end up with that. Chris Fields has written B::A::Tagtree which would be way forward, and if you created the object yourself, can you take a look at that and see whether that class wouldn't serve your purpose as well or even better? In order to be stored in BioSQL structured (hierarchical, nested) annotation is flattened into a string representation, because BioSQL can't store nested annotation collections natively. Right now if I am not mistaken upon retrieval this is not converted back into a B::A::Tagtree object but rather left flat. This is being worked on though, we've just discussed some issues connected with that. I could make B::A::StructuredValue work the same way, but I'm not sure what it provides that B::A::Tagtree doesn't. The latter uses Data::Stag under the hood, which is much cleaner, and more extensible in the future. As for SimpleValue annotation versus tag/value annotation for seqfeatures, yes right now these are treated interchangeably for the purposes of BioSQL and Bioperl-db. You can do this easily too on your end by using Bio::SeqFeature::AnnotationAdaptor. > Is there even a place in the BioSQL schema for a comment to be > attached > to a DBLink? No there isn't. I thought it is but it turns out that this isn't yet one of the desirable extensions to BioSQL from 1.1.x onwards, as documented on the wiki: http://www.biosql.org/wiki/Enhancement_Requests I'll add it (but feel free to do so yourself, especially if you have other enhancmenets). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Aug 19 18:17:36 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 19 Aug 2008 14:17:36 -0400 Subject: [BioSQL-l] Timing importing GenBank files into BioSQL In-Reply-To: <320fb6e00808180923w43b242b3ue22c48331260f9be@mail.gmail.com> References: <320fb6e00808180923w43b242b3ue22c48331260f9be@mail.gmail.com> Message-ID: The timings do seem a bit on the long end, but they are also whole genomes. The first interesting bit would be how much of that time is spent in the BioPerl parser, and how much time is spent loading the sequence. For typical genbank sequences, a rate between 10-20 seqs/sec is in the expected range, depending on your hardware setup (and db configuration) you can get slower or faster speeds. You can get lots of output on what it is doing by passing --debug. Under normal operating conditions, the printed lines should be flying past you much faster than you can identify what it is, and should start doing so right after you get the line "Loading " followed by the filename (before that it is opening the database connection). If there is something that stays on the screen long enough that you can read (or copy&paste) it it is probably a bottle neck. Bioperl-db essentially works like an object-relational mapper, and hence loading data happens one object at a time. There are some speed optimizations, for example some objects (like dbxrefs) are always looked up first and inserted if not found, whereas others (like seqs or features) are inserted first and updated if that fails. The assumptions that this is based on are for databases that you are updating (which is what one typically does 90% of the time), not for fresh loads into an empty db. Finally any speed comparisons aren't really particularly useful so long as you don't know how similar (or different) the resulting data content is, so I would start by comparing that. -hilmar On Aug 18, 2008, at 12:23 PM, Peter wrote: > Hi, > > I've started trying to look at BioPerl and Biopython and how well they > agree in writing GenBank files into BioSQL. I've been using the > BioPerl load_seqdatabase.pl script to import sample GenBank files, but > I was a little surprised how long this takes to run for E. coli K12, > NC_000913.gbk (about 10 minutes!). I'm using E coli K12, NC_000913.2 > from ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K12_substr__MG1655/NC_000913.gbk > and Nanoarchaeum equitans, NC_005213.1 from > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Nanoarchaeum_equitans/NC_005213.gbk > as my example input files. > > Example timing using BioPerl, after emptying most (all?) of my MySQL > test database: > > $ mysql --user="gbrowse" --pass="biosql" test_biosql -e "truncate > table bioentry; truncate table seqfeature; truncate table > bioentry_dbxref; truncate table term; truncate table ontology; > truncate table reference; truncate table dbxref;" > > $ time perl ~/Downloads/Software/bioperl-db-1.5.2_100/scripts/biosql/ > load_seqdatabase.pl > --dbname test_biosql --namespace test --format genbank --dbpass biosql > --dbuser gbrowse Nanoarchaeum_equitans/NC_005213.gbk > Loading Nanoarchaeum_equitans/NC_005213.gbk ... > > real 0m17.116s > user 0m13.914s > sys 0m2.293s > > $ time perl ~/Downloads/Software/bioperl-db-1.5.2_100/scripts/biosql/ > load_seqdatabase.pl > --dbname test_biosql --namespace test --format genbank --dbpass biosql > --dbuser gbrowse Escherichia_coli_K12_substr__MG1655/NC_000913.gbk > Loading Escherichia_coli_K12_substr__MG1655/NC_000913.gbk ... > > real 10m0.784s > user 6m23.898s > sys 3m26.189s > > This does seem a rather unreasonable length of time (and I've repeated > this over three times). Is this normal? I know this may not be a > fair comparison, but this it what Biopython takes (code at end of > email): > > $ mysql --user="gbrowse" --pass="biosql" test_biosql -e "truncate > table bioentry; truncate table seqfeature; truncate table > bioentry_dbxref; truncate table term; truncate table ontology; > truncate table reference; truncate table dbxref;" > > $ time python load.py > Importing Nanoarchaeum_equitans/NC_005213.gbk > Loaded 1 records > Took 5.32s include the commit > Importing Escherichia_coli_K12_substr__MG1655/NC_000913.gbk > Loaded 1 records > Took 64.15s including the commit > > real 1m10.037s > user 0m31.942s > sys 0m6.913s > > I'm wondering if the BioPerl time is typical (I hope not), and if > there are any computationally intensive or otherwise slow things it > does that BioPython might be skipping (checksums? fetching taxonomy?) > > Thanks > > Peter > > --------------------------------------------------------------------- > The contents of my load.py script: > > import time > from Bio import SeqIO > from BioSQL import BioSeqDatabase > server = BioSeqDatabase.open_database(driver="MySQLdb", > user="gbrowse", > passwd = "biosql", host = "localhost", > db="test_biosql") > > db = server["test"] > > start = time.time() > filename = "Nanoarchaeum_equitans/NC_005213.gbk" > print "Importing %s" % filename > records = SeqIO.parse(open(filename), "genbank") > print "Loaded %i records" % db.load(records) > server.adaptor.commit() > print "Took %0.2fs including the commit" % (time.time()-start) > > start = time.time() > filename = "Escherichia_coli_K12_substr__MG1655/NC_000913.gbk" > print "Importing %s" % filename > records = SeqIO.parse(open(filename), "genbank") > print "Loaded %i records" % db.load(records) > server.adaptor.commit() > print "Took %0.2fs including the commit" % (time.time()-start) > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From johnsonm at gmail.com Wed Aug 20 18:43:25 2008 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 20 Aug 2008 13:43:25 -0500 Subject: [BioSQL-l] Bio::Annotation issues with BioSQL In-Reply-To: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net> References: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net> Message-ID: On Tue, Aug 19, 2008 at 12:56 PM, Hilmar Lapp wrote: > On Aug 18, 2008, at 4:53 PM, Mark Johnson wrote: > There's a single quote missing here, but I'm assuming that's a result of > copy/paste editing? Yes, I was a bit sloppy with the example. > Your example code doesn't contain an example for where you are getting the > B::A::StructuredValue object from. If you didn't create that yourself, it > would be good to know what you did to end up with that. Chris Fields has > written B::A::Tagtree which would be way forward, and if you created the > object yourself, can you take a look at that and see whether that class > wouldn't serve your purpose as well or even better? I created the B::A::StructuredValue myself. I'm using it to store the output from PSORTb, which gives a cellular localization and a score for a protein sequence (gene), which I'm trying to keep paired together, if possible. I'll take a look at B::A::Tagtree, that's probably a better fit. > In order to be stored in BioSQL structured (hierarchical, nested) annotation > is flattened into a string representation, because BioSQL can't store nested > annotation collections natively. Right now if I am not mistaken upon > retrieval this is not converted back into a B::A::Tagtree object but rather > left flat. This is being worked on though, we've just discussed some issues > connected with that. The data I have isn't really deeply nested. I just like to keep related annotation in one object, if possible. > I could make B::A::StructuredValue work the same way, but I'm not sure what > it provides that B::A::Tagtree doesn't. The latter uses Data::Stag under the > hood, which is much cleaner, and more extensible in the future. Perhaps B::A::StructuredValue should be deprecated? > As for SimpleValue annotation versus tag/value annotation for seqfeatures, > yes right now these are treated interchangeably for the purposes of BioSQL > and Bioperl-db. You can do this easily too on your end by using > Bio::SeqFeature::AnnotationAdaptor. I'll check out the AnnotationAdaptor, but I'll probably just end using seqfeature tags/values. They're functionally equivalent to B::A::SimpleValue. >> Is there even a place in the BioSQL schema for a comment to be attached >> to a DBLink? > > No there isn't. I thought it is but it turns out that this isn't yet one of > the desirable extensions to BioSQL from 1.1.x onwards, as documented on the > wiki: > > http://www.biosql.org/wiki/Enhancement_Requests > > I'll add it (but feel free to do so yourself, especially if you have other > enhancmenets). I'll take a look at the wiki....I'll file that as a feature request if I get there before you do it. From cjfields at illinois.edu Wed Aug 20 20:25:55 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 20 Aug 2008 15:25:55 -0500 Subject: [BioSQL-l] Bio::Annotation issues with BioSQL In-Reply-To: References: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net> Message-ID: <9872D07D-61AB-4F0A-A477-35AA87ABF72E@illinois.edu> On Aug 20, 2008, at 1:43 PM, Mark Johnson wrote: > ... > >> I could make B::A::StructuredValue work the same way, but I'm not >> sure what >> it provides that B::A::Tagtree doesn't. The latter uses Data::Stag >> under the >> hood, which is much cleaner, and more extensible in the future. > > Perhaps B::A::StructuredValue should be deprecated? Probably. The only place it was used in core was SeqIO::swiss (and now that uses Tagtree in bioperl-live). Let me know if you have any problems with Bio::Annotation::Tagtree. I am planning on doing some more work with it soon. chris From awitney at sgul.ac.uk Wed Aug 27 10:28:50 2008 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 27 Aug 2008 11:28:50 +0100 Subject: [BioSQL-l] fill taxon_id field in bioentry from Seq object? Message-ID: <73F4F12F-3C10-41B5-B431-104A787E05D7@sgul.ac.uk> is it possible to add a taxon_id to a Seq object such that when i save it to my BioSQL database, it is stored in the bioentry table? thanks for any help adam From biopython at maubp.freeserve.co.uk Wed Aug 27 10:44:30 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Aug 2008 11:44:30 +0100 Subject: [BioSQL-l] fill taxon_id field in bioentry from Seq object? In-Reply-To: <73F4F12F-3C10-41B5-B431-104A787E05D7@sgul.ac.uk> References: <73F4F12F-3C10-41B5-B431-104A787E05D7@sgul.ac.uk> Message-ID: <320fb6e00808270344paeaf049p27fa7425053582a1@mail.gmail.com> On Wed, Aug 27, 2008 at 11:28 AM, Adam Witney wrote: > > is it possible to add a taxon_id to a Seq object such that when i save it to > my BioSQL database, it is stored in the bioentry table? > > thanks for any help > > adam Which Bio* binding for BioSQL are you trying to use? BioPerl, Biopython, BioJava etc Peter From awitney at sgul.ac.uk Wed Aug 27 10:49:50 2008 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 27 Aug 2008 11:49:50 +0100 Subject: [BioSQL-l] fill taxon_id field in bioentry from Seq object? In-Reply-To: <320fb6e00808270344paeaf049p27fa7425053582a1@mail.gmail.com> References: <73F4F12F-3C10-41B5-B431-104A787E05D7@sgul.ac.uk> <320fb6e00808270344paeaf049p27fa7425053582a1@mail.gmail.com> Message-ID: <5648AC9C-55D4-45A4-B08B-BC50C6BF5FC8@sgul.ac.uk> On 27 Aug 2008, at 11:44, Peter wrote: > On Wed, Aug 27, 2008 at 11:28 AM, Adam Witney > wrote: >> >> is it possible to add a taxon_id to a Seq object such that when i >> save it to >> my BioSQL database, it is stored in the bioentry table? >> >> thanks for any help >> >> adam > > Which Bio* binding for BioSQL are you trying to use? BioPerl, > Biopython, BioJava etc sorry forgot to mention that bit.... I am using BioPerl thanks adam From biopython at maubp.freeserve.co.uk Wed Aug 27 13:51:41 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Aug 2008 14:51:41 +0100 Subject: [BioSQL-l] get_dblinks in load_seqdatabase.pl is deprecated Message-ID: <320fb6e00808270651q4c3b81d7x375fbbdd47c9b5f5@mail.gmail.com> Hi, In order to install GBrowse 1.69, I've updated my installation of BioPerl (using gbrowse_netinstall.pl) and then by hand fetched the latest BioPerl/BioSQL load_seqdatabase.pl from SVN, http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-db/trunk/scripts/biosql/load_seqdatabase.pl The new script seems to work, but prints out over a page of deprecation warnings about get_dblinks (see below). Should I file this as a bug on bugzilla? Do you think load_seqdatabase.pl be updated to work with the latest BioPerl and still be backwards compatible with BioPerl 1.5.2? Peter $ mysql --user="gbrowse" --pass="biosql" test_biosql -e "truncate table bioentry; truncate table seqfeature; truncate table bioentry_dbxref; truncate table term; truncate table ontology; truncate table reference; truncate table dbxref;" $ time perl load_seqdatabase.pl --dbname test_biosql --namespace test --format genbank --dbpass biosql --dbuser gbrowse Nanoarchaeum_equitans/NC_005213.gbk Loading Nanoarchaeum_equitans/NC_005213.gbk ... Use of get_dblinks is deprecated. Note that prior use of this method could return either simple scalar values or Bio::Annotation::DBLink instances; only Bio::Annotation::DBLink is now supported. Use get_dbxrefs() instead STACK Bio::Ontology::Term::get_dblinks /Library/Perl/5.8.8/Bio/Ontology/Term.pm:437 STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:552 STACK Bio::DB::BioSQL::TermAdaptor::store_children /Library/Perl/5.8.8/Bio/DB/BioSQL/TermAdaptor.pm:280 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK Bio::DB::Persistent::PersistentObject::create /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:244 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 STACK Bio::DB::BioSQL::SeqAdaptor::store_children /Library/Perl/5.8.8/Bio/DB/BioSQL/SeqAdaptor.pm:244 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 STACK (eval) load_seqdatabase.pl:630 STACK toplevel load_seqdatabase.pl:612 [deprecation warning and stack repeated another six times] real 0m15.479s user 0m12.315s sys 0m2.263s From cjfields at illinois.edu Wed Aug 27 14:38:45 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 27 Aug 2008 09:38:45 -0500 Subject: [BioSQL-l] get_dblinks in load_seqdatabase.pl is deprecated In-Reply-To: <320fb6e00808270651q4c3b81d7x375fbbdd47c9b5f5@mail.gmail.com> References: <320fb6e00808270651q4c3b81d7x375fbbdd47c9b5f5@mail.gmail.com> Message-ID: <51375A78-B1F3-4F52-BBEC-8FE62257F789@illinois.edu> Go ahead and file a bug for tracking. I'll see if I can track this down; I'm wondering if there is something within bioperl-db/bioperl- live still using get_dblinks, though it's called through AUTOLOAD. chris On Aug 27, 2008, at 8:51 AM, Peter wrote: > Hi, > > In order to install GBrowse 1.69, I've updated my installation of > BioPerl (using gbrowse_netinstall.pl) and then by hand fetched the > latest BioPerl/BioSQL load_seqdatabase.pl from SVN, > > http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-db/trunk/scripts/biosql/load_seqdatabase.pl > > The new script seems to work, but prints out over a page of > deprecation warnings about get_dblinks (see below). Should I file > this as a bug on bugzilla? > > Do you think load_seqdatabase.pl be updated to work with the latest > BioPerl and still be backwards compatible with BioPerl 1.5.2? > > Peter > > $ mysql --user="gbrowse" --pass="biosql" test_biosql -e "truncate > table bioentry; truncate table seqfeature; truncate table > bioentry_dbxref; truncate table term; truncate table ontology; > truncate table reference; truncate table dbxref;" > > $ time perl load_seqdatabase.pl --dbname test_biosql --namespace test > --format genbank --dbpass biosql --dbuser gbrowse > Nanoarchaeum_equitans/NC_005213.gbk > Loading Nanoarchaeum_equitans/NC_005213.gbk ... > Use of get_dblinks is deprecated. Note that prior use > of this method could return either simple scalar values > or Bio::Annotation::DBLink instances; only > Bio::Annotation::DBLink is now supported. > Use get_dbxrefs() instead > STACK Bio::Ontology::Term::get_dblinks > /Library/Perl/5.8.8/Bio/Ontology/Term.pm:437 > STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD > /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:552 > STACK Bio::DB::BioSQL::TermAdaptor::store_children > /Library/Perl/5.8.8/Bio/DB/BioSQL/TermAdaptor.pm:280 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK Bio::DB::Persistent::PersistentObject::create > /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:244 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > STACK Bio::DB::Persistent::PersistentObject::store > /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 > STACK Bio::DB::BioSQL::SeqAdaptor::store_children > /Library/Perl/5.8.8/Bio/DB/BioSQL/SeqAdaptor.pm:244 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > STACK Bio::DB::Persistent::PersistentObject::store > /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 > STACK (eval) load_seqdatabase.pl:630 > STACK toplevel load_seqdatabase.pl:612 > [deprecation warning and stack repeated another six times] > real 0m15.479s > user 0m12.315s > sys 0m2.263s > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at illinois.edu Wed Aug 27 15:22:45 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 27 Aug 2008 10:22:45 -0500 Subject: [BioSQL-l] get_dblinks in load_seqdatabase.pl is deprecated In-Reply-To: <51375A78-B1F3-4F52-BBEC-8FE62257F789@illinois.edu> References: <320fb6e00808270651q4c3b81d7x375fbbdd47c9b5f5@mail.gmail.com> <51375A78-B1F3-4F52-BBEC-8FE62257F789@illinois.edu> Message-ID: Peter, Unfortunately I'm unable to reproduce this using bioperl-live and bioperl-db (both from Subversion): cjfields$ time perl load_seqdatabase.pl --dbname nano --namespace test --format genbank --dbpass ***** --dbuser foo NC_005213.gbk Loading NC_005213.gbk ... real 0m35.057s user 0m26.480s sys 0m4.456s This problem is similar to one reported recently: http://article.gmane.org/gmane.comp.lang.perl.bio.general/17360 I think the solution may have been making sure to install bioperl and bioperl-db from Subversion or (if you can't access it) the nightly builds. Use 'sudo ./Build install --uninst 1' to remove old versions which may conflict. The nightly build link: http://bioperl.org/DIST/nightly_builds/ chris On Aug 27, 2008, at 9:38 AM, Chris Fields wrote: > Go ahead and file a bug for tracking. I'll see if I can track this > down; I'm wondering if there is something within bioperl-db/bioperl- > live still using get_dblinks, though it's called through AUTOLOAD. > > chris > > On Aug 27, 2008, at 8:51 AM, Peter wrote: > >> Hi, >> >> In order to install GBrowse 1.69, I've updated my installation of >> BioPerl (using gbrowse_netinstall.pl) and then by hand fetched the >> latest BioPerl/BioSQL load_seqdatabase.pl from SVN, >> >> http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-db/trunk/scripts/biosql/load_seqdatabase.pl >> >> The new script seems to work, but prints out over a page of >> deprecation warnings about get_dblinks (see below). Should I file >> this as a bug on bugzilla? >> >> Do you think load_seqdatabase.pl be updated to work with the latest >> BioPerl and still be backwards compatible with BioPerl 1.5.2? >> >> Peter >> >> $ mysql --user="gbrowse" --pass="biosql" test_biosql -e "truncate >> table bioentry; truncate table seqfeature; truncate table >> bioentry_dbxref; truncate table term; truncate table ontology; >> truncate table reference; truncate table dbxref;" >> >> $ time perl load_seqdatabase.pl --dbname test_biosql --namespace test >> --format genbank --dbpass biosql --dbuser gbrowse >> Nanoarchaeum_equitans/NC_005213.gbk >> Loading Nanoarchaeum_equitans/NC_005213.gbk ... >> Use of get_dblinks is deprecated. Note that prior use >> of this method could return either simple scalar values >> or Bio::Annotation::DBLink instances; only >> Bio::Annotation::DBLink is now supported. >> Use get_dbxrefs() instead >> STACK Bio::Ontology::Term::get_dblinks >> /Library/Perl/5.8.8/Bio/Ontology/Term.pm:437 >> STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD >> /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:552 >> STACK Bio::DB::BioSQL::TermAdaptor::store_children >> /Library/Perl/5.8.8/Bio/DB/BioSQL/TermAdaptor.pm:280 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create >> /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK Bio::DB::Persistent::PersistentObject::create >> /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:244 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create >> /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store >> /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >> STACK Bio::DB::Persistent::PersistentObject::store >> /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 >> STACK Bio::DB::BioSQL::SeqAdaptor::store_children >> /Library/Perl/5.8.8/Bio/DB/BioSQL/SeqAdaptor.pm:244 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create >> /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store >> /Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >> STACK Bio::DB::Persistent::PersistentObject::store >> /Library/Perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 >> STACK (eval) load_seqdatabase.pl:630 >> STACK toplevel load_seqdatabase.pl:612 >> [deprecation warning and stack repeated another six times] >> real 0m15.479s >> user 0m12.315s >> sys 0m2.263s >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From biopython at maubp.freeserve.co.uk Wed Aug 27 16:43:06 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Aug 2008 17:43:06 +0100 Subject: [BioSQL-l] get_dblinks in load_seqdatabase.pl is deprecated In-Reply-To: References: <320fb6e00808270651q4c3b81d7x375fbbdd47c9b5f5@mail.gmail.com> <51375A78-B1F3-4F52-BBEC-8FE62257F789@illinois.edu> Message-ID: <320fb6e00808270943t750ce0b3r9ec9d7c2c9744fc5@mail.gmail.com> > Peter, > > Unfortunately I'm unable to reproduce this using bioperl-live and bioperl-db > (both from Subversion): > ... > > This problem is similar to one reported recently: > > http://article.gmane.org/gmane.comp.lang.perl.bio.general/17360 Yes, it does look very similar. > I think the solution may have been making sure to install bioperl and > bioperl-db from Subversion or (if you can't access it) the nightly builds. > Use 'sudo ./Build install --uninst 1' to remove old versions which may > conflict. The nightly build link: > > http://bioperl.org/DIST/nightly_builds/ You were right - I've installed the nightly builds of bioperl-live and bioperl-db with the switch to remove old versions and the deprecation warning went away. Thanks for your help, I've closed the bug I filed as invalid: http://bugzilla.open-bio.org/show_bug.cgi?id=2572 Peter From biopython at maubp.freeserve.co.uk Wed Aug 27 16:52:27 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Aug 2008 17:52:27 +0100 Subject: [BioSQL-l] fill taxon_id field in bioentry from Seq object? In-Reply-To: <5648AC9C-55D4-45A4-B08B-BC50C6BF5FC8@sgul.ac.uk> References: <73F4F12F-3C10-41B5-B431-104A787E05D7@sgul.ac.uk> <320fb6e00808270344paeaf049p27fa7425053582a1@mail.gmail.com> <5648AC9C-55D4-45A4-B08B-BC50C6BF5FC8@sgul.ac.uk> Message-ID: <320fb6e00808270952j701bda51wac5fe096d640754f@mail.gmail.com> Adam wrote: > is it possible to add a taxon_id to a Seq object such that when i save it >to my BioSQL database, it is stored in the bioentry table? > ... > sorry forgot to mention that bit.... I am using BioPerl I'm afraid I can't help you with BioPerl, sorry. Hopefully a BioPerl expect will reply. All I can suggest is you could try parsing a GenBank file with BioPerl and see where the taxon id is stored in the Seq object's annotation, then try and do the same with your data before asking BioPerl to save it to the BioSQL database. Peter From hlapp at gmx.net Wed Aug 27 19:11:09 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 27 Aug 2008 15:11:09 -0400 Subject: [BioSQL-l] fill taxon_id field in bioentry from Seq object? In-Reply-To: <5648AC9C-55D4-45A4-B08B-BC50C6BF5FC8@sgul.ac.uk> References: <73F4F12F-3C10-41B5-B431-104A787E05D7@sgul.ac.uk> <320fb6e00808270344paeaf049p27fa7425053582a1@mail.gmail.com> <5648AC9C-55D4-45A4-B08B-BC50C6BF5FC8@sgul.ac.uk> Message-ID: <0D41112C-DB20-43CA-B7F9-CC83DF3F3A89@gmx.net> $seq->species->ncbi_taxon_id() BTW feel free to post this to the BioPerl mailing list bioperl-l at lists.open-bio.org . -hilmar On Aug 27, 2008, at 6:49 AM, Adam Witney wrote: > > On 27 Aug 2008, at 11:44, Peter wrote: > >> On Wed, Aug 27, 2008 at 11:28 AM, Adam Witney >> wrote: >>> >>> is it possible to add a taxon_id to a Seq object such that when i >>> save it to >>> my BioSQL database, it is stored in the bioentry table? >>> >>> thanks for any help >>> >>> adam >> >> Which Bio* binding for BioSQL are you trying to use? BioPerl, >> Biopython, BioJava etc > > sorry forgot to mention that bit.... I am using BioPerl > > thanks > > adam > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From gabrielle_doan at gmx.net Fri Aug 29 12:36:26 2008 From: gabrielle_doan at gmx.net (Gabrielle Doan) Date: Fri, 29 Aug 2008 14:36:26 +0200 Subject: [BioSQL-l] Increasing value of rank in table seqfeature Message-ID: <48B7ED4A.5000008@gmx.net> Hi all, I have a BioSQL database which contains several chromosomes and features. And now I would like to insert chromosome 2 with some miRNA as a new feature. I meet the problem that in the table seqfeature the entry rank just can store smallint(5) unsigned values. As fare as I know each rank has to be unique. If you want to store many information this value will be excess quickly. Isn't it better to increase this value? It would be very nice if someone could comment my suggestion. Thanks a lot. Cheers, Gabrielle From hlapp at gmx.net Fri Aug 29 14:45:26 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 29 Aug 2008 10:45:26 -0400 Subject: [BioSQL-l] Increasing value of rank in table seqfeature In-Reply-To: <48B7ED4A.5000008@gmx.net> References: <48B7ED4A.5000008@gmx.net> Message-ID: <9021A6D3-7B9D-4C82-A4A4-45DC28C587F1@gmx.net> Hi Gabrielle, smallint can take values up to 65535 if unsigned. I can see that this can become a limitation if the bioentry to which the features belong is a whole chromosome. Note that the uniqueness constraint is not on bioentry (sequence) and rank. Instead, it is on the combination of bioentry (sequence), type term, source term, and rank. I.e., at present, with the smallint constraint, you can't have more than 65535 features of the same type and from the same source for a particular sequence. It's possible that the software you are using (Biojava?) increments the rank for every single feature, rather than resetting for each new combination of type and source. Is that what you are seeing? -hilmar On Aug 29, 2008, at 8:36 AM, Gabrielle Doan wrote: > Hi all, > I have a BioSQL database which contains several chromosomes and > features. And now I would like to insert chromosome 2 with some > miRNA as a new feature. I meet the problem that in the table > seqfeature the entry rank just can store smallint(5) unsigned > values. As fare as I know each rank has to be unique. If you want to > store many information this value will be excess quickly. Isn't it > better to increase this value? > > It would be very nice if someone could comment my suggestion. Thanks > a lot. > > Cheers, > Gabrielle > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Fri Aug 8 15:26:24 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 08 Aug 2008 15:26:24 -0000 Subject: [BioSQL-l] Biopython documentation in BioSQL SVN Message-ID: <320fb6e00808080826hfa9d2d7rf1adec4d10888574@mail.gmail.com> >> However, there is some older LaTeX based documentation on our webpage, >> http://biopython.org/DIST/docs/biosql/python_biosql_basic.html >> http://biopython.org/DIST/docs/biosql/python_biosql_basic.pdf >> >> These are currently living in the BioSQL repository, >... > >> What I would suggest is just to: >> >> (*) add a disclaimer to the top of python_biosql_basic.tex saying this >> document is depreciated, and giving a link to the wiki page, >> http://biopython.org/wiki/BioSQL > > Just send me a patch of the change you would like to make. Better late than never? Here is a patch against the SVN file python_biosql_basic.tex which puts more emphasis on the wiki page, http://biopython.org/wiki/BioSQL and also uses Bio.SeqIO rather than Bio.GenBank for the record parsing. This also removes the stub section "Python Cookbook Code". Thanks, Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: biopython_biosql_doc.patch Type: application/octet-stream Size: 2234 bytes Desc: not available URL: