From Bank.Beszteri at awi.de Tue Apr 1 08:31:49 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Tue, 01 Apr 2008 14:31:49 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL Message-ID: <47F22B35.1030502@awi.de> Dear list, we have recently started to try to find a solution for indexing large sequence databases / flat files for a java project, and because we ran into problems using biojava, and because both the OBDA and BioSQL ways seem to be compatible across bio~ projects, we also started to experiment with bioperl. It looks like this should work fine, but we had a couple of problems here, too. Perhaps some of you can give me hint what we are doing wrong! The first thing we tried was to use Bio::DB::Flat for indexing a TrEMBL flat file (~ 12 GB); but it seems we haven?t got a machine with enough memory to be able to handle this. (Perhaps you would be using the "bdb" style index in such a case in bioperl, but this apparently doesn?t work with biojava, so we had to stick with "flat"). So next we started to test BioSQL, by trying to load just Swissprot in a MySQL DB first, like: load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser xyz --dbpass abc --driver mysql --namespace uniprot_sprot --format swiss uniprot_sprot.dat Here we get an error message ########################################### Loading /biodb/spinkern/uniprot_sprot.dat ... Could not store Q6DAH5: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The supplied lineage does not start near 'Erwinia carotovora subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | Pectobacterium | Enterobacteriaceae | Enterobacteriales | Gammaproteobacteria | Proteobacteria | Bacteria') STACK: Error::throw STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/bioperl-1.5.2_102/Bio/Root/Root.pm:359 STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/bioperl-1.5.2_102/Bio/Species.pm:174 STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:552 STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1305 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:973 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182 STACK: Bio::DB::Persistent::PersistentObject::create /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271 STACK: load_seqdatabase.pl:622 ----------------------------------------------------------- at load_seqdatabase.pl line 635 ############################################ or similar, depending on whether we use a pre-loaded ncbi taxonomy or not, and which Swissprot release we are trying to load. It often seems to come from sg. like here, subsp. or other special addition to the species line; but alternative genus names and other curious things also to appear. It looks like Species.pm tries to validate the species name against the lineage info already there in the BioSQL DB, and in several cases, it finds inconsistencies. If we start with the ncbi taxonomy already loaded in the database, the first error comes much earlier. I found a thread on the same problem from ~ two years ago (http://thread.gmane.org/gmane.comp.lang.perl.bio.general/13766/focus=13788), where the solution recommended was to update bioperl, so I was quite surprised to find the problem with the version you can see above (1.5.2_102 bioperl core, 1.5.2_100 bioperl_db). Can someone give me any hints as to what is going wrong here? The only workaround we have found so far was to comment out line 174 in Species.pm: $self->throw("The supplied lineage does not start near '$name' (I was supplied '".join(" | ", @vals)."')"); After doing so, load_seqdatabase.pl runs for several hours (until it evetually crashes; I haven?t found out yet why), but proceeds really slowly. I also found some info on this for Pg and Oracle in the mailing list, but has anyone some approximate numbers for MySQL, how long should a first Swissprot load take? Would be grateful to hear about your ideas / experiences on these issues! Bank Beszteri Bioinformatics / Scientific Computing Alfred Wegener Institute for Polar and Marine Research Am Handelshafen 12. 27570 Bremerhaven Germany From cjfields at uiuc.edu Tue Apr 1 20:45:28 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 1 Apr 2008 19:45:28 -0500 Subject: [Bioperl-l] quick update on bioperl nightly builds Message-ID: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> I'm simplifying the nightly build archive names (removing svn revision # and date) in case anyone needs to update bioperl-live/run/db/network on a regular basis (read: GBrowse installations). When I have time I'll start working on automated builds, which will require some extra work with Module::Build and Build.PL. chris From hiekeen at gmail.com Tue Apr 1 22:14:07 2008 From: hiekeen at gmail.com (Jinyan Huang) Date: Wed, 2 Apr 2008 10:14:07 +0800 Subject: [Bioperl-l] How to make a network graphic using my genes in pathways? Message-ID: I have 20 pathways. My interesting genes are in these pathways. There are some genes overlaps in these pathways. How can I make a graphic network using these genes? It means connecting these pathways through these overlap genes. What kind of software can I use? Thank you very much in advance. -- Best regards, Jinyan Huang (ekeen) School of Life Sciences and Technology, 1302 Room Tongji University Siping Road 1239, Shanghai 200092 P.R. China Tel :0086-21-65981041 Msn: hiekeen at hotmail.com eMail: hiekeen at gmail.com From hlapp at gmx.net Tue Apr 1 22:30:06 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 1 Apr 2008 22:30:06 -0400 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47F22B35.1030502@awi.de> References: <47F22B35.1030502@awi.de> Message-ID: On Apr 1, 2008, at 8:31 AM, B?nk Beszteri wrote: > [...] So next we started to test BioSQL, by trying to load just > Swissprot in a MySQL DB first, like: > > load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser > xyz --dbpass abc --driver mysql --namespace uniprot_sprot --format > swiss uniprot_sprot.dat > > Here we get an error message > > ########################################### > > Loading /biodb/spinkern/uniprot_sprot.dat ... > Could not store Q6DAH5: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: The supplied lineage does not start near 'Erwinia carotovora > subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | > Pectobacterium | Enterobacteriaceae | Enterobacteriales | > Gammaproteobacteria | Proteobacteria | Bacteria') > STACK: Error::throw > STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/ > bioperl-1.5.2_102/Bio/Root/Root.pm:359 > STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/ > bioperl-1.5.2_102/Bio/Species.pm:174 > STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: > 552 > STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / > biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:1305 > STACK: > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:973 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:852 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:182 > STACK: Bio::DB::Persistent::PersistentObject::create /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: > 244 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:169 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:251 > STACK: Bio::DB::Persistent::PersistentObject::store /biodb/spinkern/ > bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271 > STACK: load_seqdatabase.pl:622 > ----------------------------------------------------------- > > at load_seqdatabase.pl line 635 > > ############################################ > > or similar, depending on whether we use a pre-loaded ncbi taxonomy > or not I recommend to always use a pre-loaded NCBI taxonomy unless you know there are only a few organisms that are straightforward (for the parser, that is). > , and which Swissprot release we are trying to load. It often seems > to come from sg. like here, subsp. or other special addition to the > species line; but alternative genus names and other curious things > also to appear. It looks like Species.pm tries to validate the > species name against the lineage info already there in the BioSQL > DB, and in several cases, it finds inconsistencies. It actually happens upon a successful lookup when the species object is populated from the database. > [...] > The only workaround we have found so far was to comment out line > 174 in Species.pm: > > $self->throw("The supplied lineage does not start near '$name' (I > was supplied '".join(" | ", @vals)."')"); That should be OK if you work with a pre-loaded taxonomy. It's sort of a sanity check that should catch a parser having messed up a species. If you use a pre-loaded NCBI taxonomy the results of the species parsing don't matter in all details so long as the NCBI taxonID is parsed out correctly, and then found in the database. Note that this actually a warn() in the main trunk version of BioPerl, so you might want to upgrade to that (or change throw() to warn() in your version). You still get the records flagged with that, but it isn't an exception. > > After doing so, load_seqdatabase.pl runs for several hours (until > it evetually crashes; I haven?t found out yet why), but proceeds > really slowly. It should certainly *not* crash. Note also that you can supply --safe on the command line, in which case the script will continue with the next record if one fails to load for whatever reason. You will want to adjust the width constraint of dbxref.accession, for example to 128 chars. This will also be fixed for BioSQL 1.0.1. See http://bugzilla.open-bio.org/show_bug.cgi?id=2474 > I also found some info on this for Pg and Oracle in the mailing > list, but has anyone some approximate numbers for MySQL, how long > should a first Swissprot load take? Possibly around 20 hours according to Erik Rijkers: See http://lists.open-bio.org/pipermail/bioperl-l/2008-March/027427.html You can use the --logchunks N option to have it print out performance statistics every N records. Hope this helps, -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Apr 1 22:38:12 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 1 Apr 2008 22:38:12 -0400 Subject: [Bioperl-l] Very basic implementation of GenBank XML SeqIO module In-Reply-To: <47F13C2C.4070909@umdnj.edu> References: <47F13C2C.4070909@umdnj.edu> Message-ID: Ryan - do you not have a committer account? I do agree with Chris on the test. Modules w/o tests tend to become 'pseudogenized.' -hilmar On Mar 31, 2008, at 3:31 PM, Ryan Golhar wrote: > I have a (very) basic SAX implementation of a SeqIO module to parse > GenBank XML records. Right now, it only reads in basic information > regarding the sequence and the sequence itself. > > It does not yet parse the features table. Should I submit it to be > included in bioperl or wait until I implement more for the features > table? I'm not sure when I'll get around to it though > > Ryan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cain.cshl at gmail.com Tue Apr 1 23:12:04 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 01 Apr 2008 23:12:04 -0400 Subject: [Bioperl-l] quick update on bioperl nightly builds In-Reply-To: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> Message-ID: <1207105924.6184.4.camel@frissell> Hi Chris, The tarball is currently (Apr 1) being built in a tmp directory, so that the extracted tarball is ./tmp/bioperl-live/. Is that intended? Thanks, Scott On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: > I'm simplifying the nightly build archive names (removing svn revision > # and date) in case anyone needs to update bioperl-live/run/db/network > on a regular basis (read: GBrowse installations). When I have time > I'll start working on automated builds, which will require some extra > work with Module::Build and Build.PL. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cjfields at uiuc.edu Tue Apr 1 23:59:30 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 1 Apr 2008 22:59:30 -0500 Subject: [Bioperl-l] quick update on bioperl nightly builds In-Reply-To: <1207105924.6184.4.camel@frissell> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> Message-ID: Nope, that isn't intended. I fixed it and reran it manually, so it should be fine now (note I didn't update the log file; the next cron run will catch that). I may toy around with your recent passthrough flag addition to try getting automated PPM's up and running. chris On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: > Hi Chris, > > The tarball is currently (Apr 1) being built in a tmp directory, so > that > the extracted tarball is ./tmp/bioperl-live/. Is that intended? > > Thanks, > Scott > > On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: >> I'm simplifying the nightly build archive names (removing svn >> revision >> # and date) in case anyone needs to update bioperl-live/run/db/ >> network >> on a regular basis (read: GBrowse installations). When I have time >> I'll start working on automated builds, which will require some extra >> work with Module::Build and Build.PL. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Wed Apr 2 07:33:38 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 2 Apr 2008 07:33:38 -0400 Subject: [Bioperl-l] How to make a network graphic using my genes in pathways? In-Reply-To: References: Message-ID: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> On Tue, Apr 1, 2008 at 10:14 PM, Jinyan Huang wrote: > I have 20 pathways. My interesting genes are in these pathways. There > are some genes overlaps in these pathways. How can I make a graphic > network using these genes? It means connecting these pathways through > these overlap genes. What kind of software can I use? R/Bioconductor has tools for working with graphs and pathways. Cytoscape is another open-source graphical solution. Ingenuity is, of course, not free. If you are looking at a perl solution, you can look at the various graph modules and their integration with the Graphviz libraries. SEan From cain.cshl at gmail.com Wed Apr 2 08:28:22 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 02 Apr 2008 08:28:22 -0400 Subject: [Bioperl-l] [Gmod-gbrowse] quick update on bioperl nightly builds In-Reply-To: References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> Message-ID: <1207139302.6507.7.camel@frissell> Hi Chris, (trimmed out gbrowse mailing list since this is just bioperl business) Speaking of the pass through stuff, Sendu mentioned that I stomped on some changes to Build.PL that you and he did when I committed that change, so it should be rolled back. Is there a good (svn) way to do that? Or should I just copy the contents of the old (good) Build.PL into a fresh file in my checkout and commit it? Thanks, Scott On Tue, 2008-04-01 at 22:59 -0500, Chris Fields wrote: > Nope, that isn't intended. I fixed it and reran it manually, so it > should be fine now (note I didn't update the log file; the next cron > run will catch that). > > I may toy around with your recent passthrough flag addition to try > getting automated PPM's up and running. > > chris > > On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: > > > Hi Chris, > > > > The tarball is currently (Apr 1) being built in a tmp directory, so > > that > > the extracted tarball is ./tmp/bioperl-live/. Is that intended? > > > > Thanks, > > Scott > > > > On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: > >> I'm simplifying the nightly build archive names (removing svn > >> revision > >> # and date) in case anyone needs to update bioperl-live/run/db/ > >> network > >> on a regular basis (read: GBrowse installations). When I have time > >> I'll start working on automated builds, which will require some extra > >> work with Module::Build and Build.PL. > >> > >> chris > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. cain at cshl.edu > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From robert.citek at gmail.com Wed Apr 2 08:24:06 2008 From: robert.citek at gmail.com (Robert Citek) Date: Wed, 2 Apr 2008 07:24:06 -0500 Subject: [Bioperl-l] module for pubchem queries Message-ID: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> Hello all, I have a list of chemical compounds that have some kind of interaction with proteins or genes. The current list contains names or SMILES and I would like to get the CID number for those compounds. Currently, I'm using perl to query the NCBI's eutils[1], which works great. But I was just curious to know of there was a bioperl module to do something similar. A quick google didn't turn up anything, so I thought I'd ask. [1] http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html Regards, - Robert From David.Messina at sbc.su.se Wed Apr 2 08:41:45 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 2 Apr 2008 14:41:45 +0200 Subject: [Bioperl-l] How to make a network graphic using my genes in pathways? In-Reply-To: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> Message-ID: <628aabb70804020541v6cee4584ibd9935290ae7cc0a@mail.gmail.com> I have no personal experience with it, but a colleague of mine suggested VisANT . Dave From cjfields at uiuc.edu Wed Apr 2 11:03:32 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 2 Apr 2008 10:03:32 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] quick update on bioperl nightly builds In-Reply-To: <1207139302.6507.7.camel@frissell> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> <1207139302.6507.7.camel@frissell> Message-ID: <3B490712-3413-4662-99D7-7B115CECB6E1@uiuc.edu> The changes I made were related to problems checking MySQL for Bio::DB::SeqFeature::Store tests when connectivity requires username/ password. For some reason it tests DB connectivity up front, while Bio::DB::GFF assumes the DB setup is correct (no direct DB check) then runs tests assuming the setup is correct. You can view the diffs for your commits here: http://code.open-bio.org/svnweb/index.cgi/bioperl/diff/bioperl-live/trunk/ModuleBuildBioperl.pm?revs=14604&revs=14548 http://code.open-bio.org/svnweb/index.cgi/bioperl/diff/bioperl-live/trunk/Build.PL?revs=14604&revs=14565 I'll try working on merging them together today; it shouldn't be too hard (the changes were fairly minor in both Build.PL and Module::Build). I'll test to make sure your changes stay in as well. Down the road I believe we need to rethink how we want the Build process to run using Module::Build as it's a bit convoluted, but it works for now. chris On Apr 2, 2008, at 7:28 AM, Scott Cain wrote: > Hi Chris, > > (trimmed out gbrowse mailing list since this is just bioperl business) > > Speaking of the pass through stuff, Sendu mentioned that I stomped on > some changes to Build.PL that you and he did when I committed that > change, so it should be rolled back. Is there a good (svn) way to do > that? Or should I just copy the contents of the old (good) Build.PL > into a fresh file in my checkout and commit it? > > Thanks, > Scott > > On Tue, 2008-04-01 at 22:59 -0500, Chris Fields wrote: >> Nope, that isn't intended. I fixed it and reran it manually, so it >> should be fine now (note I didn't update the log file; the next cron >> run will catch that). >> >> I may toy around with your recent passthrough flag addition to try >> getting automated PPM's up and running. >> >> chris >> >> On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: >> >>> Hi Chris, >>> >>> The tarball is currently (Apr 1) being built in a tmp directory, so >>> that >>> the extracted tarball is ./tmp/bioperl-live/. Is that intended? >>> >>> Thanks, >>> Scott >>> >>> On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: >>>> I'm simplifying the nightly build archive names (removing svn >>>> revision >>>> # and date) in case anyone needs to update bioperl-live/run/db/ >>>> network >>>> on a regular basis (read: GBrowse installations). When I have time >>>> I'll start working on automated builds, which will require some >>>> extra >>>> work with Module::Build and Build.PL. >>>> >>>> chris >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. cain at cshl.edu >>> GMOD Coordinator (http://www.gmod.org/) >>> 216-392-3087 >>> Cold Spring Harbor Laboratory >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace >> _______________________________________________ >> Gmod-gbrowse mailing list >> Gmod-gbrowse at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Apr 2 11:54:05 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 2 Apr 2008 10:54:05 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] quick update on bioperl nightly builds In-Reply-To: <3B490712-3413-4662-99D7-7B115CECB6E1@uiuc.edu> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> <1207139302.6507.7.camel@frissell> <3B490712-3413-4662-99D7-7B115CECB6E1@uiuc.edu> Message-ID: <71375DA3-A751-4908-8000-D9ACAE39B19C@uiuc.edu> Okay, committed them. The accept passthrough still appears to work; let me know if anything pops up. chris On Apr 2, 2008, at 10:03 AM, Chris Fields wrote: > ... > I'll try working on merging them together today; it shouldn't be too > hard (the changes were fairly minor in both Build.PL and > Module::Build). I'll test to make sure your changes stay in as > well. Down the road I believe we need to rethink how we want the > Build process to run using Module::Build as it's a bit convoluted, > but it works for now. > > chris > > On Apr 2, 2008, at 7:28 AM, Scott Cain wrote: >> Hi Chris, >> >> (trimmed out gbrowse mailing list since this is just bioperl >> business) >> >> Speaking of the pass through stuff, Sendu mentioned that I stomped on >> some changes to Build.PL that you and he did when I committed that >> change, so it should be rolled back. Is there a good (svn) way to do >> that? Or should I just copy the contents of the old (good) Build.PL >> into a fresh file in my checkout and commit it? >> >> Thanks, >> Scott >> >> On Tue, 2008-04-01 at 22:59 -0500, Chris Fields wrote: >>> Nope, that isn't intended. I fixed it and reran it manually, so it >>> should be fine now (note I didn't update the log file; the next cron >>> run will catch that). >>> >>> I may toy around with your recent passthrough flag addition to try >>> getting automated PPM's up and running. >>> >>> chris >>> >>> On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: >>> >>>> Hi Chris, >>>> >>>> The tarball is currently (Apr 1) being built in a tmp directory, so >>>> that >>>> the extracted tarball is ./tmp/bioperl-live/. Is that intended? >>>> >>>> Thanks, >>>> Scott >>>> >>>> On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: >>>>> I'm simplifying the nightly build archive names (removing svn >>>>> revision >>>>> # and date) in case anyone needs to update bioperl-live/run/db/ >>>>> network >>>>> on a regular basis (read: GBrowse installations). When I have >>>>> time >>>>> I'll start working on automated builds, which will require some >>>>> extra >>>>> work with Module::Build and Build.PL. >>>>> >>>>> chris >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. cain at cshl.edu >>>> GMOD Coordinator (http://www.gmod.org/) >>>> 216-392-3087 >>>> Cold Spring Harbor Laboratory >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace >>> _______________________________________________ >>> Gmod-gbrowse mailing list >>> Gmod-gbrowse at lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. cain.cshl at gmail.com >> GMOD Coordinator (http://www.gmod.org/) >> 216-392-3087 >> Cold Spring Harbor Laboratory >> >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From zhpan99 at yahoo.com Wed Apr 2 13:52:46 2008 From: zhpan99 at yahoo.com (Pan Zheng) Date: Wed, 2 Apr 2008 10:52:46 -0700 (PDT) Subject: [Bioperl-l] installing bioperl-1.5.2 errors:DB_File Message-ID: <726978.82400.qm@web53105.mail.re2.yahoo.com> Hi, I am installing bioperl-1.5.2_102 under cygwin on my Windows XP and having some errors during the process. When I was running "perl Build test", one major error is the error about DB_File. I tried to install DB_File from cpan and rpm without any luck. ++++++++++++++++++++++++ CPAN: File::Temp loaded ok (v0.16) CPAN: YAML loaded ok (v0.62) CPAN.pm: Going to build P/PM/PMQS/DB_File-1.817.tar.gz Parsing config.in... Looks Good. Checking if your kit is complete... Looks good Note (probably harmless): No library found for -ldb Writing Makefile for DB_File cp DB_File.pm blib/lib/DB_File.pm AutoSplitting blib/lib/DB_File.pm (blib/lib/auto/DB_File) gcc -c -I/usr/local/BerkeleyDB/include -DPERL_USE_SAFE_PUTENV -fno-strict-alias ing -pipe -Wdeclaration-after-statement -DUSEIMPORTLIB -O3 -DVERSION=\"1.817\" -DXS_VERSION=\"1.817\" "-I/usr/lib/perl5/5.8/cygwin/CORE" -D_NOT_CORE -DmDB_ Prefix_t=size_t -DmDB_Hash_t=u_int32_t version.c version.c:30:16: db.h: No such file or directory make: *** [version.o] Error 1 PMQS/DB_File-1.817.tar.gz /usr/bin/make -- NOT OK Running make test Can't test without successful make Running make install Make had returned bad status, install seems impossible Failed during this command: PMQS/DB_File-1.817.tar.gz : make NO +++++++++++++++++++++++++++++++++++++++++++++++ I can't remember I had this kind error while installing earlier version. Would you please help me on DB_File installation ? Thanks. Pan --------------------------------- You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. From dr.hogart at gmail.com Thu Apr 3 09:01:03 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Thu, 03 Apr 2008 17:01:03 +0400 Subject: [Bioperl-l] support of clustalw2 in bio::run::tool::alignment Message-ID: As for as I understand clustalw2 is not supported in bioperl v1.5.2.100. In what version it will be realized? Thank you in advance. From slduncan at iastate.edu Thu Apr 3 14:13:16 2008 From: slduncan at iastate.edu (slduncan at iastate.edu) Date: Thu, 3 Apr 2008 13:13:16 -0500 (CDT) Subject: [Bioperl-l] help installing bioperl with cygwin Message-ID: <161313331084931@webmail.iastate.edu> I am trying to use cpan to install bioperl and I had an error message saying: c:\Documents not recognized as and external or internal.... Any ideas here. Also, I am new to the computer world so please be kind. :) Stacy Duncan Iowa State University Bioinformatics and Computational Biology 1802 University Blvd. VMRI Building 6 Ames, IA 50011-1240 office phone: (515) 294-8385 office fax: (515) 294-1401 home phone: (336) 965-5622 e-mail: slduncan at iastate.edu From cjfields at uiuc.edu Fri Apr 4 16:13:23 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 4 Apr 2008 15:13:23 -0500 Subject: [Bioperl-l] help installing bioperl with cygwin In-Reply-To: <161313331084931@webmail.iastate.edu> References: <161313331084931@webmail.iastate.edu> Message-ID: It's best if you use ActiveState's Perl installation (it's the only one we really support at this moment, unless someone wants to give StrawberryPerl a run). See: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows chris On Apr 3, 2008, at 1:13 PM, slduncan at iastate.edu wrote: > I am trying to use cpan to install bioperl and I had an error > message saying: > c:\Documents not recognized as and external or internal.... > Any ideas here. Also, I am new to the computer world so please be > kind. :) > > Stacy Duncan > Iowa State University > Bioinformatics and Computational Biology > 1802 University Blvd. > VMRI Building 6 > Ames, IA 50011-1240 > office phone: (515) 294-8385 > office fax: (515) 294-1401 > home phone: (336) 965-5622 > e-mail: slduncan at iastate.edu > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Apr 4 16:07:12 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 4 Apr 2008 15:07:12 -0500 Subject: [Bioperl-l] installing bioperl-1.5.2 errors:DB_File In-Reply-To: <726978.82400.qm@web53105.mail.re2.yahoo.com> References: <726978.82400.qm@web53105.mail.re2.yahoo.com> Message-ID: I think you have to use the cygwin installer to install DB_File (it also installs dependencies, such as BDB). According to 'perldoc perlcygwin': .... Optional Libraries for Perl on Cygwin Several Perl functions and modules depend on the existence of some optional libraries. Configure will find them if they are installed in one of the directories listed as being used for library searches. Pre- built packages for most of these are available from the Cygwin installer. .... chris On Apr 2, 2008, at 12:52 PM, Pan Zheng wrote: > Hi, > > I am installing bioperl-1.5.2_102 under cygwin on my Windows XP and > having some errors during the process. > > When I was running "perl Build test", one major error is the error > about DB_File. I tried to install DB_File from cpan and rpm without > any luck. > > ++++++++++++++++++++++++ > CPAN: File::Temp loaded ok (v0.16) > CPAN: YAML loaded ok (v0.62) > CPAN.pm: Going to build P/PM/PMQS/DB_File-1.817.tar.gz > Parsing config.in... > Looks Good. > Checking if your kit is complete... > Looks good > Note (probably harmless): No library found for -ldb > Writing Makefile for DB_File > cp DB_File.pm blib/lib/DB_File.pm > AutoSplitting blib/lib/DB_File.pm (blib/lib/auto/DB_File) > gcc -c -I/usr/local/BerkeleyDB/include -DPERL_USE_SAFE_PUTENV -fno- > strict-alias > ing -pipe -Wdeclaration-after-statement -DUSEIMPORTLIB -O3 - > DVERSION=\"1.817\" > -DXS_VERSION=\"1.817\" "-I/usr/lib/perl5/5.8/cygwin/CORE" - > D_NOT_CORE -DmDB_ > Prefix_t=size_t -DmDB_Hash_t=u_int32_t version.c > version.c:30:16: db.h: No such file or directory > make: *** [version.o] Error 1 > PMQS/DB_File-1.817.tar.gz > /usr/bin/make -- NOT OK > Running make test > Can't test without successful make > Running make install > Make had returned bad status, install seems impossible > Failed during this command: > PMQS/DB_File-1.817.tar.gz : make NO > +++++++++++++++++++++++++++++++++++++++++++++++ > > > I can't remember I had this kind error while installing earlier > version. > > Would you please help me on DB_File installation ? > > Thanks. > > Pan > > > --------------------------------- > You rock. That's why Blockbuster's offering you one month of > Blockbuster Total Access, No Cost. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Apr 4 17:25:41 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 4 Apr 2008 16:25:41 -0500 Subject: [Bioperl-l] module for pubchem queries In-Reply-To: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> References: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> Message-ID: <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> Do you need something to access eutils via BioPerl, or are you looking for a specific set of classes? I wrote an interface to eutils (Bio::DB::EUtilities), you could do something like this: #!/usr/bin/perl -w use strict; use warnings; use Bio::DB::EUtilities; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -term => 'dihydroorotate', -db => 'pcsubstance', -retmax => 1000); print join(',',$eutil->get_ids)."\n"; chris On Apr 2, 2008, at 7:24 AM, Robert Citek wrote: > Hello all, > > I have a list of chemical compounds that have some kind of interaction > with proteins or genes. The current list contains names or SMILES and > I would like to get the CID number for those compounds. Currently, > I'm using perl to query the NCBI's eutils[1], which works great. But > I was just curious to know of there was a bioperl module to do > something similar. A quick google didn't turn up anything, so I > thought I'd ask. > > [1] http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html > > Regards, > - Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From ekeen at mail.tongji.edu.cn Mon Apr 7 02:57:04 2008 From: ekeen at mail.tongji.edu.cn (Jinyan Huang) Date: Mon, 7 Apr 2008 14:57:04 +0800 Subject: [Bioperl-l] How to analysis the relationship of my interesting KEGG pathways? Message-ID: In my research, I got 25 interesting pathways. I want to know the regulated relationship of these pathways. It is better if there some software to connect these KEGG pathways. Thank you very much in advance. From miguel.pignatelli at uv.es Mon Apr 7 06:12:58 2008 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Mon, 07 Apr 2008 12:12:58 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> Message-ID: <47F9F3AA.2090003@uv.es> Hi all, Is there any way to obtain the date of creation of individual GenBank entries? I don't mean the "last revision" date that can be found in the first line of a GenBank file. I can access this creation date by looking at the "revision history" of any GenBank entry (for example, see http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), but I need a systematic (and local=fast) way to access this information. Any help would be very appreciated, Thank you very much in advance, M; From Bank.Beszteri at awi.de Mon Apr 7 07:46:43 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Mon, 07 Apr 2008 13:46:43 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: References: <47F22B35.1030502@awi.de> Message-ID: <47FA09A3.2070004@awi.de> Hi Hilmar, it was important to understand that the inconsistency in taxon names is apparently only between the Swissprot entries with "non-standard" names and the contents of the taxonomy tables and that it is best to use a pre-loaded taxonomy, thanks for that! We have now updated to bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to have loaded everything OK in ~26 hours (with many of the "The supplied lineage does not start near..." warnings, but no other problems). Our next test is to try to load trembl (will try to do this in parallel in multiple chunks), hope it will work just as nicely! Thanks for your tips & insights! Bank Hilmar Lapp wrote: > > On Apr 1, 2008, at 8:31 AM, B?nk Beszteri wrote: > >> [...] So next we started to test BioSQL, by trying to load just >> Swissprot in a MySQL DB first, like: >> >> load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser >> xyz --dbpass abc --driver mysql --namespace uniprot_sprot --format >> swiss uniprot_sprot.dat >> >> Here we get an error message >> >> ########################################### >> >> Loading /biodb/spinkern/uniprot_sprot.dat ... >> Could not store Q6DAH5: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: The supplied lineage does not start near 'Erwinia carotovora >> subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | >> Pectobacterium | Enterobacteriaceae | Enterobacteriales | >> Gammaproteobacteria | Proteobacteria | Bacteria') >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/ >> bioperl-1.5.2_102/Bio/Root/Root.pm:359 >> STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/ >> bioperl-1.5.2_102/Bio/Species.pm:174 >> STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: 552 >> STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / >> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:1305 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key >> /biodb/ spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:973 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:852 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:182 >> STACK: Bio::DB::Persistent::PersistentObject::create /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: 244 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:169 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store /biodb/spinkern/ >> bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: load_seqdatabase.pl:622 >> ----------------------------------------------------------- >> >> at load_seqdatabase.pl line 635 >> >> ############################################ >> >> or similar, depending on whether we use a pre-loaded ncbi taxonomy >> or not > > > I recommend to always use a pre-loaded NCBI taxonomy unless you know > there are only a few organisms that are straightforward (for the > parser, that is). > >> , and which Swissprot release we are trying to load. It often seems >> to come from sg. like here, subsp. or other special addition to the >> species line; but alternative genus names and other curious things >> also to appear. It looks like Species.pm tries to validate the >> species name against the lineage info already there in the BioSQL >> DB, and in several cases, it finds inconsistencies. > > > It actually happens upon a successful lookup when the species object > is populated from the database. > >> [...] >> The only workaround we have found so far was to comment out line 174 >> in Species.pm: >> >> $self->throw("The supplied lineage does not start near '$name' (I >> was supplied '".join(" | ", @vals)."')"); > > > That should be OK if you work with a pre-loaded taxonomy. It's sort > of a sanity check that should catch a parser having messed up a > species. If you use a pre-loaded NCBI taxonomy the results of the > species parsing don't matter in all details so long as the NCBI > taxonID is parsed out correctly, and then found in the database. > > Note that this actually a warn() in the main trunk version of > BioPerl, so you might want to upgrade to that (or change throw() to > warn() in your version). You still get the records flagged with that, > but it isn't an exception. > >> >> After doing so, load_seqdatabase.pl runs for several hours (until it >> evetually crashes; I haven?t found out yet why), but proceeds really >> slowly. > > > It should certainly *not* crash. Note also that you can supply --safe > on the command line, in which case the script will continue with the > next record if one fails to load for whatever reason. > > You will want to adjust the width constraint of dbxref.accession, for > example to 128 chars. This will also be fixed for BioSQL 1.0.1. > See http://bugzilla.open-bio.org/show_bug.cgi?id=2474 > > >> I also found some info on this for Pg and Oracle in the mailing >> list, but has anyone some approximate numbers for MySQL, how long >> should a first Swissprot load take? > > > Possibly around 20 hours according to Erik Rijkers: > See http://lists.open-bio.org/pipermail/bioperl-l/2008-March/027427.html > > You can use the --logchunks N option to have it print out performance > statistics every N records. > > Hope this helps, > > -hilmar From cjfields at uiuc.edu Mon Apr 7 08:32:45 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Apr 2008 07:32:45 -0500 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47FA09A3.2070004@awi.de> References: <47F22B35.1030502@awi.de> <47FA09A3.2070004@awi.de> Message-ID: The warnings are something that we still need to resolve, but the only fix I can think of likely breaks backward compatibility with older bioperl-db installations (i.e. storing the given scientific name instead of the binomial name, which is used as a fallback when no taxid is found). There is a full explanation here: http://bugzilla.open-bio.org/show_bug.cgi?id=2092 Anyway, I think it needs further testing when someone, likely Hilmar or I, have time. chris On Apr 7, 2008, at 6:46 AM, B?nk Beszteri wrote: > Hi Hilmar, > > it was important to understand that the inconsistency in taxon names > is apparently only between the Swissprot entries with "non-standard" > names and the contents of the taxonomy tables and that it is best to > use a pre-loaded taxonomy, thanks for that! We have now updated to > bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to > have loaded everything OK in ~26 hours (with many of the "The > supplied lineage does not start near..." warnings, but no other > problems). Our next test is to try to load trembl (will try to do > this in parallel in multiple chunks), hope it will work just as > nicely! > > Thanks for your tips & insights! > > Bank > > Hilmar Lapp wrote: > >> >> On Apr 1, 2008, at 8:31 AM, B?nk Beszteri wrote: >> >>> [...] So next we started to test BioSQL, by trying to load just >>> Swissprot in a MySQL DB first, like: >>> >>> load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser >>> xyz --dbpass abc --driver mysql --namespace uniprot_sprot -- >>> format swiss uniprot_sprot.dat >>> >>> Here we get an error message >>> >>> ########################################### >>> >>> Loading /biodb/spinkern/uniprot_sprot.dat ... >>> Could not store Q6DAH5: >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: The supplied lineage does not start near 'Erwinia carotovora >>> subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | >>> Pectobacterium | Enterobacteriaceae | Enterobacteriales | >>> Gammaproteobacteria | Proteobacteria | Bacteria') >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/ >>> bioperl-1.5.2_102/Bio/Root/Root.pm:359 >>> STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/ >>> bioperl-1.5.2_102/Bio/Species.pm:174 >>> STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/ >>> PersistentObject.pm: 552 >>> STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / >>> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:1305 >>> STACK: >>> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / >>> biodb/ spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:973 >>> STACK: >>> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >>> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:852 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:182 >>> STACK: Bio::DB::Persistent::PersistentObject::create /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/ >>> PersistentObject.pm: 244 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:169 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:251 >>> STACK: Bio::DB::Persistent::PersistentObject::store /biodb/ >>> spinkern/ bioperl-db-1.5.2_100/Bio/DB/Persistent/ >>> PersistentObject.pm:271 >>> STACK: load_seqdatabase.pl:622 >>> ----------------------------------------------------------- >>> >>> at load_seqdatabase.pl line 635 >>> >>> ############################################ >>> >>> or similar, depending on whether we use a pre-loaded ncbi >>> taxonomy or not >> >> >> I recommend to always use a pre-loaded NCBI taxonomy unless you >> know there are only a few organisms that are straightforward (for >> the parser, that is). >> >>> , and which Swissprot release we are trying to load. It often >>> seems to come from sg. like here, subsp. or other special >>> addition to the species line; but alternative genus names and >>> other curious things also to appear. It looks like Species.pm >>> tries to validate the species name against the lineage info >>> already there in the BioSQL DB, and in several cases, it finds >>> inconsistencies. >> >> >> It actually happens upon a successful lookup when the species >> object is populated from the database. >> >>> [...] >>> The only workaround we have found so far was to comment out line >>> 174 in Species.pm: >>> >>> $self->throw("The supplied lineage does not start near '$name' (I >>> was supplied '".join(" | ", @vals)."')"); >> >> >> That should be OK if you work with a pre-loaded taxonomy. It's >> sort of a sanity check that should catch a parser having messed up >> a species. If you use a pre-loaded NCBI taxonomy the results of >> the species parsing don't matter in all details so long as the >> NCBI taxonID is parsed out correctly, and then found in the >> database. >> >> Note that this actually a warn() in the main trunk version of >> BioPerl, so you might want to upgrade to that (or change throw() >> to warn() in your version). You still get the records flagged with >> that, but it isn't an exception. >> >>> >>> After doing so, load_seqdatabase.pl runs for several hours (until >>> it evetually crashes; I haven?t found out yet why), but proceeds >>> really slowly. >> >> >> It should certainly *not* crash. Note also that you can supply -- >> safe on the command line, in which case the script will continue >> with the next record if one fails to load for whatever reason. >> >> You will want to adjust the width constraint of dbxref.accession, >> for example to 128 chars. This will also be fixed for BioSQL 1.0.1. >> See http://bugzilla.open-bio.org/show_bug.cgi?id=2474 >> >> >>> I also found some info on this for Pg and Oracle in the mailing >>> list, but has anyone some approximate numbers for MySQL, how long >>> should a first Swissprot load take? >> >> >> Possibly around 20 hours according to Erik Rijkers: >> See http://lists.open-bio.org/pipermail/bioperl-l/2008-March/027427.html >> >> You can use the --logchunks N option to have it print out >> performance statistics every N records. >> >> Hope this helps, >> >> -hilmar > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Mon Apr 7 08:34:00 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 07 Apr 2008 13:34:00 +0100 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47FA09A3.2070004@awi.de> References: <47F22B35.1030502@awi.de> <47FA09A3.2070004@awi.de> Message-ID: <47FA14B8.7000500@sendu.me.uk> B?nk Beszteri wrote: > Hi Hilmar, > > it was important to understand that the inconsistency in taxon names is > apparently only between the Swissprot entries with "non-standard" names > and the contents of the taxonomy tables and that it is best to use a > pre-loaded taxonomy, thanks for that! We have now updated to > bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to have > loaded everything OK in ~26 hours (with many of the "The supplied > lineage does not start near..." warnings, but no other problems). Can you provide some examples of these warnings (of the taxons that cause them)? If there's anything consistent about them perhaps Bio::Species can be improved to accommodate them properly (instead of just issuing the warning and getting the classification wrong). From heikki at sanbi.ac.za Mon Apr 7 08:48:34 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Mon, 7 Apr 2008 14:48:34 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47F9F3AA.2090003@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> Message-ID: <200804071448.34769.heikki@sanbi.ac.za> Miguel, You probably know this but: - Your entry example below is a GenPept entry, not a GenBank entry - The NCBI sequence format "genbank" has only the last modified date. I do not know about other formats (ASN.1, ...) - NCBI Entrez is a great tool but it obscures the source database. - If you really are working on real GenBank entries, you can use the accession number to see find corresponding EMBL (and Swiss-Prot) flat file formats that have both creation and last modified dates. Post to the list if you have trouble getting the dates from EMBL/Swiss-Prot formats using bioperl. Yours, -Heikki On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: > Hi all, > > Is there any way to obtain the date of creation of individual GenBank > entries? I don't mean the "last revision" date that can be found in the > first line of a GenBank file. > > I can access this creation date by looking at the "revision history" of > any GenBank entry (for example, see > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), > but I need a systematic (and local=fast) way to access this information. > > Any help would be very appreciated, > Thank you very much in advance, > > M; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From granjeau at tagc.univ-mrs.fr Mon Apr 7 09:30:10 2008 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/ICIM) Date: Mon, 07 Apr 2008 15:30:10 +0200 Subject: [Bioperl-l] help installing bioperl with cygwin In-Reply-To: References: <161313331084931@webmail.iastate.edu> Message-ID: <47FA21E2.3010602@tagc.univ-mrs.fr> Hi, I'm using BioPerl under Cygwin, because Cygwin allows one to work in a Unix-like environment in a command line point of view. So, I use the CVS version which runs out of the box http://www.bioperl.org/wiki/Using_CVS which has been replaced by SVN at the beginning of the year http://www.bioperl.org/wiki/Using_Subversion So if you really want to work under Cygwin, you can try this quick and dirty way, but you still have to become experienced because BioPerl is not supported under Cygwin. You may try Strawberry, but in my experience in installing wxPerl, wxPerl fails on both flavours of Perl. ActiveState's Perl is still the easiest way to install many packages. Regards, Samuel Chris Fields wrote: > It's best if you use ActiveState's Perl installation (it's the only > one we really support at this moment, unless someone wants to give > StrawberryPerl a run). See: > > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > chris > > On Apr 3, 2008, at 1:13 PM, slduncan at iastate.edu wrote: > >> I am trying to use cpan to install bioperl and I had an error message >> saying: >> c:\Documents not recognized as and external or internal.... >> Any ideas here. Also, I am new to the computer world so please be >> kind. :) >> >> Stacy Duncan >> Iowa State University >> Bioinformatics and Computational Biology >> 1802 University Blvd. >> VMRI Building 6 >> Ames, IA 50011-1240 >> office phone: (515) 294-8385 >> office fax: (515) 294-1401 >> home phone: (336) 965-5622 >> e-mail: slduncan at iastate.edu >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Samuel GRANJEAUD granjeau at tagc.univ-mrs.fr INSERM - ICIM - TAGC Tel: +33 (0)491 82 87 24 http://tagc.univ-mrs.fr Fax: +33 (0)491 82 87 01 http://icim.marseille.inserm.fr/proteomique From er at xs4all.nl Mon Apr 7 10:36:57 2008 From: er at xs4all.nl (Erik) Date: Mon, 7 Apr 2008 16:36:57 +0200 (CEST) Subject: [Bioperl-l] Indexing large databases / BioSQL Message-ID: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> On Mon, April 7, 2008 14:34, Sendu Bala wrote: > B?nk Beszteri wrote: >> Hi Hilmar, >> >> it was important to understand that the inconsistency in taxon names is >> apparently only between the Swissprot entries with "non-standard" names >> and the contents of the taxonomy tables and that it is best to use a >> pre-loaded taxonomy, thanks for that! We have now updated to >> bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to have >> loaded everything OK in ~26 hours (with many of the "The supplied >> lineage does not start near..." warnings, but no other problems). > > Can you provide some examples of these warnings (of the taxons that > cause them)? If there's anything consistent about them perhaps > Bio::Species can be improved to accommodate them properly (instead of > just issuing the warning and getting the classification wrong). > I did this a little while ago and saved the output (UniProtKB/Swiss-Prot Release 55.1 of 18-Mar-2008, I think). All warnings (and a few errors) for swissprot are here: http://bugzilla.open-bio.org/show_bug.cgi?id=2474 as an attached file I suppose the OP will have encountered similar output - I don't think there is much RDBMS-type-dependency involved. regards, Erik Rijkers From cjfields at uiuc.edu Mon Apr 7 11:46:01 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Apr 2008 10:46:01 -0500 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <200804071448.34769.heikki@sanbi.ac.za> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> Message-ID: <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> Strangely enough, if you use NCBI's esummary you can get both dates. Via Bio::DB::EUtilities in bioperl-live, if you dump out DocSum data (using a debugging method I added in a while back): --------------------------------------- use Bio::DB::EUtilities; # for multiple IDs use an array ref; also only use GI's (not accessions) my $factory = Bio::DB::EUtilities->new( -eutil => 'esummary', -db => 'protein', -id => 1621261); $factory->print_DocSums; --------------------------------------- One gets the following tag/value pairs: UID: 1621261 Caption :CAB02640 Title :PROBABLE PYRIMIDINE OPERON REGULATORY PROTEIN PYRR [Mycobacterium tuberculosis H37Rv] Extra :gi|1621261|emb|CAB02640.1|[1621261] Gi :1621261 CreateDate :2003/11/21 UpdateDate :2006/11/14 Flags : TaxId :83332 Length :193 Status :live ReplacedBy : Comment : I'll add in a method to grab the data element by tag (in this case, grab the creation date by asking for the 'CreateDate' key). Might come in handy for scripts. chris On Apr 7, 2008, at 7:48 AM, Heikki Lehvaslaiho wrote: > Miguel, > > You probably know this but: > > - Your entry example below is a GenPept entry, not a GenBank entry > - The NCBI sequence format "genbank" has only the last modified date. > I do not know about other formats (ASN.1, ...) > - NCBI Entrez is a great tool but it obscures the source database. > - If you really are working on real GenBank entries, you can use the > accession > number to see find corresponding EMBL (and Swiss-Prot) flat file > formats that > have both creation and last modified dates. > > Post to the list if you have trouble getting the dates from EMBL/ > Swiss-Prot > formats using bioperl. > > Yours, > > -Heikki > > On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: >> Hi all, >> >> Is there any way to obtain the date of creation of individual GenBank >> entries? I don't mean the "last revision" date that can be found in >> the >> first line of a GenBank file. >> >> I can access this creation date by looking at the "revision >> history" of >> any GenBank entry (for example, see >> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi? >> val=74311105), >> but I need a systematic (and local=fast) way to access this >> information. >> >> Any help would be very appreciated, >> Thank you very much in advance, >> >> M; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miguel.pignatelli at uv.es Mon Apr 7 12:24:50 2008 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Mon, 07 Apr 2008 18:24:50 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> Message-ID: <47FA4AD2.5030206@uv.es> I've noticed that the ASN.1 version of those records has a "creation-date" tag. But this is somehow strange, because the creation date obtained by you and that obtained via ASN.1 format is 2003/11/21, but if you look at the revision history of the record: http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=CAB02640 reports a creation date of "Oct 19 1996 12:28 AM" I don't know how to get this, because the EMBL version of this gene: http://www.ebi.ac.uk/cgi-bin/dbfetch?db=emblcds&id=CAB02640&style=raw doesn't has DT fields at all. M; Chris Fields wrote: > Strangely enough, if you use NCBI's esummary you can get both dates. > Via Bio::DB::EUtilities in bioperl-live, if you dump out DocSum data > (using a debugging method I added in a while back): > > --------------------------------------- > > use Bio::DB::EUtilities; > > # for multiple IDs use an array ref; also only use GI's (not accessions) > my $factory = Bio::DB::EUtilities->new( > -eutil => 'esummary', > -db => 'protein', > -id => 1621261); > > $factory->print_DocSums; > > --------------------------------------- > > One gets the following tag/value pairs: > > UID: 1621261 > Caption :CAB02640 > Title :PROBABLE PYRIMIDINE OPERON REGULATORY PROTEIN PYRR > [Mycobacterium tuberculosis > H37Rv] > Extra :gi|1621261|emb|CAB02640.1|[1621261] > Gi :1621261 > CreateDate :2003/11/21 > UpdateDate :2006/11/14 > Flags : > TaxId :83332 > Length :193 > Status :live > ReplacedBy : > Comment : > > I'll add in a method to grab the data element by tag (in this case, grab > the creation date by asking for the 'CreateDate' key). Might come in > handy for scripts. > > chris > > On Apr 7, 2008, at 7:48 AM, Heikki Lehvaslaiho wrote: > >> Miguel, >> >> You probably know this but: >> >> - Your entry example below is a GenPept entry, not a GenBank entry >> - The NCBI sequence format "genbank" has only the last modified date. >> I do not know about other formats (ASN.1, ...) >> - NCBI Entrez is a great tool but it obscures the source database. >> - If you really are working on real GenBank entries, you can use the >> accession >> number to see find corresponding EMBL (and Swiss-Prot) flat file >> formats that >> have both creation and last modified dates. >> >> Post to the list if you have trouble getting the dates from >> EMBL/Swiss-Prot >> formats using bioperl. >> >> Yours, >> >> -Heikki >> >> On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: >>> Hi all, >>> >>> Is there any way to obtain the date of creation of individual GenBank >>> entries? I don't mean the "last revision" date that can be found in the >>> first line of a GenBank file. >>> >>> I can access this creation date by looking at the "revision history" of >>> any GenBank entry (for example, see >>> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), >>> but I need a systematic (and local=fast) way to access this information. >>> >>> Any help would be very appreciated, >>> Thank you very much in advance, >>> >>> M; >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ______ _/ _/_____________________________________________________ >> _/ _/ >> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >> _/ _/ _/ SANBI, South African National Bioinformatics Institute >> _/ _/ _/ University of Western Cape, South Africa >> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >> ___ _/_/_/_/_/________________________________________________________ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Mon Apr 7 13:48:45 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Apr 2008 12:48:45 -0500 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47FA4AD2.5030206@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> <47FA4AD2.5030206@uv.es> Message-ID: Note in the example I gave that, during the revision history, the DBSOURCE changed at the point of the creation date (the original nuc. record was a M. tuberculosis contig sequence, which later changed to an updated full M. tuberculosis genome record at the time of the 'create date'). Couldn't find anything specific in the GenBank docs on this, but it appears (at least for a protein record) the creation date reflects the date in which the sequence was either originally deposited or originally derived from the nucleotide source record present in the record. In other words, it may not reflect the original date of deposition (which could have come from a different record, as in this case). chris On Apr 7, 2008, at 11:24 AM, Miguel Pignatelli wrote: > > I've noticed that the ASN.1 version of those records has a "creation- > date" tag. > But this is somehow strange, because the creation date obtained by > you and that obtained via ASN.1 format is 2003/11/21, but if you > look at the revision history of the record: > > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=CAB02640 > > reports a creation date of "Oct 19 1996 12:28 AM" > > I don't know how to get this, because the EMBL version of this gene: > > http://www.ebi.ac.uk/cgi-bin/dbfetch?db=emblcds&id=CAB02640&style=raw > > doesn't has DT fields at all. > > M; > > > Chris Fields wrote: >> Strangely enough, if you use NCBI's esummary you can get both >> dates. Via Bio::DB::EUtilities in bioperl-live, if you dump out >> DocSum data (using a debugging method I added in a while back): >> --------------------------------------- >> use Bio::DB::EUtilities; >> # for multiple IDs use an array ref; also only use GI's (not >> accessions) >> my $factory = Bio::DB::EUtilities->new( >> -eutil => 'esummary', >> -db => 'protein', >> -id => 1621261); >> $factory->print_DocSums; >> --------------------------------------- >> One gets the following tag/value pairs: >> UID: 1621261 >> Caption :CAB02640 >> Title :PROBABLE PYRIMIDINE OPERON REGULATORY PROTEIN >> PYRR [Mycobacterium tuberculosis >> H37Rv] >> Extra :gi|1621261|emb|CAB02640.1|[1621261] >> Gi :1621261 >> CreateDate :2003/11/21 >> UpdateDate :2006/11/14 >> Flags : >> TaxId :83332 >> Length :193 >> Status :live >> ReplacedBy : >> Comment : >> I'll add in a method to grab the data element by tag (in this case, >> grab the creation date by asking for the 'CreateDate' key). Might >> come in handy for scripts. >> chris >> On Apr 7, 2008, at 7:48 AM, Heikki Lehvaslaiho wrote: >>> Miguel, >>> >>> You probably know this but: >>> >>> - Your entry example below is a GenPept entry, not a GenBank entry >>> - The NCBI sequence format "genbank" has only the last modified >>> date. >>> I do not know about other formats (ASN.1, ...) >>> - NCBI Entrez is a great tool but it obscures the source database. >>> - If you really are working on real GenBank entries, you can use >>> the accession >>> number to see find corresponding EMBL (and Swiss-Prot) flat file >>> formats that >>> have both creation and last modified dates. >>> >>> Post to the list if you have trouble getting the dates from EMBL/ >>> Swiss-Prot >>> formats using bioperl. >>> >>> Yours, >>> >>> -Heikki >>> >>> On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: >>>> Hi all, >>>> >>>> Is there any way to obtain the date of creation of individual >>>> GenBank >>>> entries? I don't mean the "last revision" date that can be found >>>> in the >>>> first line of a GenBank file. >>>> >>>> I can access this creation date by looking at the "revision >>>> history" of >>>> any GenBank entry (for example, see >>>> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105) >>>> , >>>> but I need a systematic (and local=fast) way to access this >>>> information. >>>> >>>> Any help would be very appreciated, >>>> Thank you very much in advance, >>>> >>>> M; >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> ______ _/ _/ >>> _____________________________________________________ >>> _/ _/ >>> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >>> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >>> _/ _/ _/ SANBI, South African National Bioinformatics Institute >>> _/ _/ _/ University of Western Cape, South Africa >>> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >>> ___ _/_/_/_/_/ >>> ________________________________________________________ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Bank.Beszteri at awi.de Tue Apr 8 03:35:43 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Tue, 08 Apr 2008 09:35:43 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> Message-ID: <47FB204F.90405@awi.de> >>Can you provide some examples of these warnings (of the taxons that >>cause them)? If there's anything consistent about them perhaps >>Bio::Species can be improved to accommodate them properly (instead of >>just issuing the warning and getting the classification wrong). >> >> > >All warnings (and a few errors) for swissprot are here: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2474 > >as an attached file > >I suppose the OP will have encountered similar output - I don't think there is >much RDBMS-type-dependency involved. > > Hi Erik & Sendu, yes, the same kind of thing, probably no DBMS-type dependency; in case it could be useful, I uploaded my output as a second attachment to the bugzilla report cited above. Bank From heikki at sanbi.ac.za Tue Apr 8 04:32:12 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Tue, 8 Apr 2008 10:32:12 +0200 Subject: [Bioperl-l] Blast database sequence retrieval perl script In-Reply-To: <6BEABCD5CA640A44A848448A42A03B73079E48C9@ilrikeadx1.ILRI.CGIARAD.ORG> References: <6BEABCD5CA640A44A848448A42A03B73079E48C9@ilrikeadx1.ILRI.CGIARAD.ORG> Message-ID: <200804081032.12312.heikki@sanbi.ac.za> Dear Nelson, I am cc:ing the bioperl mailing list where all these kind of queries should go. More people can help you that way. Since you have your own local data set, you need to create an index that catalogues you sequences for easy retrieval. You need to install bioperl-live first. See for example: http://www.bioperl.org/wiki/Using_Subversion Then you can follow this HOWTO: http://www.bioperl.org/wiki/HOWTO:Flat_databases The other HOWTOs will help you dealing with BioPerl sequence objects that are retrieved: http://www.bioperl.org/wiki/HOWTOs. Yours, -Heikki On Monday 07 April 2008 14:50:23 Ndegwa, Nelson (IITA-Nairobi) wrote: > Dear Prof. Heikki, > > Hi. We met at the Pathogen Bioinformatics Conference held in Nairobi > Kenya in May 2007 at ICIPE. I recall you are a developer of Bioperl and > Perl. I have managed to install a local Blast, having just cowpea Contig > sequences, about 50,000 in total. This runs fine, as I can perform > various queries and get results. However, any good match/hit on the > local Blast database is hard to retrieve and the only option seems to go > back to that database and search manually for the top hit sequence - an > exceedingly manual task. Might you perhaps be having a Perl script I > could adopt to my database to help with this task Such that the hits > have a hyperlink which can be used to retrieve that specific entry? I > have limited knowledge of Perl. Thank you. > > With Kind Regards, > > Nelson. -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From David.Messina at sbc.su.se Tue Apr 8 07:29:12 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 8 Apr 2008 13:29:12 +0200 Subject: [Bioperl-l] How to analysis the relationship of my interesting KEGG pathways? In-Reply-To: <628aabb70804080053g1fd9120ex9d5fd12f65f216f9@mail.gmail.com> References: <628aabb70804080053g1fd9120ex9d5fd12f65f216f9@mail.gmail.com> Message-ID: <628aabb70804080429k2aa17a6eu12197709d4cc1af0@mail.gmail.com> Hi Jinyan, You asked a similar question last week and received a couple of suggestions -- did you take a look at those? I'm not an expert on this topic, but I believe that since regulatory information is much harder to obtain experimentally and therefore much less well known, there isn't a lot of it in pathway databases like KEGG. You may have to look through the literature and start trying to put together possible regulatory links on your own. Dave From hrh at sanger.ac.uk Tue Apr 8 08:48:32 2008 From: hrh at sanger.ac.uk (Hans Rudolf Hotz) Date: Tue, 8 Apr 2008 13:48:32 +0100 (BST) Subject: [Bioperl-l] Blast database sequence retrieval perl script In-Reply-To: <200804081032.12312.heikki@sanbi.ac.za> References: <6BEABCD5CA640A44A848448A42A03B73079E48C9@ilrikeadx1.ILRI.CGIARAD.ORG> <200804081032.12312.heikki@sanbi.ac.za> Message-ID: Nelson or simply use the BLAST indices for the sequence retrieval as well. All you need to do is adding the "-o" option to the 'formatdb' command for the BLAST index creation (this will create some extra files). Then you can use 'fastacmd' (which is also part of the NCBI BLAST package) to retrieve the sequences. Hans On Tue, 8 Apr 2008, Heikki Lehvaslaiho wrote: > > Dear Nelson, > > I am cc:ing the bioperl mailing list where all these kind of queries should > go. More people can help you that way. > > > Since you have your own local data set, you need to create an index that > catalogues you sequences for easy retrieval. > > You need to install bioperl-live first. See for example: > http://www.bioperl.org/wiki/Using_Subversion > > Then you can follow this HOWTO: > http://www.bioperl.org/wiki/HOWTO:Flat_databases > > The other HOWTOs will help you dealing with BioPerl sequence objects that are > retrieved: http://www.bioperl.org/wiki/HOWTOs. > > > Yours, > > -Heikki > > > On Monday 07 April 2008 14:50:23 Ndegwa, Nelson (IITA-Nairobi) wrote: >> Dear Prof. Heikki, >> >> Hi. We met at the Pathogen Bioinformatics Conference held in Nairobi >> Kenya in May 2007 at ICIPE. I recall you are a developer of Bioperl and >> Perl. I have managed to install a local Blast, having just cowpea Contig >> sequences, about 50,000 in total. This runs fine, as I can perform >> various queries and get results. However, any good match/hit on the >> local Blast database is hard to retrieve and the only option seems to go >> back to that database and search manually for the top hit sequence - an >> exceedingly manual task. Might you perhaps be having a Perl script I >> could adopt to my database to help with this task Such that the hits >> have a hyperlink which can be used to retrieve that specific entry? I >> have limited knowledge of Perl. Thank you. >> >> With Kind Regards, >> >> Nelson. > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From robert.citek at gmail.com Tue Apr 8 10:09:27 2008 From: robert.citek at gmail.com (Robert Citek) Date: Tue, 8 Apr 2008 09:09:27 -0500 Subject: [Bioperl-l] module for pubchem queries In-Reply-To: <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> References: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> Message-ID: <4145b6790804080709l20f1e56erf4b7af04b0a52870@mail.gmail.com> Wrapping bioperl around eutils will work just fine. Thanks for the pointer. http://search.cpan.org/~sendu/bioperl-1.5.2_102/Bio/DB/EUtilities.pm Regards, - Robert On Fri, Apr 4, 2008 at 4:25 PM, Chris Fields wrote: > Do you need something to access eutils via BioPerl, or are you looking for a > specific set of classes? I wrote an interface to eutils > (Bio::DB::EUtilities), you could do something like this: > > #!/usr/bin/perl -w > > use strict; > use warnings; > use Bio::DB::EUtilities; > > my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', > -term => 'dihydroorotate', > -db => 'pcsubstance', > -retmax => 1000); > > print join(',',$eutil->get_ids)."\n"; > > chris From cjfields at uiuc.edu Tue Apr 8 11:10:26 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 8 Apr 2008 10:10:26 -0500 Subject: [Bioperl-l] module for pubchem queries In-Reply-To: <4145b6790804080709l20f1e56erf4b7af04b0a52870@mail.gmail.com> References: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> <4145b6790804080709l20f1e56erf4b7af04b0a52870@mail.gmail.com> Message-ID: <32D210FC-575E-4D95-95DA-FC6F5BE1FC24@uiuc.edu> Just to note, the the API has changed significantly from the interface in the 1.5.2 release. The up-to-date (supported) interface is in subversion; there are some example recipes here: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook I'm working on a full HOWTO, just haven't had time to get it up on the wiki yet. chris On Apr 8, 2008, at 9:09 AM, Robert Citek wrote: > Wrapping bioperl around eutils will work just fine. Thanks for the > pointer. > > http://search.cpan.org/~sendu/bioperl-1.5.2_102/Bio/DB/EUtilities.pm > > Regards, > - Robert > > On Fri, Apr 4, 2008 at 4:25 PM, Chris Fields > wrote: >> Do you need something to access eutils via BioPerl, or are you >> looking for a >> specific set of classes? I wrote an interface to eutils >> (Bio::DB::EUtilities), you could do something like this: >> >> #!/usr/bin/perl -w >> >> use strict; >> use warnings; >> use Bio::DB::EUtilities; >> >> my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', >> -term => 'dihydroorotate', >> -db => 'pcsubstance', >> -retmax => 1000); >> >> print join(',',$eutil->get_ids)."\n"; >> >> chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cuiw at ncbi.nlm.nih.gov Tue Apr 8 16:41:58 2008 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Tue, 8 Apr 2008 16:41:58 -0400 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47F9F3AA.2090003@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> Message-ID: <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> Hi, Miguel: id1_fetch can do it. Detailed instruction can be found at: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id 1_fetch.html Here is an example: >id1_fetch -lt revisions -flat '12:74311105' -fmt fasta GI Loaded DB Retrieval No. -- ------ -- ------------- 74311105 12/07/2007 NCBI 19766263 74311105 01/23/2007 NCBI 16325656 74311105 03/30/2006 NCBI 13131204 74311105 03/03/2006 NCBI 12915541 74311105 03/02/2006 NCBI 12885275 74311105 12/03/2005 NCBI 12259793 74311105 09/09/2005 NCBI 11257262 74311105 09/09/2005 NCBI 11242667 Wenwu Cui PhD NCBI/NLM/NIH > -----Original Message----- > From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > Sent: Monday, April 07, 2008 6:13 AM > Cc: bioperl-l at bioperl.org > Subject: [Bioperl-l] GenBank entries creation dates > > Hi all, > > Is there any way to obtain the date of creation of individual GenBank > entries? I don't mean the "last revision" date that can be found in the > first line of a GenBank file. > > I can access this creation date by looking at the "revision history" of > any GenBank entry (for example, see > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), > but I need a systematic (and local=fast) way to access this > information. > > Any help would be very appreciated, > Thank you very much in advance, > > M; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miguel.pignatelli at uv.es Wed Apr 9 07:32:39 2008 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Wed, 09 Apr 2008 13:32:39 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> Message-ID: <47FCA957.5040409@uv.es> Wow, impressive, thanks Wenwu for the information, I have never used this tool before. The problem is that I need to know all the revision history (or at least the creation date) for *all* the GIs present in nr (well, or at least a significant portion of it) and this tool queries via web. The existence of this tool confirms me that this information is available somewhere, is it possible to download the data that contains this information? Thanks again, M; Cui, Wenwu (NIH/NLM/NCBI) [C] wrote: > Hi, Miguel: > > id1_fetch can do it. Detailed instruction can be found at: > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id > 1_fetch.html > > Here is an example: > >> id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > GI Loaded DB Retrieval No. > -- ------ -- ------------- > 74311105 12/07/2007 NCBI 19766263 > 74311105 01/23/2007 NCBI 16325656 > 74311105 03/30/2006 NCBI 13131204 > 74311105 03/03/2006 NCBI 12915541 > 74311105 03/02/2006 NCBI 12885275 > 74311105 12/03/2005 NCBI 12259793 > 74311105 09/09/2005 NCBI 11257262 > 74311105 09/09/2005 NCBI 11242667 > > Wenwu Cui PhD > NCBI/NLM/NIH > >> -----Original Message----- >> From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] >> Sent: Monday, April 07, 2008 6:13 AM >> Cc: bioperl-l at bioperl.org >> Subject: [Bioperl-l] GenBank entries creation dates >> >> Hi all, >> >> Is there any way to obtain the date of creation of individual GenBank >> entries? I don't mean the "last revision" date that can be found in > the >> first line of a GenBank file. >> >> I can access this creation date by looking at the "revision history" > of >> any GenBank entry (for example, see >> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), >> but I need a systematic (and local=fast) way to access this >> information. >> >> Any help would be very appreciated, >> Thank you very much in advance, >> >> M; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cuiw at ncbi.nlm.nih.gov Wed Apr 9 09:25:16 2008 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Wed, 9 Apr 2008 09:25:16 -0400 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47FCA957.5040409@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> <47FCA957.5040409@uv.es> Message-ID: <6F230E9769AA8D4EB4BC401DF133EDB7180BE1@NIHCESMLBX15.nih.gov> Hi, Miguel, I do not know whether the data file is publically available. However, you can perform 'real time' query via id1_fetch: ####step 1: generate GI file ##### id1_fetch -query 'YOUR-GENBANK-QUERY-STRING' -lt none -db Nucleotide -out qfile ####step 2: retrieve revisions for GIs stored in qfile ##### id1_fetch -lt revisions -qf qfile -fmt fasta -db Nucleotide Good luck! Wenwu Cui > -----Original Message----- > From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > Sent: Wednesday, April 09, 2008 7:33 AM > To: Cui, Wenwu (NIH/NLM/NCBI) [C] > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] GenBank entries creation dates > > Wow, impressive, thanks Wenwu for the information, I have never used > this tool before. The problem is that I need to know all the revision > history (or at least the creation date) for *all* the GIs present in nr > (well, or at least a significant portion of it) and this tool queries > via web. > > The existence of this tool confirms me that this information is > available somewhere, is it possible to download the data that contains > this information? > > Thanks again, > > M; > > > Cui, Wenwu (NIH/NLM/NCBI) [C] wrote: > > Hi, Miguel: > > > > id1_fetch can do it. Detailed instruction can be found at: > > > > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.i > d > > 1_fetch.html > > > > Here is an example: > > > >> id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > > GI Loaded DB Retrieval No. > > -- ------ -- ------------- > > 74311105 12/07/2007 NCBI 19766263 > > 74311105 01/23/2007 NCBI 16325656 > > 74311105 03/30/2006 NCBI 13131204 > > 74311105 03/03/2006 NCBI 12915541 > > 74311105 03/02/2006 NCBI 12885275 > > 74311105 12/03/2005 NCBI 12259793 > > 74311105 09/09/2005 NCBI 11257262 > > 74311105 09/09/2005 NCBI 11242667 > > > > Wenwu Cui PhD > > NCBI/NLM/NIH > > > >> -----Original Message----- > >> From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > >> Sent: Monday, April 07, 2008 6:13 AM > >> Cc: bioperl-l at bioperl.org > >> Subject: [Bioperl-l] GenBank entries creation dates > >> > >> Hi all, > >> > >> Is there any way to obtain the date of creation of individual > GenBank > >> entries? I don't mean the "last revision" date that can be found in > > the > >> first line of a GenBank file. > >> > >> I can access this creation date by looking at the "revision history" > > of > >> any GenBank entry (for example, see > >> > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), > >> but I need a systematic (and local=fast) way to access this > >> information. > >> > >> Any help would be very appreciated, > >> Thank you very much in advance, > >> > >> M; > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From CALLEY_JOHN_N at LILLY.COM Wed Apr 9 09:45:23 2008 From: CALLEY_JOHN_N at LILLY.COM (John N Calley) Date: Wed, 9 Apr 2008 09:45:23 -0400 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47FCA957.5040409@uv.es> Message-ID: You might want to keep in mind that the creation date is not always reliable. I am aware of one example where the recorded creation date precedes the sequencing date by several months (as determined by the trace file date). NCBI was not able to explain exactly what happened but (as I recall) hypothesized that some dates had been scrambled in a database rebuild. If there was interest I could probably pull up more details. John Calley Miguel Pignatelli Sent by: bioperl-l-bounces at lists.open-bio.org 04/09/2008 07:32 AM Please respond to miguel.pignatelli at uv.es To "Cui, Wenwu (NIH/NLM/NCBI) [C]" cc bioperl-l at bioperl.org Subject Re: [Bioperl-l] GenBank entries creation dates Wow, impressive, thanks Wenwu for the information, I have never used this tool before. The problem is that I need to know all the revision history (or at least the creation date) for *all* the GIs present in nr (well, or at least a significant portion of it) and this tool queries via web. The existence of this tool confirms me that this information is available somewhere, is it possible to download the data that contains this information? Thanks again, M; Cui, Wenwu (NIH/NLM/NCBI) [C] wrote: > Hi, Miguel: > > id1_fetch can do it. Detailed instruction can be found at: > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id > 1_fetch.html > > Here is an example: > >> id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > GI Loaded DB Retrieval No. > -- ------ -- ------------- > 74311105 12/07/2007 NCBI 19766263 > 74311105 01/23/2007 NCBI 16325656 > 74311105 03/30/2006 NCBI 13131204 > 74311105 03/03/2006 NCBI 12915541 > 74311105 03/02/2006 NCBI 12885275 > 74311105 12/03/2005 NCBI 12259793 > 74311105 09/09/2005 NCBI 11257262 > 74311105 09/09/2005 NCBI 11242667 > > Wenwu Cui PhD > NCBI/NLM/NIH > >> -----Original Message----- >> From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] >> Sent: Monday, April 07, 2008 6:13 AM >> Cc: bioperl-l at bioperl.org >> Subject: [Bioperl-l] GenBank entries creation dates >> >> Hi all, >> >> Is there any way to obtain the date of creation of individual GenBank >> entries? I don't mean the "last revision" date that can be found in > the >> first line of a GenBank file. >> >> I can access this creation date by looking at the "revision history" > of >> any GenBank entry (for example, see >> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), >> but I need a systematic (and local=fast) way to access this >> information. >> >> Any help would be very appreciated, >> Thank you very much in advance, >> >> M; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From frederic.romagne at gmail.com Wed Apr 9 16:45:50 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Wed, 09 Apr 2008 15:45:50 -0500 Subject: [Bioperl-l] question about clustalw module. Message-ID: <1207773950.483.13.camel@kiss-laptop> Hello, i have a problem when using Bio::Tools::Run::Alignment::Clustalw : I give it an array_ref scalar (the array contains some fasta sequences) and all the good parameters and i write the result via Bio::SeqIO. The fact is that my result file only contains the Accession number in the header... An example : the initial stream is : >NM_052854 Homo sapiens cAMP responsive element binding protein 3-like 1 (CREB3L1), mRNA. AGAAGACGTGCGGAGGGAGACGCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGG GGGAGCACTTAGCTCCCCCGCCCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTC AGCCCCAACCCCGGGCTCCCCATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGT GGAGTCGGCTGAATGCCCACGGTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCG CTGCCCTAAGGCCCCCGCGCGCCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCG CCCCTCCCCCGGGGCTTCGCCCCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAG GAGCTCTGGACTGGGCGCGCCGCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCC CGGGAGCCGGCTGCGATGGACGCCGTCTTGGAACCCTTCCCGGCCGACAGGCTGTTCCCC GGATCCAGCTTCCTGGACTTGGGGGATCTGAACGAGTCGGACTTCCTCAACAATGCGCAC ... the result file is : >NM_052854 ---------------------------------------AGAAGACGTGCGGAGGGAGAC GCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGGGGGAGCACTTAGCTCCCCCGC CCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTCAGCCCCAACCCCGGGCTCCCC ATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGTGGAGTCGGCTGAATGCCCACG GTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCGCTGCCCTAAGGCCCCCGCGCG CCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCGCCCCTCCCCCGGGGCTTCGCC CCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAGGAGCTCTGGACTGGGCGCGCC GCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCCCGGGAGCCGGCTGCGATGGAC ... ?So i lost the other informations provided by the header... ?Is there any option to keep these informations? Here is a part of my code with my options : my $seq_ref=\@seq; my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM', 'quiet' => 1, 'output' => 'FASTA'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); my $aln = $factory->align($seq_ref); Thank you. From jason at bioperl.org Wed Apr 9 16:55:13 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 9 Apr 2008 13:55:13 -0700 Subject: [Bioperl-l] question about clustalw module. In-Reply-To: <1207773950.483.13.camel@kiss-laptop> References: <1207773950.483.13.camel@kiss-laptop> Message-ID: the clustal alignment format does not allow for the description - if you want to preserve it you'll have to add it back, make a hash indexed by sequence ID and store the description, then when you get your alignment back you can update the description field before writing it out with AlignIO. -jason On Apr 9, 2008, at 1:45 PM, Fr?d?ric Romagn? wrote: > Hello, > > i have a problem when using Bio::Tools::Run::Alignment::Clustalw : > > I give it an array_ref scalar (the array contains some fasta > sequences) > and all the good parameters and i write the result via Bio::SeqIO. > > The fact is that my result file only contains the Accession number in > the header... An example : > > the initial stream is : > >> NM_052854 Homo sapiens cAMP responsive element binding protein 3- >> like 1 > (CREB3L1), mRNA. > AGAAGACGTGCGGAGGGAGACGCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGG > GGGAGCACTTAGCTCCCCCGCCCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTC > AGCCCCAACCCCGGGCTCCCCATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGT > GGAGTCGGCTGAATGCCCACGGTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCG > CTGCCCTAAGGCCCCCGCGCGCCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCG > CCCCTCCCCCGGGGCTTCGCCCCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAG > GAGCTCTGGACTGGGCGCGCCGCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCC > CGGGAGCCGGCTGCGATGGACGCCGTCTTGGAACCCTTCCCGGCCGACAGGCTGTTCCCC > GGATCCAGCTTCCTGGACTTGGGGGATCTGAACGAGTCGGACTTCCTCAACAATGCGCAC > > ... > > the result file is : > >> NM_052854 > ---------------------------------------AGAAGACGTGCGGAGGGAGAC > GCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGGGGGAGCACTTAGCTCCCCCGC > CCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTCAGCCCCAACCCCGGGCTCCCC > ATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGTGGAGTCGGCTGAATGCCCACG > GTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCGCTGCCCTAAGGCCCCCGCGCG > CCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCGCCCCTCCCCCGGGGCTTCGCC > CCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAGGAGCTCTGGACTGGGCGCGCC > GCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCCCGGGAGCCGGCTGCGATGGAC > > ... > > So i lost the other informations provided by the header... > > Is there any option to keep these informations? > > Here is a part of my code with my options : > > > my $seq_ref=\@seq; > my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM', 'quiet' => 1, > 'output' => 'FASTA'); > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > my $aln = $factory->align($seq_ref); > > > Thank you. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lamq at usal.es Thu Apr 10 11:52:24 2008 From: lamq at usal.es (Luis A. M. Quintales) Date: Thu, 10 Apr 2008 17:52:24 +0200 Subject: [Bioperl-l] xyplot glyph problem with previous aggregation Message-ID: <47FE37B8.9090404@usal.es> I am not able to add xyplot glyphs to one panel because I have some problems with the aggregations. Using that GFF file: ##sequence-region chr1 1 5578650 chr1 atfreq atpc 1 50 58.8000 . . atpc 1 chr1 atfreq atpc 51 100 58.4000 . . atpc 1 chr1 atfreq atpc 101 150 57.6000 . . atpc 1 chr1 atfreq atpc 151 200 57.8000 . . atpc 1 . . . And this source code for preparing the aggregated features necessary for the xyplot glyph: my $filin = $ARGV[0]; my $db = Bio::DB::GFF->new( -dsn => $filin, -adaptor => 'memory', -aggregator => 'at{atpc:atfreq}' ); my $segment = $db->segment('chr1'); my @features1 = $db->features('atpc'); print "$#features1 \n"; my @features2 = $segment->features('atpc'); print "$#features2 \n"; my @features3 = $db->features('at'); print "$#features3 \n"; my @features4 = $segment->features('at'); print "$#features4 \n"; I obtain: 111572 111572 0 0 What I am doing wrong with the aggregator? Many thanks. From lamq at usal.es Thu Apr 10 11:45:55 2008 From: lamq at usal.es (Luis A. M. Quintales) Date: Thu, 10 Apr 2008 17:45:55 +0200 Subject: [Bioperl-l] xyplot glyph problem with previous aggregation Message-ID: <47FE3633.70908@usal.es> I am not able to add xyplot glyphs to one panel because I have some problems with the aggregations. Using that GFF file: ##sequence-region chr1 1 5578650 chr1 atfreq atpc 1 50 58.8000 . . atpc 1 chr1 atfreq atpc 51 100 58.4000 . . atpc 1 chr1 atfreq atpc 101 150 57.6000 . . atpc 1 chr1 atfreq atpc 151 200 57.8000 . . atpc 1 . . . And this source code for preparing the aggregated features necessary for the xyplot glyph: my $filin = $ARGV[0]; my $db = Bio::DB::GFF->new( -dsn => $filin, -adaptor => 'memory', -aggregator => 'at{atpc:atfreq}' ); my $segment = $db->segment('chr1'); my @features1 = $db->features('atpc'); print "$#features1 \n"; my @features2 = $segment->features('atpc'); print "$#features2 \n"; my @features3 = $db->features('at'); print "$#features3 \n"; my @features4 = $segment->features('at'); print "$#features4 \n"; I obtain: 111572 111572 0 0 What I am doing wrong with the aggregator? Many thanks. From lincoln.stein at gmail.com Thu Apr 10 13:55:06 2008 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 10 Apr 2008 13:55:06 -0400 Subject: [Bioperl-l] xyplot glyph problem with previous aggregation In-Reply-To: <47FE37B8.9090404@usal.es> References: <47FE37B8.9090404@usal.es> Message-ID: <6dce9a0b0804101055w65e22abfgaa4f155751fef40f@mail.gmail.com> Hi Luis, When you aggregate the atpc 1 features together, you end up with one feature. Thus @features3 is an array of size 1. The $# operator returns the index of the last element, which is 0. If @features3 were empty, $#features3 would return -1. Lincoln On Thu, Apr 10, 2008 at 11:52 AM, Luis A. M. Quintales wrote: > I am not able to add xyplot glyphs to one panel because I have some > problems with the aggregations. > > Using that GFF file: > > ##sequence-region chr1 1 5578650 > chr1 atfreq atpc 1 50 58.8000 . . atpc 1 > chr1 atfreq atpc 51 100 58.4000 . . atpc 1 > chr1 atfreq atpc 101 150 57.6000 . . atpc 1 > chr1 atfreq atpc 151 200 57.8000 . . atpc 1 > . . . > > > And this source code for preparing the aggregated features necessary for > the xyplot glyph: > > my $filin = $ARGV[0]; > my $db = Bio::DB::GFF->new( -dsn => $filin, > -adaptor => 'memory', > -aggregator => 'at{atpc:atfreq}' > ); > my $segment = $db->segment('chr1'); > my @features1 = $db->features('atpc'); > print "$#features1 \n"; > my @features2 = $segment->features('atpc'); > print "$#features2 \n"; > my @features3 = $db->features('at'); > print "$#features3 \n"; > my @features4 = $segment->features('at'); > print "$#features4 \n"; > > I obtain: > > 111572 > 111572 > 0 > 0 > > What I am doing wrong with the aggregator? > > Many thanks. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From adsj at novozymes.com Fri Apr 11 04:53:23 2008 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Fri, 11 Apr 2008 10:53:23 +0200 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example Message-ID: <87d4owixh8.fsf@topper.koldfront.dk> Hi. I am trying to make Bio::SeqIO return objects of my own type (a small extension of Bio::Seq::RichSeq), by setting -seqfactory. I am having a little trouble creating the correct object to pass with -seqfactory: Following the example given in SYNOPSIS of Bio::Factory::SequenceFactoryI, I get this error: $ perl -e ' > use Bio::Seq::SeqFactory; > my $seqbuilder = Bio::Seq::SeqFactory->new('type' => 'Bio::PrimarySeq'); > > my $seq = $seqbuilder->create(-seq => 'ACTGAT', > -display_id => 'exampleseq'); > > print "seq is a ", ref($seq), "\n"; > ' ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Can't locate type.pm in @INC (@INC contains: /z/bio/biotools/bioinfperlmodules/ /z/bio/adm/modules /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl .) at (eval 13) line 3. : Unrecognized Sequence type for SeqFactory 'type' STACK: Error::throw STACK: Bio::Root::Root::throw /usr/share/perl/5.8/Bio/Root/Root.pm:357 STACK: Bio::Seq::SeqFactory::type /usr/share/perl/5.8/Bio/Seq/SeqFactory.pm:134 STACK: Bio::Seq::SeqFactory::new /usr/share/perl/5.8/Bio/Seq/SeqFactory.pm:93 STACK: -e:3 ----------------------------------------------------------- $ If I go "Bio::Seq::SeqFactory('Bio::PrimarySeq'=>1)" instead, for instance, it seems to work: $ perl -e ' > use Bio::Seq::SeqFactory; > my $seqbuilder = Bio::Seq::SeqFactory->new('Bio::PrimarySeq'=>1); > > my $seq = $seqbuilder->create(-seq => 'ACTGAT', > -display_id => 'exampleseq'); > > print "seq is a ", ref($seq), "\n"; > ' seq is a Bio::PrimarySeq $ I was about to write a patch for the pod, when I realized that I'd better start by asking: Is this a buglet in the pod or the code? Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From hlapp at gmx.net Fri Apr 11 11:35:54 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 11 Apr 2008 11:35:54 -0400 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example In-Reply-To: <87d4owixh8.fsf@topper.koldfront.dk> References: <87d4owixh8.fsf@topper.koldfront.dk> Message-ID: <0037240B-F469-4388-972A-324101B11621@gmx.net> On Apr 11, 2008, at 4:53 AM, Adam Sj?gren wrote: > $ perl -e ' >> use Bio::Seq::SeqFactory; >> my $seqbuilder = Bio::Seq::SeqFactory->new('type' => >> 'Bio::PrimarySeq'); You need to prefix the argument with a dash: '-type', not 'type'. Otherwise, it assumes that the class you want instantiated is 'type.pm'. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From 1zoujing at 163.com Thu Apr 10 01:08:52 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 9 Apr 2008 22:08:52 -0700 (PDT) Subject: [Bioperl-l] Bio::ASN1::EntrezGene parse so slowly? Message-ID: <16602210.post@talk.nabble.com> I want to parse a file "gene_info" from NCBI. The format of Gene in NCBI is ASN1, right? So I used Bio::ASN1::EntrezGene. But it didn't work properly/too slow. The file is about 500M. The code is following: use Bio::ASN1::EntrezGene; my $parser = Bio::ASN1::EntrezGene->new('file' => $ARGV[0]); my $i = 0; while(my $result = $parser->next_seq) { last; #something to do there, here use last for test} When it goes to the "while" part, it is processing on and on, it does not went out, even I used "last" in the "while" part. So I wonder whether it is too slow or the module is not fit for this job, or I did something wrong? Thank you! -- View this message in context: http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16602210.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 02:17:41 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 9 Apr 2008 23:17:41 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16602770.post@talk.nabble.com> I am a geen hand in Bioperl. When I run perl with "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error information: Data Error: none conforming data found on line 1 in Sus_scrofa.ags. But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, should be the same as Homo_sapiens in the example. So it should be no error as the code is the example from Mingyi. I wonder why this happen, and should I change something about the file? -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16602770.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 02:56:52 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 9 Apr 2008 23:56:52 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: <16602770.post@talk.nabble.com> References: <16602770.post@talk.nabble.com> Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 03:03:56 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:03:56 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there still one thing, I want to parse "gene_info.gz" in Gene of NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file ) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 03:04:32 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:04:32 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there still one thing, I want to parse "gene_info.gz" in Gene of NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 03:09:40 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:09:40 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there still one thing, I want to parse "gene_info.gz" in Gene of NCBI. ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 03:10:26 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:10:26 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there is still one thing, I want to parse "gene_info.gz" in Gene of NCBI. ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From stefan.kirov at bms.com Fri Apr 11 15:59:29 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 11 Apr 2008 15:59:29 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: <16602770.post@talk.nabble.com> References: <16602770.post@talk.nabble.com> Message-ID: AGS is a binary ASN.1 format and WILL NOT be parsed! You have to use gene2xml( weird, but this is NCBI) with these flags: -c -x -b -i. This will spit out text ASN which can be parsed. Stefan On Wed, 9 Apr 2008, zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no error > as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- > View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16602770.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From stefan.kirov at bms.com Fri Apr 11 16:01:30 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 11 Apr 2008 16:01:30 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: <16603225.post@talk.nabble.com> References: <16603225.post@talk.nabble.com> Message-ID: It is not. If you use this file, why would you need a parser for it anyway? Just split on \t or read with OpenOffice or equiv. Stefan On Thu, 10 Apr 2008, zoujing wrote: > > Seached the web and found the answer now, quote the answer as following: > The error was thrown by my Bio::ASN1::EntrezGene module because it > expects a text file, while you fed it with a binary file. To use > gzipped ASN binary file from NCBI, download the NCBI gene2xml > (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), > then use this syntax to run my parser on the binary files: > > my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i > Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped > binary file directly downloaded from NCBI > > Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). > Mingyi > > But there still one thing, I want to parse "gene_info.gz" in Gene of > NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line > per GeneID, Column header line is the first line in the file > ) is not the right format for Bio::ASN1::EntrezGene? > > > > zoujing wrote: >> >> I am a geen hand in Bioperl. When I run perl with >> "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error >> information: >> Data Error: none conforming data found on line 1 in Sus_scrofa.ags. >> >> But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, >> should be the same as Homo_sapiens in the example. So it should be no >> error as the code is the example from Mingyi. >> I wonder why this happen, and should I change something about the file? >> >> > > -- > View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From asjo at koldfront.dk Fri Apr 11 15:39:59 2008 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Fri, 11 Apr 2008 21:39:59 +0200 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example In-Reply-To: <0037240B-F469-4388-972A-324101B11621@gmx.net> (Hilmar Lapp's message of "Fri, 11 Apr 2008 11:35:54 -0400") References: <87d4owixh8.fsf@topper.koldfront.dk> <0037240B-F469-4388-972A-324101B11621@gmx.net> Message-ID: <877if4i3jk.fsf@topper.koldfront.dk> On Fri, 11 Apr 2008 11:35:54 -0400, Hilmar wrote: > On Apr 11, 2008, at 4:53 AM, Adam Sj?gren wrote: >>> my $seqbuilder = Bio::Seq::SeqFac