From Bank.Beszteri at awi.de Tue Apr 1 08:31:49 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Tue, 01 Apr 2008 14:31:49 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL Message-ID: <47F22B35.1030502@awi.de> Dear list, we have recently started to try to find a solution for indexing large sequence databases / flat files for a java project, and because we ran into problems using biojava, and because both the OBDA and BioSQL ways seem to be compatible across bio~ projects, we also started to experiment with bioperl. It looks like this should work fine, but we had a couple of problems here, too. Perhaps some of you can give me hint what we are doing wrong! The first thing we tried was to use Bio::DB::Flat for indexing a TrEMBL flat file (~ 12 GB); but it seems we haven?t got a machine with enough memory to be able to handle this. (Perhaps you would be using the "bdb" style index in such a case in bioperl, but this apparently doesn?t work with biojava, so we had to stick with "flat"). So next we started to test BioSQL, by trying to load just Swissprot in a MySQL DB first, like: load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser xyz --dbpass abc --driver mysql --namespace uniprot_sprot --format swiss uniprot_sprot.dat Here we get an error message ########################################### Loading /biodb/spinkern/uniprot_sprot.dat ... Could not store Q6DAH5: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The supplied lineage does not start near 'Erwinia carotovora subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | Pectobacterium | Enterobacteriaceae | Enterobacteriales | Gammaproteobacteria | Proteobacteria | Bacteria') STACK: Error::throw STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/bioperl-1.5.2_102/Bio/Root/Root.pm:359 STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/bioperl-1.5.2_102/Bio/Species.pm:174 STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:552 STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1305 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:973 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182 STACK: Bio::DB::Persistent::PersistentObject::create /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271 STACK: load_seqdatabase.pl:622 ----------------------------------------------------------- at load_seqdatabase.pl line 635 ############################################ or similar, depending on whether we use a pre-loaded ncbi taxonomy or not, and which Swissprot release we are trying to load. It often seems to come from sg. like here, subsp. or other special addition to the species line; but alternative genus names and other curious things also to appear. It looks like Species.pm tries to validate the species name against the lineage info already there in the BioSQL DB, and in several cases, it finds inconsistencies. If we start with the ncbi taxonomy already loaded in the database, the first error comes much earlier. I found a thread on the same problem from ~ two years ago (http://thread.gmane.org/gmane.comp.lang.perl.bio.general/13766/focus=13788), where the solution recommended was to update bioperl, so I was quite surprised to find the problem with the version you can see above (1.5.2_102 bioperl core, 1.5.2_100 bioperl_db). Can someone give me any hints as to what is going wrong here? The only workaround we have found so far was to comment out line 174 in Species.pm: $self->throw("The supplied lineage does not start near '$name' (I was supplied '".join(" | ", @vals)."')"); After doing so, load_seqdatabase.pl runs for several hours (until it evetually crashes; I haven?t found out yet why), but proceeds really slowly. I also found some info on this for Pg and Oracle in the mailing list, but has anyone some approximate numbers for MySQL, how long should a first Swissprot load take? Would be grateful to hear about your ideas / experiences on these issues! Bank Beszteri Bioinformatics / Scientific Computing Alfred Wegener Institute for Polar and Marine Research Am Handelshafen 12. 27570 Bremerhaven Germany From cjfields at uiuc.edu Tue Apr 1 20:45:28 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 1 Apr 2008 19:45:28 -0500 Subject: [Bioperl-l] quick update on bioperl nightly builds Message-ID: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> I'm simplifying the nightly build archive names (removing svn revision # and date) in case anyone needs to update bioperl-live/run/db/network on a regular basis (read: GBrowse installations). When I have time I'll start working on automated builds, which will require some extra work with Module::Build and Build.PL. chris From hiekeen at gmail.com Tue Apr 1 22:14:07 2008 From: hiekeen at gmail.com (Jinyan Huang) Date: Wed, 2 Apr 2008 10:14:07 +0800 Subject: [Bioperl-l] How to make a network graphic using my genes in pathways? Message-ID: I have 20 pathways. My interesting genes are in these pathways. There are some genes overlaps in these pathways. How can I make a graphic network using these genes? It means connecting these pathways through these overlap genes. What kind of software can I use? Thank you very much in advance. -- Best regards, Jinyan Huang (ekeen) School of Life Sciences and Technology, 1302 Room Tongji University Siping Road 1239, Shanghai 200092 P.R. China Tel :0086-21-65981041 Msn: hiekeen at hotmail.com eMail: hiekeen at gmail.com From hlapp at gmx.net Tue Apr 1 22:30:06 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 1 Apr 2008 22:30:06 -0400 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47F22B35.1030502@awi.de> References: <47F22B35.1030502@awi.de> Message-ID: On Apr 1, 2008, at 8:31 AM, B?nk Beszteri wrote: > [...] So next we started to test BioSQL, by trying to load just > Swissprot in a MySQL DB first, like: > > load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser > xyz --dbpass abc --driver mysql --namespace uniprot_sprot --format > swiss uniprot_sprot.dat > > Here we get an error message > > ########################################### > > Loading /biodb/spinkern/uniprot_sprot.dat ... > Could not store Q6DAH5: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: The supplied lineage does not start near 'Erwinia carotovora > subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | > Pectobacterium | Enterobacteriaceae | Enterobacteriales | > Gammaproteobacteria | Proteobacteria | Bacteria') > STACK: Error::throw > STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/ > bioperl-1.5.2_102/Bio/Root/Root.pm:359 > STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/ > bioperl-1.5.2_102/Bio/Species.pm:174 > STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: > 552 > STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / > biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:1305 > STACK: > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:973 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:852 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:182 > STACK: Bio::DB::Persistent::PersistentObject::create /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: > 244 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:169 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:251 > STACK: Bio::DB::Persistent::PersistentObject::store /biodb/spinkern/ > bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271 > STACK: load_seqdatabase.pl:622 > ----------------------------------------------------------- > > at load_seqdatabase.pl line 635 > > ############################################ > > or similar, depending on whether we use a pre-loaded ncbi taxonomy > or not I recommend to always use a pre-loaded NCBI taxonomy unless you know there are only a few organisms that are straightforward (for the parser, that is). > , and which Swissprot release we are trying to load. It often seems > to come from sg. like here, subsp. or other special addition to the > species line; but alternative genus names and other curious things > also to appear. It looks like Species.pm tries to validate the > species name against the lineage info already there in the BioSQL > DB, and in several cases, it finds inconsistencies. It actually happens upon a successful lookup when the species object is populated from the database. > [...] > The only workaround we have found so far was to comment out line > 174 in Species.pm: > > $self->throw("The supplied lineage does not start near '$name' (I > was supplied '".join(" | ", @vals)."')"); That should be OK if you work with a pre-loaded taxonomy. It's sort of a sanity check that should catch a parser having messed up a species. If you use a pre-loaded NCBI taxonomy the results of the species parsing don't matter in all details so long as the NCBI taxonID is parsed out correctly, and then found in the database. Note that this actually a warn() in the main trunk version of BioPerl, so you might want to upgrade to that (or change throw() to warn() in your version). You still get the records flagged with that, but it isn't an exception. > > After doing so, load_seqdatabase.pl runs for several hours (until > it evetually crashes; I haven?t found out yet why), but proceeds > really slowly. It should certainly *not* crash. Note also that you can supply --safe on the command line, in which case the script will continue with the next record if one fails to load for whatever reason. You will want to adjust the width constraint of dbxref.accession, for example to 128 chars. This will also be fixed for BioSQL 1.0.1. See http://bugzilla.open-bio.org/show_bug.cgi?id=2474 > I also found some info on this for Pg and Oracle in the mailing > list, but has anyone some approximate numbers for MySQL, how long > should a first Swissprot load take? Possibly around 20 hours according to Erik Rijkers: See http://lists.open-bio.org/pipermail/bioperl-l/2008-March/027427.html You can use the --logchunks N option to have it print out performance statistics every N records. Hope this helps, -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Apr 1 22:38:12 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 1 Apr 2008 22:38:12 -0400 Subject: [Bioperl-l] Very basic implementation of GenBank XML SeqIO module In-Reply-To: <47F13C2C.4070909@umdnj.edu> References: <47F13C2C.4070909@umdnj.edu> Message-ID: Ryan - do you not have a committer account? I do agree with Chris on the test. Modules w/o tests tend to become 'pseudogenized.' -hilmar On Mar 31, 2008, at 3:31 PM, Ryan Golhar wrote: > I have a (very) basic SAX implementation of a SeqIO module to parse > GenBank XML records. Right now, it only reads in basic information > regarding the sequence and the sequence itself. > > It does not yet parse the features table. Should I submit it to be > included in bioperl or wait until I implement more for the features > table? I'm not sure when I'll get around to it though > > Ryan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cain.cshl at gmail.com Tue Apr 1 23:12:04 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 01 Apr 2008 23:12:04 -0400 Subject: [Bioperl-l] quick update on bioperl nightly builds In-Reply-To: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> Message-ID: <1207105924.6184.4.camel@frissell> Hi Chris, The tarball is currently (Apr 1) being built in a tmp directory, so that the extracted tarball is ./tmp/bioperl-live/. Is that intended? Thanks, Scott On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: > I'm simplifying the nightly build archive names (removing svn revision > # and date) in case anyone needs to update bioperl-live/run/db/network > on a regular basis (read: GBrowse installations). When I have time > I'll start working on automated builds, which will require some extra > work with Module::Build and Build.PL. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cjfields at uiuc.edu Tue Apr 1 23:59:30 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 1 Apr 2008 22:59:30 -0500 Subject: [Bioperl-l] quick update on bioperl nightly builds In-Reply-To: <1207105924.6184.4.camel@frissell> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> Message-ID: Nope, that isn't intended. I fixed it and reran it manually, so it should be fine now (note I didn't update the log file; the next cron run will catch that). I may toy around with your recent passthrough flag addition to try getting automated PPM's up and running. chris On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: > Hi Chris, > > The tarball is currently (Apr 1) being built in a tmp directory, so > that > the extracted tarball is ./tmp/bioperl-live/. Is that intended? > > Thanks, > Scott > > On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: >> I'm simplifying the nightly build archive names (removing svn >> revision >> # and date) in case anyone needs to update bioperl-live/run/db/ >> network >> on a regular basis (read: GBrowse installations). When I have time >> I'll start working on automated builds, which will require some extra >> work with Module::Build and Build.PL. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Wed Apr 2 07:33:38 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 2 Apr 2008 07:33:38 -0400 Subject: [Bioperl-l] How to make a network graphic using my genes in pathways? In-Reply-To: References: Message-ID: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> On Tue, Apr 1, 2008 at 10:14 PM, Jinyan Huang wrote: > I have 20 pathways. My interesting genes are in these pathways. There > are some genes overlaps in these pathways. How can I make a graphic > network using these genes? It means connecting these pathways through > these overlap genes. What kind of software can I use? R/Bioconductor has tools for working with graphs and pathways. Cytoscape is another open-source graphical solution. Ingenuity is, of course, not free. If you are looking at a perl solution, you can look at the various graph modules and their integration with the Graphviz libraries. SEan From cain.cshl at gmail.com Wed Apr 2 08:28:22 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 02 Apr 2008 08:28:22 -0400 Subject: [Bioperl-l] [Gmod-gbrowse] quick update on bioperl nightly builds In-Reply-To: References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> Message-ID: <1207139302.6507.7.camel@frissell> Hi Chris, (trimmed out gbrowse mailing list since this is just bioperl business) Speaking of the pass through stuff, Sendu mentioned that I stomped on some changes to Build.PL that you and he did when I committed that change, so it should be rolled back. Is there a good (svn) way to do that? Or should I just copy the contents of the old (good) Build.PL into a fresh file in my checkout and commit it? Thanks, Scott On Tue, 2008-04-01 at 22:59 -0500, Chris Fields wrote: > Nope, that isn't intended. I fixed it and reran it manually, so it > should be fine now (note I didn't update the log file; the next cron > run will catch that). > > I may toy around with your recent passthrough flag addition to try > getting automated PPM's up and running. > > chris > > On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: > > > Hi Chris, > > > > The tarball is currently (Apr 1) being built in a tmp directory, so > > that > > the extracted tarball is ./tmp/bioperl-live/. Is that intended? > > > > Thanks, > > Scott > > > > On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: > >> I'm simplifying the nightly build archive names (removing svn > >> revision > >> # and date) in case anyone needs to update bioperl-live/run/db/ > >> network > >> on a regular basis (read: GBrowse installations). When I have time > >> I'll start working on automated builds, which will require some extra > >> work with Module::Build and Build.PL. > >> > >> chris > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. cain at cshl.edu > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From robert.citek at gmail.com Wed Apr 2 08:24:06 2008 From: robert.citek at gmail.com (Robert Citek) Date: Wed, 2 Apr 2008 07:24:06 -0500 Subject: [Bioperl-l] module for pubchem queries Message-ID: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> Hello all, I have a list of chemical compounds that have some kind of interaction with proteins or genes. The current list contains names or SMILES and I would like to get the CID number for those compounds. Currently, I'm using perl to query the NCBI's eutils[1], which works great. But I was just curious to know of there was a bioperl module to do something similar. A quick google didn't turn up anything, so I thought I'd ask. [1] http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html Regards, - Robert From David.Messina at sbc.su.se Wed Apr 2 08:41:45 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 2 Apr 2008 14:41:45 +0200 Subject: [Bioperl-l] How to make a network graphic using my genes in pathways? In-Reply-To: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> Message-ID: <628aabb70804020541v6cee4584ibd9935290ae7cc0a@mail.gmail.com> I have no personal experience with it, but a colleague of mine suggested VisANT . Dave From cjfields at uiuc.edu Wed Apr 2 11:03:32 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 2 Apr 2008 10:03:32 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] quick update on bioperl nightly builds In-Reply-To: <1207139302.6507.7.camel@frissell> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> <1207139302.6507.7.camel@frissell> Message-ID: <3B490712-3413-4662-99D7-7B115CECB6E1@uiuc.edu> The changes I made were related to problems checking MySQL for Bio::DB::SeqFeature::Store tests when connectivity requires username/ password. For some reason it tests DB connectivity up front, while Bio::DB::GFF assumes the DB setup is correct (no direct DB check) then runs tests assuming the setup is correct. You can view the diffs for your commits here: http://code.open-bio.org/svnweb/index.cgi/bioperl/diff/bioperl-live/trunk/ModuleBuildBioperl.pm?revs=14604&revs=14548 http://code.open-bio.org/svnweb/index.cgi/bioperl/diff/bioperl-live/trunk/Build.PL?revs=14604&revs=14565 I'll try working on merging them together today; it shouldn't be too hard (the changes were fairly minor in both Build.PL and Module::Build). I'll test to make sure your changes stay in as well. Down the road I believe we need to rethink how we want the Build process to run using Module::Build as it's a bit convoluted, but it works for now. chris On Apr 2, 2008, at 7:28 AM, Scott Cain wrote: > Hi Chris, > > (trimmed out gbrowse mailing list since this is just bioperl business) > > Speaking of the pass through stuff, Sendu mentioned that I stomped on > some changes to Build.PL that you and he did when I committed that > change, so it should be rolled back. Is there a good (svn) way to do > that? Or should I just copy the contents of the old (good) Build.PL > into a fresh file in my checkout and commit it? > > Thanks, > Scott > > On Tue, 2008-04-01 at 22:59 -0500, Chris Fields wrote: >> Nope, that isn't intended. I fixed it and reran it manually, so it >> should be fine now (note I didn't update the log file; the next cron >> run will catch that). >> >> I may toy around with your recent passthrough flag addition to try >> getting automated PPM's up and running. >> >> chris >> >> On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: >> >>> Hi Chris, >>> >>> The tarball is currently (Apr 1) being built in a tmp directory, so >>> that >>> the extracted tarball is ./tmp/bioperl-live/. Is that intended? >>> >>> Thanks, >>> Scott >>> >>> On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: >>>> I'm simplifying the nightly build archive names (removing svn >>>> revision >>>> # and date) in case anyone needs to update bioperl-live/run/db/ >>>> network >>>> on a regular basis (read: GBrowse installations). When I have time >>>> I'll start working on automated builds, which will require some >>>> extra >>>> work with Module::Build and Build.PL. >>>> >>>> chris >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. cain at cshl.edu >>> GMOD Coordinator (http://www.gmod.org/) >>> 216-392-3087 >>> Cold Spring Harbor Laboratory >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace >> _______________________________________________ >> Gmod-gbrowse mailing list >> Gmod-gbrowse at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Apr 2 11:54:05 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 2 Apr 2008 10:54:05 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] quick update on bioperl nightly builds In-Reply-To: <3B490712-3413-4662-99D7-7B115CECB6E1@uiuc.edu> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> <1207139302.6507.7.camel@frissell> <3B490712-3413-4662-99D7-7B115CECB6E1@uiuc.edu> Message-ID: <71375DA3-A751-4908-8000-D9ACAE39B19C@uiuc.edu> Okay, committed them. The accept passthrough still appears to work; let me know if anything pops up. chris On Apr 2, 2008, at 10:03 AM, Chris Fields wrote: > ... > I'll try working on merging them together today; it shouldn't be too > hard (the changes were fairly minor in both Build.PL and > Module::Build). I'll test to make sure your changes stay in as > well. Down the road I believe we need to rethink how we want the > Build process to run using Module::Build as it's a bit convoluted, > but it works for now. > > chris > > On Apr 2, 2008, at 7:28 AM, Scott Cain wrote: >> Hi Chris, >> >> (trimmed out gbrowse mailing list since this is just bioperl >> business) >> >> Speaking of the pass through stuff, Sendu mentioned that I stomped on >> some changes to Build.PL that you and he did when I committed that >> change, so it should be rolled back. Is there a good (svn) way to do >> that? Or should I just copy the contents of the old (good) Build.PL >> into a fresh file in my checkout and commit it? >> >> Thanks, >> Scott >> >> On Tue, 2008-04-01 at 22:59 -0500, Chris Fields wrote: >>> Nope, that isn't intended. I fixed it and reran it manually, so it >>> should be fine now (note I didn't update the log file; the next cron >>> run will catch that). >>> >>> I may toy around with your recent passthrough flag addition to try >>> getting automated PPM's up and running. >>> >>> chris >>> >>> On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: >>> >>>> Hi Chris, >>>> >>>> The tarball is currently (Apr 1) being built in a tmp directory, so >>>> that >>>> the extracted tarball is ./tmp/bioperl-live/. Is that intended? >>>> >>>> Thanks, >>>> Scott >>>> >>>> On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: >>>>> I'm simplifying the nightly build archive names (removing svn >>>>> revision >>>>> # and date) in case anyone needs to update bioperl-live/run/db/ >>>>> network >>>>> on a regular basis (read: GBrowse installations). When I have >>>>> time >>>>> I'll start working on automated builds, which will require some >>>>> extra >>>>> work with Module::Build and Build.PL. >>>>> >>>>> chris >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. cain at cshl.edu >>>> GMOD Coordinator (http://www.gmod.org/) >>>> 216-392-3087 >>>> Cold Spring Harbor Laboratory >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace >>> _______________________________________________ >>> Gmod-gbrowse mailing list >>> Gmod-gbrowse at lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. cain.cshl at gmail.com >> GMOD Coordinator (http://www.gmod.org/) >> 216-392-3087 >> Cold Spring Harbor Laboratory >> >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From zhpan99 at yahoo.com Wed Apr 2 13:52:46 2008 From: zhpan99 at yahoo.com (Pan Zheng) Date: Wed, 2 Apr 2008 10:52:46 -0700 (PDT) Subject: [Bioperl-l] installing bioperl-1.5.2 errors:DB_File Message-ID: <726978.82400.qm@web53105.mail.re2.yahoo.com> Hi, I am installing bioperl-1.5.2_102 under cygwin on my Windows XP and having some errors during the process. When I was running "perl Build test", one major error is the error about DB_File. I tried to install DB_File from cpan and rpm without any luck. ++++++++++++++++++++++++ CPAN: File::Temp loaded ok (v0.16) CPAN: YAML loaded ok (v0.62) CPAN.pm: Going to build P/PM/PMQS/DB_File-1.817.tar.gz Parsing config.in... Looks Good. Checking if your kit is complete... Looks good Note (probably harmless): No library found for -ldb Writing Makefile for DB_File cp DB_File.pm blib/lib/DB_File.pm AutoSplitting blib/lib/DB_File.pm (blib/lib/auto/DB_File) gcc -c -I/usr/local/BerkeleyDB/include -DPERL_USE_SAFE_PUTENV -fno-strict-alias ing -pipe -Wdeclaration-after-statement -DUSEIMPORTLIB -O3 -DVERSION=\"1.817\" -DXS_VERSION=\"1.817\" "-I/usr/lib/perl5/5.8/cygwin/CORE" -D_NOT_CORE -DmDB_ Prefix_t=size_t -DmDB_Hash_t=u_int32_t version.c version.c:30:16: db.h: No such file or directory make: *** [version.o] Error 1 PMQS/DB_File-1.817.tar.gz /usr/bin/make -- NOT OK Running make test Can't test without successful make Running make install Make had returned bad status, install seems impossible Failed during this command: PMQS/DB_File-1.817.tar.gz : make NO +++++++++++++++++++++++++++++++++++++++++++++++ I can't remember I had this kind error while installing earlier version. Would you please help me on DB_File installation ? Thanks. Pan --------------------------------- You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. From dr.hogart at gmail.com Thu Apr 3 09:01:03 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Thu, 03 Apr 2008 17:01:03 +0400 Subject: [Bioperl-l] support of clustalw2 in bio::run::tool::alignment Message-ID: As for as I understand clustalw2 is not supported in bioperl v1.5.2.100. In what version it will be realized? Thank you in advance. From slduncan at iastate.edu Thu Apr 3 14:13:16 2008 From: slduncan at iastate.edu (slduncan at iastate.edu) Date: Thu, 3 Apr 2008 13:13:16 -0500 (CDT) Subject: [Bioperl-l] help installing bioperl with cygwin Message-ID: <161313331084931@webmail.iastate.edu> I am trying to use cpan to install bioperl and I had an error message saying: c:\Documents not recognized as and external or internal.... Any ideas here. Also, I am new to the computer world so please be kind. :) Stacy Duncan Iowa State University Bioinformatics and Computational Biology 1802 University Blvd. VMRI Building 6 Ames, IA 50011-1240 office phone: (515) 294-8385 office fax: (515) 294-1401 home phone: (336) 965-5622 e-mail: slduncan at iastate.edu From cjfields at uiuc.edu Fri Apr 4 16:13:23 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 4 Apr 2008 15:13:23 -0500 Subject: [Bioperl-l] help installing bioperl with cygwin In-Reply-To: <161313331084931@webmail.iastate.edu> References: <161313331084931@webmail.iastate.edu> Message-ID: It's best if you use ActiveState's Perl installation (it's the only one we really support at this moment, unless someone wants to give StrawberryPerl a run). See: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows chris On Apr 3, 2008, at 1:13 PM, slduncan at iastate.edu wrote: > I am trying to use cpan to install bioperl and I had an error > message saying: > c:\Documents not recognized as and external or internal.... > Any ideas here. Also, I am new to the computer world so please be > kind. :) > > Stacy Duncan > Iowa State University > Bioinformatics and Computational Biology > 1802 University Blvd. > VMRI Building 6 > Ames, IA 50011-1240 > office phone: (515) 294-8385 > office fax: (515) 294-1401 > home phone: (336) 965-5622 > e-mail: slduncan at iastate.edu > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Apr 4 16:07:12 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 4 Apr 2008 15:07:12 -0500 Subject: [Bioperl-l] installing bioperl-1.5.2 errors:DB_File In-Reply-To: <726978.82400.qm@web53105.mail.re2.yahoo.com> References: <726978.82400.qm@web53105.mail.re2.yahoo.com> Message-ID: I think you have to use the cygwin installer to install DB_File (it also installs dependencies, such as BDB). According to 'perldoc perlcygwin': .... Optional Libraries for Perl on Cygwin Several Perl functions and modules depend on the existence of some optional libraries. Configure will find them if they are installed in one of the directories listed as being used for library searches. Pre- built packages for most of these are available from the Cygwin installer. .... chris On Apr 2, 2008, at 12:52 PM, Pan Zheng wrote: > Hi, > > I am installing bioperl-1.5.2_102 under cygwin on my Windows XP and > having some errors during the process. > > When I was running "perl Build test", one major error is the error > about DB_File. I tried to install DB_File from cpan and rpm without > any luck. > > ++++++++++++++++++++++++ > CPAN: File::Temp loaded ok (v0.16) > CPAN: YAML loaded ok (v0.62) > CPAN.pm: Going to build P/PM/PMQS/DB_File-1.817.tar.gz > Parsing config.in... > Looks Good. > Checking if your kit is complete... > Looks good > Note (probably harmless): No library found for -ldb > Writing Makefile for DB_File > cp DB_File.pm blib/lib/DB_File.pm > AutoSplitting blib/lib/DB_File.pm (blib/lib/auto/DB_File) > gcc -c -I/usr/local/BerkeleyDB/include -DPERL_USE_SAFE_PUTENV -fno- > strict-alias > ing -pipe -Wdeclaration-after-statement -DUSEIMPORTLIB -O3 - > DVERSION=\"1.817\" > -DXS_VERSION=\"1.817\" "-I/usr/lib/perl5/5.8/cygwin/CORE" - > D_NOT_CORE -DmDB_ > Prefix_t=size_t -DmDB_Hash_t=u_int32_t version.c > version.c:30:16: db.h: No such file or directory > make: *** [version.o] Error 1 > PMQS/DB_File-1.817.tar.gz > /usr/bin/make -- NOT OK > Running make test > Can't test without successful make > Running make install > Make had returned bad status, install seems impossible > Failed during this command: > PMQS/DB_File-1.817.tar.gz : make NO > +++++++++++++++++++++++++++++++++++++++++++++++ > > > I can't remember I had this kind error while installing earlier > version. > > Would you please help me on DB_File installation ? > > Thanks. > > Pan > > > --------------------------------- > You rock. That's why Blockbuster's offering you one month of > Blockbuster Total Access, No Cost. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Apr 4 17:25:41 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 4 Apr 2008 16:25:41 -0500 Subject: [Bioperl-l] module for pubchem queries In-Reply-To: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> References: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> Message-ID: <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> Do you need something to access eutils via BioPerl, or are you looking for a specific set of classes? I wrote an interface to eutils (Bio::DB::EUtilities), you could do something like this: #!/usr/bin/perl -w use strict; use warnings; use Bio::DB::EUtilities; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -term => 'dihydroorotate', -db => 'pcsubstance', -retmax => 1000); print join(',',$eutil->get_ids)."\n"; chris On Apr 2, 2008, at 7:24 AM, Robert Citek wrote: > Hello all, > > I have a list of chemical compounds that have some kind of interaction > with proteins or genes. The current list contains names or SMILES and > I would like to get the CID number for those compounds. Currently, > I'm using perl to query the NCBI's eutils[1], which works great. But > I was just curious to know of there was a bioperl module to do > something similar. A quick google didn't turn up anything, so I > thought I'd ask. > > [1] http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html > > Regards, > - Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From ekeen at mail.tongji.edu.cn Mon Apr 7 02:57:04 2008 From: ekeen at mail.tongji.edu.cn (Jinyan Huang) Date: Mon, 7 Apr 2008 14:57:04 +0800 Subject: [Bioperl-l] How to analysis the relationship of my interesting KEGG pathways? Message-ID: In my research, I got 25 interesting pathways. I want to know the regulated relationship of these pathways. It is better if there some software to connect these KEGG pathways. Thank you very much in advance. From miguel.pignatelli at uv.es Mon Apr 7 06:12:58 2008 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Mon, 07 Apr 2008 12:12:58 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> Message-ID: <47F9F3AA.2090003@uv.es> Hi all, Is there any way to obtain the date of creation of individual GenBank entries? I don't mean the "last revision" date that can be found in the first line of a GenBank file. I can access this creation date by looking at the "revision history" of any GenBank entry (for example, see http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), but I need a systematic (and local=fast) way to access this information. Any help would be very appreciated, Thank you very much in advance, M; From Bank.Beszteri at awi.de Mon Apr 7 07:46:43 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Mon, 07 Apr 2008 13:46:43 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: References: <47F22B35.1030502@awi.de> Message-ID: <47FA09A3.2070004@awi.de> Hi Hilmar, it was important to understand that the inconsistency in taxon names is apparently only between the Swissprot entries with "non-standard" names and the contents of the taxonomy tables and that it is best to use a pre-loaded taxonomy, thanks for that! We have now updated to bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to have loaded everything OK in ~26 hours (with many of the "The supplied lineage does not start near..." warnings, but no other problems). Our next test is to try to load trembl (will try to do this in parallel in multiple chunks), hope it will work just as nicely! Thanks for your tips & insights! Bank Hilmar Lapp wrote: > > On Apr 1, 2008, at 8:31 AM, B?nk Beszteri wrote: > >> [...] So next we started to test BioSQL, by trying to load just >> Swissprot in a MySQL DB first, like: >> >> load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser >> xyz --dbpass abc --driver mysql --namespace uniprot_sprot --format >> swiss uniprot_sprot.dat >> >> Here we get an error message >> >> ########################################### >> >> Loading /biodb/spinkern/uniprot_sprot.dat ... >> Could not store Q6DAH5: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: The supplied lineage does not start near 'Erwinia carotovora >> subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | >> Pectobacterium | Enterobacteriaceae | Enterobacteriales | >> Gammaproteobacteria | Proteobacteria | Bacteria') >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/ >> bioperl-1.5.2_102/Bio/Root/Root.pm:359 >> STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/ >> bioperl-1.5.2_102/Bio/Species.pm:174 >> STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: 552 >> STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / >> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:1305 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key >> /biodb/ spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:973 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:852 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:182 >> STACK: Bio::DB::Persistent::PersistentObject::create /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: 244 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:169 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store /biodb/spinkern/ >> bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: load_seqdatabase.pl:622 >> ----------------------------------------------------------- >> >> at load_seqdatabase.pl line 635 >> >> ############################################ >> >> or similar, depending on whether we use a pre-loaded ncbi taxonomy >> or not > > > I recommend to always use a pre-loaded NCBI taxonomy unless you know > there are only a few organisms that are straightforward (for the > parser, that is). > >> , and which Swissprot release we are trying to load. It often seems >> to come from sg. like here, subsp. or other special addition to the >> species line; but alternative genus names and other curious things >> also to appear. It looks like Species.pm tries to validate the >> species name against the lineage info already there in the BioSQL >> DB, and in several cases, it finds inconsistencies. > > > It actually happens upon a successful lookup when the species object > is populated from the database. > >> [...] >> The only workaround we have found so far was to comment out line 174 >> in Species.pm: >> >> $self->throw("The supplied lineage does not start near '$name' (I >> was supplied '".join(" | ", @vals)."')"); > > > That should be OK if you work with a pre-loaded taxonomy. It's sort > of a sanity check that should catch a parser having messed up a > species. If you use a pre-loaded NCBI taxonomy the results of the > species parsing don't matter in all details so long as the NCBI > taxonID is parsed out correctly, and then found in the database. > > Note that this actually a warn() in the main trunk version of > BioPerl, so you might want to upgrade to that (or change throw() to > warn() in your version). You still get the records flagged with that, > but it isn't an exception. > >> >> After doing so, load_seqdatabase.pl runs for several hours (until it >> evetually crashes; I haven?t found out yet why), but proceeds really >> slowly. > > > It should certainly *not* crash. Note also that you can supply --safe > on the command line, in which case the script will continue with the > next record if one fails to load for whatever reason. > > You will want to adjust the width constraint of dbxref.accession, for > example to 128 chars. This will also be fixed for BioSQL 1.0.1. > See http://bugzilla.open-bio.org/show_bug.cgi?id=2474 > > >> I also found some info on this for Pg and Oracle in the mailing >> list, but has anyone some approximate numbers for MySQL, how long >> should a first Swissprot load take? > > > Possibly around 20 hours according to Erik Rijkers: > See http://lists.open-bio.org/pipermail/bioperl-l/2008-March/027427.html > > You can use the --logchunks N option to have it print out performance > statistics every N records. > > Hope this helps, > > -hilmar From cjfields at uiuc.edu Mon Apr 7 08:32:45 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Apr 2008 07:32:45 -0500 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47FA09A3.2070004@awi.de> References: <47F22B35.1030502@awi.de> <47FA09A3.2070004@awi.de> Message-ID: The warnings are something that we still need to resolve, but the only fix I can think of likely breaks backward compatibility with older bioperl-db installations (i.e. storing the given scientific name instead of the binomial name, which is used as a fallback when no taxid is found). There is a full explanation here: http://bugzilla.open-bio.org/show_bug.cgi?id=2092 Anyway, I think it needs further testing when someone, likely Hilmar or I, have time. chris On Apr 7, 2008, at 6:46 AM, B?nk Beszteri wrote: > Hi Hilmar, > > it was important to understand that the inconsistency in taxon names > is apparently only between the Swissprot entries with "non-standard" > names and the contents of the taxonomy tables and that it is best to > use a pre-loaded taxonomy, thanks for that! We have now updated to > bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to > have loaded everything OK in ~26 hours (with many of the "The > supplied lineage does not start near..." warnings, but no other > problems). Our next test is to try to load trembl (will try to do > this in parallel in multiple chunks), hope it will work just as > nicely! > > Thanks for your tips & insights! > > Bank > > Hilmar Lapp wrote: > >> >> On Apr 1, 2008, at 8:31 AM, B?nk Beszteri wrote: >> >>> [...] So next we started to test BioSQL, by trying to load just >>> Swissprot in a MySQL DB first, like: >>> >>> load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser >>> xyz --dbpass abc --driver mysql --namespace uniprot_sprot -- >>> format swiss uniprot_sprot.dat >>> >>> Here we get an error message >>> >>> ########################################### >>> >>> Loading /biodb/spinkern/uniprot_sprot.dat ... >>> Could not store Q6DAH5: >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: The supplied lineage does not start near 'Erwinia carotovora >>> subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | >>> Pectobacterium | Enterobacteriaceae | Enterobacteriales | >>> Gammaproteobacteria | Proteobacteria | Bacteria') >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/ >>> bioperl-1.5.2_102/Bio/Root/Root.pm:359 >>> STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/ >>> bioperl-1.5.2_102/Bio/Species.pm:174 >>> STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/ >>> PersistentObject.pm: 552 >>> STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / >>> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:1305 >>> STACK: >>> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / >>> biodb/ spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:973 >>> STACK: >>> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >>> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:852 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:182 >>> STACK: Bio::DB::Persistent::PersistentObject::create /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/ >>> PersistentObject.pm: 244 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:169 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:251 >>> STACK: Bio::DB::Persistent::PersistentObject::store /biodb/ >>> spinkern/ bioperl-db-1.5.2_100/Bio/DB/Persistent/ >>> PersistentObject.pm:271 >>> STACK: load_seqdatabase.pl:622 >>> ----------------------------------------------------------- >>> >>> at load_seqdatabase.pl line 635 >>> >>> ############################################ >>> >>> or similar, depending on whether we use a pre-loaded ncbi >>> taxonomy or not >> >> >> I recommend to always use a pre-loaded NCBI taxonomy unless you >> know there are only a few organisms that are straightforward (for >> the parser, that is). >> >>> , and which Swissprot release we are trying to load. It often >>> seems to come from sg. like here, subsp. or other special >>> addition to the species line; but alternative genus names and >>> other curious things also to appear. It looks like Species.pm >>> tries to validate the species name against the lineage info >>> already there in the BioSQL DB, and in several cases, it finds >>> inconsistencies. >> >> >> It actually happens upon a successful lookup when the species >> object is populated from the database. >> >>> [...] >>> The only workaround we have found so far was to comment out line >>> 174 in Species.pm: >>> >>> $self->throw("The supplied lineage does not start near '$name' (I >>> was supplied '".join(" | ", @vals)."')"); >> >> >> That should be OK if you work with a pre-loaded taxonomy. It's >> sort of a sanity check that should catch a parser having messed up >> a species. If you use a pre-loaded NCBI taxonomy the results of >> the species parsing don't matter in all details so long as the >> NCBI taxonID is parsed out correctly, and then found in the >> database. >> >> Note that this actually a warn() in the main trunk version of >> BioPerl, so you might want to upgrade to that (or change throw() >> to warn() in your version). You still get the records flagged with >> that, but it isn't an exception. >> >>> >>> After doing so, load_seqdatabase.pl runs for several hours (until >>> it evetually crashes; I haven?t found out yet why), but proceeds >>> really slowly. >> >> >> It should certainly *not* crash. Note also that you can supply -- >> safe on the command line, in which case the script will continue >> with the next record if one fails to load for whatever reason. >> >> You will want to adjust the width constraint of dbxref.accession, >> for example to 128 chars. This will also be fixed for BioSQL 1.0.1. >> See http://bugzilla.open-bio.org/show_bug.cgi?id=2474 >> >> >>> I also found some info on this for Pg and Oracle in the mailing >>> list, but has anyone some approximate numbers for MySQL, how long >>> should a first Swissprot load take? >> >> >> Possibly around 20 hours according to Erik Rijkers: >> See http://lists.open-bio.org/pipermail/bioperl-l/2008-March/027427.html >> >> You can use the --logchunks N option to have it print out >> performance statistics every N records. >> >> Hope this helps, >> >> -hilmar > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Mon Apr 7 08:34:00 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 07 Apr 2008 13:34:00 +0100 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47FA09A3.2070004@awi.de> References: <47F22B35.1030502@awi.de> <47FA09A3.2070004@awi.de> Message-ID: <47FA14B8.7000500@sendu.me.uk> B?nk Beszteri wrote: > Hi Hilmar, > > it was important to understand that the inconsistency in taxon names is > apparently only between the Swissprot entries with "non-standard" names > and the contents of the taxonomy tables and that it is best to use a > pre-loaded taxonomy, thanks for that! We have now updated to > bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to have > loaded everything OK in ~26 hours (with many of the "The supplied > lineage does not start near..." warnings, but no other problems). Can you provide some examples of these warnings (of the taxons that cause them)? If there's anything consistent about them perhaps Bio::Species can be improved to accommodate them properly (instead of just issuing the warning and getting the classification wrong). From heikki at sanbi.ac.za Mon Apr 7 08:48:34 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Mon, 7 Apr 2008 14:48:34 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47F9F3AA.2090003@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> Message-ID: <200804071448.34769.heikki@sanbi.ac.za> Miguel, You probably know this but: - Your entry example below is a GenPept entry, not a GenBank entry - The NCBI sequence format "genbank" has only the last modified date. I do not know about other formats (ASN.1, ...) - NCBI Entrez is a great tool but it obscures the source database. - If you really are working on real GenBank entries, you can use the accession number to see find corresponding EMBL (and Swiss-Prot) flat file formats that have both creation and last modified dates. Post to the list if you have trouble getting the dates from EMBL/Swiss-Prot formats using bioperl. Yours, -Heikki On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: > Hi all, > > Is there any way to obtain the date of creation of individual GenBank > entries? I don't mean the "last revision" date that can be found in the > first line of a GenBank file. > > I can access this creation date by looking at the "revision history" of > any GenBank entry (for example, see > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), > but I need a systematic (and local=fast) way to access this information. > > Any help would be very appreciated, > Thank you very much in advance, > > M; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From granjeau at tagc.univ-mrs.fr Mon Apr 7 09:30:10 2008 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/ICIM) Date: Mon, 07 Apr 2008 15:30:10 +0200 Subject: [Bioperl-l] help installing bioperl with cygwin In-Reply-To: References: <161313331084931@webmail.iastate.edu> Message-ID: <47FA21E2.3010602@tagc.univ-mrs.fr> Hi, I'm using BioPerl under Cygwin, because Cygwin allows one to work in a Unix-like environment in a command line point of view. So, I use the CVS version which runs out of the box http://www.bioperl.org/wiki/Using_CVS which has been replaced by SVN at the beginning of the year http://www.bioperl.org/wiki/Using_Subversion So if you really want to work under Cygwin, you can try this quick and dirty way, but you still have to become experienced because BioPerl is not supported under Cygwin. You may try Strawberry, but in my experience in installing wxPerl, wxPerl fails on both flavours of Perl. ActiveState's Perl is still the easiest way to install many packages. Regards, Samuel Chris Fields wrote: > It's best if you use ActiveState's Perl installation (it's the only > one we really support at this moment, unless someone wants to give > StrawberryPerl a run). See: > > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > chris > > On Apr 3, 2008, at 1:13 PM, slduncan at iastate.edu wrote: > >> I am trying to use cpan to install bioperl and I had an error message >> saying: >> c:\Documents not recognized as and external or internal.... >> Any ideas here. Also, I am new to the computer world so please be >> kind. :) >> >> Stacy Duncan >> Iowa State University >> Bioinformatics and Computational Biology >> 1802 University Blvd. >> VMRI Building 6 >> Ames, IA 50011-1240 >> office phone: (515) 294-8385 >> office fax: (515) 294-1401 >> home phone: (336) 965-5622 >> e-mail: slduncan at iastate.edu >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Samuel GRANJEAUD granjeau at tagc.univ-mrs.fr INSERM - ICIM - TAGC Tel: +33 (0)491 82 87 24 http://tagc.univ-mrs.fr Fax: +33 (0)491 82 87 01 http://icim.marseille.inserm.fr/proteomique From er at xs4all.nl Mon Apr 7 10:36:57 2008 From: er at xs4all.nl (Erik) Date: Mon, 7 Apr 2008 16:36:57 +0200 (CEST) Subject: [Bioperl-l] Indexing large databases / BioSQL Message-ID: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> On Mon, April 7, 2008 14:34, Sendu Bala wrote: > B?nk Beszteri wrote: >> Hi Hilmar, >> >> it was important to understand that the inconsistency in taxon names is >> apparently only between the Swissprot entries with "non-standard" names >> and the contents of the taxonomy tables and that it is best to use a >> pre-loaded taxonomy, thanks for that! We have now updated to >> bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to have >> loaded everything OK in ~26 hours (with many of the "The supplied >> lineage does not start near..." warnings, but no other problems). > > Can you provide some examples of these warnings (of the taxons that > cause them)? If there's anything consistent about them perhaps > Bio::Species can be improved to accommodate them properly (instead of > just issuing the warning and getting the classification wrong). > I did this a little while ago and saved the output (UniProtKB/Swiss-Prot Release 55.1 of 18-Mar-2008, I think). All warnings (and a few errors) for swissprot are here: http://bugzilla.open-bio.org/show_bug.cgi?id=2474 as an attached file I suppose the OP will have encountered similar output - I don't think there is much RDBMS-type-dependency involved. regards, Erik Rijkers From cjfields at uiuc.edu Mon Apr 7 11:46:01 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Apr 2008 10:46:01 -0500 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <200804071448.34769.heikki@sanbi.ac.za> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> Message-ID: <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> Strangely enough, if you use NCBI's esummary you can get both dates. Via Bio::DB::EUtilities in bioperl-live, if you dump out DocSum data (using a debugging method I added in a while back): --------------------------------------- use Bio::DB::EUtilities; # for multiple IDs use an array ref; also only use GI's (not accessions) my $factory = Bio::DB::EUtilities->new( -eutil => 'esummary', -db => 'protein', -id => 1621261); $factory->print_DocSums; --------------------------------------- One gets the following tag/value pairs: UID: 1621261 Caption :CAB02640 Title :PROBABLE PYRIMIDINE OPERON REGULATORY PROTEIN PYRR [Mycobacterium tuberculosis H37Rv] Extra :gi|1621261|emb|CAB02640.1|[1621261] Gi :1621261 CreateDate :2003/11/21 UpdateDate :2006/11/14 Flags : TaxId :83332 Length :193 Status :live ReplacedBy : Comment : I'll add in a method to grab the data element by tag (in this case, grab the creation date by asking for the 'CreateDate' key). Might come in handy for scripts. chris On Apr 7, 2008, at 7:48 AM, Heikki Lehvaslaiho wrote: > Miguel, > > You probably know this but: > > - Your entry example below is a GenPept entry, not a GenBank entry > - The NCBI sequence format "genbank" has only the last modified date. > I do not know about other formats (ASN.1, ...) > - NCBI Entrez is a great tool but it obscures the source database. > - If you really are working on real GenBank entries, you can use the > accession > number to see find corresponding EMBL (and Swiss-Prot) flat file > formats that > have both creation and last modified dates. > > Post to the list if you have trouble getting the dates from EMBL/ > Swiss-Prot > formats using bioperl. > > Yours, > > -Heikki > > On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: >> Hi all, >> >> Is there any way to obtain the date of creation of individual GenBank >> entries? I don't mean the "last revision" date that can be found in >> the >> first line of a GenBank file. >> >> I can access this creation date by looking at the "revision >> history" of >> any GenBank entry (for example, see >> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi? >> val=74311105), >> but I need a systematic (and local=fast) way to access this >> information. >> >> Any help would be very appreciated, >> Thank you very much in advance, >> >> M; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miguel.pignatelli at uv.es Mon Apr 7 12:24:50 2008 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Mon, 07 Apr 2008 18:24:50 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> Message-ID: <47FA4AD2.5030206@uv.es> I've noticed that the ASN.1 version of those records has a "creation-date" tag. But this is somehow strange, because the creation date obtained by you and that obtained via ASN.1 format is 2003/11/21, but if you look at the revision history of the record: http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=CAB02640 reports a creation date of "Oct 19 1996 12:28 AM" I don't know how to get this, because the EMBL version of this gene: http://www.ebi.ac.uk/cgi-bin/dbfetch?db=emblcds&id=CAB02640&style=raw doesn't has DT fields at all. M; Chris Fields wrote: > Strangely enough, if you use NCBI's esummary you can get both dates. > Via Bio::DB::EUtilities in bioperl-live, if you dump out DocSum data > (using a debugging method I added in a while back): > > --------------------------------------- > > use Bio::DB::EUtilities; > > # for multiple IDs use an array ref; also only use GI's (not accessions) > my $factory = Bio::DB::EUtilities->new( > -eutil => 'esummary', > -db => 'protein', > -id => 1621261); > > $factory->print_DocSums; > > --------------------------------------- > > One gets the following tag/value pairs: > > UID: 1621261 > Caption :CAB02640 > Title :PROBABLE PYRIMIDINE OPERON REGULATORY PROTEIN PYRR > [Mycobacterium tuberculosis > H37Rv] > Extra :gi|1621261|emb|CAB02640.1|[1621261] > Gi :1621261 > CreateDate :2003/11/21 > UpdateDate :2006/11/14 > Flags : > TaxId :83332 > Length :193 > Status :live > ReplacedBy : > Comment : > > I'll add in a method to grab the data element by tag (in this case, grab > the creation date by asking for the 'CreateDate' key). Might come in > handy for scripts. > > chris > > On Apr 7, 2008, at 7:48 AM, Heikki Lehvaslaiho wrote: > >> Miguel, >> >> You probably know this but: >> >> - Your entry example below is a GenPept entry, not a GenBank entry >> - The NCBI sequence format "genbank" has only the last modified date. >> I do not know about other formats (ASN.1, ...) >> - NCBI Entrez is a great tool but it obscures the source database. >> - If you really are working on real GenBank entries, you can use the >> accession >> number to see find corresponding EMBL (and Swiss-Prot) flat file >> formats that >> have both creation and last modified dates. >> >> Post to the list if you have trouble getting the dates from >> EMBL/Swiss-Prot >> formats using bioperl. >> >> Yours, >> >> -Heikki >> >> On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: >>> Hi all, >>> >>> Is there any way to obtain the date of creation of individual GenBank >>> entries? I don't mean the "last revision" date that can be found in the >>> first line of a GenBank file. >>> >>> I can access this creation date by looking at the "revision history" of >>> any GenBank entry (for example, see >>> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), >>> but I need a systematic (and local=fast) way to access this information. >>> >>> Any help would be very appreciated, >>> Thank you very much in advance, >>> >>> M; >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ______ _/ _/_____________________________________________________ >> _/ _/ >> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >> _/ _/ _/ SANBI, South African National Bioinformatics Institute >> _/ _/ _/ University of Western Cape, South Africa >> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >> ___ _/_/_/_/_/________________________________________________________ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Mon Apr 7 13:48:45 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Apr 2008 12:48:45 -0500 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47FA4AD2.5030206@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> <47FA4AD2.5030206@uv.es> Message-ID: Note in the example I gave that, during the revision history, the DBSOURCE changed at the point of the creation date (the original nuc. record was a M. tuberculosis contig sequence, which later changed to an updated full M. tuberculosis genome record at the time of the 'create date'). Couldn't find anything specific in the GenBank docs on this, but it appears (at least for a protein record) the creation date reflects the date in which the sequence was either originally deposited or originally derived from the nucleotide source record present in the record. In other words, it may not reflect the original date of deposition (which could have come from a different record, as in this case). chris On Apr 7, 2008, at 11:24 AM, Miguel Pignatelli wrote: > > I've noticed that the ASN.1 version of those records has a "creation- > date" tag. > But this is somehow strange, because the creation date obtained by > you and that obtained via ASN.1 format is 2003/11/21, but if you > look at the revision history of the record: > > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=CAB02640 > > reports a creation date of "Oct 19 1996 12:28 AM" > > I don't know how to get this, because the EMBL version of this gene: > > http://www.ebi.ac.uk/cgi-bin/dbfetch?db=emblcds&id=CAB02640&style=raw > > doesn't has DT fields at all. > > M; > > > Chris Fields wrote: >> Strangely enough, if you use NCBI's esummary you can get both >> dates. Via Bio::DB::EUtilities in bioperl-live, if you dump out >> DocSum data (using a debugging method I added in a while back): >> --------------------------------------- >> use Bio::DB::EUtilities; >> # for multiple IDs use an array ref; also only use GI's (not >> accessions) >> my $factory = Bio::DB::EUtilities->new( >> -eutil => 'esummary', >> -db => 'protein', >> -id => 1621261); >> $factory->print_DocSums; >> --------------------------------------- >> One gets the following tag/value pairs: >> UID: 1621261 >> Caption :CAB02640 >> Title :PROBABLE PYRIMIDINE OPERON REGULATORY PROTEIN >> PYRR [Mycobacterium tuberculosis >> H37Rv] >> Extra :gi|1621261|emb|CAB02640.1|[1621261] >> Gi :1621261 >> CreateDate :2003/11/21 >> UpdateDate :2006/11/14 >> Flags : >> TaxId :83332 >> Length :193 >> Status :live >> ReplacedBy : >> Comment : >> I'll add in a method to grab the data element by tag (in this case, >> grab the creation date by asking for the 'CreateDate' key). Might >> come in handy for scripts. >> chris >> On Apr 7, 2008, at 7:48 AM, Heikki Lehvaslaiho wrote: >>> Miguel, >>> >>> You probably know this but: >>> >>> - Your entry example below is a GenPept entry, not a GenBank entry >>> - The NCBI sequence format "genbank" has only the last modified >>> date. >>> I do not know about other formats (ASN.1, ...) >>> - NCBI Entrez is a great tool but it obscures the source database. >>> - If you really are working on real GenBank entries, you can use >>> the accession >>> number to see find corresponding EMBL (and Swiss-Prot) flat file >>> formats that >>> have both creation and last modified dates. >>> >>> Post to the list if you have trouble getting the dates from EMBL/ >>> Swiss-Prot >>> formats using bioperl. >>> >>> Yours, >>> >>> -Heikki >>> >>> On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: >>>> Hi all, >>>> >>>> Is there any way to obtain the date of creation of individual >>>> GenBank >>>> entries? I don't mean the "last revision" date that can be found >>>> in the >>>> first line of a GenBank file. >>>> >>>> I can access this creation date by looking at the "revision >>>> history" of >>>> any GenBank entry (for example, see >>>> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105) >>>> , >>>> but I need a systematic (and local=fast) way to access this >>>> information. >>>> >>>> Any help would be very appreciated, >>>> Thank you very much in advance, >>>> >>>> M; >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> ______ _/ _/ >>> _____________________________________________________ >>> _/ _/ >>> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >>> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >>> _/ _/ _/ SANBI, South African National Bioinformatics Institute >>> _/ _/ _/ University of Western Cape, South Africa >>> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >>> ___ _/_/_/_/_/ >>> ________________________________________________________ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Bank.Beszteri at awi.de Tue Apr 8 03:35:43 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Tue, 08 Apr 2008 09:35:43 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> Message-ID: <47FB204F.90405@awi.de> >>Can you provide some examples of these warnings (of the taxons that >>cause them)? If there's anything consistent about them perhaps >>Bio::Species can be improved to accommodate them properly (instead of >>just issuing the warning and getting the classification wrong). >> >> > >All warnings (and a few errors) for swissprot are here: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2474 > >as an attached file > >I suppose the OP will have encountered similar output - I don't think there is >much RDBMS-type-dependency involved. > > Hi Erik & Sendu, yes, the same kind of thing, probably no DBMS-type dependency; in case it could be useful, I uploaded my output as a second attachment to the bugzilla report cited above. Bank From heikki at sanbi.ac.za Tue Apr 8 04:32:12 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Tue, 8 Apr 2008 10:32:12 +0200 Subject: [Bioperl-l] Blast database sequence retrieval perl script In-Reply-To: <6BEABCD5CA640A44A848448A42A03B73079E48C9@ilrikeadx1.ILRI.CGIARAD.ORG> References: <6BEABCD5CA640A44A848448A42A03B73079E48C9@ilrikeadx1.ILRI.CGIARAD.ORG> Message-ID: <200804081032.12312.heikki@sanbi.ac.za> Dear Nelson, I am cc:ing the bioperl mailing list where all these kind of queries should go. More people can help you that way. Since you have your own local data set, you need to create an index that catalogues you sequences for easy retrieval. You need to install bioperl-live first. See for example: http://www.bioperl.org/wiki/Using_Subversion Then you can follow this HOWTO: http://www.bioperl.org/wiki/HOWTO:Flat_databases The other HOWTOs will help you dealing with BioPerl sequence objects that are retrieved: http://www.bioperl.org/wiki/HOWTOs. Yours, -Heikki On Monday 07 April 2008 14:50:23 Ndegwa, Nelson (IITA-Nairobi) wrote: > Dear Prof. Heikki, > > Hi. We met at the Pathogen Bioinformatics Conference held in Nairobi > Kenya in May 2007 at ICIPE. I recall you are a developer of Bioperl and > Perl. I have managed to install a local Blast, having just cowpea Contig > sequences, about 50,000 in total. This runs fine, as I can perform > various queries and get results. However, any good match/hit on the > local Blast database is hard to retrieve and the only option seems to go > back to that database and search manually for the top hit sequence - an > exceedingly manual task. Might you perhaps be having a Perl script I > could adopt to my database to help with this task Such that the hits > have a hyperlink which can be used to retrieve that specific entry? I > have limited knowledge of Perl. Thank you. > > With Kind Regards, > > Nelson. -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From David.Messina at sbc.su.se Tue Apr 8 07:29:12 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 8 Apr 2008 13:29:12 +0200 Subject: [Bioperl-l] How to analysis the relationship of my interesting KEGG pathways? In-Reply-To: <628aabb70804080053g1fd9120ex9d5fd12f65f216f9@mail.gmail.com> References: <628aabb70804080053g1fd9120ex9d5fd12f65f216f9@mail.gmail.com> Message-ID: <628aabb70804080429k2aa17a6eu12197709d4cc1af0@mail.gmail.com> Hi Jinyan, You asked a similar question last week and received a couple of suggestions -- did you take a look at those? I'm not an expert on this topic, but I believe that since regulatory information is much harder to obtain experimentally and therefore much less well known, there isn't a lot of it in pathway databases like KEGG. You may have to look through the literature and start trying to put together possible regulatory links on your own. Dave From hrh at sanger.ac.uk Tue Apr 8 08:48:32 2008 From: hrh at sanger.ac.uk (Hans Rudolf Hotz) Date: Tue, 8 Apr 2008 13:48:32 +0100 (BST) Subject: [Bioperl-l] Blast database sequence retrieval perl script In-Reply-To: <200804081032.12312.heikki@sanbi.ac.za> References: <6BEABCD5CA640A44A848448A42A03B73079E48C9@ilrikeadx1.ILRI.CGIARAD.ORG> <200804081032.12312.heikki@sanbi.ac.za> Message-ID: Nelson or simply use the BLAST indices for the sequence retrieval as well. All you need to do is adding the "-o" option to the 'formatdb' command for the BLAST index creation (this will create some extra files). Then you can use 'fastacmd' (which is also part of the NCBI BLAST package) to retrieve the sequences. Hans On Tue, 8 Apr 2008, Heikki Lehvaslaiho wrote: > > Dear Nelson, > > I am cc:ing the bioperl mailing list where all these kind of queries should > go. More people can help you that way. > > > Since you have your own local data set, you need to create an index that > catalogues you sequences for easy retrieval. > > You need to install bioperl-live first. See for example: > http://www.bioperl.org/wiki/Using_Subversion > > Then you can follow this HOWTO: > http://www.bioperl.org/wiki/HOWTO:Flat_databases > > The other HOWTOs will help you dealing with BioPerl sequence objects that are > retrieved: http://www.bioperl.org/wiki/HOWTOs. > > > Yours, > > -Heikki > > > On Monday 07 April 2008 14:50:23 Ndegwa, Nelson (IITA-Nairobi) wrote: >> Dear Prof. Heikki, >> >> Hi. We met at the Pathogen Bioinformatics Conference held in Nairobi >> Kenya in May 2007 at ICIPE. I recall you are a developer of Bioperl and >> Perl. I have managed to install a local Blast, having just cowpea Contig >> sequences, about 50,000 in total. This runs fine, as I can perform >> various queries and get results. However, any good match/hit on the >> local Blast database is hard to retrieve and the only option seems to go >> back to that database and search manually for the top hit sequence - an >> exceedingly manual task. Might you perhaps be having a Perl script I >> could adopt to my database to help with this task Such that the hits >> have a hyperlink which can be used to retrieve that specific entry? I >> have limited knowledge of Perl. Thank you. >> >> With Kind Regards, >> >> Nelson. > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From robert.citek at gmail.com Tue Apr 8 10:09:27 2008 From: robert.citek at gmail.com (Robert Citek) Date: Tue, 8 Apr 2008 09:09:27 -0500 Subject: [Bioperl-l] module for pubchem queries In-Reply-To: <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> References: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> Message-ID: <4145b6790804080709l20f1e56erf4b7af04b0a52870@mail.gmail.com> Wrapping bioperl around eutils will work just fine. Thanks for the pointer. http://search.cpan.org/~sendu/bioperl-1.5.2_102/Bio/DB/EUtilities.pm Regards, - Robert On Fri, Apr 4, 2008 at 4:25 PM, Chris Fields wrote: > Do you need something to access eutils via BioPerl, or are you looking for a > specific set of classes? I wrote an interface to eutils > (Bio::DB::EUtilities), you could do something like this: > > #!/usr/bin/perl -w > > use strict; > use warnings; > use Bio::DB::EUtilities; > > my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', > -term => 'dihydroorotate', > -db => 'pcsubstance', > -retmax => 1000); > > print join(',',$eutil->get_ids)."\n"; > > chris From cjfields at uiuc.edu Tue Apr 8 11:10:26 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 8 Apr 2008 10:10:26 -0500 Subject: [Bioperl-l] module for pubchem queries In-Reply-To: <4145b6790804080709l20f1e56erf4b7af04b0a52870@mail.gmail.com> References: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> <4145b6790804080709l20f1e56erf4b7af04b0a52870@mail.gmail.com> Message-ID: <32D210FC-575E-4D95-95DA-FC6F5BE1FC24@uiuc.edu> Just to note, the the API has changed significantly from the interface in the 1.5.2 release. The up-to-date (supported) interface is in subversion; there are some example recipes here: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook I'm working on a full HOWTO, just haven't had time to get it up on the wiki yet. chris On Apr 8, 2008, at 9:09 AM, Robert Citek wrote: > Wrapping bioperl around eutils will work just fine. Thanks for the > pointer. > > http://search.cpan.org/~sendu/bioperl-1.5.2_102/Bio/DB/EUtilities.pm > > Regards, > - Robert > > On Fri, Apr 4, 2008 at 4:25 PM, Chris Fields > wrote: >> Do you need something to access eutils via BioPerl, or are you >> looking for a >> specific set of classes? I wrote an interface to eutils >> (Bio::DB::EUtilities), you could do something like this: >> >> #!/usr/bin/perl -w >> >> use strict; >> use warnings; >> use Bio::DB::EUtilities; >> >> my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', >> -term => 'dihydroorotate', >> -db => 'pcsubstance', >> -retmax => 1000); >> >> print join(',',$eutil->get_ids)."\n"; >> >> chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cuiw at ncbi.nlm.nih.gov Tue Apr 8 16:41:58 2008 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Tue, 8 Apr 2008 16:41:58 -0400 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47F9F3AA.2090003@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> Message-ID: <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> Hi, Miguel: id1_fetch can do it. Detailed instruction can be found at: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id 1_fetch.html Here is an example: >id1_fetch -lt revisions -flat '12:74311105' -fmt fasta GI Loaded DB Retrieval No. -- ------ -- ------------- 74311105 12/07/2007 NCBI 19766263 74311105 01/23/2007 NCBI 16325656 74311105 03/30/2006 NCBI 13131204 74311105 03/03/2006 NCBI 12915541 74311105 03/02/2006 NCBI 12885275 74311105 12/03/2005 NCBI 12259793 74311105 09/09/2005 NCBI 11257262 74311105 09/09/2005 NCBI 11242667 Wenwu Cui PhD NCBI/NLM/NIH > -----Original Message----- > From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > Sent: Monday, April 07, 2008 6:13 AM > Cc: bioperl-l at bioperl.org > Subject: [Bioperl-l] GenBank entries creation dates > > Hi all, > > Is there any way to obtain the date of creation of individual GenBank > entries? I don't mean the "last revision" date that can be found in the > first line of a GenBank file. > > I can access this creation date by looking at the "revision history" of > any GenBank entry (for example, see > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), > but I need a systematic (and local=fast) way to access this > information. > > Any help would be very appreciated, > Thank you very much in advance, > > M; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miguel.pignatelli at uv.es Wed Apr 9 07:32:39 2008 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Wed, 09 Apr 2008 13:32:39 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> Message-ID: <47FCA957.5040409@uv.es> Wow, impressive, thanks Wenwu for the information, I have never used this tool before. The problem is that I need to know all the revision history (or at least the creation date) for *all* the GIs present in nr (well, or at least a significant portion of it) and this tool queries via web. The existence of this tool confirms me that this information is available somewhere, is it possible to download the data that contains this information? Thanks again, M; Cui, Wenwu (NIH/NLM/NCBI) [C] wrote: > Hi, Miguel: > > id1_fetch can do it. Detailed instruction can be found at: > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id > 1_fetch.html > > Here is an example: > >> id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > GI Loaded DB Retrieval No. > -- ------ -- ------------- > 74311105 12/07/2007 NCBI 19766263 > 74311105 01/23/2007 NCBI 16325656 > 74311105 03/30/2006 NCBI 13131204 > 74311105 03/03/2006 NCBI 12915541 > 74311105 03/02/2006 NCBI 12885275 > 74311105 12/03/2005 NCBI 12259793 > 74311105 09/09/2005 NCBI 11257262 > 74311105 09/09/2005 NCBI 11242667 > > Wenwu Cui PhD > NCBI/NLM/NIH > >> -----Original Message----- >> From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] >> Sent: Monday, April 07, 2008 6:13 AM >> Cc: bioperl-l at bioperl.org >> Subject: [Bioperl-l] GenBank entries creation dates >> >> Hi all, >> >> Is there any way to obtain the date of creation of individual GenBank >> entries? I don't mean the "last revision" date that can be found in > the >> first line of a GenBank file. >> >> I can access this creation date by looking at the "revision history" > of >> any GenBank entry (for example, see >> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), >> but I need a systematic (and local=fast) way to access this >> information. >> >> Any help would be very appreciated, >> Thank you very much in advance, >> >> M; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cuiw at ncbi.nlm.nih.gov Wed Apr 9 09:25:16 2008 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Wed, 9 Apr 2008 09:25:16 -0400 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47FCA957.5040409@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> <47FCA957.5040409@uv.es> Message-ID: <6F230E9769AA8D4EB4BC401DF133EDB7180BE1@NIHCESMLBX15.nih.gov> Hi, Miguel, I do not know whether the data file is publically available. However, you can perform 'real time' query via id1_fetch: ####step 1: generate GI file ##### id1_fetch -query 'YOUR-GENBANK-QUERY-STRING' -lt none -db Nucleotide -out qfile ####step 2: retrieve revisions for GIs stored in qfile ##### id1_fetch -lt revisions -qf qfile -fmt fasta -db Nucleotide Good luck! Wenwu Cui > -----Original Message----- > From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > Sent: Wednesday, April 09, 2008 7:33 AM > To: Cui, Wenwu (NIH/NLM/NCBI) [C] > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] GenBank entries creation dates > > Wow, impressive, thanks Wenwu for the information, I have never used > this tool before. The problem is that I need to know all the revision > history (or at least the creation date) for *all* the GIs present in nr > (well, or at least a significant portion of it) and this tool queries > via web. > > The existence of this tool confirms me that this information is > available somewhere, is it possible to download the data that contains > this information? > > Thanks again, > > M; > > > Cui, Wenwu (NIH/NLM/NCBI) [C] wrote: > > Hi, Miguel: > > > > id1_fetch can do it. Detailed instruction can be found at: > > > > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.i > d > > 1_fetch.html > > > > Here is an example: > > > >> id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > > GI Loaded DB Retrieval No. > > -- ------ -- ------------- > > 74311105 12/07/2007 NCBI 19766263 > > 74311105 01/23/2007 NCBI 16325656 > > 74311105 03/30/2006 NCBI 13131204 > > 74311105 03/03/2006 NCBI 12915541 > > 74311105 03/02/2006 NCBI 12885275 > > 74311105 12/03/2005 NCBI 12259793 > > 74311105 09/09/2005 NCBI 11257262 > > 74311105 09/09/2005 NCBI 11242667 > > > > Wenwu Cui PhD > > NCBI/NLM/NIH > > > >> -----Original Message----- > >> From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > >> Sent: Monday, April 07, 2008 6:13 AM > >> Cc: bioperl-l at bioperl.org > >> Subject: [Bioperl-l] GenBank entries creation dates > >> > >> Hi all, > >> > >> Is there any way to obtain the date of creation of individual > GenBank > >> entries? I don't mean the "last revision" date that can be found in > > the > >> first line of a GenBank file. > >> > >> I can access this creation date by looking at the "revision history" > > of > >> any GenBank entry (for example, see > >> > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), > >> but I need a systematic (and local=fast) way to access this > >> information. > >> > >> Any help would be very appreciated, > >> Thank you very much in advance, > >> > >> M; > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From CALLEY_JOHN_N at LILLY.COM Wed Apr 9 09:45:23 2008 From: CALLEY_JOHN_N at LILLY.COM (John N Calley) Date: Wed, 9 Apr 2008 09:45:23 -0400 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47FCA957.5040409@uv.es> Message-ID: You might want to keep in mind that the creation date is not always reliable. I am aware of one example where the recorded creation date precedes the sequencing date by several months (as determined by the trace file date). NCBI was not able to explain exactly what happened but (as I recall) hypothesized that some dates had been scrambled in a database rebuild. If there was interest I could probably pull up more details. John Calley Miguel Pignatelli Sent by: bioperl-l-bounces at lists.open-bio.org 04/09/2008 07:32 AM Please respond to miguel.pignatelli at uv.es To "Cui, Wenwu (NIH/NLM/NCBI) [C]" cc bioperl-l at bioperl.org Subject Re: [Bioperl-l] GenBank entries creation dates Wow, impressive, thanks Wenwu for the information, I have never used this tool before. The problem is that I need to know all the revision history (or at least the creation date) for *all* the GIs present in nr (well, or at least a significant portion of it) and this tool queries via web. The existence of this tool confirms me that this information is available somewhere, is it possible to download the data that contains this information? Thanks again, M; Cui, Wenwu (NIH/NLM/NCBI) [C] wrote: > Hi, Miguel: > > id1_fetch can do it. Detailed instruction can be found at: > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id > 1_fetch.html > > Here is an example: > >> id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > GI Loaded DB Retrieval No. > -- ------ -- ------------- > 74311105 12/07/2007 NCBI 19766263 > 74311105 01/23/2007 NCBI 16325656 > 74311105 03/30/2006 NCBI 13131204 > 74311105 03/03/2006 NCBI 12915541 > 74311105 03/02/2006 NCBI 12885275 > 74311105 12/03/2005 NCBI 12259793 > 74311105 09/09/2005 NCBI 11257262 > 74311105 09/09/2005 NCBI 11242667 > > Wenwu Cui PhD > NCBI/NLM/NIH > >> -----Original Message----- >> From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] >> Sent: Monday, April 07, 2008 6:13 AM >> Cc: bioperl-l at bioperl.org >> Subject: [Bioperl-l] GenBank entries creation dates >> >> Hi all, >> >> Is there any way to obtain the date of creation of individual GenBank >> entries? I don't mean the "last revision" date that can be found in > the >> first line of a GenBank file. >> >> I can access this creation date by looking at the "revision history" > of >> any GenBank entry (for example, see >> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), >> but I need a systematic (and local=fast) way to access this >> information. >> >> Any help would be very appreciated, >> Thank you very much in advance, >> >> M; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From frederic.romagne at gmail.com Wed Apr 9 16:45:50 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Wed, 09 Apr 2008 15:45:50 -0500 Subject: [Bioperl-l] question about clustalw module. Message-ID: <1207773950.483.13.camel@kiss-laptop> Hello, i have a problem when using Bio::Tools::Run::Alignment::Clustalw : I give it an array_ref scalar (the array contains some fasta sequences) and all the good parameters and i write the result via Bio::SeqIO. The fact is that my result file only contains the Accession number in the header... An example : the initial stream is : >NM_052854 Homo sapiens cAMP responsive element binding protein 3-like 1 (CREB3L1), mRNA. AGAAGACGTGCGGAGGGAGACGCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGG GGGAGCACTTAGCTCCCCCGCCCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTC AGCCCCAACCCCGGGCTCCCCATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGT GGAGTCGGCTGAATGCCCACGGTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCG CTGCCCTAAGGCCCCCGCGCGCCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCG CCCCTCCCCCGGGGCTTCGCCCCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAG GAGCTCTGGACTGGGCGCGCCGCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCC CGGGAGCCGGCTGCGATGGACGCCGTCTTGGAACCCTTCCCGGCCGACAGGCTGTTCCCC GGATCCAGCTTCCTGGACTTGGGGGATCTGAACGAGTCGGACTTCCTCAACAATGCGCAC ... the result file is : >NM_052854 ---------------------------------------AGAAGACGTGCGGAGGGAGAC GCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGGGGGAGCACTTAGCTCCCCCGC CCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTCAGCCCCAACCCCGGGCTCCCC ATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGTGGAGTCGGCTGAATGCCCACG GTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCGCTGCCCTAAGGCCCCCGCGCG CCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCGCCCCTCCCCCGGGGCTTCGCC CCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAGGAGCTCTGGACTGGGCGCGCC GCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCCCGGGAGCCGGCTGCGATGGAC ... ?So i lost the other informations provided by the header... ?Is there any option to keep these informations? Here is a part of my code with my options : my $seq_ref=\@seq; my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM', 'quiet' => 1, 'output' => 'FASTA'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); my $aln = $factory->align($seq_ref); Thank you. From jason at bioperl.org Wed Apr 9 16:55:13 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 9 Apr 2008 13:55:13 -0700 Subject: [Bioperl-l] question about clustalw module. In-Reply-To: <1207773950.483.13.camel@kiss-laptop> References: <1207773950.483.13.camel@kiss-laptop> Message-ID: the clustal alignment format does not allow for the description - if you want to preserve it you'll have to add it back, make a hash indexed by sequence ID and store the description, then when you get your alignment back you can update the description field before writing it out with AlignIO. -jason On Apr 9, 2008, at 1:45 PM, Fr?d?ric Romagn? wrote: > Hello, > > i have a problem when using Bio::Tools::Run::Alignment::Clustalw : > > I give it an array_ref scalar (the array contains some fasta > sequences) > and all the good parameters and i write the result via Bio::SeqIO. > > The fact is that my result file only contains the Accession number in > the header... An example : > > the initial stream is : > >> NM_052854 Homo sapiens cAMP responsive element binding protein 3- >> like 1 > (CREB3L1), mRNA. > AGAAGACGTGCGGAGGGAGACGCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGG > GGGAGCACTTAGCTCCCCCGCCCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTC > AGCCCCAACCCCGGGCTCCCCATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGT > GGAGTCGGCTGAATGCCCACGGTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCG > CTGCCCTAAGGCCCCCGCGCGCCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCG > CCCCTCCCCCGGGGCTTCGCCCCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAG > GAGCTCTGGACTGGGCGCGCCGCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCC > CGGGAGCCGGCTGCGATGGACGCCGTCTTGGAACCCTTCCCGGCCGACAGGCTGTTCCCC > GGATCCAGCTTCCTGGACTTGGGGGATCTGAACGAGTCGGACTTCCTCAACAATGCGCAC > > ... > > the result file is : > >> NM_052854 > ---------------------------------------AGAAGACGTGCGGAGGGAGAC > GCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGGGGGAGCACTTAGCTCCCCCGC > CCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTCAGCCCCAACCCCGGGCTCCCC > ATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGTGGAGTCGGCTGAATGCCCACG > GTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCGCTGCCCTAAGGCCCCCGCGCG > CCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCGCCCCTCCCCCGGGGCTTCGCC > CCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAGGAGCTCTGGACTGGGCGCGCC > GCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCCCGGGAGCCGGCTGCGATGGAC > > ... > > So i lost the other informations provided by the header... > > Is there any option to keep these informations? > > Here is a part of my code with my options : > > > my $seq_ref=\@seq; > my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM', 'quiet' => 1, > 'output' => 'FASTA'); > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > my $aln = $factory->align($seq_ref); > > > Thank you. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lamq at usal.es Thu Apr 10 11:52:24 2008 From: lamq at usal.es (Luis A. M. Quintales) Date: Thu, 10 Apr 2008 17:52:24 +0200 Subject: [Bioperl-l] xyplot glyph problem with previous aggregation Message-ID: <47FE37B8.9090404@usal.es> I am not able to add xyplot glyphs to one panel because I have some problems with the aggregations. Using that GFF file: ##sequence-region chr1 1 5578650 chr1 atfreq atpc 1 50 58.8000 . . atpc 1 chr1 atfreq atpc 51 100 58.4000 . . atpc 1 chr1 atfreq atpc 101 150 57.6000 . . atpc 1 chr1 atfreq atpc 151 200 57.8000 . . atpc 1 . . . And this source code for preparing the aggregated features necessary for the xyplot glyph: my $filin = $ARGV[0]; my $db = Bio::DB::GFF->new( -dsn => $filin, -adaptor => 'memory', -aggregator => 'at{atpc:atfreq}' ); my $segment = $db->segment('chr1'); my @features1 = $db->features('atpc'); print "$#features1 \n"; my @features2 = $segment->features('atpc'); print "$#features2 \n"; my @features3 = $db->features('at'); print "$#features3 \n"; my @features4 = $segment->features('at'); print "$#features4 \n"; I obtain: 111572 111572 0 0 What I am doing wrong with the aggregator? Many thanks. From lamq at usal.es Thu Apr 10 11:45:55 2008 From: lamq at usal.es (Luis A. M. Quintales) Date: Thu, 10 Apr 2008 17:45:55 +0200 Subject: [Bioperl-l] xyplot glyph problem with previous aggregation Message-ID: <47FE3633.70908@usal.es> I am not able to add xyplot glyphs to one panel because I have some problems with the aggregations. Using that GFF file: ##sequence-region chr1 1 5578650 chr1 atfreq atpc 1 50 58.8000 . . atpc 1 chr1 atfreq atpc 51 100 58.4000 . . atpc 1 chr1 atfreq atpc 101 150 57.6000 . . atpc 1 chr1 atfreq atpc 151 200 57.8000 . . atpc 1 . . . And this source code for preparing the aggregated features necessary for the xyplot glyph: my $filin = $ARGV[0]; my $db = Bio::DB::GFF->new( -dsn => $filin, -adaptor => 'memory', -aggregator => 'at{atpc:atfreq}' ); my $segment = $db->segment('chr1'); my @features1 = $db->features('atpc'); print "$#features1 \n"; my @features2 = $segment->features('atpc'); print "$#features2 \n"; my @features3 = $db->features('at'); print "$#features3 \n"; my @features4 = $segment->features('at'); print "$#features4 \n"; I obtain: 111572 111572 0 0 What I am doing wrong with the aggregator? Many thanks. From lincoln.stein at gmail.com Thu Apr 10 13:55:06 2008 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 10 Apr 2008 13:55:06 -0400 Subject: [Bioperl-l] xyplot glyph problem with previous aggregation In-Reply-To: <47FE37B8.9090404@usal.es> References: <47FE37B8.9090404@usal.es> Message-ID: <6dce9a0b0804101055w65e22abfgaa4f155751fef40f@mail.gmail.com> Hi Luis, When you aggregate the atpc 1 features together, you end up with one feature. Thus @features3 is an array of size 1. The $# operator returns the index of the last element, which is 0. If @features3 were empty, $#features3 would return -1. Lincoln On Thu, Apr 10, 2008 at 11:52 AM, Luis A. M. Quintales wrote: > I am not able to add xyplot glyphs to one panel because I have some > problems with the aggregations. > > Using that GFF file: > > ##sequence-region chr1 1 5578650 > chr1 atfreq atpc 1 50 58.8000 . . atpc 1 > chr1 atfreq atpc 51 100 58.4000 . . atpc 1 > chr1 atfreq atpc 101 150 57.6000 . . atpc 1 > chr1 atfreq atpc 151 200 57.8000 . . atpc 1 > . . . > > > And this source code for preparing the aggregated features necessary for > the xyplot glyph: > > my $filin = $ARGV[0]; > my $db = Bio::DB::GFF->new( -dsn => $filin, > -adaptor => 'memory', > -aggregator => 'at{atpc:atfreq}' > ); > my $segment = $db->segment('chr1'); > my @features1 = $db->features('atpc'); > print "$#features1 \n"; > my @features2 = $segment->features('atpc'); > print "$#features2 \n"; > my @features3 = $db->features('at'); > print "$#features3 \n"; > my @features4 = $segment->features('at'); > print "$#features4 \n"; > > I obtain: > > 111572 > 111572 > 0 > 0 > > What I am doing wrong with the aggregator? > > Many thanks. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From adsj at novozymes.com Fri Apr 11 04:53:23 2008 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Fri, 11 Apr 2008 10:53:23 +0200 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example Message-ID: <87d4owixh8.fsf@topper.koldfront.dk> Hi. I am trying to make Bio::SeqIO return objects of my own type (a small extension of Bio::Seq::RichSeq), by setting -seqfactory. I am having a little trouble creating the correct object to pass with -seqfactory: Following the example given in SYNOPSIS of Bio::Factory::SequenceFactoryI, I get this error: $ perl -e ' > use Bio::Seq::SeqFactory; > my $seqbuilder = Bio::Seq::SeqFactory->new('type' => 'Bio::PrimarySeq'); > > my $seq = $seqbuilder->create(-seq => 'ACTGAT', > -display_id => 'exampleseq'); > > print "seq is a ", ref($seq), "\n"; > ' ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Can't locate type.pm in @INC (@INC contains: /z/bio/biotools/bioinfperlmodules/ /z/bio/adm/modules /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl .) at (eval 13) line 3. : Unrecognized Sequence type for SeqFactory 'type' STACK: Error::throw STACK: Bio::Root::Root::throw /usr/share/perl/5.8/Bio/Root/Root.pm:357 STACK: Bio::Seq::SeqFactory::type /usr/share/perl/5.8/Bio/Seq/SeqFactory.pm:134 STACK: Bio::Seq::SeqFactory::new /usr/share/perl/5.8/Bio/Seq/SeqFactory.pm:93 STACK: -e:3 ----------------------------------------------------------- $ If I go "Bio::Seq::SeqFactory('Bio::PrimarySeq'=>1)" instead, for instance, it seems to work: $ perl -e ' > use Bio::Seq::SeqFactory; > my $seqbuilder = Bio::Seq::SeqFactory->new('Bio::PrimarySeq'=>1); > > my $seq = $seqbuilder->create(-seq => 'ACTGAT', > -display_id => 'exampleseq'); > > print "seq is a ", ref($seq), "\n"; > ' seq is a Bio::PrimarySeq $ I was about to write a patch for the pod, when I realized that I'd better start by asking: Is this a buglet in the pod or the code? Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From hlapp at gmx.net Fri Apr 11 11:35:54 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 11 Apr 2008 11:35:54 -0400 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example In-Reply-To: <87d4owixh8.fsf@topper.koldfront.dk> References: <87d4owixh8.fsf@topper.koldfront.dk> Message-ID: <0037240B-F469-4388-972A-324101B11621@gmx.net> On Apr 11, 2008, at 4:53 AM, Adam Sj?gren wrote: > $ perl -e ' >> use Bio::Seq::SeqFactory; >> my $seqbuilder = Bio::Seq::SeqFactory->new('type' => >> 'Bio::PrimarySeq'); You need to prefix the argument with a dash: '-type', not 'type'. Otherwise, it assumes that the class you want instantiated is 'type.pm'. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From 1zoujing at 163.com Thu Apr 10 01:08:52 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 9 Apr 2008 22:08:52 -0700 (PDT) Subject: [Bioperl-l] Bio::ASN1::EntrezGene parse so slowly? Message-ID: <16602210.post@talk.nabble.com> I want to parse a file "gene_info" from NCBI. The format of Gene in NCBI is ASN1, right? So I used Bio::ASN1::EntrezGene. But it didn't work properly/too slow. The file is about 500M. The code is following: use Bio::ASN1::EntrezGene; my $parser = Bio::ASN1::EntrezGene->new('file' => $ARGV[0]); my $i = 0; while(my $result = $parser->next_seq) { last; #something to do there, here use last for test} When it goes to the "while" part, it is processing on and on, it does not went out, even I used "last" in the "while" part. So I wonder whether it is too slow or the module is not fit for this job, or I did something wrong? Thank you! -- View this message in context: http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16602210.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 02:17:41 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 9 Apr 2008 23:17:41 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16602770.post@talk.nabble.com> I am a geen hand in Bioperl. When I run perl with "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error information: Data Error: none conforming data found on line 1 in Sus_scrofa.ags. But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, should be the same as Homo_sapiens in the example. So it should be no error as the code is the example from Mingyi. I wonder why this happen, and should I change something about the file? -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16602770.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 02:56:52 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 9 Apr 2008 23:56:52 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: <16602770.post@talk.nabble.com> References: <16602770.post@talk.nabble.com> Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 03:03:56 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:03:56 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there still one thing, I want to parse "gene_info.gz" in Gene of NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file ) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 03:04:32 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:04:32 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there still one thing, I want to parse "gene_info.gz" in Gene of NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 03:09:40 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:09:40 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there still one thing, I want to parse "gene_info.gz" in Gene of NCBI. ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 03:10:26 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:10:26 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there is still one thing, I want to parse "gene_info.gz" in Gene of NCBI. ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From stefan.kirov at bms.com Fri Apr 11 15:59:29 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 11 Apr 2008 15:59:29 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: <16602770.post@talk.nabble.com> References: <16602770.post@talk.nabble.com> Message-ID: AGS is a binary ASN.1 format and WILL NOT be parsed! You have to use gene2xml( weird, but this is NCBI) with these flags: -c -x -b -i. This will spit out text ASN which can be parsed. Stefan On Wed, 9 Apr 2008, zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no error > as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- > View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16602770.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From stefan.kirov at bms.com Fri Apr 11 16:01:30 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 11 Apr 2008 16:01:30 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: <16603225.post@talk.nabble.com> References: <16603225.post@talk.nabble.com> Message-ID: It is not. If you use this file, why would you need a parser for it anyway? Just split on \t or read with OpenOffice or equiv. Stefan On Thu, 10 Apr 2008, zoujing wrote: > > Seached the web and found the answer now, quote the answer as following: > The error was thrown by my Bio::ASN1::EntrezGene module because it > expects a text file, while you fed it with a binary file. To use > gzipped ASN binary file from NCBI, download the NCBI gene2xml > (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), > then use this syntax to run my parser on the binary files: > > my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i > Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped > binary file directly downloaded from NCBI > > Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). > Mingyi > > But there still one thing, I want to parse "gene_info.gz" in Gene of > NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line > per GeneID, Column header line is the first line in the file > ) is not the right format for Bio::ASN1::EntrezGene? > > > > zoujing wrote: >> >> I am a geen hand in Bioperl. When I run perl with >> "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error >> information: >> Data Error: none conforming data found on line 1 in Sus_scrofa.ags. >> >> But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, >> should be the same as Homo_sapiens in the example. So it should be no >> error as the code is the example from Mingyi. >> I wonder why this happen, and should I change something about the file? >> >> > > -- > View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From asjo at koldfront.dk Fri Apr 11 15:39:59 2008 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Fri, 11 Apr 2008 21:39:59 +0200 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example In-Reply-To: <0037240B-F469-4388-972A-324101B11621@gmx.net> (Hilmar Lapp's message of "Fri, 11 Apr 2008 11:35:54 -0400") References: <87d4owixh8.fsf@topper.koldfront.dk> <0037240B-F469-4388-972A-324101B11621@gmx.net> Message-ID: <877if4i3jk.fsf@topper.koldfront.dk> On Fri, 11 Apr 2008 11:35:54 -0400, Hilmar wrote: > On Apr 11, 2008, at 4:53 AM, Adam Sj?gren wrote: >>> my $seqbuilder = Bio::Seq::SeqFactory->new('type' => >>> 'Bio::PrimarySeq'); > You need to prefix the argument with a dash: '-type', not 'type'. > Otherwise, it assumes that the class you want instantiated is > 'type.pm'. I guess that means I should submit a patch for the SYNOPSIS. Attached. Thanks, Adam Index: Bio/Factory/SequenceFactoryI.pm =================================================================== --- Bio/Factory/SequenceFactoryI.pm (revision 14654) +++ Bio/Factory/SequenceFactoryI.pm (working copy) @@ -20,7 +20,7 @@ # get a Bio::Factory::SequenceFactoryI object like use Bio::Seq::SeqFactory; - my $seqbuilder = Bio::Seq::SeqFactory->new('type' => 'Bio::PrimarySeq'); + my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq => 'ACTGAT', -display_id => 'exampleseq'); -- "Well, I'm a moon around you" Adam Sj?gren asjo at koldfront.dk From bamboowarrior at gmail.com Fri Apr 11 19:10:35 2008 From: bamboowarrior at gmail.com (Arkady) Date: Fri, 11 Apr 2008 18:10:35 -0500 Subject: [Bioperl-l] Nucleotide Links in Gene DB (GenBank) Message-ID: <91656c3f0804111610r24c8fa5es5bcb56b7a59e0208@mail.gmail.com> Hi everyone, I'm a bioperl n00b. Actually, kind of a genbank n00b, too, as I'm from a CS background and just started bio things last June. I'm trying to set up an analysis pipeline of primate protein CDSs (the nucleotide seqs). I've written a script which does a pretty decent job of downloading these from GenBank--but it's inconsistent, because a lot of sequences in nucleotide are 'predicted' and named LOCthisorthat instead of by gene name. So what I was thinking was this (assume ANKRD43 is the gene for this example): 1. Search 'gene' database for ANKRD43 AND (PRI*[ORGN]) On NCBI, there's an option to show all nucleotide links. How do I get a list of those in bioperl? Can bioperl even search 'gene', or just 'nucleotide'? 2. Search 'nucleotide' for the referenced items from #1, and also for ANKRD43[TITL] AND (PRI*[ORGN]), save CDSes. 3. BLAST mRNA for one of those CDSes, see if we pick up any other matches. 4. BLAT other primates for CDSes, see if we find anything not in GenBank. On the other hand, I always get the feeling I'm doing things the hard way--especially here, with #1 and #2. Is there a much more obvious, simple way to do this? Thanks, folks. Cheers, John Woods Institute for Cellular and Molecular Biology The University of Texas at Austin From hlapp at gmx.net Fri Apr 11 19:19:44 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 11 Apr 2008 19:19:44 -0400 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example In-Reply-To: <877if4i3jk.fsf@topper.koldfront.dk> References: <87d4owixh8.fsf@topper.koldfront.dk> <0037240B-F469-4388-972A-324101B11621@gmx.net> <877if4i3jk.fsf@topper.koldfront.dk> Message-ID: Thanks, applied. -hilmar On Apr 11, 2008, at 3:39 PM, Adam Sj?gren wrote: > On Fri, 11 Apr 2008 11:35:54 -0400, Hilmar wrote: > >> On Apr 11, 2008, at 4:53 AM, Adam Sj?gren wrote: > >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('type' => >>>> 'Bio::PrimarySeq'); > >> You need to prefix the argument with a dash: '-type', not 'type'. >> Otherwise, it assumes that the class you want instantiated is >> 'type.pm'. > > I guess that means I should submit a patch for the SYNOPSIS. Attached. > > > Thanks, > > Adam > > > Index: Bio/Factory/SequenceFactoryI.pm > =================================================================== > --- Bio/Factory/SequenceFactoryI.pm (revision 14654) > +++ Bio/Factory/SequenceFactoryI.pm (working copy) > @@ -20,7 +20,7 @@ > # get a Bio::Factory::SequenceFactoryI object like > > use Bio::Seq::SeqFactory; > - my $seqbuilder = Bio::Seq::SeqFactory->new('type' => > 'Bio::PrimarySeq'); > + my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > 'Bio::PrimarySeq'); > > my $seq = $seqbuilder->create(-seq => 'ACTGAT', > -display_id => 'exampleseq'); > > -- > "Well, I'm a moon around you" Adam > Sj?gren > > asjo at koldfront.dk > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mmokrejs at ribosome.natur.cuni.cz Fri Apr 11 21:32:14 2008 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Sat, 12 Apr 2008 03:32:14 +0200 Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon_id In-Reply-To: References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com> <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net> <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com> <93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com> Message-ID: <4800111E.3030802@ribosome.natur.cuni.cz> Chris Fields wrote: > The counter to that perspective (using new sequences with old tax info) > would be to regularly update NCBI taxonomy, particularly in > circumstances prior to adding new sequences. Hilmar mentioned that once > tax is loaded it doesn't take as long to update, so you could set up a > cron job to update regularly. > > I remember someone mentioning weekly or monthly updates on the list > quite a while ago, but I'm unsure how often NCBI updates tax information > (i.e. with every release, monthly, weekly, etc). I can see instances > popping up where you used the an up-to-date taxonomy but a new sequence > contains a tax ID not present. I think bioperl-db handles these but I'm > not sure what other Bio* do. > I spent some time benchmarking this and inspecting the mysql log files. The current load_ncbi_taxonomy.pl script with minor modification to show timestamps does this on initial import into mysql and then update of the database using exactly same dataset (but anyway it has to walk through all the data): $ ./load_ncbi_taxonomy.pl --dbname=biosqldb --driver=mysql --host=127.0.01 \ --port=3306 --directory=/home/mmokrejs/bioinformatics/databases/ncbitax/dump \ --chunksize=0 --verbose=2 --mycnf=~/.my.cnf Sat Apr 12 01:58:43 MEST 2008 Loading NCBI taxon database in /home/mmokrejs/bioinformatics/databases/ncbitax/dump: ... retrieving all taxon nodes in the database Sat Apr 12 01:58:43 MEST 2008 ... reading in taxon nodes from nodes.dmp Sat Apr 12 01:58:58 MEST 2008 ... insert / update / delete taxon nodes 10000/421098 done (in 5 secs, 2000.0 rows/s) 20000/421098 done (in 4 secs, 2500.0 rows/s) ... 420000/421098 done (in 4 secs, 2500.0 rows/s) Sat Apr 12 02:02:21 MEST 2008 ... (committing nodes) Sat Apr 12 02:02:21 MEST 2008 ... rebuilding nested set left/right values 10000 done (in 24 secs, 416.7 rows/s) 20000 done (in 26 secs, 384.6 rows/s) 30000 done (in 24 secs, 416.7 rows/s) ... 420004 done (in 23 secs, 434.8 rows/s) Sat Apr 12 02:19:25 MEST 2008 ... reading in taxon names from names.dmp Sat Apr 12 02:19:25 MEST 2008 ... deleting old taxon names Sat Apr 12 02:19:25 MEST 2008 ... inserting new taxon names 10000 done (in 8 secs, 1250.0 rows/s) 20000 done (in 8 secs, 1250.0 rows/s) ... 580000 done (in 5 secs, 2000.0 rows/s) Sat Apr 12 02:24:48 MEST 2008 ... cleaning up Sat Apr 12 02:24:49 MEST 2008 Done. $ I decided to re-import the same data to mimic at least somehow the future updates, although no record should be UPDATEd, except zapping left and right values with NULL. :(( $ ./load_ncbi_taxonomy.pl --dbname=biosqldb --driver=mysql --host=127.0.01 --port=3306 --directory=/home/mmokrejs/bioinformatics/databases/ncbitax/dump \ --chunksize=0 --verbose=2 --mycnf=~/.my.cnf Sat Apr 12 02:35:20 MEST 2008 Loading NCBI taxon database in /home/mmokrejs/bioinformatics/databases/ncbitax/dump: ... retrieving all taxon nodes in the database Sat Apr 12 02:35:26 MEST 2008 ... reading in taxon nodes from nodes.dmp Sat Apr 12 02:35:46 MEST 2008 ... insert / update / delete taxon nodes 10000/421098 done (in 0 secs, 10000.0 rows/s) 20000/421098 done (in 0 secs, 10000.0 rows/s) ... 410000/421098 done (in 0 secs, 10000.0 rows/s) 420000/421098 done (in 0 secs, 10000.0 rows/s) Sat Apr 12 02:35:55 MEST 2008 ... (committing nodes) Sat Apr 12 02:35:55 MEST 2008 ... rebuilding nested set left/right values 10000 done (in 9 secs, 1111.1 rows/s) 20000 done (in 9 secs, 1111.1 rows/s) ... 410004 done (in 8 secs, 1250.0 rows/s) 420004 done (in 9 secs, 1111.1 rows/s) Sat Apr 12 02:41:54 MEST 2008 ... reading in taxon names from names.dmp Sat Apr 12 02:41:54 MEST 2008 ... deleting old taxon names Sat Apr 12 02:41:55 MEST 2008 ... inserting new taxon names 10000 done (in 5 secs, 2000.0 rows/s) 20000 done (in 5 secs, 2000.0 rows/s) ... 570000 done (in 6 secs, 1666.7 rows/s) 580000 done (in 5 secs, 2000.0 rows/s) Sat Apr 12 02:47:27 MEST 2008 ... cleaning up Sat Apr 12 02:47:27 MEST 2008 Done. $ ls -la /var/log/mysql/mysql.log -rw-rw---- 1 mysql mysql 483443314 Apr 12 03:15 /var/log/mysql/mysql.log $ Pentium4 M laptop, 1.8GHz, 1 GB RAM, mysql-5.0.56 with enabled SQL text logging, the slow version of logging all SQL commands compared to binary logging. The log was cleared before the tests. I could provide some bits from the log or upload it somewhere if anybody else would like to dig into the details. I believe the recalculation step could be made faster. See what happens: 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '1' ORDER BY ncbi_taxon_id 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '10239' ORDER BY ncbi_taxon_id 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12333' ORDER BY ncbi_taxon_id 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12335' ORDER BY ncbi_taxon_id 31 Query UPDATE taxon SET left_value = NULL, right_value = NULL WHERE left_value = '4' 31 Query UPDATE taxon SET left_value = NULL, right_value = NULL WHERE right_value = '5' 31 Query UPDATE taxon SET left_value = '4', right_value = '5' WHERE taxon_id = '12335' 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12340' ORDER BY ncbi_taxon_id 31 Query UPDATE taxon SET left_value = NULL, right_value = NULL WHERE left_value = '6' 31 Query UPDATE taxon SET left_value = NULL, right_value = NULL WHERE right_value = '7' 31 Query UPDATE taxon SET left_value = '6', right_value = '7' WHERE taxon_id = '12340' The columns left_value and right_value have NULL value upon the table is created, so no need to write again NULL into them. This would mean writing a wrapper function which would mimic update() but before doing that it would do 'SELECT * FROM', compare the values with those to be written and include in the final UPDATE statement only those columns for which values have been changed. We use such a smart wrapper for our code in python. ;-) When the columns for left and right are to be made NULL during update of an existing database, I think it would be much faster to drop the columns and re-create them again with NULL values. I think it could be investigated more the possibility to create empty taxon and taxon_name tables as MyISAM tables and only after all the import and updates they could be converted into InnoDB tables. One would have to probably think a bit more of the foreign keys but it might be they would not even be lost during the conversion back and forth. Actually, easy to check. Dump your current taxon and taxon_name tables (maybe even without sql data using --without-data), run 'ALTER TABLE taxon ... type=MyISAM' followed by 'ALTER TABLE taxon ... type=InnoDB' dump again the database structure and compare by diff with the original. But, time for sleep here. Martin From sdavis2 at mail.nih.gov Fri Apr 11 23:50:44 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 11 Apr 2008 23:50:44 -0400 Subject: [Bioperl-l] Bio::ASN1::EntrezGene parse so slowly? In-Reply-To: <16602210.post@talk.nabble.com> References: <16602210.post@talk.nabble.com> Message-ID: <264855a00804112050gf785c2ei66d9c7463597eccd@mail.gmail.com> gene_info is a tab-delimited text file, if I recall correctly. Have you looked at it? If it is, you should be able to parse it in a few seconds with just a couple lines of code. Sean On Thu, Apr 10, 2008 at 1:08 AM, zoujing <1zoujing at 163.com> wrote: > > I want to parse a file "gene_info" from NCBI. The format of Gene in NCBI is > ASN1, right? So I used Bio::ASN1::EntrezGene. But it didn't work > properly/too slow. The file is about 500M. > The code is following: > use Bio::ASN1::EntrezGene; > my $parser = Bio::ASN1::EntrezGene->new('file' => $ARGV[0]); > my $i = 0; > while(my $result = $parser->next_seq) > { last; #something to do there, here use last for test} > > When it goes to the "while" part, it is processing on and on, it does not > went out, even I used "last" in the "while" part. > So I wonder whether it is too slow or the module is not fit for this job, > or I did something wrong? > > Thank you! > -- > View this message in context: http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16602210.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From david at burt7259.freeserve.co.uk Sat Apr 12 13:01:57 2008 From: david at burt7259.freeserve.co.uk (David Burt) Date: Sat, 12 Apr 2008 18:01:57 +0100 Subject: [Bioperl-l] bioperl-db Message-ID: Hi Hilmar, Hope you can help ? I am using bioperl-db to create a biosql database I have used scripts load_seqdatabase.pl and load_ontology.pl to install human swissprot entries, gene ontology, sequence ontology and now want to load interpro Here?s the command line I have tried perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser root --dbpass chicken --driver mysql \ --namespace "InterPro" --format InterPro interpro.xml But I get this message Can't call method "identifier" on an undefined value at /cygdrive/c/ Bioinformatics/Ensembl/src/bioperl-live/Bio/Ontology/ SimpleOntologyEngine.pm line 395 Any ideas? Dave PS: here?s the top of the interpro.xml file Kringle From hlapp at gmx.net Sat Apr 12 14:10:44 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 12 Apr 2008 14:10:44 -0400 Subject: [Bioperl-l] personal vs list email Message-ID: I'm not sure why but I have received several Bioperl or BioSQL- related email inquiries directed to me *personally* over the past few weeks. I have been responding as I get to them, but I feel that I am doing both the senders and this community a poor service, because sometimes someone else on the list could have responded much faster, and when I respond, others on the list who happen to be interested in the same question don't get to see the answer. So from now on as a policy I will redirect *every* email sent to me personally and that asks a question related to one of the projects to the respective mailing list. If you don't want this, please conspicuously say so at the top of your email, and in that case if you do ask a project-related question be prepared to wait and to possibly needing to follow up. As an aside, it's a pretty safe assumption to make that all other core developers, and quite possibly *all* developers are following a similar policy, whether expressly or not. Isn't this somewhere in the FAQ too? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Apr 12 14:16:13 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 12 Apr 2008 14:16:13 -0400 Subject: [Bioperl-l] bioperl-db In-Reply-To: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> Message-ID: Hi Burt, can you try format interprosax instead of interpro? That variant is also much more graceful regarding required space. -hilmar On Apr 12, 2008, at 1:01 PM, David Burt wrote: > Hi Hilmar, > > Hope you can help ? I am using bioperl-db to create a biosql database > > I have used scripts load_seqdatabase.pl and load_ontology.pl to > install human swissprot entries, gene ontology, sequence ontology > and now want to load interpro > > Here?s the command line I have tried > > perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser > root --dbpass chicken --driver mysql \ > --namespace "InterPro" --format InterPro interpro.xml > > But I get this message > > Can't call method "identifier" on an undefined value at /cygdrive/ > c/Bioinformatics/Ensembl/src/bioperl-live/Bio/Ontology/ > SimpleOntologyEngine.pm line 395 > > Any ideas? > > Dave > > PS: here?s the top of the interpro.xml file > > > > > > > > > file_date="04-OCT-2006 00:00:00" /> > file_date="22-NOV-2006 00:00:00" /> > file_date="12-JUN-2007 00:00:00" /> > file_date="22-SEP-2005 00:00:00" /> > file_date="23-APR-2004 00:00:00" /> > file_date="14-NOV-2006 00:00:00" /> > file_date="27-JUL-2007 00:00:00" /> > file_date="28-SEP-2007 00:00:00" /> > file_date="11-SEP-2006 00:00:00" /> > file_date="30-NOV-2006 00:00:00" /> > entry_count="359942" file_date="18-MAR-2008 00:00:00" /> > file_date="18-MAR-2008 00:00:00" /> > file_date="19-MAR-2008 00:00:00" /> > file_date="27-MAR-2007 00:00:00" /> > file_date="12-JUL-2007 16:56:17" /> > > protein_count="352"> > Kringle > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sat Apr 12 16:17:43 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 12 Apr 2008 15:17:43 -0500 Subject: [Bioperl-l] [BioSQL-l] personal vs list email In-Reply-To: References: Message-ID: On Apr 12, 2008, at 1:10 PM, Hilmar Lapp wrote: > I'm not sure why but I have received several Bioperl or BioSQL- > related email inquiries directed to me *personally* over the past > few weeks. > > I have been responding as I get to them, but I feel that I am doing > both the senders and this community a poor service, because > sometimes someone else on the list could have responded much faster, > and when I respond, others on the list who happen to be interested > in the same question don't get to see the answer. > > So from now on as a policy I will redirect *every* email sent to me > personally and that asks a question related to one of the projects > to the respective mailing list. If you don't want this, please > conspicuously say so at the top of your email, and in that case if > you do ask a project-related question be prepared to wait and to > possibly needing to follow up. > > As an aside, it's a pretty safe assumption to make that all other > core developers, and quite possibly *all* developers are following a > similar policy, whether expressly or not. I agree; I'm sure several other core devs feel the same way. I always try to forward these to the list if I feel it is more relevant there. > Isn't this somewhere in the FAQ too? > > -hilmar No, but I've added it to the bioperl FAQ; might be worth checking over and editing. chris From hlapp at gmx.net Sat Apr 12 18:40:53 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 12 Apr 2008 18:40:53 -0400 Subject: [Bioperl-l] bioperl-db In-Reply-To: <000001c89ce2$5400a710$0202a8c0@STUDYPC> References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> <000001c89ce2$5400a710$0202a8c0@STUDYPC> Message-ID: <3F77F49A-9C9E-4450-AE28-46F00CADBC8B@gmx.net> Burt - please keep your replies on the list. Others may have input too, or benefit from the answer too. As there is no name() method call on line 914 in the current version let's check first that you run a current version of BioPerl. It will need to be at least 1.5.2. However, I do suspect a problem in either the InterPro file itself (wouldn't be the first time), or the InterPro parser. -hilmar On Apr 12, 2008, at 5:15 PM, David Burt wrote: > Hilmar > > Many thanks seems to be working > > But got this output ? any comments/ideas what it means ? > > Dave > > > perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser > root --dbpass chicken --driver mysql \ > > --namespace "InterPro" --format interprosax interpro.xml > ...deleting all relationships for InterPro > ...parsing and loading InterPro > Can't call method "name" on an undefined value at load_ontology.pl > line 914. > > HERE?S the name and definition in the ontology table > > Name = InterPro > > Definition = > > PANTHER version 6.1, 30128 entries, 04-OCT-2006 > PFAM version 21.0, 8957 entries, 22-NOV-2006 > PIRSF version 2.70, 2877 entries, 12-JUN-2007 > PRINTS version 38.0, 1900 entries, 22-SEP-2005 > PRODOM version 2005.1, 1522 entries, 23-APR-2004 > PROSITE version 20.0, 2006 entries, 14-NOV-2006 > SMART version 5.1, 724 entries, 27-JUL-2007 > TIGRFAMs version 7.0, 3423 entries, 28-SEP-2007 > GENE3D version 3.0.0, 2147 entries, 11-SEP-2006 > SSF version 1.69, 1538 entries, 30-NOV-2006 > SWISSPROT version 55.1, 359942 entries, 18-MAR-2008 > TREMBL version 38.1, 5443281 entries, 18-MAR-2008 > INTERPRO version 17.0, 16175 entries, 19-MAR-2008 > GO version N/A, 23937 entries, 27-MAR-2007 > MEROPS version 7.8, 2831 entries, 12-JUL-2007 | > > > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: 12 April 2008 19:16 > To: David Burt > Cc: Bioperl BioPerl > Subject: Re: bioperl-db > > Hi Burt, > > can you try format interprosax instead of interpro? That variant is > also much more graceful regarding required space. > > -hilmar > > On Apr 12, 2008, at 1:01 PM, David Burt wrote: > > > Hi Hilmar, > > Hope you can help ? I am using bioperl-db to create a biosql database > > I have used scripts load_seqdatabase.pl and load_ontology.pl to > install human swissprot entries, gene ontology, sequence ontology > and now want to load interpro > > Here?s the command line I have tried > > perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser > root --dbpass chicken --driver mysql \ > --namespace "InterPro" --format InterPro interpro.xml > > But I get this message > > Can't call method "identifier" on an undefined value at /cygdrive/ > c/Bioinformatics/Ensembl/src/bioperl-live/Bio/Ontology/ > SimpleOntologyEngine.pm line 395 > > Any ideas? > > Dave > > PS: here?s the top of the interpro.xml file > > > > > > > > > file_date="04-OCT-2006 00:00:00" /> > file_date="22-NOV-2006 00:00:00" /> > file_date="12-JUN-2007 00:00:00" /> > file_date="22-SEP-2005 00:00:00" /> > file_date="23-APR-2004 00:00:00" /> > file_date="14-NOV-2006 00:00:00" /> > file_date="27-JUL-2007 00:00:00" /> > file_date="28-SEP-2007 00:00:00" /> > file_date="11-SEP-2006 00:00:00" /> > file_date="30-NOV-2006 00:00:00" /> > entry_count="359942" file_date="18-MAR-2008 00:00:00" /> > file_date="18-MAR-2008 00:00:00" /> > file_date="19-MAR-2008 00:00:00" /> > file_date="27-MAR-2007 00:00:00" /> > file_date="12-JUL-2007 16:56:17" /> > > protein_count="352"> > Kringle > > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Apr 12 18:43:25 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 12 Apr 2008 18:43:25 -0400 Subject: [Bioperl-l] bioperl-db In-Reply-To: <000001c89ce5$a5df2e50$0202a8c0@STUDYPC> References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> <000001c89ce5$a5df2e50$0202a8c0@STUDYPC> Message-ID: I'm not sure what you mean by 'Check interpro.xml', but you can use the --safe command-line option to keep going if an individual term fails to load for whatever reason. Can you post the data for the seemingly offending record? (and please cc the list) -hilmar On Apr 12, 2008, at 5:39 PM, David Burt wrote: > Hi Hilmar > > Just checked mysql database and only have 39 entries under interpro > and loaded up to IPR000035 > > Check unterpro.xml looks OK from IPR000036 and onwards > > So seems to have crashed at IPR000035 ? > > dave > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: 12 April 2008 19:16 > To: David Burt > Cc: Bioperl BioPerl > Subject: Re: bioperl-db > > Hi Burt, > > can you try format interprosax instead of interpro? That variant is > also much more graceful regarding required space. > > -hilmar > > On Apr 12, 2008, at 1:01 PM, David Burt wrote: > > > Hi Hilmar, > > Hope you can help ? I am using bioperl-db to create a biosql database > > I have used scripts load_seqdatabase.pl and load_ontology.pl to > install human swissprot entries, gene ontology, sequence ontology > and now want to load interpro > > Here?s the command line I have tried > > perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser > root --dbpass chicken --driver mysql \ > --namespace "InterPro" --format InterPro interpro.xml > > But I get this message > > Can't call method "identifier" on an undefined value at /cygdrive/ > c/Bioinformatics/Ensembl/src/bioperl-live/Bio/Ontology/ > SimpleOntologyEngine.pm line 395 > > Any ideas? > > Dave > > PS: here?s the top of the interpro.xml file > > > > > > > > > file_date="04-OCT-2006 00:00:00" /> > file_date="22-NOV-2006 00:00:00" /> > file_date="12-JUN-2007 00:00:00" /> > file_date="22-SEP-2005 00:00:00" /> > file_date="23-APR-2004 00:00:00" /> > file_date="14-NOV-2006 00:00:00" /> > file_date="27-JUL-2007 00:00:00" /> > file_date="28-SEP-2007 00:00:00" /> > file_date="11-SEP-2006 00:00:00" /> > file_date="30-NOV-2006 00:00:00" /> > entry_count="359942" file_date="18-MAR-2008 00:00:00" /> > file_date="18-MAR-2008 00:00:00" /> > file_date="19-MAR-2008 00:00:00" /> > file_date="27-MAR-2007 00:00:00" /> > file_date="12-JUL-2007 16:56:17" /> > > protein_count="352"> > Kringle > > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From Russell.Smithies at agresearch.co.nz Sun Apr 13 22:51:41 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 14 Apr 2008 14:51:41 +1200 Subject: [Bioperl-l] Tandem Repeats Finder? In-Reply-To: References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC><000001c89ce5$a5df2e50$0202a8c0@STUDYPC> Message-ID: Has anyone tried TRF? I notice UCSC is using it for all their simple repeat annotations and thought it might be better than what we're currently using (Sputnik) And is there a BioPerl parser for it's output or am I going to have to write my own ? Thanx, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Russell.Smithies at agresearch.co.nz Sun Apr 13 22:53:46 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 14 Apr 2008 14:53:46 +1200 Subject: [Bioperl-l] Tandem Repeats Finder? In-Reply-To: References: Message-ID: Scratch the need for a parser. I turned off html output and it's all nice white-space separated text :-) Russell > -----Original Message----- > From: Smithies, Russell > Sent: Monday, 14 April 2008 2:52 p.m. > To: 'Bioperl BioPerl' > Subject: Tandem Repeats Finder? > > Has anyone tried TRF? > I notice UCSC is using it for all their simple repeat annotations and thought it might > be better than what we're currently using (Sputnik) > > And is there a BioPerl parser for it's output or am I going to have to write my own ? > > Thanx, > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E? russell.smithies at agresearch.co.nz > > Invermay? Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T? +64 3 489 3809 > F? +64 3 489 9174 > www.agresearch.co.nz > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From csaba.ortutay at gmail.com Mon Apr 14 00:15:22 2008 From: csaba.ortutay at gmail.com (Ortutay Csaba =?iso-8859-1?q?P=E9ter?=) Date: Mon, 14 Apr 2008 07:15:22 +0300 Subject: [Bioperl-l] Tandem Repeats Finder? In-Reply-To: References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> Message-ID: <200804140715.22702.csaba.ortutay@gmail.com> Hello, I have used TRF in my earlier projects. It is nice and quick tool. There was not ready made parsers those times (5-6 years ago) so we have written our own. Csaba > Has anyone tried TRF? > I notice UCSC is using it for all their simple repeat annotations and > thought it might be better than what we're currently using (Sputnik) > > And is there a BioPerl parser for it's output or am I going to have to > write my own ? > > Thanx, -- Csaba Ortutay PhD IMT Bioinformatics University of Tampere Finland From avilella at gmail.com Mon Apr 14 07:13:26 2008 From: avilella at gmail.com (Albert Vilella) Date: Mon, 14 Apr 2008 12:13:26 +0100 Subject: [Bioperl-l] how can I print a Bio::Tree newick sortby given list? Message-ID: <358f4d650804140413x4271f18bx40af1b9054306df8@mail.gmail.com> Hi, I have a newick file that I want to sort by a given order and print again as newick. For example, if I have (((ENSPTRG00000013811:0.0011,ENSG00000142192:0.0021):0.0033,ENSPPYG00000003902:0.0326):0.0000,ENSMMUG00000014384:0.0366):0.3638; I want to sort it by "ENSG:ENSPTRG:ENSPPYG:ENSMMUG". Any suggestions on how to do this in bioperl? Cheers, Albert. From lamq at usal.es Mon Apr 14 11:01:51 2008 From: lamq at usal.es (Luis A. M. Quintales) Date: Mon, 14 Apr 2008 17:01:51 +0200 Subject: [Bioperl-l] xyplot glyph: scale problems Message-ID: <480371DF.7040900@usal.es> I have some problem with the xyplot scale numbers calculated by the glyph. The shape of the graph looks fine, but the scale number 10 and his position in the ouput is not correct. I send the source code, simplified input file and the png output. Thank you Source code ex1.pl (also in http://avellano.usal.es/~luis/bioperl-l/ex1.pl) ============================ #!/usr/bin/perl use Bio::DB::GFF; use Bio::Graphics::Panel; use strict; my $filin = $ARGV[0]; my $db = Bio::DB::GFF->new( -dsn => $filin,-adaptor => 'memory', -aggregator => 'at{atpc:atfreq}' ); my $segment = $db->segment('chr1'); my @features = $segment->features('at'); my $panel = Bio::Graphics::Panel->new( -offset => 0, -grid => 100, -length => 500, -width => 800, -pad_left => 50, -pad_right => 50 ); $panel->add_track($segment, -glyph => 'generic', -bgcolor => 'blue', -label => 1); $panel->add_track(\@features, -glyph => 'xyplot', -graph_type=>'boxes', -scale=>'left', -height=>200, ); open (FI,"> sal.png"); ============================ in1.gff file (also in http://avellano.usal.es/~luis/bioperl-l/in1.gff) ============================ ##sequence-region chr1 1 5578650 chr1 atfreq atpc 1 10 64.0000 . . atpc 1 chr1 atfreq atpc 11 20 63.0000 . . atpc 1 chr1 atfreq atpc 21 30 62.0000 . . atpc 1 chr1 atfreq atpc 31 40 59.0000 . . atpc 1 chr1 atfreq atpc 41 50 59.0000 . . atpc 1 chr1 atfreq atpc 51 60 59.0000 . . atpc 1 chr1 atfreq atpc 61 70 59.0000 . . atpc 1 chr1 atfreq atpc 71 80 59.0000 . . atpc 1 chr1 atfreq atpc 81 90 61.0000 . . atpc 1 chr1 atfreq atpc 91 100 60.0000 . . atpc 1 chr1 atfreq atpc 101 110 60.0000 . . atpc 1 chr1 atfreq atpc 111 120 64.0000 . . atpc 1 chr1 atfreq atpc 121 130 64.0000 . . atpc 1 chr1 atfreq atpc 131 140 60.0000 . . atpc 1 chr1 atfreq atpc 141 150 60.0000 . . atpc 1 chr1 atfreq atpc 151 160 63.0000 . . atpc 1 chr1 atfreq atpc 161 170 62.0000 . . atpc 1 chr1 atfreq atpc 171 180 59.0000 . . atpc 1 chr1 atfreq atpc 181 190 54.0000 . . atpc 1 chr1 atfreq atpc 191 200 53.0000 . . atpc 1 chr1 atfreq atpc 201 210 54.0000 . . atpc 1 chr1 atfreq atpc 211 220 50.0000 . . atpc 1 chr1 atfreq atpc 221 230 51.0000 . . atpc 1 chr1 atfreq atpc 231 240 56.0000 . . atpc 1 chr1 atfreq atpc 241 250 58.0000 . . atpc 1 chr1 atfreq atpc 251 260 55.0000 . . atpc 1 chr1 atfreq atpc 261 270 54.0000 . . atpc 1 chr1 atfreq atpc 271 280 56.0000 . . atpc 1 chr1 atfreq atpc 281 290 59.0000 . . atpc 1 chr1 atfreq atpc 291 300 58.0000 . . atpc 1 chr1 atfreq atpc 301 310 60.0000 . . atpc 1 chr1 atfreq atpc 311 320 59.0000 . . atpc 1 chr1 atfreq atpc 321 330 59.0000 . . atpc 1 chr1 atfreq atpc 331 340 57.0000 . . atpc 1 chr1 atfreq atpc 341 350 56.0000 . . atpc 1 chr1 atfreq atpc 351 360 57.0000 . . atpc 1 chr1 atfreq atpc 361 370 57.0000 . . atpc 1 chr1 atfreq atpc 371 380 58.0000 . . atpc 1 chr1 atfreq atpc 381 390 56.0000 . . atpc 1 chr1 atfreq atpc 391 400 58.0000 . . atpc 1 chr1 atfreq atpc 401 410 56.0000 . . atpc 1 chr1 atfreq atpc 411 420 59.0000 . . atpc 1 chr1 atfreq atpc 421 430 58.0000 . . atpc 1 chr1 atfreq atpc 431 440 59.0000 . . atpc 1 chr1 atfreq atpc 441 450 58.0000 . . atpc 1 chr1 atfreq atpc 451 460 58.0000 . . atpc 1 chr1 atfreq atpc 461 470 56.0000 . . atpc 1 chr1 atfreq atpc 471 480 57.0000 . . atpc 1 chr1 atfreq atpc 481 490 59.0000 . . atpc 1 ============================ The sal.png : http://avellano.usal.es/~luis/bioperl-l/sal.png Thank you. -- ================================================== Luis Antonio Miguel Quintales Departamento de Inform?tica y Autom?tica Facultad de Ciencias Universidad de Salamanca Plaza de la Merced s/n 37008-SALAMANCA SPAIN ================================================== Tel.: +34-923-294400(ext.1513) Fax.: +34-923-294584 E-mail: lamq at usal.es ================================================== From aaron.j.mackey at gsk.com Mon Apr 14 09:00:52 2008 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Mon, 14 Apr 2008 09:00:52 -0400 Subject: [Bioperl-l] personal vs list email In-Reply-To: Message-ID: I try to take it even one step further: I require the person to re-ask their question on the mailing list (and then try to answer it there). This has the added benefit of causing the person to pause a moment to reflect on their question, and (sometimes) to spend a bit more time preparing the question for more broader public consumption. -Aaron From sutripa at vbi.vt.edu Mon Apr 14 12:54:47 2008 From: sutripa at vbi.vt.edu (Sucheta Tripathy) Date: Mon, 14 Apr 2008 12:54:47 -0400 (EDT) Subject: [Bioperl-l] Error installing XML::Parser Message-ID: <1285.99.152.150.87.1208192087.squirrel@webmail.vbi.vt.edu> Hello List, I have recently installed bioperl using the following command. The installation was successful. Now I am trying to install XML::Parser but it returns with error messages. Any clue what I may be doing wrong? Thanks Sucheta Following is the last part of the error message: ### Error Message ####### Expat.c: In function ??~XS_XML__Parser__Expat_SkipUntil??T: Expat.c:2664: error: ??~XML_Parser??T undeclared (first use in this function) Expat.c:2664: error: expected ??~;??T before ??~parser??T Expat.c:2665: warning: ISO C90 forbids mixed declarations and code Expat.xs:2179: error: ??~parser??T undeclared (first use in this function) Expat.xs:2179: warning: cast to pointer from integer of different size Expat.xs:2180: error: ??~CallbackVector??T has no member named ??~st_serial??T Expat.xs:2182: error: ??~CallbackVector??T has no member named ??~skip_until??T Expat.c: In function ??~XS_XML__Parser__Expat_Do_External_Parse??T: Expat.c:2687: error: ??~XML_Parser??T undeclared (first use in this function) Expat.c:2687: error: expected ??~;??T before ??~parser??T Expat.c:2688: warning: ISO C90 forbids mixed declarations and code Expat.xs:2194: error: ??~parser??T undeclared (first use in this function) Expat.xs:2194: warning: cast to pointer from integer of different size Expat.xs:2205: warning: unused variable ??~pret??T Expat.xs:2194: warning: unused variable ??~cbv??T Expat.xs:2192: warning: unused variable ??~type??T make[1]: *** [Expat.o] Error 1 make[1]: Leaving directory `/root/.cpan/build/XML-Parser-2.36/Expat' make: *** [subdirs] Error 2 /usr/bin/make -- NOT OK Running make test Can't test without successful make Running make install make had returned bad status, install seems impossible ##### -- Sucheta Tripathy, Ph.D. Virginia Bioinformatics Institute Phase-I Washington street. Virginia Tech. Blacksburg,VA 24061-0447 phone:(540)231-8138 Fax: (540) 231-2606 From mmokrejs at ribosome.natur.cuni.cz Tue Apr 15 06:45:48 2008 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Tue, 15 Apr 2008 12:45:48 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> <47FA4AD2.5030206@uv.es> Message-ID: <4804875C.80506@ribosome.natur.cuni.cz> Chris Fields wrote: > Note in the example I gave that, during the revision history, the > DBSOURCE changed at the point of the creation date (the original nuc. > record was a M. tuberculosis contig sequence, which later changed to > an updated full M. tuberculosis genome record at the time of the > 'create date'). > > Couldn't find anything specific in the GenBank docs on this, but it > appears (at least for a protein record) the creation date reflects > the date in which the sequence was either originally deposited or > originally derived from the nucleotide source record present in the > record. In other words, it may not reflect the original date of > deposition (which could have come from a different record, as in this > case). > > chris Hi, I have few answers from the past from NCBI staff to my similar questions regarding DATE issues and VERSION numbers not being increased upon "changes" in a record. I tried below to put into a more readable form my former correspondence. Hope this helps everybody to understand what happens in the black box. ;) Martin Date: Thu, 17 Jan 2002 15:40:07 -0500 (EST) From: David Wheeler Subject: Brucella_melitensis on ftp site > Hi, I'd like to point you to the fact, that the descriptions of > Brucella_melitensis differ in > ftp.ncbi.nih.nlm.gov/genomes/Bacteria/Brucella_melitensis and > ftp.ncbi.nih.nlm.gov/genbank/genomes/Bacteria/Brucella_melitensis > > Namely, the description of the strain is retained in *.gbk files > under /genomes/Bacteria/Brucella_melitensis only under the strain > description field, but not in the DEFINITION line, where it is > present in *.gbk files under > /genbank/genomes/Bacteria/Brucella_melitensis. > > LOCUS NC_003318 1177787 bp DNA circular BCT > 13-NOV-2001 DEFINITION Brucella melitensis chromosome II, complete > sequence. ACCESSION NC_003318 VERSION NC_003318.1 GI:17988344 > > compared to > > LOCUS AE008918 1177787 bp DNA circular BCT > 27-DEC-2001 DEFINITION Brucella melitensis strain 16M chromosome II, > complete sequence. ACCESSION AE008918 VERSION AE008918 > > This makes me worried about the data. Why is the release date of > NON-curated files (AE008918) newer than the release data of CURATED > data (NC_003318)? Is it expected case? Could someone explain me the > difference between them (i.e. CURATED vs. NONCURATED)? The curated record is initially a copy of the non-curated record with certain changes in documentation made in order to comply with the NCBI standard for reference genomes. One change which you have noticed is the difference in Definition line format. Curated genomic records are created in order to standardize annotation for genomes in the Entrez Genomes database while leaving editorial control for the parent GenBank records in the hands of the original submitters. Regardles of the date you see on the record, the curated version is derived from the non-curated one. In this case, it appears that the processing of the non-curated version lagged a little bit relative to that of the curated version. Normally, however, the non-curated version will have the earlier date. Date: Sun, 27 Jan 2002 00:16:55 -0500 (EST) From: David Wheeler Subject: Re: CONSULT: Brucella_melitensis on ftp site > Are the raw sequence data always same in non-curated and curated > flatfiles? > > Is the annotation of orf's/proteins different between them? > > Are there any new or withdrawn orf's or proteins in the curated > flatfiles compared to non-curated ones? > > My feeling is that no-one except original submitters can modify > submitted data, so you cannot modify non-curated files, i.e. cannot > modify them and increase the version number. > > Because of that, you've introduced curated versions, which are just > copies of original but public data so you are free to modify it. So > once again, are the differences between non-curated and curated > flatfiles only in structure of the file? I don't think so. Examples > would be Listeria genomes or the 2 Agrobacterium's, if I remember > right. Initially, there should be no or very few differences, however, as time goes by, differences in the annotation will materialize. There may also be differences in the sequence, if errors in the original sequence come to light, but these differences should be very rare. So, practically speaking, you will probably find few differences but, since the purpose of the Refseq is to curate, there may well be some differences. Date: Mon, 17 Dec 2001 11:57:06 -0500 (EST) From: Dawn Lipshultz Subject: Re: Buggy date in Staphylococcus aureus N315 >>>> Hi, I've found there has been released Staphylococcus aureus >>>> N315 on 01-JAN-1900, which is nonsense. I guss you had y2K bug. >>>> >>>> >>>> Please see >>>> >> ftp://ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Staphylococcus_aureus_N315/BA000018.gbk >> >>>> >>>> Can you please tell me the real release date? >>>> >>>> Also, is newer the NC_xxxx for Staphylococcus aureus N315 under >>>> >>>> ftp://ncbi.nlm.nih.gov/genomes/Bacteria/Staphylococcus_aureus_N315/ >>>> or this BA000018 non-cured version? >>>> >>>> >>>> LOCUS BA000018 2814816 bp DNA circular BCT >>>> 01-JAN-1900 DEFINITION Staphylococcus aureus strain N315, >>>> complete genome. >>> AP003129-AP003138. They are all dated June 2001. >>> >>> The date for the record in the ftp file is April 2001. The record >>> in GenBank (NC_002745) is dated October 2001. This version is >>> apparently more updated than the one on the ftp site. Therefore, >>> you may want to download the sequence from GenBank rather than >>> the ftp site. >>> >>> Regards, Dawn S. Lipshultz >> I cannot find the record to which you refer in your message. When I >> did a search for accession number BA000018, I received results for >> accession numbers AP003129-AP003138. They are all dated June 2001. >> >> >> The date for the record in the ftp file is April 2001. The record >> in GenBank (NC_002745) is dated October 2001. This version is >> apparently more updated than the one on the ftp site. Therefore, >> you may want to download the sequence from GenBank rather than the >> ftp site. Regards, Dawn S. Lipshultz > > Hmm, but I do get: > http://www.ncbi.nlm.nih.gov:80/cgi-bin/Entrez/framik?db=genome&gi=179 > > > look at the "GenBank: NC_002745" text in left upper part of the > window, it points to that OLD ftp file. The "RefSeq: NC_002745" > points to the April 2001 version. So what is the right way to get the > October 2001 release? > > Where can I find the difference between NC_002745 from April compared > to NC_002745 from October? > > What do you mean with "you may want to download the sequence from > GenBank rather than the ftp site."? > > BOTH ftp directories at ftp://ncbi.nlm.nih.gov are outdated. I mean > the genomes/Bacteria/Staphylococcus_aureus_N315/NC_002745.* version > and also the > genbank/genomes/Bacteria/Staphylococcus_aureus_N315/BA000018.* > version. > > The web links from www.ncbi.nlm.nih.gov:80/cgi-bin/Entrez/ point > anyway to the ftp site. Do you want to say that the ftp version > aren't updated anymore? The genome was originally released into the database on 4/20/2001 as 10 pieces with secondary accession number BA000018. You can find these pieces in Entrez nucleotides by querying with BA000018. The Genomes group here will fix the date on the record that is available from Entrez genomes. Regards, Dawn Date: Fri, 16 Nov 2001 16:09:59 -0500 (EST) From: Susan Dombrowski Subject: Re: Agrobacterium tumefaciens C58 > Dear colleague, I've noticed that there're somehow updated on Oct 17 > the genomic flatfiles of Agrobacterium tumefaciens C58 at > ftp://ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Agrobacterium_tumefaciens/. > However, for example the AE007869.gbs does NOT self-explain what has > been changed and also the VERSION number is not increased. Would you > please explain what's the change, when can I find such information > next time on web? > > I've used the published sequence from your ftp site on 2001-08-29 > with same ID and would like to know, what differs. > > LOCUS AE007869 2841581 bp DNA circular CON > 17-OCT-2001 DEFINITION Agrobacterium tumefaciens strain C58 circular > chromosome, complete sequence. ACCESSION AE007869 VERSION > AE007869 Dear Colleague, The version number of a sequence will *only* change if the content of the actual sequence has changed in any way since it was first made available. Although the date has changed, this date refers to the last time the actual record was manipulated by an NCBI staff member. Even if there is something simple, like adding a reference, changing a spelling mistake, etc., this will cause a change in the date field of the record. Thus, since the version has not changed, there are no differences to report. Best Regards, Susan Date: Wed, 26 Jun 2002 11:04:48 -0400 (EDT) From: Eric Sayers Subject: Re: Mesorhizobium_loti flatfiles >>>>> Hi, >>>>> I've found that you again silently changed flatfiles lying on your ftp >>>>> some time ago without changing the revision number. Please apologize me, >>>>> but this really causes troubles to other people working in this so called >>>>> bioinformatics. :( >>>>> >>>>> A week ago there was: >>>>> >>>>> LOCUS NC_002678 7036074 bp DNA circular BCT 10-SEP-2001 >>>>> DEFINITION Mesorhizobium loti, complete genome. >>>>> ACCESSION NC_002678 >>>>> VERSION NC_002678.1 GI:13470324 >>>>> >>>>> >>>>> and two other plasmid sequences. This yelds 7275 proteins. >>>>> >>>>> But, last autumn there was: >>>>> >>>>> LOCUS NC_002678 7036074 bp DNA circular BCT 28-MAR-2001 >>>>> DEFINITION Mesorhizobium loti, complete genome. >>>>> ACCESSION NC_002678 >>>>> VERSION NC_002678.1 GI:13470324 >>>>> >>>>> >>>>> That version had 7281 proteins in total. >>>>> I have simple questions: "Why was NOT changed the VERSION number?". >>>>> >>>>> Do I understand it wrong, that it should get updated whenever a single >>>>> character in the file contents is changed? >>> >>>> The version number of a sequence only changes if the sequence itself is >>>> modified. If anything else in the flat file is changed (ie spelling, authors, >>>> annotations, etc) the version will not change. However, the modification date in >>> >>> Sorry, do you under annotation also mean number of predicted genes, their >>> coordinates(position) etc? >>> >>>> the top line of the flat file will change for any of these modifications. (Note >>>> that the dates are different in the file you display: Mar 28, 2001 vs Sept 10, >>>> 2001.) I would track the modification date rather than or as well as the version >>>> number to catch all changes in the files. >>>> Regards, >>>> Eric W. Sayers, Ph.D. >>> >>> OK, but unless some of our programs have been buggy before or now (in >>> either of those cases have failed to extract genes from flatfiles), I do >>> not have an explanation for the differencies in amount of >>> predicted/annotated genes. >>> >>> I do not have anymore available the old flatfiles from Mar 28, but it >>> seems to me that these were newly introduced in the Sept. 10 version: >>> gi_15600768, gi_15600770, gi_15600769, gi_15600766, gi_15600767 >> >> Dear Colleague, >> Again, the only reason the version number will change is if the sequence itself >> changes. The number of annotated/predicted genes is merely an annotation on the >> sequence, and does not change the sequence itself. Therefore, the version will >> not change when the number of annotations changes. The modification date on the >> flat file will (and did) change, of course. >> >> Regards, >> Eric W. Sayers, Ph.D. > > Finally I've heard that from someone, thanks! > Now just tell me, how can I figure out what changed between those > different "date" releases? Is there a changelog available? > I consider annotations changes very important. We do not provide the details of flat file changes on our public websites, except for changes in the version number (ie actual sequence changes). In that particular case, all of the previous versions are linked to the current one. My advice to you if you want to chronicle non-sequence changes would be to check the flat files of interest periodically (by a script, for example) and look for changes in the modification dates. You could then simply compare the before and after flat files. Regards, Eric W. Sayers, Ph.D. > Hi, Miguel: > > id1_fetch can do it. Detailed instruction can be found at: > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id1_fetch.html > > Here is an example: > >> >id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > GI Loaded DB Retrieval No. > -- ------ -- ------------- > 74311105 12/07/2007 NCBI 19766263 > 74311105 01/23/2007 NCBI 16325656 > 74311105 03/30/2006 NCBI 13131204 > 74311105 03/03/2006 NCBI 12915541 > 74311105 03/02/2006 NCBI 12885275 > 74311105 12/03/2005 NCBI 12259793 > 74311105 09/09/2005 NCBI 11257262 > 74311105 09/09/2005 NCBI 11242667 > > Wenwu Cui PhD From david at burt7259.freeserve.co.uk Sun Apr 13 10:32:31 2008 From: david at burt7259.freeserve.co.uk (David Burt) Date: Sun, 13 Apr 2008 15:32:31 +0100 Subject: [Bioperl-l] bioperl-db In-Reply-To: <3F77F49A-9C9E-4450-AE28-46F00CADBC8B@gmx.net> References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> <000001c89ce2$5400a710$0202a8c0@STUDYPC> <3F77F49A-9C9E-4450-AE28-46F00CADBC8B@gmx.net> Message-ID: <000001c89d73$3b49eec0$0202a8c0@STUDYPC> Hi Hilmar Many thanks for info - tried a few things 1. First tried --safe flag perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser root --dbpass chicken --driver mysql --safe \ --namespace "InterPro" --format interprosax interpro.xml Still got same output as before ...deleting all relationships for InterPro ...parsing and loading InterPro Can't call method "name" on an undefined value at load_ontology.pl line 914 Only 35 interpro entries entered into database 2. I am using bioperl 1.5.2 3. I downloaded Release 17.0, 20 March 2008 of the interpro.xml file from ftp://ftp.ebi.ac.uk/pub/databases/interpro/ I did not send this file, sine it was ~10Mb gzipped Dave From david at burt7259.freeserve.co.uk Sun Apr 13 10:53:43 2008 From: david at burt7259.freeserve.co.uk (David Burt) Date: Sun, 13 Apr 2008 15:53:43 +0100 Subject: [Bioperl-l] bioperl-db In-Reply-To: References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> <000001c89ce5$a5df2e50$0202a8c0@STUDYPC> Message-ID: <000001c89d76$319be060$0202a8c0@STUDYPC> Hilmar Also updated copy of bioperl - see output below root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src $ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' 1.005002101 root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src $ cvs -d :pserver:cvs at cvs.bioperl.org:/home/repository/bioperl login Logging in to :pserver:cvs at cvs.bioperl.org:2401/home/repository/bioperl CVS password: root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src $ cd bioperl-live root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src/bioperl-live $ cvs -q update -d -P -r bioperl-release-1-5-2 P Build.PL P ModuleBuildBioperl.pm P Bio/Root/Version.pm cvs update: warning: t/data/taxdump/names.dmp was lost U t/data/taxdump/names.dmp cvs update: warning: t/data/taxdump/nodes.dmp was lost U t/data/taxdump/nodes.dmp root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src/bioperl-live $ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' 1.0050021 Why is the VERSION 1.0050021 rather than 1.5.2 ? Dave From heikki at sanbi.ac.za Wed Apr 16 07:36:16 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 16 Apr 2008 13:36:16 +0200 Subject: [Bioperl-l] bioperl-microarray: status? In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> Message-ID: <200804161336.16879.heikki@sanbi.ac.za> FYI, Christoper Jones has just published [http://bioinformatics.oxfordjournals.org/cgi/content/short/24/8/1102 an article in Bioinformatics] about his [http://search.cpan.org/perldoc?Microarray Microarray perl module] in CPAN. (The text added into BioPerl wiki.) -Heikki On Friday 26 January 2007 16:05:01 Chris Fields wrote: > Don't know if it's worth it, but could the microarray package be > modified so that it deals with data generated from or interacts > directly with Bioconductor (i.e. maybe including some specialized > bioperl-run set of classes to run Bioconductor tasks, return > lightweight bioperl microarray classes)? Allen pointed out in a > previous post that Bioconductor is the best pick for certain tasks, > while Perl excels at others: > > http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 > > Might be nice if we could merge both strengths together in some way. > > chris > > On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: > > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: > >> Eh, there is some discussion activity on the list, but not much. You > >> are really better off moving to Bioconductor. > > > > Ok, thanks. I added that to the wiki page: > > > > http://www.bioperl.org/wiki/Microarray_package > > > > j > > seqlab.net > > http://www.bioperl.org/wiki/User:Jhannah > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Wed Apr 16 07:36:16 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 16 Apr 2008 13:36:16 +0200 Subject: [Bioperl-l] bioperl-microarray: status? In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> Message-ID: <200804161336.16879.heikki@sanbi.ac.za> FYI, Christoper Jones has just published [http://bioinformatics.oxfordjournals.org/cgi/content/short/24/8/1102 an article in Bioinformatics] about his [http://search.cpan.org/perldoc?Microarray Microarray perl module] in CPAN. (The text added into BioPerl wiki.) -Heikki On Friday 26 January 2007 16:05:01 Chris Fields wrote: > Don't know if it's worth it, but could the microarray package be > modified so that it deals with data generated from or interacts > directly with Bioconductor (i.e. maybe including some specialized > bioperl-run set of classes to run Bioconductor tasks, return > lightweight bioperl microarray classes)? Allen pointed out in a > previous post that Bioconductor is the best pick for certain tasks, > while Perl excels at others: > > http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 > > Might be nice if we could merge both strengths together in some way. > > chris > > On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: > > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: > >> Eh, there is some discussion activity on the list, but not much. You > >> are really better off moving to Bioconductor. > > > > Ok, thanks. I added that to the wiki page: > > > > http://www.bioperl.org/wiki/Microarray_package > > > > j > > seqlab.net > > http://www.bioperl.org/wiki/User:Jhannah > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From pan.mueller at yahoo.de Wed Apr 16 08:34:51 2008 From: pan.mueller at yahoo.de (=?iso-8859-1?Q?Peter_M=FCller?=) Date: Wed, 16 Apr 2008 12:34:51 +0000 (GMT) Subject: [Bioperl-l] load_seqdatabase.pl --pipeline Message-ID: <297809.47580.qm@web28203.mail.ukl.yahoo.com> Dear list, a want to add gene symbols to unigene-cluster which were in a biosql database and lacks this information. So one way is to make a post-update script: my $adp = $db->get_object_adaptor('Bio::ClusterI'); my $pseq = $adp->find_by_primary_key(n); $adp->remove($pseq); $pseq->gene('symbol'); $adp->store($pseq); $adp->commit(); O.k., this works (I ask me why to remove the cluster first - bug or feature...?) Second way - perhaps: Using the --pipeline option, but it looks like useable only for seq-objects (Bio::Factory::SeqProcessoI) right? regards pan Machen Sie Yahoo! zu Ihrer Startseite. Los geht's: http://de.yahoo.com/set From cjfields at uiuc.edu Wed Apr 16 11:00:51 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 16 Apr 2008 10:00:51 -0500 Subject: [Bioperl-l] bioperl-microarray: status? In-Reply-To: <200804161336.16879.heikki@sanbi.ac.za> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: <479BD5A4-9C9A-4733-889D-65942F24A7F3@uiuc.edu> That would be worth looking into at some point, if anyone's interested (though it may be best to build a 'bridging' module). Wonder if it uses BioConductor and, if not, how performance is vs BioConductor? chris On Apr 16, 2008, at 6:36 AM, Heikki Lehvaslaiho wrote: > FYI, > > Christoper Jones has just published > [http://bioinformatics.oxfordjournals.org/cgi/content/short/ > 24/8/1102 an > article in Bioinformatics] about his > [http://search.cpan.org/perldoc?Microarray Microarray perl module] > in CPAN. > > (The text added into BioPerl wiki.) > > -Heikki > > > On Friday 26 January 2007 16:05:01 Chris Fields wrote: >> Don't know if it's worth it, but could the microarray package be >> modified so that it deals with data generated from or interacts >> directly with Bioconductor (i.e. maybe including some specialized >> bioperl-run set of classes to run Bioconductor tasks, return >> lightweight bioperl microarray classes)? Allen pointed out in a >> previous post that Bioconductor is the best pick for certain tasks, >> while Perl excels at others: >> >> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 >> >> Might be nice if we could merge both strengths together in some way. >> >> chris >> >> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: >>> On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >>>> Eh, there is some discussion activity on the list, but not much. >>>> You >>>> are really better off moving to Bioconductor. >>> >>> Ok, thanks. I added that to the wiki page: >>> >>> http://www.bioperl.org/wiki/Microarray_package >>> >>> j >>> seqlab.net >>> http://www.bioperl.org/wiki/User:Jhannah >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From j-keller2 at md.northwestern.edu Wed Apr 16 12:12:27 2008 From: j-keller2 at md.northwestern.edu (Jacob Keller) Date: Wed, 16 Apr 2008 11:12:27 -0500 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: <200804161336.16879.heikki@sanbi.ac.za> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: Hello All, I am new to this list, so am not totally sure this is the right forum, so please forgive if this is not the right place to asl the following question: I am seeking to get all sequences that have a given domain architecture, or at least that contain two given domains. I have thought of a few ways to do this. 1. Blast/Psi-blast for each domain, then compare the results for common sequences between the two lists, and fetch those. I would need to write a (simple) script to do this, but would prefer not to re-invent the wheel. 2. Search with a paradigm sequence of desired architecture/domain composition, somehow tweaking the psiblast parameters to find only matches over the whole search sequence, thereby finding both desired domains. I am not sure how to tweak blast to do this, though. 3. Pfam has this capability, i.e. to show all domains with a given architecture, but it is difficult to get at the actual sequences or even a list of accession numbers. Does anybody have any suggestions as to how optimally to get these seq's? Thanks for your consideration, Jacob ******************************************* Jacob Pearson Keller Northwestern University Medical Scientist Training Program Dallos Laboratory F. Searle 1-240 2240 Campus Drive Evanston IL 60208 lab: 847.491.2438 cel: 773.608.9185 email: j-keller2 at northwestern.edu ******************************************* ----- Original Message ----- From: "Heikki Lehvaslaiho" To: Cc: ; "Chris Fields" ; "Jay Hannah" ; Sent: Wednesday, April 16, 2008 6:36 AM Subject: Re: [Bioperl-l] bioperl-microarray: status? > FYI, > > Christoper Jones has just published > [http://bioinformatics.oxfordjournals.org/cgi/content/short/24/8/1102 an > article in Bioinformatics] about his > [http://search.cpan.org/perldoc?Microarray Microarray perl module] in > CPAN. > > (The text added into BioPerl wiki.) > > -Heikki > > > On Friday 26 January 2007 16:05:01 Chris Fields wrote: >> Don't know if it's worth it, but could the microarray package be >> modified so that it deals with data generated from or interacts >> directly with Bioconductor (i.e. maybe including some specialized >> bioperl-run set of classes to run Bioconductor tasks, return >> lightweight bioperl microarray classes)? Allen pointed out in a >> previous post that Bioconductor is the best pick for certain tasks, >> while Perl excels at others: >> >> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 >> >> Might be nice if we could merge both strengths together in some way. >> >> chris >> >> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: >> > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >> >> Eh, there is some discussion activity on the list, but not much. You >> >> are really better off moving to Bioconductor. >> > >> > Ok, thanks. I added that to the wiki page: >> > >> > http://www.bioperl.org/wiki/Microarray_package >> > >> > j >> > seqlab.net >> > http://www.bioperl.org/wiki/User:Jhannah >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From j-keller2 at md.northwestern.edu Wed Apr 16 12:12:27 2008 From: j-keller2 at md.northwestern.edu (Jacob Keller) Date: Wed, 16 Apr 2008 11:12:27 -0500 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: <200804161336.16879.heikki@sanbi.ac.za> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: Hello All, I am new to this list, so am not totally sure this is the right forum, so please forgive if this is not the right place to asl the following question: I am seeking to get all sequences that have a given domain architecture, or at least that contain two given domains. I have thought of a few ways to do this. 1. Blast/Psi-blast for each domain, then compare the results for common sequences between the two lists, and fetch those. I would need to write a (simple) script to do this, but would prefer not to re-invent the wheel. 2. Search with a paradigm sequence of desired architecture/domain composition, somehow tweaking the psiblast parameters to find only matches over the whole search sequence, thereby finding both desired domains. I am not sure how to tweak blast to do this, though. 3. Pfam has this capability, i.e. to show all domains with a given architecture, but it is difficult to get at the actual sequences or even a list of accession numbers. Does anybody have any suggestions as to how optimally to get these seq's? Thanks for your consideration, Jacob ******************************************* Jacob Pearson Keller Northwestern University Medical Scientist Training Program Dallos Laboratory F. Searle 1-240 2240 Campus Drive Evanston IL 60208 lab: 847.491.2438 cel: 773.608.9185 email: j-keller2 at northwestern.edu ******************************************* ----- Original Message ----- From: "Heikki Lehvaslaiho" To: Cc: ; "Chris Fields" ; "Jay Hannah" ; Sent: Wednesday, April 16, 2008 6:36 AM Subject: Re: [Bioperl-l] bioperl-microarray: status? > FYI, > > Christoper Jones has just published > [http://bioinformatics.oxfordjournals.org/cgi/content/short/24/8/1102 an > article in Bioinformatics] about his > [http://search.cpan.org/perldoc?Microarray Microarray perl module] in > CPAN. > > (The text added into BioPerl wiki.) > > -Heikki > > > On Friday 26 January 2007 16:05:01 Chris Fields wrote: >> Don't know if it's worth it, but could the microarray package be >> modified so that it deals with data generated from or interacts >> directly with Bioconductor (i.e. maybe including some specialized >> bioperl-run set of classes to run Bioconductor tasks, return >> lightweight bioperl microarray classes)? Allen pointed out in a >> previous post that Bioconductor is the best pick for certain tasks, >> while Perl excels at others: >> >> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 >> >> Might be nice if we could merge both strengths together in some way. >> >> chris >> >> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: >> > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >> >> Eh, there is some discussion activity on the list, but not much. You >> >> are really better off moving to Bioconductor. >> > >> > Ok, thanks. I added that to the wiki page: >> > >> > http://www.bioperl.org/wiki/Microarray_package >> > >> > j >> > seqlab.net >> > http://www.bioperl.org/wiki/User:Jhannah >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From frederic.romagne at gmail.com Wed Apr 16 13:25:18 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Wed, 16 Apr 2008 12:25:18 -0500 Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix Message-ID: <1208366718.19084.15.camel@kiss-laptop> Hello, i made a program which use Bio::Index::GenBank and i tested it under unix, that worked well. But i have to launch it under windows and it seems not to work on. Here is the problem : my $dbobj = Bio::Index::Abstract->new("Data/$db"); ?my $seq = $dbobj->get_Seq_by_acc($id); print $seq->display_id."\n"; did not print the same number than $id !!! So i don't work on the sequence expected... I use the SVN sources on unix and the Perl package manager for windows... Thanks. From cjfields at uiuc.edu Wed Apr 16 13:52:59 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 16 Apr 2008 12:52:59 -0500 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: You can try CDART: http://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi?cmd=rps There are probably other tools out there as well. If you want to roll your own, you can use bioperl wrappers for all of these (Bio::Tools::Run::StandAloneBlast is in bioperl-live, Bio::Tools::Run::Hmmer in bioperl-run), tweaking the parameters as you see fit, and either parse while running them or store the file for parsing later using Bio::SearchIO. Personally, I wouldn't go with (2) unless you are absolutely sure the domains are found only once per sequence, are spatially conserved, and don't overlap. For instance, with many proteins you could have a domain structure like dom1-dom2, dom2-dom1, dom1-dom1-dom2, etc. If you just want accessions from Pfam's Stockholm format (which are UniProt, I believe) you can get at accessions using Bio::AlignIO::stockholm (using perl 5.10): use Bio::AlignIO; use feature 'say'; my $file = shift || die "Must pass file as argument\n"; my $in = Bio::AlignIO->new(-format => 'stockholm', -file => $file); while (my $aln = $in->next_aln) { my @accs; for my $seq ($aln->each_seq) { push @accs, $seq->accession_number; } say join(',', at accs); } chris On Apr 16, 2008, at 11:12 AM, Jacob Keller wrote: > Hello All, > > I am new to this list, so am not totally sure this is the right > forum, so please forgive if this is not the right place to asl the > following question: I am seeking to get all sequences that have a > given domain architecture, or at least that contain two given > domains. I have thought of a few ways to do this. > > 1. Blast/Psi-blast for each domain, then compare the results for > common sequences between the two lists, and fetch those. I would > need to write a (simple) script to do this, but would prefer not to > re-invent the wheel. > > 2. Search with a paradigm sequence of desired architecture/domain > composition, somehow tweaking the psiblast parameters to find only > matches over the whole search sequence, thereby finding both desired > domains. I am not sure how to tweak blast to do this, though. > > 3. Pfam has this capability, i.e. to show all domains with a given > architecture, but it is difficult to get at the actual sequences or > even a list of accession numbers. > > Does anybody have any suggestions as to how optimally to get these > seq's? > > Thanks for your consideration, > > Jacob > > ******************************************* > Jacob Pearson Keller > Northwestern University > Medical Scientist Training Program > Dallos Laboratory > F. Searle 1-240 > 2240 Campus Drive > Evanston IL 60208 > lab: 847.491.2438 > cel: 773.608.9185 > email: j-keller2 at northwestern.edu > ******************************************* > > ----- Original Message ----- From: "Heikki Lehvaslaiho" > > To: > Cc: ; "Chris Fields" ; "Jay > Hannah" ; > Sent: Wednesday, April 16, 2008 6:36 AM > Subject: Re: [Bioperl-l] bioperl-microarray: status? > > >> FYI, >> >> Christoper Jones has just published >> [http://bioinformatics.oxfordjournals.org/cgi/content/short/ >> 24/8/1102 an >> article in Bioinformatics] about his >> [http://search.cpan.org/perldoc?Microarray Microarray perl module] >> in CPAN. >> >> (The text added into BioPerl wiki.) >> >> -Heikki >> >> >> On Friday 26 January 2007 16:05:01 Chris Fields wrote: >>> Don't know if it's worth it, but could the microarray package be >>> modified so that it deals with data generated from or interacts >>> directly with Bioconductor (i.e. maybe including some specialized >>> bioperl-run set of classes to run Bioconductor tasks, return >>> lightweight bioperl microarray classes)? Allen pointed out in a >>> previous post that Bioconductor is the best pick for certain tasks, >>> while Perl excels at others: >>> >>> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 >>> >>> Might be nice if we could merge both strengths together in some way. >>> >>> chris >>> >>> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: >>> > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >>> >> Eh, there is some discussion activity on the list, but not >>> much. You >>> >> are really better off moving to Bioconductor. >>> > >>> > Ok, thanks. I added that to the wiki page: >>> > >>> > http://www.bioperl.org/wiki/Microarray_package >>> > >>> > j >>> > seqlab.net >>> > http://www.bioperl.org/wiki/User:Jhannah >>> > >>> > _______________________________________________ >>> > Bioperl-l mailing list >>> > Bioperl-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ______ _/ _/ >> _____________________________________________________ >> _/ _/ >> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >> _/ _/ _/ SANBI, South African National Bioinformatics Institute >> _/ _/ _/ University of Western Cape, South Africa >> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >> ___ _/_/_/_/_/ >> ________________________________________________________ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From David.Messina at sbc.su.se Wed Apr 16 14:23:27 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 16 Apr 2008 20:23:27 +0200 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: <628aabb70804161123s453bd96bqd2213b938dfdb3a2@mail.gmail.com> Hey Jacob, This forum is mostly geared toward the BioPerl software package rather than general bioinformatics assistance. That being said, I would recommend using Pfam's Sequence Search to determine the domain content of your sequences and then simply looking at those which have the same two domains of interest. If there are more sequences matching this criterion than can be examined manually, you could write up something (potentially using BioPerl) to then look at the relative order and number of those domains in your sequences. However, if these sequences have UniProt IDs, you can start with the domains and Pfam will hand you a list of all the UniProt seqs having those domains. On the Pfam website's main page, click on "Help" (right side of menu at the top of the page) and then "Tools and Services" (left side menu). Dave From Russell.Smithies at agresearch.co.nz Wed Apr 16 16:49:49 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 17 Apr 2008 08:49:49 +1200 Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix In-Reply-To: <1208366718.19084.15.camel@kiss-laptop> References: <1208366718.19084.15.camel@kiss-laptop> Message-ID: Did you check the format of your input file? i.e. DOS or UNIX line endings? > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Fr?d?ric Romagn? > Sent: Thursday, 17 April 2008 5:25 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix > > Hello, > i made a program which use Bio::Index::GenBank and i tested it under > unix, that worked well. > > But i have to launch it under windows and it seems not to work on. > > Here is the problem : > > my $dbobj = Bio::Index::Abstract->new("Data/$db"); > ?my $seq = $dbobj->get_Seq_by_acc($id); > print $seq->display_id."\n"; > > did not print the same number than $id !!! So i don't work on the > sequence expected... > > I use the SVN sources on unix and the Perl package manager for > windows... > > Thanks. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From frederic.romagne at gmail.com Wed Apr 16 17:39:07 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Wed, 16 Apr 2008 16:39:07 -0500 Subject: [Bioperl-l] index::abstract on win and unix In-Reply-To: References: <1208366718.19084.15.camel@kiss-laptop> Message-ID: <1208381947.16620.6.camel@kiss-laptop> Well, if with input file you mean the database used, it's created with ?Bio::Index::GenBank from a ncbi FTP's genbank file. $id is an accession number read from a file but i chomp the line... I am trying to install the svn version of bioperl under windows to see if there is an improvement. Le jeudi 17 avril 2008 ? 08:49 +1200, Smithies, Russell a ?crit : > Did you check the format of your input file? > i.e. DOS or UNIX line endings? > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > > bio.org] On Behalf Of Fr?d?ric Romagn? > > Sent: Thursday, 17 April 2008 5:25 a.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix > > > > Hello, > > i made a program which use Bio::Index::GenBank and i tested it under > > unix, that worked well. > > > > But i have to launch it under windows and it seems not to work on. > > > > Here is the problem : > > > > my $dbobj = Bio::Index::Abstract->new("Data/$db"); > > ?my $seq = $dbobj->get_Seq_by_acc($id); > > print $seq->display_id."\n"; > > > > did not print the same number than $id !!! So i don't work on the > > sequence expected... > > > > I use the SVN sources on unix and the Perl package manager for > > windows... > > > > Thanks. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= From hubert.gaynor at yahoo.com Thu Apr 17 02:19:11 2008 From: hubert.gaynor at yahoo.com (Hubert Gaynor) Date: Wed, 16 Apr 2008 23:19:11 -0700 (PDT) Subject: [Bioperl-l] Can I use BLAST against a database like MySQL Message-ID: <657734.41592.qm@web46008.mail.sp1.yahoo.com> Hi, As far as I know, before using BLAST to do the alignment the first thing should be done is typing formatdb to construct a database. But I was wondering whether it is possible to construct a database with MySQL which probably will grant the BLAST search a higher speed and make the database management much easier? Thanks! Hubert. ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From sdavis2 at mail.nih.gov Thu Apr 17 06:36:32 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 17 Apr 2008 06:36:32 -0400 Subject: [Bioperl-l] Can I use BLAST against a database like MySQL In-Reply-To: <657734.41592.qm@web46008.mail.sp1.yahoo.com> References: <657734.41592.qm@web46008.mail.sp1.yahoo.com> Message-ID: <264855a00804170336o2a2bcff9xfcb05a33bac4c8dc@mail.gmail.com> On Thu, Apr 17, 2008 at 2:19 AM, Hubert Gaynor wrote: > Hi, > > As far as I know, before using BLAST to do the alignment the first thing should be done is typing formatdb to construct a database. But I was wondering whether it is possible to construct a database with MySQL which probably will grant the BLAST search a higher speed and make the database management much easier? > formatdb is used to make a representation that can be used efficiently by blast. That representation already makes blast faster. MySQL can't be used for such things. As for speeding blast, if you have a multiprocessor machine, you can take advantage of those using blast and increasing the number of processors. Also, while blast is a very versatile program, it is not the only alignment program available. Depending on your needs, you could look at other programs such as blat or gmap that can be 2-3 orders of magnitude faster than blast. Sean From stefan.kirov at bms.com Thu Apr 17 09:40:29 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 09:40:29 -0400 Subject: [Bioperl-l] bioperl-db woes Message-ID: <4807534D.80105@bms.com> I'm having problems passing all the tests for bioperl-db. There are 2 distinct errors, first one: Can't locate Bio/DB/BioSQL/RichSeqAdaptor.pm ***Which by the way is embed deep into several layers of eval, so I am getting the actual error from the test: ***t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 78. or ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Annotation of class Bio::Annotation::Collection not type-mapped. Internal error? STACK: Error::throw STACK: Bio::Root::Root::throw /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store Bio/DB/Persistent/PersistentObject.pm:271 STACK: Bio::DB::BioSQL::SeqAdaptor::store_children Bio/DB/BioSQL/SeqAdaptor.pm:224 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK: Bio::DB::Persistent::PersistentObject::create Bio/DB/Persistent/PersistentObject.pm:244 STACK: t/04swiss.t:36 ----------------------------------------------------------- It turns out the adaptor is really not there??? My bioperl-db is from dev.open-bio.org/home/svn-repositories/bioperl/bioperl-db/trunk bioperl-db (revision 14661) Is this module being deprecated (I am sure it is not) my download incomplete....? The other problem was: DBD::Oracle::st execute failed: ORA-02292: integrity constraint (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with ParamValues: :p1=9606] at /home/kirovs/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 320. not ok 76 # Test 76 got: (t/02species.t at line 71) I have not tried to debug this one.... Thanks! Stefan From stefan.kirov at bms.com Thu Apr 17 10:18:30 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 10:18:30 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> Message-ID: On Thu, 17 Apr 2008, Chris Fields wrote: > The 'get_dbxrefs' problem looks related to recent changes I made when rolling > back the significant feature/annotation changes introduced just prior to the > 1.5 release, none which were fully implemented. I can check that one out. > Odd though; these passed for me, but I'm using MySQL not oracle. get_dbxref is not the problem- I think the error message is misleading: kirovs at horta:~/bioperl-db> grep get_dbxrefs /home/kirovs/bioperl-live/Bio/Ontology/Term.pm get_dbxrefs() instead, which handles both strings and DBLink "Use get_dbxrefs() instead"); $self->get_dbxrefs($context); =head2 get_dbxrefs Title : get_dbxrefs() Usage : @ds = $term->get_dbxrefs(); sub get_dbxrefs { } # get_dbxrefs my @old = $self->get_dbxrefs($context); sub each_dblink {shift->throw("use of each_dblink() is deprecated; use get_dbxrefs() instead")} So it is there. In any case I debugged and tracked that down to the RichSeq adaptor module missing. It is not in the distro I downloaded, so I think this is my problem. It is a different question why... I looked at different repos (SVN, CVS, trunk, different tags) and I did not see RichSeq.pm. I am not sure what is going on. Perhaps Hilmar will be able to help when he is around. Thanks for the help Chris.... Stefan > > You may want to make sure you are using bioperl-live and that there isn't an > older bioperl installation getting into the mix. > > chris > > On Apr 17, 2008, at 8:40 AM, Stefan Kirov wrote: > >> I'm having problems passing all the tests for bioperl-db. There are 2 >> distinct errors, first one: >> Can't locate Bio/DB/BioSQL/RichSeqAdaptor.pm >> ***Which by the way is embed deep into several layers of eval, so I >> am getting the actual error from the test: >> ***t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" >> via package "Bio::Ontology::Term" at >> >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 78. >> or >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> >> MSG: Annotation of class Bio::Annotation::Collection not >> type-mapped. Internal error? >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >> STACK: >> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >> STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store >> Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >> Bio/DB/BioSQL/SeqAdaptor.pm:224 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::Persistent::PersistentObject::create >> Bio/DB/Persistent/PersistentObject.pm:244 >> STACK: t/04swiss.t:36 >> ----------------------------------------------------------- >> >> It turns out the adaptor is really not there??? >> My bioperl-db is from >> dev.open-bio.org/home/svn-repositories/bioperl/bioperl-db/trunk >> bioperl-db (revision 14661) >> Is this module being deprecated (I am sure it is not) my download >> incomplete....? >> The other problem was: >> DBD::Oracle::st execute failed: ORA-02292: integrity constraint >> (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: >> OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with >> ParamValues: :p1=9606] at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm >> line 320. >> not ok 76 >> # Test 76 got: (t/02species.t at line 71) >> I have not tried to debug this one.... >> Thanks! >> Stefan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > From cjfields at uiuc.edu Thu Apr 17 09:59:57 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 17 Apr 2008 08:59:57 -0500 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <4807534D.80105@bms.com> References: <4807534D.80105@bms.com> Message-ID: <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> The 'get_dbxrefs' problem looks related to recent changes I made when rolling back the significant feature/annotation changes introduced just prior to the 1.5 release, none which were fully implemented. I can check that one out. Odd though; these passed for me, but I'm using MySQL not oracle. You may want to make sure you are using bioperl-live and that there isn't an older bioperl installation getting into the mix. chris On Apr 17, 2008, at 8:40 AM, Stefan Kirov wrote: > I'm having problems passing all the tests for bioperl-db. There are 2 > distinct errors, first one: > Can't locate Bio/DB/BioSQL/RichSeqAdaptor.pm > ***Which by the way is embed deep into several layers of eval, so I > am getting the actual error from the test: > ***t/04swiss.........ok 3/52Can't locate object method > "get_dbxrefs" > via package "Bio::Ontology::Term" at > > /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm > line 552, line 78. > or > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Annotation of class Bio::Annotation::Collection not > type-mapped. Internal error? > STACK: Error::throw > STACK: Bio::Root::Root::throw > /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 > STACK: > Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key > Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 > STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children > Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > STACK: Bio::DB::Persistent::PersistentObject::store > Bio/DB/Persistent/PersistentObject.pm:271 > STACK: Bio::DB::BioSQL::SeqAdaptor::store_children > Bio/DB/BioSQL/SeqAdaptor.pm:224 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK: Bio::DB::Persistent::PersistentObject::create > Bio/DB/Persistent/PersistentObject.pm:244 > STACK: t/04swiss.t:36 > ----------------------------------------------------------- > > It turns out the adaptor is really not there??? > My bioperl-db is from > dev.open-bio.org/home/svn-repositories/bioperl/bioperl-db/trunk > bioperl-db (revision 14661) > Is this module being deprecated (I am sure it is not) my download > incomplete....? > The other problem was: > DBD::Oracle::st execute failed: ORA-02292: integrity constraint > (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: > OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with > ParamValues: :p1=9606] at > /home/kirovs/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm > line 320. > not ok 76 > # Test 76 got: (t/02species.t at line 71) > I have not tried to debug this one.... > Thanks! > Stefan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Thu Apr 17 10:52:32 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 10:52:32 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <9ECDEB39-95F3-4A94-9AF7-FFEBBDEFF0FA@gmx.net> References: <4807534D.80105@bms.com> <9ECDEB39-95F3-4A94-9AF7-FFEBBDEFF0FA@gmx.net> Message-ID: That is correct and I assumed I should not be concerned with this error. Thanks Stefan On Thu, 17 Apr 2008, Hilmar Lapp wrote: > > On Apr 17, 2008, at 9:40 AM, Stefan Kirov wrote: >> The other problem was: >> DBD::Oracle::st execute failed: ORA-02292: integrity constraint >> (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: >> OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with >> ParamValues: :p1=9606] at > > > This sounds like you are running the tests against a non-empty database? > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > From hlapp at gmx.net Thu Apr 17 10:47:58 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 17 Apr 2008 10:47:58 -0400 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> Message-ID: <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: > In any case I debugged and tracked that down to the RichSeq adaptor > module missing. That almost can't be the problem. Every Bio::Seq::RichSeq is-a Bio::Seq and a SeqAdaptor is present. I'm afraid it gets stuck somewhere else and frankly I didn't see the RichSeqAdaptor failing to load in your stack trace: > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Annotation of class Bio::Annotation::Collection not > type-mapped. Internal error? > STACK: Error::throw > STACK: Bio::Root::Root::throw > /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 > STACK: > Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key > Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 > STACK: > Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children > Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > STACK: Bio::DB::Persistent::PersistentObject::store > Bio/DB/Persistent/PersistentObject.pm:271 > STACK: Bio::DB::BioSQL::SeqAdaptor::store_children > Bio/DB/BioSQL/SeqAdaptor.pm:224 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK: Bio::DB::Persistent::PersistentObject::create > Bio/DB/Persistent/PersistentObject.pm:244 > STACK: t/04swiss.t:36 > ----------------------------------------------------------- What that tells me is that when bioperl-db tries to store the annotation bundle of the (SwissProt) sequence, one of the annotations that it encounters is of type Bio::Annotation::Collection. At present bioperl-db doesn't know what to do with it; i.e., bioperl-db can't yet handle hierarchical annotation collections (collections within collections). I believe this is due to recent changes in how the GN line is parsed in BioPerl - Chris does this ring the right bell? I thought though you had built in a method would allow flattening out? It's worth noting that BioSQL itself can't really represent nested annotation collections other than by using ontology terms and their hierarchy, which at present I think isn't really appropriate, but I have to think through the issue more. In other words, in BioSQL you can't directly tie together a bunch of qualifier value pairs into a "bag" and then nest this bag within another. The way to make this work with the current schema is to flatten out the nesting. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Apr 17 10:48:52 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 17 Apr 2008 10:48:52 -0400 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <4807534D.80105@bms.com> References: <4807534D.80105@bms.com> Message-ID: <9ECDEB39-95F3-4A94-9AF7-FFEBBDEFF0FA@gmx.net> On Apr 17, 2008, at 9:40 AM, Stefan Kirov wrote: > The other problem was: > DBD::Oracle::st execute failed: ORA-02292: integrity constraint > (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: > OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with > ParamValues: :p1=9606] at This sounds like you are running the tests against a non-empty database? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From stefan.kirov at bms.com Thu Apr 17 11:28:42 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 11:28:42 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: Hilmar, I think I saw what happens with this adaptor- In Bio::DB::BioSQL::DBAdaptor::_load_object_adaptor (call from create_persistent) there is request that this module is loaded: Bio/DB/BioSQL/RichSeqAdaptor.pm There is no such module... This always fails, but since it is evaled, there is no actual error- instead. Perhaps this is leftover...? This got me fooled... I guess Chris could be right- Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key is being passed Bio::Annotation::Collection as a value for $obj->obj(). Or recursing too far? Anyway, I am just guessing here- I do not know the architecture of bioperl-db... Thanks again for the help... Stefan On Thu, 17 Apr 2008, Hilmar Lapp wrote: > > On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >> In any case I debugged and tracked that down to the RichSeq adaptor module >> missing. > > > That almost can't be the problem. Every Bio::Seq::RichSeq is-a Bio::Seq and a > SeqAdaptor is present. > > I'm afraid it gets stuck somewhere else and frankly I didn't see the > RichSeqAdaptor failing to load in your stack trace: > >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> >> MSG: Annotation of class Bio::Annotation::Collection not >> type-mapped. Internal error? >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >> STACK: >> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >> STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store >> Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >> Bio/DB/BioSQL/SeqAdaptor.pm:224 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::Persistent::PersistentObject::create >> Bio/DB/Persistent/PersistentObject.pm:244 >> STACK: t/04swiss.t:36 >> ----------------------------------------------------------- > > What that tells me is that when bioperl-db tries to store the annotation > bundle of the (SwissProt) sequence, one of the annotations that it encounters > is of type Bio::Annotation::Collection. At present bioperl-db doesn't know > what to do with it; i.e., bioperl-db can't yet handle hierarchical annotation > collections (collections within collections). > > I believe this is due to recent changes in how the GN line is parsed in > BioPerl - Chris does this ring the right bell? I thought though you had built > in a method would allow flattening out? > > It's worth noting that BioSQL itself can't really represent nested annotation > collections other than by using ontology terms and their hierarchy, which at > present I think isn't really appropriate, but I have to think through the > issue more. In other words, in BioSQL you can't directly tie together a bunch > of qualifier value pairs into a "bag" and then nest this bag within another. > The way to make this work with the current schema is to flatten out the > nesting. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at uiuc.edu Thu Apr 17 12:26:41 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 17 Apr 2008 11:26:41 -0500 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: On Apr 17, 2008, at 9:47 AM, Hilmar Lapp wrote: > > On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >> In any case I debugged and tracked that down to the RichSeq adaptor >> module missing. > > > That almost can't be the problem. Every Bio::Seq::RichSeq is-a > Bio::Seq and a SeqAdaptor is present. > > I'm afraid it gets stuck somewhere else and frankly I didn't see the > RichSeqAdaptor failing to load in your stack trace: > >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> >> MSG: Annotation of class Bio::Annotation::Collection not >> type-mapped. Internal error? >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >> STACK: >> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >> STACK: >> Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store >> Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >> Bio/DB/BioSQL/SeqAdaptor.pm:224 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::Persistent::PersistentObject::create >> Bio/DB/Persistent/PersistentObject.pm:244 >> STACK: t/04swiss.t:36 >> ----------------------------------------------------------- > > What that tells me is that when bioperl-db tries to store the > annotation bundle of the (SwissProt) sequence, one of the > annotations that it encounters is of type > Bio::Annotation::Collection. At present bioperl-db doesn't know what > to do with it; i.e., bioperl-db can't yet handle hierarchical > annotation collections (collections within collections). > > I believe this is due to recent changes in how the GN line is parsed > in BioPerl - Chris does this ring the right bell? I thought though > you had built in a method would allow flattening out This appears to be using an older bioperl-live checkout, one where Heikki changed GN parsing to use a nested Annotation::Collection. I reverted that back in a later commit to svn specifically b/c of bioperl-db problems. bioperl-live's swiss.pm now uses a new subclass of Bio::Annotation::SimpleValue (Bio::Annotation::TagTree) that represents nested values via Data::Stag's itext output (we can change that to alternatives if needed). Here are the last few relevant revisions in bioperl-live's main trunk (mine is the latest): ------------------------------------------------------------------------ r14562 | cjfields | 2008-02-28 08:30:05 -0600 (Thu, 28 Feb 2008) | 1 line bug 1825: updating swiss.pm/tests to try out TagTree (passes all tests). Need to update Handler.t and related modules still... ------------------------------------------------------------------------ r14541 | heikki | 2008-02-25 00:10:48 -0600 (Mon, 25 Feb 2008) | 1 line documentation for the GN line parsing and management ------------------------------------------------------------------------ r14538 | heikki | 2008-02-23 08:48:23 -0600 (Sat, 23 Feb 2008) | 1 line GN (Gene Name) line parsing rewrite. Breaks backward compatibility. Can now deal with >1 gene per entry and four categories of names per gene. Parses old style syntax (...OR ... OR ... ) into one gene name and synonyms for each gene. Docs to follow. .... I just updated all code from dev and reran bioperl-db tests w/o problems. Maybe someone else could do the same to see what happens? > It's worth noting that BioSQL itself can't really represent nested > annotation collections other than by using ontology terms and their > hierarchy, which at present I think isn't really appropriate, but I > have to think through the issue more. In other words, in BioSQL you > can't directly tie together a bunch of qualifier value pairs into a > "bag" and then nest this bag within another. The way to make this > work with the current schema is to flatten out the nesting. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== Might be worth looking into for a future BioSQL release, but we have a decent workaround in place for now, as long as it works cross-platform and cross-RDB. chris From stefan.kirov at bms.com Thu Apr 17 12:40:14 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 12:40:14 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] bioperl-db woes In-Reply-To: References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: Hilmar, sorry, I missed the part after the stack trace... In any case this is still problem for me after I updated bioperl-live. I see this with a number of other tests: t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 78. t/04swiss.........dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 6-52 Failed 47/52 tests, 9.62% okay t/05seqfeature....ok 4/48Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 72. t/05seqfeature....FAILED tests 9-48 Failed 40/48 tests, 16.67% okay t/06comment.......ok t/07dblink........ok t/08genbank.......ok t/09fuzzy2........ok t/10ensembl.......ok 1/15Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1420. t/10ensembl.......dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 3-15 Failed 13/15 tests, 13.33% okay t/11locuslink.....ok 4/110Can't locate object method "get_dbxrefs" via package "Bio::Annotation::OntologyTerm" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1. t/11locuslink.....dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 5-110 Failed 106/110 tests, 3.64% okay t/12ontology......ok 1/738Can't locate object method "get_dbxrefs" via package "Bio::Ontology::GOterm" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 98. t/12ontology......dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 5-738 Failed 734/738 tests, 0.54% okay t/13remove........ok 2/59Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 145. t/13remove........FAILED tests 11-59 Failed 49/59 tests, 16.95% okay t/14query.........ok t/15cluster.......ok 3/160Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1. t/15cluster.......dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 6-160 Failed 155/160 tests, 3.12% okay t/16obda..........ok On Thu, 17 Apr 2008, Chris Fields wrote: > > On Apr 17, 2008, at 9:47 AM, Hilmar Lapp wrote: > >> >> On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >>> In any case I debugged and tracked that down to the RichSeq adaptor module >>> missing. >> >> >> That almost can't be the problem. Every Bio::Seq::RichSeq is-a Bio::Seq and >> a SeqAdaptor is present. >> >> I'm afraid it gets stuck somewhere else and frankly I didn't see the >> RichSeqAdaptor failing to load in your stack trace: >> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> >>> MSG: Annotation of class Bio::Annotation::Collection not >>> type-mapped. Internal error? >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw >>> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >>> STACK: >>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >>> STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >>> STACK: Bio::DB::Persistent::PersistentObject::store >>> Bio/DB/Persistent/PersistentObject.pm:271 >>> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >>> Bio/DB/BioSQL/SeqAdaptor.pm:224 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>> STACK: Bio::DB::Persistent::PersistentObject::create >>> Bio/DB/Persistent/PersistentObject.pm:244 >>> STACK: t/04swiss.t:36 >>> ----------------------------------------------------------- >> >> What that tells me is that when bioperl-db tries to store the annotation >> bundle of the (SwissProt) sequence, one of the annotations that it >> encounters is of type Bio::Annotation::Collection. At present bioperl-db >> doesn't know what to do with it; i.e., bioperl-db can't yet handle >> hierarchical annotation collections (collections within collections). >> >> I believe this is due to recent changes in how the GN line is parsed in >> BioPerl - Chris does this ring the right bell? I thought though you had >> built in a method would allow flattening out > > This appears to be using an older bioperl-live checkout, one where Heikki > changed GN parsing to use a nested Annotation::Collection. I reverted that > back in a later commit to svn specifically b/c of bioperl-db problems. > bioperl-live's swiss.pm now uses a new subclass of > Bio::Annotation::SimpleValue (Bio::Annotation::TagTree) that represents > nested values via Data::Stag's itext output (we can change that to > alternatives if needed). > > Here are the last few relevant revisions in bioperl-live's main trunk (mine > is the latest): > > ------------------------------------------------------------------------ > r14562 | cjfields | 2008-02-28 08:30:05 -0600 (Thu, 28 Feb 2008) | 1 line > > bug 1825: updating swiss.pm/tests to try out TagTree (passes all tests). > Need to update Handler.t and related modules still... > ------------------------------------------------------------------------ > r14541 | heikki | 2008-02-25 00:10:48 -0600 (Mon, 25 Feb 2008) | 1 line > > documentation for the GN line parsing and management > ------------------------------------------------------------------------ > r14538 | heikki | 2008-02-23 08:48:23 -0600 (Sat, 23 Feb 2008) | 1 line > > GN (Gene Name) line parsing rewrite. Breaks backward compatibility. Can now > deal with >1 gene per entry and four categories of names per gene. Parses old > style syntax (...OR ... OR ... ) into one gene name and synonyms for each > gene. Docs to follow. > > .... > > I just updated all code from dev and reran bioperl-db tests w/o problems. > Maybe someone else could do the same to see what happens? > >> It's worth noting that BioSQL itself can't really represent nested >> annotation collections other than by using ontology terms and their >> hierarchy, which at present I think isn't really appropriate, but I have to >> think through the issue more. In other words, in BioSQL you can't directly >> tie together a bunch of qualifier value pairs into a "bag" and then nest >> this bag within another. The way to make this work with the current schema >> is to flatten out the nesting. >> >> -hilmar >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== > > Might be worth looking into for a future BioSQL release, but we have a decent > workaround in place for now, as long as it works cross-platform and > cross-RDB. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Apr 17 13:06:39 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 17 Apr 2008 12:06:39 -0500 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: Stefan, 'get_dbxrefs' was introduced in bioperl-live a while back during the feature/annotation rollback detailed here: http://www.bioperl.org/wiki/Feature_Annotation_rollback I still think this is an interfering old bioperl (and maybe bioperl- db) installation causing the problems; I had similar issues at one point and had to find and remove the old installation. It might be worth (1) checking 'perldoc -l Bio::Root::Root', which will give the location of the Bio::Root::Root in lib path being used, and (2) using './Build install uninst=1' to remove any old bioperl/bioperl-db installations. chris On Apr 17, 2008, at 11:40 AM, Stefan Kirov wrote: > Hilmar, > sorry, I missed the part after the stack trace... In any case this > is still problem for me after I updated bioperl-live. > I see this with a number of other tests: > t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 78. > t/04swiss.........dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 6-52 > Failed 47/52 tests, 9.62% okay > t/05seqfeature....ok 4/48Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 72. > t/05seqfeature....FAILED tests 9-48 > Failed 40/48 tests, 16.67% okay > t/06comment.......ok > t/07dblink........ok > t/08genbank.......ok > t/09fuzzy2........ok > t/10ensembl.......ok 1/15Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1420. > t/10ensembl.......dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 3-15 > Failed 13/15 tests, 13.33% okay > t/11locuslink.....ok 4/110Can't locate object method "get_dbxrefs" > via package "Bio::Annotation::OntologyTerm" at /home/kirovs/bioperl- > db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, > line 1. > t/11locuslink.....dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 5-110 > Failed 106/110 tests, 3.64% okay > t/12ontology......ok 1/738Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::GOterm" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 98. > t/12ontology......dubious > Test returned status 255 (wstat 65280, 0xff00) > DIED. FAILED tests 5-738 > Failed 734/738 tests, 0.54% okay > t/13remove........ok 2/59Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 145. > t/13remove........FAILED tests 11-59 > Failed 49/59 tests, 16.95% okay > t/14query.........ok > t/15cluster.......ok 3/160Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1. > t/15cluster.......dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 6-160 > Failed 155/160 tests, 3.12% okay > t/16obda..........ok > > On Thu, 17 Apr 2008, Chris Fields wrote: > >> >> On Apr 17, 2008, at 9:47 AM, Hilmar Lapp wrote: >> >>> On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >>>> In any case I debugged and tracked that down to the RichSeq >>>> adaptor module missing. >>> That almost can't be the problem. Every Bio::Seq::RichSeq is-a >>> Bio::Seq and a SeqAdaptor is present. >>> I'm afraid it gets stuck somewhere else and frankly I didn't see >>> the RichSeqAdaptor failing to load in your stack trace: >>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> >>>> MSG: Annotation of class Bio::Annotation::Collection not >>>> type-mapped. Internal error? >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw >>>> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >>>> STACK: >>>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >>>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >>>> STACK: >>>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >>>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >>>> STACK: Bio::DB::Persistent::PersistentObject::store >>>> Bio/DB/Persistent/PersistentObject.pm:271 >>>> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >>>> Bio/DB/BioSQL/SeqAdaptor.pm:224 >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>>> STACK: Bio::DB::Persistent::PersistentObject::create >>>> Bio/DB/Persistent/PersistentObject.pm:244 >>>> STACK: t/04swiss.t:36 >>>> ----------------------------------------------------------- >>> What that tells me is that when bioperl-db tries to store the >>> annotation bundle of the (SwissProt) sequence, one of the >>> annotations that it encounters is of type >>> Bio::Annotation::Collection. At present bioperl-db doesn't know >>> what to do with it; i.e., bioperl-db can't yet handle hierarchical >>> annotation collections (collections within collections). >>> I believe this is due to recent changes in how the GN line is >>> parsed in BioPerl - Chris does this ring the right bell? I thought >>> though you had built in a method would allow flattening out >> >> This appears to be using an older bioperl-live checkout, one where >> Heikki changed GN parsing to use a nested Annotation::Collection. >> I reverted that back in a later commit to svn specifically b/c of >> bioperl-db problems. bioperl-live's swiss.pm now uses a new >> subclass of Bio::Annotation::SimpleValue (Bio::Annotation::TagTree) >> that represents nested values via Data::Stag's itext output (we can >> change that to alternatives if needed). >> >> Here are the last few relevant revisions in bioperl-live's main >> trunk (mine is the latest): >> >> ------------------------------------------------------------------------ >> r14562 | cjfields | 2008-02-28 08:30:05 -0600 (Thu, 28 Feb 2008) | >> 1 line >> >> bug 1825: updating swiss.pm/tests to try out TagTree (passes all >> tests). Need to update Handler.t and related modules still... >> ------------------------------------------------------------------------ >> r14541 | heikki | 2008-02-25 00:10:48 -0600 (Mon, 25 Feb 2008) | 1 >> line >> >> documentation for the GN line parsing and management >> ------------------------------------------------------------------------ >> r14538 | heikki | 2008-02-23 08:48:23 -0600 (Sat, 23 Feb 2008) | 1 >> line >> >> GN (Gene Name) line parsing rewrite. Breaks backward compatibility. >> Can now deal with >1 gene per entry and four categories of names >> per gene. Parses old style syntax (...OR ... OR ... ) into one gene >> name and synonyms for each gene. Docs to follow. >> >> .... >> >> I just updated all code from dev and reran bioperl-db tests w/o >> problems. Maybe someone else could do the same to see what happens? >> >>> It's worth noting that BioSQL itself can't really represent nested >>> annotation collections other than by using ontology terms and >>> their hierarchy, which at present I think isn't really >>> appropriate, but I have to think through the issue more. In other >>> words, in BioSQL you can't directly tie together a bunch of >>> qualifier value pairs into a "bag" and then nest this bag within >>> another. The way to make this work with the current schema is to >>> flatten out the nesting. >>> >>> -hilmar >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >> >> Might be worth looking into for a future BioSQL release, but we >> have a decent workaround in place for now, as long as it works >> cross-platform and cross-RDB. >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Thu Apr 17 13:52:22 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 13:52:22 -0400 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: <48078E56.9000404@bms.com> Chris Fields wrote: > Stefan, > > 'get_dbxrefs' was introduced in bioperl-live a while back during the > feature/annotation rollback detailed here: > > http://www.bioperl.org/wiki/Feature_Annotation_rollback > Chris was right! > I still think this is an interfering old bioperl (and maybe > bioperl-db) installation causing the problems; I had similar issues at > one point and had to find and remove the old installation. It might > be worth (1) checking 'perldoc -l Bio::Root::Root', This is the first thing I did and it seemed fine from command line. So I checked a new copy (vs. updating), set PERL5LIB to the minimum which is necessary (Build changes INC), which is /home/kirovs/bioperl-db/bplive:/stf/sysdev/perl/newlib/perl/lib/5.8/ia64-linux-multi/ (/home/kirovs/bioperl-db/bplive being the fresh copy and the other having Module::Build, etc., but definitely no bioperl). This fixed the problem. I still do not see where the old module came from, but that was a really good guess. Thanks Stefan > which will give the location of the Bio::Root::Root in lib path being > used, and (2) using './Build install uninst=1' to remove any old > bioperl/bioperl-db installations. Unfortunately this is not an option for me. > > chris > > On Apr 17, 2008, at 11:40 AM, Stefan Kirov wrote: > >> Hilmar, >> sorry, I missed the part after the stack trace... In any case this is >> still problem for me after I updated bioperl-live. >> I see this with a number of other tests: >> t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" via >> package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 78. >> t/04swiss.........dubious >> Test returned status 2 (wstat 512, 0x200) >> DIED. FAILED tests 6-52 >> Failed 47/52 tests, 9.62% okay >> t/05seqfeature....ok 4/48Can't locate object method "get_dbxrefs" via >> package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 72. >> t/05seqfeature....FAILED tests 9-48 >> Failed 40/48 tests, 16.67% okay >> t/06comment.......ok >> t/07dblink........ok >> t/08genbank.......ok >> t/09fuzzy2........ok >> t/10ensembl.......ok 1/15Can't locate object method "get_dbxrefs" via >> package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 1420. >> t/10ensembl.......dubious >> Test returned status 2 (wstat 512, 0x200) >> DIED. FAILED tests 3-15 >> Failed 13/15 tests, 13.33% okay >> t/11locuslink.....ok 4/110Can't locate object method "get_dbxrefs" >> via package "Bio::Annotation::OntologyTerm" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 1. >> t/11locuslink.....dubious >> Test returned status 2 (wstat 512, 0x200) >> DIED. FAILED tests 5-110 >> Failed 106/110 tests, 3.64% okay >> t/12ontology......ok 1/738Can't locate object method "get_dbxrefs" >> via package "Bio::Ontology::GOterm" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 98. >> t/12ontology......dubious >> Test returned status 255 (wstat 65280, 0xff00) >> DIED. FAILED tests 5-738 >> Failed 734/738 tests, 0.54% okay >> t/13remove........ok 2/59Can't locate object method "get_dbxrefs" via >> package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 145. >> t/13remove........FAILED tests 11-59 >> Failed 49/59 tests, 16.95% okay >> t/14query.........ok >> t/15cluster.......ok 3/160Can't locate object method "get_dbxrefs" >> via package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 1. >> t/15cluster.......dubious >> Test returned status 2 (wstat 512, 0x200) >> DIED. FAILED tests 6-160 >> Failed 155/160 tests, 3.12% okay >> t/16obda..........ok >> >> On Thu, 17 Apr 2008, Chris Fields wrote: >> >>> >>> On Apr 17, 2008, at 9:47 AM, Hilmar Lapp wrote: >>> >>>> On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >>>>> In any case I debugged and tracked that down to the RichSeq >>>>> adaptor module missing. >>>> That almost can't be the problem. Every Bio::Seq::RichSeq is-a >>>> Bio::Seq and a SeqAdaptor is present. >>>> I'm afraid it gets stuck somewhere else and frankly I didn't see >>>> the RichSeqAdaptor failing to load in your stack trace: >>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> >>>>> MSG: Annotation of class Bio::Annotation::Collection not >>>>> type-mapped. Internal error? >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw >>>>> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >>>>> STACK: >>>>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >>>>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >>>>> STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >>>>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >>>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >>>>> STACK: Bio::DB::Persistent::PersistentObject::store >>>>> Bio/DB/Persistent/PersistentObject.pm:271 >>>>> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >>>>> Bio/DB/BioSQL/SeqAdaptor.pm:224 >>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>>>> STACK: Bio::DB::Persistent::PersistentObject::create >>>>> Bio/DB/Persistent/PersistentObject.pm:244 >>>>> STACK: t/04swiss.t:36 >>>>> ----------------------------------------------------------- >>>> What that tells me is that when bioperl-db tries to store the >>>> annotation bundle of the (SwissProt) sequence, one of the >>>> annotations that it encounters is of type >>>> Bio::Annotation::Collection. At present bioperl-db doesn't know >>>> what to do with it; i.e., bioperl-db can't yet handle hierarchical >>>> annotation collections (collections within collections). >>>> I believe this is due to recent changes in how the GN line is >>>> parsed in BioPerl - Chris does this ring the right bell? I thought >>>> though you had built in a method would allow flattening out >>> >>> This appears to be using an older bioperl-live checkout, one where >>> Heikki changed GN parsing to use a nested Annotation::Collection. I >>> reverted that back in a later commit to svn specifically b/c of >>> bioperl-db problems. bioperl-live's swiss.pm now uses a new subclass >>> of Bio::Annotation::SimpleValue (Bio::Annotation::TagTree) that >>> represents nested values via Data::Stag's itext output (we can >>> change that to alternatives if needed). >>> >>> Here are the last few relevant revisions in bioperl-live's main >>> trunk (mine is the latest): >>> >>> ------------------------------------------------------------------------ >>> >>> r14562 | cjfields | 2008-02-28 08:30:05 -0600 (Thu, 28 Feb 2008) | 1 >>> line >>> >>> bug 1825: updating swiss.pm/tests to try out TagTree (passes all >>> tests). Need to update Handler.t and related modules still... >>> ------------------------------------------------------------------------ >>> >>> r14541 | heikki | 2008-02-25 00:10:48 -0600 (Mon, 25 Feb 2008) | 1 line >>> >>> documentation for the GN line parsing and management >>> ------------------------------------------------------------------------ >>> >>> r14538 | heikki | 2008-02-23 08:48:23 -0600 (Sat, 23 Feb 2008) | 1 line >>> >>> GN (Gene Name) line parsing rewrite. Breaks backward compatibility. >>> Can now deal with >1 gene per entry and four categories of names per >>> gene. Parses old style syntax (...OR ... OR ... ) into one gene name >>> and synonyms for each gene. Docs to follow. >>> >>> .... >>> >>> I just updated all code from dev and reran bioperl-db tests w/o >>> problems. Maybe someone else could do the same to see what happens? >>> >>>> It's worth noting that BioSQL itself can't really represent nested >>>> annotation collections other than by using ontology terms and their >>>> hierarchy, which at present I think isn't really appropriate, but I >>>> have to think through the issue more. In other words, in BioSQL you >>>> can't directly tie together a bunch of qualifier value pairs into a >>>> "bag" and then nest this bag within another. The way to make this >>>> work with the current schema is to flatten out the nesting. >>>> >>>> -hilmar >>>> --=========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>> >>> Might be worth looking into for a future BioSQL release, but we have >>> a decent workaround in place for now, as long as it works >>> cross-platform and cross-RDB. >>> >>> chris >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From hubert.gaynor at yahoo.com Thu Apr 17 20:53:16 2008 From: hubert.gaynor at yahoo.com (Hubert Gaynor) Date: Thu, 17 Apr 2008 17:53:16 -0700 (PDT) Subject: [Bioperl-l] Can I use BLAST against a database like MySQL Message-ID: <130971.67684.qm@web46007.mail.sp1.yahoo.com> Hi Sean, I got it. Thank you so much! Hubert ----- Original Message ---- From: Sean Davis To: Hubert Gaynor Sent: Thursday, April 17, 2008 6:36:02 PM Subject: Re: [Bioperl-l] Can I use BLAST against a database like MySQL On Thu, Apr 17, 2008 at 2:19 AM, Hubert Gaynor wrote: > Hi, > > As far as I know, before using BLAST to do the alignment the first thing should be done is typing formatdb to construct a database. But I was wondering whether it is possible to construct a database with MySQL which probably will grant the BLAST search a higher speed and make the database management much easier? > formatdb is used to make a representation that can be used efficiently by blast. That representation already makes blast faster. MySQL can't be used for such things. As for speeding blast, if you have a multiprocessor machine, you can take advantage of those using blast and increasing the number of processors. Also, while blast is a very versatile program, it is not the only alignment program available. Depending on your needs, you could look at other programs such as blat or gmap that can be 2-3 orders of magnitude faster than blast. Sean ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From Russell.Smithies at agresearch.co.nz Thu Apr 17 21:39:23 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 18 Apr 2008 13:39:23 +1200 Subject: [Bioperl-l] accessing params for custom glyphs? In-Reply-To: <130971.67684.qm@web46007.mail.sp1.yahoo.com> References: <130971.67684.qm@web46007.mail.sp1.yahoo.com> Message-ID: This is probably more of a Perl OO problem I'm having, but can anyone tell me how to access a parameter when I create a custom glyph? I've created a panel in the usual way then I add a feature with 'my_glyph' and want to pass the value of -new_parameter to the glyph drawing code. $panel->add_track( $feature, -font => gdSmallFont, -glyph => 'my_glyph' , -height => 10, -label => 1, -strand => "forward", -new_parameter => "test", In my_glyph.pm, I have the usual draw_component sub: sub draw_component { my $self = shift; my $gd = shift; my ($x1,$y1,$x2,$y2) = $self->bounds(@_); my $fg = $self->fgcolor; my $params = $self->?????????? <<--- how do I access the value of "new_parameter" set in the panel drawing code? $gd->line($x1,$y1,$x2,$y2,$fg); $gd->line($x1,$y2,$x2,$y1,$fg); } Any ideas? Thanx, Russell Smithies ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From David.Messina at sbc.su.se Fri Apr 18 05:31:59 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 18 Apr 2008 11:31:59 +0200 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: <628aabb70804170155n4e5dfd81r7020c3e9e11094ff@mail.gmail.com> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> <628aabb70804161112o6610ee1fkfb4b08e74730237d@mail.gmail.com> <1208420674.23342.15.camel@razor.sbc.su.se> <628aabb70804170155n4e5dfd81r7020c3e9e11094ff@mail.gmail.com> Message-ID: <628aabb70804180231p2b9cef9dwd5441e85c31531fd@mail.gmail.com> Jacob, I talked about your question with a colleague of mine who has been working in this area. Below is his reply. [I'm reposting this *without* the attachment mentioned since the mailing list wouldn't accept it otherwise. If anyone wants a copy of the code, just email me.] Dave ------- > 3. Pfam has this capability, i.e. to show all domains with a given > architecture, but it is difficult to get at the actual sequences or > even a list of accession numbers. First, this should be available right away in PfamAlyser: http://pfamalyzer.sbc.su.se/pfamalyzer/index.html although you might need to upgrade your browser to Java 1.6 to get it to work. If this does not work as suggested (an upgraded version is coming eventually), have a look at the file: ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/swisspfam.gz which contains the Pfam architectures for all UniProt sequences. You can parse that to get a file of - correspondences and just filter that to get the accession numbers. (Please find attached a Perl script to do just that.) Under UNIX, you can then just grep this for the domain IDs, (like grep domainArchitectureFile.txt PF00008 | grep PF00456 > resultFile.txt) but I am sure there are solutions under other operating systems as well. You could then write a script to parse out the corresponding sequences from the UniProt fasta flatfile, if you wanted, or (again under UNIX) a script to wget them of the webpage. In case your sequences are not in UniProt, consider using HMMER and the Pfam HMM files to assign domains to all sequences in your dataset. I would then parse the HMMER output into the same format as the above, and use the same approach following that. Hope this helps, Yours sincerely, Kristoffer Forslund krifo at sbc.su.se From lincoln.stein at gmail.com Fri Apr 18 15:16:19 2008 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 18 Apr 2008 15:16:19 -0400 Subject: [Bioperl-l] [Gmod-gbrowse] accessing params for custom glyphs? In-Reply-To: References: <130971.67684.qm@web46007.mail.sp1.yahoo.com> Message-ID: <6dce9a0b0804181216q6564e580u8a805ae96c78df2e@mail.gmail.com> Hi Russell, It's very simple: my $params = $self->option('new_parameter'); Lincoln On Thu, Apr 17, 2008 at 9:39 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > This is probably more of a Perl OO problem I'm having, but can anyone > tell me how to access a parameter when I create a custom glyph? > > I've created a panel in the usual way then I add a feature with > 'my_glyph' and want to pass the value of -new_parameter to the glyph > drawing code. > > $panel->add_track( $feature, > -font => gdSmallFont, > -glyph => 'my_glyph' , > -height => 10, > -label => 1, > -strand => "forward", > -new_parameter => "test", > > > In my_glyph.pm, I have the usual draw_component sub: > > sub draw_component { > my $self = shift; > my $gd = shift; > my ($x1,$y1,$x2,$y2) = $self->bounds(@_); > my $fg = $self->fgcolor; > my $params = $self->?????????? <<--- how do I access the value of > "new_parameter" set in the panel drawing code? > > $gd->line($x1,$y1,$x2,$y2,$fg); > $gd->line($x1,$y2,$x2,$y1,$fg); > > } > > Any ideas? > > Thanx, > > Russell Smithies > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From jason at bioperl.org Fri Apr 18 22:35:10 2008 From: jason at bioperl.org (Jason Stajich) Date: Fri, 18 Apr 2008 19:35:10 -0700 Subject: [Bioperl-l] index::abstract on win and unix In-Reply-To: <1208381947.16620.6.camel@kiss-laptop> References: <1208366718.19084.15.camel@kiss-laptop> <1208381947.16620.6.camel@kiss-laptop> Message-ID: do you want the LOCUS or the ACCESSION? Do you mean the result is the completely wrong record or just the wrong field? accession number is available from the seq's accession_number() method. -jason On Apr 16, 2008, at 2:39 PM, Fr?d?ric Romagn? wrote: > Well, if with input file you mean the database used, it's created > with Bio::Index::GenBank from a ncbi FTP's genbank file. > > $id is an accession number read from a file but i chomp the line... > > I am trying to install the svn version of bioperl under windows to see > if there is an improvement. > > Le jeudi 17 avril 2008 ? 08:49 +1200, Smithies, Russell a ?crit : >> Did you check the format of your input file? >> i.e. DOS or UNIX line endings? >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open- >>> bio.org] On Behalf Of Fr?d?ric Romagn? >>> Sent: Thursday, 17 April 2008 5:25 a.m. >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix >>> >>> Hello, >>> i made a program which use Bio::Index::GenBank and i tested it under >>> unix, that worked well. >>> >>> But i have to launch it under windows and it seems not to work on. >>> >>> Here is the problem : >>> >>> my $dbobj = Bio::Index::Abstract->new("Data/$db"); >>> my $seq = $dbobj->get_Seq_by_acc($id); >>> print $seq->display_id."\n"; >>> >>> did not print the same number than $id !!! So i don't work on the >>> sequence expected... >>> >>> I use the SVN sources on unix and the Perl package manager for >>> windows... >>> >>> Thanks. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> ===================================================================== >> == >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> ===================================================================== >> == > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bioperlanand at yahoo.com Mon Apr 21 03:44:00 2008 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Mon, 21 Apr 2008 00:44:00 -0700 (PDT) Subject: [Bioperl-l] a question on obtaining HTML formatted Blast output along with the Blast hits image Message-ID: <372845.37134.qm@web36808.mail.mud.yahoo.com> Hi everybody, I would like to obtain a HTML formatted blast report output along with a picture of the blast hits as shown on Slide 60 in this pdf: http://jason.open-bio.org/Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf I have gotten the HTML output working using "Bio::SearchIO::Writer::HTMLResultWriter". My question: How do I integrate it with Bio:Graphics to render the blast hits image at the correct position in my Bioperl reformatted html file. I ultimately want to be able to display my blast output files on a browser. Here is my code so far: ---------------------------------------------------------------- #!/usr/bin/perl -w # usage: $0 use strict; use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; my $infile = shift or die $!; my $searchio = new Bio::SearchIO( -format => 'blast',-file => $infile ); my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">${infile}.html"); $outhtml->write_result($searchio->next_result); ---------------------------------------------------------------- Thanks in advance, Anand --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From cjfields at uiuc.edu Mon Apr 21 11:07:17 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 Apr 2008 10:07:17 -0500 Subject: [Bioperl-l] [Proposed change] HSP::frame() Message-ID: I have noticed (in relation to bug 2485, http://bugzilla.open-bio.org/show_bug.cgi?id=2485) that the Bio::Search::HSP::GenericHSP frame() method is implemented very differently from strand(), start(), end(), and most other HSP methods. The current behavior is to return an array of two values (query and hit frame) under list conditions, the query frame if one value is passed, and the subject frame if no value is passed under scalar context and both under list context. The latter behavior is unfortunately leading to the aforementioned bug above. The method is also implied to be a getter/setter, but the implementation doesn't allow that; it always sets to the instantiated values (in fact, repeatedly so). In order to fix that and make the interface more consistent I am changing frame() to behave like strand(), etc., in that the first argument is 'query/subject/hit/list' (default = 'query' if no arg specified) and the rest optional values for setting, in query/subject order. One issue: I can catch and imitate most of the older behavior with a few additional checks, the one exception being the old frame() default return value which is now 'query' (not context-dependent). If needed we can change the default to 'hit', but I believe method consistency is probably the better route, and I can always add a warning under old API circumstances indicating the change. I am also modifying HSPTableWriter to print frame_hit and frame_query (previously it was only printing 'frame', which implied the hit frame). I can see this being an issue with anyone expecting 'frame' instead of 'frame_hit'; I could hack in a fix for that if needed. If there aren't any objections or suggestions, I'll commit this in the next day or two. chris From cjfields at uiuc.edu Mon Apr 21 11:32:59 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 Apr 2008 10:32:59 -0500 Subject: [Bioperl-l] Assembly.t test fails Message-ID: I'm getting some significant test failures in bioperl-live for Bio::Assembly: t/Assembly...... 1..35 ok 1 - use Bio::Assembly::IO; ok 2 - The object isa Bio::Assembly::IO ok 3 - The object isa Bio::Assembly::Scaffold ok 4 not ok 5 ok 6 - The object isa Bio::AnnotationCollectionI ok 7 - no annotations in Annotation collection? ok 8 # Failed test at t/Assembly.t line 35. # got: 'NoName' # expected: 'test' Can't locate object method "get_contig_seq_ids" via package "Bio::Assembly::Contig" at /Users/cjfields/bioperl/bioperl-live/blib/ lib/Bio/Assembly/Scaffold.pm line 189, line 733. # Looks like you planned 35 tests but only ran 8. # Looks like you failed 1 test of 8 run. # Looks like your test died just after 8. Dubious, test returned 255 (wstat 65280, 0xff00) Failed 28/35 subtests Test Summary Report ------------------- t/Assembly.t (Wstat: 65280 Tests: 8 Failed: 1) Failed test: 5 Non-zero exit status: 255 Parse errors: Bad plan. You planned 35 tests but ran 8. Files=1, Tests=8, 0 wallclock secs ( 0.01 usr 0.00 sys + 0.22 cusr 0.04 csys = 0.27 CPU) Result: FAIL Failed 1/1 test programs. 1/8 subtests failed. chris Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Apr 21 11:44:21 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 Apr 2008 10:44:21 -0500 Subject: [Bioperl-l] Assembly.t test fails In-Reply-To: References: Message-ID: <2F199628-717E-4F88-85D7-408BD7BBE16D@uiuc.edu> Scratch that, figured it out (easy fix). chris On Apr 21, 2008, at 10:32 AM, Chris Fields wrote: > I'm getting some significant test failures in bioperl-live for > Bio::Assembly: > > t/Assembly...... > 1..35 > ok 1 - use Bio::Assembly::IO; > ok 2 - The object isa Bio::Assembly::IO > ok 3 - The object isa Bio::Assembly::Scaffold > ok 4 > not ok 5 > ok 6 - The object isa Bio::AnnotationCollectionI > ok 7 - no annotations in Annotation collection? > ok 8 > > # Failed test at t/Assembly.t line 35. > # got: 'NoName' > # expected: 'test' > Can't locate object method "get_contig_seq_ids" via package > "Bio::Assembly::Contig" at /Users/cjfields/bioperl/bioperl-live/blib/ > lib/Bio/Assembly/Scaffold.pm line 189, line 733. > # Looks like you planned 35 tests but only ran 8. > # Looks like you failed 1 test of 8 run. > # Looks like your test died just after 8. > Dubious, test returned 255 (wstat 65280, 0xff00) > Failed 28/35 subtests > > Test Summary Report > ------------------- > t/Assembly.t (Wstat: 65280 Tests: 8 Failed: 1) > Failed test: 5 > Non-zero exit status: 255 > Parse errors: Bad plan. You planned 35 tests but ran 8. > Files=1, Tests=8, 0 wallclock secs ( 0.01 usr 0.00 sys + 0.22 > cusr 0.04 csys = 0.27 CPU) > Result: FAIL > Failed 1/1 test programs. 1/8 subtests failed. > > > chris > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From frederic.romagne at gmail.com Mon Apr 21 11:53:11 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Mon, 21 Apr 2008 10:53:11 -0500 Subject: [Bioperl-l] index::abstract on win and unix In-Reply-To: References: <1208366718.19084.15.camel@kiss-laptop> <1208381947.16620.6.camel@kiss-laptop> Message-ID: <1208793191.25906.9.camel@kiss-laptop> In fact, i want the whole Bio::Seq object, but the i verified the ACCESSION and the LOCUS are the same in my genbank files. I saw that the program sometimes tells that it cannot find the entry : if( !defined $seq ) { warn("Sequence $id in Database $db is not present\n"); } i suspect the make_index function not to work properly on windows instead of the ?get_Seq_by_acc function... Le vendredi 18 avril 2008 ? 19:35 -0700, Jason Stajich a ?crit : > do you want the LOCUS or the ACCESSION? > Do you mean the result is the completely wrong record or just the > wrong field? > accession number is available from the seq's accession_number() method. > -jason > On Apr 16, 2008, at 2:39 PM, Fr?d?ric Romagn? wrote: > > > Well, if with input file you mean the database used, it's created > > with Bio::Index::GenBank from a ncbi FTP's genbank file. > > > > $id is an accession number read from a file but i chomp the line... > > > > I am trying to install the svn version of bioperl under windows to see > > if there is an improvement. > > > > Le jeudi 17 avril 2008 ? 08:49 +1200, Smithies, Russell a ?crit : > >> Did you check the format of your input file? > >> i.e. DOS or UNIX line endings? > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open- > >>> bio.org] On Behalf Of Fr?d?ric Romagn? > >>> Sent: Thursday, 17 April 2008 5:25 a.m. > >>> To: bioperl-l at lists.open-bio.org > >>> Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix > >>> > >>> Hello, > >>> i made a program which use Bio::Index::GenBank and i tested it under > >>> unix, that worked well. > >>> > >>> But i have to launch it under windows and it seems not to work on. > >>> > >>> Here is the problem : > >>> > >>> my $dbobj = Bio::Index::Abstract->new("Data/$db"); > >>> my $seq = $dbobj->get_Seq_by_acc($id); > >>> print $seq->display_id."\n"; > >>> > >>> did not print the same number than $id !!! So i don't work on the > >>> sequence expected... > >>> > >>> I use the SVN sources on unix and the Perl package manager for > >>> windows... > >>> > >>> Thanks. > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> ===================================================================== > >> == > >> Attention: The information contained in this message and/or > >> attachments > >> from AgResearch Limited is intended only for the persons or entities > >> to which it is addressed and may contain confidential and/or > >> privileged > >> material. Any review, retransmission, dissemination or other use > >> of, or > >> taking of any action in reliance upon, this information by persons or > >> entities other than the intended recipients is prohibited by > >> AgResearch > >> Limited. If you have received this message in error, please notify > >> the > >> sender immediately. > >> ===================================================================== > >> == > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ewijaya at gmail.com Tue Apr 22 10:03:07 2008 From: ewijaya at gmail.com (Edward Wijaya) Date: Tue, 22 Apr 2008 22:03:07 +0800 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output Message-ID: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> Hi, Is there any module that can parse the following output of BLAT. This is taken from UCSC browser. The idea is to parse it and then extract the conserved block of aligned sequences. __DATA__ Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps B D D. melanogaster tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa B D D. simulans tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa B D D. sechellia tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa B D D. yakuba tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa D. erecta tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa D. ananassae taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- D. pseudoobscura tata----ccagtacac-cttatatg------------tttttaaata-------------------- B D D. persimilis tata----ccagtacac-attatatg------------tttttaaata-------------------- D. willistoni aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa D. virilis -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa D. mojavensis -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa D. grimshawi ==================================================================== T. castaneum ==================================================================== Inserts between block 3 and 4 in window D. pseudoobscura 2008bp B D D. persimilis 1421bp D. virilis 5bp D. mojavensis 4640bp Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps B D D. melanogaster ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga B D D. simulans ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga B D D. sechellia ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga B D D. yakuba ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga D. erecta ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga D. pseudoobscura ==================================================================== B D D. persimilis ==================================================================== D. willistoni ----aggattacgaagttcctttat-------------------aaag-------------------- D. virilis gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- D. mojavensis ==================================================================== D. grimshawi ==================================================================== T. castaneum ==================================================================== __ END__ From cjfields at uiuc.edu Tue Apr 22 10:22:45 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 22 Apr 2008 09:22:45 -0500 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output In-Reply-To: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> References: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> Message-ID: <766FDF9E-9F7B-4826-B7FA-87DF3B074EBC@uiuc.edu> A quick grep of bioperl-live gets me Bio::SearchIO::blast, Bio::SearchIO::axt, Bio::SearchIO::psl, Bio::Tools::Blat, and Bio::Tools::WebBlat. Haven't looked at the docs but it's a start! chris On Apr 22, 2008, at 9:03 AM, Edward Wijaya wrote: > Hi, > > Is there any module that can parse the following output > of BLAT. This is taken from UCSC browser. > > The idea is to parse it and then extract the conserved block > of aligned sequences. > > > __DATA__ > Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps > B D D. melanogaster > tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa > B D D. simulans > tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa > B D D. sechellia > tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa > B D D. yakuba > tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa > D. erecta > tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa > D. ananassae > taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- > D. pseudoobscura > tata----ccagtacac-cttatatg------------tttttaaata-------------------- > B D D. persimilis > tata----ccagtacac-attatatg------------tttttaaata-------------------- > D. willistoni > aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa > D. virilis > -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa > D. mojavensis > -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa > D. grimshawi > ==================================================================== > T. castaneum > ==================================================================== > > Inserts between block 3 and 4 in window > D. pseudoobscura 2008bp > B D D. persimilis 1421bp > D. virilis 5bp > D. mojavensis 4640bp > > Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps > B D D. melanogaster > ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga > B D D. simulans > ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga > B D D. sechellia > ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga > B D D. yakuba > ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga > D. erecta > ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga > D. pseudoobscura > ==================================================================== > B D D. persimilis > ==================================================================== > D. willistoni > ----aggattacgaagttcctttat-------------------aaag-------------------- > D. virilis > gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- > D. mojavensis > ==================================================================== > D. grimshawi > ==================================================================== > T. castaneum > ==================================================================== > > __ END__ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Apr 22 10:59:25 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 22 Apr 2008 09:59:25 -0500 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output In-Reply-To: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> References: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> Message-ID: <4F3522BB-28F0-44A8-8DE1-7CF3F648402A@uiuc.edu> A quick grep of bioperl-live gets me Bio::SearchIO::blast, Bio::SearchIO::axt, Bio::SearchIO::psl, Bio::Tools::Blat, and Bio::Tools::WebBlat. Haven't looked at the docs but it's a start! chris On Apr 22, 2008, at 9:03 AM, Edward Wijaya wrote: > Hi, > > Is there any module that can parse the following output > of BLAT. This is taken from UCSC browser. > > The idea is to parse it and then extract the conserved block > of aligned sequences. > > > __DATA__ > Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps > B D D. melanogaster > tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa > B D D. simulans > tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa > B D D. sechellia > tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa > B D D. yakuba > tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa > D. erecta > tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa > D. ananassae > taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- > D. pseudoobscura > tata----ccagtacac-cttatatg------------tttttaaata-------------------- > B D D. persimilis > tata----ccagtacac-attatatg------------tttttaaata-------------------- > D. willistoni > aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa > D. virilis > -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa > D. mojavensis > -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa > D. grimshawi > ==================================================================== > T. castaneum > ==================================================================== > > Inserts between block 3 and 4 in window > D. pseudoobscura 2008bp > B D D. persimilis 1421bp > D. virilis 5bp > D. mojavensis 4640bp > > Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps > B D D. melanogaster > ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga > B D D. simulans > ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga > B D D. sechellia > ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga > B D D. yakuba > ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga > D. erecta > ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga > D. pseudoobscura > ==================================================================== > B D D. persimilis > ==================================================================== > D. willistoni > ----aggattacgaagttcctttat-------------------aaag-------------------- > D. virilis > gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- > D. mojavensis > ==================================================================== > D. grimshawi > ==================================================================== > T. castaneum > ==================================================================== > > __ END__ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Tue Apr 22 14:49:32 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 22 Apr 2008 11:49:32 -0700 Subject: [Bioperl-l] Fwd: [blast-announce] New BLAST URL available at the NCBI References: Message-ID: Does anyone want to take a look at how to use these URLs in the RemoteBlast module, if the interface is the same? -jason Begin forwarded message: > From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" > Date: April 22, 2008 11:35:04 AM PDT > To: > Subject: [blast-announce] New BLAST URL available at the NCBI > > New BLAST URL available at the NCBI > > > > The NCBI has activated a new URL for BLAST searches at the NCBI: > http://blast.ncbi.nlm.nih.gov. > > > > Searches sent to this URL can take advantage of a larger number of > machines for searches and the system has a better overall fault > tolerance. > > > > We recommend migration of all BLAST links and bookmarks (e.g., > http://www.ncbi.nlm.nih.gov/BLAST/ and > http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi) to the new URL. > > > > Links on the NCBI and BLAST home pages will start to change in the > coming weeks. > > > > At this point in time the plans are to also maintain the current BLAST > URL. > > > > > From jason at bioperl.org Tue Apr 22 14:51:08 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 22 Apr 2008 11:51:08 -0700 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output In-Reply-To: <766FDF9E-9F7B-4826-B7FA-87DF3B074EBC@uiuc.edu> References: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> <766FDF9E-9F7B-4826-B7FA-87DF3B074EBC@uiuc.edu> Message-ID: <6C812413-B375-427B-9AF8-5A0AA6167CC8@bioperl.org> if you get it as axt it should parse fine in SearchIO but that is pairwise, if you can get an alignment blocks I can't remember what format this is from UCSC. MSAs are going to be better handed through Bio::AlignIO though so it might be better to build a parser on that. On Apr 22, 2008, at 7:22 AM, Chris Fields wrote: > A quick grep of bioperl-live gets me Bio::SearchIO::blast, > Bio::SearchIO::axt, Bio::SearchIO::psl, Bio::Tools::Blat, and > Bio::Tools::WebBlat. Haven't looked at the docs but it's a start! > > chris > > On Apr 22, 2008, at 9:03 AM, Edward Wijaya wrote: > >> Hi, >> >> Is there any module that can parse the following output >> of BLAT. This is taken from UCSC browser. >> >> The idea is to parse it and then extract the conserved block >> of aligned sequences. >> >> >> __DATA__ >> Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps >> B D D. melanogaster >> tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa >> B D D. simulans >> tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa >> B D D. sechellia >> tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa >> B D D. yakuba >> tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa >> D. erecta >> tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa >> D. ananassae >> taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- >> D. pseudoobscura >> tata----ccagtacac-cttatatg------------tttttaaata-------------------- >> B D D. persimilis >> tata----ccagtacac-attatatg------------tttttaaata-------------------- >> D. willistoni >> aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa >> D. virilis >> -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa >> D. mojavensis >> -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa >> D. grimshawi >> ==================================================================== >> T. castaneum >> ==================================================================== >> >> Inserts between block 3 and 4 in window >> D. pseudoobscura 2008bp >> B D D. persimilis 1421bp >> D. virilis 5bp >> D. mojavensis 4640bp >> >> Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps >> B D D. melanogaster >> ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga >> B D D. simulans >> ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga >> B D D. sechellia >> ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga >> B D D. yakuba >> ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga >> D. erecta >> ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga >> D. pseudoobscura >> ==================================================================== >> B D D. persimilis >> ==================================================================== >> D. willistoni >> ----aggattacgaagttcctttat-------------------aaag-------------------- >> D. virilis >> gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- >> D. mojavensis >> ==================================================================== >> D. grimshawi >> ==================================================================== >> T. castaneum >> ==================================================================== >> >> __ END__ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Apr 22 15:02:14 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 22 Apr 2008 14:02:14 -0500 Subject: [Bioperl-l] Fwd: [blast-announce] New BLAST URL available at the NCBI In-Reply-To: References: Message-ID: <13C2AD96-8297-40DD-ADCC-B2BEC923B9E0@uiuc.edu> They work exactly the same as the old URL, at least on the surface; I haven't tried changing many URLAPI parameters. I went ahead and changed the URL in RemoteBlast to http://blast.ncbi.nlm.nih.gov/Blast.cgi as it works with RemoteBlast.t. chris On Apr 22, 2008, at 1:49 PM, Jason Stajich wrote: > Does anyone want to take a look at how to use these URLs in the > RemoteBlast module, if the interface is the same? > > -jason > > Begin forwarded message: > >> From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" >> >> Date: April 22, 2008 11:35:04 AM PDT >> To: >> Subject: [blast-announce] New BLAST URL available at the NCBI >> >> New BLAST URL available at the NCBI >> >> >> >> The NCBI has activated a new URL for BLAST searches at the NCBI: >> http://blast.ncbi.nlm.nih.gov. >> >> >> >> Searches sent to this URL can take advantage of a larger number of >> machines for searches and the system has a better overall fault >> tolerance. >> >> >> >> We recommend migration of all BLAST links and bookmarks (e.g., >> http://www.ncbi.nlm.nih.gov/BLAST/ and >> http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi) to the new URL. >> >> >> >> Links on the NCBI and BLAST home pages will start to change in the >> coming weeks. >> >> >> >> At this point in time the plans are to also maintain the current >> BLAST >> URL. >> >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Apr 22 14:58:40 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 22 Apr 2008 13:58:40 -0500 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output In-Reply-To: <6C812413-B375-427B-9AF8-5A0AA6167CC8@bioperl.org> References: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> <766FDF9E-9F7B-4826-B7FA-87DF3B074EBC@uiuc.edu> <6C812413-B375-427B-9AF8-5A0AA6167CC8@bioperl.org> Message-ID: <43344C89-6B4D-4360-AF56-A6FDD065FFF3@uiuc.edu> Related to that, I have thought about building a parser for some of the query-anchored alignments produced by blastall, just haven't had time to devote to it. One of these days... chris On Apr 22, 2008, at 1:51 PM, Jason Stajich wrote: > if you get it as axt it should parse fine in SearchIO but that is > pairwise, if you can get an alignment blocks I can't remember what > format this is from UCSC. > MSAs are going to be better handed through Bio::AlignIO though so it > might be better to build a parser on that. > > On Apr 22, 2008, at 7:22 AM, Chris Fields wrote: > >> A quick grep of bioperl-live gets me Bio::SearchIO::blast, >> Bio::SearchIO::axt, Bio::SearchIO::psl, Bio::Tools::Blat, and >> Bio::Tools::WebBlat. Haven't looked at the docs but it's a start! >> >> chris >> >> On Apr 22, 2008, at 9:03 AM, Edward Wijaya wrote: >> >>> Hi, >>> >>> Is there any module that can parse the following output >>> of BLAT. This is taken from UCSC browser. >>> >>> The idea is to parse it and then extract the conserved block >>> of aligned sequences. >>> >>> >>> __DATA__ >>> Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps >>> B D D. melanogaster >>> tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa >>> B D D. simulans >>> tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa >>> B D D. sechellia >>> tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa >>> B D D. yakuba >>> tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa >>> D. erecta >>> tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa >>> D. ananassae >>> taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- >>> D. pseudoobscura >>> tata----ccagtacac-cttatatg------------tttttaaata-------------------- >>> B D D. persimilis >>> tata----ccagtacac-attatatg------------tttttaaata-------------------- >>> D. willistoni >>> aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa >>> D. virilis >>> -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa >>> D. mojavensis >>> -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa >>> D. grimshawi >>> ==================================================================== >>> T. castaneum >>> ==================================================================== >>> >>> Inserts between block 3 and 4 in window >>> D. pseudoobscura 2008bp >>> B D D. persimilis 1421bp >>> D. virilis 5bp >>> D. mojavensis 4640bp >>> >>> Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps >>> B D D. melanogaster >>> ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga >>> B D D. simulans >>> ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga >>> B D D. sechellia >>> ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga >>> B D D. yakuba >>> ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga >>> D. erecta >>> ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga >>> D. pseudoobscura >>> ==================================================================== >>> B D D. persimilis >>> ==================================================================== >>> D. willistoni >>> ----aggattacgaagttcctttat-------------------aaag-------------------- >>> D. virilis >>> gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- >>> D. mojavensis >>> ==================================================================== >>> D. grimshawi >>> ==================================================================== >>> T. castaneum >>> ==================================================================== >>> >>> __ END__ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bioperlanand at yahoo.com Wed Apr 23 02:02:30 2008 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Tue, 22 Apr 2008 23:02:30 -0700 (PDT) Subject: [Bioperl-l] Question on integrating Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter Message-ID: <946658.12337.qm@web36802.mail.mud.yahoo.com> Hi everybody, I would like to use Bio::Graphics in conjunction with Bio::SearchIO::Writer::HTMLResultWriter to obtain a HTML formatted blast report output along with an image of the blast hits as shown on Slide 60 in this pdf: http://jason.open-bio.org/Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf I am able to get the HTML output using "Bio::SearchIO::Writer::HTMLResultWriter" and I am able to get the image using the examples outlined in the Bio::Graphics HOWTO: http://www.bioperl.org/wiki/HOWTO:Graphics My question: How do I integrate Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter to render the blast hits image at the correct position in my BioPerl reformatted html file. I also found that someone else has asked something similar to whatever I am asking & is listed under the "Orphans, Leftovers" category in the ListSummary:April 26-May 9,2006 document: http://www.bioperl.org/wiki/ListSummary:April_26-May_9%2C2006#Orphans.2C_Leftovers Here is my code so far: ---------------------------------------------------------------- #!/usr/bin/perl -w # usage: $0 use strict; use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; my $infile = shift or die $!; my $searchio = new Bio::SearchIO( -format => 'blast',-file => $infile); my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">${infile}.html"); $outhtml->write_result($searchio->next_result); ---------------------------------------------------------------- Thanks in advance, Anand --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From jason at bioperl.org Wed Apr 23 02:15:28 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 22 Apr 2008 23:15:28 -0700 Subject: [Bioperl-l] Question on integrating Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter In-Reply-To: <946658.12337.qm@web36802.mail.mud.yahoo.com> References: <946658.12337.qm@web36802.mail.mud.yahoo.com> Message-ID: <952B0A4E-8A14-4E8E-B36D-14596B20E330@bioperl.org> Basically you want to inject your own IMG tags into the file with these routines: $writerhtml->start_report(\&my_start_report); $writerhtml->title(\&my_title); $writerhtml->hit_link_align(\&my_hit_link_align); $writerhtml->hit_link_desc(\&my_hit_link_desc); fgblast shows a way to do this in part. It relies on Gbrowse to generate the image but you can replace the gbrowse_img reference to your own image generating software. http://people.genome.duke.edu/~jes12/software/scripts/fgblast -jason On Apr 22, 2008, at 11:02 PM, Anand Venkatraman wrote: > Hi everybody, > > I would like to use Bio::Graphics in conjunction with > Bio::SearchIO::Writer::HTMLResultWriter to obtain a HTML formatted > blast report output along with an image of the blast hits as shown > on Slide 60 in this pdf: http://jason.open-bio.org/ > Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf > > I am able to get the HTML output using > "Bio::SearchIO::Writer::HTMLResultWriter" and I am able to get the > image using the examples outlined in the Bio::Graphics HOWTO: > http://www.bioperl.org/wiki/HOWTO:Graphics > > My question: How do I integrate Bio::Graphics with > Bio::SearchIO::Writer::HTMLResultWriter to render the blast hits > image at the correct position in my BioPerl reformatted html file. > > I also found that someone else has asked something similar to > whatever I am asking & is listed under the "Orphans, Leftovers" > category in the ListSummary:April 26-May 9,2006 document: > http://www.bioperl.org/wiki/ListSummary:April_26-May_9% > 2C2006#Orphans.2C_Leftovers > > Here is my code so far: > ---------------------------------------------------------------- > #!/usr/bin/perl -w > # usage: $0 > use strict; > use Bio::SearchIO; > use Bio::SearchIO::Writer::HTMLResultWriter; > > my $infile = shift or die $!; > > my $searchio = new Bio::SearchIO( -format => 'blast',-file => > $infile); > my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); > my $outhtml = new Bio::SearchIO(-writer => $writerhtml, > -file => ">$ > {infile}.html"); > > $outhtml->write_result($searchio->next_result); > ---------------------------------------------------------------- > > Thanks in advance, > > Anand > > > --------------------------------- > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. > Try it now. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bamboowarrior at gmail.com Wed Apr 23 15:39:21 2008 From: bamboowarrior at gmail.com (Arkady) Date: Wed, 23 Apr 2008 14:39:21 -0500 Subject: [Bioperl-l] WebBlat, where'd it go? Message-ID: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> Hi folks, I'm trying to use BioPerl to run a BLAT search on the four primate genomes on UCSC. I understand that the proper tool for this is Bio::Tools::WebBlat. Unfortunately, it doesn't appear to be in my bioperl distribution (nor do I even know how to figure out what version that is, unfortunately, though it's a very recent install -- a month ago?). I also can't find it on CPAN. Is this deprecated? Has something else replaced it? Or are we always supposed to run local BLAT? Thanks. John Woods Institute for Cellular and Molecular Biology The University of Texas at Austin From spiros at lokku.com Wed Apr 23 15:48:12 2008 From: spiros at lokku.com (Spiros Denaxas) Date: Wed, 23 Apr 2008 20:48:12 +0100 Subject: [Bioperl-l] WebBlat, where'd it go? In-Reply-To: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> References: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> Message-ID: Hey, a quick look at the list of deprecated modules reveals that it has indeed been removed, http://www.bioperl.org/wiki/Deprecated_modules Spiros On Wed, Apr 23, 2008 at 8:39 PM, Arkady wrote: > Hi folks, > > I'm trying to use BioPerl to run a BLAT search on the four primate > genomes on UCSC. I understand that the proper tool for this is > Bio::Tools::WebBlat. Unfortunately, it doesn't appear to be in my > bioperl distribution (nor do I even know how to figure out what > version that is, unfortunately, though it's a very recent install -- a > month ago?). I also can't find it on CPAN. Is this deprecated? Has > something else replaced it? Or are we always supposed to run local > BLAT? > > Thanks. > > John Woods > > Institute for Cellular and Molecular Biology > The University of Texas at Austin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Apr 23 15:56:14 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 23 Apr 2008 14:56:14 -0500 Subject: [Bioperl-l] WebBlat, where'd it go? In-Reply-To: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> References: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> Message-ID: It's no longer maintained (deprecated); see the following for an explanation: http://article.gmane.org/gmane.comp.lang.perl.bio.general/13545 Basically, only local BLAT searches are supported through BioPerl. chris On Apr 23, 2008, at 2:39 PM, Arkady wrote: > Hi folks, > > I'm trying to use BioPerl to run a BLAT search on the four primate > genomes on UCSC. I understand that the proper tool for this is > Bio::Tools::WebBlat. Unfortunately, it doesn't appear to be in my > bioperl distribution (nor do I even know how to figure out what > version that is, unfortunately, though it's a very recent install -- a > month ago?). I also can't find it on CPAN. Is this deprecated? Has > something else replaced it? Or are we always supposed to run local > BLAT? > > Thanks. > > John Woods > > Institute for Cellular and Molecular Biology > The University of Texas at Austin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bioperlanand at yahoo.com Wed Apr 23 19:05:27 2008 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Wed, 23 Apr 2008 16:05:27 -0700 (PDT) Subject: [Bioperl-l] Question on integrating Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter In-Reply-To: <952B0A4E-8A14-4E8E-B36D-14596B20E330@bioperl.org> Message-ID: <795696.39415.qm@web36804.mail.mud.yahoo.com> Hi Jason, Thanks for the reply. I am a little lost with the solution suggested. Is that how slide 60 in the pdf is obtained: http://jason.open-bio.org/Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf I guess I am missing something quite obvious, I apologize. What I have & want is this: I have a directory having say 100 different blast reports & hence I am looking to obtain 100 different bioperl formatted blast html outputs with the respective images just as it would appear in the blast report. Thanks, Anand Jason Stajich wrote: Basically you want to inject your own IMG tags into the file with these routines: $writerhtml->start_report(\&my_start_report); $writerhtml->title(\&my_title); $writerhtml->hit_link_align(\&my_hit_link_align); $writerhtml->hit_link_desc(\&my_hit_link_desc); fgblast shows a way to do this in part. It relies on Gbrowse to generate the image but you can replace the gbrowse_img reference to your own image generating software. http://people.genome.duke.edu/~jes12/software/scripts/fgblast -jason On Apr 22, 2008, at 11:02 PM, Anand Venkatraman wrote: Hi everybody, I would like to use Bio::Graphics in conjunction with Bio::SearchIO::Writer::HTMLResultWriter to obtain a HTML formatted blast report output along with an image of the blast hits as shown on Slide 60 in this pdf: http://jason.open-bio.org/Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf I am able to get the HTML output using "Bio::SearchIO::Writer::HTMLResultWriter" and I am able to get the image using the examples outlined in the Bio::Graphics HOWTO: http://www.bioperl.org/wiki/HOWTO:Graphics My question: How do I integrate Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter to render the blast hits image at the correct position in my BioPerl reformatted html file. I also found that someone else has asked something similar to whatever I am asking & is listed under the "Orphans, Leftovers" category in the ListSummary:April 26-May 9,2006 document: http://www.bioperl.org/wiki/ListSummary:April_26-May_9%2C2006#Orphans.2C_Leftovers Here is my code so far: ---------------------------------------------------------------- #!/usr/bin/perl -w # usage: $0 use strict; use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; my $infile = shift or die $!; my $searchio = new Bio::SearchIO( -format => 'blast',-file => $infile); my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">${infile}.html"); $outhtml->write_result($searchio->next_result); ---------------------------------------------------------------- Thanks in advance, Anand --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From jason at bioperl.org Thu Apr 24 14:06:41 2008 From: jason at bioperl.org (Jason Stajich) Date: Thu, 24 Apr 2008 11:06:41 -0700 Subject: [Bioperl-l] Question on integrating Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter In-Reply-To: <795696.39415.qm@web36804.mail.mud.yahoo.com> References: <795696.39415.qm@web36804.mail.mud.yahoo.com> Message-ID: The overview graphic is generated basically from the script in scripts/graphics/search_overview.PLS So you'd have to run that on each report to generate the graphic, then use the other methods to insert images into each rendered HTML report. -jason On Apr 23, 2008, at 4:05 PM, Anand Venkatraman wrote: > Hi Jason, > > Thanks for the reply. > > I am a little lost with the solution suggested. Is that how slide > 60 in the pdf is obtained: http://jason.open-bio.org/ > Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf > > I guess I am missing something quite obvious, I apologize. > > What I have & want is this: I have a directory having say 100 > different blast reports & hence I am looking to obtain 100 > different bioperl formatted blast html outputs with the respective > images just as it would appear in the blast report. > > Thanks, > > Anand > > Jason Stajich wrote: > > Basically you want to inject your own IMG tags into the file with > these routines: > > > $writerhtml->start_report(\&my_start_report); > $writerhtml->title(\&my_title); > $writerhtml->hit_link_align(\&my_hit_link_align); > $writerhtml->hit_link_desc(\&my_hit_link_desc); > > > fgblast shows a way to do this in part. It relies on Gbrowse to > generate the image but you can replace the gbrowse_img reference to > your own image generating software. > http://people.genome.duke.edu/~jes12/software/scripts/fgblast > > > > > -jason > On Apr 22, 2008, at 11:02 PM, Anand Venkatraman wrote: > > Hi everybody, > > > I would like to use Bio::Graphics in conjunction with > Bio::SearchIO::Writer::HTMLResultWriter to obtain a HTML formatted > blast report output along with an image of the blast hits as shown > on Slide 60 in this pdf: http://jason.open-bio.org/ > Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf > > > I am able to get the HTML output using > "Bio::SearchIO::Writer::HTMLResultWriter" and I am able to get the > image using the examples outlined in the Bio::Graphics HOWTO: > http://www.bioperl.org/wiki/HOWTO:Graphics > > > My question: How do I integrate Bio::Graphics with > Bio::SearchIO::Writer::HTMLResultWriter to render the blast hits > image at the correct position in my BioPerl reformatted html file. > > > I also found that someone else has asked something similar to > whatever I am asking & is listed under the "Orphans, Leftovers" > category in the ListSummary:April 26-May 9,2006 document: > http://www.bioperl.org/wiki/ListSummary:April_26-May_9% > 2C2006#Orphans.2C_Leftovers > > > Here is my code so far: > ---------------------------------------------------------------- > #!/usr/bin/perl -w > # usage: $0 > use strict; > use Bio::SearchIO; > use Bio::SearchIO::Writer::HTMLResultWriter; > > > my $infile = shift or die $!; > > > my $searchio = new Bio::SearchIO( -format => 'blast',-file => > $infile); > my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); > my $outhtml = new Bio::SearchIO(-writer => $writerhtml, > -file => ">$ > {infile}.html"); > > > $outhtml->write_result($searchio->next_result); > ---------------------------------------------------------------- > > > Thanks in advance, > > > Anand > > > > > --------------------------------- > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. > Try it now. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > --------------------------------- > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. > Try it now. From 1zoujing at 163.com Wed Apr 16 22:53:16 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 16 Apr 2008 19:53:16 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: References: <16602770.post@talk.nabble.com> <16603225.post@talk.nabble.com> Message-ID: <16737795.post@talk.nabble.com> Thank you very much! I splited the file on \t directly. Zou Jing Stefan Kirov-2 wrote: > > It is not. If you use this file, why would you need a parser for it > anyway? Just split on \t or read with OpenOffice or equiv. > Stefan > > On Thu, 10 Apr 2008, zoujing wrote: > >> >> Seached the web and found the answer now, quote the answer as following: >> The error was thrown by my Bio::ASN1::EntrezGene module because it >> expects a text file, while you fed it with a binary file. To use >> gzipped ASN binary file from NCBI, download the NCBI gene2xml >> (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), >> then use this syntax to run my parser on the binary files: >> >> my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i >> Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped >> binary file directly downloaded from NCBI >> >> Same syntax should be used when you're using SeqIO (thus >> SeqIO::entrezgene). >> Mingyi >> >> But there still one thing, I want to parse "gene_info.gz" in Gene of >> NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one >> line >> per GeneID, Column header line is the first line in the file >> ) is not the right format for Bio::ASN1::EntrezGene? >> >> >> >> zoujing wrote: >>> >>> I am a geen hand in Bioperl. When I run perl with >>> "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error >>> information: >>> Data Error: none conforming data found on line 1 in Sus_scrofa.ags. >>> >>> But the Sus_scrofa.ags is download from NCBI, with the format of >>> ASN1, >>> should be the same as Homo_sapiens in the example. So it should be no >>> error as the code is the example from Mingyi. >>> I wonder why this happen, and should I change something about the >>> file? >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16737795.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Wed Apr 16 22:55:47 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 16 Apr 2008 19:55:47 -0700 (PDT) Subject: [Bioperl-l] Bio::ASN1::EntrezGene parse so slowly? In-Reply-To: <264855a00804112050gf785c2ei66d9c7463597eccd@mail.gmail.com> References: <16602210.post@talk.nabble.com> <264855a00804112050gf785c2ei66d9c7463597eccd@mail.gmail.com> Message-ID: <16737804.post@talk.nabble.com> Thank you vey much! Solved the problem now. Jing Sean Davis-3 wrote: > > gene_info is a tab-delimited text file, if I recall correctly. Have > you looked at it? If it is, you should be able to parse it in a few > seconds with just a couple lines of code. > > Sean > > > On Thu, Apr 10, 2008 at 1:08 AM, zoujing <1zoujing at 163.com> wrote: >> >> I want to parse a file "gene_info" from NCBI. The format of Gene in >> NCBI is >> ASN1, right? So I used Bio::ASN1::EntrezGene. But it didn't work >> properly/too slow. The file is about 500M. >> The code is following: >> use Bio::ASN1::EntrezGene; >> my $parser = Bio::ASN1::EntrezGene->new('file' => $ARGV[0]); >> my $i = 0; >> while(my $result = $parser->next_seq) >> { last; #something to do there, here use last for test} >> >> When it goes to the "while" part, it is processing on and on, it does >> not >> went out, even I used "last" in the "while" part. >> So I wonder whether it is too slow or the module is not fit for this >> job, >> or I did something wrong? >> >> Thank you! >> -- >> View this message in context: >> http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16602210.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16737804.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From sbassi at clubdelarazon.org Sat Apr 26 13:49:20 2008 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Sat, 26 Apr 2008 14:49:20 -0300 Subject: [Bioperl-l] bioperl installation problem Message-ID: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> I tried to install bioperl because I need to install cviewer. Here (http://www.pastecode.com.ar/f37c1cd60) are both stdout and sdterr outputs. Here is one of the errors I get: set_attribute: not a compat02 graph at /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 10. sleeping for 3 seconds set_attribute: not a compat02 graph at /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 14. But I have GD::Graph, so I don't know what is going on: sbassi at ubuntuMAP:~$ sudo perl -MCPAN -e 'install GD::Graph' CPAN: Storable loaded ok Going to read /home/sbassi/.cpan/Metadata Database was generated on Fri, 25 Apr 2008 09:29:45 GMT GD::Graph is up to date. Any help regarding this: http://www.pastecode.com.ar/f37c1cd60 would be appreciated. Best, SB. -- Sebasti?n Bassi (???????). Diplomado en Ciencia y Tecnolog?a. Curso Biologia molecular para programadores: http://tinyurl.com/2vv8w6 Mostr? tu c?digo: http://www.pastecode.com.ar GPG Fingerprint: 9470 0980 620D ABFC BE63 A4A4 A3DE C97D 8422 D43D From jason at bioperl.org Sat Apr 26 15:23:37 2008 From: jason at bioperl.org (Jason Stajich) Date: Sat, 26 Apr 2008 12:23:37 -0700 Subject: [Bioperl-l] bioperl installation problem In-Reply-To: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> References: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> Message-ID: the error refers to the 'Graph' module not 'GD::Graph'; -jason On Apr 26, 2008, at 10:49 AM, Sebastian Bassi wrote: > I tried to install bioperl because I need to install cviewer. > Here (http://www.pastecode.com.ar/f37c1cd60) are both stdout and > sdterr outputs. > > Here is one of the errors I get: > > set_attribute: not a compat02 graph at > /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 10. > sleeping for 3 seconds > set_attribute: not a compat02 graph at > /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 14. > > But I have GD::Graph, so I don't know what is going on: > > sbassi at ubuntuMAP:~$ sudo perl -MCPAN -e 'install GD::Graph' > CPAN: Storable loaded ok > Going to read /home/sbassi/.cpan/Metadata > Database was generated on Fri, 25 Apr 2008 09:29:45 GMT > GD::Graph is up to date. > > Any help regarding this: http://www.pastecode.com.ar/f37c1cd60 > would be appreciated. > > Best, > SB. > > -- > Sebasti?n Bassi (???????). Diplomado en Ciencia y > Tecnolog?a. > Curso Biologia molecular para programadores: http://tinyurl.com/2vv8w6 > Mostr? tu c?digo: http://www.pastecode.com.ar > GPG Fingerprint: 9470 0980 620D ABFC BE63 A4A4 A3DE C97D 8422 D43D > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sbassi at clubdelarazon.org Sat Apr 26 17:08:13 2008 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Sat, 26 Apr 2008 18:08:13 -0300 Subject: [Bioperl-l] bioperl installation problem In-Reply-To: References: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> Message-ID: <9e2f512b0804261408l45ff9f91j94f44065d21cd65f@mail.gmail.com> On Sat, Apr 26, 2008 at 4:23 PM, Jason Stajich wrote: > the error refers to the 'Graph' module not 'GD::Graph'; You are right, but I have it also installed: sbassi at ubuntuMAP:~$ sudo perl -MCPAN -e 'install Graph' Password: CPAN: Storable loaded ok Going to read /home/sbassi/.cpan/Metadata Database was generated on Fri, 25 Apr 2008 09:29:45 GMT Graph is up to date. -- Sebasti?n Bassi (???????). Diplomado en Ciencia y Tecnolog?a. Curso Biologia molecular para programadores: http://tinyurl.com/2vv8w6 Mostr? tu c?digo: http://www.pastecode.com.ar GPG Fingerprint: 9470 0980 620D ABFC BE63 A4A4 A3DE C97D 8422 D43D From bix at sendu.me.uk Sat Apr 26 19:30:56 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 27 Apr 2008 00:30:56 +0100 Subject: [Bioperl-l] bioperl installation problem In-Reply-To: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> References: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> Message-ID: <4813BB30.6060703@sendu.me.uk> Sebastian Bassi wrote: > I tried to install bioperl because I need to install cviewer. > Here (http://www.pastecode.com.ar/f37c1cd60) are both stdout and sdterr outputs. > > Here is one of the errors I get: > > set_attribute: not a compat02 graph at > /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 10. > sleeping for 3 seconds > set_attribute: not a compat02 graph at > /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 14. You're trying to install a very old version of Bioperl which apparently uses behaviour of the Graph module no longer supported: http://search.cpan.org/~jhi/Graph-0.84/lib/Graph.pod#Backward_compatibility_with_Graph_0.2 Your options are to force install your desired version of Bioperl (if you don't need to use the modules that are causing the errors you get), downgrade your version of Graph to pre-0.2, or install the latest version of Bioperl (1.5.2 or from svn). From dr.hogart at gmail.com Sun Apr 27 10:05:20 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Sun, 27 Apr 2008 18:05:20 +0400 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics Message-ID: Hi all, is it possible to add a GD::graphic object (chart) to Bio::Graphics panel to obtain a file with image of both the chart and bioseq object? From Russell.Smithies at agresearch.co.nz Sun Apr 27 17:27:23 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 28 Apr 2008 09:27:23 +1200 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics In-Reply-To: References: Message-ID: You can get the GD object back from the Bio::Graphics::Panel then draw on it using GD methods Eg: #create a BioPerl panel my $panel = Bio::Graphics::Panel->new( -length => 600 -width => 800, -bgcolor => 'white' ); # add your features my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => 200,); $panel->add_track($feature, glyph => 'segments', -label => 0, -height => 30, -bgcolor => 'red', -fgcolor => 'red' ); # grab the GD thingy my $gd = $panel->gd; #create a color - not sure if there's a better way? $black = $gd->colorAllocate(0,0,0); #draw on your GD thingy $gd->line(10,10,$panel->width -10,10,$black); $gd->string(gdSmallFont,20,10,'test' ,'$black); # print it as normal print $panel->png; > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of sergei ryazansky > Sent: Monday, 28 April 2008 2:05 a.m. > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics > > Hi all, > > is it possible to add a GD::graphic object (chart) to Bio::Graphics panel > to obtain a file with image of both the chart and bioseq object? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From dr.hogart at gmail.com Sun Apr 27 20:25:18 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Mon, 28 Apr 2008 04:25:18 +0400 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics References: Message-ID: Thanks for answer! Yours script works fine, but nevertheless, as for as I understand 'gd' method return the gd::image object. But I need the to merge bioseq object with gd::graph object (gd::graph::area). Is it possible? Or maybe I misunderstood something in your example? On Mon, 28 Apr 2008 01:27:23 +0400, Smithies, Russell wrote: > You can get the GD object back from the Bio::Graphics::Panel then draw > on it using GD methods > > Eg: > > #create a BioPerl panel > my $panel = Bio::Graphics::Panel->new( > -length => 600 > -width => 800, > -bgcolor => 'white' > ); > # add your features > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => > 200,); > $panel->add_track($feature, glyph => 'segments', > -label => 0, > -height => 30, > -bgcolor => 'red', > -fgcolor => 'red' > ); > > # grab the GD thingy > my $gd = $panel->gd; > > #create a color - not sure if there's a better way? > $black = $gd->colorAllocate(0,0,0); > > #draw on your GD thingy > $gd->line(10,10,$panel->width -10,10,$black); > $gd->string(gdSmallFont,20,10,'test' ,'$black); > > # print it as normal > print $panel->png; > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of sergei ryazansky >> Sent: Monday, 28 April 2008 2:05 a.m. >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics >> >> Hi all, >> >> is it possible to add a GD::graphic object (chart) to Bio::Graphics > panel >> to obtain a file with image of both the chart and bioseq object? >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= From Bank.Beszteri at awi.de Mon Apr 28 08:18:20 2008 From: Bank.Beszteri at awi.de (=?UTF-8?B?QsOhbmsgQmVzenRlcmk=?=) Date: Mon, 28 Apr 2008 14:18:20 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47FB204F.90405@awi.de> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> Message-ID: <4815C08C.1060305@awi.de> Dear BioSQL / bioperl-db-ists, I would like to share my experiences with trying to load uniprot_trembl into a BioSQL db, and also to ask a couple of questions; perhaps some of you know the problems I encountered. I used bioperl-live and bioperl-db-live as of 2008-04-03 and uniprot_trembl.dat as of 2008-04-04. The command was like load_seqdatabase.pl --safe --logchunk 1000 --host dbserv --dbname abc --dbuser efg --dbpass xyz --driver mysql --namespace uniprot_trembl --format embl uniprot_trembl.dat although I split the dat file into 10 chunks and started them parallel to make it faster. This did not go quite as smoothly as Swissprot did. In the end, it seems to have loaded 5022284 entries of the 5443284 which appear to be there in the input file (when counting with grep -c "ID "). Besides the harmless taxonomy warnings which also appear with Swissprot (and have been discussed about here a couple of weeks ago and also earlier), there came a couple of more serious errors. Perhaps some of you know them already: First of all, the below error seems to lead to a crash, in spite of --safe: >>> ------------- EXCEPTION ------------- MSG: A1XDT7 seems to have an invalid species classification. STACK Bio::SeqIO::embl::_read_EMBL_Species /home/biocl/bbeszter/lib/bioperl-live/bioperl-live/Bio/SeqIO/embl.pm:108 7 STACK Bio::SeqIO::embl::next_seq /home/biocl/bbeszter/lib/bioperl-live/bioperl-live/Bio/SeqIO/embl.pm:320 STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:634 ------------------------------------- Command exited with non-zero status 255 <<< What this is about is NCBI Tax_ID:435 (Acetobacter aceti; it has some 30 synonyms in my DB, too), which, to me, looks like a completely normal taxon: I could follow its taxonomy up to the root in my NCBI taxonomy in the BioSQL DB I used. I don?t know if someone else has seen / can reproduce the problem, or should I think about some problem with my taxonomy db? Besides, is it the expected behaviour from load_seqdatabase.pl to die upon this error? ################### The other problems did not lead to a crash, only to a failure to load the sequence, which would be what I?d expect with --safe. The first type of errors looks like >>> Could not store Q49I36: ------------- EXCEPTION ------------- MSG: Unique key query in Bio::DB::BioSQL::SpeciesAdaptor returned 2 rows instead of 1. Query was [name_class="scientific name",binomial="Onchocerca volvulus"] STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:958 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:854 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182 STACK Bio::DB::Persistent::PersistentObject::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:244 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:271 STACK (eval) /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:630 STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:612 ------------------------------------- <<< In this particular case, "Onchocerca volvulus" does indeed have two taxon_ids in my DB (6282 and 563188, of which only the first one is returned by a web search at NCBI taxonomy); but the same thing happened with a number of other taxa (followed by how many times the above error was caused by the particular taxa): Wolbachia pipientis 64 Hemerocallis sp. 1 Hypsiglena torquata 3 Salmonella enterica 1211 Burkholderia sp. 31 Streptococcus sp. 4 Rhizobium sp. 600 Nostoc sp. 19 Drosophila sp. 18 Onchocerca volvulus 62 Atlapetes schistaceus 4 Symbiodinium sp. 3 Escherichia coli 7421 Hieraaetus fasciatus 4 Borrelia burgdorferi group 1 Pseudomonas sp. 29 Rotavirus A 1076 Gorilla gorilla 746 Rana plancyi 14 unclassified sequences 1 (This should be 11312 cases altogether, but the list might be incomplete because I accidentally removed one of my logs, which contained STDOUT &STDERR ~ for 10 % of the entries) Again, is this a known problem for some of you, or could there be a problem with my copy of NCBI taxonomy? I don?t remember having updated it after the initial upload, so I?m quite surprised by such duplicate entries.... ################### Type 2 error w/o crash: >>> Could not store A5HU09: ------------- EXCEPTION ------------- MSG: create: object (Bio::Species) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206 STACK Bio::DB::Persistent::PersistentObject::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:244 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:271 STACK (eval) /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:630 STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:612 <<< This particular record has the NCBI_TaxID 44271, which looks completely normal in the NCBI taxonomy loaded in my BioSQL DB, but the same problem appeared in 53 further cases (I could not look into them in detail as yet to see whether they were all the same species). On the other hand, 7 records which were succesfully loaded have this taxonomy ID in the DB (44271). ################### Nr 3 no crash: >>> Could not store Q6T859: Unmatched ( in regex; marked by <-- HERE in m/Camelina microcarpa (Littlepod false flax) ( <-- HERE microcarpa subsp.\s+/ at /home/biocl/bbeszter/lib/bioperl-live/bioperl-live/Bio/Species.pm line 466, line 357048. <<< This happens in the sub binomial in Species.pm using the option "FULL", which requests to also return subspecies. I have not looked much deeper into this yet, but is it possible that there is a parsing problem with multi-line species strings? In the above case the OS field in uniprot_trembl.dat looks like OS Camelina microcarpa (Littlepod false flax) (Camelina microcarpa subsp. OS sylvestris). ################### I?m still looking for where the remaining records disappeared: of the 421000 records not showing up in the DB, I could find these: crasher (Tax_ID=435): 45 entries problem 1 ("MSG: Unique key query in Bio::DB::BioSQL::SpeciesAdaptor returned 2 rows instead of 1."): 11312 entries problem 2 ("MSG: create: object (Bio::Species) failed to insert or to be found by unique key"): 54 entries problem 3 ("Unmatched ( in regex"): 28241 entries 381348 still remain... Although these could in principle come from the first 10 %, for which I don?t have the output, but they don?t seem to: after restarting that chunk, I get ~ 30 "Could not store" errors. So the last question: are there any error messages I can expect which don?t contain "Could not store" and which I thus missed here? Bank Beszteri Bioinformatics Alfred Wegener Institute for Polar and Marine Research Am Handelshafen 12 27570 Bremerhaven From cjfields at uiuc.edu Mon Apr 28 09:20:39 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 28 Apr 2008 08:20:39 -0500 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <4815C08C.1060305@awi.de> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> <4815C08C.1060305@awi.de> Message-ID: <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> On Apr 28, 2008, at 7:18 AM, B?nk Beszteri wrote: > Dear BioSQL / bioperl-db-ists, > > I would like to share my experiences with trying to load > uniprot_trembl into a BioSQL db, and also to ask a couple of > questions; perhaps some of you know the problems I encountered. I > used bioperl-live and bioperl-db-live as of 2008-04-03 and > uniprot_trembl.dat as of 2008-04-04. The command was like > > load_seqdatabase.pl --safe --logchunk 1000 --host dbserv --dbname > abc --dbuser efg --dbpass xyz --driver mysql --namespace > uniprot_trembl --format embl uniprot_trembl.dat > > .... > > First of all, the below error seems to lead to a crash, in spite of > --safe: > > >>> > ------------- EXCEPTION ------------- > MSG: A1XDT7 seems to have an invalid species classification. > STACK Bio::SeqIO::embl::_read_EMBL_Species /home/biocl/bbeszter/lib/ > bioperl-live/bioperl-live/Bio/SeqIO/embl.pm:108 > 7 > STACK Bio::SeqIO::embl::next_seq /home/biocl/bbeszter/lib/bioperl- > live/bioperl-live/Bio/SeqIO/embl.pm:320 > STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/ > scripts/biosql/load_seqdatabase.pl:634 > ------------------------------------- > > Command exited with non-zero status 255 > <<< > > What this is about is NCBI Tax_ID:435 (Acetobacter aceti; it has > some 30 synonyms in my DB, too), which, to me, looks like a > completely normal taxon: I could follow its taxonomy up to the root > in my NCBI taxonomy in the BioSQL DB I used. I don?t know if someone > else has seen / can reproduce the problem, or should I think about > some problem with my taxonomy db? Besides, is it the expected > behaviour from load_seqdatabase.pl to die upon this error? ... You should use 'swiss' format instead of 'embl' when loading Uniprot/ SwissProt sequences. Though on the surface they're similar the feature table (among other things) is completely different. I'm not sure if that's causing all of the issues here but it certainly could contribute to them. In the meantime, it's much easier for us to track these problems if you file a bug (BioPerl, file for bioperl-db): http://bugzilla.open-bio.org/ chris From cjfields at uiuc.edu Sun Apr 27 17:54:03 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 27 Apr 2008 16:54:03 -0500 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics In-Reply-To: References: Message-ID: I think this is how some of the synteny mapping is done using SynBrowse (the trapezoids connecting syntenous genes on different tracks). http://www.gmod.org/wiki/index.php/SynView chris On Apr 27, 2008, at 4:27 PM, Smithies, Russell wrote: > You can get the GD object back from the Bio::Graphics::Panel then > draw > on it using GD methods > > Eg: > > #create a BioPerl panel > my $panel = Bio::Graphics::Panel->new( > -length => 600 > -width => 800, > -bgcolor => 'white' > ); > # add your features > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => > 200,); > $panel->add_track($feature, glyph => 'segments', > -label => 0, > -height => 30, > -bgcolor => 'red', > -fgcolor => 'red' > ); > > # grab the GD thingy > my $gd = $panel->gd; > > #create a color - not sure if there's a better way? > $black = $gd->colorAllocate(0,0,0); > > #draw on your GD thingy > $gd->line(10,10,$panel->width -10,10,$black); > $gd->string(gdSmallFont,20,10,'test' ,'$black); > > # print it as normal > print $panel->png; > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of sergei ryazansky >> Sent: Monday, 28 April 2008 2:05 a.m. >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics >> >> Hi all, >> >> is it possible to add a GD::graphic object (chart) to Bio::Graphics > panel >> to obtain a file with image of both the chart and bioseq object? >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Bank.Beszteri at awi.de Mon Apr 28 09:51:53 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Mon, 28 Apr 2008 15:51:53 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> <4815C08C.1060305@awi.de> <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> Message-ID: <4815D679.3070307@awi.de> Chris Fields schrieb: > > ... > > You should use 'swiss' format instead of 'embl' when loading > Uniprot/SwissProt sequences. Though on the surface they're similar > the feature table (among other things) is completely different. I'm > not sure if that's causing all of the issues here but it certainly > could contribute to them. > > In the meantime, it's much easier for us to track these problems if > you file a bug (BioPerl, file for bioperl-db): > > http://bugzilla.open-bio.org/ > Hi Chris, I will do so; in the meanwhile: I?m not loading Swissprot, but TrEMBL. Is swiss also the appropriate format here? By reading http://expasy.org/sprot/userman.html#diffEMBL, I concluded that embl should be the one I?d need for TrEMBL. Bank From cjfields at uiuc.edu Mon Apr 28 12:24:39 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 28 Apr 2008 11:24:39 -0500 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <4815D679.3070307@awi.de> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> <4815C08C.1060305@awi.de> <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> <4815D679.3070307@awi.de> Message-ID: On Apr 28, 2008, at 8:51 AM, B?nk Beszteri wrote: > Chris Fields schrieb: >> >> ... >> >> You should use 'swiss' format instead of 'embl' when loading >> Uniprot/SwissProt sequences. Though on the surface they're similar >> the feature table (among other things) is completely different. >> I'm not sure if that's causing all of the issues here but it >> certainly could contribute to them. >> >> In the meantime, it's much easier for us to track these problems if >> you file a bug (BioPerl, file for bioperl-db): >> >> http://bugzilla.open-bio.org/ >> > Hi Chris, > > I will do so; in the meanwhile: I?m not loading Swissprot, but > TrEMBL. Is swiss also the appropriate format here? By reading http://expasy.org/sprot/userman.html#diffEMBL > , I concluded that embl should be the one I?d need for TrEMBL. > > Bank The section you link to describes several important differences between EMBL and SwissProt/UniProt format (i.e. how each indicated line type differs between SwissProt and EMBL formats, including ID, AC, OS/OC, FT, etc). I'm unsure how you derived that 'embl' would work from that, e.g. they are close, but there are enough significant differences that using 'embl' for SwissProt (or vice versa) will not work as intended, if at all. chris From hlapp at gmx.net Mon Apr 28 15:46:07 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 28 Apr 2008 15:46:07 -0400 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <4815D679.3070307@awi.de> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> <4815C08C.1060305@awi.de> <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> <4815D679.3070307@awi.de> Message-ID: <3BD6A261-D023-4A5F-9CBC-C3216B0145F0@gmx.net> On Apr 28, 2008, at 9:51 AM, B?nk Beszteri wrote: > I?m not loading Swissprot, but TrEMBL. Is swiss also the > appropriate format here? Yes, though I guess it can be confusing. Maybe we should create a symlink uniprot.pm to swiss.pm, or in fact fork them if UniProt starts accumulating enough differences from the traditional Swissprot format. BTW as you had noticed, the --safe switch only protects the script from crashing due to a db loading error. A parsing error will still cause a crash. I guess you can argue that that's not nice, and having a chance to skip over the record that offends the (BioPerl) parser would be useful. The problem is that if the parser errors out, it's not guaranteed where we are in the file and whether the parser module is in a state that it can recover itself from. For the database it's a bit easier as one just needs to rollback() the transaction (each sequence is its own transaction). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From Russell.Smithies at agresearch.co.nz Mon Apr 28 17:15:16 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 29 Apr 2008 09:15:16 +1200 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics In-Reply-To: References: Message-ID: I thought it was a bit of a hack but I guess if someone else is doing it too, it can't be all bad :-) It looks like you can combine your drawing methods like this: (I'm sure Lincoln will tell us this is bad but it seems to work ok) ------------------------------------------------------------------------ ------------- #!perl -w use GD::Graph::lines; use GD::Graph::colour; use GD::Graph::Data; use Bio::Graphics; use Bio::SeqFeature::Generic; # create and draw on a graphics panel my $panel = Bio::Graphics::Panel->new( -length => 500, -width => 500 ); my $track = $panel->add_track( -glyph => 'generic', -label => 1 ); # create and add a few features for($i = 100; $i < 500; $i+= 100){ my $feature = Bio::SeqFeature::Generic->new( -display_name => "feature: $i", -score => $i, -start => $i, -end => $i + 100 ); $track->add_feature($feature); } # create and draw the graph my @data = ( ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], [ sort { $a <=> $b } (1, 2, 5, 6, 3, 1.5, 1, 3, 4) ] ); my $graph = GD::Graph::lines->new(500, 300); $graph->set( x_label => 'X Label', y_label => 'Y label', title => 'Some simple graph', y_max_value => 8, y_tick_number => 8, y_label_skip => 2 ) or die $graph->error; $graph->set( dclrs => [ qw( green blue black red pink) ] ); my $gd = $graph->plot(\@data) or die $graph->error; # combine the two images my $combined = $panel->gd($gd); open(IMG, '>file.png') or die $!; binmode IMG; print IMG $combined->png; ------------------------------------------------------------------------ ------------------ > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Monday, 28 April 2008 9:54 a.m. > To: Smithies, Russell > Cc: sergei ryazansky; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] addition of GD::graphic object to Bio::Graphics > > I think this is how some of the synteny mapping is done using > SynBrowse (the trapezoids connecting syntenous genes on different > tracks). > > http://www.gmod.org/wiki/index.php/SynView > > chris > > On Apr 27, 2008, at 4:27 PM, Smithies, Russell wrote: > > > You can get the GD object back from the Bio::Graphics::Panel then > > draw > > on it using GD methods > > > > Eg: > > > > #create a BioPerl panel > > my $panel = Bio::Graphics::Panel->new( > > -length => 600 > > -width => 800, > > -bgcolor => 'white' > > ); > > # add your features > > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => > > 200,); > > $panel->add_track($feature, glyph => 'segments', > > -label => 0, > > -height => 30, > > -bgcolor => 'red', > > -fgcolor => 'red' > > ); > > > > # grab the GD thingy > > my $gd = $panel->gd; > > > > #create a color - not sure if there's a better way? > > $black = $gd->colorAllocate(0,0,0); > > > > #draw on your GD thingy > > $gd->line(10,10,$panel->width -10,10,$black); > > $gd->string(gdSmallFont,20,10,'test' ,'$black); > > > > # print it as normal > > print $panel->png; > > > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open- > >> bio.org] On Behalf Of sergei ryazansky > >> Sent: Monday, 28 April 2008 2:05 a.m. > >> To: bioperl-l at bioperl.org > >> Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics > >> > >> Hi all, > >> > >> is it possible to add a GD::graphic object (chart) to Bio::Graphics > > panel > >> to obtain a file with image of both the chart and bioseq object? > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > = > > > ============================================================= > ========= > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other use of, > > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > = > > > ============================================================= > ========= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From lincoln.stein at gmail.com Mon Apr 28 17:33:19 2008 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 28 Apr 2008 17:33:19 -0400 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics In-Reply-To: References: Message-ID: <6dce9a0b0804281433i697cda2fo2c47ce59010d0858@mail.gmail.com> Hi, No, I'm perfectly happy with combining images like this. It is part of what I intended. Another idea would be to use the Image glyph to embed graphs at particular genomic locations in the panel. Right now the glyph is designed in the expectation that the image passed to it is sitting on the file system (or a web URL), but it would be easy to modify it so that a callback can generate the GD on the fly, by using, for example GD::Graph. Lincoln On Mon, Apr 28, 2008 at 5:15 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > I thought it was a bit of a hack but I guess if someone else is doing it > too, it can't be all bad :-) > > It looks like you can combine your drawing methods like this: > (I'm sure Lincoln will tell us this is bad but it seems to work ok) > ------------------------------------------------------------------------ > ------------- > > #!perl -w > use GD::Graph::lines; > use GD::Graph::colour; > use GD::Graph::Data; > > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > # create and draw on a graphics panel > my $panel = Bio::Graphics::Panel->new( > -length => 500, > -width => 500 > ); > my $track = $panel->add_track( > -glyph => 'generic', > -label => 1 > ); > > # create and add a few features > for($i = 100; $i < 500; $i+= 100){ > my $feature = Bio::SeqFeature::Generic->new( > -display_name => "feature: > $i", > -score => $i, > -start => $i, > -end => $i + 100 > ); > $track->add_feature($feature); > } > > > # create and draw the graph > my @data = ( > ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], > [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], > [ sort { $a <=> $b } (1, 2, 5, 6, 3, 1.5, 1, 3, 4) ] > ); > my $graph = GD::Graph::lines->new(500, 300); > > $graph->set( > x_label => 'X Label', > y_label => 'Y label', > title => 'Some simple graph', > y_max_value => 8, > y_tick_number => 8, > y_label_skip => 2 > ) or die $graph->error; > > $graph->set( dclrs => [ qw( green blue black red pink) ] ); > > my $gd = $graph->plot(\@data) or die $graph->error; > > # combine the two images > my $combined = $panel->gd($gd); > > open(IMG, '>file.png') or die $!; > binmode IMG; > print IMG $combined->png; > > ------------------------------------------------------------------------ > ------------------ > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > Sent: Monday, 28 April 2008 9:54 a.m. > > To: Smithies, Russell > > Cc: sergei ryazansky; bioperl-l at bioperl.org > > Subject: Re: [Bioperl-l] addition of GD::graphic object to > Bio::Graphics > > > > I think this is how some of the synteny mapping is done using > > SynBrowse (the trapezoids connecting syntenous genes on different > > tracks). > > > > http://www.gmod.org/wiki/index.php/SynView > > > > chris > > > > On Apr 27, 2008, at 4:27 PM, Smithies, Russell wrote: > > > > > You can get the GD object back from the Bio::Graphics::Panel then > > > draw > > > on it using GD methods > > > > > > Eg: > > > > > > #create a BioPerl panel > > > my $panel = Bio::Graphics::Panel->new( > > > -length => 600 > > > -width => > 800, > > > -bgcolor => 'white' > > > ); > > > # add your features > > > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => > > > 200,); > > > $panel->add_track($feature, glyph => 'segments', > > > -label => 0, > > > -height => 30, > > > -bgcolor => 'red', > > > -fgcolor => 'red' > > > ); > > > > > > # grab the GD thingy > > > my $gd = $panel->gd; > > > > > > #create a color - not sure if there's a better way? > > > $black = $gd->colorAllocate(0,0,0); > > > > > > #draw on your GD thingy > > > $gd->line(10,10,$panel->width -10,10,$black); > > > $gd->string(gdSmallFont,20,10,'test' ,'$black); > > > > > > # print it as normal > > > print $panel->png; > > > > > > > > > > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org > > > [mailto:bioperl-l-bounces at lists.open- > > >> bio.org] On Behalf Of sergei ryazansky > > >> Sent: Monday, 28 April 2008 2:05 a.m. > > >> To: bioperl-l at bioperl.org > > >> Subject: [Bioperl-l] addition of GD::graphic object to > Bio::Graphics > > >> > > >> Hi all, > > >> > > >> is it possible to add a GD::graphic object (chart) to Bio::Graphics > > > panel > > >> to obtain a file with image of both the chart and bioseq object? > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > = > > > > > ============================================================= > > ========= > > > Attention: The information contained in this message and/or > > > attachments > > > from AgResearch Limited is intended only for the persons or entities > > > to which it is addressed and may contain confidential and/or > > > privileged > > > material. Any review, retransmission, dissemination or other use of, > > > or > > > taking of any action in reliance upon, this information by persons > or > > > entities other than the intended recipients is prohibited by > > > AgResearch > > > Limited. If you have received this message in error, please notify > the > > > sender immediately. > > > = > > > > > ============================================================= > > ========= > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dr.hogart at gmail.com Tue Apr 29 03:56:24 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Tue, 29 Apr 2008 11:56:24 +0400 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics References: Message-ID: Thank you very much! It is exactly that I was looking for. On Tue, 29 Apr 2008 01:15:16 +0400, Smithies, Russell wrote: > I thought it was a bit of a hack but I guess if someone else is doing it > too, it can't be all bad :-) > > It looks like you can combine your drawing methods like this: > (I'm sure Lincoln will tell us this is bad but it seems to work ok) > ------------------------------------------------------------------------ > ------------- > > #!perl -w > use GD::Graph::lines; > use GD::Graph::colour; > use GD::Graph::Data; > > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > # create and draw on a graphics panel > my $panel = Bio::Graphics::Panel->new( > -length => 500, > -width => 500 > ); > my $track = $panel->add_track( > -glyph => 'generic', > -label => 1 > ); > > # create and add a few features > for($i = 100; $i < 500; $i+= 100){ > my $feature = Bio::SeqFeature::Generic->new( > -display_name => "feature: > $i", > -score => $i, > -start => $i, > -end => $i + 100 > ); > $track->add_feature($feature); > } > > > # create and draw the graph > my @data = ( > ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], > [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], > [ sort { $a <=> $b } (1, 2, 5, 6, 3, 1.5, 1, 3, 4) ] > ); > my $graph = GD::Graph::lines->new(500, 300); > > $graph->set( > x_label => 'X Label', > y_label => 'Y label', > title => 'Some simple graph', > y_max_value => 8, > y_tick_number => 8, > y_label_skip => 2 > ) or die $graph->error; > > $graph->set( dclrs => [ qw( green blue black red pink) ] ); > > my $gd = $graph->plot(\@data) or die $graph->error; > > # combine the two images > my $combined = $panel->gd($gd); > > open(IMG, '>file.png') or die $!; > binmode IMG; > print IMG $combined->png; > > ------------------------------------------------------------------------ > ------------------ > >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at uiuc.edu] >> Sent: Monday, 28 April 2008 9:54 a.m. >> To: Smithies, Russell >> Cc: sergei ryazansky; bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] addition of GD::graphic object to > Bio::Graphics >> >> I think this is how some of the synteny mapping is done using >> SynBrowse (the trapezoids connecting syntenous genes on different >> tracks). >> >> http://www.gmod.org/wiki/index.php/SynView >> >> chris >> >> On Apr 27, 2008, at 4:27 PM, Smithies, Russell wrote: >> >> > You can get the GD object back from the Bio::Graphics::Panel then >> > draw >> > on it using GD methods >> > >> > Eg: >> > >> > #create a BioPerl panel >> > my $panel = Bio::Graphics::Panel->new( >> > -length => 600 >> > -width => > 800, >> > -bgcolor => 'white' >> > ); >> > # add your features >> > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => >> > 200,); >> > $panel->add_track($feature, glyph => 'segments', >> > -label => 0, >> > -height => 30, >> > -bgcolor => 'red', >> > -fgcolor => 'red' >> > ); >> > >> > # grab the GD thingy >> > my $gd = $panel->gd; >> > >> > #create a color - not sure if there's a better way? >> > $black = $gd->colorAllocate(0,0,0); >> > >> > #draw on your GD thingy >> > $gd->line(10,10,$panel->width -10,10,$black); >> > $gd->string(gdSmallFont,20,10,'test' ,'$black); >> > >> > # print it as normal >> > print $panel->png; >> > >> > >> > >> > >> >> -----Original Message----- >> >> From: bioperl-l-bounces at lists.open-bio.org >> > [mailto:bioperl-l-bounces at lists.open- >> >> bio.org] On Behalf Of sergei ryazansky >> >> Sent: Monday, 28 April 2008 2:05 a.m. >> >> To: bioperl-l at bioperl.org >> >> Subject: [Bioperl-l] addition of GD::graphic object to > Bio::Graphics >> >> >> >> Hi all, >> >> >> >> is it possible to add a GD::graphic object (chart) to Bio::Graphics >> > panel >> >> to obtain a file with image of both the chart and bioseq object? >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > = >> > >> ============================================================= >> ========= >> > Attention: The information contained in this message and/or >> > attachments >> > from AgResearch Limited is intended only for the persons or entities >> > to which it is addressed and may contain confidential and/or >> > privileged >> > material. Any review, retransmission, dissemination or other use of, >> > or >> > taking of any action in reliance upon, this information by persons > or >> > entities other than the intended recipients is prohibited by >> > AgResearch >> > Limited. If you have received this message in error, please notify > the >> > sender immediately. >> > = >> > >> ============================================================= >> ========= >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= -- ?????????? M2, ????????????? ???????? ?????????? Opera: http://www.opera.com/mail/mail/ From d.gatherer at mrcvu.gla.ac.uk Tue Apr 29 08:21:05 2008 From: d.gatherer at mrcvu.gla.ac.uk (Derek Gatherer) Date: Tue, 29 Apr 2008 13:21:05 +0100 Subject: [Bioperl-l] translate() oddities Message-ID: Hi I thought I'd better run this by the community before I embarrass myself on Bugzilla. It seems like a clear bug to me. I'm running Bioperl 1.5.0 on RedHat. For a test input: >test ATGATGATGATGATGTGA the following code is fine. while((my $seqobj = $seq_in->next_seq())) { print "\n".$seqobj->display_id; my $len = $seqobj->length(); print " length: $len"; my $frame1_obj = $seqobj->translate(); my $f1_prot = $frame1_obj->seq(); print "\n$f1_prot"; } Output: test length: 18 MMMMM* But if I want to change the frame as specified in the BioPerl tutorial, by using: my $frame1_obj = $seqobj->translate(frame => 1); # which should now give frame 2, I get: test length: 18 MMMMM-frame The frame is unchanged and the text "-frame" is tacked on the end of the output. The same occurs with translate(frame => 2). Any ideas? Can something as fundamental as translate() really be bugged? or am I guilty of some particularly heinous syntax error? Cheers Derek From tristan.lefebure at gmail.com Tue Apr 29 09:58:21 2008 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Tue, 29 Apr 2008 09:58:21 -0400 Subject: [Bioperl-l] translate() oddities In-Reply-To: References: Message-ID: <200804290958.21548.tristan.lefebure@gmail.com> Aren't you forgetting the dash? my $frame1_obj = $seqobj->translate(-frame => 1) On Tuesday 29 April 2008 08:21:05 Derek Gatherer wrote: > my $frame1_obj = $seqobj->translate(frame => 1) -Tristan From d.gatherer at mrcvu.gla.ac.uk Tue Apr 29 10:05:03 2008 From: d.gatherer at mrcvu.gla.ac.uk (Derek Gatherer) Date: Tue, 29 Apr 2008 15:05:03 +0100 Subject: [Bioperl-l] translate() oddities In-Reply-To: <481726BF.1060609@bms.com> References: <481726BF.1060609@bms.com> Message-ID: Thanks Stefan Actually, there was a typo in my message, I did use -frame => 1. However, the problem disappears on upgrading from 1.5.0 to 1.5.2. So not a bug anymore. Cheers Derek At 14:46 29/04/2008, Stefan Kirov wrote: >my $frame1_obj = $seqobj->translate(-frame => 1); >not >my $frame1_obj = $seqobj->translate(frame => 1); >Stefan > >Derek Gatherer wrote: > > Hi > > > > I thought I'd better run this by the community before I embarrass > > myself on Bugzilla. It seems like a clear bug to me. I'm running > > Bioperl 1.5.0 on RedHat. > > > > For a test input: > > > > >test > > ATGATGATGATGATGTGA > > > > the following code is fine. > > > > while((my $seqobj = $seq_in->next_seq())) > > { > > print "\n".$seqobj->display_id; > > my $len = $seqobj->length(); > > print " length: $len"; > > my $frame1_obj = $seqobj->translate(); > > my $f1_prot = $frame1_obj->seq(); > > print "\n$f1_prot"; > > } > > > > Output: > > > > test length: 18 > > MMMMM* > > > > But if I want to change the frame as specified in the BioPerl > > tutorial, by using: > > > > my $frame1_obj = $seqobj->translate(frame => 1); # which should now > > give frame 2, I get: > > > > test length: 18 > > MMMMM-frame > > > > The frame is unchanged and the text "-frame" is tacked on the end of > > the output. The same occurs with translate(frame => 2). > > > > Any ideas? Can something as fundamental as translate() really be > > bugged? or am I guilty of some particularly heinous syntax error? > > > > Cheers > > Derek > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From l.douchy at gmail.com Tue Apr 29 10:16:40 2008 From: l.douchy at gmail.com (Laurent DOUCHY) Date: Tue, 29 Apr 2008 16:16:40 +0200 Subject: [Bioperl-l] translate() oddities In-Reply-To: <200804290958.21548.tristan.lefebure@gmail.com> References: <200804290958.21548.tristan.lefebure@gmail.com> Message-ID: <2fb209dd0804290716x36e403dek55978dc4f54e34ff@mail.gmail.com> Hello, I resolved this issue in Bio::seqIO with the following line : my $sequence = $seq->translate('*', 'X', '0', '1', '0', '0', '0', '0')->seq; the third parameter set the frame. I hope to have been helpful. laurent. On Tue, Apr 29, 2008 at 3:58 PM, Tristan Lefebure < tristan.lefebure at gmail.com> wrote: > Aren't you forgetting the dash? > > my $frame1_obj = $seqobj->translate(-frame => 1) > > > On Tuesday 29 April 2008 08:21:05 Derek Gatherer wrote: > > my $frame1_obj = $seqobj->translate(frame => 1) > > > > -Tristan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From roy.chaudhuri at gmail.com Tue Apr 29 10:27:10 2008 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 29 Apr 2008 15:27:10 +0100 Subject: [Bioperl-l] translate() oddities In-Reply-To: References: <481726BF.1060609@bms.com> Message-ID: <4817303E.1040903@gmail.com> Spent two minutes looking at this, so may as well chip in with what I discovered even though you solved your problem. This "bug" comes about because in version 1.5.1 and earlier, the arguments to translate were a simple list, with the first argument the terminator (defaults to "*"). Your old version therefore assumed that you wanted to translate the stop codon to "-frame". Amusingly given your typo, if you miss the hyphen off the frame argument in version 1.5.2 it reverts to the old interface and you end up with the output "MMMMMframe". The moral of the story is of course to read the docs relevant to the version you are using. Roy. -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. Derek Gatherer wrote: > Thanks Stefan > > Actually, there was a typo in my message, I did use -frame => > 1. However, the problem disappears on upgrading from 1.5.0 to 1.5.2. > > So not a bug anymore. > > Cheers > Derek > > At 14:46 29/04/2008, Stefan Kirov wrote: >> my $frame1_obj = $seqobj->translate(-frame => 1); >> not >> my $frame1_obj = $seqobj->translate(frame => 1); >> Stefan >> >> Derek Gatherer wrote: >>> Hi >>> >>> I thought I'd better run this by the community before I embarrass >>> myself on Bugzilla. It seems like a clear bug to me. I'm running >>> Bioperl 1.5.0 on RedHat. >>> >>> For a test input: >>> >>>> test >>> ATGATGATGATGATGTGA >>> >>> the following code is fine. >>> >>> while((my $seqobj = $seq_in->next_seq())) >>> { >>> print "\n".$seqobj->display_id; >>> my $len = $seqobj->length(); >>> print " length: $len"; >>> my $frame1_obj = $seqobj->translate(); >>> my $f1_prot = $frame1_obj->seq(); >>> print "\n$f1_prot"; >>> } >>> >>> Output: >>> >>> test length: 18 >>> MMMMM* >>> >>> But if I want to change the frame as specified in the BioPerl >>> tutorial, by using: >>> >>> my $frame1_obj = $seqobj->translate(frame => 1); # which should now >>> give frame 2, I get: >>> >>> test length: 18 >>> MMMMM-frame >>> >>> The frame is unchanged and the text "-frame" is tacked on the end of >>> the output. The same occurs with translate(frame => 2). >>> >>> Any ideas? Can something as fundamental as translate() really be >>> bugged? or am I guilty of some particularly heinous syntax error? >>> >>> Cheers >>> Derek >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From stefan.kirov at bms.com Tue Apr 29 09:46:39 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Tue, 29 Apr 2008 09:46:39 -0400 Subject: [Bioperl-l] translate() oddities In-Reply-To: References: Message-ID: <481726BF.1060609@bms.com> my $frame1_obj = $seqobj->translate(-frame => 1); not my $frame1_obj = $seqobj->translate(frame => 1); Stefan Derek Gatherer wrote: > Hi > > I thought I'd better run this by the community before I embarrass > myself on Bugzilla. It seems like a clear bug to me. I'm running > Bioperl 1.5.0 on RedHat. > > For a test input: > > >test > ATGATGATGATGATGTGA > > the following code is fine. > > while((my $seqobj = $seq_in->next_seq())) > { > print "\n".$seqobj->display_id; > my $len = $seqobj->length(); > print " length: $len"; > my $frame1_obj = $seqobj->translate(); > my $f1_prot = $frame1_obj->seq(); > print "\n$f1_prot"; > } > > Output: > > test length: 18 > MMMMM* > > But if I want to change the frame as specified in the BioPerl > tutorial, by using: > > my $frame1_obj = $seqobj->translate(frame => 1); # which should now > give frame 2, I get: > > test length: 18 > MMMMM-frame > > The frame is unchanged and the text "-frame" is tacked on the end of > the output. The same occurs with translate(frame => 2). > > Any ideas? Can something as fundamental as translate() really be > bugged? or am I guilty of some particularly heinous syntax error? > > Cheers > Derek > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Apr 29 11:00:00 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 29 Apr 2008 10:00:00 -0500 Subject: [Bioperl-l] translate() oddities In-Reply-To: <4817303E.1040903@gmail.com> References: <481726BF.1060609@bms.com> <4817303E.1040903@gmail.com> Message-ID: <36045A08-AEA8-4639-A384-1DC53B5DC129@uiuc.edu> Yes the interface changed somewhat post 1.5.1, mainly to accept named parameters. I think a few methods do this now as passing in lists of more than 2 args, undef'ing those one doesn't want set, gets confusing. chris On Apr 29, 2008, at 9:27 AM, Roy Chaudhuri wrote: > Spent two minutes looking at this, so may as well chip in with what > I discovered even though you solved your problem. > > This "bug" comes about because in version 1.5.1 and earlier, the > arguments to translate were a simple list, with the first argument > the terminator (defaults to "*"). Your old version therefore assumed > that you wanted to translate the stop codon to "-frame". Amusingly > given your typo, if you miss the hyphen off the frame argument in > version 1.5.2 it reverts to the old interface and you end up with > the output "MMMMMframe". The moral of the story is of course to read > the docs relevant to the version you are using. > > Roy. > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > > Derek Gatherer wrote: >> Thanks Stefan >> Actually, there was a typo in my message, I did use -frame => 1. >> However, the problem disappears on upgrading from 1.5.0 to 1.5.2. >> So not a bug anymore. >> Cheers >> Derek >> At 14:46 29/04/2008, Stefan Kirov wrote: >>> my $frame1_obj = $seqobj->translate(-frame => 1); >>> not >>> my $frame1_obj = $seqobj->translate(frame => 1); >>> Stefan >>> >>> Derek Gatherer wrote: >>>> Hi >>>> >>>> I thought I'd better run this by the community before I embarrass >>>> myself on Bugzilla. It seems like a clear bug to me. I'm running >>>> Bioperl 1.5.0 on RedHat. >>>> >>>> For a test input: >>>> >>>>> test >>>> ATGATGATGATGATGTGA >>>> >>>> the following code is fine. >>>> >>>> while((my $seqobj = $seq_in->next_seq())) >>>> { >>>> print "\n".$seqobj->display_id; >>>> my $len = $seqobj->length(); >>>> print " length: $len"; >>>> my $frame1_obj = $seqobj->translate(); >>>> my $f1_prot = $frame1_obj->seq(); >>>> print "\n$f1_prot"; >>>> } >>>> >>>> Output: >>>> >>>> test length: 18 >>>> MMMMM* >>>> >>>> But if I want to change the frame as specified in the BioPerl >>>> tutorial, by using: >>>> >>>> my $frame1_obj = $seqobj->translate(frame => 1); # which should now >>>> give frame 2, I get: >>>> >>>> test length: 18 >>>> MMMMM-frame >>>> >>>> The frame is unchanged and the text "-frame" is tacked on the end >>>> of >>>> the output. The same occurs with translate(frame => 2). >>>> >>>> Any ideas? Can something as fundamental as translate() really be >>>> bugged? or am I guilty of some particularly heinous syntax error? >>>> >>>> Cheers >>>> Derek >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Apr 29 11:07:30 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 29 Apr 2008 10:07:30 -0500 Subject: [Bioperl-l] translate() oddities In-Reply-To: <481726BF.1060609@bms.com> References: <481726BF.1060609@bms.com> Message-ID: <18DB95FB-52B9-4091-ACEE-996891F8A5AE@uiuc.edu> As an aside, I've been playing around with perl6 (Rakudo) for a bit now. Parameter-like passing (using autoaccessors and other means) will be added in soon, so you will be able to do this: $seqobj = Seq.new(seq => 'ATGATGATGATGATGTGA', alphabet => 'dna'); my $protobj = $seq.translate(frame => 1); Yes, I'm a geek. ; > chris On Apr 29, 2008, at 8:46 AM, Stefan Kirov wrote: > my $frame1_obj = $seqobj->translate(-frame => 1); > not > my $frame1_obj = $seqobj->translate(frame => 1); > Stefan > > Derek Gatherer wrote: >> Hi >> >> I thought I'd better run this by the community before I embarrass >> myself on Bugzilla. It seems like a clear bug to me. I'm running >> Bioperl 1.5.0 on RedHat. >> >> For a test input: >> >>> test >> ATGATGATGATGATGTGA >> >> the following code is fine. >> >> while((my $seqobj = $seq_in->next_seq())) >> { >> print "\n".$seqobj->display_id; >> my $len = $seqobj->length(); >> print " length: $len"; >> my $frame1_obj = $seqobj->translate(); >> my $f1_prot = $frame1_obj->seq(); >> print "\n$f1_prot"; >> } >> >> Output: >> >> test length: 18 >> MMMMM* >> >> But if I want to change the frame as specified in the BioPerl >> tutorial, by using: >> >> my $frame1_obj = $seqobj->translate(frame => 1); # which should now >> give frame 2, I get: >> >> test length: 18 >> MMMMM-frame >> >> The frame is unchanged and the text "-frame" is tacked on the end of >> the output. The same occurs with translate(frame => 2). >> >> Any ideas? Can something as fundamental as translate() really be >> bugged? or am I guilty of some particularly heinous syntax error? >> >> Cheers >> Derek >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dr.hogart at gmail.com Tue Apr 29 11:57:51 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Tue, 29 Apr 2008 19:57:51 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine Message-ID: Hi all! I am trying to perform TCoffe aligment by Bio::Tools::Run::Alignment::TCoffee wrapper as subroutine into the script. This subroutine works fine, but it is not single subroutine - there are a lot of other ones in the script. The problem is when compilation of script finish execution (nb! successful execution) of tcoffee subroutine the compiliation of the end of the script also interrupted. It seems that the tcoffee program itself induce interraption of perl compilation. Is it possible to pass this problem? -- From darin.london at duke.edu Tue Apr 29 12:49:53 2008 From: darin.london at duke.edu (darin.london at duke.edu) Date: Tue, 29 Apr 2008 12:49:53 -0400 Subject: [Bioperl-l] BOSC 2008 Announcement and Call For Submissions Message-ID: <200804291650.m3TGnr0H020814@tenero.duhs.duke.edu> BOSC 2008 Call for Abstracts Reminder The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will take place in Toronto, Ontario, Canada, as one of several Special Interest Group (SIG) meetings occurring in conjunction with the 16th annual Intelligent Systems for Molecular Biology Conference (ISMB 2008). This is a reminder to submit your proposals for talks to the BOSC submission system before May 11. Submission Process: All abstracts must be submitted through our Open Conference Systems site (http://events.open-bio.org/BOSC2008/openconf.php). The form will ask for a small Abstract Text to be pasted into it, and a full paper. The small Abstract text should be a summary, while the longer abstract (should provide more details, including the open-source license requirement details) Full-length abstracts are limited to one page with one inch (2.5 cm) margins on the top, sides, and bottom. The full-length abstract should include the title, authors, and affiliations. We prefer your abstract to be in PDF format, although plain t Important Dates: May 11: Abstract submission deadline. June 2: Notification of accepted talks. June 4: Early registration discount cut-off. July 18-19: BOSC 2008! We hope to see you at BOSC 2008! Kam Dahlquist and Darin London BOSC 2008 Co-organizers From bix at sendu.me.uk Tue Apr 29 12:54:41 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 29 Apr 2008 17:54:41 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: Message-ID: <481752D1.7010904@sendu.me.uk> sergei ryazansky wrote: > I am trying to perform TCoffe aligment by > Bio::Tools::Run::Alignment::TCoffee wrapper as subroutine into the > script. This subroutine works fine, but it is not single subroutine - > there are a lot of other ones in the script. The problem is when > compilation of script finish execution (nb! successful execution) of > tcoffee subroutine the compiliation of the end of the script also > interrupted. It seems that the tcoffee program itself induce > interraption of perl compilation. Is it possible to pass this problem? You'll have to supply us with a minimal version of the script and the complete error message. From dr.hogart at gmail.com Wed Apr 30 07:24:35 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Wed, 30 Apr 2008 15:24:35 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine References: Message-ID: On Tue, 29 Apr 2008 19:57:51 +0400, sergei ryazansky wrote: > Hi all! > > I am trying to perform TCoffe aligment by > Bio::Tools::Run::Alignment::TCoffee wrapper as subroutine into the > script. This subroutine works fine, but it is not single subroutine - > there are a lot of other ones in the script. The problem is when > compilation of script finish execution (nb! successful execution) of > tcoffee subroutine the compiliation of the end of the script also > interrupted. It seems that the tcoffee program itself induce > interraption of perl compilation. Is it possible to pass this problem? > My subroutine is following: sub align { my $file=shift @_; my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => 'fasta', 'outfile' => 'temp_align.out'); my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); my $aln=$factory->align ($file); open (fy,'temp_align.out'); my @temp_file=; close fy; return @temp_file; } This subroutine is called by the following command: my @align_fa = align($inputfile_align); After successful execution of this subroutine (accompaning with the corresponding messages on the terminal window) the execution of remainder script is terminated without any error messages. -- From bix at sendu.me.uk Wed Apr 30 08:47:17 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 30 Apr 2008 13:47:17 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: Message-ID: <48186A55.4030406@sendu.me.uk> sergei ryazansky wrote: > My subroutine is following: > > sub align { > my $file=shift @_; > my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => > 'fasta', 'outfile' => 'temp_align.out'); > my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); > my $aln=$factory->align ($file); > open (fy,'temp_align.out'); my @temp_file=; close fy; > return @temp_file; > } > > This subroutine is called by the following command: > > my @align_fa = align($inputfile_align); > > After successful execution of this subroutine (accompaning with the > corresponding messages on the terminal window) the execution of > remainder script is terminated without any error messages. The problem lies somewhere within the rest of your script, so we have to see it if you want help. Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you don't make use of the resulting alignment object? A system call might make more sense given what you're doing. The beauty of Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse the result file (temp_align.out) yourself. From dr.hogart at gmail.com Wed Apr 30 09:36:58 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Wed, 30 Apr 2008 17:36:58 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine References: <48186A55.4030406@sendu.me.uk> Message-ID: On Wed, 30 Apr 2008 16:47:17 +0400, Sendu Bala wrote: > sergei ryazansky wrote: >> My subroutine is following: >> sub align { >> my $file=shift @_; >> my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => >> 'fasta', 'outfile' => 'temp_align.out'); >> my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >> my $aln=$factory->align ($file); >> open (fy,'temp_align.out'); my @temp_file=; close fy; >> return @temp_file; >> } >> This subroutine is called by the following command: >> my @align_fa = align($inputfile_align); >> After successful execution of this subroutine (accompaning with the >> corresponding messages on the terminal window) the execution of >> remainder script is terminated without any error messages. > > The problem lies somewhere within the rest of your script, so we have to > see it if you want help. > > Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you > don't make use of the resulting alignment object? A system call might > make more sense given what you're doing. The beauty of > Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse the > result file (temp_align.out) yourself. The rest of script,imho, is ok, because without this sub it is work fine. May be problem lies into the TCoffee itself? One of the feature of script is to estimate the quantity of nt changes in each position in the different similar sequences in comparing with consensus sequences. To perform this it is nesseccary to obtain the multiply alignment: the result of TCoffee alignment goes to another subroutine, that estemated the level of changes. Of course, I dont think that this way is the best approach, most probably there are a lot of the better ways to do it. But for my today purposes it is ok. -- From avilella at gmail.com Wed Apr 30 10:16:56 2008 From: avilella at gmail.com (Albert Vilella) Date: Wed, 30 Apr 2008 15:16:56 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> Message-ID: <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> Hi Sergei, Can you try to isolate this call with a simpler example to see if it still fails? When you say that the problems are in the compilation, do you mean that the interpreter won't even compile or that it fails during execution? Have you checked that you have all the dependencies right? Cheers, Albert. On Wed, Apr 30, 2008 at 2:36 PM, sergei ryazansky wrote: > On Wed, 30 Apr 2008 16:47:17 +0400, Sendu Bala wrote: > > sergei ryazansky wrote: > > > > > My subroutine is following: > > > sub align { > > > my $file=shift @_; > > > my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => > > > 'fasta', 'outfile' => 'temp_align.out'); > > > my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); > > > my $aln=$factory->align ($file); > > > open (fy,'temp_align.out'); my @temp_file=; close fy; > > > return @temp_file; > > > } > > > This subroutine is called by the following command: > > > my @align_fa = align($inputfile_align); > > > After successful execution of this subroutine (accompaning with the > > > corresponding messages on the terminal window) the execution of remainder > > > script is terminated without any error messages. > > > > > > > The problem lies somewhere within the rest of your script, so we have to > > see it if you want help. > > > > Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you > > don't make use of the resulting alignment object? A system call might make > > more sense given what you're doing. The beauty of > > Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse the > > result file (temp_align.out) yourself. > > > > The rest of script,imho, is ok, because without this sub it is work fine. > May be problem lies into the TCoffee itself? > > One of the feature of script is to estimate the quantity of nt changes in > each position in the different similar sequences in comparing with consensus > sequences. To perform this it is nesseccary to obtain the multiply > alignment: the result of TCoffee alignment goes to another subroutine, that > estemated the level of changes. Of course, I dont think that this way is the > best approach, most probably there are a lot of the better ways to do it. > But for my today purposes it is ok. > > -- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bix at sendu.me.uk Wed Apr 30 10:22:01 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 30 Apr 2008 15:22:01 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> Message-ID: <48188089.8000300@sendu.me.uk> sergei ryazansky wrote: > On Wed, 30 Apr 2008 16:47:17 +0400, Sendu Bala wrote: > >> sergei ryazansky wrote: >>> My subroutine is following: >>> sub align { >>> my $file=shift @_; >>> my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => >>> 'fasta', 'outfile' => 'temp_align.out'); >>> my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >>> my $aln=$factory->align ($file); >>> open (fy,'temp_align.out'); my @temp_file=; close fy; >>> return @temp_file; >>> } >>> This subroutine is called by the following command: >>> my @align_fa = align($inputfile_align); >>> After successful execution of this subroutine (accompaning with the >>> corresponding messages on the terminal window) the execution of >>> remainder script is terminated without any error messages. >> >> The problem lies somewhere within the rest of your script, so we have >> to see it if you want help. > > The rest of script,imho, is ok, because without this sub it is work > fine. May be problem lies into the TCoffee itself? I've run your subroutine in a simple script of my own and it doesn't cause script termination. Again, the problem lies elsewhere in your script. Supply it or it is impossible for anyone to help you. From Sebastien.Moretti at unil.ch Wed Apr 30 10:06:28 2008 From: Sebastien.Moretti at unil.ch (Sebastien MORETTI) Date: Wed, 30 Apr 2008 16:06:28 +0200 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> Message-ID: <48187CE4.8030606@unil.ch> >>> My subroutine is following: >>> sub align { >>> my $file=shift @_; >>> my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => >>> 'fasta', 'outfile' => 'temp_align.out'); >>> my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >>> my $aln=$factory->align ($file); >>> open (fy,'temp_align.out'); my @temp_file=; close fy; >>> return @temp_file; >>> } >>> This subroutine is called by the following command: >>> my @align_fa = align($inputfile_align); >>> After successful execution of this subroutine (accompaning with the >>> corresponding messages on the terminal window) the execution of >>> remainder script is terminated without any error messages. >> >> The problem lies somewhere within the rest of your script, so we have >> to see it if you want help. >> >> Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you >> don't make use of the resulting alignment object? A system call might >> make more sense given what you're doing. The beauty of >> Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse >> the result file (temp_align.out) yourself. > > The rest of script,imho, is ok, because without this sub it is work > fine. May be problem lies into the TCoffee itself? > > One of the feature of script is to estimate the quantity of nt changes > in each position in the different similar sequences in comparing with > consensus sequences. To perform this it is nesseccary to obtain the > multiply alignment: the result of TCoffee alignment goes to another > subroutine, that estemated the level of changes. Of course, I dont think > that this way is the best approach, most probably there are a lot of the > better ways to do it. But for my today purposes it is ok. Do you have tried to use the tcoffee command, called via bioperl, as a command line ? To check if it is a problem with tcoffee or with the tcoffee release that bioperl must use. -- S?bastien Moretti From dr.hogart at gmail.com Wed Apr 30 10:54:59 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Wed, 30 Apr 2008 18:54:59 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> Message-ID: Hi Albert, The isolated call is executed without any problem, so the code is absolutely correct. The problem arise when this sub executed within the whole script - after successful execution of TCoffee alignment the execution of the rest of script is terminated. The whole code is very big (~500 lines), so for simplicity lets imagine the sheme of script in the following view: sub1; sub2; sub3; sub align; # TCoffe alignment; sub4; sub5; Each sub (subroutine) is independent from the others subs; The order of script execution is 1,2,3,align,4,5. But after the execution of align the execution of the rest of subs (4 and 5) is terminated. The script without sub align {} successfully execute the sub 4 and sub 5. So, I mean that interpreter won't compile sub 4 and 5 if sub align is placed before them. On Wed, 30 Apr 2008 18:16:56 +0400, Albert Vilella wrote: > Hi Sergei, > > Can you try to isolate this call with a simpler example to see if it > still > fails? When you say that the problems are in the compilation, do you mean > that the interpreter won't even compile or that it fails during > execution? > Have you checked that you have all the dependencies right? > > Cheers, > > Albert. > > On Wed, Apr 30, 2008 at 2:36 PM, sergei ryazansky > wrote: > >> On Wed, 30 Apr 2008 16:47:17 +0400, Sendu Bala wrote: >> >> sergei ryazansky wrote: >> > >> > > My subroutine is following: >> > > sub align { >> > > my $file=shift @_; >> > > my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => >> > > 'fasta', 'outfile' => 'temp_align.out'); >> > > my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >> > > my $aln=$factory->align ($file); >> > > open (fy,'temp_align.out'); my @temp_file=; close fy; >> > > return @temp_file; >> > > } >> > > This subroutine is called by the following command: >> > > my @align_fa = align($inputfile_align); >> > > After successful execution of this subroutine (accompaning with the >> > > corresponding messages on the terminal window) the execution of >> remainder >> > > script is terminated without any error messages. >> > > >> > >> > The problem lies somewhere within the rest of your script, so we have >> to >> > see it if you want help. >> > >> > Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you >> > don't make use of the resulting alignment object? A system call might >> make >> > more sense given what you're doing. The beauty of >> > Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse >> the >> > result file (temp_align.out) yourself. >> > >> >> The rest of script,imho, is ok, because without this sub it is work >> fine. >> May be problem lies into the TCoffee itself? >> >> One of the feature of script is to estimate the quantity of nt changes >> in >> each position in the different similar sequences in comparing with >> consensus >> sequences. To perform this it is nesseccary to obtain the multiply >> alignment: the result of TCoffee alignment goes to another subroutine, >> that >> estemated the level of changes. Of course, I dont think that this way >> is the >> best approach, most probably there are a lot of the better ways to do >> it. >> But for my today purposes it is ok. >> >> -- >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- ?????????? M2, ????????????? ???????? ?????????? Opera: http://www.opera.com/mail/mail/ From dr.hogart at gmail.com Wed Apr 30 11:14:09 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Wed, 30 Apr 2008 19:14:09 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine References: <48186A55.4030406@sendu.me.uk> <48187CE4.8030606@unil.ch> Message-ID: No, I didn tried. To tell the truth the problem like this I have obtatin earlier. I simply wanted to aling the several set of sequences by TCoffee Bioperl package. The script should have been consequently add the set one after another to TCoffee wrapper. But after the alignment of the first set of sequences the alignment of the rest sets was terminated. So it was neccessary to use another "super_script" that called first script with different arguments linked to the corresponding set. > Do you have tried to use the tcoffee command, called via bioperl, as a > command line ? -- From bix at sendu.me.uk Wed Apr 30 11:28:50 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 30 Apr 2008 16:28:50 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> Message-ID: <48189032.20102@sendu.me.uk> sergei ryazansky wrote: > Hi Albert, > > The isolated call is executed without any problem, so the code is > absolutely correct. The problem arise when this sub executed within the > whole script - after successful execution of TCoffee alignment the > execution of the rest of script is terminated. The whole code is very > big (~500 lines), so for simplicity lets imagine the sheme of script in > the following view: > sub1; > sub2; > sub3; > sub align; # TCoffe alignment; > sub4; > sub5; > > Each sub (subroutine) is independent from the others subs; The order of > script execution is 1,2,3,align,4,5. But after the execution of align > the execution of the rest of subs (4 and 5) is terminated. The script > without sub align {} successfully execute the sub 4 and sub 5. So, I > mean that interpreter won't compile sub 4 and 5 if sub align is placed > before them. This has nothing to do with interpreter compilation, which is successful if the script runs at all. What do you do with the output of &align? The thing you are doing with that output is most likely the cause of your script terminating, which is why &sub4 and &sub5 run when you don't run &align (have no output that causes the problem). If you're not willing to show us your script, here are some simple debugging steps you can do yourself: # don't do anything with the output of align() - does &sub4 still run? # add some print statements after you call align(), and then after every further block of code in your script to see exactly where the script terminates # reduce your script down to a minimal script that shows the problem (with the help of the previous step) and show us that From dr.hogart at gmail.com Wed Apr 30 11:42:41 2008 From: dr.hogart at gmail.com (Sergei Ryazansky) Date: Wed, 30 Apr 2008 19:42:41 +0400 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> Message-ID: ------- Forwarded message ------- From: "Sergei Ryazansky" To: "Sendu Bala" Cc: Subject: Re: [Bioperl-l] alignment by TCoffee as a subroutine Date: Wed, 30 Apr 2008 19:40:26 +0400 > What do you do with the output of &align? The thing you are doing with > that output is most likely the cause of your script terminating, which > is why &sub4 and &sub5 run when you don't run &align (have no output > that causes the problem). please sea my answer to Sebastien Moretti - there are description of another similar problem. The only thing that I did there with output is printing to file. Nevetheless the problem was the same. > # don't do anything with the output of align() - does &sub4 still run? please sea above. > # add some print statements after you call align(), and then after every > further block of code in your script to see exactly where the script > terminates > # reduce your script down to a minimal script that shows the problem > (with the help of the previous step) and show us that all tests with individual bloks was performed earlier. the results is ok. From cjfields at uiuc.edu Wed Apr 30 12:25:06 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 30 Apr 2008 11:25:06 -0500 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> Message-ID: <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Sergei, I agree with Sendu; we can't diagnose this unless we either have the entire script of a minimal version of it demonstrating the bug. The best way to handle this is to file a bug report, attaching relevant data using the 'Create a new attachment' link (including either the full script or a shortened one which demonstrates the bug). Otherwise we're just shooting in the dark trying to diagnose the problem. http://bugzilla.open-bio.org/ chris On Apr 30, 2008, at 10:42 AM, Sergei Ryazansky wrote: > > > ------- Forwarded message ------- > From: "Sergei Ryazansky" > To: "Sendu Bala" > Cc: > Subject: Re: [Bioperl-l] alignment by TCoffee as a subroutine > Date: Wed, 30 Apr 2008 19:40:26 +0400 > >> What do you do with the output of &align? The thing you are doing >> with that output is most likely the cause of your script >> terminating, which is why &sub4 and &sub5 run when you don't run >> &align (have no output that causes the problem). > > please sea my answer to Sebastien Moretti - there are description of > another similar problem. The only thing that I did there with output > is > printing to file. Nevetheless the problem was the same. > >> # don't do anything with the output of align() - does &sub4 still >> run? > > please sea above. > >> # add some print statements after you call align(), and then after >> every further block of code in your script to see exactly where the >> script terminates >> # reduce your script down to a minimal script that shows the >> problem (with the help of the previous step) and show us that > > all tests with individual bloks was performed earlier. the results > is ok. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dr.hogart at gmail.com Wed Apr 30 12:40:19 2008 From: dr.hogart at gmail.com (Sergei Ryazansky) Date: Wed, 30 Apr 2008 20:40:19 +0400 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Message-ID: On Wed, 30 Apr 2008 20:25:06 +0400, Chris Fields wrote: Chris, I have already sent file to Sendu and also I am attaching it here. I have removed from it really unnecessary parts. > Sergei, > > I agree with Sendu; we can't diagnose this unless we either have the > entire script of a minimal version of it demonstrating the bug. > > The best way to handle this is to file a bug report, attaching relevant > data using the 'Create a new attachment' link (including either the full > script or a shortened one which demonstrates the bug). Otherwise we're > just shooting in the dark trying to diagnose the problem. > > http://bugzilla.open-bio.org/ > > chris -------------- next part -------------- A non-text attachment was scrubbed... Name: script.pl Type: application/octet-stream Size: 6870 bytes Desc: not available URL: From cjfields at uiuc.edu Wed Apr 30 13:02:19 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 30 Apr 2008 12:02:19 -0500 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Message-ID: Hmm, maybe you were confused? From my last email: "The best way to handle this is to file a bug report, attaching relevant data using the 'Create a new attachment' link (including either the full script or a shortened one which demonstrates the bug). Otherwise we're just shooting in the dark trying to diagnose the problem." http://bugzilla.open-bio.org/ Anyone can work on fixing the issue there (so it'll probably get fixed faster). The devs can also track progress on the problem via the dev mail list (bioperl-guts). Diagnosing the bug may also reveal issues not just with Bio::Tools::Run::Alignment::TCoffee but also with other related modules. If needed I can post it to bugzilla, but it helps to submit the bug yourself (so you can receive posts on it's progress). chris On Apr 30, 2008, at 11:40 AM, Sergei Ryazansky wrote: > On Wed, 30 Apr 2008 20:25:06 +0400, Chris Fields > wrote: > > Chris, I have already sent file to Sendu and also I am attaching it > here. I have removed from it really unnecessary parts. > >> Sergei, >> >> I agree with Sendu; we can't diagnose this unless we either have >> the entire script of a minimal version of it demonstrating the bug. >> >> The best way to handle this is to file a bug report, attaching >> relevant data using the 'Create a new attachment' link (including >> either the full script or a shortened one which demonstrates the >> bug). Otherwise we're just shooting in the dark trying to diagnose >> the problem. >> >> http://bugzilla.open-bio.org/ >> >> chris From dr.hogart at gmail.com Wed Apr 30 13:39:35 2008 From: dr.hogart at gmail.com (Sergei Ryazansky) Date: Wed, 30 Apr 2008 21:39:35 +0400 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Message-ID: On Wed, 30 Apr 2008 21:11:56 +0400, Sergei Ryazansky wrote: > Oh, sorry, you right - I too fast read you message. I do it slight later. > >> Hmm, maybe you were confused? From my last email: >> >> "The best way to handle this is to file a bug report, attaching >> relevant data using the 'Create a new attachment' link (including >> either the full script or a shortened one which demonstrates the bug). >> Otherwise we're just shooting in the dark trying to diagnose the >> problem." >> >> http://bugzilla.open-bio.org/ >> >> Anyone can work on fixing the issue there (so it'll probably get fixed >> faster). The devs can also track progress on the problem via the dev >> mail list (bioperl-guts). Diagnosing the bug may also reveal issues >> not just with Bio::Tools::Run::Alignment::TCoffee but also with other >> related modules. >> >> If needed I can post it to bugzilla, but it helps to submit the bug >> yourself (so you can receive posts on it's progress). >> >> chris >> >> On Apr 30, 2008, at 11:40 AM, Sergei Ryazansky wrote: >> >>> On Wed, 30 Apr 2008 20:25:06 +0400, Chris Fields >>> wrote: >>> >>> Chris, I have already sent file to Sendu and also I am attaching it >>> here. I have removed from it really unnecessary parts. >>> >>>> Sergei, >>>> >>>> I agree with Sendu; we can't diagnose this unless we either have the >>>> entire script of a minimal version of it demonstrating the bug. >>>> >>>> The best way to handle this is to file a bug report, attaching >>>> relevant data using the 'Create a new attachment' link (including >>>> either the full script or a shortened one which demonstrates the >>>> bug). Otherwise we're just shooting in the dark trying to diagnose >>>> the problem. >>>> >>>> http://bugzilla.open-bio.org/ >>>> >>>> chris > From cjfields at uiuc.edu Wed Apr 30 14:29:28 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 30 Apr 2008 13:29:28 -0500 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Message-ID: <39A139E4-6783-41E6-8EE9-1FE60CB57577@uiuc.edu> Sorry, didn't catch that... chris On Apr 30, 2008, at 12:39 PM, Sergei Ryazansky wrote: > On Wed, 30 Apr 2008 21:11:56 +0400, Sergei Ryazansky > wrote: > >> Oh, sorry, you right - I too fast read you message. I do it slight >> later. >> >>> Hmm, maybe you were confused? From my last email: >>> >>> "The best way to handle this is to file a bug report, attaching >>> relevant data using the 'Create a new attachment' link (including >>> either the full script or a shortened one which demonstrates the >>> bug). Otherwise we're just shooting in the dark trying to diagnose >>> the problem." >>> >>> http://bugzilla.open-bio.org/ >>> >>> Anyone can work on fixing the issue there (so it'll probably get >>> fixed faster). The devs can also track progress on the problem >>> via the dev mail list (bioperl-guts). Diagnosing the bug may also >>> reveal issues not just with Bio::Tools::Run::Alignment::TCoffee >>> but also with other related modules. >>> >>> If needed I can post it to bugzilla, but it helps to submit the >>> bug yourself (so you can receive posts on it's progress). >>> >>> chris >>> >>> On Apr 30, 2008, at 11:40 AM, Sergei Ryazansky wrote: >>> >>>> On Wed, 30 Apr 2008 20:25:06 +0400, Chris Fields >>>> wrote: >>>> >>>> Chris, I have already sent file to Sendu and also I am attaching >>>> it here. I have removed from it really unnecessary parts. >>>> >>>>> Sergei, >>>>> >>>>> I agree with Sendu; we can't diagnose this unless we either have >>>>> the entire script of a minimal version of it demonstrating the >>>>> bug. >>>>> >>>>> The best way to handle this is to file a bug report, attaching >>>>> relevant data using the 'Create a new attachment' link >>>>> (including either the full script or a shortened one which >>>>> demonstrates the bug). Otherwise we're just shooting in the dark >>>>> trying to diagnose the problem. >>>>> >>>>> http://bugzilla.open-bio.org/ >>>>> >>>>> chris >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Bank.Beszteri at awi.de Tue Apr 1 08:31:49 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Tue, 01 Apr 2008 14:31:49 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL Message-ID: <47F22B35.1030502@awi.de> Dear list, we have recently started to try to find a solution for indexing large sequence databases / flat files for a java project, and because we ran into problems using biojava, and because both the OBDA and BioSQL ways seem to be compatible across bio~ projects, we also started to experiment with bioperl. It looks like this should work fine, but we had a couple of problems here, too. Perhaps some of you can give me hint what we are doing wrong! The first thing we tried was to use Bio::DB::Flat for indexing a TrEMBL flat file (~ 12 GB); but it seems we haven?t got a machine with enough memory to be able to handle this. (Perhaps you would be using the "bdb" style index in such a case in bioperl, but this apparently doesn?t work with biojava, so we had to stick with "flat"). So next we started to test BioSQL, by trying to load just Swissprot in a MySQL DB first, like: load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser xyz --dbpass abc --driver mysql --namespace uniprot_sprot --format swiss uniprot_sprot.dat Here we get an error message ########################################### Loading /biodb/spinkern/uniprot_sprot.dat ... Could not store Q6DAH5: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The supplied lineage does not start near 'Erwinia carotovora subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | Pectobacterium | Enterobacteriaceae | Enterobacteriales | Gammaproteobacteria | Proteobacteria | Bacteria') STACK: Error::throw STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/bioperl-1.5.2_102/Bio/Root/Root.pm:359 STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/bioperl-1.5.2_102/Bio/Species.pm:174 STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:552 STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1305 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:973 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182 STACK: Bio::DB::Persistent::PersistentObject::create /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271 STACK: load_seqdatabase.pl:622 ----------------------------------------------------------- at load_seqdatabase.pl line 635 ############################################ or similar, depending on whether we use a pre-loaded ncbi taxonomy or not, and which Swissprot release we are trying to load. It often seems to come from sg. like here, subsp. or other special addition to the species line; but alternative genus names and other curious things also to appear. It looks like Species.pm tries to validate the species name against the lineage info already there in the BioSQL DB, and in several cases, it finds inconsistencies. If we start with the ncbi taxonomy already loaded in the database, the first error comes much earlier. I found a thread on the same problem from ~ two years ago (http://thread.gmane.org/gmane.comp.lang.perl.bio.general/13766/focus=13788), where the solution recommended was to update bioperl, so I was quite surprised to find the problem with the version you can see above (1.5.2_102 bioperl core, 1.5.2_100 bioperl_db). Can someone give me any hints as to what is going wrong here? The only workaround we have found so far was to comment out line 174 in Species.pm: $self->throw("The supplied lineage does not start near '$name' (I was supplied '".join(" | ", @vals)."')"); After doing so, load_seqdatabase.pl runs for several hours (until it evetually crashes; I haven?t found out yet why), but proceeds really slowly. I also found some info on this for Pg and Oracle in the mailing list, but has anyone some approximate numbers for MySQL, how long should a first Swissprot load take? Would be grateful to hear about your ideas / experiences on these issues! Bank Beszteri Bioinformatics / Scientific Computing Alfred Wegener Institute for Polar and Marine Research Am Handelshafen 12. 27570 Bremerhaven Germany From cjfields at uiuc.edu Tue Apr 1 20:45:28 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 1 Apr 2008 19:45:28 -0500 Subject: [Bioperl-l] quick update on bioperl nightly builds Message-ID: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> I'm simplifying the nightly build archive names (removing svn revision # and date) in case anyone needs to update bioperl-live/run/db/network on a regular basis (read: GBrowse installations). When I have time I'll start working on automated builds, which will require some extra work with Module::Build and Build.PL. chris From hiekeen at gmail.com Tue Apr 1 22:14:07 2008 From: hiekeen at gmail.com (Jinyan Huang) Date: Wed, 2 Apr 2008 10:14:07 +0800 Subject: [Bioperl-l] How to make a network graphic using my genes in pathways? Message-ID: I have 20 pathways. My interesting genes are in these pathways. There are some genes overlaps in these pathways. How can I make a graphic network using these genes? It means connecting these pathways through these overlap genes. What kind of software can I use? Thank you very much in advance. -- Best regards, Jinyan Huang (ekeen) School of Life Sciences and Technology, 1302 Room Tongji University Siping Road 1239, Shanghai 200092 P.R. China Tel :0086-21-65981041 Msn: hiekeen at hotmail.com eMail: hiekeen at gmail.com From hlapp at gmx.net Tue Apr 1 22:30:06 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 1 Apr 2008 22:30:06 -0400 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47F22B35.1030502@awi.de> References: <47F22B35.1030502@awi.de> Message-ID: On Apr 1, 2008, at 8:31 AM, B?nk Beszteri wrote: > [...] So next we started to test BioSQL, by trying to load just > Swissprot in a MySQL DB first, like: > > load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser > xyz --dbpass abc --driver mysql --namespace uniprot_sprot --format > swiss uniprot_sprot.dat > > Here we get an error message > > ########################################### > > Loading /biodb/spinkern/uniprot_sprot.dat ... > Could not store Q6DAH5: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: The supplied lineage does not start near 'Erwinia carotovora > subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | > Pectobacterium | Enterobacteriaceae | Enterobacteriales | > Gammaproteobacteria | Proteobacteria | Bacteria') > STACK: Error::throw > STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/ > bioperl-1.5.2_102/Bio/Root/Root.pm:359 > STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/ > bioperl-1.5.2_102/Bio/Species.pm:174 > STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: > 552 > STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / > biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:1305 > STACK: > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:973 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:852 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:182 > STACK: Bio::DB::Persistent::PersistentObject::create /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: > 244 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:169 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:251 > STACK: Bio::DB::Persistent::PersistentObject::store /biodb/spinkern/ > bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271 > STACK: load_seqdatabase.pl:622 > ----------------------------------------------------------- > > at load_seqdatabase.pl line 635 > > ############################################ > > or similar, depending on whether we use a pre-loaded ncbi taxonomy > or not I recommend to always use a pre-loaded NCBI taxonomy unless you know there are only a few organisms that are straightforward (for the parser, that is). > , and which Swissprot release we are trying to load. It often seems > to come from sg. like here, subsp. or other special addition to the > species line; but alternative genus names and other curious things > also to appear. It looks like Species.pm tries to validate the > species name against the lineage info already there in the BioSQL > DB, and in several cases, it finds inconsistencies. It actually happens upon a successful lookup when the species object is populated from the database. > [...] > The only workaround we have found so far was to comment out line > 174 in Species.pm: > > $self->throw("The supplied lineage does not start near '$name' (I > was supplied '".join(" | ", @vals)."')"); That should be OK if you work with a pre-loaded taxonomy. It's sort of a sanity check that should catch a parser having messed up a species. If you use a pre-loaded NCBI taxonomy the results of the species parsing don't matter in all details so long as the NCBI taxonID is parsed out correctly, and then found in the database. Note that this actually a warn() in the main trunk version of BioPerl, so you might want to upgrade to that (or change throw() to warn() in your version). You still get the records flagged with that, but it isn't an exception. > > After doing so, load_seqdatabase.pl runs for several hours (until > it evetually crashes; I haven?t found out yet why), but proceeds > really slowly. It should certainly *not* crash. Note also that you can supply --safe on the command line, in which case the script will continue with the next record if one fails to load for whatever reason. You will want to adjust the width constraint of dbxref.accession, for example to 128 chars. This will also be fixed for BioSQL 1.0.1. See http://bugzilla.open-bio.org/show_bug.cgi?id=2474 > I also found some info on this for Pg and Oracle in the mailing > list, but has anyone some approximate numbers for MySQL, how long > should a first Swissprot load take? Possibly around 20 hours according to Erik Rijkers: See http://lists.open-bio.org/pipermail/bioperl-l/2008-March/027427.html You can use the --logchunks N option to have it print out performance statistics every N records. Hope this helps, -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Apr 1 22:38:12 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 1 Apr 2008 22:38:12 -0400 Subject: [Bioperl-l] Very basic implementation of GenBank XML SeqIO module In-Reply-To: <47F13C2C.4070909@umdnj.edu> References: <47F13C2C.4070909@umdnj.edu> Message-ID: Ryan - do you not have a committer account? I do agree with Chris on the test. Modules w/o tests tend to become 'pseudogenized.' -hilmar On Mar 31, 2008, at 3:31 PM, Ryan Golhar wrote: > I have a (very) basic SAX implementation of a SeqIO module to parse > GenBank XML records. Right now, it only reads in basic information > regarding the sequence and the sequence itself. > > It does not yet parse the features table. Should I submit it to be > included in bioperl or wait until I implement more for the features > table? I'm not sure when I'll get around to it though > > Ryan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cain.cshl at gmail.com Tue Apr 1 23:12:04 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 01 Apr 2008 23:12:04 -0400 Subject: [Bioperl-l] quick update on bioperl nightly builds In-Reply-To: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> Message-ID: <1207105924.6184.4.camel@frissell> Hi Chris, The tarball is currently (Apr 1) being built in a tmp directory, so that the extracted tarball is ./tmp/bioperl-live/. Is that intended? Thanks, Scott On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: > I'm simplifying the nightly build archive names (removing svn revision > # and date) in case anyone needs to update bioperl-live/run/db/network > on a regular basis (read: GBrowse installations). When I have time > I'll start working on automated builds, which will require some extra > work with Module::Build and Build.PL. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cjfields at uiuc.edu Tue Apr 1 23:59:30 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 1 Apr 2008 22:59:30 -0500 Subject: [Bioperl-l] quick update on bioperl nightly builds In-Reply-To: <1207105924.6184.4.camel@frissell> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> Message-ID: Nope, that isn't intended. I fixed it and reran it manually, so it should be fine now (note I didn't update the log file; the next cron run will catch that). I may toy around with your recent passthrough flag addition to try getting automated PPM's up and running. chris On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: > Hi Chris, > > The tarball is currently (Apr 1) being built in a tmp directory, so > that > the extracted tarball is ./tmp/bioperl-live/. Is that intended? > > Thanks, > Scott > > On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: >> I'm simplifying the nightly build archive names (removing svn >> revision >> # and date) in case anyone needs to update bioperl-live/run/db/ >> network >> on a regular basis (read: GBrowse installations). When I have time >> I'll start working on automated builds, which will require some extra >> work with Module::Build and Build.PL. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Wed Apr 2 07:33:38 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 2 Apr 2008 07:33:38 -0400 Subject: [Bioperl-l] How to make a network graphic using my genes in pathways? In-Reply-To: References: Message-ID: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> On Tue, Apr 1, 2008 at 10:14 PM, Jinyan Huang wrote: > I have 20 pathways. My interesting genes are in these pathways. There > are some genes overlaps in these pathways. How can I make a graphic > network using these genes? It means connecting these pathways through > these overlap genes. What kind of software can I use? R/Bioconductor has tools for working with graphs and pathways. Cytoscape is another open-source graphical solution. Ingenuity is, of course, not free. If you are looking at a perl solution, you can look at the various graph modules and their integration with the Graphviz libraries. SEan From cain.cshl at gmail.com Wed Apr 2 08:28:22 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 02 Apr 2008 08:28:22 -0400 Subject: [Bioperl-l] [Gmod-gbrowse] quick update on bioperl nightly builds In-Reply-To: References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> Message-ID: <1207139302.6507.7.camel@frissell> Hi Chris, (trimmed out gbrowse mailing list since this is just bioperl business) Speaking of the pass through stuff, Sendu mentioned that I stomped on some changes to Build.PL that you and he did when I committed that change, so it should be rolled back. Is there a good (svn) way to do that? Or should I just copy the contents of the old (good) Build.PL into a fresh file in my checkout and commit it? Thanks, Scott On Tue, 2008-04-01 at 22:59 -0500, Chris Fields wrote: > Nope, that isn't intended. I fixed it and reran it manually, so it > should be fine now (note I didn't update the log file; the next cron > run will catch that). > > I may toy around with your recent passthrough flag addition to try > getting automated PPM's up and running. > > chris > > On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: > > > Hi Chris, > > > > The tarball is currently (Apr 1) being built in a tmp directory, so > > that > > the extracted tarball is ./tmp/bioperl-live/. Is that intended? > > > > Thanks, > > Scott > > > > On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: > >> I'm simplifying the nightly build archive names (removing svn > >> revision > >> # and date) in case anyone needs to update bioperl-live/run/db/ > >> network > >> on a regular basis (read: GBrowse installations). When I have time > >> I'll start working on automated builds, which will require some extra > >> work with Module::Build and Build.PL. > >> > >> chris > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. cain at cshl.edu > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From robert.citek at gmail.com Wed Apr 2 08:24:06 2008 From: robert.citek at gmail.com (Robert Citek) Date: Wed, 2 Apr 2008 07:24:06 -0500 Subject: [Bioperl-l] module for pubchem queries Message-ID: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> Hello all, I have a list of chemical compounds that have some kind of interaction with proteins or genes. The current list contains names or SMILES and I would like to get the CID number for those compounds. Currently, I'm using perl to query the NCBI's eutils[1], which works great. But I was just curious to know of there was a bioperl module to do something similar. A quick google didn't turn up anything, so I thought I'd ask. [1] http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html Regards, - Robert From David.Messina at sbc.su.se Wed Apr 2 08:41:45 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 2 Apr 2008 14:41:45 +0200 Subject: [Bioperl-l] How to make a network graphic using my genes in pathways? In-Reply-To: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> Message-ID: <628aabb70804020541v6cee4584ibd9935290ae7cc0a@mail.gmail.com> I have no personal experience with it, but a colleague of mine suggested VisANT . Dave From cjfields at uiuc.edu Wed Apr 2 11:03:32 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 2 Apr 2008 10:03:32 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] quick update on bioperl nightly builds In-Reply-To: <1207139302.6507.7.camel@frissell> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> <1207139302.6507.7.camel@frissell> Message-ID: <3B490712-3413-4662-99D7-7B115CECB6E1@uiuc.edu> The changes I made were related to problems checking MySQL for Bio::DB::SeqFeature::Store tests when connectivity requires username/ password. For some reason it tests DB connectivity up front, while Bio::DB::GFF assumes the DB setup is correct (no direct DB check) then runs tests assuming the setup is correct. You can view the diffs for your commits here: http://code.open-bio.org/svnweb/index.cgi/bioperl/diff/bioperl-live/trunk/ModuleBuildBioperl.pm?revs=14604&revs=14548 http://code.open-bio.org/svnweb/index.cgi/bioperl/diff/bioperl-live/trunk/Build.PL?revs=14604&revs=14565 I'll try working on merging them together today; it shouldn't be too hard (the changes were fairly minor in both Build.PL and Module::Build). I'll test to make sure your changes stay in as well. Down the road I believe we need to rethink how we want the Build process to run using Module::Build as it's a bit convoluted, but it works for now. chris On Apr 2, 2008, at 7:28 AM, Scott Cain wrote: > Hi Chris, > > (trimmed out gbrowse mailing list since this is just bioperl business) > > Speaking of the pass through stuff, Sendu mentioned that I stomped on > some changes to Build.PL that you and he did when I committed that > change, so it should be rolled back. Is there a good (svn) way to do > that? Or should I just copy the contents of the old (good) Build.PL > into a fresh file in my checkout and commit it? > > Thanks, > Scott > > On Tue, 2008-04-01 at 22:59 -0500, Chris Fields wrote: >> Nope, that isn't intended. I fixed it and reran it manually, so it >> should be fine now (note I didn't update the log file; the next cron >> run will catch that). >> >> I may toy around with your recent passthrough flag addition to try >> getting automated PPM's up and running. >> >> chris >> >> On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: >> >>> Hi Chris, >>> >>> The tarball is currently (Apr 1) being built in a tmp directory, so >>> that >>> the extracted tarball is ./tmp/bioperl-live/. Is that intended? >>> >>> Thanks, >>> Scott >>> >>> On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: >>>> I'm simplifying the nightly build archive names (removing svn >>>> revision >>>> # and date) in case anyone needs to update bioperl-live/run/db/ >>>> network >>>> on a regular basis (read: GBrowse installations). When I have time >>>> I'll start working on automated builds, which will require some >>>> extra >>>> work with Module::Build and Build.PL. >>>> >>>> chris >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. cain at cshl.edu >>> GMOD Coordinator (http://www.gmod.org/) >>> 216-392-3087 >>> Cold Spring Harbor Laboratory >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace >> _______________________________________________ >> Gmod-gbrowse mailing list >> Gmod-gbrowse at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Apr 2 11:54:05 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 2 Apr 2008 10:54:05 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] quick update on bioperl nightly builds In-Reply-To: <3B490712-3413-4662-99D7-7B115CECB6E1@uiuc.edu> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> <1207139302.6507.7.camel@frissell> <3B490712-3413-4662-99D7-7B115CECB6E1@uiuc.edu> Message-ID: <71375DA3-A751-4908-8000-D9ACAE39B19C@uiuc.edu> Okay, committed them. The accept passthrough still appears to work; let me know if anything pops up. chris On Apr 2, 2008, at 10:03 AM, Chris Fields wrote: > ... > I'll try working on merging them together today; it shouldn't be too > hard (the changes were fairly minor in both Build.PL and > Module::Build). I'll test to make sure your changes stay in as > well. Down the road I believe we need to rethink how we want the > Build process to run using Module::Build as it's a bit convoluted, > but it works for now. > > chris > > On Apr 2, 2008, at 7:28 AM, Scott Cain wrote: >> Hi Chris, >> >> (trimmed out gbrowse mailing list since this is just bioperl >> business) >> >> Speaking of the pass through stuff, Sendu mentioned that I stomped on >> some changes to Build.PL that you and he did when I committed that >> change, so it should be rolled back. Is there a good (svn) way to do >> that? Or should I just copy the contents of the old (good) Build.PL >> into a fresh file in my checkout and commit it? >> >> Thanks, >> Scott >> >> On Tue, 2008-04-01 at 22:59 -0500, Chris Fields wrote: >>> Nope, that isn't intended. I fixed it and reran it manually, so it >>> should be fine now (note I didn't update the log file; the next cron >>> run will catch that). >>> >>> I may toy around with your recent passthrough flag addition to try >>> getting automated PPM's up and running. >>> >>> chris >>> >>> On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: >>> >>>> Hi Chris, >>>> >>>> The tarball is currently (Apr 1) being built in a tmp directory, so >>>> that >>>> the extracted tarball is ./tmp/bioperl-live/. Is that intended? >>>> >>>> Thanks, >>>> Scott >>>> >>>> On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: >>>>> I'm simplifying the nightly build archive names (removing svn >>>>> revision >>>>> # and date) in case anyone needs to update bioperl-live/run/db/ >>>>> network >>>>> on a regular basis (read: GBrowse installations). When I have >>>>> time >>>>> I'll start working on automated builds, which will require some >>>>> extra >>>>> work with Module::Build and Build.PL. >>>>> >>>>> chris >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. cain at cshl.edu >>>> GMOD Coordinator (http://www.gmod.org/) >>>> 216-392-3087 >>>> Cold Spring Harbor Laboratory >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace >>> _______________________________________________ >>> Gmod-gbrowse mailing list >>> Gmod-gbrowse at lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. cain.cshl at gmail.com >> GMOD Coordinator (http://www.gmod.org/) >> 216-392-3087 >> Cold Spring Harbor Laboratory >> >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From zhpan99 at yahoo.com Wed Apr 2 13:52:46 2008 From: zhpan99 at yahoo.com (Pan Zheng) Date: Wed, 2 Apr 2008 10:52:46 -0700 (PDT) Subject: [Bioperl-l] installing bioperl-1.5.2 errors:DB_File Message-ID: <726978.82400.qm@web53105.mail.re2.yahoo.com> Hi, I am installing bioperl-1.5.2_102 under cygwin on my Windows XP and having some errors during the process. When I was running "perl Build test", one major error is the error about DB_File. I tried to install DB_File from cpan and rpm without any luck. ++++++++++++++++++++++++ CPAN: File::Temp loaded ok (v0.16) CPAN: YAML loaded ok (v0.62) CPAN.pm: Going to build P/PM/PMQS/DB_File-1.817.tar.gz Parsing config.in... Looks Good. Checking if your kit is complete... Looks good Note (probably harmless): No library found for -ldb Writing Makefile for DB_File cp DB_File.pm blib/lib/DB_File.pm AutoSplitting blib/lib/DB_File.pm (blib/lib/auto/DB_File) gcc -c -I/usr/local/BerkeleyDB/include -DPERL_USE_SAFE_PUTENV -fno-strict-alias ing -pipe -Wdeclaration-after-statement -DUSEIMPORTLIB -O3 -DVERSION=\"1.817\" -DXS_VERSION=\"1.817\" "-I/usr/lib/perl5/5.8/cygwin/CORE" -D_NOT_CORE -DmDB_ Prefix_t=size_t -DmDB_Hash_t=u_int32_t version.c version.c:30:16: db.h: No such file or directory make: *** [version.o] Error 1 PMQS/DB_File-1.817.tar.gz /usr/bin/make -- NOT OK Running make test Can't test without successful make Running make install Make had returned bad status, install seems impossible Failed during this command: PMQS/DB_File-1.817.tar.gz : make NO +++++++++++++++++++++++++++++++++++++++++++++++ I can't remember I had this kind error while installing earlier version. Would you please help me on DB_File installation ? Thanks. Pan --------------------------------- You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. From dr.hogart at gmail.com Thu Apr 3 09:01:03 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Thu, 03 Apr 2008 17:01:03 +0400 Subject: [Bioperl-l] support of clustalw2 in bio::run::tool::alignment Message-ID: As for as I understand clustalw2 is not supported in bioperl v1.5.2.100. In what version it will be realized? Thank you in advance. From slduncan at iastate.edu Thu Apr 3 14:13:16 2008 From: slduncan at iastate.edu (slduncan at iastate.edu) Date: Thu, 3 Apr 2008 13:13:16 -0500 (CDT) Subject: [Bioperl-l] help installing bioperl with cygwin Message-ID: <161313331084931@webmail.iastate.edu> I am trying to use cpan to install bioperl and I had an error message saying: c:\Documents not recognized as and external or internal.... Any ideas here. Also, I am new to the computer world so please be kind. :) Stacy Duncan Iowa State University Bioinformatics and Computational Biology 1802 University Blvd. VMRI Building 6 Ames, IA 50011-1240 office phone: (515) 294-8385 office fax: (515) 294-1401 home phone: (336) 965-5622 e-mail: slduncan at iastate.edu From cjfields at uiuc.edu Fri Apr 4 16:13:23 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 4 Apr 2008 15:13:23 -0500 Subject: [Bioperl-l] help installing bioperl with cygwin In-Reply-To: <161313331084931@webmail.iastate.edu> References: <161313331084931@webmail.iastate.edu> Message-ID: It's best if you use ActiveState's Perl installation (it's the only one we really support at this moment, unless someone wants to give StrawberryPerl a run). See: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows chris On Apr 3, 2008, at 1:13 PM, slduncan at iastate.edu wrote: > I am trying to use cpan to install bioperl and I had an error > message saying: > c:\Documents not recognized as and external or internal.... > Any ideas here. Also, I am new to the computer world so please be > kind. :) > > Stacy Duncan > Iowa State University > Bioinformatics and Computational Biology > 1802 University Blvd. > VMRI Building 6 > Ames, IA 50011-1240 > office phone: (515) 294-8385 > office fax: (515) 294-1401 > home phone: (336) 965-5622 > e-mail: slduncan at iastate.edu > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Apr 4 16:07:12 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 4 Apr 2008 15:07:12 -0500 Subject: [Bioperl-l] installing bioperl-1.5.2 errors:DB_File In-Reply-To: <726978.82400.qm@web53105.mail.re2.yahoo.com> References: <726978.82400.qm@web53105.mail.re2.yahoo.com> Message-ID: I think you have to use the cygwin installer to install DB_File (it also installs dependencies, such as BDB). According to 'perldoc perlcygwin': .... Optional Libraries for Perl on Cygwin Several Perl functions and modules depend on the existence of some optional libraries. Configure will find them if they are installed in one of the directories listed as being used for library searches. Pre- built packages for most of these are available from the Cygwin installer. .... chris On Apr 2, 2008, at 12:52 PM, Pan Zheng wrote: > Hi, > > I am installing bioperl-1.5.2_102 under cygwin on my Windows XP and > having some errors during the process. > > When I was running "perl Build test", one major error is the error > about DB_File. I tried to install DB_File from cpan and rpm without > any luck. > > ++++++++++++++++++++++++ > CPAN: File::Temp loaded ok (v0.16) > CPAN: YAML loaded ok (v0.62) > CPAN.pm: Going to build P/PM/PMQS/DB_File-1.817.tar.gz > Parsing config.in... > Looks Good. > Checking if your kit is complete... > Looks good > Note (probably harmless): No library found for -ldb > Writing Makefile for DB_File > cp DB_File.pm blib/lib/DB_File.pm > AutoSplitting blib/lib/DB_File.pm (blib/lib/auto/DB_File) > gcc -c -I/usr/local/BerkeleyDB/include -DPERL_USE_SAFE_PUTENV -fno- > strict-alias > ing -pipe -Wdeclaration-after-statement -DUSEIMPORTLIB -O3 - > DVERSION=\"1.817\" > -DXS_VERSION=\"1.817\" "-I/usr/lib/perl5/5.8/cygwin/CORE" - > D_NOT_CORE -DmDB_ > Prefix_t=size_t -DmDB_Hash_t=u_int32_t version.c > version.c:30:16: db.h: No such file or directory > make: *** [version.o] Error 1 > PMQS/DB_File-1.817.tar.gz > /usr/bin/make -- NOT OK > Running make test > Can't test without successful make > Running make install > Make had returned bad status, install seems impossible > Failed during this command: > PMQS/DB_File-1.817.tar.gz : make NO > +++++++++++++++++++++++++++++++++++++++++++++++ > > > I can't remember I had this kind error while installing earlier > version. > > Would you please help me on DB_File installation ? > > Thanks. > > Pan > > > --------------------------------- > You rock. That's why Blockbuster's offering you one month of > Blockbuster Total Access, No Cost. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Apr 4 17:25:41 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 4 Apr 2008 16:25:41 -0500 Subject: [Bioperl-l] module for pubchem queries In-Reply-To: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> References: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> Message-ID: <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> Do you need something to access eutils via BioPerl, or are you looking for a specific set of classes? I wrote an interface to eutils (Bio::DB::EUtilities), you could do something like this: #!/usr/bin/perl -w use strict; use warnings; use Bio::DB::EUtilities; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -term => 'dihydroorotate', -db => 'pcsubstance', -retmax => 1000); print join(',',$eutil->get_ids)."\n"; chris On Apr 2, 2008, at 7:24 AM, Robert Citek wrote: > Hello all, > > I have a list of chemical compounds that have some kind of interaction > with proteins or genes. The current list contains names or SMILES and > I would like to get the CID number for those compounds. Currently, > I'm using perl to query the NCBI's eutils[1], which works great. But > I was just curious to know of there was a bioperl module to do > something similar. A quick google didn't turn up anything, so I > thought I'd ask. > > [1] http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html > > Regards, > - Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From ekeen at mail.tongji.edu.cn Mon Apr 7 02:57:04 2008 From: ekeen at mail.tongji.edu.cn (Jinyan Huang) Date: Mon, 7 Apr 2008 14:57:04 +0800 Subject: [Bioperl-l] How to analysis the relationship of my interesting KEGG pathways? Message-ID: In my research, I got 25 interesting pathways. I want to know the regulated relationship of these pathways. It is better if there some software to connect these KEGG pathways. Thank you very much in advance. From miguel.pignatelli at uv.es Mon Apr 7 06:12:58 2008 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Mon, 07 Apr 2008 12:12:58 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> Message-ID: <47F9F3AA.2090003@uv.es> Hi all, Is there any way to obtain the date of creation of individual GenBank entries? I don't mean the "last revision" date that can be found in the first line of a GenBank file. I can access this creation date by looking at the "revision history" of any GenBank entry (for example, see http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), but I need a systematic (and local=fast) way to access this information. Any help would be very appreciated, Thank you very much in advance, M; From Bank.Beszteri at awi.de Mon Apr 7 07:46:43 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Mon, 07 Apr 2008 13:46:43 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: References: <47F22B35.1030502@awi.de> Message-ID: <47FA09A3.2070004@awi.de> Hi Hilmar, it was important to understand that the inconsistency in taxon names is apparently only between the Swissprot entries with "non-standard" names and the contents of the taxonomy tables and that it is best to use a pre-loaded taxonomy, thanks for that! We have now updated to bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to have loaded everything OK in ~26 hours (with many of the "The supplied lineage does not start near..." warnings, but no other problems). Our next test is to try to load trembl (will try to do this in parallel in multiple chunks), hope it will work just as nicely! Thanks for your tips & insights! Bank Hilmar Lapp wrote: > > On Apr 1, 2008, at 8:31 AM, B?nk Beszteri wrote: > >> [...] So next we started to test BioSQL, by trying to load just >> Swissprot in a MySQL DB first, like: >> >> load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser >> xyz --dbpass abc --driver mysql --namespace uniprot_sprot --format >> swiss uniprot_sprot.dat >> >> Here we get an error message >> >> ########################################### >> >> Loading /biodb/spinkern/uniprot_sprot.dat ... >> Could not store Q6DAH5: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: The supplied lineage does not start near 'Erwinia carotovora >> subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | >> Pectobacterium | Enterobacteriaceae | Enterobacteriales | >> Gammaproteobacteria | Proteobacteria | Bacteria') >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/ >> bioperl-1.5.2_102/Bio/Root/Root.pm:359 >> STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/ >> bioperl-1.5.2_102/Bio/Species.pm:174 >> STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: 552 >> STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / >> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:1305 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key >> /biodb/ spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:973 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:852 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:182 >> STACK: Bio::DB::Persistent::PersistentObject::create /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: 244 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:169 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store /biodb/spinkern/ >> bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: load_seqdatabase.pl:622 >> ----------------------------------------------------------- >> >> at load_seqdatabase.pl line 635 >> >> ############################################ >> >> or similar, depending on whether we use a pre-loaded ncbi taxonomy >> or not > > > I recommend to always use a pre-loaded NCBI taxonomy unless you know > there are only a few organisms that are straightforward (for the > parser, that is). > >> , and which Swissprot release we are trying to load. It often seems >> to come from sg. like here, subsp. or other special addition to the >> species line; but alternative genus names and other curious things >> also to appear. It looks like Species.pm tries to validate the >> species name against the lineage info already there in the BioSQL >> DB, and in several cases, it finds inconsistencies. > > > It actually happens upon a successful lookup when the species object > is populated from the database. > >> [...] >> The only workaround we have found so far was to comment out line 174 >> in Species.pm: >> >> $self->throw("The supplied lineage does not start near '$name' (I >> was supplied '".join(" | ", @vals)."')"); > > > That should be OK if you work with a pre-loaded taxonomy. It's sort > of a sanity check that should catch a parser having messed up a > species. If you use a pre-loaded NCBI taxonomy the results of the > species parsing don't matter in all details so long as the NCBI > taxonID is parsed out correctly, and then found in the database. > > Note that this actually a warn() in the main trunk version of > BioPerl, so you might want to upgrade to that (or change throw() to > warn() in your version). You still get the records flagged with that, > but it isn't an exception. > >> >> After doing so, load_seqdatabase.pl runs for several hours (until it >> evetually crashes; I haven?t found out yet why), but proceeds really >> slowly. > > > It should certainly *not* crash. Note also that you can supply --safe > on the command line, in which case the script will continue with the > next record if one fails to load for whatever reason. > > You will want to adjust the width constraint of dbxref.accession, for > example to 128 chars. This will also be fixed for BioSQL 1.0.1. > See http://bugzilla.open-bio.org/show_bug.cgi?id=2474 > > >> I also found some info on this for Pg and Oracle in the mailing >> list, but has anyone some approximate numbers for MySQL, how long >> should a first Swissprot load take? > > > Possibly around 20 hours according to Erik Rijkers: > See http://lists.open-bio.org/pipermail/bioperl-l/2008-March/027427.html > > You can use the --logchunks N option to have it print out performance > statistics every N records. > > Hope this helps, > > -hilmar From cjfields at uiuc.edu Mon Apr 7 08:32:45 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Apr 2008 07:32:45 -0500 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47FA09A3.2070004@awi.de> References: <47F22B35.1030502@awi.de> <47FA09A3.2070004@awi.de> Message-ID: The warnings are something that we still need to resolve, but the only fix I can think of likely breaks backward compatibility with older bioperl-db installations (i.e. storing the given scientific name instead of the binomial name, which is used as a fallback when no taxid is found). There is a full explanation here: http://bugzilla.open-bio.org/show_bug.cgi?id=2092 Anyway, I think it needs further testing when someone, likely Hilmar or I, have time. chris On Apr 7, 2008, at 6:46 AM, B?nk Beszteri wrote: > Hi Hilmar, > > it was important to understand that the inconsistency in taxon names > is apparently only between the Swissprot entries with "non-standard" > names and the contents of the taxonomy tables and that it is best to > use a pre-loaded taxonomy, thanks for that! We have now updated to > bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to > have loaded everything OK in ~26 hours (with many of the "The > supplied lineage does not start near..." warnings, but no other > problems). Our next test is to try to load trembl (will try to do > this in parallel in multiple chunks), hope it will work just as > nicely! > > Thanks for your tips & insights! > > Bank > > Hilmar Lapp wrote: > >> >> On Apr 1, 2008, at 8:31 AM, B?nk Beszteri wrote: >> >>> [...] So next we started to test BioSQL, by trying to load just >>> Swissprot in a MySQL DB first, like: >>> >>> load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser >>> xyz --dbpass abc --driver mysql --namespace uniprot_sprot -- >>> format swiss uniprot_sprot.dat >>> >>> Here we get an error message >>> >>> ########################################### >>> >>> Loading /biodb/spinkern/uniprot_sprot.dat ... >>> Could not store Q6DAH5: >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: The supplied lineage does not start near 'Erwinia carotovora >>> subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | >>> Pectobacterium | Enterobacteriaceae | Enterobacteriales | >>> Gammaproteobacteria | Proteobacteria | Bacteria') >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/ >>> bioperl-1.5.2_102/Bio/Root/Root.pm:359 >>> STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/ >>> bioperl-1.5.2_102/Bio/Species.pm:174 >>> STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/ >>> PersistentObject.pm: 552 >>> STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / >>> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:1305 >>> STACK: >>> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / >>> biodb/ spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:973 >>> STACK: >>> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >>> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:852 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:182 >>> STACK: Bio::DB::Persistent::PersistentObject::create /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/ >>> PersistentObject.pm: 244 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:169 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:251 >>> STACK: Bio::DB::Persistent::PersistentObject::store /biodb/ >>> spinkern/ bioperl-db-1.5.2_100/Bio/DB/Persistent/ >>> PersistentObject.pm:271 >>> STACK: load_seqdatabase.pl:622 >>> ----------------------------------------------------------- >>> >>> at load_seqdatabase.pl line 635 >>> >>> ############################################ >>> >>> or similar, depending on whether we use a pre-loaded ncbi >>> taxonomy or not >> >> >> I recommend to always use a pre-loaded NCBI taxonomy unless you >> know there are only a few organisms that are straightforward (for >> the parser, that is). >> >>> , and which Swissprot release we are trying to load. It often >>> seems to come from sg. like here, subsp. or other special >>> addition to the species line; but alternative genus names and >>> other curious things also to appear. It looks like Species.pm >>> tries to validate the species name against the lineage info >>> already there in the BioSQL DB, and in several cases, it finds >>> inconsistencies. >> >> >> It actually happens upon a successful lookup when the species >> object is populated from the database. >> >>> [...] >>> The only workaround we have found so far was to comment out line >>> 174 in Species.pm: >>> >>> $self->throw("The supplied lineage does not start near '$name' (I >>> was supplied '".join(" | ", @vals)."')"); >> >> >> That should be OK if you work with a pre-loaded taxonomy. It's >> sort of a sanity check that should catch a parser having messed up >> a species. If you use a pre-loaded NCBI taxonomy the results of >> the species parsing don't matter in all details so long as the >> NCBI taxonID is parsed out correctly, and then found in the >> database. >> >> Note that this actually a warn() in the main trunk version of >> BioPerl, so you might want to upgrade to that (or change throw() >> to warn() in your version). You still get the records flagged with >> that, but it isn't an exception. >> >>> >>> After doing so, load_seqdatabase.pl runs for several hours (until >>> it evetually crashes; I haven?t found out yet why), but proceeds >>> really slowly. >> >> >> It should certainly *not* crash. Note also that you can supply -- >> safe on the command line, in which case the script will continue >> with the next record if one fails to load for whatever reason. >> >> You will want to adjust the width constraint of dbxref.accession, >> for example to 128 chars. This will also be fixed for BioSQL 1.0.1. >> See http://bugzilla.open-bio.org/show_bug.cgi?id=2474 >> >> >>> I also found some info on this for Pg and Oracle in the mailing >>> list, but has anyone some approximate numbers for MySQL, how long >>> should a first Swissprot load take? >> >> >> Possibly around 20 hours according to Erik Rijkers: >> See http://lists.open-bio.org/pipermail/bioperl-l/2008-March/027427.html >> >> You can use the --logchunks N option to have it print out >> performance statistics every N records. >> >> Hope this helps, >> >> -hilmar > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Mon Apr 7 08:34:00 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 07 Apr 2008 13:34:00 +0100 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47FA09A3.2070004@awi.de> References: <47F22B35.1030502@awi.de> <47FA09A3.2070004@awi.de> Message-ID: <47FA14B8.7000500@sendu.me.uk> B?nk Beszteri wrote: > Hi Hilmar, > > it was important to understand that the inconsistency in taxon names is > apparently only between the Swissprot entries with "non-standard" names > and the contents of the taxonomy tables and that it is best to use a > pre-loaded taxonomy, thanks for that! We have now updated to > bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to have > loaded everything OK in ~26 hours (with many of the "The supplied > lineage does not start near..." warnings, but no other problems). Can you provide some examples of these warnings (of the taxons that cause them)? If there's anything consistent about them perhaps Bio::Species can be improved to accommodate them properly (instead of just issuing the warning and getting the classification wrong). From heikki at sanbi.ac.za Mon Apr 7 08:48:34 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Mon, 7 Apr 2008 14:48:34 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47F9F3AA.2090003@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> Message-ID: <200804071448.34769.heikki@sanbi.ac.za> Miguel, You probably know this but: - Your entry example below is a GenPept entry, not a GenBank entry - The NCBI sequence format "genbank" has only the last modified date. I do not know about other formats (ASN.1, ...) - NCBI Entrez is a great tool but it obscures the source database. - If you really are working on real GenBank entries, you can use the accession number to see find corresponding EMBL (and Swiss-Prot) flat file formats that have both creation and last modified dates. Post to the list if you have trouble getting the dates from EMBL/Swiss-Prot formats using bioperl. Yours, -Heikki On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: > Hi all, > > Is there any way to obtain the date of creation of individual GenBank > entries? I don't mean the "last revision" date that can be found in the > first line of a GenBank file. > > I can access this creation date by looking at the "revision history" of > any GenBank entry (for example, see > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), > but I need a systematic (and local=fast) way to access this information. > > Any help would be very appreciated, > Thank you very much in advance, > > M; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From granjeau at tagc.univ-mrs.fr Mon Apr 7 09:30:10 2008 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/ICIM) Date: Mon, 07 Apr 2008 15:30:10 +0200 Subject: [Bioperl-l] help installing bioperl with cygwin In-Reply-To: References: <161313331084931@webmail.iastate.edu> Message-ID: <47FA21E2.3010602@tagc.univ-mrs.fr> Hi, I'm using BioPerl under Cygwin, because Cygwin allows one to work in a Unix-like environment in a command line point of view. So, I use the CVS version which runs out of the box http://www.bioperl.org/wiki/Using_CVS which has been replaced by SVN at the beginning of the year http://www.bioperl.org/wiki/Using_Subversion So if you really want to work under Cygwin, you can try this quick and dirty way, but you still have to become experienced because BioPerl is not supported under Cygwin. You may try Strawberry, but in my experience in installing wxPerl, wxPerl fails on both flavours of Perl. ActiveState's Perl is still the easiest way to install many packages. Regards, Samuel Chris Fields wrote: > It's best if you use ActiveState's Perl installation (it's the only > one we really support at this moment, unless someone wants to give > StrawberryPerl a run). See: > > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > chris > > On Apr 3, 2008, at 1:13 PM, slduncan at iastate.edu wrote: > >> I am trying to use cpan to install bioperl and I had an error message >> saying: >> c:\Documents not recognized as and external or internal.... >> Any ideas here. Also, I am new to the computer world so please be >> kind. :) >> >> Stacy Duncan >> Iowa State University >> Bioinformatics and Computational Biology >> 1802 University Blvd. >> VMRI Building 6 >> Ames, IA 50011-1240 >> office phone: (515) 294-8385 >> office fax: (515) 294-1401 >> home phone: (336) 965-5622 >> e-mail: slduncan at iastate.edu >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Samuel GRANJEAUD granjeau at tagc.univ-mrs.fr INSERM - ICIM - TAGC Tel: +33 (0)491 82 87 24 http://tagc.univ-mrs.fr Fax: +33 (0)491 82 87 01 http://icim.marseille.inserm.fr/proteomique From er at xs4all.nl Mon Apr 7 10:36:57 2008 From: er at xs4all.nl (Erik) Date: Mon, 7 Apr 2008 16:36:57 +0200 (CEST) Subject: [Bioperl-l] Indexing large databases / BioSQL Message-ID: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> On Mon, April 7, 2008 14:34, Sendu Bala wrote: > B?nk Beszteri wrote: >> Hi Hilmar, >> >> it was important to understand that the inconsistency in taxon names is >> apparently only between the Swissprot entries with "non-standard" names >> and the contents of the taxonomy tables and that it is best to use a >> pre-loaded taxonomy, thanks for that! We have now updated to >> bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to have >> loaded everything OK in ~26 hours (with many of the "The supplied >> lineage does not start near..." warnings, but no other problems). > > Can you provide some examples of these warnings (of the taxons that > cause them)? If there's anything consistent about them perhaps > Bio::Species can be improved to accommodate them properly (instead of > just issuing the warning and getting the classification wrong). > I did this a little while ago and saved the output (UniProtKB/Swiss-Prot Release 55.1 of 18-Mar-2008, I think). All warnings (and a few errors) for swissprot are here: http://bugzilla.open-bio.org/show_bug.cgi?id=2474 as an attached file I suppose the OP will have encountered similar output - I don't think there is much RDBMS-type-dependency involved. regards, Erik Rijkers From cjfields at uiuc.edu Mon Apr 7 11:46:01 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Apr 2008 10:46:01 -0500 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <200804071448.34769.heikki@sanbi.ac.za> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> Message-ID: <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> Strangely enough, if you use NCBI's esummary you can get both dates. Via Bio::DB::EUtilities in bioperl-live, if you dump out DocSum data (using a debugging method I added in a while back): --------------------------------------- use Bio::DB::EUtilities; # for multiple IDs use an array ref; also only use GI's (not accessions) my $factory = Bio::DB::EUtilities->new( -eutil => 'esummary', -db => 'protein', -id => 1621261); $factory->print_DocSums; --------------------------------------- One gets the following tag/value pairs: UID: 1621261 Caption :CAB02640 Title :PROBABLE PYRIMIDINE OPERON REGULATORY PROTEIN PYRR [Mycobacterium tuberculosis H37Rv] Extra :gi|1621261|emb|CAB02640.1|[1621261] Gi :1621261 CreateDate :2003/11/21 UpdateDate :2006/11/14 Flags : TaxId :83332 Length :193 Status :live ReplacedBy : Comment : I'll add in a method to grab the data element by tag (in this case, grab the creation date by asking for the 'CreateDate' key). Might come in handy for scripts. chris On Apr 7, 2008, at 7:48 AM, Heikki Lehvaslaiho wrote: > Miguel, > > You probably know this but: > > - Your entry example below is a GenPept entry, not a GenBank entry > - The NCBI sequence format "genbank" has only the last modified date. > I do not know about other formats (ASN.1, ...) > - NCBI Entrez is a great tool but it obscures the source database. > - If you really are working on real GenBank entries, you can use the > accession > number to see find corresponding EMBL (and Swiss-Prot) flat file > formats that > have both creation and last modified dates. > > Post to the list if you have trouble getting the dates from EMBL/ > Swiss-Prot > formats using bioperl. > > Yours, > > -Heikki > > On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: >> Hi all, >> >> Is there any way to obtain the date of creation of individual GenBank >> entries? I don't mean the "last revision" date that can be found in >> the >> first line of a GenBank file. >> >> I can access this creation date by looking at the "revision >> history" of >> any GenBank entry (for example, see >> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi? >> val=74311105), >> but I need a systematic (and local=fast) way to access this >> information. >> >> Any help would be very appreciated, >> Thank you very much in advance, >> >> M; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miguel.pignatelli at uv.es Mon Apr 7 12:24:50 2008 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Mon, 07 Apr 2008 18:24:50 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> Message-ID: <47FA4AD2.5030206@uv.es> I've noticed that the ASN.1 version of those records has a "creation-date" tag. But this is somehow strange, because the creation date obtained by you and that obtained via ASN.1 format is 2003/11/21, but if you look at the revision history of the record: http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=CAB02640 reports a creation date of "Oct 19 1996 12:28 AM" I don't know how to get this, because the EMBL version of this gene: http://www.ebi.ac.uk/cgi-bin/dbfetch?db=emblcds&id=CAB02640&style=raw doesn't has DT fields at all. M; Chris Fields wrote: > Strangely enough, if you use NCBI's esummary you can get both dates. > Via Bio::DB::EUtilities in bioperl-live, if you dump out DocSum data > (using a debugging method I added in a while back): > > --------------------------------------- > > use Bio::DB::EUtilities; > > # for multiple IDs use an array ref; also only use GI's (not accessions) > my $factory = Bio::DB::EUtilities->new( > -eutil => 'esummary', > -db => 'protein', > -id => 1621261); > > $factory->print_DocSums; > > --------------------------------------- > > One gets the following tag/value pairs: > > UID: 1621261 > Caption :CAB02640 > Title :PROBABLE PYRIMIDINE OPERON REGULATORY PROTEIN PYRR > [Mycobacterium tuberculosis > H37Rv] > Extra :gi|1621261|emb|CAB02640.1|[1621261] > Gi :1621261 > CreateDate :2003/11/21 > UpdateDate :2006/11/14 > Flags : > TaxId :83332 > Length :193 > Status :live > ReplacedBy : > Comment : > > I'll add in a method to grab the data element by tag (in this case, grab > the creation date by asking for the 'CreateDate' key). Might come in > handy for scripts. > > chris > > On Apr 7, 2008, at 7:48 AM, Heikki Lehvaslaiho wrote: > >> Miguel, >> >> You probably know this but: >> >> - Your entry example below is a GenPept entry, not a GenBank entry >> - The NCBI sequence format "genbank" has only the last modified date. >> I do not know about other formats (ASN.1, ...) >> - NCBI Entrez is a great tool but it obscures the source database. >> - If you really are working on real GenBank entries, you can use the >> accession >> number to see find corresponding EMBL (and Swiss-Prot) flat file >> formats that >> have both creation and last modified dates. >> >> Post to the list if you have trouble getting the dates from >> EMBL/Swiss-Prot >> formats using bioperl. >> >> Yours, >> >> -Heikki >> >> On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: >>> Hi all, >>> >>> Is there any way to obtain the date of creation of individual GenBank >>> entries? I don't mean the "last revision" date that can be found in the >>> first line of a GenBank file. >>> >>> I can access this creation date by looking at the "revision history" of >>> any GenBank entry (for example, see >>> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), >>> but I need a systematic (and local=fast) way to access this information. >>> >>> Any help would be very appreciated, >>> Thank you very much in advance, >>> >>> M; >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ______ _/ _/_____________________________________________________ >> _/ _/ >> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >> _/ _/ _/ SANBI, South African National Bioinformatics Institute >> _/ _/ _/ University of Western Cape, South Africa >> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >> ___ _/_/_/_/_/________________________________________________________ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Mon Apr 7 13:48:45 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Apr 2008 12:48:45 -0500 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47FA4AD2.5030206@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> <47FA4AD2.5030206@uv.es> Message-ID: Note in the example I gave that, during the revision history, the DBSOURCE changed at the point of the creation date (the original nuc. record was a M. tuberculosis contig sequence, which later changed to an updated full M. tuberculosis genome record at the time of the 'create date'). Couldn't find anything specific in the GenBank docs on this, but it appears (at least for a protein record) the creation date reflects the date in which the sequence was either originally deposited or originally derived from the nucleotide source record present in the record. In other words, it may not reflect the original date of deposition (which could have come from a different record, as in this case). chris On Apr 7, 2008, at 11:24 AM, Miguel Pignatelli wrote: > > I've noticed that the ASN.1 version of those records has a "creation- > date" tag. > But this is somehow strange, because the creation date obtained by > you and that obtained via ASN.1 format is 2003/11/21, but if you > look at the revision history of the record: > > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=CAB02640 > > reports a creation date of "Oct 19 1996 12:28 AM" > > I don't know how to get this, because the EMBL version of this gene: > > http://www.ebi.ac.uk/cgi-bin/dbfetch?db=emblcds&id=CAB02640&style=raw > > doesn't has DT fields at all. > > M; > > > Chris Fields wrote: >> Strangely enough, if you use NCBI's esummary you can get both >> dates. Via Bio::DB::EUtilities in bioperl-live, if you dump out >> DocSum data (using a debugging method I added in a while back): >> --------------------------------------- >> use Bio::DB::EUtilities; >> # for multiple IDs use an array ref; also only use GI's (not >> accessions) >> my $factory = Bio::DB::EUtilities->new( >> -eutil => 'esummary', >> -db => 'protein', >> -id => 1621261); >> $factory->print_DocSums; >> --------------------------------------- >> One gets the following tag/value pairs: >> UID: 1621261 >> Caption :CAB02640 >> Title :PROBABLE PYRIMIDINE OPERON REGULATORY PROTEIN >> PYRR [Mycobacterium tuberculosis >> H37Rv] >> Extra :gi|1621261|emb|CAB02640.1|[1621261] >> Gi :1621261 >> CreateDate :2003/11/21 >> UpdateDate :2006/11/14 >> Flags : >> TaxId :83332 >> Length :193 >> Status :live >> ReplacedBy : >> Comment : >> I'll add in a method to grab the data element by tag (in this case, >> grab the creation date by asking for the 'CreateDate' key). Might >> come in handy for scripts. >> chris >> On Apr 7, 2008, at 7:48 AM, Heikki Lehvaslaiho wrote: >>> Miguel, >>> >>> You probably know this but: >>> >>> - Your entry example below is a GenPept entry, not a GenBank entry >>> - The NCBI sequence format "genbank" has only the last modified >>> date. >>> I do not know about other formats (ASN.1, ...) >>> - NCBI Entrez is a great tool but it obscures the source database. >>> - If you really are working on real GenBank entries, you can use >>> the accession >>> number to see find corresponding EMBL (and Swiss-Prot) flat file >>> formats that >>> have both creation and last modified dates. >>> >>> Post to the list if you have trouble getting the dates from EMBL/ >>> Swiss-Prot >>> formats using bioperl. >>> >>> Yours, >>> >>> -Heikki >>> >>> On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: >>>> Hi all, >>>> >>>> Is there any way to obtain the date of creation of individual >>>> GenBank >>>> entries? I don't mean the "last revision" date that can be found >>>> in the >>>> first line of a GenBank file. >>>> >>>> I can access this creation date by looking at the "revision >>>> history" of >>>> any GenBank entry (for example, see >>>> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105) >>>> , >>>> but I need a systematic (and local=fast) way to access this >>>> information. >>>> >>>> Any help would be very appreciated, >>>> Thank you very much in advance, >>>> >>>> M; >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> ______ _/ _/ >>> _____________________________________________________ >>> _/ _/ >>> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >>> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >>> _/ _/ _/ SANBI, South African National Bioinformatics Institute >>> _/ _/ _/ University of Western Cape, South Africa >>> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >>> ___ _/_/_/_/_/ >>> ________________________________________________________ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Bank.Beszteri at awi.de Tue Apr 8 03:35:43 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Tue, 08 Apr 2008 09:35:43 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> Message-ID: <47FB204F.90405@awi.de> >>Can you provide some examples of these warnings (of the taxons that >>cause them)? If there's anything consistent about them perhaps >>Bio::Species can be improved to accommodate them properly (instead of >>just issuing the warning and getting the classification wrong). >> >> > >All warnings (and a few errors) for swissprot are here: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2474 > >as an attached file > >I suppose the OP will have encountered similar output - I don't think there is >much RDBMS-type-dependency involved. > > Hi Erik & Sendu, yes, the same kind of thing, probably no DBMS-type dependency; in case it could be useful, I uploaded my output as a second attachment to the bugzilla report cited above. Bank From heikki at sanbi.ac.za Tue Apr 8 04:32:12 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Tue, 8 Apr 2008 10:32:12 +0200 Subject: [Bioperl-l] Blast database sequence retrieval perl script In-Reply-To: <6BEABCD5CA640A44A848448A42A03B73079E48C9@ilrikeadx1.ILRI.CGIARAD.ORG> References: <6BEABCD5CA640A44A848448A42A03B73079E48C9@ilrikeadx1.ILRI.CGIARAD.ORG> Message-ID: <200804081032.12312.heikki@sanbi.ac.za> Dear Nelson, I am cc:ing the bioperl mailing list where all these kind of queries should go. More people can help you that way. Since you have your own local data set, you need to create an index that catalogues you sequences for easy retrieval. You need to install bioperl-live first. See for example: http://www.bioperl.org/wiki/Using_Subversion Then you can follow this HOWTO: http://www.bioperl.org/wiki/HOWTO:Flat_databases The other HOWTOs will help you dealing with BioPerl sequence objects that are retrieved: http://www.bioperl.org/wiki/HOWTOs. Yours, -Heikki On Monday 07 April 2008 14:50:23 Ndegwa, Nelson (IITA-Nairobi) wrote: > Dear Prof. Heikki, > > Hi. We met at the Pathogen Bioinformatics Conference held in Nairobi > Kenya in May 2007 at ICIPE. I recall you are a developer of Bioperl and > Perl. I have managed to install a local Blast, having just cowpea Contig > sequences, about 50,000 in total. This runs fine, as I can perform > various queries and get results. However, any good match/hit on the > local Blast database is hard to retrieve and the only option seems to go > back to that database and search manually for the top hit sequence - an > exceedingly manual task. Might you perhaps be having a Perl script I > could adopt to my database to help with this task Such that the hits > have a hyperlink which can be used to retrieve that specific entry? I > have limited knowledge of Perl. Thank you. > > With Kind Regards, > > Nelson. -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From David.Messina at sbc.su.se Tue Apr 8 07:29:12 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 8 Apr 2008 13:29:12 +0200 Subject: [Bioperl-l] How to analysis the relationship of my interesting KEGG pathways? In-Reply-To: <628aabb70804080053g1fd9120ex9d5fd12f65f216f9@mail.gmail.com> References: <628aabb70804080053g1fd9120ex9d5fd12f65f216f9@mail.gmail.com> Message-ID: <628aabb70804080429k2aa17a6eu12197709d4cc1af0@mail.gmail.com> Hi Jinyan, You asked a similar question last week and received a couple of suggestions -- did you take a look at those? I'm not an expert on this topic, but I believe that since regulatory information is much harder to obtain experimentally and therefore much less well known, there isn't a lot of it in pathway databases like KEGG. You may have to look through the literature and start trying to put together possible regulatory links on your own. Dave From hrh at sanger.ac.uk Tue Apr 8 08:48:32 2008 From: hrh at sanger.ac.uk (Hans Rudolf Hotz) Date: Tue, 8 Apr 2008 13:48:32 +0100 (BST) Subject: [Bioperl-l] Blast database sequence retrieval perl script In-Reply-To: <200804081032.12312.heikki@sanbi.ac.za> References: <6BEABCD5CA640A44A848448A42A03B73079E48C9@ilrikeadx1.ILRI.CGIARAD.ORG> <200804081032.12312.heikki@sanbi.ac.za> Message-ID: Nelson or simply use the BLAST indices for the sequence retrieval as well. All you need to do is adding the "-o" option to the 'formatdb' command for the BLAST index creation (this will create some extra files). Then you can use 'fastacmd' (which is also part of the NCBI BLAST package) to retrieve the sequences. Hans On Tue, 8 Apr 2008, Heikki Lehvaslaiho wrote: > > Dear Nelson, > > I am cc:ing the bioperl mailing list where all these kind of queries should > go. More people can help you that way. > > > Since you have your own local data set, you need to create an index that > catalogues you sequences for easy retrieval. > > You need to install bioperl-live first. See for example: > http://www.bioperl.org/wiki/Using_Subversion > > Then you can follow this HOWTO: > http://www.bioperl.org/wiki/HOWTO:Flat_databases > > The other HOWTOs will help you dealing with BioPerl sequence objects that are > retrieved: http://www.bioperl.org/wiki/HOWTOs. > > > Yours, > > -Heikki > > > On Monday 07 April 2008 14:50:23 Ndegwa, Nelson (IITA-Nairobi) wrote: >> Dear Prof. Heikki, >> >> Hi. We met at the Pathogen Bioinformatics Conference held in Nairobi >> Kenya in May 2007 at ICIPE. I recall you are a developer of Bioperl and >> Perl. I have managed to install a local Blast, having just cowpea Contig >> sequences, about 50,000 in total. This runs fine, as I can perform >> various queries and get results. However, any good match/hit on the >> local Blast database is hard to retrieve and the only option seems to go >> back to that database and search manually for the top hit sequence - an >> exceedingly manual task. Might you perhaps be having a Perl script I >> could adopt to my database to help with this task Such that the hits >> have a hyperlink which can be used to retrieve that specific entry? I >> have limited knowledge of Perl. Thank you. >> >> With Kind Regards, >> >> Nelson. > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From robert.citek at gmail.com Tue Apr 8 10:09:27 2008 From: robert.citek at gmail.com (Robert Citek) Date: Tue, 8 Apr 2008 09:09:27 -0500 Subject: [Bioperl-l] module for pubchem queries In-Reply-To: <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> References: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> Message-ID: <4145b6790804080709l20f1e56erf4b7af04b0a52870@mail.gmail.com> Wrapping bioperl around eutils will work just fine. Thanks for the pointer. http://search.cpan.org/~sendu/bioperl-1.5.2_102/Bio/DB/EUtilities.pm Regards, - Robert On Fri, Apr 4, 2008 at 4:25 PM, Chris Fields wrote: > Do you need something to access eutils via BioPerl, or are you looking for a > specific set of classes? I wrote an interface to eutils > (Bio::DB::EUtilities), you could do something like this: > > #!/usr/bin/perl -w > > use strict; > use warnings; > use Bio::DB::EUtilities; > > my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', > -term => 'dihydroorotate', > -db => 'pcsubstance', > -retmax => 1000); > > print join(',',$eutil->get_ids)."\n"; > > chris From cjfields at uiuc.edu Tue Apr 8 11:10:26 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 8 Apr 2008 10:10:26 -0500 Subject: [Bioperl-l] module for pubchem queries In-Reply-To: <4145b6790804080709l20f1e56erf4b7af04b0a52870@mail.gmail.com> References: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> <4145b6790804080709l20f1e56erf4b7af04b0a52870@mail.gmail.com> Message-ID: <32D210FC-575E-4D95-95DA-FC6F5BE1FC24@uiuc.edu> Just to note, the the API has changed significantly from the interface in the 1.5.2 release. The up-to-date (supported) interface is in subversion; there are some example recipes here: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook I'm working on a full HOWTO, just haven't had time to get it up on the wiki yet. chris On Apr 8, 2008, at 9:09 AM, Robert Citek wrote: > Wrapping bioperl around eutils will work just fine. Thanks for the > pointer. > > http://search.cpan.org/~sendu/bioperl-1.5.2_102/Bio/DB/EUtilities.pm > > Regards, > - Robert > > On Fri, Apr 4, 2008 at 4:25 PM, Chris Fields > wrote: >> Do you need something to access eutils via BioPerl, or are you >> looking for a >> specific set of classes? I wrote an interface to eutils >> (Bio::DB::EUtilities), you could do something like this: >> >> #!/usr/bin/perl -w >> >> use strict; >> use warnings; >> use Bio::DB::EUtilities; >> >> my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', >> -term => 'dihydroorotate', >> -db => 'pcsubstance', >> -retmax => 1000); >> >> print join(',',$eutil->get_ids)."\n"; >> >> chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cuiw at ncbi.nlm.nih.gov Tue Apr 8 16:41:58 2008 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Tue, 8 Apr 2008 16:41:58 -0400 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47F9F3AA.2090003@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> Message-ID: <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> Hi, Miguel: id1_fetch can do it. Detailed instruction can be found at: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id 1_fetch.html Here is an example: >id1_fetch -lt revisions -flat '12:74311105' -fmt fasta GI Loaded DB Retrieval No. -- ------ -- ------------- 74311105 12/07/2007 NCBI 19766263 74311105 01/23/2007 NCBI 16325656 74311105 03/30/2006 NCBI 13131204 74311105 03/03/2006 NCBI 12915541 74311105 03/02/2006 NCBI 12885275 74311105 12/03/2005 NCBI 12259793 74311105 09/09/2005 NCBI 11257262 74311105 09/09/2005 NCBI 11242667 Wenwu Cui PhD NCBI/NLM/NIH > -----Original Message----- > From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > Sent: Monday, April 07, 2008 6:13 AM > Cc: bioperl-l at bioperl.org > Subject: [Bioperl-l] GenBank entries creation dates > > Hi all, > > Is there any way to obtain the date of creation of individual GenBank > entries? I don't mean the "last revision" date that can be found in the > first line of a GenBank file. > > I can access this creation date by looking at the "revision history" of > any GenBank entry (for example, see > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), > but I need a systematic (and local=fast) way to access this > information. > > Any help would be very appreciated, > Thank you very much in advance, > > M; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miguel.pignatelli at uv.es Wed Apr 9 07:32:39 2008 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Wed, 09 Apr 2008 13:32:39 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> Message-ID: <47FCA957.5040409@uv.es> Wow, impressive, thanks Wenwu for the information, I have never used this tool before. The problem is that I need to know all the revision history (or at least the creation date) for *all* the GIs present in nr (well, or at least a significant portion of it) and this tool queries via web. The existence of this tool confirms me that this information is available somewhere, is it possible to download the data that contains this information? Thanks again, M; Cui, Wenwu (NIH/NLM/NCBI) [C] wrote: > Hi, Miguel: > > id1_fetch can do it. Detailed instruction can be found at: > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id > 1_fetch.html > > Here is an example: > >> id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > GI Loaded DB Retrieval No. > -- ------ -- ------------- > 74311105 12/07/2007 NCBI 19766263 > 74311105 01/23/2007 NCBI 16325656 > 74311105 03/30/2006 NCBI 13131204 > 74311105 03/03/2006 NCBI 12915541 > 74311105 03/02/2006 NCBI 12885275 > 74311105 12/03/2005 NCBI 12259793 > 74311105 09/09/2005 NCBI 11257262 > 74311105 09/09/2005 NCBI 11242667 > > Wenwu Cui PhD > NCBI/NLM/NIH > >> -----Original Message----- >> From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] >> Sent: Monday, April 07, 2008 6:13 AM >> Cc: bioperl-l at bioperl.org >> Subject: [Bioperl-l] GenBank entries creation dates >> >> Hi all, >> >> Is there any way to obtain the date of creation of individual GenBank >> entries? I don't mean the "last revision" date that can be found in > the >> first line of a GenBank file. >> >> I can access this creation date by looking at the "revision history" > of >> any GenBank entry (for example, see >> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), >> but I need a systematic (and local=fast) way to access this >> information. >> >> Any help would be very appreciated, >> Thank you very much in advance, >> >> M; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cuiw at ncbi.nlm.nih.gov Wed Apr 9 09:25:16 2008 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Wed, 9 Apr 2008 09:25:16 -0400 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47FCA957.5040409@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> <47FCA957.5040409@uv.es> Message-ID: <6F230E9769AA8D4EB4BC401DF133EDB7180BE1@NIHCESMLBX15.nih.gov> Hi, Miguel, I do not know whether the data file is publically available. However, you can perform 'real time' query via id1_fetch: ####step 1: generate GI file ##### id1_fetch -query 'YOUR-GENBANK-QUERY-STRING' -lt none -db Nucleotide -out qfile ####step 2: retrieve revisions for GIs stored in qfile ##### id1_fetch -lt revisions -qf qfile -fmt fasta -db Nucleotide Good luck! Wenwu Cui > -----Original Message----- > From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > Sent: Wednesday, April 09, 2008 7:33 AM > To: Cui, Wenwu (NIH/NLM/NCBI) [C] > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] GenBank entries creation dates > > Wow, impressive, thanks Wenwu for the information, I have never used > this tool before. The problem is that I need to know all the revision > history (or at least the creation date) for *all* the GIs present in nr > (well, or at least a significant portion of it) and this tool queries > via web. > > The existence of this tool confirms me that this information is > available somewhere, is it possible to download the data that contains > this information? > > Thanks again, > > M; > > > Cui, Wenwu (NIH/NLM/NCBI) [C] wrote: > > Hi, Miguel: > > > > id1_fetch can do it. Detailed instruction can be found at: > > > > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.i > d > > 1_fetch.html > > > > Here is an example: > > > >> id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > > GI Loaded DB Retrieval No. > > -- ------ -- ------------- > > 74311105 12/07/2007 NCBI 19766263 > > 74311105 01/23/2007 NCBI 16325656 > > 74311105 03/30/2006 NCBI 13131204 > > 74311105 03/03/2006 NCBI 12915541 > > 74311105 03/02/2006 NCBI 12885275 > > 74311105 12/03/2005 NCBI 12259793 > > 74311105 09/09/2005 NCBI 11257262 > > 74311105 09/09/2005 NCBI 11242667 > > > > Wenwu Cui PhD > > NCBI/NLM/NIH > > > >> -----Original Message----- > >> From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > >> Sent: Monday, April 07, 2008 6:13 AM > >> Cc: bioperl-l at bioperl.org > >> Subject: [Bioperl-l] GenBank entries creation dates > >> > >> Hi all, > >> > >> Is there any way to obtain the date of creation of individual > GenBank > >> entries? I don't mean the "last revision" date that can be found in > > the > >> first line of a GenBank file. > >> > >> I can access this creation date by looking at the "revision history" > > of > >> any GenBank entry (for example, see > >> > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), > >> but I need a systematic (and local=fast) way to access this > >> information. > >> > >> Any help would be very appreciated, > >> Thank you very much in advance, > >> > >> M; > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From CALLEY_JOHN_N at LILLY.COM Wed Apr 9 09:45:23 2008 From: CALLEY_JOHN_N at LILLY.COM (John N Calley) Date: Wed, 9 Apr 2008 09:45:23 -0400 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47FCA957.5040409@uv.es> Message-ID: You might want to keep in mind that the creation date is not always reliable. I am aware of one example where the recorded creation date precedes the sequencing date by several months (as determined by the trace file date). NCBI was not able to explain exactly what happened but (as I recall) hypothesized that some dates had been scrambled in a database rebuild. If there was interest I could probably pull up more details. John Calley Miguel Pignatelli Sent by: bioperl-l-bounces at lists.open-bio.org 04/09/2008 07:32 AM Please respond to miguel.pignatelli at uv.es To "Cui, Wenwu (NIH/NLM/NCBI) [C]" cc bioperl-l at bioperl.org Subject Re: [Bioperl-l] GenBank entries creation dates Wow, impressive, thanks Wenwu for the information, I have never used this tool before. The problem is that I need to know all the revision history (or at least the creation date) for *all* the GIs present in nr (well, or at least a significant portion of it) and this tool queries via web. The existence of this tool confirms me that this information is available somewhere, is it possible to download the data that contains this information? Thanks again, M; Cui, Wenwu (NIH/NLM/NCBI) [C] wrote: > Hi, Miguel: > > id1_fetch can do it. Detailed instruction can be found at: > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id > 1_fetch.html > > Here is an example: > >> id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > GI Loaded DB Retrieval No. > -- ------ -- ------------- > 74311105 12/07/2007 NCBI 19766263 > 74311105 01/23/2007 NCBI 16325656 > 74311105 03/30/2006 NCBI 13131204 > 74311105 03/03/2006 NCBI 12915541 > 74311105 03/02/2006 NCBI 12885275 > 74311105 12/03/2005 NCBI 12259793 > 74311105 09/09/2005 NCBI 11257262 > 74311105 09/09/2005 NCBI 11242667 > > Wenwu Cui PhD > NCBI/NLM/NIH > >> -----Original Message----- >> From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] >> Sent: Monday, April 07, 2008 6:13 AM >> Cc: bioperl-l at bioperl.org >> Subject: [Bioperl-l] GenBank entries creation dates >> >> Hi all, >> >> Is there any way to obtain the date of creation of individual GenBank >> entries? I don't mean the "last revision" date that can be found in > the >> first line of a GenBank file. >> >> I can access this creation date by looking at the "revision history" > of >> any GenBank entry (for example, see >> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), >> but I need a systematic (and local=fast) way to access this >> information. >> >> Any help would be very appreciated, >> Thank you very much in advance, >> >> M; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From frederic.romagne at gmail.com Wed Apr 9 16:45:50 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Wed, 09 Apr 2008 15:45:50 -0500 Subject: [Bioperl-l] question about clustalw module. Message-ID: <1207773950.483.13.camel@kiss-laptop> Hello, i have a problem when using Bio::Tools::Run::Alignment::Clustalw : I give it an array_ref scalar (the array contains some fasta sequences) and all the good parameters and i write the result via Bio::SeqIO. The fact is that my result file only contains the Accession number in the header... An example : the initial stream is : >NM_052854 Homo sapiens cAMP responsive element binding protein 3-like 1 (CREB3L1), mRNA. AGAAGACGTGCGGAGGGAGACGCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGG GGGAGCACTTAGCTCCCCCGCCCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTC AGCCCCAACCCCGGGCTCCCCATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGT GGAGTCGGCTGAATGCCCACGGTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCG CTGCCCTAAGGCCCCCGCGCGCCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCG CCCCTCCCCCGGGGCTTCGCCCCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAG GAGCTCTGGACTGGGCGCGCCGCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCC CGGGAGCCGGCTGCGATGGACGCCGTCTTGGAACCCTTCCCGGCCGACAGGCTGTTCCCC GGATCCAGCTTCCTGGACTTGGGGGATCTGAACGAGTCGGACTTCCTCAACAATGCGCAC ... the result file is : >NM_052854 ---------------------------------------AGAAGACGTGCGGAGGGAGAC GCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGGGGGAGCACTTAGCTCCCCCGC CCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTCAGCCCCAACCCCGGGCTCCCC ATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGTGGAGTCGGCTGAATGCCCACG GTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCGCTGCCCTAAGGCCCCCGCGCG CCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCGCCCCTCCCCCGGGGCTTCGCC CCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAGGAGCTCTGGACTGGGCGCGCC GCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCCCGGGAGCCGGCTGCGATGGAC ... ?So i lost the other informations provided by the header... ?Is there any option to keep these informations? Here is a part of my code with my options : my $seq_ref=\@seq; my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM', 'quiet' => 1, 'output' => 'FASTA'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); my $aln = $factory->align($seq_ref); Thank you. From jason at bioperl.org Wed Apr 9 16:55:13 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 9 Apr 2008 13:55:13 -0700 Subject: [Bioperl-l] question about clustalw module. In-Reply-To: <1207773950.483.13.camel@kiss-laptop> References: <1207773950.483.13.camel@kiss-laptop> Message-ID: the clustal alignment format does not allow for the description - if you want to preserve it you'll have to add it back, make a hash indexed by sequence ID and store the description, then when you get your alignment back you can update the description field before writing it out with AlignIO. -jason On Apr 9, 2008, at 1:45 PM, Fr?d?ric Romagn? wrote: > Hello, > > i have a problem when using Bio::Tools::Run::Alignment::Clustalw : > > I give it an array_ref scalar (the array contains some fasta > sequences) > and all the good parameters and i write the result via Bio::SeqIO. > > The fact is that my result file only contains the Accession number in > the header... An example : > > the initial stream is : > >> NM_052854 Homo sapiens cAMP responsive element binding protein 3- >> like 1 > (CREB3L1), mRNA. > AGAAGACGTGCGGAGGGAGACGCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGG > GGGAGCACTTAGCTCCCCCGCCCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTC > AGCCCCAACCCCGGGCTCCCCATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGT > GGAGTCGGCTGAATGCCCACGGTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCG > CTGCCCTAAGGCCCCCGCGCGCCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCG > CCCCTCCCCCGGGGCTTCGCCCCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAG > GAGCTCTGGACTGGGCGCGCCGCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCC > CGGGAGCCGGCTGCGATGGACGCCGTCTTGGAACCCTTCCCGGCCGACAGGCTGTTCCCC > GGATCCAGCTTCCTGGACTTGGGGGATCTGAACGAGTCGGACTTCCTCAACAATGCGCAC > > ... > > the result file is : > >> NM_052854 > ---------------------------------------AGAAGACGTGCGGAGGGAGAC > GCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGGGGGAGCACTTAGCTCCCCCGC > CCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTCAGCCCCAACCCCGGGCTCCCC > ATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGTGGAGTCGGCTGAATGCCCACG > GTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCGCTGCCCTAAGGCCCCCGCGCG > CCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCGCCCCTCCCCCGGGGCTTCGCC > CCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAGGAGCTCTGGACTGGGCGCGCC > GCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCCCGGGAGCCGGCTGCGATGGAC > > ... > > So i lost the other informations provided by the header... > > Is there any option to keep these informations? > > Here is a part of my code with my options : > > > my $seq_ref=\@seq; > my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM', 'quiet' => 1, > 'output' => 'FASTA'); > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > my $aln = $factory->align($seq_ref); > > > Thank you. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lamq at usal.es Thu Apr 10 11:52:24 2008 From: lamq at usal.es (Luis A. M. Quintales) Date: Thu, 10 Apr 2008 17:52:24 +0200 Subject: [Bioperl-l] xyplot glyph problem with previous aggregation Message-ID: <47FE37B8.9090404@usal.es> I am not able to add xyplot glyphs to one panel because I have some problems with the aggregations. Using that GFF file: ##sequence-region chr1 1 5578650 chr1 atfreq atpc 1 50 58.8000 . . atpc 1 chr1 atfreq atpc 51 100 58.4000 . . atpc 1 chr1 atfreq atpc 101 150 57.6000 . . atpc 1 chr1 atfreq atpc 151 200 57.8000 . . atpc 1 . . . And this source code for preparing the aggregated features necessary for the xyplot glyph: my $filin = $ARGV[0]; my $db = Bio::DB::GFF->new( -dsn => $filin, -adaptor => 'memory', -aggregator => 'at{atpc:atfreq}' ); my $segment = $db->segment('chr1'); my @features1 = $db->features('atpc'); print "$#features1 \n"; my @features2 = $segment->features('atpc'); print "$#features2 \n"; my @features3 = $db->features('at'); print "$#features3 \n"; my @features4 = $segment->features('at'); print "$#features4 \n"; I obtain: 111572 111572 0 0 What I am doing wrong with the aggregator? Many thanks. From lamq at usal.es Thu Apr 10 11:45:55 2008 From: lamq at usal.es (Luis A. M. Quintales) Date: Thu, 10 Apr 2008 17:45:55 +0200 Subject: [Bioperl-l] xyplot glyph problem with previous aggregation Message-ID: <47FE3633.70908@usal.es> I am not able to add xyplot glyphs to one panel because I have some problems with the aggregations. Using that GFF file: ##sequence-region chr1 1 5578650 chr1 atfreq atpc 1 50 58.8000 . . atpc 1 chr1 atfreq atpc 51 100 58.4000 . . atpc 1 chr1 atfreq atpc 101 150 57.6000 . . atpc 1 chr1 atfreq atpc 151 200 57.8000 . . atpc 1 . . . And this source code for preparing the aggregated features necessary for the xyplot glyph: my $filin = $ARGV[0]; my $db = Bio::DB::GFF->new( -dsn => $filin, -adaptor => 'memory', -aggregator => 'at{atpc:atfreq}' ); my $segment = $db->segment('chr1'); my @features1 = $db->features('atpc'); print "$#features1 \n"; my @features2 = $segment->features('atpc'); print "$#features2 \n"; my @features3 = $db->features('at'); print "$#features3 \n"; my @features4 = $segment->features('at'); print "$#features4 \n"; I obtain: 111572 111572 0 0 What I am doing wrong with the aggregator? Many thanks. From lincoln.stein at gmail.com Thu Apr 10 13:55:06 2008 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 10 Apr 2008 13:55:06 -0400 Subject: [Bioperl-l] xyplot glyph problem with previous aggregation In-Reply-To: <47FE37B8.9090404@usal.es> References: <47FE37B8.9090404@usal.es> Message-ID: <6dce9a0b0804101055w65e22abfgaa4f155751fef40f@mail.gmail.com> Hi Luis, When you aggregate the atpc 1 features together, you end up with one feature. Thus @features3 is an array of size 1. The $# operator returns the index of the last element, which is 0. If @features3 were empty, $#features3 would return -1. Lincoln On Thu, Apr 10, 2008 at 11:52 AM, Luis A. M. Quintales wrote: > I am not able to add xyplot glyphs to one panel because I have some > problems with the aggregations. > > Using that GFF file: > > ##sequence-region chr1 1 5578650 > chr1 atfreq atpc 1 50 58.8000 . . atpc 1 > chr1 atfreq atpc 51 100 58.4000 . . atpc 1 > chr1 atfreq atpc 101 150 57.6000 . . atpc 1 > chr1 atfreq atpc 151 200 57.8000 . . atpc 1 > . . . > > > And this source code for preparing the aggregated features necessary for > the xyplot glyph: > > my $filin = $ARGV[0]; > my $db = Bio::DB::GFF->new( -dsn => $filin, > -adaptor => 'memory', > -aggregator => 'at{atpc:atfreq}' > ); > my $segment = $db->segment('chr1'); > my @features1 = $db->features('atpc'); > print "$#features1 \n"; > my @features2 = $segment->features('atpc'); > print "$#features2 \n"; > my @features3 = $db->features('at'); > print "$#features3 \n"; > my @features4 = $segment->features('at'); > print "$#features4 \n"; > > I obtain: > > 111572 > 111572 > 0 > 0 > > What I am doing wrong with the aggregator? > > Many thanks. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From adsj at novozymes.com Fri Apr 11 04:53:23 2008 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Fri, 11 Apr 2008 10:53:23 +0200 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example Message-ID: <87d4owixh8.fsf@topper.koldfront.dk> Hi. I am trying to make Bio::SeqIO return objects of my own type (a small extension of Bio::Seq::RichSeq), by setting -seqfactory. I am having a little trouble creating the correct object to pass with -seqfactory: Following the example given in SYNOPSIS of Bio::Factory::SequenceFactoryI, I get this error: $ perl -e ' > use Bio::Seq::SeqFactory; > my $seqbuilder = Bio::Seq::SeqFactory->new('type' => 'Bio::PrimarySeq'); > > my $seq = $seqbuilder->create(-seq => 'ACTGAT', > -display_id => 'exampleseq'); > > print "seq is a ", ref($seq), "\n"; > ' ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Can't locate type.pm in @INC (@INC contains: /z/bio/biotools/bioinfperlmodules/ /z/bio/adm/modules /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl .) at (eval 13) line 3. : Unrecognized Sequence type for SeqFactory 'type' STACK: Error::throw STACK: Bio::Root::Root::throw /usr/share/perl/5.8/Bio/Root/Root.pm:357 STACK: Bio::Seq::SeqFactory::type /usr/share/perl/5.8/Bio/Seq/SeqFactory.pm:134 STACK: Bio::Seq::SeqFactory::new /usr/share/perl/5.8/Bio/Seq/SeqFactory.pm:93 STACK: -e:3 ----------------------------------------------------------- $ If I go "Bio::Seq::SeqFactory('Bio::PrimarySeq'=>1)" instead, for instance, it seems to work: $ perl -e ' > use Bio::Seq::SeqFactory; > my $seqbuilder = Bio::Seq::SeqFactory->new('Bio::PrimarySeq'=>1); > > my $seq = $seqbuilder->create(-seq => 'ACTGAT', > -display_id => 'exampleseq'); > > print "seq is a ", ref($seq), "\n"; > ' seq is a Bio::PrimarySeq $ I was about to write a patch for the pod, when I realized that I'd better start by asking: Is this a buglet in the pod or the code? Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From hlapp at gmx.net Fri Apr 11 11:35:54 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 11 Apr 2008 11:35:54 -0400 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example In-Reply-To: <87d4owixh8.fsf@topper.koldfront.dk> References: <87d4owixh8.fsf@topper.koldfront.dk> Message-ID: <0037240B-F469-4388-972A-324101B11621@gmx.net> On Apr 11, 2008, at 4:53 AM, Adam Sj?gren wrote: > $ perl -e ' >> use Bio::Seq::SeqFactory; >> my $seqbuilder = Bio::Seq::SeqFactory->new('type' => >> 'Bio::PrimarySeq'); You need to prefix the argument with a dash: '-type', not 'type'. Otherwise, it assumes that the class you want instantiated is 'type.pm'. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From 1zoujing at 163.com Thu Apr 10 01:08:52 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 9 Apr 2008 22:08:52 -0700 (PDT) Subject: [Bioperl-l] Bio::ASN1::EntrezGene parse so slowly? Message-ID: <16602210.post@talk.nabble.com> I want to parse a file "gene_info" from NCBI. The format of Gene in NCBI is ASN1, right? So I used Bio::ASN1::EntrezGene. But it didn't work properly/too slow. The file is about 500M. The code is following: use Bio::ASN1::EntrezGene; my $parser = Bio::ASN1::EntrezGene->new('file' => $ARGV[0]); my $i = 0; while(my $result = $parser->next_seq) { last; #something to do there, here use last for test} When it goes to the "while" part, it is processing on and on, it does not went out, even I used "last" in the "while" part. So I wonder whether it is too slow or the module is not fit for this job, or I did something wrong? Thank you! -- View this message in context: http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16602210.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 02:17:41 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 9 Apr 2008 23:17:41 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16602770.post@talk.nabble.com> I am a geen hand in Bioperl. When I run perl with "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error information: Data Error: none conforming data found on line 1 in Sus_scrofa.ags. But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, should be the same as Homo_sapiens in the example. So it should be no error as the code is the example from Mingyi. I wonder why this happen, and should I change something about the file? -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16602770.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 02:56:52 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 9 Apr 2008 23:56:52 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: <16602770.post@talk.nabble.com> References: <16602770.post@talk.nabble.com> Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 03:03:56 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:03:56 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there still one thing, I want to parse "gene_info.gz" in Gene of NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file ) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 03:04:32 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:04:32 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there still one thing, I want to parse "gene_info.gz" in Gene of NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 03:09:40 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:09:40 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there still one thing, I want to parse "gene_info.gz" in Gene of NCBI. ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 03:10:26 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:10:26 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there is still one thing, I want to parse "gene_info.gz" in Gene of NCBI. ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From stefan.kirov at bms.com Fri Apr 11 15:59:29 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 11 Apr 2008 15:59:29 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: <16602770.post@talk.nabble.com> References: <16602770.post@talk.nabble.com> Message-ID: AGS is a binary ASN.1 format and WILL NOT be parsed! You have to use gene2xml( weird, but this is NCBI) with these flags: -c -x -b -i. This will spit out text ASN which can be parsed. Stefan On Wed, 9 Apr 2008, zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no error > as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- > View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16602770.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From stefan.kirov at bms.com Fri Apr 11 16:01:30 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 11 Apr 2008 16:01:30 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: <16603225.post@talk.nabble.com> References: <16603225.post@talk.nabble.com> Message-ID: It is not. If you use this file, why would you need a parser for it anyway? Just split on \t or read with OpenOffice or equiv. Stefan On Thu, 10 Apr 2008, zoujing wrote: > > Seached the web and found the answer now, quote the answer as following: > The error was thrown by my Bio::ASN1::EntrezGene module because it > expects a text file, while you fed it with a binary file. To use > gzipped ASN binary file from NCBI, download the NCBI gene2xml > (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), > then use this syntax to run my parser on the binary files: > > my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i > Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped > binary file directly downloaded from NCBI > > Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). > Mingyi > > But there still one thing, I want to parse "gene_info.gz" in Gene of > NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line > per GeneID, Column header line is the first line in the file > ) is not the right format for Bio::ASN1::EntrezGene? > > > > zoujing wrote: >> >> I am a geen hand in Bioperl. When I run perl with >> "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error >> information: >> Data Error: none conforming data found on line 1 in Sus_scrofa.ags. >> >> But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, >> should be the same as Homo_sapiens in the example. So it should be no >> error as the code is the example from Mingyi. >> I wonder why this happen, and should I change something about the file? >> >> > > -- > View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From asjo at koldfront.dk Fri Apr 11 15:39:59 2008 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Fri, 11 Apr 2008 21:39:59 +0200 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example In-Reply-To: <0037240B-F469-4388-972A-324101B11621@gmx.net> (Hilmar Lapp's message of "Fri, 11 Apr 2008 11:35:54 -0400") References: <87d4owixh8.fsf@topper.koldfront.dk> <0037240B-F469-4388-972A-324101B11621@gmx.net> Message-ID: <877if4i3jk.fsf@topper.koldfront.dk> On Fri, 11 Apr 2008 11:35:54 -0400, Hilmar wrote: > On Apr 11, 2008, at 4:53 AM, Adam Sj?gren wrote: >>> my $seqbuilder = Bio::Seq::SeqFactory->new('type' => >>> 'Bio::PrimarySeq'); > You need to prefix the argument with a dash: '-type', not 'type'. > Otherwise, it assumes that the class you want instantiated is > 'type.pm'. I guess that means I should submit a patch for the SYNOPSIS. Attached. Thanks, Adam Index: Bio/Factory/SequenceFactoryI.pm =================================================================== --- Bio/Factory/SequenceFactoryI.pm (revision 14654) +++ Bio/Factory/SequenceFactoryI.pm (working copy) @@ -20,7 +20,7 @@ # get a Bio::Factory::SequenceFactoryI object like use Bio::Seq::SeqFactory; - my $seqbuilder = Bio::Seq::SeqFactory->new('type' => 'Bio::PrimarySeq'); + my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq => 'ACTGAT', -display_id => 'exampleseq'); -- "Well, I'm a moon around you" Adam Sj?gren asjo at koldfront.dk From bamboowarrior at gmail.com Fri Apr 11 19:10:35 2008 From: bamboowarrior at gmail.com (Arkady) Date: Fri, 11 Apr 2008 18:10:35 -0500 Subject: [Bioperl-l] Nucleotide Links in Gene DB (GenBank) Message-ID: <91656c3f0804111610r24c8fa5es5bcb56b7a59e0208@mail.gmail.com> Hi everyone, I'm a bioperl n00b. Actually, kind of a genbank n00b, too, as I'm from a CS background and just started bio things last June. I'm trying to set up an analysis pipeline of primate protein CDSs (the nucleotide seqs). I've written a script which does a pretty decent job of downloading these from GenBank--but it's inconsistent, because a lot of sequences in nucleotide are 'predicted' and named LOCthisorthat instead of by gene name. So what I was thinking was this (assume ANKRD43 is the gene for this example): 1. Search 'gene' database for ANKRD43 AND (PRI*[ORGN]) On NCBI, there's an option to show all nucleotide links. How do I get a list of those in bioperl? Can bioperl even search 'gene', or just 'nucleotide'? 2. Search 'nucleotide' for the referenced items from #1, and also for ANKRD43[TITL] AND (PRI*[ORGN]), save CDSes. 3. BLAST mRNA for one of those CDSes, see if we pick up any other matches. 4. BLAT other primates for CDSes, see if we find anything not in GenBank. On the other hand, I always get the feeling I'm doing things the hard way--especially here, with #1 and #2. Is there a much more obvious, simple way to do this? Thanks, folks. Cheers, John Woods Institute for Cellular and Molecular Biology The University of Texas at Austin From hlapp at gmx.net Fri Apr 11 19:19:44 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 11 Apr 2008 19:19:44 -0400 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example In-Reply-To: <877if4i3jk.fsf@topper.koldfront.dk> References: <87d4owixh8.fsf@topper.koldfront.dk> <0037240B-F469-4388-972A-324101B11621@gmx.net> <877if4i3jk.fsf@topper.koldfront.dk> Message-ID: Thanks, applied. -hilmar On Apr 11, 2008, at 3:39 PM, Adam Sj?gren wrote: > On Fri, 11 Apr 2008 11:35:54 -0400, Hilmar wrote: > >> On Apr 11, 2008, at 4:53 AM, Adam Sj?gren wrote: > >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('type' => >>>> 'Bio::PrimarySeq'); > >> You need to prefix the argument with a dash: '-type', not 'type'. >> Otherwise, it assumes that the class you want instantiated is >> 'type.pm'. > > I guess that means I should submit a patch for the SYNOPSIS. Attached. > > > Thanks, > > Adam > > > Index: Bio/Factory/SequenceFactoryI.pm > =================================================================== > --- Bio/Factory/SequenceFactoryI.pm (revision 14654) > +++ Bio/Factory/SequenceFactoryI.pm (working copy) > @@ -20,7 +20,7 @@ > # get a Bio::Factory::SequenceFactoryI object like > > use Bio::Seq::SeqFactory; > - my $seqbuilder = Bio::Seq::SeqFactory->new('type' => > 'Bio::PrimarySeq'); > + my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > 'Bio::PrimarySeq'); > > my $seq = $seqbuilder->create(-seq => 'ACTGAT', > -display_id => 'exampleseq'); > > -- > "Well, I'm a moon around you" Adam > Sj?gren > > asjo at koldfront.dk > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mmokrejs at ribosome.natur.cuni.cz Fri Apr 11 21:32:14 2008 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Sat, 12 Apr 2008 03:32:14 +0200 Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon_id In-Reply-To: References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com> <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net> <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com> <93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com> Message-ID: <4800111E.3030802@ribosome.natur.cuni.cz> Chris Fields wrote: > The counter to that perspective (using new sequences with old tax info) > would be to regularly update NCBI taxonomy, particularly in > circumstances prior to adding new sequences. Hilmar mentioned that once > tax is loaded it doesn't take as long to update, so you could set up a > cron job to update regularly. > > I remember someone mentioning weekly or monthly updates on the list > quite a while ago, but I'm unsure how often NCBI updates tax information > (i.e. with every release, monthly, weekly, etc). I can see instances > popping up where you used the an up-to-date taxonomy but a new sequence > contains a tax ID not present. I think bioperl-db handles these but I'm > not sure what other Bio* do. > I spent some time benchmarking this and inspecting the mysql log files. The current load_ncbi_taxonomy.pl script with minor modification to show timestamps does this on initial import into mysql and then update of the database using exactly same dataset (but anyway it has to walk through all the data): $ ./load_ncbi_taxonomy.pl --dbname=biosqldb --driver=mysql --host=127.0.01 \ --port=3306 --directory=/home/mmokrejs/bioinformatics/databases/ncbitax/dump \ --chunksize=0 --verbose=2 --mycnf=~/.my.cnf Sat Apr 12 01:58:43 MEST 2008 Loading NCBI taxon database in /home/mmokrejs/bioinformatics/databases/ncbitax/dump: ... retrieving all taxon nodes in the database Sat Apr 12 01:58:43 MEST 2008 ... reading in taxon nodes from nodes.dmp Sat Apr 12 01:58:58 MEST 2008 ... insert / update / delete taxon nodes 10000/421098 done (in 5 secs, 2000.0 rows/s) 20000/421098 done (in 4 secs, 2500.0 rows/s) ... 420000/421098 done (in 4 secs, 2500.0 rows/s) Sat Apr 12 02:02:21 MEST 2008 ... (committing nodes) Sat Apr 12 02:02:21 MEST 2008 ... rebuilding nested set left/right values 10000 done (in 24 secs, 416.7 rows/s) 20000 done (in 26 secs, 384.6 rows/s) 30000 done (in 24 secs, 416.7 rows/s) ... 420004 done (in 23 secs, 434.8 rows/s) Sat Apr 12 02:19:25 MEST 2008 ... reading in taxon names from names.dmp Sat Apr 12 02:19:25 MEST 2008 ... deleting old taxon names Sat Apr 12 02:19:25 MEST 2008 ... inserting new taxon names 10000 done (in 8 secs, 1250.0 rows/s) 20000 done (in 8 secs, 1250.0 rows/s) ... 580000 done (in 5 secs, 2000.0 rows/s) Sat Apr 12 02:24:48 MEST 2008 ... cleaning up Sat Apr 12 02:24:49 MEST 2008 Done. $ I decided to re-import the same data to mimic at least somehow the future updates, although no record should be UPDATEd, except zapping left and right values with NULL. :(( $ ./load_ncbi_taxonomy.pl --dbname=biosqldb --driver=mysql --host=127.0.01 --port=3306 --directory=/home/mmokrejs/bioinformatics/databases/ncbitax/dump \ --chunksize=0 --verbose=2 --mycnf=~/.my.cnf Sat Apr 12 02:35:20 MEST 2008 Loading NCBI taxon database in /home/mmokrejs/bioinformatics/databases/ncbitax/dump: ... retrieving all taxon nodes in the database Sat Apr 12 02:35:26 MEST 2008 ... reading in taxon nodes from nodes.dmp Sat Apr 12 02:35:46 MEST 2008 ... insert / update / delete taxon nodes 10000/421098 done (in 0 secs, 10000.0 rows/s) 20000/421098 done (in 0 secs, 10000.0 rows/s) ... 410000/421098 done (in 0 secs, 10000.0 rows/s) 420000/421098 done (in 0 secs, 10000.0 rows/s) Sat Apr 12 02:35:55 MEST 2008 ... (committing nodes) Sat Apr 12 02:35:55 MEST 2008 ... rebuilding nested set left/right values 10000 done (in 9 secs, 1111.1 rows/s) 20000 done (in 9 secs, 1111.1 rows/s) ... 410004 done (in 8 secs, 1250.0 rows/s) 420004 done (in 9 secs, 1111.1 rows/s) Sat Apr 12 02:41:54 MEST 2008 ... reading in taxon names from names.dmp Sat Apr 12 02:41:54 MEST 2008 ... deleting old taxon names Sat Apr 12 02:41:55 MEST 2008 ... inserting new taxon names 10000 done (in 5 secs, 2000.0 rows/s) 20000 done (in 5 secs, 2000.0 rows/s) ... 570000 done (in 6 secs, 1666.7 rows/s) 580000 done (in 5 secs, 2000.0 rows/s) Sat Apr 12 02:47:27 MEST 2008 ... cleaning up Sat Apr 12 02:47:27 MEST 2008 Done. $ ls -la /var/log/mysql/mysql.log -rw-rw---- 1 mysql mysql 483443314 Apr 12 03:15 /var/log/mysql/mysql.log $ Pentium4 M laptop, 1.8GHz, 1 GB RAM, mysql-5.0.56 with enabled SQL text logging, the slow version of logging all SQL commands compared to binary logging. The log was cleared before the tests. I could provide some bits from the log or upload it somewhere if anybody else would like to dig into the details. I believe the recalculation step could be made faster. See what happens: 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '1' ORDER BY ncbi_taxon_id 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '10239' ORDER BY ncbi_taxon_id 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12333' ORDER BY ncbi_taxon_id 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12335' ORDER BY ncbi_taxon_id 31 Query UPDATE taxon SET left_value = NULL, right_value = NULL WHERE left_value = '4' 31 Query UPDATE taxon SET left_value = NULL, right_value = NULL WHERE right_value = '5' 31 Query UPDATE taxon SET left_value = '4', right_value = '5' WHERE taxon_id = '12335' 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12340' ORDER BY ncbi_taxon_id 31 Query UPDATE taxon SET left_value = NULL, right_value = NULL WHERE left_value = '6' 31 Query UPDATE taxon SET left_value = NULL, right_value = NULL WHERE right_value = '7' 31 Query UPDATE taxon SET left_value = '6', right_value = '7' WHERE taxon_id = '12340' The columns left_value and right_value have NULL value upon the table is created, so no need to write again NULL into them. This would mean writing a wrapper function which would mimic update() but before doing that it would do 'SELECT * FROM', compare the values with those to be written and include in the final UPDATE statement only those columns for which values have been changed. We use such a smart wrapper for our code in python. ;-) When the columns for left and right are to be made NULL during update of an existing database, I think it would be much faster to drop the columns and re-create them again with NULL values. I think it could be investigated more the possibility to create empty taxon and taxon_name tables as MyISAM tables and only after all the import and updates they could be converted into InnoDB tables. One would have to probably think a bit more of the foreign keys but it might be they would not even be lost during the conversion back and forth. Actually, easy to check. Dump your current taxon and taxon_name tables (maybe even without sql data using --without-data), run 'ALTER TABLE taxon ... type=MyISAM' followed by 'ALTER TABLE taxon ... type=InnoDB' dump again the database structure and compare by diff with the original. But, time for sleep here. Martin From sdavis2 at mail.nih.gov Fri Apr 11 23:50:44 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 11 Apr 2008 23:50:44 -0400 Subject: [Bioperl-l] Bio::ASN1::EntrezGene parse so slowly? In-Reply-To: <16602210.post@talk.nabble.com> References: <16602210.post@talk.nabble.com> Message-ID: <264855a00804112050gf785c2ei66d9c7463597eccd@mail.gmail.com> gene_info is a tab-delimited text file, if I recall correctly. Have you looked at it? If it is, you should be able to parse it in a few seconds with just a couple lines of code. Sean On Thu, Apr 10, 2008 at 1:08 AM, zoujing <1zoujing at 163.com> wrote: > > I want to parse a file "gene_info" from NCBI. The format of Gene in NCBI is > ASN1, right? So I used Bio::ASN1::EntrezGene. But it didn't work > properly/too slow. The file is about 500M. > The code is following: > use Bio::ASN1::EntrezGene; > my $parser = Bio::ASN1::EntrezGene->new('file' => $ARGV[0]); > my $i = 0; > while(my $result = $parser->next_seq) > { last; #something to do there, here use last for test} > > When it goes to the "while" part, it is processing on and on, it does not > went out, even I used "last" in the "while" part. > So I wonder whether it is too slow or the module is not fit for this job, > or I did something wrong? > > Thank you! > -- > View this message in context: http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16602210.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From david at burt7259.freeserve.co.uk Sat Apr 12 13:01:57 2008 From: david at burt7259.freeserve.co.uk (David Burt) Date: Sat, 12 Apr 2008 18:01:57 +0100 Subject: [Bioperl-l] bioperl-db Message-ID: Hi Hilmar, Hope you can help ? I am using bioperl-db to create a biosql database I have used scripts load_seqdatabase.pl and load_ontology.pl to install human swissprot entries, gene ontology, sequence ontology and now want to load interpro Here?s the command line I have tried perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser root --dbpass chicken --driver mysql \ --namespace "InterPro" --format InterPro interpro.xml But I get this message Can't call method "identifier" on an undefined value at /cygdrive/c/ Bioinformatics/Ensembl/src/bioperl-live/Bio/Ontology/ SimpleOntologyEngine.pm line 395 Any ideas? Dave PS: here?s the top of the interpro.xml file Kringle From hlapp at gmx.net Sat Apr 12 14:10:44 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 12 Apr 2008 14:10:44 -0400 Subject: [Bioperl-l] personal vs list email Message-ID: I'm not sure why but I have received several Bioperl or BioSQL- related email inquiries directed to me *personally* over the past few weeks. I have been responding as I get to them, but I feel that I am doing both the senders and this community a poor service, because sometimes someone else on the list could have responded much faster, and when I respond, others on the list who happen to be interested in the same question don't get to see the answer. So from now on as a policy I will redirect *every* email sent to me personally and that asks a question related to one of the projects to the respective mailing list. If you don't want this, please conspicuously say so at the top of your email, and in that case if you do ask a project-related question be prepared to wait and to possibly needing to follow up. As an aside, it's a pretty safe assumption to make that all other core developers, and quite possibly *all* developers are following a similar policy, whether expressly or not. Isn't this somewhere in the FAQ too? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Apr 12 14:16:13 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 12 Apr 2008 14:16:13 -0400 Subject: [Bioperl-l] bioperl-db In-Reply-To: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> Message-ID: Hi Burt, can you try format interprosax instead of interpro? That variant is also much more graceful regarding required space. -hilmar On Apr 12, 2008, at 1:01 PM, David Burt wrote: > Hi Hilmar, > > Hope you can help ? I am using bioperl-db to create a biosql database > > I have used scripts load_seqdatabase.pl and load_ontology.pl to > install human swissprot entries, gene ontology, sequence ontology > and now want to load interpro > > Here?s the command line I have tried > > perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser > root --dbpass chicken --driver mysql \ > --namespace "InterPro" --format InterPro interpro.xml > > But I get this message > > Can't call method "identifier" on an undefined value at /cygdrive/ > c/Bioinformatics/Ensembl/src/bioperl-live/Bio/Ontology/ > SimpleOntologyEngine.pm line 395 > > Any ideas? > > Dave > > PS: here?s the top of the interpro.xml file > > > > > > > > > file_date="04-OCT-2006 00:00:00" /> > file_date="22-NOV-2006 00:00:00" /> > file_date="12-JUN-2007 00:00:00" /> > file_date="22-SEP-2005 00:00:00" /> > file_date="23-APR-2004 00:00:00" /> > file_date="14-NOV-2006 00:00:00" /> > file_date="27-JUL-2007 00:00:00" /> > file_date="28-SEP-2007 00:00:00" /> > file_date="11-SEP-2006 00:00:00" /> > file_date="30-NOV-2006 00:00:00" /> > entry_count="359942" file_date="18-MAR-2008 00:00:00" /> > file_date="18-MAR-2008 00:00:00" /> > file_date="19-MAR-2008 00:00:00" /> > file_date="27-MAR-2007 00:00:00" /> > file_date="12-JUL-2007 16:56:17" /> > > protein_count="352"> > Kringle > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sat Apr 12 16:17:43 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 12 Apr 2008 15:17:43 -0500 Subject: [Bioperl-l] [BioSQL-l] personal vs list email In-Reply-To: References: Message-ID: On Apr 12, 2008, at 1:10 PM, Hilmar Lapp wrote: > I'm not sure why but I have received several Bioperl or BioSQL- > related email inquiries directed to me *personally* over the past > few weeks. > > I have been responding as I get to them, but I feel that I am doing > both the senders and this community a poor service, because > sometimes someone else on the list could have responded much faster, > and when I respond, others on the list who happen to be interested > in the same question don't get to see the answer. > > So from now on as a policy I will redirect *every* email sent to me > personally and that asks a question related to one of the projects > to the respective mailing list. If you don't want this, please > conspicuously say so at the top of your email, and in that case if > you do ask a project-related question be prepared to wait and to > possibly needing to follow up. > > As an aside, it's a pretty safe assumption to make that all other > core developers, and quite possibly *all* developers are following a > similar policy, whether expressly or not. I agree; I'm sure several other core devs feel the same way. I always try to forward these to the list if I feel it is more relevant there. > Isn't this somewhere in the FAQ too? > > -hilmar No, but I've added it to the bioperl FAQ; might be worth checking over and editing. chris From hlapp at gmx.net Sat Apr 12 18:40:53 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 12 Apr 2008 18:40:53 -0400 Subject: [Bioperl-l] bioperl-db In-Reply-To: <000001c89ce2$5400a710$0202a8c0@STUDYPC> References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> <000001c89ce2$5400a710$0202a8c0@STUDYPC> Message-ID: <3F77F49A-9C9E-4450-AE28-46F00CADBC8B@gmx.net> Burt - please keep your replies on the list. Others may have input too, or benefit from the answer too. As there is no name() method call on line 914 in the current version let's check first that you run a current version of BioPerl. It will need to be at least 1.5.2. However, I do suspect a problem in either the InterPro file itself (wouldn't be the first time), or the InterPro parser. -hilmar On Apr 12, 2008, at 5:15 PM, David Burt wrote: > Hilmar > > Many thanks seems to be working > > But got this output ? any comments/ideas what it means ? > > Dave > > > perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser > root --dbpass chicken --driver mysql \ > > --namespace "InterPro" --format interprosax interpro.xml > ...deleting all relationships for InterPro > ...parsing and loading InterPro > Can't call method "name" on an undefined value at load_ontology.pl > line 914. > > HERE?S the name and definition in the ontology table > > Name = InterPro > > Definition = > > PANTHER version 6.1, 30128 entries, 04-OCT-2006 > PFAM version 21.0, 8957 entries, 22-NOV-2006 > PIRSF version 2.70, 2877 entries, 12-JUN-2007 > PRINTS version 38.0, 1900 entries, 22-SEP-2005 > PRODOM version 2005.1, 1522 entries, 23-APR-2004 > PROSITE version 20.0, 2006 entries, 14-NOV-2006 > SMART version 5.1, 724 entries, 27-JUL-2007 > TIGRFAMs version 7.0, 3423 entries, 28-SEP-2007 > GENE3D version 3.0.0, 2147 entries, 11-SEP-2006 > SSF version 1.69, 1538 entries, 30-NOV-2006 > SWISSPROT version 55.1, 359942 entries, 18-MAR-2008 > TREMBL version 38.1, 5443281 entries, 18-MAR-2008 > INTERPRO version 17.0, 16175 entries, 19-MAR-2008 > GO version N/A, 23937 entries, 27-MAR-2007 > MEROPS version 7.8, 2831 entries, 12-JUL-2007 | > > > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: 12 April 2008 19:16 > To: David Burt > Cc: Bioperl BioPerl > Subject: Re: bioperl-db > > Hi Burt, > > can you try format interprosax instead of interpro? That variant is > also much more graceful regarding required space. > > -hilmar > > On Apr 12, 2008, at 1:01 PM, David Burt wrote: > > > Hi Hilmar, > > Hope you can help ? I am using bioperl-db to create a biosql database > > I have used scripts load_seqdatabase.pl and load_ontology.pl to > install human swissprot entries, gene ontology, sequence ontology > and now want to load interpro > > Here?s the command line I have tried > > perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser > root --dbpass chicken --driver mysql \ > --namespace "InterPro" --format InterPro interpro.xml > > But I get this message > > Can't call method "identifier" on an undefined value at /cygdrive/ > c/Bioinformatics/Ensembl/src/bioperl-live/Bio/Ontology/ > SimpleOntologyEngine.pm line 395 > > Any ideas? > > Dave > > PS: here?s the top of the interpro.xml file > > > > > > > > > file_date="04-OCT-2006 00:00:00" /> > file_date="22-NOV-2006 00:00:00" /> > file_date="12-JUN-2007 00:00:00" /> > file_date="22-SEP-2005 00:00:00" /> > file_date="23-APR-2004 00:00:00" /> > file_date="14-NOV-2006 00:00:00" /> > file_date="27-JUL-2007 00:00:00" /> > file_date="28-SEP-2007 00:00:00" /> > file_date="11-SEP-2006 00:00:00" /> > file_date="30-NOV-2006 00:00:00" /> > entry_count="359942" file_date="18-MAR-2008 00:00:00" /> > file_date="18-MAR-2008 00:00:00" /> > file_date="19-MAR-2008 00:00:00" /> > file_date="27-MAR-2007 00:00:00" /> > file_date="12-JUL-2007 16:56:17" /> > > protein_count="352"> > Kringle > > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Apr 12 18:43:25 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 12 Apr 2008 18:43:25 -0400 Subject: [Bioperl-l] bioperl-db In-Reply-To: <000001c89ce5$a5df2e50$0202a8c0@STUDYPC> References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> <000001c89ce5$a5df2e50$0202a8c0@STUDYPC> Message-ID: I'm not sure what you mean by 'Check interpro.xml', but you can use the --safe command-line option to keep going if an individual term fails to load for whatever reason. Can you post the data for the seemingly offending record? (and please cc the list) -hilmar On Apr 12, 2008, at 5:39 PM, David Burt wrote: > Hi Hilmar > > Just checked mysql database and only have 39 entries under interpro > and loaded up to IPR000035 > > Check unterpro.xml looks OK from IPR000036 and onwards > > So seems to have crashed at IPR000035 ? > > dave > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: 12 April 2008 19:16 > To: David Burt > Cc: Bioperl BioPerl > Subject: Re: bioperl-db > > Hi Burt, > > can you try format interprosax instead of interpro? That variant is > also much more graceful regarding required space. > > -hilmar > > On Apr 12, 2008, at 1:01 PM, David Burt wrote: > > > Hi Hilmar, > > Hope you can help ? I am using bioperl-db to create a biosql database > > I have used scripts load_seqdatabase.pl and load_ontology.pl to > install human swissprot entries, gene ontology, sequence ontology > and now want to load interpro > > Here?s the command line I have tried > > perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser > root --dbpass chicken --driver mysql \ > --namespace "InterPro" --format InterPro interpro.xml > > But I get this message > > Can't call method "identifier" on an undefined value at /cygdrive/ > c/Bioinformatics/Ensembl/src/bioperl-live/Bio/Ontology/ > SimpleOntologyEngine.pm line 395 > > Any ideas? > > Dave > > PS: here?s the top of the interpro.xml file > > > > > > > > > file_date="04-OCT-2006 00:00:00" /> > file_date="22-NOV-2006 00:00:00" /> > file_date="12-JUN-2007 00:00:00" /> > file_date="22-SEP-2005 00:00:00" /> > file_date="23-APR-2004 00:00:00" /> > file_date="14-NOV-2006 00:00:00" /> > file_date="27-JUL-2007 00:00:00" /> > file_date="28-SEP-2007 00:00:00" /> > file_date="11-SEP-2006 00:00:00" /> > file_date="30-NOV-2006 00:00:00" /> > entry_count="359942" file_date="18-MAR-2008 00:00:00" /> > file_date="18-MAR-2008 00:00:00" /> > file_date="19-MAR-2008 00:00:00" /> > file_date="27-MAR-2007 00:00:00" /> > file_date="12-JUL-2007 16:56:17" /> > > protein_count="352"> > Kringle > > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From Russell.Smithies at agresearch.co.nz Sun Apr 13 22:51:41 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 14 Apr 2008 14:51:41 +1200 Subject: [Bioperl-l] Tandem Repeats Finder? In-Reply-To: References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC><000001c89ce5$a5df2e50$0202a8c0@STUDYPC> Message-ID: Has anyone tried TRF? I notice UCSC is using it for all their simple repeat annotations and thought it might be better than what we're currently using (Sputnik) And is there a BioPerl parser for it's output or am I going to have to write my own ? Thanx, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Russell.Smithies at agresearch.co.nz Sun Apr 13 22:53:46 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 14 Apr 2008 14:53:46 +1200 Subject: [Bioperl-l] Tandem Repeats Finder? In-Reply-To: References: Message-ID: Scratch the need for a parser. I turned off html output and it's all nice white-space separated text :-) Russell > -----Original Message----- > From: Smithies, Russell > Sent: Monday, 14 April 2008 2:52 p.m. > To: 'Bioperl BioPerl' > Subject: Tandem Repeats Finder? > > Has anyone tried TRF? > I notice UCSC is using it for all their simple repeat annotations and thought it might > be better than what we're currently using (Sputnik) > > And is there a BioPerl parser for it's output or am I going to have to write my own ? > > Thanx, > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E? russell.smithies at agresearch.co.nz > > Invermay? Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T? +64 3 489 3809 > F? +64 3 489 9174 > www.agresearch.co.nz > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From csaba.ortutay at gmail.com Mon Apr 14 00:15:22 2008 From: csaba.ortutay at gmail.com (Ortutay Csaba =?iso-8859-1?q?P=E9ter?=) Date: Mon, 14 Apr 2008 07:15:22 +0300 Subject: [Bioperl-l] Tandem Repeats Finder? In-Reply-To: References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> Message-ID: <200804140715.22702.csaba.ortutay@gmail.com> Hello, I have used TRF in my earlier projects. It is nice and quick tool. There was not ready made parsers those times (5-6 years ago) so we have written our own. Csaba > Has anyone tried TRF? > I notice UCSC is using it for all their simple repeat annotations and > thought it might be better than what we're currently using (Sputnik) > > And is there a BioPerl parser for it's output or am I going to have to > write my own ? > > Thanx, -- Csaba Ortutay PhD IMT Bioinformatics University of Tampere Finland From avilella at gmail.com Mon Apr 14 07:13:26 2008 From: avilella at gmail.com (Albert Vilella) Date: Mon, 14 Apr 2008 12:13:26 +0100 Subject: [Bioperl-l] how can I print a Bio::Tree newick sortby given list? Message-ID: <358f4d650804140413x4271f18bx40af1b9054306df8@mail.gmail.com> Hi, I have a newick file that I want to sort by a given order and print again as newick. For example, if I have (((ENSPTRG00000013811:0.0011,ENSG00000142192:0.0021):0.0033,ENSPPYG00000003902:0.0326):0.0000,ENSMMUG00000014384:0.0366):0.3638; I want to sort it by "ENSG:ENSPTRG:ENSPPYG:ENSMMUG". Any suggestions on how to do this in bioperl? Cheers, Albert. From lamq at usal.es Mon Apr 14 11:01:51 2008 From: lamq at usal.es (Luis A. M. Quintales) Date: Mon, 14 Apr 2008 17:01:51 +0200 Subject: [Bioperl-l] xyplot glyph: scale problems Message-ID: <480371DF.7040900@usal.es> I have some problem with the xyplot scale numbers calculated by the glyph. The shape of the graph looks fine, but the scale number 10 and his position in the ouput is not correct. I send the source code, simplified input file and the png output. Thank you Source code ex1.pl (also in http://avellano.usal.es/~luis/bioperl-l/ex1.pl) ============================ #!/usr/bin/perl use Bio::DB::GFF; use Bio::Graphics::Panel; use strict; my $filin = $ARGV[0]; my $db = Bio::DB::GFF->new( -dsn => $filin,-adaptor => 'memory', -aggregator => 'at{atpc:atfreq}' ); my $segment = $db->segment('chr1'); my @features = $segment->features('at'); my $panel = Bio::Graphics::Panel->new( -offset => 0, -grid => 100, -length => 500, -width => 800, -pad_left => 50, -pad_right => 50 ); $panel->add_track($segment, -glyph => 'generic', -bgcolor => 'blue', -label => 1); $panel->add_track(\@features, -glyph => 'xyplot', -graph_type=>'boxes', -scale=>'left', -height=>200, ); open (FI,"> sal.png"); ============================ in1.gff file (also in http://avellano.usal.es/~luis/bioperl-l/in1.gff) ============================ ##sequence-region chr1 1 5578650 chr1 atfreq atpc 1 10 64.0000 . . atpc 1 chr1 atfreq atpc 11 20 63.0000 . . atpc 1 chr1 atfreq atpc 21 30 62.0000 . . atpc 1 chr1 atfreq atpc 31 40 59.0000 . . atpc 1 chr1 atfreq atpc 41 50 59.0000 . . atpc 1 chr1 atfreq atpc 51 60 59.0000 . . atpc 1 chr1 atfreq atpc 61 70 59.0000 . . atpc 1 chr1 atfreq atpc 71 80 59.0000 . . atpc 1 chr1 atfreq atpc 81 90 61.0000 . . atpc 1 chr1 atfreq atpc 91 100 60.0000 . . atpc 1 chr1 atfreq atpc 101 110 60.0000 . . atpc 1 chr1 atfreq atpc 111 120 64.0000 . . atpc 1 chr1 atfreq atpc 121 130 64.0000 . . atpc 1 chr1 atfreq atpc 131 140 60.0000 . . atpc 1 chr1 atfreq atpc 141 150 60.0000 . . atpc 1 chr1 atfreq atpc 151 160 63.0000 . . atpc 1 chr1 atfreq atpc 161 170 62.0000 . . atpc 1 chr1 atfreq atpc 171 180 59.0000 . . atpc 1 chr1 atfreq atpc 181 190 54.0000 . . atpc 1 chr1 atfreq atpc 191 200 53.0000 . . atpc 1 chr1 atfreq atpc 201 210 54.0000 . . atpc 1 chr1 atfreq atpc 211 220 50.0000 . . atpc 1 chr1 atfreq atpc 221 230 51.0000 . . atpc 1 chr1 atfreq atpc 231 240 56.0000 . . atpc 1 chr1 atfreq atpc 241 250 58.0000 . . atpc 1 chr1 atfreq atpc 251 260 55.0000 . . atpc 1 chr1 atfreq atpc 261 270 54.0000 . . atpc 1 chr1 atfreq atpc 271 280 56.0000 . . atpc 1 chr1 atfreq atpc 281 290 59.0000 . . atpc 1 chr1 atfreq atpc 291 300 58.0000 . . atpc 1 chr1 atfreq atpc 301 310 60.0000 . . atpc 1 chr1 atfreq atpc 311 320 59.0000 . . atpc 1 chr1 atfreq atpc 321 330 59.0000 . . atpc 1 chr1 atfreq atpc 331 340 57.0000 . . atpc 1 chr1 atfreq atpc 341 350 56.0000 . . atpc 1 chr1 atfreq atpc 351 360 57.0000 . . atpc 1 chr1 atfreq atpc 361 370 57.0000 . . atpc 1 chr1 atfreq atpc 371 380 58.0000 . . atpc 1 chr1 atfreq atpc 381 390 56.0000 . . atpc 1 chr1 atfreq atpc 391 400 58.0000 . . atpc 1 chr1 atfreq atpc 401 410 56.0000 . . atpc 1 chr1 atfreq atpc 411 420 59.0000 . . atpc 1 chr1 atfreq atpc 421 430 58.0000 . . atpc 1 chr1 atfreq atpc 431 440 59.0000 . . atpc 1 chr1 atfreq atpc 441 450 58.0000 . . atpc 1 chr1 atfreq atpc 451 460 58.0000 . . atpc 1 chr1 atfreq atpc 461 470 56.0000 . . atpc 1 chr1 atfreq atpc 471 480 57.0000 . . atpc 1 chr1 atfreq atpc 481 490 59.0000 . . atpc 1 ============================ The sal.png : http://avellano.usal.es/~luis/bioperl-l/sal.png Thank you. -- ================================================== Luis Antonio Miguel Quintales Departamento de Inform?tica y Autom?tica Facultad de Ciencias Universidad de Salamanca Plaza de la Merced s/n 37008-SALAMANCA SPAIN ================================================== Tel.: +34-923-294400(ext.1513) Fax.: +34-923-294584 E-mail: lamq at usal.es ================================================== From aaron.j.mackey at gsk.com Mon Apr 14 09:00:52 2008 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Mon, 14 Apr 2008 09:00:52 -0400 Subject: [Bioperl-l] personal vs list email In-Reply-To: Message-ID: I try to take it even one step further: I require the person to re-ask their question on the mailing list (and then try to answer it there). This has the added benefit of causing the person to pause a moment to reflect on their question, and (sometimes) to spend a bit more time preparing the question for more broader public consumption. -Aaron From sutripa at vbi.vt.edu Mon Apr 14 12:54:47 2008 From: sutripa at vbi.vt.edu (Sucheta Tripathy) Date: Mon, 14 Apr 2008 12:54:47 -0400 (EDT) Subject: [Bioperl-l] Error installing XML::Parser Message-ID: <1285.99.152.150.87.1208192087.squirrel@webmail.vbi.vt.edu> Hello List, I have recently installed bioperl using the following command. The installation was successful. Now I am trying to install XML::Parser but it returns with error messages. Any clue what I may be doing wrong? Thanks Sucheta Following is the last part of the error message: ### Error Message ####### Expat.c: In function ??~XS_XML__Parser__Expat_SkipUntil??T: Expat.c:2664: error: ??~XML_Parser??T undeclared (first use in this function) Expat.c:2664: error: expected ??~;??T before ??~parser??T Expat.c:2665: warning: ISO C90 forbids mixed declarations and code Expat.xs:2179: error: ??~parser??T undeclared (first use in this function) Expat.xs:2179: warning: cast to pointer from integer of different size Expat.xs:2180: error: ??~CallbackVector??T has no member named ??~st_serial??T Expat.xs:2182: error: ??~CallbackVector??T has no member named ??~skip_until??T Expat.c: In function ??~XS_XML__Parser__Expat_Do_External_Parse??T: Expat.c:2687: error: ??~XML_Parser??T undeclared (first use in this function) Expat.c:2687: error: expected ??~;??T before ??~parser??T Expat.c:2688: warning: ISO C90 forbids mixed declarations and code Expat.xs:2194: error: ??~parser??T undeclared (first use in this function) Expat.xs:2194: warning: cast to pointer from integer of different size Expat.xs:2205: warning: unused variable ??~pret??T Expat.xs:2194: warning: unused variable ??~cbv??T Expat.xs:2192: warning: unused variable ??~type??T make[1]: *** [Expat.o] Error 1 make[1]: Leaving directory `/root/.cpan/build/XML-Parser-2.36/Expat' make: *** [subdirs] Error 2 /usr/bin/make -- NOT OK Running make test Can't test without successful make Running make install make had returned bad status, install seems impossible ##### -- Sucheta Tripathy, Ph.D. Virginia Bioinformatics Institute Phase-I Washington street. Virginia Tech. Blacksburg,VA 24061-0447 phone:(540)231-8138 Fax: (540) 231-2606 From mmokrejs at ribosome.natur.cuni.cz Tue Apr 15 06:45:48 2008 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Tue, 15 Apr 2008 12:45:48 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> <47FA4AD2.5030206@uv.es> Message-ID: <4804875C.80506@ribosome.natur.cuni.cz> Chris Fields wrote: > Note in the example I gave that, during the revision history, the > DBSOURCE changed at the point of the creation date (the original nuc. > record was a M. tuberculosis contig sequence, which later changed to > an updated full M. tuberculosis genome record at the time of the > 'create date'). > > Couldn't find anything specific in the GenBank docs on this, but it > appears (at least for a protein record) the creation date reflects > the date in which the sequence was either originally deposited or > originally derived from the nucleotide source record present in the > record. In other words, it may not reflect the original date of > deposition (which could have come from a different record, as in this > case). > > chris Hi, I have few answers from the past from NCBI staff to my similar questions regarding DATE issues and VERSION numbers not being increased upon "changes" in a record. I tried below to put into a more readable form my former correspondence. Hope this helps everybody to understand what happens in the black box. ;) Martin Date: Thu, 17 Jan 2002 15:40:07 -0500 (EST) From: David Wheeler Subject: Brucella_melitensis on ftp site > Hi, I'd like to point you to the fact, that the descriptions of > Brucella_melitensis differ in > ftp.ncbi.nih.nlm.gov/genomes/Bacteria/Brucella_melitensis and > ftp.ncbi.nih.nlm.gov/genbank/genomes/Bacteria/Brucella_melitensis > > Namely, the description of the strain is retained in *.gbk files > under /genomes/Bacteria/Brucella_melitensis only under the strain > description field, but not in the DEFINITION line, where it is > present in *.gbk files under > /genbank/genomes/Bacteria/Brucella_melitensis. > > LOCUS NC_003318 1177787 bp DNA circular BCT > 13-NOV-2001 DEFINITION Brucella melitensis chromosome II, complete > sequence. ACCESSION NC_003318 VERSION NC_003318.1 GI:17988344 > > compared to > > LOCUS AE008918 1177787 bp DNA circular BCT > 27-DEC-2001 DEFINITION Brucella melitensis strain 16M chromosome II, > complete sequence. ACCESSION AE008918 VERSION AE008918 > > This makes me worried about the data. Why is the release date of > NON-curated files (AE008918) newer than the release data of CURATED > data (NC_003318)? Is it expected case? Could someone explain me the > difference between them (i.e. CURATED vs. NONCURATED)? The curated record is initially a copy of the non-curated record with certain changes in documentation made in order to comply with the NCBI standard for reference genomes. One change which you have noticed is the difference in Definition line format. Curated genomic records are created in order to standardize annotation for genomes in the Entrez Genomes database while leaving editorial control for the parent GenBank records in the hands of the original submitters. Regardles of the date you see on the record, the curated version is derived from the non-curated one. In this case, it appears that the processing of the non-curated version lagged a little bit relative to that of the curated version. Normally, however, the non-curated version will have the earlier date. Date: Sun, 27 Jan 2002 00:16:55 -0500 (EST) From: David Wheeler Subject: Re: CONSULT: Brucella_melitensis on ftp site > Are the raw sequence data always same in non-curated and curated > flatfiles? > > Is the annotation of orf's/proteins different between them? > > Are there any new or withdrawn orf's or proteins in the curated > flatfiles compared to non-curated ones? > > My feeling is that no-one except original submitters can modify > submitted data, so you cannot modify non-curated files, i.e. cannot > modify them and increase the version number. > > Because of that, you've introduced curated versions, which are just > copies of original but public data so you are free to modify it. So > once again, are the differences between non-curated and curated > flatfiles only in structure of the file? I don't think so. Examples > would be Listeria genomes or the 2 Agrobacterium's, if I remember > right. Initially, there should be no or very few differences, however, as time goes by, differences in the annotation will materialize. There may also be differences in the sequence, if errors in the original sequence come to light, but these differences should be very rare. So, practically speaking, you will probably find few differences but, since the purpose of the Refseq is to curate, there may well be some differences. Date: Mon, 17 Dec 2001 11:57:06 -0500 (EST) From: Dawn Lipshultz Subject: Re: Buggy date in Staphylococcus aureus N315 >>>> Hi, I've found there has been released Staphylococcus aureus >>>> N315 on 01-JAN-1900, which is nonsense. I guss you had y2K bug. >>>> >>>> >>>> Please see >>>> >> ftp://ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Staphylococcus_aureus_N315/BA000018.gbk >> >>>> >>>> Can you please tell me the real release date? >>>> >>>> Also, is newer the NC_xxxx for Staphylococcus aureus N315 under >>>> >>>> ftp://ncbi.nlm.nih.gov/genomes/Bacteria/Staphylococcus_aureus_N315/ >>>> or this BA000018 non-cured version? >>>> >>>> >>>> LOCUS BA000018 2814816 bp DNA circular BCT >>>> 01-JAN-1900 DEFINITION Staphylococcus aureus strain N315, >>>> complete genome. >>> AP003129-AP003138. They are all dated June 2001. >>> >>> The date for the record in the ftp file is April 2001. The record >>> in GenBank (NC_002745) is dated October 2001. This version is >>> apparently more updated than the one on the ftp site. Therefore, >>> you may want to download the sequence from GenBank rather than >>> the ftp site. >>> >>> Regards, Dawn S. Lipshultz >> I cannot find the record to which you refer in your message. When I >> did a search for accession number BA000018, I received results for >> accession numbers AP003129-AP003138. They are all dated June 2001. >> >> >> The date for the record in the ftp file is April 2001. The record >> in GenBank (NC_002745) is dated October 2001. This version is >> apparently more updated than the one on the ftp site. Therefore, >> you may want to download the sequence from GenBank rather than the >> ftp site. Regards, Dawn S. Lipshultz > > Hmm, but I do get: > http://www.ncbi.nlm.nih.gov:80/cgi-bin/Entrez/framik?db=genome&gi=179 > > > look at the "GenBank: NC_002745" text in left upper part of the > window, it points to that OLD ftp file. The "RefSeq: NC_002745" > points to the April 2001 version. So what is the right way to get the > October 2001 release? > > Where can I find the difference between NC_002745 from April compared > to NC_002745 from October? > > What do you mean with "you may want to download the sequence from > GenBank rather than the ftp site."? > > BOTH ftp directories at ftp://ncbi.nlm.nih.gov are outdated. I mean > the genomes/Bacteria/Staphylococcus_aureus_N315/NC_002745.* version > and also the > genbank/genomes/Bacteria/Staphylococcus_aureus_N315/BA000018.* > version. > > The web links from www.ncbi.nlm.nih.gov:80/cgi-bin/Entrez/ point > anyway to the ftp site. Do you want to say that the ftp version > aren't updated anymore? The genome was originally released into the database on 4/20/2001 as 10 pieces with secondary accession number BA000018. You can find these pieces in Entrez nucleotides by querying with BA000018. The Genomes group here will fix the date on the record that is available from Entrez genomes. Regards, Dawn Date: Fri, 16 Nov 2001 16:09:59 -0500 (EST) From: Susan Dombrowski Subject: Re: Agrobacterium tumefaciens C58 > Dear colleague, I've noticed that there're somehow updated on Oct 17 > the genomic flatfiles of Agrobacterium tumefaciens C58 at > ftp://ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Agrobacterium_tumefaciens/. > However, for example the AE007869.gbs does NOT self-explain what has > been changed and also the VERSION number is not increased. Would you > please explain what's the change, when can I find such information > next time on web? > > I've used the published sequence from your ftp site on 2001-08-29 > with same ID and would like to know, what differs. > > LOCUS AE007869 2841581 bp DNA circular CON > 17-OCT-2001 DEFINITION Agrobacterium tumefaciens strain C58 circular > chromosome, complete sequence. ACCESSION AE007869 VERSION > AE007869 Dear Colleague, The version number of a sequence will *only* change if the content of the actual sequence has changed in any way since it was first made available. Although the date has changed, this date refers to the last time the actual record was manipulated by an NCBI staff member. Even if there is something simple, like adding a reference, changing a spelling mistake, etc., this will cause a change in the date field of the record. Thus, since the version has not changed, there are no differences to report. Best Regards, Susan Date: Wed, 26 Jun 2002 11:04:48 -0400 (EDT) From: Eric Sayers Subject: Re: Mesorhizobium_loti flatfiles >>>>> Hi, >>>>> I've found that you again silently changed flatfiles lying on your ftp >>>>> some time ago without changing the revision number. Please apologize me, >>>>> but this really causes troubles to other people working in this so called >>>>> bioinformatics. :( >>>>> >>>>> A week ago there was: >>>>> >>>>> LOCUS NC_002678 7036074 bp DNA circular BCT 10-SEP-2001 >>>>> DEFINITION Mesorhizobium loti, complete genome. >>>>> ACCESSION NC_002678 >>>>> VERSION NC_002678.1 GI:13470324 >>>>> >>>>> >>>>> and two other plasmid sequences. This yelds 7275 proteins. >>>>> >>>>> But, last autumn there was: >>>>> >>>>> LOCUS NC_002678 7036074 bp DNA circular BCT 28-MAR-2001 >>>>> DEFINITION Mesorhizobium loti, complete genome. >>>>> ACCESSION NC_002678 >>>>> VERSION NC_002678.1 GI:13470324 >>>>> >>>>> >>>>> That version had 7281 proteins in total. >>>>> I have simple questions: "Why was NOT changed the VERSION number?". >>>>> >>>>> Do I understand it wrong, that it should get updated whenever a single >>>>> character in the file contents is changed? >>> >>>> The version number of a sequence only changes if the sequence itself is >>>> modified. If anything else in the flat file is changed (ie spelling, authors, >>>> annotations, etc) the version will not change. However, the modification date in >>> >>> Sorry, do you under annotation also mean number of predicted genes, their >>> coordinates(position) etc? >>> >>>> the top line of the flat file will change for any of these modifications. (Note >>>> that the dates are different in the file you display: Mar 28, 2001 vs Sept 10, >>>> 2001.) I would track the modification date rather than or as well as the version >>>> number to catch all changes in the files. >>>> Regards, >>>> Eric W. Sayers, Ph.D. >>> >>> OK, but unless some of our programs have been buggy before or now (in >>> either of those cases have failed to extract genes from flatfiles), I do >>> not have an explanation for the differencies in amount of >>> predicted/annotated genes. >>> >>> I do not have anymore available the old flatfiles from Mar 28, but it >>> seems to me that these were newly introduced in the Sept. 10 version: >>> gi_15600768, gi_15600770, gi_15600769, gi_15600766, gi_15600767 >> >> Dear Colleague, >> Again, the only reason the version number will change is if the sequence itself >> changes. The number of annotated/predicted genes is merely an annotation on the >> sequence, and does not change the sequence itself. Therefore, the version will >> not change when the number of annotations changes. The modification date on the >> flat file will (and did) change, of course. >> >> Regards, >> Eric W. Sayers, Ph.D. > > Finally I've heard that from someone, thanks! > Now just tell me, how can I figure out what changed between those > different "date" releases? Is there a changelog available? > I consider annotations changes very important. We do not provide the details of flat file changes on our public websites, except for changes in the version number (ie actual sequence changes). In that particular case, all of the previous versions are linked to the current one. My advice to you if you want to chronicle non-sequence changes would be to check the flat files of interest periodically (by a script, for example) and look for changes in the modification dates. You could then simply compare the before and after flat files. Regards, Eric W. Sayers, Ph.D. > Hi, Miguel: > > id1_fetch can do it. Detailed instruction can be found at: > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id1_fetch.html > > Here is an example: > >> >id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > GI Loaded DB Retrieval No. > -- ------ -- ------------- > 74311105 12/07/2007 NCBI 19766263 > 74311105 01/23/2007 NCBI 16325656 > 74311105 03/30/2006 NCBI 13131204 > 74311105 03/03/2006 NCBI 12915541 > 74311105 03/02/2006 NCBI 12885275 > 74311105 12/03/2005 NCBI 12259793 > 74311105 09/09/2005 NCBI 11257262 > 74311105 09/09/2005 NCBI 11242667 > > Wenwu Cui PhD From david at burt7259.freeserve.co.uk Sun Apr 13 10:32:31 2008 From: david at burt7259.freeserve.co.uk (David Burt) Date: Sun, 13 Apr 2008 15:32:31 +0100 Subject: [Bioperl-l] bioperl-db In-Reply-To: <3F77F49A-9C9E-4450-AE28-46F00CADBC8B@gmx.net> References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> <000001c89ce2$5400a710$0202a8c0@STUDYPC> <3F77F49A-9C9E-4450-AE28-46F00CADBC8B@gmx.net> Message-ID: <000001c89d73$3b49eec0$0202a8c0@STUDYPC> Hi Hilmar Many thanks for info - tried a few things 1. First tried --safe flag perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser root --dbpass chicken --driver mysql --safe \ --namespace "InterPro" --format interprosax interpro.xml Still got same output as before ...deleting all relationships for InterPro ...parsing and loading InterPro Can't call method "name" on an undefined value at load_ontology.pl line 914 Only 35 interpro entries entered into database 2. I am using bioperl 1.5.2 3. I downloaded Release 17.0, 20 March 2008 of the interpro.xml file from ftp://ftp.ebi.ac.uk/pub/databases/interpro/ I did not send this file, sine it was ~10Mb gzipped Dave From david at burt7259.freeserve.co.uk Sun Apr 13 10:53:43 2008 From: david at burt7259.freeserve.co.uk (David Burt) Date: Sun, 13 Apr 2008 15:53:43 +0100 Subject: [Bioperl-l] bioperl-db In-Reply-To: References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> <000001c89ce5$a5df2e50$0202a8c0@STUDYPC> Message-ID: <000001c89d76$319be060$0202a8c0@STUDYPC> Hilmar Also updated copy of bioperl - see output below root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src $ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' 1.005002101 root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src $ cvs -d :pserver:cvs at cvs.bioperl.org:/home/repository/bioperl login Logging in to :pserver:cvs at cvs.bioperl.org:2401/home/repository/bioperl CVS password: root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src $ cd bioperl-live root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src/bioperl-live $ cvs -q update -d -P -r bioperl-release-1-5-2 P Build.PL P ModuleBuildBioperl.pm P Bio/Root/Version.pm cvs update: warning: t/data/taxdump/names.dmp was lost U t/data/taxdump/names.dmp cvs update: warning: t/data/taxdump/nodes.dmp was lost U t/data/taxdump/nodes.dmp root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src/bioperl-live $ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' 1.0050021 Why is the VERSION 1.0050021 rather than 1.5.2 ? Dave From heikki at sanbi.ac.za Wed Apr 16 07:36:16 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 16 Apr 2008 13:36:16 +0200 Subject: [Bioperl-l] bioperl-microarray: status? In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> Message-ID: <200804161336.16879.heikki@sanbi.ac.za> FYI, Christoper Jones has just published [http://bioinformatics.oxfordjournals.org/cgi/content/short/24/8/1102 an article in Bioinformatics] about his [http://search.cpan.org/perldoc?Microarray Microarray perl module] in CPAN. (The text added into BioPerl wiki.) -Heikki On Friday 26 January 2007 16:05:01 Chris Fields wrote: > Don't know if it's worth it, but could the microarray package be > modified so that it deals with data generated from or interacts > directly with Bioconductor (i.e. maybe including some specialized > bioperl-run set of classes to run Bioconductor tasks, return > lightweight bioperl microarray classes)? Allen pointed out in a > previous post that Bioconductor is the best pick for certain tasks, > while Perl excels at others: > > http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 > > Might be nice if we could merge both strengths together in some way. > > chris > > On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: > > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: > >> Eh, there is some discussion activity on the list, but not much. You > >> are really better off moving to Bioconductor. > > > > Ok, thanks. I added that to the wiki page: > > > > http://www.bioperl.org/wiki/Microarray_package > > > > j > > seqlab.net > > http://www.bioperl.org/wiki/User:Jhannah > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Wed Apr 16 07:36:16 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 16 Apr 2008 13:36:16 +0200 Subject: [Bioperl-l] bioperl-microarray: status? In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> Message-ID: <200804161336.16879.heikki@sanbi.ac.za> FYI, Christoper Jones has just published [http://bioinformatics.oxfordjournals.org/cgi/content/short/24/8/1102 an article in Bioinformatics] about his [http://search.cpan.org/perldoc?Microarray Microarray perl module] in CPAN. (The text added into BioPerl wiki.) -Heikki On Friday 26 January 2007 16:05:01 Chris Fields wrote: > Don't know if it's worth it, but could the microarray package be > modified so that it deals with data generated from or interacts > directly with Bioconductor (i.e. maybe including some specialized > bioperl-run set of classes to run Bioconductor tasks, return > lightweight bioperl microarray classes)? Allen pointed out in a > previous post that Bioconductor is the best pick for certain tasks, > while Perl excels at others: > > http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 > > Might be nice if we could merge both strengths together in some way. > > chris > > On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: > > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: > >> Eh, there is some discussion activity on the list, but not much. You > >> are really better off moving to Bioconductor. > > > > Ok, thanks. I added that to the wiki page: > > > > http://www.bioperl.org/wiki/Microarray_package > > > > j > > seqlab.net > > http://www.bioperl.org/wiki/User:Jhannah > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From pan.mueller at yahoo.de Wed Apr 16 08:34:51 2008 From: pan.mueller at yahoo.de (=?iso-8859-1?Q?Peter_M=FCller?=) Date: Wed, 16 Apr 2008 12:34:51 +0000 (GMT) Subject: [Bioperl-l] load_seqdatabase.pl --pipeline Message-ID: <297809.47580.qm@web28203.mail.ukl.yahoo.com> Dear list, a want to add gene symbols to unigene-cluster which were in a biosql database and lacks this information. So one way is to make a post-update script: my $adp = $db->get_object_adaptor('Bio::ClusterI'); my $pseq = $adp->find_by_primary_key(n); $adp->remove($pseq); $pseq->gene('symbol'); $adp->store($pseq); $adp->commit(); O.k., this works (I ask me why to remove the cluster first - bug or feature...?) Second way - perhaps: Using the --pipeline option, but it looks like useable only for seq-objects (Bio::Factory::SeqProcessoI) right? regards pan Machen Sie Yahoo! zu Ihrer Startseite. Los geht's: http://de.yahoo.com/set From cjfields at uiuc.edu Wed Apr 16 11:00:51 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 16 Apr 2008 10:00:51 -0500 Subject: [Bioperl-l] bioperl-microarray: status? In-Reply-To: <200804161336.16879.heikki@sanbi.ac.za> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: <479BD5A4-9C9A-4733-889D-65942F24A7F3@uiuc.edu> That would be worth looking into at some point, if anyone's interested (though it may be best to build a 'bridging' module). Wonder if it uses BioConductor and, if not, how performance is vs BioConductor? chris On Apr 16, 2008, at 6:36 AM, Heikki Lehvaslaiho wrote: > FYI, > > Christoper Jones has just published > [http://bioinformatics.oxfordjournals.org/cgi/content/short/ > 24/8/1102 an > article in Bioinformatics] about his > [http://search.cpan.org/perldoc?Microarray Microarray perl module] > in CPAN. > > (The text added into BioPerl wiki.) > > -Heikki > > > On Friday 26 January 2007 16:05:01 Chris Fields wrote: >> Don't know if it's worth it, but could the microarray package be >> modified so that it deals with data generated from or interacts >> directly with Bioconductor (i.e. maybe including some specialized >> bioperl-run set of classes to run Bioconductor tasks, return >> lightweight bioperl microarray classes)? Allen pointed out in a >> previous post that Bioconductor is the best pick for certain tasks, >> while Perl excels at others: >> >> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 >> >> Might be nice if we could merge both strengths together in some way. >> >> chris >> >> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: >>> On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >>>> Eh, there is some discussion activity on the list, but not much. >>>> You >>>> are really better off moving to Bioconductor. >>> >>> Ok, thanks. I added that to the wiki page: >>> >>> http://www.bioperl.org/wiki/Microarray_package >>> >>> j >>> seqlab.net >>> http://www.bioperl.org/wiki/User:Jhannah >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From j-keller2 at md.northwestern.edu Wed Apr 16 12:12:27 2008 From: j-keller2 at md.northwestern.edu (Jacob Keller) Date: Wed, 16 Apr 2008 11:12:27 -0500 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: <200804161336.16879.heikki@sanbi.ac.za> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: Hello All, I am new to this list, so am not totally sure this is the right forum, so please forgive if this is not the right place to asl the following question: I am seeking to get all sequences that have a given domain architecture, or at least that contain two given domains. I have thought of a few ways to do this. 1. Blast/Psi-blast for each domain, then compare the results for common sequences between the two lists, and fetch those. I would need to write a (simple) script to do this, but would prefer not to re-invent the wheel. 2. Search with a paradigm sequence of desired architecture/domain composition, somehow tweaking the psiblast parameters to find only matches over the whole search sequence, thereby finding both desired domains. I am not sure how to tweak blast to do this, though. 3. Pfam has this capability, i.e. to show all domains with a given architecture, but it is difficult to get at the actual sequences or even a list of accession numbers. Does anybody have any suggestions as to how optimally to get these seq's? Thanks for your consideration, Jacob ******************************************* Jacob Pearson Keller Northwestern University Medical Scientist Training Program Dallos Laboratory F. Searle 1-240 2240 Campus Drive Evanston IL 60208 lab: 847.491.2438 cel: 773.608.9185 email: j-keller2 at northwestern.edu ******************************************* ----- Original Message ----- From: "Heikki Lehvaslaiho" To: Cc: ; "Chris Fields" ; "Jay Hannah" ; Sent: Wednesday, April 16, 2008 6:36 AM Subject: Re: [Bioperl-l] bioperl-microarray: status? > FYI, > > Christoper Jones has just published > [http://bioinformatics.oxfordjournals.org/cgi/content/short/24/8/1102 an > article in Bioinformatics] about his > [http://search.cpan.org/perldoc?Microarray Microarray perl module] in > CPAN. > > (The text added into BioPerl wiki.) > > -Heikki > > > On Friday 26 January 2007 16:05:01 Chris Fields wrote: >> Don't know if it's worth it, but could the microarray package be >> modified so that it deals with data generated from or interacts >> directly with Bioconductor (i.e. maybe including some specialized >> bioperl-run set of classes to run Bioconductor tasks, return >> lightweight bioperl microarray classes)? Allen pointed out in a >> previous post that Bioconductor is the best pick for certain tasks, >> while Perl excels at others: >> >> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 >> >> Might be nice if we could merge both strengths together in some way. >> >> chris >> >> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: >> > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >> >> Eh, there is some discussion activity on the list, but not much. You >> >> are really better off moving to Bioconductor. >> > >> > Ok, thanks. I added that to the wiki page: >> > >> > http://www.bioperl.org/wiki/Microarray_package >> > >> > j >> > seqlab.net >> > http://www.bioperl.org/wiki/User:Jhannah >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From j-keller2 at md.northwestern.edu Wed Apr 16 12:12:27 2008 From: j-keller2 at md.northwestern.edu (Jacob Keller) Date: Wed, 16 Apr 2008 11:12:27 -0500 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: <200804161336.16879.heikki@sanbi.ac.za> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: Hello All, I am new to this list, so am not totally sure this is the right forum, so please forgive if this is not the right place to asl the following question: I am seeking to get all sequences that have a given domain architecture, or at least that contain two given domains. I have thought of a few ways to do this. 1. Blast/Psi-blast for each domain, then compare the results for common sequences between the two lists, and fetch those. I would need to write a (simple) script to do this, but would prefer not to re-invent the wheel. 2. Search with a paradigm sequence of desired architecture/domain composition, somehow tweaking the psiblast parameters to find only matches over the whole search sequence, thereby finding both desired domains. I am not sure how to tweak blast to do this, though. 3. Pfam has this capability, i.e. to show all domains with a given architecture, but it is difficult to get at the actual sequences or even a list of accession numbers. Does anybody have any suggestions as to how optimally to get these seq's? Thanks for your consideration, Jacob ******************************************* Jacob Pearson Keller Northwestern University Medical Scientist Training Program Dallos Laboratory F. Searle 1-240 2240 Campus Drive Evanston IL 60208 lab: 847.491.2438 cel: 773.608.9185 email: j-keller2 at northwestern.edu ******************************************* ----- Original Message ----- From: "Heikki Lehvaslaiho" To: Cc: ; "Chris Fields" ; "Jay Hannah" ; Sent: Wednesday, April 16, 2008 6:36 AM Subject: Re: [Bioperl-l] bioperl-microarray: status? > FYI, > > Christoper Jones has just published > [http://bioinformatics.oxfordjournals.org/cgi/content/short/24/8/1102 an > article in Bioinformatics] about his > [http://search.cpan.org/perldoc?Microarray Microarray perl module] in > CPAN. > > (The text added into BioPerl wiki.) > > -Heikki > > > On Friday 26 January 2007 16:05:01 Chris Fields wrote: >> Don't know if it's worth it, but could the microarray package be >> modified so that it deals with data generated from or interacts >> directly with Bioconductor (i.e. maybe including some specialized >> bioperl-run set of classes to run Bioconductor tasks, return >> lightweight bioperl microarray classes)? Allen pointed out in a >> previous post that Bioconductor is the best pick for certain tasks, >> while Perl excels at others: >> >> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 >> >> Might be nice if we could merge both strengths together in some way. >> >> chris >> >> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: >> > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >> >> Eh, there is some discussion activity on the list, but not much. You >> >> are really better off moving to Bioconductor. >> > >> > Ok, thanks. I added that to the wiki page: >> > >> > http://www.bioperl.org/wiki/Microarray_package >> > >> > j >> > seqlab.net >> > http://www.bioperl.org/wiki/User:Jhannah >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From frederic.romagne at gmail.com Wed Apr 16 13:25:18 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Wed, 16 Apr 2008 12:25:18 -0500 Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix Message-ID: <1208366718.19084.15.camel@kiss-laptop> Hello, i made a program which use Bio::Index::GenBank and i tested it under unix, that worked well. But i have to launch it under windows and it seems not to work on. Here is the problem : my $dbobj = Bio::Index::Abstract->new("Data/$db"); ?my $seq = $dbobj->get_Seq_by_acc($id); print $seq->display_id."\n"; did not print the same number than $id !!! So i don't work on the sequence expected... I use the SVN sources on unix and the Perl package manager for windows... Thanks. From cjfields at uiuc.edu Wed Apr 16 13:52:59 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 16 Apr 2008 12:52:59 -0500 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: You can try CDART: http://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi?cmd=rps There are probably other tools out there as well. If you want to roll your own, you can use bioperl wrappers for all of these (Bio::Tools::Run::StandAloneBlast is in bioperl-live, Bio::Tools::Run::Hmmer in bioperl-run), tweaking the parameters as you see fit, and either parse while running them or store the file for parsing later using Bio::SearchIO. Personally, I wouldn't go with (2) unless you are absolutely sure the domains are found only once per sequence, are spatially conserved, and don't overlap. For instance, with many proteins you could have a domain structure like dom1-dom2, dom2-dom1, dom1-dom1-dom2, etc. If you just want accessions from Pfam's Stockholm format (which are UniProt, I believe) you can get at accessions using Bio::AlignIO::stockholm (using perl 5.10): use Bio::AlignIO; use feature 'say'; my $file = shift || die "Must pass file as argument\n"; my $in = Bio::AlignIO->new(-format => 'stockholm', -file => $file); while (my $aln = $in->next_aln) { my @accs; for my $seq ($aln->each_seq) { push @accs, $seq->accession_number; } say join(',', at accs); } chris On Apr 16, 2008, at 11:12 AM, Jacob Keller wrote: > Hello All, > > I am new to this list, so am not totally sure this is the right > forum, so please forgive if this is not the right place to asl the > following question: I am seeking to get all sequences that have a > given domain architecture, or at least that contain two given > domains. I have thought of a few ways to do this. > > 1. Blast/Psi-blast for each domain, then compare the results for > common sequences between the two lists, and fetch those. I would > need to write a (simple) script to do this, but would prefer not to > re-invent the wheel. > > 2. Search with a paradigm sequence of desired architecture/domain > composition, somehow tweaking the psiblast parameters to find only > matches over the whole search sequence, thereby finding both desired > domains. I am not sure how to tweak blast to do this, though. > > 3. Pfam has this capability, i.e. to show all domains with a given > architecture, but it is difficult to get at the actual sequences or > even a list of accession numbers. > > Does anybody have any suggestions as to how optimally to get these > seq's? > > Thanks for your consideration, > > Jacob > > ******************************************* > Jacob Pearson Keller > Northwestern University > Medical Scientist Training Program > Dallos Laboratory > F. Searle 1-240 > 2240 Campus Drive > Evanston IL 60208 > lab: 847.491.2438 > cel: 773.608.9185 > email: j-keller2 at northwestern.edu > ******************************************* > > ----- Original Message ----- From: "Heikki Lehvaslaiho" > > To: > Cc: ; "Chris Fields" ; "Jay > Hannah" ; > Sent: Wednesday, April 16, 2008 6:36 AM > Subject: Re: [Bioperl-l] bioperl-microarray: status? > > >> FYI, >> >> Christoper Jones has just published >> [http://bioinformatics.oxfordjournals.org/cgi/content/short/ >> 24/8/1102 an >> article in Bioinformatics] about his >> [http://search.cpan.org/perldoc?Microarray Microarray perl module] >> in CPAN. >> >> (The text added into BioPerl wiki.) >> >> -Heikki >> >> >> On Friday 26 January 2007 16:05:01 Chris Fields wrote: >>> Don't know if it's worth it, but could the microarray package be >>> modified so that it deals with data generated from or interacts >>> directly with Bioconductor (i.e. maybe including some specialized >>> bioperl-run set of classes to run Bioconductor tasks, return >>> lightweight bioperl microarray classes)? Allen pointed out in a >>> previous post that Bioconductor is the best pick for certain tasks, >>> while Perl excels at others: >>> >>> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 >>> >>> Might be nice if we could merge both strengths together in some way. >>> >>> chris >>> >>> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: >>> > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >>> >> Eh, there is some discussion activity on the list, but not >>> much. You >>> >> are really better off moving to Bioconductor. >>> > >>> > Ok, thanks. I added that to the wiki page: >>> > >>> > http://www.bioperl.org/wiki/Microarray_package >>> > >>> > j >>> > seqlab.net >>> > http://www.bioperl.org/wiki/User:Jhannah >>> > >>> > _______________________________________________ >>> > Bioperl-l mailing list >>> > Bioperl-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ______ _/ _/ >> _____________________________________________________ >> _/ _/ >> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >> _/ _/ _/ SANBI, South African National Bioinformatics Institute >> _/ _/ _/ University of Western Cape, South Africa >> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >> ___ _/_/_/_/_/ >> ________________________________________________________ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From David.Messina at sbc.su.se Wed Apr 16 14:23:27 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 16 Apr 2008 20:23:27 +0200 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: <628aabb70804161123s453bd96bqd2213b938dfdb3a2@mail.gmail.com> Hey Jacob, This forum is mostly geared toward the BioPerl software package rather than general bioinformatics assistance. That being said, I would recommend using Pfam's Sequence Search to determine the domain content of your sequences and then simply looking at those which have the same two domains of interest. If there are more sequences matching this criterion than can be examined manually, you could write up something (potentially using BioPerl) to then look at the relative order and number of those domains in your sequences. However, if these sequences have UniProt IDs, you can start with the domains and Pfam will hand you a list of all the UniProt seqs having those domains. On the Pfam website's main page, click on "Help" (right side of menu at the top of the page) and then "Tools and Services" (left side menu). Dave From Russell.Smithies at agresearch.co.nz Wed Apr 16 16:49:49 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 17 Apr 2008 08:49:49 +1200 Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix In-Reply-To: <1208366718.19084.15.camel@kiss-laptop> References: <1208366718.19084.15.camel@kiss-laptop> Message-ID: Did you check the format of your input file? i.e. DOS or UNIX line endings? > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Fr?d?ric Romagn? > Sent: Thursday, 17 April 2008 5:25 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix > > Hello, > i made a program which use Bio::Index::GenBank and i tested it under > unix, that worked well. > > But i have to launch it under windows and it seems not to work on. > > Here is the problem : > > my $dbobj = Bio::Index::Abstract->new("Data/$db"); > ?my $seq = $dbobj->get_Seq_by_acc($id); > print $seq->display_id."\n"; > > did not print the same number than $id !!! So i don't work on the > sequence expected... > > I use the SVN sources on unix and the Perl package manager for > windows... > > Thanks. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From frederic.romagne at gmail.com Wed Apr 16 17:39:07 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Wed, 16 Apr 2008 16:39:07 -0500 Subject: [Bioperl-l] index::abstract on win and unix In-Reply-To: References: <1208366718.19084.15.camel@kiss-laptop> Message-ID: <1208381947.16620.6.camel@kiss-laptop> Well, if with input file you mean the database used, it's created with ?Bio::Index::GenBank from a ncbi FTP's genbank file. $id is an accession number read from a file but i chomp the line... I am trying to install the svn version of bioperl under windows to see if there is an improvement. Le jeudi 17 avril 2008 ? 08:49 +1200, Smithies, Russell a ?crit : > Did you check the format of your input file? > i.e. DOS or UNIX line endings? > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > > bio.org] On Behalf Of Fr?d?ric Romagn? > > Sent: Thursday, 17 April 2008 5:25 a.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix > > > > Hello, > > i made a program which use Bio::Index::GenBank and i tested it under > > unix, that worked well. > > > > But i have to launch it under windows and it seems not to work on. > > > > Here is the problem : > > > > my $dbobj = Bio::Index::Abstract->new("Data/$db"); > > ?my $seq = $dbobj->get_Seq_by_acc($id); > > print $seq->display_id."\n"; > > > > did not print the same number than $id !!! So i don't work on the > > sequence expected... > > > > I use the SVN sources on unix and the Perl package manager for > > windows... > > > > Thanks. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= From hubert.gaynor at yahoo.com Thu Apr 17 02:19:11 2008 From: hubert.gaynor at yahoo.com (Hubert Gaynor) Date: Wed, 16 Apr 2008 23:19:11 -0700 (PDT) Subject: [Bioperl-l] Can I use BLAST against a database like MySQL Message-ID: <657734.41592.qm@web46008.mail.sp1.yahoo.com> Hi, As far as I know, before using BLAST to do the alignment the first thing should be done is typing formatdb to construct a database. But I was wondering whether it is possible to construct a database with MySQL which probably will grant the BLAST search a higher speed and make the database management much easier? Thanks! Hubert. ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From sdavis2 at mail.nih.gov Thu Apr 17 06:36:32 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 17 Apr 2008 06:36:32 -0400 Subject: [Bioperl-l] Can I use BLAST against a database like MySQL In-Reply-To: <657734.41592.qm@web46008.mail.sp1.yahoo.com> References: <657734.41592.qm@web46008.mail.sp1.yahoo.com> Message-ID: <264855a00804170336o2a2bcff9xfcb05a33bac4c8dc@mail.gmail.com> On Thu, Apr 17, 2008 at 2:19 AM, Hubert Gaynor wrote: > Hi, > > As far as I know, before using BLAST to do the alignment the first thing should be done is typing formatdb to construct a database. But I was wondering whether it is possible to construct a database with MySQL which probably will grant the BLAST search a higher speed and make the database management much easier? > formatdb is used to make a representation that can be used efficiently by blast. That representation already makes blast faster. MySQL can't be used for such things. As for speeding blast, if you have a multiprocessor machine, you can take advantage of those using blast and increasing the number of processors. Also, while blast is a very versatile program, it is not the only alignment program available. Depending on your needs, you could look at other programs such as blat or gmap that can be 2-3 orders of magnitude faster than blast. Sean From stefan.kirov at bms.com Thu Apr 17 09:40:29 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 09:40:29 -0400 Subject: [Bioperl-l] bioperl-db woes Message-ID: <4807534D.80105@bms.com> I'm having problems passing all the tests for bioperl-db. There are 2 distinct errors, first one: Can't locate Bio/DB/BioSQL/RichSeqAdaptor.pm ***Which by the way is embed deep into several layers of eval, so I am getting the actual error from the test: ***t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 78. or ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Annotation of class Bio::Annotation::Collection not type-mapped. Internal error? STACK: Error::throw STACK: Bio::Root::Root::throw /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store Bio/DB/Persistent/PersistentObject.pm:271 STACK: Bio::DB::BioSQL::SeqAdaptor::store_children Bio/DB/BioSQL/SeqAdaptor.pm:224 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK: Bio::DB::Persistent::PersistentObject::create Bio/DB/Persistent/PersistentObject.pm:244 STACK: t/04swiss.t:36 ----------------------------------------------------------- It turns out the adaptor is really not there??? My bioperl-db is from dev.open-bio.org/home/svn-repositories/bioperl/bioperl-db/trunk bioperl-db (revision 14661) Is this module being deprecated (I am sure it is not) my download incomplete....? The other problem was: DBD::Oracle::st execute failed: ORA-02292: integrity constraint (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with ParamValues: :p1=9606] at /home/kirovs/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 320. not ok 76 # Test 76 got: (t/02species.t at line 71) I have not tried to debug this one.... Thanks! Stefan From stefan.kirov at bms.com Thu Apr 17 10:18:30 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 10:18:30 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> Message-ID: On Thu, 17 Apr 2008, Chris Fields wrote: > The 'get_dbxrefs' problem looks related to recent changes I made when rolling > back the significant feature/annotation changes introduced just prior to the > 1.5 release, none which were fully implemented. I can check that one out. > Odd though; these passed for me, but I'm using MySQL not oracle. get_dbxref is not the problem- I think the error message is misleading: kirovs at horta:~/bioperl-db> grep get_dbxrefs /home/kirovs/bioperl-live/Bio/Ontology/Term.pm get_dbxrefs() instead, which handles both strings and DBLink "Use get_dbxrefs() instead"); $self->get_dbxrefs($context); =head2 get_dbxrefs Title : get_dbxrefs() Usage : @ds = $term->get_dbxrefs(); sub get_dbxrefs { } # get_dbxrefs my @old = $self->get_dbxrefs($context); sub each_dblink {shift->throw("use of each_dblink() is deprecated; use get_dbxrefs() instead")} So it is there. In any case I debugged and tracked that down to the RichSeq adaptor module missing. It is not in the distro I downloaded, so I think this is my problem. It is a different question why... I looked at different repos (SVN, CVS, trunk, different tags) and I did not see RichSeq.pm. I am not sure what is going on. Perhaps Hilmar will be able to help when he is around. Thanks for the help Chris.... Stefan > > You may want to make sure you are using bioperl-live and that there isn't an > older bioperl installation getting into the mix. > > chris > > On Apr 17, 2008, at 8:40 AM, Stefan Kirov wrote: > >> I'm having problems passing all the tests for bioperl-db. There are 2 >> distinct errors, first one: >> Can't locate Bio/DB/BioSQL/RichSeqAdaptor.pm >> ***Which by the way is embed deep into several layers of eval, so I >> am getting the actual error from the test: >> ***t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" >> via package "Bio::Ontology::Term" at >> >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 78. >> or >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> >> MSG: Annotation of class Bio::Annotation::Collection not >> type-mapped. Internal error? >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >> STACK: >> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >> STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store >> Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >> Bio/DB/BioSQL/SeqAdaptor.pm:224 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::Persistent::PersistentObject::create >> Bio/DB/Persistent/PersistentObject.pm:244 >> STACK: t/04swiss.t:36 >> ----------------------------------------------------------- >> >> It turns out the adaptor is really not there??? >> My bioperl-db is from >> dev.open-bio.org/home/svn-repositories/bioperl/bioperl-db/trunk >> bioperl-db (revision 14661) >> Is this module being deprecated (I am sure it is not) my download >> incomplete....? >> The other problem was: >> DBD::Oracle::st execute failed: ORA-02292: integrity constraint >> (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: >> OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with >> ParamValues: :p1=9606] at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm >> line 320. >> not ok 76 >> # Test 76 got: (t/02species.t at line 71) >> I have not tried to debug this one.... >> Thanks! >> Stefan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > From cjfields at uiuc.edu Thu Apr 17 09:59:57 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 17 Apr 2008 08:59:57 -0500 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <4807534D.80105@bms.com> References: <4807534D.80105@bms.com> Message-ID: <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> The 'get_dbxrefs' problem looks related to recent changes I made when rolling back the significant feature/annotation changes introduced just prior to the 1.5 release, none which were fully implemented. I can check that one out. Odd though; these passed for me, but I'm using MySQL not oracle. You may want to make sure you are using bioperl-live and that there isn't an older bioperl installation getting into the mix. chris On Apr 17, 2008, at 8:40 AM, Stefan Kirov wrote: > I'm having problems passing all the tests for bioperl-db. There are 2 > distinct errors, first one: > Can't locate Bio/DB/BioSQL/RichSeqAdaptor.pm > ***Which by the way is embed deep into several layers of eval, so I > am getting the actual error from the test: > ***t/04swiss.........ok 3/52Can't locate object method > "get_dbxrefs" > via package "Bio::Ontology::Term" at > > /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm > line 552, line 78. > or > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Annotation of class Bio::Annotation::Collection not > type-mapped. Internal error? > STACK: Error::throw > STACK: Bio::Root::Root::throw > /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 > STACK: > Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key > Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 > STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children > Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > STACK: Bio::DB::Persistent::PersistentObject::store > Bio/DB/Persistent/PersistentObject.pm:271 > STACK: Bio::DB::BioSQL::SeqAdaptor::store_children > Bio/DB/BioSQL/SeqAdaptor.pm:224 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK: Bio::DB::Persistent::PersistentObject::create > Bio/DB/Persistent/PersistentObject.pm:244 > STACK: t/04swiss.t:36 > ----------------------------------------------------------- > > It turns out the adaptor is really not there??? > My bioperl-db is from > dev.open-bio.org/home/svn-repositories/bioperl/bioperl-db/trunk > bioperl-db (revision 14661) > Is this module being deprecated (I am sure it is not) my download > incomplete....? > The other problem was: > DBD::Oracle::st execute failed: ORA-02292: integrity constraint > (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: > OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with > ParamValues: :p1=9606] at > /home/kirovs/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm > line 320. > not ok 76 > # Test 76 got: (t/02species.t at line 71) > I have not tried to debug this one.... > Thanks! > Stefan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Thu Apr 17 10:52:32 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 10:52:32 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <9ECDEB39-95F3-4A94-9AF7-FFEBBDEFF0FA@gmx.net> References: <4807534D.80105@bms.com> <9ECDEB39-95F3-4A94-9AF7-FFEBBDEFF0FA@gmx.net> Message-ID: That is correct and I assumed I should not be concerned with this error. Thanks Stefan On Thu, 17 Apr 2008, Hilmar Lapp wrote: > > On Apr 17, 2008, at 9:40 AM, Stefan Kirov wrote: >> The other problem was: >> DBD::Oracle::st execute failed: ORA-02292: integrity constraint >> (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: >> OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with >> ParamValues: :p1=9606] at > > > This sounds like you are running the tests against a non-empty database? > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > From hlapp at gmx.net Thu Apr 17 10:47:58 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 17 Apr 2008 10:47:58 -0400 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> Message-ID: <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: > In any case I debugged and tracked that down to the RichSeq adaptor > module missing. That almost can't be the problem. Every Bio::Seq::RichSeq is-a Bio::Seq and a SeqAdaptor is present. I'm afraid it gets stuck somewhere else and frankly I didn't see the RichSeqAdaptor failing to load in your stack trace: > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Annotation of class Bio::Annotation::Collection not > type-mapped. Internal error? > STACK: Error::throw > STACK: Bio::Root::Root::throw > /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 > STACK: > Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key > Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 > STACK: > Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children > Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > STACK: Bio::DB::Persistent::PersistentObject::store > Bio/DB/Persistent/PersistentObject.pm:271 > STACK: Bio::DB::BioSQL::SeqAdaptor::store_children > Bio/DB/BioSQL/SeqAdaptor.pm:224 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK: Bio::DB::Persistent::PersistentObject::create > Bio/DB/Persistent/PersistentObject.pm:244 > STACK: t/04swiss.t:36 > ----------------------------------------------------------- What that tells me is that when bioperl-db tries to store the annotation bundle of the (SwissProt) sequence, one of the annotations that it encounters is of type Bio::Annotation::Collection. At present bioperl-db doesn't know what to do with it; i.e., bioperl-db can't yet handle hierarchical annotation collections (collections within collections). I believe this is due to recent changes in how the GN line is parsed in BioPerl - Chris does this ring the right bell? I thought though you had built in a method would allow flattening out? It's worth noting that BioSQL itself can't really represent nested annotation collections other than by using ontology terms and their hierarchy, which at present I think isn't really appropriate, but I have to think through the issue more. In other words, in BioSQL you can't directly tie together a bunch of qualifier value pairs into a "bag" and then nest this bag within another. The way to make this work with the current schema is to flatten out the nesting. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Apr 17 10:48:52 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 17 Apr 2008 10:48:52 -0400 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <4807534D.80105@bms.com> References: <4807534D.80105@bms.com> Message-ID: <9ECDEB39-95F3-4A94-9AF7-FFEBBDEFF0FA@gmx.net> On Apr 17, 2008, at 9:40 AM, Stefan Kirov wrote: > The other problem was: > DBD::Oracle::st execute failed: ORA-02292: integrity constraint > (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: > OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with > ParamValues: :p1=9606] at This sounds like you are running the tests against a non-empty database? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From stefan.kirov at bms.com Thu Apr 17 11:28:42 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 11:28:42 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: Hilmar, I think I saw what happens with this adaptor- In Bio::DB::BioSQL::DBAdaptor::_load_object_adaptor (call from create_persistent) there is request that this module is loaded: Bio/DB/BioSQL/RichSeqAdaptor.pm There is no such module... This always fails, but since it is evaled, there is no actual error- instead. Perhaps this is leftover...? This got me fooled... I guess Chris could be right- Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key is being passed Bio::Annotation::Collection as a value for $obj->obj(). Or recursing too far? Anyway, I am just guessing here- I do not know the architecture of bioperl-db... Thanks again for the help... Stefan On Thu, 17 Apr 2008, Hilmar Lapp wrote: > > On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >> In any case I debugged and tracked that down to the RichSeq adaptor module >> missing. > > > That almost can't be the problem. Every Bio::Seq::RichSeq is-a Bio::Seq and a > SeqAdaptor is present. > > I'm afraid it gets stuck somewhere else and frankly I didn't see the > RichSeqAdaptor failing to load in your stack trace: > >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> >> MSG: Annotation of class Bio::Annotation::Collection not >> type-mapped. Internal error? >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >> STACK: >> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >> STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store >> Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >> Bio/DB/BioSQL/SeqAdaptor.pm:224 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::Persistent::PersistentObject::create >> Bio/DB/Persistent/PersistentObject.pm:244 >> STACK: t/04swiss.t:36 >> ----------------------------------------------------------- > > What that tells me is that when bioperl-db tries to store the annotation > bundle of the (SwissProt) sequence, one of the annotations that it encounters > is of type Bio::Annotation::Collection. At present bioperl-db doesn't know > what to do with it; i.e., bioperl-db can't yet handle hierarchical annotation > collections (collections within collections). > > I believe this is due to recent changes in how the GN line is parsed in > BioPerl - Chris does this ring the right bell? I thought though you had built > in a method would allow flattening out? > > It's worth noting that BioSQL itself can't really represent nested annotation > collections other than by using ontology terms and their hierarchy, which at > present I think isn't really appropriate, but I have to think through the > issue more. In other words, in BioSQL you can't directly tie together a bunch > of qualifier value pairs into a "bag" and then nest this bag within another. > The way to make this work with the current schema is to flatten out the > nesting. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at uiuc.edu Thu Apr 17 12:26:41 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 17 Apr 2008 11:26:41 -0500 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: On Apr 17, 2008, at 9:47 AM, Hilmar Lapp wrote: > > On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >> In any case I debugged and tracked that down to the RichSeq adaptor >> module missing. > > > That almost can't be the problem. Every Bio::Seq::RichSeq is-a > Bio::Seq and a SeqAdaptor is present. > > I'm afraid it gets stuck somewhere else and frankly I didn't see the > RichSeqAdaptor failing to load in your stack trace: > >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> >> MSG: Annotation of class Bio::Annotation::Collection not >> type-mapped. Internal error? >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >> STACK: >> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >> STACK: >> Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store >> Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >> Bio/DB/BioSQL/SeqAdaptor.pm:224 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::Persistent::PersistentObject::create >> Bio/DB/Persistent/PersistentObject.pm:244 >> STACK: t/04swiss.t:36 >> ----------------------------------------------------------- > > What that tells me is that when bioperl-db tries to store the > annotation bundle of the (SwissProt) sequence, one of the > annotations that it encounters is of type > Bio::Annotation::Collection. At present bioperl-db doesn't know what > to do with it; i.e., bioperl-db can't yet handle hierarchical > annotation collections (collections within collections). > > I believe this is due to recent changes in how the GN line is parsed > in BioPerl - Chris does this ring the right bell? I thought though > you had built in a method would allow flattening out This appears to be using an older bioperl-live checkout, one where Heikki changed GN parsing to use a nested Annotation::Collection. I reverted that back in a later commit to svn specifically b/c of bioperl-db problems. bioperl-live's swiss.pm now uses a new subclass of Bio::Annotation::SimpleValue (Bio::Annotation::TagTree) that represents nested values via Data::Stag's itext output (we can change that to alternatives if needed). Here are the last few relevant revisions in bioperl-live's main trunk (mine is the latest): ------------------------------------------------------------------------ r14562 | cjfields | 2008-02-28 08:30:05 -0600 (Thu, 28 Feb 2008) | 1 line bug 1825: updating swiss.pm/tests to try out TagTree (passes all tests). Need to update Handler.t and related modules still... ------------------------------------------------------------------------ r14541 | heikki | 2008-02-25 00:10:48 -0600 (Mon, 25 Feb 2008) | 1 line documentation for the GN line parsing and management ------------------------------------------------------------------------ r14538 | heikki | 2008-02-23 08:48:23 -0600 (Sat, 23 Feb 2008) | 1 line GN (Gene Name) line parsing rewrite. Breaks backward compatibility. Can now deal with >1 gene per entry and four categories of names per gene. Parses old style syntax (...OR ... OR ... ) into one gene name and synonyms for each gene. Docs to follow. .... I just updated all code from dev and reran bioperl-db tests w/o problems. Maybe someone else could do the same to see what happens? > It's worth noting that BioSQL itself can't really represent nested > annotation collections other than by using ontology terms and their > hierarchy, which at present I think isn't really appropriate, but I > have to think through the issue more. In other words, in BioSQL you > can't directly tie together a bunch of qualifier value pairs into a > "bag" and then nest this bag within another. The way to make this > work with the current schema is to flatten out the nesting. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== Might be worth looking into for a future BioSQL release, but we have a decent workaround in place for now, as long as it works cross-platform and cross-RDB. chris From stefan.kirov at bms.com Thu Apr 17 12:40:14 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 12:40:14 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] bioperl-db woes In-Reply-To: References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: Hilmar, sorry, I missed the part after the stack trace... In any case this is still problem for me after I updated bioperl-live. I see this with a number of other tests: t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 78. t/04swiss.........dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 6-52 Failed 47/52 tests, 9.62% okay t/05seqfeature....ok 4/48Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 72. t/05seqfeature....FAILED tests 9-48 Failed 40/48 tests, 16.67% okay t/06comment.......ok t/07dblink........ok t/08genbank.......ok t/09fuzzy2........ok t/10ensembl.......ok 1/15Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1420. t/10ensembl.......dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 3-15 Failed 13/15 tests, 13.33% okay t/11locuslink.....ok 4/110Can't locate object method "get_dbxrefs" via package "Bio::Annotation::OntologyTerm" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1. t/11locuslink.....dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 5-110 Failed 106/110 tests, 3.64% okay t/12ontology......ok 1/738Can't locate object method "get_dbxrefs" via package "Bio::Ontology::GOterm" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 98. t/12ontology......dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 5-738 Failed 734/738 tests, 0.54% okay t/13remove........ok 2/59Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 145. t/13remove........FAILED tests 11-59 Failed 49/59 tests, 16.95% okay t/14query.........ok t/15cluster.......ok 3/160Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1. t/15cluster.......dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 6-160 Failed 155/160 tests, 3.12% okay t/16obda..........ok On Thu, 17 Apr 2008, Chris Fields wrote: > > On Apr 17, 2008, at 9:47 AM, Hilmar Lapp wrote: > >> >> On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >>> In any case I debugged and tracked that down to the RichSeq adaptor module >>> missing. >> >> >> That almost can't be the problem. Every Bio::Seq::RichSeq is-a Bio::Seq and >> a SeqAdaptor is present. >> >> I'm afraid it gets stuck somewhere else and frankly I didn't see the >> RichSeqAdaptor failing to load in your stack trace: >> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> >>> MSG: Annotation of class Bio::Annotation::Collection not >>> type-mapped. Internal error? >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw >>> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >>> STACK: >>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >>> STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >>> STACK: Bio::DB::Persistent::PersistentObject::store >>> Bio/DB/Persistent/PersistentObject.pm:271 >>> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >>> Bio/DB/BioSQL/SeqAdaptor.pm:224 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>> STACK: Bio::DB::Persistent::PersistentObject::create >>> Bio/DB/Persistent/PersistentObject.pm:244 >>> STACK: t/04swiss.t:36 >>> ----------------------------------------------------------- >> >> What that tells me is that when bioperl-db tries to store the annotation >> bundle of the (SwissProt) sequence, one of the annotations that it >> encounters is of type Bio::Annotation::Collection. At present bioperl-db >> doesn't know what to do with it; i.e., bioperl-db can't yet handle >> hierarchical annotation collections (collections within collections). >> >> I believe this is due to recent changes in how the GN line is parsed in >> BioPerl - Chris does this ring the right bell? I thought though you had >> built in a method would allow flattening out > > This appears to be using an older bioperl-live checkout, one where Heikki > changed GN parsing to use a nested Annotation::Collection. I reverted that > back in a later commit to svn specifically b/c of bioperl-db problems. > bioperl-live's swiss.pm now uses a new subclass of > Bio::Annotation::SimpleValue (Bio::Annotation::TagTree) that represents > nested values via Data::Stag's itext output (we can change that to > alternatives if needed). > > Here are the last few relevant revisions in bioperl-live's main trunk (mine > is the latest): > > ------------------------------------------------------------------------ > r14562 | cjfields | 2008-02-28 08:30:05 -0600 (Thu, 28 Feb 2008) | 1 line > > bug 1825: updating swiss.pm/tests to try out TagTree (passes all tests). > Need to update Handler.t and related modules still... > ------------------------------------------------------------------------ > r14541 | heikki | 2008-02-25 00:10:48 -0600 (Mon, 25 Feb 2008) | 1 line > > documentation for the GN line parsing and management > ------------------------------------------------------------------------ > r14538 | heikki | 2008-02-23 08:48:23 -0600 (Sat, 23 Feb 2008) | 1 line > > GN (Gene Name) line parsing rewrite. Breaks backward compatibility. Can now > deal with >1 gene per entry and four categories of names per gene. Parses old > style syntax (...OR ... OR ... ) into one gene name and synonyms for each > gene. Docs to follow. > > .... > > I just updated all code from dev and reran bioperl-db tests w/o problems. > Maybe someone else could do the same to see what happens? > >> It's worth noting that BioSQL itself can't really represent nested >> annotation collections other than by using ontology terms and their >> hierarchy, which at present I think isn't really appropriate, but I have to >> think through the issue more. In other words, in BioSQL you can't directly >> tie together a bunch of qualifier value pairs into a "bag" and then nest >> this bag within another. The way to make this work with the current schema >> is to flatten out the nesting. >> >> -hilmar >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== > > Might be worth looking into for a future BioSQL release, but we have a decent > workaround in place for now, as long as it works cross-platform and > cross-RDB. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Apr 17 13:06:39 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 17 Apr 2008 12:06:39 -0500 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: Stefan, 'get_dbxrefs' was introduced in bioperl-live a while back during the feature/annotation rollback detailed here: http://www.bioperl.org/wiki/Feature_Annotation_rollback I still think this is an interfering old bioperl (and maybe bioperl- db) installation causing the problems; I had similar issues at one point and had to find and remove the old installation. It might be worth (1) checking 'perldoc -l Bio::Root::Root', which will give the location of the Bio::Root::Root in lib path being used, and (2) using './Build install uninst=1' to remove any old bioperl/bioperl-db installations. chris On Apr 17, 2008, at 11:40 AM, Stefan Kirov wrote: > Hilmar, > sorry, I missed the part after the stack trace... In any case this > is still problem for me after I updated bioperl-live. > I see this with a number of other tests: > t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 78. > t/04swiss.........dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 6-52 > Failed 47/52 tests, 9.62% okay > t/05seqfeature....ok 4/48Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 72. > t/05seqfeature....FAILED tests 9-48 > Failed 40/48 tests, 16.67% okay > t/06comment.......ok > t/07dblink........ok > t/08genbank.......ok > t/09fuzzy2........ok > t/10ensembl.......ok 1/15Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1420. > t/10ensembl.......dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 3-15 > Failed 13/15 tests, 13.33% okay > t/11locuslink.....ok 4/110Can't locate object method "get_dbxrefs" > via package "Bio::Annotation::OntologyTerm" at /home/kirovs/bioperl- > db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, > line 1. > t/11locuslink.....dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 5-110 > Failed 106/110 tests, 3.64% okay > t/12ontology......ok 1/738Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::GOterm" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 98. > t/12ontology......dubious > Test returned status 255 (wstat 65280, 0xff00) > DIED. FAILED tests 5-738 > Failed 734/738 tests, 0.54% okay > t/13remove........ok 2/59Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 145. > t/13remove........FAILED tests 11-59 > Failed 49/59 tests, 16.95% okay > t/14query.........ok > t/15cluster.......ok 3/160Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1. > t/15cluster.......dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 6-160 > Failed 155/160 tests, 3.12% okay > t/16obda..........ok > > On Thu, 17 Apr 2008, Chris Fields wrote: > >> >> On Apr 17, 2008, at 9:47 AM, Hilmar Lapp wrote: >> >>> On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >>>> In any case I debugged and tracked that down to the RichSeq >>>> adaptor module missing. >>> That almost can't be the problem. Every Bio::Seq::RichSeq is-a >>> Bio::Seq and a SeqAdaptor is present. >>> I'm afraid it gets stuck somewhere else and frankly I didn't see >>> the RichSeqAdaptor failing to load in your stack trace: >>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> >>>> MSG: Annotation of class Bio::Annotation::Collection not >>>> type-mapped. Internal error? >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw >>>> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >>>> STACK: >>>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >>>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >>>> STACK: >>>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >>>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >>>> STACK: Bio::DB::Persistent::PersistentObject::store >>>> Bio/DB/Persistent/PersistentObject.pm:271 >>>> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >>>> Bio/DB/BioSQL/SeqAdaptor.pm:224 >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>>> STACK: Bio::DB::Persistent::PersistentObject::create >>>> Bio/DB/Persistent/PersistentObject.pm:244 >>>> STACK: t/04swiss.t:36 >>>> ----------------------------------------------------------- >>> What that tells me is that when bioperl-db tries to store the >>> annotation bundle of the (SwissProt) sequence, one of the >>> annotations that it encounters is of type >>> Bio::Annotation::Collection. At present bioperl-db doesn't know >>> what to do with it; i.e., bioperl-db can't yet handle hierarchical >>> annotation collections (collections within collections). >>> I believe this is due to recent changes in how the GN line is >>> parsed in BioPerl - Chris does this ring the right bell? I thought >>> though you had built in a method would allow flattening out >> >> This appears to be using an older bioperl-live checkout, one where >> Heikki changed GN parsing to use a nested Annotation::Collection. >> I reverted that back in a later commit to svn specifically b/c of >> bioperl-db problems. bioperl-live's swiss.pm now uses a new >> subclass of Bio::Annotation::SimpleValue (Bio::Annotation::TagTree) >> that represents nested values via Data::Stag's itext output (we can >> change that to alternatives if needed). >> >> Here are the last few relevant revisions in bioperl-live's main >> trunk (mine is the latest): >> >> ------------------------------------------------------------------------ >> r14562 | cjfields | 2008-02-28 08:30:05 -0600 (Thu, 28 Feb 2008) | >> 1 line >> >> bug 1825: updating swiss.pm/tests to try out TagTree (passes all >> tests). Need to update Handler.t and related modules still... >> ------------------------------------------------------------------------ >> r14541 | heikki | 2008-02-25 00:10:48 -0600 (Mon, 25 Feb 2008) | 1 >> line >> >> documentation for the GN line parsing and management >> ------------------------------------------------------------------------ >> r14538 | heikki | 2008-02-23 08:48:23 -0600 (Sat, 23 Feb 2008) | 1 >> line >> >> GN (Gene Name) line parsing rewrite. Breaks backward compatibility. >> Can now deal with >1 gene per entry and four categories of names >> per gene. Parses old style syntax (...OR ... OR ... ) into one gene >> name and synonyms for each gene. Docs to follow. >> >> .... >> >> I just updated all code from dev and reran bioperl-db tests w/o >> problems. Maybe someone else could do the same to see what happens? >> >>> It's worth noting that BioSQL itself can't really represent nested >>> annotation collections other than by using ontology terms and >>> their hierarchy, which at present I think isn't really >>> appropriate, but I have to think through the issue more. In other >>> words, in BioSQL you can't directly tie together a bunch of >>> qualifier value pairs into a "bag" and then nest this bag within >>> another. The way to make this work with the current schema is to >>> flatten out the nesting. >>> >>> -hilmar >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >> >> Might be worth looking into for a future BioSQL release, but we >> have a decent workaround in place for now, as long as it works >> cross-platform and cross-RDB. >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Thu Apr 17 13:52:22 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 13:52:22 -0400 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: <48078E56.9000404@bms.com> Chris Fields wrote: > Stefan, > > 'get_dbxrefs' was introduced in bioperl-live a while back during the > feature/annotation rollback detailed here: > > http://www.bioperl.org/wiki/Feature_Annotation_rollback > Chris was right! > I still think this is an interfering old bioperl (and maybe > bioperl-db) installation causing the problems; I had similar issues at > one point and had to find and remove the old installation. It might > be worth (1) checking 'perldoc -l Bio::Root::Root', This is the first thing I did and it seemed fine from command line. So I checked a new copy (vs. updating), set PERL5LIB to the minimum which is necessary (Build changes INC), which is /home/kirovs/bioperl-db/bplive:/stf/sysdev/perl/newlib/perl/lib/5.8/ia64-linux-multi/ (/home/kirovs/bioperl-db/bplive being the fresh copy and the other having Module::Build, etc., but definitely no bioperl). This fixed the problem. I still do not see where the old module came from, but that was a really good guess. Thanks Stefan > which will give the location of the Bio::Root::Root in lib path being > used, and (2) using './Build install uninst=1' to remove any old > bioperl/bioperl-db installations. Unfortunately this is not an option for me. > > chris > > On Apr 17, 2008, at 11:40 AM, Stefan Kirov wrote: > >> Hilmar, >> sorry, I missed the part after the stack trace... In any case this is >> still problem for me after I updated bioperl-live. >> I see this with a number of other tests: >> t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" via >> package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 78. >> t/04swiss.........dubious >> Test returned status 2 (wstat 512, 0x200) >> DIED. FAILED tests 6-52 >> Failed 47/52 tests, 9.62% okay >> t/05seqfeature....ok 4/48Can't locate object method "get_dbxrefs" via >> package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 72. >> t/05seqfeature....FAILED tests 9-48 >> Failed 40/48 tests, 16.67% okay >> t/06comment.......ok >> t/07dblink........ok >> t/08genbank.......ok >> t/09fuzzy2........ok >> t/10ensembl.......ok 1/15Can't locate object method "get_dbxrefs" via >> package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 1420. >> t/10ensembl.......dubious >> Test returned status 2 (wstat 512, 0x200) >> DIED. FAILED tests 3-15 >> Failed 13/15 tests, 13.33% okay >> t/11locuslink.....ok 4/110Can't locate object method "get_dbxrefs" >> via package "Bio::Annotation::OntologyTerm" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 1. >> t/11locuslink.....dubious >> Test returned status 2 (wstat 512, 0x200) >> DIED. FAILED tests 5-110 >> Failed 106/110 tests, 3.64% okay >> t/12ontology......ok 1/738Can't locate object method "get_dbxrefs" >> via package "Bio::Ontology::GOterm" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 98. >> t/12ontology......dubious >> Test returned status 255 (wstat 65280, 0xff00) >> DIED. FAILED tests 5-738 >> Failed 734/738 tests, 0.54% okay >> t/13remove........ok 2/59Can't locate object method "get_dbxrefs" via >> package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 145. >> t/13remove........FAILED tests 11-59 >> Failed 49/59 tests, 16.95% okay >> t/14query.........ok >> t/15cluster.......ok 3/160Can't locate object method "get_dbxrefs" >> via package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 1. >> t/15cluster.......dubious >> Test returned status 2 (wstat 512, 0x200) >> DIED. FAILED tests 6-160 >> Failed 155/160 tests, 3.12% okay >> t/16obda..........ok >> >> On Thu, 17 Apr 2008, Chris Fields wrote: >> >>> >>> On Apr 17, 2008, at 9:47 AM, Hilmar Lapp wrote: >>> >>>> On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >>>>> In any case I debugged and tracked that down to the RichSeq >>>>> adaptor module missing. >>>> That almost can't be the problem. Every Bio::Seq::RichSeq is-a >>>> Bio::Seq and a SeqAdaptor is present. >>>> I'm afraid it gets stuck somewhere else and frankly I didn't see >>>> the RichSeqAdaptor failing to load in your stack trace: >>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> >>>>> MSG: Annotation of class Bio::Annotation::Collection not >>>>> type-mapped. Internal error? >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw >>>>> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >>>>> STACK: >>>>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >>>>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >>>>> STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >>>>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >>>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >>>>> STACK: Bio::DB::Persistent::PersistentObject::store >>>>> Bio/DB/Persistent/PersistentObject.pm:271 >>>>> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >>>>> Bio/DB/BioSQL/SeqAdaptor.pm:224 >>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>>>> STACK: Bio::DB::Persistent::PersistentObject::create >>>>> Bio/DB/Persistent/PersistentObject.pm:244 >>>>> STACK: t/04swiss.t:36 >>>>> ----------------------------------------------------------- >>>> What that tells me is that when bioperl-db tries to store the >>>> annotation bundle of the (SwissProt) sequence, one of the >>>> annotations that it encounters is of type >>>> Bio::Annotation::Collection. At present bioperl-db doesn't know >>>> what to do with it; i.e., bioperl-db can't yet handle hierarchical >>>> annotation collections (collections within collections). >>>> I believe this is due to recent changes in how the GN line is >>>> parsed in BioPerl - Chris does this ring the right bell? I thought >>>> though you had built in a method would allow flattening out >>> >>> This appears to be using an older bioperl-live checkout, one where >>> Heikki changed GN parsing to use a nested Annotation::Collection. I >>> reverted that back in a later commit to svn specifically b/c of >>> bioperl-db problems. bioperl-live's swiss.pm now uses a new subclass >>> of Bio::Annotation::SimpleValue (Bio::Annotation::TagTree) that >>> represents nested values via Data::Stag's itext output (we can >>> change that to alternatives if needed). >>> >>> Here are the last few relevant revisions in bioperl-live's main >>> trunk (mine is the latest): >>> >>> ------------------------------------------------------------------------ >>> >>> r14562 | cjfields | 2008-02-28 08:30:05 -0600 (Thu, 28 Feb 2008) | 1 >>> line >>> >>> bug 1825: updating swiss.pm/tests to try out TagTree (passes all >>> tests). Need to update Handler.t and related modules still... >>> ------------------------------------------------------------------------ >>> >>> r14541 | heikki | 2008-02-25 00:10:48 -0600 (Mon, 25 Feb 2008) | 1 line >>> >>> documentation for the GN line parsing and management >>> ------------------------------------------------------------------------ >>> >>> r14538 | heikki | 2008-02-23 08:48:23 -0600 (Sat, 23 Feb 2008) | 1 line >>> >>> GN (Gene Name) line parsing rewrite. Breaks backward compatibility. >>> Can now deal with >1 gene per entry and four categories of names per >>> gene. Parses old style syntax (...OR ... OR ... ) into one gene name >>> and synonyms for each gene. Docs to follow. >>> >>> .... >>> >>> I just updated all code from dev and reran bioperl-db tests w/o >>> problems. Maybe someone else could do the same to see what happens? >>> >>>> It's worth noting that BioSQL itself can't really represent nested >>>> annotation collections other than by using ontology terms and their >>>> hierarchy, which at present I think isn't really appropriate, but I >>>> have to think through the issue more. In other words, in BioSQL you >>>> can't directly tie together a bunch of qualifier value pairs into a >>>> "bag" and then nest this bag within another. The way to make this >>>> work with the current schema is to flatten out the nesting. >>>> >>>> -hilmar >>>> --=========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>> >>> Might be worth looking into for a future BioSQL release, but we have >>> a decent workaround in place for now, as long as it works >>> cross-platform and cross-RDB. >>> >>> chris >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From hubert.gaynor at yahoo.com Thu Apr 17 20:53:16 2008 From: hubert.gaynor at yahoo.com (Hubert Gaynor) Date: Thu, 17 Apr 2008 17:53:16 -0700 (PDT) Subject: [Bioperl-l] Can I use BLAST against a database like MySQL Message-ID: <130971.67684.qm@web46007.mail.sp1.yahoo.com> Hi Sean, I got it. Thank you so much! Hubert ----- Original Message ---- From: Sean Davis To: Hubert Gaynor Sent: Thursday, April 17, 2008 6:36:02 PM Subject: Re: [Bioperl-l] Can I use BLAST against a database like MySQL On Thu, Apr 17, 2008 at 2:19 AM, Hubert Gaynor wrote: > Hi, > > As far as I know, before using BLAST to do the alignment the first thing should be done is typing formatdb to construct a database. But I was wondering whether it is possible to construct a database with MySQL which probably will grant the BLAST search a higher speed and make the database management much easier? > formatdb is used to make a representation that can be used efficiently by blast. That representation already makes blast faster. MySQL can't be used for such things. As for speeding blast, if you have a multiprocessor machine, you can take advantage of those using blast and increasing the number of processors. Also, while blast is a very versatile program, it is not the only alignment program available. Depending on your needs, you could look at other programs such as blat or gmap that can be 2-3 orders of magnitude faster than blast. Sean ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From Russell.Smithies at agresearch.co.nz Thu Apr 17 21:39:23 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 18 Apr 2008 13:39:23 +1200 Subject: [Bioperl-l] accessing params for custom glyphs? In-Reply-To: <130971.67684.qm@web46007.mail.sp1.yahoo.com> References: <130971.67684.qm@web46007.mail.sp1.yahoo.com> Message-ID: This is probably more of a Perl OO problem I'm having, but can anyone tell me how to access a parameter when I create a custom glyph? I've created a panel in the usual way then I add a feature with 'my_glyph' and want to pass the value of -new_parameter to the glyph drawing code. $panel->add_track( $feature, -font => gdSmallFont, -glyph => 'my_glyph' , -height => 10, -label => 1, -strand => "forward", -new_parameter => "test", In my_glyph.pm, I have the usual draw_component sub: sub draw_component { my $self = shift; my $gd = shift; my ($x1,$y1,$x2,$y2) = $self->bounds(@_); my $fg = $self->fgcolor; my $params = $self->?????????? <<--- how do I access the value of "new_parameter" set in the panel drawing code? $gd->line($x1,$y1,$x2,$y2,$fg); $gd->line($x1,$y2,$x2,$y1,$fg); } Any ideas? Thanx, Russell Smithies ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From David.Messina at sbc.su.se Fri Apr 18 05:31:59 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 18 Apr 2008 11:31:59 +0200 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: <628aabb70804170155n4e5dfd81r7020c3e9e11094ff@mail.gmail.com> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> <628aabb70804161112o6610ee1fkfb4b08e74730237d@mail.gmail.com> <1208420674.23342.15.camel@razor.sbc.su.se> <628aabb70804170155n4e5dfd81r7020c3e9e11094ff@mail.gmail.com> Message-ID: <628aabb70804180231p2b9cef9dwd5441e85c31531fd@mail.gmail.com> Jacob, I talked about your question with a colleague of mine who has been working in this area. Below is his reply. [I'm reposting this *without* the attachment mentioned since the mailing list wouldn't accept it otherwise. If anyone wants a copy of the code, just email me.] Dave ------- > 3. Pfam has this capability, i.e. to show all domains with a given > architecture, but it is difficult to get at the actual sequences or > even a list of accession numbers. First, this should be available right away in PfamAlyser: http://pfamalyzer.sbc.su.se/pfamalyzer/index.html although you might need to upgrade your browser to Java 1.6 to get it to work. If this does not work as suggested (an upgraded version is coming eventually), have a look at the file: ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/swisspfam.gz which contains the Pfam architectures for all UniProt sequences. You can parse that to get a file of - correspondences and just filter that to get the accession numbers. (Please find attached a Perl script to do just that.) Under UNIX, you can then just grep this for the domain IDs, (like grep domainArchitectureFile.txt PF00008 | grep PF00456 > resultFile.txt) but I am sure there are solutions under other operating systems as well. You could then write a script to parse out the corresponding sequences from the UniProt fasta flatfile, if you wanted, or (again under UNIX) a script to wget them of the webpage. In case your sequences are not in UniProt, consider using HMMER and the Pfam HMM files to assign domains to all sequences in your dataset. I would then parse the HMMER output into the same format as the above, and use the same approach following that. Hope this helps, Yours sincerely, Kristoffer Forslund krifo at sbc.su.se From lincoln.stein at gmail.com Fri Apr 18 15:16:19 2008 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 18 Apr 2008 15:16:19 -0400 Subject: [Bioperl-l] [Gmod-gbrowse] accessing params for custom glyphs? In-Reply-To: References: <130971.67684.qm@web46007.mail.sp1.yahoo.com> Message-ID: <6dce9a0b0804181216q6564e580u8a805ae96c78df2e@mail.gmail.com> Hi Russell, It's very simple: my $params = $self->option('new_parameter'); Lincoln On Thu, Apr 17, 2008 at 9:39 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > This is probably more of a Perl OO problem I'm having, but can anyone > tell me how to access a parameter when I create a custom glyph? > > I've created a panel in the usual way then I add a feature with > 'my_glyph' and want to pass the value of -new_parameter to the glyph > drawing code. > > $panel->add_track( $feature, > -font => gdSmallFont, > -glyph => 'my_glyph' , > -height => 10, > -label => 1, > -strand => "forward", > -new_parameter => "test", > > > In my_glyph.pm, I have the usual draw_component sub: > > sub draw_component { > my $self = shift; > my $gd = shift; > my ($x1,$y1,$x2,$y2) = $self->bounds(@_); > my $fg = $self->fgcolor; > my $params = $self->?????????? <<--- how do I access the value of > "new_parameter" set in the panel drawing code? > > $gd->line($x1,$y1,$x2,$y2,$fg); > $gd->line($x1,$y2,$x2,$y1,$fg); > > } > > Any ideas? > > Thanx, > > Russell Smithies > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From jason at bioperl.org Fri Apr 18 22:35:10 2008 From: jason at bioperl.org (Jason Stajich) Date: Fri, 18 Apr 2008 19:35:10 -0700 Subject: [Bioperl-l] index::abstract on win and unix In-Reply-To: <1208381947.16620.6.camel@kiss-laptop> References: <1208366718.19084.15.camel@kiss-laptop> <1208381947.16620.6.camel@kiss-laptop> Message-ID: do you want the LOCUS or the ACCESSION? Do you mean the result is the completely wrong record or just the wrong field? accession number is available from the seq's accession_number() method. -jason On Apr 16, 2008, at 2:39 PM, Fr?d?ric Romagn? wrote: > Well, if with input file you mean the database used, it's created > with Bio::Index::GenBank from a ncbi FTP's genbank file. > > $id is an accession number read from a file but i chomp the line... > > I am trying to install the svn version of bioperl under windows to see > if there is an improvement. > > Le jeudi 17 avril 2008 ? 08:49 +1200, Smithies, Russell a ?crit : >> Did you check the format of your input file? >> i.e. DOS or UNIX line endings? >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open- >>> bio.org] On Behalf Of Fr?d?ric Romagn? >>> Sent: Thursday, 17 April 2008 5:25 a.m. >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix >>> >>> Hello, >>> i made a program which use Bio::Index::GenBank and i tested it under >>> unix, that worked well. >>> >>> But i have to launch it under windows and it seems not to work on. >>> >>> Here is the problem : >>> >>> my $dbobj = Bio::Index::Abstract->new("Data/$db"); >>> my $seq = $dbobj->get_Seq_by_acc($id); >>> print $seq->display_id."\n"; >>> >>> did not print the same number than $id !!! So i don't work on the >>> sequence expected... >>> >>> I use the SVN sources on unix and the Perl package manager for >>> windows... >>> >>> Thanks. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> ===================================================================== >> == >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> ===================================================================== >> == > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bioperlanand at yahoo.com Mon Apr 21 03:44:00 2008 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Mon, 21 Apr 2008 00:44:00 -0700 (PDT) Subject: [Bioperl-l] a question on obtaining HTML formatted Blast output along with the Blast hits image Message-ID: <372845.37134.qm@web36808.mail.mud.yahoo.com> Hi everybody, I would like to obtain a HTML formatted blast report output along with a picture of the blast hits as shown on Slide 60 in this pdf: http://jason.open-bio.org/Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf I have gotten the HTML output working using "Bio::SearchIO::Writer::HTMLResultWriter". My question: How do I integrate it with Bio:Graphics to render the blast hits image at the correct position in my Bioperl reformatted html file. I ultimately want to be able to display my blast output files on a browser. Here is my code so far: ---------------------------------------------------------------- #!/usr/bin/perl -w # usage: $0 use strict; use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; my $infile = shift or die $!; my $searchio = new Bio::SearchIO( -format => 'blast',-file => $infile ); my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">${infile}.html"); $outhtml->write_result($searchio->next_result); ---------------------------------------------------------------- Thanks in advance, Anand --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From cjfields at uiuc.edu Mon Apr 21 11:07:17 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 Apr 2008 10:07:17 -0500 Subject: [Bioperl-l] [Proposed change] HSP::frame() Message-ID: I have noticed (in relation to bug 2485, http://bugzilla.open-bio.org/show_bug.cgi?id=2485) that the Bio::Search::HSP::GenericHSP frame() method is implemented very differently from strand(), start(), end(), and most other HSP methods. The current behavior is to return an array of two values (query and hit frame) under list conditions, the query frame if one value is passed, and the subject frame if no value is passed under scalar context and both under list context. The latter behavior is unfortunately leading to the aforementioned bug above. The method is also implied to be a getter/setter, but the implementation doesn't allow that; it always sets to the instantiated values (in fact, repeatedly so). In order to fix that and make the interface more consistent I am changing frame() to behave like strand(), etc., in that the first argument is 'query/subject/hit/list' (default = 'query' if no arg specified) and the rest optional values for setting, in query/subject order. One issue: I can catch and imitate most of the older behavior with a few additional checks, the one exception being the old frame() default return value which is now 'query' (not context-dependent). If needed we can change the default to 'hit', but I believe method consistency is probably the better route, and I can always add a warning under old API circumstances indicating the change. I am also modifying HSPTableWriter to print frame_hit and frame_query (previously it was only printing 'frame', which implied the hit frame). I can see this being an issue with anyone expecting 'frame' instead of 'frame_hit'; I could hack in a fix for that if needed. If there aren't any objections or suggestions, I'll commit this in the next day or two. chris From cjfields at uiuc.edu Mon Apr 21 11:32:59 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 Apr 2008 10:32:59 -0500 Subject: [Bioperl-l] Assembly.t test fails Message-ID: I'm getting some significant test failures in bioperl-live for Bio::Assembly: t/Assembly...... 1..35 ok 1 - use Bio::Assembly::IO; ok 2 - The object isa Bio::Assembly::IO ok 3 - The object isa Bio::Assembly::Scaffold ok 4 not ok 5 ok 6 - The object isa Bio::AnnotationCollectionI ok 7 - no annotations in Annotation collection? ok 8 # Failed test at t/Assembly.t line 35. # got: 'NoName' # expected: 'test' Can't locate object method "get_contig_seq_ids" via package "Bio::Assembly::Contig" at /Users/cjfields/bioperl/bioperl-live/blib/ lib/Bio/Assembly/Scaffold.pm line 189, line 733. # Looks like you planned 35 tests but only ran 8. # Looks like you failed 1 test of 8 run. # Looks like your test died just after 8. Dubious, test returned 255 (wstat 65280, 0xff00) Failed 28/35 subtests Test Summary Report ------------------- t/Assembly.t (Wstat: 65280 Tests: 8 Failed: 1) Failed test: 5 Non-zero exit status: 255 Parse errors: Bad plan. You planned 35 tests but ran 8. Files=1, Tests=8, 0 wallclock secs ( 0.01 usr 0.00 sys + 0.22 cusr 0.04 csys = 0.27 CPU) Result: FAIL Failed 1/1 test programs. 1/8 subtests failed. chris Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Apr 21 11:44:21 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 Apr 2008 10:44:21 -0500 Subject: [Bioperl-l] Assembly.t test fails In-Reply-To: References: Message-ID: <2F199628-717E-4F88-85D7-408BD7BBE16D@uiuc.edu> Scratch that, figured it out (easy fix). chris On Apr 21, 2008, at 10:32 AM, Chris Fields wrote: > I'm getting some significant test failures in bioperl-live for > Bio::Assembly: > > t/Assembly...... > 1..35 > ok 1 - use Bio::Assembly::IO; > ok 2 - The object isa Bio::Assembly::IO > ok 3 - The object isa Bio::Assembly::Scaffold > ok 4 > not ok 5 > ok 6 - The object isa Bio::AnnotationCollectionI > ok 7 - no annotations in Annotation collection? > ok 8 > > # Failed test at t/Assembly.t line 35. > # got: 'NoName' > # expected: 'test' > Can't locate object method "get_contig_seq_ids" via package > "Bio::Assembly::Contig" at /Users/cjfields/bioperl/bioperl-live/blib/ > lib/Bio/Assembly/Scaffold.pm line 189, line 733. > # Looks like you planned 35 tests but only ran 8. > # Looks like you failed 1 test of 8 run. > # Looks like your test died just after 8. > Dubious, test returned 255 (wstat 65280, 0xff00) > Failed 28/35 subtests > > Test Summary Report > ------------------- > t/Assembly.t (Wstat: 65280 Tests: 8 Failed: 1) > Failed test: 5 > Non-zero exit status: 255 > Parse errors: Bad plan. You planned 35 tests but ran 8. > Files=1, Tests=8, 0 wallclock secs ( 0.01 usr 0.00 sys + 0.22 > cusr 0.04 csys = 0.27 CPU) > Result: FAIL > Failed 1/1 test programs. 1/8 subtests failed. > > > chris > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From frederic.romagne at gmail.com Mon Apr 21 11:53:11 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Mon, 21 Apr 2008 10:53:11 -0500 Subject: [Bioperl-l] index::abstract on win and unix In-Reply-To: References: <1208366718.19084.15.camel@kiss-laptop> <1208381947.16620.6.camel@kiss-laptop> Message-ID: <1208793191.25906.9.camel@kiss-laptop> In fact, i want the whole Bio::Seq object, but the i verified the ACCESSION and the LOCUS are the same in my genbank files. I saw that the program sometimes tells that it cannot find the entry : if( !defined $seq ) { warn("Sequence $id in Database $db is not present\n"); } i suspect the make_index function not to work properly on windows instead of the ?get_Seq_by_acc function... Le vendredi 18 avril 2008 ? 19:35 -0700, Jason Stajich a ?crit : > do you want the LOCUS or the ACCESSION? > Do you mean the result is the completely wrong record or just the > wrong field? > accession number is available from the seq's accession_number() method. > -jason > On Apr 16, 2008, at 2:39 PM, Fr?d?ric Romagn? wrote: > > > Well, if with input file you mean the database used, it's created > > with Bio::Index::GenBank from a ncbi FTP's genbank file. > > > > $id is an accession number read from a file but i chomp the line... > > > > I am trying to install the svn version of bioperl under windows to see > > if there is an improvement. > > > > Le jeudi 17 avril 2008 ? 08:49 +1200, Smithies, Russell a ?crit : > >> Did you check the format of your input file? > >> i.e. DOS or UNIX line endings? > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open- > >>> bio.org] On Behalf Of Fr?d?ric Romagn? > >>> Sent: Thursday, 17 April 2008 5:25 a.m. > >>> To: bioperl-l at lists.open-bio.org > >>> Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix > >>> > >>> Hello, > >>> i made a program which use Bio::Index::GenBank and i tested it under > >>> unix, that worked well. > >>> > >>> But i have to launch it under windows and it seems not to work on. > >>> > >>> Here is the problem : > >>> > >>> my $dbobj = Bio::Index::Abstract->new("Data/$db"); > >>> my $seq = $dbobj->get_Seq_by_acc($id); > >>> print $seq->display_id."\n"; > >>> > >>> did not print the same number than $id !!! So i don't work on the > >>> sequence expected... > >>> > >>> I use the SVN sources on unix and the Perl package manager for > >>> windows... > >>> > >>> Thanks. > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> ===================================================================== > >> == > >> Attention: The information contained in this message and/or > >> attachments > >> from AgResearch Limited is intended only for the persons or entities > >> to which it is addressed and may contain confidential and/or > >> privileged > >> material. Any review, retransmission, dissemination or other use > >> of, or > >> taking of any action in reliance upon, this information by persons or > >> entities other than the intended recipients is prohibited by > >> AgResearch > >> Limited. If you have received this message in error, please notify > >> the > >> sender immediately. > >> ===================================================================== > >> == > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ewijaya at gmail.com Tue Apr 22 10:03:07 2008 From: ewijaya at gmail.com (Edward Wijaya) Date: Tue, 22 Apr 2008 22:03:07 +0800 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output Message-ID: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> Hi, Is there any module that can parse the following output of BLAT. This is taken from UCSC browser. The idea is to parse it and then extract the conserved block of aligned sequences. __DATA__ Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps B D D. melanogaster tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa B D D. simulans tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa B D D. sechellia tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa B D D. yakuba tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa D. erecta tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa D. ananassae taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- D. pseudoobscura tata----ccagtacac-cttatatg------------tttttaaata-------------------- B D D. persimilis tata----ccagtacac-attatatg------------tttttaaata-------------------- D. willistoni aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa D. virilis -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa D. mojavensis -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa D. grimshawi ==================================================================== T. castaneum ==================================================================== Inserts between block 3 and 4 in window D. pseudoobscura 2008bp B D D. persimilis 1421bp D. virilis 5bp D. mojavensis 4640bp Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps B D D. melanogaster ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga B D D. simulans ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga B D D. sechellia ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga B D D. yakuba ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga D. erecta ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga D. pseudoobscura ==================================================================== B D D. persimilis ==================================================================== D. willistoni ----aggattacgaagttcctttat-------------------aaag-------------------- D. virilis gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- D. mojavensis ==================================================================== D. grimshawi ==================================================================== T. castaneum ==================================================================== __ END__ From cjfields at uiuc.edu Tue Apr 22 10:22:45 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 22 Apr 2008 09:22:45 -0500 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output In-Reply-To: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> References: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> Message-ID: <766FDF9E-9F7B-4826-B7FA-87DF3B074EBC@uiuc.edu> A quick grep of bioperl-live gets me Bio::SearchIO::blast, Bio::SearchIO::axt, Bio::SearchIO::psl, Bio::Tools::Blat, and Bio::Tools::WebBlat. Haven't looked at the docs but it's a start! chris On Apr 22, 2008, at 9:03 AM, Edward Wijaya wrote: > Hi, > > Is there any module that can parse the following output > of BLAT. This is taken from UCSC browser. > > The idea is to parse it and then extract the conserved block > of aligned sequences. > > > __DATA__ > Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps > B D D. melanogaster > tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa > B D D. simulans > tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa > B D D. sechellia > tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa > B D D. yakuba > tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa > D. erecta > tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa > D. ananassae > taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- > D. pseudoobscura > tata----ccagtacac-cttatatg------------tttttaaata-------------------- > B D D. persimilis > tata----ccagtacac-attatatg------------tttttaaata-------------------- > D. willistoni > aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa > D. virilis > -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa > D. mojavensis > -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa > D. grimshawi > ==================================================================== > T. castaneum > ==================================================================== > > Inserts between block 3 and 4 in window > D. pseudoobscura 2008bp > B D D. persimilis 1421bp > D. virilis 5bp > D. mojavensis 4640bp > > Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps > B D D. melanogaster > ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga > B D D. simulans > ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga > B D D. sechellia > ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga > B D D. yakuba > ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga > D. erecta > ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga > D. pseudoobscura > ==================================================================== > B D D. persimilis > ==================================================================== > D. willistoni > ----aggattacgaagttcctttat-------------------aaag-------------------- > D. virilis > gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- > D. mojavensis > ==================================================================== > D. grimshawi > ==================================================================== > T. castaneum > ==================================================================== > > __ END__ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Apr 22 10:59:25 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 22 Apr 2008 09:59:25 -0500 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output In-Reply-To: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> References: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> Message-ID: <4F3522BB-28F0-44A8-8DE1-7CF3F648402A@uiuc.edu> A quick grep of bioperl-live gets me Bio::SearchIO::blast, Bio::SearchIO::axt, Bio::SearchIO::psl, Bio::Tools::Blat, and Bio::Tools::WebBlat. Haven't looked at the docs but it's a start! chris On Apr 22, 2008, at 9:03 AM, Edward Wijaya wrote: > Hi, > > Is there any module that can parse the following output > of BLAT. This is taken from UCSC browser. > > The idea is to parse it and then extract the conserved block > of aligned sequences. > > > __DATA__ > Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps > B D D. melanogaster > tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa > B D D. simulans > tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa > B D D. sechellia > tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa > B D D. yakuba > tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa > D. erecta > tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa > D. ananassae > taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- > D. pseudoobscura > tata----ccagtacac-cttatatg------------tttttaaata-------------------- > B D D. persimilis > tata----ccagtacac-attatatg------------tttttaaata-------------------- > D. willistoni > aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa > D. virilis > -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa > D. mojavensis > -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa > D. grimshawi > ==================================================================== > T. castaneum > ==================================================================== > > Inserts between block 3 and 4 in window > D. pseudoobscura 2008bp > B D D. persimilis 1421bp > D. virilis 5bp > D. mojavensis 4640bp > > Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps > B D D. melanogaster > ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga > B D D. simulans > ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga > B D D. sechellia > ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga > B D D. yakuba > ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga > D. erecta > ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga > D. pseudoobscura > ==================================================================== > B D D. persimilis > ==================================================================== > D. willistoni > ----aggattacgaagttcctttat-------------------aaag-------------------- > D. virilis > gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- > D. mojavensis > ==================================================================== > D. grimshawi > ==================================================================== > T. castaneum > ==================================================================== > > __ END__ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Tue Apr 22 14:49:32 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 22 Apr 2008 11:49:32 -0700 Subject: [Bioperl-l] Fwd: [blast-announce] New BLAST URL available at the NCBI References: Message-ID: Does anyone want to take a look at how to use these URLs in the RemoteBlast module, if the interface is the same? -jason Begin forwarded message: > From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" > Date: April 22, 2008 11:35:04 AM PDT > To: > Subject: [blast-announce] New BLAST URL available at the NCBI > > New BLAST URL available at the NCBI > > > > The NCBI has activated a new URL for BLAST searches at the NCBI: > http://blast.ncbi.nlm.nih.gov. > > > > Searches sent to this URL can take advantage of a larger number of > machines for searches and the system has a better overall fault > tolerance. > > > > We recommend migration of all BLAST links and bookmarks (e.g., > http://www.ncbi.nlm.nih.gov/BLAST/ and > http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi) to the new URL. > > > > Links on the NCBI and BLAST home pages will start to change in the > coming weeks. > > > > At this point in time the plans are to also maintain the current BLAST > URL. > > > > > From jason at bioperl.org Tue Apr 22 14:51:08 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 22 Apr 2008 11:51:08 -0700 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output In-Reply-To: <766FDF9E-9F7B-4826-B7FA-87DF3B074EBC@uiuc.edu> References: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> <766FDF9E-9F7B-4826-B7FA-87DF3B074EBC@uiuc.edu> Message-ID: <6C812413-B375-427B-9AF8-5A0AA6167CC8@bioperl.org> if you get it as axt it should parse fine in SearchIO but that is pairwise, if you can get an alignment blocks I can't remember what format this is from UCSC. MSAs are going to be better handed through Bio::AlignIO though so it might be better to build a parser on that. On Apr 22, 2008, at 7:22 AM, Chris Fields wrote: > A quick grep of bioperl-live gets me Bio::SearchIO::blast, > Bio::SearchIO::axt, Bio::SearchIO::psl, Bio::Tools::Blat, and > Bio::Tools::WebBlat. Haven't looked at the docs but it's a start! > > chris > > On Apr 22, 2008, at 9:03 AM, Edward Wijaya wrote: > >> Hi, >> >> Is there any module that can parse the following output >> of BLAT. This is taken from UCSC browser. >> >> The idea is to parse it and then extract the conserved block >> of aligned sequences. >> >> >> __DATA__ >> Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps >> B D D. melanogaster >> tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa >> B D D. simulans >> tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa >> B D D. sechellia >> tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa >> B D D. yakuba >> tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa >> D. erecta >> tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa >> D. ananassae >> taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- >> D. pseudoobscura >> tata----ccagtacac-cttatatg------------tttttaaata-------------------- >> B D D. persimilis >> tata----ccagtacac-attatatg------------tttttaaata-------------------- >> D. willistoni >> aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa >> D. virilis >> -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa >> D. mojavensis >> -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa >> D. grimshawi >> ==================================================================== >> T. castaneum >> ==================================================================== >> >> Inserts between block 3 and 4 in window >> D. pseudoobscura 2008bp >> B D D. persimilis 1421bp >> D. virilis 5bp >> D. mojavensis 4640bp >> >> Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps >> B D D. melanogaster >> ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga >> B D D. simulans >> ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga >> B D D. sechellia >> ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga >> B D D. yakuba >> ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga >> D. erecta >> ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga >> D. pseudoobscura >> ==================================================================== >> B D D. persimilis >> ==================================================================== >> D. willistoni >> ----aggattacgaagttcctttat-------------------aaag-------------------- >> D. virilis >> gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- >> D. mojavensis >> ==================================================================== >> D. grimshawi >> ==================================================================== >> T. castaneum >> ==================================================================== >> >> __ END__ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Apr 22 15:02:14 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 22 Apr 2008 14:02:14 -0500 Subject: [Bioperl-l] Fwd: [blast-announce] New BLAST URL available at the NCBI In-Reply-To: References: Message-ID: <13C2AD96-8297-40DD-ADCC-B2BEC923B9E0@uiuc.edu> They work exactly the same as the old URL, at least on the surface; I haven't tried changing many URLAPI parameters. I went ahead and changed the URL in RemoteBlast to http://blast.ncbi.nlm.nih.gov/Blast.cgi as it works with RemoteBlast.t. chris On Apr 22, 2008, at 1:49 PM, Jason Stajich wrote: > Does anyone want to take a look at how to use these URLs in the > RemoteBlast module, if the interface is the same? > > -jason > > Begin forwarded message: > >> From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" >> >> Date: April 22, 2008 11:35:04 AM PDT >> To: >> Subject: [blast-announce] New BLAST URL available at the NCBI >> >> New BLAST URL available at the NCBI >> >> >> >> The NCBI has activated a new URL for BLAST searches at the NCBI: >> http://blast.ncbi.nlm.nih.gov. >> >> >> >> Searches sent to this URL can take advantage of a larger number of >> machines for searches and the system has a better overall fault >> tolerance. >> >> >> >> We recommend migration of all BLAST links and bookmarks (e.g., >> http://www.ncbi.nlm.nih.gov/BLAST/ and >> http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi) to the new URL. >> >> >> >> Links on the NCBI and BLAST home pages will start to change in the >> coming weeks. >> >> >> >> At this point in time the plans are to also maintain the current >> BLAST >> URL. >> >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Apr 22 14:58:40 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 22 Apr 2008 13:58:40 -0500 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output In-Reply-To: <6C812413-B375-427B-9AF8-5A0AA6167CC8@bioperl.org> References: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> <766FDF9E-9F7B-4826-B7FA-87DF3B074EBC@uiuc.edu> <6C812413-B375-427B-9AF8-5A0AA6167CC8@bioperl.org> Message-ID: <43344C89-6B4D-4360-AF56-A6FDD065FFF3@uiuc.edu> Related to that, I have thought about building a parser for some of the query-anchored alignments produced by blastall, just haven't had time to devote to it. One of these days... chris On Apr 22, 2008, at 1:51 PM, Jason Stajich wrote: > if you get it as axt it should parse fine in SearchIO but that is > pairwise, if you can get an alignment blocks I can't remember what > format this is from UCSC. > MSAs are going to be better handed through Bio::AlignIO though so it > might be better to build a parser on that. > > On Apr 22, 2008, at 7:22 AM, Chris Fields wrote: > >> A quick grep of bioperl-live gets me Bio::SearchIO::blast, >> Bio::SearchIO::axt, Bio::SearchIO::psl, Bio::Tools::Blat, and >> Bio::Tools::WebBlat. Haven't looked at the docs but it's a start! >> >> chris >> >> On Apr 22, 2008, at 9:03 AM, Edward Wijaya wrote: >> >>> Hi, >>> >>> Is there any module that can parse the following output >>> of BLAT. This is taken from UCSC browser. >>> >>> The idea is to parse it and then extract the conserved block >>> of aligned sequences. >>> >>> >>> __DATA__ >>> Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps >>> B D D. melanogaster >>> tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa >>> B D D. simulans >>> tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa >>> B D D. sechellia >>> tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa >>> B D D. yakuba >>> tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa >>> D. erecta >>> tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa >>> D. ananassae >>> taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- >>> D. pseudoobscura >>> tata----ccagtacac-cttatatg------------tttttaaata-------------------- >>> B D D. persimilis >>> tata----ccagtacac-attatatg------------tttttaaata-------------------- >>> D. willistoni >>> aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa >>> D. virilis >>> -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa >>> D. mojavensis >>> -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa >>> D. grimshawi >>> ==================================================================== >>> T. castaneum >>> ==================================================================== >>> >>> Inserts between block 3 and 4 in window >>> D. pseudoobscura 2008bp >>> B D D. persimilis 1421bp >>> D. virilis 5bp >>> D. mojavensis 4640bp >>> >>> Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps >>> B D D. melanogaster >>> ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga >>> B D D. simulans >>> ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga >>> B D D. sechellia >>> ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga >>> B D D. yakuba >>> ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga >>> D. erecta >>> ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga >>> D. pseudoobscura >>> ==================================================================== >>> B D D. persimilis >>> ==================================================================== >>> D. willistoni >>> ----aggattacgaagttcctttat-------------------aaag-------------------- >>> D. virilis >>> gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- >>> D. mojavensis >>> ==================================================================== >>> D. grimshawi >>> ==================================================================== >>> T. castaneum >>> ==================================================================== >>> >>> __ END__ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bioperlanand at yahoo.com Wed Apr 23 02:02:30 2008 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Tue, 22 Apr 2008 23:02:30 -0700 (PDT) Subject: [Bioperl-l] Question on integrating Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter Message-ID: <946658.12337.qm@web36802.mail.mud.yahoo.com> Hi everybody, I would like to use Bio::Graphics in conjunction with Bio::SearchIO::Writer::HTMLResultWriter to obtain a HTML formatted blast report output along with an image of the blast hits as shown on Slide 60 in this pdf: http://jason.open-bio.org/Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf I am able to get the HTML output using "Bio::SearchIO::Writer::HTMLResultWriter" and I am able to get the image using the examples outlined in the Bio::Graphics HOWTO: http://www.bioperl.org/wiki/HOWTO:Graphics My question: How do I integrate Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter to render the blast hits image at the correct position in my BioPerl reformatted html file. I also found that someone else has asked something similar to whatever I am asking & is listed under the "Orphans, Leftovers" category in the ListSummary:April 26-May 9,2006 document: http://www.bioperl.org/wiki/ListSummary:April_26-May_9%2C2006#Orphans.2C_Leftovers Here is my code so far: ---------------------------------------------------------------- #!/usr/bin/perl -w # usage: $0 use strict; use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; my $infile = shift or die $!; my $searchio = new Bio::SearchIO( -format => 'blast',-file => $infile); my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">${infile}.html"); $outhtml->write_result($searchio->next_result); ---------------------------------------------------------------- Thanks in advance, Anand --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From jason at bioperl.org Wed Apr 23 02:15:28 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 22 Apr 2008 23:15:28 -0700 Subject: [Bioperl-l] Question on integrating Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter In-Reply-To: <946658.12337.qm@web36802.mail.mud.yahoo.com> References: <946658.12337.qm@web36802.mail.mud.yahoo.com> Message-ID: <952B0A4E-8A14-4E8E-B36D-14596B20E330@bioperl.org> Basically you want to inject your own IMG tags into the file with these routines: $writerhtml->start_report(\&my_start_report); $writerhtml->title(\&my_title); $writerhtml->hit_link_align(\&my_hit_link_align); $writerhtml->hit_link_desc(\&my_hit_link_desc); fgblast shows a way to do this in part. It relies on Gbrowse to generate the image but you can replace the gbrowse_img reference to your own image generating software. http://people.genome.duke.edu/~jes12/software/scripts/fgblast -jason On Apr 22, 2008, at 11:02 PM, Anand Venkatraman wrote: > Hi everybody, > > I would like to use Bio::Graphics in conjunction with > Bio::SearchIO::Writer::HTMLResultWriter to obtain a HTML formatted > blast report output along with an image of the blast hits as shown > on Slide 60 in this pdf: http://jason.open-bio.org/ > Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf > > I am able to get the HTML output using > "Bio::SearchIO::Writer::HTMLResultWriter" and I am able to get the > image using the examples outlined in the Bio::Graphics HOWTO: > http://www.bioperl.org/wiki/HOWTO:Graphics > > My question: How do I integrate Bio::Graphics with > Bio::SearchIO::Writer::HTMLResultWriter to render the blast hits > image at the correct position in my BioPerl reformatted html file. > > I also found that someone else has asked something similar to > whatever I am asking & is listed under the "Orphans, Leftovers" > category in the ListSummary:April 26-May 9,2006 document: > http://www.bioperl.org/wiki/ListSummary:April_26-May_9% > 2C2006#Orphans.2C_Leftovers > > Here is my code so far: > ---------------------------------------------------------------- > #!/usr/bin/perl -w > # usage: $0 > use strict; > use Bio::SearchIO; > use Bio::SearchIO::Writer::HTMLResultWriter; > > my $infile = shift or die $!; > > my $searchio = new Bio::SearchIO( -format => 'blast',-file => > $infile); > my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); > my $outhtml = new Bio::SearchIO(-writer => $writerhtml, > -file => ">$ > {infile}.html"); > > $outhtml->write_result($searchio->next_result); > ---------------------------------------------------------------- > > Thanks in advance, > > Anand > > > --------------------------------- > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. > Try it now. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bamboowarrior at gmail.com Wed Apr 23 15:39:21 2008 From: bamboowarrior at gmail.com (Arkady) Date: Wed, 23 Apr 2008 14:39:21 -0500 Subject: [Bioperl-l] WebBlat, where'd it go? Message-ID: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> Hi folks, I'm trying to use BioPerl to run a BLAT search on the four primate genomes on UCSC. I understand that the proper tool for this is Bio::Tools::WebBlat. Unfortunately, it doesn't appear to be in my bioperl distribution (nor do I even know how to figure out what version that is, unfortunately, though it's a very recent install -- a month ago?). I also can't find it on CPAN. Is this deprecated? Has something else replaced it? Or are we always supposed to run local BLAT? Thanks. John Woods Institute for Cellular and Molecular Biology The University of Texas at Austin From spiros at lokku.com Wed Apr 23 15:48:12 2008 From: spiros at lokku.com (Spiros Denaxas) Date: Wed, 23 Apr 2008 20:48:12 +0100 Subject: [Bioperl-l] WebBlat, where'd it go? In-Reply-To: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> References: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> Message-ID: Hey, a quick look at the list of deprecated modules reveals that it has indeed been removed, http://www.bioperl.org/wiki/Deprecated_modules Spiros On Wed, Apr 23, 2008 at 8:39 PM, Arkady wrote: > Hi folks, > > I'm trying to use BioPerl to run a BLAT search on the four primate > genomes on UCSC. I understand that the proper tool for this is > Bio::Tools::WebBlat. Unfortunately, it doesn't appear to be in my > bioperl distribution (nor do I even know how to figure out what > version that is, unfortunately, though it's a very recent install -- a > month ago?). I also can't find it on CPAN. Is this deprecated? Has > something else replaced it? Or are we always supposed to run local > BLAT? > > Thanks. > > John Woods > > Institute for Cellular and Molecular Biology > The University of Texas at Austin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Apr 23 15:56:14 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 23 Apr 2008 14:56:14 -0500 Subject: [Bioperl-l] WebBlat, where'd it go? In-Reply-To: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> References: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> Message-ID: It's no longer maintained (deprecated); see the following for an explanation: http://article.gmane.org/gmane.comp.lang.perl.bio.general/13545 Basically, only local BLAT searches are supported through BioPerl. chris On Apr 23, 2008, at 2:39 PM, Arkady wrote: > Hi folks, > > I'm trying to use BioPerl to run a BLAT search on the four primate > genomes on UCSC. I understand that the proper tool for this is > Bio::Tools::WebBlat. Unfortunately, it doesn't appear to be in my > bioperl distribution (nor do I even know how to figure out what > version that is, unfortunately, though it's a very recent install -- a > month ago?). I also can't find it on CPAN. Is this deprecated? Has > something else replaced it? Or are we always supposed to run local > BLAT? > > Thanks. > > John Woods > > Institute for Cellular and Molecular Biology > The University of Texas at Austin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bioperlanand at yahoo.com Wed Apr 23 19:05:27 2008 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Wed, 23 Apr 2008 16:05:27 -0700 (PDT) Subject: [Bioperl-l] Question on integrating Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter In-Reply-To: <952B0A4E-8A14-4E8E-B36D-14596B20E330@bioperl.org> Message-ID: <795696.39415.qm@web36804.mail.mud.yahoo.com> Hi Jason, Thanks for the reply. I am a little lost with the solution suggested. Is that how slide 60 in the pdf is obtained: http://jason.open-bio.org/Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf I guess I am missing something quite obvious, I apologize. What I have & want is this: I have a directory having say 100 different blast reports & hence I am looking to obtain 100 different bioperl formatted blast html outputs with the respective images just as it would appear in the blast report. Thanks, Anand Jason Stajich wrote: Basically you want to inject your own IMG tags into the file with these routines: $writerhtml->start_report(\&my_start_report); $writerhtml->title(\&my_title); $writerhtml->hit_link_align(\&my_hit_link_align); $writerhtml->hit_link_desc(\&my_hit_link_desc); fgblast shows a way to do this in part. It relies on Gbrowse to generate the image but you can replace the gbrowse_img reference to your own image generating software. http://people.genome.duke.edu/~jes12/software/scripts/fgblast -jason On Apr 22, 2008, at 11:02 PM, Anand Venkatraman wrote: Hi everybody, I would like to use Bio::Graphics in conjunction with Bio::SearchIO::Writer::HTMLResultWriter to obtain a HTML formatted blast report output along with an image of the blast hits as shown on Slide 60 in this pdf: http://jason.open-bio.org/Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf I am able to get the HTML output using "Bio::SearchIO::Writer::HTMLResultWriter" and I am able to get the image using the examples outlined in the Bio::Graphics HOWTO: http://www.bioperl.org/wiki/HOWTO:Graphics My question: How do I integrate Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter to render the blast hits image at the correct position in my BioPerl reformatted html file. I also found that someone else has asked something similar to whatever I am asking & is listed under the "Orphans, Leftovers" category in the ListSummary:April 26-May 9,2006 document: http://www.bioperl.org/wiki/ListSummary:April_26-May_9%2C2006#Orphans.2C_Leftovers Here is my code so far: ---------------------------------------------------------------- #!/usr/bin/perl -w # usage: $0 use strict; use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; my $infile = shift or die $!; my $searchio = new Bio::SearchIO( -format => 'blast',-file => $infile); my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">${infile}.html"); $outhtml->write_result($searchio->next_result); ---------------------------------------------------------------- Thanks in advance, Anand --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From jason at bioperl.org Thu Apr 24 14:06:41 2008 From: jason at bioperl.org (Jason Stajich) Date: Thu, 24 Apr 2008 11:06:41 -0700 Subject: [Bioperl-l] Question on integrating Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter In-Reply-To: <795696.39415.qm@web36804.mail.mud.yahoo.com> References: <795696.39415.qm@web36804.mail.mud.yahoo.com> Message-ID: The overview graphic is generated basically from the script in scripts/graphics/search_overview.PLS So you'd have to run that on each report to generate the graphic, then use the other methods to insert images into each rendered HTML report. -jason On Apr 23, 2008, at 4:05 PM, Anand Venkatraman wrote: > Hi Jason, > > Thanks for the reply. > > I am a little lost with the solution suggested. Is that how slide > 60 in the pdf is obtained: http://jason.open-bio.org/ > Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf > > I guess I am missing something quite obvious, I apologize. > > What I have & want is this: I have a directory having say 100 > different blast reports & hence I am looking to obtain 100 > different bioperl formatted blast html outputs with the respective > images just as it would appear in the blast report. > > Thanks, > > Anand > > Jason Stajich wrote: > > Basically you want to inject your own IMG tags into the file with > these routines: > > > $writerhtml->start_report(\&my_start_report); > $writerhtml->title(\&my_title); > $writerhtml->hit_link_align(\&my_hit_link_align); > $writerhtml->hit_link_desc(\&my_hit_link_desc); > > > fgblast shows a way to do this in part. It relies on Gbrowse to > generate the image but you can replace the gbrowse_img reference to > your own image generating software. > http://people.genome.duke.edu/~jes12/software/scripts/fgblast > > > > > -jason > On Apr 22, 2008, at 11:02 PM, Anand Venkatraman wrote: > > Hi everybody, > > > I would like to use Bio::Graphics in conjunction with > Bio::SearchIO::Writer::HTMLResultWriter to obtain a HTML formatted > blast report output along with an image of the blast hits as shown > on Slide 60 in this pdf: http://jason.open-bio.org/ > Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf > > > I am able to get the HTML output using > "Bio::SearchIO::Writer::HTMLResultWriter" and I am able to get the > image using the examples outlined in the Bio::Graphics HOWTO: > http://www.bioperl.org/wiki/HOWTO:Graphics > > > My question: How do I integrate Bio::Graphics with > Bio::SearchIO::Writer::HTMLResultWriter to render the blast hits > image at the correct position in my BioPerl reformatted html file. > > > I also found that someone else has asked something similar to > whatever I am asking & is listed under the "Orphans, Leftovers" > category in the ListSummary:April 26-May 9,2006 document: > http://www.bioperl.org/wiki/ListSummary:April_26-May_9% > 2C2006#Orphans.2C_Leftovers > > > Here is my code so far: > ---------------------------------------------------------------- > #!/usr/bin/perl -w > # usage: $0 > use strict; > use Bio::SearchIO; > use Bio::SearchIO::Writer::HTMLResultWriter; > > > my $infile = shift or die $!; > > > my $searchio = new Bio::SearchIO( -format => 'blast',-file => > $infile); > my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); > my $outhtml = new Bio::SearchIO(-writer => $writerhtml, > -file => ">$ > {infile}.html"); > > > $outhtml->write_result($searchio->next_result); > ---------------------------------------------------------------- > > > Thanks in advance, > > > Anand > > > > > --------------------------------- > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. > Try it now. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > --------------------------------- > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. > Try it now. From 1zoujing at 163.com Wed Apr 16 22:53:16 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 16 Apr 2008 19:53:16 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: References: <16602770.post@talk.nabble.com> <16603225.post@talk.nabble.com> Message-ID: <16737795.post@talk.nabble.com> Thank you very much! I splited the file on \t directly. Zou Jing Stefan Kirov-2 wrote: > > It is not. If you use this file, why would you need a parser for it > anyway? Just split on \t or read with OpenOffice or equiv. > Stefan > > On Thu, 10 Apr 2008, zoujing wrote: > >> >> Seached the web and found the answer now, quote the answer as following: >> The error was thrown by my Bio::ASN1::EntrezGene module because it >> expects a text file, while you fed it with a binary file. To use >> gzipped ASN binary file from NCBI, download the NCBI gene2xml >> (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), >> then use this syntax to run my parser on the binary files: >> >> my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i >> Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped >> binary file directly downloaded from NCBI >> >> Same syntax should be used when you're using SeqIO (thus >> SeqIO::entrezgene). >> Mingyi >> >> But there still one thing, I want to parse "gene_info.gz" in Gene of >> NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one >> line >> per GeneID, Column header line is the first line in the file >> ) is not the right format for Bio::ASN1::EntrezGene? >> >> >> >> zoujing wrote: >>> >>> I am a geen hand in Bioperl. When I run perl with >>> "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error >>> information: >>> Data Error: none conforming data found on line 1 in Sus_scrofa.ags. >>> >>> But the Sus_scrofa.ags is download from NCBI, with the format of >>> ASN1, >>> should be the same as Homo_sapiens in the example. So it should be no >>> error as the code is the example from Mingyi. >>> I wonder why this happen, and should I change something about the >>> file? >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16737795.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Wed Apr 16 22:55:47 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 16 Apr 2008 19:55:47 -0700 (PDT) Subject: [Bioperl-l] Bio::ASN1::EntrezGene parse so slowly? In-Reply-To: <264855a00804112050gf785c2ei66d9c7463597eccd@mail.gmail.com> References: <16602210.post@talk.nabble.com> <264855a00804112050gf785c2ei66d9c7463597eccd@mail.gmail.com> Message-ID: <16737804.post@talk.nabble.com> Thank you vey much! Solved the problem now. Jing Sean Davis-3 wrote: > > gene_info is a tab-delimited text file, if I recall correctly. Have > you looked at it? If it is, you should be able to parse it in a few > seconds with just a couple lines of code. > > Sean > > > On Thu, Apr 10, 2008 at 1:08 AM, zoujing <1zoujing at 163.com> wrote: >> >> I want to parse a file "gene_info" from NCBI. The format of Gene in >> NCBI is >> ASN1, right? So I used Bio::ASN1::EntrezGene. But it didn't work >> properly/too slow. The file is about 500M. >> The code is following: >> use Bio::ASN1::EntrezGene; >> my $parser = Bio::ASN1::EntrezGene->new('file' => $ARGV[0]); >> my $i = 0; >> while(my $result = $parser->next_seq) >> { last; #something to do there, here use last for test} >> >> When it goes to the "while" part, it is processing on and on, it does >> not >> went out, even I used "last" in the "while" part. >> So I wonder whether it is too slow or the module is not fit for this >> job, >> or I did something wrong? >> >> Thank you! >> -- >> View this message in context: >> http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16602210.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16737804.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From sbassi at clubdelarazon.org Sat Apr 26 13:49:20 2008 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Sat, 26 Apr 2008 14:49:20 -0300 Subject: [Bioperl-l] bioperl installation problem Message-ID: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> I tried to install bioperl because I need to install cviewer. Here (http://www.pastecode.com.ar/f37c1cd60) are both stdout and sdterr outputs. Here is one of the errors I get: set_attribute: not a compat02 graph at /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 10. sleeping for 3 seconds set_attribute: not a compat02 graph at /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 14. But I have GD::Graph, so I don't know what is going on: sbassi at ubuntuMAP:~$ sudo perl -MCPAN -e 'install GD::Graph' CPAN: Storable loaded ok Going to read /home/sbassi/.cpan/Metadata Database was generated on Fri, 25 Apr 2008 09:29:45 GMT GD::Graph is up to date. Any help regarding this: http://www.pastecode.com.ar/f37c1cd60 would be appreciated. Best, SB. -- Sebasti?n Bassi (???????). Diplomado en Ciencia y Tecnolog?a. Curso Biologia molecular para programadores: http://tinyurl.com/2vv8w6 Mostr? tu c?digo: http://www.pastecode.com.ar GPG Fingerprint: 9470 0980 620D ABFC BE63 A4A4 A3DE C97D 8422 D43D From jason at bioperl.org Sat Apr 26 15:23:37 2008 From: jason at bioperl.org (Jason Stajich) Date: Sat, 26 Apr 2008 12:23:37 -0700 Subject: [Bioperl-l] bioperl installation problem In-Reply-To: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> References: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> Message-ID: the error refers to the 'Graph' module not 'GD::Graph'; -jason On Apr 26, 2008, at 10:49 AM, Sebastian Bassi wrote: > I tried to install bioperl because I need to install cviewer. > Here (http://www.pastecode.com.ar/f37c1cd60) are both stdout and > sdterr outputs. > > Here is one of the errors I get: > > set_attribute: not a compat02 graph at > /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 10. > sleeping for 3 seconds > set_attribute: not a compat02 graph at > /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 14. > > But I have GD::Graph, so I don't know what is going on: > > sbassi at ubuntuMAP:~$ sudo perl -MCPAN -e 'install GD::Graph' > CPAN: Storable loaded ok > Going to read /home/sbassi/.cpan/Metadata > Database was generated on Fri, 25 Apr 2008 09:29:45 GMT > GD::Graph is up to date. > > Any help regarding this: http://www.pastecode.com.ar/f37c1cd60 > would be appreciated. > > Best, > SB. > > -- > Sebasti?n Bassi (???????). Diplomado en Ciencia y > Tecnolog?a. > Curso Biologia molecular para programadores: http://tinyurl.com/2vv8w6 > Mostr? tu c?digo: http://www.pastecode.com.ar > GPG Fingerprint: 9470 0980 620D ABFC BE63 A4A4 A3DE C97D 8422 D43D > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sbassi at clubdelarazon.org Sat Apr 26 17:08:13 2008 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Sat, 26 Apr 2008 18:08:13 -0300 Subject: [Bioperl-l] bioperl installation problem In-Reply-To: References: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> Message-ID: <9e2f512b0804261408l45ff9f91j94f44065d21cd65f@mail.gmail.com> On Sat, Apr 26, 2008 at 4:23 PM, Jason Stajich wrote: > the error refers to the 'Graph' module not 'GD::Graph'; You are right, but I have it also installed: sbassi at ubuntuMAP:~$ sudo perl -MCPAN -e 'install Graph' Password: CPAN: Storable loaded ok Going to read /home/sbassi/.cpan/Metadata Database was generated on Fri, 25 Apr 2008 09:29:45 GMT Graph is up to date. -- Sebasti?n Bassi (???????). Diplomado en Ciencia y Tecnolog?a. Curso Biologia molecular para programadores: http://tinyurl.com/2vv8w6 Mostr? tu c?digo: http://www.pastecode.com.ar GPG Fingerprint: 9470 0980 620D ABFC BE63 A4A4 A3DE C97D 8422 D43D From bix at sendu.me.uk Sat Apr 26 19:30:56 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 27 Apr 2008 00:30:56 +0100 Subject: [Bioperl-l] bioperl installation problem In-Reply-To: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> References: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> Message-ID: <4813BB30.6060703@sendu.me.uk> Sebastian Bassi wrote: > I tried to install bioperl because I need to install cviewer. > Here (http://www.pastecode.com.ar/f37c1cd60) are both stdout and sdterr outputs. > > Here is one of the errors I get: > > set_attribute: not a compat02 graph at > /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 10. > sleeping for 3 seconds > set_attribute: not a compat02 graph at > /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 14. You're trying to install a very old version of Bioperl which apparently uses behaviour of the Graph module no longer supported: http://search.cpan.org/~jhi/Graph-0.84/lib/Graph.pod#Backward_compatibility_with_Graph_0.2 Your options are to force install your desired version of Bioperl (if you don't need to use the modules that are causing the errors you get), downgrade your version of Graph to pre-0.2, or install the latest version of Bioperl (1.5.2 or from svn). From dr.hogart at gmail.com Sun Apr 27 10:05:20 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Sun, 27 Apr 2008 18:05:20 +0400 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics Message-ID: Hi all, is it possible to add a GD::graphic object (chart) to Bio::Graphics panel to obtain a file with image of both the chart and bioseq object? From Russell.Smithies at agresearch.co.nz Sun Apr 27 17:27:23 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 28 Apr 2008 09:27:23 +1200 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics In-Reply-To: References: Message-ID: You can get the GD object back from the Bio::Graphics::Panel then draw on it using GD methods Eg: #create a BioPerl panel my $panel = Bio::Graphics::Panel->new( -length => 600 -width => 800, -bgcolor => 'white' ); # add your features my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => 200,); $panel->add_track($feature, glyph => 'segments', -label => 0, -height => 30, -bgcolor => 'red', -fgcolor => 'red' ); # grab the GD thingy my $gd = $panel->gd; #create a color - not sure if there's a better way? $black = $gd->colorAllocate(0,0,0); #draw on your GD thingy $gd->line(10,10,$panel->width -10,10,$black); $gd->string(gdSmallFont,20,10,'test' ,'$black); # print it as normal print $panel->png; > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of sergei ryazansky > Sent: Monday, 28 April 2008 2:05 a.m. > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics > > Hi all, > > is it possible to add a GD::graphic object (chart) to Bio::Graphics panel > to obtain a file with image of both the chart and bioseq object? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From dr.hogart at gmail.com Sun Apr 27 20:25:18 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Mon, 28 Apr 2008 04:25:18 +0400 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics References: Message-ID: Thanks for answer! Yours script works fine, but nevertheless, as for as I understand 'gd' method return the gd::image object. But I need the to merge bioseq object with gd::graph object (gd::graph::area). Is it possible? Or maybe I misunderstood something in your example? On Mon, 28 Apr 2008 01:27:23 +0400, Smithies, Russell wrote: > You can get the GD object back from the Bio::Graphics::Panel then draw > on it using GD methods > > Eg: > > #create a BioPerl panel > my $panel = Bio::Graphics::Panel->new( > -length => 600 > -width => 800, > -bgcolor => 'white' > ); > # add your features > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => > 200,); > $panel->add_track($feature, glyph => 'segments', > -label => 0, > -height => 30, > -bgcolor => 'red', > -fgcolor => 'red' > ); > > # grab the GD thingy > my $gd = $panel->gd; > > #create a color - not sure if there's a better way? > $black = $gd->colorAllocate(0,0,0); > > #draw on your GD thingy > $gd->line(10,10,$panel->width -10,10,$black); > $gd->string(gdSmallFont,20,10,'test' ,'$black); > > # print it as normal > print $panel->png; > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of sergei ryazansky >> Sent: Monday, 28 April 2008 2:05 a.m. >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics >> >> Hi all, >> >> is it possible to add a GD::graphic object (chart) to Bio::Graphics > panel >> to obtain a file with image of both the chart and bioseq object? >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= From Bank.Beszteri at awi.de Mon Apr 28 08:18:20 2008 From: Bank.Beszteri at awi.de (=?UTF-8?B?QsOhbmsgQmVzenRlcmk=?=) Date: Mon, 28 Apr 2008 14:18:20 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47FB204F.90405@awi.de> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> Message-ID: <4815C08C.1060305@awi.de> Dear BioSQL / bioperl-db-ists, I would like to share my experiences with trying to load uniprot_trembl into a BioSQL db, and also to ask a couple of questions; perhaps some of you know the problems I encountered. I used bioperl-live and bioperl-db-live as of 2008-04-03 and uniprot_trembl.dat as of 2008-04-04. The command was like load_seqdatabase.pl --safe --logchunk 1000 --host dbserv --dbname abc --dbuser efg --dbpass xyz --driver mysql --namespace uniprot_trembl --format embl uniprot_trembl.dat although I split the dat file into 10 chunks and started them parallel to make it faster. This did not go quite as smoothly as Swissprot did. In the end, it seems to have loaded 5022284 entries of the 5443284 which appear to be there in the input file (when counting with grep -c "ID "). Besides the harmless taxonomy warnings which also appear with Swissprot (and have been discussed about here a couple of weeks ago and also earlier), there came a couple of more serious errors. Perhaps some of you know them already: First of all, the below error seems to lead to a crash, in spite of --safe: >>> ------------- EXCEPTION ------------- MSG: A1XDT7 seems to have an invalid species classification. STACK Bio::SeqIO::embl::_read_EMBL_Species /home/biocl/bbeszter/lib/bioperl-live/bioperl-live/Bio/SeqIO/embl.pm:108 7 STACK Bio::SeqIO::embl::next_seq /home/biocl/bbeszter/lib/bioperl-live/bioperl-live/Bio/SeqIO/embl.pm:320 STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:634 ------------------------------------- Command exited with non-zero status 255 <<< What this is about is NCBI Tax_ID:435 (Acetobacter aceti; it has some 30 synonyms in my DB, too), which, to me, looks like a completely normal taxon: I could follow its taxonomy up to the root in my NCBI taxonomy in the BioSQL DB I used. I don?t know if someone else has seen / can reproduce the problem, or should I think about some problem with my taxonomy db? Besides, is it the expected behaviour from load_seqdatabase.pl to die upon this error? ################### The other problems did not lead to a crash, only to a failure to load the sequence, which would be what I?d expect with --safe. The first type of errors looks like >>> Could not store Q49I36: ------------- EXCEPTION ------------- MSG: Unique key query in Bio::DB::BioSQL::SpeciesAdaptor returned 2 rows instead of 1. Query was [name_class="scientific name",binomial="Onchocerca volvulus"] STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:958 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:854 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182 STACK Bio::DB::Persistent::PersistentObject::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:244 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:271 STACK (eval) /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:630 STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:612 ------------------------------------- <<< In this particular case, "Onchocerca volvulus" does indeed have two taxon_ids in my DB (6282 and 563188, of which only the first one is returned by a web search at NCBI taxonomy); but the same thing happened with a number of other taxa (followed by how many times the above error was caused by the particular taxa): Wolbachia pipientis 64 Hemerocallis sp. 1 Hypsiglena torquata 3 Salmonella enterica 1211 Burkholderia sp. 31 Streptococcus sp. 4 Rhizobium sp. 600 Nostoc sp. 19 Drosophila sp. 18 Onchocerca volvulus 62 Atlapetes schistaceus 4 Symbiodinium sp. 3 Escherichia coli 7421 Hieraaetus fasciatus 4 Borrelia burgdorferi group 1 Pseudomonas sp. 29 Rotavirus A 1076 Gorilla gorilla 746 Rana plancyi 14 unclassified sequences 1 (This should be 11312 cases altogether, but the list might be incomplete because I accidentally removed one of my logs, which contained STDOUT &STDERR ~ for 10 % of the entries) Again, is this a known problem for some of you, or could there be a problem with my copy of NCBI taxonomy? I don?t remember having updated it after the initial upload, so I?m quite surprised by such duplicate entries.... ################### Type 2 error w/o crash: >>> Could not store A5HU09: ------------- EXCEPTION ------------- MSG: create: object (Bio::Species) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206 STACK Bio::DB::Persistent::PersistentObject::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:244 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:271 STACK (eval) /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:630 STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:612 <<< This particular record has the NCBI_TaxID 44271, which looks completely normal in the NCBI taxonomy loaded in my BioSQL DB, but the same problem appeared in 53 further cases (I could not look into them in detail as yet to see whether they were all the same species). On the other hand, 7 records which were succesfully loaded have this taxonomy ID in the DB (44271). ################### Nr 3 no crash: >>> Could not store Q6T859: Unmatched ( in regex; marked by <-- HERE in m/Camelina microcarpa (Littlepod false flax) ( <-- HERE microcarpa subsp.\s+/ at /home/biocl/bbeszter/lib/bioperl-live/bioperl-live/Bio/Species.pm line 466, line 357048. <<< This happens in the sub binomial in Species.pm using the option "FULL", which requests to also return subspecies. I have not looked much deeper into this yet, but is it possible that there is a parsing problem with multi-line species strings? In the above case the OS field in uniprot_trembl.dat looks like OS Camelina microcarpa (Littlepod false flax) (Camelina microcarpa subsp. OS sylvestris). ################### I?m still looking for where the remaining records disappeared: of the 421000 records not showing up in the DB, I could find these: crasher (Tax_ID=435): 45 entries problem 1 ("MSG: Unique key query in Bio::DB::BioSQL::SpeciesAdaptor returned 2 rows instead of 1."): 11312 entries problem 2 ("MSG: create: object (Bio::Species) failed to insert or to be found by unique key"): 54 entries problem 3 ("Unmatched ( in regex"): 28241 entries 381348 still remain... Although these could in principle come from the first 10 %, for which I don?t have the output, but they don?t seem to: after restarting that chunk, I get ~ 30 "Could not store" errors. So the last question: are there any error messages I can expect which don?t contain "Could not store" and which I thus missed here? Bank Beszteri Bioinformatics Alfred Wegener Institute for Polar and Marine Research Am Handelshafen 12 27570 Bremerhaven From cjfields at uiuc.edu Mon Apr 28 09:20:39 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 28 Apr 2008 08:20:39 -0500 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <4815C08C.1060305@awi.de> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> <4815C08C.1060305@awi.de> Message-ID: <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> On Apr 28, 2008, at 7:18 AM, B?nk Beszteri wrote: > Dear BioSQL / bioperl-db-ists, > > I would like to share my experiences with trying to load > uniprot_trembl into a BioSQL db, and also to ask a couple of > questions; perhaps some of you know the problems I encountered. I > used bioperl-live and bioperl-db-live as of 2008-04-03 and > uniprot_trembl.dat as of 2008-04-04. The command was like > > load_seqdatabase.pl --safe --logchunk 1000 --host dbserv --dbname > abc --dbuser efg --dbpass xyz --driver mysql --namespace > uniprot_trembl --format embl uniprot_trembl.dat > > .... > > First of all, the below error seems to lead to a crash, in spite of > --safe: > > >>> > ------------- EXCEPTION ------------- > MSG: A1XDT7 seems to have an invalid species classification. > STACK Bio::SeqIO::embl::_read_EMBL_Species /home/biocl/bbeszter/lib/ > bioperl-live/bioperl-live/Bio/SeqIO/embl.pm:108 > 7 > STACK Bio::SeqIO::embl::next_seq /home/biocl/bbeszter/lib/bioperl- > live/bioperl-live/Bio/SeqIO/embl.pm:320 > STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/ > scripts/biosql/load_seqdatabase.pl:634 > ------------------------------------- > > Command exited with non-zero status 255 > <<< > > What this is about is NCBI Tax_ID:435 (Acetobacter aceti; it has > some 30 synonyms in my DB, too), which, to me, looks like a > completely normal taxon: I could follow its taxonomy up to the root > in my NCBI taxonomy in the BioSQL DB I used. I don?t know if someone > else has seen / can reproduce the problem, or should I think about > some problem with my taxonomy db? Besides, is it the expected > behaviour from load_seqdatabase.pl to die upon this error? ... You should use 'swiss' format instead of 'embl' when loading Uniprot/ SwissProt sequences. Though on the surface they're similar the feature table (among other things) is completely different. I'm not sure if that's causing all of the issues here but it certainly could contribute to them. In the meantime, it's much easier for us to track these problems if you file a bug (BioPerl, file for bioperl-db): http://bugzilla.open-bio.org/ chris From cjfields at uiuc.edu Sun Apr 27 17:54:03 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 27 Apr 2008 16:54:03 -0500 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics In-Reply-To: References: Message-ID: I think this is how some of the synteny mapping is done using SynBrowse (the trapezoids connecting syntenous genes on different tracks). http://www.gmod.org/wiki/index.php/SynView chris On Apr 27, 2008, at 4:27 PM, Smithies, Russell wrote: > You can get the GD object back from the Bio::Graphics::Panel then > draw > on it using GD methods > > Eg: > > #create a BioPerl panel > my $panel = Bio::Graphics::Panel->new( > -length => 600 > -width => 800, > -bgcolor => 'white' > ); > # add your features > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => > 200,); > $panel->add_track($feature, glyph => 'segments', > -label => 0, > -height => 30, > -bgcolor => 'red', > -fgcolor => 'red' > ); > > # grab the GD thingy > my $gd = $panel->gd; > > #create a color - not sure if there's a better way? > $black = $gd->colorAllocate(0,0,0); > > #draw on your GD thingy > $gd->line(10,10,$panel->width -10,10,$black); > $gd->string(gdSmallFont,20,10,'test' ,'$black); > > # print it as normal > print $panel->png; > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of sergei ryazansky >> Sent: Monday, 28 April 2008 2:05 a.m. >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics >> >> Hi all, >> >> is it possible to add a GD::graphic object (chart) to Bio::Graphics > panel >> to obtain a file with image of both the chart and bioseq object? >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Bank.Beszteri at awi.de Mon Apr 28 09:51:53 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Mon, 28 Apr 2008 15:51:53 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> <4815C08C.1060305@awi.de> <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> Message-ID: <4815D679.3070307@awi.de> Chris Fields schrieb: > > ... > > You should use 'swiss' format instead of 'embl' when loading > Uniprot/SwissProt sequences. Though on the surface they're similar > the feature table (among other things) is completely different. I'm > not sure if that's causing all of the issues here but it certainly > could contribute to them. > > In the meantime, it's much easier for us to track these problems if > you file a bug (BioPerl, file for bioperl-db): > > http://bugzilla.open-bio.org/ > Hi Chris, I will do so; in the meanwhile: I?m not loading Swissprot, but TrEMBL. Is swiss also the appropriate format here? By reading http://expasy.org/sprot/userman.html#diffEMBL, I concluded that embl should be the one I?d need for TrEMBL. Bank From cjfields at uiuc.edu Mon Apr 28 12:24:39 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 28 Apr 2008 11:24:39 -0500 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <4815D679.3070307@awi.de> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> <4815C08C.1060305@awi.de> <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> <4815D679.3070307@awi.de> Message-ID: On Apr 28, 2008, at 8:51 AM, B?nk Beszteri wrote: > Chris Fields schrieb: >> >> ... >> >> You should use 'swiss' format instead of 'embl' when loading >> Uniprot/SwissProt sequences. Though on the surface they're similar >> the feature table (among other things) is completely different. >> I'm not sure if that's causing all of the issues here but it >> certainly could contribute to them. >> >> In the meantime, it's much easier for us to track these problems if >> you file a bug (BioPerl, file for bioperl-db): >> >> http://bugzilla.open-bio.org/ >> > Hi Chris, > > I will do so; in the meanwhile: I?m not loading Swissprot, but > TrEMBL. Is swiss also the appropriate format here? By reading http://expasy.org/sprot/userman.html#diffEMBL > , I concluded that embl should be the one I?d need for TrEMBL. > > Bank The section you link to describes several important differences between EMBL and SwissProt/UniProt format (i.e. how each indicated line type differs between SwissProt and EMBL formats, including ID, AC, OS/OC, FT, etc). I'm unsure how you derived that 'embl' would work from that, e.g. they are close, but there are enough significant differences that using 'embl' for SwissProt (or vice versa) will not work as intended, if at all. chris From hlapp at gmx.net Mon Apr 28 15:46:07 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 28 Apr 2008 15:46:07 -0400 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <4815D679.3070307@awi.de> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> <4815C08C.1060305@awi.de> <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> <4815D679.3070307@awi.de> Message-ID: <3BD6A261-D023-4A5F-9CBC-C3216B0145F0@gmx.net> On Apr 28, 2008, at 9:51 AM, B?nk Beszteri wrote: > I?m not loading Swissprot, but TrEMBL. Is swiss also the > appropriate format here? Yes, though I guess it can be confusing. Maybe we should create a symlink uniprot.pm to swiss.pm, or in fact fork them if UniProt starts accumulating enough differences from the traditional Swissprot format. BTW as you had noticed, the --safe switch only protects the script from crashing due to a db loading error. A parsing error will still cause a crash. I guess you can argue that that's not nice, and having a chance to skip over the record that offends the (BioPerl) parser would be useful. The problem is that if the parser errors out, it's not guaranteed where we are in the file and whether the parser module is in a state that it can recover itself from. For the database it's a bit easier as one just needs to rollback() the transaction (each sequence is its own transaction). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From Russell.Smithies at agresearch.co.nz Mon Apr 28 17:15:16 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 29 Apr 2008 09:15:16 +1200 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics In-Reply-To: References: Message-ID: I thought it was a bit of a hack but I guess if someone else is doing it too, it can't be all bad :-) It looks like you can combine your drawing methods like this: (I'm sure Lincoln will tell us this is bad but it seems to work ok) ------------------------------------------------------------------------ ------------- #!perl -w use GD::Graph::lines; use GD::Graph::colour; use GD::Graph::Data; use Bio::Graphics; use Bio::SeqFeature::Generic; # create and draw on a graphics panel my $panel = Bio::Graphics::Panel->new( -length => 500, -width => 500 ); my $track = $panel->add_track( -glyph => 'generic', -label => 1 ); # create and add a few features for($i = 100; $i < 500; $i+= 100){ my $feature = Bio::SeqFeature::Generic->new( -display_name => "feature: $i", -score => $i, -start => $i, -end => $i + 100 ); $track->add_feature($feature); } # create and draw the graph my @data = ( ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], [ sort { $a <=> $b } (1, 2, 5, 6, 3, 1.5, 1, 3, 4) ] ); my $graph = GD::Graph::lines->new(500, 300); $graph->set( x_label => 'X Label', y_label => 'Y label', title => 'Some simple graph', y_max_value => 8, y_tick_number => 8, y_label_skip => 2 ) or die $graph->error; $graph->set( dclrs => [ qw( green blue black red pink) ] ); my $gd = $graph->plot(\@data) or die $graph->error; # combine the two images my $combined = $panel->gd($gd); open(IMG, '>file.png') or die $!; binmode IMG; print IMG $combined->png; ------------------------------------------------------------------------ ------------------ > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Monday, 28 April 2008 9:54 a.m. > To: Smithies, Russell > Cc: sergei ryazansky; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] addition of GD::graphic object to Bio::Graphics > > I think this is how some of the synteny mapping is done using > SynBrowse (the trapezoids connecting syntenous genes on different > tracks). > > http://www.gmod.org/wiki/index.php/SynView > > chris > > On Apr 27, 2008, at 4:27 PM, Smithies, Russell wrote: > > > You can get the GD object back from the Bio::Graphics::Panel then > > draw > > on it using GD methods > > > > Eg: > > > > #create a BioPerl panel > > my $panel = Bio::Graphics::Panel->new( > > -length => 600 > > -width => 800, > > -bgcolor => 'white' > > ); > > # add your features > > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => > > 200,); > > $panel->add_track($feature, glyph => 'segments', > > -label => 0, > > -height => 30, > > -bgcolor => 'red', > > -fgcolor => 'red' > > ); > > > > # grab the GD thingy > > my $gd = $panel->gd; > > > > #create a color - not sure if there's a better way? > > $black = $gd->colorAllocate(0,0,0); > > > > #draw on your GD thingy > > $gd->line(10,10,$panel->width -10,10,$black); > > $gd->string(gdSmallFont,20,10,'test' ,'$black); > > > > # print it as normal > > print $panel->png; > > > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open- > >> bio.org] On Behalf Of sergei ryazansky > >> Sent: Monday, 28 April 2008 2:05 a.m. > >> To: bioperl-l at bioperl.org > >> Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics > >> > >> Hi all, > >> > >> is it possible to add a GD::graphic object (chart) to Bio::Graphics > > panel > >> to obtain a file with image of both the chart and bioseq object? > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > = > > > ============================================================= > ========= > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other use of, > > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > = > > > ============================================================= > ========= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From lincoln.stein at gmail.com Mon Apr 28 17:33:19 2008 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 28 Apr 2008 17:33:19 -0400 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics In-Reply-To: References: Message-ID: <6dce9a0b0804281433i697cda2fo2c47ce59010d0858@mail.gmail.com> Hi, No, I'm perfectly happy with combining images like this. It is part of what I intended. Another idea would be to use the Image glyph to embed graphs at particular genomic locations in the panel. Right now the glyph is designed in the expectation that the image passed to it is sitting on the file system (or a web URL), but it would be easy to modify it so that a callback can generate the GD on the fly, by using, for example GD::Graph. Lincoln On Mon, Apr 28, 2008 at 5:15 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > I thought it was a bit of a hack but I guess if someone else is doing it > too, it can't be all bad :-) > > It looks like you can combine your drawing methods like this: > (I'm sure Lincoln will tell us this is bad but it seems to work ok) > ------------------------------------------------------------------------ > ------------- > > #!perl -w > use GD::Graph::lines; > use GD::Graph::colour; > use GD::Graph::Data; > > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > # create and draw on a graphics panel > my $panel = Bio::Graphics::Panel->new( > -length => 500, > -width => 500 > ); > my $track = $panel->add_track( > -glyph => 'generic', > -label => 1 > ); > > # create and add a few features > for($i = 100; $i < 500; $i+= 100){ > my $feature = Bio::SeqFeature::Generic->new( > -display_name => "feature: > $i", > -score => $i, > -start => $i, > -end => $i + 100 > ); > $track->add_feature($feature); > } > > > # create and draw the graph > my @data = ( > ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], > [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], > [ sort { $a <=> $b } (1, 2, 5, 6, 3, 1.5, 1, 3, 4) ] > ); > my $graph = GD::Graph::lines->new(500, 300); > > $graph->set( > x_label => 'X Label', > y_label => 'Y label', > title => 'Some simple graph', > y_max_value => 8, > y_tick_number => 8, > y_label_skip => 2 > ) or die $graph->error; > > $graph->set( dclrs => [ qw( green blue black red pink) ] ); > > my $gd = $graph->plot(\@data) or die $graph->error; > > # combine the two images > my $combined = $panel->gd($gd); > > open(IMG, '>file.png') or die $!; > binmode IMG; > print IMG $combined->png; > > ------------------------------------------------------------------------ > ------------------ > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > Sent: Monday, 28 April 2008 9:54 a.m. > > To: Smithies, Russell > > Cc: sergei ryazansky; bioperl-l at bioperl.org > > Subject: Re: [Bioperl-l] addition of GD::graphic object to > Bio::Graphics > > > > I think this is how some of the synteny mapping is done using > > SynBrowse (the trapezoids connecting syntenous genes on different > > tracks). > > > > http://www.gmod.org/wiki/index.php/SynView > > > > chris > > > > On Apr 27, 2008, at 4:27 PM, Smithies, Russell wrote: > > > > > You can get the GD object back from the Bio::Graphics::Panel then > > > draw > > > on it using GD methods > > > > > > Eg: > > > > > > #create a BioPerl panel > > > my $panel = Bio::Graphics::Panel->new( > > > -length => 600 > > > -width => > 800, > > > -bgcolor => 'white' > > > ); > > > # add your features > > > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => > > > 200,); > > > $panel->add_track($feature, glyph => 'segments', > > > -label => 0, > > > -height => 30, > > > -bgcolor => 'red', > > > -fgcolor => 'red' > > > ); > > > > > > # grab the GD thingy > > > my $gd = $panel->gd; > > > > > > #create a color - not sure if there's a better way? > > > $black = $gd->colorAllocate(0,0,0); > > > > > > #draw on your GD thingy > > > $gd->line(10,10,$panel->width -10,10,$black); > > > $gd->string(gdSmallFont,20,10,'test' ,'$black); > > > > > > # print it as normal > > > print $panel->png; > > > > > > > > > > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org > > > [mailto:bioperl-l-bounces at lists.open- > > >> bio.org] On Behalf Of sergei ryazansky > > >> Sent: Monday, 28 April 2008 2:05 a.m. > > >> To: bioperl-l at bioperl.org > > >> Subject: [Bioperl-l] addition of GD::graphic object to > Bio::Graphics > > >> > > >> Hi all, > > >> > > >> is it possible to add a GD::graphic object (chart) to Bio::Graphics > > > panel > > >> to obtain a file with image of both the chart and bioseq object? > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > = > > > > > ============================================================= > > ========= > > > Attention: The information contained in this message and/or > > > attachments > > > from AgResearch Limited is intended only for the persons or entities > > > to which it is addressed and may contain confidential and/or > > > privileged > > > material. Any review, retransmission, dissemination or other use of, > > > or > > > taking of any action in reliance upon, this information by persons > or > > > entities other than the intended recipients is prohibited by > > > AgResearch > > > Limited. If you have received this message in error, please notify > the > > > sender immediately. > > > = > > > > > ============================================================= > > ========= > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dr.hogart at gmail.com Tue Apr 29 03:56:24 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Tue, 29 Apr 2008 11:56:24 +0400 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics References: Message-ID: Thank you very much! It is exactly that I was looking for. On Tue, 29 Apr 2008 01:15:16 +0400, Smithies, Russell wrote: > I thought it was a bit of a hack but I guess if someone else is doing it > too, it can't be all bad :-) > > It looks like you can combine your drawing methods like this: > (I'm sure Lincoln will tell us this is bad but it seems to work ok) > ------------------------------------------------------------------------ > ------------- > > #!perl -w > use GD::Graph::lines; > use GD::Graph::colour; > use GD::Graph::Data; > > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > # create and draw on a graphics panel > my $panel = Bio::Graphics::Panel->new( > -length => 500, > -width => 500 > ); > my $track = $panel->add_track( > -glyph => 'generic', > -label => 1 > ); > > # create and add a few features > for($i = 100; $i < 500; $i+= 100){ > my $feature = Bio::SeqFeature::Generic->new( > -display_name => "feature: > $i", > -score => $i, > -start => $i, > -end => $i + 100 > ); > $track->add_feature($feature); > } > > > # create and draw the graph > my @data = ( > ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], > [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], > [ sort { $a <=> $b } (1, 2, 5, 6, 3, 1.5, 1, 3, 4) ] > ); > my $graph = GD::Graph::lines->new(500, 300); > > $graph->set( > x_label => 'X Label', > y_label => 'Y label', > title => 'Some simple graph', > y_max_value => 8, > y_tick_number => 8, > y_label_skip => 2 > ) or die $graph->error; > > $graph->set( dclrs => [ qw( green blue black red pink) ] ); > > my $gd = $graph->plot(\@data) or die $graph->error; > > # combine the two images > my $combined = $panel->gd($gd); > > open(IMG, '>file.png') or die $!; > binmode IMG; > print IMG $combined->png; > > ------------------------------------------------------------------------ > ------------------ > >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at uiuc.edu] >> Sent: Monday, 28 April 2008 9:54 a.m. >> To: Smithies, Russell >> Cc: sergei ryazansky; bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] addition of GD::graphic object to > Bio::Graphics >> >> I think this is how some of the synteny mapping is done using >> SynBrowse (the trapezoids connecting syntenous genes on different >> tracks). >> >> http://www.gmod.org/wiki/index.php/SynView >> >> chris >> >> On Apr 27, 2008, at 4:27 PM, Smithies, Russell wrote: >> >> > You can get the GD object back from the Bio::Graphics::Panel then >> > draw >> > on it using GD methods >> > >> > Eg: >> > >> > #create a BioPerl panel >> > my $panel = Bio::Graphics::Panel->new( >> > -length => 600 >> > -width => > 800, >> > -bgcolor => 'white' >> > ); >> > # add your features >> > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => >> > 200,); >> > $panel->add_track($feature, glyph => 'segments', >> > -label => 0, >> > -height => 30, >> > -bgcolor => 'red', >> > -fgcolor => 'red' >> > ); >> > >> > # grab the GD thingy >> > my $gd = $panel->gd; >> > >> > #create a color - not sure if there's a better way? >> > $black = $gd->colorAllocate(0,0,0); >> > >> > #draw on your GD thingy >> > $gd->line(10,10,$panel->width -10,10,$black); >> > $gd->string(gdSmallFont,20,10,'test' ,'$black); >> > >> > # print it as normal >> > print $panel->png; >> > >> > >> > >> > >> >> -----Original Message----- >> >> From: bioperl-l-bounces at lists.open-bio.org >> > [mailto:bioperl-l-bounces at lists.open- >> >> bio.org] On Behalf Of sergei ryazansky >> >> Sent: Monday, 28 April 2008 2:05 a.m. >> >> To: bioperl-l at bioperl.org >> >> Subject: [Bioperl-l] addition of GD::graphic object to > Bio::Graphics >> >> >> >> Hi all, >> >> >> >> is it possible to add a GD::graphic object (chart) to Bio::Graphics >> > panel >> >> to obtain a file with image of both the chart and bioseq object? >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > = >> > >> ============================================================= >> ========= >> > Attention: The information contained in this message and/or >> > attachments >> > from AgResearch Limited is intended only for the persons or entities >> > to which it is addressed and may contain confidential and/or >> > privileged >> > material. Any review, retransmission, dissemination or other use of, >> > or >> > taking of any action in reliance upon, this information by persons > or >> > entities other than the intended recipients is prohibited by >> > AgResearch >> > Limited. If you have received this message in error, please notify > the >> > sender immediately. >> > = >> > >> ============================================================= >> ========= >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= -- ?????????? M2, ????????????? ???????? ?????????? Opera: http://www.opera.com/mail/mail/ From d.gatherer at mrcvu.gla.ac.uk Tue Apr 29 08:21:05 2008 From: d.gatherer at mrcvu.gla.ac.uk (Derek Gatherer) Date: Tue, 29 Apr 2008 13:21:05 +0100 Subject: [Bioperl-l] translate() oddities Message-ID: Hi I thought I'd better run this by the community before I embarrass myself on Bugzilla. It seems like a clear bug to me. I'm running Bioperl 1.5.0 on RedHat. For a test input: >test ATGATGATGATGATGTGA the following code is fine. while((my $seqobj = $seq_in->next_seq())) { print "\n".$seqobj->display_id; my $len = $seqobj->length(); print " length: $len"; my $frame1_obj = $seqobj->translate(); my $f1_prot = $frame1_obj->seq(); print "\n$f1_prot"; } Output: test length: 18 MMMMM* But if I want to change the frame as specified in the BioPerl tutorial, by using: my $frame1_obj = $seqobj->translate(frame => 1); # which should now give frame 2, I get: test length: 18 MMMMM-frame The frame is unchanged and the text "-frame" is tacked on the end of the output. The same occurs with translate(frame => 2). Any ideas? Can something as fundamental as translate() really be bugged? or am I guilty of some particularly heinous syntax error? Cheers Derek From tristan.lefebure at gmail.com Tue Apr 29 09:58:21 2008 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Tue, 29 Apr 2008 09:58:21 -0400 Subject: [Bioperl-l] translate() oddities In-Reply-To: References: Message-ID: <200804290958.21548.tristan.lefebure@gmail.com> Aren't you forgetting the dash? my $frame1_obj = $seqobj->translate(-frame => 1) On Tuesday 29 April 2008 08:21:05 Derek Gatherer wrote: > my $frame1_obj = $seqobj->translate(frame => 1) -Tristan From d.gatherer at mrcvu.gla.ac.uk Tue Apr 29 10:05:03 2008 From: d.gatherer at mrcvu.gla.ac.uk (Derek Gatherer) Date: Tue, 29 Apr 2008 15:05:03 +0100 Subject: [Bioperl-l] translate() oddities In-Reply-To: <481726BF.1060609@bms.com> References: <481726BF.1060609@bms.com> Message-ID: Thanks Stefan Actually, there was a typo in my message, I did use -frame => 1. However, the problem disappears on upgrading from 1.5.0 to 1.5.2. So not a bug anymore. Cheers Derek At 14:46 29/04/2008, Stefan Kirov wrote: >my $frame1_obj = $seqobj->translate(-frame => 1); >not >my $frame1_obj = $seqobj->translate(frame => 1); >Stefan > >Derek Gatherer wrote: > > Hi > > > > I thought I'd better run this by the community before I embarrass > > myself on Bugzilla. It seems like a clear bug to me. I'm running > > Bioperl 1.5.0 on RedHat. > > > > For a test input: > > > > >test > > ATGATGATGATGATGTGA > > > > the following code is fine. > > > > while((my $seqobj = $seq_in->next_seq())) > > { > > print "\n".$seqobj->display_id; > > my $len = $seqobj->length(); > > print " length: $len"; > > my $frame1_obj = $seqobj->translate(); > > my $f1_prot = $frame1_obj->seq(); > > print "\n$f1_prot"; > > } > > > > Output: > > > > test length: 18 > > MMMMM* > > > > But if I want to change the frame as specified in the BioPerl > > tutorial, by using: > > > > my $frame1_obj = $seqobj->translate(frame => 1); # which should now > > give frame 2, I get: > > > > test length: 18 > > MMMMM-frame > > > > The frame is unchanged and the text "-frame" is tacked on the end of > > the output. The same occurs with translate(frame => 2). > > > > Any ideas? Can something as fundamental as translate() really be > > bugged? or am I guilty of some particularly heinous syntax error? > > > > Cheers > > Derek > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From l.douchy at gmail.com Tue Apr 29 10:16:40 2008 From: l.douchy at gmail.com (Laurent DOUCHY) Date: Tue, 29 Apr 2008 16:16:40 +0200 Subject: [Bioperl-l] translate() oddities In-Reply-To: <200804290958.21548.tristan.lefebure@gmail.com> References: <200804290958.21548.tristan.lefebure@gmail.com> Message-ID: <2fb209dd0804290716x36e403dek55978dc4f54e34ff@mail.gmail.com> Hello, I resolved this issue in Bio::seqIO with the following line : my $sequence = $seq->translate('*', 'X', '0', '1', '0', '0', '0', '0')->seq; the third parameter set the frame. I hope to have been helpful. laurent. On Tue, Apr 29, 2008 at 3:58 PM, Tristan Lefebure < tristan.lefebure at gmail.com> wrote: > Aren't you forgetting the dash? > > my $frame1_obj = $seqobj->translate(-frame => 1) > > > On Tuesday 29 April 2008 08:21:05 Derek Gatherer wrote: > > my $frame1_obj = $seqobj->translate(frame => 1) > > > > -Tristan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From roy.chaudhuri at gmail.com Tue Apr 29 10:27:10 2008 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 29 Apr 2008 15:27:10 +0100 Subject: [Bioperl-l] translate() oddities In-Reply-To: References: <481726BF.1060609@bms.com> Message-ID: <4817303E.1040903@gmail.com> Spent two minutes looking at this, so may as well chip in with what I discovered even though you solved your problem. This "bug" comes about because in version 1.5.1 and earlier, the arguments to translate were a simple list, with the first argument the terminator (defaults to "*"). Your old version therefore assumed that you wanted to translate the stop codon to "-frame". Amusingly given your typo, if you miss the hyphen off the frame argument in version 1.5.2 it reverts to the old interface and you end up with the output "MMMMMframe". The moral of the story is of course to read the docs relevant to the version you are using. Roy. -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. Derek Gatherer wrote: > Thanks Stefan > > Actually, there was a typo in my message, I did use -frame => > 1. However, the problem disappears on upgrading from 1.5.0 to 1.5.2. > > So not a bug anymore. > > Cheers > Derek > > At 14:46 29/04/2008, Stefan Kirov wrote: >> my $frame1_obj = $seqobj->translate(-frame => 1); >> not >> my $frame1_obj = $seqobj->translate(frame => 1); >> Stefan >> >> Derek Gatherer wrote: >>> Hi >>> >>> I thought I'd better run this by the community before I embarrass >>> myself on Bugzilla. It seems like a clear bug to me. I'm running >>> Bioperl 1.5.0 on RedHat. >>> >>> For a test input: >>> >>>> test >>> ATGATGATGATGATGTGA >>> >>> the following code is fine. >>> >>> while((my $seqobj = $seq_in->next_seq())) >>> { >>> print "\n".$seqobj->display_id; >>> my $len = $seqobj->length(); >>> print " length: $len"; >>> my $frame1_obj = $seqobj->translate(); >>> my $f1_prot = $frame1_obj->seq(); >>> print "\n$f1_prot"; >>> } >>> >>> Output: >>> >>> test length: 18 >>> MMMMM* >>> >>> But if I want to change the frame as specified in the BioPerl >>> tutorial, by using: >>> >>> my $frame1_obj = $seqobj->translate(frame => 1); # which should now >>> give frame 2, I get: >>> >>> test length: 18 >>> MMMMM-frame >>> >>> The frame is unchanged and the text "-frame" is tacked on the end of >>> the output. The same occurs with translate(frame => 2). >>> >>> Any ideas? Can something as fundamental as translate() really be >>> bugged? or am I guilty of some particularly heinous syntax error? >>> >>> Cheers >>> Derek >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From stefan.kirov at bms.com Tue Apr 29 09:46:39 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Tue, 29 Apr 2008 09:46:39 -0400 Subject: [Bioperl-l] translate() oddities In-Reply-To: References: Message-ID: <481726BF.1060609@bms.com> my $frame1_obj = $seqobj->translate(-frame => 1); not my $frame1_obj = $seqobj->translate(frame => 1); Stefan Derek Gatherer wrote: > Hi > > I thought I'd better run this by the community before I embarrass > myself on Bugzilla. It seems like a clear bug to me. I'm running > Bioperl 1.5.0 on RedHat. > > For a test input: > > >test > ATGATGATGATGATGTGA > > the following code is fine. > > while((my $seqobj = $seq_in->next_seq())) > { > print "\n".$seqobj->display_id; > my $len = $seqobj->length(); > print " length: $len"; > my $frame1_obj = $seqobj->translate(); > my $f1_prot = $frame1_obj->seq(); > print "\n$f1_prot"; > } > > Output: > > test length: 18 > MMMMM* > > But if I want to change the frame as specified in the BioPerl > tutorial, by using: > > my $frame1_obj = $seqobj->translate(frame => 1); # which should now > give frame 2, I get: > > test length: 18 > MMMMM-frame > > The frame is unchanged and the text "-frame" is tacked on the end of > the output. The same occurs with translate(frame => 2). > > Any ideas? Can something as fundamental as translate() really be > bugged? or am I guilty of some particularly heinous syntax error? > > Cheers > Derek > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Apr 29 11:00:00 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 29 Apr 2008 10:00:00 -0500 Subject: [Bioperl-l] translate() oddities In-Reply-To: <4817303E.1040903@gmail.com> References: <481726BF.1060609@bms.com> <4817303E.1040903@gmail.com> Message-ID: <36045A08-AEA8-4639-A384-1DC53B5DC129@uiuc.edu> Yes the interface changed somewhat post 1.5.1, mainly to accept named parameters. I think a few methods do this now as passing in lists of more than 2 args, undef'ing those one doesn't want set, gets confusing. chris On Apr 29, 2008, at 9:27 AM, Roy Chaudhuri wrote: > Spent two minutes looking at this, so may as well chip in with what > I discovered even though you solved your problem. > > This "bug" comes about because in version 1.5.1 and earlier, the > arguments to translate were a simple list, with the first argument > the terminator (defaults to "*"). Your old version therefore assumed > that you wanted to translate the stop codon to "-frame". Amusingly > given your typo, if you miss the hyphen off the frame argument in > version 1.5.2 it reverts to the old interface and you end up with > the output "MMMMMframe". The moral of the story is of course to read > the docs relevant to the version you are using. > > Roy. > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > > Derek Gatherer wrote: >> Thanks Stefan >> Actually, there was a typo in my message, I did use -frame => 1. >> However, the problem disappears on upgrading from 1.5.0 to 1.5.2. >> So not a bug anymore. >> Cheers >> Derek >> At 14:46 29/04/2008, Stefan Kirov wrote: >>> my $frame1_obj = $seqobj->translate(-frame => 1); >>> not >>> my $frame1_obj = $seqobj->translate(frame => 1); >>> Stefan >>> >>> Derek Gatherer wrote: >>>> Hi >>>> >>>> I thought I'd better run this by the community before I embarrass >>>> myself on Bugzilla. It seems like a clear bug to me. I'm running >>>> Bioperl 1.5.0 on RedHat. >>>> >>>> For a test input: >>>> >>>>> test >>>> ATGATGATGATGATGTGA >>>> >>>> the following code is fine. >>>> >>>> while((my $seqobj = $seq_in->next_seq())) >>>> { >>>> print "\n".$seqobj->display_id; >>>> my $len = $seqobj->length(); >>>> print " length: $len"; >>>> my $frame1_obj = $seqobj->translate(); >>>> my $f1_prot = $frame1_obj->seq(); >>>> print "\n$f1_prot"; >>>> } >>>> >>>> Output: >>>> >>>> test length: 18 >>>> MMMMM* >>>> >>>> But if I want to change the frame as specified in the BioPerl >>>> tutorial, by using: >>>> >>>> my $frame1_obj = $seqobj->translate(frame => 1); # which should now >>>> give frame 2, I get: >>>> >>>> test length: 18 >>>> MMMMM-frame >>>> >>>> The frame is unchanged and the text "-frame" is tacked on the end >>>> of >>>> the output. The same occurs with translate(frame => 2). >>>> >>>> Any ideas? Can something as fundamental as translate() really be >>>> bugged? or am I guilty of some particularly heinous syntax error? >>>> >>>> Cheers >>>> Derek >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Apr 29 11:07:30 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 29 Apr 2008 10:07:30 -0500 Subject: [Bioperl-l] translate() oddities In-Reply-To: <481726BF.1060609@bms.com> References: <481726BF.1060609@bms.com> Message-ID: <18DB95FB-52B9-4091-ACEE-996891F8A5AE@uiuc.edu> As an aside, I've been playing around with perl6 (Rakudo) for a bit now. Parameter-like passing (using autoaccessors and other means) will be added in soon, so you will be able to do this: $seqobj = Seq.new(seq => 'ATGATGATGATGATGTGA', alphabet => 'dna'); my $protobj = $seq.translate(frame => 1); Yes, I'm a geek. ; > chris On Apr 29, 2008, at 8:46 AM, Stefan Kirov wrote: > my $frame1_obj = $seqobj->translate(-frame => 1); > not > my $frame1_obj = $seqobj->translate(frame => 1); > Stefan > > Derek Gatherer wrote: >> Hi >> >> I thought I'd better run this by the community before I embarrass >> myself on Bugzilla. It seems like a clear bug to me. I'm running >> Bioperl 1.5.0 on RedHat. >> >> For a test input: >> >>> test >> ATGATGATGATGATGTGA >> >> the following code is fine. >> >> while((my $seqobj = $seq_in->next_seq())) >> { >> print "\n".$seqobj->display_id; >> my $len = $seqobj->length(); >> print " length: $len"; >> my $frame1_obj = $seqobj->translate(); >> my $f1_prot = $frame1_obj->seq(); >> print "\n$f1_prot"; >> } >> >> Output: >> >> test length: 18 >> MMMMM* >> >> But if I want to change the frame as specified in the BioPerl >> tutorial, by using: >> >> my $frame1_obj = $seqobj->translate(frame => 1); # which should now >> give frame 2, I get: >> >> test length: 18 >> MMMMM-frame >> >> The frame is unchanged and the text "-frame" is tacked on the end of >> the output. The same occurs with translate(frame => 2). >> >> Any ideas? Can something as fundamental as translate() really be >> bugged? or am I guilty of some particularly heinous syntax error? >> >> Cheers >> Derek >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dr.hogart at gmail.com Tue Apr 29 11:57:51 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Tue, 29 Apr 2008 19:57:51 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine Message-ID: Hi all! I am trying to perform TCoffe aligment by Bio::Tools::Run::Alignment::TCoffee wrapper as subroutine into the script. This subroutine works fine, but it is not single subroutine - there are a lot of other ones in the script. The problem is when compilation of script finish execution (nb! successful execution) of tcoffee subroutine the compiliation of the end of the script also interrupted. It seems that the tcoffee program itself induce interraption of perl compilation. Is it possible to pass this problem? -- From darin.london at duke.edu Tue Apr 29 12:49:53 2008 From: darin.london at duke.edu (darin.london at duke.edu) Date: Tue, 29 Apr 2008 12:49:53 -0400 Subject: [Bioperl-l] BOSC 2008 Announcement and Call For Submissions Message-ID: <200804291650.m3TGnr0H020814@tenero.duhs.duke.edu> BOSC 2008 Call for Abstracts Reminder The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will take place in Toronto, Ontario, Canada, as one of several Special Interest Group (SIG) meetings occurring in conjunction with the 16th annual Intelligent Systems for Molecular Biology Conference (ISMB 2008). This is a reminder to submit your proposals for talks to the BOSC submission system before May 11. Submission Process: All abstracts must be submitted through our Open Conference Systems site (http://events.open-bio.org/BOSC2008/openconf.php). The form will ask for a small Abstract Text to be pasted into it, and a full paper. The small Abstract text should be a summary, while the longer abstract (should provide more details, including the open-source license requirement details) Full-length abstracts are limited to one page with one inch (2.5 cm) margins on the top, sides, and bottom. The full-length abstract should include the title, authors, and affiliations. We prefer your abstract to be in PDF format, although plain t Important Dates: May 11: Abstract submission deadline. June 2: Notification of accepted talks. June 4: Early registration discount cut-off. July 18-19: BOSC 2008! We hope to see you at BOSC 2008! Kam Dahlquist and Darin London BOSC 2008 Co-organizers From bix at sendu.me.uk Tue Apr 29 12:54:41 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 29 Apr 2008 17:54:41 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: Message-ID: <481752D1.7010904@sendu.me.uk> sergei ryazansky wrote: > I am trying to perform TCoffe aligment by > Bio::Tools::Run::Alignment::TCoffee wrapper as subroutine into the > script. This subroutine works fine, but it is not single subroutine - > there are a lot of other ones in the script. The problem is when > compilation of script finish execution (nb! successful execution) of > tcoffee subroutine the compiliation of the end of the script also > interrupted. It seems that the tcoffee program itself induce > interraption of perl compilation. Is it possible to pass this problem? You'll have to supply us with a minimal version of the script and the complete error message. From dr.hogart at gmail.com Wed Apr 30 07:24:35 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Wed, 30 Apr 2008 15:24:35 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine References: Message-ID: On Tue, 29 Apr 2008 19:57:51 +0400, sergei ryazansky wrote: > Hi all! > > I am trying to perform TCoffe aligment by > Bio::Tools::Run::Alignment::TCoffee wrapper as subroutine into the > script. This subroutine works fine, but it is not single subroutine - > there are a lot of other ones in the script. The problem is when > compilation of script finish execution (nb! successful execution) of > tcoffee subroutine the compiliation of the end of the script also > interrupted. It seems that the tcoffee program itself induce > interraption of perl compilation. Is it possible to pass this problem? > My subroutine is following: sub align { my $file=shift @_; my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => 'fasta', 'outfile' => 'temp_align.out'); my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); my $aln=$factory->align ($file); open (fy,'temp_align.out'); my @temp_file=; close fy; return @temp_file; } This subroutine is called by the following command: my @align_fa = align($inputfile_align); After successful execution of this subroutine (accompaning with the corresponding messages on the terminal window) the execution of remainder script is terminated without any error messages. -- From bix at sendu.me.uk Wed Apr 30 08:47:17 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 30 Apr 2008 13:47:17 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: Message-ID: <48186A55.4030406@sendu.me.uk> sergei ryazansky wrote: > My subroutine is following: > > sub align { > my $file=shift @_; > my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => > 'fasta', 'outfile' => 'temp_align.out'); > my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); > my $aln=$factory->align ($file); > open (fy,'temp_align.out'); my @temp_file=; close fy; > return @temp_file; > } > > This subroutine is called by the following command: > > my @align_fa = align($inputfile_align); > > After successful execution of this subroutine (accompaning with the > corresponding messages on the terminal window) the execution of > remainder script is terminated without any error messages. The problem lies somewhere within the rest of your script, so we have to see it if you want help. Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you don't make use of the resulting alignment object? A system call might make more sense given what you're doing. The beauty of Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse the result file (temp_align.out) yourself. From dr.hogart at gmail.com Wed Apr 30 09:36:58 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Wed, 30 Apr 2008 17:36:58 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine References: <48186A55.4030406@sendu.me.uk> Message-ID: On Wed, 30 Apr 2008 16:47:17 +0400, Sendu Bala wrote: > sergei ryazansky wrote: >> My subroutine is following: >> sub align { >> my $file=shift @_; >> my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => >> 'fasta', 'outfile' => 'temp_align.out'); >> my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >> my $aln=$factory->align ($file); >> open (fy,'temp_align.out'); my @temp_file=; close fy; >> return @temp_file; >> } >> This subroutine is called by the following command: >> my @align_fa = align($inputfile_align); >> After successful execution of this subroutine (accompaning with the >> corresponding messages on the terminal window) the execution of >> remainder script is terminated without any error messages. > > The problem lies somewhere within the rest of your script, so we have to > see it if you want help. > > Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you > don't make use of the resulting alignment object? A system call might > make more sense given what you're doing. The beauty of > Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse the > result file (temp_align.out) yourself. The rest of script,imho, is ok, because without this sub it is work fine. May be problem lies into the TCoffee itself? One of the feature of script is to estimate the quantity of nt changes in each position in the different similar sequences in comparing with consensus sequences. To perform this it is nesseccary to obtain the multiply alignment: the result of TCoffee alignment goes to another subroutine, that estemated the level of changes. Of course, I dont think that this way is the best approach, most probably there are a lot of the better ways to do it. But for my today purposes it is ok. -- From avilella at gmail.com Wed Apr 30 10:16:56 2008 From: avilella at gmail.com (Albert Vilella) Date: Wed, 30 Apr 2008 15:16:56 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> Message-ID: <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> Hi Sergei, Can you try to isolate this call with a simpler example to see if it still fails? When you say that the problems are in the compilation, do you mean that the interpreter won't even compile or that it fails during execution? Have you checked that you have all the dependencies right? Cheers, Albert. On Wed, Apr 30, 2008 at 2:36 PM, sergei ryazansky wrote: > On Wed, 30 Apr 2008 16:47:17 +0400, Sendu Bala wrote: > > sergei ryazansky wrote: > > > > > My subroutine is following: > > > sub align { > > > my $file=shift @_; > > > my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => > > > 'fasta', 'outfile' => 'temp_align.out'); > > > my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); > > > my $aln=$factory->align ($file); > > > open (fy,'temp_align.out'); my @temp_file=; close fy; > > > return @temp_file; > > > } > > > This subroutine is called by the following command: > > > my @align_fa = align($inputfile_align); > > > After successful execution of this subroutine (accompaning with the > > > corresponding messages on the terminal window) the execution of remainder > > > script is terminated without any error messages. > > > > > > > The problem lies somewhere within the rest of your script, so we have to > > see it if you want help. > > > > Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you > > don't make use of the resulting alignment object? A system call might make > > more sense given what you're doing. The beauty of > > Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse the > > result file (temp_align.out) yourself. > > > > The rest of script,imho, is ok, because without this sub it is work fine. > May be problem lies into the TCoffee itself? > > One of the feature of script is to estimate the quantity of nt changes in > each position in the different similar sequences in comparing with consensus > sequences. To perform this it is nesseccary to obtain the multiply > alignment: the result of TCoffee alignment goes to another subroutine, that > estemated the level of changes. Of course, I dont think that this way is the > best approach, most probably there are a lot of the better ways to do it. > But for my today purposes it is ok. > > -- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bix at sendu.me.uk Wed Apr 30 10:22:01 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 30 Apr 2008 15:22:01 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> Message-ID: <48188089.8000300@sendu.me.uk> sergei ryazansky wrote: > On Wed, 30 Apr 2008 16:47:17 +0400, Sendu Bala wrote: > >> sergei ryazansky wrote: >>> My subroutine is following: >>> sub align { >>> my $file=shift @_; >>> my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => >>> 'fasta', 'outfile' => 'temp_align.out'); >>> my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >>> my $aln=$factory->align ($file); >>> open (fy,'temp_align.out'); my @temp_file=; close fy; >>> return @temp_file; >>> } >>> This subroutine is called by the following command: >>> my @align_fa = align($inputfile_align); >>> After successful execution of this subroutine (accompaning with the >>> corresponding messages on the terminal window) the execution of >>> remainder script is terminated without any error messages. >> >> The problem lies somewhere within the rest of your script, so we have >> to see it if you want help. > > The rest of script,imho, is ok, because without this sub it is work > fine. May be problem lies into the TCoffee itself? I've run your subroutine in a simple script of my own and it doesn't cause script termination. Again, the problem lies elsewhere in your script. Supply it or it is impossible for anyone to help you. From Sebastien.Moretti at unil.ch Wed Apr 30 10:06:28 2008 From: Sebastien.Moretti at unil.ch (Sebastien MORETTI) Date: Wed, 30 Apr 2008 16:06:28 +0200 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> Message-ID: <48187CE4.8030606@unil.ch> >>> My subroutine is following: >>> sub align { >>> my $file=shift @_; >>> my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => >>> 'fasta', 'outfile' => 'temp_align.out'); >>> my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >>> my $aln=$factory->align ($file); >>> open (fy,'temp_align.out'); my @temp_file=; close fy; >>> return @temp_file; >>> } >>> This subroutine is called by the following command: >>> my @align_fa = align($inputfile_align); >>> After successful execution of this subroutine (accompaning with the >>> corresponding messages on the terminal window) the execution of >>> remainder script is terminated without any error messages. >> >> The problem lies somewhere within the rest of your script, so we have >> to see it if you want help. >> >> Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you >> don't make use of the resulting alignment object? A system call might >> make more sense given what you're doing. The beauty of >> Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse >> the result file (temp_align.out) yourself. > > The rest of script,imho, is ok, because without this sub it is work > fine. May be problem lies into the TCoffee itself? > > One of the feature of script is to estimate the quantity of nt changes > in each position in the different similar sequences in comparing with > consensus sequences. To perform this it is nesseccary to obtain the > multiply alignment: the result of TCoffee alignment goes to another > subroutine, that estemated the level of changes. Of course, I dont think > that this way is the best approach, most probably there are a lot of the > better ways to do it. But for my today purposes it is ok. Do you have tried to use the tcoffee command, called via bioperl, as a command line ? To check if it is a problem with tcoffee or with the tcoffee release that bioperl must use. -- S?bastien Moretti From dr.hogart at gmail.com Wed Apr 30 10:54:59 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Wed, 30 Apr 2008 18:54:59 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> Message-ID: Hi Albert, The isolated call is executed without any problem, so the code is absolutely correct. The problem arise when this sub executed within the whole script - after successful execution of TCoffee alignment the execution of the rest of script is terminated. The whole code is very big (~500 lines), so for simplicity lets imagine the sheme of script in the following view: sub1; sub2; sub3; sub align; # TCoffe alignment; sub4; sub5; Each sub (subroutine) is independent from the others subs; The order of script execution is 1,2,3,align,4,5. But after the execution of align the execution of the rest of subs (4 and 5) is terminated. The script without sub align {} successfully execute the sub 4 and sub 5. So, I mean that interpreter won't compile sub 4 and 5 if sub align is placed before them. On Wed, 30 Apr 2008 18:16:56 +0400, Albert Vilella wrote: > Hi Sergei, > > Can you try to isolate this call with a simpler example to see if it > still > fails? When you say that the problems are in the compilation, do you mean > that the interpreter won't even compile or that it fails during > execution? > Have you checked that you have all the dependencies right? > > Cheers, > > Albert. > > On Wed, Apr 30, 2008 at 2:36 PM, sergei ryazansky > wrote: > >> On Wed, 30 Apr 2008 16:47:17 +0400, Sendu Bala wrote: >> >> sergei ryazansky wrote: >> > >> > > My subroutine is following: >> > > sub align { >> > > my $file=shift @_; >> > > my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => >> > > 'fasta', 'outfile' => 'temp_align.out'); >> > > my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >> > > my $aln=$factory->align ($file); >> > > open (fy,'temp_align.out'); my @temp_file=; close fy; >> > > return @temp_file; >> > > } >> > > This subroutine is called by the following command: >> > > my @align_fa = align($inputfile_align); >> > > After successful execution of this subroutine (accompaning with the >> > > corresponding messages on the terminal window) the execution of >> remainder >> > > script is terminated without any error messages. >> > > >> > >> > The problem lies somewhere within the rest of your script, so we have >> to >> > see it if you want help. >> > >> > Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you >> > don't make use of the resulting alignment object? A system call might >> make >> > more sense given what you're doing. The beauty of >> > Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse >> the >> > result file (temp_align.out) yourself. >> > >> >> The rest of script,imho, is ok, because without this sub it is work >> fine. >> May be problem lies into the TCoffee itself? >> >> One of the feature of script is to estimate the quantity of nt changes >> in >> each position in the different similar sequences in comparing with >> consensus >> sequences. To perform this it is nesseccary to obtain the multiply >> alignment: the result of TCoffee alignment goes to another subroutine, >> that >> estemated the level of changes. Of course, I dont think that this way >> is the >> best approach, most probably there are a lot of the better ways to do >> it. >> But for my today purposes it is ok. >> >> -- >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- ?????????? M2, ????????????? ???????? ?????????? Opera: http://www.opera.com/mail/mail/ From dr.hogart at gmail.com Wed Apr 30 11:14:09 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Wed, 30 Apr 2008 19:14:09 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine References: <48186A55.4030406@sendu.me.uk> <48187CE4.8030606@unil.ch> Message-ID: No, I didn tried. To tell the truth the problem like this I have obtatin earlier. I simply wanted to aling the several set of sequences by TCoffee Bioperl package. The script should have been consequently add the set one after another to TCoffee wrapper. But after the alignment of the first set of sequences the alignment of the rest sets was terminated. So it was neccessary to use another "super_script" that called first script with different arguments linked to the corresponding set. > Do you have tried to use the tcoffee command, called via bioperl, as a > command line ? -- From bix at sendu.me.uk Wed Apr 30 11:28:50 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 30 Apr 2008 16:28:50 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> Message-ID: <48189032.20102@sendu.me.uk> sergei ryazansky wrote: > Hi Albert, > > The isolated call is executed without any problem, so the code is > absolutely correct. The problem arise when this sub executed within the > whole script - after successful execution of TCoffee alignment the > execution of the rest of script is terminated. The whole code is very > big (~500 lines), so for simplicity lets imagine the sheme of script in > the following view: > sub1; > sub2; > sub3; > sub align; # TCoffe alignment; > sub4; > sub5; > > Each sub (subroutine) is independent from the others subs; The order of > script execution is 1,2,3,align,4,5. But after the execution of align > the execution of the rest of subs (4 and 5) is terminated. The script > without sub align {} successfully execute the sub 4 and sub 5. So, I > mean that interpreter won't compile sub 4 and 5 if sub align is placed > before them. This has nothing to do with interpreter compilation, which is successful if the script runs at all. What do you do with the output of &align? The thing you are doing with that output is most likely the cause of your script terminating, which is why &sub4 and &sub5 run when you don't run &align (have no output that causes the problem). If you're not willing to show us your script, here are some simple debugging steps you can do yourself: # don't do anything with the output of align() - does &sub4 still run? # add some print statements after you call align(), and then after every further block of code in your script to see exactly where the script terminates # reduce your script down to a minimal script that shows the problem (with the help of the previous step) and show us that From dr.hogart at gmail.com Wed Apr 30 11:42:41 2008 From: dr.hogart at gmail.com (Sergei Ryazansky) Date: Wed, 30 Apr 2008 19:42:41 +0400 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> Message-ID: ------- Forwarded message ------- From: "Sergei Ryazansky" To: "Sendu Bala" Cc: Subject: Re: [Bioperl-l] alignment by TCoffee as a subroutine Date: Wed, 30 Apr 2008 19:40:26 +0400 > What do you do with the output of &align? The thing you are doing with > that output is most likely the cause of your script terminating, which > is why &sub4 and &sub5 run when you don't run &align (have no output > that causes the problem). please sea my answer to Sebastien Moretti - there are description of another similar problem. The only thing that I did there with output is printing to file. Nevetheless the problem was the same. > # don't do anything with the output of align() - does &sub4 still run? please sea above. > # add some print statements after you call align(), and then after every > further block of code in your script to see exactly where the script > terminates > # reduce your script down to a minimal script that shows the problem > (with the help of the previous step) and show us that all tests with individual bloks was performed earlier. the results is ok. From cjfields at uiuc.edu Wed Apr 30 12:25:06 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 30 Apr 2008 11:25:06 -0500 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> Message-ID: <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Sergei, I agree with Sendu; we can't diagnose this unless we either have the entire script of a minimal version of it demonstrating the bug. The best way to handle this is to file a bug report, attaching relevant data using the 'Create a new attachment' link (including either the full script or a shortened one which demonstrates the bug). Otherwise we're just shooting in the dark trying to diagnose the problem. http://bugzilla.open-bio.org/ chris On Apr 30, 2008, at 10:42 AM, Sergei Ryazansky wrote: > > > ------- Forwarded message ------- > From: "Sergei Ryazansky" > To: "Sendu Bala" > Cc: > Subject: Re: [Bioperl-l] alignment by TCoffee as a subroutine > Date: Wed, 30 Apr 2008 19:40:26 +0400 > >> What do you do with the output of &align? The thing you are doing >> with that output is most likely the cause of your script >> terminating, which is why &sub4 and &sub5 run when you don't run >> &align (have no output that causes the problem). > > please sea my answer to Sebastien Moretti - there are description of > another similar problem. The only thing that I did there with output > is > printing to file. Nevetheless the problem was the same. > >> # don't do anything with the output of align() - does &sub4 still >> run? > > please sea above. > >> # add some print statements after you call align(), and then after >> every further block of code in your script to see exactly where the >> script terminates >> # reduce your script down to a minimal script that shows the >> problem (with the help of the previous step) and show us that > > all tests with individual bloks was performed earlier. the results > is ok. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dr.hogart at gmail.com Wed Apr 30 12:40:19 2008 From: dr.hogart at gmail.com (Sergei Ryazansky) Date: Wed, 30 Apr 2008 20:40:19 +0400 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Message-ID: On Wed, 30 Apr 2008 20:25:06 +0400, Chris Fields wrote: Chris, I have already sent file to Sendu and also I am attaching it here. I have removed from it really unnecessary parts. > Sergei, > > I agree with Sendu; we can't diagnose this unless we either have the > entire script of a minimal version of it demonstrating the bug. > > The best way to handle this is to file a bug report, attaching relevant > data using the 'Create a new attachment' link (including either the full > script or a shortened one which demonstrates the bug). Otherwise we're > just shooting in the dark trying to diagnose the problem. > > http://bugzilla.open-bio.org/ > > chris -------------- next part -------------- A non-text attachment was scrubbed... Name: script.pl Type: application/octet-stream Size: 6870 bytes Desc: not available URL: From cjfields at uiuc.edu Wed Apr 30 13:02:19 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 30 Apr 2008 12:02:19 -0500 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Message-ID: Hmm, maybe you were confused? From my last email: "The best way to handle this is to file a bug report, attaching relevant data using the 'Create a new attachment' link (including either the full script or a shortened one which demonstrates the bug). Otherwise we're just shooting in the dark trying to diagnose the problem." http://bugzilla.open-bio.org/ Anyone can work on fixing the issue there (so it'll probably get fixed faster). The devs can also track progress on the problem via the dev mail list (bioperl-guts). Diagnosing the bug may also reveal issues not just with Bio::Tools::Run::Alignment::TCoffee but also with other related modules. If needed I can post it to bugzilla, but it helps to submit the bug yourself (so you can receive posts on it's progress). chris On Apr 30, 2008, at 11:40 AM, Sergei Ryazansky wrote: > On Wed, 30 Apr 2008 20:25:06 +0400, Chris Fields > wrote: > > Chris, I have already sent file to Sendu and also I am attaching it > here. I have removed from it really unnecessary parts. > >> Sergei, >> >> I agree with Sendu; we can't diagnose this unless we either have >> the entire script of a minimal version of it demonstrating the bug. >> >> The best way to handle this is to file a bug report, attaching >> relevant data using the 'Create a new attachment' link (including >> either the full script or a shortened one which demonstrates the >> bug). Otherwise we're just shooting in the dark trying to diagnose >> the problem. >> >> http://bugzilla.open-bio.org/ >> >> chris From dr.hogart at gmail.com Wed Apr 30 13:39:35 2008 From: dr.hogart at gmail.com (Sergei Ryazansky) Date: Wed, 30 Apr 2008 21:39:35 +0400 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Message-ID: On Wed, 30 Apr 2008 21:11:56 +0400, Sergei Ryazansky wrote: > Oh, sorry, you right - I too fast read you message. I do it slight later. > >> Hmm, maybe you were confused? From my last email: >> >> "The best way to handle this is to file a bug report, attaching >> relevant data using the 'Create a new attachment' link (including >> either the full script or a shortened one which demonstrates the bug). >> Otherwise we're just shooting in the dark trying to diagnose the >> problem." >> >> http://bugzilla.open-bio.org/ >> >> Anyone can work on fixing the issue there (so it'll probably get fixed >> faster). The devs can also track progress on the problem via the dev >> mail list (bioperl-guts). Diagnosing the bug may also reveal issues >> not just with Bio::Tools::Run::Alignment::TCoffee but also with other >> related modules. >> >> If needed I can post it to bugzilla, but it helps to submit the bug >> yourself (so you can receive posts on it's progress). >> >> chris >> >> On Apr 30, 2008, at 11:40 AM, Sergei Ryazansky wrote: >> >>> On Wed, 30 Apr 2008 20:25:06 +0400, Chris Fields >>> wrote: >>> >>> Chris, I have already sent file to Sendu and also I am attaching it >>> here. I have removed from it really unnecessary parts. >>> >>>> Sergei, >>>> >>>> I agree with Sendu; we can't diagnose this unless we either have the >>>> entire script of a minimal version of it demonstrating the bug. >>>> >>>> The best way to handle this is to file a bug report, attaching >>>> relevant data using the 'Create a new attachment' link (including >>>> either the full script or a shortened one which demonstrates the >>>> bug). Otherwise we're just shooting in the dark trying to diagnose >>>> the problem. >>>> >>>> http://bugzilla.open-bio.org/ >>>> >>>> chris > From cjfields at uiuc.edu Wed Apr 30 14:29:28 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 30 Apr 2008 13:29:28 -0500 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Message-ID: <39A139E4-6783-41E6-8EE9-1FE60CB57577@uiuc.edu> Sorry, didn't catch that... chris On Apr 30, 2008, at 12:39 PM, Sergei Ryazansky wrote: > On Wed, 30 Apr 2008 21:11:56 +0400, Sergei Ryazansky > wrote: > >> Oh, sorry, you right - I too fast read you message. I do it slight >> later. >> >>> Hmm, maybe you were confused? From my last email: >>> >>> "The best way to handle this is to file a bug report, attaching >>> relevant data using the 'Create a new attachment' link (including >>> either the full script or a shortened one which demonstrates the >>> bug). Otherwise we're just shooting in the dark trying to diagnose >>> the problem." >>> >>> http://bugzilla.open-bio.org/ >>> >>> Anyone can work on fixing the issue there (so it'll probably get >>> fixed faster). The devs can also track progress on the problem >>> via the dev mail list (bioperl-guts). Diagnosing the bug may also >>> reveal issues not just with Bio::Tools::Run::Alignment::TCoffee >>> but also with other related modules. >>> >>> If needed I can post it to bugzilla, but it helps to submit the >>> bug yourself (so you can receive posts on it's progress). >>> >>> chris >>> >>> On Apr 30, 2008, at 11:40 AM, Sergei Ryazansky wrote: >>> >>>> On Wed, 30 Apr 2008 20:25:06 +0400, Chris Fields >>>> wrote: >>>> >>>> Chris, I have already sent file to Sendu and also I am attaching >>>> it here. I have removed from it really unnecessary parts. >>>> >>>>> Sergei, >>>>> >>>>> I agree with Sendu; we can't diagnose this unless we either have >>>>> the entire script of a minimal version of it demonstrating the >>>>> bug. >>>>> >>>>> The best way to handle this is to file a bug report, attaching >>>>> relevant data using the 'Create a new attachment' link >>>>> (including either the full script or a shortened one which >>>>> demonstrates the bug). Otherwise we're just shooting in the dark >>>>> trying to diagnose the problem. >>>>> >>>>> http://bugzilla.open-bio.org/ >>>>> >>>>> chris >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Bank.Beszteri at awi.de Tue Apr 1 08:31:49 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Tue, 01 Apr 2008 14:31:49 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL Message-ID: <47F22B35.1030502@awi.de> Dear list, we have recently started to try to find a solution for indexing large sequence databases / flat files for a java project, and because we ran into problems using biojava, and because both the OBDA and BioSQL ways seem to be compatible across bio~ projects, we also started to experiment with bioperl. It looks like this should work fine, but we had a couple of problems here, too. Perhaps some of you can give me hint what we are doing wrong! The first thing we tried was to use Bio::DB::Flat for indexing a TrEMBL flat file (~ 12 GB); but it seems we haven?t got a machine with enough memory to be able to handle this. (Perhaps you would be using the "bdb" style index in such a case in bioperl, but this apparently doesn?t work with biojava, so we had to stick with "flat"). So next we started to test BioSQL, by trying to load just Swissprot in a MySQL DB first, like: load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser xyz --dbpass abc --driver mysql --namespace uniprot_sprot --format swiss uniprot_sprot.dat Here we get an error message ########################################### Loading /biodb/spinkern/uniprot_sprot.dat ... Could not store Q6DAH5: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The supplied lineage does not start near 'Erwinia carotovora subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | Pectobacterium | Enterobacteriaceae | Enterobacteriales | Gammaproteobacteria | Proteobacteria | Bacteria') STACK: Error::throw STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/bioperl-1.5.2_102/Bio/Root/Root.pm:359 STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/bioperl-1.5.2_102/Bio/Species.pm:174 STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:552 STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1305 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:973 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182 STACK: Bio::DB::Persistent::PersistentObject::create /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271 STACK: load_seqdatabase.pl:622 ----------------------------------------------------------- at load_seqdatabase.pl line 635 ############################################ or similar, depending on whether we use a pre-loaded ncbi taxonomy or not, and which Swissprot release we are trying to load. It often seems to come from sg. like here, subsp. or other special addition to the species line; but alternative genus names and other curious things also to appear. It looks like Species.pm tries to validate the species name against the lineage info already there in the BioSQL DB, and in several cases, it finds inconsistencies. If we start with the ncbi taxonomy already loaded in the database, the first error comes much earlier. I found a thread on the same problem from ~ two years ago (http://thread.gmane.org/gmane.comp.lang.perl.bio.general/13766/focus=13788), where the solution recommended was to update bioperl, so I was quite surprised to find the problem with the version you can see above (1.5.2_102 bioperl core, 1.5.2_100 bioperl_db). Can someone give me any hints as to what is going wrong here? The only workaround we have found so far was to comment out line 174 in Species.pm: $self->throw("The supplied lineage does not start near '$name' (I was supplied '".join(" | ", @vals)."')"); After doing so, load_seqdatabase.pl runs for several hours (until it evetually crashes; I haven?t found out yet why), but proceeds really slowly. I also found some info on this for Pg and Oracle in the mailing list, but has anyone some approximate numbers for MySQL, how long should a first Swissprot load take? Would be grateful to hear about your ideas / experiences on these issues! Bank Beszteri Bioinformatics / Scientific Computing Alfred Wegener Institute for Polar and Marine Research Am Handelshafen 12. 27570 Bremerhaven Germany From cjfields at uiuc.edu Tue Apr 1 20:45:28 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 1 Apr 2008 19:45:28 -0500 Subject: [Bioperl-l] quick update on bioperl nightly builds Message-ID: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> I'm simplifying the nightly build archive names (removing svn revision # and date) in case anyone needs to update bioperl-live/run/db/network on a regular basis (read: GBrowse installations). When I have time I'll start working on automated builds, which will require some extra work with Module::Build and Build.PL. chris From hiekeen at gmail.com Tue Apr 1 22:14:07 2008 From: hiekeen at gmail.com (Jinyan Huang) Date: Wed, 2 Apr 2008 10:14:07 +0800 Subject: [Bioperl-l] How to make a network graphic using my genes in pathways? Message-ID: I have 20 pathways. My interesting genes are in these pathways. There are some genes overlaps in these pathways. How can I make a graphic network using these genes? It means connecting these pathways through these overlap genes. What kind of software can I use? Thank you very much in advance. -- Best regards, Jinyan Huang (ekeen) School of Life Sciences and Technology, 1302 Room Tongji University Siping Road 1239, Shanghai 200092 P.R. China Tel :0086-21-65981041 Msn: hiekeen at hotmail.com eMail: hiekeen at gmail.com From hlapp at gmx.net Tue Apr 1 22:30:06 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 1 Apr 2008 22:30:06 -0400 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47F22B35.1030502@awi.de> References: <47F22B35.1030502@awi.de> Message-ID: On Apr 1, 2008, at 8:31 AM, B?nk Beszteri wrote: > [...] So next we started to test BioSQL, by trying to load just > Swissprot in a MySQL DB first, like: > > load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser > xyz --dbpass abc --driver mysql --namespace uniprot_sprot --format > swiss uniprot_sprot.dat > > Here we get an error message > > ########################################### > > Loading /biodb/spinkern/uniprot_sprot.dat ... > Could not store Q6DAH5: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: The supplied lineage does not start near 'Erwinia carotovora > subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | > Pectobacterium | Enterobacteriaceae | Enterobacteriales | > Gammaproteobacteria | Proteobacteria | Bacteria') > STACK: Error::throw > STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/ > bioperl-1.5.2_102/Bio/Root/Root.pm:359 > STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/ > bioperl-1.5.2_102/Bio/Species.pm:174 > STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: > 552 > STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / > biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:1305 > STACK: > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:973 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:852 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:182 > STACK: Bio::DB::Persistent::PersistentObject::create /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: > 244 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:169 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:251 > STACK: Bio::DB::Persistent::PersistentObject::store /biodb/spinkern/ > bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271 > STACK: load_seqdatabase.pl:622 > ----------------------------------------------------------- > > at load_seqdatabase.pl line 635 > > ############################################ > > or similar, depending on whether we use a pre-loaded ncbi taxonomy > or not I recommend to always use a pre-loaded NCBI taxonomy unless you know there are only a few organisms that are straightforward (for the parser, that is). > , and which Swissprot release we are trying to load. It often seems > to come from sg. like here, subsp. or other special addition to the > species line; but alternative genus names and other curious things > also to appear. It looks like Species.pm tries to validate the > species name against the lineage info already there in the BioSQL > DB, and in several cases, it finds inconsistencies. It actually happens upon a successful lookup when the species object is populated from the database. > [...] > The only workaround we have found so far was to comment out line > 174 in Species.pm: > > $self->throw("The supplied lineage does not start near '$name' (I > was supplied '".join(" | ", @vals)."')"); That should be OK if you work with a pre-loaded taxonomy. It's sort of a sanity check that should catch a parser having messed up a species. If you use a pre-loaded NCBI taxonomy the results of the species parsing don't matter in all details so long as the NCBI taxonID is parsed out correctly, and then found in the database. Note that this actually a warn() in the main trunk version of BioPerl, so you might want to upgrade to that (or change throw() to warn() in your version). You still get the records flagged with that, but it isn't an exception. > > After doing so, load_seqdatabase.pl runs for several hours (until > it evetually crashes; I haven?t found out yet why), but proceeds > really slowly. It should certainly *not* crash. Note also that you can supply --safe on the command line, in which case the script will continue with the next record if one fails to load for whatever reason. You will want to adjust the width constraint of dbxref.accession, for example to 128 chars. This will also be fixed for BioSQL 1.0.1. See http://bugzilla.open-bio.org/show_bug.cgi?id=2474 > I also found some info on this for Pg and Oracle in the mailing > list, but has anyone some approximate numbers for MySQL, how long > should a first Swissprot load take? Possibly around 20 hours according to Erik Rijkers: See http://lists.open-bio.org/pipermail/bioperl-l/2008-March/027427.html You can use the --logchunks N option to have it print out performance statistics every N records. Hope this helps, -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Apr 1 22:38:12 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 1 Apr 2008 22:38:12 -0400 Subject: [Bioperl-l] Very basic implementation of GenBank XML SeqIO module In-Reply-To: <47F13C2C.4070909@umdnj.edu> References: <47F13C2C.4070909@umdnj.edu> Message-ID: Ryan - do you not have a committer account? I do agree with Chris on the test. Modules w/o tests tend to become 'pseudogenized.' -hilmar On Mar 31, 2008, at 3:31 PM, Ryan Golhar wrote: > I have a (very) basic SAX implementation of a SeqIO module to parse > GenBank XML records. Right now, it only reads in basic information > regarding the sequence and the sequence itself. > > It does not yet parse the features table. Should I submit it to be > included in bioperl or wait until I implement more for the features > table? I'm not sure when I'll get around to it though > > Ryan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cain.cshl at gmail.com Tue Apr 1 23:12:04 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 01 Apr 2008 23:12:04 -0400 Subject: [Bioperl-l] quick update on bioperl nightly builds In-Reply-To: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> Message-ID: <1207105924.6184.4.camel@frissell> Hi Chris, The tarball is currently (Apr 1) being built in a tmp directory, so that the extracted tarball is ./tmp/bioperl-live/. Is that intended? Thanks, Scott On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: > I'm simplifying the nightly build archive names (removing svn revision > # and date) in case anyone needs to update bioperl-live/run/db/network > on a regular basis (read: GBrowse installations). When I have time > I'll start working on automated builds, which will require some extra > work with Module::Build and Build.PL. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cjfields at uiuc.edu Tue Apr 1 23:59:30 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 1 Apr 2008 22:59:30 -0500 Subject: [Bioperl-l] quick update on bioperl nightly builds In-Reply-To: <1207105924.6184.4.camel@frissell> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> Message-ID: Nope, that isn't intended. I fixed it and reran it manually, so it should be fine now (note I didn't update the log file; the next cron run will catch that). I may toy around with your recent passthrough flag addition to try getting automated PPM's up and running. chris On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: > Hi Chris, > > The tarball is currently (Apr 1) being built in a tmp directory, so > that > the extracted tarball is ./tmp/bioperl-live/. Is that intended? > > Thanks, > Scott > > On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: >> I'm simplifying the nightly build archive names (removing svn >> revision >> # and date) in case anyone needs to update bioperl-live/run/db/ >> network >> on a regular basis (read: GBrowse installations). When I have time >> I'll start working on automated builds, which will require some extra >> work with Module::Build and Build.PL. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Wed Apr 2 07:33:38 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 2 Apr 2008 07:33:38 -0400 Subject: [Bioperl-l] How to make a network graphic using my genes in pathways? In-Reply-To: References: Message-ID: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> On Tue, Apr 1, 2008 at 10:14 PM, Jinyan Huang wrote: > I have 20 pathways. My interesting genes are in these pathways. There > are some genes overlaps in these pathways. How can I make a graphic > network using these genes? It means connecting these pathways through > these overlap genes. What kind of software can I use? R/Bioconductor has tools for working with graphs and pathways. Cytoscape is another open-source graphical solution. Ingenuity is, of course, not free. If you are looking at a perl solution, you can look at the various graph modules and their integration with the Graphviz libraries. SEan From cain.cshl at gmail.com Wed Apr 2 08:28:22 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 02 Apr 2008 08:28:22 -0400 Subject: [Bioperl-l] [Gmod-gbrowse] quick update on bioperl nightly builds In-Reply-To: References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> Message-ID: <1207139302.6507.7.camel@frissell> Hi Chris, (trimmed out gbrowse mailing list since this is just bioperl business) Speaking of the pass through stuff, Sendu mentioned that I stomped on some changes to Build.PL that you and he did when I committed that change, so it should be rolled back. Is there a good (svn) way to do that? Or should I just copy the contents of the old (good) Build.PL into a fresh file in my checkout and commit it? Thanks, Scott On Tue, 2008-04-01 at 22:59 -0500, Chris Fields wrote: > Nope, that isn't intended. I fixed it and reran it manually, so it > should be fine now (note I didn't update the log file; the next cron > run will catch that). > > I may toy around with your recent passthrough flag addition to try > getting automated PPM's up and running. > > chris > > On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: > > > Hi Chris, > > > > The tarball is currently (Apr 1) being built in a tmp directory, so > > that > > the extracted tarball is ./tmp/bioperl-live/. Is that intended? > > > > Thanks, > > Scott > > > > On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: > >> I'm simplifying the nightly build archive names (removing svn > >> revision > >> # and date) in case anyone needs to update bioperl-live/run/db/ > >> network > >> on a regular basis (read: GBrowse installations). When I have time > >> I'll start working on automated builds, which will require some extra > >> work with Module::Build and Build.PL. > >> > >> chris > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. cain at cshl.edu > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From robert.citek at gmail.com Wed Apr 2 08:24:06 2008 From: robert.citek at gmail.com (Robert Citek) Date: Wed, 2 Apr 2008 07:24:06 -0500 Subject: [Bioperl-l] module for pubchem queries Message-ID: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> Hello all, I have a list of chemical compounds that have some kind of interaction with proteins or genes. The current list contains names or SMILES and I would like to get the CID number for those compounds. Currently, I'm using perl to query the NCBI's eutils[1], which works great. But I was just curious to know of there was a bioperl module to do something similar. A quick google didn't turn up anything, so I thought I'd ask. [1] http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html Regards, - Robert From David.Messina at sbc.su.se Wed Apr 2 08:41:45 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 2 Apr 2008 14:41:45 +0200 Subject: [Bioperl-l] How to make a network graphic using my genes in pathways? In-Reply-To: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> Message-ID: <628aabb70804020541v6cee4584ibd9935290ae7cc0a@mail.gmail.com> I have no personal experience with it, but a colleague of mine suggested VisANT . Dave From cjfields at uiuc.edu Wed Apr 2 11:03:32 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 2 Apr 2008 10:03:32 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] quick update on bioperl nightly builds In-Reply-To: <1207139302.6507.7.camel@frissell> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> <1207139302.6507.7.camel@frissell> Message-ID: <3B490712-3413-4662-99D7-7B115CECB6E1@uiuc.edu> The changes I made were related to problems checking MySQL for Bio::DB::SeqFeature::Store tests when connectivity requires username/ password. For some reason it tests DB connectivity up front, while Bio::DB::GFF assumes the DB setup is correct (no direct DB check) then runs tests assuming the setup is correct. You can view the diffs for your commits here: http://code.open-bio.org/svnweb/index.cgi/bioperl/diff/bioperl-live/trunk/ModuleBuildBioperl.pm?revs=14604&revs=14548 http://code.open-bio.org/svnweb/index.cgi/bioperl/diff/bioperl-live/trunk/Build.PL?revs=14604&revs=14565 I'll try working on merging them together today; it shouldn't be too hard (the changes were fairly minor in both Build.PL and Module::Build). I'll test to make sure your changes stay in as well. Down the road I believe we need to rethink how we want the Build process to run using Module::Build as it's a bit convoluted, but it works for now. chris On Apr 2, 2008, at 7:28 AM, Scott Cain wrote: > Hi Chris, > > (trimmed out gbrowse mailing list since this is just bioperl business) > > Speaking of the pass through stuff, Sendu mentioned that I stomped on > some changes to Build.PL that you and he did when I committed that > change, so it should be rolled back. Is there a good (svn) way to do > that? Or should I just copy the contents of the old (good) Build.PL > into a fresh file in my checkout and commit it? > > Thanks, > Scott > > On Tue, 2008-04-01 at 22:59 -0500, Chris Fields wrote: >> Nope, that isn't intended. I fixed it and reran it manually, so it >> should be fine now (note I didn't update the log file; the next cron >> run will catch that). >> >> I may toy around with your recent passthrough flag addition to try >> getting automated PPM's up and running. >> >> chris >> >> On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: >> >>> Hi Chris, >>> >>> The tarball is currently (Apr 1) being built in a tmp directory, so >>> that >>> the extracted tarball is ./tmp/bioperl-live/. Is that intended? >>> >>> Thanks, >>> Scott >>> >>> On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: >>>> I'm simplifying the nightly build archive names (removing svn >>>> revision >>>> # and date) in case anyone needs to update bioperl-live/run/db/ >>>> network >>>> on a regular basis (read: GBrowse installations). When I have time >>>> I'll start working on automated builds, which will require some >>>> extra >>>> work with Module::Build and Build.PL. >>>> >>>> chris >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. cain at cshl.edu >>> GMOD Coordinator (http://www.gmod.org/) >>> 216-392-3087 >>> Cold Spring Harbor Laboratory >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace >> _______________________________________________ >> Gmod-gbrowse mailing list >> Gmod-gbrowse at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Apr 2 11:54:05 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 2 Apr 2008 10:54:05 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] quick update on bioperl nightly builds In-Reply-To: <3B490712-3413-4662-99D7-7B115CECB6E1@uiuc.edu> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> <1207139302.6507.7.camel@frissell> <3B490712-3413-4662-99D7-7B115CECB6E1@uiuc.edu> Message-ID: <71375DA3-A751-4908-8000-D9ACAE39B19C@uiuc.edu> Okay, committed them. The accept passthrough still appears to work; let me know if anything pops up. chris On Apr 2, 2008, at 10:03 AM, Chris Fields wrote: > ... > I'll try working on merging them together today; it shouldn't be too > hard (the changes were fairly minor in both Build.PL and > Module::Build). I'll test to make sure your changes stay in as > well. Down the road I believe we need to rethink how we want the > Build process to run using Module::Build as it's a bit convoluted, > but it works for now. > > chris > > On Apr 2, 2008, at 7:28 AM, Scott Cain wrote: >> Hi Chris, >> >> (trimmed out gbrowse mailing list since this is just bioperl >> business) >> >> Speaking of the pass through stuff, Sendu mentioned that I stomped on >> some changes to Build.PL that you and he did when I committed that >> change, so it should be rolled back. Is there a good (svn) way to do >> that? Or should I just copy the contents of the old (good) Build.PL >> into a fresh file in my checkout and commit it? >> >> Thanks, >> Scott >> >> On Tue, 2008-04-01 at 22:59 -0500, Chris Fields wrote: >>> Nope, that isn't intended. I fixed it and reran it manually, so it >>> should be fine now (note I didn't update the log file; the next cron >>> run will catch that). >>> >>> I may toy around with your recent passthrough flag addition to try >>> getting automated PPM's up and running. >>> >>> chris >>> >>> On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: >>> >>>> Hi Chris, >>>> >>>> The tarball is currently (Apr 1) being built in a tmp directory, so >>>> that >>>> the extracted tarball is ./tmp/bioperl-live/. Is that intended? >>>> >>>> Thanks, >>>> Scott >>>> >>>> On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: >>>>> I'm simplifying the nightly build archive names (removing svn >>>>> revision >>>>> # and date) in case anyone needs to update bioperl-live/run/db/ >>>>> network >>>>> on a regular basis (read: GBrowse installations). When I have >>>>> time >>>>> I'll start working on automated builds, which will require some >>>>> extra >>>>> work with Module::Build and Build.PL. >>>>> >>>>> chris >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. cain at cshl.edu >>>> GMOD Coordinator (http://www.gmod.org/) >>>> 216-392-3087 >>>> Cold Spring Harbor Laboratory >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace >>> _______________________________________________ >>> Gmod-gbrowse mailing list >>> Gmod-gbrowse at lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. cain.cshl at gmail.com >> GMOD Coordinator (http://www.gmod.org/) >> 216-392-3087 >> Cold Spring Harbor Laboratory >> >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From zhpan99 at yahoo.com Wed Apr 2 13:52:46 2008 From: zhpan99 at yahoo.com (Pan Zheng) Date: Wed, 2 Apr 2008 10:52:46 -0700 (PDT) Subject: [Bioperl-l] installing bioperl-1.5.2 errors:DB_File Message-ID: <726978.82400.qm@web53105.mail.re2.yahoo.com> Hi, I am installing bioperl-1.5.2_102 under cygwin on my Windows XP and having some errors during the process. When I was running "perl Build test", one major error is the error about DB_File. I tried to install DB_File from cpan and rpm without any luck. ++++++++++++++++++++++++ CPAN: File::Temp loaded ok (v0.16) CPAN: YAML loaded ok (v0.62) CPAN.pm: Going to build P/PM/PMQS/DB_File-1.817.tar.gz Parsing config.in... Looks Good. Checking if your kit is complete... Looks good Note (probably harmless): No library found for -ldb Writing Makefile for DB_File cp DB_File.pm blib/lib/DB_File.pm AutoSplitting blib/lib/DB_File.pm (blib/lib/auto/DB_File) gcc -c -I/usr/local/BerkeleyDB/include -DPERL_USE_SAFE_PUTENV -fno-strict-alias ing -pipe -Wdeclaration-after-statement -DUSEIMPORTLIB -O3 -DVERSION=\"1.817\" -DXS_VERSION=\"1.817\" "-I/usr/lib/perl5/5.8/cygwin/CORE" -D_NOT_CORE -DmDB_ Prefix_t=size_t -DmDB_Hash_t=u_int32_t version.c version.c:30:16: db.h: No such file or directory make: *** [version.o] Error 1 PMQS/DB_File-1.817.tar.gz /usr/bin/make -- NOT OK Running make test Can't test without successful make Running make install Make had returned bad status, install seems impossible Failed during this command: PMQS/DB_File-1.817.tar.gz : make NO +++++++++++++++++++++++++++++++++++++++++++++++ I can't remember I had this kind error while installing earlier version. Would you please help me on DB_File installation ? Thanks. Pan --------------------------------- You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. From dr.hogart at gmail.com Thu Apr 3 09:01:03 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Thu, 03 Apr 2008 17:01:03 +0400 Subject: [Bioperl-l] support of clustalw2 in bio::run::tool::alignment Message-ID: As for as I understand clustalw2 is not supported in bioperl v1.5.2.100. In what version it will be realized? Thank you in advance. From slduncan at iastate.edu Thu Apr 3 14:13:16 2008 From: slduncan at iastate.edu (slduncan at iastate.edu) Date: Thu, 3 Apr 2008 13:13:16 -0500 (CDT) Subject: [Bioperl-l] help installing bioperl with cygwin Message-ID: <161313331084931@webmail.iastate.edu> I am trying to use cpan to install bioperl and I had an error message saying: c:\Documents not recognized as and external or internal.... Any ideas here. Also, I am new to the computer world so please be kind. :) Stacy Duncan Iowa State University Bioinformatics and Computational Biology 1802 University Blvd. VMRI Building 6 Ames, IA 50011-1240 office phone: (515) 294-8385 office fax: (515) 294-1401 home phone: (336) 965-5622 e-mail: slduncan at iastate.edu From cjfields at uiuc.edu Fri Apr 4 16:13:23 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 4 Apr 2008 15:13:23 -0500 Subject: [Bioperl-l] help installing bioperl with cygwin In-Reply-To: <161313331084931@webmail.iastate.edu> References: <161313331084931@webmail.iastate.edu> Message-ID: It's best if you use ActiveState's Perl installation (it's the only one we really support at this moment, unless someone wants to give StrawberryPerl a run). See: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows chris On Apr 3, 2008, at 1:13 PM, slduncan at iastate.edu wrote: > I am trying to use cpan to install bioperl and I had an error > message saying: > c:\Documents not recognized as and external or internal.... > Any ideas here. Also, I am new to the computer world so please be > kind. :) > > Stacy Duncan > Iowa State University > Bioinformatics and Computational Biology > 1802 University Blvd. > VMRI Building 6 > Ames, IA 50011-1240 > office phone: (515) 294-8385 > office fax: (515) 294-1401 > home phone: (336) 965-5622 > e-mail: slduncan at iastate.edu > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Apr 4 16:07:12 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 4 Apr 2008 15:07:12 -0500 Subject: [Bioperl-l] installing bioperl-1.5.2 errors:DB_File In-Reply-To: <726978.82400.qm@web53105.mail.re2.yahoo.com> References: <726978.82400.qm@web53105.mail.re2.yahoo.com> Message-ID: I think you have to use the cygwin installer to install DB_File (it also installs dependencies, such as BDB). According to 'perldoc perlcygwin': .... Optional Libraries for Perl on Cygwin Several Perl functions and modules depend on the existence of some optional libraries. Configure will find them if they are installed in one of the directories listed as being used for library searches. Pre- built packages for most of these are available from the Cygwin installer. .... chris On Apr 2, 2008, at 12:52 PM, Pan Zheng wrote: > Hi, > > I am installing bioperl-1.5.2_102 under cygwin on my Windows XP and > having some errors during the process. > > When I was running "perl Build test", one major error is the error > about DB_File. I tried to install DB_File from cpan and rpm without > any luck. > > ++++++++++++++++++++++++ > CPAN: File::Temp loaded ok (v0.16) > CPAN: YAML loaded ok (v0.62) > CPAN.pm: Going to build P/PM/PMQS/DB_File-1.817.tar.gz > Parsing config.in... > Looks Good. > Checking if your kit is complete... > Looks good > Note (probably harmless): No library found for -ldb > Writing Makefile for DB_File > cp DB_File.pm blib/lib/DB_File.pm > AutoSplitting blib/lib/DB_File.pm (blib/lib/auto/DB_File) > gcc -c -I/usr/local/BerkeleyDB/include -DPERL_USE_SAFE_PUTENV -fno- > strict-alias > ing -pipe -Wdeclaration-after-statement -DUSEIMPORTLIB -O3 - > DVERSION=\"1.817\" > -DXS_VERSION=\"1.817\" "-I/usr/lib/perl5/5.8/cygwin/CORE" - > D_NOT_CORE -DmDB_ > Prefix_t=size_t -DmDB_Hash_t=u_int32_t version.c > version.c:30:16: db.h: No such file or directory > make: *** [version.o] Error 1 > PMQS/DB_File-1.817.tar.gz > /usr/bin/make -- NOT OK > Running make test > Can't test without successful make > Running make install > Make had returned bad status, install seems impossible > Failed during this command: > PMQS/DB_File-1.817.tar.gz : make NO > +++++++++++++++++++++++++++++++++++++++++++++++ > > > I can't remember I had this kind error while installing earlier > version. > > Would you please help me on DB_File installation ? > > Thanks. > > Pan > > > --------------------------------- > You rock. That's why Blockbuster's offering you one month of > Blockbuster Total Access, No Cost. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Apr 4 17:25:41 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 4 Apr 2008 16:25:41 -0500 Subject: [Bioperl-l] module for pubchem queries In-Reply-To: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> References: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> Message-ID: <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> Do you need something to access eutils via BioPerl, or are you looking for a specific set of classes? I wrote an interface to eutils (Bio::DB::EUtilities), you could do something like this: #!/usr/bin/perl -w use strict; use warnings; use Bio::DB::EUtilities; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -term => 'dihydroorotate', -db => 'pcsubstance', -retmax => 1000); print join(',',$eutil->get_ids)."\n"; chris On Apr 2, 2008, at 7:24 AM, Robert Citek wrote: > Hello all, > > I have a list of chemical compounds that have some kind of interaction > with proteins or genes. The current list contains names or SMILES and > I would like to get the CID number for those compounds. Currently, > I'm using perl to query the NCBI's eutils[1], which works great. But > I was just curious to know of there was a bioperl module to do > something similar. A quick google didn't turn up anything, so I > thought I'd ask. > > [1] http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html > > Regards, > - Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From ekeen at mail.tongji.edu.cn Mon Apr 7 02:57:04 2008 From: ekeen at mail.tongji.edu.cn (Jinyan Huang) Date: Mon, 7 Apr 2008 14:57:04 +0800 Subject: [Bioperl-l] How to analysis the relationship of my interesting KEGG pathways? Message-ID: In my research, I got 25 interesting pathways. I want to know the regulated relationship of these pathways. It is better if there some software to connect these KEGG pathways. Thank you very much in advance. From miguel.pignatelli at uv.es Mon Apr 7 06:12:58 2008 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Mon, 07 Apr 2008 12:12:58 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> Message-ID: <47F9F3AA.2090003@uv.es> Hi all, Is there any way to obtain the date of creation of individual GenBank entries? I don't mean the "last revision" date that can be found in the first line of a GenBank file. I can access this creation date by looking at the "revision history" of any GenBank entry (for example, see http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), but I need a systematic (and local=fast) way to access this information. Any help would be very appreciated, Thank you very much in advance, M; From Bank.Beszteri at awi.de Mon Apr 7 07:46:43 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Mon, 07 Apr 2008 13:46:43 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: References: <47F22B35.1030502@awi.de> Message-ID: <47FA09A3.2070004@awi.de> Hi Hilmar, it was important to understand that the inconsistency in taxon names is apparently only between the Swissprot entries with "non-standard" names and the contents of the taxonomy tables and that it is best to use a pre-loaded taxonomy, thanks for that! We have now updated to bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to have loaded everything OK in ~26 hours (with many of the "The supplied lineage does not start near..." warnings, but no other problems). Our next test is to try to load trembl (will try to do this in parallel in multiple chunks), hope it will work just as nicely! Thanks for your tips & insights! Bank Hilmar Lapp wrote: > > On Apr 1, 2008, at 8:31 AM, B?nk Beszteri wrote: > >> [...] So next we started to test BioSQL, by trying to load just >> Swissprot in a MySQL DB first, like: >> >> load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser >> xyz --dbpass abc --driver mysql --namespace uniprot_sprot --format >> swiss uniprot_sprot.dat >> >> Here we get an error message >> >> ########################################### >> >> Loading /biodb/spinkern/uniprot_sprot.dat ... >> Could not store Q6DAH5: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: The supplied lineage does not start near 'Erwinia carotovora >> subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | >> Pectobacterium | Enterobacteriaceae | Enterobacteriales | >> Gammaproteobacteria | Proteobacteria | Bacteria') >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/ >> bioperl-1.5.2_102/Bio/Root/Root.pm:359 >> STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/ >> bioperl-1.5.2_102/Bio/Species.pm:174 >> STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: 552 >> STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / >> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:1305 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key >> /biodb/ spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:973 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:852 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:182 >> STACK: Bio::DB::Persistent::PersistentObject::create /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: 244 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:169 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store /biodb/spinkern/ >> bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: load_seqdatabase.pl:622 >> ----------------------------------------------------------- >> >> at load_seqdatabase.pl line 635 >> >> ############################################ >> >> or similar, depending on whether we use a pre-loaded ncbi taxonomy >> or not > > > I recommend to always use a pre-loaded NCBI taxonomy unless you know > there are only a few organisms that are straightforward (for the > parser, that is). > >> , and which Swissprot release we are trying to load. It often seems >> to come from sg. like here, subsp. or other special addition to the >> species line; but alternative genus names and other curious things >> also to appear. It looks like Species.pm tries to validate the >> species name against the lineage info already there in the BioSQL >> DB, and in several cases, it finds inconsistencies. > > > It actually happens upon a successful lookup when the species object > is populated from the database. > >> [...] >> The only workaround we have found so far was to comment out line 174 >> in Species.pm: >> >> $self->throw("The supplied lineage does not start near '$name' (I >> was supplied '".join(" | ", @vals)."')"); > > > That should be OK if you work with a pre-loaded taxonomy. It's sort > of a sanity check that should catch a parser having messed up a > species. If you use a pre-loaded NCBI taxonomy the results of the > species parsing don't matter in all details so long as the NCBI > taxonID is parsed out correctly, and then found in the database. > > Note that this actually a warn() in the main trunk version of > BioPerl, so you might want to upgrade to that (or change throw() to > warn() in your version). You still get the records flagged with that, > but it isn't an exception. > >> >> After doing so, load_seqdatabase.pl runs for several hours (until it >> evetually crashes; I haven?t found out yet why), but proceeds really >> slowly. > > > It should certainly *not* crash. Note also that you can supply --safe > on the command line, in which case the script will continue with the > next record if one fails to load for whatever reason. > > You will want to adjust the width constraint of dbxref.accession, for > example to 128 chars. This will also be fixed for BioSQL 1.0.1. > See http://bugzilla.open-bio.org/show_bug.cgi?id=2474 > > >> I also found some info on this for Pg and Oracle in the mailing >> list, but has anyone some approximate numbers for MySQL, how long >> should a first Swissprot load take? > > > Possibly around 20 hours according to Erik Rijkers: > See http://lists.open-bio.org/pipermail/bioperl-l/2008-March/027427.html > > You can use the --logchunks N option to have it print out performance > statistics every N records. > > Hope this helps, > > -hilmar From cjfields at uiuc.edu Mon Apr 7 08:32:45 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Apr 2008 07:32:45 -0500 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47FA09A3.2070004@awi.de> References: <47F22B35.1030502@awi.de> <47FA09A3.2070004@awi.de> Message-ID: The warnings are something that we still need to resolve, but the only fix I can think of likely breaks backward compatibility with older bioperl-db installations (i.e. storing the given scientific name instead of the binomial name, which is used as a fallback when no taxid is found). There is a full explanation here: http://bugzilla.open-bio.org/show_bug.cgi?id=2092 Anyway, I think it needs further testing when someone, likely Hilmar or I, have time. chris On Apr 7, 2008, at 6:46 AM, B?nk Beszteri wrote: > Hi Hilmar, > > it was important to understand that the inconsistency in taxon names > is apparently only between the Swissprot entries with "non-standard" > names and the contents of the taxonomy tables and that it is best to > use a pre-loaded taxonomy, thanks for that! We have now updated to > bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to > have loaded everything OK in ~26 hours (with many of the "The > supplied lineage does not start near..." warnings, but no other > problems). Our next test is to try to load trembl (will try to do > this in parallel in multiple chunks), hope it will work just as > nicely! > > Thanks for your tips & insights! > > Bank > > Hilmar Lapp wrote: > >> >> On Apr 1, 2008, at 8:31 AM, B?nk Beszteri wrote: >> >>> [...] So next we started to test BioSQL, by trying to load just >>> Swissprot in a MySQL DB first, like: >>> >>> load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser >>> xyz --dbpass abc --driver mysql --namespace uniprot_sprot -- >>> format swiss uniprot_sprot.dat >>> >>> Here we get an error message >>> >>> ########################################### >>> >>> Loading /biodb/spinkern/uniprot_sprot.dat ... >>> Could not store Q6DAH5: >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: The supplied lineage does not start near 'Erwinia carotovora >>> subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | >>> Pectobacterium | Enterobacteriaceae | Enterobacteriales | >>> Gammaproteobacteria | Proteobacteria | Bacteria') >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/ >>> bioperl-1.5.2_102/Bio/Root/Root.pm:359 >>> STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/ >>> bioperl-1.5.2_102/Bio/Species.pm:174 >>> STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/ >>> PersistentObject.pm: 552 >>> STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / >>> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:1305 >>> STACK: >>> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / >>> biodb/ spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:973 >>> STACK: >>> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >>> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:852 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:182 >>> STACK: Bio::DB::Persistent::PersistentObject::create /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/ >>> PersistentObject.pm: 244 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:169 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:251 >>> STACK: Bio::DB::Persistent::PersistentObject::store /biodb/ >>> spinkern/ bioperl-db-1.5.2_100/Bio/DB/Persistent/ >>> PersistentObject.pm:271 >>> STACK: load_seqdatabase.pl:622 >>> ----------------------------------------------------------- >>> >>> at load_seqdatabase.pl line 635 >>> >>> ############################################ >>> >>> or similar, depending on whether we use a pre-loaded ncbi >>> taxonomy or not >> >> >> I recommend to always use a pre-loaded NCBI taxonomy unless you >> know there are only a few organisms that are straightforward (for >> the parser, that is). >> >>> , and which Swissprot release we are trying to load. It often >>> seems to come from sg. like here, subsp. or other special >>> addition to the species line; but alternative genus names and >>> other curious things also to appear. It looks like Species.pm >>> tries to validate the species name against the lineage info >>> already there in the BioSQL DB, and in several cases, it finds >>> inconsistencies. >> >> >> It actually happens upon a successful lookup when the species >> object is populated from the database. >> >>> [...] >>> The only workaround we have found so far was to comment out line >>> 174 in Species.pm: >>> >>> $self->throw("The supplied lineage does not start near '$name' (I >>> was supplied '".join(" | ", @vals)."')"); >> >> >> That should be OK if you work with a pre-loaded taxonomy. It's >> sort of a sanity check that should catch a parser having messed up >> a species. If you use a pre-loaded NCBI taxonomy the results of >> the species parsing don't matter in all details so long as the >> NCBI taxonID is parsed out correctly, and then found in the >> database. >> >> Note that this actually a warn() in the main trunk version of >> BioPerl, so you might want to upgrade to that (or change throw() >> to warn() in your version). You still get the records flagged with >> that, but it isn't an exception. >> >>> >>> After doing so, load_seqdatabase.pl runs for several hours (until >>> it evetually crashes; I haven?t found out yet why), but proceeds >>> really slowly. >> >> >> It should certainly *not* crash. Note also that you can supply -- >> safe on the command line, in which case the script will continue >> with the next record if one fails to load for whatever reason. >> >> You will want to adjust the width constraint of dbxref.accession, >> for example to 128 chars. This will also be fixed for BioSQL 1.0.1. >> See http://bugzilla.open-bio.org/show_bug.cgi?id=2474 >> >> >>> I also found some info on this for Pg and Oracle in the mailing >>> list, but has anyone some approximate numbers for MySQL, how long >>> should a first Swissprot load take? >> >> >> Possibly around 20 hours according to Erik Rijkers: >> See http://lists.open-bio.org/pipermail/bioperl-l/2008-March/027427.html >> >> You can use the --logchunks N option to have it print out >> performance statistics every N records. >> >> Hope this helps, >> >> -hilmar > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Mon Apr 7 08:34:00 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 07 Apr 2008 13:34:00 +0100 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47FA09A3.2070004@awi.de> References: <47F22B35.1030502@awi.de> <47FA09A3.2070004@awi.de> Message-ID: <47FA14B8.7000500@sendu.me.uk> B?nk Beszteri wrote: > Hi Hilmar, > > it was important to understand that the inconsistency in taxon names is > apparently only between the Swissprot entries with "non-standard" names > and the contents of the taxonomy tables and that it is best to use a > pre-loaded taxonomy, thanks for that! We have now updated to > bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to have > loaded everything OK in ~26 hours (with many of the "The supplied > lineage does not start near..." warnings, but no other problems). Can you provide some examples of these warnings (of the taxons that cause them)? If there's anything consistent about them perhaps Bio::Species can be improved to accommodate them properly (instead of just issuing the warning and getting the classification wrong). From heikki at sanbi.ac.za Mon Apr 7 08:48:34 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Mon, 7 Apr 2008 14:48:34 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47F9F3AA.2090003@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> Message-ID: <200804071448.34769.heikki@sanbi.ac.za> Miguel, You probably know this but: - Your entry example below is a GenPept entry, not a GenBank entry - The NCBI sequence format "genbank" has only the last modified date. I do not know about other formats (ASN.1, ...) - NCBI Entrez is a great tool but it obscures the source database. - If you really are working on real GenBank entries, you can use the accession number to see find corresponding EMBL (and Swiss-Prot) flat file formats that have both creation and last modified dates. Post to the list if you have trouble getting the dates from EMBL/Swiss-Prot formats using bioperl. Yours, -Heikki On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: > Hi all, > > Is there any way to obtain the date of creation of individual GenBank > entries? I don't mean the "last revision" date that can be found in the > first line of a GenBank file. > > I can access this creation date by looking at the "revision history" of > any GenBank entry (for example, see > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), > but I need a systematic (and local=fast) way to access this information. > > Any help would be very appreciated, > Thank you very much in advance, > > M; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From granjeau at tagc.univ-mrs.fr Mon Apr 7 09:30:10 2008 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/ICIM) Date: Mon, 07 Apr 2008 15:30:10 +0200 Subject: [Bioperl-l] help installing bioperl with cygwin In-Reply-To: References: <161313331084931@webmail.iastate.edu> Message-ID: <47FA21E2.3010602@tagc.univ-mrs.fr> Hi, I'm using BioPerl under Cygwin, because Cygwin allows one to work in a Unix-like environment in a command line point of view. So, I use the CVS version which runs out of the box http://www.bioperl.org/wiki/Using_CVS which has been replaced by SVN at the beginning of the year http://www.bioperl.org/wiki/Using_Subversion So if you really want to work under Cygwin, you can try this quick and dirty way, but you still have to become experienced because BioPerl is not supported under Cygwin. You may try Strawberry, but in my experience in installing wxPerl, wxPerl fails on both flavours of Perl. ActiveState's Perl is still the easiest way to install many packages. Regards, Samuel Chris Fields wrote: > It's best if you use ActiveState's Perl installation (it's the only > one we really support at this moment, unless someone wants to give > StrawberryPerl a run). See: > > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > chris > > On Apr 3, 2008, at 1:13 PM, slduncan at iastate.edu wrote: > >> I am trying to use cpan to install bioperl and I had an error message >> saying: >> c:\Documents not recognized as and external or internal.... >> Any ideas here. Also, I am new to the computer world so please be >> kind. :) >> >> Stacy Duncan >> Iowa State University >> Bioinformatics and Computational Biology >> 1802 University Blvd. >> VMRI Building 6 >> Ames, IA 50011-1240 >> office phone: (515) 294-8385 >> office fax: (515) 294-1401 >> home phone: (336) 965-5622 >> e-mail: slduncan at iastate.edu >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Samuel GRANJEAUD granjeau at tagc.univ-mrs.fr INSERM - ICIM - TAGC Tel: +33 (0)491 82 87 24 http://tagc.univ-mrs.fr Fax: +33 (0)491 82 87 01 http://icim.marseille.inserm.fr/proteomique From er at xs4all.nl Mon Apr 7 10:36:57 2008 From: er at xs4all.nl (Erik) Date: Mon, 7 Apr 2008 16:36:57 +0200 (CEST) Subject: [Bioperl-l] Indexing large databases / BioSQL Message-ID: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> On Mon, April 7, 2008 14:34, Sendu Bala wrote: > B?nk Beszteri wrote: >> Hi Hilmar, >> >> it was important to understand that the inconsistency in taxon names is >> apparently only between the Swissprot entries with "non-standard" names >> and the contents of the taxonomy tables and that it is best to use a >> pre-loaded taxonomy, thanks for that! We have now updated to >> bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to have >> loaded everything OK in ~26 hours (with many of the "The supplied >> lineage does not start near..." warnings, but no other problems). > > Can you provide some examples of these warnings (of the taxons that > cause them)? If there's anything consistent about them perhaps > Bio::Species can be improved to accommodate them properly (instead of > just issuing the warning and getting the classification wrong). > I did this a little while ago and saved the output (UniProtKB/Swiss-Prot Release 55.1 of 18-Mar-2008, I think). All warnings (and a few errors) for swissprot are here: http://bugzilla.open-bio.org/show_bug.cgi?id=2474 as an attached file I suppose the OP will have encountered similar output - I don't think there is much RDBMS-type-dependency involved. regards, Erik Rijkers From cjfields at uiuc.edu Mon Apr 7 11:46:01 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Apr 2008 10:46:01 -0500 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <200804071448.34769.heikki@sanbi.ac.za> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> Message-ID: <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> Strangely enough, if you use NCBI's esummary you can get both dates. Via Bio::DB::EUtilities in bioperl-live, if you dump out DocSum data (using a debugging method I added in a while back): --------------------------------------- use Bio::DB::EUtilities; # for multiple IDs use an array ref; also only use GI's (not accessions) my $factory = Bio::DB::EUtilities->new( -eutil => 'esummary', -db => 'protein', -id => 1621261); $factory->print_DocSums; --------------------------------------- One gets the following tag/value pairs: UID: 1621261 Caption :CAB02640 Title :PROBABLE PYRIMIDINE OPERON REGULATORY PROTEIN PYRR [Mycobacterium tuberculosis H37Rv] Extra :gi|1621261|emb|CAB02640.1|[1621261] Gi :1621261 CreateDate :2003/11/21 UpdateDate :2006/11/14 Flags : TaxId :83332 Length :193 Status :live ReplacedBy : Comment : I'll add in a method to grab the data element by tag (in this case, grab the creation date by asking for the 'CreateDate' key). Might come in handy for scripts. chris On Apr 7, 2008, at 7:48 AM, Heikki Lehvaslaiho wrote: > Miguel, > > You probably know this but: > > - Your entry example below is a GenPept entry, not a GenBank entry > - The NCBI sequence format "genbank" has only the last modified date. > I do not know about other formats (ASN.1, ...) > - NCBI Entrez is a great tool but it obscures the source database. > - If you really are working on real GenBank entries, you can use the > accession > number to see find corresponding EMBL (and Swiss-Prot) flat file > formats that > have both creation and last modified dates. > > Post to the list if you have trouble getting the dates from EMBL/ > Swiss-Prot > formats using bioperl. > > Yours, > > -Heikki > > On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: >> Hi all, >> >> Is there any way to obtain the date of creation of individual GenBank >> entries? I don't mean the "last revision" date that can be found in >> the >> first line of a GenBank file. >> >> I can access this creation date by looking at the "revision >> history" of >> any GenBank entry (for example, see >> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi? >> val=74311105), >> but I need a systematic (and local=fast) way to access this >> information. >> >> Any help would be very appreciated, >> Thank you very much in advance, >> >> M; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miguel.pignatelli at uv.es Mon Apr 7 12:24:50 2008 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Mon, 07 Apr 2008 18:24:50 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> Message-ID: <47FA4AD2.5030206@uv.es> I've noticed that the ASN.1 version of those records has a "creation-date" tag. But this is somehow strange, because the creation date obtained by you and that obtained via ASN.1 format is 2003/11/21, but if you look at the revision history of the record: http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=CAB02640 reports a creation date of "Oct 19 1996 12:28 AM" I don't know how to get this, because the EMBL version of this gene: http://www.ebi.ac.uk/cgi-bin/dbfetch?db=emblcds&id=CAB02640&style=raw doesn't has DT fields at all. M; Chris Fields wrote: > Strangely enough, if you use NCBI's esummary you can get both dates. > Via Bio::DB::EUtilities in bioperl-live, if you dump out DocSum data > (using a debugging method I added in a while back): > > --------------------------------------- > > use Bio::DB::EUtilities; > > # for multiple IDs use an array ref; also only use GI's (not accessions) > my $factory = Bio::DB::EUtilities->new( > -eutil => 'esummary', > -db => 'protein', > -id => 1621261); > > $factory->print_DocSums; > > --------------------------------------- > > One gets the following tag/value pairs: > > UID: 1621261 > Caption :CAB02640 > Title :PROBABLE PYRIMIDINE OPERON REGULATORY PROTEIN PYRR > [Mycobacterium tuberculosis > H37Rv] > Extra :gi|1621261|emb|CAB02640.1|[1621261] > Gi :1621261 > CreateDate :2003/11/21 > UpdateDate :2006/11/14 > Flags : > TaxId :83332 > Length :193 > Status :live > ReplacedBy : > Comment : > > I'll add in a method to grab the data element by tag (in this case, grab > the creation date by asking for the 'CreateDate' key). Might come in > handy for scripts. > > chris > > On Apr 7, 2008, at 7:48 AM, Heikki Lehvaslaiho wrote: > >> Miguel, >> >> You probably know this but: >> >> - Your entry example below is a GenPept entry, not a GenBank entry >> - The NCBI sequence format "genbank" has only the last modified date. >> I do not know about other formats (ASN.1, ...) >> - NCBI Entrez is a great tool but it obscures the source database. >> - If you really are working on real GenBank entries, you can use the >> accession >> number to see find corresponding EMBL (and Swiss-Prot) flat file >> formats that >> have both creation and last modified dates. >> >> Post to the list if you have trouble getting the dates from >> EMBL/Swiss-Prot >> formats using bioperl. >> >> Yours, >> >> -Heikki >> >> On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: >>> Hi all, >>> >>> Is there any way to obtain the date of creation of individual GenBank >>> entries? I don't mean the "last revision" date that can be found in the >>> first line of a GenBank file. >>> >>> I can access this creation date by looking at the "revision history" of >>> any GenBank entry (for example, see >>> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), >>> but I need a systematic (and local=fast) way to access this information. >>> >>> Any help would be very appreciated, >>> Thank you very much in advance, >>> >>> M; >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ______ _/ _/_____________________________________________________ >> _/ _/ >> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >> _/ _/ _/ SANBI, South African National Bioinformatics Institute >> _/ _/ _/ University of Western Cape, South Africa >> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >> ___ _/_/_/_/_/________________________________________________________ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Mon Apr 7 13:48:45 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Apr 2008 12:48:45 -0500 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47FA4AD2.5030206@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> <47FA4AD2.5030206@uv.es> Message-ID: Note in the example I gave that, during the revision history, the DBSOURCE changed at the point of the creation date (the original nuc. record was a M. tuberculosis contig sequence, which later changed to an updated full M. tuberculosis genome record at the time of the 'create date'). Couldn't find anything specific in the GenBank docs on this, but it appears (at least for a protein record) the creation date reflects the date in which the sequence was either originally deposited or originally derived from the nucleotide source record present in the record. In other words, it may not reflect the original date of deposition (which could have come from a different record, as in this case). chris On Apr 7, 2008, at 11:24 AM, Miguel Pignatelli wrote: > > I've noticed that the ASN.1 version of those records has a "creation- > date" tag. > But this is somehow strange, because the creation date obtained by > you and that obtained via ASN.1 format is 2003/11/21, but if you > look at the revision history of the record: > > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=CAB02640 > > reports a creation date of "Oct 19 1996 12:28 AM" > > I don't know how to get this, because the EMBL version of this gene: > > http://www.ebi.ac.uk/cgi-bin/dbfetch?db=emblcds&id=CAB02640&style=raw > > doesn't has DT fields at all. > > M; > > > Chris Fields wrote: >> Strangely enough, if you use NCBI's esummary you can get both >> dates. Via Bio::DB::EUtilities in bioperl-live, if you dump out >> DocSum data (using a debugging method I added in a while back): >> --------------------------------------- >> use Bio::DB::EUtilities; >> # for multiple IDs use an array ref; also only use GI's (not >> accessions) >> my $factory = Bio::DB::EUtilities->new( >> -eutil => 'esummary', >> -db => 'protein', >> -id => 1621261); >> $factory->print_DocSums; >> --------------------------------------- >> One gets the following tag/value pairs: >> UID: 1621261 >> Caption :CAB02640 >> Title :PROBABLE PYRIMIDINE OPERON REGULATORY PROTEIN >> PYRR [Mycobacterium tuberculosis >> H37Rv] >> Extra :gi|1621261|emb|CAB02640.1|[1621261] >> Gi :1621261 >> CreateDate :2003/11/21 >> UpdateDate :2006/11/14 >> Flags : >> TaxId :83332 >> Length :193 >> Status :live >> ReplacedBy : >> Comment : >> I'll add in a method to grab the data element by tag (in this case, >> grab the creation date by asking for the 'CreateDate' key). Might >> come in handy for scripts. >> chris >> On Apr 7, 2008, at 7:48 AM, Heikki Lehvaslaiho wrote: >>> Miguel, >>> >>> You probably know this but: >>> >>> - Your entry example below is a GenPept entry, not a GenBank entry >>> - The NCBI sequence format "genbank" has only the last modified >>> date. >>> I do not know about other formats (ASN.1, ...) >>> - NCBI Entrez is a great tool but it obscures the source database. >>> - If you really are working on real GenBank entries, you can use >>> the accession >>> number to see find corresponding EMBL (and Swiss-Prot) flat file >>> formats that >>> have both creation and last modified dates. >>> >>> Post to the list if you have trouble getting the dates from EMBL/ >>> Swiss-Prot >>> formats using bioperl. >>> >>> Yours, >>> >>> -Heikki >>> >>> On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: >>>> Hi all, >>>> >>>> Is there any way to obtain the date of creation of individual >>>> GenBank >>>> entries? I don't mean the "last revision" date that can be found >>>> in the >>>> first line of a GenBank file. >>>> >>>> I can access this creation date by looking at the "revision >>>> history" of >>>> any GenBank entry (for example, see >>>> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105) >>>> , >>>> but I need a systematic (and local=fast) way to access this >>>> information. >>>> >>>> Any help would be very appreciated, >>>> Thank you very much in advance, >>>> >>>> M; >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> ______ _/ _/ >>> _____________________________________________________ >>> _/ _/ >>> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >>> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >>> _/ _/ _/ SANBI, South African National Bioinformatics Institute >>> _/ _/ _/ University of Western Cape, South Africa >>> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >>> ___ _/_/_/_/_/ >>> ________________________________________________________ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Bank.Beszteri at awi.de Tue Apr 8 03:35:43 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Tue, 08 Apr 2008 09:35:43 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> Message-ID: <47FB204F.90405@awi.de> >>Can you provide some examples of these warnings (of the taxons that >>cause them)? If there's anything consistent about them perhaps >>Bio::Species can be improved to accommodate them properly (instead of >>just issuing the warning and getting the classification wrong). >> >> > >All warnings (and a few errors) for swissprot are here: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2474 > >as an attached file > >I suppose the OP will have encountered similar output - I don't think there is >much RDBMS-type-dependency involved. > > Hi Erik & Sendu, yes, the same kind of thing, probably no DBMS-type dependency; in case it could be useful, I uploaded my output as a second attachment to the bugzilla report cited above. Bank From heikki at sanbi.ac.za Tue Apr 8 04:32:12 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Tue, 8 Apr 2008 10:32:12 +0200 Subject: [Bioperl-l] Blast database sequence retrieval perl script In-Reply-To: <6BEABCD5CA640A44A848448A42A03B73079E48C9@ilrikeadx1.ILRI.CGIARAD.ORG> References: <6BEABCD5CA640A44A848448A42A03B73079E48C9@ilrikeadx1.ILRI.CGIARAD.ORG> Message-ID: <200804081032.12312.heikki@sanbi.ac.za> Dear Nelson, I am cc:ing the bioperl mailing list where all these kind of queries should go. More people can help you that way. Since you have your own local data set, you need to create an index that catalogues you sequences for easy retrieval. You need to install bioperl-live first. See for example: http://www.bioperl.org/wiki/Using_Subversion Then you can follow this HOWTO: http://www.bioperl.org/wiki/HOWTO:Flat_databases The other HOWTOs will help you dealing with BioPerl sequence objects that are retrieved: http://www.bioperl.org/wiki/HOWTOs. Yours, -Heikki On Monday 07 April 2008 14:50:23 Ndegwa, Nelson (IITA-Nairobi) wrote: > Dear Prof. Heikki, > > Hi. We met at the Pathogen Bioinformatics Conference held in Nairobi > Kenya in May 2007 at ICIPE. I recall you are a developer of Bioperl and > Perl. I have managed to install a local Blast, having just cowpea Contig > sequences, about 50,000 in total. This runs fine, as I can perform > various queries and get results. However, any good match/hit on the > local Blast database is hard to retrieve and the only option seems to go > back to that database and search manually for the top hit sequence - an > exceedingly manual task. Might you perhaps be having a Perl script I > could adopt to my database to help with this task Such that the hits > have a hyperlink which can be used to retrieve that specific entry? I > have limited knowledge of Perl. Thank you. > > With Kind Regards, > > Nelson. -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From David.Messina at sbc.su.se Tue Apr 8 07:29:12 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 8 Apr 2008 13:29:12 +0200 Subject: [Bioperl-l] How to analysis the relationship of my interesting KEGG pathways? In-Reply-To: <628aabb70804080053g1fd9120ex9d5fd12f65f216f9@mail.gmail.com> References: <628aabb70804080053g1fd9120ex9d5fd12f65f216f9@mail.gmail.com> Message-ID: <628aabb70804080429k2aa17a6eu12197709d4cc1af0@mail.gmail.com> Hi Jinyan, You asked a similar question last week and received a couple of suggestions -- did you take a look at those? I'm not an expert on this topic, but I believe that since regulatory information is much harder to obtain experimentally and therefore much less well known, there isn't a lot of it in pathway databases like KEGG. You may have to look through the literature and start trying to put together possible regulatory links on your own. Dave From hrh at sanger.ac.uk Tue Apr 8 08:48:32 2008 From: hrh at sanger.ac.uk (Hans Rudolf Hotz) Date: Tue, 8 Apr 2008 13:48:32 +0100 (BST) Subject: [Bioperl-l] Blast database sequence retrieval perl script In-Reply-To: <200804081032.12312.heikki@sanbi.ac.za> References: <6BEABCD5CA640A44A848448A42A03B73079E48C9@ilrikeadx1.ILRI.CGIARAD.ORG> <200804081032.12312.heikki@sanbi.ac.za> Message-ID: Nelson or simply use the BLAST indices for the sequence retrieval as well. All you need to do is adding the "-o" option to the 'formatdb' command for the BLAST index creation (this will create some extra files). Then you can use 'fastacmd' (which is also part of the NCBI BLAST package) to retrieve the sequences. Hans On Tue, 8 Apr 2008, Heikki Lehvaslaiho wrote: > > Dear Nelson, > > I am cc:ing the bioperl mailing list where all these kind of queries should > go. More people can help you that way. > > > Since you have your own local data set, you need to create an index that > catalogues you sequences for easy retrieval. > > You need to install bioperl-live first. See for example: > http://www.bioperl.org/wiki/Using_Subversion > > Then you can follow this HOWTO: > http://www.bioperl.org/wiki/HOWTO:Flat_databases > > The other HOWTOs will help you dealing with BioPerl sequence objects that are > retrieved: http://www.bioperl.org/wiki/HOWTOs. > > > Yours, > > -Heikki > > > On Monday 07 April 2008 14:50:23 Ndegwa, Nelson (IITA-Nairobi) wrote: >> Dear Prof. Heikki, >> >> Hi. We met at the Pathogen Bioinformatics Conference held in Nairobi >> Kenya in May 2007 at ICIPE. I recall you are a developer of Bioperl and >> Perl. I have managed to install a local Blast, having just cowpea Contig >> sequences, about 50,000 in total. This runs fine, as I can perform >> various queries and get results. However, any good match/hit on the >> local Blast database is hard to retrieve and the only option seems to go >> back to that database and search manually for the top hit sequence - an >> exceedingly manual task. Might you perhaps be having a Perl script I >> could adopt to my database to help with this task Such that the hits >> have a hyperlink which can be used to retrieve that specific entry? I >> have limited knowledge of Perl. Thank you. >> >> With Kind Regards, >> >> Nelson. > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From robert.citek at gmail.com Tue Apr 8 10:09:27 2008 From: robert.citek at gmail.com (Robert Citek) Date: Tue, 8 Apr 2008 09:09:27 -0500 Subject: [Bioperl-l] module for pubchem queries In-Reply-To: <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> References: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> Message-ID: <4145b6790804080709l20f1e56erf4b7af04b0a52870@mail.gmail.com> Wrapping bioperl around eutils will work just fine. Thanks for the pointer. http://search.cpan.org/~sendu/bioperl-1.5.2_102/Bio/DB/EUtilities.pm Regards, - Robert On Fri, Apr 4, 2008 at 4:25 PM, Chris Fields wrote: > Do you need something to access eutils via BioPerl, or are you looking for a > specific set of classes? I wrote an interface to eutils > (Bio::DB::EUtilities), you could do something like this: > > #!/usr/bin/perl -w > > use strict; > use warnings; > use Bio::DB::EUtilities; > > my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', > -term => 'dihydroorotate', > -db => 'pcsubstance', > -retmax => 1000); > > print join(',',$eutil->get_ids)."\n"; > > chris From cjfields at uiuc.edu Tue Apr 8 11:10:26 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 8 Apr 2008 10:10:26 -0500 Subject: [Bioperl-l] module for pubchem queries In-Reply-To: <4145b6790804080709l20f1e56erf4b7af04b0a52870@mail.gmail.com> References: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> <4145b6790804080709l20f1e56erf4b7af04b0a52870@mail.gmail.com> Message-ID: <32D210FC-575E-4D95-95DA-FC6F5BE1FC24@uiuc.edu> Just to note, the the API has changed significantly from the interface in the 1.5.2 release. The up-to-date (supported) interface is in subversion; there are some example recipes here: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook I'm working on a full HOWTO, just haven't had time to get it up on the wiki yet. chris On Apr 8, 2008, at 9:09 AM, Robert Citek wrote: > Wrapping bioperl around eutils will work just fine. Thanks for the > pointer. > > http://search.cpan.org/~sendu/bioperl-1.5.2_102/Bio/DB/EUtilities.pm > > Regards, > - Robert > > On Fri, Apr 4, 2008 at 4:25 PM, Chris Fields > wrote: >> Do you need something to access eutils via BioPerl, or are you >> looking for a >> specific set of classes? I wrote an interface to eutils >> (Bio::DB::EUtilities), you could do something like this: >> >> #!/usr/bin/perl -w >> >> use strict; >> use warnings; >> use Bio::DB::EUtilities; >> >> my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', >> -term => 'dihydroorotate', >> -db => 'pcsubstance', >> -retmax => 1000); >> >> print join(',',$eutil->get_ids)."\n"; >> >> chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cuiw at ncbi.nlm.nih.gov Tue Apr 8 16:41:58 2008 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Tue, 8 Apr 2008 16:41:58 -0400 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47F9F3AA.2090003@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> Message-ID: <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> Hi, Miguel: id1_fetch can do it. Detailed instruction can be found at: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id 1_fetch.html Here is an example: >id1_fetch -lt revisions -flat '12:74311105' -fmt fasta GI Loaded DB Retrieval No. -- ------ -- ------------- 74311105 12/07/2007 NCBI 19766263 74311105 01/23/2007 NCBI 16325656 74311105 03/30/2006 NCBI 13131204 74311105 03/03/2006 NCBI 12915541 74311105 03/02/2006 NCBI 12885275 74311105 12/03/2005 NCBI 12259793 74311105 09/09/2005 NCBI 11257262 74311105 09/09/2005 NCBI 11242667 Wenwu Cui PhD NCBI/NLM/NIH > -----Original Message----- > From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > Sent: Monday, April 07, 2008 6:13 AM > Cc: bioperl-l at bioperl.org > Subject: [Bioperl-l] GenBank entries creation dates > > Hi all, > > Is there any way to obtain the date of creation of individual GenBank > entries? I don't mean the "last revision" date that can be found in the > first line of a GenBank file. > > I can access this creation date by looking at the "revision history" of > any GenBank entry (for example, see > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), > but I need a systematic (and local=fast) way to access this > information. > > Any help would be very appreciated, > Thank you very much in advance, > > M; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miguel.pignatelli at uv.es Wed Apr 9 07:32:39 2008 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Wed, 09 Apr 2008 13:32:39 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> Message-ID: <47FCA957.5040409@uv.es> Wow, impressive, thanks Wenwu for the information, I have never used this tool before. The problem is that I need to know all the revision history (or at least the creation date) for *all* the GIs present in nr (well, or at least a significant portion of it) and this tool queries via web. The existence of this tool confirms me that this information is available somewhere, is it possible to download the data that contains this information? Thanks again, M; Cui, Wenwu (NIH/NLM/NCBI) [C] wrote: > Hi, Miguel: > > id1_fetch can do it. Detailed instruction can be found at: > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id > 1_fetch.html > > Here is an example: > >> id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > GI Loaded DB Retrieval No. > -- ------ -- ------------- > 74311105 12/07/2007 NCBI 19766263 > 74311105 01/23/2007 NCBI 16325656 > 74311105 03/30/2006 NCBI 13131204 > 74311105 03/03/2006 NCBI 12915541 > 74311105 03/02/2006 NCBI 12885275 > 74311105 12/03/2005 NCBI 12259793 > 74311105 09/09/2005 NCBI 11257262 > 74311105 09/09/2005 NCBI 11242667 > > Wenwu Cui PhD > NCBI/NLM/NIH > >> -----Original Message----- >> From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] >> Sent: Monday, April 07, 2008 6:13 AM >> Cc: bioperl-l at bioperl.org >> Subject: [Bioperl-l] GenBank entries creation dates >> >> Hi all, >> >> Is there any way to obtain the date of creation of individual GenBank >> entries? I don't mean the "last revision" date that can be found in > the >> first line of a GenBank file. >> >> I can access this creation date by looking at the "revision history" > of >> any GenBank entry (for example, see >> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), >> but I need a systematic (and local=fast) way to access this >> information. >> >> Any help would be very appreciated, >> Thank you very much in advance, >> >> M; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cuiw at ncbi.nlm.nih.gov Wed Apr 9 09:25:16 2008 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Wed, 9 Apr 2008 09:25:16 -0400 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47FCA957.5040409@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> <47FCA957.5040409@uv.es> Message-ID: <6F230E9769AA8D4EB4BC401DF133EDB7180BE1@NIHCESMLBX15.nih.gov> Hi, Miguel, I do not know whether the data file is publically available. However, you can perform 'real time' query via id1_fetch: ####step 1: generate GI file ##### id1_fetch -query 'YOUR-GENBANK-QUERY-STRING' -lt none -db Nucleotide -out qfile ####step 2: retrieve revisions for GIs stored in qfile ##### id1_fetch -lt revisions -qf qfile -fmt fasta -db Nucleotide Good luck! Wenwu Cui > -----Original Message----- > From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > Sent: Wednesday, April 09, 2008 7:33 AM > To: Cui, Wenwu (NIH/NLM/NCBI) [C] > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] GenBank entries creation dates > > Wow, impressive, thanks Wenwu for the information, I have never used > this tool before. The problem is that I need to know all the revision > history (or at least the creation date) for *all* the GIs present in nr > (well, or at least a significant portion of it) and this tool queries > via web. > > The existence of this tool confirms me that this information is > available somewhere, is it possible to download the data that contains > this information? > > Thanks again, > > M; > > > Cui, Wenwu (NIH/NLM/NCBI) [C] wrote: > > Hi, Miguel: > > > > id1_fetch can do it. Detailed instruction can be found at: > > > > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.i > d > > 1_fetch.html > > > > Here is an example: > > > >> id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > > GI Loaded DB Retrieval No. > > -- ------ -- ------------- > > 74311105 12/07/2007 NCBI 19766263 > > 74311105 01/23/2007 NCBI 16325656 > > 74311105 03/30/2006 NCBI 13131204 > > 74311105 03/03/2006 NCBI 12915541 > > 74311105 03/02/2006 NCBI 12885275 > > 74311105 12/03/2005 NCBI 12259793 > > 74311105 09/09/2005 NCBI 11257262 > > 74311105 09/09/2005 NCBI 11242667 > > > > Wenwu Cui PhD > > NCBI/NLM/NIH > > > >> -----Original Message----- > >> From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > >> Sent: Monday, April 07, 2008 6:13 AM > >> Cc: bioperl-l at bioperl.org > >> Subject: [Bioperl-l] GenBank entries creation dates > >> > >> Hi all, > >> > >> Is there any way to obtain the date of creation of individual > GenBank > >> entries? I don't mean the "last revision" date that can be found in > > the > >> first line of a GenBank file. > >> > >> I can access this creation date by looking at the "revision history" > > of > >> any GenBank entry (for example, see > >> > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), > >> but I need a systematic (and local=fast) way to access this > >> information. > >> > >> Any help would be very appreciated, > >> Thank you very much in advance, > >> > >> M; > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From CALLEY_JOHN_N at LILLY.COM Wed Apr 9 09:45:23 2008 From: CALLEY_JOHN_N at LILLY.COM (John N Calley) Date: Wed, 9 Apr 2008 09:45:23 -0400 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47FCA957.5040409@uv.es> Message-ID: You might want to keep in mind that the creation date is not always reliable. I am aware of one example where the recorded creation date precedes the sequencing date by several months (as determined by the trace file date). NCBI was not able to explain exactly what happened but (as I recall) hypothesized that some dates had been scrambled in a database rebuild. If there was interest I could probably pull up more details. John Calley Miguel Pignatelli Sent by: bioperl-l-bounces at lists.open-bio.org 04/09/2008 07:32 AM Please respond to miguel.pignatelli at uv.es To "Cui, Wenwu (NIH/NLM/NCBI) [C]" cc bioperl-l at bioperl.org Subject Re: [Bioperl-l] GenBank entries creation dates Wow, impressive, thanks Wenwu for the information, I have never used this tool before. The problem is that I need to know all the revision history (or at least the creation date) for *all* the GIs present in nr (well, or at least a significant portion of it) and this tool queries via web. The existence of this tool confirms me that this information is available somewhere, is it possible to download the data that contains this information? Thanks again, M; Cui, Wenwu (NIH/NLM/NCBI) [C] wrote: > Hi, Miguel: > > id1_fetch can do it. Detailed instruction can be found at: > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id > 1_fetch.html > > Here is an example: > >> id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > GI Loaded DB Retrieval No. > -- ------ -- ------------- > 74311105 12/07/2007 NCBI 19766263 > 74311105 01/23/2007 NCBI 16325656 > 74311105 03/30/2006 NCBI 13131204 > 74311105 03/03/2006 NCBI 12915541 > 74311105 03/02/2006 NCBI 12885275 > 74311105 12/03/2005 NCBI 12259793 > 74311105 09/09/2005 NCBI 11257262 > 74311105 09/09/2005 NCBI 11242667 > > Wenwu Cui PhD > NCBI/NLM/NIH > >> -----Original Message----- >> From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] >> Sent: Monday, April 07, 2008 6:13 AM >> Cc: bioperl-l at bioperl.org >> Subject: [Bioperl-l] GenBank entries creation dates >> >> Hi all, >> >> Is there any way to obtain the date of creation of individual GenBank >> entries? I don't mean the "last revision" date that can be found in > the >> first line of a GenBank file. >> >> I can access this creation date by looking at the "revision history" > of >> any GenBank entry (for example, see >> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), >> but I need a systematic (and local=fast) way to access this >> information. >> >> Any help would be very appreciated, >> Thank you very much in advance, >> >> M; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From frederic.romagne at gmail.com Wed Apr 9 16:45:50 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Wed, 09 Apr 2008 15:45:50 -0500 Subject: [Bioperl-l] question about clustalw module. Message-ID: <1207773950.483.13.camel@kiss-laptop> Hello, i have a problem when using Bio::Tools::Run::Alignment::Clustalw : I give it an array_ref scalar (the array contains some fasta sequences) and all the good parameters and i write the result via Bio::SeqIO. The fact is that my result file only contains the Accession number in the header... An example : the initial stream is : >NM_052854 Homo sapiens cAMP responsive element binding protein 3-like 1 (CREB3L1), mRNA. AGAAGACGTGCGGAGGGAGACGCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGG GGGAGCACTTAGCTCCCCCGCCCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTC AGCCCCAACCCCGGGCTCCCCATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGT GGAGTCGGCTGAATGCCCACGGTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCG CTGCCCTAAGGCCCCCGCGCGCCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCG CCCCTCCCCCGGGGCTTCGCCCCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAG GAGCTCTGGACTGGGCGCGCCGCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCC CGGGAGCCGGCTGCGATGGACGCCGTCTTGGAACCCTTCCCGGCCGACAGGCTGTTCCCC GGATCCAGCTTCCTGGACTTGGGGGATCTGAACGAGTCGGACTTCCTCAACAATGCGCAC ... the result file is : >NM_052854 ---------------------------------------AGAAGACGTGCGGAGGGAGAC GCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGGGGGAGCACTTAGCTCCCCCGC CCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTCAGCCCCAACCCCGGGCTCCCC ATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGTGGAGTCGGCTGAATGCCCACG GTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCGCTGCCCTAAGGCCCCCGCGCG CCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCGCCCCTCCCCCGGGGCTTCGCC CCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAGGAGCTCTGGACTGGGCGCGCC GCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCCCGGGAGCCGGCTGCGATGGAC ... ?So i lost the other informations provided by the header... ?Is there any option to keep these informations? Here is a part of my code with my options : my $seq_ref=\@seq; my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM', 'quiet' => 1, 'output' => 'FASTA'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); my $aln = $factory->align($seq_ref); Thank you. From jason at bioperl.org Wed Apr 9 16:55:13 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 9 Apr 2008 13:55:13 -0700 Subject: [Bioperl-l] question about clustalw module. In-Reply-To: <1207773950.483.13.camel@kiss-laptop> References: <1207773950.483.13.camel@kiss-laptop> Message-ID: the clustal alignment format does not allow for the description - if you want to preserve it you'll have to add it back, make a hash indexed by sequence ID and store the description, then when you get your alignment back you can update the description field before writing it out with AlignIO. -jason On Apr 9, 2008, at 1:45 PM, Fr?d?ric Romagn? wrote: > Hello, > > i have a problem when using Bio::Tools::Run::Alignment::Clustalw : > > I give it an array_ref scalar (the array contains some fasta > sequences) > and all the good parameters and i write the result via Bio::SeqIO. > > The fact is that my result file only contains the Accession number in > the header... An example : > > the initial stream is : > >> NM_052854 Homo sapiens cAMP responsive element binding protein 3- >> like 1 > (CREB3L1), mRNA. > AGAAGACGTGCGGAGGGAGACGCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGG > GGGAGCACTTAGCTCCCCCGCCCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTC > AGCCCCAACCCCGGGCTCCCCATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGT > GGAGTCGGCTGAATGCCCACGGTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCG > CTGCCCTAAGGCCCCCGCGCGCCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCG > CCCCTCCCCCGGGGCTTCGCCCCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAG > GAGCTCTGGACTGGGCGCGCCGCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCC > CGGGAGCCGGCTGCGATGGACGCCGTCTTGGAACCCTTCCCGGCCGACAGGCTGTTCCCC > GGATCCAGCTTCCTGGACTTGGGGGATCTGAACGAGTCGGACTTCCTCAACAATGCGCAC > > ... > > the result file is : > >> NM_052854 > ---------------------------------------AGAAGACGTGCGGAGGGAGAC > GCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGGGGGAGCACTTAGCTCCCCCGC > CCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTCAGCCCCAACCCCGGGCTCCCC > ATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGTGGAGTCGGCTGAATGCCCACG > GTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCGCTGCCCTAAGGCCCCCGCGCG > CCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCGCCCCTCCCCCGGGGCTTCGCC > CCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAGGAGCTCTGGACTGGGCGCGCC > GCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCCCGGGAGCCGGCTGCGATGGAC > > ... > > So i lost the other informations provided by the header... > > Is there any option to keep these informations? > > Here is a part of my code with my options : > > > my $seq_ref=\@seq; > my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM', 'quiet' => 1, > 'output' => 'FASTA'); > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > my $aln = $factory->align($seq_ref); > > > Thank you. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lamq at usal.es Thu Apr 10 11:52:24 2008 From: lamq at usal.es (Luis A. M. Quintales) Date: Thu, 10 Apr 2008 17:52:24 +0200 Subject: [Bioperl-l] xyplot glyph problem with previous aggregation Message-ID: <47FE37B8.9090404@usal.es> I am not able to add xyplot glyphs to one panel because I have some problems with the aggregations. Using that GFF file: ##sequence-region chr1 1 5578650 chr1 atfreq atpc 1 50 58.8000 . . atpc 1 chr1 atfreq atpc 51 100 58.4000 . . atpc 1 chr1 atfreq atpc 101 150 57.6000 . . atpc 1 chr1 atfreq atpc 151 200 57.8000 . . atpc 1 . . . And this source code for preparing the aggregated features necessary for the xyplot glyph: my $filin = $ARGV[0]; my $db = Bio::DB::GFF->new( -dsn => $filin, -adaptor => 'memory', -aggregator => 'at{atpc:atfreq}' ); my $segment = $db->segment('chr1'); my @features1 = $db->features('atpc'); print "$#features1 \n"; my @features2 = $segment->features('atpc'); print "$#features2 \n"; my @features3 = $db->features('at'); print "$#features3 \n"; my @features4 = $segment->features('at'); print "$#features4 \n"; I obtain: 111572 111572 0 0 What I am doing wrong with the aggregator? Many thanks. From lamq at usal.es Thu Apr 10 11:45:55 2008 From: lamq at usal.es (Luis A. M. Quintales) Date: Thu, 10 Apr 2008 17:45:55 +0200 Subject: [Bioperl-l] xyplot glyph problem with previous aggregation Message-ID: <47FE3633.70908@usal.es> I am not able to add xyplot glyphs to one panel because I have some problems with the aggregations. Using that GFF file: ##sequence-region chr1 1 5578650 chr1 atfreq atpc 1 50 58.8000 . . atpc 1 chr1 atfreq atpc 51 100 58.4000 . . atpc 1 chr1 atfreq atpc 101 150 57.6000 . . atpc 1 chr1 atfreq atpc 151 200 57.8000 . . atpc 1 . . . And this source code for preparing the aggregated features necessary for the xyplot glyph: my $filin = $ARGV[0]; my $db = Bio::DB::GFF->new( -dsn => $filin, -adaptor => 'memory', -aggregator => 'at{atpc:atfreq}' ); my $segment = $db->segment('chr1'); my @features1 = $db->features('atpc'); print "$#features1 \n"; my @features2 = $segment->features('atpc'); print "$#features2 \n"; my @features3 = $db->features('at'); print "$#features3 \n"; my @features4 = $segment->features('at'); print "$#features4 \n"; I obtain: 111572 111572 0 0 What I am doing wrong with the aggregator? Many thanks. From lincoln.stein at gmail.com Thu Apr 10 13:55:06 2008 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 10 Apr 2008 13:55:06 -0400 Subject: [Bioperl-l] xyplot glyph problem with previous aggregation In-Reply-To: <47FE37B8.9090404@usal.es> References: <47FE37B8.9090404@usal.es> Message-ID: <6dce9a0b0804101055w65e22abfgaa4f155751fef40f@mail.gmail.com> Hi Luis, When you aggregate the atpc 1 features together, you end up with one feature. Thus @features3 is an array of size 1. The $# operator returns the index of the last element, which is 0. If @features3 were empty, $#features3 would return -1. Lincoln On Thu, Apr 10, 2008 at 11:52 AM, Luis A. M. Quintales wrote: > I am not able to add xyplot glyphs to one panel because I have some > problems with the aggregations. > > Using that GFF file: > > ##sequence-region chr1 1 5578650 > chr1 atfreq atpc 1 50 58.8000 . . atpc 1 > chr1 atfreq atpc 51 100 58.4000 . . atpc 1 > chr1 atfreq atpc 101 150 57.6000 . . atpc 1 > chr1 atfreq atpc 151 200 57.8000 . . atpc 1 > . . . > > > And this source code for preparing the aggregated features necessary for > the xyplot glyph: > > my $filin = $ARGV[0]; > my $db = Bio::DB::GFF->new( -dsn => $filin, > -adaptor => 'memory', > -aggregator => 'at{atpc:atfreq}' > ); > my $segment = $db->segment('chr1'); > my @features1 = $db->features('atpc'); > print "$#features1 \n"; > my @features2 = $segment->features('atpc'); > print "$#features2 \n"; > my @features3 = $db->features('at'); > print "$#features3 \n"; > my @features4 = $segment->features('at'); > print "$#features4 \n"; > > I obtain: > > 111572 > 111572 > 0 > 0 > > What I am doing wrong with the aggregator? > > Many thanks. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From adsj at novozymes.com Fri Apr 11 04:53:23 2008 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Fri, 11 Apr 2008 10:53:23 +0200 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example Message-ID: <87d4owixh8.fsf@topper.koldfront.dk> Hi. I am trying to make Bio::SeqIO return objects of my own type (a small extension of Bio::Seq::RichSeq), by setting -seqfactory. I am having a little trouble creating the correct object to pass with -seqfactory: Following the example given in SYNOPSIS of Bio::Factory::SequenceFactoryI, I get this error: $ perl -e ' > use Bio::Seq::SeqFactory; > my $seqbuilder = Bio::Seq::SeqFactory->new('type' => 'Bio::PrimarySeq'); > > my $seq = $seqbuilder->create(-seq => 'ACTGAT', > -display_id => 'exampleseq'); > > print "seq is a ", ref($seq), "\n"; > ' ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Can't locate type.pm in @INC (@INC contains: /z/bio/biotools/bioinfperlmodules/ /z/bio/adm/modules /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl .) at (eval 13) line 3. : Unrecognized Sequence type for SeqFactory 'type' STACK: Error::throw STACK: Bio::Root::Root::throw /usr/share/perl/5.8/Bio/Root/Root.pm:357 STACK: Bio::Seq::SeqFactory::type /usr/share/perl/5.8/Bio/Seq/SeqFactory.pm:134 STACK: Bio::Seq::SeqFactory::new /usr/share/perl/5.8/Bio/Seq/SeqFactory.pm:93 STACK: -e:3 ----------------------------------------------------------- $ If I go "Bio::Seq::SeqFactory('Bio::PrimarySeq'=>1)" instead, for instance, it seems to work: $ perl -e ' > use Bio::Seq::SeqFactory; > my $seqbuilder = Bio::Seq::SeqFactory->new('Bio::PrimarySeq'=>1); > > my $seq = $seqbuilder->create(-seq => 'ACTGAT', > -display_id => 'exampleseq'); > > print "seq is a ", ref($seq), "\n"; > ' seq is a Bio::PrimarySeq $ I was about to write a patch for the pod, when I realized that I'd better start by asking: Is this a buglet in the pod or the code? Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From hlapp at gmx.net Fri Apr 11 11:35:54 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 11 Apr 2008 11:35:54 -0400 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example In-Reply-To: <87d4owixh8.fsf@topper.koldfront.dk> References: <87d4owixh8.fsf@topper.koldfront.dk> Message-ID: <0037240B-F469-4388-972A-324101B11621@gmx.net> On Apr 11, 2008, at 4:53 AM, Adam Sj?gren wrote: > $ perl -e ' >> use Bio::Seq::SeqFactory; >> my $seqbuilder = Bio::Seq::SeqFactory->new('type' => >> 'Bio::PrimarySeq'); You need to prefix the argument with a dash: '-type', not 'type'. Otherwise, it assumes that the class you want instantiated is 'type.pm'. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From 1zoujing at 163.com Thu Apr 10 01:08:52 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 9 Apr 2008 22:08:52 -0700 (PDT) Subject: [Bioperl-l] Bio::ASN1::EntrezGene parse so slowly? Message-ID: <16602210.post@talk.nabble.com> I want to parse a file "gene_info" from NCBI. The format of Gene in NCBI is ASN1, right? So I used Bio::ASN1::EntrezGene. But it didn't work properly/too slow. The file is about 500M. The code is following: use Bio::ASN1::EntrezGene; my $parser = Bio::ASN1::EntrezGene->new('file' => $ARGV[0]); my $i = 0; while(my $result = $parser->next_seq) { last; #something to do there, here use last for test} When it goes to the "while" part, it is processing on and on, it does not went out, even I used "last" in the "while" part. So I wonder whether it is too slow or the module is not fit for this job, or I did something wrong? Thank you! -- View this message in context: http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16602210.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 02:17:41 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 9 Apr 2008 23:17:41 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16602770.post@talk.nabble.com> I am a geen hand in Bioperl. When I run perl with "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error information: Data Error: none conforming data found on line 1 in Sus_scrofa.ags. But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, should be the same as Homo_sapiens in the example. So it should be no error as the code is the example from Mingyi. I wonder why this happen, and should I change something about the file? -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16602770.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 02:56:52 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 9 Apr 2008 23:56:52 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: <16602770.post@talk.nabble.com> References: <16602770.post@talk.nabble.com> Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 03:03:56 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:03:56 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there still one thing, I want to parse "gene_info.gz" in Gene of NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file ) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 03:04:32 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:04:32 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there still one thing, I want to parse "gene_info.gz" in Gene of NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 03:09:40 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:09:40 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there still one thing, I want to parse "gene_info.gz" in Gene of NCBI. ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 03:10:26 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:10:26 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there is still one thing, I want to parse "gene_info.gz" in Gene of NCBI. ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From stefan.kirov at bms.com Fri Apr 11 15:59:29 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 11 Apr 2008 15:59:29 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: <16602770.post@talk.nabble.com> References: <16602770.post@talk.nabble.com> Message-ID: AGS is a binary ASN.1 format and WILL NOT be parsed! You have to use gene2xml( weird, but this is NCBI) with these flags: -c -x -b -i. This will spit out text ASN which can be parsed. Stefan On Wed, 9 Apr 2008, zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no error > as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- > View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16602770.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From stefan.kirov at bms.com Fri Apr 11 16:01:30 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 11 Apr 2008 16:01:30 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: <16603225.post@talk.nabble.com> References: <16603225.post@talk.nabble.com> Message-ID: It is not. If you use this file, why would you need a parser for it anyway? Just split on \t or read with OpenOffice or equiv. Stefan On Thu, 10 Apr 2008, zoujing wrote: > > Seached the web and found the answer now, quote the answer as following: > The error was thrown by my Bio::ASN1::EntrezGene module because it > expects a text file, while you fed it with a binary file. To use > gzipped ASN binary file from NCBI, download the NCBI gene2xml > (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), > then use this syntax to run my parser on the binary files: > > my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i > Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped > binary file directly downloaded from NCBI > > Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). > Mingyi > > But there still one thing, I want to parse "gene_info.gz" in Gene of > NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line > per GeneID, Column header line is the first line in the file > ) is not the right format for Bio::ASN1::EntrezGene? > > > > zoujing wrote: >> >> I am a geen hand in Bioperl. When I run perl with >> "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error >> information: >> Data Error: none conforming data found on line 1 in Sus_scrofa.ags. >> >> But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, >> should be the same as Homo_sapiens in the example. So it should be no >> error as the code is the example from Mingyi. >> I wonder why this happen, and should I change something about the file? >> >> > > -- > View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From asjo at koldfront.dk Fri Apr 11 15:39:59 2008 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Fri, 11 Apr 2008 21:39:59 +0200 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example In-Reply-To: <0037240B-F469-4388-972A-324101B11621@gmx.net> (Hilmar Lapp's message of "Fri, 11 Apr 2008 11:35:54 -0400") References: <87d4owixh8.fsf@topper.koldfront.dk> <0037240B-F469-4388-972A-324101B11621@gmx.net> Message-ID: <877if4i3jk.fsf@topper.koldfront.dk> On Fri, 11 Apr 2008 11:35:54 -0400, Hilmar wrote: > On Apr 11, 2008, at 4:53 AM, Adam Sj?gren wrote: >>> my $seqbuilder = Bio::Seq::SeqFactory->new('type' => >>> 'Bio::PrimarySeq'); > You need to prefix the argument with a dash: '-type', not 'type'. > Otherwise, it assumes that the class you want instantiated is > 'type.pm'. I guess that means I should submit a patch for the SYNOPSIS. Attached. Thanks, Adam Index: Bio/Factory/SequenceFactoryI.pm =================================================================== --- Bio/Factory/SequenceFactoryI.pm (revision 14654) +++ Bio/Factory/SequenceFactoryI.pm (working copy) @@ -20,7 +20,7 @@ # get a Bio::Factory::SequenceFactoryI object like use Bio::Seq::SeqFactory; - my $seqbuilder = Bio::Seq::SeqFactory->new('type' => 'Bio::PrimarySeq'); + my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq => 'ACTGAT', -display_id => 'exampleseq'); -- "Well, I'm a moon around you" Adam Sj?gren asjo at koldfront.dk From bamboowarrior at gmail.com Fri Apr 11 19:10:35 2008 From: bamboowarrior at gmail.com (Arkady) Date: Fri, 11 Apr 2008 18:10:35 -0500 Subject: [Bioperl-l] Nucleotide Links in Gene DB (GenBank) Message-ID: <91656c3f0804111610r24c8fa5es5bcb56b7a59e0208@mail.gmail.com> Hi everyone, I'm a bioperl n00b. Actually, kind of a genbank n00b, too, as I'm from a CS background and just started bio things last June. I'm trying to set up an analysis pipeline of primate protein CDSs (the nucleotide seqs). I've written a script which does a pretty decent job of downloading these from GenBank--but it's inconsistent, because a lot of sequences in nucleotide are 'predicted' and named LOCthisorthat instead of by gene name. So what I was thinking was this (assume ANKRD43 is the gene for this example): 1. Search 'gene' database for ANKRD43 AND (PRI*[ORGN]) On NCBI, there's an option to show all nucleotide links. How do I get a list of those in bioperl? Can bioperl even search 'gene', or just 'nucleotide'? 2. Search 'nucleotide' for the referenced items from #1, and also for ANKRD43[TITL] AND (PRI*[ORGN]), save CDSes. 3. BLAST mRNA for one of those CDSes, see if we pick up any other matches. 4. BLAT other primates for CDSes, see if we find anything not in GenBank. On the other hand, I always get the feeling I'm doing things the hard way--especially here, with #1 and #2. Is there a much more obvious, simple way to do this? Thanks, folks. Cheers, John Woods Institute for Cellular and Molecular Biology The University of Texas at Austin From hlapp at gmx.net Fri Apr 11 19:19:44 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 11 Apr 2008 19:19:44 -0400 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example In-Reply-To: <877if4i3jk.fsf@topper.koldfront.dk> References: <87d4owixh8.fsf@topper.koldfront.dk> <0037240B-F469-4388-972A-324101B11621@gmx.net> <877if4i3jk.fsf@topper.koldfront.dk> Message-ID: Thanks, applied. -hilmar On Apr 11, 2008, at 3:39 PM, Adam Sj?gren wrote: > On Fri, 11 Apr 2008 11:35:54 -0400, Hilmar wrote: > >> On Apr 11, 2008, at 4:53 AM, Adam Sj?gren wrote: > >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('type' => >>>> 'Bio::PrimarySeq'); > >> You need to prefix the argument with a dash: '-type', not 'type'. >> Otherwise, it assumes that the class you want instantiated is >> 'type.pm'. > > I guess that means I should submit a patch for the SYNOPSIS. Attached. > > > Thanks, > > Adam > > > Index: Bio/Factory/SequenceFactoryI.pm > =================================================================== > --- Bio/Factory/SequenceFactoryI.pm (revision 14654) > +++ Bio/Factory/SequenceFactoryI.pm (working copy) > @@ -20,7 +20,7 @@ > # get a Bio::Factory::SequenceFactoryI object like > > use Bio::Seq::SeqFactory; > - my $seqbuilder = Bio::Seq::SeqFactory->new('type' => > 'Bio::PrimarySeq'); > + my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > 'Bio::PrimarySeq'); > > my $seq = $seqbuilder->create(-seq => 'ACTGAT', > -display_id => 'exampleseq'); > > -- > "Well, I'm a moon around you" Adam > Sj?gren > > asjo at koldfront.dk > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mmokrejs at ribosome.natur.cuni.cz Fri Apr 11 21:32:14 2008 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Sat, 12 Apr 2008 03:32:14 +0200 Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon_id In-Reply-To: References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com> <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net> <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com> <93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com> Message-ID: <4800111E.3030802@ribosome.natur.cuni.cz> Chris Fields wrote: > The counter to that perspective (using new sequences with old tax info) > would be to regularly update NCBI taxonomy, particularly in > circumstances prior to adding new sequences. Hilmar mentioned that once > tax is loaded it doesn't take as long to update, so you could set up a > cron job to update regularly. > > I remember someone mentioning weekly or monthly updates on the list > quite a while ago, but I'm unsure how often NCBI updates tax information > (i.e. with every release, monthly, weekly, etc). I can see instances > popping up where you used the an up-to-date taxonomy but a new sequence > contains a tax ID not present. I think bioperl-db handles these but I'm > not sure what other Bio* do. > I spent some time benchmarking this and inspecting the mysql log files. The current load_ncbi_taxonomy.pl script with minor modification to show timestamps does this on initial import into mysql and then update of the database using exactly same dataset (but anyway it has to walk through all the data): $ ./load_ncbi_taxonomy.pl --dbname=biosqldb --driver=mysql --host=127.0.01 \ --port=3306 --directory=/home/mmokrejs/bioinformatics/databases/ncbitax/dump \ --chunksize=0 --verbose=2 --mycnf=~/.my.cnf Sat Apr 12 01:58:43 MEST 2008 Loading NCBI taxon database in /home/mmokrejs/bioinformatics/databases/ncbitax/dump: ... retrieving all taxon nodes in the database Sat Apr 12 01:58:43 MEST 2008 ... reading in taxon nodes from nodes.dmp Sat Apr 12 01:58:58 MEST 2008 ... insert / update / delete taxon nodes 10000/421098 done (in 5 secs, 2000.0 rows/s) 20000/421098 done (in 4 secs, 2500.0 rows/s) ... 420000/421098 done (in 4 secs, 2500.0 rows/s) Sat Apr 12 02:02:21 MEST 2008 ... (committing nodes) Sat Apr 12 02:02:21 MEST 2008 ... rebuilding nested set left/right values 10000 done (in 24 secs, 416.7 rows/s) 20000 done (in 26 secs, 384.6 rows/s) 30000 done (in 24 secs, 416.7 rows/s) ... 420004 done (in 23 secs, 434.8 rows/s) Sat Apr 12 02:19:25 MEST 2008 ... reading in taxon names from names.dmp Sat Apr 12 02:19:25 MEST 2008 ... deleting old taxon names Sat Apr 12 02:19:25 MEST 2008 ... inserting new taxon names 10000 done (in 8 secs, 1250.0 rows/s) 20000 done (in 8 secs, 1250.0 rows/s) ... 580000 done (in 5 secs, 2000.0 rows/s) Sat Apr 12 02:24:48 MEST 2008 ... cleaning up Sat Apr 12 02:24:49 MEST 2008 Done. $ I decided to re-import the same data to mimic at least somehow the future updates, although no record should be UPDATEd, except zapping left and right values with NULL. :(( $ ./load_ncbi_taxonomy.pl --dbname=biosqldb --driver=mysql --host=127.0.01 --port=3306 --directory=/home/mmokrejs/bioinformatics/databases/ncbitax/dump \ --chunksize=0 --verbose=2 --mycnf=~/.my.cnf Sat Apr 12 02:35:20 MEST 2008 Loading NCBI taxon database in /home/mmokrejs/bioinformatics/databases/ncbitax/dump: ... retrieving all taxon nodes in the database Sat Apr 12 02:35:26 MEST 2008 ... reading in taxon nodes from nodes.dmp Sat Apr 12 02:35:46 MEST 2008 ... insert / update / delete taxon nodes 10000/421098 done (in 0 secs, 10000.0 rows/s) 20000/421098 done (in 0 secs, 10000.0 rows/s) ... 410000/421098 done (in 0 secs, 10000.0 rows/s) 420000/421098 done (in 0 secs, 10000.0 rows/s) Sat Apr 12 02:35:55 MEST 2008 ... (committing nodes) Sat Apr 12 02:35:55 MEST 2008 ... rebuilding nested set left/right values 10000 done (in 9 secs, 1111.1 rows/s) 20000 done (in 9 secs, 1111.1 rows/s) ... 410004 done (in 8 secs, 1250.0 rows/s) 420004 done (in 9 secs, 1111.1 rows/s) Sat Apr 12 02:41:54 MEST 2008 ... reading in taxon names from names.dmp Sat Apr 12 02:41:54 MEST 2008 ... deleting old taxon names Sat Apr 12 02:41:55 MEST 2008 ... inserting new taxon names 10000 done (in 5 secs, 2000.0 rows/s) 20000 done (in 5 secs, 2000.0 rows/s) ... 570000 done (in 6 secs, 1666.7 rows/s) 580000 done (in 5 secs, 2000.0 rows/s) Sat Apr 12 02:47:27 MEST 2008 ... cleaning up Sat Apr 12 02:47:27 MEST 2008 Done. $ ls -la /var/log/mysql/mysql.log -rw-rw---- 1 mysql mysql 483443314 Apr 12 03:15 /var/log/mysql/mysql.log $ Pentium4 M laptop, 1.8GHz, 1 GB RAM, mysql-5.0.56 with enabled SQL text logging, the slow version of logging all SQL commands compared to binary logging. The log was cleared before the tests. I could provide some bits from the log or upload it somewhere if anybody else would like to dig into the details. I believe the recalculation step could be made faster. See what happens: 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '1' ORDER BY ncbi_taxon_id 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '10239' ORDER BY ncbi_taxon_id 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12333' ORDER BY ncbi_taxon_id 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12335' ORDER BY ncbi_taxon_id 31 Query UPDATE taxon SET left_value = NULL, right_value = NULL WHERE left_value = '4' 31 Query UPDATE taxon SET left_value = NULL, right_value = NULL WHERE right_value = '5' 31 Query UPDATE taxon SET left_value = '4', right_value = '5' WHERE taxon_id = '12335' 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12340' ORDER BY ncbi_taxon_id 31 Query UPDATE taxon SET left_value = NULL, right_value = NULL WHERE left_value = '6' 31 Query UPDATE taxon SET left_value = NULL, right_value = NULL WHERE right_value = '7' 31 Query UPDATE taxon SET left_value = '6', right_value = '7' WHERE taxon_id = '12340' The columns left_value and right_value have NULL value upon the table is created, so no need to write again NULL into them. This would mean writing a wrapper function which would mimic update() but before doing that it would do 'SELECT * FROM', compare the values with those to be written and include in the final UPDATE statement only those columns for which values have been changed. We use such a smart wrapper for our code in python. ;-) When the columns for left and right are to be made NULL during update of an existing database, I think it would be much faster to drop the columns and re-create them again with NULL values. I think it could be investigated more the possibility to create empty taxon and taxon_name tables as MyISAM tables and only after all the import and updates they could be converted into InnoDB tables. One would have to probably think a bit more of the foreign keys but it might be they would not even be lost during the conversion back and forth. Actually, easy to check. Dump your current taxon and taxon_name tables (maybe even without sql data using --without-data), run 'ALTER TABLE taxon ... type=MyISAM' followed by 'ALTER TABLE taxon ... type=InnoDB' dump again the database structure and compare by diff with the original. But, time for sleep here. Martin From sdavis2 at mail.nih.gov Fri Apr 11 23:50:44 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 11 Apr 2008 23:50:44 -0400 Subject: [Bioperl-l] Bio::ASN1::EntrezGene parse so slowly? In-Reply-To: <16602210.post@talk.nabble.com> References: <16602210.post@talk.nabble.com> Message-ID: <264855a00804112050gf785c2ei66d9c7463597eccd@mail.gmail.com> gene_info is a tab-delimited text file, if I recall correctly. Have you looked at it? If it is, you should be able to parse it in a few seconds with just a couple lines of code. Sean On Thu, Apr 10, 2008 at 1:08 AM, zoujing <1zoujing at 163.com> wrote: > > I want to parse a file "gene_info" from NCBI. The format of Gene in NCBI is > ASN1, right? So I used Bio::ASN1::EntrezGene. But it didn't work > properly/too slow. The file is about 500M. > The code is following: > use Bio::ASN1::EntrezGene; > my $parser = Bio::ASN1::EntrezGene->new('file' => $ARGV[0]); > my $i = 0; > while(my $result = $parser->next_seq) > { last; #something to do there, here use last for test} > > When it goes to the "while" part, it is processing on and on, it does not > went out, even I used "last" in the "while" part. > So I wonder whether it is too slow or the module is not fit for this job, > or I did something wrong? > > Thank you! > -- > View this message in context: http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16602210.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From david at burt7259.freeserve.co.uk Sat Apr 12 13:01:57 2008 From: david at burt7259.freeserve.co.uk (David Burt) Date: Sat, 12 Apr 2008 18:01:57 +0100 Subject: [Bioperl-l] bioperl-db Message-ID: Hi Hilmar, Hope you can help ? I am using bioperl-db to create a biosql database I have used scripts load_seqdatabase.pl and load_ontology.pl to install human swissprot entries, gene ontology, sequence ontology and now want to load interpro Here?s the command line I have tried perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser root --dbpass chicken --driver mysql \ --namespace "InterPro" --format InterPro interpro.xml But I get this message Can't call method "identifier" on an undefined value at /cygdrive/c/ Bioinformatics/Ensembl/src/bioperl-live/Bio/Ontology/ SimpleOntologyEngine.pm line 395 Any ideas? Dave PS: here?s the top of the interpro.xml file Kringle From hlapp at gmx.net Sat Apr 12 14:10:44 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 12 Apr 2008 14:10:44 -0400 Subject: [Bioperl-l] personal vs list email Message-ID: I'm not sure why but I have received several Bioperl or BioSQL- related email inquiries directed to me *personally* over the past few weeks. I have been responding as I get to them, but I feel that I am doing both the senders and this community a poor service, because sometimes someone else on the list could have responded much faster, and when I respond, others on the list who happen to be interested in the same question don't get to see the answer. So from now on as a policy I will redirect *every* email sent to me personally and that asks a question related to one of the projects to the respective mailing list. If you don't want this, please conspicuously say so at the top of your email, and in that case if you do ask a project-related question be prepared to wait and to possibly needing to follow up. As an aside, it's a pretty safe assumption to make that all other core developers, and quite possibly *all* developers are following a similar policy, whether expressly or not. Isn't this somewhere in the FAQ too? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Apr 12 14:16:13 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 12 Apr 2008 14:16:13 -0400 Subject: [Bioperl-l] bioperl-db In-Reply-To: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> Message-ID: Hi Burt, can you try format interprosax instead of interpro? That variant is also much more graceful regarding required space. -hilmar On Apr 12, 2008, at 1:01 PM, David Burt wrote: > Hi Hilmar, > > Hope you can help ? I am using bioperl-db to create a biosql database > > I have used scripts load_seqdatabase.pl and load_ontology.pl to > install human swissprot entries, gene ontology, sequence ontology > and now want to load interpro > > Here?s the command line I have tried > > perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser > root --dbpass chicken --driver mysql \ > --namespace "InterPro" --format InterPro interpro.xml > > But I get this message > > Can't call method "identifier" on an undefined value at /cygdrive/ > c/Bioinformatics/Ensembl/src/bioperl-live/Bio/Ontology/ > SimpleOntologyEngine.pm line 395 > > Any ideas? > > Dave > > PS: here?s the top of the interpro.xml file > > > > > > > > > file_date="04-OCT-2006 00:00:00" /> > file_date="22-NOV-2006 00:00:00" /> > file_date="12-JUN-2007 00:00:00" /> > file_date="22-SEP-2005 00:00:00" /> > file_date="23-APR-2004 00:00:00" /> > file_date="14-NOV-2006 00:00:00" /> > file_date="27-JUL-2007 00:00:00" /> > file_date="28-SEP-2007 00:00:00" /> > file_date="11-SEP-2006 00:00:00" /> > file_date="30-NOV-2006 00:00:00" /> > entry_count="359942" file_date="18-MAR-2008 00:00:00" /> > file_date="18-MAR-2008 00:00:00" /> > file_date="19-MAR-2008 00:00:00" /> > file_date="27-MAR-2007 00:00:00" /> > file_date="12-JUL-2007 16:56:17" /> > > protein_count="352"> > Kringle > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sat Apr 12 16:17:43 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 12 Apr 2008 15:17:43 -0500 Subject: [Bioperl-l] [BioSQL-l] personal vs list email In-Reply-To: References: Message-ID: On Apr 12, 2008, at 1:10 PM, Hilmar Lapp wrote: > I'm not sure why but I have received several Bioperl or BioSQL- > related email inquiries directed to me *personally* over the past > few weeks. > > I have been responding as I get to them, but I feel that I am doing > both the senders and this community a poor service, because > sometimes someone else on the list could have responded much faster, > and when I respond, others on the list who happen to be interested > in the same question don't get to see the answer. > > So from now on as a policy I will redirect *every* email sent to me > personally and that asks a question related to one of the projects > to the respective mailing list. If you don't want this, please > conspicuously say so at the top of your email, and in that case if > you do ask a project-related question be prepared to wait and to > possibly needing to follow up. > > As an aside, it's a pretty safe assumption to make that all other > core developers, and quite possibly *all* developers are following a > similar policy, whether expressly or not. I agree; I'm sure several other core devs feel the same way. I always try to forward these to the list if I feel it is more relevant there. > Isn't this somewhere in the FAQ too? > > -hilmar No, but I've added it to the bioperl FAQ; might be worth checking over and editing. chris From hlapp at gmx.net Sat Apr 12 18:40:53 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 12 Apr 2008 18:40:53 -0400 Subject: [Bioperl-l] bioperl-db In-Reply-To: <000001c89ce2$5400a710$0202a8c0@STUDYPC> References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> <000001c89ce2$5400a710$0202a8c0@STUDYPC> Message-ID: <3F77F49A-9C9E-4450-AE28-46F00CADBC8B@gmx.net> Burt - please keep your replies on the list. Others may have input too, or benefit from the answer too. As there is no name() method call on line 914 in the current version let's check first that you run a current version of BioPerl. It will need to be at least 1.5.2. However, I do suspect a problem in either the InterPro file itself (wouldn't be the first time), or the InterPro parser. -hilmar On Apr 12, 2008, at 5:15 PM, David Burt wrote: > Hilmar > > Many thanks seems to be working > > But got this output ? any comments/ideas what it means ? > > Dave > > > perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser > root --dbpass chicken --driver mysql \ > > --namespace "InterPro" --format interprosax interpro.xml > ...deleting all relationships for InterPro > ...parsing and loading InterPro > Can't call method "name" on an undefined value at load_ontology.pl > line 914. > > HERE?S the name and definition in the ontology table > > Name = InterPro > > Definition = > > PANTHER version 6.1, 30128 entries, 04-OCT-2006 > PFAM version 21.0, 8957 entries, 22-NOV-2006 > PIRSF version 2.70, 2877 entries, 12-JUN-2007 > PRINTS version 38.0, 1900 entries, 22-SEP-2005 > PRODOM version 2005.1, 1522 entries, 23-APR-2004 > PROSITE version 20.0, 2006 entries, 14-NOV-2006 > SMART version 5.1, 724 entries, 27-JUL-2007 > TIGRFAMs version 7.0, 3423 entries, 28-SEP-2007 > GENE3D version 3.0.0, 2147 entries, 11-SEP-2006 > SSF version 1.69, 1538 entries, 30-NOV-2006 > SWISSPROT version 55.1, 359942 entries, 18-MAR-2008 > TREMBL version 38.1, 5443281 entries, 18-MAR-2008 > INTERPRO version 17.0, 16175 entries, 19-MAR-2008 > GO version N/A, 23937 entries, 27-MAR-2007 > MEROPS version 7.8, 2831 entries, 12-JUL-2007 | > > > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: 12 April 2008 19:16 > To: David Burt > Cc: Bioperl BioPerl > Subject: Re: bioperl-db > > Hi Burt, > > can you try format interprosax instead of interpro? That variant is > also much more graceful regarding required space. > > -hilmar > > On Apr 12, 2008, at 1:01 PM, David Burt wrote: > > > Hi Hilmar, > > Hope you can help ? I am using bioperl-db to create a biosql database > > I have used scripts load_seqdatabase.pl and load_ontology.pl to > install human swissprot entries, gene ontology, sequence ontology > and now want to load interpro > > Here?s the command line I have tried > > perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser > root --dbpass chicken --driver mysql \ > --namespace "InterPro" --format InterPro interpro.xml > > But I get this message > > Can't call method "identifier" on an undefined value at /cygdrive/ > c/Bioinformatics/Ensembl/src/bioperl-live/Bio/Ontology/ > SimpleOntologyEngine.pm line 395 > > Any ideas? > > Dave > > PS: here?s the top of the interpro.xml file > > > > > > > > > file_date="04-OCT-2006 00:00:00" /> > file_date="22-NOV-2006 00:00:00" /> > file_date="12-JUN-2007 00:00:00" /> > file_date="22-SEP-2005 00:00:00" /> > file_date="23-APR-2004 00:00:00" /> > file_date="14-NOV-2006 00:00:00" /> > file_date="27-JUL-2007 00:00:00" /> > file_date="28-SEP-2007 00:00:00" /> > file_date="11-SEP-2006 00:00:00" /> > file_date="30-NOV-2006 00:00:00" /> > entry_count="359942" file_date="18-MAR-2008 00:00:00" /> > file_date="18-MAR-2008 00:00:00" /> > file_date="19-MAR-2008 00:00:00" /> > file_date="27-MAR-2007 00:00:00" /> > file_date="12-JUL-2007 16:56:17" /> > > protein_count="352"> > Kringle > > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Apr 12 18:43:25 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 12 Apr 2008 18:43:25 -0400 Subject: [Bioperl-l] bioperl-db In-Reply-To: <000001c89ce5$a5df2e50$0202a8c0@STUDYPC> References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> <000001c89ce5$a5df2e50$0202a8c0@STUDYPC> Message-ID: I'm not sure what you mean by 'Check interpro.xml', but you can use the --safe command-line option to keep going if an individual term fails to load for whatever reason. Can you post the data for the seemingly offending record? (and please cc the list) -hilmar On Apr 12, 2008, at 5:39 PM, David Burt wrote: > Hi Hilmar > > Just checked mysql database and only have 39 entries under interpro > and loaded up to IPR000035 > > Check unterpro.xml looks OK from IPR000036 and onwards > > So seems to have crashed at IPR000035 ? > > dave > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: 12 April 2008 19:16 > To: David Burt > Cc: Bioperl BioPerl > Subject: Re: bioperl-db > > Hi Burt, > > can you try format interprosax instead of interpro? That variant is > also much more graceful regarding required space. > > -hilmar > > On Apr 12, 2008, at 1:01 PM, David Burt wrote: > > > Hi Hilmar, > > Hope you can help ? I am using bioperl-db to create a biosql database > > I have used scripts load_seqdatabase.pl and load_ontology.pl to > install human swissprot entries, gene ontology, sequence ontology > and now want to load interpro > > Here?s the command line I have tried > > perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser > root --dbpass chicken --driver mysql \ > --namespace "InterPro" --format InterPro interpro.xml > > But I get this message > > Can't call method "identifier" on an undefined value at /cygdrive/ > c/Bioinformatics/Ensembl/src/bioperl-live/Bio/Ontology/ > SimpleOntologyEngine.pm line 395 > > Any ideas? > > Dave > > PS: here?s the top of the interpro.xml file > > > > > > > > > file_date="04-OCT-2006 00:00:00" /> > file_date="22-NOV-2006 00:00:00" /> > file_date="12-JUN-2007 00:00:00" /> > file_date="22-SEP-2005 00:00:00" /> > file_date="23-APR-2004 00:00:00" /> > file_date="14-NOV-2006 00:00:00" /> > file_date="27-JUL-2007 00:00:00" /> > file_date="28-SEP-2007 00:00:00" /> > file_date="11-SEP-2006 00:00:00" /> > file_date="30-NOV-2006 00:00:00" /> > entry_count="359942" file_date="18-MAR-2008 00:00:00" /> > file_date="18-MAR-2008 00:00:00" /> > file_date="19-MAR-2008 00:00:00" /> > file_date="27-MAR-2007 00:00:00" /> > file_date="12-JUL-2007 16:56:17" /> > > protein_count="352"> > Kringle > > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From Russell.Smithies at agresearch.co.nz Sun Apr 13 22:51:41 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 14 Apr 2008 14:51:41 +1200 Subject: [Bioperl-l] Tandem Repeats Finder? In-Reply-To: References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC><000001c89ce5$a5df2e50$0202a8c0@STUDYPC> Message-ID: Has anyone tried TRF? I notice UCSC is using it for all their simple repeat annotations and thought it might be better than what we're currently using (Sputnik) And is there a BioPerl parser for it's output or am I going to have to write my own ? Thanx, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Russell.Smithies at agresearch.co.nz Sun Apr 13 22:53:46 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 14 Apr 2008 14:53:46 +1200 Subject: [Bioperl-l] Tandem Repeats Finder? In-Reply-To: References: Message-ID: Scratch the need for a parser. I turned off html output and it's all nice white-space separated text :-) Russell > -----Original Message----- > From: Smithies, Russell > Sent: Monday, 14 April 2008 2:52 p.m. > To: 'Bioperl BioPerl' > Subject: Tandem Repeats Finder? > > Has anyone tried TRF? > I notice UCSC is using it for all their simple repeat annotations and thought it might > be better than what we're currently using (Sputnik) > > And is there a BioPerl parser for it's output or am I going to have to write my own ? > > Thanx, > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E? russell.smithies at agresearch.co.nz > > Invermay? Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T? +64 3 489 3809 > F? +64 3 489 9174 > www.agresearch.co.nz > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From csaba.ortutay at gmail.com Mon Apr 14 00:15:22 2008 From: csaba.ortutay at gmail.com (Ortutay Csaba =?iso-8859-1?q?P=E9ter?=) Date: Mon, 14 Apr 2008 07:15:22 +0300 Subject: [Bioperl-l] Tandem Repeats Finder? In-Reply-To: References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> Message-ID: <200804140715.22702.csaba.ortutay@gmail.com> Hello, I have used TRF in my earlier projects. It is nice and quick tool. There was not ready made parsers those times (5-6 years ago) so we have written our own. Csaba > Has anyone tried TRF? > I notice UCSC is using it for all their simple repeat annotations and > thought it might be better than what we're currently using (Sputnik) > > And is there a BioPerl parser for it's output or am I going to have to > write my own ? > > Thanx, -- Csaba Ortutay PhD IMT Bioinformatics University of Tampere Finland From avilella at gmail.com Mon Apr 14 07:13:26 2008 From: avilella at gmail.com (Albert Vilella) Date: Mon, 14 Apr 2008 12:13:26 +0100 Subject: [Bioperl-l] how can I print a Bio::Tree newick sortby given list? Message-ID: <358f4d650804140413x4271f18bx40af1b9054306df8@mail.gmail.com> Hi, I have a newick file that I want to sort by a given order and print again as newick. For example, if I have (((ENSPTRG00000013811:0.0011,ENSG00000142192:0.0021):0.0033,ENSPPYG00000003902:0.0326):0.0000,ENSMMUG00000014384:0.0366):0.3638; I want to sort it by "ENSG:ENSPTRG:ENSPPYG:ENSMMUG". Any suggestions on how to do this in bioperl? Cheers, Albert. From lamq at usal.es Mon Apr 14 11:01:51 2008 From: lamq at usal.es (Luis A. M. Quintales) Date: Mon, 14 Apr 2008 17:01:51 +0200 Subject: [Bioperl-l] xyplot glyph: scale problems Message-ID: <480371DF.7040900@usal.es> I have some problem with the xyplot scale numbers calculated by the glyph. The shape of the graph looks fine, but the scale number 10 and his position in the ouput is not correct. I send the source code, simplified input file and the png output. Thank you Source code ex1.pl (also in http://avellano.usal.es/~luis/bioperl-l/ex1.pl) ============================ #!/usr/bin/perl use Bio::DB::GFF; use Bio::Graphics::Panel; use strict; my $filin = $ARGV[0]; my $db = Bio::DB::GFF->new( -dsn => $filin,-adaptor => 'memory', -aggregator => 'at{atpc:atfreq}' ); my $segment = $db->segment('chr1'); my @features = $segment->features('at'); my $panel = Bio::Graphics::Panel->new( -offset => 0, -grid => 100, -length => 500, -width => 800, -pad_left => 50, -pad_right => 50 ); $panel->add_track($segment, -glyph => 'generic', -bgcolor => 'blue', -label => 1); $panel->add_track(\@features, -glyph => 'xyplot', -graph_type=>'boxes', -scale=>'left', -height=>200, ); open (FI,"> sal.png"); ============================ in1.gff file (also in http://avellano.usal.es/~luis/bioperl-l/in1.gff) ============================ ##sequence-region chr1 1 5578650 chr1 atfreq atpc 1 10 64.0000 . . atpc 1 chr1 atfreq atpc 11 20 63.0000 . . atpc 1 chr1 atfreq atpc 21 30 62.0000 . . atpc 1 chr1 atfreq atpc 31 40 59.0000 . . atpc 1 chr1 atfreq atpc 41 50 59.0000 . . atpc 1 chr1 atfreq atpc 51 60 59.0000 . . atpc 1 chr1 atfreq atpc 61 70 59.0000 . . atpc 1 chr1 atfreq atpc 71 80 59.0000 . . atpc 1 chr1 atfreq atpc 81 90 61.0000 . . atpc 1 chr1 atfreq atpc 91 100 60.0000 . . atpc 1 chr1 atfreq atpc 101 110 60.0000 . . atpc 1 chr1 atfreq atpc 111 120 64.0000 . . atpc 1 chr1 atfreq atpc 121 130 64.0000 . . atpc 1 chr1 atfreq atpc 131 140 60.0000 . . atpc 1 chr1 atfreq atpc 141 150 60.0000 . . atpc 1 chr1 atfreq atpc 151 160 63.0000 . . atpc 1 chr1 atfreq atpc 161 170 62.0000 . . atpc 1 chr1 atfreq atpc 171 180 59.0000 . . atpc 1 chr1 atfreq atpc 181 190 54.0000 . . atpc 1 chr1 atfreq atpc 191 200 53.0000 . . atpc 1 chr1 atfreq atpc 201 210 54.0000 . . atpc 1 chr1 atfreq atpc 211 220 50.0000 . . atpc 1 chr1 atfreq atpc 221 230 51.0000 . . atpc 1 chr1 atfreq atpc 231 240 56.0000 . . atpc 1 chr1 atfreq atpc 241 250 58.0000 . . atpc 1 chr1 atfreq atpc 251 260 55.0000 . . atpc 1 chr1 atfreq atpc 261 270 54.0000 . . atpc 1 chr1 atfreq atpc 271 280 56.0000 . . atpc 1 chr1 atfreq atpc 281 290 59.0000 . . atpc 1 chr1 atfreq atpc 291 300 58.0000 . . atpc 1 chr1 atfreq atpc 301 310 60.0000 . . atpc 1 chr1 atfreq atpc 311 320 59.0000 . . atpc 1 chr1 atfreq atpc 321 330 59.0000 . . atpc 1 chr1 atfreq atpc 331 340 57.0000 . . atpc 1 chr1 atfreq atpc 341 350 56.0000 . . atpc 1 chr1 atfreq atpc 351 360 57.0000 . . atpc 1 chr1 atfreq atpc 361 370 57.0000 . . atpc 1 chr1 atfreq atpc 371 380 58.0000 . . atpc 1 chr1 atfreq atpc 381 390 56.0000 . . atpc 1 chr1 atfreq atpc 391 400 58.0000 . . atpc 1 chr1 atfreq atpc 401 410 56.0000 . . atpc 1 chr1 atfreq atpc 411 420 59.0000 . . atpc 1 chr1 atfreq atpc 421 430 58.0000 . . atpc 1 chr1 atfreq atpc 431 440 59.0000 . . atpc 1 chr1 atfreq atpc 441 450 58.0000 . . atpc 1 chr1 atfreq atpc 451 460 58.0000 . . atpc 1 chr1 atfreq atpc 461 470 56.0000 . . atpc 1 chr1 atfreq atpc 471 480 57.0000 . . atpc 1 chr1 atfreq atpc 481 490 59.0000 . . atpc 1 ============================ The sal.png : http://avellano.usal.es/~luis/bioperl-l/sal.png Thank you. -- ================================================== Luis Antonio Miguel Quintales Departamento de Inform?tica y Autom?tica Facultad de Ciencias Universidad de Salamanca Plaza de la Merced s/n 37008-SALAMANCA SPAIN ================================================== Tel.: +34-923-294400(ext.1513) Fax.: +34-923-294584 E-mail: lamq at usal.es ================================================== From aaron.j.mackey at gsk.com Mon Apr 14 09:00:52 2008 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Mon, 14 Apr 2008 09:00:52 -0400 Subject: [Bioperl-l] personal vs list email In-Reply-To: Message-ID: I try to take it even one step further: I require the person to re-ask their question on the mailing list (and then try to answer it there). This has the added benefit of causing the person to pause a moment to reflect on their question, and (sometimes) to spend a bit more time preparing the question for more broader public consumption. -Aaron From sutripa at vbi.vt.edu Mon Apr 14 12:54:47 2008 From: sutripa at vbi.vt.edu (Sucheta Tripathy) Date: Mon, 14 Apr 2008 12:54:47 -0400 (EDT) Subject: [Bioperl-l] Error installing XML::Parser Message-ID: <1285.99.152.150.87.1208192087.squirrel@webmail.vbi.vt.edu> Hello List, I have recently installed bioperl using the following command. The installation was successful. Now I am trying to install XML::Parser but it returns with error messages. Any clue what I may be doing wrong? Thanks Sucheta Following is the last part of the error message: ### Error Message ####### Expat.c: In function ??~XS_XML__Parser__Expat_SkipUntil??T: Expat.c:2664: error: ??~XML_Parser??T undeclared (first use in this function) Expat.c:2664: error: expected ??~;??T before ??~parser??T Expat.c:2665: warning: ISO C90 forbids mixed declarations and code Expat.xs:2179: error: ??~parser??T undeclared (first use in this function) Expat.xs:2179: warning: cast to pointer from integer of different size Expat.xs:2180: error: ??~CallbackVector??T has no member named ??~st_serial??T Expat.xs:2182: error: ??~CallbackVector??T has no member named ??~skip_until??T Expat.c: In function ??~XS_XML__Parser__Expat_Do_External_Parse??T: Expat.c:2687: error: ??~XML_Parser??T undeclared (first use in this function) Expat.c:2687: error: expected ??~;??T before ??~parser??T Expat.c:2688: warning: ISO C90 forbids mixed declarations and code Expat.xs:2194: error: ??~parser??T undeclared (first use in this function) Expat.xs:2194: warning: cast to pointer from integer of different size Expat.xs:2205: warning: unused variable ??~pret??T Expat.xs:2194: warning: unused variable ??~cbv??T Expat.xs:2192: warning: unused variable ??~type??T make[1]: *** [Expat.o] Error 1 make[1]: Leaving directory `/root/.cpan/build/XML-Parser-2.36/Expat' make: *** [subdirs] Error 2 /usr/bin/make -- NOT OK Running make test Can't test without successful make Running make install make had returned bad status, install seems impossible ##### -- Sucheta Tripathy, Ph.D. Virginia Bioinformatics Institute Phase-I Washington street. Virginia Tech. Blacksburg,VA 24061-0447 phone:(540)231-8138 Fax: (540) 231-2606 From mmokrejs at ribosome.natur.cuni.cz Tue Apr 15 06:45:48 2008 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Tue, 15 Apr 2008 12:45:48 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> <47FA4AD2.5030206@uv.es> Message-ID: <4804875C.80506@ribosome.natur.cuni.cz> Chris Fields wrote: > Note in the example I gave that, during the revision history, the > DBSOURCE changed at the point of the creation date (the original nuc. > record was a M. tuberculosis contig sequence, which later changed to > an updated full M. tuberculosis genome record at the time of the > 'create date'). > > Couldn't find anything specific in the GenBank docs on this, but it > appears (at least for a protein record) the creation date reflects > the date in which the sequence was either originally deposited or > originally derived from the nucleotide source record present in the > record. In other words, it may not reflect the original date of > deposition (which could have come from a different record, as in this > case). > > chris Hi, I have few answers from the past from NCBI staff to my similar questions regarding DATE issues and VERSION numbers not being increased upon "changes" in a record. I tried below to put into a more readable form my former correspondence. Hope this helps everybody to understand what happens in the black box. ;) Martin Date: Thu, 17 Jan 2002 15:40:07 -0500 (EST) From: David Wheeler Subject: Brucella_melitensis on ftp site > Hi, I'd like to point you to the fact, that the descriptions of > Brucella_melitensis differ in > ftp.ncbi.nih.nlm.gov/genomes/Bacteria/Brucella_melitensis and > ftp.ncbi.nih.nlm.gov/genbank/genomes/Bacteria/Brucella_melitensis > > Namely, the description of the strain is retained in *.gbk files > under /genomes/Bacteria/Brucella_melitensis only under the strain > description field, but not in the DEFINITION line, where it is > present in *.gbk files under > /genbank/genomes/Bacteria/Brucella_melitensis. > > LOCUS NC_003318 1177787 bp DNA circular BCT > 13-NOV-2001 DEFINITION Brucella melitensis chromosome II, complete > sequence. ACCESSION NC_003318 VERSION NC_003318.1 GI:17988344 > > compared to > > LOCUS AE008918 1177787 bp DNA circular BCT > 27-DEC-2001 DEFINITION Brucella melitensis strain 16M chromosome II, > complete sequence. ACCESSION AE008918 VERSION AE008918 > > This makes me worried about the data. Why is the release date of > NON-curated files (AE008918) newer than the release data of CURATED > data (NC_003318)? Is it expected case? Could someone explain me the > difference between them (i.e. CURATED vs. NONCURATED)? The curated record is initially a copy of the non-curated record with certain changes in documentation made in order to comply with the NCBI standard for reference genomes. One change which you have noticed is the difference in Definition line format. Curated genomic records are created in order to standardize annotation for genomes in the Entrez Genomes database while leaving editorial control for the parent GenBank records in the hands of the original submitters. Regardles of the date you see on the record, the curated version is derived from the non-curated one. In this case, it appears that the processing of the non-curated version lagged a little bit relative to that of the curated version. Normally, however, the non-curated version will have the earlier date. Date: Sun, 27 Jan 2002 00:16:55 -0500 (EST) From: David Wheeler Subject: Re: CONSULT: Brucella_melitensis on ftp site > Are the raw sequence data always same in non-curated and curated > flatfiles? > > Is the annotation of orf's/proteins different between them? > > Are there any new or withdrawn orf's or proteins in the curated > flatfiles compared to non-curated ones? > > My feeling is that no-one except original submitters can modify > submitted data, so you cannot modify non-curated files, i.e. cannot > modify them and increase the version number. > > Because of that, you've introduced curated versions, which are just > copies of original but public data so you are free to modify it. So > once again, are the differences between non-curated and curated > flatfiles only in structure of the file? I don't think so. Examples > would be Listeria genomes or the 2 Agrobacterium's, if I remember > right. Initially, there should be no or very few differences, however, as time goes by, differences in the annotation will materialize. There may also be differences in the sequence, if errors in the original sequence come to light, but these differences should be very rare. So, practically speaking, you will probably find few differences but, since the purpose of the Refseq is to curate, there may well be some differences. Date: Mon, 17 Dec 2001 11:57:06 -0500 (EST) From: Dawn Lipshultz Subject: Re: Buggy date in Staphylococcus aureus N315 >>>> Hi, I've found there has been released Staphylococcus aureus >>>> N315 on 01-JAN-1900, which is nonsense. I guss you had y2K bug. >>>> >>>> >>>> Please see >>>> >> ftp://ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Staphylococcus_aureus_N315/BA000018.gbk >> >>>> >>>> Can you please tell me the real release date? >>>> >>>> Also, is newer the NC_xxxx for Staphylococcus aureus N315 under >>>> >>>> ftp://ncbi.nlm.nih.gov/genomes/Bacteria/Staphylococcus_aureus_N315/ >>>> or this BA000018 non-cured version? >>>> >>>> >>>> LOCUS BA000018 2814816 bp DNA circular BCT >>>> 01-JAN-1900 DEFINITION Staphylococcus aureus strain N315, >>>> complete genome. >>> AP003129-AP003138. They are all dated June 2001. >>> >>> The date for the record in the ftp file is April 2001. The record >>> in GenBank (NC_002745) is dated October 2001. This version is >>> apparently more updated than the one on the ftp site. Therefore, >>> you may want to download the sequence from GenBank rather than >>> the ftp site. >>> >>> Regards, Dawn S. Lipshultz >> I cannot find the record to which you refer in your message. When I >> did a search for accession number BA000018, I received results for >> accession numbers AP003129-AP003138. They are all dated June 2001. >> >> >> The date for the record in the ftp file is April 2001. The record >> in GenBank (NC_002745) is dated October 2001. This version is >> apparently more updated than the one on the ftp site. Therefore, >> you may want to download the sequence from GenBank rather than the >> ftp site. Regards, Dawn S. Lipshultz > > Hmm, but I do get: > http://www.ncbi.nlm.nih.gov:80/cgi-bin/Entrez/framik?db=genome&gi=179 > > > look at the "GenBank: NC_002745" text in left upper part of the > window, it points to that OLD ftp file. The "RefSeq: NC_002745" > points to the April 2001 version. So what is the right way to get the > October 2001 release? > > Where can I find the difference between NC_002745 from April compared > to NC_002745 from October? > > What do you mean with "you may want to download the sequence from > GenBank rather than the ftp site."? > > BOTH ftp directories at ftp://ncbi.nlm.nih.gov are outdated. I mean > the genomes/Bacteria/Staphylococcus_aureus_N315/NC_002745.* version > and also the > genbank/genomes/Bacteria/Staphylococcus_aureus_N315/BA000018.* > version. > > The web links from www.ncbi.nlm.nih.gov:80/cgi-bin/Entrez/ point > anyway to the ftp site. Do you want to say that the ftp version > aren't updated anymore? The genome was originally released into the database on 4/20/2001 as 10 pieces with secondary accession number BA000018. You can find these pieces in Entrez nucleotides by querying with BA000018. The Genomes group here will fix the date on the record that is available from Entrez genomes. Regards, Dawn Date: Fri, 16 Nov 2001 16:09:59 -0500 (EST) From: Susan Dombrowski Subject: Re: Agrobacterium tumefaciens C58 > Dear colleague, I've noticed that there're somehow updated on Oct 17 > the genomic flatfiles of Agrobacterium tumefaciens C58 at > ftp://ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Agrobacterium_tumefaciens/. > However, for example the AE007869.gbs does NOT self-explain what has > been changed and also the VERSION number is not increased. Would you > please explain what's the change, when can I find such information > next time on web? > > I've used the published sequence from your ftp site on 2001-08-29 > with same ID and would like to know, what differs. > > LOCUS AE007869 2841581 bp DNA circular CON > 17-OCT-2001 DEFINITION Agrobacterium tumefaciens strain C58 circular > chromosome, complete sequence. ACCESSION AE007869 VERSION > AE007869 Dear Colleague, The version number of a sequence will *only* change if the content of the actual sequence has changed in any way since it was first made available. Although the date has changed, this date refers to the last time the actual record was manipulated by an NCBI staff member. Even if there is something simple, like adding a reference, changing a spelling mistake, etc., this will cause a change in the date field of the record. Thus, since the version has not changed, there are no differences to report. Best Regards, Susan Date: Wed, 26 Jun 2002 11:04:48 -0400 (EDT) From: Eric Sayers Subject: Re: Mesorhizobium_loti flatfiles >>>>> Hi, >>>>> I've found that you again silently changed flatfiles lying on your ftp >>>>> some time ago without changing the revision number. Please apologize me, >>>>> but this really causes troubles to other people working in this so called >>>>> bioinformatics. :( >>>>> >>>>> A week ago there was: >>>>> >>>>> LOCUS NC_002678 7036074 bp DNA circular BCT 10-SEP-2001 >>>>> DEFINITION Mesorhizobium loti, complete genome. >>>>> ACCESSION NC_002678 >>>>> VERSION NC_002678.1 GI:13470324 >>>>> >>>>> >>>>> and two other plasmid sequences. This yelds 7275 proteins. >>>>> >>>>> But, last autumn there was: >>>>> >>>>> LOCUS NC_002678 7036074 bp DNA circular BCT 28-MAR-2001 >>>>> DEFINITION Mesorhizobium loti, complete genome. >>>>> ACCESSION NC_002678 >>>>> VERSION NC_002678.1 GI:13470324 >>>>> >>>>> >>>>> That version had 7281 proteins in total. >>>>> I have simple questions: "Why was NOT changed the VERSION number?". >>>>> >>>>> Do I understand it wrong, that it should get updated whenever a single >>>>> character in the file contents is changed? >>> >>>> The version number of a sequence only changes if the sequence itself is >>>> modified. If anything else in the flat file is changed (ie spelling, authors, >>>> annotations, etc) the version will not change. However, the modification date in >>> >>> Sorry, do you under annotation also mean number of predicted genes, their >>> coordinates(position) etc? >>> >>>> the top line of the flat file will change for any of these modifications. (Note >>>> that the dates are different in the file you display: Mar 28, 2001 vs Sept 10, >>>> 2001.) I would track the modification date rather than or as well as the version >>>> number to catch all changes in the files. >>>> Regards, >>>> Eric W. Sayers, Ph.D. >>> >>> OK, but unless some of our programs have been buggy before or now (in >>> either of those cases have failed to extract genes from flatfiles), I do >>> not have an explanation for the differencies in amount of >>> predicted/annotated genes. >>> >>> I do not have anymore available the old flatfiles from Mar 28, but it >>> seems to me that these were newly introduced in the Sept. 10 version: >>> gi_15600768, gi_15600770, gi_15600769, gi_15600766, gi_15600767 >> >> Dear Colleague, >> Again, the only reason the version number will change is if the sequence itself >> changes. The number of annotated/predicted genes is merely an annotation on the >> sequence, and does not change the sequence itself. Therefore, the version will >> not change when the number of annotations changes. The modification date on the >> flat file will (and did) change, of course. >> >> Regards, >> Eric W. Sayers, Ph.D. > > Finally I've heard that from someone, thanks! > Now just tell me, how can I figure out what changed between those > different "date" releases? Is there a changelog available? > I consider annotations changes very important. We do not provide the details of flat file changes on our public websites, except for changes in the version number (ie actual sequence changes). In that particular case, all of the previous versions are linked to the current one. My advice to you if you want to chronicle non-sequence changes would be to check the flat files of interest periodically (by a script, for example) and look for changes in the modification dates. You could then simply compare the before and after flat files. Regards, Eric W. Sayers, Ph.D. > Hi, Miguel: > > id1_fetch can do it. Detailed instruction can be found at: > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id1_fetch.html > > Here is an example: > >> >id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > GI Loaded DB Retrieval No. > -- ------ -- ------------- > 74311105 12/07/2007 NCBI 19766263 > 74311105 01/23/2007 NCBI 16325656 > 74311105 03/30/2006 NCBI 13131204 > 74311105 03/03/2006 NCBI 12915541 > 74311105 03/02/2006 NCBI 12885275 > 74311105 12/03/2005 NCBI 12259793 > 74311105 09/09/2005 NCBI 11257262 > 74311105 09/09/2005 NCBI 11242667 > > Wenwu Cui PhD From david at burt7259.freeserve.co.uk Sun Apr 13 10:32:31 2008 From: david at burt7259.freeserve.co.uk (David Burt) Date: Sun, 13 Apr 2008 15:32:31 +0100 Subject: [Bioperl-l] bioperl-db In-Reply-To: <3F77F49A-9C9E-4450-AE28-46F00CADBC8B@gmx.net> References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> <000001c89ce2$5400a710$0202a8c0@STUDYPC> <3F77F49A-9C9E-4450-AE28-46F00CADBC8B@gmx.net> Message-ID: <000001c89d73$3b49eec0$0202a8c0@STUDYPC> Hi Hilmar Many thanks for info - tried a few things 1. First tried --safe flag perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser root --dbpass chicken --driver mysql --safe \ --namespace "InterPro" --format interprosax interpro.xml Still got same output as before ...deleting all relationships for InterPro ...parsing and loading InterPro Can't call method "name" on an undefined value at load_ontology.pl line 914 Only 35 interpro entries entered into database 2. I am using bioperl 1.5.2 3. I downloaded Release 17.0, 20 March 2008 of the interpro.xml file from ftp://ftp.ebi.ac.uk/pub/databases/interpro/ I did not send this file, sine it was ~10Mb gzipped Dave From david at burt7259.freeserve.co.uk Sun Apr 13 10:53:43 2008 From: david at burt7259.freeserve.co.uk (David Burt) Date: Sun, 13 Apr 2008 15:53:43 +0100 Subject: [Bioperl-l] bioperl-db In-Reply-To: References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> <000001c89ce5$a5df2e50$0202a8c0@STUDYPC> Message-ID: <000001c89d76$319be060$0202a8c0@STUDYPC> Hilmar Also updated copy of bioperl - see output below root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src $ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' 1.005002101 root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src $ cvs -d :pserver:cvs at cvs.bioperl.org:/home/repository/bioperl login Logging in to :pserver:cvs at cvs.bioperl.org:2401/home/repository/bioperl CVS password: root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src $ cd bioperl-live root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src/bioperl-live $ cvs -q update -d -P -r bioperl-release-1-5-2 P Build.PL P ModuleBuildBioperl.pm P Bio/Root/Version.pm cvs update: warning: t/data/taxdump/names.dmp was lost U t/data/taxdump/names.dmp cvs update: warning: t/data/taxdump/nodes.dmp was lost U t/data/taxdump/nodes.dmp root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src/bioperl-live $ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' 1.0050021 Why is the VERSION 1.0050021 rather than 1.5.2 ? Dave From heikki at sanbi.ac.za Wed Apr 16 07:36:16 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 16 Apr 2008 13:36:16 +0200 Subject: [Bioperl-l] bioperl-microarray: status? In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> Message-ID: <200804161336.16879.heikki@sanbi.ac.za> FYI, Christoper Jones has just published [http://bioinformatics.oxfordjournals.org/cgi/content/short/24/8/1102 an article in Bioinformatics] about his [http://search.cpan.org/perldoc?Microarray Microarray perl module] in CPAN. (The text added into BioPerl wiki.) -Heikki On Friday 26 January 2007 16:05:01 Chris Fields wrote: > Don't know if it's worth it, but could the microarray package be > modified so that it deals with data generated from or interacts > directly with Bioconductor (i.e. maybe including some specialized > bioperl-run set of classes to run Bioconductor tasks, return > lightweight bioperl microarray classes)? Allen pointed out in a > previous post that Bioconductor is the best pick for certain tasks, > while Perl excels at others: > > http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 > > Might be nice if we could merge both strengths together in some way. > > chris > > On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: > > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: > >> Eh, there is some discussion activity on the list, but not much. You > >> are really better off moving to Bioconductor. > > > > Ok, thanks. I added that to the wiki page: > > > > http://www.bioperl.org/wiki/Microarray_package > > > > j > > seqlab.net > > http://www.bioperl.org/wiki/User:Jhannah > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Wed Apr 16 07:36:16 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 16 Apr 2008 13:36:16 +0200 Subject: [Bioperl-l] bioperl-microarray: status? In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> Message-ID: <200804161336.16879.heikki@sanbi.ac.za> FYI, Christoper Jones has just published [http://bioinformatics.oxfordjournals.org/cgi/content/short/24/8/1102 an article in Bioinformatics] about his [http://search.cpan.org/perldoc?Microarray Microarray perl module] in CPAN. (The text added into BioPerl wiki.) -Heikki On Friday 26 January 2007 16:05:01 Chris Fields wrote: > Don't know if it's worth it, but could the microarray package be > modified so that it deals with data generated from or interacts > directly with Bioconductor (i.e. maybe including some specialized > bioperl-run set of classes to run Bioconductor tasks, return > lightweight bioperl microarray classes)? Allen pointed out in a > previous post that Bioconductor is the best pick for certain tasks, > while Perl excels at others: > > http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 > > Might be nice if we could merge both strengths together in some way. > > chris > > On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: > > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: > >> Eh, there is some discussion activity on the list, but not much. You > >> are really better off moving to Bioconductor. > > > > Ok, thanks. I added that to the wiki page: > > > > http://www.bioperl.org/wiki/Microarray_package > > > > j > > seqlab.net > > http://www.bioperl.org/wiki/User:Jhannah > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From pan.mueller at yahoo.de Wed Apr 16 08:34:51 2008 From: pan.mueller at yahoo.de (=?iso-8859-1?Q?Peter_M=FCller?=) Date: Wed, 16 Apr 2008 12:34:51 +0000 (GMT) Subject: [Bioperl-l] load_seqdatabase.pl --pipeline Message-ID: <297809.47580.qm@web28203.mail.ukl.yahoo.com> Dear list, a want to add gene symbols to unigene-cluster which were in a biosql database and lacks this information. So one way is to make a post-update script: my $adp = $db->get_object_adaptor('Bio::ClusterI'); my $pseq = $adp->find_by_primary_key(n); $adp->remove($pseq); $pseq->gene('symbol'); $adp->store($pseq); $adp->commit(); O.k., this works (I ask me why to remove the cluster first - bug or feature...?) Second way - perhaps: Using the --pipeline option, but it looks like useable only for seq-objects (Bio::Factory::SeqProcessoI) right? regards pan Machen Sie Yahoo! zu Ihrer Startseite. Los geht's: http://de.yahoo.com/set From cjfields at uiuc.edu Wed Apr 16 11:00:51 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 16 Apr 2008 10:00:51 -0500 Subject: [Bioperl-l] bioperl-microarray: status? In-Reply-To: <200804161336.16879.heikki@sanbi.ac.za> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: <479BD5A4-9C9A-4733-889D-65942F24A7F3@uiuc.edu> That would be worth looking into at some point, if anyone's interested (though it may be best to build a 'bridging' module). Wonder if it uses BioConductor and, if not, how performance is vs BioConductor? chris On Apr 16, 2008, at 6:36 AM, Heikki Lehvaslaiho wrote: > FYI, > > Christoper Jones has just published > [http://bioinformatics.oxfordjournals.org/cgi/content/short/ > 24/8/1102 an > article in Bioinformatics] about his > [http://search.cpan.org/perldoc?Microarray Microarray perl module] > in CPAN. > > (The text added into BioPerl wiki.) > > -Heikki > > > On Friday 26 January 2007 16:05:01 Chris Fields wrote: >> Don't know if it's worth it, but could the microarray package be >> modified so that it deals with data generated from or interacts >> directly with Bioconductor (i.e. maybe including some specialized >> bioperl-run set of classes to run Bioconductor tasks, return >> lightweight bioperl microarray classes)? Allen pointed out in a >> previous post that Bioconductor is the best pick for certain tasks, >> while Perl excels at others: >> >> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 >> >> Might be nice if we could merge both strengths together in some way. >> >> chris >> >> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: >>> On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >>>> Eh, there is some discussion activity on the list, but not much. >>>> You >>>> are really better off moving to Bioconductor. >>> >>> Ok, thanks. I added that to the wiki page: >>> >>> http://www.bioperl.org/wiki/Microarray_package >>> >>> j >>> seqlab.net >>> http://www.bioperl.org/wiki/User:Jhannah >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From j-keller2 at md.northwestern.edu Wed Apr 16 12:12:27 2008 From: j-keller2 at md.northwestern.edu (Jacob Keller) Date: Wed, 16 Apr 2008 11:12:27 -0500 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: <200804161336.16879.heikki@sanbi.ac.za> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: Hello All, I am new to this list, so am not totally sure this is the right forum, so please forgive if this is not the right place to asl the following question: I am seeking to get all sequences that have a given domain architecture, or at least that contain two given domains. I have thought of a few ways to do this. 1. Blast/Psi-blast for each domain, then compare the results for common sequences between the two lists, and fetch those. I would need to write a (simple) script to do this, but would prefer not to re-invent the wheel. 2. Search with a paradigm sequence of desired architecture/domain composition, somehow tweaking the psiblast parameters to find only matches over the whole search sequence, thereby finding both desired domains. I am not sure how to tweak blast to do this, though. 3. Pfam has this capability, i.e. to show all domains with a given architecture, but it is difficult to get at the actual sequences or even a list of accession numbers. Does anybody have any suggestions as to how optimally to get these seq's? Thanks for your consideration, Jacob ******************************************* Jacob Pearson Keller Northwestern University Medical Scientist Training Program Dallos Laboratory F. Searle 1-240 2240 Campus Drive Evanston IL 60208 lab: 847.491.2438 cel: 773.608.9185 email: j-keller2 at northwestern.edu ******************************************* ----- Original Message ----- From: "Heikki Lehvaslaiho" To: Cc: ; "Chris Fields" ; "Jay Hannah" ; Sent: Wednesday, April 16, 2008 6:36 AM Subject: Re: [Bioperl-l] bioperl-microarray: status? > FYI, > > Christoper Jones has just published > [http://bioinformatics.oxfordjournals.org/cgi/content/short/24/8/1102 an > article in Bioinformatics] about his > [http://search.cpan.org/perldoc?Microarray Microarray perl module] in > CPAN. > > (The text added into BioPerl wiki.) > > -Heikki > > > On Friday 26 January 2007 16:05:01 Chris Fields wrote: >> Don't know if it's worth it, but could the microarray package be >> modified so that it deals with data generated from or interacts >> directly with Bioconductor (i.e. maybe including some specialized >> bioperl-run set of classes to run Bioconductor tasks, return >> lightweight bioperl microarray classes)? Allen pointed out in a >> previous post that Bioconductor is the best pick for certain tasks, >> while Perl excels at others: >> >> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 >> >> Might be nice if we could merge both strengths together in some way. >> >> chris >> >> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: >> > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >> >> Eh, there is some discussion activity on the list, but not much. You >> >> are really better off moving to Bioconductor. >> > >> > Ok, thanks. I added that to the wiki page: >> > >> > http://www.bioperl.org/wiki/Microarray_package >> > >> > j >> > seqlab.net >> > http://www.bioperl.org/wiki/User:Jhannah >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From j-keller2 at md.northwestern.edu Wed Apr 16 12:12:27 2008 From: j-keller2 at md.northwestern.edu (Jacob Keller) Date: Wed, 16 Apr 2008 11:12:27 -0500 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: <200804161336.16879.heikki@sanbi.ac.za> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: Hello All, I am new to this list, so am not totally sure this is the right forum, so please forgive if this is not the right place to asl the following question: I am seeking to get all sequences that have a given domain architecture, or at least that contain two given domains. I have thought of a few ways to do this. 1. Blast/Psi-blast for each domain, then compare the results for common sequences between the two lists, and fetch those. I would need to write a (simple) script to do this, but would prefer not to re-invent the wheel. 2. Search with a paradigm sequence of desired architecture/domain composition, somehow tweaking the psiblast parameters to find only matches over the whole search sequence, thereby finding both desired domains. I am not sure how to tweak blast to do this, though. 3. Pfam has this capability, i.e. to show all domains with a given architecture, but it is difficult to get at the actual sequences or even a list of accession numbers. Does anybody have any suggestions as to how optimally to get these seq's? Thanks for your consideration, Jacob ******************************************* Jacob Pearson Keller Northwestern University Medical Scientist Training Program Dallos Laboratory F. Searle 1-240 2240 Campus Drive Evanston IL 60208 lab: 847.491.2438 cel: 773.608.9185 email: j-keller2 at northwestern.edu ******************************************* ----- Original Message ----- From: "Heikki Lehvaslaiho" To: Cc: ; "Chris Fields" ; "Jay Hannah" ; Sent: Wednesday, April 16, 2008 6:36 AM Subject: Re: [Bioperl-l] bioperl-microarray: status? > FYI, > > Christoper Jones has just published > [http://bioinformatics.oxfordjournals.org/cgi/content/short/24/8/1102 an > article in Bioinformatics] about his > [http://search.cpan.org/perldoc?Microarray Microarray perl module] in > CPAN. > > (The text added into BioPerl wiki.) > > -Heikki > > > On Friday 26 January 2007 16:05:01 Chris Fields wrote: >> Don't know if it's worth it, but could the microarray package be >> modified so that it deals with data generated from or interacts >> directly with Bioconductor (i.e. maybe including some specialized >> bioperl-run set of classes to run Bioconductor tasks, return >> lightweight bioperl microarray classes)? Allen pointed out in a >> previous post that Bioconductor is the best pick for certain tasks, >> while Perl excels at others: >> >> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 >> >> Might be nice if we could merge both strengths together in some way. >> >> chris >> >> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: >> > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >> >> Eh, there is some discussion activity on the list, but not much. You >> >> are really better off moving to Bioconductor. >> > >> > Ok, thanks. I added that to the wiki page: >> > >> > http://www.bioperl.org/wiki/Microarray_package >> > >> > j >> > seqlab.net >> > http://www.bioperl.org/wiki/User:Jhannah >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From frederic.romagne at gmail.com Wed Apr 16 13:25:18 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Wed, 16 Apr 2008 12:25:18 -0500 Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix Message-ID: <1208366718.19084.15.camel@kiss-laptop> Hello, i made a program which use Bio::Index::GenBank and i tested it under unix, that worked well. But i have to launch it under windows and it seems not to work on. Here is the problem : my $dbobj = Bio::Index::Abstract->new("Data/$db"); ?my $seq = $dbobj->get_Seq_by_acc($id); print $seq->display_id."\n"; did not print the same number than $id !!! So i don't work on the sequence expected... I use the SVN sources on unix and the Perl package manager for windows... Thanks. From cjfields at uiuc.edu Wed Apr 16 13:52:59 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 16 Apr 2008 12:52:59 -0500 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: You can try CDART: http://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi?cmd=rps There are probably other tools out there as well. If you want to roll your own, you can use bioperl wrappers for all of these (Bio::Tools::Run::StandAloneBlast is in bioperl-live, Bio::Tools::Run::Hmmer in bioperl-run), tweaking the parameters as you see fit, and either parse while running them or store the file for parsing later using Bio::SearchIO. Personally, I wouldn't go with (2) unless you are absolutely sure the domains are found only once per sequence, are spatially conserved, and don't overlap. For instance, with many proteins you could have a domain structure like dom1-dom2, dom2-dom1, dom1-dom1-dom2, etc. If you just want accessions from Pfam's Stockholm format (which are UniProt, I believe) you can get at accessions using Bio::AlignIO::stockholm (using perl 5.10): use Bio::AlignIO; use feature 'say'; my $file = shift || die "Must pass file as argument\n"; my $in = Bio::AlignIO->new(-format => 'stockholm', -file => $file); while (my $aln = $in->next_aln) { my @accs; for my $seq ($aln->each_seq) { push @accs, $seq->accession_number; } say join(',', at accs); } chris On Apr 16, 2008, at 11:12 AM, Jacob Keller wrote: > Hello All, > > I am new to this list, so am not totally sure this is the right > forum, so please forgive if this is not the right place to asl the > following question: I am seeking to get all sequences that have a > given domain architecture, or at least that contain two given > domains. I have thought of a few ways to do this. > > 1. Blast/Psi-blast for each domain, then compare the results for > common sequences between the two lists, and fetch those. I would > need to write a (simple) script to do this, but would prefer not to > re-invent the wheel. > > 2. Search with a paradigm sequence of desired architecture/domain > composition, somehow tweaking the psiblast parameters to find only > matches over the whole search sequence, thereby finding both desired > domains. I am not sure how to tweak blast to do this, though. > > 3. Pfam has this capability, i.e. to show all domains with a given > architecture, but it is difficult to get at the actual sequences or > even a list of accession numbers. > > Does anybody have any suggestions as to how optimally to get these > seq's? > > Thanks for your consideration, > > Jacob > > ******************************************* > Jacob Pearson Keller > Northwestern University > Medical Scientist Training Program > Dallos Laboratory > F. Searle 1-240 > 2240 Campus Drive > Evanston IL 60208 > lab: 847.491.2438 > cel: 773.608.9185 > email: j-keller2 at northwestern.edu > ******************************************* > > ----- Original Message ----- From: "Heikki Lehvaslaiho" > > To: > Cc: ; "Chris Fields" ; "Jay > Hannah" ; > Sent: Wednesday, April 16, 2008 6:36 AM > Subject: Re: [Bioperl-l] bioperl-microarray: status? > > >> FYI, >> >> Christoper Jones has just published >> [http://bioinformatics.oxfordjournals.org/cgi/content/short/ >> 24/8/1102 an >> article in Bioinformatics] about his >> [http://search.cpan.org/perldoc?Microarray Microarray perl module] >> in CPAN. >> >> (The text added into BioPerl wiki.) >> >> -Heikki >> >> >> On Friday 26 January 2007 16:05:01 Chris Fields wrote: >>> Don't know if it's worth it, but could the microarray package be >>> modified so that it deals with data generated from or interacts >>> directly with Bioconductor (i.e. maybe including some specialized >>> bioperl-run set of classes to run Bioconductor tasks, return >>> lightweight bioperl microarray classes)? Allen pointed out in a >>> previous post that Bioconductor is the best pick for certain tasks, >>> while Perl excels at others: >>> >>> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 >>> >>> Might be nice if we could merge both strengths together in some way. >>> >>> chris >>> >>> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: >>> > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >>> >> Eh, there is some discussion activity on the list, but not >>> much. You >>> >> are really better off moving to Bioconductor. >>> > >>> > Ok, thanks. I added that to the wiki page: >>> > >>> > http://www.bioperl.org/wiki/Microarray_package >>> > >>> > j >>> > seqlab.net >>> > http://www.bioperl.org/wiki/User:Jhannah >>> > >>> > _______________________________________________ >>> > Bioperl-l mailing list >>> > Bioperl-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ______ _/ _/ >> _____________________________________________________ >> _/ _/ >> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >> _/ _/ _/ SANBI, South African National Bioinformatics Institute >> _/ _/ _/ University of Western Cape, South Africa >> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >> ___ _/_/_/_/_/ >> ________________________________________________________ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From David.Messina at sbc.su.se Wed Apr 16 14:23:27 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 16 Apr 2008 20:23:27 +0200 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: <628aabb70804161123s453bd96bqd2213b938dfdb3a2@mail.gmail.com> Hey Jacob, This forum is mostly geared toward the BioPerl software package rather than general bioinformatics assistance. That being said, I would recommend using Pfam's Sequence Search to determine the domain content of your sequences and then simply looking at those which have the same two domains of interest. If there are more sequences matching this criterion than can be examined manually, you could write up something (potentially using BioPerl) to then look at the relative order and number of those domains in your sequences. However, if these sequences have UniProt IDs, you can start with the domains and Pfam will hand you a list of all the UniProt seqs having those domains. On the Pfam website's main page, click on "Help" (right side of menu at the top of the page) and then "Tools and Services" (left side menu). Dave From Russell.Smithies at agresearch.co.nz Wed Apr 16 16:49:49 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 17 Apr 2008 08:49:49 +1200 Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix In-Reply-To: <1208366718.19084.15.camel@kiss-laptop> References: <1208366718.19084.15.camel@kiss-laptop> Message-ID: Did you check the format of your input file? i.e. DOS or UNIX line endings? > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Fr?d?ric Romagn? > Sent: Thursday, 17 April 2008 5:25 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix > > Hello, > i made a program which use Bio::Index::GenBank and i tested it under > unix, that worked well. > > But i have to launch it under windows and it seems not to work on. > > Here is the problem : > > my $dbobj = Bio::Index::Abstract->new("Data/$db"); > ?my $seq = $dbobj->get_Seq_by_acc($id); > print $seq->display_id."\n"; > > did not print the same number than $id !!! So i don't work on the > sequence expected... > > I use the SVN sources on unix and the Perl package manager for > windows... > > Thanks. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From frederic.romagne at gmail.com Wed Apr 16 17:39:07 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Wed, 16 Apr 2008 16:39:07 -0500 Subject: [Bioperl-l] index::abstract on win and unix In-Reply-To: References: <1208366718.19084.15.camel@kiss-laptop> Message-ID: <1208381947.16620.6.camel@kiss-laptop> Well, if with input file you mean the database used, it's created with ?Bio::Index::GenBank from a ncbi FTP's genbank file. $id is an accession number read from a file but i chomp the line... I am trying to install the svn version of bioperl under windows to see if there is an improvement. Le jeudi 17 avril 2008 ? 08:49 +1200, Smithies, Russell a ?crit : > Did you check the format of your input file? > i.e. DOS or UNIX line endings? > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > > bio.org] On Behalf Of Fr?d?ric Romagn? > > Sent: Thursday, 17 April 2008 5:25 a.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix > > > > Hello, > > i made a program which use Bio::Index::GenBank and i tested it under > > unix, that worked well. > > > > But i have to launch it under windows and it seems not to work on. > > > > Here is the problem : > > > > my $dbobj = Bio::Index::Abstract->new("Data/$db"); > > ?my $seq = $dbobj->get_Seq_by_acc($id); > > print $seq->display_id."\n"; > > > > did not print the same number than $id !!! So i don't work on the > > sequence expected... > > > > I use the SVN sources on unix and the Perl package manager for > > windows... > > > > Thanks. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= From hubert.gaynor at yahoo.com Thu Apr 17 02:19:11 2008 From: hubert.gaynor at yahoo.com (Hubert Gaynor) Date: Wed, 16 Apr 2008 23:19:11 -0700 (PDT) Subject: [Bioperl-l] Can I use BLAST against a database like MySQL Message-ID: <657734.41592.qm@web46008.mail.sp1.yahoo.com> Hi, As far as I know, before using BLAST to do the alignment the first thing should be done is typing formatdb to construct a database. But I was wondering whether it is possible to construct a database with MySQL which probably will grant the BLAST search a higher speed and make the database management much easier? Thanks! Hubert. ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From sdavis2 at mail.nih.gov Thu Apr 17 06:36:32 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 17 Apr 2008 06:36:32 -0400 Subject: [Bioperl-l] Can I use BLAST against a database like MySQL In-Reply-To: <657734.41592.qm@web46008.mail.sp1.yahoo.com> References: <657734.41592.qm@web46008.mail.sp1.yahoo.com> Message-ID: <264855a00804170336o2a2bcff9xfcb05a33bac4c8dc@mail.gmail.com> On Thu, Apr 17, 2008 at 2:19 AM, Hubert Gaynor wrote: > Hi, > > As far as I know, before using BLAST to do the alignment the first thing should be done is typing formatdb to construct a database. But I was wondering whether it is possible to construct a database with MySQL which probably will grant the BLAST search a higher speed and make the database management much easier? > formatdb is used to make a representation that can be used efficiently by blast. That representation already makes blast faster. MySQL can't be used for such things. As for speeding blast, if you have a multiprocessor machine, you can take advantage of those using blast and increasing the number of processors. Also, while blast is a very versatile program, it is not the only alignment program available. Depending on your needs, you could look at other programs such as blat or gmap that can be 2-3 orders of magnitude faster than blast. Sean From stefan.kirov at bms.com Thu Apr 17 09:40:29 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 09:40:29 -0400 Subject: [Bioperl-l] bioperl-db woes Message-ID: <4807534D.80105@bms.com> I'm having problems passing all the tests for bioperl-db. There are 2 distinct errors, first one: Can't locate Bio/DB/BioSQL/RichSeqAdaptor.pm ***Which by the way is embed deep into several layers of eval, so I am getting the actual error from the test: ***t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 78. or ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Annotation of class Bio::Annotation::Collection not type-mapped. Internal error? STACK: Error::throw STACK: Bio::Root::Root::throw /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store Bio/DB/Persistent/PersistentObject.pm:271 STACK: Bio::DB::BioSQL::SeqAdaptor::store_children Bio/DB/BioSQL/SeqAdaptor.pm:224 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK: Bio::DB::Persistent::PersistentObject::create Bio/DB/Persistent/PersistentObject.pm:244 STACK: t/04swiss.t:36 ----------------------------------------------------------- It turns out the adaptor is really not there??? My bioperl-db is from dev.open-bio.org/home/svn-repositories/bioperl/bioperl-db/trunk bioperl-db (revision 14661) Is this module being deprecated (I am sure it is not) my download incomplete....? The other problem was: DBD::Oracle::st execute failed: ORA-02292: integrity constraint (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with ParamValues: :p1=9606] at /home/kirovs/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 320. not ok 76 # Test 76 got: (t/02species.t at line 71) I have not tried to debug this one.... Thanks! Stefan From stefan.kirov at bms.com Thu Apr 17 10:18:30 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 10:18:30 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> Message-ID: On Thu, 17 Apr 2008, Chris Fields wrote: > The 'get_dbxrefs' problem looks related to recent changes I made when rolling > back the significant feature/annotation changes introduced just prior to the > 1.5 release, none which were fully implemented. I can check that one out. > Odd though; these passed for me, but I'm using MySQL not oracle. get_dbxref is not the problem- I think the error message is misleading: kirovs at horta:~/bioperl-db> grep get_dbxrefs /home/kirovs/bioperl-live/Bio/Ontology/Term.pm get_dbxrefs() instead, which handles both strings and DBLink "Use get_dbxrefs() instead"); $self->get_dbxrefs($context); =head2 get_dbxrefs Title : get_dbxrefs() Usage : @ds = $term->get_dbxrefs(); sub get_dbxrefs { } # get_dbxrefs my @old = $self->get_dbxrefs($context); sub each_dblink {shift->throw("use of each_dblink() is deprecated; use get_dbxrefs() instead")} So it is there. In any case I debugged and tracked that down to the RichSeq adaptor module missing. It is not in the distro I downloaded, so I think this is my problem. It is a different question why... I looked at different repos (SVN, CVS, trunk, different tags) and I did not see RichSeq.pm. I am not sure what is going on. Perhaps Hilmar will be able to help when he is around. Thanks for the help Chris.... Stefan > > You may want to make sure you are using bioperl-live and that there isn't an > older bioperl installation getting into the mix. > > chris > > On Apr 17, 2008, at 8:40 AM, Stefan Kirov wrote: > >> I'm having problems passing all the tests for bioperl-db. There are 2 >> distinct errors, first one: >> Can't locate Bio/DB/BioSQL/RichSeqAdaptor.pm >> ***Which by the way is embed deep into several layers of eval, so I >> am getting the actual error from the test: >> ***t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" >> via package "Bio::Ontology::Term" at >> >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 78. >> or >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> >> MSG: Annotation of class Bio::Annotation::Collection not >> type-mapped. Internal error? >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >> STACK: >> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >> STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store >> Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >> Bio/DB/BioSQL/SeqAdaptor.pm:224 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::Persistent::PersistentObject::create >> Bio/DB/Persistent/PersistentObject.pm:244 >> STACK: t/04swiss.t:36 >> ----------------------------------------------------------- >> >> It turns out the adaptor is really not there??? >> My bioperl-db is from >> dev.open-bio.org/home/svn-repositories/bioperl/bioperl-db/trunk >> bioperl-db (revision 14661) >> Is this module being deprecated (I am sure it is not) my download >> incomplete....? >> The other problem was: >> DBD::Oracle::st execute failed: ORA-02292: integrity constraint >> (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: >> OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with >> ParamValues: :p1=9606] at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm >> line 320. >> not ok 76 >> # Test 76 got: (t/02species.t at line 71) >> I have not tried to debug this one.... >> Thanks! >> Stefan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > From cjfields at uiuc.edu Thu Apr 17 09:59:57 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 17 Apr 2008 08:59:57 -0500 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <4807534D.80105@bms.com> References: <4807534D.80105@bms.com> Message-ID: <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> The 'get_dbxrefs' problem looks related to recent changes I made when rolling back the significant feature/annotation changes introduced just prior to the 1.5 release, none which were fully implemented. I can check that one out. Odd though; these passed for me, but I'm using MySQL not oracle. You may want to make sure you are using bioperl-live and that there isn't an older bioperl installation getting into the mix. chris On Apr 17, 2008, at 8:40 AM, Stefan Kirov wrote: > I'm having problems passing all the tests for bioperl-db. There are 2 > distinct errors, first one: > Can't locate Bio/DB/BioSQL/RichSeqAdaptor.pm > ***Which by the way is embed deep into several layers of eval, so I > am getting the actual error from the test: > ***t/04swiss.........ok 3/52Can't locate object method > "get_dbxrefs" > via package "Bio::Ontology::Term" at > > /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm > line 552, line 78. > or > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Annotation of class Bio::Annotation::Collection not > type-mapped. Internal error? > STACK: Error::throw > STACK: Bio::Root::Root::throw > /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 > STACK: > Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key > Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 > STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children > Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > STACK: Bio::DB::Persistent::PersistentObject::store > Bio/DB/Persistent/PersistentObject.pm:271 > STACK: Bio::DB::BioSQL::SeqAdaptor::store_children > Bio/DB/BioSQL/SeqAdaptor.pm:224 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK: Bio::DB::Persistent::PersistentObject::create > Bio/DB/Persistent/PersistentObject.pm:244 > STACK: t/04swiss.t:36 > ----------------------------------------------------------- > > It turns out the adaptor is really not there??? > My bioperl-db is from > dev.open-bio.org/home/svn-repositories/bioperl/bioperl-db/trunk > bioperl-db (revision 14661) > Is this module being deprecated (I am sure it is not) my download > incomplete....? > The other problem was: > DBD::Oracle::st execute failed: ORA-02292: integrity constraint > (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: > OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with > ParamValues: :p1=9606] at > /home/kirovs/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm > line 320. > not ok 76 > # Test 76 got: (t/02species.t at line 71) > I have not tried to debug this one.... > Thanks! > Stefan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Thu Apr 17 10:52:32 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 10:52:32 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <9ECDEB39-95F3-4A94-9AF7-FFEBBDEFF0FA@gmx.net> References: <4807534D.80105@bms.com> <9ECDEB39-95F3-4A94-9AF7-FFEBBDEFF0FA@gmx.net> Message-ID: That is correct and I assumed I should not be concerned with this error. Thanks Stefan On Thu, 17 Apr 2008, Hilmar Lapp wrote: > > On Apr 17, 2008, at 9:40 AM, Stefan Kirov wrote: >> The other problem was: >> DBD::Oracle::st execute failed: ORA-02292: integrity constraint >> (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: >> OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with >> ParamValues: :p1=9606] at > > > This sounds like you are running the tests against a non-empty database? > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > From hlapp at gmx.net Thu Apr 17 10:47:58 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 17 Apr 2008 10:47:58 -0400 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> Message-ID: <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: > In any case I debugged and tracked that down to the RichSeq adaptor > module missing. That almost can't be the problem. Every Bio::Seq::RichSeq is-a Bio::Seq and a SeqAdaptor is present. I'm afraid it gets stuck somewhere else and frankly I didn't see the RichSeqAdaptor failing to load in your stack trace: > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Annotation of class Bio::Annotation::Collection not > type-mapped. Internal error? > STACK: Error::throw > STACK: Bio::Root::Root::throw > /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 > STACK: > Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key > Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 > STACK: > Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children > Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > STACK: Bio::DB::Persistent::PersistentObject::store > Bio/DB/Persistent/PersistentObject.pm:271 > STACK: Bio::DB::BioSQL::SeqAdaptor::store_children > Bio/DB/BioSQL/SeqAdaptor.pm:224 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK: Bio::DB::Persistent::PersistentObject::create > Bio/DB/Persistent/PersistentObject.pm:244 > STACK: t/04swiss.t:36 > ----------------------------------------------------------- What that tells me is that when bioperl-db tries to store the annotation bundle of the (SwissProt) sequence, one of the annotations that it encounters is of type Bio::Annotation::Collection. At present bioperl-db doesn't know what to do with it; i.e., bioperl-db can't yet handle hierarchical annotation collections (collections within collections). I believe this is due to recent changes in how the GN line is parsed in BioPerl - Chris does this ring the right bell? I thought though you had built in a method would allow flattening out? It's worth noting that BioSQL itself can't really represent nested annotation collections other than by using ontology terms and their hierarchy, which at present I think isn't really appropriate, but I have to think through the issue more. In other words, in BioSQL you can't directly tie together a bunch of qualifier value pairs into a "bag" and then nest this bag within another. The way to make this work with the current schema is to flatten out the nesting. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Apr 17 10:48:52 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 17 Apr 2008 10:48:52 -0400 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <4807534D.80105@bms.com> References: <4807534D.80105@bms.com> Message-ID: <9ECDEB39-95F3-4A94-9AF7-FFEBBDEFF0FA@gmx.net> On Apr 17, 2008, at 9:40 AM, Stefan Kirov wrote: > The other problem was: > DBD::Oracle::st execute failed: ORA-02292: integrity constraint > (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: > OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with > ParamValues: :p1=9606] at This sounds like you are running the tests against a non-empty database? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From stefan.kirov at bms.com Thu Apr 17 11:28:42 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 11:28:42 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: Hilmar, I think I saw what happens with this adaptor- In Bio::DB::BioSQL::DBAdaptor::_load_object_adaptor (call from create_persistent) there is request that this module is loaded: Bio/DB/BioSQL/RichSeqAdaptor.pm There is no such module... This always fails, but since it is evaled, there is no actual error- instead. Perhaps this is leftover...? This got me fooled... I guess Chris could be right- Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key is being passed Bio::Annotation::Collection as a value for $obj->obj(). Or recursing too far? Anyway, I am just guessing here- I do not know the architecture of bioperl-db... Thanks again for the help... Stefan On Thu, 17 Apr 2008, Hilmar Lapp wrote: > > On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >> In any case I debugged and tracked that down to the RichSeq adaptor module >> missing. > > > That almost can't be the problem. Every Bio::Seq::RichSeq is-a Bio::Seq and a > SeqAdaptor is present. > > I'm afraid it gets stuck somewhere else and frankly I didn't see the > RichSeqAdaptor failing to load in your stack trace: > >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> >> MSG: Annotation of class Bio::Annotation::Collection not >> type-mapped. Internal error? >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >> STACK: >> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >> STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store >> Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >> Bio/DB/BioSQL/SeqAdaptor.pm:224 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::Persistent::PersistentObject::create >> Bio/DB/Persistent/PersistentObject.pm:244 >> STACK: t/04swiss.t:36 >> ----------------------------------------------------------- > > What that tells me is that when bioperl-db tries to store the annotation > bundle of the (SwissProt) sequence, one of the annotations that it encounters > is of type Bio::Annotation::Collection. At present bioperl-db doesn't know > what to do with it; i.e., bioperl-db can't yet handle hierarchical annotation > collections (collections within collections). > > I believe this is due to recent changes in how the GN line is parsed in > BioPerl - Chris does this ring the right bell? I thought though you had built > in a method would allow flattening out? > > It's worth noting that BioSQL itself can't really represent nested annotation > collections other than by using ontology terms and their hierarchy, which at > present I think isn't really appropriate, but I have to think through the > issue more. In other words, in BioSQL you can't directly tie together a bunch > of qualifier value pairs into a "bag" and then nest this bag within another. > The way to make this work with the current schema is to flatten out the > nesting. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at uiuc.edu Thu Apr 17 12:26:41 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 17 Apr 2008 11:26:41 -0500 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: On Apr 17, 2008, at 9:47 AM, Hilmar Lapp wrote: > > On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >> In any case I debugged and tracked that down to the RichSeq adaptor >> module missing. > > > That almost can't be the problem. Every Bio::Seq::RichSeq is-a > Bio::Seq and a SeqAdaptor is present. > > I'm afraid it gets stuck somewhere else and frankly I didn't see the > RichSeqAdaptor failing to load in your stack trace: > >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> >> MSG: Annotation of class Bio::Annotation::Collection not >> type-mapped. Internal error? >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >> STACK: >> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >> STACK: >> Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store >> Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >> Bio/DB/BioSQL/SeqAdaptor.pm:224 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::Persistent::PersistentObject::create >> Bio/DB/Persistent/PersistentObject.pm:244 >> STACK: t/04swiss.t:36 >> ----------------------------------------------------------- > > What that tells me is that when bioperl-db tries to store the > annotation bundle of the (SwissProt) sequence, one of the > annotations that it encounters is of type > Bio::Annotation::Collection. At present bioperl-db doesn't know what > to do with it; i.e., bioperl-db can't yet handle hierarchical > annotation collections (collections within collections). > > I believe this is due to recent changes in how the GN line is parsed > in BioPerl - Chris does this ring the right bell? I thought though > you had built in a method would allow flattening out This appears to be using an older bioperl-live checkout, one where Heikki changed GN parsing to use a nested Annotation::Collection. I reverted that back in a later commit to svn specifically b/c of bioperl-db problems. bioperl-live's swiss.pm now uses a new subclass of Bio::Annotation::SimpleValue (Bio::Annotation::TagTree) that represents nested values via Data::Stag's itext output (we can change that to alternatives if needed). Here are the last few relevant revisions in bioperl-live's main trunk (mine is the latest): ------------------------------------------------------------------------ r14562 | cjfields | 2008-02-28 08:30:05 -0600 (Thu, 28 Feb 2008) | 1 line bug 1825: updating swiss.pm/tests to try out TagTree (passes all tests). Need to update Handler.t and related modules still... ------------------------------------------------------------------------ r14541 | heikki | 2008-02-25 00:10:48 -0600 (Mon, 25 Feb 2008) | 1 line documentation for the GN line parsing and management ------------------------------------------------------------------------ r14538 | heikki | 2008-02-23 08:48:23 -0600 (Sat, 23 Feb 2008) | 1 line GN (Gene Name) line parsing rewrite. Breaks backward compatibility. Can now deal with >1 gene per entry and four categories of names per gene. Parses old style syntax (...OR ... OR ... ) into one gene name and synonyms for each gene. Docs to follow. .... I just updated all code from dev and reran bioperl-db tests w/o problems. Maybe someone else could do the same to see what happens? > It's worth noting that BioSQL itself can't really represent nested > annotation collections other than by using ontology terms and their > hierarchy, which at present I think isn't really appropriate, but I > have to think through the issue more. In other words, in BioSQL you > can't directly tie together a bunch of qualifier value pairs into a > "bag" and then nest this bag within another. The way to make this > work with the current schema is to flatten out the nesting. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== Might be worth looking into for a future BioSQL release, but we have a decent workaround in place for now, as long as it works cross-platform and cross-RDB. chris From stefan.kirov at bms.com Thu Apr 17 12:40:14 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 12:40:14 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] bioperl-db woes In-Reply-To: References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: Hilmar, sorry, I missed the part after the stack trace... In any case this is still problem for me after I updated bioperl-live. I see this with a number of other tests: t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 78. t/04swiss.........dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 6-52 Failed 47/52 tests, 9.62% okay t/05seqfeature....ok 4/48Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 72. t/05seqfeature....FAILED tests 9-48 Failed 40/48 tests, 16.67% okay t/06comment.......ok t/07dblink........ok t/08genbank.......ok t/09fuzzy2........ok t/10ensembl.......ok 1/15Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1420. t/10ensembl.......dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 3-15 Failed 13/15 tests, 13.33% okay t/11locuslink.....ok 4/110Can't locate object method "get_dbxrefs" via package "Bio::Annotation::OntologyTerm" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1. t/11locuslink.....dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 5-110 Failed 106/110 tests, 3.64% okay t/12ontology......ok 1/738Can't locate object method "get_dbxrefs" via package "Bio::Ontology::GOterm" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 98. t/12ontology......dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 5-738 Failed 734/738 tests, 0.54% okay t/13remove........ok 2/59Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 145. t/13remove........FAILED tests 11-59 Failed 49/59 tests, 16.95% okay t/14query.........ok t/15cluster.......ok 3/160Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1. t/15cluster.......dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 6-160 Failed 155/160 tests, 3.12% okay t/16obda..........ok On Thu, 17 Apr 2008, Chris Fields wrote: > > On Apr 17, 2008, at 9:47 AM, Hilmar Lapp wrote: > >> >> On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >>> In any case I debugged and tracked that down to the RichSeq adaptor module >>> missing. >> >> >> That almost can't be the problem. Every Bio::Seq::RichSeq is-a Bio::Seq and >> a SeqAdaptor is present. >> >> I'm afraid it gets stuck somewhere else and frankly I didn't see the >> RichSeqAdaptor failing to load in your stack trace: >> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> >>> MSG: Annotation of class Bio::Annotation::Collection not >>> type-mapped. Internal error? >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw >>> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >>> STACK: >>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >>> STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >>> STACK: Bio::DB::Persistent::PersistentObject::store >>> Bio/DB/Persistent/PersistentObject.pm:271 >>> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >>> Bio/DB/BioSQL/SeqAdaptor.pm:224 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>> STACK: Bio::DB::Persistent::PersistentObject::create >>> Bio/DB/Persistent/PersistentObject.pm:244 >>> STACK: t/04swiss.t:36 >>> ----------------------------------------------------------- >> >> What that tells me is that when bioperl-db tries to store the annotation >> bundle of the (SwissProt) sequence, one of the annotations that it >> encounters is of type Bio::Annotation::Collection. At present bioperl-db >> doesn't know what to do with it; i.e., bioperl-db can't yet handle >> hierarchical annotation collections (collections within collections). >> >> I believe this is due to recent changes in how the GN line is parsed in >> BioPerl - Chris does this ring the right bell? I thought though you had >> built in a method would allow flattening out > > This appears to be using an older bioperl-live checkout, one where Heikki > changed GN parsing to use a nested Annotation::Collection. I reverted that > back in a later commit to svn specifically b/c of bioperl-db problems. > bioperl-live's swiss.pm now uses a new subclass of > Bio::Annotation::SimpleValue (Bio::Annotation::TagTree) that represents > nested values via Data::Stag's itext output (we can change that to > alternatives if needed). > > Here are the last few relevant revisions in bioperl-live's main trunk (mine > is the latest): > > ------------------------------------------------------------------------ > r14562 | cjfields | 2008-02-28 08:30:05 -0600 (Thu, 28 Feb 2008) | 1 line > > bug 1825: updating swiss.pm/tests to try out TagTree (passes all tests). > Need to update Handler.t and related modules still... > ------------------------------------------------------------------------ > r14541 | heikki | 2008-02-25 00:10:48 -0600 (Mon, 25 Feb 2008) | 1 line > > documentation for the GN line parsing and management > ------------------------------------------------------------------------ > r14538 | heikki | 2008-02-23 08:48:23 -0600 (Sat, 23 Feb 2008) | 1 line > > GN (Gene Name) line parsing rewrite. Breaks backward compatibility. Can now > deal with >1 gene per entry and four categories of names per gene. Parses old > style syntax (...OR ... OR ... ) into one gene name and synonyms for each > gene. Docs to follow. > > .... > > I just updated all code from dev and reran bioperl-db tests w/o problems. > Maybe someone else could do the same to see what happens? > >> It's worth noting that BioSQL itself can't really represent nested >> annotation collections other than by using ontology terms and their >> hierarchy, which at present I think isn't really appropriate, but I have to >> think through the issue more. In other words, in BioSQL you can't directly >> tie together a bunch of qualifier value pairs into a "bag" and then nest >> this bag within another. The way to make this work with the current schema >> is to flatten out the nesting. >> >> -hilmar >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== > > Might be worth looking into for a future BioSQL release, but we have a decent > workaround in place for now, as long as it works cross-platform and > cross-RDB. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Apr 17 13:06:39 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 17 Apr 2008 12:06:39 -0500 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: Stefan, 'get_dbxrefs' was introduced in bioperl-live a while back during the feature/annotation rollback detailed here: http://www.bioperl.org/wiki/Feature_Annotation_rollback I still think this is an interfering old bioperl (and maybe bioperl- db) installation causing the problems; I had similar issues at one point and had to find and remove the old installation. It might be worth (1) checking 'perldoc -l Bio::Root::Root', which will give the location of the Bio::Root::Root in lib path being used, and (2) using './Build install uninst=1' to remove any old bioperl/bioperl-db installations. chris On Apr 17, 2008, at 11:40 AM, Stefan Kirov wrote: > Hilmar, > sorry, I missed the part after the stack trace... In any case this > is still problem for me after I updated bioperl-live. > I see this with a number of other tests: > t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 78. > t/04swiss.........dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 6-52 > Failed 47/52 tests, 9.62% okay > t/05seqfeature....ok 4/48Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 72. > t/05seqfeature....FAILED tests 9-48 > Failed 40/48 tests, 16.67% okay > t/06comment.......ok > t/07dblink........ok > t/08genbank.......ok > t/09fuzzy2........ok > t/10ensembl.......ok 1/15Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1420. > t/10ensembl.......dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 3-15 > Failed 13/15 tests, 13.33% okay > t/11locuslink.....ok 4/110Can't locate object method "get_dbxrefs" > via package "Bio::Annotation::OntologyTerm" at /home/kirovs/bioperl- > db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, > line 1. > t/11locuslink.....dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 5-110 > Failed 106/110 tests, 3.64% okay > t/12ontology......ok 1/738Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::GOterm" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 98. > t/12ontology......dubious > Test returned status 255 (wstat 65280, 0xff00) > DIED. FAILED tests 5-738 > Failed 734/738 tests, 0.54% okay > t/13remove........ok 2/59Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 145. > t/13remove........FAILED tests 11-59 > Failed 49/59 tests, 16.95% okay > t/14query.........ok > t/15cluster.......ok 3/160Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1. > t/15cluster.......dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 6-160 > Failed 155/160 tests, 3.12% okay > t/16obda..........ok > > On Thu, 17 Apr 2008, Chris Fields wrote: > >> >> On Apr 17, 2008, at 9:47 AM, Hilmar Lapp wrote: >> >>> On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >>>> In any case I debugged and tracked that down to the RichSeq >>>> adaptor module missing. >>> That almost can't be the problem. Every Bio::Seq::RichSeq is-a >>> Bio::Seq and a SeqAdaptor is present. >>> I'm afraid it gets stuck somewhere else and frankly I didn't see >>> the RichSeqAdaptor failing to load in your stack trace: >>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> >>>> MSG: Annotation of class Bio::Annotation::Collection not >>>> type-mapped. Internal error? >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw >>>> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >>>> STACK: >>>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >>>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >>>> STACK: >>>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >>>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >>>> STACK: Bio::DB::Persistent::PersistentObject::store >>>> Bio/DB/Persistent/PersistentObject.pm:271 >>>> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >>>> Bio/DB/BioSQL/SeqAdaptor.pm:224 >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>>> STACK: Bio::DB::Persistent::PersistentObject::create >>>> Bio/DB/Persistent/PersistentObject.pm:244 >>>> STACK: t/04swiss.t:36 >>>> ----------------------------------------------------------- >>> What that tells me is that when bioperl-db tries to store the >>> annotation bundle of the (SwissProt) sequence, one of the >>> annotations that it encounters is of type >>> Bio::Annotation::Collection. At present bioperl-db doesn't know >>> what to do with it; i.e., bioperl-db can't yet handle hierarchical >>> annotation collections (collections within collections). >>> I believe this is due to recent changes in how the GN line is >>> parsed in BioPerl - Chris does this ring the right bell? I thought >>> though you had built in a method would allow flattening out >> >> This appears to be using an older bioperl-live checkout, one where >> Heikki changed GN parsing to use a nested Annotation::Collection. >> I reverted that back in a later commit to svn specifically b/c of >> bioperl-db problems. bioperl-live's swiss.pm now uses a new >> subclass of Bio::Annotation::SimpleValue (Bio::Annotation::TagTree) >> that represents nested values via Data::Stag's itext output (we can >> change that to alternatives if needed). >> >> Here are the last few relevant revisions in bioperl-live's main >> trunk (mine is the latest): >> >> ------------------------------------------------------------------------ >> r14562 | cjfields | 2008-02-28 08:30:05 -0600 (Thu, 28 Feb 2008) | >> 1 line >> >> bug 1825: updating swiss.pm/tests to try out TagTree (passes all >> tests). Need to update Handler.t and related modules still... >> ------------------------------------------------------------------------ >> r14541 | heikki | 2008-02-25 00:10:48 -0600 (Mon, 25 Feb 2008) | 1 >> line >> >> documentation for the GN line parsing and management >> ------------------------------------------------------------------------ >> r14538 | heikki | 2008-02-23 08:48:23 -0600 (Sat, 23 Feb 2008) | 1 >> line >> >> GN (Gene Name) line parsing rewrite. Breaks backward compatibility. >> Can now deal with >1 gene per entry and four categories of names >> per gene. Parses old style syntax (...OR ... OR ... ) into one gene >> name and synonyms for each gene. Docs to follow. >> >> .... >> >> I just updated all code from dev and reran bioperl-db tests w/o >> problems. Maybe someone else could do the same to see what happens? >> >>> It's worth noting that BioSQL itself can't really represent nested >>> annotation collections other than by using ontology terms and >>> their hierarchy, which at present I think isn't really >>> appropriate, but I have to think through the issue more. In other >>> words, in BioSQL you can't directly tie together a bunch of >>> qualifier value pairs into a "bag" and then nest this bag within >>> another. The way to make this work with the current schema is to >>> flatten out the nesting. >>> >>> -hilmar >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >> >> Might be worth looking into for a future BioSQL release, but we >> have a decent workaround in place for now, as long as it works >> cross-platform and cross-RDB. >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Thu Apr 17 13:52:22 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 13:52:22 -0400 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: <48078E56.9000404@bms.com> Chris Fields wrote: > Stefan, > > 'get_dbxrefs' was introduced in bioperl-live a while back during the > feature/annotation rollback detailed here: > > http://www.bioperl.org/wiki/Feature_Annotation_rollback > Chris was right! > I still think this is an interfering old bioperl (and maybe > bioperl-db) installation causing the problems; I had similar issues at > one point and had to find and remove the old installation. It might > be worth (1) checking 'perldoc -l Bio::Root::Root', This is the first thing I did and it seemed fine from command line. So I checked a new copy (vs. updating), set PERL5LIB to the minimum which is necessary (Build changes INC), which is /home/kirovs/bioperl-db/bplive:/stf/sysdev/perl/newlib/perl/lib/5.8/ia64-linux-multi/ (/home/kirovs/bioperl-db/bplive being the fresh copy and the other having Module::Build, etc., but definitely no bioperl). This fixed the problem. I still do not see where the old module came from, but that was a really good guess. Thanks Stefan > which will give the location of the Bio::Root::Root in lib path being > used, and (2) using './Build install uninst=1' to remove any old > bioperl/bioperl-db installations. Unfortunately this is not an option for me. > > chris > > On Apr 17, 2008, at 11:40 AM, Stefan Kirov wrote: > >> Hilmar, >> sorry, I missed the part after the stack trace... In any case this is >> still problem for me after I updated bioperl-live. >> I see this with a number of other tests: >> t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" via >> package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 78. >> t/04swiss.........dubious >> Test returned status 2 (wstat 512, 0x200) >> DIED. FAILED tests 6-52 >> Failed 47/52 tests, 9.62% okay >> t/05seqfeature....ok 4/48Can't locate object method "get_dbxrefs" via >> package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 72. >> t/05seqfeature....FAILED tests 9-48 >> Failed 40/48 tests, 16.67% okay >> t/06comment.......ok >> t/07dblink........ok >> t/08genbank.......ok >> t/09fuzzy2........ok >> t/10ensembl.......ok 1/15Can't locate object method "get_dbxrefs" via >> package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 1420. >> t/10ensembl.......dubious >> Test returned status 2 (wstat 512, 0x200) >> DIED. FAILED tests 3-15 >> Failed 13/15 tests, 13.33% okay >> t/11locuslink.....ok 4/110Can't locate object method "get_dbxrefs" >> via package "Bio::Annotation::OntologyTerm" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 1. >> t/11locuslink.....dubious >> Test returned status 2 (wstat 512, 0x200) >> DIED. FAILED tests 5-110 >> Failed 106/110 tests, 3.64% okay >> t/12ontology......ok 1/738Can't locate object method "get_dbxrefs" >> via package "Bio::Ontology::GOterm" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 98. >> t/12ontology......dubious >> Test returned status 255 (wstat 65280, 0xff00) >> DIED. FAILED tests 5-738 >> Failed 734/738 tests, 0.54% okay >> t/13remove........ok 2/59Can't locate object method "get_dbxrefs" via >> package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 145. >> t/13remove........FAILED tests 11-59 >> Failed 49/59 tests, 16.95% okay >> t/14query.........ok >> t/15cluster.......ok 3/160Can't locate object method "get_dbxrefs" >> via package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 1. >> t/15cluster.......dubious >> Test returned status 2 (wstat 512, 0x200) >> DIED. FAILED tests 6-160 >> Failed 155/160 tests, 3.12% okay >> t/16obda..........ok >> >> On Thu, 17 Apr 2008, Chris Fields wrote: >> >>> >>> On Apr 17, 2008, at 9:47 AM, Hilmar Lapp wrote: >>> >>>> On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >>>>> In any case I debugged and tracked that down to the RichSeq >>>>> adaptor module missing. >>>> That almost can't be the problem. Every Bio::Seq::RichSeq is-a >>>> Bio::Seq and a SeqAdaptor is present. >>>> I'm afraid it gets stuck somewhere else and frankly I didn't see >>>> the RichSeqAdaptor failing to load in your stack trace: >>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> >>>>> MSG: Annotation of class Bio::Annotation::Collection not >>>>> type-mapped. Internal error? >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw >>>>> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >>>>> STACK: >>>>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >>>>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >>>>> STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >>>>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >>>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >>>>> STACK: Bio::DB::Persistent::PersistentObject::store >>>>> Bio/DB/Persistent/PersistentObject.pm:271 >>>>> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >>>>> Bio/DB/BioSQL/SeqAdaptor.pm:224 >>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>>>> STACK: Bio::DB::Persistent::PersistentObject::create >>>>> Bio/DB/Persistent/PersistentObject.pm:244 >>>>> STACK: t/04swiss.t:36 >>>>> ----------------------------------------------------------- >>>> What that tells me is that when bioperl-db tries to store the >>>> annotation bundle of the (SwissProt) sequence, one of the >>>> annotations that it encounters is of type >>>> Bio::Annotation::Collection. At present bioperl-db doesn't know >>>> what to do with it; i.e., bioperl-db can't yet handle hierarchical >>>> annotation collections (collections within collections). >>>> I believe this is due to recent changes in how the GN line is >>>> parsed in BioPerl - Chris does this ring the right bell? I thought >>>> though you had built in a method would allow flattening out >>> >>> This appears to be using an older bioperl-live checkout, one where >>> Heikki changed GN parsing to use a nested Annotation::Collection. I >>> reverted that back in a later commit to svn specifically b/c of >>> bioperl-db problems. bioperl-live's swiss.pm now uses a new subclass >>> of Bio::Annotation::SimpleValue (Bio::Annotation::TagTree) that >>> represents nested values via Data::Stag's itext output (we can >>> change that to alternatives if needed). >>> >>> Here are the last few relevant revisions in bioperl-live's main >>> trunk (mine is the latest): >>> >>> ------------------------------------------------------------------------ >>> >>> r14562 | cjfields | 2008-02-28 08:30:05 -0600 (Thu, 28 Feb 2008) | 1 >>> line >>> >>> bug 1825: updating swiss.pm/tests to try out TagTree (passes all >>> tests). Need to update Handler.t and related modules still... >>> ------------------------------------------------------------------------ >>> >>> r14541 | heikki | 2008-02-25 00:10:48 -0600 (Mon, 25 Feb 2008) | 1 line >>> >>> documentation for the GN line parsing and management >>> ------------------------------------------------------------------------ >>> >>> r14538 | heikki | 2008-02-23 08:48:23 -0600 (Sat, 23 Feb 2008) | 1 line >>> >>> GN (Gene Name) line parsing rewrite. Breaks backward compatibility. >>> Can now deal with >1 gene per entry and four categories of names per >>> gene. Parses old style syntax (...OR ... OR ... ) into one gene name >>> and synonyms for each gene. Docs to follow. >>> >>> .... >>> >>> I just updated all code from dev and reran bioperl-db tests w/o >>> problems. Maybe someone else could do the same to see what happens? >>> >>>> It's worth noting that BioSQL itself can't really represent nested >>>> annotation collections other than by using ontology terms and their >>>> hierarchy, which at present I think isn't really appropriate, but I >>>> have to think through the issue more. In other words, in BioSQL you >>>> can't directly tie together a bunch of qualifier value pairs into a >>>> "bag" and then nest this bag within another. The way to make this >>>> work with the current schema is to flatten out the nesting. >>>> >>>> -hilmar >>>> --=========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>> >>> Might be worth looking into for a future BioSQL release, but we have >>> a decent workaround in place for now, as long as it works >>> cross-platform and cross-RDB. >>> >>> chris >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From hubert.gaynor at yahoo.com Thu Apr 17 20:53:16 2008 From: hubert.gaynor at yahoo.com (Hubert Gaynor) Date: Thu, 17 Apr 2008 17:53:16 -0700 (PDT) Subject: [Bioperl-l] Can I use BLAST against a database like MySQL Message-ID: <130971.67684.qm@web46007.mail.sp1.yahoo.com> Hi Sean, I got it. Thank you so much! Hubert ----- Original Message ---- From: Sean Davis To: Hubert Gaynor Sent: Thursday, April 17, 2008 6:36:02 PM Subject: Re: [Bioperl-l] Can I use BLAST against a database like MySQL On Thu, Apr 17, 2008 at 2:19 AM, Hubert Gaynor wrote: > Hi, > > As far as I know, before using BLAST to do the alignment the first thing should be done is typing formatdb to construct a database. But I was wondering whether it is possible to construct a database with MySQL which probably will grant the BLAST search a higher speed and make the database management much easier? > formatdb is used to make a representation that can be used efficiently by blast. That representation already makes blast faster. MySQL can't be used for such things. As for speeding blast, if you have a multiprocessor machine, you can take advantage of those using blast and increasing the number of processors. Also, while blast is a very versatile program, it is not the only alignment program available. Depending on your needs, you could look at other programs such as blat or gmap that can be 2-3 orders of magnitude faster than blast. Sean ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From Russell.Smithies at agresearch.co.nz Thu Apr 17 21:39:23 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 18 Apr 2008 13:39:23 +1200 Subject: [Bioperl-l] accessing params for custom glyphs? In-Reply-To: <130971.67684.qm@web46007.mail.sp1.yahoo.com> References: <130971.67684.qm@web46007.mail.sp1.yahoo.com> Message-ID: This is probably more of a Perl OO problem I'm having, but can anyone tell me how to access a parameter when I create a custom glyph? I've created a panel in the usual way then I add a feature with 'my_glyph' and want to pass the value of -new_parameter to the glyph drawing code. $panel->add_track( $feature, -font => gdSmallFont, -glyph => 'my_glyph' , -height => 10, -label => 1, -strand => "forward", -new_parameter => "test", In my_glyph.pm, I have the usual draw_component sub: sub draw_component { my $self = shift; my $gd = shift; my ($x1,$y1,$x2,$y2) = $self->bounds(@_); my $fg = $self->fgcolor; my $params = $self->?????????? <<--- how do I access the value of "new_parameter" set in the panel drawing code? $gd->line($x1,$y1,$x2,$y2,$fg); $gd->line($x1,$y2,$x2,$y1,$fg); } Any ideas? Thanx, Russell Smithies ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From David.Messina at sbc.su.se Fri Apr 18 05:31:59 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 18 Apr 2008 11:31:59 +0200 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: <628aabb70804170155n4e5dfd81r7020c3e9e11094ff@mail.gmail.com> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> <628aabb70804161112o6610ee1fkfb4b08e74730237d@mail.gmail.com> <1208420674.23342.15.camel@razor.sbc.su.se> <628aabb70804170155n4e5dfd81r7020c3e9e11094ff@mail.gmail.com> Message-ID: <628aabb70804180231p2b9cef9dwd5441e85c31531fd@mail.gmail.com> Jacob, I talked about your question with a colleague of mine who has been working in this area. Below is his reply. [I'm reposting this *without* the attachment mentioned since the mailing list wouldn't accept it otherwise. If anyone wants a copy of the code, just email me.] Dave ------- > 3. Pfam has this capability, i.e. to show all domains with a given > architecture, but it is difficult to get at the actual sequences or > even a list of accession numbers. First, this should be available right away in PfamAlyser: http://pfamalyzer.sbc.su.se/pfamalyzer/index.html although you might need to upgrade your browser to Java 1.6 to get it to work. If this does not work as suggested (an upgraded version is coming eventually), have a look at the file: ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/swisspfam.gz which contains the Pfam architectures for all UniProt sequences. You can parse that to get a file of - correspondences and just filter that to get the accession numbers. (Please find attached a Perl script to do just that.) Under UNIX, you can then just grep this for the domain IDs, (like grep domainArchitectureFile.txt PF00008 | grep PF00456 > resultFile.txt) but I am sure there are solutions under other operating systems as well. You could then write a script to parse out the corresponding sequences from the UniProt fasta flatfile, if you wanted, or (again under UNIX) a script to wget them of the webpage. In case your sequences are not in UniProt, consider using HMMER and the Pfam HMM files to assign domains to all sequences in your dataset. I would then parse the HMMER output into the same format as the above, and use the same approach following that. Hope this helps, Yours sincerely, Kristoffer Forslund krifo at sbc.su.se From lincoln.stein at gmail.com Fri Apr 18 15:16:19 2008 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 18 Apr 2008 15:16:19 -0400 Subject: [Bioperl-l] [Gmod-gbrowse] accessing params for custom glyphs? In-Reply-To: References: <130971.67684.qm@web46007.mail.sp1.yahoo.com> Message-ID: <6dce9a0b0804181216q6564e580u8a805ae96c78df2e@mail.gmail.com> Hi Russell, It's very simple: my $params = $self->option('new_parameter'); Lincoln On Thu, Apr 17, 2008 at 9:39 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > This is probably more of a Perl OO problem I'm having, but can anyone > tell me how to access a parameter when I create a custom glyph? > > I've created a panel in the usual way then I add a feature with > 'my_glyph' and want to pass the value of -new_parameter to the glyph > drawing code. > > $panel->add_track( $feature, > -font => gdSmallFont, > -glyph => 'my_glyph' , > -height => 10, > -label => 1, > -strand => "forward", > -new_parameter => "test", > > > In my_glyph.pm, I have the usual draw_component sub: > > sub draw_component { > my $self = shift; > my $gd = shift; > my ($x1,$y1,$x2,$y2) = $self->bounds(@_); > my $fg = $self->fgcolor; > my $params = $self->?????????? <<--- how do I access the value of > "new_parameter" set in the panel drawing code? > > $gd->line($x1,$y1,$x2,$y2,$fg); > $gd->line($x1,$y2,$x2,$y1,$fg); > > } > > Any ideas? > > Thanx, > > Russell Smithies > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From jason at bioperl.org Fri Apr 18 22:35:10 2008 From: jason at bioperl.org (Jason Stajich) Date: Fri, 18 Apr 2008 19:35:10 -0700 Subject: [Bioperl-l] index::abstract on win and unix In-Reply-To: <1208381947.16620.6.camel@kiss-laptop> References: <1208366718.19084.15.camel@kiss-laptop> <1208381947.16620.6.camel@kiss-laptop> Message-ID: do you want the LOCUS or the ACCESSION? Do you mean the result is the completely wrong record or just the wrong field? accession number is available from the seq's accession_number() method. -jason On Apr 16, 2008, at 2:39 PM, Fr?d?ric Romagn? wrote: > Well, if with input file you mean the database used, it's created > with Bio::Index::GenBank from a ncbi FTP's genbank file. > > $id is an accession number read from a file but i chomp the line... > > I am trying to install the svn version of bioperl under windows to see > if there is an improvement. > > Le jeudi 17 avril 2008 ? 08:49 +1200, Smithies, Russell a ?crit : >> Did you check the format of your input file? >> i.e. DOS or UNIX line endings? >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open- >>> bio.org] On Behalf Of Fr?d?ric Romagn? >>> Sent: Thursday, 17 April 2008 5:25 a.m. >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix >>> >>> Hello, >>> i made a program which use Bio::Index::GenBank and i tested it under >>> unix, that worked well. >>> >>> But i have to launch it under windows and it seems not to work on. >>> >>> Here is the problem : >>> >>> my $dbobj = Bio::Index::Abstract->new("Data/$db"); >>> my $seq = $dbobj->get_Seq_by_acc($id); >>> print $seq->display_id."\n"; >>> >>> did not print the same number than $id !!! So i don't work on the >>> sequence expected... >>> >>> I use the SVN sources on unix and the Perl package manager for >>> windows... >>> >>> Thanks. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> ===================================================================== >> == >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> ===================================================================== >> == > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bioperlanand at yahoo.com Mon Apr 21 03:44:00 2008 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Mon, 21 Apr 2008 00:44:00 -0700 (PDT) Subject: [Bioperl-l] a question on obtaining HTML formatted Blast output along with the Blast hits image Message-ID: <372845.37134.qm@web36808.mail.mud.yahoo.com> Hi everybody, I would like to obtain a HTML formatted blast report output along with a picture of the blast hits as shown on Slide 60 in this pdf: http://jason.open-bio.org/Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf I have gotten the HTML output working using "Bio::SearchIO::Writer::HTMLResultWriter". My question: How do I integrate it with Bio:Graphics to render the blast hits image at the correct position in my Bioperl reformatted html file. I ultimately want to be able to display my blast output files on a browser. Here is my code so far: ---------------------------------------------------------------- #!/usr/bin/perl -w # usage: $0 use strict; use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; my $infile = shift or die $!; my $searchio = new Bio::SearchIO( -format => 'blast',-file => $infile ); my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">${infile}.html"); $outhtml->write_result($searchio->next_result); ---------------------------------------------------------------- Thanks in advance, Anand --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From cjfields at uiuc.edu Mon Apr 21 11:07:17 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 Apr 2008 10:07:17 -0500 Subject: [Bioperl-l] [Proposed change] HSP::frame() Message-ID: I have noticed (in relation to bug 2485, http://bugzilla.open-bio.org/show_bug.cgi?id=2485) that the Bio::Search::HSP::GenericHSP frame() method is implemented very differently from strand(), start(), end(), and most other HSP methods. The current behavior is to return an array of two values (query and hit frame) under list conditions, the query frame if one value is passed, and the subject frame if no value is passed under scalar context and both under list context. The latter behavior is unfortunately leading to the aforementioned bug above. The method is also implied to be a getter/setter, but the implementation doesn't allow that; it always sets to the instantiated values (in fact, repeatedly so). In order to fix that and make the interface more consistent I am changing frame() to behave like strand(), etc., in that the first argument is 'query/subject/hit/list' (default = 'query' if no arg specified) and the rest optional values for setting, in query/subject order. One issue: I can catch and imitate most of the older behavior with a few additional checks, the one exception being the old frame() default return value which is now 'query' (not context-dependent). If needed we can change the default to 'hit', but I believe method consistency is probably the better route, and I can always add a warning under old API circumstances indicating the change. I am also modifying HSPTableWriter to print frame_hit and frame_query (previously it was only printing 'frame', which implied the hit frame). I can see this being an issue with anyone expecting 'frame' instead of 'frame_hit'; I could hack in a fix for that if needed. If there aren't any objections or suggestions, I'll commit this in the next day or two. chris From cjfields at uiuc.edu Mon Apr 21 11:32:59 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 Apr 2008 10:32:59 -0500 Subject: [Bioperl-l] Assembly.t test fails Message-ID: I'm getting some significant test failures in bioperl-live for Bio::Assembly: t/Assembly...... 1..35 ok 1 - use Bio::Assembly::IO; ok 2 - The object isa Bio::Assembly::IO ok 3 - The object isa Bio::Assembly::Scaffold ok 4 not ok 5 ok 6 - The object isa Bio::AnnotationCollectionI ok 7 - no annotations in Annotation collection? ok 8 # Failed test at t/Assembly.t line 35. # got: 'NoName' # expected: 'test' Can't locate object method "get_contig_seq_ids" via package "Bio::Assembly::Contig" at /Users/cjfields/bioperl/bioperl-live/blib/ lib/Bio/Assembly/Scaffold.pm line 189, line 733. # Looks like you planned 35 tests but only ran 8. # Looks like you failed 1 test of 8 run. # Looks like your test died just after 8. Dubious, test returned 255 (wstat 65280, 0xff00) Failed 28/35 subtests Test Summary Report ------------------- t/Assembly.t (Wstat: 65280 Tests: 8 Failed: 1) Failed test: 5 Non-zero exit status: 255 Parse errors: Bad plan. You planned 35 tests but ran 8. Files=1, Tests=8, 0 wallclock secs ( 0.01 usr 0.00 sys + 0.22 cusr 0.04 csys = 0.27 CPU) Result: FAIL Failed 1/1 test programs. 1/8 subtests failed. chris Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Apr 21 11:44:21 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 Apr 2008 10:44:21 -0500 Subject: [Bioperl-l] Assembly.t test fails In-Reply-To: References: Message-ID: <2F199628-717E-4F88-85D7-408BD7BBE16D@uiuc.edu> Scratch that, figured it out (easy fix). chris On Apr 21, 2008, at 10:32 AM, Chris Fields wrote: > I'm getting some significant test failures in bioperl-live for > Bio::Assembly: > > t/Assembly...... > 1..35 > ok 1 - use Bio::Assembly::IO; > ok 2 - The object isa Bio::Assembly::IO > ok 3 - The object isa Bio::Assembly::Scaffold > ok 4 > not ok 5 > ok 6 - The object isa Bio::AnnotationCollectionI > ok 7 - no annotations in Annotation collection? > ok 8 > > # Failed test at t/Assembly.t line 35. > # got: 'NoName' > # expected: 'test' > Can't locate object method "get_contig_seq_ids" via package > "Bio::Assembly::Contig" at /Users/cjfields/bioperl/bioperl-live/blib/ > lib/Bio/Assembly/Scaffold.pm line 189, line 733. > # Looks like you planned 35 tests but only ran 8. > # Looks like you failed 1 test of 8 run. > # Looks like your test died just after 8. > Dubious, test returned 255 (wstat 65280, 0xff00) > Failed 28/35 subtests > > Test Summary Report > ------------------- > t/Assembly.t (Wstat: 65280 Tests: 8 Failed: 1) > Failed test: 5 > Non-zero exit status: 255 > Parse errors: Bad plan. You planned 35 tests but ran 8. > Files=1, Tests=8, 0 wallclock secs ( 0.01 usr 0.00 sys + 0.22 > cusr 0.04 csys = 0.27 CPU) > Result: FAIL > Failed 1/1 test programs. 1/8 subtests failed. > > > chris > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From frederic.romagne at gmail.com Mon Apr 21 11:53:11 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Mon, 21 Apr 2008 10:53:11 -0500 Subject: [Bioperl-l] index::abstract on win and unix In-Reply-To: References: <1208366718.19084.15.camel@kiss-laptop> <1208381947.16620.6.camel@kiss-laptop> Message-ID: <1208793191.25906.9.camel@kiss-laptop> In fact, i want the whole Bio::Seq object, but the i verified the ACCESSION and the LOCUS are the same in my genbank files. I saw that the program sometimes tells that it cannot find the entry : if( !defined $seq ) { warn("Sequence $id in Database $db is not present\n"); } i suspect the make_index function not to work properly on windows instead of the ?get_Seq_by_acc function... Le vendredi 18 avril 2008 ? 19:35 -0700, Jason Stajich a ?crit : > do you want the LOCUS or the ACCESSION? > Do you mean the result is the completely wrong record or just the > wrong field? > accession number is available from the seq's accession_number() method. > -jason > On Apr 16, 2008, at 2:39 PM, Fr?d?ric Romagn? wrote: > > > Well, if with input file you mean the database used, it's created > > with Bio::Index::GenBank from a ncbi FTP's genbank file. > > > > $id is an accession number read from a file but i chomp the line... > > > > I am trying to install the svn version of bioperl under windows to see > > if there is an improvement. > > > > Le jeudi 17 avril 2008 ? 08:49 +1200, Smithies, Russell a ?crit : > >> Did you check the format of your input file? > >> i.e. DOS or UNIX line endings? > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open- > >>> bio.org] On Behalf Of Fr?d?ric Romagn? > >>> Sent: Thursday, 17 April 2008 5:25 a.m. > >>> To: bioperl-l at lists.open-bio.org > >>> Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix > >>> > >>> Hello, > >>> i made a program which use Bio::Index::GenBank and i tested it under > >>> unix, that worked well. > >>> > >>> But i have to launch it under windows and it seems not to work on. > >>> > >>> Here is the problem : > >>> > >>> my $dbobj = Bio::Index::Abstract->new("Data/$db"); > >>> my $seq = $dbobj->get_Seq_by_acc($id); > >>> print $seq->display_id."\n"; > >>> > >>> did not print the same number than $id !!! So i don't work on the > >>> sequence expected... > >>> > >>> I use the SVN sources on unix and the Perl package manager for > >>> windows... > >>> > >>> Thanks. > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> ===================================================================== > >> == > >> Attention: The information contained in this message and/or > >> attachments > >> from AgResearch Limited is intended only for the persons or entities > >> to which it is addressed and may contain confidential and/or > >> privileged > >> material. Any review, retransmission, dissemination or other use > >> of, or > >> taking of any action in reliance upon, this information by persons or > >> entities other than the intended recipients is prohibited by > >> AgResearch > >> Limited. If you have received this message in error, please notify > >> the > >> sender immediately. > >> ===================================================================== > >> == > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ewijaya at gmail.com Tue Apr 22 10:03:07 2008 From: ewijaya at gmail.com (Edward Wijaya) Date: Tue, 22 Apr 2008 22:03:07 +0800 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output Message-ID: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> Hi, Is there any module that can parse the following output of BLAT. This is taken from UCSC browser. The idea is to parse it and then extract the conserved block of aligned sequences. __DATA__ Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps B D D. melanogaster tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa B D D. simulans tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa B D D. sechellia tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa B D D. yakuba tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa D. erecta tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa D. ananassae taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- D. pseudoobscura tata----ccagtacac-cttatatg------------tttttaaata-------------------- B D D. persimilis tata----ccagtacac-attatatg------------tttttaaata-------------------- D. willistoni aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa D. virilis -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa D. mojavensis -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa D. grimshawi ==================================================================== T. castaneum ==================================================================== Inserts between block 3 and 4 in window D. pseudoobscura 2008bp B D D. persimilis 1421bp D. virilis 5bp D. mojavensis 4640bp Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps B D D. melanogaster ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga B D D. simulans ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga B D D. sechellia ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga B D D. yakuba ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga D. erecta ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga D. pseudoobscura ==================================================================== B D D. persimilis ==================================================================== D. willistoni ----aggattacgaagttcctttat-------------------aaag-------------------- D. virilis gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- D. mojavensis ==================================================================== D. grimshawi ==================================================================== T. castaneum ==================================================================== __ END__ From cjfields at uiuc.edu Tue Apr 22 10:22:45 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 22 Apr 2008 09:22:45 -0500 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output In-Reply-To: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> References: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> Message-ID: <766FDF9E-9F7B-4826-B7FA-87DF3B074EBC@uiuc.edu> A quick grep of bioperl-live gets me Bio::SearchIO::blast, Bio::SearchIO::axt, Bio::SearchIO::psl, Bio::Tools::Blat, and Bio::Tools::WebBlat. Haven't looked at the docs but it's a start! chris On Apr 22, 2008, at 9:03 AM, Edward Wijaya wrote: > Hi, > > Is there any module that can parse the following output > of BLAT. This is taken from UCSC browser. > > The idea is to parse it and then extract the conserved block > of aligned sequences. > > > __DATA__ > Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps > B D D. melanogaster > tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa > B D D. simulans > tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa > B D D. sechellia > tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa > B D D. yakuba > tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa > D. erecta > tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa > D. ananassae > taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- > D. pseudoobscura > tata----ccagtacac-cttatatg------------tttttaaata-------------------- > B D D. persimilis > tata----ccagtacac-attatatg------------tttttaaata-------------------- > D. willistoni > aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa > D. virilis > -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa > D. mojavensis > -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa > D. grimshawi > ==================================================================== > T. castaneum > ==================================================================== > > Inserts between block 3 and 4 in window > D. pseudoobscura 2008bp > B D D. persimilis 1421bp > D. virilis 5bp > D. mojavensis 4640bp > > Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps > B D D. melanogaster > ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga > B D D. simulans > ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga > B D D. sechellia > ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga > B D D. yakuba > ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga > D. erecta > ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga > D. pseudoobscura > ==================================================================== > B D D. persimilis > ==================================================================== > D. willistoni > ----aggattacgaagttcctttat-------------------aaag-------------------- > D. virilis > gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- > D. mojavensis > ==================================================================== > D. grimshawi > ==================================================================== > T. castaneum > ==================================================================== > > __ END__ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Apr 22 10:59:25 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 22 Apr 2008 09:59:25 -0500 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output In-Reply-To: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> References: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> Message-ID: <4F3522BB-28F0-44A8-8DE1-7CF3F648402A@uiuc.edu> A quick grep of bioperl-live gets me Bio::SearchIO::blast, Bio::SearchIO::axt, Bio::SearchIO::psl, Bio::Tools::Blat, and Bio::Tools::WebBlat. Haven't looked at the docs but it's a start! chris On Apr 22, 2008, at 9:03 AM, Edward Wijaya wrote: > Hi, > > Is there any module that can parse the following output > of BLAT. This is taken from UCSC browser. > > The idea is to parse it and then extract the conserved block > of aligned sequences. > > > __DATA__ > Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps > B D D. melanogaster > tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa > B D D. simulans > tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa > B D D. sechellia > tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa > B D D. yakuba > tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa > D. erecta > tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa > D. ananassae > taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- > D. pseudoobscura > tata----ccagtacac-cttatatg------------tttttaaata-------------------- > B D D. persimilis > tata----ccagtacac-attatatg------------tttttaaata-------------------- > D. willistoni > aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa > D. virilis > -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa > D. mojavensis > -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa > D. grimshawi > ==================================================================== > T. castaneum > ==================================================================== > > Inserts between block 3 and 4 in window > D. pseudoobscura 2008bp > B D D. persimilis 1421bp > D. virilis 5bp > D. mojavensis 4640bp > > Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps > B D D. melanogaster > ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga > B D D. simulans > ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga > B D D. sechellia > ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga > B D D. yakuba > ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga > D. erecta > ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga > D. pseudoobscura > ==================================================================== > B D D. persimilis > ==================================================================== > D. willistoni > ----aggattacgaagttcctttat-------------------aaag-------------------- > D. virilis > gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- > D. mojavensis > ==================================================================== > D. grimshawi > ==================================================================== > T. castaneum > ==================================================================== > > __ END__ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Tue Apr 22 14:49:32 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 22 Apr 2008 11:49:32 -0700 Subject: [Bioperl-l] Fwd: [blast-announce] New BLAST URL available at the NCBI References: Message-ID: Does anyone want to take a look at how to use these URLs in the RemoteBlast module, if the interface is the same? -jason Begin forwarded message: > From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" > Date: April 22, 2008 11:35:04 AM PDT > To: > Subject: [blast-announce] New BLAST URL available at the NCBI > > New BLAST URL available at the NCBI > > > > The NCBI has activated a new URL for BLAST searches at the NCBI: > http://blast.ncbi.nlm.nih.gov. > > > > Searches sent to this URL can take advantage of a larger number of > machines for searches and the system has a better overall fault > tolerance. > > > > We recommend migration of all BLAST links and bookmarks (e.g., > http://www.ncbi.nlm.nih.gov/BLAST/ and > http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi) to the new URL. > > > > Links on the NCBI and BLAST home pages will start to change in the > coming weeks. > > > > At this point in time the plans are to also maintain the current BLAST > URL. > > > > > From jason at bioperl.org Tue Apr 22 14:51:08 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 22 Apr 2008 11:51:08 -0700 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output In-Reply-To: <766FDF9E-9F7B-4826-B7FA-87DF3B074EBC@uiuc.edu> References: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> <766FDF9E-9F7B-4826-B7FA-87DF3B074EBC@uiuc.edu> Message-ID: <6C812413-B375-427B-9AF8-5A0AA6167CC8@bioperl.org> if you get it as axt it should parse fine in SearchIO but that is pairwise, if you can get an alignment blocks I can't remember what format this is from UCSC. MSAs are going to be better handed through Bio::AlignIO though so it might be better to build a parser on that. On Apr 22, 2008, at 7:22 AM, Chris Fields wrote: > A quick grep of bioperl-live gets me Bio::SearchIO::blast, > Bio::SearchIO::axt, Bio::SearchIO::psl, Bio::Tools::Blat, and > Bio::Tools::WebBlat. Haven't looked at the docs but it's a start! > > chris > > On Apr 22, 2008, at 9:03 AM, Edward Wijaya wrote: > >> Hi, >> >> Is there any module that can parse the following output >> of BLAT. This is taken from UCSC browser. >> >> The idea is to parse it and then extract the conserved block >> of aligned sequences. >> >> >> __DATA__ >> Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps >> B D D. melanogaster >> tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa >> B D D. simulans >> tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa >> B D D. sechellia >> tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa >> B D D. yakuba >> tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa >> D. erecta >> tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa >> D. ananassae >> taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- >> D. pseudoobscura >> tata----ccagtacac-cttatatg------------tttttaaata-------------------- >> B D D. persimilis >> tata----ccagtacac-attatatg------------tttttaaata-------------------- >> D. willistoni >> aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa >> D. virilis >> -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa >> D. mojavensis >> -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa >> D. grimshawi >> ==================================================================== >> T. castaneum >> ==================================================================== >> >> Inserts between block 3 and 4 in window >> D. pseudoobscura 2008bp >> B D D. persimilis 1421bp >> D. virilis 5bp >> D. mojavensis 4640bp >> >> Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps >> B D D. melanogaster >> ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga >> B D D. simulans >> ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga >> B D D. sechellia >> ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga >> B D D. yakuba >> ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga >> D. erecta >> ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga >> D. pseudoobscura >> ==================================================================== >> B D D. persimilis >> ==================================================================== >> D. willistoni >> ----aggattacgaagttcctttat-------------------aaag-------------------- >> D. virilis >> gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- >> D. mojavensis >> ==================================================================== >> D. grimshawi >> ==================================================================== >> T. castaneum >> ==================================================================== >> >> __ END__ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Apr 22 15:02:14 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 22 Apr 2008 14:02:14 -0500 Subject: [Bioperl-l] Fwd: [blast-announce] New BLAST URL available at the NCBI In-Reply-To: References: Message-ID: <13C2AD96-8297-40DD-ADCC-B2BEC923B9E0@uiuc.edu> They work exactly the same as the old URL, at least on the surface; I haven't tried changing many URLAPI parameters. I went ahead and changed the URL in RemoteBlast to http://blast.ncbi.nlm.nih.gov/Blast.cgi as it works with RemoteBlast.t. chris On Apr 22, 2008, at 1:49 PM, Jason Stajich wrote: > Does anyone want to take a look at how to use these URLs in the > RemoteBlast module, if the interface is the same? > > -jason > > Begin forwarded message: > >> From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" >> >> Date: April 22, 2008 11:35:04 AM PDT >> To: >> Subject: [blast-announce] New BLAST URL available at the NCBI >> >> New BLAST URL available at the NCBI >> >> >> >> The NCBI has activated a new URL for BLAST searches at the NCBI: >> http://blast.ncbi.nlm.nih.gov. >> >> >> >> Searches sent to this URL can take advantage of a larger number of >> machines for searches and the system has a better overall fault >> tolerance. >> >> >> >> We recommend migration of all BLAST links and bookmarks (e.g., >> http://www.ncbi.nlm.nih.gov/BLAST/ and >> http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi) to the new URL. >> >> >> >> Links on the NCBI and BLAST home pages will start to change in the >> coming weeks. >> >> >> >> At this point in time the plans are to also maintain the current >> BLAST >> URL. >> >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Apr 22 14:58:40 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 22 Apr 2008 13:58:40 -0500 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output In-Reply-To: <6C812413-B375-427B-9AF8-5A0AA6167CC8@bioperl.org> References: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> <766FDF9E-9F7B-4826-B7FA-87DF3B074EBC@uiuc.edu> <6C812413-B375-427B-9AF8-5A0AA6167CC8@bioperl.org> Message-ID: <43344C89-6B4D-4360-AF56-A6FDD065FFF3@uiuc.edu> Related to that, I have thought about building a parser for some of the query-anchored alignments produced by blastall, just haven't had time to devote to it. One of these days... chris On Apr 22, 2008, at 1:51 PM, Jason Stajich wrote: > if you get it as axt it should parse fine in SearchIO but that is > pairwise, if you can get an alignment blocks I can't remember what > format this is from UCSC. > MSAs are going to be better handed through Bio::AlignIO though so it > might be better to build a parser on that. > > On Apr 22, 2008, at 7:22 AM, Chris Fields wrote: > >> A quick grep of bioperl-live gets me Bio::SearchIO::blast, >> Bio::SearchIO::axt, Bio::SearchIO::psl, Bio::Tools::Blat, and >> Bio::Tools::WebBlat. Haven't looked at the docs but it's a start! >> >> chris >> >> On Apr 22, 2008, at 9:03 AM, Edward Wijaya wrote: >> >>> Hi, >>> >>> Is there any module that can parse the following output >>> of BLAT. This is taken from UCSC browser. >>> >>> The idea is to parse it and then extract the conserved block >>> of aligned sequences. >>> >>> >>> __DATA__ >>> Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps >>> B D D. melanogaster >>> tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa >>> B D D. simulans >>> tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa >>> B D D. sechellia >>> tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa >>> B D D. yakuba >>> tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa >>> D. erecta >>> tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa >>> D. ananassae >>> taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- >>> D. pseudoobscura >>> tata----ccagtacac-cttatatg------------tttttaaata-------------------- >>> B D D. persimilis >>> tata----ccagtacac-attatatg------------tttttaaata-------------------- >>> D. willistoni >>> aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa >>> D. virilis >>> -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa >>> D. mojavensis >>> -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa >>> D. grimshawi >>> ==================================================================== >>> T. castaneum >>> ==================================================================== >>> >>> Inserts between block 3 and 4 in window >>> D. pseudoobscura 2008bp >>> B D D. persimilis 1421bp >>> D. virilis 5bp >>> D. mojavensis 4640bp >>> >>> Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps >>> B D D. melanogaster >>> ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga >>> B D D. simulans >>> ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga >>> B D D. sechellia >>> ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga >>> B D D. yakuba >>> ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga >>> D. erecta >>> ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga >>> D. pseudoobscura >>> ==================================================================== >>> B D D. persimilis >>> ==================================================================== >>> D. willistoni >>> ----aggattacgaagttcctttat-------------------aaag-------------------- >>> D. virilis >>> gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- >>> D. mojavensis >>> ==================================================================== >>> D. grimshawi >>> ==================================================================== >>> T. castaneum >>> ==================================================================== >>> >>> __ END__ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bioperlanand at yahoo.com Wed Apr 23 02:02:30 2008 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Tue, 22 Apr 2008 23:02:30 -0700 (PDT) Subject: [Bioperl-l] Question on integrating Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter Message-ID: <946658.12337.qm@web36802.mail.mud.yahoo.com> Hi everybody, I would like to use Bio::Graphics in conjunction with Bio::SearchIO::Writer::HTMLResultWriter to obtain a HTML formatted blast report output along with an image of the blast hits as shown on Slide 60 in this pdf: http://jason.open-bio.org/Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf I am able to get the HTML output using "Bio::SearchIO::Writer::HTMLResultWriter" and I am able to get the image using the examples outlined in the Bio::Graphics HOWTO: http://www.bioperl.org/wiki/HOWTO:Graphics My question: How do I integrate Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter to render the blast hits image at the correct position in my BioPerl reformatted html file. I also found that someone else has asked something similar to whatever I am asking & is listed under the "Orphans, Leftovers" category in the ListSummary:April 26-May 9,2006 document: http://www.bioperl.org/wiki/ListSummary:April_26-May_9%2C2006#Orphans.2C_Leftovers Here is my code so far: ---------------------------------------------------------------- #!/usr/bin/perl -w # usage: $0 use strict; use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; my $infile = shift or die $!; my $searchio = new Bio::SearchIO( -format => 'blast',-file => $infile); my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">${infile}.html"); $outhtml->write_result($searchio->next_result); ---------------------------------------------------------------- Thanks in advance, Anand --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From jason at bioperl.org Wed Apr 23 02:15:28 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 22 Apr 2008 23:15:28 -0700 Subject: [Bioperl-l] Question on integrating Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter In-Reply-To: <946658.12337.qm@web36802.mail.mud.yahoo.com> References: <946658.12337.qm@web36802.mail.mud.yahoo.com> Message-ID: <952B0A4E-8A14-4E8E-B36D-14596B20E330@bioperl.org> Basically you want to inject your own IMG tags into the file with these routines: $writerhtml->start_report(\&my_start_report); $writerhtml->title(\&my_title); $writerhtml->hit_link_align(\&my_hit_link_align); $writerhtml->hit_link_desc(\&my_hit_link_desc); fgblast shows a way to do this in part. It relies on Gbrowse to generate the image but you can replace the gbrowse_img reference to your own image generating software. http://people.genome.duke.edu/~jes12/software/scripts/fgblast -jason On Apr 22, 2008, at 11:02 PM, Anand Venkatraman wrote: > Hi everybody, > > I would like to use Bio::Graphics in conjunction with > Bio::SearchIO::Writer::HTMLResultWriter to obtain a HTML formatted > blast report output along with an image of the blast hits as shown > on Slide 60 in this pdf: http://jason.open-bio.org/ > Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf > > I am able to get the HTML output using > "Bio::SearchIO::Writer::HTMLResultWriter" and I am able to get the > image using the examples outlined in the Bio::Graphics HOWTO: > http://www.bioperl.org/wiki/HOWTO:Graphics > > My question: How do I integrate Bio::Graphics with > Bio::SearchIO::Writer::HTMLResultWriter to render the blast hits > image at the correct position in my BioPerl reformatted html file. > > I also found that someone else has asked something similar to > whatever I am asking & is listed under the "Orphans, Leftovers" > category in the ListSummary:April 26-May 9,2006 document: > http://www.bioperl.org/wiki/ListSummary:April_26-May_9% > 2C2006#Orphans.2C_Leftovers > > Here is my code so far: > ---------------------------------------------------------------- > #!/usr/bin/perl -w > # usage: $0 > use strict; > use Bio::SearchIO; > use Bio::SearchIO::Writer::HTMLResultWriter; > > my $infile = shift or die $!; > > my $searchio = new Bio::SearchIO( -format => 'blast',-file => > $infile); > my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); > my $outhtml = new Bio::SearchIO(-writer => $writerhtml, > -file => ">$ > {infile}.html"); > > $outhtml->write_result($searchio->next_result); > ---------------------------------------------------------------- > > Thanks in advance, > > Anand > > > --------------------------------- > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. > Try it now. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bamboowarrior at gmail.com Wed Apr 23 15:39:21 2008 From: bamboowarrior at gmail.com (Arkady) Date: Wed, 23 Apr 2008 14:39:21 -0500 Subject: [Bioperl-l] WebBlat, where'd it go? Message-ID: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> Hi folks, I'm trying to use BioPerl to run a BLAT search on the four primate genomes on UCSC. I understand that the proper tool for this is Bio::Tools::WebBlat. Unfortunately, it doesn't appear to be in my bioperl distribution (nor do I even know how to figure out what version that is, unfortunately, though it's a very recent install -- a month ago?). I also can't find it on CPAN. Is this deprecated? Has something else replaced it? Or are we always supposed to run local BLAT? Thanks. John Woods Institute for Cellular and Molecular Biology The University of Texas at Austin From spiros at lokku.com Wed Apr 23 15:48:12 2008 From: spiros at lokku.com (Spiros Denaxas) Date: Wed, 23 Apr 2008 20:48:12 +0100 Subject: [Bioperl-l] WebBlat, where'd it go? In-Reply-To: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> References: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> Message-ID: Hey, a quick look at the list of deprecated modules reveals that it has indeed been removed, http://www.bioperl.org/wiki/Deprecated_modules Spiros On Wed, Apr 23, 2008 at 8:39 PM, Arkady wrote: > Hi folks, > > I'm trying to use BioPerl to run a BLAT search on the four primate > genomes on UCSC. I understand that the proper tool for this is > Bio::Tools::WebBlat. Unfortunately, it doesn't appear to be in my > bioperl distribution (nor do I even know how to figure out what > version that is, unfortunately, though it's a very recent install -- a > month ago?). I also can't find it on CPAN. Is this deprecated? Has > something else replaced it? Or are we always supposed to run local > BLAT? > > Thanks. > > John Woods > > Institute for Cellular and Molecular Biology > The University of Texas at Austin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Apr 23 15:56:14 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 23 Apr 2008 14:56:14 -0500 Subject: [Bioperl-l] WebBlat, where'd it go? In-Reply-To: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> References: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> Message-ID: It's no longer maintained (deprecated); see the following for an explanation: http://article.gmane.org/gmane.comp.lang.perl.bio.general/13545 Basically, only local BLAT searches are supported through BioPerl. chris On Apr 23, 2008, at 2:39 PM, Arkady wrote: > Hi folks, > > I'm trying to use BioPerl to run a BLAT search on the four primate > genomes on UCSC. I understand that the proper tool for this is > Bio::Tools::WebBlat. Unfortunately, it doesn't appear to be in my > bioperl distribution (nor do I even know how to figure out what > version that is, unfortunately, though it's a very recent install -- a > month ago?). I also can't find it on CPAN. Is this deprecated? Has > something else replaced it? Or are we always supposed to run local > BLAT? > > Thanks. > > John Woods > > Institute for Cellular and Molecular Biology > The University of Texas at Austin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bioperlanand at yahoo.com Wed Apr 23 19:05:27 2008 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Wed, 23 Apr 2008 16:05:27 -0700 (PDT) Subject: [Bioperl-l] Question on integrating Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter In-Reply-To: <952B0A4E-8A14-4E8E-B36D-14596B20E330@bioperl.org> Message-ID: <795696.39415.qm@web36804.mail.mud.yahoo.com> Hi Jason, Thanks for the reply. I am a little lost with the solution suggested. Is that how slide 60 in the pdf is obtained: http://jason.open-bio.org/Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf I guess I am missing something quite obvious, I apologize. What I have & want is this: I have a directory having say 100 different blast reports & hence I am looking to obtain 100 different bioperl formatted blast html outputs with the respective images just as it would appear in the blast report. Thanks, Anand Jason Stajich wrote: Basically you want to inject your own IMG tags into the file with these routines: $writerhtml->start_report(\&my_start_report); $writerhtml->title(\&my_title); $writerhtml->hit_link_align(\&my_hit_link_align); $writerhtml->hit_link_desc(\&my_hit_link_desc); fgblast shows a way to do this in part. It relies on Gbrowse to generate the image but you can replace the gbrowse_img reference to your own image generating software. http://people.genome.duke.edu/~jes12/software/scripts/fgblast -jason On Apr 22, 2008, at 11:02 PM, Anand Venkatraman wrote: Hi everybody, I would like to use Bio::Graphics in conjunction with Bio::SearchIO::Writer::HTMLResultWriter to obtain a HTML formatted blast report output along with an image of the blast hits as shown on Slide 60 in this pdf: http://jason.open-bio.org/Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf I am able to get the HTML output using "Bio::SearchIO::Writer::HTMLResultWriter" and I am able to get the image using the examples outlined in the Bio::Graphics HOWTO: http://www.bioperl.org/wiki/HOWTO:Graphics My question: How do I integrate Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter to render the blast hits image at the correct position in my BioPerl reformatted html file. I also found that someone else has asked something similar to whatever I am asking & is listed under the "Orphans, Leftovers" category in the ListSummary:April 26-May 9,2006 document: http://www.bioperl.org/wiki/ListSummary:April_26-May_9%2C2006#Orphans.2C_Leftovers Here is my code so far: ---------------------------------------------------------------- #!/usr/bin/perl -w # usage: $0 use strict; use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; my $infile = shift or die $!; my $searchio = new Bio::SearchIO( -format => 'blast',-file => $infile); my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">${infile}.html"); $outhtml->write_result($searchio->next_result); ---------------------------------------------------------------- Thanks in advance, Anand --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From jason at bioperl.org Thu Apr 24 14:06:41 2008 From: jason at bioperl.org (Jason Stajich) Date: Thu, 24 Apr 2008 11:06:41 -0700 Subject: [Bioperl-l] Question on integrating Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter In-Reply-To: <795696.39415.qm@web36804.mail.mud.yahoo.com> References: <795696.39415.qm@web36804.mail.mud.yahoo.com> Message-ID: The overview graphic is generated basically from the script in scripts/graphics/search_overview.PLS So you'd have to run that on each report to generate the graphic, then use the other methods to insert images into each rendered HTML report. -jason On Apr 23, 2008, at 4:05 PM, Anand Venkatraman wrote: > Hi Jason, > > Thanks for the reply. > > I am a little lost with the solution suggested. Is that how slide > 60 in the pdf is obtained: http://jason.open-bio.org/ > Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf > > I guess I am missing something quite obvious, I apologize. > > What I have & want is this: I have a directory having say 100 > different blast reports & hence I am looking to obtain 100 > different bioperl formatted blast html outputs with the respective > images just as it would appear in the blast report. > > Thanks, > > Anand > > Jason Stajich wrote: > > Basically you want to inject your own IMG tags into the file with > these routines: > > > $writerhtml->start_report(\&my_start_report); > $writerhtml->title(\&my_title); > $writerhtml->hit_link_align(\&my_hit_link_align); > $writerhtml->hit_link_desc(\&my_hit_link_desc); > > > fgblast shows a way to do this in part. It relies on Gbrowse to > generate the image but you can replace the gbrowse_img reference to > your own image generating software. > http://people.genome.duke.edu/~jes12/software/scripts/fgblast > > > > > -jason > On Apr 22, 2008, at 11:02 PM, Anand Venkatraman wrote: > > Hi everybody, > > > I would like to use Bio::Graphics in conjunction with > Bio::SearchIO::Writer::HTMLResultWriter to obtain a HTML formatted > blast report output along with an image of the blast hits as shown > on Slide 60 in this pdf: http://jason.open-bio.org/ > Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf > > > I am able to get the HTML output using > "Bio::SearchIO::Writer::HTMLResultWriter" and I am able to get the > image using the examples outlined in the Bio::Graphics HOWTO: > http://www.bioperl.org/wiki/HOWTO:Graphics > > > My question: How do I integrate Bio::Graphics with > Bio::SearchIO::Writer::HTMLResultWriter to render the blast hits > image at the correct position in my BioPerl reformatted html file. > > > I also found that someone else has asked something similar to > whatever I am asking & is listed under the "Orphans, Leftovers" > category in the ListSummary:April 26-May 9,2006 document: > http://www.bioperl.org/wiki/ListSummary:April_26-May_9% > 2C2006#Orphans.2C_Leftovers > > > Here is my code so far: > ---------------------------------------------------------------- > #!/usr/bin/perl -w > # usage: $0 > use strict; > use Bio::SearchIO; > use Bio::SearchIO::Writer::HTMLResultWriter; > > > my $infile = shift or die $!; > > > my $searchio = new Bio::SearchIO( -format => 'blast',-file => > $infile); > my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); > my $outhtml = new Bio::SearchIO(-writer => $writerhtml, > -file => ">$ > {infile}.html"); > > > $outhtml->write_result($searchio->next_result); > ---------------------------------------------------------------- > > > Thanks in advance, > > > Anand > > > > > --------------------------------- > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. > Try it now. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > --------------------------------- > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. > Try it now. From 1zoujing at 163.com Wed Apr 16 22:53:16 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 16 Apr 2008 19:53:16 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: References: <16602770.post@talk.nabble.com> <16603225.post@talk.nabble.com> Message-ID: <16737795.post@talk.nabble.com> Thank you very much! I splited the file on \t directly. Zou Jing Stefan Kirov-2 wrote: > > It is not. If you use this file, why would you need a parser for it > anyway? Just split on \t or read with OpenOffice or equiv. > Stefan > > On Thu, 10 Apr 2008, zoujing wrote: > >> >> Seached the web and found the answer now, quote the answer as following: >> The error was thrown by my Bio::ASN1::EntrezGene module because it >> expects a text file, while you fed it with a binary file. To use >> gzipped ASN binary file from NCBI, download the NCBI gene2xml >> (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), >> then use this syntax to run my parser on the binary files: >> >> my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i >> Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped >> binary file directly downloaded from NCBI >> >> Same syntax should be used when you're using SeqIO (thus >> SeqIO::entrezgene). >> Mingyi >> >> But there still one thing, I want to parse "gene_info.gz" in Gene of >> NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one >> line >> per GeneID, Column header line is the first line in the file >> ) is not the right format for Bio::ASN1::EntrezGene? >> >> >> >> zoujing wrote: >>> >>> I am a geen hand in Bioperl. When I run perl with >>> "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error >>> information: >>> Data Error: none conforming data found on line 1 in Sus_scrofa.ags. >>> >>> But the Sus_scrofa.ags is download from NCBI, with the format of >>> ASN1, >>> should be the same as Homo_sapiens in the example. So it should be no >>> error as the code is the example from Mingyi. >>> I wonder why this happen, and should I change something about the >>> file? >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16737795.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Wed Apr 16 22:55:47 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 16 Apr 2008 19:55:47 -0700 (PDT) Subject: [Bioperl-l] Bio::ASN1::EntrezGene parse so slowly? In-Reply-To: <264855a00804112050gf785c2ei66d9c7463597eccd@mail.gmail.com> References: <16602210.post@talk.nabble.com> <264855a00804112050gf785c2ei66d9c7463597eccd@mail.gmail.com> Message-ID: <16737804.post@talk.nabble.com> Thank you vey much! Solved the problem now. Jing Sean Davis-3 wrote: > > gene_info is a tab-delimited text file, if I recall correctly. Have > you looked at it? If it is, you should be able to parse it in a few > seconds with just a couple lines of code. > > Sean > > > On Thu, Apr 10, 2008 at 1:08 AM, zoujing <1zoujing at 163.com> wrote: >> >> I want to parse a file "gene_info" from NCBI. The format of Gene in >> NCBI is >> ASN1, right? So I used Bio::ASN1::EntrezGene. But it didn't work >> properly/too slow. The file is about 500M. >> The code is following: >> use Bio::ASN1::EntrezGene; >> my $parser = Bio::ASN1::EntrezGene->new('file' => $ARGV[0]); >> my $i = 0; >> while(my $result = $parser->next_seq) >> { last; #something to do there, here use last for test} >> >> When it goes to the "while" part, it is processing on and on, it does >> not >> went out, even I used "last" in the "while" part. >> So I wonder whether it is too slow or the module is not fit for this >> job, >> or I did something wrong? >> >> Thank you! >> -- >> View this message in context: >> http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16602210.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16737804.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From sbassi at clubdelarazon.org Sat Apr 26 13:49:20 2008 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Sat, 26 Apr 2008 14:49:20 -0300 Subject: [Bioperl-l] bioperl installation problem Message-ID: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> I tried to install bioperl because I need to install cviewer. Here (http://www.pastecode.com.ar/f37c1cd60) are both stdout and sdterr outputs. Here is one of the errors I get: set_attribute: not a compat02 graph at /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 10. sleeping for 3 seconds set_attribute: not a compat02 graph at /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 14. But I have GD::Graph, so I don't know what is going on: sbassi at ubuntuMAP:~$ sudo perl -MCPAN -e 'install GD::Graph' CPAN: Storable loaded ok Going to read /home/sbassi/.cpan/Metadata Database was generated on Fri, 25 Apr 2008 09:29:45 GMT GD::Graph is up to date. Any help regarding this: http://www.pastecode.com.ar/f37c1cd60 would be appreciated. Best, SB. -- Sebasti?n Bassi (???????). Diplomado en Ciencia y Tecnolog?a. Curso Biologia molecular para programadores: http://tinyurl.com/2vv8w6 Mostr? tu c?digo: http://www.pastecode.com.ar GPG Fingerprint: 9470 0980 620D ABFC BE63 A4A4 A3DE C97D 8422 D43D From jason at bioperl.org Sat Apr 26 15:23:37 2008 From: jason at bioperl.org (Jason Stajich) Date: Sat, 26 Apr 2008 12:23:37 -0700 Subject: [Bioperl-l] bioperl installation problem In-Reply-To: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> References: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> Message-ID: the error refers to the 'Graph' module not 'GD::Graph'; -jason On Apr 26, 2008, at 10:49 AM, Sebastian Bassi wrote: > I tried to install bioperl because I need to install cviewer. > Here (http://www.pastecode.com.ar/f37c1cd60) are both stdout and > sdterr outputs. > > Here is one of the errors I get: > > set_attribute: not a compat02 graph at > /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 10. > sleeping for 3 seconds > set_attribute: not a compat02 graph at > /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 14. > > But I have GD::Graph, so I don't know what is going on: > > sbassi at ubuntuMAP:~$ sudo perl -MCPAN -e 'install GD::Graph' > CPAN: Storable loaded ok > Going to read /home/sbassi/.cpan/Metadata > Database was generated on Fri, 25 Apr 2008 09:29:45 GMT > GD::Graph is up to date. > > Any help regarding this: http://www.pastecode.com.ar/f37c1cd60 > would be appreciated. > > Best, > SB. > > -- > Sebasti?n Bassi (???????). Diplomado en Ciencia y > Tecnolog?a. > Curso Biologia molecular para programadores: http://tinyurl.com/2vv8w6 > Mostr? tu c?digo: http://www.pastecode.com.ar > GPG Fingerprint: 9470 0980 620D ABFC BE63 A4A4 A3DE C97D 8422 D43D > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sbassi at clubdelarazon.org Sat Apr 26 17:08:13 2008 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Sat, 26 Apr 2008 18:08:13 -0300 Subject: [Bioperl-l] bioperl installation problem In-Reply-To: References: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> Message-ID: <9e2f512b0804261408l45ff9f91j94f44065d21cd65f@mail.gmail.com> On Sat, Apr 26, 2008 at 4:23 PM, Jason Stajich wrote: > the error refers to the 'Graph' module not 'GD::Graph'; You are right, but I have it also installed: sbassi at ubuntuMAP:~$ sudo perl -MCPAN -e 'install Graph' Password: CPAN: Storable loaded ok Going to read /home/sbassi/.cpan/Metadata Database was generated on Fri, 25 Apr 2008 09:29:45 GMT Graph is up to date. -- Sebasti?n Bassi (???????). Diplomado en Ciencia y Tecnolog?a. Curso Biologia molecular para programadores: http://tinyurl.com/2vv8w6 Mostr? tu c?digo: http://www.pastecode.com.ar GPG Fingerprint: 9470 0980 620D ABFC BE63 A4A4 A3DE C97D 8422 D43D From bix at sendu.me.uk Sat Apr 26 19:30:56 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 27 Apr 2008 00:30:56 +0100 Subject: [Bioperl-l] bioperl installation problem In-Reply-To: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> References: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> Message-ID: <4813BB30.6060703@sendu.me.uk> Sebastian Bassi wrote: > I tried to install bioperl because I need to install cviewer. > Here (http://www.pastecode.com.ar/f37c1cd60) are both stdout and sdterr outputs. > > Here is one of the errors I get: > > set_attribute: not a compat02 graph at > /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 10. > sleeping for 3 seconds > set_attribute: not a compat02 graph at > /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 14. You're trying to install a very old version of Bioperl which apparently uses behaviour of the Graph module no longer supported: http://search.cpan.org/~jhi/Graph-0.84/lib/Graph.pod#Backward_compatibility_with_Graph_0.2 Your options are to force install your desired version of Bioperl (if you don't need to use the modules that are causing the errors you get), downgrade your version of Graph to pre-0.2, or install the latest version of Bioperl (1.5.2 or from svn). From dr.hogart at gmail.com Sun Apr 27 10:05:20 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Sun, 27 Apr 2008 18:05:20 +0400 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics Message-ID: Hi all, is it possible to add a GD::graphic object (chart) to Bio::Graphics panel to obtain a file with image of both the chart and bioseq object? From Russell.Smithies at agresearch.co.nz Sun Apr 27 17:27:23 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 28 Apr 2008 09:27:23 +1200 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics In-Reply-To: References: Message-ID: You can get the GD object back from the Bio::Graphics::Panel then draw on it using GD methods Eg: #create a BioPerl panel my $panel = Bio::Graphics::Panel->new( -length => 600 -width => 800, -bgcolor => 'white' ); # add your features my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => 200,); $panel->add_track($feature, glyph => 'segments', -label => 0, -height => 30, -bgcolor => 'red', -fgcolor => 'red' ); # grab the GD thingy my $gd = $panel->gd; #create a color - not sure if there's a better way? $black = $gd->colorAllocate(0,0,0); #draw on your GD thingy $gd->line(10,10,$panel->width -10,10,$black); $gd->string(gdSmallFont,20,10,'test' ,'$black); # print it as normal print $panel->png; > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of sergei ryazansky > Sent: Monday, 28 April 2008 2:05 a.m. > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics > > Hi all, > > is it possible to add a GD::graphic object (chart) to Bio::Graphics panel > to obtain a file with image of both the chart and bioseq object? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From dr.hogart at gmail.com Sun Apr 27 20:25:18 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Mon, 28 Apr 2008 04:25:18 +0400 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics References: Message-ID: Thanks for answer! Yours script works fine, but nevertheless, as for as I understand 'gd' method return the gd::image object. But I need the to merge bioseq object with gd::graph object (gd::graph::area). Is it possible? Or maybe I misunderstood something in your example? On Mon, 28 Apr 2008 01:27:23 +0400, Smithies, Russell wrote: > You can get the GD object back from the Bio::Graphics::Panel then draw > on it using GD methods > > Eg: > > #create a BioPerl panel > my $panel = Bio::Graphics::Panel->new( > -length => 600 > -width => 800, > -bgcolor => 'white' > ); > # add your features > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => > 200,); > $panel->add_track($feature, glyph => 'segments', > -label => 0, > -height => 30, > -bgcolor => 'red', > -fgcolor => 'red' > ); > > # grab the GD thingy > my $gd = $panel->gd; > > #create a color - not sure if there's a better way? > $black = $gd->colorAllocate(0,0,0); > > #draw on your GD thingy > $gd->line(10,10,$panel->width -10,10,$black); > $gd->string(gdSmallFont,20,10,'test' ,'$black); > > # print it as normal > print $panel->png; > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of sergei ryazansky >> Sent: Monday, 28 April 2008 2:05 a.m. >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics >> >> Hi all, >> >> is it possible to add a GD::graphic object (chart) to Bio::Graphics > panel >> to obtain a file with image of both the chart and bioseq object? >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= From Bank.Beszteri at awi.de Mon Apr 28 08:18:20 2008 From: Bank.Beszteri at awi.de (=?UTF-8?B?QsOhbmsgQmVzenRlcmk=?=) Date: Mon, 28 Apr 2008 14:18:20 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47FB204F.90405@awi.de> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> Message-ID: <4815C08C.1060305@awi.de> Dear BioSQL / bioperl-db-ists, I would like to share my experiences with trying to load uniprot_trembl into a BioSQL db, and also to ask a couple of questions; perhaps some of you know the problems I encountered. I used bioperl-live and bioperl-db-live as of 2008-04-03 and uniprot_trembl.dat as of 2008-04-04. The command was like load_seqdatabase.pl --safe --logchunk 1000 --host dbserv --dbname abc --dbuser efg --dbpass xyz --driver mysql --namespace uniprot_trembl --format embl uniprot_trembl.dat although I split the dat file into 10 chunks and started them parallel to make it faster. This did not go quite as smoothly as Swissprot did. In the end, it seems to have loaded 5022284 entries of the 5443284 which appear to be there in the input file (when counting with grep -c "ID "). Besides the harmless taxonomy warnings which also appear with Swissprot (and have been discussed about here a couple of weeks ago and also earlier), there came a couple of more serious errors. Perhaps some of you know them already: First of all, the below error seems to lead to a crash, in spite of --safe: >>> ------------- EXCEPTION ------------- MSG: A1XDT7 seems to have an invalid species classification. STACK Bio::SeqIO::embl::_read_EMBL_Species /home/biocl/bbeszter/lib/bioperl-live/bioperl-live/Bio/SeqIO/embl.pm:108 7 STACK Bio::SeqIO::embl::next_seq /home/biocl/bbeszter/lib/bioperl-live/bioperl-live/Bio/SeqIO/embl.pm:320 STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:634 ------------------------------------- Command exited with non-zero status 255 <<< What this is about is NCBI Tax_ID:435 (Acetobacter aceti; it has some 30 synonyms in my DB, too), which, to me, looks like a completely normal taxon: I could follow its taxonomy up to the root in my NCBI taxonomy in the BioSQL DB I used. I don?t know if someone else has seen / can reproduce the problem, or should I think about some problem with my taxonomy db? Besides, is it the expected behaviour from load_seqdatabase.pl to die upon this error? ################### The other problems did not lead to a crash, only to a failure to load the sequence, which would be what I?d expect with --safe. The first type of errors looks like >>> Could not store Q49I36: ------------- EXCEPTION ------------- MSG: Unique key query in Bio::DB::BioSQL::SpeciesAdaptor returned 2 rows instead of 1. Query was [name_class="scientific name",binomial="Onchocerca volvulus"] STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:958 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:854 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182 STACK Bio::DB::Persistent::PersistentObject::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:244 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:271 STACK (eval) /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:630 STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:612 ------------------------------------- <<< In this particular case, "Onchocerca volvulus" does indeed have two taxon_ids in my DB (6282 and 563188, of which only the first one is returned by a web search at NCBI taxonomy); but the same thing happened with a number of other taxa (followed by how many times the above error was caused by the particular taxa): Wolbachia pipientis 64 Hemerocallis sp. 1 Hypsiglena torquata 3 Salmonella enterica 1211 Burkholderia sp. 31 Streptococcus sp. 4 Rhizobium sp. 600 Nostoc sp. 19 Drosophila sp. 18 Onchocerca volvulus 62 Atlapetes schistaceus 4 Symbiodinium sp. 3 Escherichia coli 7421 Hieraaetus fasciatus 4 Borrelia burgdorferi group 1 Pseudomonas sp. 29 Rotavirus A 1076 Gorilla gorilla 746 Rana plancyi 14 unclassified sequences 1 (This should be 11312 cases altogether, but the list might be incomplete because I accidentally removed one of my logs, which contained STDOUT &STDERR ~ for 10 % of the entries) Again, is this a known problem for some of you, or could there be a problem with my copy of NCBI taxonomy? I don?t remember having updated it after the initial upload, so I?m quite surprised by such duplicate entries.... ################### Type 2 error w/o crash: >>> Could not store A5HU09: ------------- EXCEPTION ------------- MSG: create: object (Bio::Species) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206 STACK Bio::DB::Persistent::PersistentObject::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:244 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:271 STACK (eval) /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:630 STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:612 <<< This particular record has the NCBI_TaxID 44271, which looks completely normal in the NCBI taxonomy loaded in my BioSQL DB, but the same problem appeared in 53 further cases (I could not look into them in detail as yet to see whether they were all the same species). On the other hand, 7 records which were succesfully loaded have this taxonomy ID in the DB (44271). ################### Nr 3 no crash: >>> Could not store Q6T859: Unmatched ( in regex; marked by <-- HERE in m/Camelina microcarpa (Littlepod false flax) ( <-- HERE microcarpa subsp.\s+/ at /home/biocl/bbeszter/lib/bioperl-live/bioperl-live/Bio/Species.pm line 466, line 357048. <<< This happens in the sub binomial in Species.pm using the option "FULL", which requests to also return subspecies. I have not looked much deeper into this yet, but is it possible that there is a parsing problem with multi-line species strings? In the above case the OS field in uniprot_trembl.dat looks like OS Camelina microcarpa (Littlepod false flax) (Camelina microcarpa subsp. OS sylvestris). ################### I?m still looking for where the remaining records disappeared: of the 421000 records not showing up in the DB, I could find these: crasher (Tax_ID=435): 45 entries problem 1 ("MSG: Unique key query in Bio::DB::BioSQL::SpeciesAdaptor returned 2 rows instead of 1."): 11312 entries problem 2 ("MSG: create: object (Bio::Species) failed to insert or to be found by unique key"): 54 entries problem 3 ("Unmatched ( in regex"): 28241 entries 381348 still remain... Although these could in principle come from the first 10 %, for which I don?t have the output, but they don?t seem to: after restarting that chunk, I get ~ 30 "Could not store" errors. So the last question: are there any error messages I can expect which don?t contain "Could not store" and which I thus missed here? Bank Beszteri Bioinformatics Alfred Wegener Institute for Polar and Marine Research Am Handelshafen 12 27570 Bremerhaven From cjfields at uiuc.edu Mon Apr 28 09:20:39 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 28 Apr 2008 08:20:39 -0500 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <4815C08C.1060305@awi.de> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> <4815C08C.1060305@awi.de> Message-ID: <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> On Apr 28, 2008, at 7:18 AM, B?nk Beszteri wrote: > Dear BioSQL / bioperl-db-ists, > > I would like to share my experiences with trying to load > uniprot_trembl into a BioSQL db, and also to ask a couple of > questions; perhaps some of you know the problems I encountered. I > used bioperl-live and bioperl-db-live as of 2008-04-03 and > uniprot_trembl.dat as of 2008-04-04. The command was like > > load_seqdatabase.pl --safe --logchunk 1000 --host dbserv --dbname > abc --dbuser efg --dbpass xyz --driver mysql --namespace > uniprot_trembl --format embl uniprot_trembl.dat > > .... > > First of all, the below error seems to lead to a crash, in spite of > --safe: > > >>> > ------------- EXCEPTION ------------- > MSG: A1XDT7 seems to have an invalid species classification. > STACK Bio::SeqIO::embl::_read_EMBL_Species /home/biocl/bbeszter/lib/ > bioperl-live/bioperl-live/Bio/SeqIO/embl.pm:108 > 7 > STACK Bio::SeqIO::embl::next_seq /home/biocl/bbeszter/lib/bioperl- > live/bioperl-live/Bio/SeqIO/embl.pm:320 > STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/ > scripts/biosql/load_seqdatabase.pl:634 > ------------------------------------- > > Command exited with non-zero status 255 > <<< > > What this is about is NCBI Tax_ID:435 (Acetobacter aceti; it has > some 30 synonyms in my DB, too), which, to me, looks like a > completely normal taxon: I could follow its taxonomy up to the root > in my NCBI taxonomy in the BioSQL DB I used. I don?t know if someone > else has seen / can reproduce the problem, or should I think about > some problem with my taxonomy db? Besides, is it the expected > behaviour from load_seqdatabase.pl to die upon this error? ... You should use 'swiss' format instead of 'embl' when loading Uniprot/ SwissProt sequences. Though on the surface they're similar the feature table (among other things) is completely different. I'm not sure if that's causing all of the issues here but it certainly could contribute to them. In the meantime, it's much easier for us to track these problems if you file a bug (BioPerl, file for bioperl-db): http://bugzilla.open-bio.org/ chris From cjfields at uiuc.edu Sun Apr 27 17:54:03 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 27 Apr 2008 16:54:03 -0500 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics In-Reply-To: References: Message-ID: I think this is how some of the synteny mapping is done using SynBrowse (the trapezoids connecting syntenous genes on different tracks). http://www.gmod.org/wiki/index.php/SynView chris On Apr 27, 2008, at 4:27 PM, Smithies, Russell wrote: > You can get the GD object back from the Bio::Graphics::Panel then > draw > on it using GD methods > > Eg: > > #create a BioPerl panel > my $panel = Bio::Graphics::Panel->new( > -length => 600 > -width => 800, > -bgcolor => 'white' > ); > # add your features > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => > 200,); > $panel->add_track($feature, glyph => 'segments', > -label => 0, > -height => 30, > -bgcolor => 'red', > -fgcolor => 'red' > ); > > # grab the GD thingy > my $gd = $panel->gd; > > #create a color - not sure if there's a better way? > $black = $gd->colorAllocate(0,0,0); > > #draw on your GD thingy > $gd->line(10,10,$panel->width -10,10,$black); > $gd->string(gdSmallFont,20,10,'test' ,'$black); > > # print it as normal > print $panel->png; > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of sergei ryazansky >> Sent: Monday, 28 April 2008 2:05 a.m. >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics >> >> Hi all, >> >> is it possible to add a GD::graphic object (chart) to Bio::Graphics > panel >> to obtain a file with image of both the chart and bioseq object? >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Bank.Beszteri at awi.de Mon Apr 28 09:51:53 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Mon, 28 Apr 2008 15:51:53 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> <4815C08C.1060305@awi.de> <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> Message-ID: <4815D679.3070307@awi.de> Chris Fields schrieb: > > ... > > You should use 'swiss' format instead of 'embl' when loading > Uniprot/SwissProt sequences. Though on the surface they're similar > the feature table (among other things) is completely different. I'm > not sure if that's causing all of the issues here but it certainly > could contribute to them. > > In the meantime, it's much easier for us to track these problems if > you file a bug (BioPerl, file for bioperl-db): > > http://bugzilla.open-bio.org/ > Hi Chris, I will do so; in the meanwhile: I?m not loading Swissprot, but TrEMBL. Is swiss also the appropriate format here? By reading http://expasy.org/sprot/userman.html#diffEMBL, I concluded that embl should be the one I?d need for TrEMBL. Bank From cjfields at uiuc.edu Mon Apr 28 12:24:39 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 28 Apr 2008 11:24:39 -0500 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <4815D679.3070307@awi.de> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> <4815C08C.1060305@awi.de> <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> <4815D679.3070307@awi.de> Message-ID: On Apr 28, 2008, at 8:51 AM, B?nk Beszteri wrote: > Chris Fields schrieb: >> >> ... >> >> You should use 'swiss' format instead of 'embl' when loading >> Uniprot/SwissProt sequences. Though on the surface they're similar >> the feature table (among other things) is completely different. >> I'm not sure if that's causing all of the issues here but it >> certainly could contribute to them. >> >> In the meantime, it's much easier for us to track these problems if >> you file a bug (BioPerl, file for bioperl-db): >> >> http://bugzilla.open-bio.org/ >> > Hi Chris, > > I will do so; in the meanwhile: I?m not loading Swissprot, but > TrEMBL. Is swiss also the appropriate format here? By reading http://expasy.org/sprot/userman.html#diffEMBL > , I concluded that embl should be the one I?d need for TrEMBL. > > Bank The section you link to describes several important differences between EMBL and SwissProt/UniProt format (i.e. how each indicated line type differs between SwissProt and EMBL formats, including ID, AC, OS/OC, FT, etc). I'm unsure how you derived that 'embl' would work from that, e.g. they are close, but there are enough significant differences that using 'embl' for SwissProt (or vice versa) will not work as intended, if at all. chris From hlapp at gmx.net Mon Apr 28 15:46:07 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 28 Apr 2008 15:46:07 -0400 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <4815D679.3070307@awi.de> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> <4815C08C.1060305@awi.de> <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> <4815D679.3070307@awi.de> Message-ID: <3BD6A261-D023-4A5F-9CBC-C3216B0145F0@gmx.net> On Apr 28, 2008, at 9:51 AM, B?nk Beszteri wrote: > I?m not loading Swissprot, but TrEMBL. Is swiss also the > appropriate format here? Yes, though I guess it can be confusing. Maybe we should create a symlink uniprot.pm to swiss.pm, or in fact fork them if UniProt starts accumulating enough differences from the traditional Swissprot format. BTW as you had noticed, the --safe switch only protects the script from crashing due to a db loading error. A parsing error will still cause a crash. I guess you can argue that that's not nice, and having a chance to skip over the record that offends the (BioPerl) parser would be useful. The problem is that if the parser errors out, it's not guaranteed where we are in the file and whether the parser module is in a state that it can recover itself from. For the database it's a bit easier as one just needs to rollback() the transaction (each sequence is its own transaction). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From Russell.Smithies at agresearch.co.nz Mon Apr 28 17:15:16 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 29 Apr 2008 09:15:16 +1200 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics In-Reply-To: References: Message-ID: I thought it was a bit of a hack but I guess if someone else is doing it too, it can't be all bad :-) It looks like you can combine your drawing methods like this: (I'm sure Lincoln will tell us this is bad but it seems to work ok) ------------------------------------------------------------------------ ------------- #!perl -w use GD::Graph::lines; use GD::Graph::colour; use GD::Graph::Data; use Bio::Graphics; use Bio::SeqFeature::Generic; # create and draw on a graphics panel my $panel = Bio::Graphics::Panel->new( -length => 500, -width => 500 ); my $track = $panel->add_track( -glyph => 'generic', -label => 1 ); # create and add a few features for($i = 100; $i < 500; $i+= 100){ my $feature = Bio::SeqFeature::Generic->new( -display_name => "feature: $i", -score => $i, -start => $i, -end => $i + 100 ); $track->add_feature($feature); } # create and draw the graph my @data = ( ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], [ sort { $a <=> $b } (1, 2, 5, 6, 3, 1.5, 1, 3, 4) ] ); my $graph = GD::Graph::lines->new(500, 300); $graph->set( x_label => 'X Label', y_label => 'Y label', title => 'Some simple graph', y_max_value => 8, y_tick_number => 8, y_label_skip => 2 ) or die $graph->error; $graph->set( dclrs => [ qw( green blue black red pink) ] ); my $gd = $graph->plot(\@data) or die $graph->error; # combine the two images my $combined = $panel->gd($gd); open(IMG, '>file.png') or die $!; binmode IMG; print IMG $combined->png; ------------------------------------------------------------------------ ------------------ > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Monday, 28 April 2008 9:54 a.m. > To: Smithies, Russell > Cc: sergei ryazansky; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] addition of GD::graphic object to Bio::Graphics > > I think this is how some of the synteny mapping is done using > SynBrowse (the trapezoids connecting syntenous genes on different > tracks). > > http://www.gmod.org/wiki/index.php/SynView > > chris > > On Apr 27, 2008, at 4:27 PM, Smithies, Russell wrote: > > > You can get the GD object back from the Bio::Graphics::Panel then > > draw > > on it using GD methods > > > > Eg: > > > > #create a BioPerl panel > > my $panel = Bio::Graphics::Panel->new( > > -length => 600 > > -width => 800, > > -bgcolor => 'white' > > ); > > # add your features > > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => > > 200,); > > $panel->add_track($feature, glyph => 'segments', > > -label => 0, > > -height => 30, > > -bgcolor => 'red', > > -fgcolor => 'red' > > ); > > > > # grab the GD thingy > > my $gd = $panel->gd; > > > > #create a color - not sure if there's a better way? > > $black = $gd->colorAllocate(0,0,0); > > > > #draw on your GD thingy > > $gd->line(10,10,$panel->width -10,10,$black); > > $gd->string(gdSmallFont,20,10,'test' ,'$black); > > > > # print it as normal > > print $panel->png; > > > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open- > >> bio.org] On Behalf Of sergei ryazansky > >> Sent: Monday, 28 April 2008 2:05 a.m. > >> To: bioperl-l at bioperl.org > >> Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics > >> > >> Hi all, > >> > >> is it possible to add a GD::graphic object (chart) to Bio::Graphics > > panel > >> to obtain a file with image of both the chart and bioseq object? > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > = > > > ============================================================= > ========= > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other use of, > > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > = > > > ============================================================= > ========= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From lincoln.stein at gmail.com Mon Apr 28 17:33:19 2008 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 28 Apr 2008 17:33:19 -0400 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics In-Reply-To: References: Message-ID: <6dce9a0b0804281433i697cda2fo2c47ce59010d0858@mail.gmail.com> Hi, No, I'm perfectly happy with combining images like this. It is part of what I intended. Another idea would be to use the Image glyph to embed graphs at particular genomic locations in the panel. Right now the glyph is designed in the expectation that the image passed to it is sitting on the file system (or a web URL), but it would be easy to modify it so that a callback can generate the GD on the fly, by using, for example GD::Graph. Lincoln On Mon, Apr 28, 2008 at 5:15 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > I thought it was a bit of a hack but I guess if someone else is doing it > too, it can't be all bad :-) > > It looks like you can combine your drawing methods like this: > (I'm sure Lincoln will tell us this is bad but it seems to work ok) > ------------------------------------------------------------------------ > ------------- > > #!perl -w > use GD::Graph::lines; > use GD::Graph::colour; > use GD::Graph::Data; > > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > # create and draw on a graphics panel > my $panel = Bio::Graphics::Panel->new( > -length => 500, > -width => 500 > ); > my $track = $panel->add_track( > -glyph => 'generic', > -label => 1 > ); > > # create and add a few features > for($i = 100; $i < 500; $i+= 100){ > my $feature = Bio::SeqFeature::Generic->new( > -display_name => "feature: > $i", > -score => $i, > -start => $i, > -end => $i + 100 > ); > $track->add_feature($feature); > } > > > # create and draw the graph > my @data = ( > ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], > [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], > [ sort { $a <=> $b } (1, 2, 5, 6, 3, 1.5, 1, 3, 4) ] > ); > my $graph = GD::Graph::lines->new(500, 300); > > $graph->set( > x_label => 'X Label', > y_label => 'Y label', > title => 'Some simple graph', > y_max_value => 8, > y_tick_number => 8, > y_label_skip => 2 > ) or die $graph->error; > > $graph->set( dclrs => [ qw( green blue black red pink) ] ); > > my $gd = $graph->plot(\@data) or die $graph->error; > > # combine the two images > my $combined = $panel->gd($gd); > > open(IMG, '>file.png') or die $!; > binmode IMG; > print IMG $combined->png; > > ------------------------------------------------------------------------ > ------------------ > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > Sent: Monday, 28 April 2008 9:54 a.m. > > To: Smithies, Russell > > Cc: sergei ryazansky; bioperl-l at bioperl.org > > Subject: Re: [Bioperl-l] addition of GD::graphic object to > Bio::Graphics > > > > I think this is how some of the synteny mapping is done using > > SynBrowse (the trapezoids connecting syntenous genes on different > > tracks). > > > > http://www.gmod.org/wiki/index.php/SynView > > > > chris > > > > On Apr 27, 2008, at 4:27 PM, Smithies, Russell wrote: > > > > > You can get the GD object back from the Bio::Graphics::Panel then > > > draw > > > on it using GD methods > > > > > > Eg: > > > > > > #create a BioPerl panel > > > my $panel = Bio::Graphics::Panel->new( > > > -length => 600 > > > -width => > 800, > > > -bgcolor => 'white' > > > ); > > > # add your features > > > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => > > > 200,); > > > $panel->add_track($feature, glyph => 'segments', > > > -label => 0, > > > -height => 30, > > > -bgcolor => 'red', > > > -fgcolor => 'red' > > > ); > > > > > > # grab the GD thingy > > > my $gd = $panel->gd; > > > > > > #create a color - not sure if there's a better way? > > > $black = $gd->colorAllocate(0,0,0); > > > > > > #draw on your GD thingy > > > $gd->line(10,10,$panel->width -10,10,$black); > > > $gd->string(gdSmallFont,20,10,'test' ,'$black); > > > > > > # print it as normal > > > print $panel->png; > > > > > > > > > > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org > > > [mailto:bioperl-l-bounces at lists.open- > > >> bio.org] On Behalf Of sergei ryazansky > > >> Sent: Monday, 28 April 2008 2:05 a.m. > > >> To: bioperl-l at bioperl.org > > >> Subject: [Bioperl-l] addition of GD::graphic object to > Bio::Graphics > > >> > > >> Hi all, > > >> > > >> is it possible to add a GD::graphic object (chart) to Bio::Graphics > > > panel > > >> to obtain a file with image of both the chart and bioseq object? > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > = > > > > > ============================================================= > > ========= > > > Attention: The information contained in this message and/or > > > attachments > > > from AgResearch Limited is intended only for the persons or entities > > > to which it is addressed and may contain confidential and/or > > > privileged > > > material. Any review, retransmission, dissemination or other use of, > > > or > > > taking of any action in reliance upon, this information by persons > or > > > entities other than the intended recipients is prohibited by > > > AgResearch > > > Limited. If you have received this message in error, please notify > the > > > sender immediately. > > > = > > > > > ============================================================= > > ========= > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dr.hogart at gmail.com Tue Apr 29 03:56:24 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Tue, 29 Apr 2008 11:56:24 +0400 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics References: Message-ID: Thank you very much! It is exactly that I was looking for. On Tue, 29 Apr 2008 01:15:16 +0400, Smithies, Russell wrote: > I thought it was a bit of a hack but I guess if someone else is doing it > too, it can't be all bad :-) > > It looks like you can combine your drawing methods like this: > (I'm sure Lincoln will tell us this is bad but it seems to work ok) > ------------------------------------------------------------------------ > ------------- > > #!perl -w > use GD::Graph::lines; > use GD::Graph::colour; > use GD::Graph::Data; > > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > # create and draw on a graphics panel > my $panel = Bio::Graphics::Panel->new( > -length => 500, > -width => 500 > ); > my $track = $panel->add_track( > -glyph => 'generic', > -label => 1 > ); > > # create and add a few features > for($i = 100; $i < 500; $i+= 100){ > my $feature = Bio::SeqFeature::Generic->new( > -display_name => "feature: > $i", > -score => $i, > -start => $i, > -end => $i + 100 > ); > $track->add_feature($feature); > } > > > # create and draw the graph > my @data = ( > ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], > [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], > [ sort { $a <=> $b } (1, 2, 5, 6, 3, 1.5, 1, 3, 4) ] > ); > my $graph = GD::Graph::lines->new(500, 300); > > $graph->set( > x_label => 'X Label', > y_label => 'Y label', > title => 'Some simple graph', > y_max_value => 8, > y_tick_number => 8, > y_label_skip => 2 > ) or die $graph->error; > > $graph->set( dclrs => [ qw( green blue black red pink) ] ); > > my $gd = $graph->plot(\@data) or die $graph->error; > > # combine the two images > my $combined = $panel->gd($gd); > > open(IMG, '>file.png') or die $!; > binmode IMG; > print IMG $combined->png; > > ------------------------------------------------------------------------ > ------------------ > >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at uiuc.edu] >> Sent: Monday, 28 April 2008 9:54 a.m. >> To: Smithies, Russell >> Cc: sergei ryazansky; bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] addition of GD::graphic object to > Bio::Graphics >> >> I think this is how some of the synteny mapping is done using >> SynBrowse (the trapezoids connecting syntenous genes on different >> tracks). >> >> http://www.gmod.org/wiki/index.php/SynView >> >> chris >> >> On Apr 27, 2008, at 4:27 PM, Smithies, Russell wrote: >> >> > You can get the GD object back from the Bio::Graphics::Panel then >> > draw >> > on it using GD methods >> > >> > Eg: >> > >> > #create a BioPerl panel >> > my $panel = Bio::Graphics::Panel->new( >> > -length => 600 >> > -width => > 800, >> > -bgcolor => 'white' >> > ); >> > # add your features >> > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => >> > 200,); >> > $panel->add_track($feature, glyph => 'segments', >> > -label => 0, >> > -height => 30, >> > -bgcolor => 'red', >> > -fgcolor => 'red' >> > ); >> > >> > # grab the GD thingy >> > my $gd = $panel->gd; >> > >> > #create a color - not sure if there's a better way? >> > $black = $gd->colorAllocate(0,0,0); >> > >> > #draw on your GD thingy >> > $gd->line(10,10,$panel->width -10,10,$black); >> > $gd->string(gdSmallFont,20,10,'test' ,'$black); >> > >> > # print it as normal >> > print $panel->png; >> > >> > >> > >> > >> >> -----Original Message----- >> >> From: bioperl-l-bounces at lists.open-bio.org >> > [mailto:bioperl-l-bounces at lists.open- >> >> bio.org] On Behalf Of sergei ryazansky >> >> Sent: Monday, 28 April 2008 2:05 a.m. >> >> To: bioperl-l at bioperl.org >> >> Subject: [Bioperl-l] addition of GD::graphic object to > Bio::Graphics >> >> >> >> Hi all, >> >> >> >> is it possible to add a GD::graphic object (chart) to Bio::Graphics >> > panel >> >> to obtain a file with image of both the chart and bioseq object? >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > = >> > >> ============================================================= >> ========= >> > Attention: The information contained in this message and/or >> > attachments >> > from AgResearch Limited is intended only for the persons or entities >> > to which it is addressed and may contain confidential and/or >> > privileged >> > material. Any review, retransmission, dissemination or other use of, >> > or >> > taking of any action in reliance upon, this information by persons > or >> > entities other than the intended recipients is prohibited by >> > AgResearch >> > Limited. If you have received this message in error, please notify > the >> > sender immediately. >> > = >> > >> ============================================================= >> ========= >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= -- ?????????? M2, ????????????? ???????? ?????????? Opera: http://www.opera.com/mail/mail/ From d.gatherer at mrcvu.gla.ac.uk Tue Apr 29 08:21:05 2008 From: d.gatherer at mrcvu.gla.ac.uk (Derek Gatherer) Date: Tue, 29 Apr 2008 13:21:05 +0100 Subject: [Bioperl-l] translate() oddities Message-ID: Hi I thought I'd better run this by the community before I embarrass myself on Bugzilla. It seems like a clear bug to me. I'm running Bioperl 1.5.0 on RedHat. For a test input: >test ATGATGATGATGATGTGA the following code is fine. while((my $seqobj = $seq_in->next_seq())) { print "\n".$seqobj->display_id; my $len = $seqobj->length(); print " length: $len"; my $frame1_obj = $seqobj->translate(); my $f1_prot = $frame1_obj->seq(); print "\n$f1_prot"; } Output: test length: 18 MMMMM* But if I want to change the frame as specified in the BioPerl tutorial, by using: my $frame1_obj = $seqobj->translate(frame => 1); # which should now give frame 2, I get: test length: 18 MMMMM-frame The frame is unchanged and the text "-frame" is tacked on the end of the output. The same occurs with translate(frame => 2). Any ideas? Can something as fundamental as translate() really be bugged? or am I guilty of some particularly heinous syntax error? Cheers Derek From tristan.lefebure at gmail.com Tue Apr 29 09:58:21 2008 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Tue, 29 Apr 2008 09:58:21 -0400 Subject: [Bioperl-l] translate() oddities In-Reply-To: References: Message-ID: <200804290958.21548.tristan.lefebure@gmail.com> Aren't you forgetting the dash? my $frame1_obj = $seqobj->translate(-frame => 1) On Tuesday 29 April 2008 08:21:05 Derek Gatherer wrote: > my $frame1_obj = $seqobj->translate(frame => 1) -Tristan From d.gatherer at mrcvu.gla.ac.uk Tue Apr 29 10:05:03 2008 From: d.gatherer at mrcvu.gla.ac.uk (Derek Gatherer) Date: Tue, 29 Apr 2008 15:05:03 +0100 Subject: [Bioperl-l] translate() oddities In-Reply-To: <481726BF.1060609@bms.com> References: <481726BF.1060609@bms.com> Message-ID: Thanks Stefan Actually, there was a typo in my message, I did use -frame => 1. However, the problem disappears on upgrading from 1.5.0 to 1.5.2. So not a bug anymore. Cheers Derek At 14:46 29/04/2008, Stefan Kirov wrote: >my $frame1_obj = $seqobj->translate(-frame => 1); >not >my $frame1_obj = $seqobj->translate(frame => 1); >Stefan > >Derek Gatherer wrote: > > Hi > > > > I thought I'd better run this by the community before I embarrass > > myself on Bugzilla. It seems like a clear bug to me. I'm running > > Bioperl 1.5.0 on RedHat. > > > > For a test input: > > > > >test > > ATGATGATGATGATGTGA > > > > the following code is fine. > > > > while((my $seqobj = $seq_in->next_seq())) > > { > > print "\n".$seqobj->display_id; > > my $len = $seqobj->length(); > > print " length: $len"; > > my $frame1_obj = $seqobj->translate(); > > my $f1_prot = $frame1_obj->seq(); > > print "\n$f1_prot"; > > } > > > > Output: > > > > test length: 18 > > MMMMM* > > > > But if I want to change the frame as specified in the BioPerl > > tutorial, by using: > > > > my $frame1_obj = $seqobj->translate(frame => 1); # which should now > > give frame 2, I get: > > > > test length: 18 > > MMMMM-frame > > > > The frame is unchanged and the text "-frame" is tacked on the end of > > the output. The same occurs with translate(frame => 2). > > > > Any ideas? Can something as fundamental as translate() really be > > bugged? or am I guilty of some particularly heinous syntax error? > > > > Cheers > > Derek > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From l.douchy at gmail.com Tue Apr 29 10:16:40 2008 From: l.douchy at gmail.com (Laurent DOUCHY) Date: Tue, 29 Apr 2008 16:16:40 +0200 Subject: [Bioperl-l] translate() oddities In-Reply-To: <200804290958.21548.tristan.lefebure@gmail.com> References: <200804290958.21548.tristan.lefebure@gmail.com> Message-ID: <2fb209dd0804290716x36e403dek55978dc4f54e34ff@mail.gmail.com> Hello, I resolved this issue in Bio::seqIO with the following line : my $sequence = $seq->translate('*', 'X', '0', '1', '0', '0', '0', '0')->seq; the third parameter set the frame. I hope to have been helpful. laurent. On Tue, Apr 29, 2008 at 3:58 PM, Tristan Lefebure < tristan.lefebure at gmail.com> wrote: > Aren't you forgetting the dash? > > my $frame1_obj = $seqobj->translate(-frame => 1) > > > On Tuesday 29 April 2008 08:21:05 Derek Gatherer wrote: > > my $frame1_obj = $seqobj->translate(frame => 1) > > > > -Tristan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From roy.chaudhuri at gmail.com Tue Apr 29 10:27:10 2008 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 29 Apr 2008 15:27:10 +0100 Subject: [Bioperl-l] translate() oddities In-Reply-To: References: <481726BF.1060609@bms.com> Message-ID: <4817303E.1040903@gmail.com> Spent two minutes looking at this, so may as well chip in with what I discovered even though you solved your problem. This "bug" comes about because in version 1.5.1 and earlier, the arguments to translate were a simple list, with the first argument the terminator (defaults to "*"). Your old version therefore assumed that you wanted to translate the stop codon to "-frame". Amusingly given your typo, if you miss the hyphen off the frame argument in version 1.5.2 it reverts to the old interface and you end up with the output "MMMMMframe". The moral of the story is of course to read the docs relevant to the version you are using. Roy. -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. Derek Gatherer wrote: > Thanks Stefan > > Actually, there was a typo in my message, I did use -frame => > 1. However, the problem disappears on upgrading from 1.5.0 to 1.5.2. > > So not a bug anymore. > > Cheers > Derek > > At 14:46 29/04/2008, Stefan Kirov wrote: >> my $frame1_obj = $seqobj->translate(-frame => 1); >> not >> my $frame1_obj = $seqobj->translate(frame => 1); >> Stefan >> >> Derek Gatherer wrote: >>> Hi >>> >>> I thought I'd better run this by the community before I embarrass >>> myself on Bugzilla. It seems like a clear bug to me. I'm running >>> Bioperl 1.5.0 on RedHat. >>> >>> For a test input: >>> >>>> test >>> ATGATGATGATGATGTGA >>> >>> the following code is fine. >>> >>> while((my $seqobj = $seq_in->next_seq())) >>> { >>> print "\n".$seqobj->display_id; >>> my $len = $seqobj->length(); >>> print " length: $len"; >>> my $frame1_obj = $seqobj->translate(); >>> my $f1_prot = $frame1_obj->seq(); >>> print "\n$f1_prot"; >>> } >>> >>> Output: >>> >>> test length: 18 >>> MMMMM* >>> >>> But if I want to change the frame as specified in the BioPerl >>> tutorial, by using: >>> >>> my $frame1_obj = $seqobj->translate(frame => 1); # which should now >>> give frame 2, I get: >>> >>> test length: 18 >>> MMMMM-frame >>> >>> The frame is unchanged and the text "-frame" is tacked on the end of >>> the output. The same occurs with translate(frame => 2). >>> >>> Any ideas? Can something as fundamental as translate() really be >>> bugged? or am I guilty of some particularly heinous syntax error? >>> >>> Cheers >>> Derek >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From stefan.kirov at bms.com Tue Apr 29 09:46:39 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Tue, 29 Apr 2008 09:46:39 -0400 Subject: [Bioperl-l] translate() oddities In-Reply-To: References: Message-ID: <481726BF.1060609@bms.com> my $frame1_obj = $seqobj->translate(-frame => 1); not my $frame1_obj = $seqobj->translate(frame => 1); Stefan Derek Gatherer wrote: > Hi > > I thought I'd better run this by the community before I embarrass > myself on Bugzilla. It seems like a clear bug to me. I'm running > Bioperl 1.5.0 on RedHat. > > For a test input: > > >test > ATGATGATGATGATGTGA > > the following code is fine. > > while((my $seqobj = $seq_in->next_seq())) > { > print "\n".$seqobj->display_id; > my $len = $seqobj->length(); > print " length: $len"; > my $frame1_obj = $seqobj->translate(); > my $f1_prot = $frame1_obj->seq(); > print "\n$f1_prot"; > } > > Output: > > test length: 18 > MMMMM* > > But if I want to change the frame as specified in the BioPerl > tutorial, by using: > > my $frame1_obj = $seqobj->translate(frame => 1); # which should now > give frame 2, I get: > > test length: 18 > MMMMM-frame > > The frame is unchanged and the text "-frame" is tacked on the end of > the output. The same occurs with translate(frame => 2). > > Any ideas? Can something as fundamental as translate() really be > bugged? or am I guilty of some particularly heinous syntax error? > > Cheers > Derek > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Apr 29 11:00:00 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 29 Apr 2008 10:00:00 -0500 Subject: [Bioperl-l] translate() oddities In-Reply-To: <4817303E.1040903@gmail.com> References: <481726BF.1060609@bms.com> <4817303E.1040903@gmail.com> Message-ID: <36045A08-AEA8-4639-A384-1DC53B5DC129@uiuc.edu> Yes the interface changed somewhat post 1.5.1, mainly to accept named parameters. I think a few methods do this now as passing in lists of more than 2 args, undef'ing those one doesn't want set, gets confusing. chris On Apr 29, 2008, at 9:27 AM, Roy Chaudhuri wrote: > Spent two minutes looking at this, so may as well chip in with what > I discovered even though you solved your problem. > > This "bug" comes about because in version 1.5.1 and earlier, the > arguments to translate were a simple list, with the first argument > the terminator (defaults to "*"). Your old version therefore assumed > that you wanted to translate the stop codon to "-frame". Amusingly > given your typo, if you miss the hyphen off the frame argument in > version 1.5.2 it reverts to the old interface and you end up with > the output "MMMMMframe". The moral of the story is of course to read > the docs relevant to the version you are using. > > Roy. > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > > Derek Gatherer wrote: >> Thanks Stefan >> Actually, there was a typo in my message, I did use -frame => 1. >> However, the problem disappears on upgrading from 1.5.0 to 1.5.2. >> So not a bug anymore. >> Cheers >> Derek >> At 14:46 29/04/2008, Stefan Kirov wrote: >>> my $frame1_obj = $seqobj->translate(-frame => 1); >>> not >>> my $frame1_obj = $seqobj->translate(frame => 1); >>> Stefan >>> >>> Derek Gatherer wrote: >>>> Hi >>>> >>>> I thought I'd better run this by the community before I embarrass >>>> myself on Bugzilla. It seems like a clear bug to me. I'm running >>>> Bioperl 1.5.0 on RedHat. >>>> >>>> For a test input: >>>> >>>>> test >>>> ATGATGATGATGATGTGA >>>> >>>> the following code is fine. >>>> >>>> while((my $seqobj = $seq_in->next_seq())) >>>> { >>>> print "\n".$seqobj->display_id; >>>> my $len = $seqobj->length(); >>>> print " length: $len"; >>>> my $frame1_obj = $seqobj->translate(); >>>> my $f1_prot = $frame1_obj->seq(); >>>> print "\n$f1_prot"; >>>> } >>>> >>>> Output: >>>> >>>> test length: 18 >>>> MMMMM* >>>> >>>> But if I want to change the frame as specified in the BioPerl >>>> tutorial, by using: >>>> >>>> my $frame1_obj = $seqobj->translate(frame => 1); # which should now >>>> give frame 2, I get: >>>> >>>> test length: 18 >>>> MMMMM-frame >>>> >>>> The frame is unchanged and the text "-frame" is tacked on the end >>>> of >>>> the output. The same occurs with translate(frame => 2). >>>> >>>> Any ideas? Can something as fundamental as translate() really be >>>> bugged? or am I guilty of some particularly heinous syntax error? >>>> >>>> Cheers >>>> Derek >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Apr 29 11:07:30 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 29 Apr 2008 10:07:30 -0500 Subject: [Bioperl-l] translate() oddities In-Reply-To: <481726BF.1060609@bms.com> References: <481726BF.1060609@bms.com> Message-ID: <18DB95FB-52B9-4091-ACEE-996891F8A5AE@uiuc.edu> As an aside, I've been playing around with perl6 (Rakudo) for a bit now. Parameter-like passing (using autoaccessors and other means) will be added in soon, so you will be able to do this: $seqobj = Seq.new(seq => 'ATGATGATGATGATGTGA', alphabet => 'dna'); my $protobj = $seq.translate(frame => 1); Yes, I'm a geek. ; > chris On Apr 29, 2008, at 8:46 AM, Stefan Kirov wrote: > my $frame1_obj = $seqobj->translate(-frame => 1); > not > my $frame1_obj = $seqobj->translate(frame => 1); > Stefan > > Derek Gatherer wrote: >> Hi >> >> I thought I'd better run this by the community before I embarrass >> myself on Bugzilla. It seems like a clear bug to me. I'm running >> Bioperl 1.5.0 on RedHat. >> >> For a test input: >> >>> test >> ATGATGATGATGATGTGA >> >> the following code is fine. >> >> while((my $seqobj = $seq_in->next_seq())) >> { >> print "\n".$seqobj->display_id; >> my $len = $seqobj->length(); >> print " length: $len"; >> my $frame1_obj = $seqobj->translate(); >> my $f1_prot = $frame1_obj->seq(); >> print "\n$f1_prot"; >> } >> >> Output: >> >> test length: 18 >> MMMMM* >> >> But if I want to change the frame as specified in the BioPerl >> tutorial, by using: >> >> my $frame1_obj = $seqobj->translate(frame => 1); # which should now >> give frame 2, I get: >> >> test length: 18 >> MMMMM-frame >> >> The frame is unchanged and the text "-frame" is tacked on the end of >> the output. The same occurs with translate(frame => 2). >> >> Any ideas? Can something as fundamental as translate() really be >> bugged? or am I guilty of some particularly heinous syntax error? >> >> Cheers >> Derek >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dr.hogart at gmail.com Tue Apr 29 11:57:51 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Tue, 29 Apr 2008 19:57:51 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine Message-ID: Hi all! I am trying to perform TCoffe aligment by Bio::Tools::Run::Alignment::TCoffee wrapper as subroutine into the script. This subroutine works fine, but it is not single subroutine - there are a lot of other ones in the script. The problem is when compilation of script finish execution (nb! successful execution) of tcoffee subroutine the compiliation of the end of the script also interrupted. It seems that the tcoffee program itself induce interraption of perl compilation. Is it possible to pass this problem? -- From darin.london at duke.edu Tue Apr 29 12:49:53 2008 From: darin.london at duke.edu (darin.london at duke.edu) Date: Tue, 29 Apr 2008 12:49:53 -0400 Subject: [Bioperl-l] BOSC 2008 Announcement and Call For Submissions Message-ID: <200804291650.m3TGnr0H020814@tenero.duhs.duke.edu> BOSC 2008 Call for Abstracts Reminder The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will take place in Toronto, Ontario, Canada, as one of several Special Interest Group (SIG) meetings occurring in conjunction with the 16th annual Intelligent Systems for Molecular Biology Conference (ISMB 2008). This is a reminder to submit your proposals for talks to the BOSC submission system before May 11. Submission Process: All abstracts must be submitted through our Open Conference Systems site (http://events.open-bio.org/BOSC2008/openconf.php). The form will ask for a small Abstract Text to be pasted into it, and a full paper. The small Abstract text should be a summary, while the longer abstract (should provide more details, including the open-source license requirement details) Full-length abstracts are limited to one page with one inch (2.5 cm) margins on the top, sides, and bottom. The full-length abstract should include the title, authors, and affiliations. We prefer your abstract to be in PDF format, although plain t Important Dates: May 11: Abstract submission deadline. June 2: Notification of accepted talks. June 4: Early registration discount cut-off. July 18-19: BOSC 2008! We hope to see you at BOSC 2008! Kam Dahlquist and Darin London BOSC 2008 Co-organizers From bix at sendu.me.uk Tue Apr 29 12:54:41 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 29 Apr 2008 17:54:41 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: Message-ID: <481752D1.7010904@sendu.me.uk> sergei ryazansky wrote: > I am trying to perform TCoffe aligment by > Bio::Tools::Run::Alignment::TCoffee wrapper as subroutine into the > script. This subroutine works fine, but it is not single subroutine - > there are a lot of other ones in the script. The problem is when > compilation of script finish execution (nb! successful execution) of > tcoffee subroutine the compiliation of the end of the script also > interrupted. It seems that the tcoffee program itself induce > interraption of perl compilation. Is it possible to pass this problem? You'll have to supply us with a minimal version of the script and the complete error message. From dr.hogart at gmail.com Wed Apr 30 07:24:35 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Wed, 30 Apr 2008 15:24:35 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine References: Message-ID: On Tue, 29 Apr 2008 19:57:51 +0400, sergei ryazansky wrote: > Hi all! > > I am trying to perform TCoffe aligment by > Bio::Tools::Run::Alignment::TCoffee wrapper as subroutine into the > script. This subroutine works fine, but it is not single subroutine - > there are a lot of other ones in the script. The problem is when > compilation of script finish execution (nb! successful execution) of > tcoffee subroutine the compiliation of the end of the script also > interrupted. It seems that the tcoffee program itself induce > interraption of perl compilation. Is it possible to pass this problem? > My subroutine is following: sub align { my $file=shift @_; my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => 'fasta', 'outfile' => 'temp_align.out'); my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); my $aln=$factory->align ($file); open (fy,'temp_align.out'); my @temp_file=; close fy; return @temp_file; } This subroutine is called by the following command: my @align_fa = align($inputfile_align); After successful execution of this subroutine (accompaning with the corresponding messages on the terminal window) the execution of remainder script is terminated without any error messages. -- From bix at sendu.me.uk Wed Apr 30 08:47:17 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 30 Apr 2008 13:47:17 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: Message-ID: <48186A55.4030406@sendu.me.uk> sergei ryazansky wrote: > My subroutine is following: > > sub align { > my $file=shift @_; > my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => > 'fasta', 'outfile' => 'temp_align.out'); > my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); > my $aln=$factory->align ($file); > open (fy,'temp_align.out'); my @temp_file=; close fy; > return @temp_file; > } > > This subroutine is called by the following command: > > my @align_fa = align($inputfile_align); > > After successful execution of this subroutine (accompaning with the > corresponding messages on the terminal window) the execution of > remainder script is terminated without any error messages. The problem lies somewhere within the rest of your script, so we have to see it if you want help. Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you don't make use of the resulting alignment object? A system call might make more sense given what you're doing. The beauty of Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse the result file (temp_align.out) yourself. From dr.hogart at gmail.com Wed Apr 30 09:36:58 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Wed, 30 Apr 2008 17:36:58 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine References: <48186A55.4030406@sendu.me.uk> Message-ID: On Wed, 30 Apr 2008 16:47:17 +0400, Sendu Bala wrote: > sergei ryazansky wrote: >> My subroutine is following: >> sub align { >> my $file=shift @_; >> my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => >> 'fasta', 'outfile' => 'temp_align.out'); >> my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >> my $aln=$factory->align ($file); >> open (fy,'temp_align.out'); my @temp_file=; close fy; >> return @temp_file; >> } >> This subroutine is called by the following command: >> my @align_fa = align($inputfile_align); >> After successful execution of this subroutine (accompaning with the >> corresponding messages on the terminal window) the execution of >> remainder script is terminated without any error messages. > > The problem lies somewhere within the rest of your script, so we have to > see it if you want help. > > Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you > don't make use of the resulting alignment object? A system call might > make more sense given what you're doing. The beauty of > Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse the > result file (temp_align.out) yourself. The rest of script,imho, is ok, because without this sub it is work fine. May be problem lies into the TCoffee itself? One of the feature of script is to estimate the quantity of nt changes in each position in the different similar sequences in comparing with consensus sequences. To perform this it is nesseccary to obtain the multiply alignment: the result of TCoffee alignment goes to another subroutine, that estemated the level of changes. Of course, I dont think that this way is the best approach, most probably there are a lot of the better ways to do it. But for my today purposes it is ok. -- From avilella at gmail.com Wed Apr 30 10:16:56 2008 From: avilella at gmail.com (Albert Vilella) Date: Wed, 30 Apr 2008 15:16:56 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> Message-ID: <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> Hi Sergei, Can you try to isolate this call with a simpler example to see if it still fails? When you say that the problems are in the compilation, do you mean that the interpreter won't even compile or that it fails during execution? Have you checked that you have all the dependencies right? Cheers, Albert. On Wed, Apr 30, 2008 at 2:36 PM, sergei ryazansky wrote: > On Wed, 30 Apr 2008 16:47:17 +0400, Sendu Bala wrote: > > sergei ryazansky wrote: > > > > > My subroutine is following: > > > sub align { > > > my $file=shift @_; > > > my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => > > > 'fasta', 'outfile' => 'temp_align.out'); > > > my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); > > > my $aln=$factory->align ($file); > > > open (fy,'temp_align.out'); my @temp_file=; close fy; > > > return @temp_file; > > > } > > > This subroutine is called by the following command: > > > my @align_fa = align($inputfile_align); > > > After successful execution of this subroutine (accompaning with the > > > corresponding messages on the terminal window) the execution of remainder > > > script is terminated without any error messages. > > > > > > > The problem lies somewhere within the rest of your script, so we have to > > see it if you want help. > > > > Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you > > don't make use of the resulting alignment object? A system call might make > > more sense given what you're doing. The beauty of > > Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse the > > result file (temp_align.out) yourself. > > > > The rest of script,imho, is ok, because without this sub it is work fine. > May be problem lies into the TCoffee itself? > > One of the feature of script is to estimate the quantity of nt changes in > each position in the different similar sequences in comparing with consensus > sequences. To perform this it is nesseccary to obtain the multiply > alignment: the result of TCoffee alignment goes to another subroutine, that > estemated the level of changes. Of course, I dont think that this way is the > best approach, most probably there are a lot of the better ways to do it. > But for my today purposes it is ok. > > -- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bix at sendu.me.uk Wed Apr 30 10:22:01 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 30 Apr 2008 15:22:01 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> Message-ID: <48188089.8000300@sendu.me.uk> sergei ryazansky wrote: > On Wed, 30 Apr 2008 16:47:17 +0400, Sendu Bala wrote: > >> sergei ryazansky wrote: >>> My subroutine is following: >>> sub align { >>> my $file=shift @_; >>> my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => >>> 'fasta', 'outfile' => 'temp_align.out'); >>> my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >>> my $aln=$factory->align ($file); >>> open (fy,'temp_align.out'); my @temp_file=; close fy; >>> return @temp_file; >>> } >>> This subroutine is called by the following command: >>> my @align_fa = align($inputfile_align); >>> After successful execution of this subroutine (accompaning with the >>> corresponding messages on the terminal window) the execution of >>> remainder script is terminated without any error messages. >> >> The problem lies somewhere within the rest of your script, so we have >> to see it if you want help. > > The rest of script,imho, is ok, because without this sub it is work > fine. May be problem lies into the TCoffee itself? I've run your subroutine in a simple script of my own and it doesn't cause script termination. Again, the problem lies elsewhere in your script. Supply it or it is impossible for anyone to help you. From Sebastien.Moretti at unil.ch Wed Apr 30 10:06:28 2008 From: Sebastien.Moretti at unil.ch (Sebastien MORETTI) Date: Wed, 30 Apr 2008 16:06:28 +0200 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> Message-ID: <48187CE4.8030606@unil.ch> >>> My subroutine is following: >>> sub align { >>> my $file=shift @_; >>> my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => >>> 'fasta', 'outfile' => 'temp_align.out'); >>> my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >>> my $aln=$factory->align ($file); >>> open (fy,'temp_align.out'); my @temp_file=; close fy; >>> return @temp_file; >>> } >>> This subroutine is called by the following command: >>> my @align_fa = align($inputfile_align); >>> After successful execution of this subroutine (accompaning with the >>> corresponding messages on the terminal window) the execution of >>> remainder script is terminated without any error messages. >> >> The problem lies somewhere within the rest of your script, so we have >> to see it if you want help. >> >> Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you >> don't make use of the resulting alignment object? A system call might >> make more sense given what you're doing. The beauty of >> Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse >> the result file (temp_align.out) yourself. > > The rest of script,imho, is ok, because without this sub it is work > fine. May be problem lies into the TCoffee itself? > > One of the feature of script is to estimate the quantity of nt changes > in each position in the different similar sequences in comparing with > consensus sequences. To perform this it is nesseccary to obtain the > multiply alignment: the result of TCoffee alignment goes to another > subroutine, that estemated the level of changes. Of course, I dont think > that this way is the best approach, most probably there are a lot of the > better ways to do it. But for my today purposes it is ok. Do you have tried to use the tcoffee command, called via bioperl, as a command line ? To check if it is a problem with tcoffee or with the tcoffee release that bioperl must use. -- S?bastien Moretti From dr.hogart at gmail.com Wed Apr 30 10:54:59 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Wed, 30 Apr 2008 18:54:59 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> Message-ID: Hi Albert, The isolated call is executed without any problem, so the code is absolutely correct. The problem arise when this sub executed within the whole script - after successful execution of TCoffee alignment the execution of the rest of script is terminated. The whole code is very big (~500 lines), so for simplicity lets imagine the sheme of script in the following view: sub1; sub2; sub3; sub align; # TCoffe alignment; sub4; sub5; Each sub (subroutine) is independent from the others subs; The order of script execution is 1,2,3,align,4,5. But after the execution of align the execution of the rest of subs (4 and 5) is terminated. The script without sub align {} successfully execute the sub 4 and sub 5. So, I mean that interpreter won't compile sub 4 and 5 if sub align is placed before them. On Wed, 30 Apr 2008 18:16:56 +0400, Albert Vilella wrote: > Hi Sergei, > > Can you try to isolate this call with a simpler example to see if it > still > fails? When you say that the problems are in the compilation, do you mean > that the interpreter won't even compile or that it fails during > execution? > Have you checked that you have all the dependencies right? > > Cheers, > > Albert. > > On Wed, Apr 30, 2008 at 2:36 PM, sergei ryazansky > wrote: > >> On Wed, 30 Apr 2008 16:47:17 +0400, Sendu Bala wrote: >> >> sergei ryazansky wrote: >> > >> > > My subroutine is following: >> > > sub align { >> > > my $file=shift @_; >> > > my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => >> > > 'fasta', 'outfile' => 'temp_align.out'); >> > > my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >> > > my $aln=$factory->align ($file); >> > > open (fy,'temp_align.out'); my @temp_file=; close fy; >> > > return @temp_file; >> > > } >> > > This subroutine is called by the following command: >> > > my @align_fa = align($inputfile_align); >> > > After successful execution of this subroutine (accompaning with the >> > > corresponding messages on the terminal window) the execution of >> remainder >> > > script is terminated without any error messages. >> > > >> > >> > The problem lies somewhere within the rest of your script, so we have >> to >> > see it if you want help. >> > >> > Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you >> > don't make use of the resulting alignment object? A system call might >> make >> > more sense given what you're doing. The beauty of >> > Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse >> the >> > result file (temp_align.out) yourself. >> > >> >> The rest of script,imho, is ok, because without this sub it is work >> fine. >> May be problem lies into the TCoffee itself? >> >> One of the feature of script is to estimate the quantity of nt changes >> in >> each position in the different similar sequences in comparing with >> consensus >> sequences. To perform this it is nesseccary to obtain the multiply >> alignment: the result of TCoffee alignment goes to another subroutine, >> that >> estemated the level of changes. Of course, I dont think that this way >> is the >> best approach, most probably there are a lot of the better ways to do >> it. >> But for my today purposes it is ok. >> >> -- >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- ?????????? M2, ????????????? ???????? ?????????? Opera: http://www.opera.com/mail/mail/ From dr.hogart at gmail.com Wed Apr 30 11:14:09 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Wed, 30 Apr 2008 19:14:09 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine References: <48186A55.4030406@sendu.me.uk> <48187CE4.8030606@unil.ch> Message-ID: No, I didn tried. To tell the truth the problem like this I have obtatin earlier. I simply wanted to aling the several set of sequences by TCoffee Bioperl package. The script should have been consequently add the set one after another to TCoffee wrapper. But after the alignment of the first set of sequences the alignment of the rest sets was terminated. So it was neccessary to use another "super_script" that called first script with different arguments linked to the corresponding set. > Do you have tried to use the tcoffee command, called via bioperl, as a > command line ? -- From bix at sendu.me.uk Wed Apr 30 11:28:50 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 30 Apr 2008 16:28:50 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> Message-ID: <48189032.20102@sendu.me.uk> sergei ryazansky wrote: > Hi Albert, > > The isolated call is executed without any problem, so the code is > absolutely correct. The problem arise when this sub executed within the > whole script - after successful execution of TCoffee alignment the > execution of the rest of script is terminated. The whole code is very > big (~500 lines), so for simplicity lets imagine the sheme of script in > the following view: > sub1; > sub2; > sub3; > sub align; # TCoffe alignment; > sub4; > sub5; > > Each sub (subroutine) is independent from the others subs; The order of > script execution is 1,2,3,align,4,5. But after the execution of align > the execution of the rest of subs (4 and 5) is terminated. The script > without sub align {} successfully execute the sub 4 and sub 5. So, I > mean that interpreter won't compile sub 4 and 5 if sub align is placed > before them. This has nothing to do with interpreter compilation, which is successful if the script runs at all. What do you do with the output of &align? The thing you are doing with that output is most likely the cause of your script terminating, which is why &sub4 and &sub5 run when you don't run &align (have no output that causes the problem). If you're not willing to show us your script, here are some simple debugging steps you can do yourself: # don't do anything with the output of align() - does &sub4 still run? # add some print statements after you call align(), and then after every further block of code in your script to see exactly where the script terminates # reduce your script down to a minimal script that shows the problem (with the help of the previous step) and show us that From dr.hogart at gmail.com Wed Apr 30 11:42:41 2008 From: dr.hogart at gmail.com (Sergei Ryazansky) Date: Wed, 30 Apr 2008 19:42:41 +0400 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> Message-ID: ------- Forwarded message ------- From: "Sergei Ryazansky" To: "Sendu Bala" Cc: Subject: Re: [Bioperl-l] alignment by TCoffee as a subroutine Date: Wed, 30 Apr 2008 19:40:26 +0400 > What do you do with the output of &align? The thing you are doing with > that output is most likely the cause of your script terminating, which > is why &sub4 and &sub5 run when you don't run &align (have no output > that causes the problem). please sea my answer to Sebastien Moretti - there are description of another similar problem. The only thing that I did there with output is printing to file. Nevetheless the problem was the same. > # don't do anything with the output of align() - does &sub4 still run? please sea above. > # add some print statements after you call align(), and then after every > further block of code in your script to see exactly where the script > terminates > # reduce your script down to a minimal script that shows the problem > (with the help of the previous step) and show us that all tests with individual bloks was performed earlier. the results is ok. From cjfields at uiuc.edu Wed Apr 30 12:25:06 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 30 Apr 2008 11:25:06 -0500 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> Message-ID: <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Sergei, I agree with Sendu; we can't diagnose this unless we either have the entire script of a minimal version of it demonstrating the bug. The best way to handle this is to file a bug report, attaching relevant data using the 'Create a new attachment' link (including either the full script or a shortened one which demonstrates the bug). Otherwise we're just shooting in the dark trying to diagnose the problem. http://bugzilla.open-bio.org/ chris On Apr 30, 2008, at 10:42 AM, Sergei Ryazansky wrote: > > > ------- Forwarded message ------- > From: "Sergei Ryazansky" > To: "Sendu Bala" > Cc: > Subject: Re: [Bioperl-l] alignment by TCoffee as a subroutine > Date: Wed, 30 Apr 2008 19:40:26 +0400 > >> What do you do with the output of &align? The thing you are doing >> with that output is most likely the cause of your script >> terminating, which is why &sub4 and &sub5 run when you don't run >> &align (have no output that causes the problem). > > please sea my answer to Sebastien Moretti - there are description of > another similar problem. The only thing that I did there with output > is > printing to file. Nevetheless the problem was the same. > >> # don't do anything with the output of align() - does &sub4 still >> run? > > please sea above. > >> # add some print statements after you call align(), and then after >> every further block of code in your script to see exactly where the >> script terminates >> # reduce your script down to a minimal script that shows the >> problem (with the help of the previous step) and show us that > > all tests with individual bloks was performed earlier. the results > is ok. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dr.hogart at gmail.com Wed Apr 30 12:40:19 2008 From: dr.hogart at gmail.com (Sergei Ryazansky) Date: Wed, 30 Apr 2008 20:40:19 +0400 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Message-ID: On Wed, 30 Apr 2008 20:25:06 +0400, Chris Fields wrote: Chris, I have already sent file to Sendu and also I am attaching it here. I have removed from it really unnecessary parts. > Sergei, > > I agree with Sendu; we can't diagnose this unless we either have the > entire script of a minimal version of it demonstrating the bug. > > The best way to handle this is to file a bug report, attaching relevant > data using the 'Create a new attachment' link (including either the full > script or a shortened one which demonstrates the bug). Otherwise we're > just shooting in the dark trying to diagnose the problem. > > http://bugzilla.open-bio.org/ > > chris -------------- next part -------------- A non-text attachment was scrubbed... Name: script.pl Type: application/octet-stream Size: 6870 bytes Desc: not available URL: From cjfields at uiuc.edu Wed Apr 30 13:02:19 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 30 Apr 2008 12:02:19 -0500 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Message-ID: Hmm, maybe you were confused? From my last email: "The best way to handle this is to file a bug report, attaching relevant data using the 'Create a new attachment' link (including either the full script or a shortened one which demonstrates the bug). Otherwise we're just shooting in the dark trying to diagnose the problem." http://bugzilla.open-bio.org/ Anyone can work on fixing the issue there (so it'll probably get fixed faster). The devs can also track progress on the problem via the dev mail list (bioperl-guts). Diagnosing the bug may also reveal issues not just with Bio::Tools::Run::Alignment::TCoffee but also with other related modules. If needed I can post it to bugzilla, but it helps to submit the bug yourself (so you can receive posts on it's progress). chris On Apr 30, 2008, at 11:40 AM, Sergei Ryazansky wrote: > On Wed, 30 Apr 2008 20:25:06 +0400, Chris Fields > wrote: > > Chris, I have already sent file to Sendu and also I am attaching it > here. I have removed from it really unnecessary parts. > >> Sergei, >> >> I agree with Sendu; we can't diagnose this unless we either have >> the entire script of a minimal version of it demonstrating the bug. >> >> The best way to handle this is to file a bug report, attaching >> relevant data using the 'Create a new attachment' link (including >> either the full script or a shortened one which demonstrates the >> bug). Otherwise we're just shooting in the dark trying to diagnose >> the problem. >> >> http://bugzilla.open-bio.org/ >> >> chris From dr.hogart at gmail.com Wed Apr 30 13:39:35 2008 From: dr.hogart at gmail.com (Sergei Ryazansky) Date: Wed, 30 Apr 2008 21:39:35 +0400 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Message-ID: On Wed, 30 Apr 2008 21:11:56 +0400, Sergei Ryazansky wrote: > Oh, sorry, you right - I too fast read you message. I do it slight later. > >> Hmm, maybe you were confused? From my last email: >> >> "The best way to handle this is to file a bug report, attaching >> relevant data using the 'Create a new attachment' link (including >> either the full script or a shortened one which demonstrates the bug). >> Otherwise we're just shooting in the dark trying to diagnose the >> problem." >> >> http://bugzilla.open-bio.org/ >> >> Anyone can work on fixing the issue there (so it'll probably get fixed >> faster). The devs can also track progress on the problem via the dev >> mail list (bioperl-guts). Diagnosing the bug may also reveal issues >> not just with Bio::Tools::Run::Alignment::TCoffee but also with other >> related modules. >> >> If needed I can post it to bugzilla, but it helps to submit the bug >> yourself (so you can receive posts on it's progress). >> >> chris >> >> On Apr 30, 2008, at 11:40 AM, Sergei Ryazansky wrote: >> >>> On Wed, 30 Apr 2008 20:25:06 +0400, Chris Fields >>> wrote: >>> >>> Chris, I have already sent file to Sendu and also I am attaching it >>> here. I have removed from it really unnecessary parts. >>> >>>> Sergei, >>>> >>>> I agree with Sendu; we can't diagnose this unless we either have the >>>> entire script of a minimal version of it demonstrating the bug. >>>> >>>> The best way to handle this is to file a bug report, attaching >>>> relevant data using the 'Create a new attachment' link (including >>>> either the full script or a shortened one which demonstrates the >>>> bug). Otherwise we're just shooting in the dark trying to diagnose >>>> the problem. >>>> >>>> http://bugzilla.open-bio.org/ >>>> >>>> chris > From cjfields at uiuc.edu Wed Apr 30 14:29:28 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 30 Apr 2008 13:29:28 -0500 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Message-ID: <39A139E4-6783-41E6-8EE9-1FE60CB57577@uiuc.edu> Sorry, didn't catch that... chris On Apr 30, 2008, at 12:39 PM, Sergei Ryazansky wrote: > On Wed, 30 Apr 2008 21:11:56 +0400, Sergei Ryazansky > wrote: > >> Oh, sorry, you right - I too fast read you message. I do it slight >> later. >> >>> Hmm, maybe you were confused? From my last email: >>> >>> "The best way to handle this is to file a bug report, attaching >>> relevant data using the 'Create a new attachment' link (including >>> either the full script or a shortened one which demonstrates the >>> bug). Otherwise we're just shooting in the dark trying to diagnose >>> the problem." >>> >>> http://bugzilla.open-bio.org/ >>> >>> Anyone can work on fixing the issue there (so it'll probably get >>> fixed faster). The devs can also track progress on the problem >>> via the dev mail list (bioperl-guts). Diagnosing the bug may also >>> reveal issues not just with Bio::Tools::Run::Alignment::TCoffee >>> but also with other related modules. >>> >>> If needed I can post it to bugzilla, but it helps to submit the >>> bug yourself (so you can receive posts on it's progress). >>> >>> chris >>> >>> On Apr 30, 2008, at 11:40 AM, Sergei Ryazansky wrote: >>> >>>> On Wed, 30 Apr 2008 20:25:06 +0400, Chris Fields >>>> wrote: >>>> >>>> Chris, I have already sent file to Sendu and also I am attaching >>>> it here. I have removed from it really unnecessary parts. >>>> >>>>> Sergei, >>>>> >>>>> I agree with Sendu; we can't diagnose this unless we either have >>>>> the entire script of a minimal version of it demonstrating the >>>>> bug. >>>>> >>>>> The best way to handle this is to file a bug report, attaching >>>>> relevant data using the 'Create a new attachment' link >>>>> (including either the full script or a shortened one which >>>>> demonstrates the bug). Otherwise we're just shooting in the dark >>>>> trying to diagnose the problem. >>>>> >>>>> http://bugzilla.open-bio.org/ >>>>> >>>>> chris >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Bank.Beszteri at awi.de Tue Apr 1 12:31:49 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Tue, 01 Apr 2008 14:31:49 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL Message-ID: <47F22B35.1030502@awi.de> Dear list, we have recently started to try to find a solution for indexing large sequence databases / flat files for a java project, and because we ran into problems using biojava, and because both the OBDA and BioSQL ways seem to be compatible across bio~ projects, we also started to experiment with bioperl. It looks like this should work fine, but we had a couple of problems here, too. Perhaps some of you can give me hint what we are doing wrong! The first thing we tried was to use Bio::DB::Flat for indexing a TrEMBL flat file (~ 12 GB); but it seems we haven?t got a machine with enough memory to be able to handle this. (Perhaps you would be using the "bdb" style index in such a case in bioperl, but this apparently doesn?t work with biojava, so we had to stick with "flat"). So next we started to test BioSQL, by trying to load just Swissprot in a MySQL DB first, like: load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser xyz --dbpass abc --driver mysql --namespace uniprot_sprot --format swiss uniprot_sprot.dat Here we get an error message ########################################### Loading /biodb/spinkern/uniprot_sprot.dat ... Could not store Q6DAH5: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The supplied lineage does not start near 'Erwinia carotovora subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | Pectobacterium | Enterobacteriaceae | Enterobacteriales | Gammaproteobacteria | Proteobacteria | Bacteria') STACK: Error::throw STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/bioperl-1.5.2_102/Bio/Root/Root.pm:359 STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/bioperl-1.5.2_102/Bio/Species.pm:174 STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:552 STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1305 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:973 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182 STACK: Bio::DB::Persistent::PersistentObject::create /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store /biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271 STACK: load_seqdatabase.pl:622 ----------------------------------------------------------- at load_seqdatabase.pl line 635 ############################################ or similar, depending on whether we use a pre-loaded ncbi taxonomy or not, and which Swissprot release we are trying to load. It often seems to come from sg. like here, subsp. or other special addition to the species line; but alternative genus names and other curious things also to appear. It looks like Species.pm tries to validate the species name against the lineage info already there in the BioSQL DB, and in several cases, it finds inconsistencies. If we start with the ncbi taxonomy already loaded in the database, the first error comes much earlier. I found a thread on the same problem from ~ two years ago (http://thread.gmane.org/gmane.comp.lang.perl.bio.general/13766/focus=13788), where the solution recommended was to update bioperl, so I was quite surprised to find the problem with the version you can see above (1.5.2_102 bioperl core, 1.5.2_100 bioperl_db). Can someone give me any hints as to what is going wrong here? The only workaround we have found so far was to comment out line 174 in Species.pm: $self->throw("The supplied lineage does not start near '$name' (I was supplied '".join(" | ", @vals)."')"); After doing so, load_seqdatabase.pl runs for several hours (until it evetually crashes; I haven?t found out yet why), but proceeds really slowly. I also found some info on this for Pg and Oracle in the mailing list, but has anyone some approximate numbers for MySQL, how long should a first Swissprot load take? Would be grateful to hear about your ideas / experiences on these issues! Bank Beszteri Bioinformatics / Scientific Computing Alfred Wegener Institute for Polar and Marine Research Am Handelshafen 12. 27570 Bremerhaven Germany From cjfields at uiuc.edu Wed Apr 2 00:45:28 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 1 Apr 2008 19:45:28 -0500 Subject: [Bioperl-l] quick update on bioperl nightly builds Message-ID: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> I'm simplifying the nightly build archive names (removing svn revision # and date) in case anyone needs to update bioperl-live/run/db/network on a regular basis (read: GBrowse installations). When I have time I'll start working on automated builds, which will require some extra work with Module::Build and Build.PL. chris From hiekeen at gmail.com Wed Apr 2 02:14:07 2008 From: hiekeen at gmail.com (Jinyan Huang) Date: Wed, 2 Apr 2008 10:14:07 +0800 Subject: [Bioperl-l] How to make a network graphic using my genes in pathways? Message-ID: I have 20 pathways. My interesting genes are in these pathways. There are some genes overlaps in these pathways. How can I make a graphic network using these genes? It means connecting these pathways through these overlap genes. What kind of software can I use? Thank you very much in advance. -- Best regards, Jinyan Huang (ekeen) School of Life Sciences and Technology, 1302 Room Tongji University Siping Road 1239, Shanghai 200092 P.R. China Tel :0086-21-65981041 Msn: hiekeen at hotmail.com eMail: hiekeen at gmail.com From hlapp at gmx.net Wed Apr 2 02:30:06 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 1 Apr 2008 22:30:06 -0400 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47F22B35.1030502@awi.de> References: <47F22B35.1030502@awi.de> Message-ID: On Apr 1, 2008, at 8:31 AM, B?nk Beszteri wrote: > [...] So next we started to test BioSQL, by trying to load just > Swissprot in a MySQL DB first, like: > > load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser > xyz --dbpass abc --driver mysql --namespace uniprot_sprot --format > swiss uniprot_sprot.dat > > Here we get an error message > > ########################################### > > Loading /biodb/spinkern/uniprot_sprot.dat ... > Could not store Q6DAH5: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: The supplied lineage does not start near 'Erwinia carotovora > subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | > Pectobacterium | Enterobacteriaceae | Enterobacteriales | > Gammaproteobacteria | Proteobacteria | Bacteria') > STACK: Error::throw > STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/ > bioperl-1.5.2_102/Bio/Root/Root.pm:359 > STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/ > bioperl-1.5.2_102/Bio/Species.pm:174 > STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: > 552 > STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / > biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:1305 > STACK: > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:973 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:852 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:182 > STACK: Bio::DB::Persistent::PersistentObject::create /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: > 244 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:169 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/ > spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:251 > STACK: Bio::DB::Persistent::PersistentObject::store /biodb/spinkern/ > bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271 > STACK: load_seqdatabase.pl:622 > ----------------------------------------------------------- > > at load_seqdatabase.pl line 635 > > ############################################ > > or similar, depending on whether we use a pre-loaded ncbi taxonomy > or not I recommend to always use a pre-loaded NCBI taxonomy unless you know there are only a few organisms that are straightforward (for the parser, that is). > , and which Swissprot release we are trying to load. It often seems > to come from sg. like here, subsp. or other special addition to the > species line; but alternative genus names and other curious things > also to appear. It looks like Species.pm tries to validate the > species name against the lineage info already there in the BioSQL > DB, and in several cases, it finds inconsistencies. It actually happens upon a successful lookup when the species object is populated from the database. > [...] > The only workaround we have found so far was to comment out line > 174 in Species.pm: > > $self->throw("The supplied lineage does not start near '$name' (I > was supplied '".join(" | ", @vals)."')"); That should be OK if you work with a pre-loaded taxonomy. It's sort of a sanity check that should catch a parser having messed up a species. If you use a pre-loaded NCBI taxonomy the results of the species parsing don't matter in all details so long as the NCBI taxonID is parsed out correctly, and then found in the database. Note that this actually a warn() in the main trunk version of BioPerl, so you might want to upgrade to that (or change throw() to warn() in your version). You still get the records flagged with that, but it isn't an exception. > > After doing so, load_seqdatabase.pl runs for several hours (until > it evetually crashes; I haven?t found out yet why), but proceeds > really slowly. It should certainly *not* crash. Note also that you can supply --safe on the command line, in which case the script will continue with the next record if one fails to load for whatever reason. You will want to adjust the width constraint of dbxref.accession, for example to 128 chars. This will also be fixed for BioSQL 1.0.1. See http://bugzilla.open-bio.org/show_bug.cgi?id=2474 > I also found some info on this for Pg and Oracle in the mailing > list, but has anyone some approximate numbers for MySQL, how long > should a first Swissprot load take? Possibly around 20 hours according to Erik Rijkers: See http://lists.open-bio.org/pipermail/bioperl-l/2008-March/027427.html You can use the --logchunks N option to have it print out performance statistics every N records. Hope this helps, -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Apr 2 02:38:12 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 1 Apr 2008 22:38:12 -0400 Subject: [Bioperl-l] Very basic implementation of GenBank XML SeqIO module In-Reply-To: <47F13C2C.4070909@umdnj.edu> References: <47F13C2C.4070909@umdnj.edu> Message-ID: Ryan - do you not have a committer account? I do agree with Chris on the test. Modules w/o tests tend to become 'pseudogenized.' -hilmar On Mar 31, 2008, at 3:31 PM, Ryan Golhar wrote: > I have a (very) basic SAX implementation of a SeqIO module to parse > GenBank XML records. Right now, it only reads in basic information > regarding the sequence and the sequence itself. > > It does not yet parse the features table. Should I submit it to be > included in bioperl or wait until I implement more for the features > table? I'm not sure when I'll get around to it though > > Ryan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cain.cshl at gmail.com Wed Apr 2 03:12:04 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 01 Apr 2008 23:12:04 -0400 Subject: [Bioperl-l] quick update on bioperl nightly builds In-Reply-To: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> Message-ID: <1207105924.6184.4.camel@frissell> Hi Chris, The tarball is currently (Apr 1) being built in a tmp directory, so that the extracted tarball is ./tmp/bioperl-live/. Is that intended? Thanks, Scott On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: > I'm simplifying the nightly build archive names (removing svn revision > # and date) in case anyone needs to update bioperl-live/run/db/network > on a regular basis (read: GBrowse installations). When I have time > I'll start working on automated builds, which will require some extra > work with Module::Build and Build.PL. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cjfields at uiuc.edu Wed Apr 2 03:59:30 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 1 Apr 2008 22:59:30 -0500 Subject: [Bioperl-l] quick update on bioperl nightly builds In-Reply-To: <1207105924.6184.4.camel@frissell> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> Message-ID: Nope, that isn't intended. I fixed it and reran it manually, so it should be fine now (note I didn't update the log file; the next cron run will catch that). I may toy around with your recent passthrough flag addition to try getting automated PPM's up and running. chris On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: > Hi Chris, > > The tarball is currently (Apr 1) being built in a tmp directory, so > that > the extracted tarball is ./tmp/bioperl-live/. Is that intended? > > Thanks, > Scott > > On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: >> I'm simplifying the nightly build archive names (removing svn >> revision >> # and date) in case anyone needs to update bioperl-live/run/db/ >> network >> on a regular basis (read: GBrowse installations). When I have time >> I'll start working on automated builds, which will require some extra >> work with Module::Build and Build.PL. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Wed Apr 2 11:33:38 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 2 Apr 2008 07:33:38 -0400 Subject: [Bioperl-l] How to make a network graphic using my genes in pathways? In-Reply-To: References: Message-ID: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> On Tue, Apr 1, 2008 at 10:14 PM, Jinyan Huang wrote: > I have 20 pathways. My interesting genes are in these pathways. There > are some genes overlaps in these pathways. How can I make a graphic > network using these genes? It means connecting these pathways through > these overlap genes. What kind of software can I use? R/Bioconductor has tools for working with graphs and pathways. Cytoscape is another open-source graphical solution. Ingenuity is, of course, not free. If you are looking at a perl solution, you can look at the various graph modules and their integration with the Graphviz libraries. SEan From cain.cshl at gmail.com Wed Apr 2 12:28:22 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 02 Apr 2008 08:28:22 -0400 Subject: [Bioperl-l] [Gmod-gbrowse] quick update on bioperl nightly builds In-Reply-To: References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> Message-ID: <1207139302.6507.7.camel@frissell> Hi Chris, (trimmed out gbrowse mailing list since this is just bioperl business) Speaking of the pass through stuff, Sendu mentioned that I stomped on some changes to Build.PL that you and he did when I committed that change, so it should be rolled back. Is there a good (svn) way to do that? Or should I just copy the contents of the old (good) Build.PL into a fresh file in my checkout and commit it? Thanks, Scott On Tue, 2008-04-01 at 22:59 -0500, Chris Fields wrote: > Nope, that isn't intended. I fixed it and reran it manually, so it > should be fine now (note I didn't update the log file; the next cron > run will catch that). > > I may toy around with your recent passthrough flag addition to try > getting automated PPM's up and running. > > chris > > On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: > > > Hi Chris, > > > > The tarball is currently (Apr 1) being built in a tmp directory, so > > that > > the extracted tarball is ./tmp/bioperl-live/. Is that intended? > > > > Thanks, > > Scott > > > > On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: > >> I'm simplifying the nightly build archive names (removing svn > >> revision > >> # and date) in case anyone needs to update bioperl-live/run/db/ > >> network > >> on a regular basis (read: GBrowse installations). When I have time > >> I'll start working on automated builds, which will require some extra > >> work with Module::Build and Build.PL. > >> > >> chris > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. cain at cshl.edu > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From robert.citek at gmail.com Wed Apr 2 12:24:06 2008 From: robert.citek at gmail.com (Robert Citek) Date: Wed, 2 Apr 2008 07:24:06 -0500 Subject: [Bioperl-l] module for pubchem queries Message-ID: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> Hello all, I have a list of chemical compounds that have some kind of interaction with proteins or genes. The current list contains names or SMILES and I would like to get the CID number for those compounds. Currently, I'm using perl to query the NCBI's eutils[1], which works great. But I was just curious to know of there was a bioperl module to do something similar. A quick google didn't turn up anything, so I thought I'd ask. [1] http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html Regards, - Robert From David.Messina at sbc.su.se Wed Apr 2 12:41:45 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 2 Apr 2008 14:41:45 +0200 Subject: [Bioperl-l] How to make a network graphic using my genes in pathways? In-Reply-To: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> Message-ID: <628aabb70804020541v6cee4584ibd9935290ae7cc0a@mail.gmail.com> I have no personal experience with it, but a colleague of mine suggested VisANT . Dave From cjfields at uiuc.edu Wed Apr 2 15:03:32 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 2 Apr 2008 10:03:32 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] quick update on bioperl nightly builds In-Reply-To: <1207139302.6507.7.camel@frissell> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> <1207139302.6507.7.camel@frissell> Message-ID: <3B490712-3413-4662-99D7-7B115CECB6E1@uiuc.edu> The changes I made were related to problems checking MySQL for Bio::DB::SeqFeature::Store tests when connectivity requires username/ password. For some reason it tests DB connectivity up front, while Bio::DB::GFF assumes the DB setup is correct (no direct DB check) then runs tests assuming the setup is correct. You can view the diffs for your commits here: http://code.open-bio.org/svnweb/index.cgi/bioperl/diff/bioperl-live/trunk/ModuleBuildBioperl.pm?revs=14604&revs=14548 http://code.open-bio.org/svnweb/index.cgi/bioperl/diff/bioperl-live/trunk/Build.PL?revs=14604&revs=14565 I'll try working on merging them together today; it shouldn't be too hard (the changes were fairly minor in both Build.PL and Module::Build). I'll test to make sure your changes stay in as well. Down the road I believe we need to rethink how we want the Build process to run using Module::Build as it's a bit convoluted, but it works for now. chris On Apr 2, 2008, at 7:28 AM, Scott Cain wrote: > Hi Chris, > > (trimmed out gbrowse mailing list since this is just bioperl business) > > Speaking of the pass through stuff, Sendu mentioned that I stomped on > some changes to Build.PL that you and he did when I committed that > change, so it should be rolled back. Is there a good (svn) way to do > that? Or should I just copy the contents of the old (good) Build.PL > into a fresh file in my checkout and commit it? > > Thanks, > Scott > > On Tue, 2008-04-01 at 22:59 -0500, Chris Fields wrote: >> Nope, that isn't intended. I fixed it and reran it manually, so it >> should be fine now (note I didn't update the log file; the next cron >> run will catch that). >> >> I may toy around with your recent passthrough flag addition to try >> getting automated PPM's up and running. >> >> chris >> >> On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: >> >>> Hi Chris, >>> >>> The tarball is currently (Apr 1) being built in a tmp directory, so >>> that >>> the extracted tarball is ./tmp/bioperl-live/. Is that intended? >>> >>> Thanks, >>> Scott >>> >>> On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: >>>> I'm simplifying the nightly build archive names (removing svn >>>> revision >>>> # and date) in case anyone needs to update bioperl-live/run/db/ >>>> network >>>> on a regular basis (read: GBrowse installations). When I have time >>>> I'll start working on automated builds, which will require some >>>> extra >>>> work with Module::Build and Build.PL. >>>> >>>> chris >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. cain at cshl.edu >>> GMOD Coordinator (http://www.gmod.org/) >>> 216-392-3087 >>> Cold Spring Harbor Laboratory >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace >> _______________________________________________ >> Gmod-gbrowse mailing list >> Gmod-gbrowse at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Apr 2 15:54:05 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 2 Apr 2008 10:54:05 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] quick update on bioperl nightly builds In-Reply-To: <3B490712-3413-4662-99D7-7B115CECB6E1@uiuc.edu> References: <02D78F8E-276F-46C1-91CD-F80BA6A09C14@uiuc.edu> <1207105924.6184.4.camel@frissell> <1207139302.6507.7.camel@frissell> <3B490712-3413-4662-99D7-7B115CECB6E1@uiuc.edu> Message-ID: <71375DA3-A751-4908-8000-D9ACAE39B19C@uiuc.edu> Okay, committed them. The accept passthrough still appears to work; let me know if anything pops up. chris On Apr 2, 2008, at 10:03 AM, Chris Fields wrote: > ... > I'll try working on merging them together today; it shouldn't be too > hard (the changes were fairly minor in both Build.PL and > Module::Build). I'll test to make sure your changes stay in as > well. Down the road I believe we need to rethink how we want the > Build process to run using Module::Build as it's a bit convoluted, > but it works for now. > > chris > > On Apr 2, 2008, at 7:28 AM, Scott Cain wrote: >> Hi Chris, >> >> (trimmed out gbrowse mailing list since this is just bioperl >> business) >> >> Speaking of the pass through stuff, Sendu mentioned that I stomped on >> some changes to Build.PL that you and he did when I committed that >> change, so it should be rolled back. Is there a good (svn) way to do >> that? Or should I just copy the contents of the old (good) Build.PL >> into a fresh file in my checkout and commit it? >> >> Thanks, >> Scott >> >> On Tue, 2008-04-01 at 22:59 -0500, Chris Fields wrote: >>> Nope, that isn't intended. I fixed it and reran it manually, so it >>> should be fine now (note I didn't update the log file; the next cron >>> run will catch that). >>> >>> I may toy around with your recent passthrough flag addition to try >>> getting automated PPM's up and running. >>> >>> chris >>> >>> On Apr 1, 2008, at 10:12 PM, Scott Cain wrote: >>> >>>> Hi Chris, >>>> >>>> The tarball is currently (Apr 1) being built in a tmp directory, so >>>> that >>>> the extracted tarball is ./tmp/bioperl-live/. Is that intended? >>>> >>>> Thanks, >>>> Scott >>>> >>>> On Tue, 2008-04-01 at 19:45 -0500, Chris Fields wrote: >>>>> I'm simplifying the nightly build archive names (removing svn >>>>> revision >>>>> # and date) in case anyone needs to update bioperl-live/run/db/ >>>>> network >>>>> on a regular basis (read: GBrowse installations). When I have >>>>> time >>>>> I'll start working on automated builds, which will require some >>>>> extra >>>>> work with Module::Build and Build.PL. >>>>> >>>>> chris >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. cain at cshl.edu >>>> GMOD Coordinator (http://www.gmod.org/) >>>> 216-392-3087 >>>> Cold Spring Harbor Laboratory >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace >>> _______________________________________________ >>> Gmod-gbrowse mailing list >>> Gmod-gbrowse at lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. cain.cshl at gmail.com >> GMOD Coordinator (http://www.gmod.org/) >> 216-392-3087 >> Cold Spring Harbor Laboratory >> >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From zhpan99 at yahoo.com Wed Apr 2 17:52:46 2008 From: zhpan99 at yahoo.com (Pan Zheng) Date: Wed, 2 Apr 2008 10:52:46 -0700 (PDT) Subject: [Bioperl-l] installing bioperl-1.5.2 errors:DB_File Message-ID: <726978.82400.qm@web53105.mail.re2.yahoo.com> Hi, I am installing bioperl-1.5.2_102 under cygwin on my Windows XP and having some errors during the process. When I was running "perl Build test", one major error is the error about DB_File. I tried to install DB_File from cpan and rpm without any luck. ++++++++++++++++++++++++ CPAN: File::Temp loaded ok (v0.16) CPAN: YAML loaded ok (v0.62) CPAN.pm: Going to build P/PM/PMQS/DB_File-1.817.tar.gz Parsing config.in... Looks Good. Checking if your kit is complete... Looks good Note (probably harmless): No library found for -ldb Writing Makefile for DB_File cp DB_File.pm blib/lib/DB_File.pm AutoSplitting blib/lib/DB_File.pm (blib/lib/auto/DB_File) gcc -c -I/usr/local/BerkeleyDB/include -DPERL_USE_SAFE_PUTENV -fno-strict-alias ing -pipe -Wdeclaration-after-statement -DUSEIMPORTLIB -O3 -DVERSION=\"1.817\" -DXS_VERSION=\"1.817\" "-I/usr/lib/perl5/5.8/cygwin/CORE" -D_NOT_CORE -DmDB_ Prefix_t=size_t -DmDB_Hash_t=u_int32_t version.c version.c:30:16: db.h: No such file or directory make: *** [version.o] Error 1 PMQS/DB_File-1.817.tar.gz /usr/bin/make -- NOT OK Running make test Can't test without successful make Running make install Make had returned bad status, install seems impossible Failed during this command: PMQS/DB_File-1.817.tar.gz : make NO +++++++++++++++++++++++++++++++++++++++++++++++ I can't remember I had this kind error while installing earlier version. Would you please help me on DB_File installation ? Thanks. Pan --------------------------------- You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. From dr.hogart at gmail.com Thu Apr 3 13:01:03 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Thu, 03 Apr 2008 17:01:03 +0400 Subject: [Bioperl-l] support of clustalw2 in bio::run::tool::alignment Message-ID: As for as I understand clustalw2 is not supported in bioperl v1.5.2.100. In what version it will be realized? Thank you in advance. From slduncan at iastate.edu Thu Apr 3 18:13:16 2008 From: slduncan at iastate.edu (slduncan at iastate.edu) Date: Thu, 3 Apr 2008 13:13:16 -0500 (CDT) Subject: [Bioperl-l] help installing bioperl with cygwin Message-ID: <161313331084931@webmail.iastate.edu> I am trying to use cpan to install bioperl and I had an error message saying: c:\Documents not recognized as and external or internal.... Any ideas here. Also, I am new to the computer world so please be kind. :) Stacy Duncan Iowa State University Bioinformatics and Computational Biology 1802 University Blvd. VMRI Building 6 Ames, IA 50011-1240 office phone: (515) 294-8385 office fax: (515) 294-1401 home phone: (336) 965-5622 e-mail: slduncan at iastate.edu From cjfields at uiuc.edu Fri Apr 4 20:13:23 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 4 Apr 2008 15:13:23 -0500 Subject: [Bioperl-l] help installing bioperl with cygwin In-Reply-To: <161313331084931@webmail.iastate.edu> References: <161313331084931@webmail.iastate.edu> Message-ID: It's best if you use ActiveState's Perl installation (it's the only one we really support at this moment, unless someone wants to give StrawberryPerl a run). See: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows chris On Apr 3, 2008, at 1:13 PM, slduncan at iastate.edu wrote: > I am trying to use cpan to install bioperl and I had an error > message saying: > c:\Documents not recognized as and external or internal.... > Any ideas here. Also, I am new to the computer world so please be > kind. :) > > Stacy Duncan > Iowa State University > Bioinformatics and Computational Biology > 1802 University Blvd. > VMRI Building 6 > Ames, IA 50011-1240 > office phone: (515) 294-8385 > office fax: (515) 294-1401 > home phone: (336) 965-5622 > e-mail: slduncan at iastate.edu > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Apr 4 20:07:12 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 4 Apr 2008 15:07:12 -0500 Subject: [Bioperl-l] installing bioperl-1.5.2 errors:DB_File In-Reply-To: <726978.82400.qm@web53105.mail.re2.yahoo.com> References: <726978.82400.qm@web53105.mail.re2.yahoo.com> Message-ID: I think you have to use the cygwin installer to install DB_File (it also installs dependencies, such as BDB). According to 'perldoc perlcygwin': .... Optional Libraries for Perl on Cygwin Several Perl functions and modules depend on the existence of some optional libraries. Configure will find them if they are installed in one of the directories listed as being used for library searches. Pre- built packages for most of these are available from the Cygwin installer. .... chris On Apr 2, 2008, at 12:52 PM, Pan Zheng wrote: > Hi, > > I am installing bioperl-1.5.2_102 under cygwin on my Windows XP and > having some errors during the process. > > When I was running "perl Build test", one major error is the error > about DB_File. I tried to install DB_File from cpan and rpm without > any luck. > > ++++++++++++++++++++++++ > CPAN: File::Temp loaded ok (v0.16) > CPAN: YAML loaded ok (v0.62) > CPAN.pm: Going to build P/PM/PMQS/DB_File-1.817.tar.gz > Parsing config.in... > Looks Good. > Checking if your kit is complete... > Looks good > Note (probably harmless): No library found for -ldb > Writing Makefile for DB_File > cp DB_File.pm blib/lib/DB_File.pm > AutoSplitting blib/lib/DB_File.pm (blib/lib/auto/DB_File) > gcc -c -I/usr/local/BerkeleyDB/include -DPERL_USE_SAFE_PUTENV -fno- > strict-alias > ing -pipe -Wdeclaration-after-statement -DUSEIMPORTLIB -O3 - > DVERSION=\"1.817\" > -DXS_VERSION=\"1.817\" "-I/usr/lib/perl5/5.8/cygwin/CORE" - > D_NOT_CORE -DmDB_ > Prefix_t=size_t -DmDB_Hash_t=u_int32_t version.c > version.c:30:16: db.h: No such file or directory > make: *** [version.o] Error 1 > PMQS/DB_File-1.817.tar.gz > /usr/bin/make -- NOT OK > Running make test > Can't test without successful make > Running make install > Make had returned bad status, install seems impossible > Failed during this command: > PMQS/DB_File-1.817.tar.gz : make NO > +++++++++++++++++++++++++++++++++++++++++++++++ > > > I can't remember I had this kind error while installing earlier > version. > > Would you please help me on DB_File installation ? > > Thanks. > > Pan > > > --------------------------------- > You rock. That's why Blockbuster's offering you one month of > Blockbuster Total Access, No Cost. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Apr 4 21:25:41 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 4 Apr 2008 16:25:41 -0500 Subject: [Bioperl-l] module for pubchem queries In-Reply-To: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> References: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> Message-ID: <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> Do you need something to access eutils via BioPerl, or are you looking for a specific set of classes? I wrote an interface to eutils (Bio::DB::EUtilities), you could do something like this: #!/usr/bin/perl -w use strict; use warnings; use Bio::DB::EUtilities; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -term => 'dihydroorotate', -db => 'pcsubstance', -retmax => 1000); print join(',',$eutil->get_ids)."\n"; chris On Apr 2, 2008, at 7:24 AM, Robert Citek wrote: > Hello all, > > I have a list of chemical compounds that have some kind of interaction > with proteins or genes. The current list contains names or SMILES and > I would like to get the CID number for those compounds. Currently, > I'm using perl to query the NCBI's eutils[1], which works great. But > I was just curious to know of there was a bioperl module to do > something similar. A quick google didn't turn up anything, so I > thought I'd ask. > > [1] http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html > > Regards, > - Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From ekeen at mail.tongji.edu.cn Mon Apr 7 06:57:04 2008 From: ekeen at mail.tongji.edu.cn (Jinyan Huang) Date: Mon, 7 Apr 2008 14:57:04 +0800 Subject: [Bioperl-l] How to analysis the relationship of my interesting KEGG pathways? Message-ID: In my research, I got 25 interesting pathways. I want to know the regulated relationship of these pathways. It is better if there some software to connect these KEGG pathways. Thank you very much in advance. From miguel.pignatelli at uv.es Mon Apr 7 10:12:58 2008 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Mon, 07 Apr 2008 12:12:58 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> Message-ID: <47F9F3AA.2090003@uv.es> Hi all, Is there any way to obtain the date of creation of individual GenBank entries? I don't mean the "last revision" date that can be found in the first line of a GenBank file. I can access this creation date by looking at the "revision history" of any GenBank entry (for example, see http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), but I need a systematic (and local=fast) way to access this information. Any help would be very appreciated, Thank you very much in advance, M; From Bank.Beszteri at awi.de Mon Apr 7 11:46:43 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Mon, 07 Apr 2008 13:46:43 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: References: <47F22B35.1030502@awi.de> Message-ID: <47FA09A3.2070004@awi.de> Hi Hilmar, it was important to understand that the inconsistency in taxon names is apparently only between the Swissprot entries with "non-standard" names and the contents of the taxonomy tables and that it is best to use a pre-loaded taxonomy, thanks for that! We have now updated to bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to have loaded everything OK in ~26 hours (with many of the "The supplied lineage does not start near..." warnings, but no other problems). Our next test is to try to load trembl (will try to do this in parallel in multiple chunks), hope it will work just as nicely! Thanks for your tips & insights! Bank Hilmar Lapp wrote: > > On Apr 1, 2008, at 8:31 AM, B?nk Beszteri wrote: > >> [...] So next we started to test BioSQL, by trying to load just >> Swissprot in a MySQL DB first, like: >> >> load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser >> xyz --dbpass abc --driver mysql --namespace uniprot_sprot --format >> swiss uniprot_sprot.dat >> >> Here we get an error message >> >> ########################################### >> >> Loading /biodb/spinkern/uniprot_sprot.dat ... >> Could not store Q6DAH5: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: The supplied lineage does not start near 'Erwinia carotovora >> subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | >> Pectobacterium | Enterobacteriaceae | Enterobacteriales | >> Gammaproteobacteria | Proteobacteria | Bacteria') >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/ >> bioperl-1.5.2_102/Bio/Root/Root.pm:359 >> STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/ >> bioperl-1.5.2_102/Bio/Species.pm:174 >> STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: 552 >> STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / >> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:1305 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key >> /biodb/ spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:973 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:852 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:182 >> STACK: Bio::DB::Persistent::PersistentObject::create /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm: 244 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:169 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/ >> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store /biodb/spinkern/ >> bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: load_seqdatabase.pl:622 >> ----------------------------------------------------------- >> >> at load_seqdatabase.pl line 635 >> >> ############################################ >> >> or similar, depending on whether we use a pre-loaded ncbi taxonomy >> or not > > > I recommend to always use a pre-loaded NCBI taxonomy unless you know > there are only a few organisms that are straightforward (for the > parser, that is). > >> , and which Swissprot release we are trying to load. It often seems >> to come from sg. like here, subsp. or other special addition to the >> species line; but alternative genus names and other curious things >> also to appear. It looks like Species.pm tries to validate the >> species name against the lineage info already there in the BioSQL >> DB, and in several cases, it finds inconsistencies. > > > It actually happens upon a successful lookup when the species object > is populated from the database. > >> [...] >> The only workaround we have found so far was to comment out line 174 >> in Species.pm: >> >> $self->throw("The supplied lineage does not start near '$name' (I >> was supplied '".join(" | ", @vals)."')"); > > > That should be OK if you work with a pre-loaded taxonomy. It's sort > of a sanity check that should catch a parser having messed up a > species. If you use a pre-loaded NCBI taxonomy the results of the > species parsing don't matter in all details so long as the NCBI > taxonID is parsed out correctly, and then found in the database. > > Note that this actually a warn() in the main trunk version of > BioPerl, so you might want to upgrade to that (or change throw() to > warn() in your version). You still get the records flagged with that, > but it isn't an exception. > >> >> After doing so, load_seqdatabase.pl runs for several hours (until it >> evetually crashes; I haven?t found out yet why), but proceeds really >> slowly. > > > It should certainly *not* crash. Note also that you can supply --safe > on the command line, in which case the script will continue with the > next record if one fails to load for whatever reason. > > You will want to adjust the width constraint of dbxref.accession, for > example to 128 chars. This will also be fixed for BioSQL 1.0.1. > See http://bugzilla.open-bio.org/show_bug.cgi?id=2474 > > >> I also found some info on this for Pg and Oracle in the mailing >> list, but has anyone some approximate numbers for MySQL, how long >> should a first Swissprot load take? > > > Possibly around 20 hours according to Erik Rijkers: > See http://lists.open-bio.org/pipermail/bioperl-l/2008-March/027427.html > > You can use the --logchunks N option to have it print out performance > statistics every N records. > > Hope this helps, > > -hilmar From cjfields at uiuc.edu Mon Apr 7 12:32:45 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Apr 2008 07:32:45 -0500 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47FA09A3.2070004@awi.de> References: <47F22B35.1030502@awi.de> <47FA09A3.2070004@awi.de> Message-ID: The warnings are something that we still need to resolve, but the only fix I can think of likely breaks backward compatibility with older bioperl-db installations (i.e. storing the given scientific name instead of the binomial name, which is used as a fallback when no taxid is found). There is a full explanation here: http://bugzilla.open-bio.org/show_bug.cgi?id=2092 Anyway, I think it needs further testing when someone, likely Hilmar or I, have time. chris On Apr 7, 2008, at 6:46 AM, B?nk Beszteri wrote: > Hi Hilmar, > > it was important to understand that the inconsistency in taxon names > is apparently only between the Swissprot entries with "non-standard" > names and the contents of the taxonomy tables and that it is best to > use a pre-loaded taxonomy, thanks for that! We have now updated to > bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to > have loaded everything OK in ~26 hours (with many of the "The > supplied lineage does not start near..." warnings, but no other > problems). Our next test is to try to load trembl (will try to do > this in parallel in multiple chunks), hope it will work just as > nicely! > > Thanks for your tips & insights! > > Bank > > Hilmar Lapp wrote: > >> >> On Apr 1, 2008, at 8:31 AM, B?nk Beszteri wrote: >> >>> [...] So next we started to test BioSQL, by trying to load just >>> Swissprot in a MySQL DB first, like: >>> >>> load_seqdatabase.pl --host mysql.awi.de --dbname biosql2 --dbuser >>> xyz --dbpass abc --driver mysql --namespace uniprot_sprot -- >>> format swiss uniprot_sprot.dat >>> >>> Here we get an error message >>> >>> ########################################### >>> >>> Loading /biodb/spinkern/uniprot_sprot.dat ... >>> Could not store Q6DAH5: >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: The supplied lineage does not start near 'Erwinia carotovora >>> subsp. atroseptica' (I was supplied 'Erwinia carotovora subsp. | >>> Pectobacterium | Enterobacteriaceae | Enterobacteriales | >>> Gammaproteobacteria | Proteobacteria | Bacteria') >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /biodb/spinkern/bioperl-1.5/ >>> bioperl-1.5.2_102/Bio/Root/Root.pm:359 >>> STACK: Bio::Species::classification /biodb/spinkern/bioperl-1.5/ >>> bioperl-1.5.2_102/Bio/Species.pm:174 >>> STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/ >>> PersistentObject.pm: 552 >>> STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / >>> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:1305 >>> STACK: >>> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / >>> biodb/ spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:973 >>> STACK: >>> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >>> biodb/spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:852 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:182 >>> STACK: Bio::DB::Persistent::PersistentObject::create /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/Persistent/ >>> PersistentObject.pm: 244 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:169 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /biodb/ >>> spinkern/bioperl-db-1.5.2_100/Bio/DB/BioSQL/ >>> BasePersistenceAdaptor.pm:251 >>> STACK: Bio::DB::Persistent::PersistentObject::store /biodb/ >>> spinkern/ bioperl-db-1.5.2_100/Bio/DB/Persistent/ >>> PersistentObject.pm:271 >>> STACK: load_seqdatabase.pl:622 >>> ----------------------------------------------------------- >>> >>> at load_seqdatabase.pl line 635 >>> >>> ############################################ >>> >>> or similar, depending on whether we use a pre-loaded ncbi >>> taxonomy or not >> >> >> I recommend to always use a pre-loaded NCBI taxonomy unless you >> know there are only a few organisms that are straightforward (for >> the parser, that is). >> >>> , and which Swissprot release we are trying to load. It often >>> seems to come from sg. like here, subsp. or other special >>> addition to the species line; but alternative genus names and >>> other curious things also to appear. It looks like Species.pm >>> tries to validate the species name against the lineage info >>> already there in the BioSQL DB, and in several cases, it finds >>> inconsistencies. >> >> >> It actually happens upon a successful lookup when the species >> object is populated from the database. >> >>> [...] >>> The only workaround we have found so far was to comment out line >>> 174 in Species.pm: >>> >>> $self->throw("The supplied lineage does not start near '$name' (I >>> was supplied '".join(" | ", @vals)."')"); >> >> >> That should be OK if you work with a pre-loaded taxonomy. It's >> sort of a sanity check that should catch a parser having messed up >> a species. If you use a pre-loaded NCBI taxonomy the results of >> the species parsing don't matter in all details so long as the >> NCBI taxonID is parsed out correctly, and then found in the >> database. >> >> Note that this actually a warn() in the main trunk version of >> BioPerl, so you might want to upgrade to that (or change throw() >> to warn() in your version). You still get the records flagged with >> that, but it isn't an exception. >> >>> >>> After doing so, load_seqdatabase.pl runs for several hours (until >>> it evetually crashes; I haven?t found out yet why), but proceeds >>> really slowly. >> >> >> It should certainly *not* crash. Note also that you can supply -- >> safe on the command line, in which case the script will continue >> with the next record if one fails to load for whatever reason. >> >> You will want to adjust the width constraint of dbxref.accession, >> for example to 128 chars. This will also be fixed for BioSQL 1.0.1. >> See http://bugzilla.open-bio.org/show_bug.cgi?id=2474 >> >> >>> I also found some info on this for Pg and Oracle in the mailing >>> list, but has anyone some approximate numbers for MySQL, how long >>> should a first Swissprot load take? >> >> >> Possibly around 20 hours according to Erik Rijkers: >> See http://lists.open-bio.org/pipermail/bioperl-l/2008-March/027427.html >> >> You can use the --logchunks N option to have it print out >> performance statistics every N records. >> >> Hope this helps, >> >> -hilmar > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Mon Apr 7 12:34:00 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 07 Apr 2008 13:34:00 +0100 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47FA09A3.2070004@awi.de> References: <47F22B35.1030502@awi.de> <47FA09A3.2070004@awi.de> Message-ID: <47FA14B8.7000500@sendu.me.uk> B?nk Beszteri wrote: > Hi Hilmar, > > it was important to understand that the inconsistency in taxon names is > apparently only between the Swissprot entries with "non-standard" names > and the contents of the taxonomy tables and that it is best to use a > pre-loaded taxonomy, thanks for that! We have now updated to > bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to have > loaded everything OK in ~26 hours (with many of the "The supplied > lineage does not start near..." warnings, but no other problems). Can you provide some examples of these warnings (of the taxons that cause them)? If there's anything consistent about them perhaps Bio::Species can be improved to accommodate them properly (instead of just issuing the warning and getting the classification wrong). From heikki at sanbi.ac.za Mon Apr 7 12:48:34 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Mon, 7 Apr 2008 14:48:34 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47F9F3AA.2090003@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> Message-ID: <200804071448.34769.heikki@sanbi.ac.za> Miguel, You probably know this but: - Your entry example below is a GenPept entry, not a GenBank entry - The NCBI sequence format "genbank" has only the last modified date. I do not know about other formats (ASN.1, ...) - NCBI Entrez is a great tool but it obscures the source database. - If you really are working on real GenBank entries, you can use the accession number to see find corresponding EMBL (and Swiss-Prot) flat file formats that have both creation and last modified dates. Post to the list if you have trouble getting the dates from EMBL/Swiss-Prot formats using bioperl. Yours, -Heikki On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: > Hi all, > > Is there any way to obtain the date of creation of individual GenBank > entries? I don't mean the "last revision" date that can be found in the > first line of a GenBank file. > > I can access this creation date by looking at the "revision history" of > any GenBank entry (for example, see > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), > but I need a systematic (and local=fast) way to access this information. > > Any help would be very appreciated, > Thank you very much in advance, > > M; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From granjeau at tagc.univ-mrs.fr Mon Apr 7 13:30:10 2008 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/ICIM) Date: Mon, 07 Apr 2008 15:30:10 +0200 Subject: [Bioperl-l] help installing bioperl with cygwin In-Reply-To: References: <161313331084931@webmail.iastate.edu> Message-ID: <47FA21E2.3010602@tagc.univ-mrs.fr> Hi, I'm using BioPerl under Cygwin, because Cygwin allows one to work in a Unix-like environment in a command line point of view. So, I use the CVS version which runs out of the box http://www.bioperl.org/wiki/Using_CVS which has been replaced by SVN at the beginning of the year http://www.bioperl.org/wiki/Using_Subversion So if you really want to work under Cygwin, you can try this quick and dirty way, but you still have to become experienced because BioPerl is not supported under Cygwin. You may try Strawberry, but in my experience in installing wxPerl, wxPerl fails on both flavours of Perl. ActiveState's Perl is still the easiest way to install many packages. Regards, Samuel Chris Fields wrote: > It's best if you use ActiveState's Perl installation (it's the only > one we really support at this moment, unless someone wants to give > StrawberryPerl a run). See: > > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > chris > > On Apr 3, 2008, at 1:13 PM, slduncan at iastate.edu wrote: > >> I am trying to use cpan to install bioperl and I had an error message >> saying: >> c:\Documents not recognized as and external or internal.... >> Any ideas here. Also, I am new to the computer world so please be >> kind. :) >> >> Stacy Duncan >> Iowa State University >> Bioinformatics and Computational Biology >> 1802 University Blvd. >> VMRI Building 6 >> Ames, IA 50011-1240 >> office phone: (515) 294-8385 >> office fax: (515) 294-1401 >> home phone: (336) 965-5622 >> e-mail: slduncan at iastate.edu >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Samuel GRANJEAUD granjeau at tagc.univ-mrs.fr INSERM - ICIM - TAGC Tel: +33 (0)491 82 87 24 http://tagc.univ-mrs.fr Fax: +33 (0)491 82 87 01 http://icim.marseille.inserm.fr/proteomique From er at xs4all.nl Mon Apr 7 14:36:57 2008 From: er at xs4all.nl (Erik) Date: Mon, 7 Apr 2008 16:36:57 +0200 (CEST) Subject: [Bioperl-l] Indexing large databases / BioSQL Message-ID: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> On Mon, April 7, 2008 14:34, Sendu Bala wrote: > B?nk Beszteri wrote: >> Hi Hilmar, >> >> it was important to understand that the inconsistency in taxon names is >> apparently only between the Swissprot entries with "non-standard" names >> and the contents of the taxonomy tables and that it is best to use a >> pre-loaded taxonomy, thanks for that! We have now updated to >> bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to have >> loaded everything OK in ~26 hours (with many of the "The supplied >> lineage does not start near..." warnings, but no other problems). > > Can you provide some examples of these warnings (of the taxons that > cause them)? If there's anything consistent about them perhaps > Bio::Species can be improved to accommodate them properly (instead of > just issuing the warning and getting the classification wrong). > I did this a little while ago and saved the output (UniProtKB/Swiss-Prot Release 55.1 of 18-Mar-2008, I think). All warnings (and a few errors) for swissprot are here: http://bugzilla.open-bio.org/show_bug.cgi?id=2474 as an attached file I suppose the OP will have encountered similar output - I don't think there is much RDBMS-type-dependency involved. regards, Erik Rijkers From cjfields at uiuc.edu Mon Apr 7 15:46:01 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Apr 2008 10:46:01 -0500 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <200804071448.34769.heikki@sanbi.ac.za> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> Message-ID: <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> Strangely enough, if you use NCBI's esummary you can get both dates. Via Bio::DB::EUtilities in bioperl-live, if you dump out DocSum data (using a debugging method I added in a while back): --------------------------------------- use Bio::DB::EUtilities; # for multiple IDs use an array ref; also only use GI's (not accessions) my $factory = Bio::DB::EUtilities->new( -eutil => 'esummary', -db => 'protein', -id => 1621261); $factory->print_DocSums; --------------------------------------- One gets the following tag/value pairs: UID: 1621261 Caption :CAB02640 Title :PROBABLE PYRIMIDINE OPERON REGULATORY PROTEIN PYRR [Mycobacterium tuberculosis H37Rv] Extra :gi|1621261|emb|CAB02640.1|[1621261] Gi :1621261 CreateDate :2003/11/21 UpdateDate :2006/11/14 Flags : TaxId :83332 Length :193 Status :live ReplacedBy : Comment : I'll add in a method to grab the data element by tag (in this case, grab the creation date by asking for the 'CreateDate' key). Might come in handy for scripts. chris On Apr 7, 2008, at 7:48 AM, Heikki Lehvaslaiho wrote: > Miguel, > > You probably know this but: > > - Your entry example below is a GenPept entry, not a GenBank entry > - The NCBI sequence format "genbank" has only the last modified date. > I do not know about other formats (ASN.1, ...) > - NCBI Entrez is a great tool but it obscures the source database. > - If you really are working on real GenBank entries, you can use the > accession > number to see find corresponding EMBL (and Swiss-Prot) flat file > formats that > have both creation and last modified dates. > > Post to the list if you have trouble getting the dates from EMBL/ > Swiss-Prot > formats using bioperl. > > Yours, > > -Heikki > > On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: >> Hi all, >> >> Is there any way to obtain the date of creation of individual GenBank >> entries? I don't mean the "last revision" date that can be found in >> the >> first line of a GenBank file. >> >> I can access this creation date by looking at the "revision >> history" of >> any GenBank entry (for example, see >> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi? >> val=74311105), >> but I need a systematic (and local=fast) way to access this >> information. >> >> Any help would be very appreciated, >> Thank you very much in advance, >> >> M; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miguel.pignatelli at uv.es Mon Apr 7 16:24:50 2008 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Mon, 07 Apr 2008 18:24:50 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> Message-ID: <47FA4AD2.5030206@uv.es> I've noticed that the ASN.1 version of those records has a "creation-date" tag. But this is somehow strange, because the creation date obtained by you and that obtained via ASN.1 format is 2003/11/21, but if you look at the revision history of the record: http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=CAB02640 reports a creation date of "Oct 19 1996 12:28 AM" I don't know how to get this, because the EMBL version of this gene: http://www.ebi.ac.uk/cgi-bin/dbfetch?db=emblcds&id=CAB02640&style=raw doesn't has DT fields at all. M; Chris Fields wrote: > Strangely enough, if you use NCBI's esummary you can get both dates. > Via Bio::DB::EUtilities in bioperl-live, if you dump out DocSum data > (using a debugging method I added in a while back): > > --------------------------------------- > > use Bio::DB::EUtilities; > > # for multiple IDs use an array ref; also only use GI's (not accessions) > my $factory = Bio::DB::EUtilities->new( > -eutil => 'esummary', > -db => 'protein', > -id => 1621261); > > $factory->print_DocSums; > > --------------------------------------- > > One gets the following tag/value pairs: > > UID: 1621261 > Caption :CAB02640 > Title :PROBABLE PYRIMIDINE OPERON REGULATORY PROTEIN PYRR > [Mycobacterium tuberculosis > H37Rv] > Extra :gi|1621261|emb|CAB02640.1|[1621261] > Gi :1621261 > CreateDate :2003/11/21 > UpdateDate :2006/11/14 > Flags : > TaxId :83332 > Length :193 > Status :live > ReplacedBy : > Comment : > > I'll add in a method to grab the data element by tag (in this case, grab > the creation date by asking for the 'CreateDate' key). Might come in > handy for scripts. > > chris > > On Apr 7, 2008, at 7:48 AM, Heikki Lehvaslaiho wrote: > >> Miguel, >> >> You probably know this but: >> >> - Your entry example below is a GenPept entry, not a GenBank entry >> - The NCBI sequence format "genbank" has only the last modified date. >> I do not know about other formats (ASN.1, ...) >> - NCBI Entrez is a great tool but it obscures the source database. >> - If you really are working on real GenBank entries, you can use the >> accession >> number to see find corresponding EMBL (and Swiss-Prot) flat file >> formats that >> have both creation and last modified dates. >> >> Post to the list if you have trouble getting the dates from >> EMBL/Swiss-Prot >> formats using bioperl. >> >> Yours, >> >> -Heikki >> >> On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: >>> Hi all, >>> >>> Is there any way to obtain the date of creation of individual GenBank >>> entries? I don't mean the "last revision" date that can be found in the >>> first line of a GenBank file. >>> >>> I can access this creation date by looking at the "revision history" of >>> any GenBank entry (for example, see >>> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), >>> but I need a systematic (and local=fast) way to access this information. >>> >>> Any help would be very appreciated, >>> Thank you very much in advance, >>> >>> M; >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ______ _/ _/_____________________________________________________ >> _/ _/ >> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >> _/ _/ _/ SANBI, South African National Bioinformatics Institute >> _/ _/ _/ University of Western Cape, South Africa >> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >> ___ _/_/_/_/_/________________________________________________________ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Mon Apr 7 17:48:45 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Apr 2008 12:48:45 -0500 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47FA4AD2.5030206@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> <47FA4AD2.5030206@uv.es> Message-ID: Note in the example I gave that, during the revision history, the DBSOURCE changed at the point of the creation date (the original nuc. record was a M. tuberculosis contig sequence, which later changed to an updated full M. tuberculosis genome record at the time of the 'create date'). Couldn't find anything specific in the GenBank docs on this, but it appears (at least for a protein record) the creation date reflects the date in which the sequence was either originally deposited or originally derived from the nucleotide source record present in the record. In other words, it may not reflect the original date of deposition (which could have come from a different record, as in this case). chris On Apr 7, 2008, at 11:24 AM, Miguel Pignatelli wrote: > > I've noticed that the ASN.1 version of those records has a "creation- > date" tag. > But this is somehow strange, because the creation date obtained by > you and that obtained via ASN.1 format is 2003/11/21, but if you > look at the revision history of the record: > > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=CAB02640 > > reports a creation date of "Oct 19 1996 12:28 AM" > > I don't know how to get this, because the EMBL version of this gene: > > http://www.ebi.ac.uk/cgi-bin/dbfetch?db=emblcds&id=CAB02640&style=raw > > doesn't has DT fields at all. > > M; > > > Chris Fields wrote: >> Strangely enough, if you use NCBI's esummary you can get both >> dates. Via Bio::DB::EUtilities in bioperl-live, if you dump out >> DocSum data (using a debugging method I added in a while back): >> --------------------------------------- >> use Bio::DB::EUtilities; >> # for multiple IDs use an array ref; also only use GI's (not >> accessions) >> my $factory = Bio::DB::EUtilities->new( >> -eutil => 'esummary', >> -db => 'protein', >> -id => 1621261); >> $factory->print_DocSums; >> --------------------------------------- >> One gets the following tag/value pairs: >> UID: 1621261 >> Caption :CAB02640 >> Title :PROBABLE PYRIMIDINE OPERON REGULATORY PROTEIN >> PYRR [Mycobacterium tuberculosis >> H37Rv] >> Extra :gi|1621261|emb|CAB02640.1|[1621261] >> Gi :1621261 >> CreateDate :2003/11/21 >> UpdateDate :2006/11/14 >> Flags : >> TaxId :83332 >> Length :193 >> Status :live >> ReplacedBy : >> Comment : >> I'll add in a method to grab the data element by tag (in this case, >> grab the creation date by asking for the 'CreateDate' key). Might >> come in handy for scripts. >> chris >> On Apr 7, 2008, at 7:48 AM, Heikki Lehvaslaiho wrote: >>> Miguel, >>> >>> You probably know this but: >>> >>> - Your entry example below is a GenPept entry, not a GenBank entry >>> - The NCBI sequence format "genbank" has only the last modified >>> date. >>> I do not know about other formats (ASN.1, ...) >>> - NCBI Entrez is a great tool but it obscures the source database. >>> - If you really are working on real GenBank entries, you can use >>> the accession >>> number to see find corresponding EMBL (and Swiss-Prot) flat file >>> formats that >>> have both creation and last modified dates. >>> >>> Post to the list if you have trouble getting the dates from EMBL/ >>> Swiss-Prot >>> formats using bioperl. >>> >>> Yours, >>> >>> -Heikki >>> >>> On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote: >>>> Hi all, >>>> >>>> Is there any way to obtain the date of creation of individual >>>> GenBank >>>> entries? I don't mean the "last revision" date that can be found >>>> in the >>>> first line of a GenBank file. >>>> >>>> I can access this creation date by looking at the "revision >>>> history" of >>>> any GenBank entry (for example, see >>>> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105) >>>> , >>>> but I need a systematic (and local=fast) way to access this >>>> information. >>>> >>>> Any help would be very appreciated, >>>> Thank you very much in advance, >>>> >>>> M; >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> ______ _/ _/ >>> _____________________________________________________ >>> _/ _/ >>> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >>> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >>> _/ _/ _/ SANBI, South African National Bioinformatics Institute >>> _/ _/ _/ University of Western Cape, South Africa >>> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >>> ___ _/_/_/_/_/ >>> ________________________________________________________ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Bank.Beszteri at awi.de Tue Apr 8 07:35:43 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Tue, 08 Apr 2008 09:35:43 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> Message-ID: <47FB204F.90405@awi.de> >>Can you provide some examples of these warnings (of the taxons that >>cause them)? If there's anything consistent about them perhaps >>Bio::Species can be improved to accommodate them properly (instead of >>just issuing the warning and getting the classification wrong). >> >> > >All warnings (and a few errors) for swissprot are here: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2474 > >as an attached file > >I suppose the OP will have encountered similar output - I don't think there is >much RDBMS-type-dependency involved. > > Hi Erik & Sendu, yes, the same kind of thing, probably no DBMS-type dependency; in case it could be useful, I uploaded my output as a second attachment to the bugzilla report cited above. Bank From heikki at sanbi.ac.za Tue Apr 8 08:32:12 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Tue, 8 Apr 2008 10:32:12 +0200 Subject: [Bioperl-l] Blast database sequence retrieval perl script In-Reply-To: <6BEABCD5CA640A44A848448A42A03B73079E48C9@ilrikeadx1.ILRI.CGIARAD.ORG> References: <6BEABCD5CA640A44A848448A42A03B73079E48C9@ilrikeadx1.ILRI.CGIARAD.ORG> Message-ID: <200804081032.12312.heikki@sanbi.ac.za> Dear Nelson, I am cc:ing the bioperl mailing list where all these kind of queries should go. More people can help you that way. Since you have your own local data set, you need to create an index that catalogues you sequences for easy retrieval. You need to install bioperl-live first. See for example: http://www.bioperl.org/wiki/Using_Subversion Then you can follow this HOWTO: http://www.bioperl.org/wiki/HOWTO:Flat_databases The other HOWTOs will help you dealing with BioPerl sequence objects that are retrieved: http://www.bioperl.org/wiki/HOWTOs. Yours, -Heikki On Monday 07 April 2008 14:50:23 Ndegwa, Nelson (IITA-Nairobi) wrote: > Dear Prof. Heikki, > > Hi. We met at the Pathogen Bioinformatics Conference held in Nairobi > Kenya in May 2007 at ICIPE. I recall you are a developer of Bioperl and > Perl. I have managed to install a local Blast, having just cowpea Contig > sequences, about 50,000 in total. This runs fine, as I can perform > various queries and get results. However, any good match/hit on the > local Blast database is hard to retrieve and the only option seems to go > back to that database and search manually for the top hit sequence - an > exceedingly manual task. Might you perhaps be having a Perl script I > could adopt to my database to help with this task Such that the hits > have a hyperlink which can be used to retrieve that specific entry? I > have limited knowledge of Perl. Thank you. > > With Kind Regards, > > Nelson. -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From David.Messina at sbc.su.se Tue Apr 8 11:29:12 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 8 Apr 2008 13:29:12 +0200 Subject: [Bioperl-l] How to analysis the relationship of my interesting KEGG pathways? In-Reply-To: <628aabb70804080053g1fd9120ex9d5fd12f65f216f9@mail.gmail.com> References: <628aabb70804080053g1fd9120ex9d5fd12f65f216f9@mail.gmail.com> Message-ID: <628aabb70804080429k2aa17a6eu12197709d4cc1af0@mail.gmail.com> Hi Jinyan, You asked a similar question last week and received a couple of suggestions -- did you take a look at those? I'm not an expert on this topic, but I believe that since regulatory information is much harder to obtain experimentally and therefore much less well known, there isn't a lot of it in pathway databases like KEGG. You may have to look through the literature and start trying to put together possible regulatory links on your own. Dave From hrh at sanger.ac.uk Tue Apr 8 12:48:32 2008 From: hrh at sanger.ac.uk (Hans Rudolf Hotz) Date: Tue, 8 Apr 2008 13:48:32 +0100 (BST) Subject: [Bioperl-l] Blast database sequence retrieval perl script In-Reply-To: <200804081032.12312.heikki@sanbi.ac.za> References: <6BEABCD5CA640A44A848448A42A03B73079E48C9@ilrikeadx1.ILRI.CGIARAD.ORG> <200804081032.12312.heikki@sanbi.ac.za> Message-ID: Nelson or simply use the BLAST indices for the sequence retrieval as well. All you need to do is adding the "-o" option to the 'formatdb' command for the BLAST index creation (this will create some extra files). Then you can use 'fastacmd' (which is also part of the NCBI BLAST package) to retrieve the sequences. Hans On Tue, 8 Apr 2008, Heikki Lehvaslaiho wrote: > > Dear Nelson, > > I am cc:ing the bioperl mailing list where all these kind of queries should > go. More people can help you that way. > > > Since you have your own local data set, you need to create an index that > catalogues you sequences for easy retrieval. > > You need to install bioperl-live first. See for example: > http://www.bioperl.org/wiki/Using_Subversion > > Then you can follow this HOWTO: > http://www.bioperl.org/wiki/HOWTO:Flat_databases > > The other HOWTOs will help you dealing with BioPerl sequence objects that are > retrieved: http://www.bioperl.org/wiki/HOWTOs. > > > Yours, > > -Heikki > > > On Monday 07 April 2008 14:50:23 Ndegwa, Nelson (IITA-Nairobi) wrote: >> Dear Prof. Heikki, >> >> Hi. We met at the Pathogen Bioinformatics Conference held in Nairobi >> Kenya in May 2007 at ICIPE. I recall you are a developer of Bioperl and >> Perl. I have managed to install a local Blast, having just cowpea Contig >> sequences, about 50,000 in total. This runs fine, as I can perform >> various queries and get results. However, any good match/hit on the >> local Blast database is hard to retrieve and the only option seems to go >> back to that database and search manually for the top hit sequence - an >> exceedingly manual task. Might you perhaps be having a Perl script I >> could adopt to my database to help with this task Such that the hits >> have a hyperlink which can be used to retrieve that specific entry? I >> have limited knowledge of Perl. Thank you. >> >> With Kind Regards, >> >> Nelson. > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From robert.citek at gmail.com Tue Apr 8 14:09:27 2008 From: robert.citek at gmail.com (Robert Citek) Date: Tue, 8 Apr 2008 09:09:27 -0500 Subject: [Bioperl-l] module for pubchem queries In-Reply-To: <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> References: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> Message-ID: <4145b6790804080709l20f1e56erf4b7af04b0a52870@mail.gmail.com> Wrapping bioperl around eutils will work just fine. Thanks for the pointer. http://search.cpan.org/~sendu/bioperl-1.5.2_102/Bio/DB/EUtilities.pm Regards, - Robert On Fri, Apr 4, 2008 at 4:25 PM, Chris Fields wrote: > Do you need something to access eutils via BioPerl, or are you looking for a > specific set of classes? I wrote an interface to eutils > (Bio::DB::EUtilities), you could do something like this: > > #!/usr/bin/perl -w > > use strict; > use warnings; > use Bio::DB::EUtilities; > > my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', > -term => 'dihydroorotate', > -db => 'pcsubstance', > -retmax => 1000); > > print join(',',$eutil->get_ids)."\n"; > > chris From cjfields at uiuc.edu Tue Apr 8 15:10:26 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 8 Apr 2008 10:10:26 -0500 Subject: [Bioperl-l] module for pubchem queries In-Reply-To: <4145b6790804080709l20f1e56erf4b7af04b0a52870@mail.gmail.com> References: <4145b6790804020524g33672578q535b287e93792bdd@mail.gmail.com> <15B44EC6-3660-4925-BA7A-6763D51E6837@uiuc.edu> <4145b6790804080709l20f1e56erf4b7af04b0a52870@mail.gmail.com> Message-ID: <32D210FC-575E-4D95-95DA-FC6F5BE1FC24@uiuc.edu> Just to note, the the API has changed significantly from the interface in the 1.5.2 release. The up-to-date (supported) interface is in subversion; there are some example recipes here: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook I'm working on a full HOWTO, just haven't had time to get it up on the wiki yet. chris On Apr 8, 2008, at 9:09 AM, Robert Citek wrote: > Wrapping bioperl around eutils will work just fine. Thanks for the > pointer. > > http://search.cpan.org/~sendu/bioperl-1.5.2_102/Bio/DB/EUtilities.pm > > Regards, > - Robert > > On Fri, Apr 4, 2008 at 4:25 PM, Chris Fields > wrote: >> Do you need something to access eutils via BioPerl, or are you >> looking for a >> specific set of classes? I wrote an interface to eutils >> (Bio::DB::EUtilities), you could do something like this: >> >> #!/usr/bin/perl -w >> >> use strict; >> use warnings; >> use Bio::DB::EUtilities; >> >> my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', >> -term => 'dihydroorotate', >> -db => 'pcsubstance', >> -retmax => 1000); >> >> print join(',',$eutil->get_ids)."\n"; >> >> chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cuiw at ncbi.nlm.nih.gov Tue Apr 8 20:41:58 2008 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Tue, 8 Apr 2008 16:41:58 -0400 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47F9F3AA.2090003@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> Message-ID: <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> Hi, Miguel: id1_fetch can do it. Detailed instruction can be found at: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id 1_fetch.html Here is an example: >id1_fetch -lt revisions -flat '12:74311105' -fmt fasta GI Loaded DB Retrieval No. -- ------ -- ------------- 74311105 12/07/2007 NCBI 19766263 74311105 01/23/2007 NCBI 16325656 74311105 03/30/2006 NCBI 13131204 74311105 03/03/2006 NCBI 12915541 74311105 03/02/2006 NCBI 12885275 74311105 12/03/2005 NCBI 12259793 74311105 09/09/2005 NCBI 11257262 74311105 09/09/2005 NCBI 11242667 Wenwu Cui PhD NCBI/NLM/NIH > -----Original Message----- > From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > Sent: Monday, April 07, 2008 6:13 AM > Cc: bioperl-l at bioperl.org > Subject: [Bioperl-l] GenBank entries creation dates > > Hi all, > > Is there any way to obtain the date of creation of individual GenBank > entries? I don't mean the "last revision" date that can be found in the > first line of a GenBank file. > > I can access this creation date by looking at the "revision history" of > any GenBank entry (for example, see > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), > but I need a systematic (and local=fast) way to access this > information. > > Any help would be very appreciated, > Thank you very much in advance, > > M; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miguel.pignatelli at uv.es Wed Apr 9 11:32:39 2008 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Wed, 09 Apr 2008 13:32:39 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> Message-ID: <47FCA957.5040409@uv.es> Wow, impressive, thanks Wenwu for the information, I have never used this tool before. The problem is that I need to know all the revision history (or at least the creation date) for *all* the GIs present in nr (well, or at least a significant portion of it) and this tool queries via web. The existence of this tool confirms me that this information is available somewhere, is it possible to download the data that contains this information? Thanks again, M; Cui, Wenwu (NIH/NLM/NCBI) [C] wrote: > Hi, Miguel: > > id1_fetch can do it. Detailed instruction can be found at: > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id > 1_fetch.html > > Here is an example: > >> id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > GI Loaded DB Retrieval No. > -- ------ -- ------------- > 74311105 12/07/2007 NCBI 19766263 > 74311105 01/23/2007 NCBI 16325656 > 74311105 03/30/2006 NCBI 13131204 > 74311105 03/03/2006 NCBI 12915541 > 74311105 03/02/2006 NCBI 12885275 > 74311105 12/03/2005 NCBI 12259793 > 74311105 09/09/2005 NCBI 11257262 > 74311105 09/09/2005 NCBI 11242667 > > Wenwu Cui PhD > NCBI/NLM/NIH > >> -----Original Message----- >> From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] >> Sent: Monday, April 07, 2008 6:13 AM >> Cc: bioperl-l at bioperl.org >> Subject: [Bioperl-l] GenBank entries creation dates >> >> Hi all, >> >> Is there any way to obtain the date of creation of individual GenBank >> entries? I don't mean the "last revision" date that can be found in > the >> first line of a GenBank file. >> >> I can access this creation date by looking at the "revision history" > of >> any GenBank entry (for example, see >> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), >> but I need a systematic (and local=fast) way to access this >> information. >> >> Any help would be very appreciated, >> Thank you very much in advance, >> >> M; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cuiw at ncbi.nlm.nih.gov Wed Apr 9 13:25:16 2008 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Wed, 9 Apr 2008 09:25:16 -0400 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47FCA957.5040409@uv.es> References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <6F230E9769AA8D4EB4BC401DF133EDB7180BE0@NIHCESMLBX15.nih.gov> <47FCA957.5040409@uv.es> Message-ID: <6F230E9769AA8D4EB4BC401DF133EDB7180BE1@NIHCESMLBX15.nih.gov> Hi, Miguel, I do not know whether the data file is publically available. However, you can perform 'real time' query via id1_fetch: ####step 1: generate GI file ##### id1_fetch -query 'YOUR-GENBANK-QUERY-STRING' -lt none -db Nucleotide -out qfile ####step 2: retrieve revisions for GIs stored in qfile ##### id1_fetch -lt revisions -qf qfile -fmt fasta -db Nucleotide Good luck! Wenwu Cui > -----Original Message----- > From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > Sent: Wednesday, April 09, 2008 7:33 AM > To: Cui, Wenwu (NIH/NLM/NCBI) [C] > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] GenBank entries creation dates > > Wow, impressive, thanks Wenwu for the information, I have never used > this tool before. The problem is that I need to know all the revision > history (or at least the creation date) for *all* the GIs present in nr > (well, or at least a significant portion of it) and this tool queries > via web. > > The existence of this tool confirms me that this information is > available somewhere, is it possible to download the data that contains > this information? > > Thanks again, > > M; > > > Cui, Wenwu (NIH/NLM/NCBI) [C] wrote: > > Hi, Miguel: > > > > id1_fetch can do it. Detailed instruction can be found at: > > > > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.i > d > > 1_fetch.html > > > > Here is an example: > > > >> id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > > GI Loaded DB Retrieval No. > > -- ------ -- ------------- > > 74311105 12/07/2007 NCBI 19766263 > > 74311105 01/23/2007 NCBI 16325656 > > 74311105 03/30/2006 NCBI 13131204 > > 74311105 03/03/2006 NCBI 12915541 > > 74311105 03/02/2006 NCBI 12885275 > > 74311105 12/03/2005 NCBI 12259793 > > 74311105 09/09/2005 NCBI 11257262 > > 74311105 09/09/2005 NCBI 11242667 > > > > Wenwu Cui PhD > > NCBI/NLM/NIH > > > >> -----Original Message----- > >> From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > >> Sent: Monday, April 07, 2008 6:13 AM > >> Cc: bioperl-l at bioperl.org > >> Subject: [Bioperl-l] GenBank entries creation dates > >> > >> Hi all, > >> > >> Is there any way to obtain the date of creation of individual > GenBank > >> entries? I don't mean the "last revision" date that can be found in > > the > >> first line of a GenBank file. > >> > >> I can access this creation date by looking at the "revision history" > > of > >> any GenBank entry (for example, see > >> > http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), > >> but I need a systematic (and local=fast) way to access this > >> information. > >> > >> Any help would be very appreciated, > >> Thank you very much in advance, > >> > >> M; > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From CALLEY_JOHN_N at LILLY.COM Wed Apr 9 13:45:23 2008 From: CALLEY_JOHN_N at LILLY.COM (John N Calley) Date: Wed, 9 Apr 2008 09:45:23 -0400 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: <47FCA957.5040409@uv.es> Message-ID: You might want to keep in mind that the creation date is not always reliable. I am aware of one example where the recorded creation date precedes the sequencing date by several months (as determined by the trace file date). NCBI was not able to explain exactly what happened but (as I recall) hypothesized that some dates had been scrambled in a database rebuild. If there was interest I could probably pull up more details. John Calley Miguel Pignatelli Sent by: bioperl-l-bounces at lists.open-bio.org 04/09/2008 07:32 AM Please respond to miguel.pignatelli at uv.es To "Cui, Wenwu (NIH/NLM/NCBI) [C]" cc bioperl-l at bioperl.org Subject Re: [Bioperl-l] GenBank entries creation dates Wow, impressive, thanks Wenwu for the information, I have never used this tool before. The problem is that I need to know all the revision history (or at least the creation date) for *all* the GIs present in nr (well, or at least a significant portion of it) and this tool queries via web. The existence of this tool confirms me that this information is available somewhere, is it possible to download the data that contains this information? Thanks again, M; Cui, Wenwu (NIH/NLM/NCBI) [C] wrote: > Hi, Miguel: > > id1_fetch can do it. Detailed instruction can be found at: > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id > 1_fetch.html > > Here is an example: > >> id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > GI Loaded DB Retrieval No. > -- ------ -- ------------- > 74311105 12/07/2007 NCBI 19766263 > 74311105 01/23/2007 NCBI 16325656 > 74311105 03/30/2006 NCBI 13131204 > 74311105 03/03/2006 NCBI 12915541 > 74311105 03/02/2006 NCBI 12885275 > 74311105 12/03/2005 NCBI 12259793 > 74311105 09/09/2005 NCBI 11257262 > 74311105 09/09/2005 NCBI 11242667 > > Wenwu Cui PhD > NCBI/NLM/NIH > >> -----Original Message----- >> From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] >> Sent: Monday, April 07, 2008 6:13 AM >> Cc: bioperl-l at bioperl.org >> Subject: [Bioperl-l] GenBank entries creation dates >> >> Hi all, >> >> Is there any way to obtain the date of creation of individual GenBank >> entries? I don't mean the "last revision" date that can be found in > the >> first line of a GenBank file. >> >> I can access this creation date by looking at the "revision history" > of >> any GenBank entry (for example, see >> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105), >> but I need a systematic (and local=fast) way to access this >> information. >> >> Any help would be very appreciated, >> Thank you very much in advance, >> >> M; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From frederic.romagne at gmail.com Wed Apr 9 20:45:50 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Wed, 09 Apr 2008 15:45:50 -0500 Subject: [Bioperl-l] question about clustalw module. Message-ID: <1207773950.483.13.camel@kiss-laptop> Hello, i have a problem when using Bio::Tools::Run::Alignment::Clustalw : I give it an array_ref scalar (the array contains some fasta sequences) and all the good parameters and i write the result via Bio::SeqIO. The fact is that my result file only contains the Accession number in the header... An example : the initial stream is : >NM_052854 Homo sapiens cAMP responsive element binding protein 3-like 1 (CREB3L1), mRNA. AGAAGACGTGCGGAGGGAGACGCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGG GGGAGCACTTAGCTCCCCCGCCCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTC AGCCCCAACCCCGGGCTCCCCATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGT GGAGTCGGCTGAATGCCCACGGTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCG CTGCCCTAAGGCCCCCGCGCGCCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCG CCCCTCCCCCGGGGCTTCGCCCCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAG GAGCTCTGGACTGGGCGCGCCGCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCC CGGGAGCCGGCTGCGATGGACGCCGTCTTGGAACCCTTCCCGGCCGACAGGCTGTTCCCC GGATCCAGCTTCCTGGACTTGGGGGATCTGAACGAGTCGGACTTCCTCAACAATGCGCAC ... the result file is : >NM_052854 ---------------------------------------AGAAGACGTGCGGAGGGAGAC GCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGGGGGAGCACTTAGCTCCCCCGC CCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTCAGCCCCAACCCCGGGCTCCCC ATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGTGGAGTCGGCTGAATGCCCACG GTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCGCTGCCCTAAGGCCCCCGCGCG CCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCGCCCCTCCCCCGGGGCTTCGCC CCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAGGAGCTCTGGACTGGGCGCGCC GCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCCCGGGAGCCGGCTGCGATGGAC ... ?So i lost the other informations provided by the header... ?Is there any option to keep these informations? Here is a part of my code with my options : my $seq_ref=\@seq; my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM', 'quiet' => 1, 'output' => 'FASTA'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); my $aln = $factory->align($seq_ref); Thank you. From jason at bioperl.org Wed Apr 9 20:55:13 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 9 Apr 2008 13:55:13 -0700 Subject: [Bioperl-l] question about clustalw module. In-Reply-To: <1207773950.483.13.camel@kiss-laptop> References: <1207773950.483.13.camel@kiss-laptop> Message-ID: the clustal alignment format does not allow for the description - if you want to preserve it you'll have to add it back, make a hash indexed by sequence ID and store the description, then when you get your alignment back you can update the description field before writing it out with AlignIO. -jason On Apr 9, 2008, at 1:45 PM, Fr?d?ric Romagn? wrote: > Hello, > > i have a problem when using Bio::Tools::Run::Alignment::Clustalw : > > I give it an array_ref scalar (the array contains some fasta > sequences) > and all the good parameters and i write the result via Bio::SeqIO. > > The fact is that my result file only contains the Accession number in > the header... An example : > > the initial stream is : > >> NM_052854 Homo sapiens cAMP responsive element binding protein 3- >> like 1 > (CREB3L1), mRNA. > AGAAGACGTGCGGAGGGAGACGCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGG > GGGAGCACTTAGCTCCCCCGCCCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTC > AGCCCCAACCCCGGGCTCCCCATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGT > GGAGTCGGCTGAATGCCCACGGTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCG > CTGCCCTAAGGCCCCCGCGCGCCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCG > CCCCTCCCCCGGGGCTTCGCCCCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAG > GAGCTCTGGACTGGGCGCGCCGCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCC > CGGGAGCCGGCTGCGATGGACGCCGTCTTGGAACCCTTCCCGGCCGACAGGCTGTTCCCC > GGATCCAGCTTCCTGGACTTGGGGGATCTGAACGAGTCGGACTTCCTCAACAATGCGCAC > > ... > > the result file is : > >> NM_052854 > ---------------------------------------AGAAGACGTGCGGAGGGAGAC > GCAGAGACAGAGGAGAGGCCGGCAGCCACCCAGTCTCGGGGGAGCACTTAGCTCCCCCGC > CCCGGCTCCCACCCTGTCCGGGGGGCTCCTGAAGCCCTCAGCCCCAACCCCGGGCTCCCC > ATGGAAGCCAGCTGTGCCCCAGGAGGAGCAGGAGGAGGTGGAGTCGGCTGAATGCCCACG > GTGCGCCCGGGGCCCCTGAGCCCATCCCGCTCCTAGCCGCTGCCCTAAGGCCCCCGCGCG > CCCCGCGCCCCCCACCCGGGGCCGCGCCGCCTCCGTCCGCCCCTCCCCCGGGGCTTCGCC > CCGGACCTGCCCCCCGCCCGTTTGCCAGCGCTCAGGCAGGAGCTCTGGACTGGGCGCGCC > GCCGCCCTGGAGTGAGGGAAGCCCAGTGGAAGGGGGTCCCGGGAGCCGGCTGCGATGGAC > > ... > > So i lost the other informations provided by the header... > > Is there any option to keep these informations? > > Here is a part of my code with my options : > > > my $seq_ref=\@seq; > my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM', 'quiet' => 1, > 'output' => 'FASTA'); > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > my $aln = $factory->align($seq_ref); > > > Thank you. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lamq at usal.es Thu Apr 10 15:52:24 2008 From: lamq at usal.es (Luis A. M. Quintales) Date: Thu, 10 Apr 2008 17:52:24 +0200 Subject: [Bioperl-l] xyplot glyph problem with previous aggregation Message-ID: <47FE37B8.9090404@usal.es> I am not able to add xyplot glyphs to one panel because I have some problems with the aggregations. Using that GFF file: ##sequence-region chr1 1 5578650 chr1 atfreq atpc 1 50 58.8000 . . atpc 1 chr1 atfreq atpc 51 100 58.4000 . . atpc 1 chr1 atfreq atpc 101 150 57.6000 . . atpc 1 chr1 atfreq atpc 151 200 57.8000 . . atpc 1 . . . And this source code for preparing the aggregated features necessary for the xyplot glyph: my $filin = $ARGV[0]; my $db = Bio::DB::GFF->new( -dsn => $filin, -adaptor => 'memory', -aggregator => 'at{atpc:atfreq}' ); my $segment = $db->segment('chr1'); my @features1 = $db->features('atpc'); print "$#features1 \n"; my @features2 = $segment->features('atpc'); print "$#features2 \n"; my @features3 = $db->features('at'); print "$#features3 \n"; my @features4 = $segment->features('at'); print "$#features4 \n"; I obtain: 111572 111572 0 0 What I am doing wrong with the aggregator? Many thanks. From lamq at usal.es Thu Apr 10 15:45:55 2008 From: lamq at usal.es (Luis A. M. Quintales) Date: Thu, 10 Apr 2008 17:45:55 +0200 Subject: [Bioperl-l] xyplot glyph problem with previous aggregation Message-ID: <47FE3633.70908@usal.es> I am not able to add xyplot glyphs to one panel because I have some problems with the aggregations. Using that GFF file: ##sequence-region chr1 1 5578650 chr1 atfreq atpc 1 50 58.8000 . . atpc 1 chr1 atfreq atpc 51 100 58.4000 . . atpc 1 chr1 atfreq atpc 101 150 57.6000 . . atpc 1 chr1 atfreq atpc 151 200 57.8000 . . atpc 1 . . . And this source code for preparing the aggregated features necessary for the xyplot glyph: my $filin = $ARGV[0]; my $db = Bio::DB::GFF->new( -dsn => $filin, -adaptor => 'memory', -aggregator => 'at{atpc:atfreq}' ); my $segment = $db->segment('chr1'); my @features1 = $db->features('atpc'); print "$#features1 \n"; my @features2 = $segment->features('atpc'); print "$#features2 \n"; my @features3 = $db->features('at'); print "$#features3 \n"; my @features4 = $segment->features('at'); print "$#features4 \n"; I obtain: 111572 111572 0 0 What I am doing wrong with the aggregator? Many thanks. From lincoln.stein at gmail.com Thu Apr 10 17:55:06 2008 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 10 Apr 2008 13:55:06 -0400 Subject: [Bioperl-l] xyplot glyph problem with previous aggregation In-Reply-To: <47FE37B8.9090404@usal.es> References: <47FE37B8.9090404@usal.es> Message-ID: <6dce9a0b0804101055w65e22abfgaa4f155751fef40f@mail.gmail.com> Hi Luis, When you aggregate the atpc 1 features together, you end up with one feature. Thus @features3 is an array of size 1. The $# operator returns the index of the last element, which is 0. If @features3 were empty, $#features3 would return -1. Lincoln On Thu, Apr 10, 2008 at 11:52 AM, Luis A. M. Quintales wrote: > I am not able to add xyplot glyphs to one panel because I have some > problems with the aggregations. > > Using that GFF file: > > ##sequence-region chr1 1 5578650 > chr1 atfreq atpc 1 50 58.8000 . . atpc 1 > chr1 atfreq atpc 51 100 58.4000 . . atpc 1 > chr1 atfreq atpc 101 150 57.6000 . . atpc 1 > chr1 atfreq atpc 151 200 57.8000 . . atpc 1 > . . . > > > And this source code for preparing the aggregated features necessary for > the xyplot glyph: > > my $filin = $ARGV[0]; > my $db = Bio::DB::GFF->new( -dsn => $filin, > -adaptor => 'memory', > -aggregator => 'at{atpc:atfreq}' > ); > my $segment = $db->segment('chr1'); > my @features1 = $db->features('atpc'); > print "$#features1 \n"; > my @features2 = $segment->features('atpc'); > print "$#features2 \n"; > my @features3 = $db->features('at'); > print "$#features3 \n"; > my @features4 = $segment->features('at'); > print "$#features4 \n"; > > I obtain: > > 111572 > 111572 > 0 > 0 > > What I am doing wrong with the aggregator? > > Many thanks. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From adsj at novozymes.com Fri Apr 11 08:53:23 2008 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Fri, 11 Apr 2008 10:53:23 +0200 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example Message-ID: <87d4owixh8.fsf@topper.koldfront.dk> Hi. I am trying to make Bio::SeqIO return objects of my own type (a small extension of Bio::Seq::RichSeq), by setting -seqfactory. I am having a little trouble creating the correct object to pass with -seqfactory: Following the example given in SYNOPSIS of Bio::Factory::SequenceFactoryI, I get this error: $ perl -e ' > use Bio::Seq::SeqFactory; > my $seqbuilder = Bio::Seq::SeqFactory->new('type' => 'Bio::PrimarySeq'); > > my $seq = $seqbuilder->create(-seq => 'ACTGAT', > -display_id => 'exampleseq'); > > print "seq is a ", ref($seq), "\n"; > ' ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Can't locate type.pm in @INC (@INC contains: /z/bio/biotools/bioinfperlmodules/ /z/bio/adm/modules /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl .) at (eval 13) line 3. : Unrecognized Sequence type for SeqFactory 'type' STACK: Error::throw STACK: Bio::Root::Root::throw /usr/share/perl/5.8/Bio/Root/Root.pm:357 STACK: Bio::Seq::SeqFactory::type /usr/share/perl/5.8/Bio/Seq/SeqFactory.pm:134 STACK: Bio::Seq::SeqFactory::new /usr/share/perl/5.8/Bio/Seq/SeqFactory.pm:93 STACK: -e:3 ----------------------------------------------------------- $ If I go "Bio::Seq::SeqFactory('Bio::PrimarySeq'=>1)" instead, for instance, it seems to work: $ perl -e ' > use Bio::Seq::SeqFactory; > my $seqbuilder = Bio::Seq::SeqFactory->new('Bio::PrimarySeq'=>1); > > my $seq = $seqbuilder->create(-seq => 'ACTGAT', > -display_id => 'exampleseq'); > > print "seq is a ", ref($seq), "\n"; > ' seq is a Bio::PrimarySeq $ I was about to write a patch for the pod, when I realized that I'd better start by asking: Is this a buglet in the pod or the code? Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From hlapp at gmx.net Fri Apr 11 15:35:54 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 11 Apr 2008 11:35:54 -0400 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example In-Reply-To: <87d4owixh8.fsf@topper.koldfront.dk> References: <87d4owixh8.fsf@topper.koldfront.dk> Message-ID: <0037240B-F469-4388-972A-324101B11621@gmx.net> On Apr 11, 2008, at 4:53 AM, Adam Sj?gren wrote: > $ perl -e ' >> use Bio::Seq::SeqFactory; >> my $seqbuilder = Bio::Seq::SeqFactory->new('type' => >> 'Bio::PrimarySeq'); You need to prefix the argument with a dash: '-type', not 'type'. Otherwise, it assumes that the class you want instantiated is 'type.pm'. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From 1zoujing at 163.com Thu Apr 10 05:08:52 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 9 Apr 2008 22:08:52 -0700 (PDT) Subject: [Bioperl-l] Bio::ASN1::EntrezGene parse so slowly? Message-ID: <16602210.post@talk.nabble.com> I want to parse a file "gene_info" from NCBI. The format of Gene in NCBI is ASN1, right? So I used Bio::ASN1::EntrezGene. But it didn't work properly/too slow. The file is about 500M. The code is following: use Bio::ASN1::EntrezGene; my $parser = Bio::ASN1::EntrezGene->new('file' => $ARGV[0]); my $i = 0; while(my $result = $parser->next_seq) { last; #something to do there, here use last for test} When it goes to the "while" part, it is processing on and on, it does not went out, even I used "last" in the "while" part. So I wonder whether it is too slow or the module is not fit for this job, or I did something wrong? Thank you! -- View this message in context: http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16602210.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 06:17:41 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 9 Apr 2008 23:17:41 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16602770.post@talk.nabble.com> I am a geen hand in Bioperl. When I run perl with "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error information: Data Error: none conforming data found on line 1 in Sus_scrofa.ags. But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, should be the same as Homo_sapiens in the example. So it should be no error as the code is the example from Mingyi. I wonder why this happen, and should I change something about the file? -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16602770.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 06:56:52 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 9 Apr 2008 23:56:52 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: <16602770.post@talk.nabble.com> References: <16602770.post@talk.nabble.com> Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 07:03:56 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:03:56 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there still one thing, I want to parse "gene_info.gz" in Gene of NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file ) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 07:04:32 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:04:32 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there still one thing, I want to parse "gene_info.gz" in Gene of NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 07:09:40 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:09:40 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there still one thing, I want to parse "gene_info.gz" in Gene of NCBI. ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 10 07:10:26 2008 From: 1zoujing at 163.com (zoujing) Date: Thu, 10 Apr 2008 00:10:26 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" Message-ID: <16603225.post@talk.nabble.com> Seached the web and found the answer now, quote the answer as following: The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Mingyi But there is still one thing, I want to parse "gene_info.gz" in Gene of NCBI. ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file) is not the right format for Bio::ASN1::EntrezGene? zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no > error as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From stefan.kirov at bms.com Fri Apr 11 19:59:29 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 11 Apr 2008 15:59:29 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: <16602770.post@talk.nabble.com> References: <16602770.post@talk.nabble.com> Message-ID: AGS is a binary ASN.1 format and WILL NOT be parsed! You have to use gene2xml( weird, but this is NCBI) with these flags: -c -x -b -i. This will spit out text ASN which can be parsed. Stefan On Wed, 9 Apr 2008, zoujing wrote: > > I am a geen hand in Bioperl. When I run perl with > "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error > information: > Data Error: none conforming data found on line 1 in Sus_scrofa.ags. > > But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, > should be the same as Homo_sapiens in the example. So it should be no error > as the code is the example from Mingyi. > I wonder why this happen, and should I change something about the file? > > -- > View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16602770.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From stefan.kirov at bms.com Fri Apr 11 20:01:30 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 11 Apr 2008 16:01:30 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: <16603225.post@talk.nabble.com> References: <16603225.post@talk.nabble.com> Message-ID: It is not. If you use this file, why would you need a parser for it anyway? Just split on \t or read with OpenOffice or equiv. Stefan On Thu, 10 Apr 2008, zoujing wrote: > > Seached the web and found the answer now, quote the answer as following: > The error was thrown by my Bio::ASN1::EntrezGene module because it > expects a text file, while you fed it with a binary file. To use > gzipped ASN binary file from NCBI, download the NCBI gene2xml > (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), > then use this syntax to run my parser on the binary files: > > my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i > Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped > binary file directly downloaded from NCBI > > Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). > Mingyi > > But there still one thing, I want to parse "gene_info.gz" in Gene of > NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line > per GeneID, Column header line is the first line in the file > ) is not the right format for Bio::ASN1::EntrezGene? > > > > zoujing wrote: >> >> I am a geen hand in Bioperl. When I run perl with >> "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error >> information: >> Data Error: none conforming data found on line 1 in Sus_scrofa.ags. >> >> But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, >> should be the same as Homo_sapiens in the example. So it should be no >> error as the code is the example from Mingyi. >> I wonder why this happen, and should I change something about the file? >> >> > > -- > View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From asjo at koldfront.dk Fri Apr 11 19:39:59 2008 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Fri, 11 Apr 2008 21:39:59 +0200 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example In-Reply-To: <0037240B-F469-4388-972A-324101B11621@gmx.net> (Hilmar Lapp's message of "Fri, 11 Apr 2008 11:35:54 -0400") References: <87d4owixh8.fsf@topper.koldfront.dk> <0037240B-F469-4388-972A-324101B11621@gmx.net> Message-ID: <877if4i3jk.fsf@topper.koldfront.dk> On Fri, 11 Apr 2008 11:35:54 -0400, Hilmar wrote: > On Apr 11, 2008, at 4:53 AM, Adam Sj?gren wrote: >>> my $seqbuilder = Bio::Seq::SeqFactory->new('type' => >>> 'Bio::PrimarySeq'); > You need to prefix the argument with a dash: '-type', not 'type'. > Otherwise, it assumes that the class you want instantiated is > 'type.pm'. I guess that means I should submit a patch for the SYNOPSIS. Attached. Thanks, Adam Index: Bio/Factory/SequenceFactoryI.pm =================================================================== --- Bio/Factory/SequenceFactoryI.pm (revision 14654) +++ Bio/Factory/SequenceFactoryI.pm (working copy) @@ -20,7 +20,7 @@ # get a Bio::Factory::SequenceFactoryI object like use Bio::Seq::SeqFactory; - my $seqbuilder = Bio::Seq::SeqFactory->new('type' => 'Bio::PrimarySeq'); + my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq => 'ACTGAT', -display_id => 'exampleseq'); -- "Well, I'm a moon around you" Adam Sj?gren asjo at koldfront.dk From bamboowarrior at gmail.com Fri Apr 11 23:10:35 2008 From: bamboowarrior at gmail.com (Arkady) Date: Fri, 11 Apr 2008 18:10:35 -0500 Subject: [Bioperl-l] Nucleotide Links in Gene DB (GenBank) Message-ID: <91656c3f0804111610r24c8fa5es5bcb56b7a59e0208@mail.gmail.com> Hi everyone, I'm a bioperl n00b. Actually, kind of a genbank n00b, too, as I'm from a CS background and just started bio things last June. I'm trying to set up an analysis pipeline of primate protein CDSs (the nucleotide seqs). I've written a script which does a pretty decent job of downloading these from GenBank--but it's inconsistent, because a lot of sequences in nucleotide are 'predicted' and named LOCthisorthat instead of by gene name. So what I was thinking was this (assume ANKRD43 is the gene for this example): 1. Search 'gene' database for ANKRD43 AND (PRI*[ORGN]) On NCBI, there's an option to show all nucleotide links. How do I get a list of those in bioperl? Can bioperl even search 'gene', or just 'nucleotide'? 2. Search 'nucleotide' for the referenced items from #1, and also for ANKRD43[TITL] AND (PRI*[ORGN]), save CDSes. 3. BLAST mRNA for one of those CDSes, see if we pick up any other matches. 4. BLAT other primates for CDSes, see if we find anything not in GenBank. On the other hand, I always get the feeling I'm doing things the hard way--especially here, with #1 and #2. Is there a much more obvious, simple way to do this? Thanks, folks. Cheers, John Woods Institute for Cellular and Molecular Biology The University of Texas at Austin From hlapp at gmx.net Fri Apr 11 23:19:44 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 11 Apr 2008 19:19:44 -0400 Subject: [Bioperl-l] Bio::Factory::SequenceFactoryI SYNOPSIS example In-Reply-To: <877if4i3jk.fsf@topper.koldfront.dk> References: <87d4owixh8.fsf@topper.koldfront.dk> <0037240B-F469-4388-972A-324101B11621@gmx.net> <877if4i3jk.fsf@topper.koldfront.dk> Message-ID: Thanks, applied. -hilmar On Apr 11, 2008, at 3:39 PM, Adam Sj?gren wrote: > On Fri, 11 Apr 2008 11:35:54 -0400, Hilmar wrote: > >> On Apr 11, 2008, at 4:53 AM, Adam Sj?gren wrote: > >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('type' => >>>> 'Bio::PrimarySeq'); > >> You need to prefix the argument with a dash: '-type', not 'type'. >> Otherwise, it assumes that the class you want instantiated is >> 'type.pm'. > > I guess that means I should submit a patch for the SYNOPSIS. Attached. > > > Thanks, > > Adam > > > Index: Bio/Factory/SequenceFactoryI.pm > =================================================================== > --- Bio/Factory/SequenceFactoryI.pm (revision 14654) > +++ Bio/Factory/SequenceFactoryI.pm (working copy) > @@ -20,7 +20,7 @@ > # get a Bio::Factory::SequenceFactoryI object like > > use Bio::Seq::SeqFactory; > - my $seqbuilder = Bio::Seq::SeqFactory->new('type' => > 'Bio::PrimarySeq'); > + my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > 'Bio::PrimarySeq'); > > my $seq = $seqbuilder->create(-seq => 'ACTGAT', > -display_id => 'exampleseq'); > > -- > "Well, I'm a moon around you" Adam > Sj?gren > > asjo at koldfront.dk > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mmokrejs at ribosome.natur.cuni.cz Sat Apr 12 01:32:14 2008 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Sat, 12 Apr 2008 03:32:14 +0200 Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon_id In-Reply-To: References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com> <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net> <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com> <93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com> Message-ID: <4800111E.3030802@ribosome.natur.cuni.cz> Chris Fields wrote: > The counter to that perspective (using new sequences with old tax info) > would be to regularly update NCBI taxonomy, particularly in > circumstances prior to adding new sequences. Hilmar mentioned that once > tax is loaded it doesn't take as long to update, so you could set up a > cron job to update regularly. > > I remember someone mentioning weekly or monthly updates on the list > quite a while ago, but I'm unsure how often NCBI updates tax information > (i.e. with every release, monthly, weekly, etc). I can see instances > popping up where you used the an up-to-date taxonomy but a new sequence > contains a tax ID not present. I think bioperl-db handles these but I'm > not sure what other Bio* do. > I spent some time benchmarking this and inspecting the mysql log files. The current load_ncbi_taxonomy.pl script with minor modification to show timestamps does this on initial import into mysql and then update of the database using exactly same dataset (but anyway it has to walk through all the data): $ ./load_ncbi_taxonomy.pl --dbname=biosqldb --driver=mysql --host=127.0.01 \ --port=3306 --directory=/home/mmokrejs/bioinformatics/databases/ncbitax/dump \ --chunksize=0 --verbose=2 --mycnf=~/.my.cnf Sat Apr 12 01:58:43 MEST 2008 Loading NCBI taxon database in /home/mmokrejs/bioinformatics/databases/ncbitax/dump: ... retrieving all taxon nodes in the database Sat Apr 12 01:58:43 MEST 2008 ... reading in taxon nodes from nodes.dmp Sat Apr 12 01:58:58 MEST 2008 ... insert / update / delete taxon nodes 10000/421098 done (in 5 secs, 2000.0 rows/s) 20000/421098 done (in 4 secs, 2500.0 rows/s) ... 420000/421098 done (in 4 secs, 2500.0 rows/s) Sat Apr 12 02:02:21 MEST 2008 ... (committing nodes) Sat Apr 12 02:02:21 MEST 2008 ... rebuilding nested set left/right values 10000 done (in 24 secs, 416.7 rows/s) 20000 done (in 26 secs, 384.6 rows/s) 30000 done (in 24 secs, 416.7 rows/s) ... 420004 done (in 23 secs, 434.8 rows/s) Sat Apr 12 02:19:25 MEST 2008 ... reading in taxon names from names.dmp Sat Apr 12 02:19:25 MEST 2008 ... deleting old taxon names Sat Apr 12 02:19:25 MEST 2008 ... inserting new taxon names 10000 done (in 8 secs, 1250.0 rows/s) 20000 done (in 8 secs, 1250.0 rows/s) ... 580000 done (in 5 secs, 2000.0 rows/s) Sat Apr 12 02:24:48 MEST 2008 ... cleaning up Sat Apr 12 02:24:49 MEST 2008 Done. $ I decided to re-import the same data to mimic at least somehow the future updates, although no record should be UPDATEd, except zapping left and right values with NULL. :(( $ ./load_ncbi_taxonomy.pl --dbname=biosqldb --driver=mysql --host=127.0.01 --port=3306 --directory=/home/mmokrejs/bioinformatics/databases/ncbitax/dump \ --chunksize=0 --verbose=2 --mycnf=~/.my.cnf Sat Apr 12 02:35:20 MEST 2008 Loading NCBI taxon database in /home/mmokrejs/bioinformatics/databases/ncbitax/dump: ... retrieving all taxon nodes in the database Sat Apr 12 02:35:26 MEST 2008 ... reading in taxon nodes from nodes.dmp Sat Apr 12 02:35:46 MEST 2008 ... insert / update / delete taxon nodes 10000/421098 done (in 0 secs, 10000.0 rows/s) 20000/421098 done (in 0 secs, 10000.0 rows/s) ... 410000/421098 done (in 0 secs, 10000.0 rows/s) 420000/421098 done (in 0 secs, 10000.0 rows/s) Sat Apr 12 02:35:55 MEST 2008 ... (committing nodes) Sat Apr 12 02:35:55 MEST 2008 ... rebuilding nested set left/right values 10000 done (in 9 secs, 1111.1 rows/s) 20000 done (in 9 secs, 1111.1 rows/s) ... 410004 done (in 8 secs, 1250.0 rows/s) 420004 done (in 9 secs, 1111.1 rows/s) Sat Apr 12 02:41:54 MEST 2008 ... reading in taxon names from names.dmp Sat Apr 12 02:41:54 MEST 2008 ... deleting old taxon names Sat Apr 12 02:41:55 MEST 2008 ... inserting new taxon names 10000 done (in 5 secs, 2000.0 rows/s) 20000 done (in 5 secs, 2000.0 rows/s) ... 570000 done (in 6 secs, 1666.7 rows/s) 580000 done (in 5 secs, 2000.0 rows/s) Sat Apr 12 02:47:27 MEST 2008 ... cleaning up Sat Apr 12 02:47:27 MEST 2008 Done. $ ls -la /var/log/mysql/mysql.log -rw-rw---- 1 mysql mysql 483443314 Apr 12 03:15 /var/log/mysql/mysql.log $ Pentium4 M laptop, 1.8GHz, 1 GB RAM, mysql-5.0.56 with enabled SQL text logging, the slow version of logging all SQL commands compared to binary logging. The log was cleared before the tests. I could provide some bits from the log or upload it somewhere if anybody else would like to dig into the details. I believe the recalculation step could be made faster. See what happens: 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '1' ORDER BY ncbi_taxon_id 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '10239' ORDER BY ncbi_taxon_id 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12333' ORDER BY ncbi_taxon_id 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12335' ORDER BY ncbi_taxon_id 31 Query UPDATE taxon SET left_value = NULL, right_value = NULL WHERE left_value = '4' 31 Query UPDATE taxon SET left_value = NULL, right_value = NULL WHERE right_value = '5' 31 Query UPDATE taxon SET left_value = '4', right_value = '5' WHERE taxon_id = '12335' 31 Query SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12340' ORDER BY ncbi_taxon_id 31 Query UPDATE taxon SET left_value = NULL, right_value = NULL WHERE left_value = '6' 31 Query UPDATE taxon SET left_value = NULL, right_value = NULL WHERE right_value = '7' 31 Query UPDATE taxon SET left_value = '6', right_value = '7' WHERE taxon_id = '12340' The columns left_value and right_value have NULL value upon the table is created, so no need to write again NULL into them. This would mean writing a wrapper function which would mimic update() but before doing that it would do 'SELECT * FROM', compare the values with those to be written and include in the final UPDATE statement only those columns for which values have been changed. We use such a smart wrapper for our code in python. ;-) When the columns for left and right are to be made NULL during update of an existing database, I think it would be much faster to drop the columns and re-create them again with NULL values. I think it could be investigated more the possibility to create empty taxon and taxon_name tables as MyISAM tables and only after all the import and updates they could be converted into InnoDB tables. One would have to probably think a bit more of the foreign keys but it might be they would not even be lost during the conversion back and forth. Actually, easy to check. Dump your current taxon and taxon_name tables (maybe even without sql data using --without-data), run 'ALTER TABLE taxon ... type=MyISAM' followed by 'ALTER TABLE taxon ... type=InnoDB' dump again the database structure and compare by diff with the original. But, time for sleep here. Martin From sdavis2 at mail.nih.gov Sat Apr 12 03:50:44 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 11 Apr 2008 23:50:44 -0400 Subject: [Bioperl-l] Bio::ASN1::EntrezGene parse so slowly? In-Reply-To: <16602210.post@talk.nabble.com> References: <16602210.post@talk.nabble.com> Message-ID: <264855a00804112050gf785c2ei66d9c7463597eccd@mail.gmail.com> gene_info is a tab-delimited text file, if I recall correctly. Have you looked at it? If it is, you should be able to parse it in a few seconds with just a couple lines of code. Sean On Thu, Apr 10, 2008 at 1:08 AM, zoujing <1zoujing at 163.com> wrote: > > I want to parse a file "gene_info" from NCBI. The format of Gene in NCBI is > ASN1, right? So I used Bio::ASN1::EntrezGene. But it didn't work > properly/too slow. The file is about 500M. > The code is following: > use Bio::ASN1::EntrezGene; > my $parser = Bio::ASN1::EntrezGene->new('file' => $ARGV[0]); > my $i = 0; > while(my $result = $parser->next_seq) > { last; #something to do there, here use last for test} > > When it goes to the "while" part, it is processing on and on, it does not > went out, even I used "last" in the "while" part. > So I wonder whether it is too slow or the module is not fit for this job, > or I did something wrong? > > Thank you! > -- > View this message in context: http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16602210.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From david at burt7259.freeserve.co.uk Sat Apr 12 17:01:57 2008 From: david at burt7259.freeserve.co.uk (David Burt) Date: Sat, 12 Apr 2008 18:01:57 +0100 Subject: [Bioperl-l] bioperl-db Message-ID: Hi Hilmar, Hope you can help ? I am using bioperl-db to create a biosql database I have used scripts load_seqdatabase.pl and load_ontology.pl to install human swissprot entries, gene ontology, sequence ontology and now want to load interpro Here?s the command line I have tried perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser root --dbpass chicken --driver mysql \ --namespace "InterPro" --format InterPro interpro.xml But I get this message Can't call method "identifier" on an undefined value at /cygdrive/c/ Bioinformatics/Ensembl/src/bioperl-live/Bio/Ontology/ SimpleOntologyEngine.pm line 395 Any ideas? Dave PS: here?s the top of the interpro.xml file Kringle From hlapp at gmx.net Sat Apr 12 18:10:44 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 12 Apr 2008 14:10:44 -0400 Subject: [Bioperl-l] personal vs list email Message-ID: I'm not sure why but I have received several Bioperl or BioSQL- related email inquiries directed to me *personally* over the past few weeks. I have been responding as I get to them, but I feel that I am doing both the senders and this community a poor service, because sometimes someone else on the list could have responded much faster, and when I respond, others on the list who happen to be interested in the same question don't get to see the answer. So from now on as a policy I will redirect *every* email sent to me personally and that asks a question related to one of the projects to the respective mailing list. If you don't want this, please conspicuously say so at the top of your email, and in that case if you do ask a project-related question be prepared to wait and to possibly needing to follow up. As an aside, it's a pretty safe assumption to make that all other core developers, and quite possibly *all* developers are following a similar policy, whether expressly or not. Isn't this somewhere in the FAQ too? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Apr 12 18:16:13 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 12 Apr 2008 14:16:13 -0400 Subject: [Bioperl-l] bioperl-db In-Reply-To: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> Message-ID: Hi Burt, can you try format interprosax instead of interpro? That variant is also much more graceful regarding required space. -hilmar On Apr 12, 2008, at 1:01 PM, David Burt wrote: > Hi Hilmar, > > Hope you can help ? I am using bioperl-db to create a biosql database > > I have used scripts load_seqdatabase.pl and load_ontology.pl to > install human swissprot entries, gene ontology, sequence ontology > and now want to load interpro > > Here?s the command line I have tried > > perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser > root --dbpass chicken --driver mysql \ > --namespace "InterPro" --format InterPro interpro.xml > > But I get this message > > Can't call method "identifier" on an undefined value at /cygdrive/ > c/Bioinformatics/Ensembl/src/bioperl-live/Bio/Ontology/ > SimpleOntologyEngine.pm line 395 > > Any ideas? > > Dave > > PS: here?s the top of the interpro.xml file > > > > > > > > > file_date="04-OCT-2006 00:00:00" /> > file_date="22-NOV-2006 00:00:00" /> > file_date="12-JUN-2007 00:00:00" /> > file_date="22-SEP-2005 00:00:00" /> > file_date="23-APR-2004 00:00:00" /> > file_date="14-NOV-2006 00:00:00" /> > file_date="27-JUL-2007 00:00:00" /> > file_date="28-SEP-2007 00:00:00" /> > file_date="11-SEP-2006 00:00:00" /> > file_date="30-NOV-2006 00:00:00" /> > entry_count="359942" file_date="18-MAR-2008 00:00:00" /> > file_date="18-MAR-2008 00:00:00" /> > file_date="19-MAR-2008 00:00:00" /> > file_date="27-MAR-2007 00:00:00" /> > file_date="12-JUL-2007 16:56:17" /> > > protein_count="352"> > Kringle > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sat Apr 12 20:17:43 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 12 Apr 2008 15:17:43 -0500 Subject: [Bioperl-l] [BioSQL-l] personal vs list email In-Reply-To: References: Message-ID: On Apr 12, 2008, at 1:10 PM, Hilmar Lapp wrote: > I'm not sure why but I have received several Bioperl or BioSQL- > related email inquiries directed to me *personally* over the past > few weeks. > > I have been responding as I get to them, but I feel that I am doing > both the senders and this community a poor service, because > sometimes someone else on the list could have responded much faster, > and when I respond, others on the list who happen to be interested > in the same question don't get to see the answer. > > So from now on as a policy I will redirect *every* email sent to me > personally and that asks a question related to one of the projects > to the respective mailing list. If you don't want this, please > conspicuously say so at the top of your email, and in that case if > you do ask a project-related question be prepared to wait and to > possibly needing to follow up. > > As an aside, it's a pretty safe assumption to make that all other > core developers, and quite possibly *all* developers are following a > similar policy, whether expressly or not. I agree; I'm sure several other core devs feel the same way. I always try to forward these to the list if I feel it is more relevant there. > Isn't this somewhere in the FAQ too? > > -hilmar No, but I've added it to the bioperl FAQ; might be worth checking over and editing. chris From hlapp at gmx.net Sat Apr 12 22:40:53 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 12 Apr 2008 18:40:53 -0400 Subject: [Bioperl-l] bioperl-db In-Reply-To: <000001c89ce2$5400a710$0202a8c0@STUDYPC> References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> <000001c89ce2$5400a710$0202a8c0@STUDYPC> Message-ID: <3F77F49A-9C9E-4450-AE28-46F00CADBC8B@gmx.net> Burt - please keep your replies on the list. Others may have input too, or benefit from the answer too. As there is no name() method call on line 914 in the current version let's check first that you run a current version of BioPerl. It will need to be at least 1.5.2. However, I do suspect a problem in either the InterPro file itself (wouldn't be the first time), or the InterPro parser. -hilmar On Apr 12, 2008, at 5:15 PM, David Burt wrote: > Hilmar > > Many thanks seems to be working > > But got this output ? any comments/ideas what it means ? > > Dave > > > perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser > root --dbpass chicken --driver mysql \ > > --namespace "InterPro" --format interprosax interpro.xml > ...deleting all relationships for InterPro > ...parsing and loading InterPro > Can't call method "name" on an undefined value at load_ontology.pl > line 914. > > HERE?S the name and definition in the ontology table > > Name = InterPro > > Definition = > > PANTHER version 6.1, 30128 entries, 04-OCT-2006 > PFAM version 21.0, 8957 entries, 22-NOV-2006 > PIRSF version 2.70, 2877 entries, 12-JUN-2007 > PRINTS version 38.0, 1900 entries, 22-SEP-2005 > PRODOM version 2005.1, 1522 entries, 23-APR-2004 > PROSITE version 20.0, 2006 entries, 14-NOV-2006 > SMART version 5.1, 724 entries, 27-JUL-2007 > TIGRFAMs version 7.0, 3423 entries, 28-SEP-2007 > GENE3D version 3.0.0, 2147 entries, 11-SEP-2006 > SSF version 1.69, 1538 entries, 30-NOV-2006 > SWISSPROT version 55.1, 359942 entries, 18-MAR-2008 > TREMBL version 38.1, 5443281 entries, 18-MAR-2008 > INTERPRO version 17.0, 16175 entries, 19-MAR-2008 > GO version N/A, 23937 entries, 27-MAR-2007 > MEROPS version 7.8, 2831 entries, 12-JUL-2007 | > > > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: 12 April 2008 19:16 > To: David Burt > Cc: Bioperl BioPerl > Subject: Re: bioperl-db > > Hi Burt, > > can you try format interprosax instead of interpro? That variant is > also much more graceful regarding required space. > > -hilmar > > On Apr 12, 2008, at 1:01 PM, David Burt wrote: > > > Hi Hilmar, > > Hope you can help ? I am using bioperl-db to create a biosql database > > I have used scripts load_seqdatabase.pl and load_ontology.pl to > install human swissprot entries, gene ontology, sequence ontology > and now want to load interpro > > Here?s the command line I have tried > > perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser > root --dbpass chicken --driver mysql \ > --namespace "InterPro" --format InterPro interpro.xml > > But I get this message > > Can't call method "identifier" on an undefined value at /cygdrive/ > c/Bioinformatics/Ensembl/src/bioperl-live/Bio/Ontology/ > SimpleOntologyEngine.pm line 395 > > Any ideas? > > Dave > > PS: here?s the top of the interpro.xml file > > > > > > > > > file_date="04-OCT-2006 00:00:00" /> > file_date="22-NOV-2006 00:00:00" /> > file_date="12-JUN-2007 00:00:00" /> > file_date="22-SEP-2005 00:00:00" /> > file_date="23-APR-2004 00:00:00" /> > file_date="14-NOV-2006 00:00:00" /> > file_date="27-JUL-2007 00:00:00" /> > file_date="28-SEP-2007 00:00:00" /> > file_date="11-SEP-2006 00:00:00" /> > file_date="30-NOV-2006 00:00:00" /> > entry_count="359942" file_date="18-MAR-2008 00:00:00" /> > file_date="18-MAR-2008 00:00:00" /> > file_date="19-MAR-2008 00:00:00" /> > file_date="27-MAR-2007 00:00:00" /> > file_date="12-JUL-2007 16:56:17" /> > > protein_count="352"> > Kringle > > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Apr 12 22:43:25 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 12 Apr 2008 18:43:25 -0400 Subject: [Bioperl-l] bioperl-db In-Reply-To: <000001c89ce5$a5df2e50$0202a8c0@STUDYPC> References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> <000001c89ce5$a5df2e50$0202a8c0@STUDYPC> Message-ID: I'm not sure what you mean by 'Check interpro.xml', but you can use the --safe command-line option to keep going if an individual term fails to load for whatever reason. Can you post the data for the seemingly offending record? (and please cc the list) -hilmar On Apr 12, 2008, at 5:39 PM, David Burt wrote: > Hi Hilmar > > Just checked mysql database and only have 39 entries under interpro > and loaded up to IPR000035 > > Check unterpro.xml looks OK from IPR000036 and onwards > > So seems to have crashed at IPR000035 ? > > dave > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: 12 April 2008 19:16 > To: David Burt > Cc: Bioperl BioPerl > Subject: Re: bioperl-db > > Hi Burt, > > can you try format interprosax instead of interpro? That variant is > also much more graceful regarding required space. > > -hilmar > > On Apr 12, 2008, at 1:01 PM, David Burt wrote: > > > Hi Hilmar, > > Hope you can help ? I am using bioperl-db to create a biosql database > > I have used scripts load_seqdatabase.pl and load_ontology.pl to > install human swissprot entries, gene ontology, sequence ontology > and now want to load interpro > > Here?s the command line I have tried > > perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser > root --dbpass chicken --driver mysql \ > --namespace "InterPro" --format InterPro interpro.xml > > But I get this message > > Can't call method "identifier" on an undefined value at /cygdrive/ > c/Bioinformatics/Ensembl/src/bioperl-live/Bio/Ontology/ > SimpleOntologyEngine.pm line 395 > > Any ideas? > > Dave > > PS: here?s the top of the interpro.xml file > > > > > > > > > file_date="04-OCT-2006 00:00:00" /> > file_date="22-NOV-2006 00:00:00" /> > file_date="12-JUN-2007 00:00:00" /> > file_date="22-SEP-2005 00:00:00" /> > file_date="23-APR-2004 00:00:00" /> > file_date="14-NOV-2006 00:00:00" /> > file_date="27-JUL-2007 00:00:00" /> > file_date="28-SEP-2007 00:00:00" /> > file_date="11-SEP-2006 00:00:00" /> > file_date="30-NOV-2006 00:00:00" /> > entry_count="359942" file_date="18-MAR-2008 00:00:00" /> > file_date="18-MAR-2008 00:00:00" /> > file_date="19-MAR-2008 00:00:00" /> > file_date="27-MAR-2007 00:00:00" /> > file_date="12-JUL-2007 16:56:17" /> > > protein_count="352"> > Kringle > > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From Russell.Smithies at agresearch.co.nz Mon Apr 14 02:51:41 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 14 Apr 2008 14:51:41 +1200 Subject: [Bioperl-l] Tandem Repeats Finder? In-Reply-To: References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC><000001c89ce5$a5df2e50$0202a8c0@STUDYPC> Message-ID: Has anyone tried TRF? I notice UCSC is using it for all their simple repeat annotations and thought it might be better than what we're currently using (Sputnik) And is there a BioPerl parser for it's output or am I going to have to write my own ? Thanx, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Russell.Smithies at agresearch.co.nz Mon Apr 14 02:53:46 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 14 Apr 2008 14:53:46 +1200 Subject: [Bioperl-l] Tandem Repeats Finder? In-Reply-To: References: Message-ID: Scratch the need for a parser. I turned off html output and it's all nice white-space separated text :-) Russell > -----Original Message----- > From: Smithies, Russell > Sent: Monday, 14 April 2008 2:52 p.m. > To: 'Bioperl BioPerl' > Subject: Tandem Repeats Finder? > > Has anyone tried TRF? > I notice UCSC is using it for all their simple repeat annotations and thought it might > be better than what we're currently using (Sputnik) > > And is there a BioPerl parser for it's output or am I going to have to write my own ? > > Thanx, > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E? russell.smithies at agresearch.co.nz > > Invermay? Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T? +64 3 489 3809 > F? +64 3 489 9174 > www.agresearch.co.nz > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From csaba.ortutay at gmail.com Mon Apr 14 04:15:22 2008 From: csaba.ortutay at gmail.com (Ortutay Csaba =?iso-8859-1?q?P=E9ter?=) Date: Mon, 14 Apr 2008 07:15:22 +0300 Subject: [Bioperl-l] Tandem Repeats Finder? In-Reply-To: References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> Message-ID: <200804140715.22702.csaba.ortutay@gmail.com> Hello, I have used TRF in my earlier projects. It is nice and quick tool. There was not ready made parsers those times (5-6 years ago) so we have written our own. Csaba > Has anyone tried TRF? > I notice UCSC is using it for all their simple repeat annotations and > thought it might be better than what we're currently using (Sputnik) > > And is there a BioPerl parser for it's output or am I going to have to > write my own ? > > Thanx, -- Csaba Ortutay PhD IMT Bioinformatics University of Tampere Finland From avilella at gmail.com Mon Apr 14 11:13:26 2008 From: avilella at gmail.com (Albert Vilella) Date: Mon, 14 Apr 2008 12:13:26 +0100 Subject: [Bioperl-l] how can I print a Bio::Tree newick sortby given list? Message-ID: <358f4d650804140413x4271f18bx40af1b9054306df8@mail.gmail.com> Hi, I have a newick file that I want to sort by a given order and print again as newick. For example, if I have (((ENSPTRG00000013811:0.0011,ENSG00000142192:0.0021):0.0033,ENSPPYG00000003902:0.0326):0.0000,ENSMMUG00000014384:0.0366):0.3638; I want to sort it by "ENSG:ENSPTRG:ENSPPYG:ENSMMUG". Any suggestions on how to do this in bioperl? Cheers, Albert. From lamq at usal.es Mon Apr 14 15:01:51 2008 From: lamq at usal.es (Luis A. M. Quintales) Date: Mon, 14 Apr 2008 17:01:51 +0200 Subject: [Bioperl-l] xyplot glyph: scale problems Message-ID: <480371DF.7040900@usal.es> I have some problem with the xyplot scale numbers calculated by the glyph. The shape of the graph looks fine, but the scale number 10 and his position in the ouput is not correct. I send the source code, simplified input file and the png output. Thank you Source code ex1.pl (also in http://avellano.usal.es/~luis/bioperl-l/ex1.pl) ============================ #!/usr/bin/perl use Bio::DB::GFF; use Bio::Graphics::Panel; use strict; my $filin = $ARGV[0]; my $db = Bio::DB::GFF->new( -dsn => $filin,-adaptor => 'memory', -aggregator => 'at{atpc:atfreq}' ); my $segment = $db->segment('chr1'); my @features = $segment->features('at'); my $panel = Bio::Graphics::Panel->new( -offset => 0, -grid => 100, -length => 500, -width => 800, -pad_left => 50, -pad_right => 50 ); $panel->add_track($segment, -glyph => 'generic', -bgcolor => 'blue', -label => 1); $panel->add_track(\@features, -glyph => 'xyplot', -graph_type=>'boxes', -scale=>'left', -height=>200, ); open (FI,"> sal.png"); ============================ in1.gff file (also in http://avellano.usal.es/~luis/bioperl-l/in1.gff) ============================ ##sequence-region chr1 1 5578650 chr1 atfreq atpc 1 10 64.0000 . . atpc 1 chr1 atfreq atpc 11 20 63.0000 . . atpc 1 chr1 atfreq atpc 21 30 62.0000 . . atpc 1 chr1 atfreq atpc 31 40 59.0000 . . atpc 1 chr1 atfreq atpc 41 50 59.0000 . . atpc 1 chr1 atfreq atpc 51 60 59.0000 . . atpc 1 chr1 atfreq atpc 61 70 59.0000 . . atpc 1 chr1 atfreq atpc 71 80 59.0000 . . atpc 1 chr1 atfreq atpc 81 90 61.0000 . . atpc 1 chr1 atfreq atpc 91 100 60.0000 . . atpc 1 chr1 atfreq atpc 101 110 60.0000 . . atpc 1 chr1 atfreq atpc 111 120 64.0000 . . atpc 1 chr1 atfreq atpc 121 130 64.0000 . . atpc 1 chr1 atfreq atpc 131 140 60.0000 . . atpc 1 chr1 atfreq atpc 141 150 60.0000 . . atpc 1 chr1 atfreq atpc 151 160 63.0000 . . atpc 1 chr1 atfreq atpc 161 170 62.0000 . . atpc 1 chr1 atfreq atpc 171 180 59.0000 . . atpc 1 chr1 atfreq atpc 181 190 54.0000 . . atpc 1 chr1 atfreq atpc 191 200 53.0000 . . atpc 1 chr1 atfreq atpc 201 210 54.0000 . . atpc 1 chr1 atfreq atpc 211 220 50.0000 . . atpc 1 chr1 atfreq atpc 221 230 51.0000 . . atpc 1 chr1 atfreq atpc 231 240 56.0000 . . atpc 1 chr1 atfreq atpc 241 250 58.0000 . . atpc 1 chr1 atfreq atpc 251 260 55.0000 . . atpc 1 chr1 atfreq atpc 261 270 54.0000 . . atpc 1 chr1 atfreq atpc 271 280 56.0000 . . atpc 1 chr1 atfreq atpc 281 290 59.0000 . . atpc 1 chr1 atfreq atpc 291 300 58.0000 . . atpc 1 chr1 atfreq atpc 301 310 60.0000 . . atpc 1 chr1 atfreq atpc 311 320 59.0000 . . atpc 1 chr1 atfreq atpc 321 330 59.0000 . . atpc 1 chr1 atfreq atpc 331 340 57.0000 . . atpc 1 chr1 atfreq atpc 341 350 56.0000 . . atpc 1 chr1 atfreq atpc 351 360 57.0000 . . atpc 1 chr1 atfreq atpc 361 370 57.0000 . . atpc 1 chr1 atfreq atpc 371 380 58.0000 . . atpc 1 chr1 atfreq atpc 381 390 56.0000 . . atpc 1 chr1 atfreq atpc 391 400 58.0000 . . atpc 1 chr1 atfreq atpc 401 410 56.0000 . . atpc 1 chr1 atfreq atpc 411 420 59.0000 . . atpc 1 chr1 atfreq atpc 421 430 58.0000 . . atpc 1 chr1 atfreq atpc 431 440 59.0000 . . atpc 1 chr1 atfreq atpc 441 450 58.0000 . . atpc 1 chr1 atfreq atpc 451 460 58.0000 . . atpc 1 chr1 atfreq atpc 461 470 56.0000 . . atpc 1 chr1 atfreq atpc 471 480 57.0000 . . atpc 1 chr1 atfreq atpc 481 490 59.0000 . . atpc 1 ============================ The sal.png : http://avellano.usal.es/~luis/bioperl-l/sal.png Thank you. -- ================================================== Luis Antonio Miguel Quintales Departamento de Inform?tica y Autom?tica Facultad de Ciencias Universidad de Salamanca Plaza de la Merced s/n 37008-SALAMANCA SPAIN ================================================== Tel.: +34-923-294400(ext.1513) Fax.: +34-923-294584 E-mail: lamq at usal.es ================================================== From aaron.j.mackey at gsk.com Mon Apr 14 13:00:52 2008 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Mon, 14 Apr 2008 09:00:52 -0400 Subject: [Bioperl-l] personal vs list email In-Reply-To: Message-ID: I try to take it even one step further: I require the person to re-ask their question on the mailing list (and then try to answer it there). This has the added benefit of causing the person to pause a moment to reflect on their question, and (sometimes) to spend a bit more time preparing the question for more broader public consumption. -Aaron From sutripa at vbi.vt.edu Mon Apr 14 16:54:47 2008 From: sutripa at vbi.vt.edu (Sucheta Tripathy) Date: Mon, 14 Apr 2008 12:54:47 -0400 (EDT) Subject: [Bioperl-l] Error installing XML::Parser Message-ID: <1285.99.152.150.87.1208192087.squirrel@webmail.vbi.vt.edu> Hello List, I have recently installed bioperl using the following command. The installation was successful. Now I am trying to install XML::Parser but it returns with error messages. Any clue what I may be doing wrong? Thanks Sucheta Following is the last part of the error message: ### Error Message ####### Expat.c: In function ??~XS_XML__Parser__Expat_SkipUntil??T: Expat.c:2664: error: ??~XML_Parser??T undeclared (first use in this function) Expat.c:2664: error: expected ??~;??T before ??~parser??T Expat.c:2665: warning: ISO C90 forbids mixed declarations and code Expat.xs:2179: error: ??~parser??T undeclared (first use in this function) Expat.xs:2179: warning: cast to pointer from integer of different size Expat.xs:2180: error: ??~CallbackVector??T has no member named ??~st_serial??T Expat.xs:2182: error: ??~CallbackVector??T has no member named ??~skip_until??T Expat.c: In function ??~XS_XML__Parser__Expat_Do_External_Parse??T: Expat.c:2687: error: ??~XML_Parser??T undeclared (first use in this function) Expat.c:2687: error: expected ??~;??T before ??~parser??T Expat.c:2688: warning: ISO C90 forbids mixed declarations and code Expat.xs:2194: error: ??~parser??T undeclared (first use in this function) Expat.xs:2194: warning: cast to pointer from integer of different size Expat.xs:2205: warning: unused variable ??~pret??T Expat.xs:2194: warning: unused variable ??~cbv??T Expat.xs:2192: warning: unused variable ??~type??T make[1]: *** [Expat.o] Error 1 make[1]: Leaving directory `/root/.cpan/build/XML-Parser-2.36/Expat' make: *** [subdirs] Error 2 /usr/bin/make -- NOT OK Running make test Can't test without successful make Running make install make had returned bad status, install seems impossible ##### -- Sucheta Tripathy, Ph.D. Virginia Bioinformatics Institute Phase-I Washington street. Virginia Tech. Blacksburg,VA 24061-0447 phone:(540)231-8138 Fax: (540) 231-2606 From mmokrejs at ribosome.natur.cuni.cz Tue Apr 15 10:45:48 2008 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Tue, 15 Apr 2008 12:45:48 +0200 Subject: [Bioperl-l] GenBank entries creation dates In-Reply-To: References: <264855a00804020433i2a260561x883189cc4fa0c58f@mail.gmail.com> <47F9F3AA.2090003@uv.es> <200804071448.34769.heikki@sanbi.ac.za> <2BA9950D-F106-4420-B128-A2AE2F46A020@uiuc.edu> <47FA4AD2.5030206@uv.es> Message-ID: <4804875C.80506@ribosome.natur.cuni.cz> Chris Fields wrote: > Note in the example I gave that, during the revision history, the > DBSOURCE changed at the point of the creation date (the original nuc. > record was a M. tuberculosis contig sequence, which later changed to > an updated full M. tuberculosis genome record at the time of the > 'create date'). > > Couldn't find anything specific in the GenBank docs on this, but it > appears (at least for a protein record) the creation date reflects > the date in which the sequence was either originally deposited or > originally derived from the nucleotide source record present in the > record. In other words, it may not reflect the original date of > deposition (which could have come from a different record, as in this > case). > > chris Hi, I have few answers from the past from NCBI staff to my similar questions regarding DATE issues and VERSION numbers not being increased upon "changes" in a record. I tried below to put into a more readable form my former correspondence. Hope this helps everybody to understand what happens in the black box. ;) Martin Date: Thu, 17 Jan 2002 15:40:07 -0500 (EST) From: David Wheeler Subject: Brucella_melitensis on ftp site > Hi, I'd like to point you to the fact, that the descriptions of > Brucella_melitensis differ in > ftp.ncbi.nih.nlm.gov/genomes/Bacteria/Brucella_melitensis and > ftp.ncbi.nih.nlm.gov/genbank/genomes/Bacteria/Brucella_melitensis > > Namely, the description of the strain is retained in *.gbk files > under /genomes/Bacteria/Brucella_melitensis only under the strain > description field, but not in the DEFINITION line, where it is > present in *.gbk files under > /genbank/genomes/Bacteria/Brucella_melitensis. > > LOCUS NC_003318 1177787 bp DNA circular BCT > 13-NOV-2001 DEFINITION Brucella melitensis chromosome II, complete > sequence. ACCESSION NC_003318 VERSION NC_003318.1 GI:17988344 > > compared to > > LOCUS AE008918 1177787 bp DNA circular BCT > 27-DEC-2001 DEFINITION Brucella melitensis strain 16M chromosome II, > complete sequence. ACCESSION AE008918 VERSION AE008918 > > This makes me worried about the data. Why is the release date of > NON-curated files (AE008918) newer than the release data of CURATED > data (NC_003318)? Is it expected case? Could someone explain me the > difference between them (i.e. CURATED vs. NONCURATED)? The curated record is initially a copy of the non-curated record with certain changes in documentation made in order to comply with the NCBI standard for reference genomes. One change which you have noticed is the difference in Definition line format. Curated genomic records are created in order to standardize annotation for genomes in the Entrez Genomes database while leaving editorial control for the parent GenBank records in the hands of the original submitters. Regardles of the date you see on the record, the curated version is derived from the non-curated one. In this case, it appears that the processing of the non-curated version lagged a little bit relative to that of the curated version. Normally, however, the non-curated version will have the earlier date. Date: Sun, 27 Jan 2002 00:16:55 -0500 (EST) From: David Wheeler Subject: Re: CONSULT: Brucella_melitensis on ftp site > Are the raw sequence data always same in non-curated and curated > flatfiles? > > Is the annotation of orf's/proteins different between them? > > Are there any new or withdrawn orf's or proteins in the curated > flatfiles compared to non-curated ones? > > My feeling is that no-one except original submitters can modify > submitted data, so you cannot modify non-curated files, i.e. cannot > modify them and increase the version number. > > Because of that, you've introduced curated versions, which are just > copies of original but public data so you are free to modify it. So > once again, are the differences between non-curated and curated > flatfiles only in structure of the file? I don't think so. Examples > would be Listeria genomes or the 2 Agrobacterium's, if I remember > right. Initially, there should be no or very few differences, however, as time goes by, differences in the annotation will materialize. There may also be differences in the sequence, if errors in the original sequence come to light, but these differences should be very rare. So, practically speaking, you will probably find few differences but, since the purpose of the Refseq is to curate, there may well be some differences. Date: Mon, 17 Dec 2001 11:57:06 -0500 (EST) From: Dawn Lipshultz Subject: Re: Buggy date in Staphylococcus aureus N315 >>>> Hi, I've found there has been released Staphylococcus aureus >>>> N315 on 01-JAN-1900, which is nonsense. I guss you had y2K bug. >>>> >>>> >>>> Please see >>>> >> ftp://ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Staphylococcus_aureus_N315/BA000018.gbk >> >>>> >>>> Can you please tell me the real release date? >>>> >>>> Also, is newer the NC_xxxx for Staphylococcus aureus N315 under >>>> >>>> ftp://ncbi.nlm.nih.gov/genomes/Bacteria/Staphylococcus_aureus_N315/ >>>> or this BA000018 non-cured version? >>>> >>>> >>>> LOCUS BA000018 2814816 bp DNA circular BCT >>>> 01-JAN-1900 DEFINITION Staphylococcus aureus strain N315, >>>> complete genome. >>> AP003129-AP003138. They are all dated June 2001. >>> >>> The date for the record in the ftp file is April 2001. The record >>> in GenBank (NC_002745) is dated October 2001. This version is >>> apparently more updated than the one on the ftp site. Therefore, >>> you may want to download the sequence from GenBank rather than >>> the ftp site. >>> >>> Regards, Dawn S. Lipshultz >> I cannot find the record to which you refer in your message. When I >> did a search for accession number BA000018, I received results for >> accession numbers AP003129-AP003138. They are all dated June 2001. >> >> >> The date for the record in the ftp file is April 2001. The record >> in GenBank (NC_002745) is dated October 2001. This version is >> apparently more updated than the one on the ftp site. Therefore, >> you may want to download the sequence from GenBank rather than the >> ftp site. Regards, Dawn S. Lipshultz > > Hmm, but I do get: > http://www.ncbi.nlm.nih.gov:80/cgi-bin/Entrez/framik?db=genome&gi=179 > > > look at the "GenBank: NC_002745" text in left upper part of the > window, it points to that OLD ftp file. The "RefSeq: NC_002745" > points to the April 2001 version. So what is the right way to get the > October 2001 release? > > Where can I find the difference between NC_002745 from April compared > to NC_002745 from October? > > What do you mean with "you may want to download the sequence from > GenBank rather than the ftp site."? > > BOTH ftp directories at ftp://ncbi.nlm.nih.gov are outdated. I mean > the genomes/Bacteria/Staphylococcus_aureus_N315/NC_002745.* version > and also the > genbank/genomes/Bacteria/Staphylococcus_aureus_N315/BA000018.* > version. > > The web links from www.ncbi.nlm.nih.gov:80/cgi-bin/Entrez/ point > anyway to the ftp site. Do you want to say that the ftp version > aren't updated anymore? The genome was originally released into the database on 4/20/2001 as 10 pieces with secondary accession number BA000018. You can find these pieces in Entrez nucleotides by querying with BA000018. The Genomes group here will fix the date on the record that is available from Entrez genomes. Regards, Dawn Date: Fri, 16 Nov 2001 16:09:59 -0500 (EST) From: Susan Dombrowski Subject: Re: Agrobacterium tumefaciens C58 > Dear colleague, I've noticed that there're somehow updated on Oct 17 > the genomic flatfiles of Agrobacterium tumefaciens C58 at > ftp://ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Agrobacterium_tumefaciens/. > However, for example the AE007869.gbs does NOT self-explain what has > been changed and also the VERSION number is not increased. Would you > please explain what's the change, when can I find such information > next time on web? > > I've used the published sequence from your ftp site on 2001-08-29 > with same ID and would like to know, what differs. > > LOCUS AE007869 2841581 bp DNA circular CON > 17-OCT-2001 DEFINITION Agrobacterium tumefaciens strain C58 circular > chromosome, complete sequence. ACCESSION AE007869 VERSION > AE007869 Dear Colleague, The version number of a sequence will *only* change if the content of the actual sequence has changed in any way since it was first made available. Although the date has changed, this date refers to the last time the actual record was manipulated by an NCBI staff member. Even if there is something simple, like adding a reference, changing a spelling mistake, etc., this will cause a change in the date field of the record. Thus, since the version has not changed, there are no differences to report. Best Regards, Susan Date: Wed, 26 Jun 2002 11:04:48 -0400 (EDT) From: Eric Sayers Subject: Re: Mesorhizobium_loti flatfiles >>>>> Hi, >>>>> I've found that you again silently changed flatfiles lying on your ftp >>>>> some time ago without changing the revision number. Please apologize me, >>>>> but this really causes troubles to other people working in this so called >>>>> bioinformatics. :( >>>>> >>>>> A week ago there was: >>>>> >>>>> LOCUS NC_002678 7036074 bp DNA circular BCT 10-SEP-2001 >>>>> DEFINITION Mesorhizobium loti, complete genome. >>>>> ACCESSION NC_002678 >>>>> VERSION NC_002678.1 GI:13470324 >>>>> >>>>> >>>>> and two other plasmid sequences. This yelds 7275 proteins. >>>>> >>>>> But, last autumn there was: >>>>> >>>>> LOCUS NC_002678 7036074 bp DNA circular BCT 28-MAR-2001 >>>>> DEFINITION Mesorhizobium loti, complete genome. >>>>> ACCESSION NC_002678 >>>>> VERSION NC_002678.1 GI:13470324 >>>>> >>>>> >>>>> That version had 7281 proteins in total. >>>>> I have simple questions: "Why was NOT changed the VERSION number?". >>>>> >>>>> Do I understand it wrong, that it should get updated whenever a single >>>>> character in the file contents is changed? >>> >>>> The version number of a sequence only changes if the sequence itself is >>>> modified. If anything else in the flat file is changed (ie spelling, authors, >>>> annotations, etc) the version will not change. However, the modification date in >>> >>> Sorry, do you under annotation also mean number of predicted genes, their >>> coordinates(position) etc? >>> >>>> the top line of the flat file will change for any of these modifications. (Note >>>> that the dates are different in the file you display: Mar 28, 2001 vs Sept 10, >>>> 2001.) I would track the modification date rather than or as well as the version >>>> number to catch all changes in the files. >>>> Regards, >>>> Eric W. Sayers, Ph.D. >>> >>> OK, but unless some of our programs have been buggy before or now (in >>> either of those cases have failed to extract genes from flatfiles), I do >>> not have an explanation for the differencies in amount of >>> predicted/annotated genes. >>> >>> I do not have anymore available the old flatfiles from Mar 28, but it >>> seems to me that these were newly introduced in the Sept. 10 version: >>> gi_15600768, gi_15600770, gi_15600769, gi_15600766, gi_15600767 >> >> Dear Colleague, >> Again, the only reason the version number will change is if the sequence itself >> changes. The number of annotated/predicted genes is merely an annotation on the >> sequence, and does not change the sequence itself. Therefore, the version will >> not change when the number of annotations changes. The modification date on the >> flat file will (and did) change, of course. >> >> Regards, >> Eric W. Sayers, Ph.D. > > Finally I've heard that from someone, thanks! > Now just tell me, how can I figure out what changed between those > different "date" releases? Is there a changelog available? > I consider annotations changes very important. We do not provide the details of flat file changes on our public websites, except for changes in the version number (ie actual sequence changes). In that particular case, all of the previous versions are linked to the current one. My advice to you if you want to chronicle non-sequence changes would be to check the flat files of interest periodically (by a script, for example) and look for changes in the modification dates. You could then simply compare the before and after flat files. Regards, Eric W. Sayers, Ph.D. > Hi, Miguel: > > id1_fetch can do it. Detailed instruction can be found at: > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=toolkit.section.ch_demo.id1_fetch.html > > Here is an example: > >> >id1_fetch -lt revisions -flat '12:74311105' -fmt fasta > GI Loaded DB Retrieval No. > -- ------ -- ------------- > 74311105 12/07/2007 NCBI 19766263 > 74311105 01/23/2007 NCBI 16325656 > 74311105 03/30/2006 NCBI 13131204 > 74311105 03/03/2006 NCBI 12915541 > 74311105 03/02/2006 NCBI 12885275 > 74311105 12/03/2005 NCBI 12259793 > 74311105 09/09/2005 NCBI 11257262 > 74311105 09/09/2005 NCBI 11242667 > > Wenwu Cui PhD From david at burt7259.freeserve.co.uk Sun Apr 13 14:32:31 2008 From: david at burt7259.freeserve.co.uk (David Burt) Date: Sun, 13 Apr 2008 15:32:31 +0100 Subject: [Bioperl-l] bioperl-db In-Reply-To: <3F77F49A-9C9E-4450-AE28-46F00CADBC8B@gmx.net> References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> <000001c89ce2$5400a710$0202a8c0@STUDYPC> <3F77F49A-9C9E-4450-AE28-46F00CADBC8B@gmx.net> Message-ID: <000001c89d73$3b49eec0$0202a8c0@STUDYPC> Hi Hilmar Many thanks for info - tried a few things 1. First tried --safe flag perl load_ontology.pl --host 127.0.0.1 --dbname bioseqdb --dbuser root --dbpass chicken --driver mysql --safe \ --namespace "InterPro" --format interprosax interpro.xml Still got same output as before ...deleting all relationships for InterPro ...parsing and loading InterPro Can't call method "name" on an undefined value at load_ontology.pl line 914 Only 35 interpro entries entered into database 2. I am using bioperl 1.5.2 3. I downloaded Release 17.0, 20 March 2008 of the interpro.xml file from ftp://ftp.ebi.ac.uk/pub/databases/interpro/ I did not send this file, sine it was ~10Mb gzipped Dave From david at burt7259.freeserve.co.uk Sun Apr 13 14:53:43 2008 From: david at burt7259.freeserve.co.uk (David Burt) Date: Sun, 13 Apr 2008 15:53:43 +0100 Subject: [Bioperl-l] bioperl-db In-Reply-To: References: <000001c89cbe$f2b92b80$0202a8c0@STUDYPC> <000001c89ce5$a5df2e50$0202a8c0@STUDYPC> Message-ID: <000001c89d76$319be060$0202a8c0@STUDYPC> Hilmar Also updated copy of bioperl - see output below root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src $ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' 1.005002101 root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src $ cvs -d :pserver:cvs at cvs.bioperl.org:/home/repository/bioperl login Logging in to :pserver:cvs at cvs.bioperl.org:2401/home/repository/bioperl CVS password: root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src $ cd bioperl-live root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src/bioperl-live $ cvs -q update -d -P -r bioperl-release-1-5-2 P Build.PL P ModuleBuildBioperl.pm P Bio/Root/Version.pm cvs update: warning: t/data/taxdump/names.dmp was lost U t/data/taxdump/names.dmp cvs update: warning: t/data/taxdump/nodes.dmp was lost U t/data/taxdump/nodes.dmp root at STUDY_PC /cygdrive/c/Bioinformatics/Ensembl/src/bioperl-live $ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' 1.0050021 Why is the VERSION 1.0050021 rather than 1.5.2 ? Dave From heikki at sanbi.ac.za Wed Apr 16 11:36:16 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 16 Apr 2008 13:36:16 +0200 Subject: [Bioperl-l] bioperl-microarray: status? In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> Message-ID: <200804161336.16879.heikki@sanbi.ac.za> FYI, Christoper Jones has just published [http://bioinformatics.oxfordjournals.org/cgi/content/short/24/8/1102 an article in Bioinformatics] about his [http://search.cpan.org/perldoc?Microarray Microarray perl module] in CPAN. (The text added into BioPerl wiki.) -Heikki On Friday 26 January 2007 16:05:01 Chris Fields wrote: > Don't know if it's worth it, but could the microarray package be > modified so that it deals with data generated from or interacts > directly with Bioconductor (i.e. maybe including some specialized > bioperl-run set of classes to run Bioconductor tasks, return > lightweight bioperl microarray classes)? Allen pointed out in a > previous post that Bioconductor is the best pick for certain tasks, > while Perl excels at others: > > http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 > > Might be nice if we could merge both strengths together in some way. > > chris > > On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: > > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: > >> Eh, there is some discussion activity on the list, but not much. You > >> are really better off moving to Bioconductor. > > > > Ok, thanks. I added that to the wiki page: > > > > http://www.bioperl.org/wiki/Microarray_package > > > > j > > seqlab.net > > http://www.bioperl.org/wiki/User:Jhannah > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Wed Apr 16 11:36:16 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 16 Apr 2008 13:36:16 +0200 Subject: [Bioperl-l] bioperl-microarray: status? In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> Message-ID: <200804161336.16879.heikki@sanbi.ac.za> FYI, Christoper Jones has just published [http://bioinformatics.oxfordjournals.org/cgi/content/short/24/8/1102 an article in Bioinformatics] about his [http://search.cpan.org/perldoc?Microarray Microarray perl module] in CPAN. (The text added into BioPerl wiki.) -Heikki On Friday 26 January 2007 16:05:01 Chris Fields wrote: > Don't know if it's worth it, but could the microarray package be > modified so that it deals with data generated from or interacts > directly with Bioconductor (i.e. maybe including some specialized > bioperl-run set of classes to run Bioconductor tasks, return > lightweight bioperl microarray classes)? Allen pointed out in a > previous post that Bioconductor is the best pick for certain tasks, > while Perl excels at others: > > http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 > > Might be nice if we could merge both strengths together in some way. > > chris > > On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: > > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: > >> Eh, there is some discussion activity on the list, but not much. You > >> are really better off moving to Bioconductor. > > > > Ok, thanks. I added that to the wiki page: > > > > http://www.bioperl.org/wiki/Microarray_package > > > > j > > seqlab.net > > http://www.bioperl.org/wiki/User:Jhannah > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From pan.mueller at yahoo.de Wed Apr 16 12:34:51 2008 From: pan.mueller at yahoo.de (=?iso-8859-1?Q?Peter_M=FCller?=) Date: Wed, 16 Apr 2008 12:34:51 +0000 (GMT) Subject: [Bioperl-l] load_seqdatabase.pl --pipeline Message-ID: <297809.47580.qm@web28203.mail.ukl.yahoo.com> Dear list, a want to add gene symbols to unigene-cluster which were in a biosql database and lacks this information. So one way is to make a post-update script: my $adp = $db->get_object_adaptor('Bio::ClusterI'); my $pseq = $adp->find_by_primary_key(n); $adp->remove($pseq); $pseq->gene('symbol'); $adp->store($pseq); $adp->commit(); O.k., this works (I ask me why to remove the cluster first - bug or feature...?) Second way - perhaps: Using the --pipeline option, but it looks like useable only for seq-objects (Bio::Factory::SeqProcessoI) right? regards pan Machen Sie Yahoo! zu Ihrer Startseite. Los geht's: http://de.yahoo.com/set From cjfields at uiuc.edu Wed Apr 16 15:00:51 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 16 Apr 2008 10:00:51 -0500 Subject: [Bioperl-l] bioperl-microarray: status? In-Reply-To: <200804161336.16879.heikki@sanbi.ac.za> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: <479BD5A4-9C9A-4733-889D-65942F24A7F3@uiuc.edu> That would be worth looking into at some point, if anyone's interested (though it may be best to build a 'bridging' module). Wonder if it uses BioConductor and, if not, how performance is vs BioConductor? chris On Apr 16, 2008, at 6:36 AM, Heikki Lehvaslaiho wrote: > FYI, > > Christoper Jones has just published > [http://bioinformatics.oxfordjournals.org/cgi/content/short/ > 24/8/1102 an > article in Bioinformatics] about his > [http://search.cpan.org/perldoc?Microarray Microarray perl module] > in CPAN. > > (The text added into BioPerl wiki.) > > -Heikki > > > On Friday 26 January 2007 16:05:01 Chris Fields wrote: >> Don't know if it's worth it, but could the microarray package be >> modified so that it deals with data generated from or interacts >> directly with Bioconductor (i.e. maybe including some specialized >> bioperl-run set of classes to run Bioconductor tasks, return >> lightweight bioperl microarray classes)? Allen pointed out in a >> previous post that Bioconductor is the best pick for certain tasks, >> while Perl excels at others: >> >> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 >> >> Might be nice if we could merge both strengths together in some way. >> >> chris >> >> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: >>> On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >>>> Eh, there is some discussion activity on the list, but not much. >>>> You >>>> are really better off moving to Bioconductor. >>> >>> Ok, thanks. I added that to the wiki page: >>> >>> http://www.bioperl.org/wiki/Microarray_package >>> >>> j >>> seqlab.net >>> http://www.bioperl.org/wiki/User:Jhannah >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From j-keller2 at md.northwestern.edu Wed Apr 16 16:12:27 2008 From: j-keller2 at md.northwestern.edu (Jacob Keller) Date: Wed, 16 Apr 2008 11:12:27 -0500 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: <200804161336.16879.heikki@sanbi.ac.za> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: Hello All, I am new to this list, so am not totally sure this is the right forum, so please forgive if this is not the right place to asl the following question: I am seeking to get all sequences that have a given domain architecture, or at least that contain two given domains. I have thought of a few ways to do this. 1. Blast/Psi-blast for each domain, then compare the results for common sequences between the two lists, and fetch those. I would need to write a (simple) script to do this, but would prefer not to re-invent the wheel. 2. Search with a paradigm sequence of desired architecture/domain composition, somehow tweaking the psiblast parameters to find only matches over the whole search sequence, thereby finding both desired domains. I am not sure how to tweak blast to do this, though. 3. Pfam has this capability, i.e. to show all domains with a given architecture, but it is difficult to get at the actual sequences or even a list of accession numbers. Does anybody have any suggestions as to how optimally to get these seq's? Thanks for your consideration, Jacob ******************************************* Jacob Pearson Keller Northwestern University Medical Scientist Training Program Dallos Laboratory F. Searle 1-240 2240 Campus Drive Evanston IL 60208 lab: 847.491.2438 cel: 773.608.9185 email: j-keller2 at northwestern.edu ******************************************* ----- Original Message ----- From: "Heikki Lehvaslaiho" To: Cc: ; "Chris Fields" ; "Jay Hannah" ; Sent: Wednesday, April 16, 2008 6:36 AM Subject: Re: [Bioperl-l] bioperl-microarray: status? > FYI, > > Christoper Jones has just published > [http://bioinformatics.oxfordjournals.org/cgi/content/short/24/8/1102 an > article in Bioinformatics] about his > [http://search.cpan.org/perldoc?Microarray Microarray perl module] in > CPAN. > > (The text added into BioPerl wiki.) > > -Heikki > > > On Friday 26 January 2007 16:05:01 Chris Fields wrote: >> Don't know if it's worth it, but could the microarray package be >> modified so that it deals with data generated from or interacts >> directly with Bioconductor (i.e. maybe including some specialized >> bioperl-run set of classes to run Bioconductor tasks, return >> lightweight bioperl microarray classes)? Allen pointed out in a >> previous post that Bioconductor is the best pick for certain tasks, >> while Perl excels at others: >> >> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 >> >> Might be nice if we could merge both strengths together in some way. >> >> chris >> >> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: >> > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >> >> Eh, there is some discussion activity on the list, but not much. You >> >> are really better off moving to Bioconductor. >> > >> > Ok, thanks. I added that to the wiki page: >> > >> > http://www.bioperl.org/wiki/Microarray_package >> > >> > j >> > seqlab.net >> > http://www.bioperl.org/wiki/User:Jhannah >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From j-keller2 at md.northwestern.edu Wed Apr 16 16:12:27 2008 From: j-keller2 at md.northwestern.edu (Jacob Keller) Date: Wed, 16 Apr 2008 11:12:27 -0500 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: <200804161336.16879.heikki@sanbi.ac.za> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: Hello All, I am new to this list, so am not totally sure this is the right forum, so please forgive if this is not the right place to asl the following question: I am seeking to get all sequences that have a given domain architecture, or at least that contain two given domains. I have thought of a few ways to do this. 1. Blast/Psi-blast for each domain, then compare the results for common sequences between the two lists, and fetch those. I would need to write a (simple) script to do this, but would prefer not to re-invent the wheel. 2. Search with a paradigm sequence of desired architecture/domain composition, somehow tweaking the psiblast parameters to find only matches over the whole search sequence, thereby finding both desired domains. I am not sure how to tweak blast to do this, though. 3. Pfam has this capability, i.e. to show all domains with a given architecture, but it is difficult to get at the actual sequences or even a list of accession numbers. Does anybody have any suggestions as to how optimally to get these seq's? Thanks for your consideration, Jacob ******************************************* Jacob Pearson Keller Northwestern University Medical Scientist Training Program Dallos Laboratory F. Searle 1-240 2240 Campus Drive Evanston IL 60208 lab: 847.491.2438 cel: 773.608.9185 email: j-keller2 at northwestern.edu ******************************************* ----- Original Message ----- From: "Heikki Lehvaslaiho" To: Cc: ; "Chris Fields" ; "Jay Hannah" ; Sent: Wednesday, April 16, 2008 6:36 AM Subject: Re: [Bioperl-l] bioperl-microarray: status? > FYI, > > Christoper Jones has just published > [http://bioinformatics.oxfordjournals.org/cgi/content/short/24/8/1102 an > article in Bioinformatics] about his > [http://search.cpan.org/perldoc?Microarray Microarray perl module] in > CPAN. > > (The text added into BioPerl wiki.) > > -Heikki > > > On Friday 26 January 2007 16:05:01 Chris Fields wrote: >> Don't know if it's worth it, but could the microarray package be >> modified so that it deals with data generated from or interacts >> directly with Bioconductor (i.e. maybe including some specialized >> bioperl-run set of classes to run Bioconductor tasks, return >> lightweight bioperl microarray classes)? Allen pointed out in a >> previous post that Bioconductor is the best pick for certain tasks, >> while Perl excels at others: >> >> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 >> >> Might be nice if we could merge both strengths together in some way. >> >> chris >> >> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: >> > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >> >> Eh, there is some discussion activity on the list, but not much. You >> >> are really better off moving to Bioconductor. >> > >> > Ok, thanks. I added that to the wiki page: >> > >> > http://www.bioperl.org/wiki/Microarray_package >> > >> > j >> > seqlab.net >> > http://www.bioperl.org/wiki/User:Jhannah >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From frederic.romagne at gmail.com Wed Apr 16 17:25:18 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Wed, 16 Apr 2008 12:25:18 -0500 Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix Message-ID: <1208366718.19084.15.camel@kiss-laptop> Hello, i made a program which use Bio::Index::GenBank and i tested it under unix, that worked well. But i have to launch it under windows and it seems not to work on. Here is the problem : my $dbobj = Bio::Index::Abstract->new("Data/$db"); ?my $seq = $dbobj->get_Seq_by_acc($id); print $seq->display_id."\n"; did not print the same number than $id !!! So i don't work on the sequence expected... I use the SVN sources on unix and the Perl package manager for windows... Thanks. From cjfields at uiuc.edu Wed Apr 16 17:52:59 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 16 Apr 2008 12:52:59 -0500 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: You can try CDART: http://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi?cmd=rps There are probably other tools out there as well. If you want to roll your own, you can use bioperl wrappers for all of these (Bio::Tools::Run::StandAloneBlast is in bioperl-live, Bio::Tools::Run::Hmmer in bioperl-run), tweaking the parameters as you see fit, and either parse while running them or store the file for parsing later using Bio::SearchIO. Personally, I wouldn't go with (2) unless you are absolutely sure the domains are found only once per sequence, are spatially conserved, and don't overlap. For instance, with many proteins you could have a domain structure like dom1-dom2, dom2-dom1, dom1-dom1-dom2, etc. If you just want accessions from Pfam's Stockholm format (which are UniProt, I believe) you can get at accessions using Bio::AlignIO::stockholm (using perl 5.10): use Bio::AlignIO; use feature 'say'; my $file = shift || die "Must pass file as argument\n"; my $in = Bio::AlignIO->new(-format => 'stockholm', -file => $file); while (my $aln = $in->next_aln) { my @accs; for my $seq ($aln->each_seq) { push @accs, $seq->accession_number; } say join(',', at accs); } chris On Apr 16, 2008, at 11:12 AM, Jacob Keller wrote: > Hello All, > > I am new to this list, so am not totally sure this is the right > forum, so please forgive if this is not the right place to asl the > following question: I am seeking to get all sequences that have a > given domain architecture, or at least that contain two given > domains. I have thought of a few ways to do this. > > 1. Blast/Psi-blast for each domain, then compare the results for > common sequences between the two lists, and fetch those. I would > need to write a (simple) script to do this, but would prefer not to > re-invent the wheel. > > 2. Search with a paradigm sequence of desired architecture/domain > composition, somehow tweaking the psiblast parameters to find only > matches over the whole search sequence, thereby finding both desired > domains. I am not sure how to tweak blast to do this, though. > > 3. Pfam has this capability, i.e. to show all domains with a given > architecture, but it is difficult to get at the actual sequences or > even a list of accession numbers. > > Does anybody have any suggestions as to how optimally to get these > seq's? > > Thanks for your consideration, > > Jacob > > ******************************************* > Jacob Pearson Keller > Northwestern University > Medical Scientist Training Program > Dallos Laboratory > F. Searle 1-240 > 2240 Campus Drive > Evanston IL 60208 > lab: 847.491.2438 > cel: 773.608.9185 > email: j-keller2 at northwestern.edu > ******************************************* > > ----- Original Message ----- From: "Heikki Lehvaslaiho" > > To: > Cc: ; "Chris Fields" ; "Jay > Hannah" ; > Sent: Wednesday, April 16, 2008 6:36 AM > Subject: Re: [Bioperl-l] bioperl-microarray: status? > > >> FYI, >> >> Christoper Jones has just published >> [http://bioinformatics.oxfordjournals.org/cgi/content/short/ >> 24/8/1102 an >> article in Bioinformatics] about his >> [http://search.cpan.org/perldoc?Microarray Microarray perl module] >> in CPAN. >> >> (The text added into BioPerl wiki.) >> >> -Heikki >> >> >> On Friday 26 January 2007 16:05:01 Chris Fields wrote: >>> Don't know if it's worth it, but could the microarray package be >>> modified so that it deals with data generated from or interacts >>> directly with Bioconductor (i.e. maybe including some specialized >>> bioperl-run set of classes to run Bioconductor tasks, return >>> lightweight bioperl microarray classes)? Allen pointed out in a >>> previous post that Bioconductor is the best pick for certain tasks, >>> while Perl excels at others: >>> >>> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 >>> >>> Might be nice if we could merge both strengths together in some way. >>> >>> chris >>> >>> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: >>> > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >>> >> Eh, there is some discussion activity on the list, but not >>> much. You >>> >> are really better off moving to Bioconductor. >>> > >>> > Ok, thanks. I added that to the wiki page: >>> > >>> > http://www.bioperl.org/wiki/Microarray_package >>> > >>> > j >>> > seqlab.net >>> > http://www.bioperl.org/wiki/User:Jhannah >>> > >>> > _______________________________________________ >>> > Bioperl-l mailing list >>> > Bioperl-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ______ _/ _/ >> _____________________________________________________ >> _/ _/ >> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >> _/ _/ _/ SANBI, South African National Bioinformatics Institute >> _/ _/ _/ University of Western Cape, South Africa >> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >> ___ _/_/_/_/_/ >> ________________________________________________________ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From David.Messina at sbc.su.se Wed Apr 16 18:23:27 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 16 Apr 2008 20:23:27 +0200 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> Message-ID: <628aabb70804161123s453bd96bqd2213b938dfdb3a2@mail.gmail.com> Hey Jacob, This forum is mostly geared toward the BioPerl software package rather than general bioinformatics assistance. That being said, I would recommend using Pfam's Sequence Search to determine the domain content of your sequences and then simply looking at those which have the same two domains of interest. If there are more sequences matching this criterion than can be examined manually, you could write up something (potentially using BioPerl) to then look at the relative order and number of those domains in your sequences. However, if these sequences have UniProt IDs, you can start with the domains and Pfam will hand you a list of all the UniProt seqs having those domains. On the Pfam website's main page, click on "Help" (right side of menu at the top of the page) and then "Tools and Services" (left side menu). Dave From Russell.Smithies at agresearch.co.nz Wed Apr 16 20:49:49 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 17 Apr 2008 08:49:49 +1200 Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix In-Reply-To: <1208366718.19084.15.camel@kiss-laptop> References: <1208366718.19084.15.camel@kiss-laptop> Message-ID: Did you check the format of your input file? i.e. DOS or UNIX line endings? > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Fr?d?ric Romagn? > Sent: Thursday, 17 April 2008 5:25 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix > > Hello, > i made a program which use Bio::Index::GenBank and i tested it under > unix, that worked well. > > But i have to launch it under windows and it seems not to work on. > > Here is the problem : > > my $dbobj = Bio::Index::Abstract->new("Data/$db"); > ?my $seq = $dbobj->get_Seq_by_acc($id); > print $seq->display_id."\n"; > > did not print the same number than $id !!! So i don't work on the > sequence expected... > > I use the SVN sources on unix and the Perl package manager for > windows... > > Thanks. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From frederic.romagne at gmail.com Wed Apr 16 21:39:07 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Wed, 16 Apr 2008 16:39:07 -0500 Subject: [Bioperl-l] index::abstract on win and unix In-Reply-To: References: <1208366718.19084.15.camel@kiss-laptop> Message-ID: <1208381947.16620.6.camel@kiss-laptop> Well, if with input file you mean the database used, it's created with ?Bio::Index::GenBank from a ncbi FTP's genbank file. $id is an accession number read from a file but i chomp the line... I am trying to install the svn version of bioperl under windows to see if there is an improvement. Le jeudi 17 avril 2008 ? 08:49 +1200, Smithies, Russell a ?crit : > Did you check the format of your input file? > i.e. DOS or UNIX line endings? > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > > bio.org] On Behalf Of Fr?d?ric Romagn? > > Sent: Thursday, 17 April 2008 5:25 a.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix > > > > Hello, > > i made a program which use Bio::Index::GenBank and i tested it under > > unix, that worked well. > > > > But i have to launch it under windows and it seems not to work on. > > > > Here is the problem : > > > > my $dbobj = Bio::Index::Abstract->new("Data/$db"); > > ?my $seq = $dbobj->get_Seq_by_acc($id); > > print $seq->display_id."\n"; > > > > did not print the same number than $id !!! So i don't work on the > > sequence expected... > > > > I use the SVN sources on unix and the Perl package manager for > > windows... > > > > Thanks. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= From hubert.gaynor at yahoo.com Thu Apr 17 06:19:11 2008 From: hubert.gaynor at yahoo.com (Hubert Gaynor) Date: Wed, 16 Apr 2008 23:19:11 -0700 (PDT) Subject: [Bioperl-l] Can I use BLAST against a database like MySQL Message-ID: <657734.41592.qm@web46008.mail.sp1.yahoo.com> Hi, As far as I know, before using BLAST to do the alignment the first thing should be done is typing formatdb to construct a database. But I was wondering whether it is possible to construct a database with MySQL which probably will grant the BLAST search a higher speed and make the database management much easier? Thanks! Hubert. ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From sdavis2 at mail.nih.gov Thu Apr 17 10:36:32 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 17 Apr 2008 06:36:32 -0400 Subject: [Bioperl-l] Can I use BLAST against a database like MySQL In-Reply-To: <657734.41592.qm@web46008.mail.sp1.yahoo.com> References: <657734.41592.qm@web46008.mail.sp1.yahoo.com> Message-ID: <264855a00804170336o2a2bcff9xfcb05a33bac4c8dc@mail.gmail.com> On Thu, Apr 17, 2008 at 2:19 AM, Hubert Gaynor wrote: > Hi, > > As far as I know, before using BLAST to do the alignment the first thing should be done is typing formatdb to construct a database. But I was wondering whether it is possible to construct a database with MySQL which probably will grant the BLAST search a higher speed and make the database management much easier? > formatdb is used to make a representation that can be used efficiently by blast. That representation already makes blast faster. MySQL can't be used for such things. As for speeding blast, if you have a multiprocessor machine, you can take advantage of those using blast and increasing the number of processors. Also, while blast is a very versatile program, it is not the only alignment program available. Depending on your needs, you could look at other programs such as blat or gmap that can be 2-3 orders of magnitude faster than blast. Sean From stefan.kirov at bms.com Thu Apr 17 13:40:29 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 09:40:29 -0400 Subject: [Bioperl-l] bioperl-db woes Message-ID: <4807534D.80105@bms.com> I'm having problems passing all the tests for bioperl-db. There are 2 distinct errors, first one: Can't locate Bio/DB/BioSQL/RichSeqAdaptor.pm ***Which by the way is embed deep into several layers of eval, so I am getting the actual error from the test: ***t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 78. or ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Annotation of class Bio::Annotation::Collection not type-mapped. Internal error? STACK: Error::throw STACK: Bio::Root::Root::throw /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store Bio/DB/Persistent/PersistentObject.pm:271 STACK: Bio::DB::BioSQL::SeqAdaptor::store_children Bio/DB/BioSQL/SeqAdaptor.pm:224 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK: Bio::DB::Persistent::PersistentObject::create Bio/DB/Persistent/PersistentObject.pm:244 STACK: t/04swiss.t:36 ----------------------------------------------------------- It turns out the adaptor is really not there??? My bioperl-db is from dev.open-bio.org/home/svn-repositories/bioperl/bioperl-db/trunk bioperl-db (revision 14661) Is this module being deprecated (I am sure it is not) my download incomplete....? The other problem was: DBD::Oracle::st execute failed: ORA-02292: integrity constraint (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with ParamValues: :p1=9606] at /home/kirovs/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 320. not ok 76 # Test 76 got: (t/02species.t at line 71) I have not tried to debug this one.... Thanks! Stefan From stefan.kirov at bms.com Thu Apr 17 14:18:30 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 10:18:30 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> Message-ID: On Thu, 17 Apr 2008, Chris Fields wrote: > The 'get_dbxrefs' problem looks related to recent changes I made when rolling > back the significant feature/annotation changes introduced just prior to the > 1.5 release, none which were fully implemented. I can check that one out. > Odd though; these passed for me, but I'm using MySQL not oracle. get_dbxref is not the problem- I think the error message is misleading: kirovs at horta:~/bioperl-db> grep get_dbxrefs /home/kirovs/bioperl-live/Bio/Ontology/Term.pm get_dbxrefs() instead, which handles both strings and DBLink "Use get_dbxrefs() instead"); $self->get_dbxrefs($context); =head2 get_dbxrefs Title : get_dbxrefs() Usage : @ds = $term->get_dbxrefs(); sub get_dbxrefs { } # get_dbxrefs my @old = $self->get_dbxrefs($context); sub each_dblink {shift->throw("use of each_dblink() is deprecated; use get_dbxrefs() instead")} So it is there. In any case I debugged and tracked that down to the RichSeq adaptor module missing. It is not in the distro I downloaded, so I think this is my problem. It is a different question why... I looked at different repos (SVN, CVS, trunk, different tags) and I did not see RichSeq.pm. I am not sure what is going on. Perhaps Hilmar will be able to help when he is around. Thanks for the help Chris.... Stefan > > You may want to make sure you are using bioperl-live and that there isn't an > older bioperl installation getting into the mix. > > chris > > On Apr 17, 2008, at 8:40 AM, Stefan Kirov wrote: > >> I'm having problems passing all the tests for bioperl-db. There are 2 >> distinct errors, first one: >> Can't locate Bio/DB/BioSQL/RichSeqAdaptor.pm >> ***Which by the way is embed deep into several layers of eval, so I >> am getting the actual error from the test: >> ***t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" >> via package "Bio::Ontology::Term" at >> >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 78. >> or >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> >> MSG: Annotation of class Bio::Annotation::Collection not >> type-mapped. Internal error? >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >> STACK: >> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >> STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store >> Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >> Bio/DB/BioSQL/SeqAdaptor.pm:224 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::Persistent::PersistentObject::create >> Bio/DB/Persistent/PersistentObject.pm:244 >> STACK: t/04swiss.t:36 >> ----------------------------------------------------------- >> >> It turns out the adaptor is really not there??? >> My bioperl-db is from >> dev.open-bio.org/home/svn-repositories/bioperl/bioperl-db/trunk >> bioperl-db (revision 14661) >> Is this module being deprecated (I am sure it is not) my download >> incomplete....? >> The other problem was: >> DBD::Oracle::st execute failed: ORA-02292: integrity constraint >> (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: >> OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with >> ParamValues: :p1=9606] at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm >> line 320. >> not ok 76 >> # Test 76 got: (t/02species.t at line 71) >> I have not tried to debug this one.... >> Thanks! >> Stefan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > From cjfields at uiuc.edu Thu Apr 17 13:59:57 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 17 Apr 2008 08:59:57 -0500 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <4807534D.80105@bms.com> References: <4807534D.80105@bms.com> Message-ID: <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> The 'get_dbxrefs' problem looks related to recent changes I made when rolling back the significant feature/annotation changes introduced just prior to the 1.5 release, none which were fully implemented. I can check that one out. Odd though; these passed for me, but I'm using MySQL not oracle. You may want to make sure you are using bioperl-live and that there isn't an older bioperl installation getting into the mix. chris On Apr 17, 2008, at 8:40 AM, Stefan Kirov wrote: > I'm having problems passing all the tests for bioperl-db. There are 2 > distinct errors, first one: > Can't locate Bio/DB/BioSQL/RichSeqAdaptor.pm > ***Which by the way is embed deep into several layers of eval, so I > am getting the actual error from the test: > ***t/04swiss.........ok 3/52Can't locate object method > "get_dbxrefs" > via package "Bio::Ontology::Term" at > > /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm > line 552, line 78. > or > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Annotation of class Bio::Annotation::Collection not > type-mapped. Internal error? > STACK: Error::throw > STACK: Bio::Root::Root::throw > /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 > STACK: > Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key > Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 > STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children > Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > STACK: Bio::DB::Persistent::PersistentObject::store > Bio/DB/Persistent/PersistentObject.pm:271 > STACK: Bio::DB::BioSQL::SeqAdaptor::store_children > Bio/DB/BioSQL/SeqAdaptor.pm:224 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK: Bio::DB::Persistent::PersistentObject::create > Bio/DB/Persistent/PersistentObject.pm:244 > STACK: t/04swiss.t:36 > ----------------------------------------------------------- > > It turns out the adaptor is really not there??? > My bioperl-db is from > dev.open-bio.org/home/svn-repositories/bioperl/bioperl-db/trunk > bioperl-db (revision 14661) > Is this module being deprecated (I am sure it is not) my download > incomplete....? > The other problem was: > DBD::Oracle::st execute failed: ORA-02292: integrity constraint > (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: > OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with > ParamValues: :p1=9606] at > /home/kirovs/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm > line 320. > not ok 76 > # Test 76 got: (t/02species.t at line 71) > I have not tried to debug this one.... > Thanks! > Stefan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Thu Apr 17 14:52:32 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 10:52:32 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <9ECDEB39-95F3-4A94-9AF7-FFEBBDEFF0FA@gmx.net> References: <4807534D.80105@bms.com> <9ECDEB39-95F3-4A94-9AF7-FFEBBDEFF0FA@gmx.net> Message-ID: That is correct and I assumed I should not be concerned with this error. Thanks Stefan On Thu, 17 Apr 2008, Hilmar Lapp wrote: > > On Apr 17, 2008, at 9:40 AM, Stefan Kirov wrote: >> The other problem was: >> DBD::Oracle::st execute failed: ORA-02292: integrity constraint >> (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: >> OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with >> ParamValues: :p1=9606] at > > > This sounds like you are running the tests against a non-empty database? > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > From hlapp at gmx.net Thu Apr 17 14:47:58 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 17 Apr 2008 10:47:58 -0400 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> Message-ID: <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: > In any case I debugged and tracked that down to the RichSeq adaptor > module missing. That almost can't be the problem. Every Bio::Seq::RichSeq is-a Bio::Seq and a SeqAdaptor is present. I'm afraid it gets stuck somewhere else and frankly I didn't see the RichSeqAdaptor failing to load in your stack trace: > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Annotation of class Bio::Annotation::Collection not > type-mapped. Internal error? > STACK: Error::throw > STACK: Bio::Root::Root::throw > /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 > STACK: > Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key > Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 > STACK: > Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children > Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > STACK: Bio::DB::Persistent::PersistentObject::store > Bio/DB/Persistent/PersistentObject.pm:271 > STACK: Bio::DB::BioSQL::SeqAdaptor::store_children > Bio/DB/BioSQL/SeqAdaptor.pm:224 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 > STACK: Bio::DB::Persistent::PersistentObject::create > Bio/DB/Persistent/PersistentObject.pm:244 > STACK: t/04swiss.t:36 > ----------------------------------------------------------- What that tells me is that when bioperl-db tries to store the annotation bundle of the (SwissProt) sequence, one of the annotations that it encounters is of type Bio::Annotation::Collection. At present bioperl-db doesn't know what to do with it; i.e., bioperl-db can't yet handle hierarchical annotation collections (collections within collections). I believe this is due to recent changes in how the GN line is parsed in BioPerl - Chris does this ring the right bell? I thought though you had built in a method would allow flattening out? It's worth noting that BioSQL itself can't really represent nested annotation collections other than by using ontology terms and their hierarchy, which at present I think isn't really appropriate, but I have to think through the issue more. In other words, in BioSQL you can't directly tie together a bunch of qualifier value pairs into a "bag" and then nest this bag within another. The way to make this work with the current schema is to flatten out the nesting. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Apr 17 14:48:52 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 17 Apr 2008 10:48:52 -0400 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <4807534D.80105@bms.com> References: <4807534D.80105@bms.com> Message-ID: <9ECDEB39-95F3-4A94-9AF7-FFEBBDEFF0FA@gmx.net> On Apr 17, 2008, at 9:40 AM, Stefan Kirov wrote: > The other problem was: > DBD::Oracle::st execute failed: ORA-02292: integrity constraint > (BIOSQL.FKTAX_ENT) violated - child record found (DBD ERROR: > OCIStmtExecute) [for Statement "DELETE FROM taxon WHERE oid = ?" with > ParamValues: :p1=9606] at This sounds like you are running the tests against a non-empty database? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From stefan.kirov at bms.com Thu Apr 17 15:28:42 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 11:28:42 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: Hilmar, I think I saw what happens with this adaptor- In Bio::DB::BioSQL::DBAdaptor::_load_object_adaptor (call from create_persistent) there is request that this module is loaded: Bio/DB/BioSQL/RichSeqAdaptor.pm There is no such module... This always fails, but since it is evaled, there is no actual error- instead. Perhaps this is leftover...? This got me fooled... I guess Chris could be right- Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key is being passed Bio::Annotation::Collection as a value for $obj->obj(). Or recursing too far? Anyway, I am just guessing here- I do not know the architecture of bioperl-db... Thanks again for the help... Stefan On Thu, 17 Apr 2008, Hilmar Lapp wrote: > > On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >> In any case I debugged and tracked that down to the RichSeq adaptor module >> missing. > > > That almost can't be the problem. Every Bio::Seq::RichSeq is-a Bio::Seq and a > SeqAdaptor is present. > > I'm afraid it gets stuck somewhere else and frankly I didn't see the > RichSeqAdaptor failing to load in your stack trace: > >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> >> MSG: Annotation of class Bio::Annotation::Collection not >> type-mapped. Internal error? >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >> STACK: >> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >> STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store >> Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >> Bio/DB/BioSQL/SeqAdaptor.pm:224 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::Persistent::PersistentObject::create >> Bio/DB/Persistent/PersistentObject.pm:244 >> STACK: t/04swiss.t:36 >> ----------------------------------------------------------- > > What that tells me is that when bioperl-db tries to store the annotation > bundle of the (SwissProt) sequence, one of the annotations that it encounters > is of type Bio::Annotation::Collection. At present bioperl-db doesn't know > what to do with it; i.e., bioperl-db can't yet handle hierarchical annotation > collections (collections within collections). > > I believe this is due to recent changes in how the GN line is parsed in > BioPerl - Chris does this ring the right bell? I thought though you had built > in a method would allow flattening out? > > It's worth noting that BioSQL itself can't really represent nested annotation > collections other than by using ontology terms and their hierarchy, which at > present I think isn't really appropriate, but I have to think through the > issue more. In other words, in BioSQL you can't directly tie together a bunch > of qualifier value pairs into a "bag" and then nest this bag within another. > The way to make this work with the current schema is to flatten out the > nesting. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at uiuc.edu Thu Apr 17 16:26:41 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 17 Apr 2008 11:26:41 -0500 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: On Apr 17, 2008, at 9:47 AM, Hilmar Lapp wrote: > > On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >> In any case I debugged and tracked that down to the RichSeq adaptor >> module missing. > > > That almost can't be the problem. Every Bio::Seq::RichSeq is-a > Bio::Seq and a SeqAdaptor is present. > > I'm afraid it gets stuck somewhere else and frankly I didn't see the > RichSeqAdaptor failing to load in your stack trace: > >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> >> MSG: Annotation of class Bio::Annotation::Collection not >> type-mapped. Internal error? >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >> STACK: >> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >> STACK: >> Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >> STACK: Bio::DB::Persistent::PersistentObject::store >> Bio/DB/Persistent/PersistentObject.pm:271 >> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >> Bio/DB/BioSQL/SeqAdaptor.pm:224 >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >> STACK: Bio::DB::Persistent::PersistentObject::create >> Bio/DB/Persistent/PersistentObject.pm:244 >> STACK: t/04swiss.t:36 >> ----------------------------------------------------------- > > What that tells me is that when bioperl-db tries to store the > annotation bundle of the (SwissProt) sequence, one of the > annotations that it encounters is of type > Bio::Annotation::Collection. At present bioperl-db doesn't know what > to do with it; i.e., bioperl-db can't yet handle hierarchical > annotation collections (collections within collections). > > I believe this is due to recent changes in how the GN line is parsed > in BioPerl - Chris does this ring the right bell? I thought though > you had built in a method would allow flattening out This appears to be using an older bioperl-live checkout, one where Heikki changed GN parsing to use a nested Annotation::Collection. I reverted that back in a later commit to svn specifically b/c of bioperl-db problems. bioperl-live's swiss.pm now uses a new subclass of Bio::Annotation::SimpleValue (Bio::Annotation::TagTree) that represents nested values via Data::Stag's itext output (we can change that to alternatives if needed). Here are the last few relevant revisions in bioperl-live's main trunk (mine is the latest): ------------------------------------------------------------------------ r14562 | cjfields | 2008-02-28 08:30:05 -0600 (Thu, 28 Feb 2008) | 1 line bug 1825: updating swiss.pm/tests to try out TagTree (passes all tests). Need to update Handler.t and related modules still... ------------------------------------------------------------------------ r14541 | heikki | 2008-02-25 00:10:48 -0600 (Mon, 25 Feb 2008) | 1 line documentation for the GN line parsing and management ------------------------------------------------------------------------ r14538 | heikki | 2008-02-23 08:48:23 -0600 (Sat, 23 Feb 2008) | 1 line GN (Gene Name) line parsing rewrite. Breaks backward compatibility. Can now deal with >1 gene per entry and four categories of names per gene. Parses old style syntax (...OR ... OR ... ) into one gene name and synonyms for each gene. Docs to follow. .... I just updated all code from dev and reran bioperl-db tests w/o problems. Maybe someone else could do the same to see what happens? > It's worth noting that BioSQL itself can't really represent nested > annotation collections other than by using ontology terms and their > hierarchy, which at present I think isn't really appropriate, but I > have to think through the issue more. In other words, in BioSQL you > can't directly tie together a bunch of qualifier value pairs into a > "bag" and then nest this bag within another. The way to make this > work with the current schema is to flatten out the nesting. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== Might be worth looking into for a future BioSQL release, but we have a decent workaround in place for now, as long as it works cross-platform and cross-RDB. chris From stefan.kirov at bms.com Thu Apr 17 16:40:14 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 12:40:14 -0400 (Eastern Daylight Time) Subject: [Bioperl-l] bioperl-db woes In-Reply-To: References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: Hilmar, sorry, I missed the part after the stack trace... In any case this is still problem for me after I updated bioperl-live. I see this with a number of other tests: t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 78. t/04swiss.........dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 6-52 Failed 47/52 tests, 9.62% okay t/05seqfeature....ok 4/48Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 72. t/05seqfeature....FAILED tests 9-48 Failed 40/48 tests, 16.67% okay t/06comment.......ok t/07dblink........ok t/08genbank.......ok t/09fuzzy2........ok t/10ensembl.......ok 1/15Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1420. t/10ensembl.......dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 3-15 Failed 13/15 tests, 13.33% okay t/11locuslink.....ok 4/110Can't locate object method "get_dbxrefs" via package "Bio::Annotation::OntologyTerm" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1. t/11locuslink.....dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 5-110 Failed 106/110 tests, 3.64% okay t/12ontology......ok 1/738Can't locate object method "get_dbxrefs" via package "Bio::Ontology::GOterm" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 98. t/12ontology......dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 5-738 Failed 734/738 tests, 0.54% okay t/13remove........ok 2/59Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 145. t/13remove........FAILED tests 11-59 Failed 49/59 tests, 16.95% okay t/14query.........ok t/15cluster.......ok 3/160Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1. t/15cluster.......dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 6-160 Failed 155/160 tests, 3.12% okay t/16obda..........ok On Thu, 17 Apr 2008, Chris Fields wrote: > > On Apr 17, 2008, at 9:47 AM, Hilmar Lapp wrote: > >> >> On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >>> In any case I debugged and tracked that down to the RichSeq adaptor module >>> missing. >> >> >> That almost can't be the problem. Every Bio::Seq::RichSeq is-a Bio::Seq and >> a SeqAdaptor is present. >> >> I'm afraid it gets stuck somewhere else and frankly I didn't see the >> RichSeqAdaptor failing to load in your stack trace: >> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> >>> MSG: Annotation of class Bio::Annotation::Collection not >>> type-mapped. Internal error? >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw >>> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >>> STACK: >>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >>> STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >>> STACK: Bio::DB::Persistent::PersistentObject::store >>> Bio/DB/Persistent/PersistentObject.pm:271 >>> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >>> Bio/DB/BioSQL/SeqAdaptor.pm:224 >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>> STACK: Bio::DB::Persistent::PersistentObject::create >>> Bio/DB/Persistent/PersistentObject.pm:244 >>> STACK: t/04swiss.t:36 >>> ----------------------------------------------------------- >> >> What that tells me is that when bioperl-db tries to store the annotation >> bundle of the (SwissProt) sequence, one of the annotations that it >> encounters is of type Bio::Annotation::Collection. At present bioperl-db >> doesn't know what to do with it; i.e., bioperl-db can't yet handle >> hierarchical annotation collections (collections within collections). >> >> I believe this is due to recent changes in how the GN line is parsed in >> BioPerl - Chris does this ring the right bell? I thought though you had >> built in a method would allow flattening out > > This appears to be using an older bioperl-live checkout, one where Heikki > changed GN parsing to use a nested Annotation::Collection. I reverted that > back in a later commit to svn specifically b/c of bioperl-db problems. > bioperl-live's swiss.pm now uses a new subclass of > Bio::Annotation::SimpleValue (Bio::Annotation::TagTree) that represents > nested values via Data::Stag's itext output (we can change that to > alternatives if needed). > > Here are the last few relevant revisions in bioperl-live's main trunk (mine > is the latest): > > ------------------------------------------------------------------------ > r14562 | cjfields | 2008-02-28 08:30:05 -0600 (Thu, 28 Feb 2008) | 1 line > > bug 1825: updating swiss.pm/tests to try out TagTree (passes all tests). > Need to update Handler.t and related modules still... > ------------------------------------------------------------------------ > r14541 | heikki | 2008-02-25 00:10:48 -0600 (Mon, 25 Feb 2008) | 1 line > > documentation for the GN line parsing and management > ------------------------------------------------------------------------ > r14538 | heikki | 2008-02-23 08:48:23 -0600 (Sat, 23 Feb 2008) | 1 line > > GN (Gene Name) line parsing rewrite. Breaks backward compatibility. Can now > deal with >1 gene per entry and four categories of names per gene. Parses old > style syntax (...OR ... OR ... ) into one gene name and synonyms for each > gene. Docs to follow. > > .... > > I just updated all code from dev and reran bioperl-db tests w/o problems. > Maybe someone else could do the same to see what happens? > >> It's worth noting that BioSQL itself can't really represent nested >> annotation collections other than by using ontology terms and their >> hierarchy, which at present I think isn't really appropriate, but I have to >> think through the issue more. In other words, in BioSQL you can't directly >> tie together a bunch of qualifier value pairs into a "bag" and then nest >> this bag within another. The way to make this work with the current schema >> is to flatten out the nesting. >> >> -hilmar >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== > > Might be worth looking into for a future BioSQL release, but we have a decent > workaround in place for now, as long as it works cross-platform and > cross-RDB. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Apr 17 17:06:39 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 17 Apr 2008 12:06:39 -0500 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: Stefan, 'get_dbxrefs' was introduced in bioperl-live a while back during the feature/annotation rollback detailed here: http://www.bioperl.org/wiki/Feature_Annotation_rollback I still think this is an interfering old bioperl (and maybe bioperl- db) installation causing the problems; I had similar issues at one point and had to find and remove the old installation. It might be worth (1) checking 'perldoc -l Bio::Root::Root', which will give the location of the Bio::Root::Root in lib path being used, and (2) using './Build install uninst=1' to remove any old bioperl/bioperl-db installations. chris On Apr 17, 2008, at 11:40 AM, Stefan Kirov wrote: > Hilmar, > sorry, I missed the part after the stack trace... In any case this > is still problem for me after I updated bioperl-live. > I see this with a number of other tests: > t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 78. > t/04swiss.........dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 6-52 > Failed 47/52 tests, 9.62% okay > t/05seqfeature....ok 4/48Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 72. > t/05seqfeature....FAILED tests 9-48 > Failed 40/48 tests, 16.67% okay > t/06comment.......ok > t/07dblink........ok > t/08genbank.......ok > t/09fuzzy2........ok > t/10ensembl.......ok 1/15Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1420. > t/10ensembl.......dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 3-15 > Failed 13/15 tests, 13.33% okay > t/11locuslink.....ok 4/110Can't locate object method "get_dbxrefs" > via package "Bio::Annotation::OntologyTerm" at /home/kirovs/bioperl- > db/blib/lib/Bio/DB/Persistent/PersistentObject.pm line 552, > line 1. > t/11locuslink.....dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 5-110 > Failed 106/110 tests, 3.64% okay > t/12ontology......ok 1/738Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::GOterm" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 98. > t/12ontology......dubious > Test returned status 255 (wstat 65280, 0xff00) > DIED. FAILED tests 5-738 > Failed 734/738 tests, 0.54% okay > t/13remove........ok 2/59Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 145. > t/13remove........FAILED tests 11-59 > Failed 49/59 tests, 16.95% okay > t/14query.........ok > t/15cluster.......ok 3/160Can't locate object method "get_dbxrefs" > via package "Bio::Ontology::Term" at /home/kirovs/bioperl-db/blib/ > lib/Bio/DB/Persistent/PersistentObject.pm line 552, line 1. > t/15cluster.......dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 6-160 > Failed 155/160 tests, 3.12% okay > t/16obda..........ok > > On Thu, 17 Apr 2008, Chris Fields wrote: > >> >> On Apr 17, 2008, at 9:47 AM, Hilmar Lapp wrote: >> >>> On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >>>> In any case I debugged and tracked that down to the RichSeq >>>> adaptor module missing. >>> That almost can't be the problem. Every Bio::Seq::RichSeq is-a >>> Bio::Seq and a SeqAdaptor is present. >>> I'm afraid it gets stuck somewhere else and frankly I didn't see >>> the RichSeqAdaptor failing to load in your stack trace: >>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> >>>> MSG: Annotation of class Bio::Annotation::Collection not >>>> type-mapped. Internal error? >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw >>>> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >>>> STACK: >>>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >>>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >>>> STACK: >>>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >>>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >>>> STACK: Bio::DB::Persistent::PersistentObject::store >>>> Bio/DB/Persistent/PersistentObject.pm:271 >>>> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >>>> Bio/DB/BioSQL/SeqAdaptor.pm:224 >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>>> STACK: Bio::DB::Persistent::PersistentObject::create >>>> Bio/DB/Persistent/PersistentObject.pm:244 >>>> STACK: t/04swiss.t:36 >>>> ----------------------------------------------------------- >>> What that tells me is that when bioperl-db tries to store the >>> annotation bundle of the (SwissProt) sequence, one of the >>> annotations that it encounters is of type >>> Bio::Annotation::Collection. At present bioperl-db doesn't know >>> what to do with it; i.e., bioperl-db can't yet handle hierarchical >>> annotation collections (collections within collections). >>> I believe this is due to recent changes in how the GN line is >>> parsed in BioPerl - Chris does this ring the right bell? I thought >>> though you had built in a method would allow flattening out >> >> This appears to be using an older bioperl-live checkout, one where >> Heikki changed GN parsing to use a nested Annotation::Collection. >> I reverted that back in a later commit to svn specifically b/c of >> bioperl-db problems. bioperl-live's swiss.pm now uses a new >> subclass of Bio::Annotation::SimpleValue (Bio::Annotation::TagTree) >> that represents nested values via Data::Stag's itext output (we can >> change that to alternatives if needed). >> >> Here are the last few relevant revisions in bioperl-live's main >> trunk (mine is the latest): >> >> ------------------------------------------------------------------------ >> r14562 | cjfields | 2008-02-28 08:30:05 -0600 (Thu, 28 Feb 2008) | >> 1 line >> >> bug 1825: updating swiss.pm/tests to try out TagTree (passes all >> tests). Need to update Handler.t and related modules still... >> ------------------------------------------------------------------------ >> r14541 | heikki | 2008-02-25 00:10:48 -0600 (Mon, 25 Feb 2008) | 1 >> line >> >> documentation for the GN line parsing and management >> ------------------------------------------------------------------------ >> r14538 | heikki | 2008-02-23 08:48:23 -0600 (Sat, 23 Feb 2008) | 1 >> line >> >> GN (Gene Name) line parsing rewrite. Breaks backward compatibility. >> Can now deal with >1 gene per entry and four categories of names >> per gene. Parses old style syntax (...OR ... OR ... ) into one gene >> name and synonyms for each gene. Docs to follow. >> >> .... >> >> I just updated all code from dev and reran bioperl-db tests w/o >> problems. Maybe someone else could do the same to see what happens? >> >>> It's worth noting that BioSQL itself can't really represent nested >>> annotation collections other than by using ontology terms and >>> their hierarchy, which at present I think isn't really >>> appropriate, but I have to think through the issue more. In other >>> words, in BioSQL you can't directly tie together a bunch of >>> qualifier value pairs into a "bag" and then nest this bag within >>> another. The way to make this work with the current schema is to >>> flatten out the nesting. >>> >>> -hilmar >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >> >> Might be worth looking into for a future BioSQL release, but we >> have a decent workaround in place for now, as long as it works >> cross-platform and cross-RDB. >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Thu Apr 17 17:52:22 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 17 Apr 2008 13:52:22 -0400 Subject: [Bioperl-l] bioperl-db woes In-Reply-To: References: <4807534D.80105@bms.com> <82B3844B-A133-4AF3-9F08-774730F9B44C@uiuc.edu> <2D6AEAD9-286C-4F3F-8992-1778847708A8@gmx.net> Message-ID: <48078E56.9000404@bms.com> Chris Fields wrote: > Stefan, > > 'get_dbxrefs' was introduced in bioperl-live a while back during the > feature/annotation rollback detailed here: > > http://www.bioperl.org/wiki/Feature_Annotation_rollback > Chris was right! > I still think this is an interfering old bioperl (and maybe > bioperl-db) installation causing the problems; I had similar issues at > one point and had to find and remove the old installation. It might > be worth (1) checking 'perldoc -l Bio::Root::Root', This is the first thing I did and it seemed fine from command line. So I checked a new copy (vs. updating), set PERL5LIB to the minimum which is necessary (Build changes INC), which is /home/kirovs/bioperl-db/bplive:/stf/sysdev/perl/newlib/perl/lib/5.8/ia64-linux-multi/ (/home/kirovs/bioperl-db/bplive being the fresh copy and the other having Module::Build, etc., but definitely no bioperl). This fixed the problem. I still do not see where the old module came from, but that was a really good guess. Thanks Stefan > which will give the location of the Bio::Root::Root in lib path being > used, and (2) using './Build install uninst=1' to remove any old > bioperl/bioperl-db installations. Unfortunately this is not an option for me. > > chris > > On Apr 17, 2008, at 11:40 AM, Stefan Kirov wrote: > >> Hilmar, >> sorry, I missed the part after the stack trace... In any case this is >> still problem for me after I updated bioperl-live. >> I see this with a number of other tests: >> t/04swiss.........ok 3/52Can't locate object method "get_dbxrefs" via >> package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 78. >> t/04swiss.........dubious >> Test returned status 2 (wstat 512, 0x200) >> DIED. FAILED tests 6-52 >> Failed 47/52 tests, 9.62% okay >> t/05seqfeature....ok 4/48Can't locate object method "get_dbxrefs" via >> package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 72. >> t/05seqfeature....FAILED tests 9-48 >> Failed 40/48 tests, 16.67% okay >> t/06comment.......ok >> t/07dblink........ok >> t/08genbank.......ok >> t/09fuzzy2........ok >> t/10ensembl.......ok 1/15Can't locate object method "get_dbxrefs" via >> package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 1420. >> t/10ensembl.......dubious >> Test returned status 2 (wstat 512, 0x200) >> DIED. FAILED tests 3-15 >> Failed 13/15 tests, 13.33% okay >> t/11locuslink.....ok 4/110Can't locate object method "get_dbxrefs" >> via package "Bio::Annotation::OntologyTerm" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 1. >> t/11locuslink.....dubious >> Test returned status 2 (wstat 512, 0x200) >> DIED. FAILED tests 5-110 >> Failed 106/110 tests, 3.64% okay >> t/12ontology......ok 1/738Can't locate object method "get_dbxrefs" >> via package "Bio::Ontology::GOterm" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 98. >> t/12ontology......dubious >> Test returned status 255 (wstat 65280, 0xff00) >> DIED. FAILED tests 5-738 >> Failed 734/738 tests, 0.54% okay >> t/13remove........ok 2/59Can't locate object method "get_dbxrefs" via >> package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 145. >> t/13remove........FAILED tests 11-59 >> Failed 49/59 tests, 16.95% okay >> t/14query.........ok >> t/15cluster.......ok 3/160Can't locate object method "get_dbxrefs" >> via package "Bio::Ontology::Term" at >> /home/kirovs/bioperl-db/blib/lib/Bio/DB/Persistent/PersistentObject.pm >> line 552, line 1. >> t/15cluster.......dubious >> Test returned status 2 (wstat 512, 0x200) >> DIED. FAILED tests 6-160 >> Failed 155/160 tests, 3.12% okay >> t/16obda..........ok >> >> On Thu, 17 Apr 2008, Chris Fields wrote: >> >>> >>> On Apr 17, 2008, at 9:47 AM, Hilmar Lapp wrote: >>> >>>> On Apr 17, 2008, at 10:18 AM, Stefan Kirov wrote: >>>>> In any case I debugged and tracked that down to the RichSeq >>>>> adaptor module missing. >>>> That almost can't be the problem. Every Bio::Seq::RichSeq is-a >>>> Bio::Seq and a SeqAdaptor is present. >>>> I'm afraid it gets stuck somewhere else and frankly I didn't see >>>> the RichSeqAdaptor failing to load in your stack trace: >>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> >>>>> MSG: Annotation of class Bio::Annotation::Collection not >>>>> type-mapped. Internal error? >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw >>>>> /home/kirovs/bioperl-live/Bio/Root/Root.pm:357 >>>>> STACK: >>>>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::_annotation_map_key >>>>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:695 >>>>> STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children >>>>> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:204 >>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store >>>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >>>>> STACK: Bio::DB::Persistent::PersistentObject::store >>>>> Bio/DB/Persistent/PersistentObject.pm:271 >>>>> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children >>>>> Bio/DB/BioSQL/SeqAdaptor.pm:224 >>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create >>>>> Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 >>>>> STACK: Bio::DB::Persistent::PersistentObject::create >>>>> Bio/DB/Persistent/PersistentObject.pm:244 >>>>> STACK: t/04swiss.t:36 >>>>> ----------------------------------------------------------- >>>> What that tells me is that when bioperl-db tries to store the >>>> annotation bundle of the (SwissProt) sequence, one of the >>>> annotations that it encounters is of type >>>> Bio::Annotation::Collection. At present bioperl-db doesn't know >>>> what to do with it; i.e., bioperl-db can't yet handle hierarchical >>>> annotation collections (collections within collections). >>>> I believe this is due to recent changes in how the GN line is >>>> parsed in BioPerl - Chris does this ring the right bell? I thought >>>> though you had built in a method would allow flattening out >>> >>> This appears to be using an older bioperl-live checkout, one where >>> Heikki changed GN parsing to use a nested Annotation::Collection. I >>> reverted that back in a later commit to svn specifically b/c of >>> bioperl-db problems. bioperl-live's swiss.pm now uses a new subclass >>> of Bio::Annotation::SimpleValue (Bio::Annotation::TagTree) that >>> represents nested values via Data::Stag's itext output (we can >>> change that to alternatives if needed). >>> >>> Here are the last few relevant revisions in bioperl-live's main >>> trunk (mine is the latest): >>> >>> ------------------------------------------------------------------------ >>> >>> r14562 | cjfields | 2008-02-28 08:30:05 -0600 (Thu, 28 Feb 2008) | 1 >>> line >>> >>> bug 1825: updating swiss.pm/tests to try out TagTree (passes all >>> tests). Need to update Handler.t and related modules still... >>> ------------------------------------------------------------------------ >>> >>> r14541 | heikki | 2008-02-25 00:10:48 -0600 (Mon, 25 Feb 2008) | 1 line >>> >>> documentation for the GN line parsing and management >>> ------------------------------------------------------------------------ >>> >>> r14538 | heikki | 2008-02-23 08:48:23 -0600 (Sat, 23 Feb 2008) | 1 line >>> >>> GN (Gene Name) line parsing rewrite. Breaks backward compatibility. >>> Can now deal with >1 gene per entry and four categories of names per >>> gene. Parses old style syntax (...OR ... OR ... ) into one gene name >>> and synonyms for each gene. Docs to follow. >>> >>> .... >>> >>> I just updated all code from dev and reran bioperl-db tests w/o >>> problems. Maybe someone else could do the same to see what happens? >>> >>>> It's worth noting that BioSQL itself can't really represent nested >>>> annotation collections other than by using ontology terms and their >>>> hierarchy, which at present I think isn't really appropriate, but I >>>> have to think through the issue more. In other words, in BioSQL you >>>> can't directly tie together a bunch of qualifier value pairs into a >>>> "bag" and then nest this bag within another. The way to make this >>>> work with the current schema is to flatten out the nesting. >>>> >>>> -hilmar >>>> --=========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>> >>> Might be worth looking into for a future BioSQL release, but we have >>> a decent workaround in place for now, as long as it works >>> cross-platform and cross-RDB. >>> >>> chris >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From hubert.gaynor at yahoo.com Fri Apr 18 00:53:16 2008 From: hubert.gaynor at yahoo.com (Hubert Gaynor) Date: Thu, 17 Apr 2008 17:53:16 -0700 (PDT) Subject: [Bioperl-l] Can I use BLAST against a database like MySQL Message-ID: <130971.67684.qm@web46007.mail.sp1.yahoo.com> Hi Sean, I got it. Thank you so much! Hubert ----- Original Message ---- From: Sean Davis To: Hubert Gaynor Sent: Thursday, April 17, 2008 6:36:02 PM Subject: Re: [Bioperl-l] Can I use BLAST against a database like MySQL On Thu, Apr 17, 2008 at 2:19 AM, Hubert Gaynor wrote: > Hi, > > As far as I know, before using BLAST to do the alignment the first thing should be done is typing formatdb to construct a database. But I was wondering whether it is possible to construct a database with MySQL which probably will grant the BLAST search a higher speed and make the database management much easier? > formatdb is used to make a representation that can be used efficiently by blast. That representation already makes blast faster. MySQL can't be used for such things. As for speeding blast, if you have a multiprocessor machine, you can take advantage of those using blast and increasing the number of processors. Also, while blast is a very versatile program, it is not the only alignment program available. Depending on your needs, you could look at other programs such as blat or gmap that can be 2-3 orders of magnitude faster than blast. Sean ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From Russell.Smithies at agresearch.co.nz Fri Apr 18 01:39:23 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 18 Apr 2008 13:39:23 +1200 Subject: [Bioperl-l] accessing params for custom glyphs? In-Reply-To: <130971.67684.qm@web46007.mail.sp1.yahoo.com> References: <130971.67684.qm@web46007.mail.sp1.yahoo.com> Message-ID: This is probably more of a Perl OO problem I'm having, but can anyone tell me how to access a parameter when I create a custom glyph? I've created a panel in the usual way then I add a feature with 'my_glyph' and want to pass the value of -new_parameter to the glyph drawing code. $panel->add_track( $feature, -font => gdSmallFont, -glyph => 'my_glyph' , -height => 10, -label => 1, -strand => "forward", -new_parameter => "test", In my_glyph.pm, I have the usual draw_component sub: sub draw_component { my $self = shift; my $gd = shift; my ($x1,$y1,$x2,$y2) = $self->bounds(@_); my $fg = $self->fgcolor; my $params = $self->?????????? <<--- how do I access the value of "new_parameter" set in the panel drawing code? $gd->line($x1,$y1,$x2,$y2,$fg); $gd->line($x1,$y2,$x2,$y1,$fg); } Any ideas? Thanx, Russell Smithies ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From David.Messina at sbc.su.se Fri Apr 18 09:31:59 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 18 Apr 2008 11:31:59 +0200 Subject: [Bioperl-l] Finding seqs of given domain architecture In-Reply-To: <628aabb70804170155n4e5dfd81r7020c3e9e11094ff@mail.gmail.com> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <200804161336.16879.heikki@sanbi.ac.za> <628aabb70804161112o6610ee1fkfb4b08e74730237d@mail.gmail.com> <1208420674.23342.15.camel@razor.sbc.su.se> <628aabb70804170155n4e5dfd81r7020c3e9e11094ff@mail.gmail.com> Message-ID: <628aabb70804180231p2b9cef9dwd5441e85c31531fd@mail.gmail.com> Jacob, I talked about your question with a colleague of mine who has been working in this area. Below is his reply. [I'm reposting this *without* the attachment mentioned since the mailing list wouldn't accept it otherwise. If anyone wants a copy of the code, just email me.] Dave ------- > 3. Pfam has this capability, i.e. to show all domains with a given > architecture, but it is difficult to get at the actual sequences or > even a list of accession numbers. First, this should be available right away in PfamAlyser: http://pfamalyzer.sbc.su.se/pfamalyzer/index.html although you might need to upgrade your browser to Java 1.6 to get it to work. If this does not work as suggested (an upgraded version is coming eventually), have a look at the file: ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/swisspfam.gz which contains the Pfam architectures for all UniProt sequences. You can parse that to get a file of - correspondences and just filter that to get the accession numbers. (Please find attached a Perl script to do just that.) Under UNIX, you can then just grep this for the domain IDs, (like grep domainArchitectureFile.txt PF00008 | grep PF00456 > resultFile.txt) but I am sure there are solutions under other operating systems as well. You could then write a script to parse out the corresponding sequences from the UniProt fasta flatfile, if you wanted, or (again under UNIX) a script to wget them of the webpage. In case your sequences are not in UniProt, consider using HMMER and the Pfam HMM files to assign domains to all sequences in your dataset. I would then parse the HMMER output into the same format as the above, and use the same approach following that. Hope this helps, Yours sincerely, Kristoffer Forslund krifo at sbc.su.se From lincoln.stein at gmail.com Fri Apr 18 19:16:19 2008 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 18 Apr 2008 15:16:19 -0400 Subject: [Bioperl-l] [Gmod-gbrowse] accessing params for custom glyphs? In-Reply-To: References: <130971.67684.qm@web46007.mail.sp1.yahoo.com> Message-ID: <6dce9a0b0804181216q6564e580u8a805ae96c78df2e@mail.gmail.com> Hi Russell, It's very simple: my $params = $self->option('new_parameter'); Lincoln On Thu, Apr 17, 2008 at 9:39 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > This is probably more of a Perl OO problem I'm having, but can anyone > tell me how to access a parameter when I create a custom glyph? > > I've created a panel in the usual way then I add a feature with > 'my_glyph' and want to pass the value of -new_parameter to the glyph > drawing code. > > $panel->add_track( $feature, > -font => gdSmallFont, > -glyph => 'my_glyph' , > -height => 10, > -label => 1, > -strand => "forward", > -new_parameter => "test", > > > In my_glyph.pm, I have the usual draw_component sub: > > sub draw_component { > my $self = shift; > my $gd = shift; > my ($x1,$y1,$x2,$y2) = $self->bounds(@_); > my $fg = $self->fgcolor; > my $params = $self->?????????? <<--- how do I access the value of > "new_parameter" set in the panel drawing code? > > $gd->line($x1,$y1,$x2,$y2,$fg); > $gd->line($x1,$y2,$x2,$y1,$fg); > > } > > Any ideas? > > Thanx, > > Russell Smithies > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From jason at bioperl.org Sat Apr 19 02:35:10 2008 From: jason at bioperl.org (Jason Stajich) Date: Fri, 18 Apr 2008 19:35:10 -0700 Subject: [Bioperl-l] index::abstract on win and unix In-Reply-To: <1208381947.16620.6.camel@kiss-laptop> References: <1208366718.19084.15.camel@kiss-laptop> <1208381947.16620.6.camel@kiss-laptop> Message-ID: do you want the LOCUS or the ACCESSION? Do you mean the result is the completely wrong record or just the wrong field? accession number is available from the seq's accession_number() method. -jason On Apr 16, 2008, at 2:39 PM, Fr?d?ric Romagn? wrote: > Well, if with input file you mean the database used, it's created > with Bio::Index::GenBank from a ncbi FTP's genbank file. > > $id is an accession number read from a file but i chomp the line... > > I am trying to install the svn version of bioperl under windows to see > if there is an improvement. > > Le jeudi 17 avril 2008 ? 08:49 +1200, Smithies, Russell a ?crit : >> Did you check the format of your input file? >> i.e. DOS or UNIX line endings? >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open- >>> bio.org] On Behalf Of Fr?d?ric Romagn? >>> Sent: Thursday, 17 April 2008 5:25 a.m. >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix >>> >>> Hello, >>> i made a program which use Bio::Index::GenBank and i tested it under >>> unix, that worked well. >>> >>> But i have to launch it under windows and it seems not to work on. >>> >>> Here is the problem : >>> >>> my $dbobj = Bio::Index::Abstract->new("Data/$db"); >>> my $seq = $dbobj->get_Seq_by_acc($id); >>> print $seq->display_id."\n"; >>> >>> did not print the same number than $id !!! So i don't work on the >>> sequence expected... >>> >>> I use the SVN sources on unix and the Perl package manager for >>> windows... >>> >>> Thanks. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> ===================================================================== >> == >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> ===================================================================== >> == > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bioperlanand at yahoo.com Mon Apr 21 07:44:00 2008 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Mon, 21 Apr 2008 00:44:00 -0700 (PDT) Subject: [Bioperl-l] a question on obtaining HTML formatted Blast output along with the Blast hits image Message-ID: <372845.37134.qm@web36808.mail.mud.yahoo.com> Hi everybody, I would like to obtain a HTML formatted blast report output along with a picture of the blast hits as shown on Slide 60 in this pdf: http://jason.open-bio.org/Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf I have gotten the HTML output working using "Bio::SearchIO::Writer::HTMLResultWriter". My question: How do I integrate it with Bio:Graphics to render the blast hits image at the correct position in my Bioperl reformatted html file. I ultimately want to be able to display my blast output files on a browser. Here is my code so far: ---------------------------------------------------------------- #!/usr/bin/perl -w # usage: $0 use strict; use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; my $infile = shift or die $!; my $searchio = new Bio::SearchIO( -format => 'blast',-file => $infile ); my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">${infile}.html"); $outhtml->write_result($searchio->next_result); ---------------------------------------------------------------- Thanks in advance, Anand --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From cjfields at uiuc.edu Mon Apr 21 15:07:17 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 Apr 2008 10:07:17 -0500 Subject: [Bioperl-l] [Proposed change] HSP::frame() Message-ID: I have noticed (in relation to bug 2485, http://bugzilla.open-bio.org/show_bug.cgi?id=2485) that the Bio::Search::HSP::GenericHSP frame() method is implemented very differently from strand(), start(), end(), and most other HSP methods. The current behavior is to return an array of two values (query and hit frame) under list conditions, the query frame if one value is passed, and the subject frame if no value is passed under scalar context and both under list context. The latter behavior is unfortunately leading to the aforementioned bug above. The method is also implied to be a getter/setter, but the implementation doesn't allow that; it always sets to the instantiated values (in fact, repeatedly so). In order to fix that and make the interface more consistent I am changing frame() to behave like strand(), etc., in that the first argument is 'query/subject/hit/list' (default = 'query' if no arg specified) and the rest optional values for setting, in query/subject order. One issue: I can catch and imitate most of the older behavior with a few additional checks, the one exception being the old frame() default return value which is now 'query' (not context-dependent). If needed we can change the default to 'hit', but I believe method consistency is probably the better route, and I can always add a warning under old API circumstances indicating the change. I am also modifying HSPTableWriter to print frame_hit and frame_query (previously it was only printing 'frame', which implied the hit frame). I can see this being an issue with anyone expecting 'frame' instead of 'frame_hit'; I could hack in a fix for that if needed. If there aren't any objections or suggestions, I'll commit this in the next day or two. chris From cjfields at uiuc.edu Mon Apr 21 15:32:59 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 Apr 2008 10:32:59 -0500 Subject: [Bioperl-l] Assembly.t test fails Message-ID: I'm getting some significant test failures in bioperl-live for Bio::Assembly: t/Assembly...... 1..35 ok 1 - use Bio::Assembly::IO; ok 2 - The object isa Bio::Assembly::IO ok 3 - The object isa Bio::Assembly::Scaffold ok 4 not ok 5 ok 6 - The object isa Bio::AnnotationCollectionI ok 7 - no annotations in Annotation collection? ok 8 # Failed test at t/Assembly.t line 35. # got: 'NoName' # expected: 'test' Can't locate object method "get_contig_seq_ids" via package "Bio::Assembly::Contig" at /Users/cjfields/bioperl/bioperl-live/blib/ lib/Bio/Assembly/Scaffold.pm line 189, line 733. # Looks like you planned 35 tests but only ran 8. # Looks like you failed 1 test of 8 run. # Looks like your test died just after 8. Dubious, test returned 255 (wstat 65280, 0xff00) Failed 28/35 subtests Test Summary Report ------------------- t/Assembly.t (Wstat: 65280 Tests: 8 Failed: 1) Failed test: 5 Non-zero exit status: 255 Parse errors: Bad plan. You planned 35 tests but ran 8. Files=1, Tests=8, 0 wallclock secs ( 0.01 usr 0.00 sys + 0.22 cusr 0.04 csys = 0.27 CPU) Result: FAIL Failed 1/1 test programs. 1/8 subtests failed. chris Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Apr 21 15:44:21 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 Apr 2008 10:44:21 -0500 Subject: [Bioperl-l] Assembly.t test fails In-Reply-To: References: Message-ID: <2F199628-717E-4F88-85D7-408BD7BBE16D@uiuc.edu> Scratch that, figured it out (easy fix). chris On Apr 21, 2008, at 10:32 AM, Chris Fields wrote: > I'm getting some significant test failures in bioperl-live for > Bio::Assembly: > > t/Assembly...... > 1..35 > ok 1 - use Bio::Assembly::IO; > ok 2 - The object isa Bio::Assembly::IO > ok 3 - The object isa Bio::Assembly::Scaffold > ok 4 > not ok 5 > ok 6 - The object isa Bio::AnnotationCollectionI > ok 7 - no annotations in Annotation collection? > ok 8 > > # Failed test at t/Assembly.t line 35. > # got: 'NoName' > # expected: 'test' > Can't locate object method "get_contig_seq_ids" via package > "Bio::Assembly::Contig" at /Users/cjfields/bioperl/bioperl-live/blib/ > lib/Bio/Assembly/Scaffold.pm line 189, line 733. > # Looks like you planned 35 tests but only ran 8. > # Looks like you failed 1 test of 8 run. > # Looks like your test died just after 8. > Dubious, test returned 255 (wstat 65280, 0xff00) > Failed 28/35 subtests > > Test Summary Report > ------------------- > t/Assembly.t (Wstat: 65280 Tests: 8 Failed: 1) > Failed test: 5 > Non-zero exit status: 255 > Parse errors: Bad plan. You planned 35 tests but ran 8. > Files=1, Tests=8, 0 wallclock secs ( 0.01 usr 0.00 sys + 0.22 > cusr 0.04 csys = 0.27 CPU) > Result: FAIL > Failed 1/1 test programs. 1/8 subtests failed. > > > chris > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From frederic.romagne at gmail.com Mon Apr 21 15:53:11 2008 From: frederic.romagne at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Romagn=E9?=) Date: Mon, 21 Apr 2008 10:53:11 -0500 Subject: [Bioperl-l] index::abstract on win and unix In-Reply-To: References: <1208366718.19084.15.camel@kiss-laptop> <1208381947.16620.6.camel@kiss-laptop> Message-ID: <1208793191.25906.9.camel@kiss-laptop> In fact, i want the whole Bio::Seq object, but the i verified the ACCESSION and the LOCUS are the same in my genbank files. I saw that the program sometimes tells that it cannot find the entry : if( !defined $seq ) { warn("Sequence $id in Database $db is not present\n"); } i suspect the make_index function not to work properly on windows instead of the ?get_Seq_by_acc function... Le vendredi 18 avril 2008 ? 19:35 -0700, Jason Stajich a ?crit : > do you want the LOCUS or the ACCESSION? > Do you mean the result is the completely wrong record or just the > wrong field? > accession number is available from the seq's accession_number() method. > -jason > On Apr 16, 2008, at 2:39 PM, Fr?d?ric Romagn? wrote: > > > Well, if with input file you mean the database used, it's created > > with Bio::Index::GenBank from a ncbi FTP's genbank file. > > > > $id is an accession number read from a file but i chomp the line... > > > > I am trying to install the svn version of bioperl under windows to see > > if there is an improvement. > > > > Le jeudi 17 avril 2008 ? 08:49 +1200, Smithies, Russell a ?crit : > >> Did you check the format of your input file? > >> i.e. DOS or UNIX line endings? > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open- > >>> bio.org] On Behalf Of Fr?d?ric Romagn? > >>> Sent: Thursday, 17 April 2008 5:25 a.m. > >>> To: bioperl-l at lists.open-bio.org > >>> Subject: [Bioperl-l] [bioperl-l] index::abstract on win and unix > >>> > >>> Hello, > >>> i made a program which use Bio::Index::GenBank and i tested it under > >>> unix, that worked well. > >>> > >>> But i have to launch it under windows and it seems not to work on. > >>> > >>> Here is the problem : > >>> > >>> my $dbobj = Bio::Index::Abstract->new("Data/$db"); > >>> my $seq = $dbobj->get_Seq_by_acc($id); > >>> print $seq->display_id."\n"; > >>> > >>> did not print the same number than $id !!! So i don't work on the > >>> sequence expected... > >>> > >>> I use the SVN sources on unix and the Perl package manager for > >>> windows... > >>> > >>> Thanks. > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> ===================================================================== > >> == > >> Attention: The information contained in this message and/or > >> attachments > >> from AgResearch Limited is intended only for the persons or entities > >> to which it is addressed and may contain confidential and/or > >> privileged > >> material. Any review, retransmission, dissemination or other use > >> of, or > >> taking of any action in reliance upon, this information by persons or > >> entities other than the intended recipients is prohibited by > >> AgResearch > >> Limited. If you have received this message in error, please notify > >> the > >> sender immediately. > >> ===================================================================== > >> == > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ewijaya at gmail.com Tue Apr 22 14:03:07 2008 From: ewijaya at gmail.com (Edward Wijaya) Date: Tue, 22 Apr 2008 22:03:07 +0800 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output Message-ID: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> Hi, Is there any module that can parse the following output of BLAT. This is taken from UCSC browser. The idea is to parse it and then extract the conserved block of aligned sequences. __DATA__ Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps B D D. melanogaster tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa B D D. simulans tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa B D D. sechellia tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa B D D. yakuba tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa D. erecta tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa D. ananassae taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- D. pseudoobscura tata----ccagtacac-cttatatg------------tttttaaata-------------------- B D D. persimilis tata----ccagtacac-attatatg------------tttttaaata-------------------- D. willistoni aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa D. virilis -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa D. mojavensis -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa D. grimshawi ==================================================================== T. castaneum ==================================================================== Inserts between block 3 and 4 in window D. pseudoobscura 2008bp B D D. persimilis 1421bp D. virilis 5bp D. mojavensis 4640bp Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps B D D. melanogaster ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga B D D. simulans ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga B D D. sechellia ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga B D D. yakuba ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga D. erecta ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga D. pseudoobscura ==================================================================== B D D. persimilis ==================================================================== D. willistoni ----aggattacgaagttcctttat-------------------aaag-------------------- D. virilis gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- D. mojavensis ==================================================================== D. grimshawi ==================================================================== T. castaneum ==================================================================== __ END__ From cjfields at uiuc.edu Tue Apr 22 14:22:45 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 22 Apr 2008 09:22:45 -0500 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output In-Reply-To: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> References: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> Message-ID: <766FDF9E-9F7B-4826-B7FA-87DF3B074EBC@uiuc.edu> A quick grep of bioperl-live gets me Bio::SearchIO::blast, Bio::SearchIO::axt, Bio::SearchIO::psl, Bio::Tools::Blat, and Bio::Tools::WebBlat. Haven't looked at the docs but it's a start! chris On Apr 22, 2008, at 9:03 AM, Edward Wijaya wrote: > Hi, > > Is there any module that can parse the following output > of BLAT. This is taken from UCSC browser. > > The idea is to parse it and then extract the conserved block > of aligned sequences. > > > __DATA__ > Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps > B D D. melanogaster > tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa > B D D. simulans > tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa > B D D. sechellia > tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa > B D D. yakuba > tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa > D. erecta > tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa > D. ananassae > taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- > D. pseudoobscura > tata----ccagtacac-cttatatg------------tttttaaata-------------------- > B D D. persimilis > tata----ccagtacac-attatatg------------tttttaaata-------------------- > D. willistoni > aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa > D. virilis > -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa > D. mojavensis > -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa > D. grimshawi > ==================================================================== > T. castaneum > ==================================================================== > > Inserts between block 3 and 4 in window > D. pseudoobscura 2008bp > B D D. persimilis 1421bp > D. virilis 5bp > D. mojavensis 4640bp > > Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps > B D D. melanogaster > ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga > B D D. simulans > ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga > B D D. sechellia > ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga > B D D. yakuba > ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga > D. erecta > ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga > D. pseudoobscura > ==================================================================== > B D D. persimilis > ==================================================================== > D. willistoni > ----aggattacgaagttcctttat-------------------aaag-------------------- > D. virilis > gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- > D. mojavensis > ==================================================================== > D. grimshawi > ==================================================================== > T. castaneum > ==================================================================== > > __ END__ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Apr 22 14:59:25 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 22 Apr 2008 09:59:25 -0500 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output In-Reply-To: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> References: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> Message-ID: <4F3522BB-28F0-44A8-8DE1-7CF3F648402A@uiuc.edu> A quick grep of bioperl-live gets me Bio::SearchIO::blast, Bio::SearchIO::axt, Bio::SearchIO::psl, Bio::Tools::Blat, and Bio::Tools::WebBlat. Haven't looked at the docs but it's a start! chris On Apr 22, 2008, at 9:03 AM, Edward Wijaya wrote: > Hi, > > Is there any module that can parse the following output > of BLAT. This is taken from UCSC browser. > > The idea is to parse it and then extract the conserved block > of aligned sequences. > > > __DATA__ > Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps > B D D. melanogaster > tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa > B D D. simulans > tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa > B D D. sechellia > tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa > B D D. yakuba > tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa > D. erecta > tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa > D. ananassae > taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- > D. pseudoobscura > tata----ccagtacac-cttatatg------------tttttaaata-------------------- > B D D. persimilis > tata----ccagtacac-attatatg------------tttttaaata-------------------- > D. willistoni > aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa > D. virilis > -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa > D. mojavensis > -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa > D. grimshawi > ==================================================================== > T. castaneum > ==================================================================== > > Inserts between block 3 and 4 in window > D. pseudoobscura 2008bp > B D D. persimilis 1421bp > D. virilis 5bp > D. mojavensis 4640bp > > Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps > B D D. melanogaster > ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga > B D D. simulans > ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga > B D D. sechellia > ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga > B D D. yakuba > ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga > D. erecta > ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga > D. pseudoobscura > ==================================================================== > B D D. persimilis > ==================================================================== > D. willistoni > ----aggattacgaagttcctttat-------------------aaag-------------------- > D. virilis > gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- > D. mojavensis > ==================================================================== > D. grimshawi > ==================================================================== > T. castaneum > ==================================================================== > > __ END__ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Tue Apr 22 18:49:32 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 22 Apr 2008 11:49:32 -0700 Subject: [Bioperl-l] Fwd: [blast-announce] New BLAST URL available at the NCBI References: Message-ID: Does anyone want to take a look at how to use these URLs in the RemoteBlast module, if the interface is the same? -jason Begin forwarded message: > From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" > Date: April 22, 2008 11:35:04 AM PDT > To: > Subject: [blast-announce] New BLAST URL available at the NCBI > > New BLAST URL available at the NCBI > > > > The NCBI has activated a new URL for BLAST searches at the NCBI: > http://blast.ncbi.nlm.nih.gov. > > > > Searches sent to this URL can take advantage of a larger number of > machines for searches and the system has a better overall fault > tolerance. > > > > We recommend migration of all BLAST links and bookmarks (e.g., > http://www.ncbi.nlm.nih.gov/BLAST/ and > http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi) to the new URL. > > > > Links on the NCBI and BLAST home pages will start to change in the > coming weeks. > > > > At this point in time the plans are to also maintain the current BLAST > URL. > > > > > From jason at bioperl.org Tue Apr 22 18:51:08 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 22 Apr 2008 11:51:08 -0700 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output In-Reply-To: <766FDF9E-9F7B-4826-B7FA-87DF3B074EBC@uiuc.edu> References: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> <766FDF9E-9F7B-4826-B7FA-87DF3B074EBC@uiuc.edu> Message-ID: <6C812413-B375-427B-9AF8-5A0AA6167CC8@bioperl.org> if you get it as axt it should parse fine in SearchIO but that is pairwise, if you can get an alignment blocks I can't remember what format this is from UCSC. MSAs are going to be better handed through Bio::AlignIO though so it might be better to build a parser on that. On Apr 22, 2008, at 7:22 AM, Chris Fields wrote: > A quick grep of bioperl-live gets me Bio::SearchIO::blast, > Bio::SearchIO::axt, Bio::SearchIO::psl, Bio::Tools::Blat, and > Bio::Tools::WebBlat. Haven't looked at the docs but it's a start! > > chris > > On Apr 22, 2008, at 9:03 AM, Edward Wijaya wrote: > >> Hi, >> >> Is there any module that can parse the following output >> of BLAT. This is taken from UCSC browser. >> >> The idea is to parse it and then extract the conserved block >> of aligned sequences. >> >> >> __DATA__ >> Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps >> B D D. melanogaster >> tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa >> B D D. simulans >> tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa >> B D D. sechellia >> tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa >> B D D. yakuba >> tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa >> D. erecta >> tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa >> D. ananassae >> taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- >> D. pseudoobscura >> tata----ccagtacac-cttatatg------------tttttaaata-------------------- >> B D D. persimilis >> tata----ccagtacac-attatatg------------tttttaaata-------------------- >> D. willistoni >> aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa >> D. virilis >> -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa >> D. mojavensis >> -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa >> D. grimshawi >> ==================================================================== >> T. castaneum >> ==================================================================== >> >> Inserts between block 3 and 4 in window >> D. pseudoobscura 2008bp >> B D D. persimilis 1421bp >> D. virilis 5bp >> D. mojavensis 4640bp >> >> Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps >> B D D. melanogaster >> ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga >> B D D. simulans >> ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga >> B D D. sechellia >> ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga >> B D D. yakuba >> ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga >> D. erecta >> ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga >> D. pseudoobscura >> ==================================================================== >> B D D. persimilis >> ==================================================================== >> D. willistoni >> ----aggattacgaagttcctttat-------------------aaag-------------------- >> D. virilis >> gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- >> D. mojavensis >> ==================================================================== >> D. grimshawi >> ==================================================================== >> T. castaneum >> ==================================================================== >> >> __ END__ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Apr 22 19:02:14 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 22 Apr 2008 14:02:14 -0500 Subject: [Bioperl-l] Fwd: [blast-announce] New BLAST URL available at the NCBI In-Reply-To: References: Message-ID: <13C2AD96-8297-40DD-ADCC-B2BEC923B9E0@uiuc.edu> They work exactly the same as the old URL, at least on the surface; I haven't tried changing many URLAPI parameters. I went ahead and changed the URL in RemoteBlast to http://blast.ncbi.nlm.nih.gov/Blast.cgi as it works with RemoteBlast.t. chris On Apr 22, 2008, at 1:49 PM, Jason Stajich wrote: > Does anyone want to take a look at how to use these URLs in the > RemoteBlast module, if the interface is the same? > > -jason > > Begin forwarded message: > >> From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" >> >> Date: April 22, 2008 11:35:04 AM PDT >> To: >> Subject: [blast-announce] New BLAST URL available at the NCBI >> >> New BLAST URL available at the NCBI >> >> >> >> The NCBI has activated a new URL for BLAST searches at the NCBI: >> http://blast.ncbi.nlm.nih.gov. >> >> >> >> Searches sent to this URL can take advantage of a larger number of >> machines for searches and the system has a better overall fault >> tolerance. >> >> >> >> We recommend migration of all BLAST links and bookmarks (e.g., >> http://www.ncbi.nlm.nih.gov/BLAST/ and >> http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi) to the new URL. >> >> >> >> Links on the NCBI and BLAST home pages will start to change in the >> coming weeks. >> >> >> >> At this point in time the plans are to also maintain the current >> BLAST >> URL. >> >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Apr 22 18:58:40 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 22 Apr 2008 13:58:40 -0500 Subject: [Bioperl-l] BioPerl Module to Parse BLAT alignment output In-Reply-To: <6C812413-B375-427B-9AF8-5A0AA6167CC8@bioperl.org> References: <3521d3670804220703u4d8565c8q604036727aedf0a8@mail.gmail.com> <766FDF9E-9F7B-4826-B7FA-87DF3B074EBC@uiuc.edu> <6C812413-B375-427B-9AF8-5A0AA6167CC8@bioperl.org> Message-ID: <43344C89-6B4D-4360-AF56-A6FDD065FFF3@uiuc.edu> Related to that, I have thought about building a parser for some of the query-anchored alignments produced by blastall, just haven't had time to devote to it. One of these days... chris On Apr 22, 2008, at 1:51 PM, Jason Stajich wrote: > if you get it as axt it should parse fine in SearchIO but that is > pairwise, if you can get an alignment blocks I can't remember what > format this is from UCSC. > MSAs are going to be better handed through Bio::AlignIO though so it > might be better to build a parser on that. > > On Apr 22, 2008, at 7:22 AM, Chris Fields wrote: > >> A quick grep of bioperl-live gets me Bio::SearchIO::blast, >> Bio::SearchIO::axt, Bio::SearchIO::psl, Bio::Tools::Blat, and >> Bio::Tools::WebBlat. Haven't looked at the docs but it's a start! >> >> chris >> >> On Apr 22, 2008, at 9:03 AM, Edward Wijaya wrote: >> >>> Hi, >>> >>> Is there any module that can parse the following output >>> of BLAT. This is taken from UCSC browser. >>> >>> The idea is to parse it and then extract the conserved block >>> of aligned sequences. >>> >>> >>> __DATA__ >>> Alignment block 3 of 135 in window, 5860248 - 5860300, 53 bps >>> B D D. melanogaster >>> tgtg----tatttatgt-tttaaataaaggt-------tttctaaata---cgaaatttcaaatttaa >>> B D D. simulans >>> tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cgcaattttaaatttaa >>> B D D. sechellia >>> tgtg----tatttatgt-tttaaataaaggt-------tttttaaata---cccaattttaaatttaa >>> B D D. yakuba >>> tgtg----tatttatgt-tcttaataaaggt-------ttcctaaataa-ttcaaaatttaaattaaa >>> D. erecta >>> tgtg----tgtttatgt-ttttaataaaggt-------tttctaaataa--tcgaaattcatttcaaa >>> D. ananassae >>> taag----tttttatgtattttaaaatatag-------aaaataaata---aaaaaaattgaact--- >>> D. pseudoobscura >>> tata----ccagtacac-cttatatg------------tttttaaata-------------------- >>> B D D. persimilis >>> tata----ccagtacac-attatatg------------tttttaaata-------------------- >>> D. willistoni >>> aaaaaagttatttgaat-ttggaata------------taccaaaacatgttggaaatt------gaa >>> D. virilis >>> -------------gatt-ttataataaaattgcgctaatttctaa------------tttacgttaaa >>> D. mojavensis >>> -------------tagt-ccttaatataaatataatattaaataaata-------cttttaagttaaa >>> D. grimshawi >>> ==================================================================== >>> T. castaneum >>> ==================================================================== >>> >>> Inserts between block 3 and 4 in window >>> D. pseudoobscura 2008bp >>> B D D. persimilis 1421bp >>> D. virilis 5bp >>> D. mojavensis 4640bp >>> >>> Alignment block 4 of 135 in window, 5860301 - 5860344, 44 bps >>> B D D. melanogaster >>> ----tgggtagcagcgttgccagat--------------------aaagggacatgtttactggctga >>> B D D. simulans >>> ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga >>> B D D. sechellia >>> ----tgggaagcagcgttgccagat-------------------gaaacgggcatgtttgcaggctga >>> B D D. yakuba >>> ----tgagtaccaatgctgccagat-------------ctttgtaaagcggtaatgtttgctggctga >>> D. erecta >>> ----t-----ttaatgttgccagat-------------ctgcgtaaggcgctcatgttggctggctga >>> D. pseudoobscura >>> ==================================================================== >>> B D D. persimilis >>> ==================================================================== >>> D. willistoni >>> ----aggattacgaagttcctttat-------------------aaag-------------------- >>> D. virilis >>> gactagtttaatatctcagcccgttaagctaactgttactttttacagtattcgcgccattttgc--- >>> D. mojavensis >>> ==================================================================== >>> D. grimshawi >>> ==================================================================== >>> T. castaneum >>> ==================================================================== >>> >>> __ END__ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bioperlanand at yahoo.com Wed Apr 23 06:02:30 2008 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Tue, 22 Apr 2008 23:02:30 -0700 (PDT) Subject: [Bioperl-l] Question on integrating Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter Message-ID: <946658.12337.qm@web36802.mail.mud.yahoo.com> Hi everybody, I would like to use Bio::Graphics in conjunction with Bio::SearchIO::Writer::HTMLResultWriter to obtain a HTML formatted blast report output along with an image of the blast hits as shown on Slide 60 in this pdf: http://jason.open-bio.org/Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf I am able to get the HTML output using "Bio::SearchIO::Writer::HTMLResultWriter" and I am able to get the image using the examples outlined in the Bio::Graphics HOWTO: http://www.bioperl.org/wiki/HOWTO:Graphics My question: How do I integrate Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter to render the blast hits image at the correct position in my BioPerl reformatted html file. I also found that someone else has asked something similar to whatever I am asking & is listed under the "Orphans, Leftovers" category in the ListSummary:April 26-May 9,2006 document: http://www.bioperl.org/wiki/ListSummary:April_26-May_9%2C2006#Orphans.2C_Leftovers Here is my code so far: ---------------------------------------------------------------- #!/usr/bin/perl -w # usage: $0 use strict; use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; my $infile = shift or die $!; my $searchio = new Bio::SearchIO( -format => 'blast',-file => $infile); my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">${infile}.html"); $outhtml->write_result($searchio->next_result); ---------------------------------------------------------------- Thanks in advance, Anand --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From jason at bioperl.org Wed Apr 23 06:15:28 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 22 Apr 2008 23:15:28 -0700 Subject: [Bioperl-l] Question on integrating Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter In-Reply-To: <946658.12337.qm@web36802.mail.mud.yahoo.com> References: <946658.12337.qm@web36802.mail.mud.yahoo.com> Message-ID: <952B0A4E-8A14-4E8E-B36D-14596B20E330@bioperl.org> Basically you want to inject your own IMG tags into the file with these routines: $writerhtml->start_report(\&my_start_report); $writerhtml->title(\&my_title); $writerhtml->hit_link_align(\&my_hit_link_align); $writerhtml->hit_link_desc(\&my_hit_link_desc); fgblast shows a way to do this in part. It relies on Gbrowse to generate the image but you can replace the gbrowse_img reference to your own image generating software. http://people.genome.duke.edu/~jes12/software/scripts/fgblast -jason On Apr 22, 2008, at 11:02 PM, Anand Venkatraman wrote: > Hi everybody, > > I would like to use Bio::Graphics in conjunction with > Bio::SearchIO::Writer::HTMLResultWriter to obtain a HTML formatted > blast report output along with an image of the blast hits as shown > on Slide 60 in this pdf: http://jason.open-bio.org/ > Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf > > I am able to get the HTML output using > "Bio::SearchIO::Writer::HTMLResultWriter" and I am able to get the > image using the examples outlined in the Bio::Graphics HOWTO: > http://www.bioperl.org/wiki/HOWTO:Graphics > > My question: How do I integrate Bio::Graphics with > Bio::SearchIO::Writer::HTMLResultWriter to render the blast hits > image at the correct position in my BioPerl reformatted html file. > > I also found that someone else has asked something similar to > whatever I am asking & is listed under the "Orphans, Leftovers" > category in the ListSummary:April 26-May 9,2006 document: > http://www.bioperl.org/wiki/ListSummary:April_26-May_9% > 2C2006#Orphans.2C_Leftovers > > Here is my code so far: > ---------------------------------------------------------------- > #!/usr/bin/perl -w > # usage: $0 > use strict; > use Bio::SearchIO; > use Bio::SearchIO::Writer::HTMLResultWriter; > > my $infile = shift or die $!; > > my $searchio = new Bio::SearchIO( -format => 'blast',-file => > $infile); > my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); > my $outhtml = new Bio::SearchIO(-writer => $writerhtml, > -file => ">$ > {infile}.html"); > > $outhtml->write_result($searchio->next_result); > ---------------------------------------------------------------- > > Thanks in advance, > > Anand > > > --------------------------------- > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. > Try it now. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bamboowarrior at gmail.com Wed Apr 23 19:39:21 2008 From: bamboowarrior at gmail.com (Arkady) Date: Wed, 23 Apr 2008 14:39:21 -0500 Subject: [Bioperl-l] WebBlat, where'd it go? Message-ID: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> Hi folks, I'm trying to use BioPerl to run a BLAT search on the four primate genomes on UCSC. I understand that the proper tool for this is Bio::Tools::WebBlat. Unfortunately, it doesn't appear to be in my bioperl distribution (nor do I even know how to figure out what version that is, unfortunately, though it's a very recent install -- a month ago?). I also can't find it on CPAN. Is this deprecated? Has something else replaced it? Or are we always supposed to run local BLAT? Thanks. John Woods Institute for Cellular and Molecular Biology The University of Texas at Austin From spiros at lokku.com Wed Apr 23 19:48:12 2008 From: spiros at lokku.com (Spiros Denaxas) Date: Wed, 23 Apr 2008 20:48:12 +0100 Subject: [Bioperl-l] WebBlat, where'd it go? In-Reply-To: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> References: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> Message-ID: Hey, a quick look at the list of deprecated modules reveals that it has indeed been removed, http://www.bioperl.org/wiki/Deprecated_modules Spiros On Wed, Apr 23, 2008 at 8:39 PM, Arkady wrote: > Hi folks, > > I'm trying to use BioPerl to run a BLAT search on the four primate > genomes on UCSC. I understand that the proper tool for this is > Bio::Tools::WebBlat. Unfortunately, it doesn't appear to be in my > bioperl distribution (nor do I even know how to figure out what > version that is, unfortunately, though it's a very recent install -- a > month ago?). I also can't find it on CPAN. Is this deprecated? Has > something else replaced it? Or are we always supposed to run local > BLAT? > > Thanks. > > John Woods > > Institute for Cellular and Molecular Biology > The University of Texas at Austin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Apr 23 19:56:14 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 23 Apr 2008 14:56:14 -0500 Subject: [Bioperl-l] WebBlat, where'd it go? In-Reply-To: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> References: <91656c3f0804231239j159fb9d8q7bae51ba5cbcd442@mail.gmail.com> Message-ID: It's no longer maintained (deprecated); see the following for an explanation: http://article.gmane.org/gmane.comp.lang.perl.bio.general/13545 Basically, only local BLAT searches are supported through BioPerl. chris On Apr 23, 2008, at 2:39 PM, Arkady wrote: > Hi folks, > > I'm trying to use BioPerl to run a BLAT search on the four primate > genomes on UCSC. I understand that the proper tool for this is > Bio::Tools::WebBlat. Unfortunately, it doesn't appear to be in my > bioperl distribution (nor do I even know how to figure out what > version that is, unfortunately, though it's a very recent install -- a > month ago?). I also can't find it on CPAN. Is this deprecated? Has > something else replaced it? Or are we always supposed to run local > BLAT? > > Thanks. > > John Woods > > Institute for Cellular and Molecular Biology > The University of Texas at Austin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bioperlanand at yahoo.com Wed Apr 23 23:05:27 2008 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Wed, 23 Apr 2008 16:05:27 -0700 (PDT) Subject: [Bioperl-l] Question on integrating Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter In-Reply-To: <952B0A4E-8A14-4E8E-B36D-14596B20E330@bioperl.org> Message-ID: <795696.39415.qm@web36804.mail.mud.yahoo.com> Hi Jason, Thanks for the reply. I am a little lost with the solution suggested. Is that how slide 60 in the pdf is obtained: http://jason.open-bio.org/Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf I guess I am missing something quite obvious, I apologize. What I have & want is this: I have a directory having say 100 different blast reports & hence I am looking to obtain 100 different bioperl formatted blast html outputs with the respective images just as it would appear in the blast report. Thanks, Anand Jason Stajich wrote: Basically you want to inject your own IMG tags into the file with these routines: $writerhtml->start_report(\&my_start_report); $writerhtml->title(\&my_title); $writerhtml->hit_link_align(\&my_hit_link_align); $writerhtml->hit_link_desc(\&my_hit_link_desc); fgblast shows a way to do this in part. It relies on Gbrowse to generate the image but you can replace the gbrowse_img reference to your own image generating software. http://people.genome.duke.edu/~jes12/software/scripts/fgblast -jason On Apr 22, 2008, at 11:02 PM, Anand Venkatraman wrote: Hi everybody, I would like to use Bio::Graphics in conjunction with Bio::SearchIO::Writer::HTMLResultWriter to obtain a HTML formatted blast report output along with an image of the blast hits as shown on Slide 60 in this pdf: http://jason.open-bio.org/Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf I am able to get the HTML output using "Bio::SearchIO::Writer::HTMLResultWriter" and I am able to get the image using the examples outlined in the Bio::Graphics HOWTO: http://www.bioperl.org/wiki/HOWTO:Graphics My question: How do I integrate Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter to render the blast hits image at the correct position in my BioPerl reformatted html file. I also found that someone else has asked something similar to whatever I am asking & is listed under the "Orphans, Leftovers" category in the ListSummary:April 26-May 9,2006 document: http://www.bioperl.org/wiki/ListSummary:April_26-May_9%2C2006#Orphans.2C_Leftovers Here is my code so far: ---------------------------------------------------------------- #!/usr/bin/perl -w # usage: $0 use strict; use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; my $infile = shift or die $!; my $searchio = new Bio::SearchIO( -format => 'blast',-file => $infile); my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">${infile}.html"); $outhtml->write_result($searchio->next_result); ---------------------------------------------------------------- Thanks in advance, Anand --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From jason at bioperl.org Thu Apr 24 18:06:41 2008 From: jason at bioperl.org (Jason Stajich) Date: Thu, 24 Apr 2008 11:06:41 -0700 Subject: [Bioperl-l] Question on integrating Bio::Graphics with Bio::SearchIO::Writer::HTMLResultWriter In-Reply-To: <795696.39415.qm@web36804.mail.mud.yahoo.com> References: <795696.39415.qm@web36804.mail.mud.yahoo.com> Message-ID: The overview graphic is generated basically from the script in scripts/graphics/search_overview.PLS So you'd have to run that on each report to generate the graphic, then use the other methods to insert images into each rendered HTML report. -jason On Apr 23, 2008, at 4:05 PM, Anand Venkatraman wrote: > Hi Jason, > > Thanks for the reply. > > I am a little lost with the solution suggested. Is that how slide > 60 in the pdf is obtained: http://jason.open-bio.org/ > Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf > > I guess I am missing something quite obvious, I apologize. > > What I have & want is this: I have a directory having say 100 > different blast reports & hence I am looking to obtain 100 > different bioperl formatted blast html outputs with the respective > images just as it would appear in the blast report. > > Thanks, > > Anand > > Jason Stajich wrote: > > Basically you want to inject your own IMG tags into the file with > these routines: > > > $writerhtml->start_report(\&my_start_report); > $writerhtml->title(\&my_title); > $writerhtml->hit_link_align(\&my_hit_link_align); > $writerhtml->hit_link_desc(\&my_hit_link_desc); > > > fgblast shows a way to do this in part. It relies on Gbrowse to > generate the image but you can replace the gbrowse_img reference to > your own image generating software. > http://people.genome.duke.edu/~jes12/software/scripts/fgblast > > > > > -jason > On Apr 22, 2008, at 11:02 PM, Anand Venkatraman wrote: > > Hi everybody, > > > I would like to use Bio::Graphics in conjunction with > Bio::SearchIO::Writer::HTMLResultWriter to obtain a HTML formatted > blast report output along with an image of the blast hits as shown > on Slide 60 in this pdf: http://jason.open-bio.org/ > Bioperl_Tutorials/NESCENT_2007/CSHL_Bioperl_I.pdf > > > I am able to get the HTML output using > "Bio::SearchIO::Writer::HTMLResultWriter" and I am able to get the > image using the examples outlined in the Bio::Graphics HOWTO: > http://www.bioperl.org/wiki/HOWTO:Graphics > > > My question: How do I integrate Bio::Graphics with > Bio::SearchIO::Writer::HTMLResultWriter to render the blast hits > image at the correct position in my BioPerl reformatted html file. > > > I also found that someone else has asked something similar to > whatever I am asking & is listed under the "Orphans, Leftovers" > category in the ListSummary:April 26-May 9,2006 document: > http://www.bioperl.org/wiki/ListSummary:April_26-May_9% > 2C2006#Orphans.2C_Leftovers > > > Here is my code so far: > ---------------------------------------------------------------- > #!/usr/bin/perl -w > # usage: $0 > use strict; > use Bio::SearchIO; > use Bio::SearchIO::Writer::HTMLResultWriter; > > > my $infile = shift or die $!; > > > my $searchio = new Bio::SearchIO( -format => 'blast',-file => > $infile); > my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); > my $outhtml = new Bio::SearchIO(-writer => $writerhtml, > -file => ">$ > {infile}.html"); > > > $outhtml->write_result($searchio->next_result); > ---------------------------------------------------------------- > > > Thanks in advance, > > > Anand > > > > > --------------------------------- > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. > Try it now. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > --------------------------------- > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. > Try it now. From 1zoujing at 163.com Thu Apr 17 02:53:16 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 16 Apr 2008 19:53:16 -0700 (PDT) Subject: [Bioperl-l] Error with "parse_entrez_gene_example.pl Sus_scrofa.ags" In-Reply-To: References: <16602770.post@talk.nabble.com> <16603225.post@talk.nabble.com> Message-ID: <16737795.post@talk.nabble.com> Thank you very much! I splited the file on \t directly. Zou Jing Stefan Kirov-2 wrote: > > It is not. If you use this file, why would you need a parser for it > anyway? Just split on \t or read with OpenOffice or equiv. > Stefan > > On Thu, 10 Apr 2008, zoujing wrote: > >> >> Seached the web and found the answer now, quote the answer as following: >> The error was thrown by my Bio::ASN1::EntrezGene module because it >> expects a text file, while you fed it with a binary file. To use >> gzipped ASN binary file from NCBI, download the NCBI gene2xml >> (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), >> then use this syntax to run my parser on the binary files: >> >> my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i >> Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped >> binary file directly downloaded from NCBI >> >> Same syntax should be used when you're using SeqIO (thus >> SeqIO::entrezgene). >> Mingyi >> >> But there still one thing, I want to parse "gene_info.gz" in Gene of >> NCBI. It doesn't work.Is that means "gene_info.gz"( tab-delimited,one >> line >> per GeneID, Column header line is the first line in the file >> ) is not the right format for Bio::ASN1::EntrezGene? >> >> >> >> zoujing wrote: >>> >>> I am a geen hand in Bioperl. When I run perl with >>> "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error >>> information: >>> Data Error: none conforming data found on line 1 in Sus_scrofa.ags. >>> >>> But the Sus_scrofa.ags is download from NCBI, with the format of >>> ASN1, >>> should be the same as Homo_sapiens in the example. So it should be no >>> error as the code is the example from Mingyi. >>> I wonder why this happen, and should I change something about the >>> file? >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16603225.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Error-with-%22parse_entrez_gene_example.pl-Sus_scrofa.ags%22-tp16602770p16737795.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From 1zoujing at 163.com Thu Apr 17 02:55:47 2008 From: 1zoujing at 163.com (zoujing) Date: Wed, 16 Apr 2008 19:55:47 -0700 (PDT) Subject: [Bioperl-l] Bio::ASN1::EntrezGene parse so slowly? In-Reply-To: <264855a00804112050gf785c2ei66d9c7463597eccd@mail.gmail.com> References: <16602210.post@talk.nabble.com> <264855a00804112050gf785c2ei66d9c7463597eccd@mail.gmail.com> Message-ID: <16737804.post@talk.nabble.com> Thank you vey much! Solved the problem now. Jing Sean Davis-3 wrote: > > gene_info is a tab-delimited text file, if I recall correctly. Have > you looked at it? If it is, you should be able to parse it in a few > seconds with just a couple lines of code. > > Sean > > > On Thu, Apr 10, 2008 at 1:08 AM, zoujing <1zoujing at 163.com> wrote: >> >> I want to parse a file "gene_info" from NCBI. The format of Gene in >> NCBI is >> ASN1, right? So I used Bio::ASN1::EntrezGene. But it didn't work >> properly/too slow. The file is about 500M. >> The code is following: >> use Bio::ASN1::EntrezGene; >> my $parser = Bio::ASN1::EntrezGene->new('file' => $ARGV[0]); >> my $i = 0; >> while(my $result = $parser->next_seq) >> { last; #something to do there, here use last for test} >> >> When it goes to the "while" part, it is processing on and on, it does >> not >> went out, even I used "last" in the "while" part. >> So I wonder whether it is too slow or the module is not fit for this >> job, >> or I did something wrong? >> >> Thank you! >> -- >> View this message in context: >> http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16602210.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Bio%3A%3AASN1%3A%3AEntrezGene-parse-so-slowly--tp16602210p16737804.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From sbassi at clubdelarazon.org Sat Apr 26 17:49:20 2008 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Sat, 26 Apr 2008 14:49:20 -0300 Subject: [Bioperl-l] bioperl installation problem Message-ID: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> I tried to install bioperl because I need to install cviewer. Here (http://www.pastecode.com.ar/f37c1cd60) are both stdout and sdterr outputs. Here is one of the errors I get: set_attribute: not a compat02 graph at /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 10. sleeping for 3 seconds set_attribute: not a compat02 graph at /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 14. But I have GD::Graph, so I don't know what is going on: sbassi at ubuntuMAP:~$ sudo perl -MCPAN -e 'install GD::Graph' CPAN: Storable loaded ok Going to read /home/sbassi/.cpan/Metadata Database was generated on Fri, 25 Apr 2008 09:29:45 GMT GD::Graph is up to date. Any help regarding this: http://www.pastecode.com.ar/f37c1cd60 would be appreciated. Best, SB. -- Sebasti?n Bassi (???????). Diplomado en Ciencia y Tecnolog?a. Curso Biologia molecular para programadores: http://tinyurl.com/2vv8w6 Mostr? tu c?digo: http://www.pastecode.com.ar GPG Fingerprint: 9470 0980 620D ABFC BE63 A4A4 A3DE C97D 8422 D43D From jason at bioperl.org Sat Apr 26 19:23:37 2008 From: jason at bioperl.org (Jason Stajich) Date: Sat, 26 Apr 2008 12:23:37 -0700 Subject: [Bioperl-l] bioperl installation problem In-Reply-To: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> References: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> Message-ID: the error refers to the 'Graph' module not 'GD::Graph'; -jason On Apr 26, 2008, at 10:49 AM, Sebastian Bassi wrote: > I tried to install bioperl because I need to install cviewer. > Here (http://www.pastecode.com.ar/f37c1cd60) are both stdout and > sdterr outputs. > > Here is one of the errors I get: > > set_attribute: not a compat02 graph at > /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 10. > sleeping for 3 seconds > set_attribute: not a compat02 graph at > /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 14. > > But I have GD::Graph, so I don't know what is going on: > > sbassi at ubuntuMAP:~$ sudo perl -MCPAN -e 'install GD::Graph' > CPAN: Storable loaded ok > Going to read /home/sbassi/.cpan/Metadata > Database was generated on Fri, 25 Apr 2008 09:29:45 GMT > GD::Graph is up to date. > > Any help regarding this: http://www.pastecode.com.ar/f37c1cd60 > would be appreciated. > > Best, > SB. > > -- > Sebasti?n Bassi (???????). Diplomado en Ciencia y > Tecnolog?a. > Curso Biologia molecular para programadores: http://tinyurl.com/2vv8w6 > Mostr? tu c?digo: http://www.pastecode.com.ar > GPG Fingerprint: 9470 0980 620D ABFC BE63 A4A4 A3DE C97D 8422 D43D > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sbassi at clubdelarazon.org Sat Apr 26 21:08:13 2008 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Sat, 26 Apr 2008 18:08:13 -0300 Subject: [Bioperl-l] bioperl installation problem In-Reply-To: References: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> Message-ID: <9e2f512b0804261408l45ff9f91j94f44065d21cd65f@mail.gmail.com> On Sat, Apr 26, 2008 at 4:23 PM, Jason Stajich wrote: > the error refers to the 'Graph' module not 'GD::Graph'; You are right, but I have it also installed: sbassi at ubuntuMAP:~$ sudo perl -MCPAN -e 'install Graph' Password: CPAN: Storable loaded ok Going to read /home/sbassi/.cpan/Metadata Database was generated on Fri, 25 Apr 2008 09:29:45 GMT Graph is up to date. -- Sebasti?n Bassi (???????). Diplomado en Ciencia y Tecnolog?a. Curso Biologia molecular para programadores: http://tinyurl.com/2vv8w6 Mostr? tu c?digo: http://www.pastecode.com.ar GPG Fingerprint: 9470 0980 620D ABFC BE63 A4A4 A3DE C97D 8422 D43D From bix at sendu.me.uk Sat Apr 26 23:30:56 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 27 Apr 2008 00:30:56 +0100 Subject: [Bioperl-l] bioperl installation problem In-Reply-To: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> References: <9e2f512b0804261049s4c1d829cy79b702f6f5680474@mail.gmail.com> Message-ID: <4813BB30.6060703@sendu.me.uk> Sebastian Bassi wrote: > I tried to install bioperl because I need to install cviewer. > Here (http://www.pastecode.com.ar/f37c1cd60) are both stdout and sdterr outputs. > > Here is one of the errors I get: > > set_attribute: not a compat02 graph at > /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 10. > sleeping for 3 seconds > set_attribute: not a compat02 graph at > /usr/local/share/perl/5.8.7/Graph.pm line 2394, line 14. You're trying to install a very old version of Bioperl which apparently uses behaviour of the Graph module no longer supported: http://search.cpan.org/~jhi/Graph-0.84/lib/Graph.pod#Backward_compatibility_with_Graph_0.2 Your options are to force install your desired version of Bioperl (if you don't need to use the modules that are causing the errors you get), downgrade your version of Graph to pre-0.2, or install the latest version of Bioperl (1.5.2 or from svn). From dr.hogart at gmail.com Sun Apr 27 14:05:20 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Sun, 27 Apr 2008 18:05:20 +0400 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics Message-ID: Hi all, is it possible to add a GD::graphic object (chart) to Bio::Graphics panel to obtain a file with image of both the chart and bioseq object? From Russell.Smithies at agresearch.co.nz Sun Apr 27 21:27:23 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 28 Apr 2008 09:27:23 +1200 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics In-Reply-To: References: Message-ID: You can get the GD object back from the Bio::Graphics::Panel then draw on it using GD methods Eg: #create a BioPerl panel my $panel = Bio::Graphics::Panel->new( -length => 600 -width => 800, -bgcolor => 'white' ); # add your features my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => 200,); $panel->add_track($feature, glyph => 'segments', -label => 0, -height => 30, -bgcolor => 'red', -fgcolor => 'red' ); # grab the GD thingy my $gd = $panel->gd; #create a color - not sure if there's a better way? $black = $gd->colorAllocate(0,0,0); #draw on your GD thingy $gd->line(10,10,$panel->width -10,10,$black); $gd->string(gdSmallFont,20,10,'test' ,'$black); # print it as normal print $panel->png; > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of sergei ryazansky > Sent: Monday, 28 April 2008 2:05 a.m. > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics > > Hi all, > > is it possible to add a GD::graphic object (chart) to Bio::Graphics panel > to obtain a file with image of both the chart and bioseq object? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From dr.hogart at gmail.com Mon Apr 28 00:25:18 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Mon, 28 Apr 2008 04:25:18 +0400 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics References: Message-ID: Thanks for answer! Yours script works fine, but nevertheless, as for as I understand 'gd' method return the gd::image object. But I need the to merge bioseq object with gd::graph object (gd::graph::area). Is it possible? Or maybe I misunderstood something in your example? On Mon, 28 Apr 2008 01:27:23 +0400, Smithies, Russell wrote: > You can get the GD object back from the Bio::Graphics::Panel then draw > on it using GD methods > > Eg: > > #create a BioPerl panel > my $panel = Bio::Graphics::Panel->new( > -length => 600 > -width => 800, > -bgcolor => 'white' > ); > # add your features > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => > 200,); > $panel->add_track($feature, glyph => 'segments', > -label => 0, > -height => 30, > -bgcolor => 'red', > -fgcolor => 'red' > ); > > # grab the GD thingy > my $gd = $panel->gd; > > #create a color - not sure if there's a better way? > $black = $gd->colorAllocate(0,0,0); > > #draw on your GD thingy > $gd->line(10,10,$panel->width -10,10,$black); > $gd->string(gdSmallFont,20,10,'test' ,'$black); > > # print it as normal > print $panel->png; > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of sergei ryazansky >> Sent: Monday, 28 April 2008 2:05 a.m. >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics >> >> Hi all, >> >> is it possible to add a GD::graphic object (chart) to Bio::Graphics > panel >> to obtain a file with image of both the chart and bioseq object? >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= From Bank.Beszteri at awi.de Mon Apr 28 12:18:20 2008 From: Bank.Beszteri at awi.de (=?UTF-8?B?QsOhbmsgQmVzenRlcmk=?=) Date: Mon, 28 Apr 2008 14:18:20 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <47FB204F.90405@awi.de> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> Message-ID: <4815C08C.1060305@awi.de> Dear BioSQL / bioperl-db-ists, I would like to share my experiences with trying to load uniprot_trembl into a BioSQL db, and also to ask a couple of questions; perhaps some of you know the problems I encountered. I used bioperl-live and bioperl-db-live as of 2008-04-03 and uniprot_trembl.dat as of 2008-04-04. The command was like load_seqdatabase.pl --safe --logchunk 1000 --host dbserv --dbname abc --dbuser efg --dbpass xyz --driver mysql --namespace uniprot_trembl --format embl uniprot_trembl.dat although I split the dat file into 10 chunks and started them parallel to make it faster. This did not go quite as smoothly as Swissprot did. In the end, it seems to have loaded 5022284 entries of the 5443284 which appear to be there in the input file (when counting with grep -c "ID "). Besides the harmless taxonomy warnings which also appear with Swissprot (and have been discussed about here a couple of weeks ago and also earlier), there came a couple of more serious errors. Perhaps some of you know them already: First of all, the below error seems to lead to a crash, in spite of --safe: >>> ------------- EXCEPTION ------------- MSG: A1XDT7 seems to have an invalid species classification. STACK Bio::SeqIO::embl::_read_EMBL_Species /home/biocl/bbeszter/lib/bioperl-live/bioperl-live/Bio/SeqIO/embl.pm:108 7 STACK Bio::SeqIO::embl::next_seq /home/biocl/bbeszter/lib/bioperl-live/bioperl-live/Bio/SeqIO/embl.pm:320 STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:634 ------------------------------------- Command exited with non-zero status 255 <<< What this is about is NCBI Tax_ID:435 (Acetobacter aceti; it has some 30 synonyms in my DB, too), which, to me, looks like a completely normal taxon: I could follow its taxonomy up to the root in my NCBI taxonomy in the BioSQL DB I used. I don?t know if someone else has seen / can reproduce the problem, or should I think about some problem with my taxonomy db? Besides, is it the expected behaviour from load_seqdatabase.pl to die upon this error? ################### The other problems did not lead to a crash, only to a failure to load the sequence, which would be what I?d expect with --safe. The first type of errors looks like >>> Could not store Q49I36: ------------- EXCEPTION ------------- MSG: Unique key query in Bio::DB::BioSQL::SpeciesAdaptor returned 2 rows instead of 1. Query was [name_class="scientific name",binomial="Onchocerca volvulus"] STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:958 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:854 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182 STACK Bio::DB::Persistent::PersistentObject::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:244 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:271 STACK (eval) /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:630 STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:612 ------------------------------------- <<< In this particular case, "Onchocerca volvulus" does indeed have two taxon_ids in my DB (6282 and 563188, of which only the first one is returned by a web search at NCBI taxonomy); but the same thing happened with a number of other taxa (followed by how many times the above error was caused by the particular taxa): Wolbachia pipientis 64 Hemerocallis sp. 1 Hypsiglena torquata 3 Salmonella enterica 1211 Burkholderia sp. 31 Streptococcus sp. 4 Rhizobium sp. 600 Nostoc sp. 19 Drosophila sp. 18 Onchocerca volvulus 62 Atlapetes schistaceus 4 Symbiodinium sp. 3 Escherichia coli 7421 Hieraaetus fasciatus 4 Borrelia burgdorferi group 1 Pseudomonas sp. 29 Rotavirus A 1076 Gorilla gorilla 746 Rana plancyi 14 unclassified sequences 1 (This should be 11312 cases altogether, but the list might be incomplete because I accidentally removed one of my logs, which contained STDOUT &STDERR ~ for 10 % of the entries) Again, is this a known problem for some of you, or could there be a problem with my copy of NCBI taxonomy? I don?t remember having updated it after the initial upload, so I?m quite surprised by such duplicate entries.... ################### Type 2 error w/o crash: >>> Could not store A5HU09: ------------- EXCEPTION ------------- MSG: create: object (Bio::Species) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206 STACK Bio::DB::Persistent::PersistentObject::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:244 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:271 STACK (eval) /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:630 STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/scripts/biosql/load_seqdatabase.pl:612 <<< This particular record has the NCBI_TaxID 44271, which looks completely normal in the NCBI taxonomy loaded in my BioSQL DB, but the same problem appeared in 53 further cases (I could not look into them in detail as yet to see whether they were all the same species). On the other hand, 7 records which were succesfully loaded have this taxonomy ID in the DB (44271). ################### Nr 3 no crash: >>> Could not store Q6T859: Unmatched ( in regex; marked by <-- HERE in m/Camelina microcarpa (Littlepod false flax) ( <-- HERE microcarpa subsp.\s+/ at /home/biocl/bbeszter/lib/bioperl-live/bioperl-live/Bio/Species.pm line 466, line 357048. <<< This happens in the sub binomial in Species.pm using the option "FULL", which requests to also return subspecies. I have not looked much deeper into this yet, but is it possible that there is a parsing problem with multi-line species strings? In the above case the OS field in uniprot_trembl.dat looks like OS Camelina microcarpa (Littlepod false flax) (Camelina microcarpa subsp. OS sylvestris). ################### I?m still looking for where the remaining records disappeared: of the 421000 records not showing up in the DB, I could find these: crasher (Tax_ID=435): 45 entries problem 1 ("MSG: Unique key query in Bio::DB::BioSQL::SpeciesAdaptor returned 2 rows instead of 1."): 11312 entries problem 2 ("MSG: create: object (Bio::Species) failed to insert or to be found by unique key"): 54 entries problem 3 ("Unmatched ( in regex"): 28241 entries 381348 still remain... Although these could in principle come from the first 10 %, for which I don?t have the output, but they don?t seem to: after restarting that chunk, I get ~ 30 "Could not store" errors. So the last question: are there any error messages I can expect which don?t contain "Could not store" and which I thus missed here? Bank Beszteri Bioinformatics Alfred Wegener Institute for Polar and Marine Research Am Handelshafen 12 27570 Bremerhaven From cjfields at uiuc.edu Mon Apr 28 13:20:39 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 28 Apr 2008 08:20:39 -0500 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <4815C08C.1060305@awi.de> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> <4815C08C.1060305@awi.de> Message-ID: <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> On Apr 28, 2008, at 7:18 AM, B?nk Beszteri wrote: > Dear BioSQL / bioperl-db-ists, > > I would like to share my experiences with trying to load > uniprot_trembl into a BioSQL db, and also to ask a couple of > questions; perhaps some of you know the problems I encountered. I > used bioperl-live and bioperl-db-live as of 2008-04-03 and > uniprot_trembl.dat as of 2008-04-04. The command was like > > load_seqdatabase.pl --safe --logchunk 1000 --host dbserv --dbname > abc --dbuser efg --dbpass xyz --driver mysql --namespace > uniprot_trembl --format embl uniprot_trembl.dat > > .... > > First of all, the below error seems to lead to a crash, in spite of > --safe: > > >>> > ------------- EXCEPTION ------------- > MSG: A1XDT7 seems to have an invalid species classification. > STACK Bio::SeqIO::embl::_read_EMBL_Species /home/biocl/bbeszter/lib/ > bioperl-live/bioperl-live/Bio/SeqIO/embl.pm:108 > 7 > STACK Bio::SeqIO::embl::next_seq /home/biocl/bbeszter/lib/bioperl- > live/bioperl-live/Bio/SeqIO/embl.pm:320 > STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/ > scripts/biosql/load_seqdatabase.pl:634 > ------------------------------------- > > Command exited with non-zero status 255 > <<< > > What this is about is NCBI Tax_ID:435 (Acetobacter aceti; it has > some 30 synonyms in my DB, too), which, to me, looks like a > completely normal taxon: I could follow its taxonomy up to the root > in my NCBI taxonomy in the BioSQL DB I used. I don?t know if someone > else has seen / can reproduce the problem, or should I think about > some problem with my taxonomy db? Besides, is it the expected > behaviour from load_seqdatabase.pl to die upon this error? ... You should use 'swiss' format instead of 'embl' when loading Uniprot/ SwissProt sequences. Though on the surface they're similar the feature table (among other things) is completely different. I'm not sure if that's causing all of the issues here but it certainly could contribute to them. In the meantime, it's much easier for us to track these problems if you file a bug (BioPerl, file for bioperl-db): http://bugzilla.open-bio.org/ chris From cjfields at uiuc.edu Sun Apr 27 21:54:03 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 27 Apr 2008 16:54:03 -0500 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics In-Reply-To: References: Message-ID: I think this is how some of the synteny mapping is done using SynBrowse (the trapezoids connecting syntenous genes on different tracks). http://www.gmod.org/wiki/index.php/SynView chris On Apr 27, 2008, at 4:27 PM, Smithies, Russell wrote: > You can get the GD object back from the Bio::Graphics::Panel then > draw > on it using GD methods > > Eg: > > #create a BioPerl panel > my $panel = Bio::Graphics::Panel->new( > -length => 600 > -width => 800, > -bgcolor => 'white' > ); > # add your features > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => > 200,); > $panel->add_track($feature, glyph => 'segments', > -label => 0, > -height => 30, > -bgcolor => 'red', > -fgcolor => 'red' > ); > > # grab the GD thingy > my $gd = $panel->gd; > > #create a color - not sure if there's a better way? > $black = $gd->colorAllocate(0,0,0); > > #draw on your GD thingy > $gd->line(10,10,$panel->width -10,10,$black); > $gd->string(gdSmallFont,20,10,'test' ,'$black); > > # print it as normal > print $panel->png; > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of sergei ryazansky >> Sent: Monday, 28 April 2008 2:05 a.m. >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics >> >> Hi all, >> >> is it possible to add a GD::graphic object (chart) to Bio::Graphics > panel >> to obtain a file with image of both the chart and bioseq object? >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Bank.Beszteri at awi.de Mon Apr 28 13:51:53 2008 From: Bank.Beszteri at awi.de (=?ISO-8859-1?Q?B=E1nk_Beszteri?=) Date: Mon, 28 Apr 2008 15:51:53 +0200 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> <4815C08C.1060305@awi.de> <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> Message-ID: <4815D679.3070307@awi.de> Chris Fields schrieb: > > ... > > You should use 'swiss' format instead of 'embl' when loading > Uniprot/SwissProt sequences. Though on the surface they're similar > the feature table (among other things) is completely different. I'm > not sure if that's causing all of the issues here but it certainly > could contribute to them. > > In the meantime, it's much easier for us to track these problems if > you file a bug (BioPerl, file for bioperl-db): > > http://bugzilla.open-bio.org/ > Hi Chris, I will do so; in the meanwhile: I?m not loading Swissprot, but TrEMBL. Is swiss also the appropriate format here? By reading http://expasy.org/sprot/userman.html#diffEMBL, I concluded that embl should be the one I?d need for TrEMBL. Bank From cjfields at uiuc.edu Mon Apr 28 16:24:39 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 28 Apr 2008 11:24:39 -0500 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <4815D679.3070307@awi.de> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> <4815C08C.1060305@awi.de> <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> <4815D679.3070307@awi.de> Message-ID: On Apr 28, 2008, at 8:51 AM, B?nk Beszteri wrote: > Chris Fields schrieb: >> >> ... >> >> You should use 'swiss' format instead of 'embl' when loading >> Uniprot/SwissProt sequences. Though on the surface they're similar >> the feature table (among other things) is completely different. >> I'm not sure if that's causing all of the issues here but it >> certainly could contribute to them. >> >> In the meantime, it's much easier for us to track these problems if >> you file a bug (BioPerl, file for bioperl-db): >> >> http://bugzilla.open-bio.org/ >> > Hi Chris, > > I will do so; in the meanwhile: I?m not loading Swissprot, but > TrEMBL. Is swiss also the appropriate format here? By reading http://expasy.org/sprot/userman.html#diffEMBL > , I concluded that embl should be the one I?d need for TrEMBL. > > Bank The section you link to describes several important differences between EMBL and SwissProt/UniProt format (i.e. how each indicated line type differs between SwissProt and EMBL formats, including ID, AC, OS/OC, FT, etc). I'm unsure how you derived that 'embl' would work from that, e.g. they are close, but there are enough significant differences that using 'embl' for SwissProt (or vice versa) will not work as intended, if at all. chris From hlapp at gmx.net Mon Apr 28 19:46:07 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 28 Apr 2008 15:46:07 -0400 Subject: [Bioperl-l] Indexing large databases / BioSQL In-Reply-To: <4815D679.3070307@awi.de> References: <19992.156.83.1.157.1207579017.squirrel@webmail.xs4all.nl> <47FB204F.90405@awi.de> <4815C08C.1060305@awi.de> <5C383B1F-92AD-4194-B9B4-007AE51A092F@uiuc.edu> <4815D679.3070307@awi.de> Message-ID: <3BD6A261-D023-4A5F-9CBC-C3216B0145F0@gmx.net> On Apr 28, 2008, at 9:51 AM, B?nk Beszteri wrote: > I?m not loading Swissprot, but TrEMBL. Is swiss also the > appropriate format here? Yes, though I guess it can be confusing. Maybe we should create a symlink uniprot.pm to swiss.pm, or in fact fork them if UniProt starts accumulating enough differences from the traditional Swissprot format. BTW as you had noticed, the --safe switch only protects the script from crashing due to a db loading error. A parsing error will still cause a crash. I guess you can argue that that's not nice, and having a chance to skip over the record that offends the (BioPerl) parser would be useful. The problem is that if the parser errors out, it's not guaranteed where we are in the file and whether the parser module is in a state that it can recover itself from. For the database it's a bit easier as one just needs to rollback() the transaction (each sequence is its own transaction). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From Russell.Smithies at agresearch.co.nz Mon Apr 28 21:15:16 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 29 Apr 2008 09:15:16 +1200 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics In-Reply-To: References: Message-ID: I thought it was a bit of a hack but I guess if someone else is doing it too, it can't be all bad :-) It looks like you can combine your drawing methods like this: (I'm sure Lincoln will tell us this is bad but it seems to work ok) ------------------------------------------------------------------------ ------------- #!perl -w use GD::Graph::lines; use GD::Graph::colour; use GD::Graph::Data; use Bio::Graphics; use Bio::SeqFeature::Generic; # create and draw on a graphics panel my $panel = Bio::Graphics::Panel->new( -length => 500, -width => 500 ); my $track = $panel->add_track( -glyph => 'generic', -label => 1 ); # create and add a few features for($i = 100; $i < 500; $i+= 100){ my $feature = Bio::SeqFeature::Generic->new( -display_name => "feature: $i", -score => $i, -start => $i, -end => $i + 100 ); $track->add_feature($feature); } # create and draw the graph my @data = ( ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], [ sort { $a <=> $b } (1, 2, 5, 6, 3, 1.5, 1, 3, 4) ] ); my $graph = GD::Graph::lines->new(500, 300); $graph->set( x_label => 'X Label', y_label => 'Y label', title => 'Some simple graph', y_max_value => 8, y_tick_number => 8, y_label_skip => 2 ) or die $graph->error; $graph->set( dclrs => [ qw( green blue black red pink) ] ); my $gd = $graph->plot(\@data) or die $graph->error; # combine the two images my $combined = $panel->gd($gd); open(IMG, '>file.png') or die $!; binmode IMG; print IMG $combined->png; ------------------------------------------------------------------------ ------------------ > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Monday, 28 April 2008 9:54 a.m. > To: Smithies, Russell > Cc: sergei ryazansky; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] addition of GD::graphic object to Bio::Graphics > > I think this is how some of the synteny mapping is done using > SynBrowse (the trapezoids connecting syntenous genes on different > tracks). > > http://www.gmod.org/wiki/index.php/SynView > > chris > > On Apr 27, 2008, at 4:27 PM, Smithies, Russell wrote: > > > You can get the GD object back from the Bio::Graphics::Panel then > > draw > > on it using GD methods > > > > Eg: > > > > #create a BioPerl panel > > my $panel = Bio::Graphics::Panel->new( > > -length => 600 > > -width => 800, > > -bgcolor => 'white' > > ); > > # add your features > > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => > > 200,); > > $panel->add_track($feature, glyph => 'segments', > > -label => 0, > > -height => 30, > > -bgcolor => 'red', > > -fgcolor => 'red' > > ); > > > > # grab the GD thingy > > my $gd = $panel->gd; > > > > #create a color - not sure if there's a better way? > > $black = $gd->colorAllocate(0,0,0); > > > > #draw on your GD thingy > > $gd->line(10,10,$panel->width -10,10,$black); > > $gd->string(gdSmallFont,20,10,'test' ,'$black); > > > > # print it as normal > > print $panel->png; > > > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open- > >> bio.org] On Behalf Of sergei ryazansky > >> Sent: Monday, 28 April 2008 2:05 a.m. > >> To: bioperl-l at bioperl.org > >> Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics > >> > >> Hi all, > >> > >> is it possible to add a GD::graphic object (chart) to Bio::Graphics > > panel > >> to obtain a file with image of both the chart and bioseq object? > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > = > > > ============================================================= > ========= > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other use of, > > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > = > > > ============================================================= > ========= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From lincoln.stein at gmail.com Mon Apr 28 21:33:19 2008 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 28 Apr 2008 17:33:19 -0400 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics In-Reply-To: References: Message-ID: <6dce9a0b0804281433i697cda2fo2c47ce59010d0858@mail.gmail.com> Hi, No, I'm perfectly happy with combining images like this. It is part of what I intended. Another idea would be to use the Image glyph to embed graphs at particular genomic locations in the panel. Right now the glyph is designed in the expectation that the image passed to it is sitting on the file system (or a web URL), but it would be easy to modify it so that a callback can generate the GD on the fly, by using, for example GD::Graph. Lincoln On Mon, Apr 28, 2008 at 5:15 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > I thought it was a bit of a hack but I guess if someone else is doing it > too, it can't be all bad :-) > > It looks like you can combine your drawing methods like this: > (I'm sure Lincoln will tell us this is bad but it seems to work ok) > ------------------------------------------------------------------------ > ------------- > > #!perl -w > use GD::Graph::lines; > use GD::Graph::colour; > use GD::Graph::Data; > > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > # create and draw on a graphics panel > my $panel = Bio::Graphics::Panel->new( > -length => 500, > -width => 500 > ); > my $track = $panel->add_track( > -glyph => 'generic', > -label => 1 > ); > > # create and add a few features > for($i = 100; $i < 500; $i+= 100){ > my $feature = Bio::SeqFeature::Generic->new( > -display_name => "feature: > $i", > -score => $i, > -start => $i, > -end => $i + 100 > ); > $track->add_feature($feature); > } > > > # create and draw the graph > my @data = ( > ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], > [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], > [ sort { $a <=> $b } (1, 2, 5, 6, 3, 1.5, 1, 3, 4) ] > ); > my $graph = GD::Graph::lines->new(500, 300); > > $graph->set( > x_label => 'X Label', > y_label => 'Y label', > title => 'Some simple graph', > y_max_value => 8, > y_tick_number => 8, > y_label_skip => 2 > ) or die $graph->error; > > $graph->set( dclrs => [ qw( green blue black red pink) ] ); > > my $gd = $graph->plot(\@data) or die $graph->error; > > # combine the two images > my $combined = $panel->gd($gd); > > open(IMG, '>file.png') or die $!; > binmode IMG; > print IMG $combined->png; > > ------------------------------------------------------------------------ > ------------------ > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > Sent: Monday, 28 April 2008 9:54 a.m. > > To: Smithies, Russell > > Cc: sergei ryazansky; bioperl-l at bioperl.org > > Subject: Re: [Bioperl-l] addition of GD::graphic object to > Bio::Graphics > > > > I think this is how some of the synteny mapping is done using > > SynBrowse (the trapezoids connecting syntenous genes on different > > tracks). > > > > http://www.gmod.org/wiki/index.php/SynView > > > > chris > > > > On Apr 27, 2008, at 4:27 PM, Smithies, Russell wrote: > > > > > You can get the GD object back from the Bio::Graphics::Panel then > > > draw > > > on it using GD methods > > > > > > Eg: > > > > > > #create a BioPerl panel > > > my $panel = Bio::Graphics::Panel->new( > > > -length => 600 > > > -width => > 800, > > > -bgcolor => 'white' > > > ); > > > # add your features > > > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => > > > 200,); > > > $panel->add_track($feature, glyph => 'segments', > > > -label => 0, > > > -height => 30, > > > -bgcolor => 'red', > > > -fgcolor => 'red' > > > ); > > > > > > # grab the GD thingy > > > my $gd = $panel->gd; > > > > > > #create a color - not sure if there's a better way? > > > $black = $gd->colorAllocate(0,0,0); > > > > > > #draw on your GD thingy > > > $gd->line(10,10,$panel->width -10,10,$black); > > > $gd->string(gdSmallFont,20,10,'test' ,'$black); > > > > > > # print it as normal > > > print $panel->png; > > > > > > > > > > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org > > > [mailto:bioperl-l-bounces at lists.open- > > >> bio.org] On Behalf Of sergei ryazansky > > >> Sent: Monday, 28 April 2008 2:05 a.m. > > >> To: bioperl-l at bioperl.org > > >> Subject: [Bioperl-l] addition of GD::graphic object to > Bio::Graphics > > >> > > >> Hi all, > > >> > > >> is it possible to add a GD::graphic object (chart) to Bio::Graphics > > > panel > > >> to obtain a file with image of both the chart and bioseq object? > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > = > > > > > ============================================================= > > ========= > > > Attention: The information contained in this message and/or > > > attachments > > > from AgResearch Limited is intended only for the persons or entities > > > to which it is addressed and may contain confidential and/or > > > privileged > > > material. Any review, retransmission, dissemination or other use of, > > > or > > > taking of any action in reliance upon, this information by persons > or > > > entities other than the intended recipients is prohibited by > > > AgResearch > > > Limited. If you have received this message in error, please notify > the > > > sender immediately. > > > = > > > > > ============================================================= > > ========= > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dr.hogart at gmail.com Tue Apr 29 07:56:24 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Tue, 29 Apr 2008 11:56:24 +0400 Subject: [Bioperl-l] addition of GD::graphic object to Bio::Graphics References: Message-ID: Thank you very much! It is exactly that I was looking for. On Tue, 29 Apr 2008 01:15:16 +0400, Smithies, Russell wrote: > I thought it was a bit of a hack but I guess if someone else is doing it > too, it can't be all bad :-) > > It looks like you can combine your drawing methods like this: > (I'm sure Lincoln will tell us this is bad but it seems to work ok) > ------------------------------------------------------------------------ > ------------- > > #!perl -w > use GD::Graph::lines; > use GD::Graph::colour; > use GD::Graph::Data; > > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > # create and draw on a graphics panel > my $panel = Bio::Graphics::Panel->new( > -length => 500, > -width => 500 > ); > my $track = $panel->add_track( > -glyph => 'generic', > -label => 1 > ); > > # create and add a few features > for($i = 100; $i < 500; $i+= 100){ > my $feature = Bio::SeqFeature::Generic->new( > -display_name => "feature: > $i", > -score => $i, > -start => $i, > -end => $i + 100 > ); > $track->add_feature($feature); > } > > > # create and draw the graph > my @data = ( > ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], > [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], > [ sort { $a <=> $b } (1, 2, 5, 6, 3, 1.5, 1, 3, 4) ] > ); > my $graph = GD::Graph::lines->new(500, 300); > > $graph->set( > x_label => 'X Label', > y_label => 'Y label', > title => 'Some simple graph', > y_max_value => 8, > y_tick_number => 8, > y_label_skip => 2 > ) or die $graph->error; > > $graph->set( dclrs => [ qw( green blue black red pink) ] ); > > my $gd = $graph->plot(\@data) or die $graph->error; > > # combine the two images > my $combined = $panel->gd($gd); > > open(IMG, '>file.png') or die $!; > binmode IMG; > print IMG $combined->png; > > ------------------------------------------------------------------------ > ------------------ > >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at uiuc.edu] >> Sent: Monday, 28 April 2008 9:54 a.m. >> To: Smithies, Russell >> Cc: sergei ryazansky; bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] addition of GD::graphic object to > Bio::Graphics >> >> I think this is how some of the synteny mapping is done using >> SynBrowse (the trapezoids connecting syntenous genes on different >> tracks). >> >> http://www.gmod.org/wiki/index.php/SynView >> >> chris >> >> On Apr 27, 2008, at 4:27 PM, Smithies, Russell wrote: >> >> > You can get the GD object back from the Bio::Graphics::Panel then >> > draw >> > on it using GD methods >> > >> > Eg: >> > >> > #create a BioPerl panel >> > my $panel = Bio::Graphics::Panel->new( >> > -length => 600 >> > -width => > 800, >> > -bgcolor => 'white' >> > ); >> > # add your features >> > my $feature = Bio::SeqFeature::Generic->new( -start => 1,-end => >> > 200,); >> > $panel->add_track($feature, glyph => 'segments', >> > -label => 0, >> > -height => 30, >> > -bgcolor => 'red', >> > -fgcolor => 'red' >> > ); >> > >> > # grab the GD thingy >> > my $gd = $panel->gd; >> > >> > #create a color - not sure if there's a better way? >> > $black = $gd->colorAllocate(0,0,0); >> > >> > #draw on your GD thingy >> > $gd->line(10,10,$panel->width -10,10,$black); >> > $gd->string(gdSmallFont,20,10,'test' ,'$black); >> > >> > # print it as normal >> > print $panel->png; >> > >> > >> > >> > >> >> -----Original Message----- >> >> From: bioperl-l-bounces at lists.open-bio.org >> > [mailto:bioperl-l-bounces at lists.open- >> >> bio.org] On Behalf Of sergei ryazansky >> >> Sent: Monday, 28 April 2008 2:05 a.m. >> >> To: bioperl-l at bioperl.org >> >> Subject: [Bioperl-l] addition of GD::graphic object to > Bio::Graphics >> >> >> >> Hi all, >> >> >> >> is it possible to add a GD::graphic object (chart) to Bio::Graphics >> > panel >> >> to obtain a file with image of both the chart and bioseq object? >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > = >> > >> ============================================================= >> ========= >> > Attention: The information contained in this message and/or >> > attachments >> > from AgResearch Limited is intended only for the persons or entities >> > to which it is addressed and may contain confidential and/or >> > privileged >> > material. Any review, retransmission, dissemination or other use of, >> > or >> > taking of any action in reliance upon, this information by persons > or >> > entities other than the intended recipients is prohibited by >> > AgResearch >> > Limited. If you have received this message in error, please notify > the >> > sender immediately. >> > = >> > >> ============================================================= >> ========= >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= -- ?????????? M2, ????????????? ???????? ?????????? Opera: http://www.opera.com/mail/mail/ From d.gatherer at mrcvu.gla.ac.uk Tue Apr 29 12:21:05 2008 From: d.gatherer at mrcvu.gla.ac.uk (Derek Gatherer) Date: Tue, 29 Apr 2008 13:21:05 +0100 Subject: [Bioperl-l] translate() oddities Message-ID: Hi I thought I'd better run this by the community before I embarrass myself on Bugzilla. It seems like a clear bug to me. I'm running Bioperl 1.5.0 on RedHat. For a test input: >test ATGATGATGATGATGTGA the following code is fine. while((my $seqobj = $seq_in->next_seq())) { print "\n".$seqobj->display_id; my $len = $seqobj->length(); print " length: $len"; my $frame1_obj = $seqobj->translate(); my $f1_prot = $frame1_obj->seq(); print "\n$f1_prot"; } Output: test length: 18 MMMMM* But if I want to change the frame as specified in the BioPerl tutorial, by using: my $frame1_obj = $seqobj->translate(frame => 1); # which should now give frame 2, I get: test length: 18 MMMMM-frame The frame is unchanged and the text "-frame" is tacked on the end of the output. The same occurs with translate(frame => 2). Any ideas? Can something as fundamental as translate() really be bugged? or am I guilty of some particularly heinous syntax error? Cheers Derek From tristan.lefebure at gmail.com Tue Apr 29 13:58:21 2008 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Tue, 29 Apr 2008 09:58:21 -0400 Subject: [Bioperl-l] translate() oddities In-Reply-To: References: Message-ID: <200804290958.21548.tristan.lefebure@gmail.com> Aren't you forgetting the dash? my $frame1_obj = $seqobj->translate(-frame => 1) On Tuesday 29 April 2008 08:21:05 Derek Gatherer wrote: > my $frame1_obj = $seqobj->translate(frame => 1) -Tristan From d.gatherer at mrcvu.gla.ac.uk Tue Apr 29 14:05:03 2008 From: d.gatherer at mrcvu.gla.ac.uk (Derek Gatherer) Date: Tue, 29 Apr 2008 15:05:03 +0100 Subject: [Bioperl-l] translate() oddities In-Reply-To: <481726BF.1060609@bms.com> References: <481726BF.1060609@bms.com> Message-ID: Thanks Stefan Actually, there was a typo in my message, I did use -frame => 1. However, the problem disappears on upgrading from 1.5.0 to 1.5.2. So not a bug anymore. Cheers Derek At 14:46 29/04/2008, Stefan Kirov wrote: >my $frame1_obj = $seqobj->translate(-frame => 1); >not >my $frame1_obj = $seqobj->translate(frame => 1); >Stefan > >Derek Gatherer wrote: > > Hi > > > > I thought I'd better run this by the community before I embarrass > > myself on Bugzilla. It seems like a clear bug to me. I'm running > > Bioperl 1.5.0 on RedHat. > > > > For a test input: > > > > >test > > ATGATGATGATGATGTGA > > > > the following code is fine. > > > > while((my $seqobj = $seq_in->next_seq())) > > { > > print "\n".$seqobj->display_id; > > my $len = $seqobj->length(); > > print " length: $len"; > > my $frame1_obj = $seqobj->translate(); > > my $f1_prot = $frame1_obj->seq(); > > print "\n$f1_prot"; > > } > > > > Output: > > > > test length: 18 > > MMMMM* > > > > But if I want to change the frame as specified in the BioPerl > > tutorial, by using: > > > > my $frame1_obj = $seqobj->translate(frame => 1); # which should now > > give frame 2, I get: > > > > test length: 18 > > MMMMM-frame > > > > The frame is unchanged and the text "-frame" is tacked on the end of > > the output. The same occurs with translate(frame => 2). > > > > Any ideas? Can something as fundamental as translate() really be > > bugged? or am I guilty of some particularly heinous syntax error? > > > > Cheers > > Derek > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From l.douchy at gmail.com Tue Apr 29 14:16:40 2008 From: l.douchy at gmail.com (Laurent DOUCHY) Date: Tue, 29 Apr 2008 16:16:40 +0200 Subject: [Bioperl-l] translate() oddities In-Reply-To: <200804290958.21548.tristan.lefebure@gmail.com> References: <200804290958.21548.tristan.lefebure@gmail.com> Message-ID: <2fb209dd0804290716x36e403dek55978dc4f54e34ff@mail.gmail.com> Hello, I resolved this issue in Bio::seqIO with the following line : my $sequence = $seq->translate('*', 'X', '0', '1', '0', '0', '0', '0')->seq; the third parameter set the frame. I hope to have been helpful. laurent. On Tue, Apr 29, 2008 at 3:58 PM, Tristan Lefebure < tristan.lefebure at gmail.com> wrote: > Aren't you forgetting the dash? > > my $frame1_obj = $seqobj->translate(-frame => 1) > > > On Tuesday 29 April 2008 08:21:05 Derek Gatherer wrote: > > my $frame1_obj = $seqobj->translate(frame => 1) > > > > -Tristan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From roy.chaudhuri at gmail.com Tue Apr 29 14:27:10 2008 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 29 Apr 2008 15:27:10 +0100 Subject: [Bioperl-l] translate() oddities In-Reply-To: References: <481726BF.1060609@bms.com> Message-ID: <4817303E.1040903@gmail.com> Spent two minutes looking at this, so may as well chip in with what I discovered even though you solved your problem. This "bug" comes about because in version 1.5.1 and earlier, the arguments to translate were a simple list, with the first argument the terminator (defaults to "*"). Your old version therefore assumed that you wanted to translate the stop codon to "-frame". Amusingly given your typo, if you miss the hyphen off the frame argument in version 1.5.2 it reverts to the old interface and you end up with the output "MMMMMframe". The moral of the story is of course to read the docs relevant to the version you are using. Roy. -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. Derek Gatherer wrote: > Thanks Stefan > > Actually, there was a typo in my message, I did use -frame => > 1. However, the problem disappears on upgrading from 1.5.0 to 1.5.2. > > So not a bug anymore. > > Cheers > Derek > > At 14:46 29/04/2008, Stefan Kirov wrote: >> my $frame1_obj = $seqobj->translate(-frame => 1); >> not >> my $frame1_obj = $seqobj->translate(frame => 1); >> Stefan >> >> Derek Gatherer wrote: >>> Hi >>> >>> I thought I'd better run this by the community before I embarrass >>> myself on Bugzilla. It seems like a clear bug to me. I'm running >>> Bioperl 1.5.0 on RedHat. >>> >>> For a test input: >>> >>>> test >>> ATGATGATGATGATGTGA >>> >>> the following code is fine. >>> >>> while((my $seqobj = $seq_in->next_seq())) >>> { >>> print "\n".$seqobj->display_id; >>> my $len = $seqobj->length(); >>> print " length: $len"; >>> my $frame1_obj = $seqobj->translate(); >>> my $f1_prot = $frame1_obj->seq(); >>> print "\n$f1_prot"; >>> } >>> >>> Output: >>> >>> test length: 18 >>> MMMMM* >>> >>> But if I want to change the frame as specified in the BioPerl >>> tutorial, by using: >>> >>> my $frame1_obj = $seqobj->translate(frame => 1); # which should now >>> give frame 2, I get: >>> >>> test length: 18 >>> MMMMM-frame >>> >>> The frame is unchanged and the text "-frame" is tacked on the end of >>> the output. The same occurs with translate(frame => 2). >>> >>> Any ideas? Can something as fundamental as translate() really be >>> bugged? or am I guilty of some particularly heinous syntax error? >>> >>> Cheers >>> Derek >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From stefan.kirov at bms.com Tue Apr 29 13:46:39 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Tue, 29 Apr 2008 09:46:39 -0400 Subject: [Bioperl-l] translate() oddities In-Reply-To: References: Message-ID: <481726BF.1060609@bms.com> my $frame1_obj = $seqobj->translate(-frame => 1); not my $frame1_obj = $seqobj->translate(frame => 1); Stefan Derek Gatherer wrote: > Hi > > I thought I'd better run this by the community before I embarrass > myself on Bugzilla. It seems like a clear bug to me. I'm running > Bioperl 1.5.0 on RedHat. > > For a test input: > > >test > ATGATGATGATGATGTGA > > the following code is fine. > > while((my $seqobj = $seq_in->next_seq())) > { > print "\n".$seqobj->display_id; > my $len = $seqobj->length(); > print " length: $len"; > my $frame1_obj = $seqobj->translate(); > my $f1_prot = $frame1_obj->seq(); > print "\n$f1_prot"; > } > > Output: > > test length: 18 > MMMMM* > > But if I want to change the frame as specified in the BioPerl > tutorial, by using: > > my $frame1_obj = $seqobj->translate(frame => 1); # which should now > give frame 2, I get: > > test length: 18 > MMMMM-frame > > The frame is unchanged and the text "-frame" is tacked on the end of > the output. The same occurs with translate(frame => 2). > > Any ideas? Can something as fundamental as translate() really be > bugged? or am I guilty of some particularly heinous syntax error? > > Cheers > Derek > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Apr 29 15:00:00 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 29 Apr 2008 10:00:00 -0500 Subject: [Bioperl-l] translate() oddities In-Reply-To: <4817303E.1040903@gmail.com> References: <481726BF.1060609@bms.com> <4817303E.1040903@gmail.com> Message-ID: <36045A08-AEA8-4639-A384-1DC53B5DC129@uiuc.edu> Yes the interface changed somewhat post 1.5.1, mainly to accept named parameters. I think a few methods do this now as passing in lists of more than 2 args, undef'ing those one doesn't want set, gets confusing. chris On Apr 29, 2008, at 9:27 AM, Roy Chaudhuri wrote: > Spent two minutes looking at this, so may as well chip in with what > I discovered even though you solved your problem. > > This "bug" comes about because in version 1.5.1 and earlier, the > arguments to translate were a simple list, with the first argument > the terminator (defaults to "*"). Your old version therefore assumed > that you wanted to translate the stop codon to "-frame". Amusingly > given your typo, if you miss the hyphen off the frame argument in > version 1.5.2 it reverts to the old interface and you end up with > the output "MMMMMframe". The moral of the story is of course to read > the docs relevant to the version you are using. > > Roy. > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > > Derek Gatherer wrote: >> Thanks Stefan >> Actually, there was a typo in my message, I did use -frame => 1. >> However, the problem disappears on upgrading from 1.5.0 to 1.5.2. >> So not a bug anymore. >> Cheers >> Derek >> At 14:46 29/04/2008, Stefan Kirov wrote: >>> my $frame1_obj = $seqobj->translate(-frame => 1); >>> not >>> my $frame1_obj = $seqobj->translate(frame => 1); >>> Stefan >>> >>> Derek Gatherer wrote: >>>> Hi >>>> >>>> I thought I'd better run this by the community before I embarrass >>>> myself on Bugzilla. It seems like a clear bug to me. I'm running >>>> Bioperl 1.5.0 on RedHat. >>>> >>>> For a test input: >>>> >>>>> test >>>> ATGATGATGATGATGTGA >>>> >>>> the following code is fine. >>>> >>>> while((my $seqobj = $seq_in->next_seq())) >>>> { >>>> print "\n".$seqobj->display_id; >>>> my $len = $seqobj->length(); >>>> print " length: $len"; >>>> my $frame1_obj = $seqobj->translate(); >>>> my $f1_prot = $frame1_obj->seq(); >>>> print "\n$f1_prot"; >>>> } >>>> >>>> Output: >>>> >>>> test length: 18 >>>> MMMMM* >>>> >>>> But if I want to change the frame as specified in the BioPerl >>>> tutorial, by using: >>>> >>>> my $frame1_obj = $seqobj->translate(frame => 1); # which should now >>>> give frame 2, I get: >>>> >>>> test length: 18 >>>> MMMMM-frame >>>> >>>> The frame is unchanged and the text "-frame" is tacked on the end >>>> of >>>> the output. The same occurs with translate(frame => 2). >>>> >>>> Any ideas? Can something as fundamental as translate() really be >>>> bugged? or am I guilty of some particularly heinous syntax error? >>>> >>>> Cheers >>>> Derek >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Apr 29 15:07:30 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 29 Apr 2008 10:07:30 -0500 Subject: [Bioperl-l] translate() oddities In-Reply-To: <481726BF.1060609@bms.com> References: <481726BF.1060609@bms.com> Message-ID: <18DB95FB-52B9-4091-ACEE-996891F8A5AE@uiuc.edu> As an aside, I've been playing around with perl6 (Rakudo) for a bit now. Parameter-like passing (using autoaccessors and other means) will be added in soon, so you will be able to do this: $seqobj = Seq.new(seq => 'ATGATGATGATGATGTGA', alphabet => 'dna'); my $protobj = $seq.translate(frame => 1); Yes, I'm a geek. ; > chris On Apr 29, 2008, at 8:46 AM, Stefan Kirov wrote: > my $frame1_obj = $seqobj->translate(-frame => 1); > not > my $frame1_obj = $seqobj->translate(frame => 1); > Stefan > > Derek Gatherer wrote: >> Hi >> >> I thought I'd better run this by the community before I embarrass >> myself on Bugzilla. It seems like a clear bug to me. I'm running >> Bioperl 1.5.0 on RedHat. >> >> For a test input: >> >>> test >> ATGATGATGATGATGTGA >> >> the following code is fine. >> >> while((my $seqobj = $seq_in->next_seq())) >> { >> print "\n".$seqobj->display_id; >> my $len = $seqobj->length(); >> print " length: $len"; >> my $frame1_obj = $seqobj->translate(); >> my $f1_prot = $frame1_obj->seq(); >> print "\n$f1_prot"; >> } >> >> Output: >> >> test length: 18 >> MMMMM* >> >> But if I want to change the frame as specified in the BioPerl >> tutorial, by using: >> >> my $frame1_obj = $seqobj->translate(frame => 1); # which should now >> give frame 2, I get: >> >> test length: 18 >> MMMMM-frame >> >> The frame is unchanged and the text "-frame" is tacked on the end of >> the output. The same occurs with translate(frame => 2). >> >> Any ideas? Can something as fundamental as translate() really be >> bugged? or am I guilty of some particularly heinous syntax error? >> >> Cheers >> Derek >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dr.hogart at gmail.com Tue Apr 29 15:57:51 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Tue, 29 Apr 2008 19:57:51 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine Message-ID: Hi all! I am trying to perform TCoffe aligment by Bio::Tools::Run::Alignment::TCoffee wrapper as subroutine into the script. This subroutine works fine, but it is not single subroutine - there are a lot of other ones in the script. The problem is when compilation of script finish execution (nb! successful execution) of tcoffee subroutine the compiliation of the end of the script also interrupted. It seems that the tcoffee program itself induce interraption of perl compilation. Is it possible to pass this problem? -- From darin.london at duke.edu Tue Apr 29 16:49:53 2008 From: darin.london at duke.edu (darin.london at duke.edu) Date: Tue, 29 Apr 2008 12:49:53 -0400 Subject: [Bioperl-l] BOSC 2008 Announcement and Call For Submissions Message-ID: <200804291650.m3TGnr0H020814@tenero.duhs.duke.edu> BOSC 2008 Call for Abstracts Reminder The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will take place in Toronto, Ontario, Canada, as one of several Special Interest Group (SIG) meetings occurring in conjunction with the 16th annual Intelligent Systems for Molecular Biology Conference (ISMB 2008). This is a reminder to submit your proposals for talks to the BOSC submission system before May 11. Submission Process: All abstracts must be submitted through our Open Conference Systems site (http://events.open-bio.org/BOSC2008/openconf.php). The form will ask for a small Abstract Text to be pasted into it, and a full paper. The small Abstract text should be a summary, while the longer abstract (should provide more details, including the open-source license requirement details) Full-length abstracts are limited to one page with one inch (2.5 cm) margins on the top, sides, and bottom. The full-length abstract should include the title, authors, and affiliations. We prefer your abstract to be in PDF format, although plain t Important Dates: May 11: Abstract submission deadline. June 2: Notification of accepted talks. June 4: Early registration discount cut-off. July 18-19: BOSC 2008! We hope to see you at BOSC 2008! Kam Dahlquist and Darin London BOSC 2008 Co-organizers From bix at sendu.me.uk Tue Apr 29 16:54:41 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 29 Apr 2008 17:54:41 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: Message-ID: <481752D1.7010904@sendu.me.uk> sergei ryazansky wrote: > I am trying to perform TCoffe aligment by > Bio::Tools::Run::Alignment::TCoffee wrapper as subroutine into the > script. This subroutine works fine, but it is not single subroutine - > there are a lot of other ones in the script. The problem is when > compilation of script finish execution (nb! successful execution) of > tcoffee subroutine the compiliation of the end of the script also > interrupted. It seems that the tcoffee program itself induce > interraption of perl compilation. Is it possible to pass this problem? You'll have to supply us with a minimal version of the script and the complete error message. From dr.hogart at gmail.com Wed Apr 30 11:24:35 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Wed, 30 Apr 2008 15:24:35 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine References: Message-ID: On Tue, 29 Apr 2008 19:57:51 +0400, sergei ryazansky wrote: > Hi all! > > I am trying to perform TCoffe aligment by > Bio::Tools::Run::Alignment::TCoffee wrapper as subroutine into the > script. This subroutine works fine, but it is not single subroutine - > there are a lot of other ones in the script. The problem is when > compilation of script finish execution (nb! successful execution) of > tcoffee subroutine the compiliation of the end of the script also > interrupted. It seems that the tcoffee program itself induce > interraption of perl compilation. Is it possible to pass this problem? > My subroutine is following: sub align { my $file=shift @_; my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => 'fasta', 'outfile' => 'temp_align.out'); my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); my $aln=$factory->align ($file); open (fy,'temp_align.out'); my @temp_file=; close fy; return @temp_file; } This subroutine is called by the following command: my @align_fa = align($inputfile_align); After successful execution of this subroutine (accompaning with the corresponding messages on the terminal window) the execution of remainder script is terminated without any error messages. -- From bix at sendu.me.uk Wed Apr 30 12:47:17 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 30 Apr 2008 13:47:17 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: Message-ID: <48186A55.4030406@sendu.me.uk> sergei ryazansky wrote: > My subroutine is following: > > sub align { > my $file=shift @_; > my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => > 'fasta', 'outfile' => 'temp_align.out'); > my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); > my $aln=$factory->align ($file); > open (fy,'temp_align.out'); my @temp_file=; close fy; > return @temp_file; > } > > This subroutine is called by the following command: > > my @align_fa = align($inputfile_align); > > After successful execution of this subroutine (accompaning with the > corresponding messages on the terminal window) the execution of > remainder script is terminated without any error messages. The problem lies somewhere within the rest of your script, so we have to see it if you want help. Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you don't make use of the resulting alignment object? A system call might make more sense given what you're doing. The beauty of Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse the result file (temp_align.out) yourself. From dr.hogart at gmail.com Wed Apr 30 13:36:58 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Wed, 30 Apr 2008 17:36:58 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine References: <48186A55.4030406@sendu.me.uk> Message-ID: On Wed, 30 Apr 2008 16:47:17 +0400, Sendu Bala wrote: > sergei ryazansky wrote: >> My subroutine is following: >> sub align { >> my $file=shift @_; >> my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => >> 'fasta', 'outfile' => 'temp_align.out'); >> my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >> my $aln=$factory->align ($file); >> open (fy,'temp_align.out'); my @temp_file=; close fy; >> return @temp_file; >> } >> This subroutine is called by the following command: >> my @align_fa = align($inputfile_align); >> After successful execution of this subroutine (accompaning with the >> corresponding messages on the terminal window) the execution of >> remainder script is terminated without any error messages. > > The problem lies somewhere within the rest of your script, so we have to > see it if you want help. > > Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you > don't make use of the resulting alignment object? A system call might > make more sense given what you're doing. The beauty of > Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse the > result file (temp_align.out) yourself. The rest of script,imho, is ok, because without this sub it is work fine. May be problem lies into the TCoffee itself? One of the feature of script is to estimate the quantity of nt changes in each position in the different similar sequences in comparing with consensus sequences. To perform this it is nesseccary to obtain the multiply alignment: the result of TCoffee alignment goes to another subroutine, that estemated the level of changes. Of course, I dont think that this way is the best approach, most probably there are a lot of the better ways to do it. But for my today purposes it is ok. -- From avilella at gmail.com Wed Apr 30 14:16:56 2008 From: avilella at gmail.com (Albert Vilella) Date: Wed, 30 Apr 2008 15:16:56 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> Message-ID: <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> Hi Sergei, Can you try to isolate this call with a simpler example to see if it still fails? When you say that the problems are in the compilation, do you mean that the interpreter won't even compile or that it fails during execution? Have you checked that you have all the dependencies right? Cheers, Albert. On Wed, Apr 30, 2008 at 2:36 PM, sergei ryazansky wrote: > On Wed, 30 Apr 2008 16:47:17 +0400, Sendu Bala wrote: > > sergei ryazansky wrote: > > > > > My subroutine is following: > > > sub align { > > > my $file=shift @_; > > > my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => > > > 'fasta', 'outfile' => 'temp_align.out'); > > > my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); > > > my $aln=$factory->align ($file); > > > open (fy,'temp_align.out'); my @temp_file=; close fy; > > > return @temp_file; > > > } > > > This subroutine is called by the following command: > > > my @align_fa = align($inputfile_align); > > > After successful execution of this subroutine (accompaning with the > > > corresponding messages on the terminal window) the execution of remainder > > > script is terminated without any error messages. > > > > > > > The problem lies somewhere within the rest of your script, so we have to > > see it if you want help. > > > > Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you > > don't make use of the resulting alignment object? A system call might make > > more sense given what you're doing. The beauty of > > Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse the > > result file (temp_align.out) yourself. > > > > The rest of script,imho, is ok, because without this sub it is work fine. > May be problem lies into the TCoffee itself? > > One of the feature of script is to estimate the quantity of nt changes in > each position in the different similar sequences in comparing with consensus > sequences. To perform this it is nesseccary to obtain the multiply > alignment: the result of TCoffee alignment goes to another subroutine, that > estemated the level of changes. Of course, I dont think that this way is the > best approach, most probably there are a lot of the better ways to do it. > But for my today purposes it is ok. > > -- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bix at sendu.me.uk Wed Apr 30 14:22:01 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 30 Apr 2008 15:22:01 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> Message-ID: <48188089.8000300@sendu.me.uk> sergei ryazansky wrote: > On Wed, 30 Apr 2008 16:47:17 +0400, Sendu Bala wrote: > >> sergei ryazansky wrote: >>> My subroutine is following: >>> sub align { >>> my $file=shift @_; >>> my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => >>> 'fasta', 'outfile' => 'temp_align.out'); >>> my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >>> my $aln=$factory->align ($file); >>> open (fy,'temp_align.out'); my @temp_file=; close fy; >>> return @temp_file; >>> } >>> This subroutine is called by the following command: >>> my @align_fa = align($inputfile_align); >>> After successful execution of this subroutine (accompaning with the >>> corresponding messages on the terminal window) the execution of >>> remainder script is terminated without any error messages. >> >> The problem lies somewhere within the rest of your script, so we have >> to see it if you want help. > > The rest of script,imho, is ok, because without this sub it is work > fine. May be problem lies into the TCoffee itself? I've run your subroutine in a simple script of my own and it doesn't cause script termination. Again, the problem lies elsewhere in your script. Supply it or it is impossible for anyone to help you. From Sebastien.Moretti at unil.ch Wed Apr 30 14:06:28 2008 From: Sebastien.Moretti at unil.ch (Sebastien MORETTI) Date: Wed, 30 Apr 2008 16:06:28 +0200 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> Message-ID: <48187CE4.8030606@unil.ch> >>> My subroutine is following: >>> sub align { >>> my $file=shift @_; >>> my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => >>> 'fasta', 'outfile' => 'temp_align.out'); >>> my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >>> my $aln=$factory->align ($file); >>> open (fy,'temp_align.out'); my @temp_file=; close fy; >>> return @temp_file; >>> } >>> This subroutine is called by the following command: >>> my @align_fa = align($inputfile_align); >>> After successful execution of this subroutine (accompaning with the >>> corresponding messages on the terminal window) the execution of >>> remainder script is terminated without any error messages. >> >> The problem lies somewhere within the rest of your script, so we have >> to see it if you want help. >> >> Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you >> don't make use of the resulting alignment object? A system call might >> make more sense given what you're doing. The beauty of >> Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse >> the result file (temp_align.out) yourself. > > The rest of script,imho, is ok, because without this sub it is work > fine. May be problem lies into the TCoffee itself? > > One of the feature of script is to estimate the quantity of nt changes > in each position in the different similar sequences in comparing with > consensus sequences. To perform this it is nesseccary to obtain the > multiply alignment: the result of TCoffee alignment goes to another > subroutine, that estemated the level of changes. Of course, I dont think > that this way is the best approach, most probably there are a lot of the > better ways to do it. But for my today purposes it is ok. Do you have tried to use the tcoffee command, called via bioperl, as a command line ? To check if it is a problem with tcoffee or with the tcoffee release that bioperl must use. -- S?bastien Moretti From dr.hogart at gmail.com Wed Apr 30 14:54:59 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Wed, 30 Apr 2008 18:54:59 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> Message-ID: Hi Albert, The isolated call is executed without any problem, so the code is absolutely correct. The problem arise when this sub executed within the whole script - after successful execution of TCoffee alignment the execution of the rest of script is terminated. The whole code is very big (~500 lines), so for simplicity lets imagine the sheme of script in the following view: sub1; sub2; sub3; sub align; # TCoffe alignment; sub4; sub5; Each sub (subroutine) is independent from the others subs; The order of script execution is 1,2,3,align,4,5. But after the execution of align the execution of the rest of subs (4 and 5) is terminated. The script without sub align {} successfully execute the sub 4 and sub 5. So, I mean that interpreter won't compile sub 4 and 5 if sub align is placed before them. On Wed, 30 Apr 2008 18:16:56 +0400, Albert Vilella wrote: > Hi Sergei, > > Can you try to isolate this call with a simpler example to see if it > still > fails? When you say that the problems are in the compilation, do you mean > that the interpreter won't even compile or that it fails during > execution? > Have you checked that you have all the dependencies right? > > Cheers, > > Albert. > > On Wed, Apr 30, 2008 at 2:36 PM, sergei ryazansky > wrote: > >> On Wed, 30 Apr 2008 16:47:17 +0400, Sendu Bala wrote: >> >> sergei ryazansky wrote: >> > >> > > My subroutine is following: >> > > sub align { >> > > my $file=shift @_; >> > > my @params = ('ktuple' => 2,'matrix' => 'BLOSUM', 'output' => >> > > 'fasta', 'outfile' => 'temp_align.out'); >> > > my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >> > > my $aln=$factory->align ($file); >> > > open (fy,'temp_align.out'); my @temp_file=; close fy; >> > > return @temp_file; >> > > } >> > > This subroutine is called by the following command: >> > > my @align_fa = align($inputfile_align); >> > > After successful execution of this subroutine (accompaning with the >> > > corresponding messages on the terminal window) the execution of >> remainder >> > > script is terminated without any error messages. >> > > >> > >> > The problem lies somewhere within the rest of your script, so we have >> to >> > see it if you want help. >> > >> > Why are you using Bio::Tools::Run::Alignment::TCoffee at all if you >> > don't make use of the resulting alignment object? A system call might >> make >> > more sense given what you're doing. The beauty of >> > Bio::Tools::Run::Alignment::TCoffee is that you don't have to parse >> the >> > result file (temp_align.out) yourself. >> > >> >> The rest of script,imho, is ok, because without this sub it is work >> fine. >> May be problem lies into the TCoffee itself? >> >> One of the feature of script is to estimate the quantity of nt changes >> in >> each position in the different similar sequences in comparing with >> consensus >> sequences. To perform this it is nesseccary to obtain the multiply >> alignment: the result of TCoffee alignment goes to another subroutine, >> that >> estemated the level of changes. Of course, I dont think that this way >> is the >> best approach, most probably there are a lot of the better ways to do >> it. >> But for my today purposes it is ok. >> >> -- >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- ?????????? M2, ????????????? ???????? ?????????? Opera: http://www.opera.com/mail/mail/ From dr.hogart at gmail.com Wed Apr 30 15:14:09 2008 From: dr.hogart at gmail.com (sergei ryazansky) Date: Wed, 30 Apr 2008 19:14:09 +0400 Subject: [Bioperl-l] alignment by TCoffee as a subroutine References: <48186A55.4030406@sendu.me.uk> <48187CE4.8030606@unil.ch> Message-ID: No, I didn tried. To tell the truth the problem like this I have obtatin earlier. I simply wanted to aling the several set of sequences by TCoffee Bioperl package. The script should have been consequently add the set one after another to TCoffee wrapper. But after the alignment of the first set of sequences the alignment of the rest sets was terminated. So it was neccessary to use another "super_script" that called first script with different arguments linked to the corresponding set. > Do you have tried to use the tcoffee command, called via bioperl, as a > command line ? -- From bix at sendu.me.uk Wed Apr 30 15:28:50 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 30 Apr 2008 16:28:50 +0100 Subject: [Bioperl-l] alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> Message-ID: <48189032.20102@sendu.me.uk> sergei ryazansky wrote: > Hi Albert, > > The isolated call is executed without any problem, so the code is > absolutely correct. The problem arise when this sub executed within the > whole script - after successful execution of TCoffee alignment the > execution of the rest of script is terminated. The whole code is very > big (~500 lines), so for simplicity lets imagine the sheme of script in > the following view: > sub1; > sub2; > sub3; > sub align; # TCoffe alignment; > sub4; > sub5; > > Each sub (subroutine) is independent from the others subs; The order of > script execution is 1,2,3,align,4,5. But after the execution of align > the execution of the rest of subs (4 and 5) is terminated. The script > without sub align {} successfully execute the sub 4 and sub 5. So, I > mean that interpreter won't compile sub 4 and 5 if sub align is placed > before them. This has nothing to do with interpreter compilation, which is successful if the script runs at all. What do you do with the output of &align? The thing you are doing with that output is most likely the cause of your script terminating, which is why &sub4 and &sub5 run when you don't run &align (have no output that causes the problem). If you're not willing to show us your script, here are some simple debugging steps you can do yourself: # don't do anything with the output of align() - does &sub4 still run? # add some print statements after you call align(), and then after every further block of code in your script to see exactly where the script terminates # reduce your script down to a minimal script that shows the problem (with the help of the previous step) and show us that From dr.hogart at gmail.com Wed Apr 30 15:42:41 2008 From: dr.hogart at gmail.com (Sergei Ryazansky) Date: Wed, 30 Apr 2008 19:42:41 +0400 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> Message-ID: ------- Forwarded message ------- From: "Sergei Ryazansky" To: "Sendu Bala" Cc: Subject: Re: [Bioperl-l] alignment by TCoffee as a subroutine Date: Wed, 30 Apr 2008 19:40:26 +0400 > What do you do with the output of &align? The thing you are doing with > that output is most likely the cause of your script terminating, which > is why &sub4 and &sub5 run when you don't run &align (have no output > that causes the problem). please sea my answer to Sebastien Moretti - there are description of another similar problem. The only thing that I did there with output is printing to file. Nevetheless the problem was the same. > # don't do anything with the output of align() - does &sub4 still run? please sea above. > # add some print statements after you call align(), and then after every > further block of code in your script to see exactly where the script > terminates > # reduce your script down to a minimal script that shows the problem > (with the help of the previous step) and show us that all tests with individual bloks was performed earlier. the results is ok. From cjfields at uiuc.edu Wed Apr 30 16:25:06 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 30 Apr 2008 11:25:06 -0500 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> Message-ID: <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Sergei, I agree with Sendu; we can't diagnose this unless we either have the entire script of a minimal version of it demonstrating the bug. The best way to handle this is to file a bug report, attaching relevant data using the 'Create a new attachment' link (including either the full script or a shortened one which demonstrates the bug). Otherwise we're just shooting in the dark trying to diagnose the problem. http://bugzilla.open-bio.org/ chris On Apr 30, 2008, at 10:42 AM, Sergei Ryazansky wrote: > > > ------- Forwarded message ------- > From: "Sergei Ryazansky" > To: "Sendu Bala" > Cc: > Subject: Re: [Bioperl-l] alignment by TCoffee as a subroutine > Date: Wed, 30 Apr 2008 19:40:26 +0400 > >> What do you do with the output of &align? The thing you are doing >> with that output is most likely the cause of your script >> terminating, which is why &sub4 and &sub5 run when you don't run >> &align (have no output that causes the problem). > > please sea my answer to Sebastien Moretti - there are description of > another similar problem. The only thing that I did there with output > is > printing to file. Nevetheless the problem was the same. > >> # don't do anything with the output of align() - does &sub4 still >> run? > > please sea above. > >> # add some print statements after you call align(), and then after >> every further block of code in your script to see exactly where the >> script terminates >> # reduce your script down to a minimal script that shows the >> problem (with the help of the previous step) and show us that > > all tests with individual bloks was performed earlier. the results > is ok. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dr.hogart at gmail.com Wed Apr 30 16:40:19 2008 From: dr.hogart at gmail.com (Sergei Ryazansky) Date: Wed, 30 Apr 2008 20:40:19 +0400 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Message-ID: On Wed, 30 Apr 2008 20:25:06 +0400, Chris Fields wrote: Chris, I have already sent file to Sendu and also I am attaching it here. I have removed from it really unnecessary parts. > Sergei, > > I agree with Sendu; we can't diagnose this unless we either have the > entire script of a minimal version of it demonstrating the bug. > > The best way to handle this is to file a bug report, attaching relevant > data using the 'Create a new attachment' link (including either the full > script or a shortened one which demonstrates the bug). Otherwise we're > just shooting in the dark trying to diagnose the problem. > > http://bugzilla.open-bio.org/ > > chris -------------- next part -------------- A non-text attachment was scrubbed... Name: script.pl Type: application/octet-stream Size: 6870 bytes Desc: not available URL: From cjfields at uiuc.edu Wed Apr 30 17:02:19 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 30 Apr 2008 12:02:19 -0500 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Message-ID: Hmm, maybe you were confused? From my last email: "The best way to handle this is to file a bug report, attaching relevant data using the 'Create a new attachment' link (including either the full script or a shortened one which demonstrates the bug). Otherwise we're just shooting in the dark trying to diagnose the problem." http://bugzilla.open-bio.org/ Anyone can work on fixing the issue there (so it'll probably get fixed faster). The devs can also track progress on the problem via the dev mail list (bioperl-guts). Diagnosing the bug may also reveal issues not just with Bio::Tools::Run::Alignment::TCoffee but also with other related modules. If needed I can post it to bugzilla, but it helps to submit the bug yourself (so you can receive posts on it's progress). chris On Apr 30, 2008, at 11:40 AM, Sergei Ryazansky wrote: > On Wed, 30 Apr 2008 20:25:06 +0400, Chris Fields > wrote: > > Chris, I have already sent file to Sendu and also I am attaching it > here. I have removed from it really unnecessary parts. > >> Sergei, >> >> I agree with Sendu; we can't diagnose this unless we either have >> the entire script of a minimal version of it demonstrating the bug. >> >> The best way to handle this is to file a bug report, attaching >> relevant data using the 'Create a new attachment' link (including >> either the full script or a shortened one which demonstrates the >> bug). Otherwise we're just shooting in the dark trying to diagnose >> the problem. >> >> http://bugzilla.open-bio.org/ >> >> chris From dr.hogart at gmail.com Wed Apr 30 17:39:35 2008 From: dr.hogart at gmail.com (Sergei Ryazansky) Date: Wed, 30 Apr 2008 21:39:35 +0400 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Message-ID: On Wed, 30 Apr 2008 21:11:56 +0400, Sergei Ryazansky wrote: > Oh, sorry, you right - I too fast read you message. I do it slight later. > >> Hmm, maybe you were confused? From my last email: >> >> "The best way to handle this is to file a bug report, attaching >> relevant data using the 'Create a new attachment' link (including >> either the full script or a shortened one which demonstrates the bug). >> Otherwise we're just shooting in the dark trying to diagnose the >> problem." >> >> http://bugzilla.open-bio.org/ >> >> Anyone can work on fixing the issue there (so it'll probably get fixed >> faster). The devs can also track progress on the problem via the dev >> mail list (bioperl-guts). Diagnosing the bug may also reveal issues >> not just with Bio::Tools::Run::Alignment::TCoffee but also with other >> related modules. >> >> If needed I can post it to bugzilla, but it helps to submit the bug >> yourself (so you can receive posts on it's progress). >> >> chris >> >> On Apr 30, 2008, at 11:40 AM, Sergei Ryazansky wrote: >> >>> On Wed, 30 Apr 2008 20:25:06 +0400, Chris Fields >>> wrote: >>> >>> Chris, I have already sent file to Sendu and also I am attaching it >>> here. I have removed from it really unnecessary parts. >>> >>>> Sergei, >>>> >>>> I agree with Sendu; we can't diagnose this unless we either have the >>>> entire script of a minimal version of it demonstrating the bug. >>>> >>>> The best way to handle this is to file a bug report, attaching >>>> relevant data using the 'Create a new attachment' link (including >>>> either the full script or a shortened one which demonstrates the >>>> bug). Otherwise we're just shooting in the dark trying to diagnose >>>> the problem. >>>> >>>> http://bugzilla.open-bio.org/ >>>> >>>> chris > From cjfields at uiuc.edu Wed Apr 30 18:29:28 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 30 Apr 2008 13:29:28 -0500 Subject: [Bioperl-l] Fwd: Re: alignment by TCoffee as a subroutine In-Reply-To: References: <48186A55.4030406@sendu.me.uk> <358f4d650804300716j2a40360fsca340370e552d238@mail.gmail.com> <48189032.20102@sendu.me.uk> <5F24BE07-4085-4458-8A7D-178769BE6110@uiuc.edu> Message-ID: <39A139E4-6783-41E6-8EE9-1FE60CB57577@uiuc.edu> Sorry, didn't catch that... chris On Apr 30, 2008, at 12:39 PM, Sergei Ryazansky wrote: > On Wed, 30 Apr 2008 21:11:56 +0400, Sergei Ryazansky > wrote: > >> Oh, sorry, you right - I too fast read you message. I do it slight >> later. >> >>> Hmm, maybe you were confused? From my last email: >>> >>> "The best way to handle this is to file a bug report, attaching >>> relevant data using the 'Create a new attachment' link (including >>> either the full script or a shortened one which demonstrates the >>> bug). Otherwise we're just shooting in the dark trying to diagnose >>> the problem." >>> >>> http://bugzilla.open-bio.org/ >>> >>> Anyone can work on fixing the issue there (so it'll probably get >>> fixed faster). The devs can also track progress on the problem >>> via the dev mail list (bioperl-guts). Diagnosing the bug may also >>> reveal issues not just with Bio::Tools::Run::Alignment::TCoffee >>> but also with other related modules. >>> >>> If needed I can post it to bugzilla, but it helps to submit the >>> bug yourself (so you can receive posts on it's progress). >>> >>> chris >>> >>> On Apr 30, 2008, at 11:40 AM, Sergei Ryazansky wrote: >>> >>>> On Wed, 30 Apr 2008 20:25:06 +0400, Chris Fields >>>> wrote: >>>> >>>> Chris, I have already sent file to Sendu and also I am attaching >>>> it here. I have removed from it really unnecessary parts. >>>> >>>>> Sergei, >>>>> >>>>> I agree with Sendu; we can't diagnose this unless we either have >>>>> the entire script of a minimal version of it demonstrating the >>>>> bug. >>>>> >>>>> The best way to handle this is to file a bug report, attaching >>>>> relevant data using the 'Create a new attachment' link >>>>> (including either the full script or a shortened one which >>>>> demonstrates the bug). Otherwise we're just shooting in the dark >>>>> trying to diagnose the problem. >>>>> >>>>> http://bugzilla.open-bio.org/ >>>>> >>>>> chris >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign