From akarger at CGR.Harvard.edu Fri Jul 1 11:14:40 2005 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri Jul 1 11:04:01 2005 Subject: [Bioperl-l] why string overload is bad Message-ID: <339D68B133EAD311971E009027DC47970321A7FC@montecarlo.cgr.harvard.edu> > -----Original Message----- > From: Ewan Birney [mailto:birney@ebi.ac.uk] > Sent: Tuesday, June 28, 2005 9:57 AM > To: Stefan Kirov > Cc: Hilmar Lapp; Bioperl > Subject: Re: [Bioperl-l] why string overload is bad > > > >> Do people really want to go the route of string-overloading the > >> annotation classes? To me it's really over the top and is a step > >> backwards for ease of using the toolkit. > > > > Hilmar definitely has a point here. > > I have always been against string overloading. The subtly of the bugs > generated and non-obvious code paths (when Perl wants a number, does > it go via hte string-overloaded case...) > > I also (personally) think overloading in C++ is bad. I just > think overloading > is bad wherever. I wouldn't say "wherever". For example, it's probably worth it for complex number libraries, so that you don't have to use the "plus" function every time you want to add variables. (Especially because you need to overload +-*/% etc., so code will be MUCH more readable with the overloaded values.) That said, IMO it should be used only in cases that are clear wins, not for minor convenience, or even slightly increased elegance. And it's better if the overloading is clearly defined & scoped without side effects. It sounds like there are side effects here. String overloading is probably more side effect-prone than number, because you do fewer complicated things with numbers (math, change to boolean) than strings (tons of perl functions, not to mention m// and s///). -Amir Karger From jason.stajich at duke.edu Fri Jul 1 14:26:15 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Jul 1 14:17:31 2005 Subject: [Bioperl-l] Getting nucleotide seq from protein accession In-Reply-To: <42BAF1B7.10109@york.ac.uk> References: <42BAF1B7.10109@york.ac.uk> Message-ID: <6A814EB4-D5A4-4510-A8B7-4D67197A29DA@duke.edu> Did you try the FAQ? http://www.bioperl.org/Core/Latest/faq.html#Q5.4 On Jun 23, 2005, at 1:30 PM, Kat Hull wrote: > Hi there, > I was wondering whether anyone has a solution to my problem. I have > a list of protein assession numbers and want to retrieve the > corresponding nucleotide sequences automatically. I thought it > would be possible to do this by changing the NCBI url, but this > doesn't seem to be the case. > Is there a bio-perl module that can do this? > > Kind regards, > Kat > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From dsam at ucsd.edu Fri Jul 1 06:51:50 2005 From: dsam at ucsd.edu (dsam@ucsd.edu) Date: Fri Jul 1 14:27:50 2005 Subject: [Bioperl-l] Re: go-perl Message-ID: <43846.66.91.255.236.1120215110.squirrel@acs-webmail.ucsd.edu> Hello, I tried to install BioPerl using CPAN, but I get quite a few failed tests. I need BioPerl in order to install the GO Perl API. These are needed to calculate semantic similarity based on Gene Ontology (http://www.cs.man.ac.uk/~phillord/semantic_sim.html). I was wondering what does each failed test mean (e.g. simpleGOparser) and if the failed tests can be ignored. Any insight would be greatly appreciated. Below are the failed tests: Thanks, Daniel /************** CPAN **************/ cpan> d /bioperl/ Distribution A/AL/ALLENDAY/bioperl-microarray-0.1.tar.gz Distribution B/BI/BIRNEY/bioperl-0.05.1.tar.gz Distribution B/BI/BIRNEY/bioperl-0.6.2.tar.gz Distribution B/BI/BIRNEY/bioperl-0.7.0.tar.gz Distribution B/BI/BIRNEY/bioperl-1.0.2.tar.gz Distribution B/BI/BIRNEY/bioperl-1.0.tar.gz Distribution B/BI/BIRNEY/bioperl-1.2.1.tar.gz Distribution B/BI/BIRNEY/bioperl-1.2.2.tar.gz Distribution B/BI/BIRNEY/bioperl-1.2.3.tar.gz Distribution B/BI/BIRNEY/bioperl-1.2.tar.gz Distribution B/BI/BIRNEY/bioperl-1.4.tar.gz Distribution B/BI/BIRNEY/bioperl-db-0.1.tar.gz Distribution B/BI/BIRNEY/bioperl-ext-1.4.tar.gz Distribution B/BI/BIRNEY/bioperl-gui-0.7.tar.gz Distribution B/BI/BIRNEY/bioperl-run-1.2.2.tar.gz Distribution B/BI/BIRNEY/bioperl-run-1.4.tar.gz Distribution B/BO/BOZO/Fry-Lib-BioPerl-0.15.tar.gz Distribution C/CR/CRAFFI/Bundle-BioPerl-2.1.5.tar.gz 18 items found cpan>install B/BI/BIRNEY/bioperl-1.4.tar.gz ... ... > t/Variation_IO...............FAILED tests 15, 20, 25 Failed 3/25 tests, 88.00% okay t/WABA.......................ok t/XEMBL_DB...................ok Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------------- t/BioFetch_DB.t 27 2 7.41% 20-21 t/DB.t 78 2 2.56% 30-31 t/EMBL_DB.t 15 2 13.33% 13-14 t/Ontology.t 255 65280 50 100 200.00% 1-50 t/TreeIO.t 41 1 2.44% 42 t/Variation_IO.t 25 3 12.00% 15 20 25 t/simpleGOparser.t 255 65280 98 196 200.00% 1-98 121 subtests skipped. Failed 7/179 test scripts, 96.09% okay. 156/8268 subtests failed, 98.11% okay. make: *** [test_dynamic] Error 2 /usr/bin/make test -- NOT OK Running make install make test had returned bad status, won't install without force cpan> From skirov at utk.edu Fri Jul 1 14:46:54 2005 From: skirov at utk.edu (Stefan Kirov) Date: Fri Jul 1 14:38:03 2005 Subject: [Bioperl-l] Getting nucleotide seq from protein accession In-Reply-To: <6A814EB4-D5A4-4510-A8B7-4D67197A29DA@duke.edu> References: <42BAF1B7.10109@york.ac.uk> <6A814EB4-D5A4-4510-A8B7-4D67197A29DA@duke.edu> Message-ID: <42C58F9E.9060204@utk.edu> Yup, It's always useful to read the manual first. But it is not as much fun :-) . Stefan Jason Stajich wrote: > Did you try the FAQ? > > http://www.bioperl.org/Core/Latest/faq.html#Q5.4 > > > On Jun 23, 2005, at 1:30 PM, Kat Hull wrote: > >> Hi there, >> I was wondering whether anyone has a solution to my problem. I have >> a list of protein assession numbers and want to retrieve the >> corresponding nucleotide sequences automatically. I thought it >> would be possible to do this by changing the NCBI url, but this >> doesn't seem to be the case. >> Is there a bio-perl module that can do this? >> >> Kind regards, >> Kat >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Fri Jul 1 14:48:26 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Jul 1 14:39:34 2005 Subject: [Bioperl-l] TreeIO::nhx doesn't write internal node labels In-Reply-To: <174361574726.20050630205556@princeton.edu> References: <174361574726.20050630205556@princeton.edu> Message-ID: Can you send your code and an example file as a bug in http:// bugzilla.open-bio.org? On Jun 30, 2005, at 12:55 PM, Georgii Bazykin wrote: > Hi, > > I am new to BioPerl, and I am having trouble trying to save a tree in > NHX format. I load a nexus tree and parse a PAUP log file ("branch > linkages") to get internal node ids (I will then need to process > character changes between internal nodes, this is why I need internal > node ids). I then put write PAUP ids (which are numbers) as ids of > internal nodes of the tree, and write the tree in nxh format, hoping > that the internal node labels will be preserved. But the resulting nhx > file has only empty [&&NHX] labels and no internal node labels. Is > this a feature, or am I doing something wrong? > > Please help! > > Yegor > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From cjm at fruitfly.org Fri Jul 1 15:30:44 2005 From: cjm at fruitfly.org (Chris Mungall) Date: Fri Jul 1 15:23:31 2005 Subject: [Bioperl-l] Re: go-perl In-Reply-To: <43846.66.91.255.236.1120215110.squirrel@acs-webmail.ucsd.edu> References: <43846.66.91.255.236.1120215110.squirrel@acs-webmail.ucsd.edu> Message-ID: Hi Daniel In general you should be wary of forcing an install if the tests fail However, in this case I can tell you that none of the failed tests are of any consequence for either go-db-perl (bioperl isn't required at all for go-perl) or the semantic similarity tool Cheers Chris On Fri, 1 Jul 2005 dsam@ucsd.edu wrote: > Hello, > > I tried to install BioPerl using CPAN, but I get quite a few failed tests. > > I need BioPerl in order to install the GO Perl API. > These are needed to calculate semantic similarity based on Gene > Ontology (http://www.cs.man.ac.uk/~phillord/semantic_sim.html). > > I was wondering what does each failed test mean (e.g. simpleGOparser) > and if the failed tests can be ignored. > Any insight would be greatly appreciated. > > Below are the failed tests: > > Thanks, > Daniel > > /************** > CPAN > **************/ > > cpan> d /bioperl/ > Distribution A/AL/ALLENDAY/bioperl-microarray-0.1.tar.gz > Distribution B/BI/BIRNEY/bioperl-0.05.1.tar.gz > Distribution B/BI/BIRNEY/bioperl-0.6.2.tar.gz > Distribution B/BI/BIRNEY/bioperl-0.7.0.tar.gz > Distribution B/BI/BIRNEY/bioperl-1.0.2.tar.gz > Distribution B/BI/BIRNEY/bioperl-1.0.tar.gz > Distribution B/BI/BIRNEY/bioperl-1.2.1.tar.gz > Distribution B/BI/BIRNEY/bioperl-1.2.2.tar.gz > Distribution B/BI/BIRNEY/bioperl-1.2.3.tar.gz > Distribution B/BI/BIRNEY/bioperl-1.2.tar.gz > Distribution B/BI/BIRNEY/bioperl-1.4.tar.gz > Distribution B/BI/BIRNEY/bioperl-db-0.1.tar.gz > Distribution B/BI/BIRNEY/bioperl-ext-1.4.tar.gz > Distribution B/BI/BIRNEY/bioperl-gui-0.7.tar.gz > Distribution B/BI/BIRNEY/bioperl-run-1.2.2.tar.gz > Distribution B/BI/BIRNEY/bioperl-run-1.4.tar.gz > Distribution B/BO/BOZO/Fry-Lib-BioPerl-0.15.tar.gz > Distribution C/CR/CRAFFI/Bundle-BioPerl-2.1.5.tar.gz > 18 items found > > cpan>install B/BI/BIRNEY/bioperl-1.4.tar.gz > ... > ... > > > t/Variation_IO...............FAILED tests 15, 20, 25 > Failed 3/25 tests, 88.00% okay > t/WABA.......................ok > t/XEMBL_DB...................ok > Failed Test Stat Wstat Total Fail Failed List of Failed > ------------------------------------------------------------------------------- > t/BioFetch_DB.t 27 2 7.41% 20-21 > t/DB.t 78 2 2.56% 30-31 > t/EMBL_DB.t 15 2 13.33% 13-14 > t/Ontology.t 255 65280 50 100 200.00% 1-50 > t/TreeIO.t 41 1 2.44% 42 > t/Variation_IO.t 25 3 12.00% 15 20 25 > t/simpleGOparser.t 255 65280 98 196 200.00% 1-98 > 121 subtests skipped. > Failed 7/179 test scripts, 96.09% okay. 156/8268 subtests failed, 98.11% > okay. > make: *** [test_dynamic] Error 2 > /usr/bin/make test -- NOT OK > Running make install > make test had returned bad status, won't install without force > > cpan> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gnf.org Fri Jul 1 16:04:15 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Jul 1 15:53:04 2005 Subject: [Bioperl-l] Re: go-perl In-Reply-To: <43846.66.91.255.236.1120215110.squirrel@acs-webmail.ucsd.edu> References: <43846.66.91.255.236.1120215110.squirrel@acs-webmail.ucsd.edu> Message-ID: <9a98b045f729e7056728526cc2ada0fe@gnf.org> You should probably upgrade to a snapshot from the CVS main trunk (in essence equivalent to a 1.5.x version) if you want to use Bioperl. As ChrisM said, for go-perl bioperl is not required. In fact, the next version of Bioperl will optionally depend on go-perl if you want .obo formats supported. -hilmar On Jul 1, 2005, at 3:51 AM, dsam@ucsd.edu wrote: > Hello, > > I tried to install BioPerl using CPAN, but I get quite a few failed > tests. > > I need BioPerl in order to install the GO Perl API. > These are needed to calculate semantic similarity based on Gene > Ontology (http://www.cs.man.ac.uk/~phillord/semantic_sim.html). > > I was wondering what does each failed test mean (e.g. simpleGOparser) > and if the failed tests can be ignored. > Any insight would be greatly appreciated. > > Below are the failed tests: > > Thanks, > Daniel > > /************** > CPAN > **************/ > > cpan> d /bioperl/ > Distribution A/AL/ALLENDAY/bioperl-microarray-0.1.tar.gz > Distribution B/BI/BIRNEY/bioperl-0.05.1.tar.gz > Distribution B/BI/BIRNEY/bioperl-0.6.2.tar.gz > Distribution B/BI/BIRNEY/bioperl-0.7.0.tar.gz > Distribution B/BI/BIRNEY/bioperl-1.0.2.tar.gz > Distribution B/BI/BIRNEY/bioperl-1.0.tar.gz > Distribution B/BI/BIRNEY/bioperl-1.2.1.tar.gz > Distribution B/BI/BIRNEY/bioperl-1.2.2.tar.gz > Distribution B/BI/BIRNEY/bioperl-1.2.3.tar.gz > Distribution B/BI/BIRNEY/bioperl-1.2.tar.gz > Distribution B/BI/BIRNEY/bioperl-1.4.tar.gz > Distribution B/BI/BIRNEY/bioperl-db-0.1.tar.gz > Distribution B/BI/BIRNEY/bioperl-ext-1.4.tar.gz > Distribution B/BI/BIRNEY/bioperl-gui-0.7.tar.gz > Distribution B/BI/BIRNEY/bioperl-run-1.2.2.tar.gz > Distribution B/BI/BIRNEY/bioperl-run-1.4.tar.gz > Distribution B/BO/BOZO/Fry-Lib-BioPerl-0.15.tar.gz > Distribution C/CR/CRAFFI/Bundle-BioPerl-2.1.5.tar.gz > 18 items found > > cpan>install B/BI/BIRNEY/bioperl-1.4.tar.gz > ... > ... >> > t/Variation_IO...............FAILED tests 15, 20, 25 > Failed 3/25 tests, 88.00% okay > t/WABA.......................ok > t/XEMBL_DB...................ok > Failed Test Stat Wstat Total Fail Failed List of Failed > ----------------------------------------------------------------------- > -------- > t/BioFetch_DB.t 27 2 7.41% 20-21 > t/DB.t 78 2 2.56% 30-31 > t/EMBL_DB.t 15 2 13.33% 13-14 > t/Ontology.t 255 65280 50 100 200.00% 1-50 > t/TreeIO.t 41 1 2.44% 42 > t/Variation_IO.t 25 3 12.00% 15 20 25 > t/simpleGOparser.t 255 65280 98 196 200.00% 1-98 > 121 subtests skipped. > Failed 7/179 test scripts, 96.09% okay. 156/8268 subtests failed, > 98.11% > okay. > make: *** [test_dynamic] Error 2 > /usr/bin/make test -- NOT OK > Running make install > make test had returned bad status, won't install without force > > cpan> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From sgoegel at gmail.com Fri Jul 1 18:27:44 2005 From: sgoegel at gmail.com (SG) Date: Fri Jul 1 18:19:18 2005 Subject: [Bioperl-l] Download sequence annotations without sequence ?? In-Reply-To: References: Message-ID: <200507011727.45112.sgoegel@gmail.com> I have scripts and modules set up to, for a given blast report, go through and download sequences (when not available locally) for certain subjects (hits) and extract information such as db_xref fields, geneontology annotations, taxon ID, and features. The one thing I am not using is the actual DNA or amino acid sequence itself. For large sequences such as genomic DNA, which can be several megabases in size or more, it is impractical to download the entire sequence, which I do not need. My question is, does Bioperl currently have a way to download only the annotations/features associated with a sequence (in GenBank format, for example), but not the sequence itself? If NCBI does not currently offer a way to do that, all that would be necessary to do would be to terminate the connection with the server when the ORIGIN line is reached. Of course, that would limit to only one sequence per query, which is perfectly fine under the circumstances. For pipelined downloads (the default), the $/ input separator would have to be modified accordingly. I have done this but I want to make sure it's not already a standard function of any part of Bioperl. Also, if Bioperl does not currently do this, is there interest in a patch to add this functionality (assuming I get around to making one)? SG From sac at portal.open-bio.org Fri Jul 1 16:56:07 2005 From: sac at portal.open-bio.org (Steve Chervitz) Date: Sat Jul 2 12:15:34 2005 Subject: [Bioperl-l] Re: A question about Bioperl module In-Reply-To: <1120147420.42c417dc5b242@webmail.pobox.upenn.edu> Message-ID: Hi Gao Zhang, No, SeqPattern cannot generate random motifs, and I'm not aware of any modules in Bioperl than can do so (anyone else know?). The String::Random module might be sufficient for your needs: http://search.cpan.org/~steve/String-Random-0.20/Random.pm Steve > From: > Date: Thu, 30 Jun 2005 12:03:40 -0400 > To: > Subject: A question about Bioperl module > > > > > Dear Steve Chervitz, > > Hi! This is Gao Zhang, a Ph.D student in Graduate Group > in Genomics and Computational Biology at University of > Pennsylvinia. I am working on discovery of motifs using > DNA sequence and find Bio::Tools::SeqPattern module might be > helpful for me. > > My question is that whether it has any module which is > able to generate a random motif of width w like 8. In this motif, > each position will have a dominant letter with probability around > x like 0.91. > > Thank you very much and look forward to your reply! > > Best Regards, > Gao Zhang > > From senger at ebi.ac.uk Sun Jul 3 08:50:17 2005 From: senger at ebi.ac.uk (Martin Senger) Date: Sun Jul 3 08:42:22 2005 Subject: [Bioperl-l] pubmed article download and storing in object Message-ID: Hi, My few cents regarding the Bio::Biblio module: 1) You are right that the doumentation of the created Perl objects is poor. I will try to improve it. 2) The modules in Bio::Biblio are of two categories: the first ones get you XML from MEDLINE/Pubmed (by default it gets it from the EBI using SOAP). And the second ones convert it - either to nothing, so you still have an XML, or to Perl biblio objects (that are poorly documented, as mentioned above; blam me), or to a simple hash (with similar names of keys as used in the Perl objects). I agree that it would be nice to have more outputs (like a printed versions of various level of details). 3) The best way to see how the Bio::Biblio modules work is to check the script bioperl-live/scripts/biblio/biblio.PLS (try with -h first). It uses all the methods - so you can directly use it, or to copy&paste the code into your own programs. With regards, Martin -- Martin Senger EMBL Outstation - Hinxton Senger@EBI.ac.uk European Bioinformatics Institute Phone: (+44) 1223 494636 Wellcome Trust Genome Campus (Switchboard: 494444) Hinxton Fax : (+44) 1223 494468 Cambridge CB10 1SD United Kingdom http://industry.ebi.ac.uk/~senger From ylin9 at gel.ym.edu.tw Sun Jul 3 09:00:31 2005 From: ylin9 at gel.ym.edu.tw (Yu-Hsuan Lin(???)) Date: Sun Jul 3 11:32:06 2005 Subject: [Bioperl-l] Problem of installing bioperl-run-1.4 References: <002501c57d7e$68c66170$7a4e818c@sandy> <1120146046.42c4127e11197@webmail.duke.edu> Message-ID: <000501c57fcf$32025a60$7a4e818c@sandy> Thank your for your reply. I can run EMBOSS program directly by typing the program name in my home directory. And it is what I got by typing echo $PATH /usr/lib/j2re1.5-sun/bin:/usr/lib/j2re1.5-sun:/var/local/sbin:/var/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games:/home/tools/EMBOSS-2.10.0:/home/tools/EMBOSS-2.10.0/emboss But I still get the same error message when I type "make test" in command line. Please help me with this problem. Thank you very much. Vincent. ----- Original Message ----- From: "Jason Stajich" To: "Yu-Hsuan Lin(?L?t?a)" Cc: Sent: Thursday, June 30, 2005 11:40 PM Subject: Re: [Bioperl-l] Problem of installing bioperl-run-1.4 > did you make sure the EMBOSS bin directory is in your PATH? > > -jason > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > > Quoting "Yu-Hsuan Lin(?L?t?a)" : > >> Hi, all, >> >> I have a problem to install bioperl-run-1.4. Because I want to use EMBOSS >> program within >> >> my bioperl script, I installed bioperl 1.4 and EMBOSS ( >> /home/tools/EMBOSS-2.10.0 ) in >> >> my debian linux system. When I tried to type in command line "make test", >> it >> said >> >> t/EMBOSS..................ok >> >> 28/30 skipped: EMBOSS not installed locally or XML::Twig not >> installed >> >> >> I also installed XML::Twig and tried symbolic link from /usr/local to >> >> /home/tools/EMBOSS-2.10.0 but still get the same message. Can anyone >> kindly >> tell me how to >> >> solve this problem or where to find solution ? >> >> Thank you very much, >> >> Vincent, >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > From jason.stajich at duke.edu Sun Jul 3 11:43:04 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Jul 3 11:34:12 2005 Subject: [Bioperl-l] Problem of installing bioperl-run-1.4 In-Reply-To: <000501c57fcf$32025a60$7a4e818c@sandy> References: <002501c57d7e$68c66170$7a4e818c@sandy> <1120146046.42c4127e11197@webmail.duke.edu> <000501c57fcf$32025a60$7a4e818c@sandy> Message-ID: What do you see when you run the test individually, you will get more detailed error messages. $ perl -I. -w t/EMBOSS.t On Jul 3, 2005, at 9:00 AM, Yu-Hsuan Lin((???)) wrote: >> >> Quoting "Yu-Hsuan Lin(?L?t?a)" : >> >> >>> Hi, all, >>> >>> I have a problem to install bioperl-run-1.4. Because I want to >>> use EMBOSS >>> program within >>> >>> my bioperl script, I installed bioperl 1.4 and EMBOSS ( >>> /home/tools/EMBOSS-2.10.0 ) in >>> >>> my debian linux system. When I tried to type in command line >>> "make test", it >>> said >>> >>> t/EMBOSS..................ok >>> >>> 28/30 skipped: EMBOSS not installed locally or XML::Twig not >>> installed >>> >>> >>> I also installed XML::Twig and tried symbolic link from /usr/ >>> local to >>> >>> /home/tools/EMBOSS-2.10.0 but still get the same message. Can >>> anyone kindly >>> tell me how to >>> >>> solve this problem or where to find solution ? >>> >>> Thank you very much, >>> >>> Vincent, >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > > > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From ylin9 at gel.ym.edu.tw Mon Jul 4 01:06:41 2005 From: ylin9 at gel.ym.edu.tw (Yu-Hsuan Lin(???)) Date: Mon Jul 4 01:30:27 2005 Subject: [Bioperl-l] Problem of installing bioperl-run-1.4 References: <002501c57d7e$68c66170$7a4e818c@sandy> <1120146046.42c4127e11197@webmail.duke.edu> <000501c57fcf$32025a60$7a4e818c@sandy> Message-ID: <002c01c58056$2a305300$7a4e818c@sandy> $ perl -I. -w t/EMBOSS.t 1..30 # Running under perl version 5.008006 for linux # Current time local: Mon Jul 4 13:01:26 2005 # Current time GMT: Mon Jul 4 05:01:26 2005 # Using Test.pm version 1.25 ok 1 ok 2 ok 3 # skip EMBOSS not installed locally or XML::Twig not installed ok 4 # skip EMBOSS not installed locally or XML::Twig not installed ok 5 # skip EMBOSS not installed locally or XML::Twig not installed ok 6 # skip EMBOSS not installed locally or XML::Twig not installed ok 7 # skip EMBOSS not installed locally or XML::Twig not installed ok 8 # skip EMBOSS not installed locally or XML::Twig not installed ok 9 # skip EMBOSS not installed locally or XML::Twig not installed ok 10 # skip EMBOSS not installed locally or XML::Twig not installed ok 11 # skip EMBOSS not installed locally or XML::Twig not installed ok 12 # skip EMBOSS not installed locally or XML::Twig not installed ok 13 # skip EMBOSS not installed locally or XML::Twig not installed ok 14 # skip EMBOSS not installed locally or XML::Twig not installed ok 15 # skip EMBOSS not installed locally or XML::Twig not installed ok 16 # skip EMBOSS not installed locally or XML::Twig not installed ok 17 # skip EMBOSS not installed locally or XML::Twig not installed ok 18 # skip EMBOSS not installed locally or XML::Twig not installed ok 19 # skip EMBOSS not installed locally or XML::Twig not installed ok 20 # skip EMBOSS not installed locally or XML::Twig not installed ok 21 # skip EMBOSS not installed locally or XML::Twig not installed ok 22 # skip EMBOSS not installed locally or XML::Twig not installed ok 23 # skip EMBOSS not installed locally or XML::Twig not installed ok 24 # skip EMBOSS not installed locally or XML::Twig not installed ok 25 # skip EMBOSS not installed locally or XML::Twig not installed ok 26 # skip EMBOSS not installed locally or XML::Twig not installed ok 27 # skip EMBOSS not installed locally or XML::Twig not installed ok 28 # skip EMBOSS not installed locally or XML::Twig not installed ok 29 # skip EMBOSS not installed locally or XML::Twig not installed ok 30 # skip EMBOSS not installed locally or XML::Twig not installed I don't think I need to set $PATH for XML::Twig, should I ? I installed XML::Twig with CPAN, and it up to date. Vincent. ----- Original Message ----- From: Jason Stajich To: Yu-Hsuan Lin ((???)) Cc: bioperl-l@portal.open-bio.org Sent: Sunday, July 03, 2005 11:43 PM Subject: Re: [Bioperl-l] Problem of installing bioperl-run-1.4 What do you see when you run the test individually, you will get more detailed error messages. $ perl -I. -w t/EMBOSS.t On Jul 3, 2005, at 9:00 AM, Yu-Hsuan Lin((???)) wrote: Quoting "Yu-Hsuan Lin(?L?t?a)" : Hi, all, I have a problem to install bioperl-run-1.4. Because I want to use EMBOSS program within my bioperl script, I installed bioperl 1.4 and EMBOSS ( /home/tools/EMBOSS-2.10.0 ) in my debian linux system. When I tried to type in command line "make test", it said t/EMBOSS..................ok 28/30 skipped: EMBOSS not installed locally or XML::Twig not installed I also installed XML::Twig and tried symbolic link from /usr/local to /home/tools/EMBOSS-2.10.0 but still get the same message. Can anyone kindly tell me how to solve this problem or where to find solution ? Thank you very much, Vincent, _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From michael.watson at bbsrc.ac.uk Mon Jul 4 03:59:18 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Mon Jul 4 03:50:34 2005 Subject: [Bioperl-l] BLAST scores Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D847@iahce2knas1.iah.bbsrc.reserved> On another note, I was parsing BLAST output using Bio::SearchIO and found it took ages - so I switched to BPLite and my parsing took about a tenth of the time - you may want to try it :-) -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Josh Lauricha Sent: 30 June 2005 22:03 To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] BLAST scores Not really a BioPerl question, but... I ran a bunch of blasts using the tablular output. However, I need the score reported and it apparently doesn't do that. The reason I'm using the tabular format is to speed parsing, since that was taking more than half the CPU time... Anyhow, is there anyway to compute the score from the e-value and/or bit scores? Or am I stuck rerunning all those blasts? Thanks -- ------------------------------------------------------ | Josh Lauricha | Ford, you're turning | | laurichj@bioinfo.ucr.edu | into a penguin. Stop | | Bioinformatics, UCR | it | |----------------------------------------------------| | OpenPG: | | 4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 | |----------------------------------------------------| _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From Marc.Logghe at devgen.com Mon Jul 4 05:43:08 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Mon Jul 4 05:34:29 2005 Subject: [Bioperl-l] SeqWithQuality and biosql Message-ID: <0C528E3670D8CE4B8E013F6749231AA62F53C7@ANTARESIA.be.devgen.com> Hi all, I am currently exploring the possibility to store a Bio::Seq::SeqWithQuality object in biosql. Has anyone ever tried this ? One possibility would be to 1) split up the Bio::Seq::SeqWithQuality object into a plain Bio::Seq::RichSeq and a Bio::Seq::PrimaryQual 2) store them separately in biosql; different namespaces 3) link them with a relation term. 4) make a custom adaptor to fetch the persistent objects from biosql and reconstruct the Bio::Seq::SeqWithQuality Does that make sense ? Any other suggestions/possibilities ? As a test I tried to load a Bio::Seq::PrimaryQual in biosql using the load_seqdatabase.pl but it fails because Bio::Seq::PrimaryQual does not have a namespace method. I hope I'm wrong but I have the impression there is a long way to go ;-) Marc From heikki at ebi.ac.uk Mon Jul 4 12:15:20 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon Jul 4 12:06:42 2005 Subject: [Bioperl-l] SeqWithQuality and biosql In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA62F53C7@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA62F53C7@ANTARESIA.be.devgen.com> Message-ID: <200507041715.20558.heikki@ebi.ac.uk> Marc, I have not actually talked about this with Chad, but I've had a long time plan to refactor Bio::Seq::SeqWithQuality to inherit from Bio::Seq::MetaI. It does not at the moment because Chad was there first. Some time later there were some other needs to attach meta information to residues and to avoid having several implementations in bioperl I wrote Bio::Seq::MetaI and its implementation classes. I do not know if there are any issues why Bio::Seq::SeqWithQuality could not be Bio::Seq::MetaI, but it would be good thing to explore that, and implement only one very generic way to store residue-based meta values in biosql. -Heikki On Monday 04 July 2005 10:43, Marc Logghe wrote: > Hi all, > I am currently exploring the possibility to store a > Bio::Seq::SeqWithQuality object in biosql. > Has anyone ever tried this ? > One possibility would be to > 1) split up the Bio::Seq::SeqWithQuality object into a plain > Bio::Seq::RichSeq and a Bio::Seq::PrimaryQual > 2) store them separately in biosql; different namespaces > 3) link them with a relation term. > 4) make a custom adaptor to fetch the persistent objects from biosql and > reconstruct the Bio::Seq::SeqWithQuality > > Does that make sense ? Any other suggestions/possibilities ? > As a test I tried to load a Bio::Seq::PrimaryQual in biosql using the > load_seqdatabase.pl but it fails because Bio::Seq::PrimaryQual does not > have a namespace method. > I hope I'm wrong but I have the impression there is a long way to go ;-) > > Marc > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at ebi.ac.uk Tue Jul 5 10:32:47 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue Jul 5 10:26:01 2005 Subject: [Bioperl-l] Bio::Tree::Compatible, Bio::Tree::Draw::Cladogram Message-ID: <200507051532.48179.heikki@ebi.ac.uk> Gabriel, While testing bioperl module SYNOPSIS sections for runnability I found out that there are two modules in bioperl-live that have external dependencies that are not in Makefile.PL: Bio::Tree::Compatible Testing compatibility of phylogenetic trees with nested taxa. depends on Set::Scalar Bio::Tree::Draw::Cladogram Drawing phylogenetic trees in Encapsulated PostScript (EPS) format. depends on PostScript::TextBlock.pm They both are yours. I have not been that active on the mailing list lately, so I searched the list for a discussion on these new modules. I started getting a bit alarmed that there were none, no emails ever to the bioperl mailing list from you. Finally, I checked the t (test) directory and there were no tests for these modules. Could we have that discussion now and hopefully at the end of the discussion add the dependencies to the Makefile.PL? In a project this big, we have to keep each others informed so that we can keep all parts of bioperl functional and to avoid confusing and alienating users. Where do these modules come from? What functionality do they add? Is the name space used correct? Could we see test code that demonstrates the functionality? Is there something else that you are planning to do? Yours, -Heikki, who feels that he is probably overreacting ;-) ... so do not take it personally -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From jinsun at indiana.edu Tue Jul 5 10:05:15 2005 From: jinsun at indiana.edu (jinsun@indiana.edu) Date: Tue Jul 5 10:47:48 2005 Subject: [Bioperl-l] [Bioperl-guts-l] a question of retrieval information Message-ID: <89DD54D7-C124-4BB2-8476-472AF90EC92F@indiana.edu> To whom it is concerned: I try to write a perl program using bioperl and want to retrieve information from ncbi website. The purpose of this program is to get protein's annotation with a gi number. For example if given a gi number 16128448, I would get acrAB operon repressor [Escherichia coli K12] gi|16128448|ref|NP_414997.1| [16128448]. I wrote bioperl like this: use Bio::DB::GenBank; $gb = new Bio::DB::GenBank; $seqobj = $gb->get_Seq_by_gi('16128448'); $ann_coll = $seqobj->annotation; for $ann ($ann_coll->get_Annotations) { print "Features: ",$ann->as_text if ($ann->tagname eq "features"); print "Comment: ",$ann->as_text if ($ann->tagname eq "comment"); print "Title: ",$ann->as_text if ($ann->tagname eq "title"); print "Organism: ",$ann->as_text if ($ann->tagname eq "organism"); print "Definition: ",$ann->as_text if ($ann->tagname eq "definition"); } It does not work. For some gi numbers I can not get $seqobj, for other gi numbers I can get $seqobj but not any annotation. Could you please help me how to get information from ncbi with a program? Thank you. Jingjun Sun ----- End forwarded message ----- _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l From johan.viklund at gmail.com Tue Jul 5 11:01:40 2005 From: johan.viklund at gmail.com (Johan Viklund) Date: Tue Jul 5 10:52:43 2005 Subject: [Bioperl-l] bioperl-db: exporting data Message-ID: <5e924f0a05070508012bbb63d3@mail.gmail.com> Hi I'm trying to add COG annotations from Entrez Gene to sequences (from refseq in genbank format) I have in a biosql database (on mysql). The problem is I can't get them out again with the bioentry2flat.pl script (the bioentries appears without what i've added). I don't use bioperl for this (i've got ~40000 COG annotations (linked to GeneIDs)). Instead I add it to the seqfeature_qualifer_value table similar to the way GeneID:s are represented (as far as i've figured), with term_id corresponding to db_xref, the same seqfeature_id as the GeneID had and rank i've tried a few different variations but none seem to work (the first free that's larger than GeneID and 1). How should I add this annotation to the database so it gets exported when I use bioperl? I've also got another question: What is rank for? -- Johan Viklund E-mail: From heikki at ebi.ac.uk Tue Jul 5 11:06:10 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue Jul 5 10:56:53 2005 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] a question of retrieval information Message-ID: <200507051606.10387.heikki@ebi.ac.uk> ---------- Forwarded Message ---------- Subject: [Bioperl-guts-l] a question of retrieval information Date: Tuesday 05 July 2005 15:05 From: jinsun@indiana.edu To: bioperl-guts-l@bioperl.org To whom it is concerned: I try to write a perl program using bioperl and want to retrieve information from ncbi website. The purpose of this program is to get protein's annotation with a gi number. For example if given a gi number 16128448, I would get acrAB operon repressor [Escherichia coli K12] gi|16128448|ref|NP_414997.1|[16128448]. I wrote bioperl like this: use Bio::DB::GenBank; $gb = new Bio::DB::GenBank; $seqobj = $gb->get_Seq_by_gi('16128448'); $ann_coll = $seqobj->annotation; for $ann ($ann_coll->get_Annotations) { print "Features: ",$ann->as_text if ($ann->tagname eq "features"); print "Comment: ",$ann->as_text if ($ann->tagname eq "comment"); print "Title: ",$ann->as_text if ($ann->tagname eq "title"); print "Organism: ",$ann->as_text if ($ann->tagname eq "organism"); print "Definition: ",$ann->as_text if ($ann->tagname eq "definition"); } It does not work. For some gi numbers I can not get $seqobj, for other gi numbers I can get $seqobj but not any annotation. Could you please help me how to get information from ncbi with a program? Thank you. Jingjun Sun ----- End forwarded message ----- _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l ------------------------------------------------------- -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From wackattack at gmail.com Tue Jul 5 03:23:04 2005 From: wackattack at gmail.com (Wacki) Date: Tue Jul 5 10:59:21 2005 Subject: [Bioperl-l] Problems with Bioperl graphics Message-ID: <2b8a4eeb05070500235eb437f8@mail.gmail.com> I followed the tutorial here: http://bioperl.org/HOWTOs/Graphics-HOWTO/gettingStarted.html And ran this exact code: http://biokdd.informatics.indiana.edu/jnowacki/render_blast1.txt The image produced is shown here: http://biokdd.informatics.indiana.edu/jnowacki/test.png It doesn't have the name of the hits. What is wrong? The code is exactly the same as the tutorial is it not? code: #!/usr/bin/perl # This is code example 2 in the Graphics-HOWTO use strict; use lib '/home/lstein/projects/bioperl-live'; use Bio::Graphics; use Bio::SeqFeature::Generic; my $panel = Bio::Graphics::Panel->new(-length => 1000, -width => 800, -pad_left => 10, -pad_right => 10, ); my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000); $panel->add_track($full_length, -glyph => 'arrow', -tick => 2, -fgcolor => 'black', -double => 1, ); my $track = $panel->add_track(-glyph => 'graded_segments', -label => 4, -bgcolor => 'blue', -min_score => 0, -max_score => 1000); while (<>) { # read blast file chomp; next if /^\#/; # ignore comments my($name,$score,$start,$end) = split /\t+/; my $feature = Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score, -start=>$start,-end=>$end); $track->add_feature($feature); } binmode(STDOUT); print $panel->png; From crabtree at tigr.ORG Tue Jul 5 11:23:22 2005 From: crabtree at tigr.ORG (Crabtree, Jonathan) Date: Tue Jul 5 11:16:26 2005 Subject: [Bioperl-l] Problems with Bioperl graphics Message-ID: One difference between your code and the tutorial is that you've set -label to 4 in your call to add_track(); in the tutorial this parameter is set to 1. Try changing the 4 to 1 and see what happens. Another difference is that you're using the 'graded_segments' glyph instead of the 'generic' glyph (I don't think this should matter, but you were asking whether your code differs from that in the tutorial ;) Jonathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Wacki > Sent: Tuesday, July 05, 2005 3:23 AM > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] Problems with Bioperl graphics > > > I followed the tutorial here: > > http://bioperl.org/HOWTOs/Graphics-HOWTO/gettingStarted.html > > And ran this exact code: > > http://biokdd.informatics.indiana.edu/jnowacki/render_blast1.txt > > The image produced is shown here: > > http://biokdd.informatics.indiana.edu/jnowacki/test.png > > It doesn't have the name of the hits. What is wrong? The > code is exactly the same as the tutorial is it not? > > > > > > > code: > > #!/usr/bin/perl > > # This is code example 2 in the Graphics-HOWTO > use strict; > use lib '/home/lstein/projects/bioperl-live'; > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > my $panel = Bio::Graphics::Panel->new(-length => 1000, > -width => 800, > -pad_left => 10, > -pad_right => 10, > ); > my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000); > $panel->add_track($full_length, > -glyph => 'arrow', > -tick => 2, > -fgcolor => 'black', > -double => 1, > ); > > my $track = $panel->add_track(-glyph => 'graded_segments', > -label => 4, > -bgcolor => 'blue', > -min_score => 0, > -max_score => 1000); > > while (<>) { # read blast file > chomp; > next if /^\#/; # ignore comments > my($name,$score,$start,$end) = split /\t+/; > my $feature = > Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score, > > -start=>$start,-end=>$end); $track->add_feature($feature); } > > binmode(STDOUT); > print $panel->png; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-> bio.org/mailman/listinfo/bioperl-l > From heikki at ebi.ac.uk Tue Jul 5 11:32:33 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue Jul 5 11:24:55 2005 Subject: [Bioperl-l] [Bioperl-guts-l] a question of retrieval information In-Reply-To: <89DD54D7-C124-4BB2-8476-472AF90EC92F@indiana.edu> References: <89DD54D7-C124-4BB2-8476-472AF90EC92F@indiana.edu> Message-ID: <200507051632.33248.heikki@ebi.ac.uk> Jinsun, Your code works for me. It retrieves the sequence text file, creates the objects and prints out the comment. Are you sure the other gi numbers are valid? Try them first on Entrez to see entry. There are no annotations with names like 'features', 'title' or 'definition'. These are attributes of the sequence object itself. Try $seqobj->id $seqobj->desc $seqobj->species $seqobj->all_SeqFeatures Some of these return strings, some objects or arrays of objects. These are good places to start learn how bioperl works: http://bio.perl.org/HOWTOs/ http://bio.perl.org/Core/Latest/faq.html Yours, -Heikki P.S. Do not post to the guts mailing list. It is only for automatically generated reports. On Tuesday 05 July 2005 15:05, jinsun@indiana.edu wrote: > To whom it is concerned: > > I try to write a perl program using bioperl and want to retrieve > information > from ncbi website. The purpose of this program is to get protein's > annotation > with a gi number. For example if given a gi number 16128448, I would > get acrAB > operon repressor [Escherichia coli K12] gi|16128448|ref|NP_414997.1| > [16128448]. > > I wrote bioperl like this: > > use Bio::DB::GenBank; > > $gb = new Bio::DB::GenBank; > $seqobj = $gb->get_Seq_by_gi('16128448'); > $ann_coll = $seqobj->annotation; > > for $ann ($ann_coll->get_Annotations) { > print "Features: ",$ann->as_text if ($ann->tagname eq "features"); > print "Comment: ",$ann->as_text if ($ann->tagname eq "comment"); > print "Title: ",$ann->as_text if ($ann->tagname eq "title"); > print "Organism: ",$ann->as_text if ($ann->tagname eq "organism"); > print "Definition: ",$ann->as_text if ($ann->tagname eq > "definition"); > } > > It does not work. For some gi numbers I can not get $seqobj, for > other gi numbers > I can get $seqobj but not any annotation. > > Could you please help me how to get information from ncbi with a > program? > Thank you. > > Jingjun Sun > > > > ----- End forwarded message ----- > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From chad at dieselwurks.com Tue Jul 5 13:41:27 2005 From: chad at dieselwurks.com (Chad Matsalla) Date: Tue Jul 5 13:32:35 2005 Subject: [Bioperl-l] SeqWithQuality and biosql In-Reply-To: <200507041715.20558.heikki@ebi.ac.uk> References: <0C528E3670D8CE4B8E013F6749231AA62F53C7@ANTARESIA.be.devgen.com> <200507041715.20558.heikki@ebi.ac.uk> Message-ID: On Mon, 4 Jul 2005, Heikki Lehvaslaiho wrote: > I have not actually talked about this with Chad, but I've had a long time plan > to refactor Bio::Seq::SeqWithQuality to inherit from Bio::Seq::MetaI. It does > not at the moment because Chad was there first. Ha! I won! I remember doing a victory dance. > Some time later there were some other needs to attach meta information > to residues and to avoid having several implementations in bioperl I > wrote Bio::Seq::MetaI and its implementation classes. So how can I help the retrofit? Is my help necessary? > I do not know if there are any issues why Bio::Seq::SeqWithQuality could not > be Bio::Seq::MetaI, but it would be good thing to explore that, and implement > only one very generic way to store residue-based meta values in biosql. This sounds good. I'm willing to help as much as I can. Chad Matsalla From hlapp at gnf.org Tue Jul 5 14:55:10 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jul 5 14:43:40 2005 Subject: [Bioperl-l] Re: SeqWithQuality and biosql In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA62F53D1@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA62F53D1@ANTARESIA.be.devgen.com> Message-ID: <4672e7ad470df9973b998dd1188db923@gnf.org> (I don't think posting to bioperl was a mistake, so I'm including it here again) I think I like Mark's proposal best, i.e., the fundamental model of at most one sequence for each bioentry (e.g., Bio::SeqI object) is left intact, and the problem is reformulated as how to encode/decode sequences from alphabet cross-products as strings. Encoding/decoding wouldn't be difficult to implement, even such that the encoded string is still humanly readable. Biojava has a natural provision for doing this (SymbolTokenizer?), but Bioperl does not, i.e., in Bioperl the object model assumes that the sequence is a flat string, and the alphabet is also a flat string; there is no object you could ask to provide you with an encoder/decoder appropriate for either the alphabet or the type of sequence object. I'd like to hear some feedback from the Bioperl folks as to whether you'd consider this capability a generally useful addition to Bioperl. (It could be designed in a number of ways ranging from more intrusive to completely neutral - e.g., adding this as a method to SeqI [like $seq->seq_encoder()], or making $seq->alphabet() return an object with this and other capabilities, or creating a separate factory class that would return the appropriate encoder known to [or registered with] it based on a given alphabet and type of sequence object.) As for Bio::Seq::MetaI, this could certainly be the interface for SeqWithQuality, but wouldn't solve the de/serialization problem. Also, at least conceptually MetaI-derived classes could represent multi-dimensional meta-information, right? That is, the problem of how to encode/decode the meta-information isn't trivial or restricted to two dimensions here either. As for creating a specialized adaptor in Bioperl-db, that would certainly work too and would most likely be the fastest way to get something that works. However, long-term it would solve the problem only for SeqWithQuality and not for the more general problem of how to store sequences that are based on cross-product alphabets. BTW if you do implement a specialized adaptor, then instead of storing two bioentries and connecting them you might as well implement the sequence encoding/decoding for this particular object in the adaptor - you'd gain speed because instead of increasing the number of database operations you'd spend a couple more CPU cycles in Perl code, and you wouldn't be burdened with two bioentries that aren't coupled by foreign key constraint. As for consensus for how to encode sequence with quality values, I'd include a delimiter between the alphabet operands in the cross-product. I.e., using e.g. slash as the delimiter: 'A/22 T/30 A/32 G/35 C/35'. This can be easily extended to multi-dimensional cross-products so long as the delimiter between them isn't a symbol in any of the alphabets. -hilmar On Jul 5, 2005, at 12:39 AM, Marc Logghe wrote: > Thanks for the feedback. > Good to know I am not alone in this ;-) > I totally agree with Mark that there should be a kind of consensus on > how to store this in Bio*. > Yesterday I mistakenly posted my original mail to the bioperl list. > Heikki responded to that; it might be a good starting point but I am > not > familiar with it: > http://portal.open-bio.org/pipermail/bioperl-l/2005-July/019271.html > So far the long term solustion. > In short term, to have at least something that works, I'll experiment a > little with storing separate objects. I remember one of the > presentations of Hilmar, where he gave the example of making an adaptor > and storing 2 sequence objects that interacted with each other as a > result of a Two Hybrid experiment in yeast. > Cheers, > Marc > > >> >> I'd think storing it in BioSQL as 2-byte pairs would be good. >> First byte is the base (an ASCII character), second byte is >> the quality (an 8-bit integer). Sure it wastes a few bits but >> so does normal DNA... >> >> >> Richard Holland >> Bioinformatics Specialist >> GIS extension 8199 >> --------------------------------------------- >> This email is confidential and may be privileged. If you are >> not the intended recipient, please delete it and notify us >> immediately. Please do not copy or use it for any purpose, or >> disclose its content to any other person. Thank you. >> --------------------------------------------- >> >> >>> -----Original Message----- >>> From: biosql-l-bounces@portal.open-bio.org >>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of >>> mark.schreiber@novartis.com >>> Sent: Tuesday, July 05, 2005 1:44 PM >>> To: Marc Logghe >>> Cc: biosql-l-bounces@portal.open-bio.org; biosql-l@open-bio.org >>> Subject: Re: [BioSQL-l] FW: SeqWithQuality and biosql >>> >>> >>> Hello - >>> >>> I was wondering about similar issues with biojava. As you >> may (or may >>> not) know biojava can make sequences from symbols in any >> alphabet, two >>> examples are DNA and the integer alphabet (a collection of Symbols >>> that are integers). Biojava can also make compound >> alphabets, one such >>> example is the Phred alphabet which is the multiplication of DNA x >>> Integer (technically a subset of Integer from 0 to 99). >>> >>> Because sequence in BioSQL is stored in a CLOB if you can >> encode your >>> SeqWithQuality as a String of characters you can store it. >>> With the case >>> above (which is probably similar to yours) you would need 400 >>> characters to store it which is too large for ASCI but >> could be done >>> in Unicode. The downside is your persitance layer needs to >> know how to >>> encode and decode your SeqWithQuality. I'm not familiar how BioPerl >>> would do this. BioJava would need to Implement a >> SymbolTokenizer for >>> the alphabet and then persistance would happen >> automatically (assuming >>> your DB is OK with Unicode). An alternative would be to make a >>> tokenizer that uses more than single character tokens for >> encoding (eg >>> A23 G40 T34 C22 etc). >>> >>> The alternative you suggest of storing two sequences with a >>> relationship is also nice (because you can retreive each part >>> seperately) but also requires your persitance layer to know >> about it. >>> However, it has big disadvantages because they are not >> strongly tied >>> to each other. If you manipulate one you might invalidate >> the other. >>> Also if you delete one the other will probably not be deleted in a >>> cascade. >>> >>> Not sure if any of this helps but a consensus on how to store this >>> kind of information would be good so the bio* projects do >> it the same >>> way. >>> Consensus in this case will probably mean whatever the first >>> implementation is. >>> >>> - Mark >>> >>> >>> >>> >>> >>> "Marc Logghe" Sent by: >>> biosql-l-bounces@portal.open-bio.org >>> 07/04/2005 05:56 PM >>> >>> >>> To: >>> cc: (bcc: Mark Schreiber/GP/Novartis) >>> Subject: [BioSQL-l] FW: SeqWithQuality and biosql >>> >>> >>> Apologies for cross posting, I had picked the wrong mail adress :-( >>> >>> -----Original Message----- >>> From: Marc Logghe >>> Sent: Monday, July 04, 2005 11:43 AM >>> To: bioperl-l@portal.open-bio.org >>> Subject: SeqWithQuality and biosql >>> >>> Hi all, >>> I am currently exploring the possibility to store a >>> Bio::Seq::SeqWithQuality object in biosql. >>> Has anyone ever tried this ? >>> One possibility would be to >>> 1) split up the Bio::Seq::SeqWithQuality object into a plain >>> Bio::Seq::RichSeq and a Bio::Seq::PrimaryQual >>> 2) store them separately in biosql; different namespaces >>> 3) link them with a relation term. >>> 4) make a custom adaptor to fetch the persistent objects >> from biosql >>> and reconstruct the Bio::Seq::SeqWithQuality >>> >>> Does that make sense ? Any other suggestions/possibilities ? >>> As a test I tried to load a Bio::Seq::PrimaryQual in biosql >> using the >>> load_seqdatabase.pl but it fails because Bio::Seq::PrimaryQual does >>> not have a namespace method. >>> I hope I'm wrong but I have the impression there is a long >> way to go >>> ;-) >>> >>> Marc >>> >>> >>> >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l@open-bio.org >>> http://open-bio.org/mailman/listinfo/biosql-l >>> >>> >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l@open-bio.org >>> http://open-bio.org/mailman/listinfo/biosql-l >>> >> > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Tue Jul 5 23:29:30 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jul 5 23:18:18 2005 Subject: [Bioperl-l] Re: SeqWithQuality and biosql In-Reply-To: References: Message-ID: <12d0914aa33fca6d2e5175ddf85cd0d4@gnf.org> On Jul 5, 2005, at 7:55 PM, mark.schreiber@novartis.com wrote: > I would propose the > following for compound alphabets... > > (aca)(gtc) for codon alphabets. > (g17)(t40) for quality type alphabets. In your convention wouldn't this need to be (g(17))(t(40)) Otherwise you'd have trouble representing higher-dimensional cross-products unless you alternate chars and digits which would be a useless restriction. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Wed Jul 6 03:47:21 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Jul 6 03:43:02 2005 Subject: [Bioperl-l] bioperl-db: exporting data In-Reply-To: <5e924f0a05070508012bbb63d3@mail.gmail.com> References: <5e924f0a05070508012bbb63d3@mail.gmail.com> Message-ID: The way you're describing doesn't sound too far off. The rank is an ordering index as well as a component of the unique key constraint, i.e., you can't have two seqfeature qualifier values for the same feature and tag name unless the rank is different. Have you convinced yourself that you con log in to the database and retrieve those additions by hand (using SQL)? Can you reduce this to a test case where you load a single sequence record, then issue SQL to add your custom annotation, and then retrieve the record again. Email me the entry you loaded, the SQL statements you issued, and the entry you got out. -hilmar On Jul 5, 2005, at 8:01 AM, Johan Viklund wrote: > Hi > > I'm trying to add COG annotations from Entrez Gene to sequences (from > refseq in genbank format) I have in a biosql database (on mysql). The > problem is I can't get them out again with the bioentry2flat.pl script > (the bioentries appears without what i've added). > > I don't use bioperl for this (i've got ~40000 COG annotations (linked > to GeneIDs)). Instead I add it to the seqfeature_qualifer_value table > similar to the way GeneID:s are represented (as far as i've figured), > with term_id corresponding to db_xref, the same seqfeature_id as the > GeneID had and rank i've tried a few different variations but none > seem to work (the first free that's larger than GeneID and 1). > > How should I add this annotation to the database so it gets exported > when I use bioperl? > > I've also got another question: What is rank for? > > -- > Johan Viklund > E-mail: > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From mark.schreiber at novartis.com Tue Jul 5 22:55:40 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Jul 6 09:44:41 2005 Subject: [Bioperl-l] Re: SeqWithQuality and biosql Message-ID: The BioJava SymbolTokenizer can either tokenize to characters or Strings. Obviously not all alphabets can sensibly tokenize to characters (eg large compound alphabets). Currently by default it would tokenize a compound symbol to its compound names. For example the codon ACA would be (adenosine cytosine adenosine) This is obviously not ideal for a database and it can easily be changed in biojava without breaking things (to be honest, tokenization of compound alphas in biojava is not a common task at all). I would propose the following for compound alphabets... (aca)(gtc) for codon alphabets. (g17)(t40) for quality type alphabets. I like the use of brakets because it is possible in biojava to do something like this ((DNAxDNAxDNA)xPROTEIN) which would represent an alignement of codons with their amino acids or even ((DNAxDNA)x(DNAxDNAxDNA)), which I'm not sure you would ever use but their might be a good reason for it. The brackets help to disambiguate better than spaces would. For example ((ctc)S) for the first example or, ((atg)(gc)) for the second example. To make this work there also needs to be a uniform way to store the alphabet name in the sequence table. The above examples show how biojava constructs alphabet names but there maybe (probably are) better ways. For quality information you could use (DNAxINTEGER), techincally the biojava name would be (DNAxSubIntegerAlphabet[0..99]). Of course you don't have to use this convention and aliasing would be nice (eg the 'official' name for INTEGER in BioJava would be 'Alphabet of all integers' which is a bit long winded!) - Mark Hilmar Lapp 07/06/2005 02:55 AM To: "Marc Logghe" cc: Mark Schreiber/GP/Novartis@PH, Bioperl , OBDA BioSQL , Richard HOLLAND Subject: Re: SeqWithQuality and biosql (I don't think posting to bioperl was a mistake, so I'm including it here again) I think I like Mark's proposal best, i.e., the fundamental model of at most one sequence for each bioentry (e.g., Bio::SeqI object) is left intact, and the problem is reformulated as how to encode/decode sequences from alphabet cross-products as strings. Encoding/decoding wouldn't be difficult to implement, even such that the encoded string is still humanly readable. Biojava has a natural provision for doing this (SymbolTokenizer?), but Bioperl does not, i.e., in Bioperl the object model assumes that the sequence is a flat string, and the alphabet is also a flat string; there is no object you could ask to provide you with an encoder/decoder appropriate for either the alphabet or the type of sequence object. I'd like to hear some feedback from the Bioperl folks as to whether you'd consider this capability a generally useful addition to Bioperl. (It could be designed in a number of ways ranging from more intrusive to completely neutral - e.g., adding this as a method to SeqI [like $seq->seq_encoder()], or making $seq->alphabet() return an object with this and other capabilities, or creating a separate factory class that would return the appropriate encoder known to [or registered with] it based on a given alphabet and type of sequence object.) As for Bio::Seq::MetaI, this could certainly be the interface for SeqWithQuality, but wouldn't solve the de/serialization problem. Also, at least conceptually MetaI-derived classes could represent multi-dimensional meta-information, right? That is, the problem of how to encode/decode the meta-information isn't trivial or restricted to two dimensions here either. As for creating a specialized adaptor in Bioperl-db, that would certainly work too and would most likely be the fastest way to get something that works. However, long-term it would solve the problem only for SeqWithQuality and not for the more general problem of how to store sequences that are based on cross-product alphabets. BTW if you do implement a specialized adaptor, then instead of storing two bioentries and connecting them you might as well implement the sequence encoding/decoding for this particular object in the adaptor - you'd gain speed because instead of increasing the number of database operations you'd spend a couple more CPU cycles in Perl code, and you wouldn't be burdened with two bioentries that aren't coupled by foreign key constraint. As for consensus for how to encode sequence with quality values, I'd include a delimiter between the alphabet operands in the cross-product. I.e., using e.g. slash as the delimiter: 'A/22 T/30 A/32 G/35 C/35'. This can be easily extended to multi-dimensional cross-products so long as the delimiter between them isn't a symbol in any of the alphabets. -hilmar On Jul 5, 2005, at 12:39 AM, Marc Logghe wrote: > Thanks for the feedback. > Good to know I am not alone in this ;-) > I totally agree with Mark that there should be a kind of consensus on > how to store this in Bio*. > Yesterday I mistakenly posted my original mail to the bioperl list. > Heikki responded to that; it might be a good starting point but I am > not > familiar with it: > http://portal.open-bio.org/pipermail/bioperl-l/2005-July/019271.html > So far the long term solustion. > In short term, to have at least something that works, I'll experiment a > little with storing separate objects. I remember one of the > presentations of Hilmar, where he gave the example of making an adaptor > and storing 2 sequence objects that interacted with each other as a > result of a Two Hybrid experiment in yeast. > Cheers, > Marc > > >> >> I'd think storing it in BioSQL as 2-byte pairs would be good. >> First byte is the base (an ASCII character), second byte is >> the quality (an 8-bit integer). Sure it wastes a few bits but >> so does normal DNA... >> >> >> Richard Holland >> Bioinformatics Specialist >> GIS extension 8199 >> --------------------------------------------- >> This email is confidential and may be privileged. If you are >> not the intended recipient, please delete it and notify us >> immediately. Please do not copy or use it for any purpose, or >> disclose its content to any other person. Thank you. >> --------------------------------------------- >> >> >>> -----Original Message----- >>> From: biosql-l-bounces@portal.open-bio.org >>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of >>> mark.schreiber@novartis.com >>> Sent: Tuesday, July 05, 2005 1:44 PM >>> To: Marc Logghe >>> Cc: biosql-l-bounces@portal.open-bio.org; biosql-l@open-bio.org >>> Subject: Re: [BioSQL-l] FW: SeqWithQuality and biosql >>> >>> >>> Hello - >>> >>> I was wondering about similar issues with biojava. As you >> may (or may >>> not) know biojava can make sequences from symbols in any >> alphabet, two >>> examples are DNA and the integer alphabet (a collection of Symbols >>> that are integers). Biojava can also make compound >> alphabets, one such >>> example is the Phred alphabet which is the multiplication of DNA x >>> Integer (technically a subset of Integer from 0 to 99). >>> >>> Because sequence in BioSQL is stored in a CLOB if you can >> encode your >>> SeqWithQuality as a String of characters you can store it. >>> With the case >>> above (which is probably similar to yours) you would need 400 >>> characters to store it which is too large for ASCI but >> could be done >>> in Unicode. The downside is your persitance layer needs to >> know how to >>> encode and decode your SeqWithQuality. I'm not familiar how BioPerl >>> would do this. BioJava would need to Implement a >> SymbolTokenizer for >>> the alphabet and then persistance would happen >> automatically (assuming >>> your DB is OK with Unicode). An alternative would be to make a >>> tokenizer that uses more than single character tokens for >> encoding (eg >>> A23 G40 T34 C22 etc). >>> >>> The alternative you suggest of storing two sequences with a >>> relationship is also nice (because you can retreive each part >>> seperately) but also requires your persitance layer to know >> about it. >>> However, it has big disadvantages because they are not >> strongly tied >>> to each other. If you manipulate one you might invalidate >> the other. >>> Also if you delete one the other will probably not be deleted in a >>> cascade. >>> >>> Not sure if any of this helps but a consensus on how to store this >>> kind of information would be good so the bio* projects do >> it the same >>> way. >>> Consensus in this case will probably mean whatever the first >>> implementation is. >>> >>> - Mark >>> >>> >>> >>> >>> >>> "Marc Logghe" Sent by: >>> biosql-l-bounces@portal.open-bio.org >>> 07/04/2005 05:56 PM >>> >>> >>> To: >>> cc: (bcc: Mark Schreiber/GP/Novartis) >>> Subject: [BioSQL-l] FW: SeqWithQuality and biosql >>> >>> >>> Apologies for cross posting, I had picked the wrong mail adress :-( >>> >>> -----Original Message----- >>> From: Marc Logghe >>> Sent: Monday, July 04, 2005 11:43 AM >>> To: bioperl-l@portal.open-bio.org >>> Subject: SeqWithQuality and biosql >>> >>> Hi all, >>> I am currently exploring the possibility to store a >>> Bio::Seq::SeqWithQuality object in biosql. >>> Has anyone ever tried this ? >>> One possibility would be to >>> 1) split up the Bio::Seq::SeqWithQuality object into a plain >>> Bio::Seq::RichSeq and a Bio::Seq::PrimaryQual >>> 2) store them separately in biosql; different namespaces >>> 3) link them with a relation term. >>> 4) make a custom adaptor to fetch the persistent objects >> from biosql >>> and reconstruct the Bio::Seq::SeqWithQuality >>> >>> Does that make sense ? Any other suggestions/possibilities ? >>> As a test I tried to load a Bio::Seq::PrimaryQual in biosql >> using the >>> load_seqdatabase.pl but it fails because Bio::Seq::PrimaryQual does >>> not have a namespace method. >>> I hope I'm wrong but I have the impression there is a long >> way to go >>> ;-) >>> >>> Marc >>> >>> >>> >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l@open-bio.org >>> http://open-bio.org/mailman/listinfo/biosql-l >>> >>> >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l@open-bio.org >>> http://open-bio.org/mailman/listinfo/biosql-l >>> >> > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hollandr at gis.a-star.edu.sg Tue Jul 5 23:38:51 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Wed Jul 6 09:44:42 2005 Subject: [Bioperl-l] RE: SeqWithQuality and biosql Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCB226@BIONIC.biopolis.one-north.com> Good point. To correctly represent compound alphabets in a consistent manner would require extra tables in BioSQL (version 1.1?). Some kind of alphabet table with a name and a related table with alphabet ids and ranks to construct cross products etc. Why not store the delimiter as an attribute of the alphabet in this table. That way we can use whatever delimiters we like. I don't think grouping is necessary - after all we know from the alphabet definition that there are a fixed number of tokens per symbol and what order they come in, so we just read the first three tokens to build the first symbol, and so on. Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gnf.org] > Sent: Wednesday, July 06, 2005 11:30 AM > To: mark.schreiber@novartis.com > Cc: Bioperl; Richard HOLLAND > Subject: Re: SeqWithQuality and biosql > > > > On Jul 5, 2005, at 7:55 PM, mark.schreiber@novartis.com wrote: > > > I would propose the > > following for compound alphabets... > > > > (aca)(gtc) for codon alphabets. > > (g17)(t40) for quality type alphabets. > > In your convention wouldn't this need to be > (g(17))(t(40)) > > Otherwise you'd have trouble representing higher-dimensional > cross-products unless you alternate chars and digits which would be a > useless restriction. > > -hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > From mark.schreiber at novartis.com Wed Jul 6 01:37:21 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Jul 6 09:44:43 2005 Subject: [Bioperl-l] RE: SeqWithQuality and biosql Message-ID: Actually under my proposal (a(17)) would imply (DNAx(SubInteger[0..9]xSubInteger[0..9])) "Richard HOLLAND" 07/06/2005 11:38 AM To: "Hilmar Lapp" , Mark Schreiber/GP/Novartis@PH cc: "Bioperl" , Subject: RE: SeqWithQuality and biosql Good point. To correctly represent compound alphabets in a consistent manner would require extra tables in BioSQL (version 1.1?). Some kind of alphabet table with a name and a related table with alphabet ids and ranks to construct cross products etc. Why not store the delimiter as an attribute of the alphabet in this table. That way we can use whatever delimiters we like. I don't think grouping is necessary - after all we know from the alphabet definition that there are a fixed number of tokens per symbol and what order they come in, so we just read the first three tokens to build the first symbol, and so on. Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gnf.org] > Sent: Wednesday, July 06, 2005 11:30 AM > To: mark.schreiber@novartis.com > Cc: Bioperl; Richard HOLLAND > Subject: Re: SeqWithQuality and biosql > > > > On Jul 5, 2005, at 7:55 PM, mark.schreiber@novartis.com wrote: > > > I would propose the > > following for compound alphabets... > > > > (aca)(gtc) for codon alphabets. > > (g17)(t40) for quality type alphabets. > > In your convention wouldn't this need to be > (g(17))(t(40)) > > Otherwise you'd have trouble representing higher-dimensional > cross-products unless you alternate chars and digits which would be a > useless restriction. > > -hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > From heikki at ebi.ac.uk Wed Jul 6 12:28:43 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed Jul 6 12:20:00 2005 Subject: [Bioperl-l] Re: Bio::Tree::Compatible, Bio::Tree::Draw::Cladogram In-Reply-To: <42CC0C6A.80001@jaist.ac.jp> References: <200507051532.48179.heikki@ebi.ac.uk> <42CC0C6A.80001@jaist.ac.jp> Message-ID: <200507061728.43482.heikki@ebi.ac.uk> Hi Gabriel, I thought this must have been through Jason ;-) He is the most active contributor to bioperl but this just demonstrates how complex bioperl have got. One person just can not monitor everything. We can easily put blame on him. He can take it like a man! On Wednesday 06 July 2005 17:52, Gabriel Valiente wrote: > Dear Heikki, > > I've been discussing all about these modules with Jason Stajich, I just > didn't know of the need for moving the discussion to any mailing list. > Sorry about that. Please tell me how to proceed, I haven't yet > subscribed to any BioPerl mailing list. To keep everyone informed about commits it is customary to - be a member of the bioperl mailing list http://bio.perl.org/MailList.shtml - announce plans and major code commits to the list - commit tests, preferable at the same time as code (There is no more than a paragraph in biodesign.pod, so here is a beginning of a new tutorial... I just remembered that I wrote something about this in docbook format more than a year ago, but I can not find it now. If I ever gave the text to anyone, I'd love to see it again!) Take a look at a few example test file in the t directory, e.g. t/Spidey.t. They all contain a BEGIN statement of variable complexity that uses the Test module and declares how many tests there will be. Test (see 'man Test') exports function ok() that takes care of printing the output. That output is all this file should write out when run using 'perl -w t/Spidey.t' or using make to run the perl test harness (see 'man Test::Harness'). Running these tests periodically enables maintainers see if a change somewhere has broken some other feature. You normally start with testing if you can 'use' your new module, then that you can create an object and then proceed by testing at least all the public methods. Data files can be put into t/data. You can have output from your test script, but it should clean up all new files at exit (do that within END block). > In a nutshell, I've written these modules to support my research on > algorithms in bioinformatics. Bio::Tree::Draw::Cladogram is in an early > stage, I'm still working on the optimal tanglegram layout problem (to > minimize the number of edge crossings among the taxa of the two trees). If there is more interest, it would be cool to have an abstaction layer and be able to output more formats. I guess we can not do anything to the PostScript::TextBloc dependency here. > Bio::Tree::Compatible is perhaps in much better shape, I've tested it > over all pairs of trees from the TreeBASE database. There's a paper > (still under review) about it, the preprint is available from any of: > > http://www.lsi.upc.es/dept/techreps/listado_concreto.php?id=766 > http://arxiv.org/abs/cs.DM/0505086 Is the use of Set::Scalar really necessary? It is yet an other dependency, although I do like it myself, and it might turn out to be useful to other modules, too, in the future. > I don't know much about including test code in the distribution. Please > give me some guidelines, I definitely want to see these modules (and > whatever else I may write in the future) included in the whole BioPerl > distribution. I'm on vacation now, but will try to keep along the > discussion anyway. They are in. No real hurry with the tests. Yours, -Heikki > Thanks, > > Gabriel > > >Gabriel, > > > >While testing bioperl module SYNOPSIS sections for runnability I found out > >that there are two modules in bioperl-live that have external dependencies > >that are not in Makefile.PL: > > > >Bio::Tree::Compatible > > Testing compatibility of phylogenetic trees with nested taxa. > > depends on Set::Scalar > > > >Bio::Tree::Draw::Cladogram > > Drawing phylogenetic trees in Encapsulated PostScript (EPS) format. > > depends on PostScript::TextBlock.pm > > > >They both are yours. > > > >I have not been that active on the mailing list lately, so I searched the > > list for a discussion on these new modules. I started getting a bit > > alarmed that there were none, no emails ever to the bioperl mailing list > > from you. Finally, I checked the t (test) directory and there were no > > tests for these modules. > > > >Could we have that discussion now and hopefully at the end of the > > discussion add the dependencies to the Makefile.PL? In a project this > > big, we have to keep each others informed so that we can keep all parts > > of bioperl functional and to avoid confusing and alienating users. > > > > > >Where do these modules come from? > >What functionality do they add? > >Is the name space used correct? > >Could we see test code that demonstrates the functionality? > >Is there something else that you are planning to do? > > > > > >Yours, > > > > -Heikki, who feels that he is probably overreacting ;-) > > ... so do not take it personally -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From hlapp at gnf.org Wed Jul 6 12:30:06 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Wed Jul 6 12:22:08 2005 Subject: [Bioperl-l] RE: SeqWithQuality and biosql In-Reply-To: References: Message-ID: On Jul 5, 2005, at 10:37 PM, mark.schreiber@novartis.com wrote: > Actually under my proposal > > (a(17)) would imply (DNAx(SubInteger[0..9]xSubInteger[0..9])) > That's why I didn't like it - how would you encode (DNAx(SubInteger[0..99]xSubInteger[0..99]) in this proposal? Require each component to be two-digit? There ought to be delimiters between the operands, no? -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From sm_middha at yahoo.com Wed Jul 6 12:37:05 2005 From: sm_middha at yahoo.com (sumit middha) Date: Wed Jul 6 12:28:06 2005 Subject: [Bioperl-l] FASTA.pm issue In-Reply-To: Message-ID: <20050706163705.67979.qmail@web30711.mail.mud.yahoo.com> Well heres a small test code I made to explain my problem. Please let me know your suggestions. Thanks. --------------code----------------- #!/usr/bin/perl -w use strict; use Bio::DB::Fasta; use Bio::DB::Flat; use Bio::Index::Fasta; use Bio::Seq; my $db = Bio::DB::Fasta->new("f1"); #my $db = Bio::Index::Fasta->new("f1"); my $seqobj = $db->get_Seq_by_id("abc"); my $str = $seqobj->seq(); print $str; exit; -----------end of code ------------ And here is the error I get (which I did not a few months back) > perl -w test.pl AnyDBM_File doesn't define an EXISTS method at /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm line 577 and f1 fasta file is > cat f1 >abc AGCATCG --- Brian Osborne wrote: > Sumit, > > You'll have to show us the code that gives you the > error, I think. > > > Brian O. > > > On 6/23/05 1:07 PM, "sumit middha" > wrote: > > > > > Thanks for the reply Brian. > > Changing it to Bio::Index::Fasta helped, but gave > > another problem in my script, which I dont have a > > clue. > > > > ------------- EXCEPTION ------------- > > MSG: Can't open 'SDBM_File' dbm file > > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file or > > directory > > STACK Bio::Index::Abstract::open_dbm > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392 > > STACK Bio::Index::Abstract::new > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150 > > STACK Bio::Index::AbstractSeq::new > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91 > > STACK toplevel get_ortho.pl:31 > > > > I know that the file exists, and has been > formatted as > > a database to use BLAST search. > > > > sumit > > > > --- Brian Osborne > wrote: > > > >> Sumit, > >> > >> In perl 5.8 a module that's using a tied hash is > >> supposed to have an EXISTS > >> method, but it appears that AnyDBM_File doesn't. > You > >> could try using > >> Bio::Index::Fasta instead, or Bio::DB::Flat. > >> > >> Brian O. > >> > >> > >> On 6/22/05 6:24 PM, "sumit middha" > >> wrote: > >> > >>> > >>> Hello, > >>> > >>> I have a trouble with using fasta module > >>> > >>> I use the required statements > >>> > >>> use Bio::DB::Fasta; > >>> use Bio::Seq; > >>> > >>> The error was: > >>> > >>> AnyDBM_File doesn't define an EXISTS method at > >>> > >> > /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm > >>> line 577 > >>> > >>> thanks, > >>> sm __________________________________ Do you Yahoo!? Make Yahoo! your home page http://www.yahoo.com/r/hs From sm_middha at yahoo.com Wed Jul 6 12:37:05 2005 From: sm_middha at yahoo.com (sumit middha) Date: Wed Jul 6 12:28:07 2005 Subject: [Bioperl-l] FASTA.pm issue In-Reply-To: Message-ID: <20050706163705.67979.qmail@web30711.mail.mud.yahoo.com> Well heres a small test code I made to explain my problem. Please let me know your suggestions. Thanks. --------------code----------------- #!/usr/bin/perl -w use strict; use Bio::DB::Fasta; use Bio::DB::Flat; use Bio::Index::Fasta; use Bio::Seq; my $db = Bio::DB::Fasta->new("f1"); #my $db = Bio::Index::Fasta->new("f1"); my $seqobj = $db->get_Seq_by_id("abc"); my $str = $seqobj->seq(); print $str; exit; -----------end of code ------------ And here is the error I get (which I did not a few months back) > perl -w test.pl AnyDBM_File doesn't define an EXISTS method at /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm line 577 and f1 fasta file is > cat f1 >abc AGCATCG --- Brian Osborne wrote: > Sumit, > > You'll have to show us the code that gives you the > error, I think. > > > Brian O. > > > On 6/23/05 1:07 PM, "sumit middha" > wrote: > > > > > Thanks for the reply Brian. > > Changing it to Bio::Index::Fasta helped, but gave > > another problem in my script, which I dont have a > > clue. > > > > ------------- EXCEPTION ------------- > > MSG: Can't open 'SDBM_File' dbm file > > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file or > > directory > > STACK Bio::Index::Abstract::open_dbm > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392 > > STACK Bio::Index::Abstract::new > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150 > > STACK Bio::Index::AbstractSeq::new > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91 > > STACK toplevel get_ortho.pl:31 > > > > I know that the file exists, and has been > formatted as > > a database to use BLAST search. > > > > sumit > > > > --- Brian Osborne > wrote: > > > >> Sumit, > >> > >> In perl 5.8 a module that's using a tied hash is > >> supposed to have an EXISTS > >> method, but it appears that AnyDBM_File doesn't. > You > >> could try using > >> Bio::Index::Fasta instead, or Bio::DB::Flat. > >> > >> Brian O. > >> > >> > >> On 6/22/05 6:24 PM, "sumit middha" > >> wrote: > >> > >>> > >>> Hello, > >>> > >>> I have a trouble with using fasta module > >>> > >>> I use the required statements > >>> > >>> use Bio::DB::Fasta; > >>> use Bio::Seq; > >>> > >>> The error was: > >>> > >>> AnyDBM_File doesn't define an EXISTS method at > >>> > >> > /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm > >>> line 577 > >>> > >>> thanks, > >>> sm __________________________________ Do you Yahoo!? Make Yahoo! your home page http://www.yahoo.com/r/hs From lehvasla at ebi.ac.uk Wed Jul 6 17:40:49 2005 From: lehvasla at ebi.ac.uk (lehvasla@ebi.ac.uk) Date: Wed Jul 6 18:09:21 2005 Subject: [Bioperl-l] FASTA.pm issue In-Reply-To: <20050706163705.67979.qmail@web30711.mail.mud.yahoo.com> References: <20050706163705.67979.qmail@web30711.mail.mud.yahoo.com> Message-ID: <49934.84.12.20.100.1120686049.squirrel@webmail.ebi.ac.uk> Dumit, Your code works under perl v5.8.4. I do not get any errors or warnings. There has to be some change between perl releases. What is the version of your AnyDBM_File? Mine is perl -MAnyDBM_File -le 'print AnyDBM_File->VERSION;' 1.00 -Heikki > > Well heres a small test code I made to explain my > problem. Please let me know your suggestions. > Thanks. > > --------------code----------------- > #!/usr/bin/perl -w > use strict; > use Bio::DB::Fasta; > use Bio::DB::Flat; > use Bio::Index::Fasta; > use Bio::Seq; > > my $db = Bio::DB::Fasta->new("f1"); > #my $db = Bio::Index::Fasta->new("f1"); > my $seqobj = $db->get_Seq_by_id("abc"); > my $str = $seqobj->seq(); > print $str; > > exit; > -----------end of code ------------ > > And here is the error I get (which I did not a few > months back) > >> perl -w test.pl > AnyDBM_File doesn't define an EXISTS method at > /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm > line 577 > > and f1 fasta file is >> cat f1 >>abc > AGCATCG > > > --- Brian Osborne wrote: > >> Sumit, >> >> You'll have to show us the code that gives you the >> error, I think. >> >> >> Brian O. >> >> >> On 6/23/05 1:07 PM, "sumit middha" >> wrote: >> >> > >> > Thanks for the reply Brian. >> > Changing it to Bio::Index::Fasta helped, but gave >> > another problem in my script, which I dont have a >> > clue. >> > >> > ------------- EXCEPTION ------------- >> > MSG: Can't open 'SDBM_File' dbm file >> > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file or >> > directory >> > STACK Bio::Index::Abstract::open_dbm >> > >> > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392 >> > STACK Bio::Index::Abstract::new >> > >> > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150 >> > STACK Bio::Index::AbstractSeq::new >> > >> > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91 >> > STACK toplevel get_ortho.pl:31 >> > >> > I know that the file exists, and has been >> formatted as >> > a database to use BLAST search. >> > >> > sumit >> > >> > --- Brian Osborne >> wrote: >> > >> >> Sumit, >> >> >> >> In perl 5.8 a module that's using a tied hash is >> >> supposed to have an EXISTS >> >> method, but it appears that AnyDBM_File doesn't. >> You >> >> could try using >> >> Bio::Index::Fasta instead, or Bio::DB::Flat. >> >> >> >> Brian O. >> >> >> >> >> >> On 6/22/05 6:24 PM, "sumit middha" >> >> wrote: >> >> >> >>> >> >>> Hello, >> >>> >> >>> I have a trouble with using fasta module >> >>> >> >>> I use the required statements >> >>> >> >>> use Bio::DB::Fasta; >> >>> use Bio::Seq; >> >>> >> >>> The error was: >> >>> >> >>> AnyDBM_File doesn't define an EXISTS method at >> >>> >> >> >> /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm >> >>> line 577 >> >>> >> >>> thanks, >> >>> sm > > > > __________________________________ > Do you Yahoo!? > Make Yahoo! your home page > http://www.yahoo.com/r/hs > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From lehvasla at ebi.ac.uk Wed Jul 6 17:40:57 2005 From: lehvasla at ebi.ac.uk (lehvasla@ebi.ac.uk) Date: Wed Jul 6 18:09:27 2005 Subject: [Bioperl-l] FASTA.pm issue Message-ID: <49935.84.12.20.100.1120686057.squirrel@webmail.ebi.ac.uk> Dumit, Your code works under perl v5.8.4. I do not get any errors or warnings. There has to be some change between perl releases. What is the version of your AnyDBM_File? Mine is perl -MAnyDBM_File -le 'print AnyDBM_File->VERSION;' 1.00 -Heikki > > Well heres a small test code I made to explain my > problem. Please let me know your suggestions. > Thanks. > > --------------code----------------- > #!/usr/bin/perl -w > use strict; > use Bio::DB::Fasta; > use Bio::DB::Flat; > use Bio::Index::Fasta; > use Bio::Seq; > > my $db = Bio::DB::Fasta->new("f1"); > #my $db = Bio::Index::Fasta->new("f1"); > my $seqobj = $db->get_Seq_by_id("abc"); > my $str = $seqobj->seq(); > print $str; > > exit; > -----------end of code ------------ > > And here is the error I get (which I did not a few > months back) > >> perl -w test.pl > AnyDBM_File doesn't define an EXISTS method at > /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm > line 577 > > and f1 fasta file is >> cat f1 >>abc > AGCATCG > > > --- Brian Osborne wrote: > >> Sumit, >> >> You'll have to show us the code that gives you the >> error, I think. >> >> >> Brian O. >> >> >> On 6/23/05 1:07 PM, "sumit middha" >> wrote: >> >> > >> > Thanks for the reply Brian. >> > Changing it to Bio::Index::Fasta helped, but gave >> > another problem in my script, which I dont have a >> > clue. >> > >> > ------------- EXCEPTION ------------- >> > MSG: Can't open 'SDBM_File' dbm file >> > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file or >> > directory >> > STACK Bio::Index::Abstract::open_dbm >> > >> > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392 >> > STACK Bio::Index::Abstract::new >> > >> > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150 >> > STACK Bio::Index::AbstractSeq::new >> > >> > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91 >> > STACK toplevel get_ortho.pl:31 >> > >> > I know that the file exists, and has been >> formatted as >> > a database to use BLAST search. >> > >> > sumit >> > >> > --- Brian Osborne >> wrote: >> > >> >> Sumit, >> >> >> >> In perl 5.8 a module that's using a tied hash is >> >> supposed to have an EXISTS >> >> method, but it appears that AnyDBM_File doesn't. >> You >> >> could try using >> >> Bio::Index::Fasta instead, or Bio::DB::Flat. >> >> >> >> Brian O. >> >> >> >> >> >> On 6/22/05 6:24 PM, "sumit middha" >> >> wrote: >> >> >> >>> >> >>> Hello, >> >>> >> >>> I have a trouble with using fasta module >> >>> >> >>> I use the required statements >> >>> >> >>> use Bio::DB::Fasta; >> >>> use Bio::Seq; >> >>> >> >>> The error was: >> >>> >> >>> AnyDBM_File doesn't define an EXISTS method at >> >>> >> >> >> /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm >> >>> line 577 >> >>> >> >>> thanks, >> >>> sm > > > > __________________________________ > Do you Yahoo!? > Make Yahoo! your home page > http://www.yahoo.com/r/hs > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From mark.schreiber at novartis.com Wed Jul 6 20:59:13 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Jul 6 21:23:52 2005 Subject: [Bioperl-l] RE: SeqWithQuality and biosql Message-ID: Good point. I would prefer a system that only uses delimiters for ambiguous cases like the one you show but I guess thats pretty complex so maybe delimiters for every sub-alphabet. - Mark Hilmar Lapp 07/07/2005 12:30 AM To: Mark Schreiber/GP/Novartis@PH cc: "Richard HOLLAND" , Bioperl , biosql-l@open-bio.org Subject: Re: [Bioperl-l] RE: SeqWithQuality and biosql On Jul 5, 2005, at 10:37 PM, mark.schreiber@novartis.com wrote: > Actually under my proposal > > (a(17)) would imply (DNAx(SubInteger[0..9]xSubInteger[0..9])) > That's why I didn't like it - how would you encode (DNAx(SubInteger[0..99]xSubInteger[0..99]) in this proposal? Require each component to be two-digit? There ought to be delimiters between the operands, no? -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From sm_middha at yahoo.com Thu Jul 7 01:05:50 2005 From: sm_middha at yahoo.com (sumit middha) Date: Thu Jul 7 00:56:56 2005 Subject: [Bioperl-l] FASTA.pm issue In-Reply-To: <49935.84.12.20.100.1120686057.squirrel@webmail.ebi.ac.uk> Message-ID: <20050707050551.28599.qmail@web30710.mail.mud.yahoo.com> Mine is > perl -v This is perl, v5.8.5 built for sun4-solaris > perl -MAnyDBM_File -le 'print AnyDBM_File->VERSION;' 1.00 :( any guesses ?? --- lehvasla@ebi.ac.uk wrote: > > Dumit, > > Your code works under perl v5.8.4. I do not get any > errors or warnings. > There has to be some change between perl releases. > What is the version of > your AnyDBM_File? Mine is > > perl -MAnyDBM_File -le 'print AnyDBM_File->VERSION;' > 1.00 > > -Heikki > > > > > > Well heres a small test code I made to explain my > > problem. Please let me know your suggestions. > > Thanks. > > > > --------------code----------------- > > #!/usr/bin/perl -w > > use strict; > > use Bio::DB::Fasta; > > use Bio::DB::Flat; > > use Bio::Index::Fasta; > > use Bio::Seq; > > > > my $db = Bio::DB::Fasta->new("f1"); > > #my $db = Bio::Index::Fasta->new("f1"); > > my $seqobj = $db->get_Seq_by_id("abc"); > > my $str = $seqobj->seq(); > > print $str; > > > > exit; > > -----------end of code ------------ > > > > And here is the error I get (which I did not a few > > months back) > > > >> perl -w test.pl > > AnyDBM_File doesn't define an EXISTS method at > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm > > line 577 > > > > and f1 fasta file is > >> cat f1 > >>abc > > AGCATCG > > > > > > --- Brian Osborne > wrote: > > > >> Sumit, > >> > >> You'll have to show us the code that gives you > the > >> error, I think. > >> > >> > >> Brian O. > >> > >> > >> On 6/23/05 1:07 PM, "sumit middha" > >> wrote: > >> > >> > > >> > Thanks for the reply Brian. > >> > Changing it to Bio::Index::Fasta helped, but > gave > >> > another problem in my script, which I dont have > a > >> > clue. > >> > > >> > ------------- EXCEPTION ------------- > >> > MSG: Can't open 'SDBM_File' dbm file > >> > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file > or > >> > directory > >> > STACK Bio::Index::Abstract::open_dbm > >> > > >> > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392 > >> > STACK Bio::Index::Abstract::new > >> > > >> > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150 > >> > STACK Bio::Index::AbstractSeq::new > >> > > >> > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91 > >> > STACK toplevel get_ortho.pl:31 > >> > > >> > I know that the file exists, and has been > >> formatted as > >> > a database to use BLAST search. > >> > > >> > sumit > >> > > >> > --- Brian Osborne > >> wrote: > >> > > >> >> Sumit, > >> >> > >> >> In perl 5.8 a module that's using a tied hash > is > >> >> supposed to have an EXISTS > >> >> method, but it appears that AnyDBM_File > doesn't. > >> You > >> >> could try using > >> >> Bio::Index::Fasta instead, or Bio::DB::Flat. > >> >> > >> >> Brian O. > >> >> > >> >> > >> >> On 6/22/05 6:24 PM, "sumit middha" > >> >> wrote: > >> >> > >> >>> > >> >>> Hello, > >> >>> > >> >>> I have a trouble with using fasta module > >> >>> > >> >>> I use the required statements > >> >>> > >> >>> use Bio::DB::Fasta; > >> >>> use Bio::Seq; > >> >>> > >> >>> The error was: > >> >>> > >> >>> AnyDBM_File doesn't define an EXISTS method > at > >> >>> > >> >> > >> > /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm > >> >>> line 577 > >> >>> > >> >>> thanks, > >> >>> sm > > > > > > > > __________________________________ > > Do you Yahoo!? > > Make Yahoo! your home page > > http://www.yahoo.com/r/hs > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > ____________________________________________________ Sell on Yahoo! Auctions ? no fees. Bid on great items. http://auctions.yahoo.com/ From sm_middha at yahoo.com Thu Jul 7 01:05:50 2005 From: sm_middha at yahoo.com (sumit middha) Date: Thu Jul 7 00:56:58 2005 Subject: [Bioperl-l] FASTA.pm issue In-Reply-To: <49935.84.12.20.100.1120686057.squirrel@webmail.ebi.ac.uk> Message-ID: <20050707050551.28599.qmail@web30710.mail.mud.yahoo.com> Mine is > perl -v This is perl, v5.8.5 built for sun4-solaris > perl -MAnyDBM_File -le 'print AnyDBM_File->VERSION;' 1.00 :( any guesses ?? --- lehvasla@ebi.ac.uk wrote: > > Dumit, > > Your code works under perl v5.8.4. I do not get any > errors or warnings. > There has to be some change between perl releases. > What is the version of > your AnyDBM_File? Mine is > > perl -MAnyDBM_File -le 'print AnyDBM_File->VERSION;' > 1.00 > > -Heikki > > > > > > Well heres a small test code I made to explain my > > problem. Please let me know your suggestions. > > Thanks. > > > > --------------code----------------- > > #!/usr/bin/perl -w > > use strict; > > use Bio::DB::Fasta; > > use Bio::DB::Flat; > > use Bio::Index::Fasta; > > use Bio::Seq; > > > > my $db = Bio::DB::Fasta->new("f1"); > > #my $db = Bio::Index::Fasta->new("f1"); > > my $seqobj = $db->get_Seq_by_id("abc"); > > my $str = $seqobj->seq(); > > print $str; > > > > exit; > > -----------end of code ------------ > > > > And here is the error I get (which I did not a few > > months back) > > > >> perl -w test.pl > > AnyDBM_File doesn't define an EXISTS method at > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm > > line 577 > > > > and f1 fasta file is > >> cat f1 > >>abc > > AGCATCG > > > > > > --- Brian Osborne > wrote: > > > >> Sumit, > >> > >> You'll have to show us the code that gives you > the > >> error, I think. > >> > >> > >> Brian O. > >> > >> > >> On 6/23/05 1:07 PM, "sumit middha" > >> wrote: > >> > >> > > >> > Thanks for the reply Brian. > >> > Changing it to Bio::Index::Fasta helped, but > gave > >> > another problem in my script, which I dont have > a > >> > clue. > >> > > >> > ------------- EXCEPTION ------------- > >> > MSG: Can't open 'SDBM_File' dbm file > >> > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file > or > >> > directory > >> > STACK Bio::Index::Abstract::open_dbm > >> > > >> > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392 > >> > STACK Bio::Index::Abstract::new > >> > > >> > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150 > >> > STACK Bio::Index::AbstractSeq::new > >> > > >> > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91 > >> > STACK toplevel get_ortho.pl:31 > >> > > >> > I know that the file exists, and has been > >> formatted as > >> > a database to use BLAST search. > >> > > >> > sumit > >> > > >> > --- Brian Osborne > >> wrote: > >> > > >> >> Sumit, > >> >> > >> >> In perl 5.8 a module that's using a tied hash > is > >> >> supposed to have an EXISTS > >> >> method, but it appears that AnyDBM_File > doesn't. > >> You > >> >> could try using > >> >> Bio::Index::Fasta instead, or Bio::DB::Flat. > >> >> > >> >> Brian O. > >> >> > >> >> > >> >> On 6/22/05 6:24 PM, "sumit middha" > >> >> wrote: > >> >> > >> >>> > >> >>> Hello, > >> >>> > >> >>> I have a trouble with using fasta module > >> >>> > >> >>> I use the required statements > >> >>> > >> >>> use Bio::DB::Fasta; > >> >>> use Bio::Seq; > >> >>> > >> >>> The error was: > >> >>> > >> >>> AnyDBM_File doesn't define an EXISTS method > at > >> >>> > >> >> > >> > /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm > >> >>> line 577 > >> >>> > >> >>> thanks, > >> >>> sm > > > > > > > > __________________________________ > > Do you Yahoo!? > > Make Yahoo! your home page > > http://www.yahoo.com/r/hs > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > ____________________________________________________ Sell on Yahoo! Auctions ? no fees. Bid on great items. http://auctions.yahoo.com/ From khoueiry at ibdm.univ-mrs.fr Thu Jul 7 04:42:30 2005 From: khoueiry at ibdm.univ-mrs.fr (khoueiry) Date: Thu Jul 7 04:32:43 2005 Subject: [Bioperl-l] FASTA.pm issue In-Reply-To: <20050706163705.67979.qmail@web30711.mail.mud.yahoo.com> References: <20050706163705.67979.qmail@web30711.mail.mud.yahoo.com> Message-ID: <1120725750.27317.1.camel@DavidLinux> Hi sumit, I suggest you to change your index method. Try that... (In fact your code and the below one works well for me) ------ #!/usr/bin/perl -w use strict; use Bio::Index::Fasta; #Indexing.... my $type = $ENV{'BIOPER_INDEX_TYPE'}; if ($type) { $Bio::Index::Abstract::USE_DBM_TYPE = $type; } my $index = Bio::Index::Fasta->new( "/home/pierre/BioperlTest/f1.idx", 'WRITE' ); $index->make_index("/home/pierre/BioperlTest/f1"); my $seqobj = $index->fetch("abc"); my $str = $seqobj->seq(); print $str."\n"; exit; ----------- Le mercredi 06 juillet 2005 ? 09:37 -0700, sumit middha a ?crit : > Well heres a small test code I made to explain my > problem. Please let me know your suggestions. > Thanks. > > --------------code----------------- > #!/usr/bin/perl -w > use strict; > use Bio::DB::Fasta; > use Bio::DB::Flat; > use Bio::Index::Fasta; > use Bio::Seq; > > my $db = Bio::DB::Fasta->new("f1"); > #my $db = Bio::Index::Fasta->new("f1"); > my $seqobj = $db->get_Seq_by_id("abc"); > my $str = $seqobj->seq(); > print $str; > > exit; > -----------end of code ------------ > > And here is the error I get (which I did not a few > months back) > > > perl -w test.pl > AnyDBM_File doesn't define an EXISTS method at > /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm > line 577 > > and f1 fasta file is > > cat f1 > >abc > AGCATCG > > > --- Brian Osborne wrote: > > > Sumit, > > > > You'll have to show us the code that gives you the > > error, I think. > > > > > > Brian O. > > > > > > On 6/23/05 1:07 PM, "sumit middha" > > wrote: > > > > > > > > Thanks for the reply Brian. > > > Changing it to Bio::Index::Fasta helped, but gave > > > another problem in my script, which I dont have a > > > clue. > > > > > > ------------- EXCEPTION ------------- > > > MSG: Can't open 'SDBM_File' dbm file > > > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file or > > > directory > > > STACK Bio::Index::Abstract::open_dbm > > > > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392 > > > STACK Bio::Index::Abstract::new > > > > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150 > > > STACK Bio::Index::AbstractSeq::new > > > > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91 > > > STACK toplevel get_ortho.pl:31 > > > > > > I know that the file exists, and has been > > formatted as > > > a database to use BLAST search. > > > > > > sumit > > > > > > --- Brian Osborne > > wrote: > > > > > >> Sumit, > > >> > > >> In perl 5.8 a module that's using a tied hash is > > >> supposed to have an EXISTS > > >> method, but it appears that AnyDBM_File doesn't. > > You > > >> could try using > > >> Bio::Index::Fasta instead, or Bio::DB::Flat. > > >> > > >> Brian O. > > >> > > >> > > >> On 6/22/05 6:24 PM, "sumit middha" > > >> wrote: > > >> > > >>> > > >>> Hello, > > >>> > > >>> I have a trouble with using fasta module > > >>> > > >>> I use the required statements > > >>> > > >>> use Bio::DB::Fasta; > > >>> use Bio::Seq; > > >>> > > >>> The error was: > > >>> > > >>> AnyDBM_File doesn't define an EXISTS method at > > >>> > > >> > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm > > >>> line 577 > > >>> > > >>> thanks, > > >>> sm > > > > __________________________________ > Do you Yahoo!? > Make Yahoo! your home page > http://www.yahoo.com/r/hs > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From heikki at ebi.ac.uk Thu Jul 7 05:13:45 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Thu Jul 7 05:05:13 2005 Subject: [Bioperl-l] use cases for SeqWithQuality, please Message-ID: <200507071013.45813.heikki@ebi.ac.uk> I've compared Bio::Seq::SeqWithQuality with Bio::Seq::MetaI schema and there does not seem to be too many differences. All the functionality seems to be there already. The main problem is that there are many different ways to call the constructor. There are so many ways to call it and some methods are already depreciated that it would be better to write a replacement module than try to rewrite all methods. Could I ask those who use Bio::Seq::SeqWithQuality now to send be sample code that shows how they call this module in practise. With that information I could write a Bio::Seq::Quality that implements Bio::Seq::MetaI and we could depreciate Bio::Seq::SeqWithQuality. Are you happy with that, Chad? -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From victor.ruotti at gmail.com Thu Jul 7 11:05:24 2005 From: victor.ruotti at gmail.com (Victor) Date: Thu Jul 7 10:56:32 2005 Subject: [Bioperl-l] Overlapping Features with GFF dbase Message-ID: <36d7e55505070708053442bb23@mail.gmail.com> Hello, I was wondering if someone can point me out on how best to retrieve a set of overlapping features from the GFF schema. Right now I am looking at the GFF.pm to do this by: use Bio::DB::GFF; my $db = Bio::DB:GFF->new(-dsn =>'mydbase', -aggregators =>'gene_model{CDS ,five_prime_UTR,three_prime_UTR'}); my $gene_stream = $db=>get_seq_stream('gene_model:UCSC_hg16'); while (my $gene = $gene_stream->next_seq) { print $gene->name, "\n"; for my $part ($gene->get_SeqFeatures) { print "\t", join("\t", $part->method,$part->start,$part->end), "\n"; } print "\n"; } This gets all the genes from the GFF schema. Should I be using another while loop to retrieve other features that overlap with these genes? It there a bioperl module to retrieve overlapping features? I would like to be able to get all the features that overlap with a particular gene or a whole set of genes. Thanks in advance. Victor From sm_middha at yahoo.com Thu Jul 7 11:11:26 2005 From: sm_middha at yahoo.com (sumit middha) Date: Thu Jul 7 11:02:52 2005 Subject: [Bioperl-l] FASTA.pm issue In-Reply-To: <1120725750.27317.1.camel@DavidLinux> Message-ID: <20050707151126.4199.qmail@web30705.mail.mud.yahoo.com> Nopes, that did not help either. I tried it on a different machine and both the codes worked. My guess is that something might have gone bad with the perl installed in this machine, but cannot guess what it can be, and how to correct that ! > perl test.pl Use of uninitialized value in numeric gt (>) at /usr/local/lib/perl5/5.8.5/sun4-solaris/DB_File.pm line 271. Deep recursion on subroutine "DB_File::AUTOLOAD" at /usr/local/lib/perl5/5.8.5/sun4-solaris/DB_File.pm line 234. Thanks for your help. --- khoueiry wrote: > Hi sumit, > > I suggest you to change your index method. Try > that... (In fact your > code and the below one works well for me) > > ------ > #!/usr/bin/perl -w > use strict; > use Bio::Index::Fasta; > > > #Indexing.... > my $type = $ENV{'BIOPER_INDEX_TYPE'}; > if ($type) { > $Bio::Index::Abstract::USE_DBM_TYPE = $type; > } > > my $index = Bio::Index::Fasta->new( > "/home/pierre/BioperlTest/f1.idx", > 'WRITE' ); > $index->make_index("/home/pierre/BioperlTest/f1"); > > > my $seqobj = $index->fetch("abc"); > my $str = $seqobj->seq(); > print $str."\n"; > > exit; > > ----------- > > Le mercredi 06 juillet 2005 ??? 09:37 -0700, sumit > middha a ???crit : > > > Well heres a small test code I made to explain my > > problem. Please let me know your suggestions. > > Thanks. > > > > --------------code----------------- > > #!/usr/bin/perl -w > > use strict; > > use Bio::DB::Fasta; > > use Bio::DB::Flat; > > use Bio::Index::Fasta; > > use Bio::Seq; > > > > my $db = Bio::DB::Fasta->new("f1"); > > #my $db = Bio::Index::Fasta->new("f1"); > > my $seqobj = $db->get_Seq_by_id("abc"); > > my $str = $seqobj->seq(); > > print $str; > > > > exit; > > -----------end of code ------------ > > > > And here is the error I get (which I did not a few > > months back) > > > > > perl -w test.pl > > AnyDBM_File doesn't define an EXISTS method at > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm > > line 577 > > > > and f1 fasta file is > > > cat f1 > > >abc > > AGCATCG > > > > > > --- Brian Osborne > wrote: > > > > > Sumit, > > > > > > You'll have to show us the code that gives you > the > > > error, I think. > > > > > > > > > Brian O. > > > > > > > > > On 6/23/05 1:07 PM, "sumit middha" > > > wrote: > > > > > > > > > > > Thanks for the reply Brian. > > > > Changing it to Bio::Index::Fasta helped, but > gave > > > > another problem in my script, which I dont > have a > > > > clue. > > > > > > > > ------------- EXCEPTION ------------- > > > > MSG: Can't open 'SDBM_File' dbm file > > > > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file > or > > > > directory > > > > STACK Bio::Index::Abstract::open_dbm > > > > > > > > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392 > > > > STACK Bio::Index::Abstract::new > > > > > > > > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150 > > > > STACK Bio::Index::AbstractSeq::new > > > > > > > > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91 > > > > STACK toplevel get_ortho.pl:31 > > > > > > > > I know that the file exists, and has been > > > formatted as > > > > a database to use BLAST search. > > > > > > > > sumit > > > > > > > > --- Brian Osborne > > > wrote: > > > > > > > >> Sumit, > > > >> > > > >> In perl 5.8 a module that's using a tied hash > is > > > >> supposed to have an EXISTS > > > >> method, but it appears that AnyDBM_File > doesn't. > > > You > > > >> could try using > > > >> Bio::Index::Fasta instead, or Bio::DB::Flat. > > > >> > > > >> Brian O. > > > >> > > > >> > > > >> On 6/22/05 6:24 PM, "sumit middha" > > > >> wrote: > > > >> > > > >>> > > > >>> Hello, > > > >>> > > > >>> I have a trouble with using fasta module > > > >>> > > > >>> I use the required statements > > > >>> > > > >>> use Bio::DB::Fasta; > > > >>> use Bio::Seq; > > > >>> > > > >>> The error was: > > > >>> > > > >>> AnyDBM_File doesn't define an EXISTS method > at > > > >>> > > > >> > > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm > > > >>> line 577 > > > >>> > > > >>> thanks, > > > >>> sm > > > > > > > > __________________________________ > > Do you Yahoo!? > > Make Yahoo! your home page > > http://www.yahoo.com/r/hs > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > ____________________________________________________ Sell on Yahoo! Auctions ? no fees. Bid on great items. http://auctions.yahoo.com/ From lstein at cshl.edu Thu Jul 7 12:28:39 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Thu Jul 7 12:21:29 2005 Subject: [Bioperl-l] Overlapping Features with GFF dbase In-Reply-To: <36d7e55505070708053442bb23@mail.gmail.com> References: <36d7e55505070708053442bb23@mail.gmail.com> Message-ID: <200507071228.40400.lstein@cshl.edu> Hi Victor, Once you get a gene, you can do this: my @overlapping_features = $gene->features; The same filtering syntax that you use, as well as the get_seq_stream() method call, works with features as well as segments. Lincoln On Thursday 07 July 2005 11:05 am, Victor wrote: > Hello, > I was wondering if someone can point me out on how best to retrieve a set > of overlapping features from the GFF schema. Right now I am looking at the > GFF.pm to do this by: > > use Bio::DB::GFF; > my $db = Bio::DB:GFF->new(-dsn =>'mydbase', > -aggregators =>'gene_model{CDS ,five_prime_UTR,three_prime_UTR'}); > > my $gene_stream = $db=>get_seq_stream('gene_model:UCSC_hg16'); > > while (my $gene = $gene_stream->next_seq) { > print $gene->name, "\n"; > for my $part ($gene->get_SeqFeatures) { > print "\t", join("\t", $part->method,$part->start,$part->end), "\n"; > } > print "\n"; > } > > This gets all the genes from the GFF schema. Should I be using another > while loop to retrieve other features that overlap with these genes? It > there a bioperl module to retrieve overlapping features? I would like to be > able to get all the features that overlap with a particular gene or a whole > set of genes. > > Thanks in advance. > Victor > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From F.Zhang at surrey.ac.uk Thu Jul 7 12:09:43 2005 From: F.Zhang at surrey.ac.uk (F.Zhang) Date: Thu Jul 7 23:16:31 2005 Subject: [Bioperl-l] about features extraction from PDB files Message-ID: Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 7957 bytes Desc: image001.jpg Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050707/1637979e/attachment.jpg From fgarret at ub.edu Fri Jul 8 10:39:21 2005 From: fgarret at ub.edu (Filipe Garrett) Date: Fri Jul 8 10:31:26 2005 Subject: [Bioperl-l] How to get the intron phase Message-ID: <42CE9019.40502@ub.edu> Hi all, I'm new to bioperl and I was looking for a way to obtain the intron phases from genes in a FASTA format like this: >CG3427-RA type=transcript; loc=2R:complement(2273725..2274587,2274647..2274996,2275280..2275413,2275634..2275804,2275864..2276117,2276188..2276549,2277349..2277510,2277748..2277924,2278864..2279008,2279228..2279373,2279935..2280127,2280182..2280323,2280392..2280478,2280739..2280836,2281121..2281172,2285453..2285599,2300275..2300819); ID=CG3427-RA; name=Epac-RA; db_xref=FlyBase:FBtr0086132,FlyBase:FBgn0033102,Gadfly:CG3427-RA; release=r4.1; species=dmel; len=4028 CTCTCCAGCGGCGCACAACTCGATCGCTGGCCCAGAGGTTCAGTTCGGTT TGGTTCGGTTCGGTTTGAATCTCTGCCTCTGTTTACGCCTCTATATC... I've looked at the script directory and found the phase method inside the Bio::SeqFeature::Gene::Intron object, but the examples are from data parsed from a GFF file. Can I bypass the GFF stuff and use the FASTA header information directly? Thanks in advance, Bests From jason.stajich at duke.edu Fri Jul 8 11:12:04 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Jul 8 11:03:26 2005 Subject: [Bioperl-l] How to get the intron phase In-Reply-To: <42CE9019.40502@ub.edu> References: <42CE9019.40502@ub.edu> Message-ID: You can calculate it pretty easily, just build the split-location object from the location string. use Bio::Factory::FTLocationFactory; my $fh; my $file = shift @ARGV; open($fh, "grep '^>' $file") || die; while(<$fh> ){ if( /loc=(\S+):(\S+);/ ) { my ($seqid,$locationstr) = ( $1,$2); my $location = Bio::Factory::FTLocationFactory->from_string ($locationstr); my $runninglength = 0; my $i = 0; my @exons = $location->each_Location; my $last = scalar @exons; for my $exon (@exons) { # I may be sloppy here, pls check that this is working the way you expect # defining A^TG is phase 1 and AT^G is phase 2 i my $phase = ( $runninglength += $exon->length) % 3; if( $i != $last) { print "phase of intron $i is $phase\n"; } $i++; } } } On Jul 8, 2005, at 10:39 AM, Filipe Garrett wrote: > Hi all, > > I'm new to bioperl and I was looking for a way to obtain the intron > phases from genes in a FASTA format like this: > > >CG3427-RA type=transcript; loc=2R:complement > (2273725..2274587,2274647..2274996,2275280..2275413,2275634..2275804,2 > 275864..2276117,2276188..2276549,2277349..2277510,2277748..2277924,227 > 8864..2279008,2279228..2279373,2279935..2280127,2280182..2280323,22803 > 92..2280478,2280739..2280836,2281121..2281172,2285453..2285599,2300275 > ..2300819); ID=CG3427-RA; name=Epac-RA; > db_xref=FlyBase:FBtr0086132,FlyBase:FBgn0033102,Gadfly:CG3427-RA; > release=r4.1; species=dmel; len=4028 > CTCTCCAGCGGCGCACAACTCGATCGCTGGCCCAGAGGTTCAGTTCGGTT > TGGTTCGGTTCGGTTTGAATCTCTGCCTCTGTTTACGCCTCTATATC... > > I've looked at the script directory and found the phase method > inside the Bio::SeqFeature::Gene::Intron object, but the examples > are from data parsed from a GFF file. > > Can I bypass the GFF stuff and use the FASTA header information > directly? > > Thanks in advance, > > Bests > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From chandan.kr.singh at gmail.com Fri Jul 8 15:05:23 2005 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Fri Jul 8 14:56:16 2005 Subject: [Bioperl-l] DEBUGGED Remoteblast from Bio::Perl through proxy Message-ID: <2d4f32050708120534f59f1a@mail.gmail.com> Hi eveybody Those of u ,having problem in blasting sequences from Bio::Perl module through proxy and getting "time out " or " no route to host " errors do need to set the environment proxy variable ( hello smarty we all know it ) and just give the following argument ( env_proxy => 1 ) to $self->{'_ua'} = new LWP::UserAgent( ); as $self->{'_ua'} = new LWP::UserAgent( env_proxy => 1 ); in the following sub in Bio::Tools::Run::RemoteBlast.pm sub ua { my ($self, $value) = @_; if( ! defined $self->{'_ua'} ) { $self->{'_ua'} = new LWP::UserAgent( ); my $nm = ref($self); $nm =~ s/::/_/g; $self->{'_ua'}->agent("bioperl-$nm/$MODVERSION"); } return $self->{'_ua'}; } I saw this bug in the stable version and also in the one downloaded from CVS yesterday . From chandan.kr.singh at gmail.com Fri Jul 8 15:37:13 2005 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Fri Jul 8 15:28:29 2005 Subject: [Bioperl-l] Remote::Blast In-Reply-To: <425D8AAF.6040903@ime.usp.br> References: <425D8AAF.6040903@ime.usp.br> Message-ID: <2d4f32050708123744e99cfe@mail.gmail.com> Dear Thiago I was out of touch with bioperl for quite sometime and today i solved my problem but it seems from your last email that your problem was slow proxy connection or hence time out ,while in my case ,the RemoteBlast.pm module was not reading the env proxy variable . I dint used to get any output .You can see the solution in my recent mail to the group . Regards Chandan On 4/14/05, Thiago Motta Venancio wrote: > Hi all. > I am using the Remote::Blast module. > The script was running ok, but it become out because of a 500 error and > gaves a timeout www.ncbi.nih.go:80. > Later, it came back, but the vast majority of sequences returned no > matches, some of them are not really no matches. > Any lights? > Thanks in advance > Thiago > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From chandan.kr.singh at gmail.com Fri Jul 8 16:00:15 2005 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Fri Jul 8 15:51:09 2005 Subject: [Bioperl-l] DEBUGGED Remoteblast from Bio::Perl through proxy..cont Message-ID: <2d4f32050708130030b7a0dd@mail.gmail.com> sorry for the reduplication of the mail but i had forgot to mention the more bugging bug which is How come ,others dont get this problem or have i misunderstood something . do reply Hi eveybody Those of u ,having problem in blasting sequences from Bio::Perl module through proxy and getting "time out " or " no route to host " errors do need to set the environment proxy variable ( hello smarty we all know it ) and just give the following argument ( env_proxy => 1 ) to $self->{'_ua'} = new LWP::UserAgent( ); as $self->{'_ua'} = new LWP::UserAgent( env_proxy => 1 ); in the following sub in Bio::Tools::Run::RemoteBlast.pm sub ua { my ($self, $value) = @_; if( ! defined $self->{'_ua'} ) { $self->{'_ua'} = new LWP::UserAgent( ); my $nm = ref($self); $nm =~ s/::/_/g; $self->{'_ua'}->agent("bioperl-$nm/$MODVERSION"); } return $self->{'_ua'}; } I saw this bug in the stable version and also in the one downloaded from CVS yesterday . From chandan.kr.singh at gmail.com Fri Jul 8 16:00:15 2005 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Fri Jul 8 15:51:10 2005 Subject: [Bioperl-l] DEBUGGED Remoteblast from Bio::Perl through proxy..cont Message-ID: <2d4f32050708130030b7a0dd@mail.gmail.com> sorry for the reduplication of the mail but i had forgot to mention the more bugging bug which is How come ,others dont get this problem or have i misunderstood something . do reply Hi eveybody Those of u ,having problem in blasting sequences from Bio::Perl module through proxy and getting "time out " or " no route to host " errors do need to set the environment proxy variable ( hello smarty we all know it ) and just give the following argument ( env_proxy => 1 ) to $self->{'_ua'} = new LWP::UserAgent( ); as $self->{'_ua'} = new LWP::UserAgent( env_proxy => 1 ); in the following sub in Bio::Tools::Run::RemoteBlast.pm sub ua { my ($self, $value) = @_; if( ! defined $self->{'_ua'} ) { $self->{'_ua'} = new LWP::UserAgent( ); my $nm = ref($self); $nm =~ s/::/_/g; $self->{'_ua'}->agent("bioperl-$nm/$MODVERSION"); } return $self->{'_ua'}; } I saw this bug in the stable version and also in the one downloaded from CVS yesterday . From chad at dieselwurks.com Fri Jul 8 18:15:13 2005 From: chad at dieselwurks.com (Chad Matsalla) Date: Fri Jul 8 18:06:15 2005 Subject: [Bioperl-l] Re: use cases for SeqWithQuality, please In-Reply-To: <200507071013.45813.heikki@ebi.ac.uk> References: <200507071013.45813.heikki@ebi.ac.uk> Message-ID: On Thu, 7 Jul 2005, Heikki Lehvaslaiho wrote: > There are so many ways to call it and some methods are already depreciated > that it would be better to write a replacement module than try to rewrite all > methods. That sounds ok. > With that information I could write a Bio::Seq::Quality that > implements Bio::Seq::MetaI and we could depreciate > Bio::Seq::SeqWithQuality. > > Are you happy with that, Chad? Absolutely. I'll dig through our code to find use cases. After that you'll let me know how I can help? Chad -- George Orwell was an optimist. From J.A.Page at newcastle.ac.uk Sat Jul 9 18:26:00 2005 From: J.A.Page at newcastle.ac.uk (Jaqueline Ann Page) Date: Sun Jul 10 08:08:32 2005 Subject: [Bioperl-l] Advice on using bioperl Message-ID: Hi Everyone I had trouble using the remote blast bioperl as I could't set the proxy. So I used NCBI webblasst.pl code on their web site. This sends queries to qblast gets back the result into a variable called $content ( containing the blast report). I dont know to pass this to bioperl code. How do I create a $blast_report object to pass it to. Then I would be able to use my $result = $blast_report->next_result; while( $result = $in->next_result ) etc Thanks in advance Jackie From chandan.kr.singh at gmail.com Sun Jul 10 09:14:14 2005 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Sun Jul 10 09:06:22 2005 Subject: [Bioperl-l] Advice on using bioperl In-Reply-To: References: Message-ID: <2d4f32050710061426db929@mail.gmail.com> Hi JAP I dont understand why u cant set the proxy . If the environment proxy variable is set u can easily use Remoteblast.pm .I had posted a mail regarding this two days ago . It might help u . Do reply if it helps . See u Chandan On 7/10/05, Jaqueline Ann Page wrote: > > Hi Everyone > > I had trouble using the remote blast bioperl as I could't set the proxy. So I used NCBI webblasst.pl code on their web site. This sends queries to qblast gets back the result into a variable called $content ( containing the blast report). I dont know > to pass this to bioperl code. How do I create a $blast_report object to pass it to. Then I would be able to use my $result = $blast_report->next_result; > > > while( $result = $in->next_result ) > > > etc > > Thanks in advance > > Jackie > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From oxcorder at cs.uu.nl Mon Jul 11 05:56:50 2005 From: oxcorder at cs.uu.nl (Otto X. Cordero) Date: Mon Jul 11 05:48:12 2005 Subject: [Bioperl-l] MSG: Replacing one sequence Message-ID: <55393.131.211.52.202.1121075810.squirrel@mail.students.cs.uu.nl> Dear all, I have a simple script that converts my alignments from fasta to phylip format. It is mostly a copy-paste from the code in the module documentation, very simple stuff. I noticed that some sequences where replaced: -------------------- WARNING --------------------- MSG: Replacing one sequence [305.Q8XFS3.NR/1-1275] Can anyone explain why this happens? Thanks very much, Otto. ======================================= Otto X. Cordero Theoretical Biology and Bioinformatics Utrecht University +31 30 2539043 Room Z508, Padualaan 8, 3584 CH Utrecht The Netherlands From n.haigh at sheffield.ac.uk Mon Jul 11 11:50:13 2005 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Mon Jul 11 11:41:04 2005 Subject: [Bioperl-l] Gene Features Message-ID: I'm working on Arabidopsis thaliana and I'd like to identify candidate genes based on their gene features. In particular I'd like to identify genes with introns within a specific range. I have obtained a file from TIGR describing gene features: ftp://ftp.arabidopsis.org/home/tair/Maps/seqviewer_data/sv_gene_feature.data I wondered if anyone might have some code for doing this type of thing? Would the use of Bio::SeqFeature be overkill and can it be used without actually having the gene sequences? Thanks Nathan ---------------------------------- Nathan Haigh Bioinformatics PostDoctoral Research Associate Room B2 211 Department of Animal and Plant Sciences University of Sheffield Western Bank Sheffield S10 2TN Tel: +44 (0)114 22 20112 Mob: +44 (0)7742 533 569 Fax: +44 (0)114 22 20002 From heikki at ebi.ac.uk Mon Jul 11 12:02:52 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon Jul 11 11:53:23 2005 Subject: [Bioperl-l] Announce: Bio::Seq::Quality Message-ID: <200507111702.52264.heikki@ebi.ac.uk> Bio::Seq::Quality is a new module that allows you to store per-residue quality and trace index values using Bio::Seq::MetaI interface. It replaces Bio::Seq::SeqWithQuality which is now deprecated. Solutions to persistence should focus on storing Bio::Seq::Meta and Bio::Seq::Meta::Array objects. It should be easy to stringify most real world meta values. Then the persistence could be implemented by storing the sequence object and N number of meta strings. All the functional code is in Bio::Seq::Meta::Array, Bio::Seq::Quality merely adds a convenient interface. The POD contains a discussion of differences from Bio::Seq::SeqWithQuality. If the following, or anything else, is a problem let me know as soon as possible: The greatest difference to Bio::Seq::SeqWithQuality is that in this implementation quality for all sequence residues are automatically assigned a value of '0' (zero) unless you set it to something else. Length of the quality array always equals the length of the sequence. Therefore, length() never returns "DIFFERENT". Enjoy, -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From jason.stajich at duke.edu Mon Jul 11 14:29:26 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Jul 11 14:23:23 2005 Subject: [Bioperl-l] MSG: Replacing one sequence In-Reply-To: <55393.131.211.52.202.1121075810.squirrel@mail.students.cs.uu.nl> References: <55393.131.211.52.202.1121075810.squirrel@mail.students.cs.uu.nl> Message-ID: <11CE92FD-AEF1-47CA-816A-C2D271087F81@duke.edu> sequence names are probably not unique. -jason On Jul 11, 2005, at 5:56 AM, Otto X. Cordero wrote: > Dear all, > > I have a simple script that converts my alignments from fasta to > phylip > format. It is mostly a copy-paste from the code in the module > documentation, very simple stuff. I noticed that some sequences where > replaced: > > -------------------- WARNING --------------------- > MSG: Replacing one sequence [305.Q8XFS3.NR/1-1275] > > Can anyone explain why this happens? > > Thanks very much, > > Otto. > > ======================================= > Otto X. Cordero > Theoretical Biology and Bioinformatics > Utrecht University > +31 30 2539043 > Room Z508, Padualaan 8, 3584 CH Utrecht > The Netherlands > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Mon Jul 11 14:32:25 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Jul 11 14:25:23 2005 Subject: [Bioperl-l] DEBUGGED Remoteblast from Bio::Perl through proxy..cont In-Reply-To: <2d4f32050708130030b7a0dd@mail.gmail.com> References: <2d4f32050708130030b7a0dd@mail.gmail.com> Message-ID: Thanks - I think you can just reset the LWP object directly if you like in your script code w/o modifying the module: $remoteblast->ua(LWP::UserAgent->new(env_proxy =>1)); We can certainly update the module to add this default initialization though. You should submit it as feature request at http://bugzilla.open- bio.org/ so we can track whether or not someone has done it. On Jul 8, 2005, at 4:00 PM, CHANDAN SINGH wrote: > sorry for the reduplication of the mail but i had forgot to mention > the more bugging bug which is How come ,others dont get this > problem or > have i misunderstood something . > do reply > > Hi eveybody > Those of u ,having problem in blasting sequences from Bio::Perl > module through > proxy and getting > "time out " or " no route to host " > errors > do need to set the environment proxy variable ( hello smarty we > all know it ) > and just give the following argument > ( env_proxy => 1 ) > to > $self->{'_ua'} = new LWP::UserAgent( ); > as > $self->{'_ua'} = new LWP::UserAgent( env_proxy => 1 ); > > in the following sub in Bio::Tools::Run::RemoteBlast.pm > sub ua { > my ($self, $value) = @_; > if( ! defined $self->{'_ua'} ) { > $self->{'_ua'} = new LWP::UserAgent( ); > my $nm = ref($self); > $nm =~ s/::/_/g; > $self->{'_ua'}->agent("bioperl-$nm/$MODVERSION"); > } > return $self->{'_ua'}; > } > I saw this bug in the stable version and also in the one downloaded > from CVS yesterday . > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From chandan.kr.singh at gmail.com Mon Jul 11 14:45:58 2005 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Mon Jul 11 14:36:45 2005 Subject: [Bioperl-l] DEBUGGED Remoteblast from Bio::Perl through proxy..cont In-Reply-To: References: <2d4f32050708130030b7a0dd@mail.gmail.com> Message-ID: <2d4f3205071111456c0d2103@mail.gmail.com> Hi Jason I am not sure if it can be done the way u tell it .Anyway i 'll try it. I had submited the problem to bugzilla in april itself and i 've installed bioperl from cvs recently . It is quite possible that it is not included yet . See u Chandan On 7/12/05, Jason Stajich wrote: > Thanks - I think you can just reset the LWP object directly if you like in > your script code w/o modifying the module: > $remoteblast->ua(LWP::UserAgent->new(env_proxy =>1)); > > We can certainly update the module to add this default initialization > though. > > You should submit it as feature request at http://bugzilla.open-bio.org/ so > we can track whether or not someone has done it. > > > On Jul 8, 2005, at 4:00 PM, CHANDAN SINGH wrote: > > sorry for the reduplication of the mail but i had forgot to mention > the more bugging bug which is How come ,others dont get this problem or > have i misunderstood something . > do reply > > Hi eveybody > Those of u ,having problem in blasting sequences from Bio::Perl module > through > proxy and getting > "time out " or " no route to host " errors > do need to set the environment proxy variable ( hello smarty we all know > it ) > and just give the following argument > ( env_proxy => 1 ) > to > $self->{'_ua'} = new LWP::UserAgent( ); > as > $self->{'_ua'} = new LWP::UserAgent( env_proxy => 1 ); > > in the following sub in Bio::Tools::Run::RemoteBlast.pm > sub ua { > my ($self, $value) = @_; > if( ! defined $self->{'_ua'} ) { > $self->{'_ua'} = new LWP::UserAgent( ); > my $nm = ref($self); > $nm =~ s/::/_/g; > $self->{'_ua'}->agent("bioperl-$nm/$MODVERSION"); > } > return $self->{'_ua'}; > } > I saw this bug in the stable version and also in the one downloaded > from CVS yesterday . > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > jason.stajich at duke.edu > > http://www.duke.edu/~jes12/ > > From jason.stajich at duke.edu Mon Jul 11 14:56:21 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Jul 11 14:47:16 2005 Subject: [Bioperl-l] DEBUGGED Remoteblast from Bio::Perl through proxy..cont In-Reply-To: References: <2d4f32050708130030b7a0dd@mail.gmail.com> Message-ID: <19A5C4D1-80D5-44A3-9687-2A985D75EEE1@duke.edu> Sorry I meant just do this. $remoteblast->ua->env_proxy; The ua function is not currently written to accept storing a new ua object but of course you can just do: $remoteblast->{'_ua'} = LWP::UserAgent->new(env_proxy => 1). -jason On Jul 11, 2005, at 2:32 PM, Jason Stajich wrote: > Thanks - I think you can just reset the LWP object directly if you > like in your script code w/o modifying the module: > $remoteblast->ua(LWP::UserAgent->new(env_proxy =>1)); > > We can certainly update the module to add this default > initialization though. > > You should submit it as feature request at http://bugzilla.open- > bio.org/ so we can track whether or not someone has done it. > > On Jul 8, 2005, at 4:00 PM, CHANDAN SINGH wrote: > > >> sorry for the reduplication of the mail but i had forgot to mention >> the more bugging bug which is How come ,others dont get this >> problem or >> have i misunderstood something . >> do reply >> >> Hi eveybody >> Those of u ,having problem in blasting sequences from Bio::Perl >> module through >> proxy and getting >> "time out " or " no route to host >> " errors >> do need to set the environment proxy variable ( hello smarty we >> all know it ) >> and just give the following argument >> ( env_proxy => >> 1 ) >> to >> $self->{'_ua'} = new LWP::UserAgent( ); >> as >> $self->{'_ua'} = new LWP::UserAgent( env_proxy => 1 ); >> >> in the following sub in Bio::Tools::Run::RemoteBlast.pm >> sub ua { >> my ($self, $value) = @_; >> if( ! defined $self->{'_ua'} ) { >> $self->{'_ua'} = new LWP::UserAgent( ); >> my $nm = ref($self); >> $nm =~ s/::/_/g; >> $self->{'_ua'}->agent("bioperl-$nm/$MODVERSION"); >> } >> return $self->{'_ua'}; >> } >> I saw this bug in the stable version and also in the one downloaded >> from CVS yesterday . >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From chandan.kr.singh at gmail.com Mon Jul 11 15:12:58 2005 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Mon Jul 11 15:04:16 2005 Subject: [Bioperl-l] DEBUGGED Remoteblast from Bio::Perl through proxy..cont In-Reply-To: <19A5C4D1-80D5-44A3-9687-2A985D75EEE1@duke.edu> References: <2d4f32050708130030b7a0dd@mail.gmail.com> <19A5C4D1-80D5-44A3-9687-2A985D75EEE1@duke.edu> Message-ID: <2d4f32050711121236c3186a@mail.gmail.com> I know there are ways to do it but if u remember my program was nothing but the second example in bptutorial on net and no such one liner can help it . You seem to be referring to your script which might be a different one . That example is disheartening enough for a newbie . It seems there are options to include proxy if we directly use the RemoteBlast.pm . chandan On 7/12/05, Jason Stajich wrote: > Sorry I meant just do this. > $remoteblast->ua->env_proxy; > > The ua function is not currently written to accept storing a new ua object > but of course you can just do: > $remoteblast->{'_ua'} = LWP::UserAgent->new(env_proxy => 1). > > -jason > > > > On Jul 11, 2005, at 2:32 PM, Jason Stajich wrote: > > Thanks - I think you can just reset the LWP object directly if you like in > your script code w/o modifying the module: > $remoteblast->ua(LWP::UserAgent->new(env_proxy =>1)); > > We can certainly update the module to add this default initialization > though. > > You should submit it as feature request at http://bugzilla.open-bio.org/ so > we can track whether or not someone has done it. > > On Jul 8, 2005, at 4:00 PM, CHANDAN SINGH wrote: > > > > sorry for the reduplication of the mail but i had forgot to mention > the more bugging bug which is How come ,others dont get this problem or > have i misunderstood something . > do reply > > Hi eveybody > Those of u ,having problem in blasting sequences from Bio::Perl module > through > proxy and getting > "time out " or " no route to host " errors > do need to set the environment proxy variable ( hello smarty we all know > it ) > and just give the following argument > ( env_proxy => 1 ) > to > $self->{'_ua'} = new LWP::UserAgent( ); > as > $self->{'_ua'} = new LWP::UserAgent( env_proxy => 1 ); > > in the following sub in Bio::Tools::Run::RemoteBlast.pm > sub ua { > my ($self, $value) = @_; > if( ! defined $self->{'_ua'} ) { > $self->{'_ua'} = new LWP::UserAgent( ); > my $nm = ref($self); > $nm =~ s/::/_/g; > $self->{'_ua'}->agent("bioperl-$nm/$MODVERSION"); > } > return $self->{'_ua'}; > } > I saw this bug in the stable version and also in the one downloaded > from CVS yesterday . > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > jason.stajich at duke.edu > > http://www.duke.edu/~jes12/ > > From mckays at cshl.edu Mon Jul 11 12:29:54 2005 From: mckays at cshl.edu (Sheldon McKay) Date: Mon Jul 11 18:44:55 2005 Subject: [Bioperl-l] extract info from .game.xml In-Reply-To: References: <497101aad05f378c5e1805c206b1cfd8@cshl.edu> Message-ID: Hi Tuan, Your game XML file contains only sequence and computational_analysis elements, with no annotation elements. Unfortunately lack of annotations is fatal and computational analysis features are not supported in the bioperl parser. Lack of annotations does not necessarily need to be fatal, though. I will see what I can do about that. Sheldon On Jul 11, 2005, at 12:05 PM, Tuan A. Tran wrote: > Hi Sheldon, > > Thanks very much for your email. Yes, I am still interested in doing > that. > It is quite a while ago so I don't remember what I might have done > wrong. Anyway, I seem to recall that in attached data file, there is > not any 'annotation' anywhere. I have not checked since then. I just > downloaded the attached file from flybase.org. > > I hope that you can help me to figure out. > > Sincerely, > Tuan > > > On 7/7/05, Sheldon McKay wrote: >> Hi, >> >> Sorry for taking so long to reply. If you are still interested in >> doing this, could you send me the file you are trying to parse and i >> will see if I can figure out what is wrong? >> >> Thanks, >> Sheldon >> >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Sheldon McKay, PhD >> Cold Spring Harbor Laboratory >> 1 Bungtown Road >> Cold Spring Harbor, NY 11724 >> On Jun 14, 2005, at 8:03 PM, Tuan A. Tran wrote: >> >>> Hi, >>> >>> I am trying to extract some information from a file filename.game.xml >>> (I got this file from flybase.org). I wrote a simple script to test >>> it. However, I keep getting the following message >>> >>> ------------- EXCEPTION ------------- >>> MSG: No annotations >>> STACK Bio::SeqIO::game::gameHandler::load >>> /usr/local/share/perl/5.8.4/Bio/SeqIO/game/gameHandler.pm:121 >>> STACK Bio::SeqIO::game::_getseqs >>> /usr/local/share/perl/5.8.4/Bio/SeqIO/game.pm:156 >>> STACK Bio::SeqIO::game::next_seq >>> /usr/local/share/perl/5.8.4/Bio/SeqIO/game.pm:101 >>> STACK toplevel fetchseq_game_xml.pl:64 >>> >>> I have no idea why. Can anyone help? >>> Thanks in advance, >>> TAT >>> >>> --------------------------------- >>> My simple script is >>> >>> #!/usr/local/lib/perl >>> >>> use strict; >>> >>> sub NULL () {0}; >>> >>> use Bio::Seq; >>> use Bio::SeqIO; >>> #use Bio::SeqIO::game; >>> #use Bio::Annotation; >>> use Bio::SearchIO; >>> use Bio::AlignIO; >>> use Bio::SimpleAlign; >>> use Bio::LocatableSeq; >>> use Bio::Tools::Run::StandAloneBlast; >>> use Bio::Tools::Run::Alignment::Clustalw; >>> use Getopt::Long; >>> use Bio::DB::GenBank; >>> use Bio::DB::Flat::BDB; >>> #use Bio::Index::GenBank; >>> use Bio::Index::Fasta; >>> use Bio::SeqFeature::Generic; >>> use DBI; >>> >>> >>> my $infile = shift; >>> my $in = Bio::SeqIO->new( -file=> $infile, -format=>'game'); >>> >>> while (my $query = $in->next_seq() ) { >>> >>> print $query->id,"\n"; >>> } >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > <3R_27900000_28200000.game.xml.gz> From avilella at gmail.com Tue Jul 12 11:19:45 2005 From: avilella at gmail.com (Albert Vilella) Date: Tue Jul 12 11:12:41 2005 Subject: [Bioperl-l] bioperl-run Codeml.pm fix_blength Message-ID: <1121181586.8167.13.camel@localhost.localdomain> Hi, I noticed that the valid values for fix_blength in Codeml.pm do not include option "fix_blength 1: initial", I agreed, I would add it myself in cvs: bioperl-run/Bio/Tools/Run/Phylo/PAML/Codeml.pm 'fix_blength' => [0,-1,2], change to: 'fix_blength' => [0,-1,1,2], Jason? Bests, Albert. From jason.stajich at duke.edu Tue Jul 12 11:28:10 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Jul 12 11:19:26 2005 Subject: [Bioperl-l] Re: bioperl-run Codeml.pm fix_blength In-Reply-To: <1121181586.8167.13.camel@localhost.localdomain> References: <1121181586.8167.13.camel@localhost.localdomain> Message-ID: sure - fix away. I think it was a bit misguided on my part to think we could really capture all the valid values in this init hash - possibly could remove the whole system of checking and just establish default values. Anyways, feel free to check that it. -jason On Jul 12, 2005, at 11:19 AM, Albert Vilella wrote: > Hi, > > I noticed that the valid values for fix_blength in Codeml.pm do not > include option "fix_blength 1: initial", > > I agreed, I would add it myself in cvs: > > bioperl-run/Bio/Tools/Run/Phylo/PAML/Codeml.pm > > 'fix_blength' => [0,-1,2], > change to: > 'fix_blength' => [0,-1,1,2], > > Jason? > > Bests, > > Albert. > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From valiente at jaist.ac.jp Mon Jul 11 22:37:12 2005 From: valiente at jaist.ac.jp (Gabriel Valiente) Date: Tue Jul 12 11:25:00 2005 Subject: [Bioperl-l] Announce: Bio::Tree::Draw::Cladogram In-Reply-To: <200507111702.52264.heikki@ebi.ac.uk> References: <200507111702.52264.heikki@ebi.ac.uk> Message-ID: <42D32CD8.6070409@jaist.ac.jp> Bio::Tree::Draw::Cladogram is a new module for drawing Bio::Tree::Tree objects in Encapsulated PostScript (EPS) format. It can be utilized both for displaying a single phylogenetic tree (a cladogram) and for the comparative display of two phylogenetic trees (a tanglegram) such as a gene tree and a species tree, a host tree and a parasite tree, two alternative trees for the same set of taxa, or two alternative trees for overlapping sets of taxa. The POD contains a detailed description of the way in which cladograms and tanglegrams are built. However, tests are still missing and I'm afraid I won't be able to work on this until August. Many extensions are possible, such as using branch lengths and producing output in other graphic formats. Any suggestions are welcome. Enjoy, Gabriel From valiente at jaist.ac.jp Mon Jul 11 22:45:17 2005 From: valiente at jaist.ac.jp (Gabriel Valiente) Date: Tue Jul 12 11:25:05 2005 Subject: [Bioperl-l] Announce: Bio::Tree::Compatible In-Reply-To: <42D32CD8.6070409@jaist.ac.jp> References: <200507111702.52264.heikki@ebi.ac.uk> <42D32CD8.6070409@jaist.ac.jp> Message-ID: <42D32EBD.6070909@jaist.ac.jp> Bio::Tree::Compatible is a new module for testing compatibility of phylogenetic trees with nested taxa represented as Bio::Tree::Tree objects. It is based on a recent characterization of ancestral compatibility of semi-labeled trees in terms of their cluster representations. The POD is now complete but tests are still missing and I'm afraid I won't be able to work on this until August. However, I've tested this module on all pairs of trees from TreeBASE. Any suggestions are welcome. The theory behind this module can be found at: http://www.lsi.upc.es/dept/techreps/listado_concreto.php?id=766 http://arxiv.org/abs/cs.DM/0505086 Enjoy, Gabriel From avilella at gmail.com Tue Jul 12 11:40:41 2005 From: avilella at gmail.com (Albert Vilella) Date: Tue Jul 12 11:32:38 2005 Subject: [Bioperl-l] Re: bioperl-run Codeml.pm fix_blength In-Reply-To: References: <1121181586.8167.13.camel@localhost.localdomain> Message-ID: <1121182841.8167.22.camel@localhost.localdomain> El dt 12 de 07 del 2005 a les 11:28 -0400, en/na Jason Stajich va escriure: > sure - fix away. done. Also, in my pipeline it would be interesting to call Codeml.pm via bioperl keeping the tempfiles in a specified directory: I understand that save_tempfiles will save the generated tempfiles in the temp directory, the dir will remain in $tempdir. An $outdir could be specified so that the codeml run is saved where the user specifies. What do you think? Albert. From jason.stajich at duke.edu Tue Jul 12 11:47:19 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Jul 12 11:38:15 2005 Subject: [Bioperl-l] Re: bioperl-run Codeml.pm fix_blength In-Reply-To: <1121182841.8167.22.camel@localhost.localdomain> References: <1121181586.8167.13.camel@localhost.localdomain> <1121182841.8167.22.camel@localhost.localdomain> Message-ID: Sounds good - would you just copy the dir to the users specified outdir? Another way to go is make tempdir a settable value (see Bio::Tools::Run::WrapperBase -- in bioperl-live repository) - but this may not be as clear on how to use it? -jason On Jul 12, 2005, at 11:40 AM, Albert Vilella wrote: > El dt 12 de 07 del 2005 a les 11:28 -0400, en/na Jason Stajich va > escriure: > >> sure - fix away. >> > > done. > > Also, in my pipeline it would be interesting to call Codeml.pm via > bioperl keeping the tempfiles in a specified directory: > > I understand that save_tempfiles will save the generated tempfiles in > the temp directory, the dir will remain in $tempdir. > An $outdir could be specified so that the codeml run is saved where > the > user specifies. > > What do you think? > > Albert. > > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From avilella at gmail.com Tue Jul 12 12:02:57 2005 From: avilella at gmail.com (Albert Vilella) Date: Tue Jul 12 11:54:55 2005 Subject: [Bioperl-l] Re: bioperl-run Codeml.pm fix_blength In-Reply-To: References: <1121181586.8167.13.camel@localhost.localdomain> <1121182841.8167.22.camel@localhost.localdomain> Message-ID: <1121184178.8167.28.camel@localhost.localdomain> El dt 12 de 07 del 2005 a les 11:47 -0400, en/na Jason Stajich va escriure: > Sounds good - would you just copy the dir to the users specified > outdir? yes > Another way to go is make tempdir a settable value (see > Bio::Tools::Run::WrapperBase -- in bioperl-live repository) - but > this may not be as clear on how to use it? well, it is not as direct as the other way but maybe it is cleaner in the sense that will directly run the analysis on $tempdir and no extra cp or mv would be needed... Albert. > > > -jason > On Jul 12, 2005, at 11:40 AM, Albert Vilella wrote: > > > El dt 12 de 07 del 2005 a les 11:28 -0400, en/na Jason Stajich va > > escriure: > > > > > sure - fix away. > > > > > > > > > done. > > > > > > Also, in my pipeline it would be interesting to call Codeml.pm via > > bioperl keeping the tempfiles in a specified directory: > > > > > > I understand that save_tempfiles will save the generated tempfiles > > in > > the temp directory, the dir will remain in $tempdir. > > An $outdir could be specified so that the codeml run is saved where > > the > > user specifies. > > > > > > What do you think? > > > > > > Albert. > > > > > > > > > > > > -- > > Jason Stajich > > jason.stajich at duke.edu > > http://www.duke.edu/~jes12/ > > > > > From wrp at virginia.edu Tue Jul 12 12:41:41 2005 From: wrp at virginia.edu (William R. Pearson) Date: Tue Jul 12 12:35:05 2005 Subject: [Bioperl-l] Computational and Comparative Genomics Course - July 15 Deadline In-Reply-To: <200507121533.j6CFXha6021002@portal.open-bio.org> References: <200507121533.j6CFXha6021002@portal.open-bio.org> Message-ID: <56C4C92F-B286-4C0F-8EC9-B094BA9A7528@virginia.edu> Course announcement - Application deadline, July 15, 2005 ================================================================ Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS November 2 - 8, 2005 Application Deadline: July 15, 2005 INSTRUCTORS: Pearson, William, Ph.D., University of Virginia, Charlottesville, VA Smith, Randall, Ph.D., SmithKline Beecham Pharmaceuticals, King of Prussia, PA Beyond BLAST and FASTA - Alignment: from proteins to genomes - This course presents a comprehensive overview of the theory and practice of computational methods for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment, and genome scale alignment. Additional topics include gene finding, dentifying signals in unaligned sequences, integration of genetic and sequence information in biological databases. The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course makes extensive use of local WWW pages to present problem sets and the computing tools to solve them. Students use Windows and Mac workstations attached to a UNIX server; participants should be comfortable using the Unix operating system and a Unix text editor. The course is designed for biologists seeking advanced training in biological sequence analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis and comparative genomics. The primary focus of the Computational and Comparative Genomics Course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and developing new algorithms. Cold Spring Harbor also offers an Advanced Bioinformatics Programming course, which focuses more on software development. Over the past few years, the course has been expanded to cover more algorithms and exercises on comparative genomics and genome databases. For additional information and the lecture schedule and problem sets for the 2004 course, see: http://fasta.bioch.virginia.edu/cshl04 ================================================================ To apply to the course, fill out the form at: http://meetings.cshl.edu/courses/courseapplication.asp ================================================================ From heikki at ebi.ac.uk Wed Jul 13 09:06:24 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed Jul 13 08:56:51 2005 Subject: [Bioperl-l] EMBL ID line parsing error Message-ID: <200507131406.24939.heikki@ebi.ac.uk> I noticed that one BioFetch test was failing. It was caused by an EMBL entry object not having a display ID. The failure was caused by regular expression in the EMBL parser not allowing spaces in the molecule substring of the ID line: ID BUM standard; genomic RNA; VRL; 200 BP. was: (\S+); fix: ([\S ]+); now in bioperl-live The affected Bio::Seq::RichSeq methods are: display_id(), id(), molecule(), division() Here is a breakdown of all molecule values in current EMBL release: circular genomic dna 7427 circular genomic rna 687 circular mrna 23 circular other dna 915 circular other rna 9 circular trna 1 circular unassigned dna 266 circular unassigned rna 2 genomic dna 14573961 genomic rna 152219 mrna 28138477 other dna 6956 other rna 1827 pre-rna 898 rrna 5999 scrna 95 snorna 981 snrna 455 trna 667 unassigned dna 1941868 unassigned rna 102162 One third of the EMBL entries are affected. This error does not affect GenBank entries which use different syntax. I wonder how long this error has been there! -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at ebi.ac.uk Wed Jul 13 10:49:09 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed Jul 13 10:46:53 2005 Subject: [Bioperl-l] Announce: Bio::Seq::Quality In-Reply-To: <200507111702.52264.heikki@ebi.ac.uk> References: <200507111702.52264.heikki@ebi.ac.uk> Message-ID: <200507131549.09525.heikki@ebi.ac.uk> I've been cleaning Bio::Seq::SeqWithQuality usage from bioperl-live modules and replacing it with Bio::Seq::Quality. Everything seems to work. I've left Bio::Seq::PrimaryQual for the next rewrite. Its functionality is all in the Quality class (get and set id and quality values), but you can not get the quality values from a Bio::Seq::Quality object if you do not have the sequence set. Usually qualities without residues do not make such sense, but there is something in Bio::Assembly code or at least in its tests that need plain qualities. -Heikki On Monday 11 July 2005 17:02, Heikki Lehvaslaiho wrote: > Bio::Seq::Quality is a new module that allows you to store per-residue > quality and trace index values using Bio::Seq::MetaI interface. It replaces > Bio::Seq::SeqWithQuality which is now deprecated. > > Solutions to persistence should focus on storing Bio::Seq::Meta and > Bio::Seq::Meta::Array objects. It should be easy to stringify most real > world meta values. Then the persistence could be implemented by storing the > sequence object and N number of meta strings. > > All the functional code is in Bio::Seq::Meta::Array, Bio::Seq::Quality > merely adds a convenient interface. > > The POD contains a discussion of differences from Bio::Seq::SeqWithQuality. > If the following, or anything else, is a problem let me know as soon as > possible: > > The greatest difference to Bio::Seq::SeqWithQuality is that in this > implementation quality for all sequence residues are automatically > assigned a value of '0' (zero) unless you set it to something > else. Length of the quality array always equals the length of the > sequence. Therefore, length() never returns "DIFFERENT". > > > Enjoy, > -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From michael.watson at bbsrc.ac.uk Wed Jul 13 11:04:59 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed Jul 13 10:57:27 2005 Subject: [Bioperl-l] Getting hit or subject length in BPlite Message-ID: <8975119BCD0AC5419D61A9CF1A923E9502067A87@iahce2knas1.iah.bbsrc.reserved> Hi Maybe I'm just being dumb, but I can't see a way to get the hit length (note: NOT hsp length) using Bplite to parse a blast report.... Any help? Mick From pierre_rioux at yahoo.com Wed Jul 13 13:51:56 2005 From: pierre_rioux at yahoo.com (Pierre Rioux) Date: Wed Jul 13 13:42:41 2005 Subject: [Bioperl-l] EMBL ID line parsing error In-Reply-To: <200507131406.24939.heikki@ebi.ac.uk> Message-ID: <20050713175156.77649.qmail@web53003.mail.yahoo.com> Hi, > I noticed that one BioFetch test was failing. It was caused by an EMBL entry > object not having a display ID. The failure was caused by regular expression > in the EMBL parser not allowing spaces in the molecule substring of the ID > line: > > > ID BUM standard; genomic RNA; VRL; 200 BP. > was: (\S+); > fix: ([\S ]+); now in bioperl-live Because regular expressions are greedy, and because \S also matches the semicolon ";", I think maybe a better fix would be ([^;]); That way, if the EMBL line format ever gets extended to include more semicolon-separated fields, it will still work. (Personally, when I write regexes, I always try to make sure the specific character that is used as delimiter cannot be matched by the parenthesized regex for the fields... otherwise you're putting too much trust on the NUMBER of fields in the line for the whole line-matching regex to succeed as planned). Pierre ____________________________________________________ Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs From pierre_rioux at yahoo.com Wed Jul 13 14:08:09 2005 From: pierre_rioux at yahoo.com (Pierre Rioux) Date: Wed Jul 13 13:58:55 2005 Subject: [Bioperl-l] EMBL ID line parsing error In-Reply-To: <20050713175156.77649.qmail@web53003.mail.yahoo.com> Message-ID: <20050713180810.81238.qmail@web53001.mail.yahoo.com> Small correction. I wrote: > ([^;]); But it should be: ([^;]*); I hope mail readers out there won't turn this into some kind of weird smiley. :-) Pierre ____________________________________________________ Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs From reneehalbrook74 at yahoo.com Wed Jul 13 17:19:57 2005 From: reneehalbrook74 at yahoo.com (Renee Halbrook) Date: Wed Jul 13 17:10:47 2005 Subject: [Bioperl-l] COG parsing ? Message-ID: <20050713211957.6373.qmail@web40513.mail.yahoo.com> Hi, Does BioPerl have a parser for the Clusters of Orthologous Groups of proteins (COGs) from NCBI ? Thanks for any help, Renee Halbrook __________________________________ Yahoo! Mail Stay connected, organized, and protected. Take the tour: http://tour.mail.yahoo.com/mailtour.html From heikki at ebi.ac.uk Wed Jul 13 17:49:39 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed Jul 13 17:40:02 2005 Subject: [Bioperl-l] EMBL ID line parsing error In-Reply-To: <20050713180810.81238.qmail@web53001.mail.yahoo.com> References: <20050713180810.81238.qmail@web53001.mail.yahoo.com> Message-ID: <200507132249.40102.heikki@ebi.ac.uk> Hi Pierre, You are quite right, ([^;]*); or ([^;]+); really is a lot better way of writing it and that is the way I committed the fix: ($name, $mol, $div) = ($line =~ /^ID\s+(\S+).*;\s+([^;]+);\s+(\S+);/); I started writing the email as note for myself when I first verified the bug, and then forgot to change the text in the email before sending. Sorry to waste your time, -Heikki On Wednesday 13 July 2005 19:08, Pierre Rioux wrote: > Small correction. > > I wrote: > > ([^;]); > > But it should be: > > ([^;]*); > > I hope mail readers out there won't turn this into > some kind of weird smiley. :-) > > Pierre > > > > ____________________________________________________ > Start your day with Yahoo! - make it your home page > http://www.yahoo.com/r/hs -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From Marc.Logghe at devgen.com Thu Jul 14 05:53:30 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Thu Jul 14 05:44:18 2005 Subject: [Bioperl-l] Announce: Bio::Seq::Quality Message-ID: <0C528E3670D8CE4B8E013F6749231AA62F5439@ANTARESIA.be.devgen.com> Hi Heikki, > I've left Bio::Seq::PrimaryQual for the next rewrite. Its > functionality is all in the Quality class (get and set id and > quality values), but you can not get the quality values from > a Bio::Seq::Quality object if you do not have the sequence > set. Usually qualities without residues do not make such > sense, but there is something in Bio::Assembly code or at > least in its tests that need plain qualities. > > The greatest difference to Bio::Seq::SeqWithQuality is > that in this > > implementation quality for all sequence residues are automatically > > assigned a value of '0' (zero) unless you set it to something > > else. Length of the quality array always equals the length of the > > sequence. Therefore, length() never returns "DIFFERENT". When these to extracts of your mail are considered, am I correct in thinking that the lengths of the sequence and quality array only are identical when you pass a sequence in the construcor together with the quality string ? But in all the other cases, how can one be sure that the lengths are equal ? E.g. you can first create the Bio::Seq::Quality object passing it the quality and assign the sequence afterwards by calling $qual->seq($seq). As you indicated, it is even possible not to set the sequence, so seq length is zero while the quality is not. Does it mean a user should check the lengths explicitely ? BTW I am currently editing Bio::Assembly::IO::ace so that it also parses CAP3 generated ACE files. You think it is OK also to set the contig sequence for the quality object, e.g. not leaving the seq attribute empty. I'll check the Bio::Assembly code why plain quality is needed to pass the tests. Cheers, Marc From michael.watson at bbsrc.ac.uk Thu Jul 14 06:53:17 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu Jul 14 06:44:16 2005 Subject: [Bioperl-l] Blast features added to wrong strand??? Message-ID: <8975119BCD0AC5419D61A9CF1A923E9502067A90@iahce2knas1.iah.bbsrc.reserved> Hi I'm using bioperl-1.4. I have a genomic region. I am first using bl2seq and blastn to align it with some custom sequences, then blastall and blastx to blast it agains uniprot. I'm using SearchIO and pretty standard code to add the blast hits as features to the sequence e.g.: my $feature = Bio::SeqFeature::Generic->new(-primary_tag => 'CDS', -score => $hit->raw_score, -display_name => $hit->name, -tag => { locus_tag => $name, note => $note, } ); # @hsps is a filtered list of HSPs obtained from $hit->next_hsp foreach $hsp (@hsps) { $feature->add_sub_SeqFeature($hsp,'EXPAND'); } $genome->add_SeqFeature($feature); # $genome is a Bio::Seq feature Now, the bl2seq hits all have the strand reported as "Plus / Minus" and the blastx hits all have the strand reported as -1 i.e. there is a gene on the other strand of my sequence. HOWEVER, using the above code for both the bl2seq results and the blastx results, ONLY the blastx results get annotated on the reverse strand - the bl2seq results, which report the strand as "Plus / Minus", get annotated on the forward strand and hence point the wrong way when I draw them :-( So my question is, what am I doing wrong in the above code (which is pretty much ripped off from the bioperl HOWTOs) that makes the bl2seq "Plus / Minus" hits get annotated on the plus strand on my sequence?? Many thanks Mick From jason.stajich at duke.edu Thu Jul 14 08:40:47 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Jul 14 08:31:56 2005 Subject: [Bioperl-l] Blast features added to wrong strand??? In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9502067A90@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E9502067A90@iahce2knas1.iah.bbsrc.reserved> Message-ID: <86A43AEE-AC14-479E-B396-901D3C25C906@duke.edu> Why use bl2seq when you can use fasta.... From the Bio::SearchIO::blast documentation =head2 bl2seq parsing Since I cannot differentiate between BLASTX and TBLASTN since bl2seq doesn't report the algorithm used - I assume it is BLASTX by default - you can supply the program type with -report_type in the SearchIO constructor i.e. my $parser = new Bio::SearchIO(-format => 'blast', -file => 'bl2seq.tblastn.report', -report_type => 'tblastn'); This only really affects where the frame and strand information are put - they will always be on the $hsp-Equery instead of on the $hsp-Ehit part of the feature pair for blastx and tblastn bl2seq produced reports. Hope that's clear... On Jul 14, 2005, at 6:53 AM, michael watson ((IAH-C)) wrote: > Hi > > I'm using bioperl-1.4. > > I have a genomic region. I am first using bl2seq and blastn to > align it > with some custom sequences, then blastall and blastx to blast it > agains > uniprot. > > I'm using SearchIO and pretty standard code to add the blast hits as > features to the sequence e.g.: > > my $feature = Bio::SeqFeature::Generic->new(-primary_tag => 'CDS', > -score > => $hit->raw_score, > -display_name > => $hit->name, > -tag > => { > > locus_tag => > $name, > > note => $note, > } > ); > > # @hsps is a filtered list of HSPs obtained from $hit->next_hsp > foreach $hsp (@hsps) { > $feature->add_sub_SeqFeature($hsp,'EXPAND'); > } > > $genome->add_SeqFeature($feature); # $genome is a Bio::Seq feature > > Now, the bl2seq hits all have the strand reported as "Plus / Minus" > and > the blastx hits all have the strand reported as -1 i.e. there is a > gene > on the other strand of my sequence. > > HOWEVER, using the above code for both the bl2seq results and the > blastx > results, ONLY the blastx results get annotated on the reverse strand - > the bl2seq results, which report the strand as "Plus / Minus", get > annotated on the forward strand and hence point the wrong way when I > draw them :-( > > So my question is, what am I doing wrong in the above code (which is > pretty much ripped off from the bioperl HOWTOs) that makes the bl2seq > "Plus / Minus" hits get annotated on the plus strand on my sequence?? > > Many thanks > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From michael.watson at bbsrc.ac.uk Thu Jul 14 09:09:45 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu Jul 14 09:00:35 2005 Subject: [Bioperl-l] Blast features added to wrong strand??? Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D9EF@iahce2knas1.iah.bbsrc.reserved> No, see something definitely weird is going on, but I am happy to accept it may be my misuse. To the bl2seq hits I have added the code "-report_type => 'blastn'" and I get the same results. The *really* weird thing is that after I have created these features, if I write them to an EMBL file, they are annotated as being on the -1 strand e.g. complement(1..230) etc etc. However, when I pass those very same features to $panel->add_track, they are drawn on the + strand. If I iterate through them and "print $feat->strand", they all say -1. Write them out as EMBL, they say "complement(1..23) etc. Draw them, and they point --------------------> that way. If I write them out as EMBL, then read them back in using Bio::SeqIO, then pass them to $panel->add_track, they point in the right direction. So something is getting set wrong - could it be "frame"? ???? -----Original Message----- From: Jason Stajich [mailto:jason.stajich@duke.edu] Sent: 14 July 2005 13:41 To: michael watson (IAH-C) Cc: bioperl-l@portal.open-bio.org Subject: Re: [Bioperl-l] Blast features added to wrong strand??? Why use bl2seq when you can use fasta.... From the Bio::SearchIO::blast documentation =head2 bl2seq parsing Since I cannot differentiate between BLASTX and TBLASTN since bl2seq doesn't report the algorithm used - I assume it is BLASTX by default - you can supply the program type with -report_type in the SearchIO constructor i.e. my $parser = new Bio::SearchIO(-format => 'blast', -file => 'bl2seq.tblastn.report', -report_type => 'tblastn'); This only really affects where the frame and strand information are put - they will always be on the $hsp-Equery instead of on the $hsp-Ehit part of the feature pair for blastx and tblastn bl2seq produced reports. Hope that's clear... On Jul 14, 2005, at 6:53 AM, michael watson ((IAH-C)) wrote: > Hi > > I'm using bioperl-1.4. > > I have a genomic region. I am first using bl2seq and blastn to > align it > with some custom sequences, then blastall and blastx to blast it > agains > uniprot. > > I'm using SearchIO and pretty standard code to add the blast hits as > features to the sequence e.g.: > > my $feature = Bio::SeqFeature::Generic->new(-primary_tag => 'CDS', > -score > => $hit->raw_score, > -display_name > => $hit->name, > -tag > => { > > locus_tag => > $name, > > note => $note, > } > ); > > # @hsps is a filtered list of HSPs obtained from $hit->next_hsp > foreach $hsp (@hsps) { > $feature->add_sub_SeqFeature($hsp,'EXPAND'); > } > > $genome->add_SeqFeature($feature); # $genome is a Bio::Seq feature > > Now, the bl2seq hits all have the strand reported as "Plus / Minus" > and > the blastx hits all have the strand reported as -1 i.e. there is a > gene > on the other strand of my sequence. > > HOWEVER, using the above code for both the bl2seq results and the > blastx > results, ONLY the blastx results get annotated on the reverse strand - > the bl2seq results, which report the strand as "Plus / Minus", get > annotated on the forward strand and hence point the wrong way when I > draw them :-( > > So my question is, what am I doing wrong in the above code (which is > pretty much ripped off from the bioperl HOWTOs) that makes the bl2seq > "Plus / Minus" hits get annotated on the plus strand on my sequence?? > > Many thanks > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From jbedell at oriongenomics.com Thu Jul 14 10:24:39 2005 From: jbedell at oriongenomics.com (Joseph Bedell) Date: Thu Jul 14 10:16:08 2005 Subject: [Bioperl-l] Getting hit or subject length in BPlite Message-ID: <434AF352F9D03C4C896782B8CC78BC7687F922@VADER.oriongenomics.com> Hey Mick, Here's how to get the queryLength and the sbjct Length. Is this what you're looking for? my $report = new BPlite(\*STDIN); $report->queryLength; while(my $sbjct = $report->nextSbjct) { $sbjct->length; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Joseph A Bedell, Ph.D. office: 314-615-6979 Director, Bioinformatics fax: 314-615-6975 Orion Genomics cell: 314-518-1343 4041 Forest Park Ave St. Louis, MO 63108 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- >bounces@portal.open-bio.org] On Behalf Of michael watson (IAH-C) >Sent: Wednesday, July 13, 2005 10:05 AM >To: bioperl-l@portal.open-bio.org >Subject: [Bioperl-l] Getting hit or subject length in BPlite > >Hi > >Maybe I'm just being dumb, but I can't see a way to get the hit length >(note: NOT hsp length) using Bplite to parse a blast report.... > >Any help? > >Mick > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l From michael.watson at bbsrc.ac.uk Thu Jul 14 10:32:11 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu Jul 14 10:23:57 2005 Subject: [Bioperl-l] Getting hit or subject length in BPlite Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D9F8@iahce2knas1.iah.bbsrc.reserved> Error message: Can't locate object method "length" via package "Bio::Tools::Bplite::Sbjct" I managed to access it by hacking, ie I call $sbjct->{'LENGTH'} But it seems a little bit of an oversight to store it and yet not provide an accessor method? Mick -----Original Message----- From: Joseph Bedell [mailto:jbedell@oriongenomics.com] Sent: 14 July 2005 15:25 To: michael watson (IAH-C); bioperl-l@portal.open-bio.org Subject: RE: [Bioperl-l] Getting hit or subject length in BPlite Hey Mick, Here's how to get the queryLength and the sbjct Length. Is this what you're looking for? my $report = new BPlite(\*STDIN); $report->queryLength; while(my $sbjct = $report->nextSbjct) { $sbjct->length; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Joseph A Bedell, Ph.D. office: 314-615-6979 Director, Bioinformatics fax: 314-615-6975 Orion Genomics cell: 314-518-1343 4041 Forest Park Ave St. Louis, MO 63108 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- >bounces@portal.open-bio.org] On Behalf Of michael watson (IAH-C) >Sent: Wednesday, July 13, 2005 10:05 AM >To: bioperl-l@portal.open-bio.org >Subject: [Bioperl-l] Getting hit or subject length in BPlite > >Hi > >Maybe I'm just being dumb, but I can't see a way to get the hit length >(note: NOT hsp length) using Bplite to parse a blast report.... > >Any help? > >Mick > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l From heikki at ebi.ac.uk Thu Jul 14 11:19:56 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Thu Jul 14 11:10:27 2005 Subject: [Bioperl-l] Announce: Bio::Seq::Quality In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA62F5439@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA62F5439@ANTARESIA.be.devgen.com> Message-ID: <200507141619.56288.heikki@ebi.ac.uk> Marc, The way I wrote Bio::Seq::Meta modules is that you can set meta sets and sequence completely independently and everything is stored within the object, but only that part of the meta arrays are rerurned that have residues. I tried this out now and realised that it does not work for padding the quality values: e.g. $s = new Bio::Seq::Quality(-qual=> "6 6 7") $s->qual(); # returns [] $s->qual_text(); # returns '' $s->seq(atcg); $qual_text(); # should return '6 6 7 0' but returns '6 6 7'; I have to tweak the code now. So, what do you think? Is the automatic padding a good or bad thing? Should I get rid of it or make sure it works as I planned? In other words, do you think it is better to let users make their own mistakes and offer ways to check for inconsistencies, or offer a "padded" fool proof system? (If this fool gets it right in the first place.) -Heikki On Thursday 14 July 2005 10:53, Marc Logghe wrote: > Hi Heikki, > > > I've left Bio::Seq::PrimaryQual for the next rewrite. Its > > functionality is all in the Quality class (get and set id and > > quality values), but you can not get the quality values from > > a Bio::Seq::Quality object if you do not have the sequence > > set. Usually qualities without residues do not make such > > sense, but there is something in Bio::Assembly code or at > > least in its tests that need plain qualities. > > > > > The greatest difference to Bio::Seq::SeqWithQuality is > > > > that in this > > > > > implementation quality for all sequence residues are automatically > > > assigned a value of '0' (zero) unless you set it to something > > > else. Length of the quality array always equals the length of the > > > sequence. Therefore, length() never returns "DIFFERENT". > > When these to extracts of your mail are considered, am I correct in > thinking that the lengths of the sequence and quality array only are > identical when you pass a sequence in the construcor together with the > quality string ? But in all the other cases, how can one be sure that > the lengths are equal ? > E.g. you can first create the Bio::Seq::Quality object passing it the > quality and assign the sequence afterwards by calling $qual->seq($seq). > As you indicated, it is even possible not to set the sequence, so seq > length is zero while the quality is not. > Does it mean a user should check the lengths explicitely ? > BTW I am currently editing Bio::Assembly::IO::ace so that it also parses > CAP3 generated ACE files. You think it is OK also to set the contig > sequence for the quality object, e.g. not leaving the seq attribute > empty. I'll check the Bio::Assembly code why plain quality is needed to > pass the tests. > Cheers, > Marc -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From Marc.Logghe at devgen.com Thu Jul 14 11:54:10 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Thu Jul 14 11:44:58 2005 Subject: [Bioperl-l] Announce: Bio::Seq::Quality Message-ID: <0C528E3670D8CE4B8E013F6749231AA62F5440@ANTARESIA.be.devgen.com> > > The way I wrote Bio::Seq::Meta modules is that you can set > meta sets and sequence completely independently and > everything is stored within the object, but only that part of > the meta arrays are rerurned that have residues. > > I tried this out now and realised that it does not work for > padding the quality values: > > e.g. > $s = new Bio::Seq::Quality(-qual=> "6 6 7") $s->qual(); # > returns [] $s->qual_text(); # returns '' > $s->seq(atcg); > $qual_text(); # should return '6 6 7 0' but returns '6 6 7'; Ah, I see. The length checking + padding is only triggerd when one calls qual(). > I have to tweak the code now. So, what do you think? Is the > automatic padding a good or bad thing? Should I get rid of it > or make sure it works as I planned? Personally I'd use that optionally by setting/resetting a padding flag or something. I'd more be interested in having a way to validate your Bio::Seq::Quality one way or another. In de case padding is switched off, I'd like to know whether my sequence length is exactly the same as my quality array. Does that make sense ? Thing is, I am currently struggling with the Bio::Assembly* module because we've noticed that the contig sequence object may contain gaps and as a consequence is larger than the quality object that can be extracted from the ace file produced by cap3. I'd like to include into Bio::Assembly::IO::ace the feature that a Bio::Seq::Quality is created *including* the cleaned contig sequence (gaps removed). In that process it would be very usefull to have a way to check for inconsistencies. Not sure however what to do when an inconsistency is actually occuring: throw an exception, a warning, trash the contig or keep it, ??? > > In other words, do you think it is better to let users make > their own mistakes and offer ways to check for > inconsistencies, or offer a "padded" fool proof system? (If > this fool gets it right in the first place.) In conclusion I'd opt for a inconsistency check and an optional padding feature. Cheers, ML From n.haigh at sheffield.ac.uk Fri Jul 15 08:43:16 2005 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Fri Jul 15 08:34:05 2005 Subject: [Bioperl-l] Bio::Graphics and primer3 pipeline Message-ID: I'm creating a pipeline for passing around 200 sequences to primer3 in order to generate primers. I want to be able to use Bio::Graphics to create a png file for each sequence with the position of the primers shown and some of the details about each primer (e.g. Tm, %GC). Here's what I have so far in pseudocode: Foreach Bio::Seq object Add position of introns as Bio::SeqFeature::Generic features Run Primer3 with Bio::Seq object Loop through primers, returning a Bio::Seq::PrimedSeq object Add primer as features using: Bio::Seq object->add_SeqFeature(Bio::Seq::PrimedSeq); Now create png using Bio::Graphics This works ok, but I'm lost trying to get the Tm and GC content of the primers as returned by Primer3 Does anyone have a script that can do something similar that I might try to work out whats going on? Thanks Nathan ---------------------------------- Nathan Haigh Bioinformatics PostDoctoral Research Associate Room B2 211 Department of Animal and Plant Sciences University of Sheffield Western Bank Sheffield S10 2TN Tel: +44 (0)114 22 20112 Mob: +44 (0)7742 533 569 Fax: +44 (0)114 22 20002 From halwaniradzi at yahoo.com Fri Jul 15 00:47:13 2005 From: halwaniradzi at yahoo.com (halwani radzi) Date: Fri Jul 15 10:34:09 2005 Subject: [Bioperl-l] anyone have experience on developing parallel smith-waterman for sequence alignment? Message-ID: <20050715044713.10107.qmail@web90009.mail.scd.yahoo.com> Hi everyone, I have to do a research on parallelizing the matrix calculation using smith-waterman algorithm based on local aligment for sequence comparison. I have read some threads here discussing about this thing. I really appreciate if anyone that has exprience with this to share some information such as example codes, suitable framework and hardware even the papers/people/website to refer to. Thank you very much.. --------------------------------- Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. From mayagao1999 at yahoo.com Fri Jul 15 13:36:48 2005 From: mayagao1999 at yahoo.com (Alex Zhang) Date: Fri Jul 15 13:27:29 2005 Subject: [Bioperl-l] A question about replacing a substring using Bioperl Message-ID: <20050715173648.55102.qmail@web53508.mail.yahoo.com> Dear all, I have a txt file which stores 20 short DNA sequences and the length of each is 8, let's call it A. Meanwhile, I have another txt file which owns 100 long DNA sequences and the length of each is 200, let's call it B. Then, I want to replace a substring of each sequence in B with each one in A. The replacement starting site could be specified as you want(such as starting at position 1 for the first sequence in B, 10th for the 2nd sequence in B, 20th for the 3rd, until 190th for the 20th in B ) or picked by the program randomly. I am pretty sure substr(string,index,length,replacement string) can finish a part of this work. But I have limited experience of using Perl to manipulate two files. Can anybody give me some suggestions? Thank you very much and look forward to your reply! Best Regards, Maya __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From skirov at utk.edu Fri Jul 15 14:20:42 2005 From: skirov at utk.edu (Stefan Kirov) Date: Fri Jul 15 14:12:09 2005 Subject: [Bioperl-l] Re: Information regarding BioPerl Transfac module In-Reply-To: <6.2.0.14.0.20050715113315.0216a310@anumail.anu.edu.au> References: <6.2.0.14.0.20050715113315.0216a310@anumail.anu.edu.au> Message-ID: <42D7FE7A.2050803@utk.edu> Nagesh, Please address all questions to the bioperl list (if appropriate) and not to me personally. Bio:::Matrix::PSM::IO can parse only the matrix.dat file at the moment and I have no intention of extending it. As for TFBS::DB::LocalTRANSFAC- it is not part of BioPerl in the moment and I believe qustions should be addressed to Boris Lehnard or Leonardo Ramirez. But it is clear to me from the documentation, that it also parses only matrix.dat (see NAME section). Stefan Nagesh Chakka wrote: > Dear Stefan Kirov, > I am Nagesh doing my PhD in Medical Sciences at Australian National > University. I have read some of your posting in the BioPerl mailing > list and thought of writing to you as I was not able to get the > information I wanted. I wanted to know whether we can achieve a simple > task of getting the information (about the name of the TF, Cell > specificity, tissue expression and other information) from the .dat > file if we know the matrix ID information using the > TFBS::DB::LocalTRANSFAC. I havent seen any method that returns what I > am interested in. So are there any other modules to achieve this task. > I also had a look at the Bio::Matrix::PSM::IO (could not find what I > wanted). > Thanks very much for your attention. Any information related to this > would be highly appreciated. > Regards > > Nagesh Chakka > PhD Student > John Curtin School of Medical Research > Australian National University > PO Box 334, Canberra ACT 2601 > Phone: +61-2-6125-8303 > Fax: +61-2-6125-0415 From chad at dieselwurks.com Fri Jul 15 15:51:48 2005 From: chad at dieselwurks.com (Chad Matsalla) Date: Fri Jul 15 15:42:33 2005 Subject: [Bioperl-l] genbank2gff3.PLS and the unflatenner - Inconsistent order? Message-ID: Greetings, I posted to bioperl-l, hmm, back in June reporting issues with the genbank2gff* scripts. Time moved on but I returned to this project where I'm searching through the Arabidopsis mitochondria for things. I want to gff-i-fy this: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide&cmd=search&term=nc_001284 I downloaded the genbank record and ran this command: $BIOPERL_LIVE/scripts/scripts/Bio-DB-GFF/genbank2gff3.PLS genbank/nc_001284.gb Unordered features [on strand:-1]: NC_001284 Unflattening error: Details: ------------- EXCEPTION ------------- MSG: ASSERTION ERROR: inconsistent order Note that I put a BEGIN clause on top of genbank2gff3.PLS script this morning and cvs committed it. This BEGIN clause ensures that the script is using the cvs-versions of Bio::* modules which I've compiled into ./blib/lib . The issue is for more than my mitochondria - it breaks even against test data: $BIOPERL/scripts/Bio-DB-GFF/genbank2gff3.PLS $BIOPERL/t/data/test.genbank -o . ...stuff... L26462 Unflattening error: ...big exception... Unfortunately, I don't know much about the unflattener but I'll help if someone can point me in the right direction. Oh, and just for an additional datapoint, the file t/data/AE003644_Adh-genomic.gb works. Thank you, Chad Matsalla From lstein at cshl.edu Fri Jul 15 16:45:28 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Jul 15 16:36:37 2005 Subject: [Bioperl-l] GFF3 and Gbrowse In-Reply-To: <200506281654.35206.lstein@cshl.edu> References: <1119987851.3365.46.camel@localhost.localdomain> <200506281654.35206.lstein@cshl.edu> Message-ID: <200507151645.29893.lstein@cshl.edu> My memory is failing. The glyph and aggregator are named "processed_transcript" I've also just now created a pair called "so_transcript" that do exactly the same thing. They should work with the GFF3 "canonical gene." Let me know if they don't. Lincoln On Tuesday 28 June 2005 04:54 pm, Lincoln Stein wrote: > It's in bioperl CVS. A copy is also in the gbrowse CVS which will be > installed if it detects an old version of bioperl. > > Lincoln > > On Tuesday 28 June 2005 03:44 pm, Scott Cain wrote: > > Lincoln, > > > > This is the first I've heard of the so_transcript aggregator; have you > > committed it anywhere? > > > > Scott > > > > On Tue, 2005-06-28 at 13:24 -0400, Lincoln Stein wrote: > > > The bioperl GFF database (both the inmemory and relational database > > > versions) need to be brought up to date to handle the full expressive > > > powerof GFF3. So for the time being ID trumps Name. Also you must use > > > the so_transcript aggregator instead of the processed_transcript > > > aggregator. > > > > > > Lincoln > > > > > > On Tuesday 28 June 2005 11:21 am, Andrew Nunberg wrote: > > > > I was wondering if there is any documentation about using GFF3 format > > > > with Gbrowse. Since this is the "new" format, I wanted to start > > > > using it, but observing some behaviors. > > > > > > > > The GFF3 documentation on http://song.sourceforge.net/gff3.shtml > > > > indicates the Name tag is the id to be displayed and the ID tag is > > > > unique and internal, however when I use Gbrowse 1.62 it is ID that is > > > > being displayed as the label. > > > > > > > > I wish to use processed_transcript aggregator, the GFF3 document > > > > indicates you only need to display the exons and CDS and the UTRs > > > > will be inferred, however I did not see that when viewed in Gbrowse. > > > > > > > > If there is some extra code or documentation I need please let me > > > > know > > > > > > > > Thanks > > > > Andy -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse@cshl.edu From lstein at cshl.edu Fri Jul 15 16:45:28 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Jul 15 16:36:39 2005 Subject: [Bioperl-l] GFF3 and Gbrowse In-Reply-To: <200506281654.35206.lstein@cshl.edu> References: <1119987851.3365.46.camel@localhost.localdomain> <200506281654.35206.lstein@cshl.edu> Message-ID: <200507151645.29893.lstein@cshl.edu> My memory is failing. The glyph and aggregator are named "processed_transcript" I've also just now created a pair called "so_transcript" that do exactly the same thing. They should work with the GFF3 "canonical gene." Let me know if they don't. Lincoln On Tuesday 28 June 2005 04:54 pm, Lincoln Stein wrote: > It's in bioperl CVS. A copy is also in the gbrowse CVS which will be > installed if it detects an old version of bioperl. > > Lincoln > > On Tuesday 28 June 2005 03:44 pm, Scott Cain wrote: > > Lincoln, > > > > This is the first I've heard of the so_transcript aggregator; have you > > committed it anywhere? > > > > Scott > > > > On Tue, 2005-06-28 at 13:24 -0400, Lincoln Stein wrote: > > > The bioperl GFF database (both the inmemory and relational database > > > versions) need to be brought up to date to handle the full expressive > > > powerof GFF3. So for the time being ID trumps Name. Also you must use > > > the so_transcript aggregator instead of the processed_transcript > > > aggregator. > > > > > > Lincoln > > > > > > On Tuesday 28 June 2005 11:21 am, Andrew Nunberg wrote: > > > > I was wondering if there is any documentation about using GFF3 format > > > > with Gbrowse. Since this is the "new" format, I wanted to start > > > > using it, but observing some behaviors. > > > > > > > > The GFF3 documentation on http://song.sourceforge.net/gff3.shtml > > > > indicates the Name tag is the id to be displayed and the ID tag is > > > > unique and internal, however when I use Gbrowse 1.62 it is ID that is > > > > being displayed as the label. > > > > > > > > I wish to use processed_transcript aggregator, the GFF3 document > > > > indicates you only need to display the exons and CDS and the UTRs > > > > will be inferred, however I did not see that when viewed in Gbrowse. > > > > > > > > If there is some extra code or documentation I need please let me > > > > know > > > > > > > > Thanks > > > > Andy -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse@cshl.edu From mayagao at gmail.com Fri Jul 15 13:32:32 2005 From: mayagao at gmail.com (Gao Zhang) Date: Fri Jul 15 18:45:40 2005 Subject: [Bioperl-l] A question about replacing a substring using Bioperl In-Reply-To: References: <7beac6a05071508535feea867@mail.gmail.com> Message-ID: <7beac6a0507151032148bd615@mail.gmail.com> Dear all, I have a txt file which stores 20 short DNA sequences and the length of each is 8, let's call it A. Meanwhile, I have another txt file which owns 100 long DNA sequences and the length of each is 200, let's call it B. Then, I want to replace a substring of each sequence in B with each one in A. The replacement starting site could be specified as you want(such as starting at position 1 for the first sequence in B, 10th for the 2nd sequence in B, 20th for the 3rd, until 190th for the 20th in B ) or picked by the program randomly. I am pretty sure substr(string,index,length,replacement string) can finish a part of this work. But I have limited experience of using Perl to manipulate two files. Can anybody give me some suggestions? Thank you very much and look forward to your reply! Best Regards, Maya From rob at salmonella.org Sat Jul 16 11:07:27 2005 From: rob at salmonella.org (Rob Edwards) Date: Sat Jul 16 10:57:59 2005 Subject: [Bioperl-l] Bio::Graphics and primer3 pipeline In-Reply-To: References: Message-ID: <29CA8456-285F-4A66-B865-4BFC35E4730F@salmonella.org> You should be able to get the primers Tm from the Bio::SeqFeature::Primer objects - there are two different methods in there for calculating Tm's. The GC content is not a method at the moment, but could be added as one. Rob On Jul 15, 2005, at 5:43 AM, Nathan Haigh wrote: > I'm creating a pipeline for passing around 200 sequences to primer3 > in order > to generate primers. I want to be able to use Bio::Graphics to > create a png > file for each sequence with the position of the primers shown and > some of > the details about each primer (e.g. Tm, %GC). > > > > Here's what I have so far in pseudocode: > > > > Foreach Bio::Seq object > > Add position of introns as Bio::SeqFeature::Generic features > > Run Primer3 with Bio::Seq object > > Loop through primers, returning a Bio::Seq::PrimedSeq object > > Add primer as features using: Bio::Seq > object->add_SeqFeature(Bio::Seq::PrimedSeq); > > > > Now create png using Bio::Graphics > > > > This works ok, but I'm lost trying to get the Tm and GC content of the > primers as returned by Primer3 > > > > Does anyone have a script that can do something similar that I > might try to > work out whats going on? > > Thanks > > Nathan > > > > > > > > ---------------------------------- > > Nathan Haigh > > Bioinformatics PostDoctoral Research Associate > > > > Room B2 211 > > Department of Animal and Plant Sciences > > University of Sheffield > > Western Bank > > Sheffield > > S10 2TN > > > > Tel: +44 (0)114 22 20112 > > Mob: +44 (0)7742 533 569 > > Fax: +44 (0)114 22 20002 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From hlapp at gmx.net Sat Jul 16 20:02:05 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Jul 16 19:52:45 2005 Subject: [Bioperl-l] COG parsing ? In-Reply-To: <20050713211957.6373.qmail@web40513.mail.yahoo.com> References: <20050713211957.6373.qmail@web40513.mail.yahoo.com> Message-ID: <038871a824ed3175897809adfeac5c98@gmx.net> Not that I'm aware of. If you're thinking about contributing one, that'd be cool. -hilmar On Jul 13, 2005, at 2:19 PM, Renee Halbrook wrote: > Hi, > > Does BioPerl have a parser for the Clusters of > Orthologous Groups of proteins (COGs) from NCBI ? > > > Thanks for any help, > Renee Halbrook > > > > __________________________________ > Yahoo! Mail > Stay connected, organized, and protected. Take the tour: > http://tour.mail.yahoo.com/mailtour.html > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From n.haigh at sheffield.ac.uk Sun Jul 17 09:22:43 2005 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Sun Jul 17 09:16:02 2005 Subject: [Bioperl-l] Bio::Graphics and primer3 pipeline In-Reply-To: <29CA8456-285F-4A66-B865-4BFC35E4730F@salmonella.org> Message-ID: Are these not parsed out of the primer3 output file at all, without the need for actually calculating them? There seems to be quite a few useful annotations that could be extracted from the output file. Nath -----Original Message----- From: Rob Edwards [mailto:rob@salmonella.org] Sent: 16 July 2005 16:07 To: n.haigh@sheffield.ac.uk Cc: 'Bioperl list' Subject: Re: [Bioperl-l] Bio::Graphics and primer3 pipeline You should be able to get the primers Tm from the Bio::SeqFeature::Primer objects - there are two different methods in there for calculating Tm's. The GC content is not a method at the moment, but could be added as one. Rob On Jul 15, 2005, at 5:43 AM, Nathan Haigh wrote: > I'm creating a pipeline for passing around 200 sequences to primer3 > in order > to generate primers. I want to be able to use Bio::Graphics to > create a png > file for each sequence with the position of the primers shown and > some of > the details about each primer (e.g. Tm, %GC). > > > > Here's what I have so far in pseudocode: > > > > Foreach Bio::Seq object > > Add position of introns as Bio::SeqFeature::Generic features > > Run Primer3 with Bio::Seq object > > Loop through primers, returning a Bio::Seq::PrimedSeq object > > Add primer as features using: Bio::Seq > object->add_SeqFeature(Bio::Seq::PrimedSeq); > > > > Now create png using Bio::Graphics > > > > This works ok, but I'm lost trying to get the Tm and GC content of the > primers as returned by Primer3 > > > > Does anyone have a script that can do something similar that I > might try to > work out whats going on? > > Thanks > > Nathan > > > > > > > > ---------------------------------- > > Nathan Haigh > > Bioinformatics PostDoctoral Research Associate > > > > Room B2 211 > > Department of Animal and Plant Sciences > > University of Sheffield > > Western Bank > > Sheffield > > S10 2TN > > > > Tel: +44 (0)114 22 20112 > > Mob: +44 (0)7742 533 569 > > Fax: +44 (0)114 22 20002 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From iain.m.wallace at gmail.com Mon Jul 18 07:54:41 2005 From: iain.m.wallace at gmail.com (Iain Wallace) Date: Mon Jul 18 07:46:46 2005 Subject: [Bioperl-l] [Bioperl -l] Passing argument via Command line Message-ID: <8cff3eb80507180454108946df@mail.gmail.com> Hi all, I am wondering if anybody can help me. I am trying to open a sequence file and parse it via Bio::SeqIO. My script works fine if I pass the filename in via the commandline e.g. perl test_embl.pl filename. but it doesn't work if i hard code the filename into the script, and I cann't figure out why. The only two lines i change are: #my $seqfile = $ARGV[0]; my $seqfile = "HBB_HUMAN.BC007075.embl"; Thanks for any help you can give me Iain --The Script-- use Bio::AlignIO; use Bio::SeqIO; use Bio::LocatableSeq; #my $seqfile = $ARGV[0]; my $seqfile = "HBB_HUMAN.BC007075.embl"; print $seqfile,"\n"; my $input = new Bio::SeqIO->new( -file => $seqfile,-format=>'EMBL'); while ( my $seq = $input->next_seq() ) { print $seq->id,"\n"; @features = $seq->get_SeqFeatures(); # just top level foreach my $feat ( @features ) { if($feat->primary_tag eq "CDS"){ $cds_obj= $feat->spliced_seq; $cds_seq=$cds_obj->seq; my @translated = $feat->each_tag_value('translation'); $translated_seq= $translated[0]; print $translated_seq,"\n"; } } } From Marc.Logghe at devgen.com Mon Jul 18 08:43:25 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Mon Jul 18 08:34:02 2005 Subject: [Bioperl-l] [Bioperl -l] Passing argument via Command line Message-ID: <0C528E3670D8CE4B8E013F6749231AA62F545F@ANTARESIA.be.devgen.com> Hi Iain, It is because you run 2 times the new method with Bio::SeqIO new Bio::SeqIO->new( ... ) This is like writing: my $input = Bio::SeqIO->new; $input = $input->new( -file => $seqfile,-format=>'EMBL' ); When you take your script where you hardcoded $seqfile and you run as 'perl test_embl.pl HBB_HUMAN.BC007075.embl' everything works fine. The first time new() is called, you actually do not pass any arguments. So by default bioperl will look for the passed filename in @ARGV, which was given. Using that filename it will try to guess the format. This succeeds also. The 2nd call to new() also will succeed. But when one runs it like 'perl test_embl.pl' then it fails, because the first call to new() fails because it has no filename (@ARGV is empty), so no chance to guess the format. Your instantion should obviously look like: my $input = Bio::SeqIO->new( -file => $seqfile,-format=>'EMBL'); HTH, Marc > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of > Iain Wallace > Sent: Monday, July 18, 2005 1:55 PM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] [Bioperl -l] Passing argument via Command line > > Hi all, > > I am wondering if anybody can help me. I am trying to open a > sequence file and parse it via Bio::SeqIO. > > My script works fine if I pass the filename in via the > commandline e.g. perl test_embl.pl filename. > but it doesn't work if i hard code the filename into the > script, and I cann't figure out why. > > The only two lines i change are: > #my $seqfile = $ARGV[0]; > my $seqfile = "HBB_HUMAN.BC007075.embl"; > > Thanks for any help you can give me > > Iain > > --The Script-- > use Bio::AlignIO; > use Bio::SeqIO; > use Bio::LocatableSeq; > > #my $seqfile = $ARGV[0]; > my $seqfile = "HBB_HUMAN.BC007075.embl"; print $seqfile,"\n"; > my $input = new Bio::SeqIO->new( -file => $seqfile,-format=>'EMBL'); > > while ( my $seq = $input->next_seq() ) { print $seq->id,"\n"; > @features = $seq->get_SeqFeatures(); # just top level foreach > my $feat ( @features ) { if($feat->primary_tag eq "CDS"){ > $cds_obj= $feat->spliced_seq; $cds_seq=$cds_obj->seq; my > @translated = $feat->each_tag_value('translation'); > $translated_seq= $translated[0]; > print $translated_seq,"\n"; > } > } > > } From mayagao1999 at yahoo.com Mon Jul 18 17:06:10 2005 From: mayagao1999 at yahoo.com (Alex Zhang) Date: Mon Jul 18 16:56:46 2005 Subject: [Bioperl-l] how to work on two txt files simultaneously by handle corresponding lines from each file Message-ID: <20050718210610.18944.qmail@web53509.mail.yahoo.com> Dear All, Sorry to bother you again. I have two txt files to handle. One is "short_sequences" and the other one is "long_sequences". The "short_sequences" holds 100 short sequences (8 nucleotide long) and 100 long sequences (200 nucleotide long) in the "long_sequence". For example, the first short sequence is "TTGACATA" and the first long sequence is "GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC GAACCTTGGACTAACCACTGTCTGGATA". Basically, I want to generate a random position as a starting site to replace a substring in the long sequence with a short sequence. In this example, we can choose a starting site as 5th nucleotide in the long sequence, after replacing using "TTGACATA", the replaced long sequence is "GAATTTGACATAAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC GAACCTTGGACTAACCACTGTCTGGATA". Then I want replace the 2nd long sequence with the 2nd short sequence and then repeat this over and over again until the last long sequence is reached and replaced. I think the only problem is that the starting site should not be larger than 193. Otherwise, there are not enough nucleotides in the long sequence for replacement. Furthurmore, I want to keep track the starting replacement site for each long sequence. I am copying my code in the below. ****************************************** use strict; use warnings; my (@short, @long, $offset); # the 'short' array will hold the short #sequences while 'long' array the long sequences open(FILE1, '<', "short_sequences.txt") || die "Can't open short_sequences.txt: $!\n"; while(){ chomp; push(@short, $_); } close FILE1; #Close the file open(FILE2, '<', "long_sequences.txt") || die "Can't open long_sequences.txt: $!\n"; while(){ chomp; push(@long, $_); } close FILE2; #Close the file # replacement foreach my $short(@short){ foreach my $long(@long){ $offset = int(rand(length($long)%193)); substr($long,$offset,length($short),$short); printf "%3d", $offset+1; print "\n", $long, "\n"; } } ******************************************** But I just realized that there is a problem for the two loops. The problem is that each short sequence will be used to replace all long sequences not the corresponding one. So I seek your suggestions on how to handle two files simultaneously for my case. Thank you very much and look forward to your reply! Best Regards, Alex __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From khoueiry at ibdm.univ-mrs.fr Mon Jul 18 17:19:47 2005 From: khoueiry at ibdm.univ-mrs.fr (khoueiry) Date: Mon Jul 18 17:32:58 2005 Subject: [Bioperl-l] how to work on two txt files simultaneously by handle corresponding lines from each file In-Reply-To: <20050718210610.18944.qmail@web53509.mail.yahoo.com> References: <20050718210610.18944.qmail@web53509.mail.yahoo.com> Message-ID: <20050718211306.M22651@ibdm.univ-mrs.fr> If I understood well your idea, I suggest to access table by index (see the code below). I didn't test this code but I think it's a fine way to solve your problem. # replacement for(my $i = 0; $i < $#short; $i++){ $offset = int(rand(length($long)%193)); printf "%3d", $offset+1; substr($long[$i],$offset,length($short[$i]),$short[$i]); print "\n", $long, "\n"; } On Mon, 18 Jul 2005 14:06:10 -0700 (PDT), Alex Zhang wrote > Dear All, > > Sorry to bother you again. > > I have two txt files to handle. One is > "short_sequences" and the other > one is "long_sequences". The "short_sequences" holds > 100 short sequences (8 nucleotide long) and 100 long > sequences (200 nucleotide long) in the > "long_sequence". > > For example, the first short sequence is "TTGACATA" > and the first long sequence is > "GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA > GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT > CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC > GAACCTTGGACTAACCACTGTCTGGATA". > > Basically, I want to generate a random position as a > starting site to replace a substring > in the long sequence with a short sequence. In this > example, we can choose a starting site > as 5th nucleotide in the long sequence, after > replacing using "TTGACATA", the replaced > long sequence is > "GAATTTGACATAAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA > GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT > CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC > GAACCTTGGACTAACCACTGTCTGGATA". > > Then I want replace the 2nd long sequence with the 2nd > short sequence and then repeat this over and over > again until the last long sequence is reached and > replaced. I think the only problem is that the > starting site should not be larger than 193. > Otherwise, there are > not enough nucleotides in the long sequence for > replacement. > > Furthurmore, I want to keep track the starting > replacement site for each long sequence. > > I am copying my code in the below. > ****************************************** > use strict; > use warnings; > > my (@short, @long, $offset); # the 'short' array will > hold the short > #sequences while 'long' > array the long sequences > > open(FILE1, '<', "short_sequences.txt") || die "Can't > open short_sequences.txt: $!\n"; > while(){ > chomp; > push(@short, $_); > } > close FILE1; #Close the file > > open(FILE2, '<', "long_sequences.txt") || die "Can't > open long_sequences.txt: $!\n"; > while(){ > chomp; > push(@long, $_); > } > close FILE2; #Close the file > > # replacement > foreach my $short(@short){ > foreach my $long(@long){ > $offset = int(rand(length($long)%193)); > substr($long,$offset,length($short),$short); > printf "%3d", $offset+1; > print "\n", $long, "\n"; > > } > } > ******************************************** > > But I just realized that there is a problem for the > two > loops. The problem is that each short sequence will be > used to replace all long sequences not the > corresponding one. > > So I seek your suggestions on how to handle two files > simultaneously for my case. > > Thank you very much and look forward to your reply! > > Best Regards, > Alex > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Open WebMail Project (http://openwebmail.org) From tgra at ceh.ac.uk Tue Jul 19 08:23:14 2005 From: tgra at ceh.ac.uk (Tanya Gray) Date: Tue Jul 19 08:11:21 2005 Subject: [Bioperl-l] FeatureIO::gff.pm -- error fetching sofa.definition Message-ID: Hi, I have a simple test script to read a GFF3 file using FeatureIO::gff.pm. Unfortunately it is throwing errors relating to retrieval of sofa.definition file: MSG: failed to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, server threw 500 I am using Bioperl-1.5.0. I just wonder if anyone might know what the problem is. Copy of the relevant script/ error messages below. thank you Tanya relevant lines of script --------------------------- my $file_features = "gff3.test"; my $fio = Bio::FeatureIO->new( -file =>$file_features, -format =>"GFF", -validate_terms=>0, -version=>3) or print "\nError occurred: " . $! ; gff3.test ---------- ##gff-version 3 ##sequence-region ctg123 1 1497228 ctg123 . gene 1000 9000 . + . ID=gene00001;Name=EDEN ERROR MESSAGES --------------------- perl gff3test.pl [11:11] -------------------- WARNING --------------------- MSG: [1/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500. retrying... --------------------------------------------------- -------------------- WARNING --------------------- MSG: [2/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500. retrying... --------------------------------------------------- -------------------- WARNING --------------------- MSG: [3/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500. retrying... --------------------------------------------------- -------------------- WARNING --------------------- MSG: [4/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500. retrying... --------------------------------------------------- -------------------- WARNING --------------------- MSG: [5/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500. retrying... --------------------------------------------------- ------------- EXCEPTION ------------- MSG: failed to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, server threw 500 STACK Bio::Root::IO::_initialize_io /usr/local/share/perl/5.8.7/Bio/Root/IO.pm:276 STACK Bio::Root::IO::new /usr/local/share/perl/5.8.7/Bio/Root/IO.pm:227 STACK Bio::OntologyIO::dagflat::defs_url /usr/local/share/perl/5.8.7/Bio/OntologyIO/dagflat.pm:361 STACK Bio::OntologyIO::dagflat::_initialize /usr/local/share/perl/5.8.7/Bio/OntologyIO/dagflat.pm:188 STACK Bio::OntologyIO::soflat::_initialize /usr/local/share/perl/5.8.7/Bio/OntologyIO/soflat.pm:145 STACK Bio::OntologyIO::new /usr/local/share/perl/5.8.7/Bio/OntologyIO.pm:169 STACK Bio::OntologyIO::new /usr/local/share/perl/5.8.7/Bio/OntologyIO.pm:178 STACK Bio::Ontology::OntologyStore::get_ontology /usr/local/share/perl/5.8.7/Bio/Ontology/OntologyStore.pm:225 STACK Bio::FeatureIO::gff::_initialize /usr/local/share/perl/5.8.7/Bio/FeatureIO/gff.pm:110 STACK Bio::FeatureIO::new /usr/local/share/perl/5.8.7/Bio/FeatureIO.pm:268 STACK Bio::FeatureIO::new /usr/local/share/perl/5.8.7/Bio/FeatureIO.pm:288 STACK toplevel gff3test.pl:15 From grassi.e at virgilio.it Tue Jul 19 09:58:40 2005 From: grassi.e at virgilio.it (Elena Grassi) Date: Tue Jul 19 09:49:20 2005 Subject: [Bioperl-l] Bio-perl and webpages? Message-ID: <1121781520.4064.38.camel@localhost.localdomain> Hi, I've got a bunch of scripts (ok, it should be a complete program, but that's not the point now...) written in perl (written with other people not that much object-oriented) and now I need to make them work through a website. My first idea is to use a little bit of dirty php, my second one is to translate perl in php (I have to admit that I'd prefer not to use this idea...), the third one involves bioperl: if I decide to try to re-write the scripts with bioperl is there any suitable and fast tool to put them into an html based structure? Sorry for the nearly OT question, E. -- If I were a swan, I'd be gone. If - Pink Floyd - Atom Heart Mother From cain at cshl.edu Tue Jul 19 09:59:17 2005 From: cain at cshl.edu (Scott Cain) Date: Tue Jul 19 09:49:56 2005 Subject: [Bioperl-l] FeatureIO::gff.pm -- error fetching sofa.definition In-Reply-To: Message-ID: Hi Tanya, The version of Bio::FeatureIO::gff in bioperl-1.5 was still a little rough. In particular, it required the download of SOFA from a hard coded location to validate the types in the GFF3. In bioperl-live, validation has become a option you can pass to the constructor and is off by default. I believe the location to get SOFA from is still hard coded in though. I would suggest using bioperl-live if you can. Scott ---------------------------------------------------------------------- Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator, http://www.gmod.org/ (216)392-3087 ---------------------------------------------------------------------- On Tue, 19 Jul 2005, Tanya Gray wrote: > Hi, I have a simple test script to read a GFF3 file using FeatureIO::gff.pm. Unfortunately it is throwing errors relating to retrieval of sofa.definition file: > MSG: failed to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, server threw 500 > > I am using Bioperl-1.5.0. I just wonder if anyone might know what the problem is. Copy of the relevant script/ error messages below. > > thank you > Tanya > > > relevant lines of script > --------------------------- > > my $file_features = "gff3.test"; > > my $fio = Bio::FeatureIO->new( -file =>$file_features, -format =>"GFF", -validate_terms=>0, -version=>3) or print "\nError occurred: " . $! ; > > > gff3.test > ---------- > > ##gff-version 3 > ##sequence-region ctg123 1 1497228 > ctg123 . gene 1000 9000 . + . ID=gene00001;Name=EDEN > > ERROR MESSAGES > --------------------- > perl gff3test.pl [11:11] > > -------------------- WARNING --------------------- > MSG: [1/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500. retrying... > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: [2/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500. retrying... > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: [3/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500. retrying... > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: [4/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500. retrying... > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: [5/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500. retrying... > --------------------------------------------------- > > ------------- EXCEPTION ------------- > MSG: failed to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, server threw 500 > STACK Bio::Root::IO::_initialize_io /usr/local/share/perl/5.8.7/Bio/Root/IO.pm:276 > STACK Bio::Root::IO::new /usr/local/share/perl/5.8.7/Bio/Root/IO.pm:227 > STACK Bio::OntologyIO::dagflat::defs_url /usr/local/share/perl/5.8.7/Bio/OntologyIO/dagflat.pm:361 > STACK Bio::OntologyIO::dagflat::_initialize /usr/local/share/perl/5.8.7/Bio/OntologyIO/dagflat.pm:188 > STACK Bio::OntologyIO::soflat::_initialize /usr/local/share/perl/5.8.7/Bio/OntologyIO/soflat.pm:145 > STACK Bio::OntologyIO::new /usr/local/share/perl/5.8.7/Bio/OntologyIO.pm:169 > STACK Bio::OntologyIO::new /usr/local/share/perl/5.8.7/Bio/OntologyIO.pm:178 > STACK Bio::Ontology::OntologyStore::get_ontology /usr/local/share/perl/5.8.7/Bio/Ontology/OntologyStore.pm:225 > STACK Bio::FeatureIO::gff::_initialize /usr/local/share/perl/5.8.7/Bio/FeatureIO/gff.pm:110 > STACK Bio::FeatureIO::new /usr/local/share/perl/5.8.7/Bio/FeatureIO.pm:268 > STACK Bio::FeatureIO::new /usr/local/share/perl/5.8.7/Bio/FeatureIO.pm:288 > STACK toplevel gff3test.pl:15 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From palmeida at igc.gulbenkian.pt Tue Jul 19 10:31:05 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Tue Jul 19 10:22:00 2005 Subject: [Bioperl-l] Bio-perl and webpages? In-Reply-To: <1121781520.4064.38.camel@localhost.localdomain> References: <1121781520.4064.38.camel@localhost.localdomain> Message-ID: <42DD0EA9.6000306@igc.gulbenkian.pt> Hi Elena, If you already have your scripts in Perl it would probably be best to use the Perl CGI module, instead of php ( you can find it at http://search.cpan.org/dist/CGI.pm/ ). You can adapt the input so it is read from a web form and change the print commands to print html. Incidentally, I'm not sure this is appropriate for the list, but since I'm on the subject... I tried to adapt a script to run on the Web; I wanted to use Taint mode but I got an error saying that something on the Clustal module of BioPerl was using an unsafe variable: Insecure $ENV{PATH} while running with -T switch at /usr/local/share/perl/5.8.4/Bio/Tools/Run/Alignment/Clustalw.pm line 556, line 2. I wouldn't mind hardcoding the path of Clustal, but I couldn't figure out a way to do it, or to untaint the variable. Can anyone help? Thanks, --Paulo Elena Grassi wrote: >Hi, > >I've got a bunch of scripts (ok, it should be a complete program, but >that's not the point now...) written in perl (written with other people >not that much object-oriented) and now I need to make them work through >a website. >My first idea is to use a little bit of dirty php, my second one is to >translate perl in php (I have to admit that I'd prefer not to use this >idea...), the third one involves bioperl: if I decide to try to re-write >the scripts with bioperl is there any suitable and fast tool to put them >into an html based structure? > >Sorry for the nearly OT question, >E. > > From jeremy_just at netcourrier.com Tue Jul 19 11:25:32 2005 From: jeremy_just at netcourrier.com (=?ISO-8859-15?Q?J=E9r=E9my?= JUST) Date: Tue Jul 19 11:16:47 2005 Subject: [Bioperl-l] Bio-perl and webpages? In-Reply-To: <42DD0EA9.6000306@igc.gulbenkian.pt> References: <1121781520.4064.38.camel@localhost.localdomain> <42DD0EA9.6000306@igc.gulbenkian.pt> Message-ID: <20050719172532.00007ca9@pearson.infobiogen.fr> On Tue, 19 Jul 2005 15:31:05 +0100 Paulo Almeida wrote: > Insecure $ENV{PATH} while running with -T switch at > /usr/local/share/perl/5.8.4/Bio/Tools/Run/Alignment/Clustalw.pm line > 556, line 2. > > I wouldn't mind hardcoding the path of Clustal, but I couldn't figure > out a way to do it, or to untaint the variable. Can anyone help? The content of %ENV is considered as unsafe, since it comes from outside your program. One secure way of untainting the PATH is to set it at the beginning of your code: $ENV{PATH} = '/bin:/usr/bin:/usr/local/bin' ; I think you are bound to hardcode the PATH into your program for it to be really safe. I've seen another solution in the SpamAssassin code: it checks each element of the PATH to verify that there is no world-writable or group-writable directories in it. See also perldoc perlsec for more details. -- J?r?my JUST From lstein at cshl.edu Tue Jul 19 12:43:36 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Jul 19 12:34:29 2005 Subject: [Bioperl-l] Re: [Gmod-gbrowse] Adding human chromosomes as reference sequences In-Reply-To: References: Message-ID: <200507191243.37491.lstein@cshl.edu> Hi, The bug involving _maxbin() was fixed in the CVS version of bioper some time ago. You also get the fix when you install the latest CVS version of GBrowse. I'm sorry that the ucsc_genes2gff.pl script isn't loading the chromosome extents; We just need a similar script called ucsc_chromosomes2gff.pl or something similar. Ilari, since you've already essentially done this, perhaps you'd be willing to contribute the script? I'll add it to bioperl. Thanks for the information about load_ucsc.pl. Although I can't use it, due to not having the enum.pm module installed, I did see immediately where the problem has arisen and have fixed it in bioperl CVS (hope I didn't break it in so doing!) As of about a week ago the xyplot.pm glyph has been enhanced to accept negative scores. You can also colorize the bars and points according to the score or other criteria. lincoln On Tuesday 19 July 2005 05:57 am, Ilari Scheinin wrote: > Hello. > > I recently installed gbrowse for visualizing the human genome. By > browsing this list, I found out that the easiest way to import the > genome data is is to get it from UCSC. > > So I downloaded these files from > ftp://hgdownload.cse.ucsc.edu/goldenPath/hg17/database/: > chromInfo.txt, kgXref.txt, knownGeneMrna.txt, knownGenePep.txt, > knownGene.txt, knownToLocusLink.txt, knownToPfam.txt, > knownToU133Plus2.txt, knownToU133.txt, knownToU95.txt, refLink.txt, > refSeqSummary.txt > > and these from ftp://ftp.ncbi.nlm.nih.gov/refseq/LocusLink/ARCHIVE/: > log2UG, loc2acc, loc2go > and also /gene/DATA/gene2accession (renamed to genebank2accessions.txt) > > and then ran ucsc_genes2gff.pl (from gmod-0.003) and bp_load_gff.pl with > % ./ucsc_genes2gff.pl -annotations hg17 | bp_load_gff.pl -c -d > "dbi:mysql:database=gbrowse;host=" --user -p -f > sequencedata/ - > > It works fine and loads the data to the database, but it doesn't add > the reference entries for the chromosomes, so when I try to search for > chr1 (or just 1) in gbrowse, I get "The landmark named chr1 is not > recognized.". I tried adding an entry for chr1 directly in mysql and > gbrowse worked fine with that. > > So next I took the file chromInfo.txt which contains the lenghts of the > chromosomes and edited that into a GFF file. I tried to load it with > % bp_load_gff.pl -d "dbi:mysql:database=gbrowse;host=" --user > -p chromosomes.gff > > I get: > chromosomes.gff: loading... > Can't locate object method "_maxbin" via package > "Bio::DB::GFF::Adaptor::dbi::mysqlopt" at > /usr/lib/perl5/site_perl/5.8.1/Bio/DB/GFF/Adaptor/dbi/mysql.pm line > 687, <> line 2. > DBI::db=HASH(0x11f8080)->disconnect invalidates 2 active statement > handles (either destroy statement handles or call finish on them before > disconnecting) at > /usr/lib/perl5/site_perl/5.8.1/Bio/DB/GFF/Adaptor/dbi/caching_handle.pm > line 228, <> line 2. > > I noticed that this is a problem with long features. Chr1 is > 245,522,847 bp. If I drop the 7 from the end, it works. The default for > maxfeature is 100,000,000, but adding --maxfeature 1000000000 for > bp_load_gff.pl doesn't have any effect. As you can see, this is with > perl 5.8.1, and same thing happens on another machine with 5.8.3. > Bioperl is 1.5.0. Is the script broken or am I doing something wrong? > > I then made a little script that goes through chromInfo.txt and adds > the chromosomes directly to mysql. I ignored the column fbin, because I > didn't know what it was for. This seems to work fine, gbrowse is able > to find the chromosomes. But is there an "official" or better way to > import the human genome data to gbrowse? > > I also tried load_ucsc.pl from bioperl-1.5.0, but it didn't add the > chromosome entries either. By the way, the script produces an empty GFF > file for each input file, but everything is written to stdout, so all > the files remain empty. > > > Also one other thing. Can the score values in GFF be negative? I'm > using gbrowse to visualize CGH data, but the xyplot doesn't seem to > work with negative log ratios. > > > Regards, > Ilari > > > > ------------------------------------------------------- > SF.Net email is sponsored by: Discover Easy Linux Migration Strategies > from IBM. Find simple to follow Roadmaps, straightforward articles, > informative Webcasts and more! Get everything you need to get up to > speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse@cshl.edu From Andrew.Mather at dpi.vic.gov.au Tue Jul 19 00:11:48 2005 From: Andrew.Mather at dpi.vic.gov.au (Andrew.Mather@dpi.vic.gov.au) Date: Tue Jul 19 13:19:40 2005 Subject: [Bioperl-l] Bioperl-ext, Staden and x86_64 Message-ID: Hi Bioperlers I'm having some problems getting the bioperl-ext to install under RHEL3 Update 5 for Opteron. I guess it's not strictly a Bioperl problem, but I figure someone here will have tried this before I and can offer some advice. The problem seems to be related to the Staden io_lib. Version 1.8.11 wouldn't compile, as the configure fails since it doesn't appear to undertsand Opterons. I looked around and found Verison 1.9.0 on Sourceforge and this appears to compile cleanly, however it doesn't look like it's left any .so files in /usr/local/lib (or anywhere else for that matter). >From reading the staden::read makefile, this (and I'm guesing it's this ) causes the make process to fail and I can't build ext. It leaves the .a files, but no .so files. I've copied the Read, os and configure header files into /usr/local/include, which seems to be a common problem, but this makes no difference. Has anyone on the list compiled the staden io_lib on Opteron ? If so, pointers to appropriate info/versions etc gratefully received. Thanks, Andrew Animal Genetics and Genomics, PIRVic Attwood 475 Mickleham Road, Attwood, 3049 ph +61 3 92174342 mob 0413 009 761 ---------------- There are 10 kinds of people...those who understand binary and those who don't. From palmeida at igc.gulbenkian.pt Tue Jul 19 14:45:31 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Tue Jul 19 14:36:27 2005 Subject: [Bioperl-l] Bio-perl and webpages? In-Reply-To: <20050719172532.00007ca9@pearson.infobiogen.fr> References: <1121781520.4064.38.camel@localhost.localdomain> <42DD0EA9.6000306@igc.gulbenkian.pt> <20050719172532.00007ca9@pearson.infobiogen.fr> Message-ID: <200507191945.31605.palmeida@igc.gulbenkian.pt> Hey, I did what you said and it seems to be working. Thank you very much. I changed things in Clustalw.pm back and forth and never thought of trying to solve the problem within my script. -- Paulo On Tuesday 19 July 2005 16:25, J?r?my JUST wrote: > On Tue, 19 Jul 2005 15:31:05 +0100 > > Paulo Almeida wrote: > > Insecure $ENV{PATH} while running with -T switch at > > /usr/local/share/perl/5.8.4/Bio/Tools/Run/Alignment/Clustalw.pm line > > 556, line 2. > > > > I wouldn't mind hardcoding the path of Clustal, but I couldn't figure > > out a way to do it, or to untaint the variable. Can anyone help? > > The content of %ENV is considered as unsafe, since it comes from > outside your program. > One secure way of untainting the PATH is to set it at the beginning of > your code: > > $ENV{PATH} = '/bin:/usr/bin:/usr/local/bin' ; > > > I think you are bound to hardcode the PATH into your program for it to > be really safe. > I've seen another solution in the SpamAssassin code: it checks each > element of the PATH to verify that there is no world-writable or > group-writable directories in it. > > > See also perldoc perlsec for more details. From astew at wam.umd.edu Tue Jul 19 18:03:19 2005 From: astew at wam.umd.edu (Andrew Stewart) Date: Tue Jul 19 21:02:52 2005 Subject: [Bioperl-l] error installing bioperl-db Message-ID: <42DD78A7.5060507@wam.umd.edu> I'm having a problem while trying to install the bioperl-db modules. While trying to run a make test, I get the following error: t/01dbadaptor.....ok 1/13 ------------- EXCEPTION ------------- *MSG: Failed to load module Bio::DB::DBI::postgresql*. Can't locate Bio/DB/DBI/postgresql.pm in @INC (@INC contains: t /usr/local/bioperl-db/blib/lib /usr/local/bioperl-db/blib/arch /sw/lib/perl5/5.8.1/darwin-thread-multi-2level /sw/lib/perl5/5.8.1 /sw/lib/perl5 /sw/lib/perl5/darwin /Users/astew/usr/lib /System/Library/Perl/5.8.1/darwin-thread-multi-2level /System/Library/Perl/5.8.1 /Library/Perl/5.8.1/darwin-thread-multi-2level /Library/Perl/5.8.1 /Library/Perl /Network/Library/Perl/5.8.1/darwin-thread-multi-2level /Network/Library/Perl/5.8.1 /Network/Library/Perl .) at /sw/lib/perl5/5.8.1/Bio/Root/Root.pm line 396. STACK Bio::Root::Root::_load_module /sw/lib/perl5/5.8.1/Bio/Root/Root.pm:398 STACK Bio::DB::SimpleDBContext::dbi /usr/local/bioperl-db/blib/lib/Bio/DB/SimpleDBContext.pm:296 STACK Bio::DB::BioSQL::DBAdaptor::new /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/DBAdaptor.pm:85 STACK Bio::DB::BioDB::new /usr/local/bioperl-db/blib/lib/Bio/DB/BioDB.pm:203 STACK DBTestHarness::get_DBAdaptor t/DBTestHarness.pm:257 STACK DBTestHarness::get_DBContext t/DBTestHarness.pm:272 STACK toplevel t/01dbadaptor.t:23 -------------------------------------- t/01dbadaptor.....dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 2-13 Failed 12/13 tests, 7.69% okay Where am I supposed to find Bio::DB::DBI::postgresql ? -Andrew Stewart From hlapp at gmx.net Wed Jul 20 04:50:32 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Jul 20 04:48:20 2005 Subject: [Bioperl-l] error installing bioperl-db In-Reply-To: <42DD78A7.5060507@wam.umd.edu> References: <42DD78A7.5060507@wam.umd.edu> Message-ID: <00afefecc06d2d2bb22f5be09fb4410a@gmx.net> Did you configure the name of your driver to be 'postgresql'? Is this a new DBD driver for PostgreSQL? The DBD driver used to be Pg (DBD::Pg), so bioperl-db just uses the same convention or name. I.e., change postgresql to Pg in your configuration (t/DBHarness.biosql.conf), unless the DBD driver you are using is indeed DBD::postgresql. If the DBD driver you are using is indeed DBD::postgresql and not DBD::Pg then copy Bio/DB/DBI/Pg.pm to Bio/DB/DBI/postgresql.pm and rename (or copy) the directory Bio/DB/BioSQL/Pg to Bio/DB/BioSQL/postgresql. Hth, -hilmar On Jul 19, 2005, at 3:03 PM, Andrew Stewart wrote: > I'm having a problem while trying to install the bioperl-db modules. > While trying to run a make test, I get the following error: > > t/01dbadaptor.....ok 1/13 > ------------- EXCEPTION ------------- > *MSG: Failed to load module Bio::DB::DBI::postgresql*. Can't locate > Bio/DB/DBI/postgresql.pm in @INC (@INC contains: t > /usr/local/bioperl-db/blib/lib /usr/local/bioperl-db/blib/arch > /sw/lib/perl5/5.8.1/darwin-thread-multi-2level /sw/lib/perl5/5.8.1 > /sw/lib/perl5 /sw/lib/perl5/darwin /Users/astew/usr/lib > /System/Library/Perl/5.8.1/darwin-thread-multi-2level > /System/Library/Perl/5.8.1 > /Library/Perl/5.8.1/darwin-thread-multi-2level /Library/Perl/5.8.1 > /Library/Perl /Network/Library/Perl/5.8.1/darwin-thread-multi-2level > /Network/Library/Perl/5.8.1 /Network/Library/Perl .) at > /sw/lib/perl5/5.8.1/Bio/Root/Root.pm line 396. > > STACK Bio::Root::Root::_load_module > /sw/lib/perl5/5.8.1/Bio/Root/Root.pm:398 > STACK Bio::DB::SimpleDBContext::dbi > /usr/local/bioperl-db/blib/lib/Bio/DB/SimpleDBContext.pm:296 > STACK Bio::DB::BioSQL::DBAdaptor::new > /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/DBAdaptor.pm:85 > STACK Bio::DB::BioDB::new > /usr/local/bioperl-db/blib/lib/Bio/DB/BioDB.pm:203 > STACK DBTestHarness::get_DBAdaptor t/DBTestHarness.pm:257 > STACK DBTestHarness::get_DBContext t/DBTestHarness.pm:272 > STACK toplevel t/01dbadaptor.t:23 > > -------------------------------------- > t/01dbadaptor.....dubious > Test returned status 2 (wstat 512, 0x200) > DIED. FAILED tests 2-13 > Failed 12/13 tests, 7.69% okay > > > Where am I supposed to find Bio::DB::DBI::postgresql ? > > > -Andrew Stewart > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Wed Jul 20 12:33:50 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Jul 20 12:27:29 2005 Subject: [Bioperl-l] error installing bioperl-db In-Reply-To: <42DE7912.4020300@wam.umd.edu> References: <42DD78A7.5060507@wam.umd.edu> <00afefecc06d2d2bb22f5be09fb4410a@gmx.net> <42DE7912.4020300@wam.umd.edu> Message-ID: <9e25eb983a2480c4493d3f0422ad0365@gmx.net> First off, the bioperl-microarray mailing list has little to do with this topic. The appropriate list is bioperl-l to which you posted first. You can find the page to subscribe at www.bioperl.org. As for your error report, seeing an error in the tests is usually not a good sign. You can force an install but there's likely a problem that needs to be fixed. Here, in Postgresql the first failed statement invalidates the entire transaction and no other SQL command can succeed until the transaction is rolled back. To deal with this the Postgresql version of the schema defines 'rules' that do lookups to prevent unique key clashes. One of the statements either failed unexpectedly, or it failed when it should have been caught by one of the rules. Which version of Postgresql are you using? Did you download the schema from CVS, and were there any errors when you instantiated it? I'll need to replicate the error before I can judge further what's going on. -hilmar On Jul 20, 2005, at 9:17 AM, Andrew Stewart wrote: > Doh, that was a sloppy overlook on my part. Thanks for pointing it > out. > > make test now reports 97% ok with the following error: > t/03simpleseq.....NOK 33Use of uninitialized value in join or string > at /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/BaseDriver.pm line > 1845. > > -------------------- WARNING --------------------- > MSG: update in Bio::DB::BioSQL::PrimarySeqAdaptor (driver) failed, > values were ("NM_003319","","NM_003319","Homo sapiens titin (TTN), > transcript variant N2-B, mRNA","3") FKs (2) > ERROR: current transaction is aborted, commands ignored until end of > transaction block > > --------------------------------------------------- > Use of uninitialized value in join or string at > /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/BaseDriver.pm line 1845. > Use of uninitialized value in join or string at > /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/BaseDriver.pm line 1845. > Use of uninitialized value in join or string at > /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/BaseDriver.pm line 1845. > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::BiosequenceAdaptor (driver) failed, > values were ("","82027","dna","","") FKs (2) > ERROR: current transaction is aborted, commands ignored until end of > transaction block > > --------------------------------------------------- > t/03simpleseq.....ok 34/59 > ------------- EXCEPTION ------------- > MSG: error while executing statement in > Bio::DB::BioSQL::PrimarySeqAdaptor::find_by_unique_key: ERROR: > current transaction is aborted, commands ignored until end of > transaction block > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:951 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:855 > STACK (eval) t/03simpleseq.t:112 > STACK toplevel t/03simpleseq.t:66 > > -------------------------------------- > t/03simpleseq.....FAILED tests 32-33, 35, 38-59 > Failed 25/59 tests, 57.63% okay > > > Is this an error I should be worried about or should I go ahead and > force make install? > > > Thanks for the help. Could I be added to this listserv by the way? > > -Andrew Stewart > US Navy BDRD > > > > > Hilmar Lapp wrote: > >> Did you configure the name of your driver to be 'postgresql'? Is this >> a new DBD driver for PostgreSQL? The DBD driver used to be Pg >> (DBD::Pg), so bioperl-db just uses the same convention or name. >> >> I.e., change postgresql to Pg in your configuration >> (t/DBHarness.biosql.conf), unless the DBD driver you are using is >> indeed DBD::postgresql. >> >> If the DBD driver you are using is indeed DBD::postgresql and not >> DBD::Pg then copy Bio/DB/DBI/Pg.pm to Bio/DB/DBI/postgresql.pm and >> rename (or copy) the directory Bio/DB/BioSQL/Pg to >> Bio/DB/BioSQL/postgresql. >> >> Hth, >> >> -hilmar >> >> On Jul 19, 2005, at 3:03 PM, Andrew Stewart wrote: >> >>> I'm having a problem while trying to install the bioperl-db modules. >>> While trying to run a make test, I get the following error: >>> >>> t/01dbadaptor.....ok 1/13 >>> ------------- EXCEPTION ------------- >>> *MSG: Failed to load module Bio::DB::DBI::postgresql*. Can't locate >>> Bio/DB/DBI/postgresql.pm in @INC (@INC contains: t >>> /usr/local/bioperl-db/blib/lib /usr/local/bioperl-db/blib/arch >>> /sw/lib/perl5/5.8.1/darwin-thread-multi-2level /sw/lib/perl5/5.8.1 >>> /sw/lib/perl5 /sw/lib/perl5/darwin /Users/astew/usr/lib >>> /System/Library/Perl/5.8.1/darwin-thread-multi-2level >>> /System/Library/Perl/5.8.1 >>> /Library/Perl/5.8.1/darwin-thread-multi-2level /Library/Perl/5.8.1 >>> /Library/Perl /Network/Library/Perl/5.8.1/darwin-thread-multi-2level >>> /Network/Library/Perl/5.8.1 /Network/Library/Perl .) at >>> /sw/lib/perl5/5.8.1/Bio/Root/Root.pm line 396. >>> >>> STACK Bio::Root::Root::_load_module >>> /sw/lib/perl5/5.8.1/Bio/Root/Root.pm:398 >>> STACK Bio::DB::SimpleDBContext::dbi >>> /usr/local/bioperl-db/blib/lib/Bio/DB/SimpleDBContext.pm:296 >>> STACK Bio::DB::BioSQL::DBAdaptor::new >>> /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/DBAdaptor.pm:85 >>> STACK Bio::DB::BioDB::new >>> /usr/local/bioperl-db/blib/lib/Bio/DB/BioDB.pm:203 >>> STACK DBTestHarness::get_DBAdaptor t/DBTestHarness.pm:257 >>> STACK DBTestHarness::get_DBContext t/DBTestHarness.pm:272 >>> STACK toplevel t/01dbadaptor.t:23 >>> >>> -------------------------------------- >>> t/01dbadaptor.....dubious >>> Test returned status 2 (wstat 512, 0x200) >>> DIED. FAILED tests 2-13 >>> Failed 12/13 tests, 7.69% okay >>> >>> >>> Where am I supposed to find Bio::DB::DBI::postgresql ? >>> >>> >>> -Andrew Stewart >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From lstein at cshl.edu Wed Jul 20 12:39:23 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Jul 20 12:30:56 2005 Subject: [Bioperl-l] Re: [Gmod-gbrowse] Adding human chromosomes as reference sequences In-Reply-To: <42DE0A24.6030705@molecular-sciences.ox.ac.uk> References: <200507191243.37491.lstein@cshl.edu> <42DE0A24.6030705@molecular-sciences.ox.ac.uk> Message-ID: <200507201239.24760.lstein@cshl.edu> These changes were all in bioperl itself, so you don't have to update gbrowse. Lincoln On Wednesday 20 July 2005 04:24 am, Steve Taylor wrote: > Hi, > > > As of about a week ago the xyplot.pm glyph has been enhanced to accept > > negative scores. You can also colorize the bars and points according to > > the score or other criteria. > > That's great! Is it best to do a full CVS update of bioperl and gbrowse > (1_62-bugfixes branch) or will just updating bioperl suffice to get these > features? > > Thanks and Regards, > > Steve > > > ------------------------------------------------------- > SF.Net email is sponsored by: Discover Easy Linux Migration Strategies > from IBM. Find simple to follow Roadmaps, straightforward articles, > informative Webcasts and more! Get everything you need to get up to > speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse@cshl.edu From Kary at ioc.fiocruz.br Wed Jul 20 12:27:03 2005 From: Kary at ioc.fiocruz.br (Kary Ann Del Carmen Soriano Ocana) Date: Wed Jul 20 13:09:15 2005 Subject: [Bioperl-l] Add a new parameters for Hmmpfam Message-ID: <29AC1A3F62AAF54BA71E367C6D62CEB096C2A0@alpha.ioc.fiocruz.br> Dear Marc Logghe: I have a script in perl for run hmmpfam (add following), I would like add other parameter in my @params, because when I run by shell with this "expert options command --forward" (Viterbi algorithm), it returns me much more hits in the result: 1.- SHELL COMMAND my $factory = system("hmmpfam -E 0.1 --forward modelos_hmmer_alignm.hmm $seq >results/hmmer_alignm.out") 2.- PERL SCRIPT #!/usr/bin/perl -w $ENV{HMMPFAMDIR} = '/usr/local/bin/'; use lib "/usr/local/bioperl14"; use lib "/usr/local/bioperl-run-1.4"; use strict; use Bio::Tools::Run::Hmmpfam; use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; use Bio::SearchIO::Writer::TextResultWriter; use Bio::SearchIO::Writer::HSPTableWriter; use Bio::SearchIO::Writer::ResultTableWriter; use Bio::SeqIO; my @params = ('DB' => 'modelos_hmmer_tcoffee.hmm', 'E' => 0.1); my $factory = Bio::Tools::Run::Hmmpfam->new(@params); my $seq = $ARGV[0]; #any old protein fasta file my $search = $factory->run($seq); my $writer = Bio::SearchIO::Writer::HSPTableWriter -> new( -columns => [qw( hit_name query_name score expect start_hit end_hit start_query end_query )] ); my $out = Bio::SearchIO->new( -writer => $writer, -file => ">results/searchio_tcoffee.out" ); while (my $result = $search->next_result()) { $out->write_result($result); } Thank you very much for help me. Your faithfully Kary Soriano From smarkel at scitegic.com Wed Jul 20 17:19:28 2005 From: smarkel at scitegic.com (Scott Markel) Date: Wed Jul 20 17:13:28 2005 Subject: [Bioperl-l] HTTP response size check in Bio::Tools::Run::RemoteBlast Message-ID: <42DEBFE0.1080209@scitegic.com> Sometime last week NCBI made a change to the HTTP response for remote BLAST requests. Based on when my regressions started to fail, I think it was on the 14th. The if( $size > 1000 ) check in retrieve_blast() now passes when it shouldn't, meaning that intermediate pages are assumed to be final results. I'm now seeing response sizes of just under 2000 for the intermediate pages. A customer of mine is getting about the same. If this check is changed to 2000, then we're back in business. We can't make the number too big or we'll start missing small result sets. A request for a single BLASTp hit gives me a result size of about 3400. Has anyone else seen this problem? Is this a reasonable fix to propose? I'm a little concerned that whatever the number is, it's very susceptible to changes at NCBI. Scott -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From jason.stajich at duke.edu Wed Jul 20 21:23:35 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Jul 20 21:15:00 2005 Subject: [Bioperl-l] HTTP response size check in Bio::Tools::Run::RemoteBlast In-Reply-To: <42DEBFE0.1080209@scitegic.com> References: <42DEBFE0.1080209@scitegic.com> Message-ID: <013A3AB6-52F7-4813-A220-6F2A0F57B92F@duke.edu> I got a email from Guido to the same effect. Guido - best to post to the mailing list in the future so I am not the bottleneck. I just haven't had time to actually make the changes. Really need someone else to maintain this module to be honest. Anyways, any ways to make the module more robust to NCBI changes would be appreciated - it really started as a simple hack - I don't know if it needs to mirror more closely the example code that NCBI provides for submitting remote blasts. -jason On Jul 20, 2005, at 5:19 PM, Scott Markel wrote: > Sometime last week NCBI made a change to the HTTP response > for remote BLAST requests. Based on when my regressions > started to fail, I think it was on the 14th. > > The if( $size > 1000 ) check in retrieve_blast() now passes > when it shouldn't, meaning that intermediate pages are assumed > to be final results. I'm now seeing response sizes of just > under 2000 for the intermediate pages. A customer of mine is > getting about the same. > > If this check is changed to 2000, then we're back in business. > We can't make the number too big or we'll start missing small > result sets. A request for a single BLASTp hit gives me a > result size of about 3400. > > Has anyone else seen this problem? Is this a reasonable fix > to propose? I'm a little concerned that whatever the number > is, it's very susceptible to changes at NCBI. > > Scott > > -- > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel@scitegic.com > SciTegic Inc. mobile: +1 858 205 3653 > 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 > San Diego, CA 92123 fax: +1 858 279 8804 > USA web: http://www.scitegic.com > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From ferdinand.marletaz at gmail.com Thu Jul 21 04:49:31 2005 From: ferdinand.marletaz at gmail.com (=?ISO-8859-1?Q?Ferdinand_Marl=E9taz?=) Date: Thu Jul 21 04:41:08 2005 Subject: [Bioperl-l] Blast : Bus Error Message-ID: <7c7aa474050721014961ce6a6f@mail.gmail.com> Hi, I know my current problem is only farly related with bioperl but maybe omebody would have already encountered it so, it can be tryed... I try to run blast (tblastx) on a G5 powermac computer (OS : OS 10.4 Tiger but the same was happening with 10.3 Panther), it starts perfect normal but after sometimes, it stops and displays either 'bus error' or 'segmentation fault'... I'm quite surprised because I've never got this problem on a second identical G5 in my lab ? I've try to change blast version from 2.10 to 2.11... but it don't solved the problem. I verify that it's not related to my databases in reformating them from fasta... So, I don't see where the problem can come from ? Does anybody have encountered such problems or erros and have a solution or an idea because I'd like to avoid reinstalling the system on this machine cause loss of time... Thanks a lot Cheers Ferdi ______________________ Ferdinand Marl?taz Evolution and Phylogeny of Metazoans UMR 6540 DIMAR CNRS Station Marine d'Endoume Rue Batterie-des-Lions 13007 MARSEILLE Tel. 33(0)4 91 04 16 54 Port. 33(0)6 30 35 58 49 e-mail. Ferdinand.Marletaz@ens-lyon.fr From l.douchy at gmail.com Thu Jul 21 05:12:26 2005 From: l.douchy at gmail.com (Laurent DOUCHY) Date: Thu Jul 21 05:04:43 2005 Subject: [Bioperl-l] Blast : Bus Error In-Reply-To: <7c7aa474050721014961ce6a6f@mail.gmail.com> References: <7c7aa474050721014961ce6a6f@mail.gmail.com> Message-ID: <2fb209dd0507210212672ea750@mail.gmail.com> Hello, This problem can happen for several reasons : your ram is not sufficiant and /or you are working against a db like nt too big for the combination PPC/blast/db; First verify your ram (500Mo are not enougth) , secondly try to work when you can on a part of nt ; try to check the blast optimised by the Bioteam... Cordially LN 2005/7/21, Ferdinand Marl?taz : > Hi, > > I know my current problem is only farly related with bioperl but maybe > omebody would have already encountered it so, it can be tryed... > > I try to run blast (tblastx) on a G5 powermac computer (OS : OS 10.4 > Tiger but the same was happening with 10.3 Panther), it starts perfect > normal but after sometimes, it stops and displays either 'bus error' > or 'segmentation fault'... I'm quite surprised because I've never got > this problem on a second identical G5 in my lab ? I've try to change > blast version from 2.10 to 2.11... but it don't solved the problem. > I verify that it's not related to my databases in reformating them > from fasta... > > So, I don't see where the problem can come from ? Does anybody have > encountered such problems or erros and have a solution or an idea > because I'd like to avoid reinstalling the system on this machine > cause loss of time... > > Thanks a lot > > Cheers > > Ferdi > > ______________________ > Ferdinand Marl?taz > Evolution and Phylogeny of Metazoans > UMR 6540 DIMAR CNRS > Station Marine d'Endoume > Rue Batterie-des-Lions > 13007 MARSEILLE > Tel. 33(0)4 91 04 16 54 > Port. 33(0)6 30 35 58 49 > e-mail. Ferdinand.Marletaz@ens-lyon.fr > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From ferdinand.marletaz at gmail.com Thu Jul 21 05:58:27 2005 From: ferdinand.marletaz at gmail.com (=?ISO-8859-1?Q?Ferdinand_Marl=E9taz?=) Date: Thu Jul 21 05:49:43 2005 Subject: [Bioperl-l] Blast : Bus Error In-Reply-To: <20050721094219.GA14638@ebi.ac.uk> References: <7c7aa474050721014961ce6a6f@mail.gmail.com> <2fb209dd0507210212672ea750@mail.gmail.com> <20050721094219.GA14638@ebi.ac.uk> Message-ID: <7c7aa474050721025839062a98@mail.gmail.com> Well, I excclude memory problems (2 GB RAM on these machines) and Database SIze problems (The error happens both with large and little like 50 Mo DB). On top of that, I've already perform on the two computers identical blast searches and the other computer runs very well... I don't think about Hardware problems too because this bugging computer have led similar searches in the past without problem... So, something could happened in the configuration what makes the blast process faulty ! I just know that somebody have try to install linux on this computer and don't manage to finish this installation. Maybe a source of my current problems ? What do you all think about that ? Thanks Ferdi 2005/7/21, Andreas Kahari : > [not to the list] > > Hi guys, > > There could also be a problem with a faulty memory module... If > the error is not consistently reproducible, then this is one > possible cause. > > Running out of memory should not produce a Bus Error. It might > produce a Segmentation Fault if the program doesn't care that > the memory allocation failed, but not a Bus Error (as far as I > know, but I don't run OS X here). > > A way to diagnose this is to run exactly the same set-up on two > identical machines until one of them causes the error more than > once. If the other machine seems to run ok then it is very > possible that there is a hardware fault on the first machine (or > some important system configuration setting is different without > you knowing it). > > Regards, > Andreas > > On Thu, Jul 21, 2005 at 11:12:26AM +0200, Laurent DOUCHY wrote: > > Hello, > > This problem can happen for several reasons : > > your ram is not sufficiant and /or you are working against a db like > > nt too big for the combination PPC/blast/db; First verify your ram > > (500Mo are not enougth) , secondly try to work when you can on a part > > of nt ; try to check the blast optimised by the Bioteam... > > Cordially > > > > LN > > > > 2005/7/21, Ferdinand Marl?taz : > > > Hi, > > > > > > I know my current problem is only farly related with bioperl but maybe > > > omebody would have already encountered it so, it can be tryed... > > > > > > I try to run blast (tblastx) on a G5 powermac computer (OS : OS 10.4 > > > Tiger but the same was happening with 10.3 Panther), it starts perfect > > > normal but after sometimes, it stops and displays either 'bus error' > > > or 'segmentation fault'... I'm quite surprised because I've never got > > > this problem on a second identical G5 in my lab ? I've try to change > > > blast version from 2.10 to 2.11... but it don't solved the problem. > > > I verify that it's not related to my databases in reformating them > > > from fasta... > > > > > > So, I don't see where the problem can come from ? Does anybody have > > > encountered such problems or erros and have a solution or an idea > > > because I'd like to avoid reinstalling the system on this machine > > > cause loss of time... > [cut] > > -- > Andreas K?h?ri > > EMBL-EBI/ensembl > www.ensembl.org > > 1024D/C2E163CB F4C4 A41A 665B 448A 3FA9 6AEA 12E3 39DA C2E1 63CB > From johan.viklund at gmail.com Thu Jul 21 08:10:20 2005 From: johan.viklund at gmail.com (Johan Viklund) Date: Thu Jul 21 08:02:25 2005 Subject: [Bioperl-l] bioperl-db: exporting data In-Reply-To: References: <5e924f0a05070508012bbb63d3@mail.gmail.com> Message-ID: <5e924f0a05072105101b55e307@mail.gmail.com> Thanks for the help, it works now (it was a small programming error). (sending this so someone else in a similar predicament can find help) On 7/6/05, Hilmar Lapp wrote: > The way you're describing doesn't sound too far off. The rank is an > ordering index as well as a component of the unique key constraint, > i.e., you can't have two seqfeature qualifier values for the same > feature and tag name unless the rank is different. > > Have you convinced yourself that you con log in to the database and > retrieve those additions by hand (using SQL)? > > Can you reduce this to a test case where you load a single sequence > record, then issue SQL to add your custom annotation, and then retrieve > the record again. Email me the entry you loaded, the SQL statements you > issued, and the entry you got out. > > -hilmar > > On Jul 5, 2005, at 8:01 AM, Johan Viklund wrote: > > > Hi > > > > I'm trying to add COG annotations from Entrez Gene to sequences (from > > refseq in genbank format) I have in a biosql database (on mysql). The > > problem is I can't get them out again with the bioentry2flat.pl script > > (the bioentries appears without what i've added). > > > > I don't use bioperl for this (i've got ~40000 COG annotations (linked > > to GeneIDs)). Instead I add it to the seqfeature_qualifer_value table > > similar to the way GeneID:s are represented (as far as i've figured), > > with term_id corresponding to db_xref, the same seqfeature_id as the > > GeneID had and rank i've tried a few different variations but none > > seem to work (the first free that's larger than GeneID and 1). > > > > How should I add this annotation to the database so it gets exported > > when I use bioperl? > > > > I've also got another question: What is rank for? > > > > -- > > Johan Viklund > > E-mail: > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > -- Johan Viklund E-post: From johan.viklund at gmail.com Thu Jul 21 08:18:35 2005 From: johan.viklund at gmail.com (Johan Viklund) Date: Thu Jul 21 08:23:50 2005 Subject: [Bioperl-l] bioperl-db: Searcing Message-ID: <5e924f0a05072105187840349f@mail.gmail.com> Hello again, I've got new bioperl-db problem: This is my context: I've got a number of sequences in the databases (complete genomes from refseq). I want to be able to find all the db_xrefs for a feature when i've got GeneID or GI for that feature (prefarably this should be returned as a Bio::SeqFeatureI compliant object). If this isn't [currently] possible, how do I get a Bio::SeqFeatureI object from the database? For the record, I can do this with sql-queries and dbi, I want to know if there's a bioperl way. -- Johan Viklund E-post: ----------------- perl -we '$,=" ";$_=bless sub{shift;print split(/::/,ref)},Just::Another::Perl::Hacker;&$_' From johan.viklund at gmail.com Thu Jul 21 08:18:35 2005 From: johan.viklund at gmail.com (Johan Viklund) Date: Thu Jul 21 08:37:45 2005 Subject: [Bioperl-l] bioperl-db: Searcing Message-ID: <5e924f0a05072105187840349f@mail.gmail.com> Hello again, I've got new bioperl-db problem: This is my context: I've got a number of sequences in the databases (complete genomes from refseq). I want to be able to find all the db_xrefs for a feature when i've got GeneID or GI for that feature (prefarably this should be returned as a Bio::SeqFeatureI compliant object). If this isn't [currently] possible, how do I get a Bio::SeqFeatureI object from the database? For the record, I can do this with sql-queries and dbi, I want to know if there's a bioperl way. -- Johan Viklund E-post: ----------------- perl -we '$,=" ";$_=bless sub{shift;print split(/::/,ref)},Just::Another::Perl::Hacker;&$_' From amackey at pcbi.upenn.edu Thu Jul 21 12:09:37 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Thu Jul 21 12:00:30 2005 Subject: [Bioperl-l] HTTP response size check in Bio::Tools::Run::RemoteBlast In-Reply-To: <013A3AB6-52F7-4813-A220-6F2A0F57B92F@duke.edu> References: <42DEBFE0.1080209@scitegic.com> <013A3AB6-52F7-4813-A220-6F2A0F57B92F@duke.edu> Message-ID: <3E91A072-C504-4595-AE8F-39F686F21EC5@pcbi.upenn.edu> This is what I do to distinguish intermediate pages from , and it seems to be stable (at least so far): if ($html =~ m/Status=WAITING/iso) -Aaron On Jul 20, 2005, at 9:23 PM, Jason Stajich wrote: > I got a email from Guido to the same effect. Guido - best to post > to the mailing list in the future so I am not the bottleneck. > > I just haven't had time to actually make the changes. > > Really need someone else to maintain this module to be honest. > > Anyways, any ways to make the module more robust to NCBI changes > would be appreciated - it really started as a simple hack - I don't > know if it needs to mirror more closely the example code that NCBI > provides for submitting remote blasts. > > -jason > On Jul 20, 2005, at 5:19 PM, Scott Markel wrote: > > >> Sometime last week NCBI made a change to the HTTP response >> for remote BLAST requests. Based on when my regressions >> started to fail, I think it was on the 14th. >> >> The if( $size > 1000 ) check in retrieve_blast() now passes >> when it shouldn't, meaning that intermediate pages are assumed >> to be final results. I'm now seeing response sizes of just >> under 2000 for the intermediate pages. A customer of mine is >> getting about the same. >> >> If this check is changed to 2000, then we're back in business. >> We can't make the number too big or we'll start missing small >> result sets. A request for a single BLASTp hit gives me a >> result size of about 3400. >> >> Has anyone else seen this problem? Is this a reasonable fix >> to propose? I'm a little concerned that whatever the number >> is, it's very susceptible to changes at NCBI. >> >> Scott >> >> -- >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel@scitegic.com >> SciTegic Inc. mobile: +1 858 205 3653 >> 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 >> San Diego, CA 92123 fax: +1 858 279 8804 >> USA web: http://www.scitegic.com >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From sdshlxh at gmail.com Thu Jul 21 12:49:37 2005 From: sdshlxh at gmail.com (Ping Yao) Date: Thu Jul 21 12:41:52 2005 Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 27, Issue 8 In-Reply-To: <200507210909.j6L98gTw018162@portal.open-bio.org> References: <200507210909.j6L98gTw018162@portal.open-bio.org> Message-ID: Hi group : I want to download genes from genbank and put them in my local database MySQL. Now what I can do is to download into different files . So who can help me put them into MySQL ? Or does anyone have the code for it and let me try ? Ping From palmeida at igc.gulbenkian.pt Thu Jul 21 13:16:38 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Thu Jul 21 13:07:32 2005 Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 27, Issue 8 In-Reply-To: References: <200507210909.j6L98gTw018162@portal.open-bio.org> Message-ID: <200507211816.39202.palmeida@igc.gulbenkian.pt> Hi Ping, Are you familiar with DBD-mysql ? If not, check it out on http://search.cpan.org/dist/DBD-mysql/ On Thursday 21 July 2005 17:49, Ping Yao wrote: > Hi group : > I want to download genes from genbank and put them in my local database > MySQL. > Now what I can do is to download into different files . > So who can help me put them into MySQL ? > Or does anyone have the code for it and let me try ? > Ping > -- Paulo Almeida Tel: +351 21 4464635, Fax: +351 21 4407970 Instituto Gulbenkian de Ci?ncia Rua da Quinta Grande, 6 P-2780-156 Oeiras Portugal http://www.igc.gulbenkian.pt From chiromatzo at gmail.com Thu Jul 21 14:15:55 2005 From: chiromatzo at gmail.com (Alynne Chiromatzo) Date: Thu Jul 21 14:06:35 2005 Subject: [Bioperl-l] $hsp->seq_inds and axt file Message-ID: <5865004505072111156b10d5bd@mail.gmail.com> Hi! I'm having trouble in finding the hsp->seq_inds in the axt file(whole genome alignment from UCSC Genome Browser). The code is below and a sample of the input file. It doens't show the sequence that it suppose to contain. Anyone can help me? Thanks very much! Alynne Oya. #! /usr/bin/perl use Bio::SearchIO; my $parser = new Bio::SearchIO(-format => 'axt', -file => '/work/project/align/testeaxt'); while( my $result = $parser->next_result ) { while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp) { print "Hank: ".$hsp->rank." Strand : ".$hsp->strand('hit')."\n"; print "Query Name: ".$result->query_name." Hit Name: ".$hit->name."\n"; ($query_beg, $query_end) = $hsp->range('query');#encontra os valores de inicio-final, mas soh q somados de 1 ($hit_beg,$hit_end) = $hsp->range('hit'); print "Range: ".($query_beg-1)."-".($query_end-1)." ".($hit_beg-1)."-".($hit_end-1)."\n"; print $hsp->query_string."\n".$hsp->hit_string."\n"; @h_ind = $hsp->seq_inds('query', 'identical', 1); #Here doesn't apper the index sequence like it suppose to show foreach (@h_ind){ print "==> ".$_." "; } print "\n"; } This is a sample of the input file: 1 SCAFFOLD1 1535 1688 chrX 44389546 44389697 + 6498 TACAATAGGTCAAGGGTCTGCAAACTATAGGTTTAAAAATTAAAAAGAA-GAAAAATATATGGTGGAGACTGGTTGGGATCATAAAGCCCAATATATTTATTGTATGGTCtgtgt-tagccaggagtcttcagagaaacagaaccaataagataCA TACAATAAATCAGAGGTCAGCAAGCTATAGGTTTT----TTAAACAGGACAAAAAATATACAACAGAGAAAATGTAGGACCAGAAAACCCAACATATTTATTATATGGGCTTTTTGTGgtcagggttctcctgtgaaacaggaccaataggatgta 3 SCAFFOLD1 3665 3845 chrX 44391563 44391740 + 7187 CCCTAAAAAGTCA-GTTTTTCA------AGAAGCATAAGCATAGTGTAAATGTAGGAGTTCATAGATCCATAGCAGGGAGAGCTGTTTAGCCTACTTATAGCTTATTTCCAGCTTATATCATCTGTTTGGGGCACGGTCATCCCTAGAGGCAGAGGAA-GAGATTTGGAATGAGGTTTTAGCATGATAT TCCTGAAAATTTATATTTTTCACCAAGAAGAAACATAAACATCTTGCACA---AGGA---CATAAATCTATAGCTGGGGGTGCTGTT-AGTCTAGTTCTAGCATATTTCTAGCCTACATCATCTGTTTGGGGCATAATCATGTCTGGAAGAAAAGGAATGAGGTTTG----GGGATTTTAGCATGGTAT 17 SCAFFOLD2 22789 22919 chrX 44409117 44409239 - 5180 AGAATACACATCATAGTTATCATAGGGGAAT-GTTTAGGTGGCAGGATAAGGCATATTT--TTTTCTTTTCTCTGGTCTGTAAATTCTCTAACATAACTATATTGCTTTTAAATTTTAAATTGATTTTCAATTA agaaaacacacc-cacttataatagtggatttgtccaggtggcaggactatacatctttgttttctttttttcttgtTTATAAATGTTCTAATATAACTATATTGCCtttaaa----------atttttaatta From cjfields at uiuc.edu Thu Jul 21 14:28:49 2005 From: cjfields at uiuc.edu (Chris Fields) Date: Thu Jul 21 14:19:55 2005 Subject: [Bioperl-l] PPM for bioperl-1.5? Message-ID: <6.2.1.2.2.20050721131602.03c8f370@express.cites.uiuc.edu> I noticed the PPM for the latest developer bioperl (v 1.5) isn't found in http://bioperl.org/DIST. I saw that Nathan created one a while back; did anyone transfer it over to the above directory? __________________________________ Chris Fields - Postdoctoral Researcher Lab of Dr. Robert Switzer Address: University of Illinois at Urbana-Champaign Dept. of Biochemistry - 323 RAL 600 S. Mathews Ave. Urbana, IL 61801 Phone : (217) 333-7098 Fax : (217) 244-5858 From cjfields at uiuc.edu Thu Jul 21 15:11:27 2005 From: cjfields at uiuc.edu (Chris Fields) Date: Thu Jul 21 15:02:10 2005 Subject: [Bioperl-l] PPM for bioperl-1.5? In-Reply-To: <6.2.1.2.2.20050721131602.03c8f370@express.cites.uiuc.edu> References: <6.2.1.2.2.20050721131602.03c8f370@express.cites.uiuc.edu> Message-ID: <6.2.1.2.2.20050721140910.03cfac40@express.cites.uiuc.edu> Also, not to complicate things, but are Lincoln's bioperl-1.5 PPM (at http://www.gmod.org/ggb/ppm/) and Nathan's version (at http://web.ukonline.co.uk/nathanhaigh/bioperl/bioperl.ppd) essentially the same? I noticed that Nathan's has a bunch of dependencies but Lincoln's doesn't. Chris At 01:28 PM 7/21/2005, Chris Fields wrote: >I noticed the PPM for the latest developer bioperl (v 1.5) isn't found in >http://bioperl.org/DIST. I saw that Nathan created one a while back; did >anyone transfer it over to the above directory? > >__________________________________ > >Chris Fields - Postdoctoral Researcher >Lab of Dr. Robert Switzer > >Address: > >University of Illinois at Urbana-Champaign >Dept. of Biochemistry - 323 RAL >600 S. Mathews Ave. >Urbana, IL 61801 > >Phone : (217) 333-7098 >Fax : (217) 244-5858 >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l __________________________________ Chris Fields - Postdoctoral Researcher Lab of Dr. Robert Switzer Address: University of Illinois at Urbana-Champaign Dept. of Biochemistry - 323 RAL 600 S. Mathews Ave. Urbana, IL 61801 Phone : (217) 333-7098 Fax : (217) 244-5858 From jason.stajich at duke.edu Thu Jul 21 15:12:39 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Jul 21 15:03:38 2005 Subject: [Bioperl-l] $hsp->seq_inds and axt file In-Reply-To: <5865004505072111156b10d5bd@mail.gmail.com> References: <5865004505072111156b10d5bd@mail.gmail.com> Message-ID: <1121973159.42dff3a7c358d@webmail.duke.edu> There's no midline/homology line in the axt format so there is no way to know which columns are identical so I don't see how it can work. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ Quoting Alynne Chiromatzo : > Hi! > > I'm having trouble in finding the hsp->seq_inds in the axt file(whole > genome alignment from UCSC Genome Browser). The code is below and a > sample of the input file. It doens't show the sequence that it suppose > to contain. Anyone can help me? > > Thanks very much! > > Alynne Oya. > > #! /usr/bin/perl > > use Bio::SearchIO; > > my $parser = new Bio::SearchIO(-format => 'axt', > -file => '/work/project/align/testeaxt'); > while( my $result = $parser->next_result ) { > while( my $hit = $result->next_hit ) { > while( my $hsp = $hit->next_hsp) { > print "Hank: ".$hsp->rank." Strand : ".$hsp->strand('hit')."\n"; > print "Query Name: ".$result->query_name." Hit Name: > ".$hit->name."\n"; > ($query_beg, $query_end) = $hsp->range('query');#encontra os > valores de inicio-final, mas soh q somados de 1 > ($hit_beg,$hit_end) = $hsp->range('hit'); > print "Range: ".($query_beg-1)."-".($query_end-1)." > ".($hit_beg-1)."-".($hit_end-1)."\n"; > print $hsp->query_string."\n".$hsp->hit_string."\n"; > @h_ind = $hsp->seq_inds('query', 'identical', 1); > > #Here doesn't apper the index sequence like it suppose to show > foreach (@h_ind){ > print "==> ".$_." "; > } > print "\n"; > } > > This is a sample of the input file: > > 1 SCAFFOLD1 1535 1688 chrX 44389546 44389697 + 6498 > TACAATAGGTCAAGGGTCTGCAAACTATAGGTTTAAAAATTAAAAAGAA-GAAAAATATATGGTGGAGACTGGTTGGGATCATAAAGCCCAATATATTTATTGTATGGTCtgtgt-tagccaggagtcttcagagaaacagaaccaataagataCA > TACAATAAATCAGAGGTCAGCAAGCTATAGGTTTT----TTAAACAGGACAAAAAATATACAACAGAGAAAATGTAGGACCAGAAAACCCAACATATTTATTATATGGGCTTTTTGTGgtcagggttctcctgtgaaacaggaccaataggatgta > > 3 SCAFFOLD1 3665 3845 chrX 44391563 44391740 + 7187 > CCCTAAAAAGTCA-GTTTTTCA------AGAAGCATAAGCATAGTGTAAATGTAGGAGTTCATAGATCCATAGCAGGGAGAGCTGTTTAGCCTACTTATAGCTTATTTCCAGCTTATATCATCTGTTTGGGGCACGGTCATCCCTAGAGGCAGAGGAA-GAGATTTGGAATGAGGTTTTAGCATGATAT > TCCTGAAAATTTATATTTTTCACCAAGAAGAAACATAAACATCTTGCACA---AGGA---CATAAATCTATAGCTGGGGGTGCTGTT-AGTCTAGTTCTAGCATATTTCTAGCCTACATCATCTGTTTGGGGCATAATCATGTCTGGAAGAAAAGGAATGAGGTTTG----GGGATTTTAGCATGGTAT > > 17 SCAFFOLD2 22789 22919 chrX 44409117 44409239 - 5180 > AGAATACACATCATAGTTATCATAGGGGAAT-GTTTAGGTGGCAGGATAAGGCATATTT--TTTTCTTTTCTCTGGTCTGTAAATTCTCTAACATAACTATATTGCTTTTAAATTTTAAATTGATTTTCAATTA > agaaaacacacc-cacttataatagtggatttgtccaggtggcaggactatacatctttgttttctttttttcttgtTTATAAATGTTCTAATATAACTATATTGCCtttaaa----------atttttaatta > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From cain at cshl.edu Thu Jul 21 15:21:31 2005 From: cain at cshl.edu (Scott Cain) Date: Thu Jul 21 15:13:02 2005 Subject: [Bioperl-l] PPM for bioperl-1.5? In-Reply-To: <6.2.1.2.2.20050721140910.03cfac40@express.cites.uiuc.edu> References: <6.2.1.2.2.20050721131602.03c8f370@express.cites.uiuc.edu> <6.2.1.2.2.20050721140910.03cfac40@express.cites.uiuc.edu> Message-ID: <1121973691.3494.37.camel@localhost.localdomain> The ppm on the gmod website is really intended to be just enough to get GBrowse working and nothing more (though I'm sure you could do more with it, just not the stuff that there are missing dependencies for). Scott On Thu, 2005-07-21 at 14:11 -0500, Chris Fields wrote: > Also, not to complicate things, but are Lincoln's bioperl-1.5 PPM (at > http://www.gmod.org/ggb/ppm/) and Nathan's version (at > http://web.ukonline.co.uk/nathanhaigh/bioperl/bioperl.ppd) > > essentially the same? I noticed that Nathan's has a bunch of dependencies > but Lincoln's doesn't. > Chris > > At 01:28 PM 7/21/2005, Chris Fields wrote: > >I noticed the PPM for the latest developer bioperl (v 1.5) isn't found in > >http://bioperl.org/DIST. I saw that Nathan created one a while back; did > >anyone transfer it over to the above directory? > > > >__________________________________ > > > >Chris Fields - Postdoctoral Researcher > >Lab of Dr. Robert Switzer > > > >Address: > > > >University of Illinois at Urbana-Champaign > >Dept. of Biochemistry - 323 RAL > >600 S. Mathews Ave. > >Urbana, IL 61801 > > > >Phone : (217) 333-7098 > >Fax : (217) 244-5858 > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > __________________________________ > > Chris Fields - Postdoctoral Researcher > Lab of Dr. Robert Switzer > > Address: > > University of Illinois at Urbana-Champaign > Dept. of Biochemistry - 323 RAL > 600 S. Mathews Ave. > Urbana, IL 61801 > > Phone : (217) 333-7098 > Fax : (217) 244-5858 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From hartzell at kestrel.alerce.com Thu Jul 21 15:34:19 2005 From: hartzell at kestrel.alerce.com (George Hartzell) Date: Thu Jul 21 15:26:03 2005 Subject: [Bioperl-l] "Be forgiving in what you accept" and Bio::Tools::GuessSeqFormat Message-ID: <200507211934.j6LJYJO3007600@satchel.alerce.com> There's a great "old" Internet maxim, "Be forgiving in what you accept and strict in what you send". The Bio::Seqio modules seem to be able to cope with "fasta" formatted files that have a space separating the ">" from the rest of the line (e.g. "> ape") if a) you explicitly specify the format or b) if you have the sequence in a file that ends in "fa" (or generally matches the list of patterns that correspond to fasta file names). But, if you happen to have the sequence in a file with a funny name (e.g. /var/tmp/apreq23ZHis [aka a form upload]) then it fails. It can't guess based on the filename and the file content test is strict and wants to see the header line without the whitespace (">ape"). Is there any reason not to extend the regexp a bit and relax that constraint (since everything else seems to cope with it)? Something like this: *** /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm.orig Thu Jul 21 12:30:55 2005 --- /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm Thu Jul 21 12:31:45 2005 *************** *** 591,595 **** my ($line, $lineno) = (shift, shift); return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) || ! $line =~ /^>\w/); } --- 591,595 ---- my ($line, $lineno) = (shift, shift); return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) || ! $line =~ /^>\s*\w/); } g. From brian_osborne at cognia.com Thu Jul 21 16:04:29 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Jul 21 15:55:26 2005 Subject: [Bioperl-l] "Be forgiving in what you accept" and Bio::Tools::GuessSeqFormat In-Reply-To: <200507211934.j6LJYJO3007600@satchel.alerce.com> Message-ID: George, This does sound like a reasonable change, I will make it unless someone has an objection. Let's wait a moment... Brian O. On 7/21/05 3:34 PM, "George Hartzell" wrote: > > There's a great "old" Internet maxim, "Be forgiving in what you accept > and strict in what you send". > > The Bio::Seqio modules seem to be able to cope with "fasta" formatted > files that have a space separating the ">" from the rest of the line > (e.g. "> ape") if a) you explicitly specify the format or b) if you > have the sequence in a file that ends in "fa" (or generally matches > the list of patterns that correspond to fasta file names). > > But, if you happen to have the sequence in a file with a funny name > (e.g. /var/tmp/apreq23ZHis [aka a form upload]) then it fails. It > can't guess based on the filename and the file content test is strict > and wants to see the header line without the whitespace (">ape"). > > Is there any reason not to extend the regexp a bit and relax that > constraint (since everything else seems to cope with it)? > > Something like this: > > *** /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm.orig Thu > Jul 21 12:30:55 2005 > --- /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm Thu Jul > 21 12:31:45 2005 > *************** > *** 591,595 **** > my ($line, $lineno) = (shift, shift); > return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) || > ! $line =~ /^>\w/); > } > > --- 591,595 ---- > my ($line, $lineno) = (shift, shift); > return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) || > ! $line =~ /^>\s*\w/); > } > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From akarger at CGR.Harvard.edu Thu Jul 21 15:58:19 2005 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Thu Jul 21 16:02:11 2005 Subject: [Bioperl-l] The Scriptome now has mailing lists Message-ID: <339D68B133EAD311971E009027DC47970321AF08@montecarlo.cgr.harvard.edu> A few months ago, I introduced the Scriptome, a new cookbook/toolbox of Perl one-liners that allows non-programmer biologists to manipulate their data. I've just created some mailing lists, so I don't have to clutter this list anymore. The scriptome-announce list will be very low traffic (maybe 1 email per month), and scriptome-users will (hopefully) be busier. Subscribe to either or both at http://bioinformatics.org/mail/?group_id=505 Most of you already know how to program, but I'm hoping you'll let some non-geeks know about this resource - or maybe help build it. Now that we've got 40 or 50 tools on the website (http://cgr.harvard.edu/cbg/scriptome) we would love to get feedback from Real Biologists and the people who support them. Cheers, - Amir Karger Computational Biology Group Bauer Center for Genomics Research Harvard University From Marc.Logghe at devgen.com Thu Jul 21 17:06:42 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Thu Jul 21 16:57:12 2005 Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 27, Issue 8 Message-ID: <0C528E3670D8CE4B8E013F6749231AA607D885@ANTARESIA.be.devgen.com> Hi Ping, I have a strong feeling you are looking for bioperl-db/biosql. As soon as you have set up the system you can load your genbank records with the command: load_seqdatabase.pl --host localhost --dbname biosql \ --namespace my_genbank --format genbank \ my/genbank/file.gb You can perform queries using the API or fetch records by accession number with the bioentry2flat.pl script. A good starting point is the BOSC2003 presentation of Hilmar: http://www.open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf HTH, Marc -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org on behalf of Ping Yao Sent: Thu 7/21/2005 6:49 PM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 27, Issue 8 Hi group : I want to download genes from genbank and put them in my local database MySQL. Now what I can do is to download into different files . So who can help me put them into MySQL ? Or does anyone have the code for it and let me try ? Ping _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From stephen.taylor at molecular-sciences.ox.ac.uk Wed Jul 20 04:24:04 2005 From: stephen.taylor at molecular-sciences.ox.ac.uk (Steve Taylor) Date: Thu Jul 21 17:13:29 2005 Subject: [Bioperl-l] Re: [Gmod-gbrowse] Adding human chromosomes as reference sequences In-Reply-To: <200507191243.37491.lstein@cshl.edu> References: <200507191243.37491.lstein@cshl.edu> Message-ID: <42DE0A24.6030705@molecular-sciences.ox.ac.uk> Hi, > As of about a week ago the xyplot.pm glyph has been enhanced to accept > negative scores. You can also colorize the bars and points according to the > score or other criteria. That's great! Is it best to do a full CVS update of bioperl and gbrowse (1_62-bugfixes branch) or will just updating bioperl suffice to get these features? Thanks and Regards, Steve From ilari.scheinin at helsinki.fi Wed Jul 20 05:37:36 2005 From: ilari.scheinin at helsinki.fi (Ilari Scheinin) Date: Thu Jul 21 17:13:40 2005 Subject: [Bioperl-l] Re: [Gmod-gbrowse] Adding human chromosomes as reference sequences In-Reply-To: <200507191243.37491.lstein@cshl.edu> References: <200507191243.37491.lstein@cshl.edu> Message-ID: On 19.7.2005, at 19:43, Lincoln Stein wrote: > I'm sorry that the ucsc_genes2gff.pl script isn't loading the > chromosome > extents; We just need a similar script called ucsc_chromosomes2gff.pl > or > something similar. Ilari, since you've already essentially done this, > perhaps > you'd be willing to contribute the script? I'll add it to bioperl. Actually I wrote my script with PHP, because I don't really know much about Perl. I just recently wanted to use gbrowse and for that reason installed Bioperl. I have started learning myself some Perl, but I think I'm more in the "Hello world" stage than chromosomes2gff stage. Anyway, the chromInfo.txt file from UCSC is just a tab delimited file where the first field is the name of the chromosome, and the second field contains the number of bases. So it is really simple to do a chromosomes2gff script. If someone is interested, here is the PHP script I used. It doesn't convert the chromosome info to a GFF file, but directly loads the data into a mysql database. It is a really dummy script and doesn't do any kind of checks whether it can really read the provided file and access the database, or whether some of the data already exists. It doesn't touch the fbin column of the table fdata, because I have no idea what it is for. It is not mentioned in perldoc Bio::DB::GFF::Adaptor::dbi::mysql. #!/usr/bin/php -f \n"; exit(); } $con = mysql_connect($host, $user, $pass); mysql_select_db($db, $con); mysql_query("insert into ftype (fmethod, fsource) values ('chromosome', 'assembly')", $con); $ftypeid = mysql_insert_id($con); $fp = fopen($file, "r"); $count = 0; while ($line = fgets($fp)) { $fields = explode("\t", $line); mysql_query("insert into fgroup (gclass, gname) values ('chromosome', '$fields[0]')", $con); $gid = mysql_insert_id(); mysql_query("insert into fdata (fref, fstart, fstop, ftypeid, gid) values ('$fields[0]', 1, $fields[1], $ftypeid, $gid)", $con); $count++; } fclose($fp); mysql_close($con); echo "Added $count entries.\n"; ?> Ilari From Guido.Dieterich at gbf.de Thu Jul 21 09:57:42 2005 From: Guido.Dieterich at gbf.de (Guido Dieterich) Date: Thu Jul 21 17:13:44 2005 Subject: [Bioperl-l] HTTP response size check in Bio::Tools::Run::RemoteBlast In-Reply-To: <013A3AB6-52F7-4813-A220-6F2A0F57B92F@duke.edu> References: <42DEBFE0.1080209@scitegic.com> <013A3AB6-52F7-4813-A220-6F2A0F57B92F@duke.edu> Message-ID: <1121954262.16542.51.camel@sb289.gbf-braunschweig.de> Skipped content of type multipart/alternative-------------- next part -------------- An embedded message was scrubbed... From: Jason Stajich Subject: Re: [Bioperl-l] HTTP response size check in Bio::Tools::Run::RemoteBlast Date: Wed, 20 Jul 2005 21:23:35 -0400 Size: 4381 Url: http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050721/25ac06a2/attachment.eml From n.haigh at sheffield.ac.uk Fri Jul 22 04:14:34 2005 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Fri Jul 22 04:09:39 2005 Subject: [Bioperl-l] PPM for bioperl-1.5? In-Reply-To: <6.2.1.2.2.20050721140910.03cfac40@express.cites.uiuc.edu> Message-ID: I created the ppd from the 1.5 release and then manually added as many dependencies from bioperl as I could find to make things as simple and complete as possible for those who wish to install via PPM. As a result it should be a pretty self contained download for bioperl 1.5 but if you are missing many of the dependencies it could take a while to download them all! I have used that particular ppd to install bioperl 1.5 on a clean system and as far as I remember it installs most thing if you have the repositories added to PPM that are mentioned in the INSTALL.WIN file: http://bioperl.org/Core/Latest/INSTALL.WIN A point of interest when using PPM: Try not to use PPM to do something like "upgrade ", inconsistencies in PPM and peoples naming of ppd files can result in an old version of a package being installed. Therefore always use: "search " and "install " in order to obtain the correct version of a package. Let me know if you have any problems. Nathan -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris Fields Sent: 21 July 2005 20:11 To: bioperl-l@bioperl.org Subject: Re: [Bioperl-l] PPM for bioperl-1.5? Also, not to complicate things, but are Lincoln's bioperl-1.5 PPM (at http://www.gmod.org/ggb/ppm/) and Nathan's version (at http://web.ukonli ne.co.uk/nathanhaigh/bioperl/bioperl.ppd) essentially the same? I noticed that Nathan's has a bunch of dependencies but Lincoln's doesn't. Chris At 01:28 PM 7/21/2005, Chris Fields wrote: >I noticed the PPM for the latest developer bioperl (v 1.5) isn't found in >http://bioperl.org/DIST. I saw that Nathan created one a while back; did >anyone transfer it over to the above directory? > >__________________________________ > >Chris Fields - Postdoctoral Researcher >Lab of Dr. Robert Switzer > >Address: > >University of Illinois at Urbana-Champaign >Dept. of Biochemistry - 323 RAL >600 S. Mathews Ave. >Urbana, IL 61801 > >Phone : (217) 333-7098 >Fax : (217) 244-5858 >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l __________________________________ Chris Fields - Postdoctoral Researcher Lab of Dr. Robert Switzer Address: University of Illinois at Urbana-Champaign Dept. of Biochemistry - 323 RAL 600 S. Mathews Ave. Urbana, IL 61801 Phone : (217) 333-7098 Fax : (217) 244-5858 _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From n.haigh at sheffield.ac.uk Fri Jul 22 04:40:38 2005 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Fri Jul 22 04:35:25 2005 Subject: [Bioperl-l] PPM for bioperl-1.5? In-Reply-To: <6.2.1.2.2.20050721131602.03c8f370@express.cites.uiuc.edu> Message-ID: No one transferred the ppm file over to the bioperl server, if someone is able to do this, then please do but note the following: The latest version of the ppd file should be named bioperl.ppd other versions should be named something like bioperl-version_no.ppd. Does anyone know which server the website is served from? Is it on the pub.open-bio.org server, if so I could make the changes myself if I get the permissions? Nathan -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris Fields Sent: 21 July 2005 19:29 To: bioperl-l@bioperl.org Subject: [Bioperl-l] PPM for bioperl-1.5? I noticed the PPM for the latest developer bioperl (v 1.5) isn't found in http://bioperl.org/DIST. I saw that Nathan created one a while back; did anyone transfer it over to the above directory? __________________________________ Chris Fields - Postdoctoral Researcher Lab of Dr. Robert Switzer Address: University of Illinois at Urbana-Champaign Dept. of Biochemistry - 323 RAL 600 S. Mathews Ave. Urbana, IL 61801 Phone : (217) 333-7098 Fax : (217) 244-5858 _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jrm at compbio.dundee.ac.uk Fri Jul 22 04:39:40 2005 From: jrm at compbio.dundee.ac.uk (Jon Manning) Date: Fri Jul 22 04:35:50 2005 Subject: [Bioperl-l] Bio::Structure::IO and single chain output Message-ID: <42E0B0CC.7010609@compbio.dundee.ac.uk> Hi all, I've been using Bio::Structure::IO to read PDB files. I'm currently trying to calculate solvent accessibilities on a per-chain basis, so want to spit out Bio::Structure::Chain objects to PDB-format files so I can feed them alone to NAccess. I've tried passing them directly to the IO object, this looked like it might work: ... $out = Bio::Structure::IO->new(-file => ">test.pdb", '-format' => 'pdb'); $out->write_structure($chain); ... (where $chain is a Bio::Structure::Chain) But I get this sort of error: ------------- EXCEPTION ------------- MSG: Bio::Structure::Chain=HASH(0x8dc7368) is not a StructureI compliant module. STACK Bio::Structure::IO::pdb::write_structure /usr/lib/perl5/site_perl/5.8.5/Bio/Structure/IO/pdb.pm:531 STACK toplevel ./structureIOtest.pl:26 So I then thought to create a Bio::Structure::Entry object, add the chain, and feed that to IO, like: ... my $entry = Bio::Structure::Entry->new(-id => 'structure_id'); $entry->chain($chainobject); $out = Bio::Structure::IO->new(-file => ">test.pdb", '-format' => 'pdb'); $out->write_structure($entry); ... But I've clearly misunderstood the entry initialisation somewhere, because I get this sort of error: Use of uninitialized value in concatenation (.) or string at /usr/lib/perl5/site_perl/5.8.5/Bio/Structure/Entry.pm line 331, line 12836. ------------- EXCEPTION ------------- MSG: add_chain: first argument needs to be a Model object () STACK Bio::Structure::Entry::add_chain /usr/lib/perl5/site_perl/5.8.5/Bio/Structure/Entry.pm:330 STACK Bio::Structure::Entry::get_chains /usr/lib/perl5/site_perl/5.8.5/Bio/Structure/Entry.pm:386 STACK Bio::Structure::Entry::chain /usr/lib/perl5/site_perl/5.8.5/Bio/Structure/Entry.pm:300 STACK toplevel ./structureIOtest.pl:25 -------------------------------------- I'd really appreciate some pointers on how to go about doing this. Thanks, Jon From n.haigh at sheffield.ac.uk Fri Jul 22 05:05:51 2005 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Fri Jul 22 04:57:48 2005 Subject: [Bioperl-l] "Be forgiving in what you accept" andBio::Tools::GuessSeqFormat In-Reply-To: Message-ID: May I ask what software is producing this FASTA format file which has a space immediately after the '>' in the description line? Although I am not aware of a formal description of FASTA format, I have never seem any files with a space immediately after '>'. Although I don't object to relaxing this a little in bioperl, you may find that these files are not compatible with other software. Nathan -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Brian Osborne Sent: 21 July 2005 21:04 To: hartzell@alerce.com; bioperl-l Subject: Re: [Bioperl-l] "Be forgiving in what you accept" andBio::Tools::GuessSeqFormat George, This does sound like a reasonable change, I will make it unless someone has an objection. Let's wait a moment... Brian O. On 7/21/05 3:34 PM, "George Hartzell" wrote: > > There's a great "old" Internet maxim, "Be forgiving in what you accept > and strict in what you send". > > The Bio::Seqio modules seem to be able to cope with "fasta" formatted > files that have a space separating the ">" from the rest of the line > (e.g. "> ape") if a) you explicitly specify the format or b) if you > have the sequence in a file that ends in "fa" (or generally matches > the list of patterns that correspond to fasta file names). > > But, if you happen to have the sequence in a file with a funny name > (e.g. /var/tmp/apreq23ZHis [aka a form upload]) then it fails. It > can't guess based on the filename and the file content test is strict > and wants to see the header line without the whitespace (">ape"). > > Is there any reason not to extend the regexp a bit and relax that > constraint (since everything else seems to cope with it)? > > Something like this: > > *** /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm.orig Thu > Jul 21 12:30:55 2005 > --- /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm Thu Jul > 21 12:31:45 2005 > *************** > *** 591,595 **** > my ($line, $lineno) = (shift, shift); > return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) || > ! $line =~ /^>\w/); > } > > --- 591,595 ---- > my ($line, $lineno) = (shift, shift); > return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) || > ! $line =~ /^>\s*\w/); > } > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From n.haigh at sheffield.ac.uk Fri Jul 22 04:48:06 2005 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Fri Jul 22 05:24:15 2005 Subject: [Bioperl-l] PPM for bioperl-1.5? In-Reply-To: <6.2.1.2.2.20050721140910.03cfac40@express.cites.uiuc.edu> Message-ID: Opps, just realized that although the ppd file is available at: The actual file containing the bioperl stuff isn't available at: http://bioperl.org/DIST/bioperl-1.5-ppm.tar.gz If someone could volunteer to put these files in the http://bioperl.org/DIST directory or grant me access I can do this myself; just let me know and I'll pass on the relevant files! Nathan -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris Fields Sent: 21 July 2005 20:11 To: bioperl-l@bioperl.org Subject: Re: [Bioperl-l] PPM for bioperl-1.5? Also, not to complicate things, but are Lincoln's bioperl-1.5 PPM (at http://www.gmod.org/ggb/ppm/) and Nathan's version (at http://web.ukonli ne.co.uk/nathanhaigh/bioperl/bioperl.ppd) essentially the same? I noticed that Nathan's has a bunch of dependencies but Lincoln's doesn't. Chris At 01:28 PM 7/21/2005, Chris Fields wrote: >I noticed the PPM for the latest developer bioperl (v 1.5) isn't found in >http://bioperl.org/DIST. I saw that Nathan created one a while back; did >anyone transfer it over to the above directory? > >__________________________________ > >Chris Fields - Postdoctoral Researcher >Lab of Dr. Robert Switzer > >Address: > >University of Illinois at Urbana-Champaign >Dept. of Biochemistry - 323 RAL >600 S. Mathews Ave. >Urbana, IL 61801 > >Phone : (217) 333-7098 >Fax : (217) 244-5858 >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l __________________________________ Chris Fields - Postdoctoral Researcher Lab of Dr. Robert Switzer Address: University of Illinois at Urbana-Champaign Dept. of Biochemistry - 323 RAL 600 S. Mathews Ave. Urbana, IL 61801 Phone : (217) 333-7098 Fax : (217) 244-5858 _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From khoueiry at ibdm.univ-mrs.fr Fri Jul 22 10:38:51 2005 From: khoueiry at ibdm.univ-mrs.fr (khoueiry) Date: Fri Jul 22 10:27:33 2005 Subject: [Bioperl-l] getting patterns consensus Message-ID: <1122043131.16107.6.camel@DavidLinux> Hi all, Let's admit that I have the following pattern : $PAT = A[AT]GAT[CT]A Is there a bioperl method or a fine/fast perl way to get all the consensus relative to that pattern: (i.e) AAGATCA AAGATTA ATGATCA ATGATTA Thanks pierre From skirov at utk.edu Fri Jul 22 10:59:40 2005 From: skirov at utk.edu (Stefan Kirov) Date: Fri Jul 22 10:50:13 2005 Subject: [Bioperl-l] getting patterns consensus In-Reply-To: <1122043131.16107.6.camel@DavidLinux> References: <1122043131.16107.6.camel@DavidLinux> Message-ID: <42E109DC.9030906@utk.edu> Yes. Look at Bio::Tools::IUPAC. Create the Bio::Seq object, using IUPAC coding for ambiguous nucleotides (see the documentation) and then create the IUPAC object based on the seq one. Then use next_seq method- it will give you exactly what you need. Stefan khoueiry wrote: >Hi all, > >Let's admit that I have the following pattern : > >$PAT = A[AT]GAT[CT]A > >Is there a bioperl method or a fine/fast perl way to get all the >consensus relative to that pattern: > (i.e) > >AAGATCA >AAGATTA >ATGATCA >ATGATTA > >Thanks > >pierre > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From brian_osborne at cognia.com Fri Jul 22 11:04:39 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Jul 22 10:55:09 2005 Subject: [Bioperl-l] getting patterns consensus In-Reply-To: <1122043131.16107.6.camel@DavidLinux> Message-ID: Pierre, I haven't taken a very close look but I believe you can do this with Bio::Tools::SeqPattern. There's an accompanying script, examples/tools/seq_pattern.pl. Brian O. On 7/22/05 10:38 AM, "khoueiry" wrote: > Hi all, > > Let's admit that I have the following pattern : > > $PAT = A[AT]GAT[CT]A > > Is there a bioperl method or a fine/fast perl way to get all the > consensus relative to that pattern: > (i.e) > > AAGATCA > AAGATTA > ATGATCA > ATGATTA > > Thanks > > pierre > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Jul 22 11:11:25 2005 From: cjfields at uiuc.edu (Chris Fields) Date: Fri Jul 22 11:01:56 2005 Subject: [Bioperl-l] PPM for bioperl-1.5? In-Reply-To: References: <6.2.1.2.2.20050721131602.03c8f370@express.cites.uiuc.edu> Message-ID: <6.2.1.2.2.20050722093124.01e1fe88@express.cites.uiuc.edu> I found the links for both the PPD and PPM in your older response from the bioperl emails (I think from Jan 2005), downloaded the files to a local directory, and installed everything through a local repository (although I had a few slight hitches, see below). I also installed GBrowse directly from the GMOD site and it works as well. I also found GD-SVG and installed it, and I get SVG images with GBrowse just fine! A few notes: I had to change the location of the tar archive in the PPD file for Bioperl and GD-SVG. For some reason PPM kept looking for them in the Bioperl website (where they are still MIA); I found the link in the PPD files for both Bioperl and GD-SVG and changed them by removing the path to the file on the site to the local file, reflecting their location in the local repository (this is done for every link in each file), so for GD-SVG: to Voila! Also, the repository for Kobes (http://theoryx5.uwinnipeg.ca/ppms) didn't work and stopped installation of Bioperl; it is very likely due a problem with the latest XML-SAX module (and not Bioperl), which leaves out or doesn't initialize the file ParserDetails.ini, which may be causing some problems when parsing repositories (although I can't see why); I kep getting messages that ParserDetails.ini couldn't be found. I got around it by using the PPM repository for Randy Kobes that ActiveState lists (http://theoryx5.uwinnipeg.ca/cgi-bin/ppmserver?urn:/PPMServer58 for Perl 5.8, http://theoryx5.uwinnipeg.ca/cgi-bin/ppmserver?urn:/PPMServer for 5.6). Installation worked fine after that. I also reinstalled XML-SAX from the Kobes repository and everything now loads from the original repository listed under the INSTALL.WIN file for Bioperl. This may be something we want to remember if someone comes up with a similar issue; since your PPM package has XML-SAX listed as a dependency, it may install the ActiveState version (the bad one) vs. the Kobes version (the good one). Cheers Chris At 03:51 AM 7/22/2005, you wrote: >Would you be confident enough to install these locally from your own >computer if I give you the relevant files? It would involve saving the files >to a dir on your computer adding that directory to your PPM repository list >and then installing as usual through PPM. Let me know and I can send you the >files (just over 2Mb), or make them available from a web server. > >Nathan > > >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org >[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris Fields >Sent: 21 July 2005 19:29 >To: bioperl-l@bioperl.org >Subject: [Bioperl-l] PPM for bioperl-1.5? > >I noticed the PPM for the latest developer bioperl (v 1.5) isn't found in >http://bioperl.org/DIST. I saw that Nathan created one a while back; did >anyone transfer it over to the above directory? > >__________________________________ > >Chris Fields - Postdoctoral Researcher >Lab of Dr. Robert Switzer > >Address: > >University of Illinois at Urbana-Champaign >Dept. of Biochemistry - 323 RAL >600 S. Mathews Ave. >Urbana, IL 61801 > >Phone : (217) 333-7098 >Fax : (217) 244-5858 > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l __________________________________ Chris Fields - Postdoctoral Researcher Lab of Dr. Robert Switzer Address: University of Illinois at Urbana-Champaign Dept. of Biochemistry - 323 RAL 600 S. Mathews Ave. Urbana, IL 61801 Phone : (217) 333-7098 Fax : (217) 244-5858 From hartzell at kestrel.alerce.com Fri Jul 22 11:35:47 2005 From: hartzell at kestrel.alerce.com (George Hartzell) Date: Fri Jul 22 11:27:45 2005 Subject: [Bioperl-l] "Be forgiving in what you accept" andBio::Tools::GuessSeqFormat In-Reply-To: References: Message-ID: <17121.4691.792887.4566@satchel.alerce.com> Nathan Haigh writes: > May I ask what software is producing this FASTA format file which has a > space immediately after the '>' in the description line? I don't know what created it. Wouldn't surprise me to find out it was created in Microsoft Word.... It was given to me as a example input file/test case. > Although I am not aware of a formal description of FASTA format, I have > never seem any files with a space immediately after '>'. Although I don't > object to relaxing this a little in bioperl, you may find that these files > are not compatible with other software. Yeah, there is that. On the other hand, then we should make the equivalent change and have the Bio::SeqIO object fail on them even if it's told that they're Fasta (e.g. by -format or by guessing based on filename). I was just frustrated when stuff worked up until the moment that I uploaded the file into a tool via the web (at which point it ended up in an oddly named file and the guessing heuristic broke). I'd vote for relaxing the constraint, but, hey.... g. From n.haigh at sheffield.ac.uk Fri Jul 22 12:15:34 2005 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Fri Jul 22 12:07:29 2005 Subject: [Bioperl-l] "Be forgiving in what you accept"andBio::Tools::GuessSeqFormat In-Reply-To: <17121.4691.792887.4566@satchel.alerce.com> Message-ID: If you specified that the file was FASTA, I'm not sure how the parser would work for pulling out primary_id, display_id etc etc for the sequence - have you check that the parser is flexible enough to pull these out of a sequence description that has a space after the '>'? It may be better to strip out these spaces prior to using them in bioperl? But to be honest I wouldn't be bothered either way! :o) Nathan -----Original Message----- From: George Hartzell [mailto:hartzell@kestrel.alerce.com] Sent: 22 July 2005 16:36 To: n.haigh@sheffield.ac.uk Cc: 'Brian Osborne'; 'bioperl-l' Subject: RE: [Bioperl-l] "Be forgiving in what you accept"andBio::Tools::GuessSeqFormat Nathan Haigh writes: > May I ask what software is producing this FASTA format file which has a > space immediately after the '>' in the description line? I don't know what created it. Wouldn't surprise me to find out it was created in Microsoft Word.... It was given to me as a example input file/test case. > Although I am not aware of a formal description of FASTA format, I have > never seem any files with a space immediately after '>'. Although I don't > object to relaxing this a little in bioperl, you may find that these files > are not compatible with other software. Yeah, there is that. On the other hand, then we should make the equivalent change and have the Bio::SeqIO object fail on them even if it's told that they're Fasta (e.g. by -format or by guessing based on filename). I was just frustrated when stuff worked up until the moment that I uploaded the file into a tool via the web (at which point it ended up in an oddly named file and the guessing heuristic broke). I'd vote for relaxing the constraint, but, hey.... g. From khoueiry at ibdm.univ-mrs.fr Fri Jul 22 12:34:36 2005 From: khoueiry at ibdm.univ-mrs.fr (khoueiry) Date: Fri Jul 22 12:23:45 2005 Subject: [Bioperl-l] getting patterns consensus In-Reply-To: <42E109DC.9030906@utk.edu> References: <1122043131.16107.6.camel@DavidLinux> <42E109DC.9030906@utk.edu> Message-ID: <1122050076.5577.2.camel@DavidLinux> Thanks, Kirov, Your method did exactly what I need. brian, no I don't think that Bio::Tools::SeqPattern resolve the prob unless something's missing me. Thanks again Pierre Le vendredi 22 juillet 2005 ? 10:59 -0400, Stefan Kirov a ?crit : > Yes. Look at Bio::Tools::IUPAC. Create the Bio::Seq object, using IUPAC > coding for ambiguous nucleotides (see the documentation) and then create > the IUPAC object based on the seq one. Then use next_seq method- it will > give you exactly what you need. > Stefan > > khoueiry wrote: > > >Hi all, > > > >Let's admit that I have the following pattern : > > > >$PAT = A[AT]GAT[CT]A > > > >Is there a bioperl method or a fine/fast perl way to get all the > >consensus relative to that pattern: > > (i.e) > > > >AAGATCA > >AAGATTA > >ATGATCA > >ATGATTA > > > >Thanks > > > >pierre > > > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > From victor.ruotti at gmail.com Fri Jul 22 13:16:22 2005 From: victor.ruotti at gmail.com (Victor) Date: Fri Jul 22 13:07:31 2005 Subject: [Bioperl-l] The Scriptome now has mailing lists In-Reply-To: <339D68B133EAD311971E009027DC47970321AF08@montecarlo.cgr.harvard.edu> References: <339D68B133EAD311971E009027DC47970321AF08@montecarlo.cgr.harvard.edu> Message-ID: <36d7e55505072210166a3f666e@mail.gmail.com> Hi Lincoln: We are testing your bp_fast_load_gff.pl program on Solaris 10. The script was downloaded from bioperl-live cvs repository. It wont go pass creating the pipe files: ./bp_fast_load_gff.pl -d human_test test.gff loading normalized group, type and attribute information...ok creating load file /export/home/victor/fastload/fdata.16445...ok opening load file for writing... Here is the line of code where I think is having problem executing: $FH{$_} = IO::File->new($file,'>') or die $_,": $!"; Is this because of the pipes used to fast things up? Thanks, Victor From skirov at utk.edu Fri Jul 22 13:39:07 2005 From: skirov at utk.edu (Stefan Kirov) Date: Fri Jul 22 13:34:24 2005 Subject: [Bioperl-l] getting patterns consensus In-Reply-To: <1122050076.5577.2.camel@DavidLinux> References: <1122043131.16107.6.camel@DavidLinux> <42E109DC.9030906@utk.edu> <1122050076.5577.2.camel@DavidLinux> Message-ID: <42E12F3B.6020806@utk.edu> Pierre, Thanks, but actually the module is written by Aaaron Mackay, so I guess your gratitude goes to him :-) . Stefan khoueiry wrote: > Thanks, > > Kirov, Your method did exactly what I need. brian, no I don't think > that Bio::Tools::SeqPattern resolve the prob unless something's > missing me. > > Thanks again > > Pierre > > Le vendredi 22 juillet 2005 ? 10:59 -0400, Stefan Kirov a ?crit : > >>Yes. Look at Bio::Tools::IUPAC. Create the Bio::Seq object, using IUPAC >>coding for ambiguous nucleotides (see the documentation) and then create >>the IUPAC object based on the seq one. Then use next_seq method- it will >>give you exactly what you need. >>Stefan >> >>khoueiry wrote: >> >>>Hi all, >>> >>>Let's admit that I have the following pattern : >>> >>>$PAT = A[AT]GAT[CT]A >>> >>>Is there a bioperl method or a fine/fast perl way to get all the >>>consensus relative to that pattern: >>> (i.e) >>> >>>AAGATCA >>>AAGATTA >>>ATGATCA >>>ATGATTA >>> >>>Thanks >>> >>>pierre >>> >>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l@portal.open-bio.org >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From Steve_Chervitz at affymetrix.com Fri Jul 22 14:22:26 2005 From: Steve_Chervitz at affymetrix.com (Chervitz, Steve) Date: Fri Jul 22 14:14:20 2005 Subject: [Bioperl-l] "Be forgiving in what you accept" and Bio::Tools::GuessSeqFormat In-Reply-To: <200507211934.j6LJYJO3007600@satchel.alerce.com> Message-ID: George Hartzell : > Is there any reason not to extend the regexp a bit and relax that > constraint (since everything else seems to cope with it)? Seems reasonable. I've seen fasta files where there was no id at all, just a '>' by itself on a line followed by a line of sequence. Perhaps the sequence format guesser should accept as fasta any input with a line beginning with '>'? But maybe this is too radical... > But, if you happen to have the sequence in a file with a funny name > (e.g. /var/tmp/apreq23ZHis [aka a form upload]) then it fails. It > can't guess based on the filename and the file content test is strict > and wants to see the header line without the whitespace (">ape"). Would be good to add this example to the SeqIO test suite. > > There's a great "old" Internet maxim, "Be forgiving in what you accept > and strict in what you send". Here's an interesting discussion on this philosophy: http://www.artima.com/forums/flat.jsp?forum=106&thread=4204 The FCC had this notion long before the internet. It's part of the specs on many off-the-shelf electronic devices: CFR part 15 "Devices must not interfere with licensed services and must accept interference from licensed services." I found a recent presentation on the FCC site showing results of a survey about whether part 15 stifles innovation (10/14 respondants said no, and 9/5 said more stringent regulations might even permit *more* innovation): http://www.fcc.gov/oet/tac/Part_15_Survey_12_4_02.ppt Flexibility in input acceptance may be an issue in Bioperl to the extent that it leads to complicated code that is difficult to maintain or for others to grok. But in this particular SeqIO case, flexibility seems warranted. I think it should be up to a specific application to wield authority over what it accepts and produces for fasta files. Since bioperl is a library used by multiple apps, high flexibility in acceptance seems like a bonus. Steve From brian_osborne at cognia.com Fri Jul 22 16:17:44 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Jul 22 16:15:04 2005 Subject: [Bioperl-l] 1.5 for Windows In-Reply-To: <200507211934.j6LJYJO3007600@satchel.alerce.com> Message-ID: bioperl-l, The following files, provided by Nathan Haigh, have been uploaded to bioperl.org/DIST: GD-SVG-0.25-ppm.tar.gz GD-SVG.ppd bioperl-1.5-ppm.tar.gz Bioperl-1.5.ppd The MD5's have also been added to SIGNATURES.md5. Brian O. From rvosa at sfu.ca Fri Jul 22 19:54:09 2005 From: rvosa at sfu.ca (Rutger Vos) Date: Fri Jul 22 19:44:54 2005 Subject: [Bioperl-l] who, if anyone, "owns" Bio:: Message-ID: <42E18721.10000@sfu.ca> Dear bioperlers, for a while now, I've been working on a set of perl modules for phylogenetic analysis. Obviously, I would like other people to use these too (and perhaps contribute to development as well), and so I wish to upload them to the CPAN. The working title for the root name space has been "Phylo::" but I'd like to change this because - rightly - the perl community is hesitant towards a proliferation of top level name spaces. Also, I might wish to incorporate my work into the bioperl release (that is, if the core developers agree) because, well, fragmentation helps no-one. I am however on the fence on this second issue. To help me understand these issues I direct the following questions to the folks in the know: * is the Bio:: namespace reserved for BioPerl proper or can other people - if appropriate - use it as their top level name space, in the same way as, say, WWW:: or CGI::? * Do modules that use the Bio:: namespace have to be part of the bioperl release? I for one think bioperl is wonderful, but it is mostly aimed at molecular biologists, and so a phylogeneticist or evolutionary biologist might not want to install the whole thing just to use peripheral functionality. After all, BioPerl is (with all due respect) starting to grow into a fairly monolithic install. Basically, I'm curious whether it would be okay on your part if I submitted my work under the Bio:: namespace, but as separate installs, to the comprehensive perl archive network. Like I said, I am at this point not entirely of one mind w.r.t merging/contributing to bioperl proper - to be honest, I fear that might be a hairy proposition what with other modules possibly making assumptions about underlying data structures in the packages rather than following the advertised API and what not, but I'm certainly interested in discussing that also. Looking forward to your replies! Best wishes, Rutger -- ++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar ++++++++++++++++++++++++++++++++++++++++++++ From senger at ebi.ac.uk Sat Jul 23 12:07:16 2005 From: senger at ebi.ac.uk (Martin Senger) Date: Sat Jul 23 11:57:42 2005 Subject: [Bioperl-l] Bio::Biblio - small but important changes Message-ID: Hi, I have slightly changed the Bio::Biblio modules that use SOAP to get to the MEDLINE repository (at EBI). The changes are two - and neither of them should have any impact on your code (because the interface of Bio::Biblio has not been changed). Therefore, it should be enough just to update your local copy of these modules. Anyway, the changes are: 1) The default location of the MEDLINE/EBI Web Service changed. The new one is http://www.ebi.ac.uk/openbqs/services/MedlineSRS. 2) The Web Service API (that is hidden to most of you by the Bio::Biblio's API) has changed in order to comply with the WSDL specification (method overloading has been removed - that's why some method names changed there). Of couse, I will be happy to hear about any problems you may (hopefully not) meet after these changes. Regards, Martin -- Martin Senger EMBL Outstation - Hinxton Senger@EBI.ac.uk European Bioinformatics Institute Phone: (+44) 1223 494636 Wellcome Trust Genome Campus (Switchboard: 494444) Hinxton Fax : (+44) 1223 494468 Cambridge CB10 1SD United Kingdom http://industry.ebi.ac.uk/~senger From hartzell at kestrel.alerce.com Sat Jul 23 16:45:26 2005 From: hartzell at kestrel.alerce.com (George Hartzell) Date: Sat Jul 23 16:36:17 2005 Subject: [Bioperl-l] Stop him before he codes again! [almost-multiple-alignment tool?] Message-ID: <17122.44134.422603.381087@satchel.alerce.com> Yep, there's nothing scarier than some people at the keyboard.... I have: 1 putative sequence (DNA, not too big 1-5kb). many (1-10) empirically determined sequences (DNA) that should be fairly similar to the putative sequence. It's trivial to: produce a pairwise alignment of each empirical sequence against the putative sequence. I'd ultimately like to produce (actually, I don't have much choice...): an almost multiple-alignment-like figure, with the putative sequence (e.g.) along the bottom and the empirical sequences piled up above it, gapped in emp. and put. sequences where necessary. It's pretty much similar to piling up a bunch of EST's/cDNA vs. the corresponding genomic, with a simpler gapping model (no splicing, etc...). Ideally I'd like to just wave my hands and have it all work :). I'll write code if I have to.... Since I'm not really looking for a multiple alignment, I'd like to avoid the cost of actually computing/approximating one. I'd like to just do the pairwise alignments then shoehorn the results into an existing bioperl multiple-alignment representation and play with it from there. I'd love comments (and will happily pay in beer/coffee next time you're in Berkeley) about: - existing tools that do just this (or even close) - what's the cleanest bioperl object to shoehorn it into? SimpleAlign? Align? - given one of those objects, are where should I start digging for pretty output routines? - if there's nothing particularly useful, any suggestions on how to structure things so that I can deposit them into BioPerl? - should i just punt and run something (e.g. clustalw, pileup) against the putative and all the empirical and be done with it? Tool recommendations? Thanks, g. From wackattack at gmail.com Sat Jul 23 23:53:18 2005 From: wackattack at gmail.com (Wacki) Date: Sun Jul 24 14:52:37 2005 Subject: [Bioperl-l] Accessing database settings Message-ID: <2b8a4eeb05072320531f2349b8@mail.gmail.com> I installed bioperl but didn't use the right password/username combo. How do I change it? Thanks for the help. From amackey at pcbi.upenn.edu Mon Jul 25 08:25:50 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon Jul 25 08:19:01 2005 Subject: [Bioperl-l] Bioperl-ext, Staden and x86_64 In-Reply-To: References: Message-ID: <408F0127-3C94-4DD7-9B62-B7A6EE369B39@pcbi.upenn.edu> I haven't looked, but if it's using an autoconf-like build system (i.e. you first type "configure" then "make"), you may need to add "-- enable-shared" to the "configure" invocation. -Aaron On Jul 19, 2005, at 12:11 AM, Andrew.Mather@dpi.vic.gov.au wrote: > I looked around and found Verison 1.9.0 on > Sourceforge and this appears to compile cleanly, however it doesn't > look > like it's left any .so files in /usr/local/lib (or anywhere else > for that > matter). > From chiromatzo at gmail.com Mon Jul 25 09:20:28 2005 From: chiromatzo at gmail.com (Alynne Chiromatzo) Date: Mon Jul 25 09:13:10 2005 Subject: [Bioperl-l] How can I acess the alignmet score of the axt file? Message-ID: <5865004505072506204715b019@mail.gmail.com> Hi! I'm working with axt files. I need to know how can I acess the aligment score from the axt file. I've tried to use the $hsp->raw_score but it isn't worked. Anyone can help me? Thanks. Alynne Oya. From jbedell at oriongenomics.com Mon Jul 25 10:04:26 2005 From: jbedell at oriongenomics.com (Joseph Bedell) Date: Mon Jul 25 09:55:08 2005 Subject: [Bioperl-l] Stop him before he codes again![almost-multiple-alignment tool?] Message-ID: <434AF352F9D03C4C896782B8CC78BC7687FEE7@VADER.oriongenomics.com> Hey George, Have you looked at the display option of -m 1 in NCBI BLAST? That gives a multiple sequence alignment-like output. Joey ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Joseph A Bedell, Ph.D. office: 314-615-6979 Director, Bioinformatics fax: 314-615-6975 Orion Genomics cell: 314-518-1343 4041 Forest Park Ave St. Louis, MO 63108 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- >bounces@portal.open-bio.org] On Behalf Of George Hartzell >Sent: Saturday, July 23, 2005 3:45 PM >To: BioPerl MailingList >Subject: [Bioperl-l] Stop him before he codes again![almost-multiple- >alignment tool?] > > >Yep, there's nothing scarier than some people at the keyboard.... > >I have: > > 1 putative sequence (DNA, not too big 1-5kb). > many (1-10) empirically determined sequences (DNA) that should be > fairly similar to the putative sequence. > >It's trivial to: > > produce a pairwise alignment of each empirical sequence against the > putative sequence. > >I'd ultimately like to produce (actually, I don't have much choice...): > > an almost multiple-alignment-like figure, with the putative sequence > (e.g.) along the bottom and the empirical sequences piled up above > it, gapped in emp. and put. sequences where necessary. > >It's pretty much similar to piling up a bunch of EST's/cDNA vs. the >corresponding genomic, with a simpler gapping model (no splicing, >etc...). > >Ideally I'd like to just wave my hands and have it all work :). I'll >write code if I have to.... > >Since I'm not really looking for a multiple alignment, I'd like to >avoid the cost of actually computing/approximating one. > >I'd like to just do the pairwise alignments then shoehorn the results >into an existing bioperl multiple-alignment representation and play >with it from there. > >I'd love comments (and will happily pay in beer/coffee next time >you're in Berkeley) about: > > - existing tools that do just this (or even close) > > - what's the cleanest bioperl object to shoehorn it into? > SimpleAlign? Align? > > - given one of those objects, are where should I start digging for > pretty output routines? > > - if there's nothing particularly useful, any suggestions on how > to structure things so that I can deposit them into BioPerl? > > - should i just punt and run something (e.g. clustalw, pileup) > against the putative and all the empirical and be done with it? > Tool recommendations? > >Thanks, > >g. >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Mon Jul 25 10:11:16 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Jul 25 10:01:44 2005 Subject: [Bioperl-l] "Be forgiving in what you accept" and Bio::Tools::GuessSeqFormat In-Reply-To: <200507211934.j6LJYJO3007600@satchel.alerce.com> Message-ID: George, Done. Brian O. On 7/21/05 3:34 PM, "George Hartzell" wrote: > > There's a great "old" Internet maxim, "Be forgiving in what you accept > and strict in what you send". > > The Bio::Seqio modules seem to be able to cope with "fasta" formatted > files that have a space separating the ">" from the rest of the line > (e.g. "> ape") if a) you explicitly specify the format or b) if you > have the sequence in a file that ends in "fa" (or generally matches > the list of patterns that correspond to fasta file names). > > But, if you happen to have the sequence in a file with a funny name > (e.g. /var/tmp/apreq23ZHis [aka a form upload]) then it fails. It > can't guess based on the filename and the file content test is strict > and wants to see the header line without the whitespace (">ape"). > > Is there any reason not to extend the regexp a bit and relax that > constraint (since everything else seems to cope with it)? > > Something like this: > > *** /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm.orig Thu > Jul 21 12:30:55 2005 > --- /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm Thu Jul > 21 12:31:45 2005 > *************** > *** 591,595 **** > my ($line, $lineno) = (shift, shift); > return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) || > ! $line =~ /^>\w/); > } > > --- 591,595 ---- > my ($line, $lineno) = (shift, shift); > return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) || > ! $line =~ /^>\s*\w/); > } > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From cain at cshl.edu Mon Jul 25 10:34:09 2005 From: cain at cshl.edu (Scott Cain) Date: Mon Jul 25 10:25:18 2005 Subject: [Bioperl-l] Accessing database settings In-Reply-To: <2b8a4eeb05072320531f2349b8@mail.gmail.com> References: <2b8a4eeb05072320531f2349b8@mail.gmail.com> Message-ID: <1122302049.3293.5.camel@localhost.localdomain> Hello, I am guessing you are writing about accessing a GFF database, since you are also posting questions to the gbrowse mailing list. Also, from your posts there, I'm guessing that you have figured it out. If I am not guessing correctly, please re-ask the question and make it a little more clear what you mean. Scott On Sat, 2005-07-23 at 22:53 -0500, Wacki wrote: > I installed bioperl but didn't use the right password/username combo. How do > I change it? Thanks for the help. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From jason.stajich at duke.edu Mon Jul 25 10:44:54 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Jul 25 10:35:17 2005 Subject: [Bioperl-l] Accessing database settings In-Reply-To: <1122302049.3293.5.camel@localhost.localdomain> References: <2b8a4eeb05072320531f2349b8@mail.gmail.com> <1122302049.3293.5.camel@localhost.localdomain> Message-ID: <3a974032dc21830d0456985486249ef8@duke.edu> I assume you just mean for setting up the tests? See t/BioDBGFF.t to see where it reads the conf from (t/data/dbfa). -jason On Jul 25, 2005, at 7:34 AM, Scott Cain wrote: > Hello, > > I am guessing you are writing about accessing a GFF database, since you > are also posting questions to the gbrowse mailing list. Also, from > your > posts there, I'm guessing that you have figured it out. If I am not > guessing correctly, please re-ask the question and make it a little > more > clear what you mean. > > Scott > > > On Sat, 2005-07-23 at 22:53 -0500, Wacki wrote: >> I installed bioperl but didn't use the right password/username combo. >> How do >> I change it? Thanks for the help. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- > ----------------------------------------------------------------------- > - > Scott Cain, Ph. D. > cain@cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich http://www.duke.edu/~jes12 jason.stajich -at- duke.edu From avilella at gmail.com Mon Jul 25 11:06:53 2005 From: avilella at gmail.com (Albert Vilella) Date: Mon Jul 25 10:57:35 2005 Subject: [Bioperl-l] How can I acess the alignmet score of the axt file? In-Reply-To: <5865004505072506204715b019@mail.gmail.com> References: <5865004505072506204715b019@mail.gmail.com> Message-ID: <1122304013.8201.5.camel@localhost.localdomain> El dl 25 de 07 del 2005 a les 10:20 -0300, en/na Alynne Chiromatzo va escriure: > Hi! I'm working with axt files. I need to know how can I acess the > aligment score from the axt file. I've tried to use the > $hsp->raw_score but it isn't worked. Anyone can help me? looking at bioperl-live/t/SearchIO.t it seems that raw_score is for a $hit, whereas the method for $hsp would be "score": while( my $hit = $r->next_hit ) { my $d = shift @dcompare; ok($hit->name, shift @$d); ok($hit->length, shift @$d); ok($hit->raw_score, shift @$d); ok($hit->significance, shift @$d); my $hsp = $hit->next_hsp; so maybe $hsp->score? Hope it helps, Albert. > > Thanks. > Alynne Oya. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From taerwin at tpg.com.au Mon Jul 25 19:30:05 2005 From: taerwin at tpg.com.au (Tim Erwin) Date: Mon Jul 25 19:27:04 2005 Subject: [Bioperl-l] Blast : Bus Error In-Reply-To: <7c7aa474050721025839062a98@mail.gmail.com> References: <7c7aa474050721014961ce6a6f@mail.gmail.com> <2fb209dd0507210212672ea750@mail.gmail.com> <20050721094219.GA14638@ebi.ac.uk> <7c7aa474050721025839062a98@mail.gmail.com> Message-ID: <1122334205.1779.101.camel@bacp4> I don't think it would be the linux installation as this would be set up on different partitions. It could be faulty memory you can test it with: http://www.memtestosx.org/ The only other thing that I can think of is are you using the right binary? (Did you download a linux binary?) Regards, Tim On Thu, 2005-07-21 at 11:58 +0200, Ferdinand Marl?taz wrote: > Well, I excclude memory problems (2 GB RAM on these machines) and > Database SIze problems (The error happens both with large and little > like 50 Mo DB). On top of that, I've already perform on the two > computers identical blast searches and the other computer runs very > well... > I don't think about Hardware problems too because this bugging > computer have led similar searches in the past without problem... So, > something could happened in the configuration what makes the blast > process faulty ! I just know that somebody have try to install linux > on this computer and don't manage to finish this installation. Maybe a > source of my current problems ? > > What do you all think about that ? > > Thanks > > Ferdi > > > 2005/7/21, Andreas Kahari : > > [not to the list] > > > > Hi guys, > > > > There could also be a problem with a faulty memory module... If > > the error is not consistently reproducible, then this is one > > possible cause. > > > > Running out of memory should not produce a Bus Error. It might > > produce a Segmentation Fault if the program doesn't care that > > the memory allocation failed, but not a Bus Error (as far as I > > know, but I don't run OS X here). > > > > A way to diagnose this is to run exactly the same set-up on two > > identical machines until one of them causes the error more than > > once. If the other machine seems to run ok then it is very > > possible that there is a hardware fault on the first machine (or > > some important system configuration setting is different without > > you knowing it). > > > > Regards, > > Andreas > > > > On Thu, Jul 21, 2005 at 11:12:26AM +0200, Laurent DOUCHY wrote: > > > Hello, > > > This problem can happen for several reasons : > > > your ram is not sufficiant and /or you are working against a db like > > > nt too big for the combination PPC/blast/db; First verify your ram > > > (500Mo are not enougth) , secondly try to work when you can on a part > > > of nt ; try to check the blast optimised by the Bioteam... > > > Cordially > > > > > > LN > > > > > > 2005/7/21, Ferdinand Marl?taz : > > > > Hi, > > > > > > > > I know my current problem is only farly related with bioperl but maybe > > > > omebody would have already encountered it so, it can be tryed... > > > > > > > > I try to run blast (tblastx) on a G5 powermac computer (OS : OS 10.4 > > > > Tiger but the same was happening with 10.3 Panther), it starts perfect > > > > normal but after sometimes, it stops and displays either 'bus error' > > > > or 'segmentation fault'... I'm quite surprised because I've never got > > > > this problem on a second identical G5 in my lab ? I've try to change > > > > blast version from 2.10 to 2.11... but it don't solved the problem. > > > > I verify that it's not related to my databases in reformating them > > > > from fasta... > > > > > > > > So, I don't see where the problem can come from ? Does anybody have > > > > encountered such problems or erros and have a solution or an idea > > > > because I'd like to avoid reinstalling the system on this machine > > > > cause loss of time... > > [cut] > > > > -- > > Andreas K?h?ri > > > > EMBL-EBI/ensembl > > www.ensembl.org > > > > 1024D/C2E163CB F4C4 A41A 665B 448A 3FA9 6AEA 12E3 39DA C2E1 63CB > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From Andrew.Mather at dpi.vic.gov.au Mon Jul 25 21:24:51 2005 From: Andrew.Mather at dpi.vic.gov.au (Andrew.Mather@dpi.vic.gov.au) Date: Mon Jul 25 21:16:58 2005 Subject: [Bioperl-l] Bioperl-ext, Staden and x86_64 Message-ID: Hi Aaron, Yes, it does use the autoconf build system, but unfortunately, --enable-shared (and --enable-shared=yes) made no observable difference. This one's got me stumped. The older version compiles and installs without problems on the x86 machines, but the Opterons don't seem to want to cooperate. Andrew |---------+----------------------------> | | amackey@pcbi.upen| | | n.edu | | | | | | 25/07/2005 10:25 | | | PM | | | | |---------+----------------------------> >------------------------------------------------------------------------------------------------------------------------------| | | | To: Andrew.Mather@dpi.vic.gov.au | | cc: bioperl-l@portal.open-bio.org | | Subject: Re: [Bioperl-l] Bioperl-ext, Staden and x86_64 | >------------------------------------------------------------------------------------------------------------------------------| I haven't looked, but if it's using an autoconf-like build system (i.e. you first type "configure" then "make"), you may need to add "-- enable-shared" to the "configure" invocation. -Aaron On Jul 19, 2005, at 12:11 AM, Andrew.Mather@dpi.vic.gov.au wrote: > I looked around and found Verison 1.9.0 on > Sourceforge and this appears to compile cleanly, however it doesn't > look > like it's left any .so files in /usr/local/lib (or anywhere else > for that > matter). > From n.haigh at sheffield.ac.uk Tue Jul 26 09:02:49 2005 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Tue Jul 26 08:53:03 2005 Subject: [Bioperl-l] getting pubmed id from genbank files Message-ID: I want to be able to supply a list of GI's, retrieve the genbank files and parse out the pubmed id's. I know I can do the first steps of retrieving the genbank files directly, but how do I get the pubmed id's? I've been playing around with things and haven't yet found out if this can be done. Cheers, Nathan ---------------------------------- Nathan Haigh Bioinformatics PostDoctoral Research Associate Room B2 211 Department of Animal and Plant Sciences University of Sheffield Western Bank Sheffield S10 2TN Tel: +44 (0)114 22 20112 Mob: +44 (0)7742 533 569 Fax: +44 (0)114 22 20002 From jason.stajich at duke.edu Tue Jul 26 10:28:15 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Jul 26 10:19:08 2005 Subject: [Bioperl-l] getting pubmed id from genbank files Message-ID: Here is part of the synopsis in Bio::Seq: foreach my $ref ( $ann->get_Annotations('reference') ) { print "Reference ",$ref->title,"\n"; } so do $ref->pubmed instead of $ref->title. -jason > On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote: > >> I want to be able to supply a list of GI's, retrieve the genbank >> files and >> parse out the pubmed id's. >> >> >> >> I know I can do the first steps of retrieving the genbank files >> directly, >> but how do I get the pubmed id's? I've been playing around with >> things and >> haven't yet found out if this can be done. >> >> >> >> Cheers, >> >> Nathan >> >> >> >> ---------------------------------- >> >> Nathan Haigh >> >> Bioinformatics PostDoctoral Research Associate >> >> >> >> Room B2 211 >> >> Department of Animal and Plant Sciences >> >> University of Sheffield >> >> Western Bank >> >> Sheffield >> >> S10 2TN >> >> >> >> Tel: +44 (0)114 22 20112 >> >> Mob: +44 (0)7742 533 569 >> >> Fax: +44 (0)114 22 20002 >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > -- > Jason Stajich > http://www.duke.edu/~jes12 > jason.stajich -at- duke.edu > > -- Jason Stajich http://www.duke.edu/~jes12 jason.stajich -at- duke.edu From wackattack at gmail.com Mon Jul 25 17:28:01 2005 From: wackattack at gmail.com (Wacki) Date: Tue Jul 26 10:23:25 2005 Subject: [Bioperl-l] Accessing database settings In-Reply-To: <1122302049.3293.5.camel@localhost.localdomain> References: <2b8a4eeb05072320531f2349b8@mail.gmail.com> <1122302049.3293.5.camel@localhost.localdomain> Message-ID: <2b8a4eeb050725142819660772@mail.gmail.com> When I run: bp_load_gff.pl -c -d volvox volvox_all.fa volvox_all.gff I get: ------------------------------------------------------------------------------------------------------------------------------------------------ DBI connect('volvox','',...) failed: Access denied for user ''@'localhost' to database 'volvox' at /usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF/Adaptor/dbi/caching_handle.pm line 139 ------------- EXCEPTION ------------- MSG: Can't connect to database: Access denied for user ''@'localhost' to database 'volvox' STACK Bio::DB::GFF::Adaptor::dbi::caching_handle::new /usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF/Adaptor/dbi/caching_handle.pm:89 STACK Bio::DB::GFF::Adaptor::dbi::new /usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF/Adaptor/dbi.pm:93 STACK Bio::DB::GFF::Adaptor::dbi::mysql::new /usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF/Adaptor/dbi/mysql.pm:270 STACK Bio::DB::GFF::new /usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF.pm:599 STACK toplevel /usr/bin/bp_load_gff.pl:103 -------------------------------------- -------------------------------------- On 7/25/05, Scott Cain wrote: > > Hello, > > I am guessing you are writing about accessing a GFF database, since you > are also posting questions to the gbrowse mailing list. Also, from your > posts there, I'm guessing that you have figured it out. If I am not > guessing correctly, please re-ask the question and make it a little more > clear what you mean. > > Scott > > > On Sat, 2005-07-23 at 22:53 -0500, Wacki wrote: > > I installed bioperl but didn't use the right password/username combo. > How do > > I change it? Thanks for the help. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain@cshl.edu > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > Cold Spring Harbor Laboratory > > From n.haigh at sheffield.ac.uk Tue Jul 26 10:49:22 2005 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Tue Jul 26 10:39:36 2005 Subject: [Bioperl-l] getting pubmed id from genbank files In-Reply-To: Message-ID: Yeah, I tried this after I found a previous post from someone wanting to do the same thing and you suggested the same thing that time. However, it doesn't return anything! My script is simply: -- snip -- use Bio::DB::GenBank; use Data::Dumper; my $db = Bio::DB::GenBank->new; while () { chomp; my $seq = $db->get_Seq_by_gi($_); my $ac = $seq->annotation; for my $ref ($ac->get_Annotations('reference')) { print "Reference :", $ref->title,"\t"; print "PubMed :", $ref->pubmed,"\n"; } } -- snip -- if I pass 46367591 on STDIN I get the following output: -- snip -- Reference :Functional divergence in tandemly duplicated Arabidopsis thaliana trypsin inhibitor genes PubMed : Reference :Direct Submission PubMed : Reference :Direct Submission PubMed : -- snip -- If I do Data::Dumper on $ref I get: -- snip -- $VAR1 = bless( { 'authors' => 'Clauss,M.J. and Mitchell-Olds,T.', 'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED 15082560', 'title' => 'Functional divergence in tandemly duplicated Arabidopsis thaliana trypsin inhibitor genes', 'tagname' => 'reference' }, 'Bio::Annotation::Reference' ); -- snip -- The pubmed id doesn't seem to be getting parsed out! Any ideas? Nathan -----Original Message----- From: Jason Stajich [mailto:jason.stajich@duke.edu] Sent: 26 July 2005 15:28 To: Bioperl-l@portal.open-bio.org Cc: Nathan Haigh Subject: [Bioperl-l] getting pubmed id from genbank files Here is part of the synopsis in Bio::Seq: foreach my $ref ( $ann->get_Annotations('reference') ) { print "Reference ",$ref->title,"\n"; } so do $ref->pubmed instead of $ref->title. -jason > On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote: > >> I want to be able to supply a list of GI's, retrieve the genbank >> files and >> parse out the pubmed id's. >> >> >> >> I know I can do the first steps of retrieving the genbank files >> directly, >> but how do I get the pubmed id's? I've been playing around with >> things and >> haven't yet found out if this can be done. >> >> >> >> Cheers, >> >> Nathan >> >> >> >> ---------------------------------- >> >> Nathan Haigh >> >> Bioinformatics PostDoctoral Research Associate >> >> >> >> Room B2 211 >> >> Department of Animal and Plant Sciences >> >> University of Sheffield >> >> Western Bank >> >> Sheffield >> >> S10 2TN >> >> >> >> Tel: +44 (0)114 22 20112 >> >> Mob: +44 (0)7742 533 569 >> >> Fax: +44 (0)114 22 20002 >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > -- > Jason Stajich > http://www.duke.edu/~jes12 > jason.stajich -at- duke.edu > > -- Jason Stajich http://www.duke.edu/~jes12 jason.stajich -at- duke.edu From golharam at umdnj.edu Tue Jul 26 12:05:51 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue Jul 26 11:53:06 2005 Subject: [Bioperl-l] Parsing EMBOSS::needle output Message-ID: <00bd01c591fb$e53cc0f0$0301a8c0@GOLHARMOBILE1> I'm trying to parse the output of EMBOSS::needle (EMBOSS 3.0.0) using `needle -asequence /tmp/genbank.cds -bsequence ../Seq/$tuple/$organism - gapopen 10 -gapextend 0.5 -outfile /tmp/compare.needle 2>/dev/null`; my $alnobj = new Bio::AlignIO(-format => 'emboss', -file => '/tmp/compare.needle'); my $alignment = $alnobj->next_aln; print "\tPercentage Identity: ", $alignment->percentage_identity, "\n"; However $alignment never gets defined. $alnobj never returns an alignment object. I saw other posts relating to this but not solutions... Any ideas? Ryan From cain at cshl.edu Tue Jul 26 12:18:47 2005 From: cain at cshl.edu (Scott Cain) Date: Tue Jul 26 12:09:34 2005 Subject: [Bioperl-l] Accessing database settings In-Reply-To: <2b8a4eeb050725142819660772@mail.gmail.com> References: <2b8a4eeb05072320531f2349b8@mail.gmail.com> <1122302049.3293.5.camel@localhost.localdomain> <2b8a4eeb050725142819660772@mail.gmail.com> Message-ID: <1122394727.3293.41.camel@localhost.localdomain> Generally, the load script will get the user name from the shell, but in your case it seems to not be picking it up. From `perldoc bp_load_gff.pl`, you can supply a --user argument to supply a MySQL username. This, of course, assumes that you have already granted your MySQL user permission to operate on the database as the directions in the INSTALL doc and the tutorial indicate. Scott On Mon, 2005-07-25 at 17:28 -0400, Wacki wrote: > When I run: bp_load_gff.pl -c -d volvox volvox_all.fa volvox_all.gff > > > I get: > ------------------------------------------------------------------------------------------------------------------------------------------------ > > DBI connect('volvox','',...) failed: Access denied for user > ''@'localhost' to database 'volvox' > at /usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF/Adaptor/dbi/caching_handle.pm line 139 > > ------------- EXCEPTION ------------- > MSG: Can't connect to database: Access denied for user ''@'localhost' > to database 'volvox' > STACK > Bio::DB::GFF::Adaptor::dbi::caching_handle::new /usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF/Adaptor/dbi/caching_handle.pm:89 > STACK > Bio::DB::GFF::Adaptor::dbi::new /usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF/Adaptor/dbi.pm:93 > STACK > Bio::DB::GFF::Adaptor::dbi::mysql::new /usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF/Adaptor/dbi/mysql.pm:270 > STACK > Bio::DB::GFF::new /usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF.pm:599 > STACK toplevel /usr/bin/bp_load_gff.pl:103 > > -------------------------------------- > > > -------------------------------------- > > > On 7/25/05, Scott Cain wrote: > Hello, > > I am guessing you are writing about accessing a GFF database, > since you > are also posting questions to the gbrowse mailing list. Also, > from your > posts there, I'm guessing that you have figured it out. If I > am not > guessing correctly, please re-ask the question and make it a > little more > clear what you mean. > > Scott > > > On Sat, 2005-07-23 at 22:53 -0500, Wacki wrote: > > I installed bioperl but didn't use the right > password/username combo. How do > > I change it? Thanks for the help. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. > cain@cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From hlapp at gnf.org Tue Jul 26 13:05:14 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jul 26 12:55:47 2005 Subject: [Bioperl-l] getting pubmed id from genbank files In-Reply-To: References: Message-ID: On Jul 26, 2005, at 7:49 AM, Nathan Haigh wrote: > -- snip -- > $VAR1 = bless( { > 'authors' => 'Clauss,M.J. and Mitchell-Olds,T.', > 'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED > 15082560', > 'title' => 'Functional divergence in tandemly duplicated > Arabidopsis > thaliana trypsin inhibitor genes', > 'tagname' => 'reference' > }, 'Bio::Annotation::Reference' ); > -- snip -- This is odd. The PUBMED line should not be concatenated with the JOURNAL line. I wonder where this happens and why. Can you download the record from NCBI (using the web interface, format 'GenBank', 'Send all to file') and then parse it with Bio::SeqIO? If it works then the problem must be in the code that deals with the HTTP-response. -hilmar > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: 26 July 2005 15:28 > To: Bioperl-l@portal.open-bio.org > Cc: Nathan Haigh > Subject: [Bioperl-l] getting pubmed id from genbank files > > > > Here is part of the synopsis in Bio::Seq: > > foreach my $ref ( $ann->get_Annotations('reference') ) { > print "Reference ",$ref->title,"\n"; > } > > so do $ref->pubmed instead of $ref->title. > > > -jason >> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote: >> >>> I want to be able to supply a list of GI's, retrieve the genbank >>> files and >>> parse out the pubmed id's. >>> >>> >>> >>> I know I can do the first steps of retrieving the genbank files >>> directly, >>> but how do I get the pubmed id's? I've been playing around with >>> things and >>> haven't yet found out if this can be done. >>> >>> >>> >>> Cheers, >>> >>> Nathan >>> >>> >>> >>> ---------------------------------- >>> >>> Nathan Haigh >>> >>> Bioinformatics PostDoctoral Research Associate >>> >>> >>> >>> Room B2 211 >>> >>> Department of Animal and Plant Sciences >>> >>> University of Sheffield >>> >>> Western Bank >>> >>> Sheffield >>> >>> S10 2TN >>> >>> >>> >>> Tel: +44 (0)114 22 20112 >>> >>> Mob: +44 (0)7742 533 569 >>> >>> Fax: +44 (0)114 22 20002 >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> -- >> Jason Stajich >> http://www.duke.edu/~jes12 >> jason.stajich -at- duke.edu >> >> > -- > Jason Stajich > http://www.duke.edu/~jes12 > jason.stajich -at- duke.edu > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hartzell at kestrel.alerce.com Tue Jul 26 14:54:23 2005 From: hartzell at kestrel.alerce.com (George Hartzell) Date: Tue Jul 26 14:44:42 2005 Subject: [Bioperl-l] is the Bio::Ext::Align stuff supposed to work? Message-ID: <17126.34527.231530.271197@satchel.alerce.com> I've been playing with Bio::Tools::dpAlign, which involved installing Bio::Ext. Bio::Ext did a really poor job of installing itself (FreeBSD 6-{various}, perl 5.8.[67]). I managed to mv and cp the various parts around to where they were supposed to be. I'm not sure if it's me, FreeBSD, or Bio::Ext. Does it work for other folks? The tests all work fine, they get away with some judicious -I../this-that-the-other, but if you copy e.g. the Align test file to your home directory and just try to run it, it doesn't work. In particular, the .so and .bs files didn't end up where they belong, and I ended up with /.../Bio/Ext/Align/Align.pm instead /.../Bio/Ext/Align.pm. I'm sure I can figure it out and pass some patches back, just wanted to understand who else might be seeing the problem. g. From hlapp at gnf.org Tue Jul 26 16:08:33 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jul 26 15:58:56 2005 Subject: [Bioperl-l] getting pubmed id from genbank files In-Reply-To: References: Message-ID: <9af6be32bc65e26bc28dfe18d1d68d8e@gnf.org> There are indeed JOURNAL entries spanning multiple lines; the parser was once unable to deal with this and was subsequently fixed ... as we see this introduced other problems ... On Jul 26, 2005, at 1:07 PM, Barry Moore wrote: > Nathan- > > That sounds like you are using bioperl 1.4? The error is in > Bio/SeqIO/genbank.pm and was fixed by Jason in cvs version 1.102 of > that file. However the current code still looks a bit odd to me. > Starting at line 1068 of the current cvs version (1.119) of genebank.pm > we have: > > 1068 if (/^\s{2}JOURNAL\s+(.*)/o) { > 1069 push(@loc, $1); > 1070 while ( defined($_ = $self->_readline) ) { > 1071 # we only match when there are at least 4 spaces > 1072 # there is probably a better way to match this > 1073 # as it assumes that the describing tag is short enough > 1074 /^\s{4,}(.*)/o && do { push(@loc, $1); > 1075 next; > 1076 }; > 1077 last; > 1078 } > 1079 $ref->location(join(' ', @loc)); > > This is all dealing with parsing the Journal line which is handled fine > by lines 1068-69. The while loop at 1070 looks at successive lines to > find something to add to the Journal line. The regex at line 1074 used > to read /^\s{3,}(.*)/o which would not match if the next line after > JOURNAL began with ' MEDLINE', but would match ' PUBMED' (Nathan's > situation) causing that line to be added to the JOURNAL line. Is there > ever a JOURNAL entry with more than one line? If so, shouldn't the > following lines always be untagged and thus indented 12 making the > regex > /^\s{12}(.*)/o safer. The current situation would add any line to > JOURNAL line if it's tag is shorter than 6 characters, and I don't > think > that's what we want. > > Barry > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Tuesday, July 26, 2005 11:05 AM > To: n.haigh@sheffield.ac.uk > Cc: 'bioperl-l' > Subject: Re: [Bioperl-l] getting pubmed id from genbank files > > > On Jul 26, 2005, at 7:49 AM, Nathan Haigh wrote: > >> -- snip -- >> $VAR1 = bless( { >> 'authors' => 'Clauss,M.J. and Mitchell-Olds,T.', >> 'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED >> 15082560', >> 'title' => 'Functional divergence in tandemly duplicated >> Arabidopsis >> thaliana trypsin inhibitor genes', >> 'tagname' => 'reference' >> }, 'Bio::Annotation::Reference' ); >> -- snip -- > > This is odd. The PUBMED line should not be concatenated with the > JOURNAL line. I wonder where this happens and why. Can you download the > record from NCBI (using the web interface, format 'GenBank', 'Send all > to file') and then parse it with Bio::SeqIO? If it works then the > problem must be in the code that deals with the HTTP-response. > > -hilmar > > >> >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich@duke.edu] >> Sent: 26 July 2005 15:28 >> To: Bioperl-l@portal.open-bio.org >> Cc: Nathan Haigh >> Subject: [Bioperl-l] getting pubmed id from genbank files >> >> >> >> Here is part of the synopsis in Bio::Seq: >> >> foreach my $ref ( $ann->get_Annotations('reference') ) { >> print "Reference ",$ref->title,"\n"; >> } >> >> so do $ref->pubmed instead of $ref->title. >> >> >> -jason >>> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote: >>> >>>> I want to be able to supply a list of GI's, retrieve the genbank >>>> files and >>>> parse out the pubmed id's. >>>> >>>> >>>> >>>> I know I can do the first steps of retrieving the genbank files >>>> directly, >>>> but how do I get the pubmed id's? I've been playing around with >>>> things and >>>> haven't yet found out if this can be done. >>>> >>>> >>>> >>>> Cheers, >>>> >>>> Nathan >>>> >>>> >>>> >>>> ---------------------------------- >>>> >>>> Nathan Haigh >>>> >>>> Bioinformatics PostDoctoral Research Associate >>>> >>>> >>>> >>>> Room B2 211 >>>> >>>> Department of Animal and Plant Sciences >>>> >>>> University of Sheffield >>>> >>>> Western Bank >>>> >>>> Sheffield >>>> >>>> S10 2TN >>>> >>>> >>>> >>>> Tel: +44 (0)114 22 20112 >>>> >>>> Mob: +44 (0)7742 533 569 >>>> >>>> Fax: +44 (0)114 22 20002 >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> -- >>> Jason Stajich >>> http://www.duke.edu/~jes12 >>> jason.stajich -at- duke.edu >>> >>> >> -- >> Jason Stajich >> http://www.duke.edu/~jes12 >> jason.stajich -at- duke.edu >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Tue Jul 26 16:20:07 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Jul 26 16:10:40 2005 Subject: [Bioperl-l] error installing bioperl-db In-Reply-To: <42E63FA9.9070001@wam.umd.edu> References: <42DD78A7.5060507@wam.umd.edu> <00afefecc06d2d2bb22f5be09fb4410a@gmx.net> <42DE7912.4020300@wam.umd.edu> <9e25eb983a2480c4493d3f0422ad0365@gmx.net> <42DFD96C.1050003@wam.umd.edu> <42E0E8F0.6080904@wam.umd.edu> <42E63FA9.9070001@wam.umd.edu> Message-ID: <13450598ab5947a3aad589497b44b3e7@gmx.net> I fixed this. Should propagate to anonymous cvs within the next hours, the new version of the module will be 1.3. I tested against an PostgreSQL 8.0.3 server and all tests pass. For the curious, the problem was that DBD::Pg binds all parameters as type VARCHAR by default, and does use 'real' prepared statements by default if the server is 8.x but not if it's 7.3.x. This is why the problem only surfaces when using an 8.x server. The server apparently doesn't like VARCHAR-type parameters bound to the SUBSTRING arguments, so what I did was explicitly specify the type as integer to the $sth->bind_param() call. -hilmar On Jul 26, 2005, at 6:50 AM, Andrew Stewart wrote: > I updated BiosequenceAdaptorDriver.pm to 1.2. Here's the first > erroneous bit of the make test. Looks like the same thing? > > -Andrew > > > preparing SELECT statement: SELECT SUBSTRING(seq FROM ? FOR ?) FROM > biosequence WHERE bioentry_id = ? > ok 30 > ok 31 > DBD::Pg::st execute failed: ERROR: invalid escape string > HINT: Escape string must be empty or one character. > CONTEXT: SQL function "substring" statement 1 > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Tue Jul 26 17:42:08 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Jul 26 17:32:35 2005 Subject: [Bioperl-l] getting pubmed id from genbank files In-Reply-To: References: Message-ID: <0d27e80a4a44b81e1686149febdfb6f2@gmx.net> Right - but don't tell only me :-) On Jul 26, 2005, at 1:29 PM, Barry Moore wrote: > Then would it be safe to assume that in the case of multi-line JOURNAL > entries, all lines following the initial tagged JOURNAL line would be > untagged? If so, the regex could probably be made a bit safer. > > Barry > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gnf.org] > Sent: Tuesday, July 26, 2005 2:09 PM > To: Barry Moore > Cc: bioperl-l; n.haigh@sheffield.ac.uk > Subject: Re: [Bioperl-l] getting pubmed id from genbank files > > There are indeed JOURNAL entries spanning multiple lines; the parser > was once unable to deal with this and was subsequently fixed ... as we > see this introduced other problems ... > > On Jul 26, 2005, at 1:07 PM, Barry Moore wrote: > >> Nathan- >> >> That sounds like you are using bioperl 1.4? The error is in >> Bio/SeqIO/genbank.pm and was fixed by Jason in cvs version 1.102 of >> that file. However the current code still looks a bit odd to me. >> Starting at line 1068 of the current cvs version (1.119) of > genebank.pm >> we have: >> >> 1068 if (/^\s{2}JOURNAL\s+(.*)/o) { >> 1069 push(@loc, $1); >> 1070 while ( defined($_ = $self->_readline) ) { >> 1071 # we only match when there are at least 4 spaces >> 1072 # there is probably a better way to match this >> 1073 # as it assumes that the describing tag is short enough >> 1074 /^\s{4,}(.*)/o && do { push(@loc, $1); >> 1075 next; >> 1076 }; >> 1077 last; >> 1078 } >> 1079 $ref->location(join(' ', @loc)); >> >> This is all dealing with parsing the Journal line which is handled > fine >> by lines 1068-69. The while loop at 1070 looks at successive lines to >> find something to add to the Journal line. The regex at line 1074 > used >> to read /^\s{3,}(.*)/o which would not match if the next line after >> JOURNAL began with ' MEDLINE', but would match ' PUBMED' (Nathan's >> situation) causing that line to be added to the JOURNAL line. Is > there >> ever a JOURNAL entry with more than one line? If so, shouldn't the >> following lines always be untagged and thus indented 12 making the >> regex >> /^\s{12}(.*)/o safer. The current situation would add any line to >> JOURNAL line if it's tag is shorter than 6 characters, and I don't >> think >> that's what we want. >> >> Barry >> >> -----Original Message----- >> From: bioperl-l-bounces@portal.open-bio.org >> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar > Lapp >> Sent: Tuesday, July 26, 2005 11:05 AM >> To: n.haigh@sheffield.ac.uk >> Cc: 'bioperl-l' >> Subject: Re: [Bioperl-l] getting pubmed id from genbank files >> >> >> On Jul 26, 2005, at 7:49 AM, Nathan Haigh wrote: >> >>> -- snip -- >>> $VAR1 = bless( { >>> 'authors' => 'Clauss,M.J. and Mitchell-Olds,T.', >>> 'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED >>> 15082560', >>> 'title' => 'Functional divergence in tandemly duplicated >>> Arabidopsis >>> thaliana trypsin inhibitor genes', >>> 'tagname' => 'reference' >>> }, 'Bio::Annotation::Reference' ); >>> -- snip -- >> >> This is odd. The PUBMED line should not be concatenated with the >> JOURNAL line. I wonder where this happens and why. Can you download > the >> record from NCBI (using the web interface, format 'GenBank', 'Send all >> to file') and then parse it with Bio::SeqIO? If it works then the >> problem must be in the code that deals with the HTTP-response. >> >> -hilmar >> >> >>> >>> -----Original Message----- >>> From: Jason Stajich [mailto:jason.stajich@duke.edu] >>> Sent: 26 July 2005 15:28 >>> To: Bioperl-l@portal.open-bio.org >>> Cc: Nathan Haigh >>> Subject: [Bioperl-l] getting pubmed id from genbank files >>> >>> >>> >>> Here is part of the synopsis in Bio::Seq: >>> >>> foreach my $ref ( $ann->get_Annotations('reference') ) { >>> print "Reference ",$ref->title,"\n"; >>> } >>> >>> so do $ref->pubmed instead of $ref->title. >>> >>> >>> -jason >>>> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote: >>>> >>>>> I want to be able to supply a list of GI's, retrieve the genbank >>>>> files and >>>>> parse out the pubmed id's. >>>>> >>>>> >>>>> >>>>> I know I can do the first steps of retrieving the genbank files >>>>> directly, >>>>> but how do I get the pubmed id's? I've been playing around with >>>>> things and >>>>> haven't yet found out if this can be done. >>>>> >>>>> >>>>> >>>>> Cheers, >>>>> >>>>> Nathan >>>>> >>>>> >>>>> >>>>> ---------------------------------- >>>>> >>>>> Nathan Haigh >>>>> >>>>> Bioinformatics PostDoctoral Research Associate >>>>> >>>>> >>>>> >>>>> Room B2 211 >>>>> >>>>> Department of Animal and Plant Sciences >>>>> >>>>> University of Sheffield >>>>> >>>>> Western Bank >>>>> >>>>> Sheffield >>>>> >>>>> S10 2TN >>>>> >>>>> >>>>> >>>>> Tel: +44 (0)114 22 20112 >>>>> >>>>> Mob: +44 (0)7742 533 569 >>>>> >>>>> Fax: +44 (0)114 22 20002 >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> -- >>>> Jason Stajich >>>> http://www.duke.edu/~jes12 >>>> jason.stajich -at- duke.edu >>>> >>>> >>> -- >>> Jason Stajich >>> http://www.duke.edu/~jes12 >>> jason.stajich -at- duke.edu >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason.stajich at duke.edu Tue Jul 26 21:13:09 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Jul 26 21:03:52 2005 Subject: [Bioperl-l] Parsing EMBOSS::needle output In-Reply-To: <00bd01c591fb$e53cc0f0$0301a8c0@GOLHARMOBILE1> References: <00bd01c591fb$e53cc0f0$0301a8c0@GOLHARMOBILE1> Message-ID: <4816562302e855bdf87abb347c216c8d@duke.edu> I think the "emboss" format changed in 3.0.0 solutions: a) fix the AlignIO::emboss parser to handle both flavors (old and new) b) have it output MSF format and use AlignIO::msf. -jason On Jul 26, 2005, at 9:05 AM, Ryan Golhar wrote: > I'm trying to parse the output of EMBOSS::needle (EMBOSS 3.0.0) using > > `needle -asequence /tmp/genbank.cds -bsequence ../Seq/$tuple/$organism > - > gapopen 10 -gapextend 0.5 -outfile /tmp/compare.needle 2>/dev/null`; > > my $alnobj = new Bio::AlignIO(-format => 'emboss', > -file => '/tmp/compare.needle'); > my $alignment = $alnobj->next_aln; > print "\tPercentage Identity: ", $alignment->percentage_identity, "\n"; > > However $alignment never gets defined. $alnobj never returns an > alignment object. I saw other posts relating to this but not > solutions... > > Any ideas? > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich http://www.duke.edu/~jes12 jason.stajich -at- duke.edu From bmoore at genetics.utah.edu Tue Jul 26 16:07:16 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Tue Jul 26 21:13:27 2005 Subject: [Bioperl-l] getting pubmed id from genbank files Message-ID: Nathan- That sounds like you are using bioperl 1.4? The error is in Bio/SeqIO/genbank.pm and was fixed by Jason in cvs version 1.102 of that file. However the current code still looks a bit odd to me. Starting at line 1068 of the current cvs version (1.119) of genebank.pm we have: 1068 if (/^\s{2}JOURNAL\s+(.*)/o) { 1069 push(@loc, $1); 1070 while ( defined($_ = $self->_readline) ) { 1071 # we only match when there are at least 4 spaces 1072 # there is probably a better way to match this 1073 # as it assumes that the describing tag is short enough 1074 /^\s{4,}(.*)/o && do { push(@loc, $1); 1075 next; 1076 }; 1077 last; 1078 } 1079 $ref->location(join(' ', @loc)); This is all dealing with parsing the Journal line which is handled fine by lines 1068-69. The while loop at 1070 looks at successive lines to find something to add to the Journal line. The regex at line 1074 used to read /^\s{3,}(.*)/o which would not match if the next line after JOURNAL began with ' MEDLINE', but would match ' PUBMED' (Nathan's situation) causing that line to be added to the JOURNAL line. Is there ever a JOURNAL entry with more than one line? If so, shouldn't the following lines always be untagged and thus indented 12 making the regex /^\s{12}(.*)/o safer. The current situation would add any line to JOURNAL line if it's tag is shorter than 6 characters, and I don't think that's what we want. Barry -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp Sent: Tuesday, July 26, 2005 11:05 AM To: n.haigh@sheffield.ac.uk Cc: 'bioperl-l' Subject: Re: [Bioperl-l] getting pubmed id from genbank files On Jul 26, 2005, at 7:49 AM, Nathan Haigh wrote: > -- snip -- > $VAR1 = bless( { > 'authors' => 'Clauss,M.J. and Mitchell-Olds,T.', > 'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED > 15082560', > 'title' => 'Functional divergence in tandemly duplicated > Arabidopsis > thaliana trypsin inhibitor genes', > 'tagname' => 'reference' > }, 'Bio::Annotation::Reference' ); > -- snip -- This is odd. The PUBMED line should not be concatenated with the JOURNAL line. I wonder where this happens and why. Can you download the record from NCBI (using the web interface, format 'GenBank', 'Send all to file') and then parse it with Bio::SeqIO? If it works then the problem must be in the code that deals with the HTTP-response. -hilmar > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: 26 July 2005 15:28 > To: Bioperl-l@portal.open-bio.org > Cc: Nathan Haigh > Subject: [Bioperl-l] getting pubmed id from genbank files > > > > Here is part of the synopsis in Bio::Seq: > > foreach my $ref ( $ann->get_Annotations('reference') ) { > print "Reference ",$ref->title,"\n"; > } > > so do $ref->pubmed instead of $ref->title. > > > -jason >> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote: >> >>> I want to be able to supply a list of GI's, retrieve the genbank >>> files and >>> parse out the pubmed id's. >>> >>> >>> >>> I know I can do the first steps of retrieving the genbank files >>> directly, >>> but how do I get the pubmed id's? I've been playing around with >>> things and >>> haven't yet found out if this can be done. >>> >>> >>> >>> Cheers, >>> >>> Nathan >>> >>> >>> >>> ---------------------------------- >>> >>> Nathan Haigh >>> >>> Bioinformatics PostDoctoral Research Associate >>> >>> >>> >>> Room B2 211 >>> >>> Department of Animal and Plant Sciences >>> >>> University of Sheffield >>> >>> Western Bank >>> >>> Sheffield >>> >>> S10 2TN >>> >>> >>> >>> Tel: +44 (0)114 22 20112 >>> >>> Mob: +44 (0)7742 533 569 >>> >>> Fax: +44 (0)114 22 20002 >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> -- >> Jason Stajich >> http://www.duke.edu/~jes12 >> jason.stajich -at- duke.edu >> >> > -- > Jason Stajich > http://www.duke.edu/~jes12 > jason.stajich -at- duke.edu > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From bmoore at genetics.utah.edu Tue Jul 26 16:31:03 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Tue Jul 26 21:13:30 2005 Subject: [Bioperl-l] getting pubmed id from genbank files Message-ID: Then would it be safe to assume that in the case of multi-line JOURNAL entries, all lines following the initial tagged JOURNAL line would be untagged? If so, the regex could probably be made a bit safer. Barry -----Original Message----- From: Hilmar Lapp [mailto:hlapp@gnf.org] Sent: Tuesday, July 26, 2005 2:09 PM To: Barry Moore Cc: bioperl-l; n.haigh@sheffield.ac.uk Subject: Re: [Bioperl-l] getting pubmed id from genbank files There are indeed JOURNAL entries spanning multiple lines; the parser was once unable to deal with this and was subsequently fixed ... as we see this introduced other problems ... On Jul 26, 2005, at 1:07 PM, Barry Moore wrote: > Nathan- > > That sounds like you are using bioperl 1.4? The error is in > Bio/SeqIO/genbank.pm and was fixed by Jason in cvs version 1.102 of > that file. However the current code still looks a bit odd to me. > Starting at line 1068 of the current cvs version (1.119) of genebank.pm > we have: > > 1068 if (/^\s{2}JOURNAL\s+(.*)/o) { > 1069 push(@loc, $1); > 1070 while ( defined($_ = $self->_readline) ) { > 1071 # we only match when there are at least 4 spaces > 1072 # there is probably a better way to match this > 1073 # as it assumes that the describing tag is short enough > 1074 /^\s{4,}(.*)/o && do { push(@loc, $1); > 1075 next; > 1076 }; > 1077 last; > 1078 } > 1079 $ref->location(join(' ', @loc)); > > This is all dealing with parsing the Journal line which is handled fine > by lines 1068-69. The while loop at 1070 looks at successive lines to > find something to add to the Journal line. The regex at line 1074 used > to read /^\s{3,}(.*)/o which would not match if the next line after > JOURNAL began with ' MEDLINE', but would match ' PUBMED' (Nathan's > situation) causing that line to be added to the JOURNAL line. Is there > ever a JOURNAL entry with more than one line? If so, shouldn't the > following lines always be untagged and thus indented 12 making the > regex > /^\s{12}(.*)/o safer. The current situation would add any line to > JOURNAL line if it's tag is shorter than 6 characters, and I don't > think > that's what we want. > > Barry > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Tuesday, July 26, 2005 11:05 AM > To: n.haigh@sheffield.ac.uk > Cc: 'bioperl-l' > Subject: Re: [Bioperl-l] getting pubmed id from genbank files > > > On Jul 26, 2005, at 7:49 AM, Nathan Haigh wrote: > >> -- snip -- >> $VAR1 = bless( { >> 'authors' => 'Clauss,M.J. and Mitchell-Olds,T.', >> 'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED >> 15082560', >> 'title' => 'Functional divergence in tandemly duplicated >> Arabidopsis >> thaliana trypsin inhibitor genes', >> 'tagname' => 'reference' >> }, 'Bio::Annotation::Reference' ); >> -- snip -- > > This is odd. The PUBMED line should not be concatenated with the > JOURNAL line. I wonder where this happens and why. Can you download the > record from NCBI (using the web interface, format 'GenBank', 'Send all > to file') and then parse it with Bio::SeqIO? If it works then the > problem must be in the code that deals with the HTTP-response. > > -hilmar > > >> >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich@duke.edu] >> Sent: 26 July 2005 15:28 >> To: Bioperl-l@portal.open-bio.org >> Cc: Nathan Haigh >> Subject: [Bioperl-l] getting pubmed id from genbank files >> >> >> >> Here is part of the synopsis in Bio::Seq: >> >> foreach my $ref ( $ann->get_Annotations('reference') ) { >> print "Reference ",$ref->title,"\n"; >> } >> >> so do $ref->pubmed instead of $ref->title. >> >> >> -jason >>> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote: >>> >>>> I want to be able to supply a list of GI's, retrieve the genbank >>>> files and >>>> parse out the pubmed id's. >>>> >>>> >>>> >>>> I know I can do the first steps of retrieving the genbank files >>>> directly, >>>> but how do I get the pubmed id's? I've been playing around with >>>> things and >>>> haven't yet found out if this can be done. >>>> >>>> >>>> >>>> Cheers, >>>> >>>> Nathan >>>> >>>> >>>> >>>> ---------------------------------- >>>> >>>> Nathan Haigh >>>> >>>> Bioinformatics PostDoctoral Research Associate >>>> >>>> >>>> >>>> Room B2 211 >>>> >>>> Department of Animal and Plant Sciences >>>> >>>> University of Sheffield >>>> >>>> Western Bank >>>> >>>> Sheffield >>>> >>>> S10 2TN >>>> >>>> >>>> >>>> Tel: +44 (0)114 22 20112 >>>> >>>> Mob: +44 (0)7742 533 569 >>>> >>>> Fax: +44 (0)114 22 20002 >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> -- >>> Jason Stajich >>> http://www.duke.edu/~jes12 >>> jason.stajich -at- duke.edu >>> >>> >> -- >> Jason Stajich >> http://www.duke.edu/~jes12 >> jason.stajich -at- duke.edu >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From bmoore at genetics.utah.edu Tue Jul 26 16:56:11 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Tue Jul 26 21:13:31 2005 Subject: [Bioperl-l] Parsing EMBOSS::needle output Message-ID: Ryan- This works for me with my own sequence files. I don't know if your mailer line wrapped your script, but when I copied your script I had to fix the '-gapopen' parameter in your needle command line. You can't have any whitespace between the '-' and 'gapopen'. Did you check to be sure that /tmp/compare.needle was actually written? If you still have trouble, you can send along the files that your comparing, and I'll see if they run for me. Barry -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Ryan Golhar Sent: Tuesday, July 26, 2005 10:06 AM To: 'Bioperl List' Subject: [Bioperl-l] Parsing EMBOSS::needle output I'm trying to parse the output of EMBOSS::needle (EMBOSS 3.0.0) using `needle -asequence /tmp/genbank.cds -bsequence ../Seq/$tuple/$organism - gapopen 10 -gapextend 0.5 -outfile /tmp/compare.needle 2>/dev/null`; my $alnobj = new Bio::AlignIO(-format => 'emboss', -file => '/tmp/compare.needle'); my $alignment = $alnobj->next_aln; print "\tPercentage Identity: ", $alignment->percentage_identity, "\n"; However $alignment never gets defined. $alnobj never returns an alignment object. I saw other posts relating to this but not solutions... Any ideas? Ryan _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From N.Haigh at sheffield.ac.uk Wed Jul 27 04:09:59 2005 From: N.Haigh at sheffield.ac.uk (Nathan Haigh) Date: Wed Jul 27 04:00:56 2005 Subject: [Bioperl-l] getting pubmed id from genbank files In-Reply-To: References: Message-ID: <1122451799.42e7415740f57@webmail.shef.ac.uk> Yeah, i'm pretty sure i was using bioperl-live updated that morning. Your explaination of the problem seems feasible from what i was looking at in the perl debugger. I'll look into this a bit more later this morning. Nathan Quoting Barry Moore : > Nathan- > > That sounds like you are using bioperl 1.4? The error is in > Bio/SeqIO/genbank.pm and was fixed by Jason in cvs version 1.102 of > that file. However the current code still looks a bit odd to me. > Starting at line 1068 of the current cvs version (1.119) of genebank.pm > we have: > > 1068 if (/^\s{2}JOURNAL\s+(.*)/o) { > 1069 push(@loc, $1); > 1070 while ( defined($_ = $self->_readline) ) { > 1071 # we only match when there are at least 4 spaces > 1072 # there is probably a better way to match this > 1073 # as it assumes that the describing tag is short enough > 1074 /^\s{4,}(.*)/o && do { push(@loc, $1); > 1075 next; > 1076 }; > 1077 last; > 1078 } > 1079 $ref->location(join(' ', @loc)); > > This is all dealing with parsing the Journal line which is handled fine > by lines 1068-69. The while loop at 1070 looks at successive lines to > find something to add to the Journal line. The regex at line 1074 used > to read /^\s{3,}(.*)/o which would not match if the next line after > JOURNAL began with ' MEDLINE', but would match ' PUBMED' (Nathan's > situation) causing that line to be added to the JOURNAL line. Is there > ever a JOURNAL entry with more than one line? If so, shouldn't the > following lines always be untagged and thus indented 12 making the regex > /^\s{12}(.*)/o safer. The current situation would add any line to > JOURNAL line if it's tag is shorter than 6 characters, and I don't think > that's what we want. > > Barry > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Tuesday, July 26, 2005 11:05 AM > To: n.haigh@sheffield.ac.uk > Cc: 'bioperl-l' > Subject: Re: [Bioperl-l] getting pubmed id from genbank files > > > On Jul 26, 2005, at 7:49 AM, Nathan Haigh wrote: > > > -- snip -- > > $VAR1 = bless( { > > 'authors' => 'Clauss,M.J. and Mitchell-Olds,T.', > > 'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED > > 15082560', > > 'title' => 'Functional divergence in tandemly duplicated > > Arabidopsis > > thaliana trypsin inhibitor genes', > > 'tagname' => 'reference' > > }, 'Bio::Annotation::Reference' ); > > -- snip -- > > This is odd. The PUBMED line should not be concatenated with the > JOURNAL line. I wonder where this happens and why. Can you download the > record from NCBI (using the web interface, format 'GenBank', 'Send all > to file') and then parse it with Bio::SeqIO? If it works then the > problem must be in the code that deals with the HTTP-response. > > -hilmar > > > > > > -----Original Message----- > > From: Jason Stajich [mailto:jason.stajich@duke.edu] > > Sent: 26 July 2005 15:28 > > To: Bioperl-l@portal.open-bio.org > > Cc: Nathan Haigh > > Subject: [Bioperl-l] getting pubmed id from genbank files > > > > > > > > Here is part of the synopsis in Bio::Seq: > > > > foreach my $ref ( $ann->get_Annotations('reference') ) { > > print "Reference ",$ref->title,"\n"; > > } > > > > so do $ref->pubmed instead of $ref->title. > > > > > > -jason > >> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote: > >> > >>> I want to be able to supply a list of GI's, retrieve the genbank > >>> files and > >>> parse out the pubmed id's. > >>> > >>> > >>> > >>> I know I can do the first steps of retrieving the genbank files > >>> directly, > >>> but how do I get the pubmed id's? I've been playing around with > >>> things and > >>> haven't yet found out if this can be done. > >>> > >>> > >>> > >>> Cheers, > >>> > >>> Nathan > >>> > >>> > >>> > >>> ---------------------------------- > >>> > >>> Nathan Haigh > >>> > >>> Bioinformatics PostDoctoral Research Associate > >>> > >>> > >>> > >>> Room B2 211 > >>> > >>> Department of Animal and Plant Sciences > >>> > >>> University of Sheffield > >>> > >>> Western Bank > >>> > >>> Sheffield > >>> > >>> S10 2TN > >>> > >>> > >>> > >>> Tel: +44 (0)114 22 20112 > >>> > >>> Mob: +44 (0)7742 533 569 > >>> > >>> Fax: +44 (0)114 22 20002 > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> -- > >> Jason Stajich > >> http://www.duke.edu/~jes12 > >> jason.stajich -at- duke.edu > >> > >> > > -- > > Jason Stajich > > http://www.duke.edu/~jes12 > > jason.stajich -at- duke.edu > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From Andrew.Mather at dpi.vic.gov.au Wed Jul 27 06:52:46 2005 From: Andrew.Mather at dpi.vic.gov.au (Andrew.Mather@dpi.vic.gov.au) Date: Wed Jul 27 06:44:09 2005 Subject: [Bioperl-l] is the Bio::Ext::Align stuff supposed to work? Message-ID: Hi George > > I've been playing with Bio::Tools::dpAlign, which involved installing > Bio::Ext. > > Bio::Ext did a really poor job of installing itself (FreeBSD > 6-{various}, perl 5.8.[67]). I managed to mv and cp the various parts > around to where they were supposed to be. > > I'm not sure if it's me, FreeBSD, or Bio::Ext. Does it work for other > folks? The tests all work fine, they get away with some judicious > -I../this-that-the-other, but if you copy e.g. the Align test file to > your home directory and just try to run it, it doesn't work. > > In particular, the .so and .bs files didn't end up where they belong, > and I ended up with /.../Bio/Ext/Align/Align.pm instead > /.../Bio/Ext/Align.pm. > > I'm sure I can figure it out and pass some patches back, just wanted > to understand who else might be seeing the problem. > I've been having a few battles with staden io_lib myself, which have caused problems with Bio::Ext. I have a system with a mix of RHEL3 on IA32 and AMD64 machines. The staden compiled fine on the Intel machines and once I'd copied the usual .h files to where they were expected, Ext set up fine. On the AMD's though, no such luck. I had to find the 1.9 (or is it 1.1.9 ?..I'm not near the machines now) version before it would even compile, however it doesn't create any .so files at all. This isn't strictly a bioperl problem I suppose, but it is related. There were a couple of suggestions raised here, but so far no good. I've had to go on to other things at the moment, but I'm still trying find a solution when I can get back to it. Andrew Animal Genetics and Genomics, PIRVic Attwood 475 Mickleham Road, Attwood, 3049 ph +61 3 92174342 mob 0413 009 761 ---------------- There are 10 kinds of people...those who understand binary and those who don't. From senger at ebi.ac.uk Wed Jul 27 10:57:33 2005 From: senger at ebi.ac.uk (Martin Senger) Date: Wed Jul 27 10:48:04 2005 Subject: [Bioperl-l] Bio::Tools::Run::Analysis - small but important changes In-Reply-To: Message-ID: Hi, This is a similar messages as was about Bio::Biblio recently: the default location of the SOAP-based services running at EBI has been changed. Nothing has changed to the API of these services. If you are using these services (details at http://www.ebi.ac.uk/soaplab/Perl_Client.html; but the pages are not yet updated) just update your bioperl modules, or overwrite in your scripts the default location by the new one: -location => 'http://www.ebi.ac.uk/soaplab/service' Regards, Martin -- Martin Senger EMBL Outstation - Hinxton Senger@EBI.ac.uk European Bioinformatics Institute Phone: (+44) 1223 494636 Wellcome Trust Genome Campus (Switchboard: 494444) Hinxton Fax : (+44) 1223 494468 Cambridge CB10 1SD United Kingdom http://industry.ebi.ac.uk/~senger From adil_iqbal75 at yahoo.com Wed Jul 27 14:21:38 2005 From: adil_iqbal75 at yahoo.com (adil iqbal) Date: Wed Jul 27 14:16:45 2005 Subject: [Bioperl-l] app kia bar bar messeges bhajthay ho kiu koi khas bhat hai agar ho plz urdo main likho do, nt like english ok Message-ID: <20050727182138.59616.qmail@web32401.mail.mud.yahoo.com> __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From n.haigh at sheffield.ac.uk Thu Jul 28 07:36:56 2005 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Thu Jul 28 07:28:44 2005 Subject: [Bioperl-l] getting pubmed id from genbank files In-Reply-To: <1122451799.42e7415740f57@webmail.shef.ac.uk> Message-ID: Big Oops! I wasn't using bioperl live! Things now seem to be ok - well at lest with that one genbank file! Thanks for the input anyway! :o) Nathan -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Nathan Haigh Sent: 27 July 2005 09:10 To: Barry Moore Cc: Hilmar Lapp; bioperl-l Subject: RE: [Bioperl-l] getting pubmed id from genbank files Yeah, i'm pretty sure i was using bioperl-live updated that morning. Your explaination of the problem seems feasible from what i was looking at in the perl debugger. I'll look into this a bit more later this morning. Nathan Quoting Barry Moore : > Nathan- > > That sounds like you are using bioperl 1.4? The error is in > Bio/SeqIO/genbank.pm and was fixed by Jason in cvs version 1.102 of > that file. However the current code still looks a bit odd to me. > Starting at line 1068 of the current cvs version (1.119) of genebank.pm > we have: > > 1068 if (/^\s{2}JOURNAL\s+(.*)/o) { > 1069 push(@loc, $1); > 1070 while ( defined($_ = $self->_readline) ) { > 1071 # we only match when there are at least 4 spaces > 1072 # there is probably a better way to match this > 1073 # as it assumes that the describing tag is short enough > 1074 /^\s{4,}(.*)/o && do { push(@loc, $1); > 1075 next; > 1076 }; > 1077 last; > 1078 } > 1079 $ref->location(join(' ', @loc)); > > This is all dealing with parsing the Journal line which is handled fine > by lines 1068-69. The while loop at 1070 looks at successive lines to > find something to add to the Journal line. The regex at line 1074 used > to read /^\s{3,}(.*)/o which would not match if the next line after > JOURNAL began with ' MEDLINE', but would match ' PUBMED' (Nathan's > situation) causing that line to be added to the JOURNAL line. Is there > ever a JOURNAL entry with more than one line? If so, shouldn't the > following lines always be untagged and thus indented 12 making the regex > /^\s{12}(.*)/o safer. The current situation would add any line to > JOURNAL line if it's tag is shorter than 6 characters, and I don't think > that's what we want. > > Barry > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Tuesday, July 26, 2005 11:05 AM > To: n.haigh@sheffield.ac.uk > Cc: 'bioperl-l' > Subject: Re: [Bioperl-l] getting pubmed id from genbank files > > > On Jul 26, 2005, at 7:49 AM, Nathan Haigh wrote: > > > -- snip -- > > $VAR1 = bless( { > > 'authors' => 'Clauss,M.J. and Mitchell-Olds,T.', > > 'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED > > 15082560', > > 'title' => 'Functional divergence in tandemly duplicated > > Arabidopsis > > thaliana trypsin inhibitor genes', > > 'tagname' => 'reference' > > }, 'Bio::Annotation::Reference' ); > > -- snip -- > > This is odd. The PUBMED line should not be concatenated with the > JOURNAL line. I wonder where this happens and why. Can you download the > record from NCBI (using the web interface, format 'GenBank', 'Send all > to file') and then parse it with Bio::SeqIO? If it works then the > problem must be in the code that deals with the HTTP-response. > > -hilmar > > > > > > -----Original Message----- > > From: Jason Stajich [mailto:jason.stajich@duke.edu] > > Sent: 26 July 2005 15:28 > > To: Bioperl-l@portal.open-bio.org > > Cc: Nathan Haigh > > Subject: [Bioperl-l] getting pubmed id from genbank files > > > > > > > > Here is part of the synopsis in Bio::Seq: > > > > foreach my $ref ( $ann->get_Annotations('reference') ) { > > print "Reference ",$ref->title,"\n"; > > } > > > > so do $ref->pubmed instead of $ref->title. > > > > > > -jason > >> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote: > >> > >>> I want to be able to supply a list of GI's, retrieve the genbank > >>> files and > >>> parse out the pubmed id's. > >>> > >>> > >>> > >>> I know I can do the first steps of retrieving the genbank files > >>> directly, > >>> but how do I get the pubmed id's? I've been playing around with > >>> things and > >>> haven't yet found out if this can be done. > >>> > >>> > >>> > >>> Cheers, > >>> > >>> Nathan > >>> > >>> > >>> > >>> ---------------------------------- > >>> > >>> Nathan Haigh > >>> > >>> Bioinformatics PostDoctoral Research Associate > >>> > >>> > >>> > >>> Room B2 211 > >>> > >>> Department of Animal and Plant Sciences > >>> > >>> University of Sheffield > >>> > >>> Western Bank > >>> > >>> Sheffield > >>> > >>> S10 2TN > >>> > >>> > >>> > >>> Tel: +44 (0)114 22 20112 > >>> > >>> Mob: +44 (0)7742 533 569 > >>> > >>> Fax: +44 (0)114 22 20002 > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> -- > >> Jason Stajich > >> http://www.duke.edu/~jes12 > >> jason.stajich -at- duke.edu > >> > >> > > -- > > Jason Stajich > > http://www.duke.edu/~jes12 > > jason.stajich -at- duke.edu > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From cjm at fruitfly.org Thu Jul 28 15:42:48 2005 From: cjm at fruitfly.org (Chris Mungall) Date: Thu Jul 28 15:33:35 2005 Subject: [Bioperl-l] Fixing bioperl [was Re: [GMOD-devel] Re: [Gmod-gbrowse] Analysis features (Re: Final alpha release of gmod (chado))] In-Reply-To: <1122570166.3288.10.camel@localhost.localdomain> References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu> <42E909E3.2030102@infobiogen.fr> <1122570166.3288.10.camel@localhost.localdomain> Message-ID: [sorry for the cross-posting, but I think it's really important to have a gmod to bioperl chit chat on this. I've removed gmod-gbrowse from the cc list] On Thu, 28 Jul 2005, Scott Cain wrote: > Hi Cyril, > > I think Bio::Tools::GFF is somewhat hacky and not a tool I would use to > produce 'safe' GFF3. On the other hand Bio::FeatureIO is still a little > immature, but it is what I used for the chado GFF3 bulk loader, so it > does handle (parse) Target features. So my suggestion would be to use > BFIO::gff, but be prepared for some problems; when you find them > complain loudly on the bioperl mailing list or fix the problems and > commit them (or both!). I think the answer may be even more complicated than this. Lurkers and contributors to the bioperl mailing list may have noticed that there has been some major obstacles in progressing lately, particularly in getting a stable release of the code out. bp1.4 is fairly old, 1.5 is a developers release, though this is the one required by GMOD. My understanding is that this bottleneck can be traced back to changes in the SeqFeature and Annotation model. These changes appear to be required by Bio::SeqFeature::Annotated which is produced by Bio::FeatureIO::gff (which in turn is used by the GMOD bulk loader, which is the main reason GMOD requires 1.5, I believe?). Unfortunately, these changes also break existing code and have a severe negative impact on memory usage. Before advising Cyril and others to switch to BFIO::gff I think it's important to make sure there is a clear path forward with bioperl. My impression is that there is something of a stalemate here. The bioperl developers would like to retract the aforementioned changes, but they believe they cannot do this without breaking GMOD code. They are also extremely uncomfortable about leaving these changes in. Everyone gives up and starts coding around bioperl. Here is why the changes were introduced: BioPerl has a 'scruffy' typing model, whereby feature types (primary_tag in bioperl) and featureprop types (tags in bioperl) are labels or strings. In contrast, Chado forces all types to be some class or relation in an ontology. Now obviously I'm rather partial to the Chado model, but that doesn't mean I think it should be forced upon bioperl. I often use bioperl in scruffy mode (on scruffy data); or in some combination whereby I map the scruffy types to ontologies in some non-bioperl code. When using bioperl as a middleware component over a nicely organised database, ontology-typed mode is definitely best. However, the majority of bioperl users (including myself) spend a large proportion of their time working with scruffy data, in which case lightweight scruffy types are more appropriate. It seems that there is a perfectly simple way of reconciling both approaches. We revert bioperl back to the simpler scruffy model. The majority of users and developers breathe a sigh of relief. We then extend SeqFeatureI with something like SeqFeatureAnnotatedI. This forces types to be stored as OntologyTerms (and I haven't even touched on some of the problems here, but at least we are insulating the standard bioperl layer that 99% of users use from these issues). All classes implementing SFAI will necessarily implement SFI, and the primary_tag and tag_values methods will be supported (not deprecated) as simple delegations to the OntologyTerm objects. We can then modify BFIO::gff (which is an incredibly useful piece of code) and get rid of all the dependencies on SO and Bio::Ontology* and instead allow the user of this module to plug in their own resolver/validator - so they can choose whether they just want fast scruffy lightweight SFI features, or whether they want ontology-typed SFAI features. If the latter, then they can choose their own resolver strategy - by a user supplied hash, by a copy of SO auto-downloaded from sourceforge, by a local chado db, by the genbank->SO mapping table, during parsing vs post-parsing, whatever. In fact there is already Bio::SeqFeature::Tools::TypeMapper, but currently this is mostly concerned with helping Bio::SeqFeature::Tools::Unflattener convert scruffy genbank to something sensible. GMOD (and perhaps biosql) would use SFAI, everyone else would use the simpler SFI. Someone can even get a stable 1.6 release out before all the SFAI details such as how the resolver would work are finalised. I'd really like to see 1.6 include a simpler BFIO::gff that can optionally produces features that aren't SeqFeature::Annotateds, but that's negotiable. There's vast swathes of both GMOD and BioPerl code I'm not familiar with, so it's possible my analysis above is flawed in some way. If it is, then it's up to someone from either camp to speak up! If not, then there's no excuses for the relevant people to start sorting out this mess by commencing with the solution outlined above. Cheers Chris > > Scott > > > On Thu, 2005-07-28 at 18:37 +0200, Cyril Pommier wrote: > > Hello, > > We are going to store analysis results in chado, and we are of course > > very interressed by these futur evolutions of GFF3/chado. > > So we would like to make sure that the parsers and conversions programs > > we are writing now will be compatible with the futur GFF3. > > > > We are using Bio::SeqFeature::Generic objects that we write with > > Bio::Tools::GFF. > > > > Do you think that Bio::Tools::GFF will be able to handle the new 'type' > > column or is it better to switch to Bio::FeatureIO::gff ? > > > > Thanks in advance for any advice. > > > > Cyril > > > > Don Gilbert wrote: > > > > > > > > Scott, > > > > > > Your notes in gmod_bulk_load_gff3.pl suggest it is headed in > > > same direction I suggest below. More about these todo points > > > > > >> - address flybase"s use of of analysisfeature combined with feature to > > >> give source-type information (in GFF terms). This will need to > > >> be addressed in the GBrowse adaptor. > > >> - modify the bulk loader to allow "mixed" GFF3 files (that is, > > >> containing > > >> both analysis results and annotations). See perldoc > > >> gmod_bulk_load_gff3.pl > > >> for more info > > > > > > > > > Use of chado's analysisfeature table is something others who know > > > it better can comment on. But after working with it for a while > > > it makes sense to me to use in this way: > > > > > > For a future GFF -> Chado loader, treat analysis features such as > > > gene finding results, BLAST, sim4 as 'analysisfeature type' rather > > > than feature CV term type (the ones that now end up with a generic > > > 'match' cvterm). In these cases the Analysis table is populated with > > > program:database_sourcename > > > as the basis of this 'analysisfeature type', such as > > > match:blastx:na_pe.dros > > > match:sim4:DGC > > > match:genie:dummy (or maybe exon:genie) > > > > > > The program:database fits neatly in GFF source field, as > > > #ref source type start stop ... > > > chr1 blastx:na_pe.dros match 1 100 ... > > > chr1 sim4:DGC match 1 100 ... > > > > > > These can be treated in database adaptor analogously to the CVterm > > > table feature types. See at end a list of current GFF feature > > > type:source from worm, rice, yeast, fly MODs. Fly and rice use a > > > syntax like above and worm gff uses BLAT_EMBL_BEST, instead of > > > BLAT:EMBL_BEST. > > > > > > From POD of your bulk_load_gff3.pl > > > > Analysis > > > > If you are loading analysis results (ie, BLAT results, gene > > > > predictions), you should specify the -a flag. If no arguments are > > > > supplied with the -a, then the loader will assume that the results > > > > belong to an analysis set with a name that is the concatenation of > > > > the source (column 2) and the method (column 3) with an underscore > > > > in between. > > > > > > "... then the loader will assume that the results belong to an > > > analysis table row with a program name and database source name > > > taken from Source (column 2, colon separated program:sourcename), > > > with a SOFA feature type taken from Method (column 3). If > > > sourcename doesn't apply, e.g. genefinder, don't add or use 'dummy'. > > > Use the generic 'match' SOFA type if others don't apply." > > > [see also http://song.sourceforge.net/gff3-jan04.shtml#ALIGNMENTS] > > > > > > Note that sourcename of database is a common attribute (all those > > > blasts, blats, sim4, ... are run on several different databases). > > > > > > For that underscore between method and source, where does that go into > > > database? It is used as parts of program or database sourcename names, > > > so it may be problematic to add one if not needed. > > > > > > Oh, I see now from bulk_load_gff3.PLS, you are creating a 'Name' entry > > > for analysis table. This probably is less useful than using Program > > > and Sourcename fields as flybase does, which comes from the common > > > usage where people run various programs, with various database sources > > > and want to plop the results into a database easily. These go into those > > > two fields directly, no need to create or parse a Name entry > > > (which can be and is null in flybase data). > > > > > > > my $search_analysis > > > > = $db->prepare("SELECT analysis_id FROM analysis WHERE name=?"); > > > > > > I think it would be better as > > > my $search_analysis > > > = $db->prepare("SELECT analysis_id FROM analysis WHERE program=? and > > > sourcename=?"); > > > > > > > Otherwise, the argument provided with -a will be taken > > > > as the name of the analysis set. Either way, the analysis set must > > > > already be in the analysis table. The easist way to do this is to > > > > insert it directly in the psql shell: > > > > > > > > INSERT INTO analysis (name, program, programversion) > > > > VALUES ('genscan 2005-2-28','genscan','5.4'); > > > > > > My choice would be to populate the analysis table from GFF data, rather > > > than expect prepraration by user (or as another option). > > > > > > INSERT INTO analysis (program, sourcename) > > > VALUES ('tblastx','na_baylorf1_scfchunk.dpse'); > > > INSERT INTO analysis (program, sourcename) > > > VALUES ('sim4','na_gb.dmel'); > > > INSERT INTO analysis (program, sourcename, programversion) > > > VALUES ('genie_masked','dummy', '1.0'); > > > > > > > There are other columns in the analysis table that are optional; see > > > > the schema documentation and '\d analysis' in psql for more > > > > information. > > > > > > > .... > > > > A planned addtion to the functionality of handling analysis results > > > > is to allow "mixed" GFF files, where some lines are analysis results > > > > and some are not. > > > > > > This is the case for drosophila GFF now (see others also below). If > > > you make the default assumption that if ($method =~ /.*match/) and > > > ($source =~ m/([^:]+):(.+)/), you should get all/most of > > > analysisfeature types, and probably not anything else. > > > > > > > Additionally, one will be able to supply lists of > > > > types (optionally with sources) and their associated entry in the > > > > analysis table. The format will probably be tag value pairs: > > > > > > > > --analysis match:Rice_est=rice_est_blast, \ > > > > match:Maize_cDNA=maize_cdna_blast, \ > > > > mRNA=genscan_prediction,exon=genscan_prediction > > > > > > My suggestion for this (as per GFF source,type columns) would be > > > --analysis match:program:sourcename ... > > > --analysis match:blast:Rice_est,match:blast:Maize_cDNA,\ > > > mRNA:genscan:dummy, exon:genscan:dummy > > > > > > I guess the 'dummy' data sourcename need not be added; flybase uses it > > > to keep that field not-null, but it isn't required by the schema. > > > > > > Here are some snippets from the ChadoFC adaptor I modified > > > from yours (will get into cvs.sf.net 'real soon'), showing that > > > it isn't much work to add this as an analog to how cvterm types > > > are used. > > > > > > -- Don > > > > > > ## Bio::DB::Das::ChadoFC.pm, part of new() - load analysis types > > > ## treat similar to CV table types > > > > > > sub getAnalysisFeatureHash > > > { > > > my $self= shift; > > > > > > my $dbh= $self->dbh(); > > > my $sth = $dbh->prepare("select analysis_id,program,sourcename from > > > analysis") > > > or warn "unable to prepare select cvterms"; > > > $sth->execute or $self->throw("unable to select cvterms"); > > > > > > my(%term2name,%name2term) = ({},{}); > > > > > > while (my $hashref = $sth->fetchrow_hashref) { > > > > > > ## this is dgg syntax of analysis feature names for GFF > > > ## all have generic 'match' method and program:source as 'source' > > > ## a problem, want other main types: EST_match:xxx, mRNA:genie .. etc. > > > my $anfeat= "match:".$hashref->{program}.":".$hashref->{sourcename}; > > > > > > $term2name{ $hashref->{analysis_id} } = $anfeat; > > > $name2term{ $anfeat } = $hashref->{analysis_id}; > > > } > > > $self->an_term2name(\%term2name); > > > $self->an_name2term(\%name2term); > > > } > > > > > > ## Das::ChadoFC::Segment snippets > > > sub features { > > > $self->{has_anatype}=0; > > > my $sql_range = ''; > > > my ($interbase_start,$rend,$srcfeature_id,$sql_types); > > > unless ($feature_id) { > > > $sql_range = $self->sql_range($rangetype); > > > > > > $sql_types = $self->sql_types($types, -1); # dgg > > > > > > $srcfeature_id = $self->{srcfeature_id}; > > > } > > > ... > > > elsif($self->{has_anatype}) { > > > $from_part .= "left join analysisfeature af using (feature_id) "; > > > } > > > > > > > > > sub sql_types > > > .. > > > $valid_type = $factory->name2term($temp_type); > > > $is_anatype= 0; > > > unless ($valid_type) { > > > $valid_type = $factory->an_name2term($temp_type); > > > $self->{has_anatype}= $is_anatype= 1 if ($valid_type); > > > } > > > .. > > > ## leave out extra invalid types > > > if (!$valid_type) { > > > ### skip > > > } elsif ($temp_dbxref) { > > > $sql_types .= $orsql."(f.type_id = $valid_type and fd.dbxref_id = > > > $temp_dbxref)"; > > > } elsif($is_anatype) { > > > $sql_types .= $orsql."(af.analysis_id = $valid_type)"; #<<< > > > } else { > > > $sql_types .= $orsql."(f.type_id = $valid_type)"; > > > } > > > > > > > > > Lists of GFF feature type:source from some current MOD data > > > where * are probably analysisfeature types (program:database) > > > > > > rice gff type:source > > > ftp://ftp.gramene.org/pub/gramene/release17/data/sequence_annotation/ > > > gff3/ > > > -------------------- > > > CDS:known > > > CDS:tigr > > > EST:cmap > > > EST_match:Barley (? might be EST_match:someprogram:Barley) > > > EST_match:Maize > > > EST_match:Millet > > > EST_match:Rice > > > EST_match:Sorghum > > > EST_match:Wheat > > > cDNA_match:Rice > > > cross_genome_match:Maize > > > cross_genome_match:Rice > > > cross_genome_match:Sorghum > > > * exon:FgenesH:Monocot > > > exon:known > > > exon:tigr > > > five_prime_UTR:tigr > > > gene:known > > > gene:tigr > > > * mRNA:FgenesH:Monocot > > > mRNA:known > > > mRNA:tigr > > > microsatellite:cmap > > > three_prime_UTR:known > > > three_prime_UTR:tigr > > > transposable_element_insertion_site:cmap > > > > > > worm gff type:source > > > ftp://ftp.wormbase.org/pub/wormbase/species/elegans/ > > > genome_feature_tables/GFF3/ > > > ---------------------- > > > CDS:Coding_transcript > > > * CDS:Genefinder > > > CDS:Transposon_CDS > > > CDS:history > > > * CDS:twinscan > > > * EST_match:BLAT_EST_BEST (~ EST_match:BLAT:EST_BEST) > > > * EST_match:BLAT_EST_OTHER > > > PCR_product:GenePair_STS > > > PCR_product:Orfeome > > > RNAi_reagent:RNAi_primary > > > RNAi_reagent:RNAi_secondary > > > SNP:Allele > > > binding_site:binding_site > > > * cDNA_match:BLAT_mRNA_BEST (~ cDNA_match:BLAT:mRNA_BEST ) > > > * cDNA_match:BLAT_mRNA_OTHER > > > clone_end:. > > > clone_start:. > > > complex_substitution :Allele > > > deletion:Allele > > > exon:Coding_transcript > > > * exon:Genefinder > > > exon:Non_coding_transcript > > > exon:Pseudogene > > > exon:Transposon_CDS > > > exon:history > > > exon:miRNA > > > exon:rRNA > > > exon:scRNA > > > exon:snRNA > > > exon:snoRNA > > > exon:tRNA > > > * exon:tRNAscan-SE-1.23 > > > * exon:twinscan > > > experimental_result_region:Expr_profile > > > experimental_result_region:cDNA_for_RNAi > > > * expressed_sequence_match:BLAT_OST_BEST (~ > > > expressed_sequence_match:BLAT:OST_BEST ) > > > * expressed_sequence_match:BLAT_OST_OTHER > > > five_prime_UTR:Coding_transcript > > > gene:Coding_transcript > > > gene:gene > > > gene:history > > > gene:landmark > > > insertion:Allele > > > inverted_repeat:inverted > > > mRNA:Coding_transcript > > > * mRNA:Genefinder > > > mRNA:Transposon_CDS > > > mRNA:history > > > * mRNA:twinscan > > > miRNA:miRNA > > > nc_primary_transcript:Non_coding_transcript > > > * nucleotide_match:BLAT_EMBL_BEST (~ nucleotide_match:BLAT:EMBL_BEST ) > > > * nucleotide_match:BLAT_EMBL_OTHER > > > * nucleotide_match:BLAT_TC1_BEST > > > * nucleotide_match:BLAT_TC1_OTHER > > > * nucleotide_match:BLAT_ncRNA_BEST > > > * nucleotide_match:BLAT_ncRNA_OTHER > > > * nucleotide_match:TEC_RED > > > * nucleotide_match:waba_coding > > > * nucleotide_match:waba_strong > > > * nucleotide_match:waba_weak > > > oligo:. > > > operon:operon > > > polyA_signal_sequence:polyA_signal_sequence > > > polyA_site:polyA_site > > > processed_transcript:gene > > > protein_coding_primary_transcript:Coding_transcript > > > * protein_match:wublastx > > > pseudogene:Pseudogene > > > pseudogene:history > > > rRNA:rRNA > > > reagent:Oligo_set > > > region:. > > > region:Genbank > > > region:Genomic_canonical > > > region:Link > > > * repeat_region:RepeatMasker > > > scRNA:scRNA > > > sequence_variant:. > > > sequence_variant:Allele > > > snRNA:snRNA > > > snoRNA:snoRNA > > > substitution:Allele > > > tRNA:tRNA > > > * tRNA:tRNAscan-SE-1.23 > > > tandem_repeat:tandem > > > three_prime_UTR:Coding_transcript > > > trans_splice_acceptor_site:SL1 > > > trans_splice_acceptor_site:SL2 > > > transcript:SAGE_transcript > > > * translated_nucleotide_match:BLAT_NEMATODE (~ > > > translated_nucleotide_match:BLAT:NEMATODE ) > > > transposable_element:Transposon > > > transposable_element:Transposon_CDS > > > transposable_element_insertion_site:Allele > > > transposable_element_insertion_site:Mos_insertion_allele > > > > > > > > > fly gff type:source > > > ftp://ftp.flybase.net/genomes/dmel/current/gff/ > > > ----------------------- > > > BAC:. > > > CDS:. > > > aberration_junction:. > > > chromosome:. > > > chromosome_arm:. > > > chromosome_band:. > > > enhancer:. > > > exon:. > > > five_prime_UTR:. > > > gene:. > > > insertion_site:. > > > intron:. > > > mRNA:. > > > * match:RNAiHDP > > > * match:assembly:path > > > * match:blastx:aa_SPTR.dmel > > > * match:blastx:aa_SPTR.insect > > > * match:blastx:aa_SPTR.othinv > > > * match:blastx:aa_SPTR.othvert > > > * match:blastx:aa_SPTR.plant > > > * match:blastx:aa_SPTR.primate > > > * match:blastx:aa_SPTR.rodent > > > * match:blastx:aa_SPTR.worm > > > * match:blastx:aa_SPTR.yeast > > > * match:genscan > > > * match:repeatmasker > > > * match:sim4:na_ARGs.dros > > > * match:sim4:na_ARGsCDS.dros > > > * match:sim4:na_DGC_dros > > > * match:sim4:na_dbEST.diff.dmel > > > * match:sim4:na_dbEST.same.dmel > > > * match:sim4:na_gadfly_dmel_r2 > > > * match:sim4:na_gb.dmel > > > * match:sim4:na_gb.tpa.dmel > > > * match:sim4:na_smallRNA.dros > > > * match:sim4:na_transcript_dmel_r31 > > > * match:sim4:na_transcript_dmel_r32 > > > * match:tRNAscan-SE:. > > > * match:tblastx:na_agambiae > > > * match:tblastx:na_dbEST.insect > > > * match:tblastx:na_dpse > > > * match_part:RNAiHDP > > > * match_part:assembly:path > > > * match_part:blastx:aa_SPTR.dmel > > > * match_part:blastx:aa_SPTR.insect > > > * match_part:blastx:aa_SPTR.othinv > > > * match_part:blastx:aa_SPTR.othvert > > > * match_part:blastx:aa_SPTR.plant > > > * match_part:blastx:aa_SPTR.primate > > > * match_part:blastx:aa_SPTR.rodent > > > * match_part:blastx:aa_SPTR.worm > > > * match_part:blastx:aa_SPTR.yeast > > > * match_part:genscan > > > * match_part:repeatmasker > > > * match_part:sim4:na_ARGs.dros > > > * match_part:sim4:na_ARGsCDS.dros > > > * match_part:sim4:na_DGC_dros > > > * match_part:sim4:na_dbEST.diff.dmel > > > * match_part:sim4:na_dbEST.same.dmel > > > * match_part:sim4:na_gadfly_dmel_r2 > > > * match_part:sim4:na_gb.dmel > > > * match_part:sim4:na_gb.tpa.dmel > > > * match_part:sim4:na_smallRNA.dros > > > * match_part:sim4:na_transcript_dmel_r31 > > > * match_part:sim4:na_transcript_dmel_r32 > > > * match_part:tRNAscan-SE:. > > > * match_part:tblastx:na_agambiae > > > * match_part:tblastx:na_dbEST.insect > > > * match_part:tblastx:na_dpse > > > mature_peptide:. > > > ncRNA:. > > > oligo:. > > > point_mutation:. > > > polyA_site:. > > > protein_binding_site:. > > > pseudogene:. > > > region:. > > > regulatory_region:. > > > rescue_fragment:. > > > scaffold:. > > > sequence_variant:. > > > snRNA:. > > > snoRNA:. > > > tRNA:. > > > three_prime_UTR:. > > > transcription_start_site:. > > > transposable_element:. > > > transposable_element_insertion_site:. 3116 > > > > > > > > > yeast gff type:source count > > > ftp://genome-ftp.stanford.edu/pub/yeast/data_download/ > > > chromosomal_feature/saccharomyces_cerevisiae.gff > > > ------------------------- > > > ARS:SGD > > > CDS:SGD > > > binding_site:SGD > > > centromere:SGD > > > chromosome:SGD > > > gene:SGD > > > insertion:SGD > > > intron:SGD > > > ncRNA:SGD > > > nc_primary_transcript:SGD > > > nucleotide_match:SGD > > > pseudogene:SGD > > > rRNA:SGD > > > region:SGD > > > region:landmark > > > repeat_family:SGD > > > repeat_region:SGD > > > snRNA:SGD > > > snoRNA:SGD > > > tRNA:SGD > > > telomere:SGD > > > transposable_element:SGD > > > transposable_element_gene:SGD > > > > > > -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 > > > -- gilbertd@indiana.edu -- http://marmot.bio.indiana.edu/ > > > > > > > > > > > > ------------------------------------------------------- > > > This SF.Net email is sponsored by the 'Do More With Dual!' webinar > > > happening > > > July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual > > > core and dual graphics technology at this free one hour event hosted > > > by HP, AMD, and NVIDIA. To register visit > > > http://www.hp.com/go/dualwebinar > > > _______________________________________________ > > > Gmod-gbrowse mailing list > > > Gmod-gbrowse@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain@cshl.edu > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > Cold Spring Harbor Laboratory > > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO September > 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > _______________________________________________ > Gmod-devel mailing list > Gmod-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-devel > From birney at ebi.ac.uk Thu Jul 28 19:20:39 2005 From: birney at ebi.ac.uk (Ewan Birney) Date: Thu Jul 28 19:12:17 2005 Subject: [Bioperl-l] Fixing bioperl [was Re: [GMOD-devel] Re: [Gmod-gbrowse] Analysis features (Re: Final alpha release of gmod (chado))] In-Reply-To: References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu> <42E909E3.2030102@infobiogen.fr> <1122570166.3288.10.camel@localhost.localdomain> Message-ID: <42E96847.1060900@ebi.ac.uk> Just my $0.02 on this.... Chris - this seems bang on the money and what we should do (roll back out the changes, extend the interface and then in the extended interface have the "scruffy" types delegate to the short_name or whatever in the main types). So - for what it is worth, this is the way to go for me. Chris Mungall wrote: > [sorry for the cross-posting, but I think it's really important to have a > gmod to bioperl chit chat on this. I've removed gmod-gbrowse from the cc > list] > > On Thu, 28 Jul 2005, Scott Cain wrote: > > >>Hi Cyril, >> >>I think Bio::Tools::GFF is somewhat hacky and not a tool I would use to >>produce 'safe' GFF3. On the other hand Bio::FeatureIO is still a little >>immature, but it is what I used for the chado GFF3 bulk loader, so it >>does handle (parse) Target features. So my suggestion would be to use >>BFIO::gff, but be prepared for some problems; when you find them >>complain loudly on the bioperl mailing list or fix the problems and >>commit them (or both!). > > > I think the answer may be even more complicated than this. > > Lurkers and contributors to the bioperl mailing list may have noticed that > there has been some major obstacles in progressing lately, particularly in > getting a stable release of the code out. bp1.4 is fairly old, 1.5 is a > developers release, though this is the one required by GMOD. > > My understanding is that this bottleneck can be traced back to changes in > the SeqFeature and Annotation model. These changes appear to be required > by Bio::SeqFeature::Annotated which is produced by Bio::FeatureIO::gff > (which in turn is used by the GMOD bulk loader, which is the main reason > GMOD requires 1.5, I believe?). Unfortunately, these changes also break > existing code and have a severe negative impact on memory usage. > > Before advising Cyril and others to switch to BFIO::gff I think it's > important to make sure there is a clear path forward with bioperl. My > impression is that there is something of a stalemate here. The bioperl > developers would like to retract the aforementioned changes, but they > believe they cannot do this without breaking GMOD code. They are also > extremely uncomfortable about leaving these changes in. Everyone gives up > and starts coding around bioperl. > > Here is why the changes were introduced: > > BioPerl has a 'scruffy' typing model, whereby feature types (primary_tag > in bioperl) and featureprop types (tags in bioperl) are labels or strings. > In contrast, Chado forces all types to be some class or relation in an > ontology. > > Now obviously I'm rather partial to the Chado model, but that doesn't mean > I think it should be forced upon bioperl. I often use bioperl in scruffy > mode (on scruffy data); or in some combination whereby I map the scruffy > types to ontologies in some non-bioperl code. When using bioperl as a > middleware component over a nicely organised database, ontology-typed mode > is definitely best. However, the majority of bioperl users (including > myself) spend a large proportion of their time working with scruffy data, > in which case lightweight scruffy types are more appropriate. > > It seems that there is a perfectly simple way of reconciling both > approaches. We revert bioperl back to the simpler scruffy model. The > majority of users and developers breathe a sigh of relief. We then extend > SeqFeatureI with something like SeqFeatureAnnotatedI. This forces types to > be stored as OntologyTerms (and I haven't even touched on some of the > problems here, but at least we are insulating the standard bioperl layer > that 99% of users use from these issues). All classes implementing SFAI > will necessarily implement SFI, and the primary_tag and tag_values methods > will be supported (not deprecated) as simple delegations to the > OntologyTerm objects. > > We can then modify BFIO::gff (which is an incredibly useful piece of code) > and get rid of all the dependencies on SO and Bio::Ontology* and instead > allow the user of this module to plug in their own resolver/validator - so > they can choose whether they just want fast scruffy lightweight SFI > features, or whether they want ontology-typed SFAI features. If the > latter, then they can choose their own resolver strategy - by a user > supplied hash, by a copy of SO auto-downloaded from sourceforge, by a > local chado db, by the genbank->SO mapping table, during parsing vs > post-parsing, whatever. In fact there is already > Bio::SeqFeature::Tools::TypeMapper, but currently this is mostly concerned > with helping Bio::SeqFeature::Tools::Unflattener convert scruffy genbank > to something sensible. > > GMOD (and perhaps biosql) would use SFAI, everyone else would use the > simpler SFI. Someone can even get a stable 1.6 release out before all the > SFAI details such as how the resolver would work are finalised. I'd really > like to see 1.6 include a simpler BFIO::gff that can optionally produces > features that aren't SeqFeature::Annotateds, but that's negotiable. > > There's vast swathes of both GMOD and BioPerl code I'm not familiar with, > so it's possible my analysis above is flawed in some way. If it is, then > it's up to someone from either camp to speak up! If not, then there's no > excuses for the relevant people to start sorting out this mess by > commencing with the solution outlined above. > > Cheers > Chris > > >>Scott >> >> >>On Thu, 2005-07-28 at 18:37 +0200, Cyril Pommier wrote: >> >>>Hello, >>>We are going to store analysis results in chado, and we are of course >>>very interressed by these futur evolutions of GFF3/chado. >>>So we would like to make sure that the parsers and conversions programs >>>we are writing now will be compatible with the futur GFF3. >>> >>>We are using Bio::SeqFeature::Generic objects that we write with >>>Bio::Tools::GFF. >>> >>>Do you think that Bio::Tools::GFF will be able to handle the new 'type' >>>column or is it better to switch to Bio::FeatureIO::gff ? >>> >>>Thanks in advance for any advice. >>> >>>Cyril >>> >>>Don Gilbert wrote: >>> >>> >>>>Scott, >>>> >>>>Your notes in gmod_bulk_load_gff3.pl suggest it is headed in >>>>same direction I suggest below. More about these todo points >>>> >>>> >>>>>- address flybase"s use of of analysisfeature combined with feature to >>>>>give source-type information (in GFF terms). This will need to >>>>>be addressed in the GBrowse adaptor. >>>>>- modify the bulk loader to allow "mixed" GFF3 files (that is, >>>>>containing >>>>>both analysis results and annotations). See perldoc >>>>>gmod_bulk_load_gff3.pl >>>>>for more info >>>> >>>> >>>>Use of chado's analysisfeature table is something others who know >>>>it better can comment on. But after working with it for a while >>>>it makes sense to me to use in this way: >>>> >>>>For a future GFF -> Chado loader, treat analysis features such as >>>>gene finding results, BLAST, sim4 as 'analysisfeature type' rather >>>>than feature CV term type (the ones that now end up with a generic >>>>'match' cvterm). In these cases the Analysis table is populated with >>>>program:database_sourcename >>>>as the basis of this 'analysisfeature type', such as >>>>match:blastx:na_pe.dros >>>>match:sim4:DGC >>>>match:genie:dummy (or maybe exon:genie) >>>> >>>>The program:database fits neatly in GFF source field, as >>>>#ref source type start stop ... >>>>chr1 blastx:na_pe.dros match 1 100 ... >>>>chr1 sim4:DGC match 1 100 ... >>>> >>>>These can be treated in database adaptor analogously to the CVterm >>>>table feature types. See at end a list of current GFF feature >>>>type:source from worm, rice, yeast, fly MODs. Fly and rice use a >>>>syntax like above and worm gff uses BLAT_EMBL_BEST, instead of >>>>BLAT:EMBL_BEST. >>>> >>>>From POD of your bulk_load_gff3.pl >>>> >>>>>Analysis >>>>>If you are loading analysis results (ie, BLAT results, gene >>>>>predictions), you should specify the -a flag. If no arguments are >>>>>supplied with the -a, then the loader will assume that the results >>>>>belong to an analysis set with a name that is the concatenation of >>>>>the source (column 2) and the method (column 3) with an underscore >>>>>in between. >>>> >>>>"... then the loader will assume that the results belong to an >>>>analysis table row with a program name and database source name >>>>taken from Source (column 2, colon separated program:sourcename), >>>>with a SOFA feature type taken from Method (column 3). If >>>>sourcename doesn't apply, e.g. genefinder, don't add or use 'dummy'. >>>>Use the generic 'match' SOFA type if others don't apply." >>>>[see also http://song.sourceforge.net/gff3-jan04.shtml#ALIGNMENTS] >>>> >>>>Note that sourcename of database is a common attribute (all those >>>>blasts, blats, sim4, ... are run on several different databases). >>>> >>>>For that underscore between method and source, where does that go into >>>>database? It is used as parts of program or database sourcename names, >>>>so it may be problematic to add one if not needed. >>>> >>>>Oh, I see now from bulk_load_gff3.PLS, you are creating a 'Name' entry >>>>for analysis table. This probably is less useful than using Program >>>>and Sourcename fields as flybase does, which comes from the common >>>>usage where people run various programs, with various database sources >>>>and want to plop the results into a database easily. These go into those >>>>two fields directly, no need to create or parse a Name entry >>>>(which can be and is null in flybase data). >>>> >>>> >>>>>my $search_analysis >>>>>= $db->prepare("SELECT analysis_id FROM analysis WHERE name=?"); >>>> >>>>I think it would be better as >>>>my $search_analysis >>>>= $db->prepare("SELECT analysis_id FROM analysis WHERE program=? and >>>>sourcename=?"); >>>> >>>> >>>>>Otherwise, the argument provided with -a will be taken >>>>>as the name of the analysis set. Either way, the analysis set must >>>>>already be in the analysis table. The easist way to do this is to >>>>>insert it directly in the psql shell: >>>>> >>>>>INSERT INTO analysis (name, program, programversion) >>>>>VALUES ('genscan 2005-2-28','genscan','5.4'); >>>> >>>>My choice would be to populate the analysis table from GFF data, rather >>>>than expect prepraration by user (or as another option). >>>> >>>>INSERT INTO analysis (program, sourcename) >>>>VALUES ('tblastx','na_baylorf1_scfchunk.dpse'); >>>>INSERT INTO analysis (program, sourcename) >>>>VALUES ('sim4','na_gb.dmel'); >>>>INSERT INTO analysis (program, sourcename, programversion) >>>>VALUES ('genie_masked','dummy', '1.0'); >>>> >>>> >>>>>There are other columns in the analysis table that are optional; see >>>>>the schema documentation and '\d analysis' in psql for more >>>>>information. >>>>> >>>> >>>>.... >>>> >>>>>A planned addtion to the functionality of handling analysis results >>>>>is to allow "mixed" GFF files, where some lines are analysis results >>>>>and some are not. >>>> >>>>This is the case for drosophila GFF now (see others also below). If >>>>you make the default assumption that if ($method =~ /.*match/) and >>>>($source =~ m/([^:]+):(.+)/), you should get all/most of >>>>analysisfeature types, and probably not anything else. >>>> >>>> >>>>>Additionally, one will be able to supply lists of >>>>>types (optionally with sources) and their associated entry in the >>>>>analysis table. The format will probably be tag value pairs: >>>>> >>>>>--analysis match:Rice_est=rice_est_blast, \ >>>>>match:Maize_cDNA=maize_cdna_blast, \ >>>>>mRNA=genscan_prediction,exon=genscan_prediction >>>> >>>>My suggestion for this (as per GFF source,type columns) would be >>>>--analysis match:program:sourcename ... >>>>--analysis match:blast:Rice_est,match:blast:Maize_cDNA,\ >>>>mRNA:genscan:dummy, exon:genscan:dummy >>>> >>>>I guess the 'dummy' data sourcename need not be added; flybase uses it >>>>to keep that field not-null, but it isn't required by the schema. >>>> >>>>Here are some snippets from the ChadoFC adaptor I modified >>>>from yours (will get into cvs.sf.net 'real soon'), showing that >>>>it isn't much work to add this as an analog to how cvterm types >>>>are used. >>>> >>>>-- Don >>>> >>>>## Bio::DB::Das::ChadoFC.pm, part of new() - load analysis types >>>>## treat similar to CV table types >>>> >>>>sub getAnalysisFeatureHash >>>>{ >>>>my $self= shift; >>>> >>>>my $dbh= $self->dbh(); >>>>my $sth = $dbh->prepare("select analysis_id,program,sourcename from >>>>analysis") >>>>or warn "unable to prepare select cvterms"; >>>>$sth->execute or $self->throw("unable to select cvterms"); >>>> >>>>my(%term2name,%name2term) = ({},{}); >>>> >>>>while (my $hashref = $sth->fetchrow_hashref) { >>>> >>>>## this is dgg syntax of analysis feature names for GFF >>>>## all have generic 'match' method and program:source as 'source' >>>>## a problem, want other main types: EST_match:xxx, mRNA:genie .. etc. >>>>my $anfeat= "match:".$hashref->{program}.":".$hashref->{sourcename}; >>>> >>>>$term2name{ $hashref->{analysis_id} } = $anfeat; >>>>$name2term{ $anfeat } = $hashref->{analysis_id}; >>>>} >>>>$self->an_term2name(\%term2name); >>>>$self->an_name2term(\%name2term); >>>>} >>>> >>>>## Das::ChadoFC::Segment snippets >>>>sub features { >>>>$self->{has_anatype}=0; >>>>my $sql_range = ''; >>>>my ($interbase_start,$rend,$srcfeature_id,$sql_types); >>>>unless ($feature_id) { >>>>$sql_range = $self->sql_range($rangetype); >>>> >>>>$sql_types = $self->sql_types($types, -1); # dgg >>>> >>>>$srcfeature_id = $self->{srcfeature_id}; >>>>} >>>>... >>>>elsif($self->{has_anatype}) { >>>>$from_part .= "left join analysisfeature af using (feature_id) "; >>>>} >>>> >>>> >>>>sub sql_types >>>>.. >>>>$valid_type = $factory->name2term($temp_type); >>>>$is_anatype= 0; >>>>unless ($valid_type) { >>>>$valid_type = $factory->an_name2term($temp_type); >>>>$self->{has_anatype}= $is_anatype= 1 if ($valid_type); >>>>} >>>>.. >>>>## leave out extra invalid types >>>>if (!$valid_type) { >>>>### skip >>>>} elsif ($temp_dbxref) { >>>>$sql_types .= $orsql."(f.type_id = $valid_type and fd.dbxref_id = >>>>$temp_dbxref)"; >>>>} elsif($is_anatype) { >>>>$sql_types .= $orsql."(af.analysis_id = $valid_type)"; #<<< >>>>} else { >>>>$sql_types .= $orsql."(f.type_id = $valid_type)"; >>>>} >>>> >>>> >>>>Lists of GFF feature type:source from some current MOD data >>>>where * are probably analysisfeature types (program:database) >>>> >>>>rice gff type:source >>>>ftp://ftp.gramene.org/pub/gramene/release17/data/sequence_annotation/ >>>>gff3/ >>>>-------------------- >>>>CDS:known >>>>CDS:tigr >>>>EST:cmap >>>>EST_match:Barley (? might be EST_match:someprogram:Barley) >>>>EST_match:Maize >>>>EST_match:Millet >>>>EST_match:Rice >>>>EST_match:Sorghum >>>>EST_match:Wheat >>>>cDNA_match:Rice >>>>cross_genome_match:Maize >>>>cross_genome_match:Rice >>>>cross_genome_match:Sorghum >>>>* exon:FgenesH:Monocot >>>>exon:known >>>>exon:tigr >>>>five_prime_UTR:tigr >>>>gene:known >>>>gene:tigr >>>>* mRNA:FgenesH:Monocot >>>>mRNA:known >>>>mRNA:tigr >>>>microsatellite:cmap >>>>three_prime_UTR:known >>>>three_prime_UTR:tigr >>>>transposable_element_insertion_site:cmap >>>> >>>>worm gff type:source >>>>ftp://ftp.wormbase.org/pub/wormbase/species/elegans/ >>>>genome_feature_tables/GFF3/ >>>>---------------------- >>>>CDS:Coding_transcript >>>>* CDS:Genefinder >>>>CDS:Transposon_CDS >>>>CDS:history >>>>* CDS:twinscan >>>>* EST_match:BLAT_EST_BEST (~ EST_match:BLAT:EST_BEST) >>>>* EST_match:BLAT_EST_OTHER >>>>PCR_product:GenePair_STS >>>>PCR_product:Orfeome >>>>RNAi_reagent:RNAi_primary >>>>RNAi_reagent:RNAi_secondary >>>>SNP:Allele >>>>binding_site:binding_site >>>>* cDNA_match:BLAT_mRNA_BEST (~ cDNA_match:BLAT:mRNA_BEST ) >>>>* cDNA_match:BLAT_mRNA_OTHER >>>>clone_end:. >>>>clone_start:. >>>>complex_substitution :Allele >>>>deletion:Allele >>>>exon:Coding_transcript >>>>* exon:Genefinder >>>>exon:Non_coding_transcript >>>>exon:Pseudogene >>>>exon:Transposon_CDS >>>>exon:history >>>>exon:miRNA >>>>exon:rRNA >>>>exon:scRNA >>>>exon:snRNA >>>>exon:snoRNA >>>>exon:tRNA >>>>* exon:tRNAscan-SE-1.23 >>>>* exon:twinscan >>>>experimental_result_region:Expr_profile >>>>experimental_result_region:cDNA_for_RNAi >>>>* expressed_sequence_match:BLAT_OST_BEST (~ >>>>expressed_sequence_match:BLAT:OST_BEST ) >>>>* expressed_sequence_match:BLAT_OST_OTHER >>>>five_prime_UTR:Coding_transcript >>>>gene:Coding_transcript >>>>gene:gene >>>>gene:history >>>>gene:landmark >>>>insertion:Allele >>>>inverted_repeat:inverted >>>>mRNA:Coding_transcript >>>>* mRNA:Genefinder >>>>mRNA:Transposon_CDS >>>>mRNA:history >>>>* mRNA:twinscan >>>>miRNA:miRNA >>>>nc_primary_transcript:Non_coding_transcript >>>>* nucleotide_match:BLAT_EMBL_BEST (~ nucleotide_match:BLAT:EMBL_BEST ) >>>>* nucleotide_match:BLAT_EMBL_OTHER >>>>* nucleotide_match:BLAT_TC1_BEST >>>>* nucleotide_match:BLAT_TC1_OTHER >>>>* nucleotide_match:BLAT_ncRNA_BEST >>>>* nucleotide_match:BLAT_ncRNA_OTHER >>>>* nucleotide_match:TEC_RED >>>>* nucleotide_match:waba_coding >>>>* nucleotide_match:waba_strong >>>>* nucleotide_match:waba_weak >>>>oligo:. >>>>operon:operon >>>>polyA_signal_sequence:polyA_signal_sequence >>>>polyA_site:polyA_site >>>>processed_transcript:gene >>>>protein_coding_primary_transcript:Coding_transcript >>>>* protein_match:wublastx >>>>pseudogene:Pseudogene >>>>pseudogene:history >>>>rRNA:rRNA >>>>reagent:Oligo_set >>>>region:. >>>>region:Genbank >>>>region:Genomic_canonical >>>>region:Link >>>>* repeat_region:RepeatMasker >>>>scRNA:scRNA >>>>sequence_variant:. >>>>sequence_variant:Allele >>>>snRNA:snRNA >>>>snoRNA:snoRNA >>>>substitution:Allele >>>>tRNA:tRNA >>>>* tRNA:tRNAscan-SE-1.23 >>>>tandem_repeat:tandem >>>>three_prime_UTR:Coding_transcript >>>>trans_splice_acceptor_site:SL1 >>>>trans_splice_acceptor_site:SL2 >>>>transcript:SAGE_transcript >>>>* translated_nucleotide_match:BLAT_NEMATODE (~ >>>>translated_nucleotide_match:BLAT:NEMATODE ) >>>>transposable_element:Transposon >>>>transposable_element:Transposon_CDS >>>>transposable_element_insertion_site:Allele >>>>transposable_element_insertion_site:Mos_insertion_allele >>>> >>>> >>>>fly gff type:source >>>>ftp://ftp.flybase.net/genomes/dmel/current/gff/ >>>>----------------------- >>>>BAC:. >>>>CDS:. >>>>aberration_junction:. >>>>chromosome:. >>>>chromosome_arm:. >>>>chromosome_band:. >>>>enhancer:. >>>>exon:. >>>>five_prime_UTR:. >>>>gene:. >>>>insertion_site:. >>>>intron:. >>>>mRNA:. >>>>* match:RNAiHDP >>>>* match:assembly:path >>>>* match:blastx:aa_SPTR.dmel >>>>* match:blastx:aa_SPTR.insect >>>>* match:blastx:aa_SPTR.othinv >>>>* match:blastx:aa_SPTR.othvert >>>>* match:blastx:aa_SPTR.plant >>>>* match:blastx:aa_SPTR.primate >>>>* match:blastx:aa_SPTR.rodent >>>>* match:blastx:aa_SPTR.worm >>>>* match:blastx:aa_SPTR.yeast >>>>* match:genscan >>>>* match:repeatmasker >>>>* match:sim4:na_ARGs.dros >>>>* match:sim4:na_ARGsCDS.dros >>>>* match:sim4:na_DGC_dros >>>>* match:sim4:na_dbEST.diff.dmel >>>>* match:sim4:na_dbEST.same.dmel >>>>* match:sim4:na_gadfly_dmel_r2 >>>>* match:sim4:na_gb.dmel >>>>* match:sim4:na_gb.tpa.dmel >>>>* match:sim4:na_smallRNA.dros >>>>* match:sim4:na_transcript_dmel_r31 >>>>* match:sim4:na_transcript_dmel_r32 >>>>* match:tRNAscan-SE:. >>>>* match:tblastx:na_agambiae >>>>* match:tblastx:na_dbEST.insect >>>>* match:tblastx:na_dpse >>>>* match_part:RNAiHDP >>>>* match_part:assembly:path >>>>* match_part:blastx:aa_SPTR.dmel >>>>* match_part:blastx:aa_SPTR.insect >>>>* match_part:blastx:aa_SPTR.othinv >>>>* match_part:blastx:aa_SPTR.othvert >>>>* match_part:blastx:aa_SPTR.plant >>>>* match_part:blastx:aa_SPTR.primate >>>>* match_part:blastx:aa_SPTR.rodent >>>>* match_part:blastx:aa_SPTR.worm >>>>* match_part:blastx:aa_SPTR.yeast >>>>* match_part:genscan >>>>* match_part:repeatmasker >>>>* match_part:sim4:na_ARGs.dros >>>>* match_part:sim4:na_ARGsCDS.dros >>>>* match_part:sim4:na_DGC_dros >>>>* match_part:sim4:na_dbEST.diff.dmel >>>>* match_part:sim4:na_dbEST.same.dmel >>>>* match_part:sim4:na_gadfly_dmel_r2 >>>>* match_part:sim4:na_gb.dmel >>>>* match_part:sim4:na_gb.tpa.dmel >>>>* match_part:sim4:na_smallRNA.dros >>>>* match_part:sim4:na_transcript_dmel_r31 >>>>* match_part:sim4:na_transcript_dmel_r32 >>>>* match_part:tRNAscan-SE:. >>>>* match_part:tblastx:na_agambiae >>>>* match_part:tblastx:na_dbEST.insect >>>>* match_part:tblastx:na_dpse >>>>mature_peptide:. >>>>ncRNA:. >>>>oligo:. >>>>point_mutation:. >>>>polyA_site:. >>>>protein_binding_site:. >>>>pseudogene:. >>>>region:. >>>>regulatory_region:. >>>>rescue_fragment:. >>>>scaffold:. >>>>sequence_variant:. >>>>snRNA:. >>>>snoRNA:. >>>>tRNA:. >>>>three_prime_UTR:. >>>>transcription_start_site:. >>>>transposable_element:. >>>>transposable_element_insertion_site:. 3116 >>>> >>>> >>>>yeast gff type:source count >>>>ftp://genome-ftp.stanford.edu/pub/yeast/data_download/ >>>>chromosomal_feature/saccharomyces_cerevisiae.gff >>>>------------------------- >>>>ARS:SGD >>>>CDS:SGD >>>>binding_site:SGD >>>>centromere:SGD >>>>chromosome:SGD >>>>gene:SGD >>>>insertion:SGD >>>>intron:SGD >>>>ncRNA:SGD >>>>nc_primary_transcript:SGD >>>>nucleotide_match:SGD >>>>pseudogene:SGD >>>>rRNA:SGD >>>>region:SGD >>>>region:landmark >>>>repeat_family:SGD >>>>repeat_region:SGD >>>>snRNA:SGD >>>>snoRNA:SGD >>>>tRNA:SGD >>>>telomere:SGD >>>>transposable_element:SGD >>>>transposable_element_gene:SGD >>>> >>>>-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 >>>>-- gilbertd@indiana.edu -- http://marmot.bio.indiana.edu/ >>>> >>>> >>>> >>>>------------------------------------------------------- >>>>This SF.Net email is sponsored by the 'Do More With Dual!' webinar >>>>happening >>>>July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual >>>>core and dual graphics technology at this free one hour event hosted >>>>by HP, AMD, and NVIDIA. To register visit >>>>http://www.hp.com/go/dualwebinar >>>>_______________________________________________ >>>>Gmod-gbrowse mailing list >>>>Gmod-gbrowse@lists.sourceforge.net >>>>https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >>>> >>> >>> >>-- >>------------------------------------------------------------------------ >>Scott Cain, Ph. D. cain@cshl.edu >>GMOD Coordinator (http://www.gmod.org/) 216-392-3087 >>Cold Spring Harbor Laboratory >> >> >> >>------------------------------------------------------- >>SF.Net email is Sponsored by the Better Software Conference & EXPO September >>19-22, 2005 * San Francisco, CA * Development Lifecycle Practices >>Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA >>Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf >>_______________________________________________ >>Gmod-devel mailing list >>Gmod-devel@lists.sourceforge.net >>https://lists.sourceforge.net/lists/listinfo/gmod-devel >> > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From pmiguel at purdue.edu Fri Jul 29 10:45:09 2005 From: pmiguel at purdue.edu (Phillip SanMiguel) Date: Fri Jul 29 10:36:13 2005 Subject: [Bioperl-l] Patching lucy Message-ID: <42EA40F5.3090707@purdue.edu> The patch to lucy source code from (the appendix): http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Lucy.html seems not to work for lucy-1.19p or lucy-1.19s. Actually patch runs fine, but the resulting executable (after make) seg faults when run on the lucy test data. Any advice? I've sent email directly to the module creator, Andrew G. Walsh, as requested in the module. But I'm not sure if the module creator regularly monitors the hotmail account listed therein. So I thought I'd post here, in case someone had a patch that would work on lucy-1.19. -- Phillip SanMiguel Purdue Genomics Core Facility From cain at cshl.edu Fri Jul 29 11:17:12 2005 From: cain at cshl.edu (Scott Cain) Date: Fri Jul 29 11:07:52 2005 Subject: [Bioperl-l] Re: Fixing bioperl [was Re: [GMOD-devel] Re: [Gmod-gbrowse] Analysis features (Re: Final alpha release of gmod (chado))] In-Reply-To: References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu> <42E909E3.2030102@infobiogen.fr> <1122570166.3288.10.camel@localhost.localdomain> Message-ID: <1122650232.10455.31.camel@localhost.localdomain> Hi Chris, I agree that the changes you suggest below need to happen, and I am willing to move forward with them. After the last release of gmod/chado, I was planning to restructure several sections of the gmod architecture, so incorporating changes in bioperl will just go along for the ride. The main section of affected code in gmod is the GFF bulk loader, but after we make the changes to the bioperl API, it shouldn't be too hard to fix the loader. In fact, some of those changes may have already started. I remember a few weeks before I release the gmod/chado package, Hilmar sent out an announcement that he made some changes. While I should have paid attention then, I was busy getting my release together, and everything seemed to work, so I ignored it. Unfortunately, the reason things continued to work was that I forgot to update my bioperl-live, and as a result, the gmod release doesn't work with bioperl-live. So now, there is a tarball of bioperl released with the gmod release. OK, mentally put parenthesis around most of the last paragraph, as it is mostly an aside. The other section of code that could have been affected but won't be is the ontology loader. The current ontology loader depends on Bio::Ontology, but I was already planning on migrating to go-perl for loading ontologies anyway, so that won't be a problem. So, who wants to take the lead on this? Thanks, Scott On Thu, 2005-07-28 at 12:42 -0700, Chris Mungall wrote: > I think the answer may be even more complicated than this. > > Lurkers and contributors to the bioperl mailing list may have noticed that > there has been some major obstacles in progressing lately, particularly in > getting a stable release of the code out. bp1.4 is fairly old, 1.5 is a > developers release, though this is the one required by GMOD. > > My understanding is that this bottleneck can be traced back to changes in > the SeqFeature and Annotation model. These changes appear to be required > by Bio::SeqFeature::Annotated which is produced by Bio::FeatureIO::gff > (which in turn is used by the GMOD bulk loader, which is the main reason > GMOD requires 1.5, I believe?). Unfortunately, these changes also break > existing code and have a severe negative impact on memory usage. > > Before advising Cyril and others to switch to BFIO::gff I think it's > important to make sure there is a clear path forward with bioperl. My > impression is that there is something of a stalemate here. The bioperl > developers would like to retract the aforementioned changes, but they > believe they cannot do this without breaking GMOD code. They are also > extremely uncomfortable about leaving these changes in. Everyone gives up > and starts coding around bioperl. > > Here is why the changes were introduced: > > BioPerl has a 'scruffy' typing model, whereby feature types (primary_tag > in bioperl) and featureprop types (tags in bioperl) are labels or strings. > In contrast, Chado forces all types to be some class or relation in an > ontology. > > Now obviously I'm rather partial to the Chado model, but that doesn't mean > I think it should be forced upon bioperl. I often use bioperl in scruffy > mode (on scruffy data); or in some combination whereby I map the scruffy > types to ontologies in some non-bioperl code. When using bioperl as a > middleware component over a nicely organised database, ontology-typed mode > is definitely best. However, the majority of bioperl users (including > myself) spend a large proportion of their time working with scruffy data, > in which case lightweight scruffy types are more appropriate. > > It seems that there is a perfectly simple way of reconciling both > approaches. We revert bioperl back to the simpler scruffy model. The > majority of users and developers breathe a sigh of relief. We then extend > SeqFeatureI with something like SeqFeatureAnnotatedI. This forces types to > be stored as OntologyTerms (and I haven't even touched on some of the > problems here, but at least we are insulating the standard bioperl layer > that 99% of users use from these issues). All classes implementing SFAI > will necessarily implement SFI, and the primary_tag and tag_values methods > will be supported (not deprecated) as simple delegations to the > OntologyTerm objects. > > We can then modify BFIO::gff (which is an incredibly useful piece of code) > and get rid of all the dependencies on SO and Bio::Ontology* and instead > allow the user of this module to plug in their own resolver/validator - so > they can choose whether they just want fast scruffy lightweight SFI > features, or whether they want ontology-typed SFAI features. If the > latter, then they can choose their own resolver strategy - by a user > supplied hash, by a copy of SO auto-downloaded from sourceforge, by a > local chado db, by the genbank->SO mapping table, during parsing vs > post-parsing, whatever. In fact there is already > Bio::SeqFeature::Tools::TypeMapper, but currently this is mostly concerned > with helping Bio::SeqFeature::Tools::Unflattener convert scruffy genbank > to something sensible. > > GMOD (and perhaps biosql) would use SFAI, everyone else would use the > simpler SFI. Someone can even get a stable 1.6 release out before all the > SFAI details such as how the resolver would work are finalised. I'd really > like to see 1.6 include a simpler BFIO::gff that can optionally produces > features that aren't SeqFeature::Annotateds, but that's negotiable. > > There's vast swathes of both GMOD and BioPerl code I'm not familiar with, > so it's possible my analysis above is flawed in some way. If it is, then > it's up to someone from either camp to speak up! If not, then there's no > excuses for the relevant people to start sorting out this mess by > commencing with the solution outlined above. > > Cheers > Chris > > > > > Scott > > > > > > On Thu, 2005-07-28 at 18:37 +0200, Cyril Pommier wrote: > > > Hello, > > > We are going to store analysis results in chado, and we are of course > > > very interressed by these futur evolutions of GFF3/chado. > > > So we would like to make sure that the parsers and conversions programs > > > we are writing now will be compatible with the futur GFF3. > > > > > > We are using Bio::SeqFeature::Generic objects that we write with > > > Bio::Tools::GFF. > > > > > > Do you think that Bio::Tools::GFF will be able to handle the new 'type' > > > column or is it better to switch to Bio::FeatureIO::gff ? > > > > > > Thanks in advance for any advice. > > > > > > Cyril > > > > > > Don Gilbert wrote: > > > > > > > > > > > Scott, > > > > > > > > Your notes in gmod_bulk_load_gff3.pl suggest it is headed in > > > > same direction I suggest below. More about these todo points > > > > > > > >> - address flybase"s use of of analysisfeature combined with feature to > > > >> give source-type information (in GFF terms). This will need to > > > >> be addressed in the GBrowse adaptor. > > > >> - modify the bulk loader to allow "mixed" GFF3 files (that is, > > > >> containing > > > >> both analysis results and annotations). See perldoc > > > >> gmod_bulk_load_gff3.pl > > > >> for more info > > > > > > > > > > > > Use of chado's analysisfeature table is something others who know > > > > it better can comment on. But after working with it for a while > > > > it makes sense to me to use in this way: > > > > > > > > For a future GFF -> Chado loader, treat analysis features such as > > > > gene finding results, BLAST, sim4 as 'analysisfeature type' rather > > > > than feature CV term type (the ones that now end up with a generic > > > > 'match' cvterm). In these cases the Analysis table is populated with > > > > program:database_sourcename > > > > as the basis of this 'analysisfeature type', such as > > > > match:blastx:na_pe.dros > > > > match:sim4:DGC > > > > match:genie:dummy (or maybe exon:genie) > > > > > > > > The program:database fits neatly in GFF source field, as > > > > #ref source type start stop ... > > > > chr1 blastx:na_pe.dros match 1 100 ... > > > > chr1 sim4:DGC match 1 100 ... > > > > > > > > These can be treated in database adaptor analogously to the CVterm > > > > table feature types. See at end a list of current GFF feature > > > > type:source from worm, rice, yeast, fly MODs. Fly and rice use a > > > > syntax like above and worm gff uses BLAT_EMBL_BEST, instead of > > > > BLAT:EMBL_BEST. > > > > > > > > From POD of your bulk_load_gff3.pl > > > > > Analysis > > > > > If you are loading analysis results (ie, BLAT results, gene > > > > > predictions), you should specify the -a flag. If no arguments are > > > > > supplied with the -a, then the loader will assume that the results > > > > > belong to an analysis set with a name that is the concatenation of > > > > > the source (column 2) and the method (column 3) with an underscore > > > > > in between. > > > > > > > > "... then the loader will assume that the results belong to an > > > > analysis table row with a program name and database source name > > > > taken from Source (column 2, colon separated program:sourcename), > > > > with a SOFA feature type taken from Method (column 3). If > > > > sourcename doesn't apply, e.g. genefinder, don't add or use 'dummy'. > > > > Use the generic 'match' SOFA type if others don't apply." > > > > [see also http://song.sourceforge.net/gff3-jan04.shtml#ALIGNMENTS] > > > > > > > > Note that sourcename of database is a common attribute (all those > > > > blasts, blats, sim4, ... are run on several different databases). > > > > > > > > For that underscore between method and source, where does that go into > > > > database? It is used as parts of program or database sourcename names, > > > > so it may be problematic to add one if not needed. > > > > > > > > Oh, I see now from bulk_load_gff3.PLS, you are creating a 'Name' entry > > > > for analysis table. This probably is less useful than using Program > > > > and Sourcename fields as flybase does, which comes from the common > > > > usage where people run various programs, with various database sources > > > > and want to plop the results into a database easily. These go into those > > > > two fields directly, no need to create or parse a Name entry > > > > (which can be and is null in flybase data). > > > > > > > > > my $search_analysis > > > > > = $db->prepare("SELECT analysis_id FROM analysis WHERE name=?"); > > > > > > > > I think it would be better as > > > > my $search_analysis > > > > = $db->prepare("SELECT analysis_id FROM analysis WHERE program=? and > > > > sourcename=?"); > > > > > > > > > Otherwise, the argument provided with -a will be taken > > > > > as the name of the analysis set. Either way, the analysis set must > > > > > already be in the analysis table. The easist way to do this is to > > > > > insert it directly in the psql shell: > > > > > > > > > > INSERT INTO analysis (name, program, programversion) > > > > > VALUES ('genscan 2005-2-28','genscan','5.4'); > > > > > > > > My choice would be to populate the analysis table from GFF data, rather > > > > than expect prepraration by user (or as another option). > > > > > > > > INSERT INTO analysis (program, sourcename) > > > > VALUES ('tblastx','na_baylorf1_scfchunk.dpse'); > > > > INSERT INTO analysis (program, sourcename) > > > > VALUES ('sim4','na_gb.dmel'); > > > > INSERT INTO analysis (program, sourcename, programversion) > > > > VALUES ('genie_masked','dummy', '1.0'); > > > > > > > > > There are other columns in the analysis table that are optional; see > > > > > the schema documentation and '\d analysis' in psql for more > > > > > information. > > > > > > > > > .... > > > > > A planned addtion to the functionality of handling analysis results > > > > > is to allow "mixed" GFF files, where some lines are analysis results > > > > > and some are not. > > > > > > > > This is the case for drosophila GFF now (see others also below). If > > > > you make the default assumption that if ($method =~ /.*match/) and > > > > ($source =~ m/([^:]+):(.+)/), you should get all/most of > > > > analysisfeature types, and probably not anything else. > > > > > > > > > Additionally, one will be able to supply lists of > > > > > types (optionally with sources) and their associated entry in the > > > > > analysis table. The format will probably be tag value pairs: > > > > > > > > > > --analysis match:Rice_est=rice_est_blast, \ > > > > > match:Maize_cDNA=maize_cdna_blast, \ > > > > > mRNA=genscan_prediction,exon=genscan_prediction > > > > > > > > My suggestion for this (as per GFF source,type columns) would be > > > > --analysis match:program:sourcename ... > > > > --analysis match:blast:Rice_est,match:blast:Maize_cDNA,\ > > > > mRNA:genscan:dummy, exon:genscan:dummy > > > > > > > > I guess the 'dummy' data sourcename need not be added; flybase uses it > > > > to keep that field not-null, but it isn't required by the schema. > > > > > > > > Here are some snippets from the ChadoFC adaptor I modified > > > > from yours (will get into cvs.sf.net 'real soon'), showing that > > > > it isn't much work to add this as an analog to how cvterm types > > > > are used. > > > > > > > > -- Don > > > > > > > > ## Bio::DB::Das::ChadoFC.pm, part of new() - load analysis types > > > > ## treat similar to CV table types > > > > > > > > sub getAnalysisFeatureHash > > > > { > > > > my $self= shift; > > > > > > > > my $dbh= $self->dbh(); > > > > my $sth = $dbh->prepare("select analysis_id,program,sourcename from > > > > analysis") > > > > or warn "unable to prepare select cvterms"; > > > > $sth->execute or $self->throw("unable to select cvterms"); > > > > > > > > my(%term2name,%name2term) = ({},{}); > > > > > > > > while (my $hashref = $sth->fetchrow_hashref) { > > > > > > > > ## this is dgg syntax of analysis feature names for GFF > > > > ## all have generic 'match' method and program:source as 'source' > > > > ## a problem, want other main types: EST_match:xxx, mRNA:genie .. etc. > > > > my $anfeat= "match:".$hashref->{program}.":".$hashref->{sourcename}; > > > > > > > > $term2name{ $hashref->{analysis_id} } = $anfeat; > > > > $name2term{ $anfeat } = $hashref->{analysis_id}; > > > > } > > > > $self->an_term2name(\%term2name); > > > > $self->an_name2term(\%name2term); > > > > } > > > > > > > > ## Das::ChadoFC::Segment snippets > > > > sub features { > > > > $self->{has_anatype}=0; > > > > my $sql_range = ''; > > > > my ($interbase_start,$rend,$srcfeature_id,$sql_types); > > > > unless ($feature_id) { > > > > $sql_range = $self->sql_range($rangetype); > > > > > > > > $sql_types = $self->sql_types($types, -1); # dgg > > > > > > > > $srcfeature_id = $self->{srcfeature_id}; > > > > } > > > > ... > > > > elsif($self->{has_anatype}) { > > > > $from_part .= "left join analysisfeature af using (feature_id) "; > > > > } > > > > > > > > > > > > sub sql_types > > > > .. > > > > $valid_type = $factory->name2term($temp_type); > > > > $is_anatype= 0; > > > > unless ($valid_type) { > > > > $valid_type = $factory->an_name2term($temp_type); > > > > $self->{has_anatype}= $is_anatype= 1 if ($valid_type); > > > > } > > > > .. > > > > ## leave out extra invalid types > > > > if (!$valid_type) { > > > > ### skip > > > > } elsif ($temp_dbxref) { > > > > $sql_types .= $orsql."(f.type_id = $valid_type and fd.dbxref_id = > > > > $temp_dbxref)"; > > > > } elsif($is_anatype) { > > > > $sql_types .= $orsql."(af.analysis_id = $valid_type)"; #<<< > > > > } else { > > > > $sql_types .= $orsql."(f.type_id = $valid_type)"; > > > > } > > > > > > > > > > > > Lists of GFF feature type:source from some current MOD data > > > > where * are probably analysisfeature types (program:database) > > > > > > > > rice gff type:source > > > > ftp://ftp.gramene.org/pub/gramene/release17/data/sequence_annotation/ > > > > gff3/ > > > > -------------------- > > > > CDS:known > > > > CDS:tigr > > > > EST:cmap > > > > EST_match:Barley (? might be EST_match:someprogram:Barley) > > > > EST_match:Maize > > > > EST_match:Millet > > > > EST_match:Rice > > > > EST_match:Sorghum > > > > EST_match:Wheat > > > > cDNA_match:Rice > > > > cross_genome_match:Maize > > > > cross_genome_match:Rice > > > > cross_genome_match:Sorghum > > > > * exon:FgenesH:Monocot > > > > exon:known > > > > exon:tigr > > > > five_prime_UTR:tigr > > > > gene:known > > > > gene:tigr > > > > * mRNA:FgenesH:Monocot > > > > mRNA:known > > > > mRNA:tigr > > > > microsatellite:cmap > > > > three_prime_UTR:known > > > > three_prime_UTR:tigr > > > > transposable_element_insertion_site:cmap > > > > > > > > worm gff type:source > > > > ftp://ftp.wormbase.org/pub/wormbase/species/elegans/ > > > > genome_feature_tables/GFF3/ > > > > ---------------------- > > > > CDS:Coding_transcript > > > > * CDS:Genefinder > > > > CDS:Transposon_CDS > > > > CDS:history > > > > * CDS:twinscan > > > > * EST_match:BLAT_EST_BEST (~ EST_match:BLAT:EST_BEST) > > > > * EST_match:BLAT_EST_OTHER > > > > PCR_product:GenePair_STS > > > > PCR_product:Orfeome > > > > RNAi_reagent:RNAi_primary > > > > RNAi_reagent:RNAi_secondary > > > > SNP:Allele > > > > binding_site:binding_site > > > > * cDNA_match:BLAT_mRNA_BEST (~ cDNA_match:BLAT:mRNA_BEST ) > > > > * cDNA_match:BLAT_mRNA_OTHER > > > > clone_end:. > > > > clone_start:. > > > > complex_substitution :Allele > > > > deletion:Allele > > > > exon:Coding_transcript > > > > * exon:Genefinder > > > > exon:Non_coding_transcript > > > > exon:Pseudogene > > > > exon:Transposon_CDS > > > > exon:history > > > > exon:miRNA > > > > exon:rRNA > > > > exon:scRNA > > > > exon:snRNA > > > > exon:snoRNA > > > > exon:tRNA > > > > * exon:tRNAscan-SE-1.23 > > > > * exon:twinscan > > > > experimental_result_region:Expr_profile > > > > experimental_result_region:cDNA_for_RNAi > > > > * expressed_sequence_match:BLAT_OST_BEST (~ > > > > expressed_sequence_match:BLAT:OST_BEST ) > > > > * expressed_sequence_match:BLAT_OST_OTHER > > > > five_prime_UTR:Coding_transcript > > > > gene:Coding_transcript > > > > gene:gene > > > > gene:history > > > > gene:landmark > > > > insertion:Allele > > > > inverted_repeat:inverted > > > > mRNA:Coding_transcript > > > > * mRNA:Genefinder > > > > mRNA:Transposon_CDS > > > > mRNA:history > > > > * mRNA:twinscan > > > > miRNA:miRNA > > > > nc_primary_transcript:Non_coding_transcript > > > > * nucleotide_match:BLAT_EMBL_BEST (~ nucleotide_match:BLAT:EMBL_BEST ) > > > > * nucleotide_match:BLAT_EMBL_OTHER > > > > * nucleotide_match:BLAT_TC1_BEST > > > > * nucleotide_match:BLAT_TC1_OTHER > > > > * nucleotide_match:BLAT_ncRNA_BEST > > > > * nucleotide_match:BLAT_ncRNA_OTHER > > > > * nucleotide_match:TEC_RED > > > > * nucleotide_match:waba_coding > > > > * nucleotide_match:waba_strong > > > > * nucleotide_match:waba_weak > > > > oligo:. > > > > operon:operon > > > > polyA_signal_sequence:polyA_signal_sequence > > > > polyA_site:polyA_site > > > > processed_transcript:gene > > > > protein_coding_primary_transcript:Coding_transcript > > > > * protein_match:wublastx > > > > pseudogene:Pseudogene > > > > pseudogene:history > > > > rRNA:rRNA > > > > reagent:Oligo_set > > > > region:. > > > > region:Genbank > > > > region:Genomic_canonical > > > > region:Link > > > > * repeat_region:RepeatMasker > > > > scRNA:scRNA > > > > sequence_variant:. > > > > sequence_variant:Allele > > > > snRNA:snRNA > > > > snoRNA:snoRNA > > > > substitution:Allele > > > > tRNA:tRNA > > > > * tRNA:tRNAscan-SE-1.23 > > > > tandem_repeat:tandem > > > > three_prime_UTR:Coding_transcript > > > > trans_splice_acceptor_site:SL1 > > > > trans_splice_acceptor_site:SL2 > > > > transcript:SAGE_transcript > > > > * translated_nucleotide_match:BLAT_NEMATODE (~ > > > > translated_nucleotide_match:BLAT:NEMATODE ) > > > > transposable_element:Transposon > > > > transposable_element:Transposon_CDS > > > > transposable_element_insertion_site:Allele > > > > transposable_element_insertion_site:Mos_insertion_allele > > > > > > > > > > > > fly gff type:source > > > > ftp://ftp.flybase.net/genomes/dmel/current/gff/ > > > > ----------------------- > > > > BAC:. > > > > CDS:. > > > > aberration_junction:. > > > > chromosome:. > > > > chromosome_arm:. > > > > chromosome_band:. > > > > enhancer:. > > > > exon:. > > > > five_prime_UTR:. > > > > gene:. > > > > insertion_site:. > > > > intron:. > > > > mRNA:. > > > > * match:RNAiHDP > > > > * match:assembly:path > > > > * match:blastx:aa_SPTR.dmel > > > > * match:blastx:aa_SPTR.insect > > > > * match:blastx:aa_SPTR.othinv > > > > * match:blastx:aa_SPTR.othvert > > > > * match:blastx:aa_SPTR.plant > > > > * match:blastx:aa_SPTR.primate > > > > * match:blastx:aa_SPTR.rodent > > > > * match:blastx:aa_SPTR.worm > > > > * match:blastx:aa_SPTR.yeast > > > > * match:genscan > > > > * match:repeatmasker > > > > * match:sim4:na_ARGs.dros > > > > * match:sim4:na_ARGsCDS.dros > > > > * match:sim4:na_DGC_dros > > > > * match:sim4:na_dbEST.diff.dmel > > > > * match:sim4:na_dbEST.same.dmel > > > > * match:sim4:na_gadfly_dmel_r2 > > > > * match:sim4:na_gb.dmel > > > > * match:sim4:na_gb.tpa.dmel > > > > * match:sim4:na_smallRNA.dros > > > > * match:sim4:na_transcript_dmel_r31 > > > > * match:sim4:na_transcript_dmel_r32 > > > > * match:tRNAscan-SE:. > > > > * match:tblastx:na_agambiae > > > > * match:tblastx:na_dbEST.insect > > > > * match:tblastx:na_dpse > > > > * match_part:RNAiHDP > > > > * match_part:assembly:path > > > > * match_part:blastx:aa_SPTR.dmel > > > > * match_part:blastx:aa_SPTR.insect > > > > * match_part:blastx:aa_SPTR.othinv > > > > * match_part:blastx:aa_SPTR.othvert > > > > * match_part:blastx:aa_SPTR.plant > > > > * match_part:blastx:aa_SPTR.primate > > > > * match_part:blastx:aa_SPTR.rodent > > > > * match_part:blastx:aa_SPTR.worm > > > > * match_part:blastx:aa_SPTR.yeast > > > > * match_part:genscan > > > > * match_part:repeatmasker > > > > * match_part:sim4:na_ARGs.dros > > > > * match_part:sim4:na_ARGsCDS.dros > > > > * match_part:sim4:na_DGC_dros > > > > * match_part:sim4:na_dbEST.diff.dmel > > > > * match_part:sim4:na_dbEST.same.dmel > > > > * match_part:sim4:na_gadfly_dmel_r2 > > > > * match_part:sim4:na_gb.dmel > > > > * match_part:sim4:na_gb.tpa.dmel > > > > * match_part:sim4:na_smallRNA.dros > > > > * match_part:sim4:na_transcript_dmel_r31 > > > > * match_part:sim4:na_transcript_dmel_r32 > > > > * match_part:tRNAscan-SE:. > > > > * match_part:tblastx:na_agambiae > > > > * match_part:tblastx:na_dbEST.insect > > > > * match_part:tblastx:na_dpse > > > > mature_peptide:. > > > > ncRNA:. > > > > oligo:. > > > > point_mutation:. > > > > polyA_site:. > > > > protein_binding_site:. > > > > pseudogene:. > > > > region:. > > > > regulatory_region:. > > > > rescue_fragment:. > > > > scaffold:. > > > > sequence_variant:. > > > > snRNA:. > > > > snoRNA:. > > > > tRNA:. > > > > three_prime_UTR:. > > > > transcription_start_site:. > > > > transposable_element:. > > > > transposable_element_insertion_site:. 3116 > > > > > > > > > > > > yeast gff type:source count > > > > ftp://genome-ftp.stanford.edu/pub/yeast/data_download/ > > > > chromosomal_feature/saccharomyces_cerevisiae.gff > > > > ------------------------- > > > > ARS:SGD > > > > CDS:SGD > > > > binding_site:SGD > > > > centromere:SGD > > > > chromosome:SGD > > > > gene:SGD > > > > insertion:SGD > > > > intron:SGD > > > > ncRNA:SGD > > > > nc_primary_transcript:SGD > > > > nucleotide_match:SGD > > > > pseudogene:SGD > > > > rRNA:SGD > > > > region:SGD > > > > region:landmark > > > > repeat_family:SGD > > > > repeat_region:SGD > > > > snRNA:SGD > > > > snoRNA:SGD > > > > tRNA:SGD > > > > telomere:SGD > > > > transposable_element:SGD > > > > transposable_element_gene:SGD > > > > > > > > -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 > > > > -- gilbertd@indiana.edu -- http://marmot.bio.indiana.edu/ > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > This SF.Net email is sponsored by the 'Do More With Dual!' webinar > > > > happening > > > > July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual > > > > core and dual graphics technology at this free one hour event hosted > > > > by HP, AMD, and NVIDIA. To register visit > > > > http://www.hp.com/go/dualwebinar > > > > _______________________________________________ > > > > Gmod-gbrowse mailing list > > > > Gmod-gbrowse@lists.sourceforge.net > > > > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > > > > > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. cain@cshl.edu > > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > > Cold Spring Harbor Laboratory > > > > > > > > ------------------------------------------------------- > > SF.Net email is Sponsored by the Better Software Conference & EXPO September > > 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices > > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA > > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > > _______________________________________________ > > Gmod-devel mailing list > > Gmod-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gmod-devel > > > > > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO September > 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > _______________________________________________ > Gmod-devel mailing list > Gmod-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-devel -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From mebradley at chem.ufl.edu Fri Jul 29 17:39:08 2005 From: mebradley at chem.ufl.edu (Michael Bradley) Date: Fri Jul 29 17:47:02 2005 Subject: [Bioperl-l] constructing a tree object Message-ID: <003201c59485$f68500c0$ab05a8c0@bradleydell> Can anyone tell me how to do $treeObj = Bio::TreeIO->new(-file "somefile", -format 'newick' ) from a variable instead of a file? Suppose that my tree is stored in $treestring. I would like to do something like : $treeObj = Bio::TreeIO->new(-$treestring, -format 'newick' ) . Thanks, Mike Bradley From hlapp at gnf.org Fri Jul 29 20:07:35 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Jul 29 19:58:09 2005 Subject: [Bioperl-l] Re: Fixing bioperl [was Re: [GMOD-devel] Re: [Gmod-gbrowse] Analysis features (Re: Final alpha release of gmod (chado))] In-Reply-To: References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu> <42E909E3.2030102@infobiogen.fr> <1122570166.3288.10.camel@localhost.localdomain> Message-ID: <08c0281f4eda0b376b27944a7aa99191@gnf.org> Hi Chris, this sounds like a way to go. As you note, I'm not very comfortable with transforming the API to use structured objects where flat strings will do just fine in most applications until someone demonstrates that this doesn't substantially impact performance/memory hogging in large-throughput use-cases. And operator overloading is just too bug-prone IMHO. The one thing that makes me hesitate is the introduction of another interface - but maybe I should be cool if you are ;) OTOH, adding typed methods to SeqFeatureI instead of recasting the existing ones maybe just causes as much confusion. -hilmar On Jul 28, 2005, at 12:42 PM, Chris Mungall wrote: > > [sorry for the cross-posting, but I think it's really important to > have a > gmod to bioperl chit chat on this. I've removed gmod-gbrowse from the > cc > list] > > On Thu, 28 Jul 2005, Scott Cain wrote: > >> Hi Cyril, >> >> I think Bio::Tools::GFF is somewhat hacky and not a tool I would use >> to >> produce 'safe' GFF3. On the other hand Bio::FeatureIO is still a >> little >> immature, but it is what I used for the chado GFF3 bulk loader, so it >> does handle (parse) Target features. So my suggestion would be to use >> BFIO::gff, but be prepared for some problems; when you find them >> complain loudly on the bioperl mailing list or fix the problems and >> commit them (or both!). > > I think the answer may be even more complicated than this. > > Lurkers and contributors to the bioperl mailing list may have noticed > that > there has been some major obstacles in progressing lately, > particularly in > getting a stable release of the code out. bp1.4 is fairly old, 1.5 is a > developers release, though this is the one required by GMOD. > > My understanding is that this bottleneck can be traced back to changes > in > the SeqFeature and Annotation model. These changes appear to be > required > by Bio::SeqFeature::Annotated which is produced by Bio::FeatureIO::gff > (which in turn is used by the GMOD bulk loader, which is the main > reason > GMOD requires 1.5, I believe?). Unfortunately, these changes also break > existing code and have a severe negative impact on memory usage. > > Before advising Cyril and others to switch to BFIO::gff I think it's > important to make sure there is a clear path forward with bioperl. My > impression is that there is something of a stalemate here. The bioperl > developers would like to retract the aforementioned changes, but they > believe they cannot do this without breaking GMOD code. They are also > extremely uncomfortable about leaving these changes in. Everyone gives > up > and starts coding around bioperl. > > Here is why the changes were introduced: > > BioPerl has a 'scruffy' typing model, whereby feature types > (primary_tag > in bioperl) and featureprop types (tags in bioperl) are labels or > strings. > In contrast, Chado forces all types to be some class or relation in an > ontology. > > Now obviously I'm rather partial to the Chado model, but that doesn't > mean > I think it should be forced upon bioperl. I often use bioperl in > scruffy > mode (on scruffy data); or in some combination whereby I map the > scruffy > types to ontologies in some non-bioperl code. When using bioperl as a > middleware component over a nicely organised database, ontology-typed > mode > is definitely best. However, the majority of bioperl users (including > myself) spend a large proportion of their time working with scruffy > data, > in which case lightweight scruffy types are more appropriate. > > It seems that there is a perfectly simple way of reconciling both > approaches. We revert bioperl back to the simpler scruffy model. The > majority of users and developers breathe a sigh of relief. We then > extend > SeqFeatureI with something like SeqFeatureAnnotatedI. This forces > types to > be stored as OntologyTerms (and I haven't even touched on some of the > problems here, but at least we are insulating the standard bioperl > layer > that 99% of users use from these issues). All classes implementing SFAI > will necessarily implement SFI, and the primary_tag and tag_values > methods > will be supported (not deprecated) as simple delegations to the > OntologyTerm objects. > > We can then modify BFIO::gff (which is an incredibly useful piece of > code) > and get rid of all the dependencies on SO and Bio::Ontology* and > instead > allow the user of this module to plug in their own resolver/validator > - so > they can choose whether they just want fast scruffy lightweight SFI > features, or whether they want ontology-typed SFAI features. If the > latter, then they can choose their own resolver strategy - by a user > supplied hash, by a copy of SO auto-downloaded from sourceforge, by a > local chado db, by the genbank->SO mapping table, during parsing vs > post-parsing, whatever. In fact there is already > Bio::SeqFeature::Tools::TypeMapper, but currently this is mostly > concerned > with helping Bio::SeqFeature::Tools::Unflattener convert scruffy > genbank > to something sensible. > > GMOD (and perhaps biosql) would use SFAI, everyone else would use the > simpler SFI. Someone can even get a stable 1.6 release out before all > the > SFAI details such as how the resolver would work are finalised. I'd > really > like to see 1.6 include a simpler BFIO::gff that can optionally > produces > features that aren't SeqFeature::Annotateds, but that's negotiable. > > There's vast swathes of both GMOD and BioPerl code I'm not familiar > with, > so it's possible my analysis above is flawed in some way. If it is, > then > it's up to someone from either camp to speak up! If not, then there's > no > excuses for the relevant people to start sorting out this mess by > commencing with the solution outlined above. > > Cheers > Chris > >> >> Scott >> >> >> On Thu, 2005-07-28 at 18:37 +0200, Cyril Pommier wrote: >>> Hello, >>> We are going to store analysis results in chado, and we are of course >>> very interressed by these futur evolutions of GFF3/chado. >>> So we would like to make sure that the parsers and conversions >>> programs >>> we are writing now will be compatible with the futur GFF3. >>> >>> We are using Bio::SeqFeature::Generic objects that we write with >>> Bio::Tools::GFF. >>> >>> Do you think that Bio::Tools::GFF will be able to handle the new >>> 'type' >>> column or is it better to switch to Bio::FeatureIO::gff ? >>> >>> Thanks in advance for any advice. >>> >>> Cyril >>> >>> Don Gilbert wrote: >>> >>>> >>>> Scott, >>>> >>>> Your notes in gmod_bulk_load_gff3.pl suggest it is headed in >>>> same direction I suggest below. More about these todo points >>>> >>>>> - address flybase"s use of of analysisfeature combined with >>>>> feature to >>>>> give source-type information (in GFF terms). This will need to >>>>> be addressed in the GBrowse adaptor. >>>>> - modify the bulk loader to allow "mixed" GFF3 files (that is, >>>>> containing >>>>> both analysis results and annotations). See perldoc >>>>> gmod_bulk_load_gff3.pl >>>>> for more info >>>> >>>> >>>> Use of chado's analysisfeature table is something others who know >>>> it better can comment on. But after working with it for a while >>>> it makes sense to me to use in this way: >>>> >>>> For a future GFF -> Chado loader, treat analysis features such as >>>> gene finding results, BLAST, sim4 as 'analysisfeature type' rather >>>> than feature CV term type (the ones that now end up with a generic >>>> 'match' cvterm). In these cases the Analysis table is populated with >>>> program:database_sourcename >>>> as the basis of this 'analysisfeature type', such as >>>> match:blastx:na_pe.dros >>>> match:sim4:DGC >>>> match:genie:dummy (or maybe exon:genie) >>>> >>>> The program:database fits neatly in GFF source field, as >>>> #ref source type start stop ... >>>> chr1 blastx:na_pe.dros match 1 100 ... >>>> chr1 sim4:DGC match 1 100 ... >>>> >>>> These can be treated in database adaptor analogously to the CVterm >>>> table feature types. See at end a list of current GFF feature >>>> type:source from worm, rice, yeast, fly MODs. Fly and rice use a >>>> syntax like above and worm gff uses BLAT_EMBL_BEST, instead of >>>> BLAT:EMBL_BEST. >>>> >>>> From POD of your bulk_load_gff3.pl >>>>> Analysis >>>>> If you are loading analysis results (ie, BLAT results, gene >>>>> predictions), you should specify the -a flag. If no arguments are >>>>> supplied with the -a, then the loader will assume that the results >>>>> belong to an analysis set with a name that is the concatenation of >>>>> the source (column 2) and the method (column 3) with an underscore >>>>> in between. >>>> >>>> "... then the loader will assume that the results belong to an >>>> analysis table row with a program name and database source name >>>> taken from Source (column 2, colon separated program:sourcename), >>>> with a SOFA feature type taken from Method (column 3). If >>>> sourcename doesn't apply, e.g. genefinder, don't add or use 'dummy'. >>>> Use the generic 'match' SOFA type if others don't apply." >>>> [see also http://song.sourceforge.net/gff3-jan04.shtml#ALIGNMENTS] >>>> >>>> Note that sourcename of database is a common attribute (all those >>>> blasts, blats, sim4, ... are run on several different databases). >>>> >>>> For that underscore between method and source, where does that go >>>> into >>>> database? It is used as parts of program or database sourcename >>>> names, >>>> so it may be problematic to add one if not needed. >>>> >>>> Oh, I see now from bulk_load_gff3.PLS, you are creating a 'Name' >>>> entry >>>> for analysis table. This probably is less useful than using Program >>>> and Sourcename fields as flybase does, which comes from the common >>>> usage where people run various programs, with various database >>>> sources >>>> and want to plop the results into a database easily. These go into >>>> those >>>> two fields directly, no need to create or parse a Name entry >>>> (which can be and is null in flybase data). >>>> >>>>> my $search_analysis >>>>> = $db->prepare("SELECT analysis_id FROM analysis WHERE name=?"); >>>> >>>> I think it would be better as >>>> my $search_analysis >>>> = $db->prepare("SELECT analysis_id FROM analysis WHERE program=? and >>>> sourcename=?"); >>>> >>>>> Otherwise, the argument provided with -a will be taken >>>>> as the name of the analysis set. Either way, the analysis set must >>>>> already be in the analysis table. The easist way to do this is to >>>>> insert it directly in the psql shell: >>>>> >>>>> INSERT INTO analysis (name, program, programversion) >>>>> VALUES ('genscan 2005-2-28','genscan','5.4'); >>>> >>>> My choice would be to populate the analysis table from GFF data, >>>> rather >>>> than expect prepraration by user (or as another option). >>>> >>>> INSERT INTO analysis (program, sourcename) >>>> VALUES ('tblastx','na_baylorf1_scfchunk.dpse'); >>>> INSERT INTO analysis (program, sourcename) >>>> VALUES ('sim4','na_gb.dmel'); >>>> INSERT INTO analysis (program, sourcename, programversion) >>>> VALUES ('genie_masked','dummy', '1.0'); >>>> >>>>> There are other columns in the analysis table that are optional; >>>>> see >>>>> the schema documentation and '\d analysis' in psql for more >>>>> information. >>>>> >>>> .... >>>>> A planned addtion to the functionality of handling analysis results >>>>> is to allow "mixed" GFF files, where some lines are analysis >>>>> results >>>>> and some are not. >>>> >>>> This is the case for drosophila GFF now (see others also below). If >>>> you make the default assumption that if ($method =~ /.*match/) and >>>> ($source =~ m/([^:]+):(.+)/), you should get all/most of >>>> analysisfeature types, and probably not anything else. >>>> >>>>> Additionally, one will be able to supply lists of >>>>> types (optionally with sources) and their associated entry in the >>>>> analysis table. The format will probably be tag value pairs: >>>>> >>>>> --analysis match:Rice_est=rice_est_blast, \ >>>>> match:Maize_cDNA=maize_cdna_blast, \ >>>>> mRNA=genscan_prediction,exon=genscan_prediction >>>> >>>> My suggestion for this (as per GFF source,type columns) would be >>>> --analysis match:program:sourcename ... >>>> --analysis match:blast:Rice_est,match:blast:Maize_cDNA,\ >>>> mRNA:genscan:dummy, exon:genscan:dummy >>>> >>>> I guess the 'dummy' data sourcename need not be added; flybase uses >>>> it >>>> to keep that field not-null, but it isn't required by the schema. >>>> >>>> Here are some snippets from the ChadoFC adaptor I modified >>>> from yours (will get into cvs.sf.net 'real soon'), showing that >>>> it isn't much work to add this as an analog to how cvterm types >>>> are used. >>>> >>>> -- Don >>>> >>>> ## Bio::DB::Das::ChadoFC.pm, part of new() - load analysis types >>>> ## treat similar to CV table types >>>> >>>> sub getAnalysisFeatureHash >>>> { >>>> my $self= shift; >>>> >>>> my $dbh= $self->dbh(); >>>> my $sth = $dbh->prepare("select analysis_id,program,sourcename from >>>> analysis") >>>> or warn "unable to prepare select cvterms"; >>>> $sth->execute or $self->throw("unable to select cvterms"); >>>> >>>> my(%term2name,%name2term) = ({},{}); >>>> >>>> while (my $hashref = $sth->fetchrow_hashref) { >>>> >>>> ## this is dgg syntax of analysis feature names for GFF >>>> ## all have generic 'match' method and program:source as 'source' >>>> ## a problem, want other main types: EST_match:xxx, mRNA:genie .. >>>> etc. >>>> my $anfeat= "match:".$hashref->{program}.":".$hashref->{sourcename}; >>>> >>>> $term2name{ $hashref->{analysis_id} } = $anfeat; >>>> $name2term{ $anfeat } = $hashref->{analysis_id}; >>>> } >>>> $self->an_term2name(\%term2name); >>>> $self->an_name2term(\%name2term); >>>> } >>>> >>>> ## Das::ChadoFC::Segment snippets >>>> sub features { >>>> $self->{has_anatype}=0; >>>> my $sql_range = ''; >>>> my ($interbase_start,$rend,$srcfeature_id,$sql_types); >>>> unless ($feature_id) { >>>> $sql_range = $self->sql_range($rangetype); >>>> >>>> $sql_types = $self->sql_types($types, -1); # dgg >>>> >>>> $srcfeature_id = $self->{srcfeature_id}; >>>> } >>>> ... >>>> elsif($self->{has_anatype}) { >>>> $from_part .= "left join analysisfeature af using (feature_id) "; >>>> } >>>> >>>> >>>> sub sql_types >>>> .. >>>> $valid_type = $factory->name2term($temp_type); >>>> $is_anatype= 0; >>>> unless ($valid_type) { >>>> $valid_type = $factory->an_name2term($temp_type); >>>> $self->{has_anatype}= $is_anatype= 1 if ($valid_type); >>>> } >>>> .. >>>> ## leave out extra invalid types >>>> if (!$valid_type) { >>>> ### skip >>>> } elsif ($temp_dbxref) { >>>> $sql_types .= $orsql."(f.type_id = $valid_type and fd.dbxref_id = >>>> $temp_dbxref)"; >>>> } elsif($is_anatype) { >>>> $sql_types .= $orsql."(af.analysis_id = $valid_type)"; #<<< >>>> } else { >>>> $sql_types .= $orsql."(f.type_id = $valid_type)"; >>>> } >>>> >>>> >>>> Lists of GFF feature type:source from some current MOD data >>>> where * are probably analysisfeature types (program:database) >>>> >>>> rice gff type:source >>>> ftp://ftp.gramene.org/pub/gramene/release17/data/ >>>> sequence_annotation/ >>>> gff3/ >>>> -------------------- >>>> CDS:known >>>> CDS:tigr >>>> EST:cmap >>>> EST_match:Barley (? might be EST_match:someprogram:Barley) >>>> EST_match:Maize >>>> EST_match:Millet >>>> EST_match:Rice >>>> EST_match:Sorghum >>>> EST_match:Wheat >>>> cDNA_match:Rice >>>> cross_genome_match:Maize >>>> cross_genome_match:Rice >>>> cross_genome_match:Sorghum >>>> * exon:FgenesH:Monocot >>>> exon:known >>>> exon:tigr >>>> five_prime_UTR:tigr >>>> gene:known >>>> gene:tigr >>>> * mRNA:FgenesH:Monocot >>>> mRNA:known >>>> mRNA:tigr >>>> microsatellite:cmap >>>> three_prime_UTR:known >>>> three_prime_UTR:tigr >>>> transposable_element_insertion_site:cmap >>>> >>>> worm gff type:source >>>> ftp://ftp.wormbase.org/pub/wormbase/species/elegans/ >>>> genome_feature_tables/GFF3/ >>>> ---------------------- >>>> CDS:Coding_transcript >>>> * CDS:Genefinder >>>> CDS:Transposon_CDS >>>> CDS:history >>>> * CDS:twinscan >>>> * EST_match:BLAT_EST_BEST (~ EST_match:BLAT:EST_BEST) >>>> * EST_match:BLAT_EST_OTHER >>>> PCR_product:GenePair_STS >>>> PCR_product:Orfeome >>>> RNAi_reagent:RNAi_primary >>>> RNAi_reagent:RNAi_secondary >>>> SNP:Allele >>>> binding_site:binding_site >>>> * cDNA_match:BLAT_mRNA_BEST (~ cDNA_match:BLAT:mRNA_BEST ) >>>> * cDNA_match:BLAT_mRNA_OTHER >>>> clone_end:. >>>> clone_start:. >>>> complex_substitution :Allele >>>> deletion:Allele >>>> exon:Coding_transcript >>>> * exon:Genefinder >>>> exon:Non_coding_transcript >>>> exon:Pseudogene >>>> exon:Transposon_CDS >>>> exon:history >>>> exon:miRNA >>>> exon:rRNA >>>> exon:scRNA >>>> exon:snRNA >>>> exon:snoRNA >>>> exon:tRNA >>>> * exon:tRNAscan-SE-1.23 >>>> * exon:twinscan >>>> experimental_result_region:Expr_profile >>>> experimental_result_region:cDNA_for_RNAi >>>> * expressed_sequence_match:BLAT_OST_BEST (~ >>>> expressed_sequence_match:BLAT:OST_BEST ) >>>> * expressed_sequence_match:BLAT_OST_OTHER >>>> five_prime_UTR:Coding_transcript >>>> gene:Coding_transcript >>>> gene:gene >>>> gene:history >>>> gene:landmark >>>> insertion:Allele >>>> inverted_repeat:inverted >>>> mRNA:Coding_transcript >>>> * mRNA:Genefinder >>>> mRNA:Transposon_CDS >>>> mRNA:history >>>> * mRNA:twinscan >>>> miRNA:miRNA >>>> nc_primary_transcript:Non_coding_transcript >>>> * nucleotide_match:BLAT_EMBL_BEST (~ >>>> nucleotide_match:BLAT:EMBL_BEST ) >>>> * nucleotide_match:BLAT_EMBL_OTHER >>>> * nucleotide_match:BLAT_TC1_BEST >>>> * nucleotide_match:BLAT_TC1_OTHER >>>> * nucleotide_match:BLAT_ncRNA_BEST >>>> * nucleotide_match:BLAT_ncRNA_OTHER >>>> * nucleotide_match:TEC_RED >>>> * nucleotide_match:waba_coding >>>> * nucleotide_match:waba_strong >>>> * nucleotide_match:waba_weak >>>> oligo:. >>>> operon:operon >>>> polyA_signal_sequence:polyA_signal_sequence >>>> polyA_site:polyA_site >>>> processed_transcript:gene >>>> protein_coding_primary_transcript:Coding_transcript >>>> * protein_match:wublastx >>>> pseudogene:Pseudogene >>>> pseudogene:history >>>> rRNA:rRNA >>>> reagent:Oligo_set >>>> region:. >>>> region:Genbank >>>> region:Genomic_canonical >>>> region:Link >>>> * repeat_region:RepeatMasker >>>> scRNA:scRNA >>>> sequence_variant:. >>>> sequence_variant:Allele >>>> snRNA:snRNA >>>> snoRNA:snoRNA >>>> substitution:Allele >>>> tRNA:tRNA >>>> * tRNA:tRNAscan-SE-1.23 >>>> tandem_repeat:tandem >>>> three_prime_UTR:Coding_transcript >>>> trans_splice_acceptor_site:SL1 >>>> trans_splice_acceptor_site:SL2 >>>> transcript:SAGE_transcript >>>> * translated_nucleotide_match:BLAT_NEMATODE (~ >>>> translated_nucleotide_match:BLAT:NEMATODE ) >>>> transposable_element:Transposon >>>> transposable_element:Transposon_CDS >>>> transposable_element_insertion_site:Allele >>>> transposable_element_insertion_site:Mos_insertion_allele >>>> >>>> >>>> fly gff type:source >>>> ftp://ftp.flybase.net/genomes/dmel/current/gff/ >>>> ----------------------- >>>> BAC:. >>>> CDS:. >>>> aberration_junction:. >>>> chromosome:. >>>> chromosome_arm:. >>>> chromosome_band:. >>>> enhancer:. >>>> exon:. >>>> five_prime_UTR:. >>>> gene:. >>>> insertion_site:. >>>> intron:. >>>> mRNA:. >>>> * match:RNAiHDP >>>> * match:assembly:path >>>> * match:blastx:aa_SPTR.dmel >>>> * match:blastx:aa_SPTR.insect >>>> * match:blastx:aa_SPTR.othinv >>>> * match:blastx:aa_SPTR.othvert >>>> * match:blastx:aa_SPTR.plant >>>> * match:blastx:aa_SPTR.primate >>>> * match:blastx:aa_SPTR.rodent >>>> * match:blastx:aa_SPTR.worm >>>> * match:blastx:aa_SPTR.yeast >>>> * match:genscan >>>> * match:repeatmasker >>>> * match:sim4:na_ARGs.dros >>>> * match:sim4:na_ARGsCDS.dros >>>> * match:sim4:na_DGC_dros >>>> * match:sim4:na_dbEST.diff.dmel >>>> * match:sim4:na_dbEST.same.dmel >>>> * match:sim4:na_gadfly_dmel_r2 >>>> * match:sim4:na_gb.dmel >>>> * match:sim4:na_gb.tpa.dmel >>>> * match:sim4:na_smallRNA.dros >>>> * match:sim4:na_transcript_dmel_r31 >>>> * match:sim4:na_transcript_dmel_r32 >>>> * match:tRNAscan-SE:. >>>> * match:tblastx:na_agambiae >>>> * match:tblastx:na_dbEST.insect >>>> * match:tblastx:na_dpse >>>> * match_part:RNAiHDP >>>> * match_part:assembly:path >>>> * match_part:blastx:aa_SPTR.dmel >>>> * match_part:blastx:aa_SPTR.insect >>>> * match_part:blastx:aa_SPTR.othinv >>>> * match_part:blastx:aa_SPTR.othvert >>>> * match_part:blastx:aa_SPTR.plant >>>> * match_part:blastx:aa_SPTR.primate >>>> * match_part:blastx:aa_SPTR.rodent >>>> * match_part:blastx:aa_SPTR.worm >>>> * match_part:blastx:aa_SPTR.yeast >>>> * match_part:genscan >>>> * match_part:repeatmasker >>>> * match_part:sim4:na_ARGs.dros >>>> * match_part:sim4:na_ARGsCDS.dros >>>> * match_part:sim4:na_DGC_dros >>>> * match_part:sim4:na_dbEST.diff.dmel >>>> * match_part:sim4:na_dbEST.same.dmel >>>> * match_part:sim4:na_gadfly_dmel_r2 >>>> * match_part:sim4:na_gb.dmel >>>> * match_part:sim4:na_gb.tpa.dmel >>>> * match_part:sim4:na_smallRNA.dros >>>> * match_part:sim4:na_transcript_dmel_r31 >>>> * match_part:sim4:na_transcript_dmel_r32 >>>> * match_part:tRNAscan-SE:. >>>> * match_part:tblastx:na_agambiae >>>> * match_part:tblastx:na_dbEST.insect >>>> * match_part:tblastx:na_dpse >>>> mature_peptide:. >>>> ncRNA:. >>>> oligo:. >>>> point_mutation:. >>>> polyA_site:. >>>> protein_binding_site:. >>>> pseudogene:. >>>> region:. >>>> regulatory_region:. >>>> rescue_fragment:. >>>> scaffold:. >>>> sequence_variant:. >>>> snRNA:. >>>> snoRNA:. >>>> tRNA:. >>>> three_prime_UTR:. >>>> transcription_start_site:. >>>> transposable_element:. >>>> transposable_element_insertion_site:. 3116 >>>> >>>> >>>> yeast gff type:source count >>>> ftp://genome-ftp.stanford.edu/pub/yeast/data_download/ >>>> chromosomal_feature/saccharomyces_cerevisiae.gff >>>> ------------------------- >>>> ARS:SGD >>>> CDS:SGD >>>> binding_site:SGD >>>> centromere:SGD >>>> chromosome:SGD >>>> gene:SGD >>>> insertion:SGD >>>> intron:SGD >>>> ncRNA:SGD >>>> nc_primary_transcript:SGD >>>> nucleotide_match:SGD >>>> pseudogene:SGD >>>> rRNA:SGD >>>> region:SGD >>>> region:landmark >>>> repeat_family:SGD >>>> repeat_region:SGD >>>> snRNA:SGD >>>> snoRNA:SGD >>>> tRNA:SGD >>>> telomere:SGD >>>> transposable_element:SGD >>>> transposable_element_gene:SGD >>>> >>>> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 >>>> -- gilbertd@indiana.edu -- http://marmot.bio.indiana.edu/ >>>> >>>> >>>> >>>> ------------------------------------------------------- >>>> This SF.Net email is sponsored by the 'Do More With Dual!' webinar >>>> happening >>>> July 14 at 8am PDT/11am EDT. We invite you to explore the latest in >>>> dual >>>> core and dual graphics technology at this free one hour event hosted >>>> by HP, AMD, and NVIDIA. To register visit >>>> http://www.hp.com/go/dualwebinar >>>> _______________________________________________ >>>> Gmod-gbrowse mailing list >>>> Gmod-gbrowse@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >>>> >>> >>> >> -- >> ---------------------------------------------------------------------- >> -- >> Scott Cain, Ph. D. >> cain@cshl.edu >> GMOD Coordinator (http://www.gmod.org/) >> 216-392-3087 >> Cold Spring Harbor Laboratory >> >> >> >> ------------------------------------------------------- >> SF.Net email is Sponsored by the Better Software Conference & EXPO >> September >> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices >> Agile & Plan-Driven Development * Managing Projects & Teams * Testing >> & QA >> Security * Process Improvement & Measurement * >> http://www.sqe.com/bsce5sf >> _______________________________________________ >> Gmod-devel mailing list >> Gmod-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/gmod-devel >> > > > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September > 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices > Agile & Plan-Driven Development * Managing Projects & Teams * Testing > & QA > Security * Process Improvement & Measurement * > http://www.sqe.com/bsce5sf > _______________________________________________ > Gmod-devel mailing list > Gmod-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-devel > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Fri Jul 29 20:20:19 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Jul 29 20:10:54 2005 Subject: [Bioperl-l] Re: Fixing bioperl [was Re: Analysis features] In-Reply-To: <1122650232.10455.31.camel@localhost.localdomain> References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu> <42E909E3.2030102@infobiogen.fr> <1122570166.3288.10.camel@localhost.localdomain> <1122650232.10455.31.camel@localhost.localdomain> Message-ID: <51a02b5bd508f35301ee3c847b104895@gnf.org> On Jul 29, 2005, at 8:17 AM, Scott Cain wrote: > > The main section of affected code in gmod is the GFF bulk loader, but > after we make the changes to the bioperl API, it shouldn't be too hard > to fix the loader. In fact, some of those changes may have already > started. I remember a few weeks before I release the gmod/chado > package, Hilmar sent out an announcement that he made some changes. You mean around the time of ISMB? I fixed the ontology modules ... they should actually work better now not worse unless you assumed the presence of some bugs ;) > While I should have paid attention then, I was busy getting my release > together, and everything seemed to work, so I ignored it. > Unfortunately, the reason things continued to work was that I forgot to > update my bioperl-live, and as a result, the gmod release doesn't work > with bioperl-live. Scott, what would really help sometimes is if in such a situation you run the bioperl test suite and report the result if there are any failures, especially those that appear potentially connected to your problem. Last time the gmod ontology loader ceased to work the problem would have been readily exposed by the ontology tests in bioperl. It just helps in zooming in on the problem. I'd be eager to help make bioperl work with gmod and vice versa and I'm sure many others are too, but it'll be difficult if we don't work towards this collaboratively. For this I really liked the spirit of Chris' proposal - that's the way to make this work. > [...] > The other section of code that could have been affected but won't be is > the ontology loader. The current ontology loader depends on > Bio::Ontology, but I was already planning on migrating to go-perl for > loading ontologies anyway, so that won't be a problem. I'm closing in on the last bugs in the go-perl integration. It remains to be seen how fast the result is as Chris made me aware in Detroit, but if it works this will give you both worlds at your choosing. -hilmar > > So, who wants to take the lead on this? > > Thanks, > Scott > > > On Thu, 2005-07-28 at 12:42 -0700, Chris Mungall wrote: >> I think the answer may be even more complicated than this. >> >> Lurkers and contributors to the bioperl mailing list may have noticed >> that >> there has been some major obstacles in progressing lately, >> particularly in >> getting a stable release of the code out. bp1.4 is fairly old, 1.5 is >> a >> developers release, though this is the one required by GMOD. >> >> My understanding is that this bottleneck can be traced back to >> changes in >> the SeqFeature and Annotation model. These changes appear to be >> required >> by Bio::SeqFeature::Annotated which is produced by Bio::FeatureIO::gff >> (which in turn is used by the GMOD bulk loader, which is the main >> reason >> GMOD requires 1.5, I believe?). Unfortunately, these changes also >> break >> existing code and have a severe negative impact on memory usage. >> >> Before advising Cyril and others to switch to BFIO::gff I think it's >> important to make sure there is a clear path forward with bioperl. My >> impression is that there is something of a stalemate here. The bioperl >> developers would like to retract the aforementioned changes, but they >> believe they cannot do this without breaking GMOD code. They are also >> extremely uncomfortable about leaving these changes in. Everyone >> gives up >> and starts coding around bioperl. >> >> Here is why the changes were introduced: >> >> BioPerl has a 'scruffy' typing model, whereby feature types >> (primary_tag >> in bioperl) and featureprop types (tags in bioperl) are labels or >> strings. >> In contrast, Chado forces all types to be some class or relation in an >> ontology. >> >> Now obviously I'm rather partial to the Chado model, but that doesn't >> mean >> I think it should be forced upon bioperl. I often use bioperl in >> scruffy >> mode (on scruffy data); or in some combination whereby I map the >> scruffy >> types to ontologies in some non-bioperl code. When using bioperl as a >> middleware component over a nicely organised database, ontology-typed >> mode >> is definitely best. However, the majority of bioperl users (including >> myself) spend a large proportion of their time working with scruffy >> data, >> in which case lightweight scruffy types are more appropriate. >> >> It seems that there is a perfectly simple way of reconciling both >> approaches. We revert bioperl back to the simpler scruffy model. The >> majority of users and developers breathe a sigh of relief. We then >> extend >> SeqFeatureI with something like SeqFeatureAnnotatedI. This forces >> types to >> be stored as OntologyTerms (and I haven't even touched on some of the >> problems here, but at least we are insulating the standard bioperl >> layer >> that 99% of users use from these issues). All classes implementing >> SFAI >> will necessarily implement SFI, and the primary_tag and tag_values >> methods >> will be supported (not deprecated) as simple delegations to the >> OntologyTerm objects. >> >> We can then modify BFIO::gff (which is an incredibly useful piece of >> code) >> and get rid of all the dependencies on SO and Bio::Ontology* and >> instead >> allow the user of this module to plug in their own resolver/validator >> - so >> they can choose whether they just want fast scruffy lightweight SFI >> features, or whether they want ontology-typed SFAI features. If the >> latter, then they can choose their own resolver strategy - by a user >> supplied hash, by a copy of SO auto-downloaded from sourceforge, by a >> local chado db, by the genbank->SO mapping table, during parsing vs >> post-parsing, whatever. In fact there is already >> Bio::SeqFeature::Tools::TypeMapper, but currently this is mostly >> concerned >> with helping Bio::SeqFeature::Tools::Unflattener convert scruffy >> genbank >> to something sensible. >> >> GMOD (and perhaps biosql) would use SFAI, everyone else would use the >> simpler SFI. Someone can even get a stable 1.6 release out before all >> the >> SFAI details such as how the resolver would work are finalised. I'd >> really >> like to see 1.6 include a simpler BFIO::gff that can optionally >> produces >> features that aren't SeqFeature::Annotateds, but that's negotiable. >> >> There's vast swathes of both GMOD and BioPerl code I'm not familiar >> with, >> so it's possible my analysis above is flawed in some way. If it is, >> then >> it's up to someone from either camp to speak up! If not, then there's >> no >> excuses for the relevant people to start sorting out this mess by >> commencing with the solution outlined above. >> >> Cheers >> Chris >> >>> >>> Scott >>> >>> >>> On Thu, 2005-07-28 at 18:37 +0200, Cyril Pommier wrote: >>>> Hello, >>>> We are going to store analysis results in chado, and we are of >>>> course >>>> very interressed by these futur evolutions of GFF3/chado. >>>> So we would like to make sure that the parsers and conversions >>>> programs >>>> we are writing now will be compatible with the futur GFF3. >>>> >>>> We are using Bio::SeqFeature::Generic objects that we write with >>>> Bio::Tools::GFF. >>>> >>>> Do you think that Bio::Tools::GFF will be able to handle the new >>>> 'type' >>>> column or is it better to switch to Bio::FeatureIO::gff ? >>>> >>>> Thanks in advance for any advice. >>>> >>>> Cyril >>>> >>>> Don Gilbert wrote: >>>> >>>>> >>>>> Scott, >>>>> >>>>> Your notes in gmod_bulk_load_gff3.pl suggest it is headed in >>>>> same direction I suggest below. More about these todo points >>>>> >>>>>> - address flybase"s use of of analysisfeature combined with >>>>>> feature to >>>>>> give source-type information (in GFF terms). This will need to >>>>>> be addressed in the GBrowse adaptor. >>>>>> - modify the bulk loader to allow "mixed" GFF3 files (that is, >>>>>> containing >>>>>> both analysis results and annotations). See perldoc >>>>>> gmod_bulk_load_gff3.pl >>>>>> for more info >>>>> >>>>> >>>>> Use of chado's analysisfeature table is something others who know >>>>> it better can comment on. But after working with it for a while >>>>> it makes sense to me to use in this way: >>>>> >>>>> For a future GFF -> Chado loader, treat analysis features such as >>>>> gene finding results, BLAST, sim4 as 'analysisfeature type' rather >>>>> than feature CV term type (the ones that now end up with a generic >>>>> 'match' cvterm). In these cases the Analysis table is populated >>>>> with >>>>> program:database_sourcename >>>>> as the basis of this 'analysisfeature type', such as >>>>> match:blastx:na_pe.dros >>>>> match:sim4:DGC >>>>> match:genie:dummy (or maybe exon:genie) >>>>> >>>>> The program:database fits neatly in GFF source field, as >>>>> #ref source type start stop ... >>>>> chr1 blastx:na_pe.dros match 1 100 ... >>>>> chr1 sim4:DGC match 1 100 ... >>>>> >>>>> These can be treated in database adaptor analogously to the CVterm >>>>> table feature types. See at end a list of current GFF feature >>>>> type:source from worm, rice, yeast, fly MODs. Fly and rice use a >>>>> syntax like above and worm gff uses BLAT_EMBL_BEST, instead of >>>>> BLAT:EMBL_BEST. >>>>> >>>>> From POD of your bulk_load_gff3.pl >>>>>> Analysis >>>>>> If you are loading analysis results (ie, BLAT results, gene >>>>>> predictions), you should specify the -a flag. If no arguments are >>>>>> supplied with the -a, then the loader will assume that the results >>>>>> belong to an analysis set with a name that is the concatenation of >>>>>> the source (column 2) and the method (column 3) with an underscore >>>>>> in between. >>>>> >>>>> "... then the loader will assume that the results belong to an >>>>> analysis table row with a program name and database source name >>>>> taken from Source (column 2, colon separated program:sourcename), >>>>> with a SOFA feature type taken from Method (column 3). If >>>>> sourcename doesn't apply, e.g. genefinder, don't add or use >>>>> 'dummy'. >>>>> Use the generic 'match' SOFA type if others don't apply." >>>>> [see also http://song.sourceforge.net/gff3-jan04.shtml#ALIGNMENTS] >>>>> >>>>> Note that sourcename of database is a common attribute (all those >>>>> blasts, blats, sim4, ... are run on several different databases). >>>>> >>>>> For that underscore between method and source, where does that go >>>>> into >>>>> database? It is used as parts of program or database sourcename >>>>> names, >>>>> so it may be problematic to add one if not needed. >>>>> >>>>> Oh, I see now from bulk_load_gff3.PLS, you are creating a 'Name' >>>>> entry >>>>> for analysis table. This probably is less useful than using Program >>>>> and Sourcename fields as flybase does, which comes from the common >>>>> usage where people run various programs, with various database >>>>> sources >>>>> and want to plop the results into a database easily. These go into >>>>> those >>>>> two fields directly, no need to create or parse a Name entry >>>>> (which can be and is null in flybase data). >>>>> >>>>>> my $search_analysis >>>>>> = $db->prepare("SELECT analysis_id FROM analysis WHERE name=?"); >>>>> >>>>> I think it would be better as >>>>> my $search_analysis >>>>> = $db->prepare("SELECT analysis_id FROM analysis WHERE program=? >>>>> and >>>>> sourcename=?"); >>>>> >>>>>> Otherwise, the argument provided with -a will be taken >>>>>> as the name of the analysis set. Either way, the analysis set must >>>>>> already be in the analysis table. The easist way to do this is to >>>>>> insert it directly in the psql shell: >>>>>> >>>>>> INSERT INTO analysis (name, program, programversion) >>>>>> VALUES ('genscan 2005-2-28','genscan','5.4'); >>>>> >>>>> My choice would be to populate the analysis table from GFF data, >>>>> rather >>>>> than expect prepraration by user (or as another option). >>>>> >>>>> INSERT INTO analysis (program, sourcename) >>>>> VALUES ('tblastx','na_baylorf1_scfchunk.dpse'); >>>>> INSERT INTO analysis (program, sourcename) >>>>> VALUES ('sim4','na_gb.dmel'); >>>>> INSERT INTO analysis (program, sourcename, programversion) >>>>> VALUES ('genie_masked','dummy', '1.0'); >>>>> >>>>>> There are other columns in the analysis table that are optional; >>>>>> see >>>>>> the schema documentation and '\d analysis' in psql for more >>>>>> information. >>>>>> >>>>> .... >>>>>> A planned addtion to the functionality of handling analysis >>>>>> results >>>>>> is to allow "mixed" GFF files, where some lines are analysis >>>>>> results >>>>>> and some are not. >>>>> >>>>> This is the case for drosophila GFF now (see others also below). If >>>>> you make the default assumption that if ($method =~ /.*match/) and >>>>> ($source =~ m/([^:]+):(.+)/), you should get all/most of >>>>> analysisfeature types, and probably not anything else. >>>>> >>>>>> Additionally, one will be able to supply lists of >>>>>> types (optionally with sources) and their associated entry in the >>>>>> analysis table. The format will probably be tag value pairs: >>>>>> >>>>>> --analysis match:Rice_est=rice_est_blast, \ >>>>>> match:Maize_cDNA=maize_cdna_blast, \ >>>>>> mRNA=genscan_prediction,exon=genscan_prediction >>>>> >>>>> My suggestion for this (as per GFF source,type columns) would be >>>>> --analysis match:program:sourcename ... >>>>> --analysis match:blast:Rice_est,match:blast:Maize_cDNA,\ >>>>> mRNA:genscan:dummy, exon:genscan:dummy >>>>> >>>>> I guess the 'dummy' data sourcename need not be added; flybase >>>>> uses it >>>>> to keep that field not-null, but it isn't required by the schema. >>>>> >>>>> Here are some snippets from the ChadoFC adaptor I modified >>>>> from yours (will get into cvs.sf.net 'real soon'), showing that >>>>> it isn't much work to add this as an analog to how cvterm types >>>>> are used. >>>>> >>>>> -- Don >>>>> >>>>> ## Bio::DB::Das::ChadoFC.pm, part of new() - load analysis types >>>>> ## treat similar to CV table types >>>>> >>>>> sub getAnalysisFeatureHash >>>>> { >>>>> my $self= shift; >>>>> >>>>> my $dbh= $self->dbh(); >>>>> my $sth = $dbh->prepare("select analysis_id,program,sourcename from >>>>> analysis") >>>>> or warn "unable to prepare select cvterms"; >>>>> $sth->execute or $self->throw("unable to select cvterms"); >>>>> >>>>> my(%term2name,%name2term) = ({},{}); >>>>> >>>>> while (my $hashref = $sth->fetchrow_hashref) { >>>>> >>>>> ## this is dgg syntax of analysis feature names for GFF >>>>> ## all have generic 'match' method and program:source as 'source' >>>>> ## a problem, want other main types: EST_match:xxx, mRNA:genie .. >>>>> etc. >>>>> my $anfeat= >>>>> "match:".$hashref->{program}.":".$hashref->{sourcename}; >>>>> >>>>> $term2name{ $hashref->{analysis_id} } = $anfeat; >>>>> $name2term{ $anfeat } = $hashref->{analysis_id}; >>>>> } >>>>> $self->an_term2name(\%term2name); >>>>> $self->an_name2term(\%name2term); >>>>> } >>>>> >>>>> ## Das::ChadoFC::Segment snippets >>>>> sub features { >>>>> $self->{has_anatype}=0; >>>>> my $sql_range = ''; >>>>> my ($interbase_start,$rend,$srcfeature_id,$sql_types); >>>>> unless ($feature_id) { >>>>> $sql_range = $self->sql_range($rangetype); >>>>> >>>>> $sql_types = $self->sql_types($types, -1); # dgg >>>>> >>>>> $srcfeature_id = $self->{srcfeature_id}; >>>>> } >>>>> ... >>>>> elsif($self->{has_anatype}) { >>>>> $from_part .= "left join analysisfeature af using (feature_id) "; >>>>> } >>>>> >>>>> >>>>> sub sql_types >>>>> .. >>>>> $valid_type = $factory->name2term($temp_type); >>>>> $is_anatype= 0; >>>>> unless ($valid_type) { >>>>> $valid_type = $factory->an_name2term($temp_type); >>>>> $self->{has_anatype}= $is_anatype= 1 if ($valid_type); >>>>> } >>>>> .. >>>>> ## leave out extra invalid types >>>>> if (!$valid_type) { >>>>> ### skip >>>>> } elsif ($temp_dbxref) { >>>>> $sql_types .= $orsql."(f.type_id = $valid_type and fd.dbxref_id = >>>>> $temp_dbxref)"; >>>>> } elsif($is_anatype) { >>>>> $sql_types .= $orsql."(af.analysis_id = $valid_type)"; #<<< >>>>> } else { >>>>> $sql_types .= $orsql."(f.type_id = $valid_type)"; >>>>> } >>>>> >>>>> >>>>> Lists of GFF feature type:source from some current MOD data >>>>> where * are probably analysisfeature types (program:database) >>>>> >>>>> rice gff type:source >>>>> ftp://ftp.gramene.org/pub/gramene/release17/data/ >>>>> sequence_annotation/ >>>>> gff3/ >>>>> -------------------- >>>>> CDS:known >>>>> CDS:tigr >>>>> EST:cmap >>>>> EST_match:Barley (? might be EST_match:someprogram:Barley) >>>>> EST_match:Maize >>>>> EST_match:Millet >>>>> EST_match:Rice >>>>> EST_match:Sorghum >>>>> EST_match:Wheat >>>>> cDNA_match:Rice >>>>> cross_genome_match:Maize >>>>> cross_genome_match:Rice >>>>> cross_genome_match:Sorghum >>>>> * exon:FgenesH:Monocot >>>>> exon:known >>>>> exon:tigr >>>>> five_prime_UTR:tigr >>>>> gene:known >>>>> gene:tigr >>>>> * mRNA:FgenesH:Monocot >>>>> mRNA:known >>>>> mRNA:tigr >>>>> microsatellite:cmap >>>>> three_prime_UTR:known >>>>> three_prime_UTR:tigr >>>>> transposable_element_insertion_site:cmap >>>>> >>>>> worm gff type:source >>>>> ftp://ftp.wormbase.org/pub/wormbase/species/elegans/ >>>>> genome_feature_tables/GFF3/ >>>>> ---------------------- >>>>> CDS:Coding_transcript >>>>> * CDS:Genefinder >>>>> CDS:Transposon_CDS >>>>> CDS:history >>>>> * CDS:twinscan >>>>> * EST_match:BLAT_EST_BEST (~ EST_match:BLAT:EST_BEST) >>>>> * EST_match:BLAT_EST_OTHER >>>>> PCR_product:GenePair_STS >>>>> PCR_product:Orfeome >>>>> RNAi_reagent:RNAi_primary >>>>> RNAi_reagent:RNAi_secondary >>>>> SNP:Allele >>>>> binding_site:binding_site >>>>> * cDNA_match:BLAT_mRNA_BEST (~ cDNA_match:BLAT:mRNA_BEST ) >>>>> * cDNA_match:BLAT_mRNA_OTHER >>>>> clone_end:. >>>>> clone_start:. >>>>> complex_substitution :Allele >>>>> deletion:Allele >>>>> exon:Coding_transcript >>>>> * exon:Genefinder >>>>> exon:Non_coding_transcript >>>>> exon:Pseudogene >>>>> exon:Transposon_CDS >>>>> exon:history >>>>> exon:miRNA >>>>> exon:rRNA >>>>> exon:scRNA >>>>> exon:snRNA >>>>> exon:snoRNA >>>>> exon:tRNA >>>>> * exon:tRNAscan-SE-1.23 >>>>> * exon:twinscan >>>>> experimental_result_region:Expr_profile >>>>> experimental_result_region:cDNA_for_RNAi >>>>> * expressed_sequence_match:BLAT_OST_BEST (~ >>>>> expressed_sequence_match:BLAT:OST_BEST ) >>>>> * expressed_sequence_match:BLAT_OST_OTHER >>>>> five_prime_UTR:Coding_transcript >>>>> gene:Coding_transcript >>>>> gene:gene >>>>> gene:history >>>>> gene:landmark >>>>> insertion:Allele >>>>> inverted_repeat:inverted >>>>> mRNA:Coding_transcript >>>>> * mRNA:Genefinder >>>>> mRNA:Transposon_CDS >>>>> mRNA:history >>>>> * mRNA:twinscan >>>>> miRNA:miRNA >>>>> nc_primary_transcript:Non_coding_transcript >>>>> * nucleotide_match:BLAT_EMBL_BEST (~ >>>>> nucleotide_match:BLAT:EMBL_BEST ) >>>>> * nucleotide_match:BLAT_EMBL_OTHER >>>>> * nucleotide_match:BLAT_TC1_BEST >>>>> * nucleotide_match:BLAT_TC1_OTHER >>>>> * nucleotide_match:BLAT_ncRNA_BEST >>>>> * nucleotide_match:BLAT_ncRNA_OTHER >>>>> * nucleotide_match:TEC_RED >>>>> * nucleotide_match:waba_coding >>>>> * nucleotide_match:waba_strong >>>>> * nucleotide_match:waba_weak >>>>> oligo:. >>>>> operon:operon >>>>> polyA_signal_sequence:polyA_signal_sequence >>>>> polyA_site:polyA_site >>>>> processed_transcript:gene >>>>> protein_coding_primary_transcript:Coding_transcript >>>>> * protein_match:wublastx >>>>> pseudogene:Pseudogene >>>>> pseudogene:history >>>>> rRNA:rRNA >>>>> reagent:Oligo_set >>>>> region:. >>>>> region:Genbank >>>>> region:Genomic_canonical >>>>> region:Link >>>>> * repeat_region:RepeatMasker >>>>> scRNA:scRNA >>>>> sequence_variant:. >>>>> sequence_variant:Allele >>>>> snRNA:snRNA >>>>> snoRNA:snoRNA >>>>> substitution:Allele >>>>> tRNA:tRNA >>>>> * tRNA:tRNAscan-SE-1.23 >>>>> tandem_repeat:tandem >>>>> three_prime_UTR:Coding_transcript >>>>> trans_splice_acceptor_site:SL1 >>>>> trans_splice_acceptor_site:SL2 >>>>> transcript:SAGE_transcript >>>>> * translated_nucleotide_match:BLAT_NEMATODE (~ >>>>> translated_nucleotide_match:BLAT:NEMATODE ) >>>>> transposable_element:Transposon >>>>> transposable_element:Transposon_CDS >>>>> transposable_element_insertion_site:Allele >>>>> transposable_element_insertion_site:Mos_insertion_allele >>>>> >>>>> >>>>> fly gff type:source >>>>> ftp://ftp.flybase.net/genomes/dmel/current/gff/ >>>>> ----------------------- >>>>> BAC:. >>>>> CDS:. >>>>> aberration_junction:. >>>>> chromosome:. >>>>> chromosome_arm:. >>>>> chromosome_band:. >>>>> enhancer:. >>>>> exon:. >>>>> five_prime_UTR:. >>>>> gene:. >>>>> insertion_site:. >>>>> intron:. >>>>> mRNA:. >>>>> * match:RNAiHDP >>>>> * match:assembly:path >>>>> * match:blastx:aa_SPTR.dmel >>>>> * match:blastx:aa_SPTR.insect >>>>> * match:blastx:aa_SPTR.othinv >>>>> * match:blastx:aa_SPTR.othvert >>>>> * match:blastx:aa_SPTR.plant >>>>> * match:blastx:aa_SPTR.primate >>>>> * match:blastx:aa_SPTR.rodent >>>>> * match:blastx:aa_SPTR.worm >>>>> * match:blastx:aa_SPTR.yeast >>>>> * match:genscan >>>>> * match:repeatmasker >>>>> * match:sim4:na_ARGs.dros >>>>> * match:sim4:na_ARGsCDS.dros >>>>> * match:sim4:na_DGC_dros >>>>> * match:sim4:na_dbEST.diff.dmel >>>>> * match:sim4:na_dbEST.same.dmel >>>>> * match:sim4:na_gadfly_dmel_r2 >>>>> * match:sim4:na_gb.dmel >>>>> * match:sim4:na_gb.tpa.dmel >>>>> * match:sim4:na_smallRNA.dros >>>>> * match:sim4:na_transcript_dmel_r31 >>>>> * match:sim4:na_transcript_dmel_r32 >>>>> * match:tRNAscan-SE:. >>>>> * match:tblastx:na_agambiae >>>>> * match:tblastx:na_dbEST.insect >>>>> * match:tblastx:na_dpse >>>>> * match_part:RNAiHDP >>>>> * match_part:assembly:path >>>>> * match_part:blastx:aa_SPTR.dmel >>>>> * match_part:blastx:aa_SPTR.insect >>>>> * match_part:blastx:aa_SPTR.othinv >>>>> * match_part:blastx:aa_SPTR.othvert >>>>> * match_part:blastx:aa_SPTR.plant >>>>> * match_part:blastx:aa_SPTR.primate >>>>> * match_part:blastx:aa_SPTR.rodent >>>>> * match_part:blastx:aa_SPTR.worm >>>>> * match_part:blastx:aa_SPTR.yeast >>>>> * match_part:genscan >>>>> * match_part:repeatmasker >>>>> * match_part:sim4:na_ARGs.dros >>>>> * match_part:sim4:na_ARGsCDS.dros >>>>> * match_part:sim4:na_DGC_dros >>>>> * match_part:sim4:na_dbEST.diff.dmel >>>>> * match_part:sim4:na_dbEST.same.dmel >>>>> * match_part:sim4:na_gadfly_dmel_r2 >>>>> * match_part:sim4:na_gb.dmel >>>>> * match_part:sim4:na_gb.tpa.dmel >>>>> * match_part:sim4:na_smallRNA.dros >>>>> * match_part:sim4:na_transcript_dmel_r31 >>>>> * match_part:sim4:na_transcript_dmel_r32 >>>>> * match_part:tRNAscan-SE:. >>>>> * match_part:tblastx:na_agambiae >>>>> * match_part:tblastx:na_dbEST.insect >>>>> * match_part:tblastx:na_dpse >>>>> mature_peptide:. >>>>> ncRNA:. >>>>> oligo:. >>>>> point_mutation:. >>>>> polyA_site:. >>>>> protein_binding_site:. >>>>> pseudogene:. >>>>> region:. >>>>> regulatory_region:. >>>>> rescue_fragment:. >>>>> scaffold:. >>>>> sequence_variant:. >>>>> snRNA:. >>>>> snoRNA:. >>>>> tRNA:. >>>>> three_prime_UTR:. >>>>> transcription_start_site:. >>>>> transposable_element:. >>>>> transposable_element_insertion_site:. 3116 >>>>> >>>>> >>>>> yeast gff type:source count >>>>> ftp://genome-ftp.stanford.edu/pub/yeast/data_download/ >>>>> chromosomal_feature/saccharomyces_cerevisiae.gff >>>>> ------------------------- >>>>> ARS:SGD >>>>> CDS:SGD >>>>> binding_site:SGD >>>>> centromere:SGD >>>>> chromosome:SGD >>>>> gene:SGD >>>>> insertion:SGD >>>>> intron:SGD >>>>> ncRNA:SGD >>>>> nc_primary_transcript:SGD >>>>> nucleotide_match:SGD >>>>> pseudogene:SGD >>>>> rRNA:SGD >>>>> region:SGD >>>>> region:landmark >>>>> repeat_family:SGD >>>>> repeat_region:SGD >>>>> snRNA:SGD >>>>> snoRNA:SGD >>>>> tRNA:SGD >>>>> telomere:SGD >>>>> transposable_element:SGD >>>>> transposable_element_gene:SGD >>>>> >>>>> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 >>>>> -- gilbertd@indiana.edu -- http://marmot.bio.indiana.edu/ >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------- >>>>> This SF.Net email is sponsored by the 'Do More With Dual!' webinar >>>>> happening >>>>> July 14 at 8am PDT/11am EDT. We invite you to explore the latest >>>>> in dual >>>>> core and dual graphics technology at this free one hour event >>>>> hosted >>>>> by HP, AMD, and NVIDIA. To register visit >>>>> http://www.hp.com/go/dualwebinar >>>>> _______________________________________________ >>>>> Gmod-gbrowse mailing list >>>>> Gmod-gbrowse@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >>>>> >>>> >>>> >>> -- >>> --------------------------------------------------------------------- >>> --- >>> Scott Cain, Ph. D. >>> cain@cshl.edu >>> GMOD Coordinator (http://www.gmod.org/) >>> 216-392-3087 >>> Cold Spring Harbor Laboratory >>> >>> >>> >>> ------------------------------------------------------- >>> SF.Net email is Sponsored by the Better Software Conference & EXPO >>> September >>> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices >>> Agile & Plan-Driven Development * Managing Projects & Teams * >>> Testing & QA >>> Security * Process Improvement & Measurement * >>> http://www.sqe.com/bsce5sf >>> _______________________________________________ >>> Gmod-devel mailing list >>> Gmod-devel@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/gmod-devel >>> >> >> >> >> >> ------------------------------------------------------- >> SF.Net email is Sponsored by the Better Software Conference & EXPO >> September >> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices >> Agile & Plan-Driven Development * Managing Projects & Teams * Testing >> & QA >> Security * Process Improvement & Measurement * >> http://www.sqe.com/bsce5sf >> _______________________________________________ >> Gmod-devel mailing list >> Gmod-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/gmod-devel > -- > ----------------------------------------------------------------------- > - > Scott Cain, Ph. D. > cain@cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From mayagao1999 at yahoo.com Fri Jul 29 20:37:35 2005 From: mayagao1999 at yahoo.com (Alex Zhang) Date: Fri Jul 29 20:29:59 2005 Subject: [Bioperl-l] A problem about a subroutin in my code Message-ID: <20050730003735.48916.qmail@web53506.mail.yahoo.com> Dear all, Sorry to bother you. I need some help on my code. I have an input file named "origin8.txt" which holds 200 short sequences of width 8. My code is to use each short sequence from "origin8.txt" as a template to generate 100 short sequences of the same width and store them in a txt file A. Then the code will read 100 short sequences from the txt file A and 100 long sequences of width 200 from a txt file B , and then replaced a substring of each long sequence using each short sequence. This code will lead to two txt files C and D. File C will hold 100 replaced long sequences. In other words, I want to input "origin8.txt" to get 200 File D. My code can generates 200 File D but each of them holds nothing. So I guess the problem is caused by a failure of passing the data to a subroutine named "make_file". Can anybody suggest me how to modify that? Thank you very much in advance! Sincerely, Alex My code: ******************************************************************* #!/usr/bin/perl use strict; use warnings; my (@origin, $y); my $N_Sequences = 100; my @Alphabet = split(//,'ACGT'); my $P_Consensus = 0.85; # This is the probability of dominant letter # ====== Globals ========================== my @Probabilities; # Stores the probability of each character # ====== Program ========================== open (ORIGIN, "< origin8.txt"); # This file holds 200 sequences used for motif template chomp (@origin = ); close ORIGIN; for ($y=0; $y<=$#origin; $y++) { my @Motif = split(//,'$origin[$y]'); # This is a loop to get the motif template from origin8 open (OUT_NORM, ">short_sequences8_[$y].txt") or die "Unable to open file :$!"; for (my $i=0; $i < $N_Sequences; $i++) { for (my $j=0; $j < scalar(@Motif); $j++) { loadConsensusCharacter($Motif[$j]); addNoiseToDistribution(); convertToIntervals(); print OUT_NORM (getRandomCharacter(rand(1.0))); } print OUT_NORM "\n"; make_files(); } } exit(); # ====== Subroutines ======================= # sub loadConsensusCharacter { my ($char) = @_; my $Found = 'FALSE'; for (my $i=0; $i < scalar(@Alphabet); $i++) { if ( $char eq $Alphabet[$i]) { $Probabilities[$i] = 1.0; $Found = 'TRUE'; } else { $Probabilities[$i] = 0.0; } } if ($Found eq 'FALSE') { die("Panic: Motif-Character\"$char\" was not found in Alphabet. Aborting.\n"); } return(); } # ========================================== sub addNoiseToDistribution { my $P_NonConsensus = ( 1.0-$P_Consensus) / (scalar(@Alphabet) - 1); for (my $i=0; $i < scalar(@Probabilities); $i++) { if ( $Probabilities[$i] == 1.0 ) { $Probabilities[$i] = $P_Consensus; } else { $Probabilities[$i] = $P_NonConsensus; } } return(); } # ========================================== sub convertToIntervals { my $Sum = 0; for (my $i=1; $i < scalar(@Probabilities); $i++) { $Probabilities[$i] += $Probabilities[$i-1]; } return(); } # ========================================== sub getRandomCharacter { my ($RandomNumber) = @_; my $i=0; for ($i=0; $i < scalar(@Probabilities); $i++) { if ($Probabilities[$i] > $RandomNumber) { last; } } return($Alphabet[$i]); } # ========================================== sub make_files { my (@short, @long,$x,$r, $output_norm); open (SHORT, "< short_sequences8_[$y].txt"); chomp (@short = ); close SHORT; open (LONG, "< long_sequences.txt"); chomp (@long = ); close LONG; open (OUT_INITIAL, "> output8_[$y]1.txt"); open (OUT_REPLACED, "> output8_[$y]2.txt"); for ($x=0; $x<=$#short; $x++) { $r=2; print OUT_INITIAL ">SeqName$x\n$long[$x]\n"; print OUT_REPLACED "SeqName$x\n" . substr($long[$x], $r, length $short[$x]) . "\n";} close OUT_INITIAL; close OUT_REPLACED; } ******************************************************************* Input file "origin8.txt" holds 200 sequences as: TTTATAAT TGTCAATG CGTTGATG CGTCCTAG GGCTTCCA ATTAGCCT GTCCTGAT TGTAAATC CGCTTATT TTGACATA CCTGATAT ATGAATCG CGTCCGAT TGGCCCAT ATCCTGAT TGCCCATT CCCTAACT AAAAAAAA TTTTTTTT CCCCCCCC GGGGGGGG AAAAAAAT AAAAAAAG AAAAAAAC AAAAAACC AAAAAATT AAAAAAGG AAAAAACT AAAAAACG AAAAAACA AAAAACAA AAAACAAA AAACAAAA AACAAAAA ACAAAAAA CAAAAAAA AAAAAATA AAAAATAA AAAATAAA AAATAAAA AATAAAAA ATAAAAAA TAAAAAAA AAAAAAGA AAAAAGAA AAAAGAAA AAAGAAAA AAGAAAAA AGAAAAAA GAAAAAAA AAAACCAA AACCAAAA CCAAAAAA AAAATTAA AATTAAAA TTAAAAAA AAAAACCC AAAACCCA AAACCCAA AACCCAAA ACCCAAAA CCCAAAAA AAAAATTT AAAATTTA AAATTTAA AATTTAAA ATTTAAAA TTTAAAAA AAAAAGGG AAAAGGGA AAAGGGAA AAGGGAAA AGGGAAAA GGGAAAAA AAAACCCC AAACCCCA AACCCCAA ACCCCAAA CCCCAAAA AAAATTTT AAATTTTA AATTTTA A ATTTTAAA TTTTAAAA AAAAGGGG AAAGGGGA AAGGGGAA AGGGGAAA GGGGAAAA AAACCCCC AACCCCCA ACCCCCAA CCCCCAAA AAATTTTT AATTTTTA ATTTTTAA TTTTTAAA AAAGGGGG AAGGGGGA AGGGGGAA GGGGGAAA AAGGGGGG AGGGGGGA GGGGGGAA AACCCCCC ACCCCCCA CCCCCCAA AATTTTTT ATTTTTTA TTTTTTAA ATTTTTTT TTTTTTTA ACCCCCCC CCCCCCCA AGGGGGGG GGGGGGGA ATTTTTTT TTTTTTTA ATAAAATA AATAAATA AAATAATA AAAATATA ACAAAACA AACAAACA AAACAACA AAAACACA AGAAAAGA AAGAAAGA AAAGAAGA AAAAGAGA ATAAAAGA ATAAAACA AGAAAATA AGAAAACA ACAAAAGA ACAAAATA ATTAAATA AATTAATA AAATTATA ACCAAACA AACCAACA AAACCACA AGGAAAGA AAGGAAGA AAAGGAGA ATTTAATA AATTTATA ACCCAACA AACCCACA AGGGAAGA AAGGGAGA ATTTAACA ATTTAAGA AATTTACA AATTTAGA ACCCAATA ACCCAAGA AACCCATA AACCCAGA AGGGAACA AGGGAATA AAGGGATA AAGGGACA TTGGGACA CCGGGACA< BR>AGAAGGGA TGCCCATA TAAAAAAT TGCCTATA CCGTAGTC ACTTGACT CTGATCCC TGTGACTA CCTGATCC CCTGAACC TGATCACG GGGTAACC CTTTTGAA TTGTATGA CCTGATAA CTGGTTAG CCCCGACC TTGGGGAC GGTTTGAC GCTTAGAC GTTACACC TTGTACCA TGGTACCA CCGTACAT CCCTTGCC GTGTTGGT ATCGATCG ACGTACGT TCAGTCAG GCTATACG GTCCATAC CCGTCCGT ATATATCC GTGTCCCC --------------------------------- Yahoo! Mail for Mobile Take Yahoo! Mail with you! Check email on your mobile phone. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From jason.stajich at duke.edu Sat Jul 30 02:20:44 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat Jul 30 02:12:42 2005 Subject: [Bioperl-l] constructing a tree object In-Reply-To: <003201c59485$f68500c0$ab05a8c0@bradleydell> References: <003201c59485$f68500c0$ab05a8c0@bradleydell> Message-ID: <5087f484f5a8d9088172fa4b1d26fba9@duke.edu> See the FAQ on IO::String and SeqIO same thing applies. -jason On Jul 29, 2005, at 2:39 PM, Michael Bradley wrote: > Can anyone tell me how to do $treeObj = Bio::TreeIO->new(-file > "somefile", -format 'newick' ) from a variable instead of a file? > > Suppose that my tree is stored in $treestring. I would like to do > something like : $treeObj = Bio::TreeIO->new(-$treestring, -format > 'newick' ) . > > Thanks, > > Mike Bradley > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich http://www.duke.edu/~jes12 jason.stajich -at- duke.edu From rvosa at sfu.ca Sat Jul 30 10:01:30 2005 From: rvosa at sfu.ca (Rutger Vos) Date: Sat Jul 30 09:51:45 2005 Subject: [Bioperl-l] Bio:: namespace question Message-ID: <42EB883A.7060605@sfu.ca> Dear fellow perl-using-biologists, I want to submit to CPAN a module for phylogenetic analysis. I am trying to decide what top-level namespace to use. Can I use Bio::Phylo or something like that? Or is Bio:: reserved for BioPerl? Thanks! Rutger -- ++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar ++++++++++++++++++++++++++++++++++++++++++++ From jason.stajich at duke.edu Sat Jul 30 12:28:37 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat Jul 30 12:19:53 2005 Subject: [Bioperl-l] Bio:: namespace question In-Reply-To: <42EB883A.7060605@sfu.ca> References: <42EB883A.7060605@sfu.ca> Message-ID: <40c2ad0d56c965bb329ed08ef934f209@duke.edu> You'll see there are other non-bioperl modules in CPAN under the Bio:: namespace so we haven't got any reservations on what people can submit to CPAN. We have some phylogenetics related modules scattered in bioperl to logically deal with data parsing mostly (Bio::Tree, Bio::TreeIO, Bio::Tools::Phylo) and running (Bio::Tools::Run::Phylo). My concern is mostly with confusing people about how things interrelate and can be used together and not having a namespace clash, but we are not currently using Bio::Phylo. I don't know if any other folks have opinions -- I suppose waiting to see what I'd say? -jason On Jul 30, 2005, at 7:01 AM, Rutger Vos wrote: > Dear fellow perl-using-biologists, > > I want to submit to CPAN a module for phylogenetic analysis. I am > trying to decide what top-level namespace to use. Can I use Bio::Phylo > or something like that? Or is Bio:: reserved for BioPerl? > > Thanks! > > Rutger > > -- > ++++++++++++++++++++++++++++++++++++++++++++ > Rutger Vos, PhD. candidate > Department of Biological Sciences > Simon Fraser University > 8888 University Drive > Burnaby, BC, V5A1S6 > Phone: 604-291-5625 Fax: 604-291-3496 > Personal site: http://www.sfu.ca/~rvosa > FAB* lab: http://www.sfu.ca/~fabstar > ++++++++++++++++++++++++++++++++++++++++++++ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich http://www.duke.edu/~jes12 jason.stajich -at- duke.edu From hlapp at gmx.net Sat Jul 30 18:56:20 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Jul 30 18:46:46 2005 Subject: [Bioperl-l] Bio:: namespace question In-Reply-To: <40c2ad0d56c965bb329ed08ef934f209@duke.edu> References: <42EB883A.7060605@sfu.ca> <40c2ad0d56c965bb329ed08ef934f209@duke.edu> Message-ID: <3d1bb154cc3225a57c9e9ad031b57fc1@gmx.net> On Jul 30, 2005, at 9:28 AM, Jason Stajich wrote: > I don't know if any other folks have opinions -- I suppose waiting to > see what I'd say? > Right - egoistically I'd reserve Bio::Phylo for future Bioperl use, but that's not fair really ;-) -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jim.hu.biobio at gmail.com Sat Jul 30 16:05:42 2005 From: jim.hu.biobio at gmail.com (Jim Hu) Date: Sun Jul 31 13:13:56 2005 Subject: [Bioperl-l] Newbie gbrowse help - script to make gff from fasta In-Reply-To: <42E96847.1060900@ebi.ac.uk> References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu> <42E909E3.2030102@infobiogen.fr> <1122570166.3288.10.camel@localhost.localdomain> <42E96847.1060900@ebi.ac.uk> Message-ID: <1AC69124-28AD-48B2-B910-7C5D8057908E@gmail.com> 1) Is there an existing script to convert a refseq fasta into a gff flatfile compatible with gbrowse 1.62? bp_genbank2gff.pl --accession NC_001416 --stdout > lambda.gff requires some additional tweaking/parsing as far as I can tell. I know that I'll probably eventually load these into mySQL (but for phage genomes, is it worth it?), but I wanted to learn via the flatfiles first. 2) Is there a repository of standard track stanzas and aggregators that match the feature types generated by such scripts? 3) Is there a FAQ I missed that I should have consulted first? 4) Is this even the right listserv for these questions? Didn't want to reinvent any wheels if possible. Sorry if this is off topic. Thanks! Jim Hu