From bartwiegmans at gmail.com Sun Apr 1 07:55:01 2012 From: bartwiegmans at gmail.com (Bart Wiegmans) Date: Sun, 1 Apr 2012 13:55:01 +0200 Subject: [Bioperl-l] Implementing Bioperl6 for GSoC 2012 Message-ID: Hello all BioPerl-ers, This is my first e-mail to the list and thus my introduction. My name is Bart Wiegmans, I study biology at the University of Groningen, the netherlands. It is my goal to implement bioperl6 this summer as part of the GSoC program. Why would I want such a thing? For a start, I'd like to learn more about bioinformatics. As I told you I study biology, so this has an obvious advantage for me. Also, I'd like to learn perl6 well, and this is only possible when one writes a significant program in it. Moreover, I think perl6 is awesome, and having a real-world toolkit like bioperl out there might just be enough to develop a significant community using it. As a third, I think perl5's object support is crufty, and difficult to learn for many people. These people include biologists who might not be inclined to learn it, and rather use some other tools instead. As to who I am, I already told you my name. I am 24 years old, and study biology at an undergraduate level. (For those interested, yes this means I haven't exactly been flying through my courses :-)). I have been programming computers ever since I was 16 years old, and earlier if you count BASIC. Starting out with C, most of that has been websites (in PHP), scripts (in Perl), and other smallish programs (in Java / Perl). For example, I implemented a parser and decoder for the dirac video specification as part of GSoC 2008, and a script which reads the NIH bookshelf website and translates this into ePub e-books. Read quite a few of them that way. Aside from my motivation and capabilities, two other factors somewhat complicate my involvement with GSoC. The first is that the academic year ends halfway in July in the netherlands, not in may as in the USA and in many other countries. This means that I am not 'free' in a real sense before that time. Also, I have a day job as a PHP programmer for a local online students' magazine, which also takes some time. Which is unfortunate, because I'd rather spend my time writing useful programs; hence, if you would accept me as a student I plan to take leave from this job during the period of GSoC. Anyway, I realize this has been enough information for any interested reader. If there is any interest on your side, I frequent freenode under the nickname brrt. Other than that and this e-mail address, I don't have much of an online presence. Kind regards, Bart Wiegmans From l.m.timmermans at students.uu.nl Sun Apr 1 10:38:13 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Sun, 1 Apr 2012 16:38:13 +0200 Subject: [Bioperl-l] Implementing Bioperl6 for GSoC 2012 In-Reply-To: References: Message-ID: On Sun, Apr 1, 2012 at 1:55 PM, Bart Wiegmans wrote: > This is my first e-mail to the list and thus my introduction. My name > is Bart Wiegmans, I study biology at the University of Groningen, the > netherlands. It is my goal to implement bioperl6 this summer as part > of the GSoC program. > Cool. Though I am wondering what exactly you want to implement. BioPerl as a whole is 2000 modules, not even a dozen GSOC students could implement that. You will have to focus on something. Also, I'd like to learn perl6 well, and this > is only possible when one writes a significant program in it. > Moreover, I think perl6 is awesome, and having a real-world toolkit > like bioperl out there might just be enough to develop a significant > community using it. How much time do you expect that to cost? Having to learn a new language means you will get less done that you would ordinarily. This doesn't have to be a problem, but do keep it into account. > As a third, I think perl5's object support is > crufty, and difficult to learn for many people. These people include > biologists who might not be inclined to learn it, and rather use some > other tools instead. > Perl 5's object support can be quite elegant with modern OO frameworks such as Moose and relatives. Sadly, BioPerl itself is based on fairly dated paradigms. Aside from my motivation and capabilities, two other factors somewhat > complicate my involvement with GSoC. The first is that the academic > year ends halfway in July in the netherlands, not in may as in the USA > and in many other countries. This means that I am not 'free' in a real > sense before that time. Also, I have a day job as a PHP programmer for > a local online students' magazine, which also takes some time. Which > is unfortunate, because I'd rather spend my time writing useful > programs; hence, if you would accept me as a student I plan to take > leave from this job during the period of GSoC. > Yeah, I'm familiar with that problem, it's rather unfortunate. > Anyway, I realize this has been enough information for any interested > reader. If there is any interest on your side, I frequent freenode > under the nickname brrt. Other than that and this e-mail address, I > don't have much of an online presence. > Well, then come join us at #bioperl and #perl6 then. Leon From cjfields at illinois.edu Sun Apr 1 21:57:53 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 2 Apr 2012 01:57:53 +0000 Subject: [Bioperl-l] Implementing Bioperl6 for GSoC 2012 In-Reply-To: References: Message-ID: <6D16BB69-A74A-4A7E-B81C-0B33E9A73473@illinois.edu> Bart, I think this is a good idea, but to address one concern up-front: I believe the time frame (possible start in mid-July) will not be in your favor unless you can participate according to Google's schedule. There is some flexibility at the beginning, mainly during the community bonding period, but beyond that there isn't much wiggle room; we're as constrained as you are. The proposals to OBF over the last few years have been highly competitive, and this year will prove even more so as a large number of major Perl-based FLOSS orgs (primarily The Perl Foundation) were not accepted into GSoC. Now for Perl 6: BioPerl6 is a project Philip Mabon and I have already started up on github: https://github.com/cjfields/bioperl6 The code is basically proof-of-concept stuff at this point, and I don't believe it's current to spec last I checked or that it works on either the most current niecza or rakudo. All this was enough of a moving target to make it hard to maintain, but that seems to be settling somewhat now. It's pretty wide open, though, as far as I'm concerned. If you are to pursue this as a proposal, you will need to focus on a specific task or set of classes to implement along with docs and tests. A straight port isn't the best option; you should take advantage of Perl 6's strengths and possibly work on redesign where needed, keeping in mind that some aspects of bioperl might be considered a bit over-designed and out-of-date w/ re: to modern perl dictums. Also, learning a new language is nice, but that isn't the main focus for any GSoC project. At the end of the day, we should have usable code for the community, and if it acts as a entry point for more people to get involved with Perl 6 the better :) (I see that Leon has also chimed in on this with similar comments as well) I will be on and off #bioperl this week (pyrimidine). IRC is also logged in case I need to backlog (provided by one Moritz Lenz): http://irclog.perlgeek.de/bioperl/today chris On Apr 1, 2012, at 6:55 AM, Bart Wiegmans wrote: > Hello all BioPerl-ers, > > This is my first e-mail to the list and thus my introduction. My name > is Bart Wiegmans, I study biology at the University of Groningen, the > netherlands. It is my goal to implement bioperl6 this summer as part > of the GSoC program. > > Why would I want such a thing? For a start, I'd like to learn more > about bioinformatics. As I told you I study biology, so this has an > obvious advantage for me. Also, I'd like to learn perl6 well, and this > is only possible when one writes a significant program in it. > Moreover, I think perl6 is awesome, and having a real-world toolkit > like bioperl out there might just be enough to develop a significant > community using it. As a third, I think perl5's object support is > crufty, and difficult to learn for many people. These people include > biologists who might not be inclined to learn it, and rather use some > other tools instead. > > As to who I am, I already told you my name. I am 24 years old, and > study biology at an undergraduate level. (For those interested, yes > this means I haven't exactly been flying through my courses :-)). I > have been programming computers ever since I was 16 years old, and > earlier if you count BASIC. Starting out with C, most of that has been > websites (in PHP), scripts (in Perl), and other smallish programs (in > Java / Perl). For example, I implemented a parser and decoder for the > dirac video specification as part of GSoC 2008, and a script which > reads the NIH bookshelf website and translates this into ePub e-books. > Read quite a few of them that way. > > Aside from my motivation and capabilities, two other factors somewhat > complicate my involvement with GSoC. The first is that the academic > year ends halfway in July in the netherlands, not in may as in the USA > and in many other countries. This means that I am not 'free' in a real > sense before that time. Also, I have a day job as a PHP programmer for > a local online students' magazine, which also takes some time. Which > is unfortunate, because I'd rather spend my time writing useful > programs; hence, if you would accept me as a student I plan to take > leave from this job during the period of GSoC. > > Anyway, I realize this has been enough information for any interested > reader. If there is any interest on your side, I frequent freenode > under the nickname brrt. Other than that and this e-mail address, I > don't have much of an online presence. > > Kind regards, > Bart Wiegmans > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From joseramonblas at gmail.com Mon Apr 2 01:21:55 2012 From: joseramonblas at gmail.com (=?ISO-8859-1?Q?Jos=E9_Ram=F3n_Blas_Pastor?=) Date: Mon, 2 Apr 2012 07:21:55 +0200 Subject: [Bioperl-l] data into R from Perl arrays Message-ID: Hi, a very simple doubt, but I do not know how to manage this. I want to plot a histogram for all data in 'datos.txt'. a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) dev.off() b) How could I invoke R inside Perl to do the same?? #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; push(@datos,$_); } #now I want a histogram of values in @datos Thanks!! JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 From joseramonblas at gmail.com Mon Apr 2 01:21:55 2012 From: joseramonblas at gmail.com (=?ISO-8859-1?Q?Jos=E9_Ram=F3n_Blas_Pastor?=) Date: Mon, 2 Apr 2012 07:21:55 +0200 Subject: [Bioperl-l] data into R from Perl arrays Message-ID: Hi, a very simple doubt, but I do not know how to manage this. I want to plot a histogram for all data in 'datos.txt'. a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) dev.off() b) How could I invoke R inside Perl to do the same?? #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; push(@datos,$_); } #now I want a histogram of values in @datos Thanks!! JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 From avilella at gmail.com Mon Apr 2 02:24:09 2012 From: avilella at gmail.com (Albert Vilella) Date: Mon, 2 Apr 2012 07:24:09 +0100 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: You may get a faster answer in a generic programming site, like stackoverflow.com... On Mon, Apr 2, 2012 at 6:21 AM, Jos? Ram?n Blas Pastor wrote: > Hi, > > a very simple doubt, but I do not know how to manage this. > > I want to plot a histogram for all data in 'datos.txt'. > > a) by using R: > datos<-scan("datos.txt") > pdf("xh.pdf") > hist(datos) > dev.off() > > > b) How could I invoke R inside Perl to do the same?? > #!/usr/bin/perl > open(DAT,"datos.txt"); > while () { > ?chomp; > ?push(@datos,$_); > } > #now I want a histogram of values in @datos > > Thanks!! > > JR > > -- > Jos? Ram?n Blas - PhD > Dept. Biochemistry - Medicine School > University of Castilla-La Mancha > C Almansa, 14 > 02006 Albacete (Spain) > > Phone: +34 967599200 ext. 2958 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From avilella at gmail.com Mon Apr 2 02:24:09 2012 From: avilella at gmail.com (Albert Vilella) Date: Mon, 2 Apr 2012 07:24:09 +0100 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: You may get a faster answer in a generic programming site, like stackoverflow.com... On Mon, Apr 2, 2012 at 6:21 AM, Jos? Ram?n Blas Pastor wrote: > Hi, > > a very simple doubt, but I do not know how to manage this. > > I want to plot a histogram for all data in 'datos.txt'. > > a) by using R: > datos<-scan("datos.txt") > pdf("xh.pdf") > hist(datos) > dev.off() > > > b) How could I invoke R inside Perl to do the same?? > #!/usr/bin/perl > open(DAT,"datos.txt"); > while () { > ?chomp; > ?push(@datos,$_); > } > #now I want a histogram of values in @datos > > Thanks!! > > JR > > -- > Jos? Ram?n Blas - PhD > Dept. Biochemistry - Medicine School > University of Castilla-La Mancha > C Almansa, 14 > 02006 Albacete (Spain) > > Phone: +34 967599200 ext. 2958 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Mon Apr 2 04:17:56 2012 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 2 Apr 2012 09:17:56 +0100 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> The quickest way to do this specific example is probably to just save your R commands to a file (eg R_commands.R) and then run some kind of system/exec/backtick function in your perl script to invoke R, something like (untested): #!/usr/bin/perl use strict; use warnings; system( 'R --file R_commands.R' ); Alternatively if you want perl and R to be able to interact and pass data back and forth, you could use something like Statistics::R http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm HTH adam On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: > Hi, > > a very simple doubt, but I do not know how to manage this. > > I want to plot a histogram for all data in 'datos.txt'. > > a) by using R: > datos<-scan("datos.txt") > pdf("xh.pdf") > hist(datos) > dev.off() > > > b) How could I invoke R inside Perl to do the same?? > #!/usr/bin/perl > open(DAT,"datos.txt"); > while () { > chomp; > push(@datos,$_); > } > #now I want a histogram of values in @datos > > Thanks!! > > JR > > -- > Jos? Ram?n Blas - PhD > Dept. Biochemistry - Medicine School > University of Castilla-La Mancha > C Almansa, 14 > 02006 Albacete (Spain) > > Phone: +34 967599200 ext. 2958 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Mon Apr 2 06:33:06 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 02 Apr 2012 11:33:06 +0100 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> References: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> Message-ID: <4F798062.4050908@gmail.com> Alternatively you could go for a Perl-only approach using something like GD::Graph::Histogram. Cheers, Roy. On 02/04/2012 09:17, Adam Witney wrote: > > The quickest way to do this specific example is probably to just save > your R commands to a file (eg R_commands.R) and then run some kind of > system/exec/backtick function in your perl script to invoke R, > something like (untested): > > #!/usr/bin/perl use strict; use warnings; system( 'R --file > R_commands.R' ); > > Alternatively if you want perl and R to be able to interact and pass > data back and forth, you could use something like Statistics::R > > http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm > > HTH > > adam > > On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: > >> Hi, >> >> a very simple doubt, but I do not know how to manage this. >> >> I want to plot a histogram for all data in 'datos.txt'. >> >> a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) >> dev.off() >> >> >> b) How could I invoke R inside Perl to do the same?? >> #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; >> push(@datos,$_); } #now I want a histogram of values in @datos >> >> Thanks!! >> >> JR >> >> -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School >> University of Castilla-La Mancha C Almansa, 14 02006 Albacete >> (Spain) >> >> Phone: +34 967599200 ext. 2958 >> >> _______________________________________________ Bioperl-l mailing >> list Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ Bioperl-l mailing > list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Apr 2 08:59:40 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 2 Apr 2012 12:59:40 +0000 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> References: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> Message-ID: Not sure how well it is supported, but there is also Statistics::useR (which has an XS layer for conversing with R). https://metacpan.org/module/Statistics::useR chris On Apr 2, 2012, at 3:17 AM, Adam Witney wrote: > > The quickest way to do this specific example is probably to just save your R commands to a file (eg R_commands.R) and then run some kind of system/exec/backtick function in your perl script to invoke R, something like (untested): > > #!/usr/bin/perl > use strict; > use warnings; > system( 'R --file R_commands.R' ); > > Alternatively if you want perl and R to be able to interact and pass data back and forth, you could use something like Statistics::R > > http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm > > HTH > > adam > > On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: > >> Hi, >> >> a very simple doubt, but I do not know how to manage this. >> >> I want to plot a histogram for all data in 'datos.txt'. >> >> a) by using R: >> datos<-scan("datos.txt") >> pdf("xh.pdf") >> hist(datos) >> dev.off() >> >> >> b) How could I invoke R inside Perl to do the same?? >> #!/usr/bin/perl >> open(DAT,"datos.txt"); >> while () { >> chomp; >> push(@datos,$_); >> } >> #now I want a histogram of values in @datos >> >> Thanks!! >> >> JR >> >> -- >> Jos? Ram?n Blas - PhD >> Dept. Biochemistry - Medicine School >> University of Castilla-La Mancha >> C Almansa, 14 >> 02006 Albacete (Spain) >> >> Phone: +34 967599200 ext. 2958 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bartwiegmans at gmail.com Mon Apr 2 13:10:47 2012 From: bartwiegmans at gmail.com (Bart Wiegmans) Date: Mon, 2 Apr 2012 19:10:47 +0200 Subject: [Bioperl-l] Implementing Bioperl6 for GSoC 2012 In-Reply-To: <6D16BB69-A74A-4A7E-B81C-0B33E9A73473@illinois.edu> References: <6D16BB69-A74A-4A7E-B81C-0B33E9A73473@illinois.edu> Message-ID: Chris, Leon, others, Thank you for your timely responses. So far as the timeframe is concerned, I might be able to get student credits for participating in this projects as it is related to my study. In that case I would have more time free. At any rate, I understand it is suboptimal to start working in july, so I will do my best to make as much time free as possible. I've already checked out the bioper-6 projects as well as the biome project from github. I am not quite sure what scope of project to choose and I was hoping for your advice. File format import / export and database connectivity would come to mind, as these are subjects I am most familiar with. In such a scenario, aside from a set of modules / classes, the end goal would be a script that could search for and import a sequence from a number of popular databases, and save it on the users' hard disk. I am very much open to suggestions, however. Anyway, thank you for your time. Kind regards, Bart Wiegmans 2012/4/2 Fields, Christopher J : > Bart, > > I think this is a good idea, but to address one concern up-front: I believe the time frame (possible start in mid-July) will not be in your favor unless you can participate according to Google's schedule. ?There is some flexibility at the beginning, mainly during the community bonding period, but beyond that there isn't much wiggle room; we're as constrained as you are. ?The proposals to OBF over the last few years have been highly competitive, and this year will prove even more so as a large number of major Perl-based FLOSS orgs (primarily The Perl Foundation) were not accepted into GSoC. > > Now for Perl 6: > > BioPerl6 is a project Philip Mabon and I have already started up on github: > > ? https://github.com/cjfields/bioperl6 > > The code is basically proof-of-concept stuff at this point, and I don't believe it's current to spec last I checked or that it works on either the most current niecza or rakudo. ?All this was enough of a moving target to make it hard to maintain, but that seems to be settling somewhat now. ?It's pretty wide open, though, as far as I'm concerned. > > If you are to pursue this as a proposal, you will need to focus on a specific task or set of classes to implement along with docs and tests. ?A straight port isn't the best option; you should take advantage of Perl 6's strengths and possibly work on redesign where needed, keeping in mind that some aspects of bioperl might be considered a bit over-designed and out-of-date w/ re: to modern perl dictums. ?Also, learning a new language is nice, but that isn't the main focus for any GSoC project. ?At the end of the day, we should have usable code for the community, and if it acts as a entry point for more people to get involved with Perl 6 the better :) > > (I see that Leon has also chimed in on this with similar comments as well) > > I will be on and off #bioperl this week (pyrimidine). ?IRC is also logged in case I need to backlog (provided by one Moritz Lenz): > > ? http://irclog.perlgeek.de/bioperl/today > > chris > > On Apr 1, 2012, at 6:55 AM, Bart Wiegmans wrote: > >> Hello all BioPerl-ers, >> >> This is my first e-mail to the list and thus my introduction. My name >> is Bart Wiegmans, I study biology at the University of Groningen, the >> netherlands. It is my goal to implement bioperl6 this summer as part >> of the GSoC program. >> >> Why would I want such a thing? For a start, I'd like to learn more >> about bioinformatics. As I told you I study biology, so this has an >> obvious advantage for me. Also, I'd like to learn perl6 well, and this >> is only possible when one writes a significant program in it. >> Moreover, I think perl6 is awesome, and having a real-world toolkit >> like bioperl out there might just be enough to develop a significant >> community using it. As a third, I think perl5's object support is >> crufty, and difficult to learn for many people. These people include >> biologists who might not be inclined to learn it, and rather use some >> other tools instead. >> >> As to who I am, I already told you my name. I am 24 years old, and >> study biology at an undergraduate level. (For those interested, yes >> this means I haven't exactly been flying through my courses :-)). I >> have been programming computers ever since I was 16 years old, and >> earlier if you count BASIC. Starting out with C, most of that has been >> websites (in PHP), scripts (in Perl), and other smallish programs (in >> Java / Perl). For example, I implemented a parser and decoder for the >> dirac video specification as part of GSoC 2008, and a script which >> reads the NIH bookshelf website and translates this into ePub e-books. >> Read quite a few of them that way. >> >> Aside from my motivation and capabilities, two other factors somewhat >> complicate my involvement with GSoC. The first is that the academic >> year ends halfway in July in the netherlands, not in may as in the USA >> and in many other countries. This means that I am not 'free' in a real >> sense before that time. Also, I have a day job as a PHP programmer for >> a local online students' magazine, which also takes some time. Which >> is unfortunate, because I'd rather spend my time writing useful >> programs; hence, if you would accept me as a student I plan to take >> leave from this job during the period of GSoC. >> >> Anyway, I realize this has been enough information for any interested >> reader. If there is any interest on your side, I frequent freenode >> under the nickname brrt. Other than that and this e-mail address, I >> don't have much of an online presence. >> >> Kind regards, >> Bart Wiegmans >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florent.angly at gmail.com Mon Apr 2 18:30:09 2012 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 03 Apr 2012 08:30:09 +1000 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> Message-ID: <4F7A2871.8080207@gmail.com> To execute R commands from Perl, you can also try Statistics::R (http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm ), which has been around for longer, and which I have recently refactored. Regards, Florent On 02/04/12 22:59, Fields, Christopher J wrote: > Not sure how well it is supported, but there is also Statistics::useR (which has an XS layer for conversing with R). > > https://metacpan.org/module/Statistics::useR > > chris > > On Apr 2, 2012, at 3:17 AM, Adam Witney wrote: > >> The quickest way to do this specific example is probably to just save your R commands to a file (eg R_commands.R) and then run some kind of system/exec/backtick function in your perl script to invoke R, something like (untested): >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> system( 'R --file R_commands.R' ); >> >> Alternatively if you want perl and R to be able to interact and pass data back and forth, you could use something like Statistics::R >> >> http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm >> >> HTH >> >> adam >> >> On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: >> >>> Hi, >>> >>> a very simple doubt, but I do not know how to manage this. >>> >>> I want to plot a histogram for all data in 'datos.txt'. >>> >>> a) by using R: >>> datos<-scan("datos.txt") >>> pdf("xh.pdf") >>> hist(datos) >>> dev.off() >>> >>> >>> b) How could I invoke R inside Perl to do the same?? >>> #!/usr/bin/perl >>> open(DAT,"datos.txt"); >>> while () { >>> chomp; >>> push(@datos,$_); >>> } >>> #now I want a histogram of values in @datos >>> >>> Thanks!! >>> >>> JR >>> >>> -- >>> Jos? Ram?n Blas - PhD >>> Dept. Biochemistry - Medicine School >>> University of Castilla-La Mancha >>> C Almansa, 14 >>> 02006 Albacete (Spain) >>> >>> Phone: +34 967599200 ext. 2958 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From huangyifeicmb at gmail.com Mon Apr 2 20:41:54 2012 From: huangyifeicmb at gmail.com (Yifei Huang) Date: Mon, 2 Apr 2012 20:41:54 -0400 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: <4F7A2871.8080207@gmail.com> References: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> <4F7A2871.8080207@gmail.com> Message-ID: You may try RSPerl. http://www.omegahat.org/RSPerl/ Yifei 2012/4/2 Florent Angly > To execute R commands from Perl, you can also try Statistics::R ( > http://search.cpan.org/~**fangly/Statistics-R-0.27/lib/**Statistics/R.pm< > http://search.cpan.org/%**7Efangly/Statistics-R-0.27/**lib/Statistics/R.pm>), > which has been around for longer, and which I have recently refactored. > Regards, > Florent > > > On 02/04/12 22:59, Fields, Christopher J wrote: > >> Not sure how well it is supported, but there is also Statistics::useR >> (which has an XS layer for conversing with R). >> >> https://metacpan.org/module/**Statistics::useR >> >> chris >> >> On Apr 2, 2012, at 3:17 AM, Adam Witney wrote: >> >> The quickest way to do this specific example is probably to just save >>> your R commands to a file (eg R_commands.R) and then run some kind of >>> system/exec/backtick function in your perl script to invoke R, something >>> like (untested): >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> system( 'R --file R_commands.R' ); >>> >>> Alternatively if you want perl and R to be able to interact and pass >>> data back and forth, you could use something like Statistics::R >>> >>> http://search.cpan.org/~**fangly/Statistics-R-0.27/lib/**Statistics/R.pm >>> >>> HTH >>> >>> adam >>> >>> On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: >>> >>> Hi, >>>> >>>> a very simple doubt, but I do not know how to manage this. >>>> >>>> I want to plot a histogram for all data in 'datos.txt'. >>>> >>>> a) by using R: >>>> datos<-scan("datos.txt") >>>> pdf("xh.pdf") >>>> hist(datos) >>>> dev.off() >>>> >>>> >>>> b) How could I invoke R inside Perl to do the same?? >>>> #!/usr/bin/perl >>>> open(DAT,"datos.txt"); >>>> while () { >>>> chomp; >>>> push(@datos,$_); >>>> } >>>> #now I want a histogram of values in @datos >>>> >>>> Thanks!! >>>> >>>> JR >>>> >>>> -- >>>> Jos? Ram?n Blas - PhD >>>> Dept. Biochemistry - Medicine School >>>> University of Castilla-La Mancha >>>> C Almansa, 14 >>>> 02006 Albacete (Spain) >>>> >>>> Phone: +34 967599200 ext. 2958 >>>> >>>> ______________________________**_________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >>>> >>> >>> ______________________________**_________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >>> >> >> ______________________________**_________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> > > ______________________________**_________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/**mailman/listinfo/bioperl-l > -- Yifei Huang Department of Biology McMaster University From gregonomic at yahoo.co.nz Mon Apr 2 20:37:38 2012 From: gregonomic at yahoo.co.nz (Gregory Baillie) Date: Mon, 2 Apr 2012 17:37:38 -0700 (PDT) Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: <1333413458.84973.YahooMailNeo@web112506.mail.gq1.yahoo.com> Hi. If you only want to plot histograms (or other kinds of graphs), and don't need to use the statistics capabilities of R, then gnuplot (http://www.gnuplot.info/) is a good option. You can construct a string of gnuplot commands and your data, and pipe that directly to gnuplot, without having to write the commands or data to separate files and then use a system call to R. Of course, you do have to learn gnuplot, which may be more trouble than it's worth (I like it, but some people don't). Greg. ________________________________ From: Jos? Ram?n Blas Pastor To: bioperl-l at lists.open-bio.org; Bioperl mailing list Sent: Monday, 2 April 2012 3:21 PM Subject: [Bioperl-l] data into R from Perl arrays Hi, a very simple doubt, but I do not know how to manage this. I want to plot a histogram for all data in 'datos.txt'. a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) dev.off() b) How could I invoke R inside Perl to do the same?? #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; push(@datos,$_); } #now I want a histogram of values in @datos Thanks!! JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From gregonomic at yahoo.co.nz Mon Apr 2 20:37:38 2012 From: gregonomic at yahoo.co.nz (Gregory Baillie) Date: Mon, 2 Apr 2012 17:37:38 -0700 (PDT) Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: <1333413458.84973.YahooMailNeo@web112506.mail.gq1.yahoo.com> Hi. If you only want to plot histograms (or other kinds of graphs), and don't need to use the statistics capabilities of R, then gnuplot (http://www.gnuplot.info/) is a good option. You can construct a string of gnuplot commands and your data, and pipe that directly to gnuplot, without having to write the commands or data to separate files and then use a system call to R. Of course, you do have to learn gnuplot, which may be more trouble than it's worth (I like it, but some people don't). Greg. ________________________________ From: Jos? Ram?n Blas Pastor To: bioperl-l at lists.open-bio.org; Bioperl mailing list Sent: Monday, 2 April 2012 3:21 PM Subject: [Bioperl-l] data into R from Perl arrays Hi, a very simple doubt, but I do not know how to manage this. I want to plot a histogram for all data in 'datos.txt'. a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) dev.off() b) How could I invoke R inside Perl to do the same?? #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; push(@datos,$_); } #now I want a histogram of values in @datos Thanks!! JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Tue Apr 3 11:34:43 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 3 Apr 2012 11:34:43 -0400 Subject: [Bioperl-l] Downloading refseq genomes in batch Message-ID: Hi All, I am trying to download refseq genomes in batch. But instead of accession number i have genome names (=~ 500). Is there any way i can download them using some bioperl module ? Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From carandraug+dev at gmail.com Tue Apr 3 11:53:32 2012 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Tue, 3 Apr 2012 16:53:32 +0100 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: Message-ID: On 3 April 2012 16:34, shalabh sharma wrote: > Hi All, > ? ? ? ? I am trying to download refseq genomes in batch. But instead of > accession number i have genome names (=~ 500). > Is there any way i can download them using some bioperl module ? If you have their name/official symbol, then searching on the database should nly return one hit, therefore one UID. Make the search, get that number, and use it for download. The EUtilities module should do that. Carn? From shalabh.sharma7 at gmail.com Tue Apr 3 14:15:16 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 3 Apr 2012 14:15:16 -0400 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: Message-ID: Hi Came, Thanks for your reply. I tried to get UID from genome names but i cant find on EUtilities. I have taxa id for those genomes, can i download genomes with taxa id in batch ? Thanks Shalabh On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug wrote: > On 3 April 2012 16:34, shalabh sharma wrote: > > Hi All, > > I am trying to download refseq genomes in batch. But instead of > > accession number i have genome names (=~ 500). > > Is there any way i can download them using some bioperl module ? > > If you have their name/official symbol, then searching on the database > should nly return one hit, therefore one UID. Make the search, get > that number, and use it for download. The EUtilities module should do > that. > > Carn? > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From berry at exisoft.nl Tue Apr 3 16:24:54 2012 From: berry at exisoft.nl (Berry Kriesels) Date: Tue, 03 Apr 2012 22:24:54 +0200 Subject: [Bioperl-l] Google summer of code Bio::Structure Message-ID: <4F7B5C96.5090208@exisoft.nl> Dear all, Currently I am considering applying as a student for the 'google summer of code' and would like to contribute to BioPerl via this way. At the moment I am investigating extending the BioPerl Bio::Structure**library in such a way that also some protein modelling can be done or at least add a method so one could do a pdb structure quality assessment. One way is to do it with the use of online services such as for instance Prosaweb (and thus creating a wrapper for this service). Also I could make libraries which one could use to asses the phi and psi angles of certain atoms within a PDB file or the distance in angstrom among many other coordinate measurements within a protein PDB file but also among (comparison) of multiple PDB files. Also adding functions such as DOPE (*D*iscrete*O*ptimized*P*rotein*E*nergy) for model comparisons is an option. There are tons of options to add. However... I have a few questions regarding this and hope some of you will be willing to answer: 1. As users of BioPerl would you consider extending the current Bio::Structure library as a added value or would you rather see effort made in different areas. 2. If one would see extension of the current Bio:Structure library as a useful project, what would your main interests and wishes be? Thank you for input and time. With kind regards, Berry Msc student Bio-informatics. From jovel_juan at hotmail.com Tue Apr 3 17:02:26 2012 From: jovel_juan at hotmail.com (Juan Jovel) Date: Tue, 3 Apr 2012 21:02:26 +0000 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: , , Message-ID: Hi Shalab You can try use Bio::DB::GenBank, but I believe the NCBI does not like people doing many remote lookups. I would advise you download the whole database you are interested in, and then you parse it locally. Cheers, Juan > Date: Tue, 3 Apr 2012 14:15:16 -0400 > From: shalabh.sharma7 at gmail.com > To: carandraug+dev at gmail.com > CC: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Downloading refseq genomes in batch > > Hi Came, > Thanks for your reply. > I tried to get UID from genome names but i cant find on EUtilities. > I have taxa id for those genomes, can i download genomes with taxa id in > batch ? > > Thanks > Shalabh > > > On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug wrote: > > > On 3 April 2012 16:34, shalabh sharma wrote: > > > Hi All, > > > I am trying to download refseq genomes in batch. But instead of > > > accession number i have genome names (=~ 500). > > > Is there any way i can download them using some bioperl module ? > > > > If you have their name/official symbol, then searching on the database > > should nly return one hit, therefore one UID. Make the search, get > > that number, and use it for download. The EUtilities module should do > > that. > > > > Carn? > > > > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Apr 3 17:19:07 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 3 Apr 2012 21:19:07 +0000 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: , , Message-ID: 500 sequences isn't too bad for a remote lookup (I have run about ~20K myself). It's much easier if you can grab them as a batch, e.g. run esearch for the IDs, use efetch with the webenv/key to grab the sequences. NCBI is more worried about the number of requests made, the length of time between requests, and the time of day requests are made. In fact, I recall updating EUtilities recently so it can use a POST, so you can grab ~2000 seqs at a time w/o having to iterate through them. chris On Apr 3, 2012, at 4:02 PM, Juan Jovel wrote: > > Hi Shalab > You can try use Bio::DB::GenBank, but I believe the NCBI does not like people doing many remote lookups. I would advise you download the whole database you are interested in, and then you parse it locally. > Cheers, Juan >> Date: Tue, 3 Apr 2012 14:15:16 -0400 >> From: shalabh.sharma7 at gmail.com >> To: carandraug+dev at gmail.com >> CC: Bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Downloading refseq genomes in batch >> >> Hi Came, >> Thanks for your reply. >> I tried to get UID from genome names but i cant find on EUtilities. >> I have taxa id for those genomes, can i download genomes with taxa id in >> batch ? >> >> Thanks >> Shalabh >> >> >> On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug wrote: >> >>> On 3 April 2012 16:34, shalabh sharma wrote: >>>> Hi All, >>>> I am trying to download refseq genomes in batch. But instead of >>>> accession number i have genome names (=~ 500). >>>> Is there any way i can download them using some bioperl module ? >>> >>> If you have their name/official symbol, then searching on the database >>> should nly return one hit, therefore one UID. Make the search, get >>> that number, and use it for download. The EUtilities module should do >>> that. >>> >>> Carn? >>> >> >> >> >> -- >> Shalabh Sharma >> Scientific Computing Professional Associate (Bioinformatics Specialist) >> Department of Marine Sciences >> University of Georgia >> Athens, GA 30602-3636 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Wed Apr 4 17:24:08 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 4 Apr 2012 17:24:08 -0400 Subject: [Bioperl-l] Weird efetch problem. Message-ID: Hi All, I am facing a really weird problem using efetch. I am getting different outputs if i am using different method of passing values. Like if i am using this method: #!/usr/bin/perl -w use Bio::DB::EUtilities; use Bio::SeqIO; my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'protein', -rettype => 'fasta', -email => 'ssharmai at uga.edu', -id => '256009369'); my $file = 'genome.fasta'; $factory->get_Response(-file => $file); I am getting correct protein sequence but if i am passing values (same id) via an array i am getting nucleotide sequences. use Bio::DB::EUtilities; use Bio::SeqIO; my @ids; my $c = 0; open(IN,"$ARGV[0]"); while(){ my $id = $_; chomp($id);chop($id); $ids[$c] = $id; print "$id\n"; $c++; } close(IN); my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'protein', -rettype => 'fasta', -email => 'ssharmai at uga.edu', -id => \@ids); my $file = 'genome.fasta'; Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From sidd.basu at gmail.com Thu Apr 5 06:31:47 2012 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 5 Apr 2012 05:31:47 -0500 Subject: [Bioperl-l] Re: Weird efetch problem. In-Reply-To: References: Message-ID: <20120405103146.GA5544@Macintosh-388.local> On Wed, 04 Apr 2012, shalabh sharma wrote: > Hi All, > I am facing a really weird problem using efetch. I am getting > different outputs if i am using different method of passing values. > > Like if i am using this method: > > #!/usr/bin/perl -w > use Bio::DB::EUtilities; > use Bio::SeqIO; > my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -db => 'protein', > -rettype => 'fasta', > -email => 'ssharmai at uga.edu', > -id => '256009369'); > > my $file = 'genome.fasta'; > $factory->get_Response(-file => $file); > > I am getting correct protein sequence but if i am passing values (same id) > via an array i am getting nucleotide sequences. > > use Bio::DB::EUtilities; > use Bio::SeqIO; > my @ids; > my $c = 0; > open(IN,"$ARGV[0]"); > while(){ > my $id = $_; > chomp($id);chop($id); > $ids[$c] = $id; > print "$id\n"; > $c++; > } > close(IN); > my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -db => 'protein', > -rettype => 'fasta', > -email => 'ssharmai at uga.edu', > -id => \@ids); Could you send the ids here. -siddhartha > > my $file = 'genome.fasta'; > > Thanks > Shalabh > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Apr 5 09:07:28 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 5 Apr 2012 13:07:28 +0000 Subject: [Bioperl-l] Weird efetch problem. In-Reply-To: <20120405103146.GA5544@Macintosh-388.local> References: <20120405103146.GA5544@Macintosh-388.local> Message-ID: On Apr 5, 2012, at 5:31 AM, Siddhartha Basu wrote: > On Wed, 04 Apr 2012, shalabh sharma wrote: > >> Hi All, >> I am facing a really weird problem using efetch. I am getting >> different outputs if i am using different method of passing values. >> ... >> >> I am getting correct protein sequence but if i am passing values (same id) >> via an array i am getting nucleotide sequences. >> >> .. > Could you send the ids here. > > -siddhartha And please file a bug report on this if something is found. I do know if you use accession numbers you can sometimes get odd results. I recommend only using UIDs (the GI in the case of protein and nuc seqs). chris From shalabh.sharma7 at gmail.com Thu Apr 5 10:40:06 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 5 Apr 2012 10:40:06 -0400 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: Message-ID: Hi All, Thanks for all the suggestions. Thanks a lot Chris, i am using your method to pull out genomes. Its working fine. Thanks Shalabh On Tue, Apr 3, 2012 at 5:19 PM, Fields, Christopher J wrote: > 500 sequences isn't too bad for a remote lookup (I have run about ~20K > myself). It's much easier if you can grab them as a batch, e.g. run > esearch for the IDs, use efetch with the webenv/key to grab the sequences. > NCBI is more worried about the number of requests made, the length of time > between requests, and the time of day requests are made. > > In fact, I recall updating EUtilities recently so it can use a POST, so > you can grab ~2000 seqs at a time w/o having to iterate through them. > > chris > > On Apr 3, 2012, at 4:02 PM, Juan Jovel wrote: > > > > > Hi Shalab > > You can try use Bio::DB::GenBank, but I believe the NCBI does not like > people doing many remote lookups. I would advise you download the whole > database you are interested in, and then you parse it locally. > > Cheers, Juan > >> Date: Tue, 3 Apr 2012 14:15:16 -0400 > >> From: shalabh.sharma7 at gmail.com > >> To: carandraug+dev at gmail.com > >> CC: Bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Downloading refseq genomes in batch > >> > >> Hi Came, > >> Thanks for your reply. > >> I tried to get UID from genome names but i cant find on EUtilities. > >> I have taxa id for those genomes, can i download genomes with taxa id in > >> batch ? > >> > >> Thanks > >> Shalabh > >> > >> > >> On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug >wrote: > >> > >>> On 3 April 2012 16:34, shalabh sharma > wrote: > >>>> Hi All, > >>>> I am trying to download refseq genomes in batch. But instead of > >>>> accession number i have genome names (=~ 500). > >>>> Is there any way i can download them using some bioperl module ? > >>> > >>> If you have their name/official symbol, then searching on the database > >>> should nly return one hit, therefore one UID. Make the search, get > >>> that number, and use it for download. The EUtilities module should do > >>> that. > >>> > >>> Carn? > >>> > >> > >> > >> > >> -- > >> Shalabh Sharma > >> Scientific Computing Professional Associate (Bioinformatics Specialist) > >> Department of Marine Sciences > >> University of Georgia > >> Athens, GA 30602-3636 > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From k.d.murray.91 at gmail.com Fri Apr 6 09:49:32 2012 From: k.d.murray.91 at gmail.com (Kevin Murray) Date: Fri, 6 Apr 2012 23:49:32 +1000 Subject: [Bioperl-l] sequence proxy server Message-ID: Hi all, I'm an undergrad student in molecular biology at the ANU in Australia, and my research projects are becoming increasingly bioinformatics heavy. The latest one has involved quite a large amount of sequence retrieval from GenBank and GenPept. The download speed to Australia from NCBI's servers is rather slow, and i've been thinking about how we can improve this. One solution would be to use Bio::DB::Flat with GenBank sequences on a local computer. However, in a situation where there are multiple people in a lab doing bioinformatics, it seems to me a bit of a waste to have the entire genbank/genpept database, or even the relevant sections thereof, on each computer. So, i though about writing a "sequence proxy" cgi script, and a corresponding module, which would work a bit like this: The user calls Bio::DB::SeqProxy::GenBank as they would Bio::DB::GenBank, with the exception that a parameter for the address of the sequence proxy server is required. The module then sends a request similar to that sent to NCBI's servers by calling Bio::DB::GenBank->get_Seq_by_x() to the sequence proxy server I believe all requests go to the efetch page now (please correct me if I'm wrong, i have read the relevant bioperl module code but not thoroughly), so the CGI script on the sequence proxy would take arguments in a similar fashion to make writing the client side module easier. The CGI script would use a Bio::DB::Flat database, or an interface to an SQL database to determine if the required sequence is stored locally. (as a aside, i'd like your thoughts on Bio::DB::Flat vs Bio::DB::Sql or similar) If the sequence exists locally, it would be returned to the user, either as plain text, or inside an XML container (see below). If not, it would be retrieved from the remote database using the relevant Bio::DB module, and returned. The sequence would either be returned as the relevant sequence format (which would default to GenBank format) in plain text, or as an XML document similar to: 1 ___YOUR GENBANK FILE HERE___ Local Database The aim of the xml document would be to simplify handling of server errors and allow for the specification of other metadata such as which database the sequence came from. Firstly, I'd like to know if this sounds feasible, and if so, if someone is already working on something similar? I don't want to reinvent the wheel. Secondly, I'd like to ask for your comments and advice. Being reasonably new to bioperl (started using bioperl about 6 months ago, but I've been coding in various languages for 8 years) I don't expect to have considered things that may seem obvious to a more experienced bioperl-er, so please be as brutally constructive in your criticism as you see fit =]. I know this is alot of questions, so thanks in advance for your help. Cheers, and a happy Easter to those who celebrate it. Regards Kevin Murray From shalabh.sharma7 at gmail.com Fri Apr 6 10:52:30 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 6 Apr 2012 10:52:30 -0400 Subject: [Bioperl-l] Question about EUtils esearch Message-ID: Hi All, I am trying to get all the UIDs for few genomes. For example: for 'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to NCBI genome page i see there are only =~ 32,00 proteins. http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens I have done this for lot of genomes and i am afraid that i have to do this again. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From shalabh.sharma7 at gmail.com Fri Apr 6 14:27:29 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 6 Apr 2012 14:27:29 -0400 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: Message-ID: Hi Chris, I am using the method you suggested. But i have a question. The UIDs that i am searching using "esearch" are not same as the number of proteins in that genome. For Example: for 'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to NCBI genome page i see there are only =~ 32,00 proteins. http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens Thanks Shalabh On Thu, Apr 5, 2012 at 10:40 AM, shalabh sharma wrote: > Hi All, > Thanks for all the suggestions. > Thanks a lot Chris, i am using your method to pull out genomes. Its > working fine. > > Thanks > Shalabh > > > On Tue, Apr 3, 2012 at 5:19 PM, Fields, Christopher J < > cjfields at illinois.edu> wrote: > >> 500 sequences isn't too bad for a remote lookup (I have run about ~20K >> myself). It's much easier if you can grab them as a batch, e.g. run >> esearch for the IDs, use efetch with the webenv/key to grab the sequences. >> NCBI is more worried about the number of requests made, the length of time >> between requests, and the time of day requests are made. >> >> In fact, I recall updating EUtilities recently so it can use a POST, so >> you can grab ~2000 seqs at a time w/o having to iterate through them. >> >> chris >> >> On Apr 3, 2012, at 4:02 PM, Juan Jovel wrote: >> >> > >> > Hi Shalab >> > You can try use Bio::DB::GenBank, but I believe the NCBI does not like >> people doing many remote lookups. I would advise you download the whole >> database you are interested in, and then you parse it locally. >> > Cheers, Juan >> >> Date: Tue, 3 Apr 2012 14:15:16 -0400 >> >> From: shalabh.sharma7 at gmail.com >> >> To: carandraug+dev at gmail.com >> >> CC: Bioperl-l at lists.open-bio.org >> >> Subject: Re: [Bioperl-l] Downloading refseq genomes in batch >> >> >> >> Hi Came, >> >> Thanks for your reply. >> >> I tried to get UID from genome names but i cant find on EUtilities. >> >> I have taxa id for those genomes, can i download genomes with taxa id >> in >> >> batch ? >> >> >> >> Thanks >> >> Shalabh >> >> >> >> >> >> On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug > >wrote: >> >> >> >>> On 3 April 2012 16:34, shalabh sharma >> wrote: >> >>>> Hi All, >> >>>> I am trying to download refseq genomes in batch. But instead >> of >> >>>> accession number i have genome names (=~ 500). >> >>>> Is there any way i can download them using some bioperl module ? >> >>> >> >>> If you have their name/official symbol, then searching on the database >> >>> should nly return one hit, therefore one UID. Make the search, get >> >>> that number, and use it for download. The EUtilities module should do >> >>> that. >> >>> >> >>> Carn? >> >>> >> >> >> >> >> >> >> >> -- >> >> Shalabh Sharma >> >> Scientific Computing Professional Associate (Bioinformatics Specialist) >> >> Department of Marine Sciences >> >> University of Georgia >> >> Athens, GA 30602-3636 >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Fri Apr 6 15:09:23 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 6 Apr 2012 19:09:23 +0000 Subject: [Bioperl-l] Question about EUtils esearch In-Reply-To: References: Message-ID: Shalabh, You should try getting the specific genome project ID of interest, linking to the proteins, and then grab those. The EUtilities cookbook has a few examples on how to do that. chris On Apr 6, 2012, at 9:52 AM, shalabh sharma wrote: > Hi All, > I am trying to get all the UIDs for few genomes. > For example: > for 'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to > NCBI genome page i see there are only =~ 32,00 proteins. > http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens > > I have done this for lot of genomes and i am afraid that i have to do this > again. > > Thanks > Shalabh > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From wrp at virginia.edu Sat Apr 7 16:56:16 2012 From: wrp at virginia.edu (William Pearson) Date: Sat, 7 Apr 2012 16:56:16 -0400 Subject: [Bioperl-l] Bioperl-l Digest, Vol 108, Issue 7 In-Reply-To: References: Message-ID: To get the UIDs (GIs) that you want, search for human[organism] AND srcdb_refseq[Properties] This will get you the refseq proteins you want. Bill Pearson > Message: 1 > Date: Fri, 6 Apr 2012 14:27:29 -0400 > From: shalabh sharma > Subject: Re: [Bioperl-l] Downloading refseq genomes in batch > > Hi Chris, > I am using the method you suggested. > But i have a question. The UIDs that i am searching using "esearch" are not > same as the number of proteins in that genome. > > For Example: > for 'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to > NCBI genome page i see there are only =~ 32,00 proteins. > http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens > > Thanks > Shalabh > From joel.klein at wur.nl Sun Apr 8 19:35:18 2012 From: joel.klein at wur.nl (Bradyjoel) Date: Sun, 8 Apr 2012 16:35:18 -0700 (PDT) Subject: [Bioperl-l] Need some help with : Error "Use of uninitialized value" Message-ID: <33653318.post@talk.nabble.com> Hi all, I have little experiences in programming with Perl/Bioperl. I'm currently working on a script that takes a whole genome from a bacteria as input, converts it into a multiple fasta file containing all the open reading frames and blast it against a multiple protein fasta file with know proteins. When I get a hit I want to combine the header of the known protein with the orf sequence, here it gives an error when I try to go through the orf file and extract the right corresponding sequence. The error it gives is : Use of uninitialized value $seq in print at blastscript.pl line .. Is there someone who has an idea what caused this error, and can help me with solving it? Regards, Joel (I put my script in the attachment) http://old.nabble.com/file/p33653318/blastscript.pl blastscript.pl -- View this message in context: http://old.nabble.com/Need-some-help-with-%3A-Error-%22Use-of-uninitialized-value%22-tp33653318p33653318.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From afia.hayati at gmail.com Thu Apr 5 00:52:01 2012 From: afia.hayati at gmail.com (afia hayati) Date: Thu, 5 Apr 2012 13:52:01 +0900 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE Message-ID: Dear all, I am afia, a PhD student in Bioinformatics. I am so interested to participate GSoC and would like to apply for project Bio::Assembly, Ace2Sam and Sam2Ace converter. I have written a proposal based on the guidance for prospective GSoC student. I paste my proposal in here. If you have time, please give me suggestions. Thank you very much. Sincerely, Afiahayati [~Be Passion, Patient and Persistent~] *Google Summer of Code 2012* *Proposal* *ACEtoSAM and SAMtoACE* 1. *Contact information * 1. Full name :Afiahayati 2. Address : Hiyoshi International House (Room C301-1), 223-0061 Yokohama-shi Kouhoku-ku, Hiyoshi 2-27 Kanagawa ? Japan 3. Email : afia.hayati at gmail.com 4. Phone number : 818044637237 5. IRC nick : afia 2. *Motivation to join this project * I am a PhD student in bioinformatics. My research is in genome assembly, especially metagenome assembly. I have same idea that the converter from ACEtoSAM and vice versa is very useful. I am familiar with Perl and BioPerl, so there is no reason for not participating in this project 3. *Programming experience and skills * 1. Perl also BioPerl since January 2010 2. R, since January 2008 3. Oracle, since January 2008 4. Biojava, since January 2007 5. PHP , since January 2006 6. C++, since January 2006 7. Java, since January 2006 8. MySQL, since January 2005 9. C , since January 2005 4. *Open source projects involved with * 1. Metagenome Assembly, 2012 (with supervisor) Develop de novo assembler for metagenomic data from short sequence reads Using C, C++ and Perl 2. Develop some interfaces in RCommander, 2010 (in team) 3. Computer system of academic hospital, 2009 (in team) By modifying an open source hospital information system, Care2x Using PHP, Java script and HTML 4. Academic data warehouse and data mining, 2008 (in team) Using Pentaho Business Analytics and R programming language 5. *Project Plan * 1. *Before April 23 * 1. Study the format of SAM and ACE more detail 2. Study the biodesign related to module Bio::Assembly::IO especially Bio::Assembly::IO::ACE and Bio::Assembly::IO::SAM 3. Study the documentation and the code of module Bio::Assembly::IO::ACE and Bio::Assembly::IO::SAM 2. *April 23 - May 20 (before official coding period) * 1. To do self coding for Bio::Assembly::IO::ACE and Bio::Assembly::IO::SAM to improve my understand. 2. Keep contact with my mentor and the BioPerl community. I will active in mailing list and IRC to confirm my understanding about Bio::Assembly::IO::ACE and Bio::Assembly::IO::SAM and also discuss the operations (the methods) needed for a module ACEtoSAM and SAMtoACE converting. 3. With the supervision from my mentor, try to determine the appropriate design of module ACEtoSAM and SAMtoACE converting. 3. *May 21 - June 21 * 1. Determine the final design of module ACEtoSAM and SAMtoACE converting. 2. Code the module ACEtoSAM and SAMtoACE converting 3. Test my code by myself 4. Discuss with my mentor to design good test 5. Test my code based on the test design 4. *June 22 - July 8 * 1. Discuss with my mentor about my code in order to publish in bioperl community 2. Publish my code to the community and learn the feedback *JULY 9 MID TERM EVALUATION * 5. *July 9 - August 5 * 1. Improving the code (do iteration activities) : 1. Keep contact with the community, learn the feedback 2. Make changes in the code, with the supervision from my mentor 3. Test the code and publish the code to the community 2. Finalize the code 3. Start writing the POD documentation 6. *August 6 - August 13 * For final documentation *A buffer of a week for unpredicted delay * *AUGUST 20 FINAL EVALUATION* From heath.obrien at gmail.com Tue Apr 3 12:56:31 2012 From: heath.obrien at gmail.com (Heath O'Brien) Date: Tue, 3 Apr 2012 16:56:31 +0000 (UTC) Subject: [Bioperl-l] =?utf-8?q?problem_with_trunc=5Fwith=5Ffeatures_=28Seq?= =?utf-8?b?VXRpbHMucG0p?= Message-ID: Hi All, I've encountered a bug in the trunc_with_features function in SeqUtils.pm, or at least behavior that was unexpected to me: Features with fuzzy coordinates in the original sequence are converted to exact coordinates in the truncated sequence. For example, the script below changes the coordinates for the feature from <1..5 to 1..5. I have modified the code to change this behavior on my system, but I thought I'd post something here in case others encounter the same problem. all good things, Heath #!/usr/bin/perl -w use strict; use warnings; use Bio::SeqIO; use Bio::SeqUtils; my $infile= shift; my $inIO = Bio::SeqIO->new('-file' => $infile, '-format' => 'genbank') or die "could not open seq file $infile\n"; my $outfile = $infile . '_out.gbk'; my $outIO = Bio::SeqIO->new('-file' => ">$outfile", '-format' => 'genbank') or die "could not open seq file $outfile\n"; my $in_seq = $inIO->next_seq; my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); $outIO->write_seq($out_seq); exit; LOCUS test_sequence 57303 bp DNA linear UNA DEFINITION Sequence to demonstrate unexpected behavior of trunc_with_features ACCESSION unknown KEYWORDS . FEATURES Location/Qualifiers source 1..10 /mol_type="genomic DNA" gene <1..5 /gene="test" CDS <1..5 /product="hypothetical protein" ORIGIN 1 caagattaaa // From mkhalfan at cshl.edu Thu Apr 5 15:29:35 2012 From: mkhalfan at cshl.edu (Khalfan, Mohammed) Date: Thu, 5 Apr 2012 19:29:35 +0000 Subject: [Bioperl-l] Bio::AlignIO->add_new ORDER? Message-ID: <7A8AC28B954CFA428BD67E1376B889A21F0077FC@EX-HS-MBX03.cshl.edu> Hi, I am having a problem trying to add a new sequence to an alignment using the order parameter. I would like to add the sequence to the first position (row) in the alignment, is this possible? Here is what my code looks like: use Bio::AlignIO; use Bio::LocatableSeq; use Bio::SimpleAlign; my $in = Bio::AlignIO->new( -file => $infile ); # $infile is the output from muscle my $aln = $in->next_aln; # build a consensus from the current alignment my $consensus = $aln->consensus_string(); # make the consensus sequence obtained in the above step into a LocatableSeq object my $consensus_obj = new Bio::LocatableSeq ( -seq => $consensus, -id => 'Consensus', -start => 1, -end => length($consensus), ); # add consensus sequence to alignment $aln->add_seq($consensus_obj, 1); ## END CODE ## I have tried $aln->add_seq(seq=>$consensus_obj, order=1); $aln->add_seq(SEQ=>$consensus_obj, ORDER=1); But no luck, I cannot get the consensus listed as the first sequence in the alignment. Is this possible? I can add it in like this successfully, but it adds it to the end, which is not what I need. $aln->add_seq($consensus_obj); These are the errors I get: Using this syntax: $aln->add_seq($consensus_obj, 1); I get this error: Use of uninitialized value in subtraction (-) at /usr/lib/perl5/site_perl/5.8.8/Bio/SimpleAlign.pm line 327, line 132. Using this syntax: $aln->add_seq(seq=>$consensus_obj, order=1); I get this error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Unable to process non locatable sequences [] STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::SimpleAlign::add_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SimpleAlign.pm:297 STACK: ./muscle_post_processor.pl:49 ----------------------------------------------------------- Any assistance would be much appreciated. Thank you. From jason.stajich at gmail.com Mon Apr 9 15:52:43 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 9 Apr 2012 14:52:43 -0500 Subject: [Bioperl-l] Need some help with : Error "Use of uninitialized value" In-Reply-To: <33653318.post@talk.nabble.com> References: <33653318.post@talk.nabble.com> Message-ID: <74341B2D-5EC2-4421-B66B-F0193CA4FB52@gmail.com> You really want to create sequence object(s) an pass these into the BLAST factory. I also can't figure out why you are manually parsing the EMBL file and then using SeqIO later. Why not use SeqIO to parse the embl/genbank file? You also don't report the line number of your current problem, but one can surmise it is here: my $seq = $db->seq($id); print $seq,"\n"; The error indicates you are looking up a sequence ID that doesn't exist since you get an undefined sequence. I would suggest printing out the name of the ID you are asking for to make sure it is correct. Typically we protect these queries like if( my $seqstr = $db->seq($id) ) { print $seqstr, "\n"; } else { warn "cannot find $id in sequence db file\n"); } I think you have not really structured your logic well enough in that loop - you only want to build Bio::DB::Fasta once, the whole point is index once and then query it multiple times. You might consider starting with this code which does a lot of the stuff you are trying to do to extract annotated features. https://github.com/bioperl/bioperl-live/blob/master/scripts/seq/bp_extract_feature_seq.pl I think you are also use tr wrong - if you want to replace replace a string with an empty string you should use s/// and you also need to escape the | character since it has special meaning. I guess in your case you just want the sequence - you would use use Bio::SeqIO to read in your sequence and then pass this back out as FASTA to give to getorf. I don't know if we have a wrapper for EMBOSS's getorf. There are probably a lot more things that need some attention but you should start on these. Jason On Apr 8, 2012, at 6:35 PM, Bradyjoel wrote: > > Hi all, > > I have little experiences in programming with Perl/Bioperl. I'm currently > working on a script that takes a whole genome from a bacteria as input, > converts it into a multiple fasta file containing all the open reading > frames and blast it against a multiple protein fasta file with know > proteins. When I get a hit I want to combine the header of the known protein > with the orf sequence, here it gives an error when I try to go through the > orf file and extract the right corresponding sequence. The error it gives is > : Use of uninitialized value $seq in print at blastscript.pl line .. > Is there someone who has an idea what caused this error, and can help me > with solving it? > > Regards, Joel (I put my script in the attachment) > http://old.nabble.com/file/p33653318/blastscript.pl blastscript.pl > -- > View this message in context: http://old.nabble.com/Need-some-help-with-%3A-Error-%22Use-of-uninitialized-value%22-tp33653318p33653318.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason.stajich at gmail.com Mon Apr 9 15:57:52 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 9 Apr 2012 14:57:52 -0500 Subject: [Bioperl-l] Bio::AlignIO->add_new ORDER? In-Reply-To: <7A8AC28B954CFA428BD67E1376B889A21F0077FC@EX-HS-MBX03.cshl.edu> References: <7A8AC28B954CFA428BD67E1376B889A21F0077FC@EX-HS-MBX03.cshl.edu> Message-ID: You cannot use order=1, it would have to be order => 1 as you are passing in a hash not an assignment. However, I think the rearrange function that parses arguments prefers a leading '-' so it should be -order => 1. Same thing for -seq=>$seq not seq=$seq Did you try using exactly what is in the perldoc? Title : add_seq Usage : $myalign->add_seq($newseq); $myalign->add_seq(-SEQ=>$newseq, -ORDER=>5); Function : Adds another sequence to the alignment. *Does not* align it - just adds it to the hashes. If -ORDER is specified, the sequence is inserted at the the position spec'd by -ORDER, and existing sequences are pushed down the storage array. Returns : nothing Args : A Bio::LocatableSeq object Positive integer for the sequence position (optional) Also - I am not sure what version of the code you are using, that line error you report is not in the current code so you may have to print out what is on those lines or consider upgrading to latest version of the code. On Apr 5, 2012, at 2:29 PM, Khalfan, Mohammed wrote: > Hi, > > I am having a problem trying to add a new sequence to an alignment using the order parameter. > > I would like to add the sequence to the first position (row) in the alignment, is this possible? Here is what my code looks like: > > use Bio::AlignIO; > use Bio::LocatableSeq; > use Bio::SimpleAlign; > > my $in = Bio::AlignIO->new( -file => $infile ); # $infile is the output from muscle > > my $aln = $in->next_aln; > > # build a consensus from the current alignment > my $consensus = $aln->consensus_string(); > > # make the consensus sequence obtained in the above step into a LocatableSeq object > my $consensus_obj = new Bio::LocatableSeq ( > -seq => $consensus, > -id => 'Consensus', > -start => 1, > -end => length($consensus), > ); > > # add consensus sequence to alignment > $aln->add_seq($consensus_obj, 1); > > ## END CODE ## > > I have tried > $aln->add_seq(seq=>$consensus_obj, order=1); > $aln->add_seq(SEQ=>$consensus_obj, ORDER=1); > > But no luck, I cannot get the consensus listed as the first sequence in the alignment. Is this possible? > > I can add it in like this successfully, but it adds it to the end, which is not what I need. > $aln->add_seq($consensus_obj); > > These are the errors I get: > > Using this syntax: $aln->add_seq($consensus_obj, 1); > I get this error: > Use of uninitialized value in subtraction (-) at /usr/lib/perl5/site_perl/5.8.8/Bio/SimpleAlign.pm line 327, line 132. > > Using this syntax: $aln->add_seq(seq=>$consensus_obj, order=1); > I get this error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Unable to process non locatable sequences [] > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::SimpleAlign::add_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SimpleAlign.pm:297 > STACK: ./muscle_post_processor.pl:49 > ----------------------------------------------------------- > > Any assistance would be much appreciated. Thank you. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From heath.obrien at gmail.com Mon Apr 9 17:37:56 2012 From: heath.obrien at gmail.com (Heath O'Brien) Date: Mon, 9 Apr 2012 17:37:56 -0400 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F8352DB.6060106@sanger.ac.uk> References: <4F8352DB.6060106@sanger.ac.uk> Message-ID: <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> Hi Frank, I just tried it with the latest version from bioperl-live, and it worked the way I described in my email. all good things, Heath On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: > Hi Heath, > > I have recently worked a bit on that module and contributed the code > to bioperl-live. I think this behaviour may already have changed but > I'm not 100% sure at the moment. When I have some time I will review > the code to confirm. In the meantime, you could give it a go with > the bioperl-live version if that's an option for you? > > Cheers, > > Frank > > > On 03/04/12 17:56, Heath O'Brien wrote: >> Hi All, >> >> I've encountered a bug in the trunc_with_features function in >> SeqUtils.pm, or at >> least behavior that was unexpected to me: >> >> Features with fuzzy coordinates in the original sequence are >> converted to exact >> coordinates in the truncated sequence. For example, the script >> below changes the >> coordinates for the feature from<1..5 to 1..5. >> >> I have modified the code to change this behavior on my system, but >> I thought I'd >> post something here in case others encounter the same problem. >> >> all good things, >> Heath >> >> >> >> #!/usr/bin/perl -w >> >> use strict; >> use warnings; >> use Bio::SeqIO; >> use Bio::SeqUtils; >> >> my $infile= shift; >> >> my $inIO = Bio::SeqIO->new('-file' => $infile, >> '-format' => 'genbank') or die "could not open seq file >> $infile\n"; >> >> my $outfile = $infile . '_out.gbk'; >> >> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >> '-format' => 'genbank') or die "could not open seq file >> $outfile\n"; >> >> my $in_seq = $inIO->next_seq; >> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >> $outIO->write_seq($out_seq); >> exit; >> >> >> LOCUS test_sequence 57303 bp DNA linear UNA >> DEFINITION Sequence to demonstrate unexpected behavior of >> trunc_with_features >> ACCESSION unknown >> KEYWORDS . >> FEATURES Location/Qualifiers >> source 1..10 >> /mol_type="genomic DNA" >> gene<1..5 >> /gene="test" >> CDS<1..5 >> /product="hypothetical protein" >> ORIGIN >> 1 caagattaaa >> // >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. From fs5 at sanger.ac.uk Mon Apr 9 17:21:31 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 09 Apr 2012 22:21:31 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: References: Message-ID: <4F8352DB.6060106@sanger.ac.uk> Hi Heath, I have recently worked a bit on that module and contributed the code to bioperl-live. I think this behaviour may already have changed but I'm not 100% sure at the moment. When I have some time I will review the code to confirm. In the meantime, you could give it a go with the bioperl-live version if that's an option for you? Cheers, Frank On 03/04/12 17:56, Heath O'Brien wrote: > Hi All, > > I've encountered a bug in the trunc_with_features function in SeqUtils.pm, or at > least behavior that was unexpected to me: > > Features with fuzzy coordinates in the original sequence are converted to exact > coordinates in the truncated sequence. For example, the script below changes the > coordinates for the feature from<1..5 to 1..5. > > I have modified the code to change this behavior on my system, but I thought I'd > post something here in case others encounter the same problem. > > all good things, > Heath > > > > #!/usr/bin/perl -w > > use strict; > use warnings; > use Bio::SeqIO; > use Bio::SeqUtils; > > my $infile= shift; > > my $inIO = Bio::SeqIO->new('-file' => $infile, > '-format' => 'genbank') or die "could not open seq file $infile\n"; > > my $outfile = $infile . '_out.gbk'; > > my $outIO = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => 'genbank') or die "could not open seq file $outfile\n"; > > my $in_seq = $inIO->next_seq; > my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); > $outIO->write_seq($out_seq); > exit; > > > LOCUS test_sequence 57303 bp DNA linear UNA > DEFINITION Sequence to demonstrate unexpected behavior of trunc_with_features > ACCESSION unknown > KEYWORDS . > FEATURES Location/Qualifiers > source 1..10 > /mol_type="genomic DNA" > gene<1..5 > /gene="test" > CDS<1..5 > /product="hypothetical protein" > ORIGIN > 1 caagattaaa > // > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From longbow0 at gmail.com Tue Apr 10 00:40:16 2012 From: longbow0 at gmail.com (longbow leo) Date: Mon, 9 Apr 2012 23:40:16 -0500 Subject: [Bioperl-l] Strange behavior of $node->height for scientific notation format branch length Message-ID: Hi all, I have encountered a strange behavior while calculating the tree height at root node. If the branch length of the tree was in scientific notation format, such as MrBayes created trees, it is unable to give correct results. For example, Tree 1: (((A:0.02,B:0.025):0.12,C:0.071):0.34,D:0.6); Tree 2: (((A:2e-2,B:2.5e-2):1.2e-1,C:7.1e-2):3.4e-1,D:6e-1); These two trees are identical besides the expression of branch length. The Perl script: # ============================================================ #!/usr/bin/perl use 5.010; use strict; use warnings; use Bio::TreeIO; my $usage = << "EOS"; Display branch lengths for leave nodes. Usage: t_branchlen.pl [] Params: : Tree file. : Tree format. Optional. Default "newick". EOS my ($ftre, $fmt) = @ARGV; die $usage unless ( defined $ftre ); $fmt = 'newick' unless ( defined $fmt); my $o_treei = Bio::TreeIO->new( -file => $ftre, -format => $fmt, ); my $o_tree = $o_treei->next_tree; my @o_leaves = $o_tree->get_leaf_nodes(); say join("\t", ("Node", "Branch Length", "Depth")); for my $o_node ( @o_leaves ) { say $o_node->id, "\t", $o_node->branch_length, "\t", $o_node->depth; } my $o_root = $o_tree->get_root_node; # say; say "Root height:\t", $o_root->height; exit 0; # ============================================================ For tree 1, the output is: Node Branch Length Depth A 0.02 0.48 B 0.025 0.485 C 0.071 0.411 D 0.6 0.6 *Root height: 0.6* For tree 2, Node Branch Length Depth A 2e-2 0.48 B 2.5e-2 0.485 C 7.1e-2 0.411 D 6e-1 0.6 *Root height: 3* The interesting thing is, the node depth values are correct, but I have no idea how the root height calculated. Are there any ideas to resolve this problem? Thanks! Haizhou From jason.stajich at gmail.com Tue Apr 10 02:33:00 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 9 Apr 2012 23:33:00 -0700 Subject: [Bioperl-l] Strange behavior of $node->height for scientific notation format branch length In-Reply-To: References: Message-ID: <1839F94F-178E-44F2-8A5C-6E2657AAD59C@gmail.com> It also looks like there is some code in calculating height that only processes numbers that are floating point - see line 64. I am not sure why this is in there, but I guess it was a protection from something that was failing in some other situation. 62: foreach my $subnode ( $self->each_Descendent ) { 63: my $bl = $subnode->branch_length; 64: $bl = 1 unless (defined $bl && $bl =~ /^\-?\d+(\.\d+)?$/); 65: my $s = $subnode->height + $bl; you can work around this by first forcing all your branch lengths to floating point after you read the tree in: for my $node ($tree->get_all_nodes ) $node->branch_length(sprintf("%f",$node->branch_length); } We should think about how we might handle scientific notation branch lengths properly in the code in the future if someone wants to take this on. Jason > Hi all, > > I have encountered a strange behavior while calculating the tree height at > root node. > > If the branch length of the tree was in scientific notation format, such as > MrBayes created trees, it is unable to give correct results. > > For example, > > Tree 1: > > (((A:0.02,B:0.025):0.12,C:0.071):0.34,D:0.6); > > Tree 2: > > (((A:2e-2,B:2.5e-2):1.2e-1,C:7.1e-2):3.4e-1,D:6e-1); > > These two trees are identical besides the expression of branch length. > > The Perl script: > > # ============================================================ > > #!/usr/bin/perl > > use 5.010; > use strict; > use warnings; > > use Bio::TreeIO; > > my $usage = << "EOS"; > Display branch lengths for leave nodes. > Usage: > t_branchlen.pl [] > Params: > : Tree file. > : Tree format. Optional. Default "newick". > EOS > > my ($ftre, $fmt) = @ARGV; > > die $usage unless ( defined $ftre ); > > $fmt = 'newick' unless ( defined $fmt); > > my $o_treei = Bio::TreeIO->new( > -file => $ftre, > -format => $fmt, > ); > > my $o_tree = $o_treei->next_tree; > > my @o_leaves = $o_tree->get_leaf_nodes(); > > say join("\t", ("Node", "Branch Length", "Depth")); > > for my $o_node ( @o_leaves ) { > say $o_node->id, "\t", $o_node->branch_length, "\t", $o_node->depth; > } > > my $o_root = $o_tree->get_root_node; > > # say; > > say "Root height:\t", $o_root->height; > > exit 0; > > # ============================================================ > > For tree 1, the output is: > > Node Branch Length Depth > A 0.02 0.48 > B 0.025 0.485 > C 0.071 0.411 > D 0.6 0.6 > *Root height: 0.6* > > For tree 2, > > Node Branch Length Depth > A 2e-2 0.48 > B 2.5e-2 0.485 > C 7.1e-2 0.411 > D 6e-1 0.6 > *Root height: 3* > > The interesting thing is, the node depth values are correct, but I have no > idea how the root height calculated. > > Are there any ideas to resolve this problem? > > Thanks! > > Haizhou > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From fs5 at sanger.ac.uk Tue Apr 10 04:42:54 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 10 Apr 2012 09:42:54 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> Message-ID: <4F83F28E.4080000@sanger.ac.uk> Hi Heath, Yes, I just had a look too and it's true that it would currently ignore the original type. I had added some new methods (delete, insert, ligate) and with those the location type is preserved but not with the already existing methods like trunc_with_features. I will look into it when I have some time and make some changes. Cheers, Frank On 09/04/12 22:37, Heath O'Brien wrote: > Hi Frank, > > I just tried it with the latest version from bioperl-live, and it worked > the way I described in my email. > > all good things, > Heath > > > On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: > >> Hi Heath, >> >> I have recently worked a bit on that module and contributed the code >> to bioperl-live. I think this behaviour may already have changed but >> I'm not 100% sure at the moment. When I have some time I will review >> the code to confirm. In the meantime, you could give it a go with the >> bioperl-live version if that's an option for you? >> >> Cheers, >> >> Frank >> >> >> On 03/04/12 17:56, Heath O'Brien wrote: >>> Hi All, >>> >>> I've encountered a bug in the trunc_with_features function in >>> SeqUtils.pm, or at >>> least behavior that was unexpected to me: >>> >>> Features with fuzzy coordinates in the original sequence are >>> converted to exact >>> coordinates in the truncated sequence. For example, the script below >>> changes the >>> coordinates for the feature from<1..5 to 1..5. >>> >>> I have modified the code to change this behavior on my system, but I >>> thought I'd >>> post something here in case others encounter the same problem. >>> >>> all good things, >>> Heath >>> >>> >>> >>> #!/usr/bin/perl -w >>> >>> use strict; >>> use warnings; >>> use Bio::SeqIO; >>> use Bio::SeqUtils; >>> >>> my $infile= shift; >>> >>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>> '-format' => 'genbank') or die "could not open seq file $infile\n"; >>> >>> my $outfile = $infile . '_out.gbk'; >>> >>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>> '-format' => 'genbank') or die "could not open seq file $outfile\n"; >>> >>> my $in_seq = $inIO->next_seq; >>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>> $outIO->write_seq($out_seq); >>> exit; >>> >>> >>> LOCUS test_sequence 57303 bp DNA linear UNA >>> DEFINITION Sequence to demonstrate unexpected behavior of >>> trunc_with_features >>> ACCESSION unknown >>> KEYWORDS . >>> FEATURES Location/Qualifiers >>> source 1..10 >>> /mol_type="genomic DNA" >>> gene<1..5 >>> /gene="test" >>> CDS<1..5 >>> /product="hypothetical protein" >>> ORIGIN >>> 1 caagattaaa >>> // >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a >> company registered in England with number 2742969, whose registered >> office is 215 Euston Road, London, NW1 2BE. > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From awitney at sgul.ac.uk Tue Apr 10 05:11:51 2012 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 10 Apr 2012 10:11:51 +0100 Subject: [Bioperl-l] Output of a BLAST parse to text file In-Reply-To: References: Message-ID: <908D24DC-1A0E-4EE1-8573-F68FB3487071@sgul.ac.uk> Hi Zac, how do you want to sort the information? if its just on num_hsps... then you will have to store the results in an array or something and then sort that before printing your output adam On 1 Apr 2012, at 04:35, Zachariah Wylde wrote: > Hi there, > > I am very new to Bioperl, so excuse me if come across as simple! I need to > write a bioperl script to extract information from BLAST results. > The script needs to count how many HSPs are on each mouse chromosome and > be written to a tab-separated table. I have this so far, but do not > understand how to > sort the information. I would much, appreciate if you could help me?? > > Yours sincerely, > > Zac Wylde > > use strict; > use warnings; > use lib "C:/Program Files (x86)/BioPerl"; > use Bio::SearchIO; > > my $infile = "Alignment_Ref_Seq.txt"; > open INFILE, $infile or die "Cannot open $infile: $!"; > > my $outfile = "assignment2.txt"; > open OUTFILE, ">$outfile" or die "Cannot open $outfile: $!"; > > > my $parser = new Bio::SearchIO(-format => 'blast', -file => > 'Alignment_Ref_Seq.txt'); > > > while (my $result = $parser->next_result){ > while (my $hit = $result->next_hit){ > while (my $hsp = $hit->next_hsp){ > if ($hit->description =~ /(mus musculus)|(mouse)/i){ > if ($hit->description =~ /chromosome (\w+)/){ > print "Hit = ", $hit->name, " \t", > "chromosome = ", $1, " \t", > "HSPs = ", $hit->num_hsps, "\n"; > } > } > } > } > } > > close INFILE; > close OUTFILE; > > #unknown > #chromosome from > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Tue Apr 10 07:10:36 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 10 Apr 2012 12:10:36 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F83F28E.4080000@sanger.ac.uk> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> Message-ID: <4F84152C.7030300@gmail.com> Hi Heath, Frank, This was probably my fault back in the mists of time. Looks like an easy fix though, I've reported the issue on Redmine and submitted a patch: https://redmine.open-bio.org/issues/3339 We should probably also add Heath's example as a test case. Cheers, Roy. On 10/04/2012 09:42, Frank Schwach wrote: > Hi Heath, > > Yes, I just had a look too and it's true that it would currently ignore > the original type. I had added some new methods (delete, insert, ligate) > and with those the location type is preserved but not with the already > existing methods like trunc_with_features. I will look into it when I > have some time and make some changes. > > Cheers, > > Frank > > > On 09/04/12 22:37, Heath O'Brien wrote: >> Hi Frank, >> >> I just tried it with the latest version from bioperl-live, and it worked >> the way I described in my email. >> >> all good things, >> Heath >> >> >> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >> >>> Hi Heath, >>> >>> I have recently worked a bit on that module and contributed the code >>> to bioperl-live. I think this behaviour may already have changed but >>> I'm not 100% sure at the moment. When I have some time I will review >>> the code to confirm. In the meantime, you could give it a go with the >>> bioperl-live version if that's an option for you? >>> >>> Cheers, >>> >>> Frank >>> >>> >>> On 03/04/12 17:56, Heath O'Brien wrote: >>>> Hi All, >>>> >>>> I've encountered a bug in the trunc_with_features function in >>>> SeqUtils.pm, or at >>>> least behavior that was unexpected to me: >>>> >>>> Features with fuzzy coordinates in the original sequence are >>>> converted to exact >>>> coordinates in the truncated sequence. For example, the script below >>>> changes the >>>> coordinates for the feature from<1..5 to 1..5. >>>> >>>> I have modified the code to change this behavior on my system, but I >>>> thought I'd >>>> post something here in case others encounter the same problem. >>>> >>>> all good things, >>>> Heath >>>> >>>> >>>> >>>> #!/usr/bin/perl -w >>>> >>>> use strict; >>>> use warnings; >>>> use Bio::SeqIO; >>>> use Bio::SeqUtils; >>>> >>>> my $infile= shift; >>>> >>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>> '-format' => 'genbank') or die "could not open seq file $infile\n"; >>>> >>>> my $outfile = $infile . '_out.gbk'; >>>> >>>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>>> '-format' => 'genbank') or die "could not open seq file $outfile\n"; >>>> >>>> my $in_seq = $inIO->next_seq; >>>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>> $outIO->write_seq($out_seq); >>>> exit; >>>> >>>> >>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>> DEFINITION Sequence to demonstrate unexpected behavior of >>>> trunc_with_features >>>> ACCESSION unknown >>>> KEYWORDS . >>>> FEATURES Location/Qualifiers >>>> source 1..10 >>>> /mol_type="genomic DNA" >>>> gene<1..5 >>>> /gene="test" >>>> CDS<1..5 >>>> /product="hypothetical protein" >>>> ORIGIN >>>> 1 caagattaaa >>>> // >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome Research >>> Limited, a charity registered in England with number 1021457 and a >>> company registered in England with number 2742969, whose registered >>> office is 215 Euston Road, London, NW1 2BE. >> > > From roy.chaudhuri at gmail.com Tue Apr 10 10:45:21 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 10 Apr 2012 15:45:21 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F841EF3.6000603@sanger.ac.uk> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> Message-ID: <4F844781.90005@gmail.com> Turns out I spoke too soon, I added in some new tests and they highlighted problems with both trunc_with_features and revcom_with_features. I think I have resolved all the issues in the most recent Redmine patch - Frank, Heath, please could you check that it works for you? Cheers, Roy. On 10/04/2012 12:52, Frank Schwach wrote: > Brilliant, thanks Roy! > Frank > > > On 10/04/12 12:10, Roy Chaudhuri wrote: >> Hi Heath, Frank, >> >> This was probably my fault back in the mists of time. Looks like an easy >> fix though, I've reported the issue on Redmine and submitted a patch: >> https://redmine.open-bio.org/issues/3339 >> >> We should probably also add Heath's example as a test case. >> >> Cheers, >> Roy. >> >> On 10/04/2012 09:42, Frank Schwach wrote: >>> Hi Heath, >>> >>> Yes, I just had a look too and it's true that it would currently ignore >>> the original type. I had added some new methods (delete, insert, ligate) >>> and with those the location type is preserved but not with the already >>> existing methods like trunc_with_features. I will look into it when I >>> have some time and make some changes. >>> >>> Cheers, >>> >>> Frank >>> >>> >>> On 09/04/12 22:37, Heath O'Brien wrote: >>>> Hi Frank, >>>> >>>> I just tried it with the latest version from bioperl-live, and it worked >>>> the way I described in my email. >>>> >>>> all good things, >>>> Heath >>>> >>>> >>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>> >>>>> Hi Heath, >>>>> >>>>> I have recently worked a bit on that module and contributed the code >>>>> to bioperl-live. I think this behaviour may already have changed but >>>>> I'm not 100% sure at the moment. When I have some time I will review >>>>> the code to confirm. In the meantime, you could give it a go with the >>>>> bioperl-live version if that's an option for you? >>>>> >>>>> Cheers, >>>>> >>>>> Frank >>>>> >>>>> >>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>> Hi All, >>>>>> >>>>>> I've encountered a bug in the trunc_with_features function in >>>>>> SeqUtils.pm, or at >>>>>> least behavior that was unexpected to me: >>>>>> >>>>>> Features with fuzzy coordinates in the original sequence are >>>>>> converted to exact >>>>>> coordinates in the truncated sequence. For example, the script below >>>>>> changes the >>>>>> coordinates for the feature from<1..5 to 1..5. >>>>>> >>>>>> I have modified the code to change this behavior on my system, but I >>>>>> thought I'd >>>>>> post something here in case others encounter the same problem. >>>>>> >>>>>> all good things, >>>>>> Heath >>>>>> >>>>>> >>>>>> >>>>>> #!/usr/bin/perl -w >>>>>> >>>>>> use strict; >>>>>> use warnings; >>>>>> use Bio::SeqIO; >>>>>> use Bio::SeqUtils; >>>>>> >>>>>> my $infile= shift; >>>>>> >>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>> '-format' => 'genbank') or die "could not open seq file $infile\n"; >>>>>> >>>>>> my $outfile = $infile . '_out.gbk'; >>>>>> >>>>>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>>>>> '-format' => 'genbank') or die "could not open seq file $outfile\n"; >>>>>> >>>>>> my $in_seq = $inIO->next_seq; >>>>>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>> $outIO->write_seq($out_seq); >>>>>> exit; >>>>>> >>>>>> >>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>> DEFINITION Sequence to demonstrate unexpected behavior of >>>>>> trunc_with_features >>>>>> ACCESSION unknown >>>>>> KEYWORDS . >>>>>> FEATURES Location/Qualifiers >>>>>> source 1..10 >>>>>> /mol_type="genomic DNA" >>>>>> gene<1..5 >>>>>> /gene="test" >>>>>> CDS<1..5 >>>>>> /product="hypothetical protein" >>>>>> ORIGIN >>>>>> 1 caagattaaa >>>>>> // >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> -- >>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>> Limited, a charity registered in England with number 1021457 and a >>>>> company registered in England with number 2742969, whose registered >>>>> office is 215 Euston Road, London, NW1 2BE. >>>> >>> >>> >> > > From heath.obrien at gmail.com Tue Apr 10 11:34:59 2012 From: heath.obrien at gmail.com (Heath O'Brien) Date: Tue, 10 Apr 2012 11:34:59 -0400 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F844781.90005@gmail.com> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> <4F844781.90005@gmail.com> Message-ID: <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> Works perfect for me. Thanks! all good things, Heath On 10-Apr-12, at 10:45 AM, Roy Chaudhuri wrote: > Turns out I spoke too soon, I added in some new tests and they > highlighted problems with both trunc_with_features and > revcom_with_features. I think I have resolved all the issues in the > most recent Redmine patch - Frank, Heath, please could you check > that it works for you? > > Cheers, > Roy. > > On 10/04/2012 12:52, Frank Schwach wrote: >> Brilliant, thanks Roy! >> Frank >> >> >> On 10/04/12 12:10, Roy Chaudhuri wrote: >>> Hi Heath, Frank, >>> >>> This was probably my fault back in the mists of time. Looks like >>> an easy >>> fix though, I've reported the issue on Redmine and submitted a >>> patch: >>> https://redmine.open-bio.org/issues/3339 >>> >>> We should probably also add Heath's example as a test case. >>> >>> Cheers, >>> Roy. >>> >>> On 10/04/2012 09:42, Frank Schwach wrote: >>>> Hi Heath, >>>> >>>> Yes, I just had a look too and it's true that it would currently >>>> ignore >>>> the original type. I had added some new methods (delete, insert, >>>> ligate) >>>> and with those the location type is preserved but not with the >>>> already >>>> existing methods like trunc_with_features. I will look into it >>>> when I >>>> have some time and make some changes. >>>> >>>> Cheers, >>>> >>>> Frank >>>> >>>> >>>> On 09/04/12 22:37, Heath O'Brien wrote: >>>>> Hi Frank, >>>>> >>>>> I just tried it with the latest version from bioperl-live, and >>>>> it worked >>>>> the way I described in my email. >>>>> >>>>> all good things, >>>>> Heath >>>>> >>>>> >>>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>>> >>>>>> Hi Heath, >>>>>> >>>>>> I have recently worked a bit on that module and contributed the >>>>>> code >>>>>> to bioperl-live. I think this behaviour may already have >>>>>> changed but >>>>>> I'm not 100% sure at the moment. When I have some time I will >>>>>> review >>>>>> the code to confirm. In the meantime, you could give it a go >>>>>> with the >>>>>> bioperl-live version if that's an option for you? >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> I've encountered a bug in the trunc_with_features function in >>>>>>> SeqUtils.pm, or at >>>>>>> least behavior that was unexpected to me: >>>>>>> >>>>>>> Features with fuzzy coordinates in the original sequence are >>>>>>> converted to exact >>>>>>> coordinates in the truncated sequence. For example, the script >>>>>>> below >>>>>>> changes the >>>>>>> coordinates for the feature from<1..5 to 1..5. >>>>>>> >>>>>>> I have modified the code to change this behavior on my system, >>>>>>> but I >>>>>>> thought I'd >>>>>>> post something here in case others encounter the same problem. >>>>>>> >>>>>>> all good things, >>>>>>> Heath >>>>>>> >>>>>>> >>>>>>> >>>>>>> #!/usr/bin/perl -w >>>>>>> >>>>>>> use strict; >>>>>>> use warnings; >>>>>>> use Bio::SeqIO; >>>>>>> use Bio::SeqUtils; >>>>>>> >>>>>>> my $infile= shift; >>>>>>> >>>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>>> '-format' => 'genbank') or die "could not open seq file >>>>>>> $infile\n"; >>>>>>> >>>>>>> my $outfile = $infile . '_out.gbk'; >>>>>>> >>>>>>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>>>>>> '-format' => 'genbank') or die "could not open seq file >>>>>>> $outfile\n"; >>>>>>> >>>>>>> my $in_seq = $inIO->next_seq; >>>>>>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>>> $outIO->write_seq($out_seq); >>>>>>> exit; >>>>>>> >>>>>>> >>>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>>> DEFINITION Sequence to demonstrate unexpected behavior of >>>>>>> trunc_with_features >>>>>>> ACCESSION unknown >>>>>>> KEYWORDS . >>>>>>> FEATURES Location/Qualifiers >>>>>>> source 1..10 >>>>>>> /mol_type="genomic DNA" >>>>>>> gene<1..5 >>>>>>> /gene="test" >>>>>>> CDS<1..5 >>>>>>> /product="hypothetical protein" >>>>>>> ORIGIN >>>>>>> 1 caagattaaa >>>>>>> // >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> -- >>>>>> The Wellcome Trust Sanger Institute is operated by Genome >>>>>> Research >>>>>> Limited, a charity registered in England with number 1021457 >>>>>> and a >>>>>> company registered in England with number 2742969, whose >>>>>> registered >>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>> >>>> >>>> >>> >> >> > From cjfields at illinois.edu Tue Apr 10 13:08:45 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 10 Apr 2012 17:08:45 +0000 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> <4F844781.90005@gmail.com> <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> Message-ID: I have committed these to bioperl-live, they passed tests for me. I have left the bug report open, however, in case more work needs to be done. Roy, did you want to close that when you are ready? chris On Apr 10, 2012, at 10:34 AM, Heath O'Brien wrote: > Works perfect for me. Thanks! > > all good things, > Heath > > On 10-Apr-12, at 10:45 AM, Roy Chaudhuri wrote: > >> Turns out I spoke too soon, I added in some new tests and they highlighted problems with both trunc_with_features and revcom_with_features. I think I have resolved all the issues in the most recent Redmine patch - Frank, Heath, please could you check that it works for you? >> >> Cheers, >> Roy. >> >> On 10/04/2012 12:52, Frank Schwach wrote: >>> Brilliant, thanks Roy! >>> Frank >>> >>> >>> On 10/04/12 12:10, Roy Chaudhuri wrote: >>>> Hi Heath, Frank, >>>> >>>> This was probably my fault back in the mists of time. Looks like an easy >>>> fix though, I've reported the issue on Redmine and submitted a patch: >>>> https://redmine.open-bio.org/issues/3339 >>>> >>>> We should probably also add Heath's example as a test case. >>>> >>>> Cheers, >>>> Roy. >>>> >>>> On 10/04/2012 09:42, Frank Schwach wrote: >>>>> Hi Heath, >>>>> >>>>> Yes, I just had a look too and it's true that it would currently ignore >>>>> the original type. I had added some new methods (delete, insert, ligate) >>>>> and with those the location type is preserved but not with the already >>>>> existing methods like trunc_with_features. I will look into it when I >>>>> have some time and make some changes. >>>>> >>>>> Cheers, >>>>> >>>>> Frank >>>>> >>>>> >>>>> On 09/04/12 22:37, Heath O'Brien wrote: >>>>>> Hi Frank, >>>>>> >>>>>> I just tried it with the latest version from bioperl-live, and it worked >>>>>> the way I described in my email. >>>>>> >>>>>> all good things, >>>>>> Heath >>>>>> >>>>>> >>>>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>>>> >>>>>>> Hi Heath, >>>>>>> >>>>>>> I have recently worked a bit on that module and contributed the code >>>>>>> to bioperl-live. I think this behaviour may already have changed but >>>>>>> I'm not 100% sure at the moment. When I have some time I will review >>>>>>> the code to confirm. In the meantime, you could give it a go with the >>>>>>> bioperl-live version if that's an option for you? >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> I've encountered a bug in the trunc_with_features function in >>>>>>>> SeqUtils.pm, or at >>>>>>>> least behavior that was unexpected to me: >>>>>>>> >>>>>>>> Features with fuzzy coordinates in the original sequence are >>>>>>>> converted to exact >>>>>>>> coordinates in the truncated sequence. For example, the script below >>>>>>>> changes the >>>>>>>> coordinates for the feature from<1..5 to 1..5. >>>>>>>> >>>>>>>> I have modified the code to change this behavior on my system, but I >>>>>>>> thought I'd >>>>>>>> post something here in case others encounter the same problem. >>>>>>>> >>>>>>>> all good things, >>>>>>>> Heath >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> #!/usr/bin/perl -w >>>>>>>> >>>>>>>> use strict; >>>>>>>> use warnings; >>>>>>>> use Bio::SeqIO; >>>>>>>> use Bio::SeqUtils; >>>>>>>> >>>>>>>> my $infile= shift; >>>>>>>> >>>>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>>>> '-format' => 'genbank') or die "could not open seq file $infile\n"; >>>>>>>> >>>>>>>> my $outfile = $infile . '_out.gbk'; >>>>>>>> >>>>>>>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>>>>>>> '-format' => 'genbank') or die "could not open seq file $outfile\n"; >>>>>>>> >>>>>>>> my $in_seq = $inIO->next_seq; >>>>>>>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>>>> $outIO->write_seq($out_seq); >>>>>>>> exit; >>>>>>>> >>>>>>>> >>>>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>>>> DEFINITION Sequence to demonstrate unexpected behavior of >>>>>>>> trunc_with_features >>>>>>>> ACCESSION unknown >>>>>>>> KEYWORDS . >>>>>>>> FEATURES Location/Qualifiers >>>>>>>> source 1..10 >>>>>>>> /mol_type="genomic DNA" >>>>>>>> gene<1..5 >>>>>>>> /gene="test" >>>>>>>> CDS<1..5 >>>>>>>> /product="hypothetical protein" >>>>>>>> ORIGIN >>>>>>>> 1 caagattaaa >>>>>>>> // >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>>>> Limited, a charity registered in England with number 1021457 and a >>>>>>> company registered in England with number 2742969, whose registered >>>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>>> >>>>> >>>>> >>>> >>> >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Tue Apr 10 16:07:28 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 10 Apr 2012 21:07:28 +0100 Subject: [Bioperl-l] sequence proxy server In-Reply-To: References: Message-ID: On Fri, Apr 6, 2012 at 2:49 PM, Kevin Murray wrote: > Hi all, > > I'm an undergrad student in molecular biology at the ANU in Australia, > and my research projects are becoming increasingly bioinformatics > heavy. The latest one has involved quite a large amount of sequence > retrieval from GenBank and GenPept. The download speed to Australia > from NCBI's servers is rather slow, and i've been thinking about how > we can improve this. ...So, i though about writing a "sequence proxy" ... Have you tried TogoWS? It is based Japan and offers access to some of the local databases but also proxies some important EMBL/EBI and NCBI resources as well - including GenBank. I would expect you'd get much faster response times from Australia than talking directly to the NCBI. http://togows.dbcls.jp/site/en/rest.html I think the TogoWS REST API is very nice to use, and seems to give much clearer error messages than the NCBI Entrez site (TogoWS uses HTTP error codes pretty consistently). Biopython 1.59 onwards has a simple API for the TogoWS REST interface, but their URL structure is very easy, so for a simple one off task you can easily roll your own in Perl (or write one for BioPerl?). Peter From cjfields at illinois.edu Tue Apr 10 21:20:48 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 11 Apr 2012 01:20:48 +0000 Subject: [Bioperl-l] sequence proxy server In-Reply-To: References: Message-ID: On Apr 10, 2012, at 3:07 PM, Peter Cock wrote: > On Fri, Apr 6, 2012 at 2:49 PM, Kevin Murray wrote: >> Hi all, >> >> I'm an undergrad student in molecular biology at the ANU in Australia, >> and my research projects are becoming increasingly bioinformatics >> heavy. The latest one has involved quite a large amount of sequence >> retrieval from GenBank and GenPept. The download speed to Australia >> from NCBI's servers is rather slow, and i've been thinking about how >> we can improve this. ...So, i though about writing a "sequence proxy" ... > > Have you tried TogoWS? It is based Japan and offers access to > some of the local databases but also proxies some important > EMBL/EBI and NCBI resources as well - including GenBank. > I would expect you'd get much faster response times from > Australia than talking directly to the NCBI. > http://togows.dbcls.jp/site/en/rest.html > > I think the TogoWS REST API is very nice to use, and seems to > give much clearer error messages than the NCBI Entrez site > (TogoWS uses HTTP error codes pretty consistently). > > Biopython 1.59 onwards has a simple API for the TogoWS > REST interface, but their URL structure is very easy, so for > a simple one off task you can easily roll your own in Perl > (or write one for BioPerl?). > > Peter Should be easy enough if the API is well-documented. Related to this, anyone know if NCBI's REST API is documented anywhere? chris From roy.chaudhuri at gmail.com Wed Apr 11 06:55:49 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 11 Apr 2012 11:55:49 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> <4F844781.90005@gmail.com> <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> Message-ID: <4F856335.1000503@gmail.com> Hi Chris, I think it should be fine to close, but my account doesn't have permission to do so. Cheers, Roy. On 10/04/2012 18:08, Fields, Christopher J wrote: > I have committed these to bioperl-live, they passed tests for me. I > have left the bug report open, however, in case more work needs to be > done. Roy, did you want to close that when you are ready? > > chris > > On Apr 10, 2012, at 10:34 AM, Heath O'Brien wrote: > >> Works perfect for me. Thanks! >> >> all good things, Heath >> >> On 10-Apr-12, at 10:45 AM, Roy Chaudhuri wrote: >> >>> Turns out I spoke too soon, I added in some new tests and they >>> highlighted problems with both trunc_with_features and >>> revcom_with_features. I think I have resolved all the issues in >>> the most recent Redmine patch - Frank, Heath, please could you >>> check that it works for you? >>> >>> Cheers, Roy. >>> >>> On 10/04/2012 12:52, Frank Schwach wrote: >>>> Brilliant, thanks Roy! Frank >>>> >>>> >>>> On 10/04/12 12:10, Roy Chaudhuri wrote: >>>>> Hi Heath, Frank, >>>>> >>>>> This was probably my fault back in the mists of time. Looks >>>>> like an easy fix though, I've reported the issue on Redmine >>>>> and submitted a patch: >>>>> https://redmine.open-bio.org/issues/3339 >>>>> >>>>> We should probably also add Heath's example as a test case. >>>>> >>>>> Cheers, Roy. >>>>> >>>>> On 10/04/2012 09:42, Frank Schwach wrote: >>>>>> Hi Heath, >>>>>> >>>>>> Yes, I just had a look too and it's true that it would >>>>>> currently ignore the original type. I had added some new >>>>>> methods (delete, insert, ligate) and with those the >>>>>> location type is preserved but not with the already >>>>>> existing methods like trunc_with_features. I will look into >>>>>> it when I have some time and make some changes. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> On 09/04/12 22:37, Heath O'Brien wrote: >>>>>>> Hi Frank, >>>>>>> >>>>>>> I just tried it with the latest version from >>>>>>> bioperl-live, and it worked the way I described in my >>>>>>> email. >>>>>>> >>>>>>> all good things, Heath >>>>>>> >>>>>>> >>>>>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>>>>> >>>>>>>> Hi Heath, >>>>>>>> >>>>>>>> I have recently worked a bit on that module and >>>>>>>> contributed the code to bioperl-live. I think this >>>>>>>> behaviour may already have changed but I'm not 100% >>>>>>>> sure at the moment. When I have some time I will >>>>>>>> review the code to confirm. In the meantime, you could >>>>>>>> give it a go with the bioperl-live version if that's an >>>>>>>> option for you? >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Frank >>>>>>>> >>>>>>>> >>>>>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> I've encountered a bug in the trunc_with_features >>>>>>>>> function in SeqUtils.pm, or at least behavior that >>>>>>>>> was unexpected to me: >>>>>>>>> >>>>>>>>> Features with fuzzy coordinates in the original >>>>>>>>> sequence are converted to exact coordinates in the >>>>>>>>> truncated sequence. For example, the script below >>>>>>>>> changes the coordinates for the feature from<1..5 to >>>>>>>>> 1..5. >>>>>>>>> >>>>>>>>> I have modified the code to change this behavior on >>>>>>>>> my system, but I thought I'd post something here in >>>>>>>>> case others encounter the same problem. >>>>>>>>> >>>>>>>>> all good things, Heath >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> #!/usr/bin/perl -w >>>>>>>>> >>>>>>>>> use strict; use warnings; use Bio::SeqIO; use >>>>>>>>> Bio::SeqUtils; >>>>>>>>> >>>>>>>>> my $infile= shift; >>>>>>>>> >>>>>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>>>>> '-format' => 'genbank') or die "could not open seq >>>>>>>>> file $infile\n"; >>>>>>>>> >>>>>>>>> my $outfile = $infile . '_out.gbk'; >>>>>>>>> >>>>>>>>> my $outIO = Bio::SeqIO->new('-file' => >>>>>>>>> ">$outfile", '-format' => 'genbank') or die "could >>>>>>>>> not open seq file $outfile\n"; >>>>>>>>> >>>>>>>>> my $in_seq = $inIO->next_seq; my $out_seq = >>>>>>>>> Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>>>>> $outIO->write_seq($out_seq); exit; >>>>>>>>> >>>>>>>>> >>>>>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>>>>> DEFINITION Sequence to demonstrate unexpected >>>>>>>>> behavior of trunc_with_features ACCESSION unknown >>>>>>>>> KEYWORDS . FEATURES Location/Qualifiers source 1..10 >>>>>>>>> /mol_type="genomic DNA" gene<1..5 /gene="test" >>>>>>>>> CDS<1..5 /product="hypothetical protein" ORIGIN 1 >>>>>>>>> caagattaaa // >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> -- The Wellcome Trust Sanger Institute is operated by >>>>>>>> Genome Research Limited, a charity registered in >>>>>>>> England with number 1021457 and a company registered in >>>>>>>> England with number 2742969, whose registered office is >>>>>>>> 215 Euston Road, London, NW1 2BE. >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> _______________________________________________ Bioperl-l mailing >> list Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed Apr 11 11:28:38 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 11 Apr 2012 15:28:38 +0000 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F856335.1000503@gmail.com> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> <4F844781.90005@gmail.com> <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> <4F856335.1000503@gmail.com> Message-ID: Okay, closed it. Thanks again! chris On Apr 11, 2012, at 5:55 AM, Roy Chaudhuri wrote: > Hi Chris, > > I think it should be fine to close, but my account doesn't have permission to do so. > > Cheers, > Roy. > > On 10/04/2012 18:08, Fields, Christopher J wrote: >> I have committed these to bioperl-live, they passed tests for me. I >> have left the bug report open, however, in case more work needs to be >> done. Roy, did you want to close that when you are ready? >> >> chris >> >> On Apr 10, 2012, at 10:34 AM, Heath O'Brien wrote: >> >>> Works perfect for me. Thanks! >>> >>> all good things, Heath >>> >>> On 10-Apr-12, at 10:45 AM, Roy Chaudhuri wrote: >>> >>>> Turns out I spoke too soon, I added in some new tests and they >>>> highlighted problems with both trunc_with_features and >>>> revcom_with_features. I think I have resolved all the issues in >>>> the most recent Redmine patch - Frank, Heath, please could you >>>> check that it works for you? >>>> >>>> Cheers, Roy. >>>> >>>> On 10/04/2012 12:52, Frank Schwach wrote: >>>>> Brilliant, thanks Roy! Frank >>>>> >>>>> >>>>> On 10/04/12 12:10, Roy Chaudhuri wrote: >>>>>> Hi Heath, Frank, >>>>>> >>>>>> This was probably my fault back in the mists of time. Looks >>>>>> like an easy fix though, I've reported the issue on Redmine >>>>>> and submitted a patch: >>>>>> https://redmine.open-bio.org/issues/3339 >>>>>> >>>>>> We should probably also add Heath's example as a test case. >>>>>> >>>>>> Cheers, Roy. >>>>>> >>>>>> On 10/04/2012 09:42, Frank Schwach wrote: >>>>>>> Hi Heath, >>>>>>> >>>>>>> Yes, I just had a look too and it's true that it would >>>>>>> currently ignore the original type. I had added some new >>>>>>> methods (delete, insert, ligate) and with those the >>>>>>> location type is preserved but not with the already >>>>>>> existing methods like trunc_with_features. I will look into >>>>>>> it when I have some time and make some changes. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> On 09/04/12 22:37, Heath O'Brien wrote: >>>>>>>> Hi Frank, >>>>>>>> >>>>>>>> I just tried it with the latest version from >>>>>>>> bioperl-live, and it worked the way I described in my >>>>>>>> email. >>>>>>>> >>>>>>>> all good things, Heath >>>>>>>> >>>>>>>> >>>>>>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>>>>>> >>>>>>>>> Hi Heath, >>>>>>>>> >>>>>>>>> I have recently worked a bit on that module and >>>>>>>>> contributed the code to bioperl-live. I think this >>>>>>>>> behaviour may already have changed but I'm not 100% >>>>>>>>> sure at the moment. When I have some time I will >>>>>>>>> review the code to confirm. In the meantime, you could >>>>>>>>> give it a go with the bioperl-live version if that's an >>>>>>>>> option for you? >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Frank >>>>>>>>> >>>>>>>>> >>>>>>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> I've encountered a bug in the trunc_with_features >>>>>>>>>> function in SeqUtils.pm, or at least behavior that >>>>>>>>>> was unexpected to me: >>>>>>>>>> >>>>>>>>>> Features with fuzzy coordinates in the original >>>>>>>>>> sequence are converted to exact coordinates in the >>>>>>>>>> truncated sequence. For example, the script below >>>>>>>>>> changes the coordinates for the feature from<1..5 to >>>>>>>>>> 1..5. >>>>>>>>>> >>>>>>>>>> I have modified the code to change this behavior on >>>>>>>>>> my system, but I thought I'd post something here in >>>>>>>>>> case others encounter the same problem. >>>>>>>>>> >>>>>>>>>> all good things, Heath >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> #!/usr/bin/perl -w >>>>>>>>>> >>>>>>>>>> use strict; use warnings; use Bio::SeqIO; use >>>>>>>>>> Bio::SeqUtils; >>>>>>>>>> >>>>>>>>>> my $infile= shift; >>>>>>>>>> >>>>>>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>>>>>> '-format' => 'genbank') or die "could not open seq >>>>>>>>>> file $infile\n"; >>>>>>>>>> >>>>>>>>>> my $outfile = $infile . '_out.gbk'; >>>>>>>>>> >>>>>>>>>> my $outIO = Bio::SeqIO->new('-file' => >>>>>>>>>> ">$outfile", '-format' => 'genbank') or die "could >>>>>>>>>> not open seq file $outfile\n"; >>>>>>>>>> >>>>>>>>>> my $in_seq = $inIO->next_seq; my $out_seq = >>>>>>>>>> Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>>>>>> $outIO->write_seq($out_seq); exit; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>>>>>> DEFINITION Sequence to demonstrate unexpected >>>>>>>>>> behavior of trunc_with_features ACCESSION unknown >>>>>>>>>> KEYWORDS . FEATURES Location/Qualifiers source 1..10 >>>>>>>>>> /mol_type="genomic DNA" gene<1..5 /gene="test" >>>>>>>>>> CDS<1..5 /product="hypothetical protein" ORIGIN 1 >>>>>>>>>> caagattaaa // >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>>> -- The Wellcome Trust Sanger Institute is operated by >>>>>>>>> Genome Research Limited, a charity registered in >>>>>>>>> England with number 1021457 and a company registered in >>>>>>>>> England with number 2742969, whose registered office is >>>>>>>>> 215 Euston Road, London, NW1 2BE. >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> _______________________________________________ Bioperl-l mailing >>> list Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From p.j.a.cock at googlemail.com Thu Apr 12 08:47:05 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 12 Apr 2012 13:47:05 +0100 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE In-Reply-To: References: Message-ID: On Thu, Apr 5, 2012 at 5:52 AM, afia hayati wrote: > Dear all, > > I am afia, a PhD student in Bioinformatics. ?I am so interested to > participate GSoC and would like to apply for project Bio::Assembly, Ace2Sam > and Sam2Ace converter. I have written a proposal based on the guidance for > prospective GSoC student. I paste my proposal in here. > If you have time, please give me suggestions. > Thank you very much. > > Sincerely, > Afiahayati Hello Afiahayati, What would you use this converter for? I can see it is useful to convert ACE to SAM/BAM for downstream analysis and visualization. At the moment the only assemblers I regularly use which produce ACE are the Roche 'Newbler' gsAssember, and MIRA. For MIRA, Bastien is working on native SAM output, but for the moment I wrote and maintain a converter from MIRA's alignment format (MAF) to SAM: https://github.com/peterjc/maf2sam Or is the idea more to support SAM (and BAM) assemblies within the existing BioPerl Bio::Assembly::IO: framework to allow easier manipulation from Perl? Peter From florent.angly at gmail.com Thu Apr 12 22:41:54 2012 From: florent.angly at gmail.com (Florent Angly) Date: Fri, 13 Apr 2012 12:41:54 +1000 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE In-Reply-To: References: Message-ID: <4F879272.30306@gmail.com> Bioperl has a module to read and write ACE files, Bio::Assembly::IO::ace. It also has a module to read (but not write) SAM files, Bio::Assembly::IO::sam. So, it looks like you can already do SAMtoACE within Bioperl. Implementing ACEtoSAM would involve adding write support to the Bio::Assembly::sam module. This can be helped by looking at how Bio::Assembly::IO::ace and Bio::Assembly::tigr implement write support. Regards, Florent On 12/04/12 22:47, Peter Cock wrote: > On Thu, Apr 5, 2012 at 5:52 AM, afia hayati wrote: >> Dear all, >> >> I am afia, a PhD student in Bioinformatics. I am so interested to >> participate GSoC and would like to apply for project Bio::Assembly, Ace2Sam >> and Sam2Ace converter. I have written a proposal based on the guidance for >> prospective GSoC student. I paste my proposal in here. >> If you have time, please give me suggestions. >> Thank you very much. >> >> Sincerely, >> Afiahayati > Hello Afiahayati, > > What would you use this converter for? > > I can see it is useful to convert ACE to SAM/BAM for downstream analysis > and visualization. At the moment the only assemblers I regularly use which > produce ACE are the Roche 'Newbler' gsAssember, and MIRA. > > For MIRA, Bastien is working on native SAM output, but for the moment > I wrote and maintain a converter from MIRA's alignment format (MAF) to > SAM: https://github.com/peterjc/maf2sam > > Or is the idea more to support SAM (and BAM) assemblies within the > existing BioPerl Bio::Assembly::IO: framework to allow easier > manipulation from Perl? > > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Fri Apr 13 04:32:00 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 13 Apr 2012 09:32:00 +0100 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE In-Reply-To: <4F879272.30306@gmail.com> References: <4F879272.30306@gmail.com> Message-ID: On Fri, Apr 13, 2012 at 3:41 AM, Florent Angly wrote: > Bioperl has a module to read and write ACE files, Bio::Assembly::IO::ace. It > also has a module to read (but not write) SAM files, Bio::Assembly::IO::sam. > So, it looks like you can already do SAMtoACE within Bioperl. Implementing > ACEtoSAM would involve adding write support to the Bio::Assembly::sam > module. This can be helped by looking at how Bio::Assembly::IO::ace and > Bio::Assembly::tigr implement write support. > Regards, > Florent Does SAMtoACE within Bioperl using Bio::Assembly::IO actually work? Note that proper multiple sequence alignments in SAM/BAM format are relatively rare - the vast majority of SAM/BAM files are just pairwise alignments which are not a good fit for ACE. Peter From k.d.murray.91 at gmail.com Fri Apr 13 05:31:06 2012 From: k.d.murray.91 at gmail.com (Kevin Murray) Date: Fri, 13 Apr 2012 19:31:06 +1000 Subject: [Bioperl-l] sequence proxy server In-Reply-To: References: Message-ID: Hi Chris and Peter, Thanks for the advice, it is much appreciated. I have found almost exactly what i was taking about in the bioperl scripts, github link https://github.com/bioperl/bioperl-live/blob/master/scripts/DB/bp_biofetch_genbank_proxy.pl I will have a go at porting this to use a Bio::DB::Flat cache, given that would be exactly what i envisaged. With regards to implementing a Bio::DB module for TogoWS, i may have a crack at it if no one else is (although it will probably take me a while). Are there any pointers or particular styles you guys have (other than TMTOWTDI). Cheers, Regards Kevin Murray From afia.hayati at gmail.com Sat Apr 14 20:15:11 2012 From: afia.hayati at gmail.com (afia hayati) Date: Sun, 15 Apr 2012 09:15:11 +0900 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE In-Reply-To: References: <4F879272.30306@gmail.com> Message-ID: Peter, Florent, and all, thanks for the responses. Ya.., the idea is more to support SAM assemblies within the existing Bio::Assembly::IO. SAM or ACE files once imported should have similar handles and methods. Bio::Assembly::IO::SAM is a read only. I also will try to add write support for that module. In Bio::Assembly::ACE, there are write methods, completed with the quality score, so it "looks like" we can do SAMtoACE converter. Anyway, the main point is to add write support in Bio::Assembly::SAM. Please CMIIW, I am open to corrections and suggestions. best regards, Afiahayati On Fri, Apr 13, 2012 at 5:32 PM, Peter Cock wrote: > On Fri, Apr 13, 2012 at 3:41 AM, Florent Angly > wrote: > > Bioperl has a module to read and write ACE files, > Bio::Assembly::IO::ace. It > > also has a module to read (but not write) SAM files, > Bio::Assembly::IO::sam. > > So, it looks like you can already do SAMtoACE within Bioperl. > Implementing > > ACEtoSAM would involve adding write support to the Bio::Assembly::sam > > module. This can be helped by looking at how Bio::Assembly::IO::ace and > > Bio::Assembly::tigr implement write support. > > Regards, > > Florent > > Does SAMtoACE within Bioperl using Bio::Assembly::IO actually work? > > Note that proper multiple sequence alignments in SAM/BAM format are > relatively rare - the vast majority of SAM/BAM files are just pairwise > alignments which are not a good fit for ACE. > > Peter From jovel_juan at hotmail.com Sat Apr 14 23:27:57 2012 From: jovel_juan at hotmail.com (Juan Jovel) Date: Sun, 15 Apr 2012 03:27:57 +0000 Subject: [Bioperl-l] how to get specific sub-sequence from GenBank file with Bio::DB::GenBank In-Reply-To: References: , , <4F879272.30306@gmail.com>, , Message-ID: Hello All, I want to get some subsequences from provirus sequences in the GenBank, I got the whole sequences with the script below. However, I want to get a specific sub-sequence, which appears in the GenBank files in the line: LTR 9091..9723 how can I modify my script to get only nts 9091-9723 (in this example), instead of the whole sequence. Thanks a lot in advance!________________________HERE THE SCRIPT: #!/usr/bin/perl -wuse strict;use Bio::DB::GenBank;use Bio::SeqIO; chomp(my $infile = $ARGV[0]);open(IN, "$infile") or die "$!";my @ids = ; chomp(my $outfile = $ARGV[1]); my $seqs_out = Bio::SeqIO -> new(-file => ">$outfile", -format => "fasta"); foreach my $entry(@ids){ print "$entry"; my $db = Bio::DB::GenBank->new; my $seq = $db->get_Seq_by_acc($entry); $seqs_out->write_seq($seq); }exit; From roy.chaudhuri at gmail.com Mon Apr 16 07:16:57 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 16 Apr 2012 12:16:57 +0100 Subject: [Bioperl-l] how to get specific sub-sequence from GenBank file with Bio::DB::GenBank In-Reply-To: References: , , <4F879272.30306@gmail.com>, , Message-ID: <4F8BFFA9.9030305@gmail.com> Hi Juan, If you know the LTR coordinates in advance, then you can download a specific subsequence using Bio::DB::GenBank as shown here: http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::GenBank_when_you_have_genomic_coordinates_to_get_a_Seq_object If you don't, then you will need to download the whole sequence as you are doing, but add in some code to print out just the sequence associated with the LTR feature. Something like (untested): for my $feat ($seq->get_SeqFeatures) { $seqs_out->write_seq($feat->spliced_seq) if $feat->primary_tag eq 'LTR'; } Cheers, Roy. On 15/04/2012 04:27, Juan Jovel wrote: > > > Hello All, I want to get some subsequences from provirus sequences in > the GenBank, I got the whole sequences with the script below. > However, I want to get a specific sub-sequence, which appears in the > GenBank files in the line: LTR 9091..9723 how can I > modify my script to get only nts 9091-9723 (in this example), instead > of the whole sequence. Thanks a lot in > advance!________________________HERE THE SCRIPT: #!/usr/bin/perl > -wuse strict;use Bio::DB::GenBank;use Bio::SeqIO; chomp(my $infile = > $ARGV[0]);open(IN, "$infile") or die "$!";my @ids =; chomp(my > $outfile = $ARGV[1]); my $seqs_out = Bio::SeqIO -> new(-file => > ">$outfile", -format => "fasta"); foreach my $entry(@ids){ > print "$entry"; my $db = Bio::DB::GenBank->new; my $seq = > $db->get_Seq_by_acc($entry); $seqs_out->write_seq($seq); }exit; > > > > > > > > _______________________________________________ Bioperl-l mailing > list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sharmashalu.bio at gmail.com Mon Apr 16 16:08:23 2012 From: sharmashalu.bio at gmail.com (shalu sharma) Date: Mon, 16 Apr 2012 16:08:23 -0400 Subject: [Bioperl-l] Amino Acid sequence to nucleotides sequence Message-ID: Hi All, Is there any way in Bioperl i can convert amino acid sequences to nucleotide sequences. Thanks Shalu From p.j.a.cock at googlemail.com Mon Apr 16 16:32:20 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 16 Apr 2012 21:32:20 +0100 Subject: [Bioperl-l] Amino Acid sequence to nucleotides sequence In-Reply-To: References: Message-ID: On Mon, Apr 16, 2012 at 9:08 PM, shalu sharma wrote: > Hi All, > ? ? ? ? ? ?Is there any way in Bioperl i can convert amino acid sequences > to nucleotide sequences. > > Thanks > Shalu Probably - but there is more than one answer since the codon tables are a many-to-one mapping. Are you hoping for one possible nucleotide sequence, perhaps with IUPAC ambiguity characters? Perhaps a specific example of what you want would help - back-translation is a fuzzy term. If you are trying to combine a protein alignment with the original unaligned nucleotide sequences to make a codon alignment that's a different task. Peter From cjfields at illinois.edu Mon Apr 16 16:44:21 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 16 Apr 2012 20:44:21 +0000 Subject: [Bioperl-l] Amino Acid sequence to nucleotides sequence In-Reply-To: References: Message-ID: <45DE0C13-27B3-4E1C-AB8A-83B99DD407AF@illinois.edu> On Apr 16, 2012, at 3:32 PM, Peter Cock wrote: > On Mon, Apr 16, 2012 at 9:08 PM, shalu sharma wrote: >> Hi All, >> Is there any way in Bioperl i can convert amino acid sequences >> to nucleotide sequences. >> >> Thanks >> Shalu > > Probably - but there is more than one answer since the codon > tables are a many-to-one mapping. Are you hoping for one > possible nucleotide sequence, perhaps with IUPAC ambiguity > characters? Perhaps a specific example of what you want > would help - back-translation is a fuzzy term. > > If you are trying to combine a protein alignment with the > original unaligned nucleotide sequences to make a codon > alignment that's a different task. > > Peter We do have a revtranslate function in bioperl that is supposed to deal with ambiguities: https://metacpan.org/module/Bio::Tools::CodonTable#revtranslate I don't know how well-tested it is, but it was added a few years back to Bio::Tools::CodonTable. IIRC Mark Jensen was the developer who did that, and he's pretty meticulous. chris From Russell.Smithies at agresearch.co.nz Mon Apr 16 17:28:11 2012 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 17 Apr 2012 09:28:11 +1200 Subject: [Bioperl-l] sequence proxy server In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCE50A550@exchsth.agresearch.co.nz> I assume you've done the obvious thing and tried downloading from your local mirror? ftp://biomirror.aarnet.edu.au/biomirror/ Or ours: http://www.biomirror.org.nz/ If you have a large number of requests it's almost always faster to download the refseq files and extract locally rather than run queries against NCBI via the web. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Murray > Sent: Saturday, 7 April 2012 1:50 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] sequence proxy server > > Hi all, > > I'm an undergrad student in molecular biology at the ANU in Australia, and > my research projects are becoming increasingly bioinformatics heavy. The > latest one has involved quite a large amount of sequence retrieval from > GenBank and GenPept. The download speed to Australia from NCBI's servers > is rather slow, and i've been thinking about how we can improve this. One > solution would be to use Bio::DB::Flat with GenBank sequences on a local > computer. However, in a situation where there are multiple people in a lab > doing bioinformatics, it seems to me a bit of a waste to have the entire > genbank/genpept database, or even the relevant sections thereof, on each > computer. So, i though about writing a "sequence proxy" cgi script, and a > corresponding module, which would work a bit like this: > > The user calls Bio::DB::SeqProxy::GenBank as they would Bio::DB::GenBank, > with the exception that a parameter for the address of the sequence proxy > server is required. > The module then sends a request similar to that sent to NCBI's servers by > calling Bio::DB::GenBank->get_Seq_by_x() to the sequence proxy server I > believe all requests go to the efetch page now (please correct me if I'm > wrong, i have read the relevant bioperl module code but not thoroughly), so > the CGI script on the sequence proxy would take arguments in a similar > fashion to make writing the client side module easier. > The CGI script would use a Bio::DB::Flat database, or an interface to an SQL > database to determine if the required sequence is stored locally. (as a aside, > i'd like your thoughts on Bio::DB::Flat vs Bio::DB::Sql or similar) If the > sequence exists locally, it would be returned to the user, either as plain text, > or inside an XML container (see below). > If not, it would be retrieved from the remote database using the relevant > Bio::DB module, and returned. > > The sequence would either be returned as the relevant sequence format > (which would default to GenBank format) in plain text, or as an XML > document similar to: > > > 1 > ___YOUR GENBANK FILE HERE___ Local > Database The aim of the xml document would be to > simplify handling of server errors and allow for the specification of other > metadata such as which database the sequence came from. > > > Firstly, I'd like to know if this sounds feasible, and if so, if someone is already > working on something similar? I don't want to reinvent the wheel. > Secondly, I'd like to ask for your comments and advice. Being reasonably new > to bioperl (started using bioperl about 6 months ago, but I've been coding in > various languages for 8 years) I don't expect to have considered things that > may seem obvious to a more experienced bioperl-er, so please be as brutally > constructive in your criticism as you see fit =]. > > I know this is alot of questions, so thanks in advance for your help. > > Cheers, and a happy Easter to those who celebrate it. > > Regards > Kevin Murray > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From hnorpois at googlemail.com Thu Apr 19 10:44:50 2012 From: hnorpois at googlemail.com (Hermann Norpois) Date: Thu, 19 Apr 2012 16:44:50 +0200 Subject: [Bioperl-l] Transcriptional Regulatory Element Database Message-ID: Hello, I would like to get access to the Transcriptional Regulatory Element Database (http://rulai.cshl.edu/cgi-bin/TRED/tred.cgi?process=searchPromForm) via Bioperl. I did not find a module that does the job. Is it possible to modify a module? Is it generally possible to access this database (by means of bioperl)? Thank you norpois From jason.stajich at gmail.com Thu Apr 19 18:45:32 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 19 Apr 2012 15:45:32 -0700 Subject: [Bioperl-l] Transcriptional Regulatory Element Database In-Reply-To: References: Message-ID: <80CFDDE6-FA7F-4614-AE5D-22A5398EAA17@gmail.com> Have you first tried emailing the author listed at the bottom of the page? That seems like a more direct way to get this information. On Apr 19, 2012, at 7:44 AM, Hermann Norpois wrote: > Hello, > > I would like to get access to the Transcriptional Regulatory Element > Database (http://rulai.cshl.edu/cgi-bin/TRED/tred.cgi?process=searchPromForm) > via Bioperl. I did not find a module that does the job. Is it possible to > modify a module? Is it generally possible to access this database (by means > of bioperl)? > Thank you > norpois > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From merche at uni-bonn.de Mon Apr 23 08:31:27 2012 From: merche at uni-bonn.de (Merche Castillo) Date: Mon, 23 Apr 2012 14:31:27 +0200 Subject: [Bioperl-l] Problems installing BioPerl & cpan Message-ID: <4F954B9F.9020506@uni-bonn.de> I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation | use strict; use warnings; use Getopt::Long; use Bio::EnsEMBL::Registry; my $reg = "Bio::EnsEMBL::Registry"; $reg->load_registry_from_db( -host => "ensembldb.ensembl.org", -user => "anonymous" ); my $db_list=$reg->get_all_adaptors(); my @line; foreach my $db (@$db_list){ @line = split ('=',$db); print $line[0]."\n"; } | I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. Thanks for your help Merche -- ************************************;) Mercedes Castillo INRES, Dept. Molecular Phytomedicine University of Bonn Karlrobert-Kreiten-str 13 53115 Bonn +49(0)22873-60143 merche at uni-bonn.de ***************************************** From jason.stajich at gmail.com Mon Apr 23 09:44:51 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 23 Apr 2012 06:44:51 -0700 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <4F954B9F.9020506@uni-bonn.de> References: <4F954B9F.9020506@uni-bonn.de> Message-ID: <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> That module really has nothing to do with Fgenesh parsing so I don't think this is a particularly good test script - try one of the scripts that comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl However, you should know that module is part of EnsEMBL not BioPerl - it requires an additional package of modules be installed. if you use CPAN to install things you can do cpan> install Bio::EnsEMBL::Registry On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: > I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc > > I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation > > | use strict; > use warnings; > > use Getopt::Long; > use Bio::EnsEMBL::Registry; > > my $reg = "Bio::EnsEMBL::Registry"; > $reg->load_registry_from_db( > -host => "ensembldb.ensembl.org", > -user => "anonymous" > ); > my $db_list=$reg->get_all_adaptors(); > my @line; > > foreach my $db (@$db_list){ > @line = split ('=',$db); > print $line[0]."\n"; > } > | > > I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* > > I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. > > Thanks for your help Merche > > -- > ************************************;) > Mercedes Castillo > INRES, Dept. Molecular Phytomedicine > University of Bonn > > Karlrobert-Kreiten-str 13 > 53115 Bonn > +49(0)22873-60143 > merche at uni-bonn.de > ***************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Mon Apr 23 09:48:53 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 23 Apr 2012 13:48:53 +0000 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <4F954B9F.9020506@uni-bonn.de> References: <4F954B9F.9020506@uni-bonn.de> Message-ID: <26594DC3-0C8D-41E5-BD22-BC3F1DC7E1F0@illinois.edu> You need the Ensembl Perl API code, which requires bioperl but is not part of the bioperl distribution. See here for the latest: http://ensembl.org/info/docs/api/index.html chris On Apr 23, 2012, at 7:31 AM, Merche Castillo wrote: > I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc > > I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation > > | use strict; > use warnings; > > use Getopt::Long; > use Bio::EnsEMBL::Registry; > > my $reg = "Bio::EnsEMBL::Registry"; > $reg->load_registry_from_db( > -host => "ensembldb.ensembl.org", > -user => "anonymous" > ); > my $db_list=$reg->get_all_adaptors(); > my @line; > > foreach my $db (@$db_list){ > @line = split ('=',$db); > print $line[0]."\n"; > } > | > > I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* > > I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. > > Thanks for your help Merche > > -- > ************************************;) > Mercedes Castillo > INRES, Dept. Molecular Phytomedicine > University of Bonn > > Karlrobert-Kreiten-str 13 > 53115 Bonn > +49(0)22873-60143 > merche at uni-bonn.de > ***************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Mon Apr 23 09:54:54 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 23 Apr 2012 06:54:54 -0700 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <4F955E52.50400@uni-bonn.de> References: <4F954B9F.9020506@uni-bonn.de> <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> <4F955E52.50400@uni-bonn.de> Message-ID: <78EB7156-3EC8-4CCD-AE6E-C221B12D4F58@gmail.com> Then the next logical thing to do is go to the Ensembl page for info on how to install their modules. http://uswest.ensembl.org/info/docs/api/api_installation.html On Apr 23, 2012, at 6:51 AM, Merche Castillo wrote: > Hi > > Thanks for your reply. I'm working on some EnsEMBL scripts too, that's why I tried this script. I did look for the Bio::EnsEMBL::Registry on cpan but returns "no object found". > > > > On 04/23/2012 03:44 PM, Jason Stajich wrote: >> >> That module really has nothing to do with Fgenesh parsing so I don't think this is a particularly good test script - try one of the scripts that comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl >> >> However, you should know that module is part of EnsEMBL not BioPerl - it requires an additional package of modules be installed. >> >> if you use CPAN to install things you can do >> cpan> install Bio::EnsEMBL::Registry >> >> >> On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: >> >>> I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc >>> >>> I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation >>> >>> | use strict; >>> use warnings; >>> >>> use Getopt::Long; >>> use Bio::EnsEMBL::Registry; >>> >>> my $reg = "Bio::EnsEMBL::Registry"; >>> $reg->load_registry_from_db( >>> -host => "ensembldb.ensembl.org", >>> -user => "anonymous" >>> ); >>> my $db_list=$reg->get_all_adaptors(); >>> my @line; >>> >>> foreach my $db (@$db_list){ >>> @line = split ('=',$db); >>> print $line[0]."\n"; >>> } >>> | >>> >>> I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* >>> >>> I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. >>> >>> Thanks for your help Merche >>> >>> -- >>> ************************************;) >>> Mercedes Castillo >>> INRES, Dept. Molecular Phytomedicine >>> University of Bonn >>> >>> Karlrobert-Kreiten-str 13 >>> 53115 Bonn >>> +49(0)22873-60143 >>> merche at uni-bonn.de >>> ***************************************** >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> > > > -- > ************************************;) > Mercedes Castillo > INRES, Dept. Molecular Phytomedicine > University of Bonn > > Karlrobert-Kreiten-str 13 > 53115 Bonn > +49(0)22873-60143 > merche at uni-bonn.de > ***************************************** Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Mon Apr 23 09:51:24 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 23 Apr 2012 13:51:24 +0000 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> References: <4F954B9F.9020506@uni-bonn.de> <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> Message-ID: <10108081-D3CC-4A8F-9372-390A2C377088@illinois.edu> Unfortunately the Ensembl code isn't on CPAN (wish it were, it might make things a little easier). chris On Apr 23, 2012, at 8:44 AM, Jason Stajich wrote: > That module really has nothing to do with Fgenesh parsing so I don't think this is a particularly good test script - try one of the scripts that comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl > > However, you should know that module is part of EnsEMBL not BioPerl - it requires an additional package of modules be installed. > > if you use CPAN to install things you can do > cpan> install Bio::EnsEMBL::Registry > > > On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: > >> I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc >> >> I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation >> >> | use strict; >> use warnings; >> >> use Getopt::Long; >> use Bio::EnsEMBL::Registry; >> >> my $reg = "Bio::EnsEMBL::Registry"; >> $reg->load_registry_from_db( >> -host => "ensembldb.ensembl.org", >> -user => "anonymous" >> ); >> my $db_list=$reg->get_all_adaptors(); >> my @line; >> >> foreach my $db (@$db_list){ >> @line = split ('=',$db); >> print $line[0]."\n"; >> } >> | >> >> I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* >> >> I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. >> >> Thanks for your help Merche >> >> -- >> ************************************;) >> Mercedes Castillo >> INRES, Dept. Molecular Phytomedicine >> University of Bonn >> >> Karlrobert-Kreiten-str 13 >> 53115 Bonn >> +49(0)22873-60143 >> merche at uni-bonn.de >> ***************************************** >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From l.m.timmermans at students.uu.nl Mon Apr 23 10:16:04 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Mon, 23 Apr 2012 16:16:04 +0200 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <10108081-D3CC-4A8F-9372-390A2C377088@illinois.edu> References: <4F954B9F.9020506@uni-bonn.de> <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> <10108081-D3CC-4A8F-9372-390A2C377088@illinois.edu> Message-ID: Yeah, that's fairly useless. Does anyone know the reason for that? Is it just inertia, or is it more? Leon On Mon, Apr 23, 2012 at 3:51 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > Unfortunately the Ensembl code isn't on CPAN (wish it were, it might make > things a little easier). > > chris > > On Apr 23, 2012, at 8:44 AM, Jason Stajich wrote: > > > That module really has nothing to do with Fgenesh parsing so I don't > think this is a particularly good test script - try one of the scripts that > comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl > > > > However, you should know that module is part of EnsEMBL not BioPerl - it > requires an additional package of modules be installed. > > > > if you use CPAN to install things you can do > > cpan> install Bio::EnsEMBL::Registry > > > > > > On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: > > > >> I'm posting this message out of pure desperation, because I really > don't know what else to try. I'm a beginner in bioperl and I'm working on a > script to parse out some results I got from MolQuest fgenesh. Results are > out in .txt format and I want to parse them to GFF and fasta file for mRNA > and protein sequences to facilitate comparison with other results we have. > I would like to use BioPerl for other purposes in the future so I'm very > interested in getting it ready on my pc > >> > >> I followed the instructions herehttp:// > www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install > CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All > tests were ok, but when I ran this script to test the installation > >> > >> | use strict; > >> use warnings; > >> > >> use Getopt::Long; > >> use Bio::EnsEMBL::Registry; > >> > >> my $reg = "Bio::EnsEMBL::Registry"; > >> $reg->load_registry_from_db( > >> -host => "ensembldb.ensembl.org", > >> -user => "anonymous" > >> ); > >> my $db_list=$reg->get_all_adaptors(); > >> my @line; > >> > >> foreach my $db (@$db_list){ > >> @line = split ('=',$db); > >> print $line[0]."\n"; > >> } > >> | > >> > >> I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* > >> > >> I tried to install BioPerl again via Build.PL, running as root, but > still came to the same outcome. > >> > >> Thanks for your help Merche > >> > >> -- > >> ************************************;) > >> Mercedes Castillo > >> INRES, Dept. Molecular Phytomedicine > >> University of Bonn > >> > >> Karlrobert-Kreiten-str 13 > >> 53115 Bonn > >> +49(0)22873-60143 > >> merche at uni-bonn.de > >> ***************************************** > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > > jason.stajich at gmail.com > > jason at bioperl.org > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Apr 23 10:20:59 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 23 Apr 2012 14:20:59 +0000 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: References: <4F954B9F.9020506@uni-bonn.de> <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> <10108081-D3CC-4A8F-9372-390A2C377088@illinois.edu> Message-ID: <70FCB632-4CD5-4F28-A6B6-F93507397435@illinois.edu> Not sure, but it may have something to do with the requirement for a very old bioperl (v1.2.3). chris On Apr 23, 2012, at 9:16 AM, Leon Timmermans wrote: > Yeah, that's fairly useless. Does anyone know the reason for that? Is it just inertia, or is it more? > > Leon > > On Mon, Apr 23, 2012 at 3:51 PM, Fields, Christopher J wrote: > Unfortunately the Ensembl code isn't on CPAN (wish it were, it might make things a little easier). > > chris > > On Apr 23, 2012, at 8:44 AM, Jason Stajich wrote: > > > That module really has nothing to do with Fgenesh parsing so I don't think this is a particularly good test script - try one of the scripts that comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl > > > > However, you should know that module is part of EnsEMBL not BioPerl - it requires an additional package of modules be installed. > > > > if you use CPAN to install things you can do > > cpan> install Bio::EnsEMBL::Registry > > > > > > On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: > > > >> I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc > >> > >> I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation > >> > >> | use strict; > >> use warnings; > >> > >> use Getopt::Long; > >> use Bio::EnsEMBL::Registry; > >> > >> my $reg = "Bio::EnsEMBL::Registry"; > >> $reg->load_registry_from_db( > >> -host => "ensembldb.ensembl.org", > >> -user => "anonymous" > >> ); > >> my $db_list=$reg->get_all_adaptors(); > >> my @line; > >> > >> foreach my $db (@$db_list){ > >> @line = split ('=',$db); > >> print $line[0]."\n"; > >> } > >> | > >> > >> I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* > >> > >> I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. > >> > >> Thanks for your help Merche > >> > >> -- > >> ************************************;) > >> Mercedes Castillo > >> INRES, Dept. Molecular Phytomedicine > >> University of Bonn > >> > >> Karlrobert-Kreiten-str 13 > >> 53115 Bonn > >> +49(0)22873-60143 > >> merche at uni-bonn.de > >> ***************************************** > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > > jason.stajich at gmail.com > > jason at bioperl.org > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rbuels at gmail.com Mon Apr 23 19:49:10 2012 From: rbuels at gmail.com (Robert Buels) Date: Mon, 23 Apr 2012 19:49:10 -0400 Subject: [Bioperl-l] Announcing OBF Google Summer of Code Accepted Students Message-ID: <4F95EA76.4030004@gmail.com> Hello all, I'm very pleased and excited to announce that the Open Bioinformatics Foundation has selected 5 very capable students to work on OBF projects this summer as part of the Google Summer of Code program. The accepted students, their projects, and their mentors (in alphabetical order): Wibowo Arindrarto SearchIO Implementation in Biopython mentored by Peter Cock Lenna Peterson Diff My DNA: Development of a Genomic Variant Toolkit for Biopython mentored by Brad Chapman Marjan Povolni The worlds fastest parallelized GFF3/GTF parser in D, and an interfacing biogem plugin for Ruby mentored by Pjotr Prins, Francesco Strozzi, Raoul Bonnal Artem Tarasov Fast parallelized GFF3/GTF parser in C++, with Ruby FFI bindings mentored by Pjotr Prins, Francesco Strozzi, Raoul Bonnal Clayton Wheeler Multiple Alignment Format parser for BioRuby mentored by Francesco Strozzi and Raoul Bonnal As in every year, we received many great applications and ideas. However, funding and mentor resources are limited, and we were not able to accept as many as we would have liked. Our deepest thanks to all the students who applied: we sincerely appreciate the time and effort you put into your applications, and hope you will still consider being a part of the OBF's open source projects, even without Google funding. I speak for myself and all of the mentors who read and scored applications when I say that we were truly honored by the number and quality of the applications we received. For the accepted students: congratulations! You have risen to the top of a very competitive application process. Now it's time to "put your money where your mouth is", as the saying goes. Let's get out there and write some great code this summer! Best regards, Rob ---- Robert Buels OBF GSoC 2012 Administrator From Simon.Guest at agresearch.co.nz Mon Apr 30 02:00:26 2012 From: Simon.Guest at agresearch.co.nz (Guest, Simon) Date: Mon, 30 Apr 2012 18:00:26 +1200 Subject: [Bioperl-l] Circular dependency problems packaging BioPerl as RPM Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCECE5EF4@exchsth.agresearch.co.nz> Dear BioPerlers, I am building BioPerl and BioPerl-Run as RPMs, since I have to install them on several servers, and really don't want to run CPAN installation scripts on each machine. It has been a tortuous journey of chasing down dependencies and packaging them (thank goodness for cpanspec), but I think I am nearly done. However, I have hit a circular dependency / incompatibility problem between BioPerl and BioPerl-Run. When building BioPerl-Run-1.006900, it insists on BioPerl-1.6.901: Checking prerequisites... - ERROR: Bio::Root::Version (1.006001) is installed, but we need version >= 1.006900 But then BioPerl-Run-1.006900 has dependencies on Bio::Expression::DataSet Bio::Expression::Platform Bio::Expression::Sample Bio::Expression::Contact which seem to be in BioPerl-1.6.1 but not BioPerl-1.6.901 Does anyone know of this problem? Are there any suggestions for work arounds? cheers, Simon ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Mon Apr 30 09:42:34 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 30 Apr 2012 13:42:34 +0000 Subject: [Bioperl-l] Circular dependency problems packaging BioPerl as RPM In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF34CCECE5EF4@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF34CCECE5EF4@exchsth.agresearch.co.nz> Message-ID: <3E2BD058-EDB0-482B-9727-8DB7FB466737@illinois.edu> The Bio::Expression dependencies are unusual, I'll have to look through and find the modules responsible for pulling these in. When I last ran these no tests failed, so either the dependency is off or no tests have been written for the modules in question. We can always release a new CPAN BioPerl-Run to deal with it. chris On Apr 30, 2012, at 1:00 AM, Guest, Simon wrote: > Dear BioPerlers, > > I am building BioPerl and BioPerl-Run as RPMs, since I have to install them on > several servers, and really don't want to run CPAN installation scripts on > each machine. > > It has been a tortuous journey of chasing down dependencies and packaging them > (thank goodness for cpanspec), but I think I am nearly done. > > However, I have hit a circular dependency / incompatibility problem between > BioPerl and BioPerl-Run. > > When building BioPerl-Run-1.006900, it insists on BioPerl-1.6.901: > Checking prerequisites... > - ERROR: Bio::Root::Version (1.006001) is installed, but we need version >= 1.006900 > > But then BioPerl-Run-1.006900 has dependencies on > Bio::Expression::DataSet > Bio::Expression::Platform > Bio::Expression::Sample > Bio::Expression::Contact > which seem to be in BioPerl-1.6.1 but not BioPerl-1.6.901 > > Does anyone know of this problem? > > Are there any suggestions for work arounds? > > cheers, > Simon > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hnorpois at googlemail.com Mon Apr 30 12:45:50 2012 From: hnorpois at googlemail.com (Hermann Norpois) Date: Mon, 30 Apr 2012 18:45:50 +0200 Subject: [Bioperl-l] different interpretion of get_seq_by_id by DB::GenBank and DB:Entrez::Gene Message-ID: I am a confused by the different interpretation of get_seq_by_id. Obviously it is something different for the two modules. Script1: #!/bin/perl -w use Bio::DB::GenBank; use Bio::SeqIO; # Das output-Format wird festgelegt $seqio_obj = Bio::SeqIO->new(-file => '>sequence.fasta', -format => 'fasta' ); $db_obj = Bio::DB::GenBank->new; $id = "BC049766"; # accesscion number $seq_obj = $db_obj->get_Seq_by_id($id); $seqio_obj->write_seq($seq_obj); Script2: #!/bin/perl -w use strict; use Bio::DB::EntrezGene; my $id = "Penk1"; #name of the gene my $db = new Bio::DB::EntrezGene; my $seq = $db->get_Seq_by_id($id); my $ac = $seq->annotation; for my $ann ($ac->get_Annotations('dblink')) { if ($ann->database eq "Evidence Viewer") { # get the sequence identifier, the start, and the stop my ($contig,$from,$to) = $ann->url =~ /contig=([^&]+).+from=(\d+)&to=(\d+)/; print "$contig\t$from\t$to\n"; } } Thank you Hermann Norpois From jimhu at tamu.edu Mon Apr 30 13:38:23 2012 From: jimhu at tamu.edu (Jim Hu) Date: Mon, 30 Apr 2012 12:38:23 -0500 Subject: [Bioperl-l] Gbrowse file uploads, bigwig and chromosome sizes files Message-ID: <1F4B23DC-2CD1-4D61-A6F4-D823B4C7C7D1@tamu.edu> I'm not sure how many of our issues are gbrowse-specific vs. more general bioperl issues, so I'm cross-posting to both lists. We think we've traced our problems uploading wiggle files to our gbrowse to the failure to create the chromosome.size file. Short version: - what is supposed to be in the locationlist? Chromosomes only or just genes? - why does the chromosome sizes try to get everything in the locationlist, whether or not it's a chromosome? Long version: Our E. coli MG1655 database was loaded several years ago with bp_seqfeature_load.pl -d gb_MG1655_jh -f -c NC_000913.gb.gff NC_000913.gb.fasta -u -p The mysql database has 4,146 entries in the locationlist where the first one is for the chromosome and the others are named for genes. When we ask Gbrowse to generate the chromosome sizes file, instead of doing what I expect (look up the reference feature names), it tries to get the size of every feature in the locationlist. I can't actually find the fasta file I used. When this happens, the eval in Bio::Graphics::Broser2::Dataloader dies because it does not seem to be passing allow_aliases to this subroutine in Bio::DB::Seqfeature::Store:: DBI::mysql sub _name_sql { my $self = shift; my ($name,$allow_aliases,$join) = @_; my $name_table = $self->_name_table; my $from = "$name_table as n"; my ($match,$string) = $self->_match_sql($name); my $where = "n.id=$join AND n.name $match"; $where .= " AND n.display_name>0" unless $allow_aliases; return ($from,$where,'',$string); } Here's the backtrace: CHROMOSOME SIZES at /usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 942, referer: Bio::DB::SeqFeature::Store::DBI::mysql::_name_sql('Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0xb2bfed0)', 'b0001', undef, 'f.id') called at /usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/DBI/mysql.pm Bio::DB::SeqFeature::Store::DBI::mysql::_features('Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0xb2bfed0)', '-name', 'b0001', '-class', undef, '-aliases', undef, Bio::DB::SeqFeature::Store::get_features_by_name('Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0xb2bfed0)', '-name', 'b0001') called at /usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store.pm line Bio::DB::SeqFeature::Store::segment('Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0xb2bfed0)', 'b0001') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/DataLoader.pm line 171, eval {...} called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/DataLoader.pm line 169, Bio::Graphics::Browser2::DataLoader::generate_chrom_sizes('Bio::Graphics::Browser2::DataLoader=HASH(0xb2bfbd0)', '/var/tmp/gbrowse2/chrom_sizes/MG1655.sizes') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/DataLoader.pm line 143, Bio::Graphics::Browser2::DataLoader::chrom_sizes('Bio::Graphics::Browser2::DataLoader=HASH(0xb2bfbd0)') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/Action.pm line 1117, referer: Bio::Graphics::Browser2::Action::ACTION_chrom_sizes('Bio::Graphics::Browser2::Action=REF(0xa993ea0)', 'CGI=HASH(0xaf57450)') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/Render.pm line 427, Bio::Graphics::Browser2::Render::asynchronous_event('Bio::Graphics::Browser2::Render::HTML=HASH(0xaf590d8)') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/Render.pm line 356, referer: Bio::Graphics::Browser2::Render::run_asynchronous_event('Bio::Graphics::Browser2::Render::HTML=HASH(0xaf590d8)') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/Render.pm line 274, referer: Bio::Graphics::Browser2::Render::run('Bio::Graphics::Browser2::Render::HTML=HASH(0xaf590d8)') called at /usr/lib/cgi-bin/gb2/gbrowse line 50, referer: ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From hnorpois at googlemail.com Mon Apr 30 14:06:40 2012 From: hnorpois at googlemail.com (Hermann Norpois) Date: Mon, 30 Apr 2012 20:06:40 +0200 Subject: [Bioperl-l] Retrieving promoter sequenc Message-ID: Dear list, I try to write a script for retrieving a 700bp sequence upstream of the 5?prime of TTS (a putative promoter sequence). This page gave me some information how to do so (Chapter *Using Bio::DB::EntrezGene to get genomic coordinates* AND *Using Bio::DB::GenBank when you have genomic coordinates to get a Seq object*): http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences Actually I do not have an idea how to define $chr_acc_ver (see below) #!/bin/perl -w use strict; use Bio::DB::EntrezGene; use Bio::SeqIO; use Bio::DB::GenBank; my $id = "12064"; # bdnf my $seqio_obj = Bio::SeqIO->new(-file => '>s2.fasta', -format => 'fasta' ); my $db = new Bio::DB::EntrezGene; my $seq = $db->get_Seq_by_id($id); my $ac = $seq->annotation; for my $ann ($ac->get_Annotations('dblink' )) { if ($ann->database eq "Evidence Viewer") { # get the sequence identifier, the start, and the stop my ($contig,$from,$to) = $ann->url =~ /contig=([^&]+).+from=(\d+)&to=(\d+)/; my $chr_start = $from-700; my $chr_stop = $from; my $gb = Bio::DB::GenBank->new(-format => 'genbank', -seq_start => $chr_start, -seq_stop => $chr_stop, # -strand => $strand ); my $obj = $gb->get_Seq_by_id($chr_acc_ver); # *How do I define $chr_acc_ver?* $seqio_obj->write_seq($obj); # print "$contig\t$from\t$to\n$chr_start\t$chr_stop\n"; } } Can anybody give me a hint how this might work? Thanks Hermann Norpois From maquino at knome.com Mon Apr 30 15:15:26 2012 From: maquino at knome.com (Mark Aquino) Date: Mon, 30 Apr 2012 15:15:26 -0400 Subject: [Bioperl-l] unblessed reference in $sam->pileup error. Message-ID: <984E64E7-DF37-4BD1-BB84-DC86816A42E8@knome.com> Hi all, I'm trying to call all bases from a bam and count their depths, at first I was doing this getting all alignments that cover a certain region, but realized that writing the logic to detect indels via the cigar string was a bit more complicated than I thought so I decided to try this with the pileup method from Bio::DB::Sam / Bio::DB::Bam::Pileup however I am getting this error: Can't call method "b" on unblessed reference at ./coverageDepths.pl line 114, line 1. when trying to use the $pileup->alignment method. Does anyone have any idea what I'm missing? 109 $sam->pileup('1:550968-550969', 110 sub { 111 my ($seqid,$pos,$pileup) = @_; 112 for my $p (@$pileup){ 113 if ($p->indel){ print "INDEL!\n"}; 114 my $b = $pileup->b; 115 my $qbase = substr($b->qseq, $pileup->qpos,1); 116 print "$qbase\n"; 117 } 118 }); Thanks, Mark From maquino at knome.com Mon Apr 30 15:18:35 2012 From: maquino at knome.com (Mark Aquino) Date: Mon, 30 Apr 2012 15:18:35 -0400 Subject: [Bioperl-l] unblessed reference on sam->pileup Message-ID: <33D55D60-7986-4971-9802-47AB9CDE3E24@knome.com> Nevermind, as usual 5 seconds after sending an email to the group I realized what I was doing wrong the whole time. From Simon.Guest at agresearch.co.nz Mon Apr 30 23:29:58 2012 From: Simon.Guest at agresearch.co.nz (Guest, Simon) Date: Tue, 1 May 2012 15:29:58 +1200 Subject: [Bioperl-l] Circular dependency problems packaging BioPerl as RPM In-Reply-To: <3E2BD058-EDB0-482B-9727-8DB7FB466737@illinois.edu> References: <18DF7D20DFEC044098A1062202F5FFF34CCECE5EF4@exchsth.agresearch.co.nz> <3E2BD058-EDB0-482B-9727-8DB7FB466737@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCECE61BE@exchsth.agresearch.co.nz> > -----Original Message----- > From: Fields, Christopher J [mailto:cjfields at illinois.edu] > Sent: Tuesday, 1 May 2012 1:43 a.m. > To: Guest, Simon > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Circular dependency problems packaging BioPerl as > RPM > > The Bio::Expression dependencies are unusual, I'll have to look through and > find the modules responsible for pulling these in. When I last ran these no > tests failed, so either the dependency is off or no tests have been written for > the modules in question. > > We can always release a new CPAN BioPerl-Run to deal with it. Hi Chris, I ignored the Bio::Expression dependencies and everything eventually built OK, using BioPerl-1.6.901 and BioPerl-Run-1.006900. If you release a new BioPerl-Run, I would be interested in packaging it, as I have come this far. Do you have any ideas about where I could submit the BioPerl and dependency RPMs I built for CentOS 6? I now have around 40 RPMs that weren't in CentOS or EPEL, which were all built straight from CPAN using cpanspec. I guess others might like to benefit from this (and it would also serve to validate the builds). My other unknown is what non-Perl dependencies I should add to the BioPerl RPM. I don't know what to do here. The dependencies page on the BioPerl Wiki seems to list only Perl module dependencies. cheers, Simon ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From exceptlowang at gmail.com Tue Apr 17 20:00:08 2012 From: exceptlowang at gmail.com (Tim White) Date: Wed, 18 Apr 2012 00:00:08 -0000 Subject: [Bioperl-l] Bio::SeqIO::tab deletes gap characters when reading sequences, which is inconvenient Message-ID: <4F8E03FE.7000506@gmail.com> Hi, Bio::SeqIO::tab (what you get when specifying -format => 'tab' to Bio::SeqIO->new()) is perfect for converting sequences into a one-per-line format, so that standard line-oriented UNIX tools (grep, comm etc.) work as expected. Except... I just discovered that it deletes gap ("-") characters when reading sequences, so it can't be used to round-trip any files that contain these. This is a source of grief as I frequently work with FASTA files that contain aligned sequences, and thus gap characters. This is all because the next_seq() function in Bio::SeqIO::tab.pm contains the line: $seq =~ s/\W//g; which removes all non-alphanumeric characters from the sequence data. IMHO it would be *much* better if this was changed to: $seq =~ s/\s//g; which simply removes all whitespace characters (particularly including the \r that often appears at the ends of lines on text files that have visited Windows), enabling gap characters (and, for example, periods and asterisks) to be preserved. Alternatively, you could simply get rid of this line of code and allow whitespace characters through. I'm not sure whether this counts as a "bug", as a cursory search didn't turn up any docs explaining precisely what characters are and aren't preserved by classes implementing Bio::SeqIO, but it's certainly inconsistent (at least Bio::SeqIO::fasta, and Bio::SeqIO::table, with columns and delimiters set up appropriately, allow round-tripping of files containing gap characters) as well as extremely inconvenient for me personally, and I suspect for others. Assuming no harm would be done by making the above change, what's the best thing to do to get this changed? I've simply edited my own local copy of tab.pm to make the above change, but obviously if others agree I'd like to get the change done upstream. Thanks, Tim From mohammadali.alavi at edu.uni-graz.at Sat Apr 21 05:22:21 2012 From: mohammadali.alavi at edu.uni-graz.at (Alavi, Mohammadali (0313xxx)) Date: Sat, 21 Apr 2012 11:22:21 +0200 Subject: [Bioperl-l] piping values into an existing GENBANK file Message-ID: <70DA93B804A15C4387B05DEF33BC255701A1CE149E36@MSIGI.stud.ad.uni-graz.at> Hello All, I have a GENBANK file already, to which I need to add some feauture. To be precise, I want to add the data (over the COG function) to the CDSs present in the GENBANK file. The data (COG functions) I need to add is included in an array in a manner that the first value is the value needed to be added to my first CDS in the GENBANK file, the second value needs to be added to the second CDS in the GENBANK file and so on. I tried to add the data in a tag/value style to the CDSs (as described in HOW TO:Feautures-Annotation provided by Biopel), which actually basically works. The Problem is though, I do not know how I could tell Perl/Bioperl to only take one single value at a time and add it in a tag/value style to a CDS and then take the next (and only the next) value and add it to the NEXT CDS and so on. Here is the code I used. As you see, using the for $item(@array) is not appropriate, since it adds all the values of my array to all CDSs! So is there a way of piping in values one after another into CDSs one after another in a file using Bioperl?! or maybe how about another way of doing it in regular Perl? I would appreciate any help on that very much! Bioperl I'm using: 1.6.1 The Active Perl I'm using : 5.12.4 (on Windows Vista) #!/bin/perl use Bio::SeqIO; use Bio::SeqFeature::Generic; use warnings; @COGlist = qw(motility General metabolism nunknown); # think of this as the #array I would like to add the values of to my file, the real one has ofcourse #as many values as the number of CDSs in the GENBANK file $seqio_object = Bio::SeqIO -> new(-file => "file.gbk", -format => "genbank"); $seq_object = $seqio_object -> next_seq; for $feat_object ($seq_object -> get_SeqFeatures){ for $item(@COGlist){ # this would add all elements of the array to all of CDSs and is therefore wrong! $feat_object -> add_tag_value("note", $item); } for $tags ($feat_object -> get_all_tags){ print "tag:".$tags . "\n"; for $values ($feat_object -> get_tag_values($tags)){ print "value: " . $values . "\n"; # as one might imagine this does not give the output I have been looking for :-)) } } } From huansheng.xu at gmail.com Sun Apr 22 10:15:44 2012 From: huansheng.xu at gmail.com (Huansheng Xu) Date: Sun, 22 Apr 2012 10:15:44 -0400 Subject: [Bioperl-l] configuration problem with Bio::Tools::Run::Alignment::ClustalW Message-ID: Hi, I am a postdoc fellow at Massachusetts General Hospital in Boston. I am writing to seek help with the Bio::Tools::Run::Alignment::ClustalW module available at the BioPerl website. I tried to align some DNA sequences contained in a FASTA file with the module embeded in a propram (as shown below), but got stuck there. The program works very well for protein sequences. I think maybe I need to configure the module specifically for DNA, but I do not know how to do that. Could you take a look and let me know how to do the configuration? Thanks a lot! Best, Huansheng Xu -------------------------------------------------------------------------------------------------------------------------------------------------------------------- #! /usr/bin/perl use Bio::Perl; use Bio::SearchIO; use Bio::AlignIO; use Bio::Tools::Run::Alignment::Clustalw; use warnings; use strict; my $filename = $ARGV[0]; die "Usage: $0 \n" unless $filename; die "File $filename not found.\n" unless -f $filename; # Read the list of raw sequences from the file you feed the program my $fh = Bio::SeqIO->newFh(-file=>$filename, -format=>'fasta'); my @seq_array=<$fh>; # pass the parameters and generate a factory to run the alignmnet wiht ClustalW my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); @params = ('ktuple' => 2, 'dnamatrix' => 'IUB') if ($seq_array[0]->alphabet eq 'dna'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); my $seq_array_ref = \@seq_array; my $aln = $factory->align($seq_array_ref); # create a new AlignIO object my $out = Bio::AlignIO->new(-file=> ">$filename.aln", -format=> 'clustalw'); $out->write_aln($aln); From bubli_thakur at rediffmail.com Fri Apr 20 22:59:50 2012 From: bubli_thakur at rediffmail.com (subarna thakur) Date: 21 Apr 2012 02:59:50 -0000 Subject: [Bioperl-l] =?utf-8?q?codon_usage?= Message-ID: <20120421025950.8579.qmail@f4mail-235-122.rediffmail.com> I am writing a script for determining number of genes containing a particular codon. The codons are mentioned in a separate file. The output is coming all right for the first codon mentioned in the file but for the other codons , the script is not working. Please suggest the error in the script. The script is as follows ---- #!/usr/bin/perl -w use Bio::SeqIO; $file2="table.txt"; $codon=0; open OUT, ">out-test.txt" or die $!; $seqio_obj = Bio::SeqIO->new( -file => "gopi2.txt" , '-format' => 'Fasta'); open( my $fh2, $file2 ) or die "$!"; while( my $line = <$fh2> ){ $acc=$line; chomp $acc; while ($seq1= $seqio_obj->next_seq){ my @output = $seq1->id; my $string = $seq1->seq; $v=0; $l= length($string); $t=$l/3; $k=0; for ($i=1; $i <= $t; $i++){ @array2 = substr($string, $k, 3); $k=$k+3; foreach $value (@array2) { if ($value eq "$acc") { print OUT " The sequence id is @output\n"; print OUT "$acc codon found in position $i\n\n"; $v=$v+1; } } } if ($v==0) { $h=0; } else { $h=1; } $codon=$codon+$h; } print OUT "Total number of sequences with $acc codon"; print OUT "\t"; print OUT $codon; } exit; From msprasad693 at gmail.com Thu Apr 26 08:16:39 2012 From: msprasad693 at gmail.com (prasad ms) Date: Thu, 26 Apr 2012 17:46:39 +0530 Subject: [Bioperl-l] Bioperl for global alignment Message-ID: Hello sir, I am Prasad, student of MS in bioinformatics. I am doing my final year project, and sequence alignment is the part of my project. I am having nearly 50k sequences and i want to do a pairwise global alignment (NW alignment). I read the bioperl tutorial. But in that there is no mention about this. Could you please guide how can i do this type of alignment using bioperl. I assure that all the usage is purely for academic. Looking forward to hear from you. Thank you, Regards, Prasad MS From msprasad693 at gmail.com Mon Apr 30 01:40:43 2012 From: msprasad693 at gmail.com (prasad ms) Date: Mon, 30 Apr 2012 11:10:43 +0530 Subject: [Bioperl-l] Fwd: Bioperl for global alignment In-Reply-To: References: Message-ID: Hello sir, I am Prasad, student of MS in bioinformatics. I am doing my final year project, and sequence alignment is the part of my project. I am having nearly 50k sequences and i want to do a pairwise global alignment (NW alignment). I read the bioperl tutorial. But in that there is no mention about this. Could you please guide how can i do this type of alignment using bioperl. I assure that all the usage is purely for academic. Looking forward to hear from you. Thank you, Regards, Prasad MS From bartwiegmans at gmail.com Sun Apr 1 07:55:01 2012 From: bartwiegmans at gmail.com (Bart Wiegmans) Date: Sun, 1 Apr 2012 13:55:01 +0200 Subject: [Bioperl-l] Implementing Bioperl6 for GSoC 2012 Message-ID: Hello all BioPerl-ers, This is my first e-mail to the list and thus my introduction. My name is Bart Wiegmans, I study biology at the University of Groningen, the netherlands. It is my goal to implement bioperl6 this summer as part of the GSoC program. Why would I want such a thing? For a start, I'd like to learn more about bioinformatics. As I told you I study biology, so this has an obvious advantage for me. Also, I'd like to learn perl6 well, and this is only possible when one writes a significant program in it. Moreover, I think perl6 is awesome, and having a real-world toolkit like bioperl out there might just be enough to develop a significant community using it. As a third, I think perl5's object support is crufty, and difficult to learn for many people. These people include biologists who might not be inclined to learn it, and rather use some other tools instead. As to who I am, I already told you my name. I am 24 years old, and study biology at an undergraduate level. (For those interested, yes this means I haven't exactly been flying through my courses :-)). I have been programming computers ever since I was 16 years old, and earlier if you count BASIC. Starting out with C, most of that has been websites (in PHP), scripts (in Perl), and other smallish programs (in Java / Perl). For example, I implemented a parser and decoder for the dirac video specification as part of GSoC 2008, and a script which reads the NIH bookshelf website and translates this into ePub e-books. Read quite a few of them that way. Aside from my motivation and capabilities, two other factors somewhat complicate my involvement with GSoC. The first is that the academic year ends halfway in July in the netherlands, not in may as in the USA and in many other countries. This means that I am not 'free' in a real sense before that time. Also, I have a day job as a PHP programmer for a local online students' magazine, which also takes some time. Which is unfortunate, because I'd rather spend my time writing useful programs; hence, if you would accept me as a student I plan to take leave from this job during the period of GSoC. Anyway, I realize this has been enough information for any interested reader. If there is any interest on your side, I frequent freenode under the nickname brrt. Other than that and this e-mail address, I don't have much of an online presence. Kind regards, Bart Wiegmans From l.m.timmermans at students.uu.nl Sun Apr 1 10:38:13 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Sun, 1 Apr 2012 16:38:13 +0200 Subject: [Bioperl-l] Implementing Bioperl6 for GSoC 2012 In-Reply-To: References: Message-ID: On Sun, Apr 1, 2012 at 1:55 PM, Bart Wiegmans wrote: > This is my first e-mail to the list and thus my introduction. My name > is Bart Wiegmans, I study biology at the University of Groningen, the > netherlands. It is my goal to implement bioperl6 this summer as part > of the GSoC program. > Cool. Though I am wondering what exactly you want to implement. BioPerl as a whole is 2000 modules, not even a dozen GSOC students could implement that. You will have to focus on something. Also, I'd like to learn perl6 well, and this > is only possible when one writes a significant program in it. > Moreover, I think perl6 is awesome, and having a real-world toolkit > like bioperl out there might just be enough to develop a significant > community using it. How much time do you expect that to cost? Having to learn a new language means you will get less done that you would ordinarily. This doesn't have to be a problem, but do keep it into account. > As a third, I think perl5's object support is > crufty, and difficult to learn for many people. These people include > biologists who might not be inclined to learn it, and rather use some > other tools instead. > Perl 5's object support can be quite elegant with modern OO frameworks such as Moose and relatives. Sadly, BioPerl itself is based on fairly dated paradigms. Aside from my motivation and capabilities, two other factors somewhat > complicate my involvement with GSoC. The first is that the academic > year ends halfway in July in the netherlands, not in may as in the USA > and in many other countries. This means that I am not 'free' in a real > sense before that time. Also, I have a day job as a PHP programmer for > a local online students' magazine, which also takes some time. Which > is unfortunate, because I'd rather spend my time writing useful > programs; hence, if you would accept me as a student I plan to take > leave from this job during the period of GSoC. > Yeah, I'm familiar with that problem, it's rather unfortunate. > Anyway, I realize this has been enough information for any interested > reader. If there is any interest on your side, I frequent freenode > under the nickname brrt. Other than that and this e-mail address, I > don't have much of an online presence. > Well, then come join us at #bioperl and #perl6 then. Leon From cjfields at illinois.edu Sun Apr 1 21:57:53 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 2 Apr 2012 01:57:53 +0000 Subject: [Bioperl-l] Implementing Bioperl6 for GSoC 2012 In-Reply-To: References: Message-ID: <6D16BB69-A74A-4A7E-B81C-0B33E9A73473@illinois.edu> Bart, I think this is a good idea, but to address one concern up-front: I believe the time frame (possible start in mid-July) will not be in your favor unless you can participate according to Google's schedule. There is some flexibility at the beginning, mainly during the community bonding period, but beyond that there isn't much wiggle room; we're as constrained as you are. The proposals to OBF over the last few years have been highly competitive, and this year will prove even more so as a large number of major Perl-based FLOSS orgs (primarily The Perl Foundation) were not accepted into GSoC. Now for Perl 6: BioPerl6 is a project Philip Mabon and I have already started up on github: https://github.com/cjfields/bioperl6 The code is basically proof-of-concept stuff at this point, and I don't believe it's current to spec last I checked or that it works on either the most current niecza or rakudo. All this was enough of a moving target to make it hard to maintain, but that seems to be settling somewhat now. It's pretty wide open, though, as far as I'm concerned. If you are to pursue this as a proposal, you will need to focus on a specific task or set of classes to implement along with docs and tests. A straight port isn't the best option; you should take advantage of Perl 6's strengths and possibly work on redesign where needed, keeping in mind that some aspects of bioperl might be considered a bit over-designed and out-of-date w/ re: to modern perl dictums. Also, learning a new language is nice, but that isn't the main focus for any GSoC project. At the end of the day, we should have usable code for the community, and if it acts as a entry point for more people to get involved with Perl 6 the better :) (I see that Leon has also chimed in on this with similar comments as well) I will be on and off #bioperl this week (pyrimidine). IRC is also logged in case I need to backlog (provided by one Moritz Lenz): http://irclog.perlgeek.de/bioperl/today chris On Apr 1, 2012, at 6:55 AM, Bart Wiegmans wrote: > Hello all BioPerl-ers, > > This is my first e-mail to the list and thus my introduction. My name > is Bart Wiegmans, I study biology at the University of Groningen, the > netherlands. It is my goal to implement bioperl6 this summer as part > of the GSoC program. > > Why would I want such a thing? For a start, I'd like to learn more > about bioinformatics. As I told you I study biology, so this has an > obvious advantage for me. Also, I'd like to learn perl6 well, and this > is only possible when one writes a significant program in it. > Moreover, I think perl6 is awesome, and having a real-world toolkit > like bioperl out there might just be enough to develop a significant > community using it. As a third, I think perl5's object support is > crufty, and difficult to learn for many people. These people include > biologists who might not be inclined to learn it, and rather use some > other tools instead. > > As to who I am, I already told you my name. I am 24 years old, and > study biology at an undergraduate level. (For those interested, yes > this means I haven't exactly been flying through my courses :-)). I > have been programming computers ever since I was 16 years old, and > earlier if you count BASIC. Starting out with C, most of that has been > websites (in PHP), scripts (in Perl), and other smallish programs (in > Java / Perl). For example, I implemented a parser and decoder for the > dirac video specification as part of GSoC 2008, and a script which > reads the NIH bookshelf website and translates this into ePub e-books. > Read quite a few of them that way. > > Aside from my motivation and capabilities, two other factors somewhat > complicate my involvement with GSoC. The first is that the academic > year ends halfway in July in the netherlands, not in may as in the USA > and in many other countries. This means that I am not 'free' in a real > sense before that time. Also, I have a day job as a PHP programmer for > a local online students' magazine, which also takes some time. Which > is unfortunate, because I'd rather spend my time writing useful > programs; hence, if you would accept me as a student I plan to take > leave from this job during the period of GSoC. > > Anyway, I realize this has been enough information for any interested > reader. If there is any interest on your side, I frequent freenode > under the nickname brrt. Other than that and this e-mail address, I > don't have much of an online presence. > > Kind regards, > Bart Wiegmans > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From joseramonblas at gmail.com Mon Apr 2 01:21:55 2012 From: joseramonblas at gmail.com (=?ISO-8859-1?Q?Jos=E9_Ram=F3n_Blas_Pastor?=) Date: Mon, 2 Apr 2012 07:21:55 +0200 Subject: [Bioperl-l] data into R from Perl arrays Message-ID: Hi, a very simple doubt, but I do not know how to manage this. I want to plot a histogram for all data in 'datos.txt'. a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) dev.off() b) How could I invoke R inside Perl to do the same?? #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; push(@datos,$_); } #now I want a histogram of values in @datos Thanks!! JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 From joseramonblas at gmail.com Mon Apr 2 01:21:55 2012 From: joseramonblas at gmail.com (=?ISO-8859-1?Q?Jos=E9_Ram=F3n_Blas_Pastor?=) Date: Mon, 2 Apr 2012 07:21:55 +0200 Subject: [Bioperl-l] data into R from Perl arrays Message-ID: Hi, a very simple doubt, but I do not know how to manage this. I want to plot a histogram for all data in 'datos.txt'. a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) dev.off() b) How could I invoke R inside Perl to do the same?? #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; push(@datos,$_); } #now I want a histogram of values in @datos Thanks!! JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 From avilella at gmail.com Mon Apr 2 02:24:09 2012 From: avilella at gmail.com (Albert Vilella) Date: Mon, 2 Apr 2012 07:24:09 +0100 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: You may get a faster answer in a generic programming site, like stackoverflow.com... On Mon, Apr 2, 2012 at 6:21 AM, Jos? Ram?n Blas Pastor wrote: > Hi, > > a very simple doubt, but I do not know how to manage this. > > I want to plot a histogram for all data in 'datos.txt'. > > a) by using R: > datos<-scan("datos.txt") > pdf("xh.pdf") > hist(datos) > dev.off() > > > b) How could I invoke R inside Perl to do the same?? > #!/usr/bin/perl > open(DAT,"datos.txt"); > while () { > ?chomp; > ?push(@datos,$_); > } > #now I want a histogram of values in @datos > > Thanks!! > > JR > > -- > Jos? Ram?n Blas - PhD > Dept. Biochemistry - Medicine School > University of Castilla-La Mancha > C Almansa, 14 > 02006 Albacete (Spain) > > Phone: +34 967599200 ext. 2958 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From avilella at gmail.com Mon Apr 2 02:24:09 2012 From: avilella at gmail.com (Albert Vilella) Date: Mon, 2 Apr 2012 07:24:09 +0100 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: You may get a faster answer in a generic programming site, like stackoverflow.com... On Mon, Apr 2, 2012 at 6:21 AM, Jos? Ram?n Blas Pastor wrote: > Hi, > > a very simple doubt, but I do not know how to manage this. > > I want to plot a histogram for all data in 'datos.txt'. > > a) by using R: > datos<-scan("datos.txt") > pdf("xh.pdf") > hist(datos) > dev.off() > > > b) How could I invoke R inside Perl to do the same?? > #!/usr/bin/perl > open(DAT,"datos.txt"); > while () { > ?chomp; > ?push(@datos,$_); > } > #now I want a histogram of values in @datos > > Thanks!! > > JR > > -- > Jos? Ram?n Blas - PhD > Dept. Biochemistry - Medicine School > University of Castilla-La Mancha > C Almansa, 14 > 02006 Albacete (Spain) > > Phone: +34 967599200 ext. 2958 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Mon Apr 2 04:17:56 2012 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 2 Apr 2012 09:17:56 +0100 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> The quickest way to do this specific example is probably to just save your R commands to a file (eg R_commands.R) and then run some kind of system/exec/backtick function in your perl script to invoke R, something like (untested): #!/usr/bin/perl use strict; use warnings; system( 'R --file R_commands.R' ); Alternatively if you want perl and R to be able to interact and pass data back and forth, you could use something like Statistics::R http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm HTH adam On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: > Hi, > > a very simple doubt, but I do not know how to manage this. > > I want to plot a histogram for all data in 'datos.txt'. > > a) by using R: > datos<-scan("datos.txt") > pdf("xh.pdf") > hist(datos) > dev.off() > > > b) How could I invoke R inside Perl to do the same?? > #!/usr/bin/perl > open(DAT,"datos.txt"); > while () { > chomp; > push(@datos,$_); > } > #now I want a histogram of values in @datos > > Thanks!! > > JR > > -- > Jos? Ram?n Blas - PhD > Dept. Biochemistry - Medicine School > University of Castilla-La Mancha > C Almansa, 14 > 02006 Albacete (Spain) > > Phone: +34 967599200 ext. 2958 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Mon Apr 2 06:33:06 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 02 Apr 2012 11:33:06 +0100 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> References: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> Message-ID: <4F798062.4050908@gmail.com> Alternatively you could go for a Perl-only approach using something like GD::Graph::Histogram. Cheers, Roy. On 02/04/2012 09:17, Adam Witney wrote: > > The quickest way to do this specific example is probably to just save > your R commands to a file (eg R_commands.R) and then run some kind of > system/exec/backtick function in your perl script to invoke R, > something like (untested): > > #!/usr/bin/perl use strict; use warnings; system( 'R --file > R_commands.R' ); > > Alternatively if you want perl and R to be able to interact and pass > data back and forth, you could use something like Statistics::R > > http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm > > HTH > > adam > > On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: > >> Hi, >> >> a very simple doubt, but I do not know how to manage this. >> >> I want to plot a histogram for all data in 'datos.txt'. >> >> a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) >> dev.off() >> >> >> b) How could I invoke R inside Perl to do the same?? >> #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; >> push(@datos,$_); } #now I want a histogram of values in @datos >> >> Thanks!! >> >> JR >> >> -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School >> University of Castilla-La Mancha C Almansa, 14 02006 Albacete >> (Spain) >> >> Phone: +34 967599200 ext. 2958 >> >> _______________________________________________ Bioperl-l mailing >> list Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ Bioperl-l mailing > list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Apr 2 08:59:40 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 2 Apr 2012 12:59:40 +0000 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> References: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> Message-ID: Not sure how well it is supported, but there is also Statistics::useR (which has an XS layer for conversing with R). https://metacpan.org/module/Statistics::useR chris On Apr 2, 2012, at 3:17 AM, Adam Witney wrote: > > The quickest way to do this specific example is probably to just save your R commands to a file (eg R_commands.R) and then run some kind of system/exec/backtick function in your perl script to invoke R, something like (untested): > > #!/usr/bin/perl > use strict; > use warnings; > system( 'R --file R_commands.R' ); > > Alternatively if you want perl and R to be able to interact and pass data back and forth, you could use something like Statistics::R > > http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm > > HTH > > adam > > On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: > >> Hi, >> >> a very simple doubt, but I do not know how to manage this. >> >> I want to plot a histogram for all data in 'datos.txt'. >> >> a) by using R: >> datos<-scan("datos.txt") >> pdf("xh.pdf") >> hist(datos) >> dev.off() >> >> >> b) How could I invoke R inside Perl to do the same?? >> #!/usr/bin/perl >> open(DAT,"datos.txt"); >> while () { >> chomp; >> push(@datos,$_); >> } >> #now I want a histogram of values in @datos >> >> Thanks!! >> >> JR >> >> -- >> Jos? Ram?n Blas - PhD >> Dept. Biochemistry - Medicine School >> University of Castilla-La Mancha >> C Almansa, 14 >> 02006 Albacete (Spain) >> >> Phone: +34 967599200 ext. 2958 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bartwiegmans at gmail.com Mon Apr 2 13:10:47 2012 From: bartwiegmans at gmail.com (Bart Wiegmans) Date: Mon, 2 Apr 2012 19:10:47 +0200 Subject: [Bioperl-l] Implementing Bioperl6 for GSoC 2012 In-Reply-To: <6D16BB69-A74A-4A7E-B81C-0B33E9A73473@illinois.edu> References: <6D16BB69-A74A-4A7E-B81C-0B33E9A73473@illinois.edu> Message-ID: Chris, Leon, others, Thank you for your timely responses. So far as the timeframe is concerned, I might be able to get student credits for participating in this projects as it is related to my study. In that case I would have more time free. At any rate, I understand it is suboptimal to start working in july, so I will do my best to make as much time free as possible. I've already checked out the bioper-6 projects as well as the biome project from github. I am not quite sure what scope of project to choose and I was hoping for your advice. File format import / export and database connectivity would come to mind, as these are subjects I am most familiar with. In such a scenario, aside from a set of modules / classes, the end goal would be a script that could search for and import a sequence from a number of popular databases, and save it on the users' hard disk. I am very much open to suggestions, however. Anyway, thank you for your time. Kind regards, Bart Wiegmans 2012/4/2 Fields, Christopher J : > Bart, > > I think this is a good idea, but to address one concern up-front: I believe the time frame (possible start in mid-July) will not be in your favor unless you can participate according to Google's schedule. ?There is some flexibility at the beginning, mainly during the community bonding period, but beyond that there isn't much wiggle room; we're as constrained as you are. ?The proposals to OBF over the last few years have been highly competitive, and this year will prove even more so as a large number of major Perl-based FLOSS orgs (primarily The Perl Foundation) were not accepted into GSoC. > > Now for Perl 6: > > BioPerl6 is a project Philip Mabon and I have already started up on github: > > ? https://github.com/cjfields/bioperl6 > > The code is basically proof-of-concept stuff at this point, and I don't believe it's current to spec last I checked or that it works on either the most current niecza or rakudo. ?All this was enough of a moving target to make it hard to maintain, but that seems to be settling somewhat now. ?It's pretty wide open, though, as far as I'm concerned. > > If you are to pursue this as a proposal, you will need to focus on a specific task or set of classes to implement along with docs and tests. ?A straight port isn't the best option; you should take advantage of Perl 6's strengths and possibly work on redesign where needed, keeping in mind that some aspects of bioperl might be considered a bit over-designed and out-of-date w/ re: to modern perl dictums. ?Also, learning a new language is nice, but that isn't the main focus for any GSoC project. ?At the end of the day, we should have usable code for the community, and if it acts as a entry point for more people to get involved with Perl 6 the better :) > > (I see that Leon has also chimed in on this with similar comments as well) > > I will be on and off #bioperl this week (pyrimidine). ?IRC is also logged in case I need to backlog (provided by one Moritz Lenz): > > ? http://irclog.perlgeek.de/bioperl/today > > chris > > On Apr 1, 2012, at 6:55 AM, Bart Wiegmans wrote: > >> Hello all BioPerl-ers, >> >> This is my first e-mail to the list and thus my introduction. My name >> is Bart Wiegmans, I study biology at the University of Groningen, the >> netherlands. It is my goal to implement bioperl6 this summer as part >> of the GSoC program. >> >> Why would I want such a thing? For a start, I'd like to learn more >> about bioinformatics. As I told you I study biology, so this has an >> obvious advantage for me. Also, I'd like to learn perl6 well, and this >> is only possible when one writes a significant program in it. >> Moreover, I think perl6 is awesome, and having a real-world toolkit >> like bioperl out there might just be enough to develop a significant >> community using it. As a third, I think perl5's object support is >> crufty, and difficult to learn for many people. These people include >> biologists who might not be inclined to learn it, and rather use some >> other tools instead. >> >> As to who I am, I already told you my name. I am 24 years old, and >> study biology at an undergraduate level. (For those interested, yes >> this means I haven't exactly been flying through my courses :-)). I >> have been programming computers ever since I was 16 years old, and >> earlier if you count BASIC. Starting out with C, most of that has been >> websites (in PHP), scripts (in Perl), and other smallish programs (in >> Java / Perl). For example, I implemented a parser and decoder for the >> dirac video specification as part of GSoC 2008, and a script which >> reads the NIH bookshelf website and translates this into ePub e-books. >> Read quite a few of them that way. >> >> Aside from my motivation and capabilities, two other factors somewhat >> complicate my involvement with GSoC. The first is that the academic >> year ends halfway in July in the netherlands, not in may as in the USA >> and in many other countries. This means that I am not 'free' in a real >> sense before that time. Also, I have a day job as a PHP programmer for >> a local online students' magazine, which also takes some time. Which >> is unfortunate, because I'd rather spend my time writing useful >> programs; hence, if you would accept me as a student I plan to take >> leave from this job during the period of GSoC. >> >> Anyway, I realize this has been enough information for any interested >> reader. If there is any interest on your side, I frequent freenode >> under the nickname brrt. Other than that and this e-mail address, I >> don't have much of an online presence. >> >> Kind regards, >> Bart Wiegmans >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florent.angly at gmail.com Mon Apr 2 18:30:09 2012 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 03 Apr 2012 08:30:09 +1000 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> Message-ID: <4F7A2871.8080207@gmail.com> To execute R commands from Perl, you can also try Statistics::R (http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm ), which has been around for longer, and which I have recently refactored. Regards, Florent On 02/04/12 22:59, Fields, Christopher J wrote: > Not sure how well it is supported, but there is also Statistics::useR (which has an XS layer for conversing with R). > > https://metacpan.org/module/Statistics::useR > > chris > > On Apr 2, 2012, at 3:17 AM, Adam Witney wrote: > >> The quickest way to do this specific example is probably to just save your R commands to a file (eg R_commands.R) and then run some kind of system/exec/backtick function in your perl script to invoke R, something like (untested): >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> system( 'R --file R_commands.R' ); >> >> Alternatively if you want perl and R to be able to interact and pass data back and forth, you could use something like Statistics::R >> >> http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm >> >> HTH >> >> adam >> >> On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: >> >>> Hi, >>> >>> a very simple doubt, but I do not know how to manage this. >>> >>> I want to plot a histogram for all data in 'datos.txt'. >>> >>> a) by using R: >>> datos<-scan("datos.txt") >>> pdf("xh.pdf") >>> hist(datos) >>> dev.off() >>> >>> >>> b) How could I invoke R inside Perl to do the same?? >>> #!/usr/bin/perl >>> open(DAT,"datos.txt"); >>> while () { >>> chomp; >>> push(@datos,$_); >>> } >>> #now I want a histogram of values in @datos >>> >>> Thanks!! >>> >>> JR >>> >>> -- >>> Jos? Ram?n Blas - PhD >>> Dept. Biochemistry - Medicine School >>> University of Castilla-La Mancha >>> C Almansa, 14 >>> 02006 Albacete (Spain) >>> >>> Phone: +34 967599200 ext. 2958 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From huangyifeicmb at gmail.com Mon Apr 2 20:41:54 2012 From: huangyifeicmb at gmail.com (Yifei Huang) Date: Mon, 2 Apr 2012 20:41:54 -0400 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: <4F7A2871.8080207@gmail.com> References: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> <4F7A2871.8080207@gmail.com> Message-ID: You may try RSPerl. http://www.omegahat.org/RSPerl/ Yifei 2012/4/2 Florent Angly > To execute R commands from Perl, you can also try Statistics::R ( > http://search.cpan.org/~**fangly/Statistics-R-0.27/lib/**Statistics/R.pm< > http://search.cpan.org/%**7Efangly/Statistics-R-0.27/**lib/Statistics/R.pm>), > which has been around for longer, and which I have recently refactored. > Regards, > Florent > > > On 02/04/12 22:59, Fields, Christopher J wrote: > >> Not sure how well it is supported, but there is also Statistics::useR >> (which has an XS layer for conversing with R). >> >> https://metacpan.org/module/**Statistics::useR >> >> chris >> >> On Apr 2, 2012, at 3:17 AM, Adam Witney wrote: >> >> The quickest way to do this specific example is probably to just save >>> your R commands to a file (eg R_commands.R) and then run some kind of >>> system/exec/backtick function in your perl script to invoke R, something >>> like (untested): >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> system( 'R --file R_commands.R' ); >>> >>> Alternatively if you want perl and R to be able to interact and pass >>> data back and forth, you could use something like Statistics::R >>> >>> http://search.cpan.org/~**fangly/Statistics-R-0.27/lib/**Statistics/R.pm >>> >>> HTH >>> >>> adam >>> >>> On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: >>> >>> Hi, >>>> >>>> a very simple doubt, but I do not know how to manage this. >>>> >>>> I want to plot a histogram for all data in 'datos.txt'. >>>> >>>> a) by using R: >>>> datos<-scan("datos.txt") >>>> pdf("xh.pdf") >>>> hist(datos) >>>> dev.off() >>>> >>>> >>>> b) How could I invoke R inside Perl to do the same?? >>>> #!/usr/bin/perl >>>> open(DAT,"datos.txt"); >>>> while () { >>>> chomp; >>>> push(@datos,$_); >>>> } >>>> #now I want a histogram of values in @datos >>>> >>>> Thanks!! >>>> >>>> JR >>>> >>>> -- >>>> Jos? Ram?n Blas - PhD >>>> Dept. Biochemistry - Medicine School >>>> University of Castilla-La Mancha >>>> C Almansa, 14 >>>> 02006 Albacete (Spain) >>>> >>>> Phone: +34 967599200 ext. 2958 >>>> >>>> ______________________________**_________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >>>> >>> >>> ______________________________**_________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >>> >> >> ______________________________**_________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> > > ______________________________**_________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/**mailman/listinfo/bioperl-l > -- Yifei Huang Department of Biology McMaster University From gregonomic at yahoo.co.nz Mon Apr 2 20:37:38 2012 From: gregonomic at yahoo.co.nz (Gregory Baillie) Date: Mon, 2 Apr 2012 17:37:38 -0700 (PDT) Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: <1333413458.84973.YahooMailNeo@web112506.mail.gq1.yahoo.com> Hi. If you only want to plot histograms (or other kinds of graphs), and don't need to use the statistics capabilities of R, then gnuplot (http://www.gnuplot.info/) is a good option. You can construct a string of gnuplot commands and your data, and pipe that directly to gnuplot, without having to write the commands or data to separate files and then use a system call to R. Of course, you do have to learn gnuplot, which may be more trouble than it's worth (I like it, but some people don't). Greg. ________________________________ From: Jos? Ram?n Blas Pastor To: bioperl-l at lists.open-bio.org; Bioperl mailing list Sent: Monday, 2 April 2012 3:21 PM Subject: [Bioperl-l] data into R from Perl arrays Hi, a very simple doubt, but I do not know how to manage this. I want to plot a histogram for all data in 'datos.txt'. a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) dev.off() b) How could I invoke R inside Perl to do the same?? #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; push(@datos,$_); } #now I want a histogram of values in @datos Thanks!! JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From gregonomic at yahoo.co.nz Mon Apr 2 20:37:38 2012 From: gregonomic at yahoo.co.nz (Gregory Baillie) Date: Mon, 2 Apr 2012 17:37:38 -0700 (PDT) Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: <1333413458.84973.YahooMailNeo@web112506.mail.gq1.yahoo.com> Hi. If you only want to plot histograms (or other kinds of graphs), and don't need to use the statistics capabilities of R, then gnuplot (http://www.gnuplot.info/) is a good option. You can construct a string of gnuplot commands and your data, and pipe that directly to gnuplot, without having to write the commands or data to separate files and then use a system call to R. Of course, you do have to learn gnuplot, which may be more trouble than it's worth (I like it, but some people don't). Greg. ________________________________ From: Jos? Ram?n Blas Pastor To: bioperl-l at lists.open-bio.org; Bioperl mailing list Sent: Monday, 2 April 2012 3:21 PM Subject: [Bioperl-l] data into R from Perl arrays Hi, a very simple doubt, but I do not know how to manage this. I want to plot a histogram for all data in 'datos.txt'. a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) dev.off() b) How could I invoke R inside Perl to do the same?? #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; push(@datos,$_); } #now I want a histogram of values in @datos Thanks!! JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Tue Apr 3 11:34:43 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 3 Apr 2012 11:34:43 -0400 Subject: [Bioperl-l] Downloading refseq genomes in batch Message-ID: Hi All, I am trying to download refseq genomes in batch. But instead of accession number i have genome names (=~ 500). Is there any way i can download them using some bioperl module ? Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From carandraug+dev at gmail.com Tue Apr 3 11:53:32 2012 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Tue, 3 Apr 2012 16:53:32 +0100 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: Message-ID: On 3 April 2012 16:34, shalabh sharma wrote: > Hi All, > ? ? ? ? I am trying to download refseq genomes in batch. But instead of > accession number i have genome names (=~ 500). > Is there any way i can download them using some bioperl module ? If you have their name/official symbol, then searching on the database should nly return one hit, therefore one UID. Make the search, get that number, and use it for download. The EUtilities module should do that. Carn? From shalabh.sharma7 at gmail.com Tue Apr 3 14:15:16 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 3 Apr 2012 14:15:16 -0400 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: Message-ID: Hi Came, Thanks for your reply. I tried to get UID from genome names but i cant find on EUtilities. I have taxa id for those genomes, can i download genomes with taxa id in batch ? Thanks Shalabh On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug wrote: > On 3 April 2012 16:34, shalabh sharma wrote: > > Hi All, > > I am trying to download refseq genomes in batch. But instead of > > accession number i have genome names (=~ 500). > > Is there any way i can download them using some bioperl module ? > > If you have their name/official symbol, then searching on the database > should nly return one hit, therefore one UID. Make the search, get > that number, and use it for download. The EUtilities module should do > that. > > Carn? > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From berry at exisoft.nl Tue Apr 3 16:24:54 2012 From: berry at exisoft.nl (Berry Kriesels) Date: Tue, 03 Apr 2012 22:24:54 +0200 Subject: [Bioperl-l] Google summer of code Bio::Structure Message-ID: <4F7B5C96.5090208@exisoft.nl> Dear all, Currently I am considering applying as a student for the 'google summer of code' and would like to contribute to BioPerl via this way. At the moment I am investigating extending the BioPerl Bio::Structure**library in such a way that also some protein modelling can be done or at least add a method so one could do a pdb structure quality assessment. One way is to do it with the use of online services such as for instance Prosaweb (and thus creating a wrapper for this service). Also I could make libraries which one could use to asses the phi and psi angles of certain atoms within a PDB file or the distance in angstrom among many other coordinate measurements within a protein PDB file but also among (comparison) of multiple PDB files. Also adding functions such as DOPE (*D*iscrete*O*ptimized*P*rotein*E*nergy) for model comparisons is an option. There are tons of options to add. However... I have a few questions regarding this and hope some of you will be willing to answer: 1. As users of BioPerl would you consider extending the current Bio::Structure library as a added value or would you rather see effort made in different areas. 2. If one would see extension of the current Bio:Structure library as a useful project, what would your main interests and wishes be? Thank you for input and time. With kind regards, Berry Msc student Bio-informatics. From jovel_juan at hotmail.com Tue Apr 3 17:02:26 2012 From: jovel_juan at hotmail.com (Juan Jovel) Date: Tue, 3 Apr 2012 21:02:26 +0000 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: , , Message-ID: Hi Shalab You can try use Bio::DB::GenBank, but I believe the NCBI does not like people doing many remote lookups. I would advise you download the whole database you are interested in, and then you parse it locally. Cheers, Juan > Date: Tue, 3 Apr 2012 14:15:16 -0400 > From: shalabh.sharma7 at gmail.com > To: carandraug+dev at gmail.com > CC: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Downloading refseq genomes in batch > > Hi Came, > Thanks for your reply. > I tried to get UID from genome names but i cant find on EUtilities. > I have taxa id for those genomes, can i download genomes with taxa id in > batch ? > > Thanks > Shalabh > > > On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug wrote: > > > On 3 April 2012 16:34, shalabh sharma wrote: > > > Hi All, > > > I am trying to download refseq genomes in batch. But instead of > > > accession number i have genome names (=~ 500). > > > Is there any way i can download them using some bioperl module ? > > > > If you have their name/official symbol, then searching on the database > > should nly return one hit, therefore one UID. Make the search, get > > that number, and use it for download. The EUtilities module should do > > that. > > > > Carn? > > > > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Apr 3 17:19:07 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 3 Apr 2012 21:19:07 +0000 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: , , Message-ID: 500 sequences isn't too bad for a remote lookup (I have run about ~20K myself). It's much easier if you can grab them as a batch, e.g. run esearch for the IDs, use efetch with the webenv/key to grab the sequences. NCBI is more worried about the number of requests made, the length of time between requests, and the time of day requests are made. In fact, I recall updating EUtilities recently so it can use a POST, so you can grab ~2000 seqs at a time w/o having to iterate through them. chris On Apr 3, 2012, at 4:02 PM, Juan Jovel wrote: > > Hi Shalab > You can try use Bio::DB::GenBank, but I believe the NCBI does not like people doing many remote lookups. I would advise you download the whole database you are interested in, and then you parse it locally. > Cheers, Juan >> Date: Tue, 3 Apr 2012 14:15:16 -0400 >> From: shalabh.sharma7 at gmail.com >> To: carandraug+dev at gmail.com >> CC: Bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Downloading refseq genomes in batch >> >> Hi Came, >> Thanks for your reply. >> I tried to get UID from genome names but i cant find on EUtilities. >> I have taxa id for those genomes, can i download genomes with taxa id in >> batch ? >> >> Thanks >> Shalabh >> >> >> On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug wrote: >> >>> On 3 April 2012 16:34, shalabh sharma wrote: >>>> Hi All, >>>> I am trying to download refseq genomes in batch. But instead of >>>> accession number i have genome names (=~ 500). >>>> Is there any way i can download them using some bioperl module ? >>> >>> If you have their name/official symbol, then searching on the database >>> should nly return one hit, therefore one UID. Make the search, get >>> that number, and use it for download. The EUtilities module should do >>> that. >>> >>> Carn? >>> >> >> >> >> -- >> Shalabh Sharma >> Scientific Computing Professional Associate (Bioinformatics Specialist) >> Department of Marine Sciences >> University of Georgia >> Athens, GA 30602-3636 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Wed Apr 4 17:24:08 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 4 Apr 2012 17:24:08 -0400 Subject: [Bioperl-l] Weird efetch problem. Message-ID: Hi All, I am facing a really weird problem using efetch. I am getting different outputs if i am using different method of passing values. Like if i am using this method: #!/usr/bin/perl -w use Bio::DB::EUtilities; use Bio::SeqIO; my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'protein', -rettype => 'fasta', -email => 'ssharmai at uga.edu', -id => '256009369'); my $file = 'genome.fasta'; $factory->get_Response(-file => $file); I am getting correct protein sequence but if i am passing values (same id) via an array i am getting nucleotide sequences. use Bio::DB::EUtilities; use Bio::SeqIO; my @ids; my $c = 0; open(IN,"$ARGV[0]"); while(){ my $id = $_; chomp($id);chop($id); $ids[$c] = $id; print "$id\n"; $c++; } close(IN); my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'protein', -rettype => 'fasta', -email => 'ssharmai at uga.edu', -id => \@ids); my $file = 'genome.fasta'; Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From sidd.basu at gmail.com Thu Apr 5 06:31:47 2012 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 5 Apr 2012 05:31:47 -0500 Subject: [Bioperl-l] Re: Weird efetch problem. In-Reply-To: References: Message-ID: <20120405103146.GA5544@Macintosh-388.local> On Wed, 04 Apr 2012, shalabh sharma wrote: > Hi All, > I am facing a really weird problem using efetch. I am getting > different outputs if i am using different method of passing values. > > Like if i am using this method: > > #!/usr/bin/perl -w > use Bio::DB::EUtilities; > use Bio::SeqIO; > my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -db => 'protein', > -rettype => 'fasta', > -email => 'ssharmai at uga.edu', > -id => '256009369'); > > my $file = 'genome.fasta'; > $factory->get_Response(-file => $file); > > I am getting correct protein sequence but if i am passing values (same id) > via an array i am getting nucleotide sequences. > > use Bio::DB::EUtilities; > use Bio::SeqIO; > my @ids; > my $c = 0; > open(IN,"$ARGV[0]"); > while(){ > my $id = $_; > chomp($id);chop($id); > $ids[$c] = $id; > print "$id\n"; > $c++; > } > close(IN); > my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -db => 'protein', > -rettype => 'fasta', > -email => 'ssharmai at uga.edu', > -id => \@ids); Could you send the ids here. -siddhartha > > my $file = 'genome.fasta'; > > Thanks > Shalabh > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Apr 5 09:07:28 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 5 Apr 2012 13:07:28 +0000 Subject: [Bioperl-l] Weird efetch problem. In-Reply-To: <20120405103146.GA5544@Macintosh-388.local> References: <20120405103146.GA5544@Macintosh-388.local> Message-ID: On Apr 5, 2012, at 5:31 AM, Siddhartha Basu wrote: > On Wed, 04 Apr 2012, shalabh sharma wrote: > >> Hi All, >> I am facing a really weird problem using efetch. I am getting >> different outputs if i am using different method of passing values. >> ... >> >> I am getting correct protein sequence but if i am passing values (same id) >> via an array i am getting nucleotide sequences. >> >> .. > Could you send the ids here. > > -siddhartha And please file a bug report on this if something is found. I do know if you use accession numbers you can sometimes get odd results. I recommend only using UIDs (the GI in the case of protein and nuc seqs). chris From shalabh.sharma7 at gmail.com Thu Apr 5 10:40:06 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 5 Apr 2012 10:40:06 -0400 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: Message-ID: Hi All, Thanks for all the suggestions. Thanks a lot Chris, i am using your method to pull out genomes. Its working fine. Thanks Shalabh On Tue, Apr 3, 2012 at 5:19 PM, Fields, Christopher J wrote: > 500 sequences isn't too bad for a remote lookup (I have run about ~20K > myself). It's much easier if you can grab them as a batch, e.g. run > esearch for the IDs, use efetch with the webenv/key to grab the sequences. > NCBI is more worried about the number of requests made, the length of time > between requests, and the time of day requests are made. > > In fact, I recall updating EUtilities recently so it can use a POST, so > you can grab ~2000 seqs at a time w/o having to iterate through them. > > chris > > On Apr 3, 2012, at 4:02 PM, Juan Jovel wrote: > > > > > Hi Shalab > > You can try use Bio::DB::GenBank, but I believe the NCBI does not like > people doing many remote lookups. I would advise you download the whole > database you are interested in, and then you parse it locally. > > Cheers, Juan > >> Date: Tue, 3 Apr 2012 14:15:16 -0400 > >> From: shalabh.sharma7 at gmail.com > >> To: carandraug+dev at gmail.com > >> CC: Bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Downloading refseq genomes in batch > >> > >> Hi Came, > >> Thanks for your reply. > >> I tried to get UID from genome names but i cant find on EUtilities. > >> I have taxa id for those genomes, can i download genomes with taxa id in > >> batch ? > >> > >> Thanks > >> Shalabh > >> > >> > >> On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug >wrote: > >> > >>> On 3 April 2012 16:34, shalabh sharma > wrote: > >>>> Hi All, > >>>> I am trying to download refseq genomes in batch. But instead of > >>>> accession number i have genome names (=~ 500). > >>>> Is there any way i can download them using some bioperl module ? > >>> > >>> If you have their name/official symbol, then searching on the database > >>> should nly return one hit, therefore one UID. Make the search, get > >>> that number, and use it for download. The EUtilities module should do > >>> that. > >>> > >>> Carn? > >>> > >> > >> > >> > >> -- > >> Shalabh Sharma > >> Scientific Computing Professional Associate (Bioinformatics Specialist) > >> Department of Marine Sciences > >> University of Georgia > >> Athens, GA 30602-3636 > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From k.d.murray.91 at gmail.com Fri Apr 6 09:49:32 2012 From: k.d.murray.91 at gmail.com (Kevin Murray) Date: Fri, 6 Apr 2012 23:49:32 +1000 Subject: [Bioperl-l] sequence proxy server Message-ID: Hi all, I'm an undergrad student in molecular biology at the ANU in Australia, and my research projects are becoming increasingly bioinformatics heavy. The latest one has involved quite a large amount of sequence retrieval from GenBank and GenPept. The download speed to Australia from NCBI's servers is rather slow, and i've been thinking about how we can improve this. One solution would be to use Bio::DB::Flat with GenBank sequences on a local computer. However, in a situation where there are multiple people in a lab doing bioinformatics, it seems to me a bit of a waste to have the entire genbank/genpept database, or even the relevant sections thereof, on each computer. So, i though about writing a "sequence proxy" cgi script, and a corresponding module, which would work a bit like this: The user calls Bio::DB::SeqProxy::GenBank as they would Bio::DB::GenBank, with the exception that a parameter for the address of the sequence proxy server is required. The module then sends a request similar to that sent to NCBI's servers by calling Bio::DB::GenBank->get_Seq_by_x() to the sequence proxy server I believe all requests go to the efetch page now (please correct me if I'm wrong, i have read the relevant bioperl module code but not thoroughly), so the CGI script on the sequence proxy would take arguments in a similar fashion to make writing the client side module easier. The CGI script would use a Bio::DB::Flat database, or an interface to an SQL database to determine if the required sequence is stored locally. (as a aside, i'd like your thoughts on Bio::DB::Flat vs Bio::DB::Sql or similar) If the sequence exists locally, it would be returned to the user, either as plain text, or inside an XML container (see below). If not, it would be retrieved from the remote database using the relevant Bio::DB module, and returned. The sequence would either be returned as the relevant sequence format (which would default to GenBank format) in plain text, or as an XML document similar to: 1 ___YOUR GENBANK FILE HERE___ Local Database The aim of the xml document would be to simplify handling of server errors and allow for the specification of other metadata such as which database the sequence came from. Firstly, I'd like to know if this sounds feasible, and if so, if someone is already working on something similar? I don't want to reinvent the wheel. Secondly, I'd like to ask for your comments and advice. Being reasonably new to bioperl (started using bioperl about 6 months ago, but I've been coding in various languages for 8 years) I don't expect to have considered things that may seem obvious to a more experienced bioperl-er, so please be as brutally constructive in your criticism as you see fit =]. I know this is alot of questions, so thanks in advance for your help. Cheers, and a happy Easter to those who celebrate it. Regards Kevin Murray From shalabh.sharma7 at gmail.com Fri Apr 6 10:52:30 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 6 Apr 2012 10:52:30 -0400 Subject: [Bioperl-l] Question about EUtils esearch Message-ID: Hi All, I am trying to get all the UIDs for few genomes. For example: for 'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to NCBI genome page i see there are only =~ 32,00 proteins. http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens I have done this for lot of genomes and i am afraid that i have to do this again. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From shalabh.sharma7 at gmail.com Fri Apr 6 14:27:29 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 6 Apr 2012 14:27:29 -0400 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: Message-ID: Hi Chris, I am using the method you suggested. But i have a question. The UIDs that i am searching using "esearch" are not same as the number of proteins in that genome. For Example: for 'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to NCBI genome page i see there are only =~ 32,00 proteins. http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens Thanks Shalabh On Thu, Apr 5, 2012 at 10:40 AM, shalabh sharma wrote: > Hi All, > Thanks for all the suggestions. > Thanks a lot Chris, i am using your method to pull out genomes. Its > working fine. > > Thanks > Shalabh > > > On Tue, Apr 3, 2012 at 5:19 PM, Fields, Christopher J < > cjfields at illinois.edu> wrote: > >> 500 sequences isn't too bad for a remote lookup (I have run about ~20K >> myself). It's much easier if you can grab them as a batch, e.g. run >> esearch for the IDs, use efetch with the webenv/key to grab the sequences. >> NCBI is more worried about the number of requests made, the length of time >> between requests, and the time of day requests are made. >> >> In fact, I recall updating EUtilities recently so it can use a POST, so >> you can grab ~2000 seqs at a time w/o having to iterate through them. >> >> chris >> >> On Apr 3, 2012, at 4:02 PM, Juan Jovel wrote: >> >> > >> > Hi Shalab >> > You can try use Bio::DB::GenBank, but I believe the NCBI does not like >> people doing many remote lookups. I would advise you download the whole >> database you are interested in, and then you parse it locally. >> > Cheers, Juan >> >> Date: Tue, 3 Apr 2012 14:15:16 -0400 >> >> From: shalabh.sharma7 at gmail.com >> >> To: carandraug+dev at gmail.com >> >> CC: Bioperl-l at lists.open-bio.org >> >> Subject: Re: [Bioperl-l] Downloading refseq genomes in batch >> >> >> >> Hi Came, >> >> Thanks for your reply. >> >> I tried to get UID from genome names but i cant find on EUtilities. >> >> I have taxa id for those genomes, can i download genomes with taxa id >> in >> >> batch ? >> >> >> >> Thanks >> >> Shalabh >> >> >> >> >> >> On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug > >wrote: >> >> >> >>> On 3 April 2012 16:34, shalabh sharma >> wrote: >> >>>> Hi All, >> >>>> I am trying to download refseq genomes in batch. But instead >> of >> >>>> accession number i have genome names (=~ 500). >> >>>> Is there any way i can download them using some bioperl module ? >> >>> >> >>> If you have their name/official symbol, then searching on the database >> >>> should nly return one hit, therefore one UID. Make the search, get >> >>> that number, and use it for download. The EUtilities module should do >> >>> that. >> >>> >> >>> Carn? >> >>> >> >> >> >> >> >> >> >> -- >> >> Shalabh Sharma >> >> Scientific Computing Professional Associate (Bioinformatics Specialist) >> >> Department of Marine Sciences >> >> University of Georgia >> >> Athens, GA 30602-3636 >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Fri Apr 6 15:09:23 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 6 Apr 2012 19:09:23 +0000 Subject: [Bioperl-l] Question about EUtils esearch In-Reply-To: References: Message-ID: Shalabh, You should try getting the specific genome project ID of interest, linking to the proteins, and then grab those. The EUtilities cookbook has a few examples on how to do that. chris On Apr 6, 2012, at 9:52 AM, shalabh sharma wrote: > Hi All, > I am trying to get all the UIDs for few genomes. > For example: > for 'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to > NCBI genome page i see there are only =~ 32,00 proteins. > http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens > > I have done this for lot of genomes and i am afraid that i have to do this > again. > > Thanks > Shalabh > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From wrp at virginia.edu Sat Apr 7 16:56:16 2012 From: wrp at virginia.edu (William Pearson) Date: Sat, 7 Apr 2012 16:56:16 -0400 Subject: [Bioperl-l] Bioperl-l Digest, Vol 108, Issue 7 In-Reply-To: References: Message-ID: To get the UIDs (GIs) that you want, search for human[organism] AND srcdb_refseq[Properties] This will get you the refseq proteins you want. Bill Pearson > Message: 1 > Date: Fri, 6 Apr 2012 14:27:29 -0400 > From: shalabh sharma > Subject: Re: [Bioperl-l] Downloading refseq genomes in batch > > Hi Chris, > I am using the method you suggested. > But i have a question. The UIDs that i am searching using "esearch" are not > same as the number of proteins in that genome. > > For Example: > for 'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to > NCBI genome page i see there are only =~ 32,00 proteins. > http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens > > Thanks > Shalabh > From joel.klein at wur.nl Sun Apr 8 19:35:18 2012 From: joel.klein at wur.nl (Bradyjoel) Date: Sun, 8 Apr 2012 16:35:18 -0700 (PDT) Subject: [Bioperl-l] Need some help with : Error "Use of uninitialized value" Message-ID: <33653318.post@talk.nabble.com> Hi all, I have little experiences in programming with Perl/Bioperl. I'm currently working on a script that takes a whole genome from a bacteria as input, converts it into a multiple fasta file containing all the open reading frames and blast it against a multiple protein fasta file with know proteins. When I get a hit I want to combine the header of the known protein with the orf sequence, here it gives an error when I try to go through the orf file and extract the right corresponding sequence. The error it gives is : Use of uninitialized value $seq in print at blastscript.pl line .. Is there someone who has an idea what caused this error, and can help me with solving it? Regards, Joel (I put my script in the attachment) http://old.nabble.com/file/p33653318/blastscript.pl blastscript.pl -- View this message in context: http://old.nabble.com/Need-some-help-with-%3A-Error-%22Use-of-uninitialized-value%22-tp33653318p33653318.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From afia.hayati at gmail.com Thu Apr 5 00:52:01 2012 From: afia.hayati at gmail.com (afia hayati) Date: Thu, 5 Apr 2012 13:52:01 +0900 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE Message-ID: Dear all, I am afia, a PhD student in Bioinformatics. I am so interested to participate GSoC and would like to apply for project Bio::Assembly, Ace2Sam and Sam2Ace converter. I have written a proposal based on the guidance for prospective GSoC student. I paste my proposal in here. If you have time, please give me suggestions. Thank you very much. Sincerely, Afiahayati [~Be Passion, Patient and Persistent~] *Google Summer of Code 2012* *Proposal* *ACEtoSAM and SAMtoACE* 1. *Contact information * 1. Full name :Afiahayati 2. Address : Hiyoshi International House (Room C301-1), 223-0061 Yokohama-shi Kouhoku-ku, Hiyoshi 2-27 Kanagawa ? Japan 3. Email : afia.hayati at gmail.com 4. Phone number : 818044637237 5. IRC nick : afia 2. *Motivation to join this project * I am a PhD student in bioinformatics. My research is in genome assembly, especially metagenome assembly. I have same idea that the converter from ACEtoSAM and vice versa is very useful. I am familiar with Perl and BioPerl, so there is no reason for not participating in this project 3. *Programming experience and skills * 1. Perl also BioPerl since January 2010 2. R, since January 2008 3. Oracle, since January 2008 4. Biojava, since January 2007 5. PHP , since January 2006 6. C++, since January 2006 7. Java, since January 2006 8. MySQL, since January 2005 9. C , since January 2005 4. *Open source projects involved with * 1. Metagenome Assembly, 2012 (with supervisor) Develop de novo assembler for metagenomic data from short sequence reads Using C, C++ and Perl 2. Develop some interfaces in RCommander, 2010 (in team) 3. Computer system of academic hospital, 2009 (in team) By modifying an open source hospital information system, Care2x Using PHP, Java script and HTML 4. Academic data warehouse and data mining, 2008 (in team) Using Pentaho Business Analytics and R programming language 5. *Project Plan * 1. *Before April 23 * 1. Study the format of SAM and ACE more detail 2. Study the biodesign related to module Bio::Assembly::IO especially Bio::Assembly::IO::ACE and Bio::Assembly::IO::SAM 3. Study the documentation and the code of module Bio::Assembly::IO::ACE and Bio::Assembly::IO::SAM 2. *April 23 - May 20 (before official coding period) * 1. To do self coding for Bio::Assembly::IO::ACE and Bio::Assembly::IO::SAM to improve my understand. 2. Keep contact with my mentor and the BioPerl community. I will active in mailing list and IRC to confirm my understanding about Bio::Assembly::IO::ACE and Bio::Assembly::IO::SAM and also discuss the operations (the methods) needed for a module ACEtoSAM and SAMtoACE converting. 3. With the supervision from my mentor, try to determine the appropriate design of module ACEtoSAM and SAMtoACE converting. 3. *May 21 - June 21 * 1. Determine the final design of module ACEtoSAM and SAMtoACE converting. 2. Code the module ACEtoSAM and SAMtoACE converting 3. Test my code by myself 4. Discuss with my mentor to design good test 5. Test my code based on the test design 4. *June 22 - July 8 * 1. Discuss with my mentor about my code in order to publish in bioperl community 2. Publish my code to the community and learn the feedback *JULY 9 MID TERM EVALUATION * 5. *July 9 - August 5 * 1. Improving the code (do iteration activities) : 1. Keep contact with the community, learn the feedback 2. Make changes in the code, with the supervision from my mentor 3. Test the code and publish the code to the community 2. Finalize the code 3. Start writing the POD documentation 6. *August 6 - August 13 * For final documentation *A buffer of a week for unpredicted delay * *AUGUST 20 FINAL EVALUATION* From heath.obrien at gmail.com Tue Apr 3 12:56:31 2012 From: heath.obrien at gmail.com (Heath O'Brien) Date: Tue, 3 Apr 2012 16:56:31 +0000 (UTC) Subject: [Bioperl-l] =?utf-8?q?problem_with_trunc=5Fwith=5Ffeatures_=28Seq?= =?utf-8?b?VXRpbHMucG0p?= Message-ID: Hi All, I've encountered a bug in the trunc_with_features function in SeqUtils.pm, or at least behavior that was unexpected to me: Features with fuzzy coordinates in the original sequence are converted to exact coordinates in the truncated sequence. For example, the script below changes the coordinates for the feature from <1..5 to 1..5. I have modified the code to change this behavior on my system, but I thought I'd post something here in case others encounter the same problem. all good things, Heath #!/usr/bin/perl -w use strict; use warnings; use Bio::SeqIO; use Bio::SeqUtils; my $infile= shift; my $inIO = Bio::SeqIO->new('-file' => $infile, '-format' => 'genbank') or die "could not open seq file $infile\n"; my $outfile = $infile . '_out.gbk'; my $outIO = Bio::SeqIO->new('-file' => ">$outfile", '-format' => 'genbank') or die "could not open seq file $outfile\n"; my $in_seq = $inIO->next_seq; my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); $outIO->write_seq($out_seq); exit; LOCUS test_sequence 57303 bp DNA linear UNA DEFINITION Sequence to demonstrate unexpected behavior of trunc_with_features ACCESSION unknown KEYWORDS . FEATURES Location/Qualifiers source 1..10 /mol_type="genomic DNA" gene <1..5 /gene="test" CDS <1..5 /product="hypothetical protein" ORIGIN 1 caagattaaa // From mkhalfan at cshl.edu Thu Apr 5 15:29:35 2012 From: mkhalfan at cshl.edu (Khalfan, Mohammed) Date: Thu, 5 Apr 2012 19:29:35 +0000 Subject: [Bioperl-l] Bio::AlignIO->add_new ORDER? Message-ID: <7A8AC28B954CFA428BD67E1376B889A21F0077FC@EX-HS-MBX03.cshl.edu> Hi, I am having a problem trying to add a new sequence to an alignment using the order parameter. I would like to add the sequence to the first position (row) in the alignment, is this possible? Here is what my code looks like: use Bio::AlignIO; use Bio::LocatableSeq; use Bio::SimpleAlign; my $in = Bio::AlignIO->new( -file => $infile ); # $infile is the output from muscle my $aln = $in->next_aln; # build a consensus from the current alignment my $consensus = $aln->consensus_string(); # make the consensus sequence obtained in the above step into a LocatableSeq object my $consensus_obj = new Bio::LocatableSeq ( -seq => $consensus, -id => 'Consensus', -start => 1, -end => length($consensus), ); # add consensus sequence to alignment $aln->add_seq($consensus_obj, 1); ## END CODE ## I have tried $aln->add_seq(seq=>$consensus_obj, order=1); $aln->add_seq(SEQ=>$consensus_obj, ORDER=1); But no luck, I cannot get the consensus listed as the first sequence in the alignment. Is this possible? I can add it in like this successfully, but it adds it to the end, which is not what I need. $aln->add_seq($consensus_obj); These are the errors I get: Using this syntax: $aln->add_seq($consensus_obj, 1); I get this error: Use of uninitialized value in subtraction (-) at /usr/lib/perl5/site_perl/5.8.8/Bio/SimpleAlign.pm line 327, line 132. Using this syntax: $aln->add_seq(seq=>$consensus_obj, order=1); I get this error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Unable to process non locatable sequences [] STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::SimpleAlign::add_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SimpleAlign.pm:297 STACK: ./muscle_post_processor.pl:49 ----------------------------------------------------------- Any assistance would be much appreciated. Thank you. From jason.stajich at gmail.com Mon Apr 9 15:52:43 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 9 Apr 2012 14:52:43 -0500 Subject: [Bioperl-l] Need some help with : Error "Use of uninitialized value" In-Reply-To: <33653318.post@talk.nabble.com> References: <33653318.post@talk.nabble.com> Message-ID: <74341B2D-5EC2-4421-B66B-F0193CA4FB52@gmail.com> You really want to create sequence object(s) an pass these into the BLAST factory. I also can't figure out why you are manually parsing the EMBL file and then using SeqIO later. Why not use SeqIO to parse the embl/genbank file? You also don't report the line number of your current problem, but one can surmise it is here: my $seq = $db->seq($id); print $seq,"\n"; The error indicates you are looking up a sequence ID that doesn't exist since you get an undefined sequence. I would suggest printing out the name of the ID you are asking for to make sure it is correct. Typically we protect these queries like if( my $seqstr = $db->seq($id) ) { print $seqstr, "\n"; } else { warn "cannot find $id in sequence db file\n"); } I think you have not really structured your logic well enough in that loop - you only want to build Bio::DB::Fasta once, the whole point is index once and then query it multiple times. You might consider starting with this code which does a lot of the stuff you are trying to do to extract annotated features. https://github.com/bioperl/bioperl-live/blob/master/scripts/seq/bp_extract_feature_seq.pl I think you are also use tr wrong - if you want to replace replace a string with an empty string you should use s/// and you also need to escape the | character since it has special meaning. I guess in your case you just want the sequence - you would use use Bio::SeqIO to read in your sequence and then pass this back out as FASTA to give to getorf. I don't know if we have a wrapper for EMBOSS's getorf. There are probably a lot more things that need some attention but you should start on these. Jason On Apr 8, 2012, at 6:35 PM, Bradyjoel wrote: > > Hi all, > > I have little experiences in programming with Perl/Bioperl. I'm currently > working on a script that takes a whole genome from a bacteria as input, > converts it into a multiple fasta file containing all the open reading > frames and blast it against a multiple protein fasta file with know > proteins. When I get a hit I want to combine the header of the known protein > with the orf sequence, here it gives an error when I try to go through the > orf file and extract the right corresponding sequence. The error it gives is > : Use of uninitialized value $seq in print at blastscript.pl line .. > Is there someone who has an idea what caused this error, and can help me > with solving it? > > Regards, Joel (I put my script in the attachment) > http://old.nabble.com/file/p33653318/blastscript.pl blastscript.pl > -- > View this message in context: http://old.nabble.com/Need-some-help-with-%3A-Error-%22Use-of-uninitialized-value%22-tp33653318p33653318.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason.stajich at gmail.com Mon Apr 9 15:57:52 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 9 Apr 2012 14:57:52 -0500 Subject: [Bioperl-l] Bio::AlignIO->add_new ORDER? In-Reply-To: <7A8AC28B954CFA428BD67E1376B889A21F0077FC@EX-HS-MBX03.cshl.edu> References: <7A8AC28B954CFA428BD67E1376B889A21F0077FC@EX-HS-MBX03.cshl.edu> Message-ID: You cannot use order=1, it would have to be order => 1 as you are passing in a hash not an assignment. However, I think the rearrange function that parses arguments prefers a leading '-' so it should be -order => 1. Same thing for -seq=>$seq not seq=$seq Did you try using exactly what is in the perldoc? Title : add_seq Usage : $myalign->add_seq($newseq); $myalign->add_seq(-SEQ=>$newseq, -ORDER=>5); Function : Adds another sequence to the alignment. *Does not* align it - just adds it to the hashes. If -ORDER is specified, the sequence is inserted at the the position spec'd by -ORDER, and existing sequences are pushed down the storage array. Returns : nothing Args : A Bio::LocatableSeq object Positive integer for the sequence position (optional) Also - I am not sure what version of the code you are using, that line error you report is not in the current code so you may have to print out what is on those lines or consider upgrading to latest version of the code. On Apr 5, 2012, at 2:29 PM, Khalfan, Mohammed wrote: > Hi, > > I am having a problem trying to add a new sequence to an alignment using the order parameter. > > I would like to add the sequence to the first position (row) in the alignment, is this possible? Here is what my code looks like: > > use Bio::AlignIO; > use Bio::LocatableSeq; > use Bio::SimpleAlign; > > my $in = Bio::AlignIO->new( -file => $infile ); # $infile is the output from muscle > > my $aln = $in->next_aln; > > # build a consensus from the current alignment > my $consensus = $aln->consensus_string(); > > # make the consensus sequence obtained in the above step into a LocatableSeq object > my $consensus_obj = new Bio::LocatableSeq ( > -seq => $consensus, > -id => 'Consensus', > -start => 1, > -end => length($consensus), > ); > > # add consensus sequence to alignment > $aln->add_seq($consensus_obj, 1); > > ## END CODE ## > > I have tried > $aln->add_seq(seq=>$consensus_obj, order=1); > $aln->add_seq(SEQ=>$consensus_obj, ORDER=1); > > But no luck, I cannot get the consensus listed as the first sequence in the alignment. Is this possible? > > I can add it in like this successfully, but it adds it to the end, which is not what I need. > $aln->add_seq($consensus_obj); > > These are the errors I get: > > Using this syntax: $aln->add_seq($consensus_obj, 1); > I get this error: > Use of uninitialized value in subtraction (-) at /usr/lib/perl5/site_perl/5.8.8/Bio/SimpleAlign.pm line 327, line 132. > > Using this syntax: $aln->add_seq(seq=>$consensus_obj, order=1); > I get this error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Unable to process non locatable sequences [] > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::SimpleAlign::add_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SimpleAlign.pm:297 > STACK: ./muscle_post_processor.pl:49 > ----------------------------------------------------------- > > Any assistance would be much appreciated. Thank you. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From heath.obrien at gmail.com Mon Apr 9 17:37:56 2012 From: heath.obrien at gmail.com (Heath O'Brien) Date: Mon, 9 Apr 2012 17:37:56 -0400 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F8352DB.6060106@sanger.ac.uk> References: <4F8352DB.6060106@sanger.ac.uk> Message-ID: <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> Hi Frank, I just tried it with the latest version from bioperl-live, and it worked the way I described in my email. all good things, Heath On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: > Hi Heath, > > I have recently worked a bit on that module and contributed the code > to bioperl-live. I think this behaviour may already have changed but > I'm not 100% sure at the moment. When I have some time I will review > the code to confirm. In the meantime, you could give it a go with > the bioperl-live version if that's an option for you? > > Cheers, > > Frank > > > On 03/04/12 17:56, Heath O'Brien wrote: >> Hi All, >> >> I've encountered a bug in the trunc_with_features function in >> SeqUtils.pm, or at >> least behavior that was unexpected to me: >> >> Features with fuzzy coordinates in the original sequence are >> converted to exact >> coordinates in the truncated sequence. For example, the script >> below changes the >> coordinates for the feature from<1..5 to 1..5. >> >> I have modified the code to change this behavior on my system, but >> I thought I'd >> post something here in case others encounter the same problem. >> >> all good things, >> Heath >> >> >> >> #!/usr/bin/perl -w >> >> use strict; >> use warnings; >> use Bio::SeqIO; >> use Bio::SeqUtils; >> >> my $infile= shift; >> >> my $inIO = Bio::SeqIO->new('-file' => $infile, >> '-format' => 'genbank') or die "could not open seq file >> $infile\n"; >> >> my $outfile = $infile . '_out.gbk'; >> >> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >> '-format' => 'genbank') or die "could not open seq file >> $outfile\n"; >> >> my $in_seq = $inIO->next_seq; >> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >> $outIO->write_seq($out_seq); >> exit; >> >> >> LOCUS test_sequence 57303 bp DNA linear UNA >> DEFINITION Sequence to demonstrate unexpected behavior of >> trunc_with_features >> ACCESSION unknown >> KEYWORDS . >> FEATURES Location/Qualifiers >> source 1..10 >> /mol_type="genomic DNA" >> gene<1..5 >> /gene="test" >> CDS<1..5 >> /product="hypothetical protein" >> ORIGIN >> 1 caagattaaa >> // >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. From fs5 at sanger.ac.uk Mon Apr 9 17:21:31 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 09 Apr 2012 22:21:31 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: References: Message-ID: <4F8352DB.6060106@sanger.ac.uk> Hi Heath, I have recently worked a bit on that module and contributed the code to bioperl-live. I think this behaviour may already have changed but I'm not 100% sure at the moment. When I have some time I will review the code to confirm. In the meantime, you could give it a go with the bioperl-live version if that's an option for you? Cheers, Frank On 03/04/12 17:56, Heath O'Brien wrote: > Hi All, > > I've encountered a bug in the trunc_with_features function in SeqUtils.pm, or at > least behavior that was unexpected to me: > > Features with fuzzy coordinates in the original sequence are converted to exact > coordinates in the truncated sequence. For example, the script below changes the > coordinates for the feature from<1..5 to 1..5. > > I have modified the code to change this behavior on my system, but I thought I'd > post something here in case others encounter the same problem. > > all good things, > Heath > > > > #!/usr/bin/perl -w > > use strict; > use warnings; > use Bio::SeqIO; > use Bio::SeqUtils; > > my $infile= shift; > > my $inIO = Bio::SeqIO->new('-file' => $infile, > '-format' => 'genbank') or die "could not open seq file $infile\n"; > > my $outfile = $infile . '_out.gbk'; > > my $outIO = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => 'genbank') or die "could not open seq file $outfile\n"; > > my $in_seq = $inIO->next_seq; > my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); > $outIO->write_seq($out_seq); > exit; > > > LOCUS test_sequence 57303 bp DNA linear UNA > DEFINITION Sequence to demonstrate unexpected behavior of trunc_with_features > ACCESSION unknown > KEYWORDS . > FEATURES Location/Qualifiers > source 1..10 > /mol_type="genomic DNA" > gene<1..5 > /gene="test" > CDS<1..5 > /product="hypothetical protein" > ORIGIN > 1 caagattaaa > // > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From longbow0 at gmail.com Tue Apr 10 00:40:16 2012 From: longbow0 at gmail.com (longbow leo) Date: Mon, 9 Apr 2012 23:40:16 -0500 Subject: [Bioperl-l] Strange behavior of $node->height for scientific notation format branch length Message-ID: Hi all, I have encountered a strange behavior while calculating the tree height at root node. If the branch length of the tree was in scientific notation format, such as MrBayes created trees, it is unable to give correct results. For example, Tree 1: (((A:0.02,B:0.025):0.12,C:0.071):0.34,D:0.6); Tree 2: (((A:2e-2,B:2.5e-2):1.2e-1,C:7.1e-2):3.4e-1,D:6e-1); These two trees are identical besides the expression of branch length. The Perl script: # ============================================================ #!/usr/bin/perl use 5.010; use strict; use warnings; use Bio::TreeIO; my $usage = << "EOS"; Display branch lengths for leave nodes. Usage: t_branchlen.pl [] Params: : Tree file. : Tree format. Optional. Default "newick". EOS my ($ftre, $fmt) = @ARGV; die $usage unless ( defined $ftre ); $fmt = 'newick' unless ( defined $fmt); my $o_treei = Bio::TreeIO->new( -file => $ftre, -format => $fmt, ); my $o_tree = $o_treei->next_tree; my @o_leaves = $o_tree->get_leaf_nodes(); say join("\t", ("Node", "Branch Length", "Depth")); for my $o_node ( @o_leaves ) { say $o_node->id, "\t", $o_node->branch_length, "\t", $o_node->depth; } my $o_root = $o_tree->get_root_node; # say; say "Root height:\t", $o_root->height; exit 0; # ============================================================ For tree 1, the output is: Node Branch Length Depth A 0.02 0.48 B 0.025 0.485 C 0.071 0.411 D 0.6 0.6 *Root height: 0.6* For tree 2, Node Branch Length Depth A 2e-2 0.48 B 2.5e-2 0.485 C 7.1e-2 0.411 D 6e-1 0.6 *Root height: 3* The interesting thing is, the node depth values are correct, but I have no idea how the root height calculated. Are there any ideas to resolve this problem? Thanks! Haizhou From jason.stajich at gmail.com Tue Apr 10 02:33:00 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 9 Apr 2012 23:33:00 -0700 Subject: [Bioperl-l] Strange behavior of $node->height for scientific notation format branch length In-Reply-To: References: Message-ID: <1839F94F-178E-44F2-8A5C-6E2657AAD59C@gmail.com> It also looks like there is some code in calculating height that only processes numbers that are floating point - see line 64. I am not sure why this is in there, but I guess it was a protection from something that was failing in some other situation. 62: foreach my $subnode ( $self->each_Descendent ) { 63: my $bl = $subnode->branch_length; 64: $bl = 1 unless (defined $bl && $bl =~ /^\-?\d+(\.\d+)?$/); 65: my $s = $subnode->height + $bl; you can work around this by first forcing all your branch lengths to floating point after you read the tree in: for my $node ($tree->get_all_nodes ) $node->branch_length(sprintf("%f",$node->branch_length); } We should think about how we might handle scientific notation branch lengths properly in the code in the future if someone wants to take this on. Jason > Hi all, > > I have encountered a strange behavior while calculating the tree height at > root node. > > If the branch length of the tree was in scientific notation format, such as > MrBayes created trees, it is unable to give correct results. > > For example, > > Tree 1: > > (((A:0.02,B:0.025):0.12,C:0.071):0.34,D:0.6); > > Tree 2: > > (((A:2e-2,B:2.5e-2):1.2e-1,C:7.1e-2):3.4e-1,D:6e-1); > > These two trees are identical besides the expression of branch length. > > The Perl script: > > # ============================================================ > > #!/usr/bin/perl > > use 5.010; > use strict; > use warnings; > > use Bio::TreeIO; > > my $usage = << "EOS"; > Display branch lengths for leave nodes. > Usage: > t_branchlen.pl [] > Params: > : Tree file. > : Tree format. Optional. Default "newick". > EOS > > my ($ftre, $fmt) = @ARGV; > > die $usage unless ( defined $ftre ); > > $fmt = 'newick' unless ( defined $fmt); > > my $o_treei = Bio::TreeIO->new( > -file => $ftre, > -format => $fmt, > ); > > my $o_tree = $o_treei->next_tree; > > my @o_leaves = $o_tree->get_leaf_nodes(); > > say join("\t", ("Node", "Branch Length", "Depth")); > > for my $o_node ( @o_leaves ) { > say $o_node->id, "\t", $o_node->branch_length, "\t", $o_node->depth; > } > > my $o_root = $o_tree->get_root_node; > > # say; > > say "Root height:\t", $o_root->height; > > exit 0; > > # ============================================================ > > For tree 1, the output is: > > Node Branch Length Depth > A 0.02 0.48 > B 0.025 0.485 > C 0.071 0.411 > D 0.6 0.6 > *Root height: 0.6* > > For tree 2, > > Node Branch Length Depth > A 2e-2 0.48 > B 2.5e-2 0.485 > C 7.1e-2 0.411 > D 6e-1 0.6 > *Root height: 3* > > The interesting thing is, the node depth values are correct, but I have no > idea how the root height calculated. > > Are there any ideas to resolve this problem? > > Thanks! > > Haizhou > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From fs5 at sanger.ac.uk Tue Apr 10 04:42:54 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 10 Apr 2012 09:42:54 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> Message-ID: <4F83F28E.4080000@sanger.ac.uk> Hi Heath, Yes, I just had a look too and it's true that it would currently ignore the original type. I had added some new methods (delete, insert, ligate) and with those the location type is preserved but not with the already existing methods like trunc_with_features. I will look into it when I have some time and make some changes. Cheers, Frank On 09/04/12 22:37, Heath O'Brien wrote: > Hi Frank, > > I just tried it with the latest version from bioperl-live, and it worked > the way I described in my email. > > all good things, > Heath > > > On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: > >> Hi Heath, >> >> I have recently worked a bit on that module and contributed the code >> to bioperl-live. I think this behaviour may already have changed but >> I'm not 100% sure at the moment. When I have some time I will review >> the code to confirm. In the meantime, you could give it a go with the >> bioperl-live version if that's an option for you? >> >> Cheers, >> >> Frank >> >> >> On 03/04/12 17:56, Heath O'Brien wrote: >>> Hi All, >>> >>> I've encountered a bug in the trunc_with_features function in >>> SeqUtils.pm, or at >>> least behavior that was unexpected to me: >>> >>> Features with fuzzy coordinates in the original sequence are >>> converted to exact >>> coordinates in the truncated sequence. For example, the script below >>> changes the >>> coordinates for the feature from<1..5 to 1..5. >>> >>> I have modified the code to change this behavior on my system, but I >>> thought I'd >>> post something here in case others encounter the same problem. >>> >>> all good things, >>> Heath >>> >>> >>> >>> #!/usr/bin/perl -w >>> >>> use strict; >>> use warnings; >>> use Bio::SeqIO; >>> use Bio::SeqUtils; >>> >>> my $infile= shift; >>> >>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>> '-format' => 'genbank') or die "could not open seq file $infile\n"; >>> >>> my $outfile = $infile . '_out.gbk'; >>> >>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>> '-format' => 'genbank') or die "could not open seq file $outfile\n"; >>> >>> my $in_seq = $inIO->next_seq; >>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>> $outIO->write_seq($out_seq); >>> exit; >>> >>> >>> LOCUS test_sequence 57303 bp DNA linear UNA >>> DEFINITION Sequence to demonstrate unexpected behavior of >>> trunc_with_features >>> ACCESSION unknown >>> KEYWORDS . >>> FEATURES Location/Qualifiers >>> source 1..10 >>> /mol_type="genomic DNA" >>> gene<1..5 >>> /gene="test" >>> CDS<1..5 >>> /product="hypothetical protein" >>> ORIGIN >>> 1 caagattaaa >>> // >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a >> company registered in England with number 2742969, whose registered >> office is 215 Euston Road, London, NW1 2BE. > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From awitney at sgul.ac.uk Tue Apr 10 05:11:51 2012 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 10 Apr 2012 10:11:51 +0100 Subject: [Bioperl-l] Output of a BLAST parse to text file In-Reply-To: References: Message-ID: <908D24DC-1A0E-4EE1-8573-F68FB3487071@sgul.ac.uk> Hi Zac, how do you want to sort the information? if its just on num_hsps... then you will have to store the results in an array or something and then sort that before printing your output adam On 1 Apr 2012, at 04:35, Zachariah Wylde wrote: > Hi there, > > I am very new to Bioperl, so excuse me if come across as simple! I need to > write a bioperl script to extract information from BLAST results. > The script needs to count how many HSPs are on each mouse chromosome and > be written to a tab-separated table. I have this so far, but do not > understand how to > sort the information. I would much, appreciate if you could help me?? > > Yours sincerely, > > Zac Wylde > > use strict; > use warnings; > use lib "C:/Program Files (x86)/BioPerl"; > use Bio::SearchIO; > > my $infile = "Alignment_Ref_Seq.txt"; > open INFILE, $infile or die "Cannot open $infile: $!"; > > my $outfile = "assignment2.txt"; > open OUTFILE, ">$outfile" or die "Cannot open $outfile: $!"; > > > my $parser = new Bio::SearchIO(-format => 'blast', -file => > 'Alignment_Ref_Seq.txt'); > > > while (my $result = $parser->next_result){ > while (my $hit = $result->next_hit){ > while (my $hsp = $hit->next_hsp){ > if ($hit->description =~ /(mus musculus)|(mouse)/i){ > if ($hit->description =~ /chromosome (\w+)/){ > print "Hit = ", $hit->name, " \t", > "chromosome = ", $1, " \t", > "HSPs = ", $hit->num_hsps, "\n"; > } > } > } > } > } > > close INFILE; > close OUTFILE; > > #unknown > #chromosome from > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Tue Apr 10 07:10:36 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 10 Apr 2012 12:10:36 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F83F28E.4080000@sanger.ac.uk> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> Message-ID: <4F84152C.7030300@gmail.com> Hi Heath, Frank, This was probably my fault back in the mists of time. Looks like an easy fix though, I've reported the issue on Redmine and submitted a patch: https://redmine.open-bio.org/issues/3339 We should probably also add Heath's example as a test case. Cheers, Roy. On 10/04/2012 09:42, Frank Schwach wrote: > Hi Heath, > > Yes, I just had a look too and it's true that it would currently ignore > the original type. I had added some new methods (delete, insert, ligate) > and with those the location type is preserved but not with the already > existing methods like trunc_with_features. I will look into it when I > have some time and make some changes. > > Cheers, > > Frank > > > On 09/04/12 22:37, Heath O'Brien wrote: >> Hi Frank, >> >> I just tried it with the latest version from bioperl-live, and it worked >> the way I described in my email. >> >> all good things, >> Heath >> >> >> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >> >>> Hi Heath, >>> >>> I have recently worked a bit on that module and contributed the code >>> to bioperl-live. I think this behaviour may already have changed but >>> I'm not 100% sure at the moment. When I have some time I will review >>> the code to confirm. In the meantime, you could give it a go with the >>> bioperl-live version if that's an option for you? >>> >>> Cheers, >>> >>> Frank >>> >>> >>> On 03/04/12 17:56, Heath O'Brien wrote: >>>> Hi All, >>>> >>>> I've encountered a bug in the trunc_with_features function in >>>> SeqUtils.pm, or at >>>> least behavior that was unexpected to me: >>>> >>>> Features with fuzzy coordinates in the original sequence are >>>> converted to exact >>>> coordinates in the truncated sequence. For example, the script below >>>> changes the >>>> coordinates for the feature from<1..5 to 1..5. >>>> >>>> I have modified the code to change this behavior on my system, but I >>>> thought I'd >>>> post something here in case others encounter the same problem. >>>> >>>> all good things, >>>> Heath >>>> >>>> >>>> >>>> #!/usr/bin/perl -w >>>> >>>> use strict; >>>> use warnings; >>>> use Bio::SeqIO; >>>> use Bio::SeqUtils; >>>> >>>> my $infile= shift; >>>> >>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>> '-format' => 'genbank') or die "could not open seq file $infile\n"; >>>> >>>> my $outfile = $infile . '_out.gbk'; >>>> >>>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>>> '-format' => 'genbank') or die "could not open seq file $outfile\n"; >>>> >>>> my $in_seq = $inIO->next_seq; >>>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>> $outIO->write_seq($out_seq); >>>> exit; >>>> >>>> >>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>> DEFINITION Sequence to demonstrate unexpected behavior of >>>> trunc_with_features >>>> ACCESSION unknown >>>> KEYWORDS . >>>> FEATURES Location/Qualifiers >>>> source 1..10 >>>> /mol_type="genomic DNA" >>>> gene<1..5 >>>> /gene="test" >>>> CDS<1..5 >>>> /product="hypothetical protein" >>>> ORIGIN >>>> 1 caagattaaa >>>> // >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome Research >>> Limited, a charity registered in England with number 1021457 and a >>> company registered in England with number 2742969, whose registered >>> office is 215 Euston Road, London, NW1 2BE. >> > > From roy.chaudhuri at gmail.com Tue Apr 10 10:45:21 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 10 Apr 2012 15:45:21 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F841EF3.6000603@sanger.ac.uk> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> Message-ID: <4F844781.90005@gmail.com> Turns out I spoke too soon, I added in some new tests and they highlighted problems with both trunc_with_features and revcom_with_features. I think I have resolved all the issues in the most recent Redmine patch - Frank, Heath, please could you check that it works for you? Cheers, Roy. On 10/04/2012 12:52, Frank Schwach wrote: > Brilliant, thanks Roy! > Frank > > > On 10/04/12 12:10, Roy Chaudhuri wrote: >> Hi Heath, Frank, >> >> This was probably my fault back in the mists of time. Looks like an easy >> fix though, I've reported the issue on Redmine and submitted a patch: >> https://redmine.open-bio.org/issues/3339 >> >> We should probably also add Heath's example as a test case. >> >> Cheers, >> Roy. >> >> On 10/04/2012 09:42, Frank Schwach wrote: >>> Hi Heath, >>> >>> Yes, I just had a look too and it's true that it would currently ignore >>> the original type. I had added some new methods (delete, insert, ligate) >>> and with those the location type is preserved but not with the already >>> existing methods like trunc_with_features. I will look into it when I >>> have some time and make some changes. >>> >>> Cheers, >>> >>> Frank >>> >>> >>> On 09/04/12 22:37, Heath O'Brien wrote: >>>> Hi Frank, >>>> >>>> I just tried it with the latest version from bioperl-live, and it worked >>>> the way I described in my email. >>>> >>>> all good things, >>>> Heath >>>> >>>> >>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>> >>>>> Hi Heath, >>>>> >>>>> I have recently worked a bit on that module and contributed the code >>>>> to bioperl-live. I think this behaviour may already have changed but >>>>> I'm not 100% sure at the moment. When I have some time I will review >>>>> the code to confirm. In the meantime, you could give it a go with the >>>>> bioperl-live version if that's an option for you? >>>>> >>>>> Cheers, >>>>> >>>>> Frank >>>>> >>>>> >>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>> Hi All, >>>>>> >>>>>> I've encountered a bug in the trunc_with_features function in >>>>>> SeqUtils.pm, or at >>>>>> least behavior that was unexpected to me: >>>>>> >>>>>> Features with fuzzy coordinates in the original sequence are >>>>>> converted to exact >>>>>> coordinates in the truncated sequence. For example, the script below >>>>>> changes the >>>>>> coordinates for the feature from<1..5 to 1..5. >>>>>> >>>>>> I have modified the code to change this behavior on my system, but I >>>>>> thought I'd >>>>>> post something here in case others encounter the same problem. >>>>>> >>>>>> all good things, >>>>>> Heath >>>>>> >>>>>> >>>>>> >>>>>> #!/usr/bin/perl -w >>>>>> >>>>>> use strict; >>>>>> use warnings; >>>>>> use Bio::SeqIO; >>>>>> use Bio::SeqUtils; >>>>>> >>>>>> my $infile= shift; >>>>>> >>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>> '-format' => 'genbank') or die "could not open seq file $infile\n"; >>>>>> >>>>>> my $outfile = $infile . '_out.gbk'; >>>>>> >>>>>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>>>>> '-format' => 'genbank') or die "could not open seq file $outfile\n"; >>>>>> >>>>>> my $in_seq = $inIO->next_seq; >>>>>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>> $outIO->write_seq($out_seq); >>>>>> exit; >>>>>> >>>>>> >>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>> DEFINITION Sequence to demonstrate unexpected behavior of >>>>>> trunc_with_features >>>>>> ACCESSION unknown >>>>>> KEYWORDS . >>>>>> FEATURES Location/Qualifiers >>>>>> source 1..10 >>>>>> /mol_type="genomic DNA" >>>>>> gene<1..5 >>>>>> /gene="test" >>>>>> CDS<1..5 >>>>>> /product="hypothetical protein" >>>>>> ORIGIN >>>>>> 1 caagattaaa >>>>>> // >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> -- >>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>> Limited, a charity registered in England with number 1021457 and a >>>>> company registered in England with number 2742969, whose registered >>>>> office is 215 Euston Road, London, NW1 2BE. >>>> >>> >>> >> > > From heath.obrien at gmail.com Tue Apr 10 11:34:59 2012 From: heath.obrien at gmail.com (Heath O'Brien) Date: Tue, 10 Apr 2012 11:34:59 -0400 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F844781.90005@gmail.com> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> <4F844781.90005@gmail.com> Message-ID: <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> Works perfect for me. Thanks! all good things, Heath On 10-Apr-12, at 10:45 AM, Roy Chaudhuri wrote: > Turns out I spoke too soon, I added in some new tests and they > highlighted problems with both trunc_with_features and > revcom_with_features. I think I have resolved all the issues in the > most recent Redmine patch - Frank, Heath, please could you check > that it works for you? > > Cheers, > Roy. > > On 10/04/2012 12:52, Frank Schwach wrote: >> Brilliant, thanks Roy! >> Frank >> >> >> On 10/04/12 12:10, Roy Chaudhuri wrote: >>> Hi Heath, Frank, >>> >>> This was probably my fault back in the mists of time. Looks like >>> an easy >>> fix though, I've reported the issue on Redmine and submitted a >>> patch: >>> https://redmine.open-bio.org/issues/3339 >>> >>> We should probably also add Heath's example as a test case. >>> >>> Cheers, >>> Roy. >>> >>> On 10/04/2012 09:42, Frank Schwach wrote: >>>> Hi Heath, >>>> >>>> Yes, I just had a look too and it's true that it would currently >>>> ignore >>>> the original type. I had added some new methods (delete, insert, >>>> ligate) >>>> and with those the location type is preserved but not with the >>>> already >>>> existing methods like trunc_with_features. I will look into it >>>> when I >>>> have some time and make some changes. >>>> >>>> Cheers, >>>> >>>> Frank >>>> >>>> >>>> On 09/04/12 22:37, Heath O'Brien wrote: >>>>> Hi Frank, >>>>> >>>>> I just tried it with the latest version from bioperl-live, and >>>>> it worked >>>>> the way I described in my email. >>>>> >>>>> all good things, >>>>> Heath >>>>> >>>>> >>>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>>> >>>>>> Hi Heath, >>>>>> >>>>>> I have recently worked a bit on that module and contributed the >>>>>> code >>>>>> to bioperl-live. I think this behaviour may already have >>>>>> changed but >>>>>> I'm not 100% sure at the moment. When I have some time I will >>>>>> review >>>>>> the code to confirm. In the meantime, you could give it a go >>>>>> with the >>>>>> bioperl-live version if that's an option for you? >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> I've encountered a bug in the trunc_with_features function in >>>>>>> SeqUtils.pm, or at >>>>>>> least behavior that was unexpected to me: >>>>>>> >>>>>>> Features with fuzzy coordinates in the original sequence are >>>>>>> converted to exact >>>>>>> coordinates in the truncated sequence. For example, the script >>>>>>> below >>>>>>> changes the >>>>>>> coordinates for the feature from<1..5 to 1..5. >>>>>>> >>>>>>> I have modified the code to change this behavior on my system, >>>>>>> but I >>>>>>> thought I'd >>>>>>> post something here in case others encounter the same problem. >>>>>>> >>>>>>> all good things, >>>>>>> Heath >>>>>>> >>>>>>> >>>>>>> >>>>>>> #!/usr/bin/perl -w >>>>>>> >>>>>>> use strict; >>>>>>> use warnings; >>>>>>> use Bio::SeqIO; >>>>>>> use Bio::SeqUtils; >>>>>>> >>>>>>> my $infile= shift; >>>>>>> >>>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>>> '-format' => 'genbank') or die "could not open seq file >>>>>>> $infile\n"; >>>>>>> >>>>>>> my $outfile = $infile . '_out.gbk'; >>>>>>> >>>>>>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>>>>>> '-format' => 'genbank') or die "could not open seq file >>>>>>> $outfile\n"; >>>>>>> >>>>>>> my $in_seq = $inIO->next_seq; >>>>>>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>>> $outIO->write_seq($out_seq); >>>>>>> exit; >>>>>>> >>>>>>> >>>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>>> DEFINITION Sequence to demonstrate unexpected behavior of >>>>>>> trunc_with_features >>>>>>> ACCESSION unknown >>>>>>> KEYWORDS . >>>>>>> FEATURES Location/Qualifiers >>>>>>> source 1..10 >>>>>>> /mol_type="genomic DNA" >>>>>>> gene<1..5 >>>>>>> /gene="test" >>>>>>> CDS<1..5 >>>>>>> /product="hypothetical protein" >>>>>>> ORIGIN >>>>>>> 1 caagattaaa >>>>>>> // >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> -- >>>>>> The Wellcome Trust Sanger Institute is operated by Genome >>>>>> Research >>>>>> Limited, a charity registered in England with number 1021457 >>>>>> and a >>>>>> company registered in England with number 2742969, whose >>>>>> registered >>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>> >>>> >>>> >>> >> >> > From cjfields at illinois.edu Tue Apr 10 13:08:45 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 10 Apr 2012 17:08:45 +0000 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> <4F844781.90005@gmail.com> <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> Message-ID: I have committed these to bioperl-live, they passed tests for me. I have left the bug report open, however, in case more work needs to be done. Roy, did you want to close that when you are ready? chris On Apr 10, 2012, at 10:34 AM, Heath O'Brien wrote: > Works perfect for me. Thanks! > > all good things, > Heath > > On 10-Apr-12, at 10:45 AM, Roy Chaudhuri wrote: > >> Turns out I spoke too soon, I added in some new tests and they highlighted problems with both trunc_with_features and revcom_with_features. I think I have resolved all the issues in the most recent Redmine patch - Frank, Heath, please could you check that it works for you? >> >> Cheers, >> Roy. >> >> On 10/04/2012 12:52, Frank Schwach wrote: >>> Brilliant, thanks Roy! >>> Frank >>> >>> >>> On 10/04/12 12:10, Roy Chaudhuri wrote: >>>> Hi Heath, Frank, >>>> >>>> This was probably my fault back in the mists of time. Looks like an easy >>>> fix though, I've reported the issue on Redmine and submitted a patch: >>>> https://redmine.open-bio.org/issues/3339 >>>> >>>> We should probably also add Heath's example as a test case. >>>> >>>> Cheers, >>>> Roy. >>>> >>>> On 10/04/2012 09:42, Frank Schwach wrote: >>>>> Hi Heath, >>>>> >>>>> Yes, I just had a look too and it's true that it would currently ignore >>>>> the original type. I had added some new methods (delete, insert, ligate) >>>>> and with those the location type is preserved but not with the already >>>>> existing methods like trunc_with_features. I will look into it when I >>>>> have some time and make some changes. >>>>> >>>>> Cheers, >>>>> >>>>> Frank >>>>> >>>>> >>>>> On 09/04/12 22:37, Heath O'Brien wrote: >>>>>> Hi Frank, >>>>>> >>>>>> I just tried it with the latest version from bioperl-live, and it worked >>>>>> the way I described in my email. >>>>>> >>>>>> all good things, >>>>>> Heath >>>>>> >>>>>> >>>>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>>>> >>>>>>> Hi Heath, >>>>>>> >>>>>>> I have recently worked a bit on that module and contributed the code >>>>>>> to bioperl-live. I think this behaviour may already have changed but >>>>>>> I'm not 100% sure at the moment. When I have some time I will review >>>>>>> the code to confirm. In the meantime, you could give it a go with the >>>>>>> bioperl-live version if that's an option for you? >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> I've encountered a bug in the trunc_with_features function in >>>>>>>> SeqUtils.pm, or at >>>>>>>> least behavior that was unexpected to me: >>>>>>>> >>>>>>>> Features with fuzzy coordinates in the original sequence are >>>>>>>> converted to exact >>>>>>>> coordinates in the truncated sequence. For example, the script below >>>>>>>> changes the >>>>>>>> coordinates for the feature from<1..5 to 1..5. >>>>>>>> >>>>>>>> I have modified the code to change this behavior on my system, but I >>>>>>>> thought I'd >>>>>>>> post something here in case others encounter the same problem. >>>>>>>> >>>>>>>> all good things, >>>>>>>> Heath >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> #!/usr/bin/perl -w >>>>>>>> >>>>>>>> use strict; >>>>>>>> use warnings; >>>>>>>> use Bio::SeqIO; >>>>>>>> use Bio::SeqUtils; >>>>>>>> >>>>>>>> my $infile= shift; >>>>>>>> >>>>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>>>> '-format' => 'genbank') or die "could not open seq file $infile\n"; >>>>>>>> >>>>>>>> my $outfile = $infile . '_out.gbk'; >>>>>>>> >>>>>>>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>>>>>>> '-format' => 'genbank') or die "could not open seq file $outfile\n"; >>>>>>>> >>>>>>>> my $in_seq = $inIO->next_seq; >>>>>>>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>>>> $outIO->write_seq($out_seq); >>>>>>>> exit; >>>>>>>> >>>>>>>> >>>>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>>>> DEFINITION Sequence to demonstrate unexpected behavior of >>>>>>>> trunc_with_features >>>>>>>> ACCESSION unknown >>>>>>>> KEYWORDS . >>>>>>>> FEATURES Location/Qualifiers >>>>>>>> source 1..10 >>>>>>>> /mol_type="genomic DNA" >>>>>>>> gene<1..5 >>>>>>>> /gene="test" >>>>>>>> CDS<1..5 >>>>>>>> /product="hypothetical protein" >>>>>>>> ORIGIN >>>>>>>> 1 caagattaaa >>>>>>>> // >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>>>> Limited, a charity registered in England with number 1021457 and a >>>>>>> company registered in England with number 2742969, whose registered >>>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>>> >>>>> >>>>> >>>> >>> >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Tue Apr 10 16:07:28 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 10 Apr 2012 21:07:28 +0100 Subject: [Bioperl-l] sequence proxy server In-Reply-To: References: Message-ID: On Fri, Apr 6, 2012 at 2:49 PM, Kevin Murray wrote: > Hi all, > > I'm an undergrad student in molecular biology at the ANU in Australia, > and my research projects are becoming increasingly bioinformatics > heavy. The latest one has involved quite a large amount of sequence > retrieval from GenBank and GenPept. The download speed to Australia > from NCBI's servers is rather slow, and i've been thinking about how > we can improve this. ...So, i though about writing a "sequence proxy" ... Have you tried TogoWS? It is based Japan and offers access to some of the local databases but also proxies some important EMBL/EBI and NCBI resources as well - including GenBank. I would expect you'd get much faster response times from Australia than talking directly to the NCBI. http://togows.dbcls.jp/site/en/rest.html I think the TogoWS REST API is very nice to use, and seems to give much clearer error messages than the NCBI Entrez site (TogoWS uses HTTP error codes pretty consistently). Biopython 1.59 onwards has a simple API for the TogoWS REST interface, but their URL structure is very easy, so for a simple one off task you can easily roll your own in Perl (or write one for BioPerl?). Peter From cjfields at illinois.edu Tue Apr 10 21:20:48 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 11 Apr 2012 01:20:48 +0000 Subject: [Bioperl-l] sequence proxy server In-Reply-To: References: Message-ID: On Apr 10, 2012, at 3:07 PM, Peter Cock wrote: > On Fri, Apr 6, 2012 at 2:49 PM, Kevin Murray wrote: >> Hi all, >> >> I'm an undergrad student in molecular biology at the ANU in Australia, >> and my research projects are becoming increasingly bioinformatics >> heavy. The latest one has involved quite a large amount of sequence >> retrieval from GenBank and GenPept. The download speed to Australia >> from NCBI's servers is rather slow, and i've been thinking about how >> we can improve this. ...So, i though about writing a "sequence proxy" ... > > Have you tried TogoWS? It is based Japan and offers access to > some of the local databases but also proxies some important > EMBL/EBI and NCBI resources as well - including GenBank. > I would expect you'd get much faster response times from > Australia than talking directly to the NCBI. > http://togows.dbcls.jp/site/en/rest.html > > I think the TogoWS REST API is very nice to use, and seems to > give much clearer error messages than the NCBI Entrez site > (TogoWS uses HTTP error codes pretty consistently). > > Biopython 1.59 onwards has a simple API for the TogoWS > REST interface, but their URL structure is very easy, so for > a simple one off task you can easily roll your own in Perl > (or write one for BioPerl?). > > Peter Should be easy enough if the API is well-documented. Related to this, anyone know if NCBI's REST API is documented anywhere? chris From roy.chaudhuri at gmail.com Wed Apr 11 06:55:49 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 11 Apr 2012 11:55:49 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> <4F844781.90005@gmail.com> <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> Message-ID: <4F856335.1000503@gmail.com> Hi Chris, I think it should be fine to close, but my account doesn't have permission to do so. Cheers, Roy. On 10/04/2012 18:08, Fields, Christopher J wrote: > I have committed these to bioperl-live, they passed tests for me. I > have left the bug report open, however, in case more work needs to be > done. Roy, did you want to close that when you are ready? > > chris > > On Apr 10, 2012, at 10:34 AM, Heath O'Brien wrote: > >> Works perfect for me. Thanks! >> >> all good things, Heath >> >> On 10-Apr-12, at 10:45 AM, Roy Chaudhuri wrote: >> >>> Turns out I spoke too soon, I added in some new tests and they >>> highlighted problems with both trunc_with_features and >>> revcom_with_features. I think I have resolved all the issues in >>> the most recent Redmine patch - Frank, Heath, please could you >>> check that it works for you? >>> >>> Cheers, Roy. >>> >>> On 10/04/2012 12:52, Frank Schwach wrote: >>>> Brilliant, thanks Roy! Frank >>>> >>>> >>>> On 10/04/12 12:10, Roy Chaudhuri wrote: >>>>> Hi Heath, Frank, >>>>> >>>>> This was probably my fault back in the mists of time. Looks >>>>> like an easy fix though, I've reported the issue on Redmine >>>>> and submitted a patch: >>>>> https://redmine.open-bio.org/issues/3339 >>>>> >>>>> We should probably also add Heath's example as a test case. >>>>> >>>>> Cheers, Roy. >>>>> >>>>> On 10/04/2012 09:42, Frank Schwach wrote: >>>>>> Hi Heath, >>>>>> >>>>>> Yes, I just had a look too and it's true that it would >>>>>> currently ignore the original type. I had added some new >>>>>> methods (delete, insert, ligate) and with those the >>>>>> location type is preserved but not with the already >>>>>> existing methods like trunc_with_features. I will look into >>>>>> it when I have some time and make some changes. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> On 09/04/12 22:37, Heath O'Brien wrote: >>>>>>> Hi Frank, >>>>>>> >>>>>>> I just tried it with the latest version from >>>>>>> bioperl-live, and it worked the way I described in my >>>>>>> email. >>>>>>> >>>>>>> all good things, Heath >>>>>>> >>>>>>> >>>>>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>>>>> >>>>>>>> Hi Heath, >>>>>>>> >>>>>>>> I have recently worked a bit on that module and >>>>>>>> contributed the code to bioperl-live. I think this >>>>>>>> behaviour may already have changed but I'm not 100% >>>>>>>> sure at the moment. When I have some time I will >>>>>>>> review the code to confirm. In the meantime, you could >>>>>>>> give it a go with the bioperl-live version if that's an >>>>>>>> option for you? >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Frank >>>>>>>> >>>>>>>> >>>>>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> I've encountered a bug in the trunc_with_features >>>>>>>>> function in SeqUtils.pm, or at least behavior that >>>>>>>>> was unexpected to me: >>>>>>>>> >>>>>>>>> Features with fuzzy coordinates in the original >>>>>>>>> sequence are converted to exact coordinates in the >>>>>>>>> truncated sequence. For example, the script below >>>>>>>>> changes the coordinates for the feature from<1..5 to >>>>>>>>> 1..5. >>>>>>>>> >>>>>>>>> I have modified the code to change this behavior on >>>>>>>>> my system, but I thought I'd post something here in >>>>>>>>> case others encounter the same problem. >>>>>>>>> >>>>>>>>> all good things, Heath >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> #!/usr/bin/perl -w >>>>>>>>> >>>>>>>>> use strict; use warnings; use Bio::SeqIO; use >>>>>>>>> Bio::SeqUtils; >>>>>>>>> >>>>>>>>> my $infile= shift; >>>>>>>>> >>>>>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>>>>> '-format' => 'genbank') or die "could not open seq >>>>>>>>> file $infile\n"; >>>>>>>>> >>>>>>>>> my $outfile = $infile . '_out.gbk'; >>>>>>>>> >>>>>>>>> my $outIO = Bio::SeqIO->new('-file' => >>>>>>>>> ">$outfile", '-format' => 'genbank') or die "could >>>>>>>>> not open seq file $outfile\n"; >>>>>>>>> >>>>>>>>> my $in_seq = $inIO->next_seq; my $out_seq = >>>>>>>>> Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>>>>> $outIO->write_seq($out_seq); exit; >>>>>>>>> >>>>>>>>> >>>>>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>>>>> DEFINITION Sequence to demonstrate unexpected >>>>>>>>> behavior of trunc_with_features ACCESSION unknown >>>>>>>>> KEYWORDS . FEATURES Location/Qualifiers source 1..10 >>>>>>>>> /mol_type="genomic DNA" gene<1..5 /gene="test" >>>>>>>>> CDS<1..5 /product="hypothetical protein" ORIGIN 1 >>>>>>>>> caagattaaa // >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> -- The Wellcome Trust Sanger Institute is operated by >>>>>>>> Genome Research Limited, a charity registered in >>>>>>>> England with number 1021457 and a company registered in >>>>>>>> England with number 2742969, whose registered office is >>>>>>>> 215 Euston Road, London, NW1 2BE. >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> _______________________________________________ Bioperl-l mailing >> list Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed Apr 11 11:28:38 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 11 Apr 2012 15:28:38 +0000 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F856335.1000503@gmail.com> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> <4F844781.90005@gmail.com> <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> <4F856335.1000503@gmail.com> Message-ID: Okay, closed it. Thanks again! chris On Apr 11, 2012, at 5:55 AM, Roy Chaudhuri wrote: > Hi Chris, > > I think it should be fine to close, but my account doesn't have permission to do so. > > Cheers, > Roy. > > On 10/04/2012 18:08, Fields, Christopher J wrote: >> I have committed these to bioperl-live, they passed tests for me. I >> have left the bug report open, however, in case more work needs to be >> done. Roy, did you want to close that when you are ready? >> >> chris >> >> On Apr 10, 2012, at 10:34 AM, Heath O'Brien wrote: >> >>> Works perfect for me. Thanks! >>> >>> all good things, Heath >>> >>> On 10-Apr-12, at 10:45 AM, Roy Chaudhuri wrote: >>> >>>> Turns out I spoke too soon, I added in some new tests and they >>>> highlighted problems with both trunc_with_features and >>>> revcom_with_features. I think I have resolved all the issues in >>>> the most recent Redmine patch - Frank, Heath, please could you >>>> check that it works for you? >>>> >>>> Cheers, Roy. >>>> >>>> On 10/04/2012 12:52, Frank Schwach wrote: >>>>> Brilliant, thanks Roy! Frank >>>>> >>>>> >>>>> On 10/04/12 12:10, Roy Chaudhuri wrote: >>>>>> Hi Heath, Frank, >>>>>> >>>>>> This was probably my fault back in the mists of time. Looks >>>>>> like an easy fix though, I've reported the issue on Redmine >>>>>> and submitted a patch: >>>>>> https://redmine.open-bio.org/issues/3339 >>>>>> >>>>>> We should probably also add Heath's example as a test case. >>>>>> >>>>>> Cheers, Roy. >>>>>> >>>>>> On 10/04/2012 09:42, Frank Schwach wrote: >>>>>>> Hi Heath, >>>>>>> >>>>>>> Yes, I just had a look too and it's true that it would >>>>>>> currently ignore the original type. I had added some new >>>>>>> methods (delete, insert, ligate) and with those the >>>>>>> location type is preserved but not with the already >>>>>>> existing methods like trunc_with_features. I will look into >>>>>>> it when I have some time and make some changes. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> On 09/04/12 22:37, Heath O'Brien wrote: >>>>>>>> Hi Frank, >>>>>>>> >>>>>>>> I just tried it with the latest version from >>>>>>>> bioperl-live, and it worked the way I described in my >>>>>>>> email. >>>>>>>> >>>>>>>> all good things, Heath >>>>>>>> >>>>>>>> >>>>>>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>>>>>> >>>>>>>>> Hi Heath, >>>>>>>>> >>>>>>>>> I have recently worked a bit on that module and >>>>>>>>> contributed the code to bioperl-live. I think this >>>>>>>>> behaviour may already have changed but I'm not 100% >>>>>>>>> sure at the moment. When I have some time I will >>>>>>>>> review the code to confirm. In the meantime, you could >>>>>>>>> give it a go with the bioperl-live version if that's an >>>>>>>>> option for you? >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Frank >>>>>>>>> >>>>>>>>> >>>>>>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> I've encountered a bug in the trunc_with_features >>>>>>>>>> function in SeqUtils.pm, or at least behavior that >>>>>>>>>> was unexpected to me: >>>>>>>>>> >>>>>>>>>> Features with fuzzy coordinates in the original >>>>>>>>>> sequence are converted to exact coordinates in the >>>>>>>>>> truncated sequence. For example, the script below >>>>>>>>>> changes the coordinates for the feature from<1..5 to >>>>>>>>>> 1..5. >>>>>>>>>> >>>>>>>>>> I have modified the code to change this behavior on >>>>>>>>>> my system, but I thought I'd post something here in >>>>>>>>>> case others encounter the same problem. >>>>>>>>>> >>>>>>>>>> all good things, Heath >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> #!/usr/bin/perl -w >>>>>>>>>> >>>>>>>>>> use strict; use warnings; use Bio::SeqIO; use >>>>>>>>>> Bio::SeqUtils; >>>>>>>>>> >>>>>>>>>> my $infile= shift; >>>>>>>>>> >>>>>>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>>>>>> '-format' => 'genbank') or die "could not open seq >>>>>>>>>> file $infile\n"; >>>>>>>>>> >>>>>>>>>> my $outfile = $infile . '_out.gbk'; >>>>>>>>>> >>>>>>>>>> my $outIO = Bio::SeqIO->new('-file' => >>>>>>>>>> ">$outfile", '-format' => 'genbank') or die "could >>>>>>>>>> not open seq file $outfile\n"; >>>>>>>>>> >>>>>>>>>> my $in_seq = $inIO->next_seq; my $out_seq = >>>>>>>>>> Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>>>>>> $outIO->write_seq($out_seq); exit; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>>>>>> DEFINITION Sequence to demonstrate unexpected >>>>>>>>>> behavior of trunc_with_features ACCESSION unknown >>>>>>>>>> KEYWORDS . FEATURES Location/Qualifiers source 1..10 >>>>>>>>>> /mol_type="genomic DNA" gene<1..5 /gene="test" >>>>>>>>>> CDS<1..5 /product="hypothetical protein" ORIGIN 1 >>>>>>>>>> caagattaaa // >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>>> -- The Wellcome Trust Sanger Institute is operated by >>>>>>>>> Genome Research Limited, a charity registered in >>>>>>>>> England with number 1021457 and a company registered in >>>>>>>>> England with number 2742969, whose registered office is >>>>>>>>> 215 Euston Road, London, NW1 2BE. >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> _______________________________________________ Bioperl-l mailing >>> list Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From p.j.a.cock at googlemail.com Thu Apr 12 08:47:05 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 12 Apr 2012 13:47:05 +0100 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE In-Reply-To: References: Message-ID: On Thu, Apr 5, 2012 at 5:52 AM, afia hayati wrote: > Dear all, > > I am afia, a PhD student in Bioinformatics. ?I am so interested to > participate GSoC and would like to apply for project Bio::Assembly, Ace2Sam > and Sam2Ace converter. I have written a proposal based on the guidance for > prospective GSoC student. I paste my proposal in here. > If you have time, please give me suggestions. > Thank you very much. > > Sincerely, > Afiahayati Hello Afiahayati, What would you use this converter for? I can see it is useful to convert ACE to SAM/BAM for downstream analysis and visualization. At the moment the only assemblers I regularly use which produce ACE are the Roche 'Newbler' gsAssember, and MIRA. For MIRA, Bastien is working on native SAM output, but for the moment I wrote and maintain a converter from MIRA's alignment format (MAF) to SAM: https://github.com/peterjc/maf2sam Or is the idea more to support SAM (and BAM) assemblies within the existing BioPerl Bio::Assembly::IO: framework to allow easier manipulation from Perl? Peter From florent.angly at gmail.com Thu Apr 12 22:41:54 2012 From: florent.angly at gmail.com (Florent Angly) Date: Fri, 13 Apr 2012 12:41:54 +1000 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE In-Reply-To: References: Message-ID: <4F879272.30306@gmail.com> Bioperl has a module to read and write ACE files, Bio::Assembly::IO::ace. It also has a module to read (but not write) SAM files, Bio::Assembly::IO::sam. So, it looks like you can already do SAMtoACE within Bioperl. Implementing ACEtoSAM would involve adding write support to the Bio::Assembly::sam module. This can be helped by looking at how Bio::Assembly::IO::ace and Bio::Assembly::tigr implement write support. Regards, Florent On 12/04/12 22:47, Peter Cock wrote: > On Thu, Apr 5, 2012 at 5:52 AM, afia hayati wrote: >> Dear all, >> >> I am afia, a PhD student in Bioinformatics. I am so interested to >> participate GSoC and would like to apply for project Bio::Assembly, Ace2Sam >> and Sam2Ace converter. I have written a proposal based on the guidance for >> prospective GSoC student. I paste my proposal in here. >> If you have time, please give me suggestions. >> Thank you very much. >> >> Sincerely, >> Afiahayati > Hello Afiahayati, > > What would you use this converter for? > > I can see it is useful to convert ACE to SAM/BAM for downstream analysis > and visualization. At the moment the only assemblers I regularly use which > produce ACE are the Roche 'Newbler' gsAssember, and MIRA. > > For MIRA, Bastien is working on native SAM output, but for the moment > I wrote and maintain a converter from MIRA's alignment format (MAF) to > SAM: https://github.com/peterjc/maf2sam > > Or is the idea more to support SAM (and BAM) assemblies within the > existing BioPerl Bio::Assembly::IO: framework to allow easier > manipulation from Perl? > > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Fri Apr 13 04:32:00 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 13 Apr 2012 09:32:00 +0100 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE In-Reply-To: <4F879272.30306@gmail.com> References: <4F879272.30306@gmail.com> Message-ID: On Fri, Apr 13, 2012 at 3:41 AM, Florent Angly wrote: > Bioperl has a module to read and write ACE files, Bio::Assembly::IO::ace. It > also has a module to read (but not write) SAM files, Bio::Assembly::IO::sam. > So, it looks like you can already do SAMtoACE within Bioperl. Implementing > ACEtoSAM would involve adding write support to the Bio::Assembly::sam > module. This can be helped by looking at how Bio::Assembly::IO::ace and > Bio::Assembly::tigr implement write support. > Regards, > Florent Does SAMtoACE within Bioperl using Bio::Assembly::IO actually work? Note that proper multiple sequence alignments in SAM/BAM format are relatively rare - the vast majority of SAM/BAM files are just pairwise alignments which are not a good fit for ACE. Peter From k.d.murray.91 at gmail.com Fri Apr 13 05:31:06 2012 From: k.d.murray.91 at gmail.com (Kevin Murray) Date: Fri, 13 Apr 2012 19:31:06 +1000 Subject: [Bioperl-l] sequence proxy server In-Reply-To: References: Message-ID: Hi Chris and Peter, Thanks for the advice, it is much appreciated. I have found almost exactly what i was taking about in the bioperl scripts, github link https://github.com/bioperl/bioperl-live/blob/master/scripts/DB/bp_biofetch_genbank_proxy.pl I will have a go at porting this to use a Bio::DB::Flat cache, given that would be exactly what i envisaged. With regards to implementing a Bio::DB module for TogoWS, i may have a crack at it if no one else is (although it will probably take me a while). Are there any pointers or particular styles you guys have (other than TMTOWTDI). Cheers, Regards Kevin Murray From afia.hayati at gmail.com Sat Apr 14 20:15:11 2012 From: afia.hayati at gmail.com (afia hayati) Date: Sun, 15 Apr 2012 09:15:11 +0900 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE In-Reply-To: References: <4F879272.30306@gmail.com> Message-ID: Peter, Florent, and all, thanks for the responses. Ya.., the idea is more to support SAM assemblies within the existing Bio::Assembly::IO. SAM or ACE files once imported should have similar handles and methods. Bio::Assembly::IO::SAM is a read only. I also will try to add write support for that module. In Bio::Assembly::ACE, there are write methods, completed with the quality score, so it "looks like" we can do SAMtoACE converter. Anyway, the main point is to add write support in Bio::Assembly::SAM. Please CMIIW, I am open to corrections and suggestions. best regards, Afiahayati On Fri, Apr 13, 2012 at 5:32 PM, Peter Cock wrote: > On Fri, Apr 13, 2012 at 3:41 AM, Florent Angly > wrote: > > Bioperl has a module to read and write ACE files, > Bio::Assembly::IO::ace. It > > also has a module to read (but not write) SAM files, > Bio::Assembly::IO::sam. > > So, it looks like you can already do SAMtoACE within Bioperl. > Implementing > > ACEtoSAM would involve adding write support to the Bio::Assembly::sam > > module. This can be helped by looking at how Bio::Assembly::IO::ace and > > Bio::Assembly::tigr implement write support. > > Regards, > > Florent > > Does SAMtoACE within Bioperl using Bio::Assembly::IO actually work? > > Note that proper multiple sequence alignments in SAM/BAM format are > relatively rare - the vast majority of SAM/BAM files are just pairwise > alignments which are not a good fit for ACE. > > Peter From jovel_juan at hotmail.com Sat Apr 14 23:27:57 2012 From: jovel_juan at hotmail.com (Juan Jovel) Date: Sun, 15 Apr 2012 03:27:57 +0000 Subject: [Bioperl-l] how to get specific sub-sequence from GenBank file with Bio::DB::GenBank In-Reply-To: References: , , <4F879272.30306@gmail.com>, , Message-ID: Hello All, I want to get some subsequences from provirus sequences in the GenBank, I got the whole sequences with the script below. However, I want to get a specific sub-sequence, which appears in the GenBank files in the line: LTR 9091..9723 how can I modify my script to get only nts 9091-9723 (in this example), instead of the whole sequence. Thanks a lot in advance!________________________HERE THE SCRIPT: #!/usr/bin/perl -wuse strict;use Bio::DB::GenBank;use Bio::SeqIO; chomp(my $infile = $ARGV[0]);open(IN, "$infile") or die "$!";my @ids = ; chomp(my $outfile = $ARGV[1]); my $seqs_out = Bio::SeqIO -> new(-file => ">$outfile", -format => "fasta"); foreach my $entry(@ids){ print "$entry"; my $db = Bio::DB::GenBank->new; my $seq = $db->get_Seq_by_acc($entry); $seqs_out->write_seq($seq); }exit; From roy.chaudhuri at gmail.com Mon Apr 16 07:16:57 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 16 Apr 2012 12:16:57 +0100 Subject: [Bioperl-l] how to get specific sub-sequence from GenBank file with Bio::DB::GenBank In-Reply-To: References: , , <4F879272.30306@gmail.com>, , Message-ID: <4F8BFFA9.9030305@gmail.com> Hi Juan, If you know the LTR coordinates in advance, then you can download a specific subsequence using Bio::DB::GenBank as shown here: http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::GenBank_when_you_have_genomic_coordinates_to_get_a_Seq_object If you don't, then you will need to download the whole sequence as you are doing, but add in some code to print out just the sequence associated with the LTR feature. Something like (untested): for my $feat ($seq->get_SeqFeatures) { $seqs_out->write_seq($feat->spliced_seq) if $feat->primary_tag eq 'LTR'; } Cheers, Roy. On 15/04/2012 04:27, Juan Jovel wrote: > > > Hello All, I want to get some subsequences from provirus sequences in > the GenBank, I got the whole sequences with the script below. > However, I want to get a specific sub-sequence, which appears in the > GenBank files in the line: LTR 9091..9723 how can I > modify my script to get only nts 9091-9723 (in this example), instead > of the whole sequence. Thanks a lot in > advance!________________________HERE THE SCRIPT: #!/usr/bin/perl > -wuse strict;use Bio::DB::GenBank;use Bio::SeqIO; chomp(my $infile = > $ARGV[0]);open(IN, "$infile") or die "$!";my @ids =; chomp(my > $outfile = $ARGV[1]); my $seqs_out = Bio::SeqIO -> new(-file => > ">$outfile", -format => "fasta"); foreach my $entry(@ids){ > print "$entry"; my $db = Bio::DB::GenBank->new; my $seq = > $db->get_Seq_by_acc($entry); $seqs_out->write_seq($seq); }exit; > > > > > > > > _______________________________________________ Bioperl-l mailing > list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sharmashalu.bio at gmail.com Mon Apr 16 16:08:23 2012 From: sharmashalu.bio at gmail.com (shalu sharma) Date: Mon, 16 Apr 2012 16:08:23 -0400 Subject: [Bioperl-l] Amino Acid sequence to nucleotides sequence Message-ID: Hi All, Is there any way in Bioperl i can convert amino acid sequences to nucleotide sequences. Thanks Shalu From p.j.a.cock at googlemail.com Mon Apr 16 16:32:20 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 16 Apr 2012 21:32:20 +0100 Subject: [Bioperl-l] Amino Acid sequence to nucleotides sequence In-Reply-To: References: Message-ID: On Mon, Apr 16, 2012 at 9:08 PM, shalu sharma wrote: > Hi All, > ? ? ? ? ? ?Is there any way in Bioperl i can convert amino acid sequences > to nucleotide sequences. > > Thanks > Shalu Probably - but there is more than one answer since the codon tables are a many-to-one mapping. Are you hoping for one possible nucleotide sequence, perhaps with IUPAC ambiguity characters? Perhaps a specific example of what you want would help - back-translation is a fuzzy term. If you are trying to combine a protein alignment with the original unaligned nucleotide sequences to make a codon alignment that's a different task. Peter From cjfields at illinois.edu Mon Apr 16 16:44:21 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 16 Apr 2012 20:44:21 +0000 Subject: [Bioperl-l] Amino Acid sequence to nucleotides sequence In-Reply-To: References: Message-ID: <45DE0C13-27B3-4E1C-AB8A-83B99DD407AF@illinois.edu> On Apr 16, 2012, at 3:32 PM, Peter Cock wrote: > On Mon, Apr 16, 2012 at 9:08 PM, shalu sharma wrote: >> Hi All, >> Is there any way in Bioperl i can convert amino acid sequences >> to nucleotide sequences. >> >> Thanks >> Shalu > > Probably - but there is more than one answer since the codon > tables are a many-to-one mapping. Are you hoping for one > possible nucleotide sequence, perhaps with IUPAC ambiguity > characters? Perhaps a specific example of what you want > would help - back-translation is a fuzzy term. > > If you are trying to combine a protein alignment with the > original unaligned nucleotide sequences to make a codon > alignment that's a different task. > > Peter We do have a revtranslate function in bioperl that is supposed to deal with ambiguities: https://metacpan.org/module/Bio::Tools::CodonTable#revtranslate I don't know how well-tested it is, but it was added a few years back to Bio::Tools::CodonTable. IIRC Mark Jensen was the developer who did that, and he's pretty meticulous. chris From Russell.Smithies at agresearch.co.nz Mon Apr 16 17:28:11 2012 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 17 Apr 2012 09:28:11 +1200 Subject: [Bioperl-l] sequence proxy server In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCE50A550@exchsth.agresearch.co.nz> I assume you've done the obvious thing and tried downloading from your local mirror? ftp://biomirror.aarnet.edu.au/biomirror/ Or ours: http://www.biomirror.org.nz/ If you have a large number of requests it's almost always faster to download the refseq files and extract locally rather than run queries against NCBI via the web. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Murray > Sent: Saturday, 7 April 2012 1:50 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] sequence proxy server > > Hi all, > > I'm an undergrad student in molecular biology at the ANU in Australia, and > my research projects are becoming increasingly bioinformatics heavy. The > latest one has involved quite a large amount of sequence retrieval from > GenBank and GenPept. The download speed to Australia from NCBI's servers > is rather slow, and i've been thinking about how we can improve this. One > solution would be to use Bio::DB::Flat with GenBank sequences on a local > computer. However, in a situation where there are multiple people in a lab > doing bioinformatics, it seems to me a bit of a waste to have the entire > genbank/genpept database, or even the relevant sections thereof, on each > computer. So, i though about writing a "sequence proxy" cgi script, and a > corresponding module, which would work a bit like this: > > The user calls Bio::DB::SeqProxy::GenBank as they would Bio::DB::GenBank, > with the exception that a parameter for the address of the sequence proxy > server is required. > The module then sends a request similar to that sent to NCBI's servers by > calling Bio::DB::GenBank->get_Seq_by_x() to the sequence proxy server I > believe all requests go to the efetch page now (please correct me if I'm > wrong, i have read the relevant bioperl module code but not thoroughly), so > the CGI script on the sequence proxy would take arguments in a similar > fashion to make writing the client side module easier. > The CGI script would use a Bio::DB::Flat database, or an interface to an SQL > database to determine if the required sequence is stored locally. (as a aside, > i'd like your thoughts on Bio::DB::Flat vs Bio::DB::Sql or similar) If the > sequence exists locally, it would be returned to the user, either as plain text, > or inside an XML container (see below). > If not, it would be retrieved from the remote database using the relevant > Bio::DB module, and returned. > > The sequence would either be returned as the relevant sequence format > (which would default to GenBank format) in plain text, or as an XML > document similar to: > > > 1 > ___YOUR GENBANK FILE HERE___ Local > Database The aim of the xml document would be to > simplify handling of server errors and allow for the specification of other > metadata such as which database the sequence came from. > > > Firstly, I'd like to know if this sounds feasible, and if so, if someone is already > working on something similar? I don't want to reinvent the wheel. > Secondly, I'd like to ask for your comments and advice. Being reasonably new > to bioperl (started using bioperl about 6 months ago, but I've been coding in > various languages for 8 years) I don't expect to have considered things that > may seem obvious to a more experienced bioperl-er, so please be as brutally > constructive in your criticism as you see fit =]. > > I know this is alot of questions, so thanks in advance for your help. > > Cheers, and a happy Easter to those who celebrate it. > > Regards > Kevin Murray > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From hnorpois at googlemail.com Thu Apr 19 10:44:50 2012 From: hnorpois at googlemail.com (Hermann Norpois) Date: Thu, 19 Apr 2012 16:44:50 +0200 Subject: [Bioperl-l] Transcriptional Regulatory Element Database Message-ID: Hello, I would like to get access to the Transcriptional Regulatory Element Database (http://rulai.cshl.edu/cgi-bin/TRED/tred.cgi?process=searchPromForm) via Bioperl. I did not find a module that does the job. Is it possible to modify a module? Is it generally possible to access this database (by means of bioperl)? Thank you norpois From jason.stajich at gmail.com Thu Apr 19 18:45:32 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 19 Apr 2012 15:45:32 -0700 Subject: [Bioperl-l] Transcriptional Regulatory Element Database In-Reply-To: References: Message-ID: <80CFDDE6-FA7F-4614-AE5D-22A5398EAA17@gmail.com> Have you first tried emailing the author listed at the bottom of the page? That seems like a more direct way to get this information. On Apr 19, 2012, at 7:44 AM, Hermann Norpois wrote: > Hello, > > I would like to get access to the Transcriptional Regulatory Element > Database (http://rulai.cshl.edu/cgi-bin/TRED/tred.cgi?process=searchPromForm) > via Bioperl. I did not find a module that does the job. Is it possible to > modify a module? Is it generally possible to access this database (by means > of bioperl)? > Thank you > norpois > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From merche at uni-bonn.de Mon Apr 23 08:31:27 2012 From: merche at uni-bonn.de (Merche Castillo) Date: Mon, 23 Apr 2012 14:31:27 +0200 Subject: [Bioperl-l] Problems installing BioPerl & cpan Message-ID: <4F954B9F.9020506@uni-bonn.de> I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation | use strict; use warnings; use Getopt::Long; use Bio::EnsEMBL::Registry; my $reg = "Bio::EnsEMBL::Registry"; $reg->load_registry_from_db( -host => "ensembldb.ensembl.org", -user => "anonymous" ); my $db_list=$reg->get_all_adaptors(); my @line; foreach my $db (@$db_list){ @line = split ('=',$db); print $line[0]."\n"; } | I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. Thanks for your help Merche -- ************************************;) Mercedes Castillo INRES, Dept. Molecular Phytomedicine University of Bonn Karlrobert-Kreiten-str 13 53115 Bonn +49(0)22873-60143 merche at uni-bonn.de ***************************************** From jason.stajich at gmail.com Mon Apr 23 09:44:51 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 23 Apr 2012 06:44:51 -0700 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <4F954B9F.9020506@uni-bonn.de> References: <4F954B9F.9020506@uni-bonn.de> Message-ID: <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> That module really has nothing to do with Fgenesh parsing so I don't think this is a particularly good test script - try one of the scripts that comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl However, you should know that module is part of EnsEMBL not BioPerl - it requires an additional package of modules be installed. if you use CPAN to install things you can do cpan> install Bio::EnsEMBL::Registry On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: > I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc > > I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation > > | use strict; > use warnings; > > use Getopt::Long; > use Bio::EnsEMBL::Registry; > > my $reg = "Bio::EnsEMBL::Registry"; > $reg->load_registry_from_db( > -host => "ensembldb.ensembl.org", > -user => "anonymous" > ); > my $db_list=$reg->get_all_adaptors(); > my @line; > > foreach my $db (@$db_list){ > @line = split ('=',$db); > print $line[0]."\n"; > } > | > > I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* > > I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. > > Thanks for your help Merche > > -- > ************************************;) > Mercedes Castillo > INRES, Dept. Molecular Phytomedicine > University of Bonn > > Karlrobert-Kreiten-str 13 > 53115 Bonn > +49(0)22873-60143 > merche at uni-bonn.de > ***************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Mon Apr 23 09:48:53 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 23 Apr 2012 13:48:53 +0000 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <4F954B9F.9020506@uni-bonn.de> References: <4F954B9F.9020506@uni-bonn.de> Message-ID: <26594DC3-0C8D-41E5-BD22-BC3F1DC7E1F0@illinois.edu> You need the Ensembl Perl API code, which requires bioperl but is not part of the bioperl distribution. See here for the latest: http://ensembl.org/info/docs/api/index.html chris On Apr 23, 2012, at 7:31 AM, Merche Castillo wrote: > I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc > > I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation > > | use strict; > use warnings; > > use Getopt::Long; > use Bio::EnsEMBL::Registry; > > my $reg = "Bio::EnsEMBL::Registry"; > $reg->load_registry_from_db( > -host => "ensembldb.ensembl.org", > -user => "anonymous" > ); > my $db_list=$reg->get_all_adaptors(); > my @line; > > foreach my $db (@$db_list){ > @line = split ('=',$db); > print $line[0]."\n"; > } > | > > I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* > > I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. > > Thanks for your help Merche > > -- > ************************************;) > Mercedes Castillo > INRES, Dept. Molecular Phytomedicine > University of Bonn > > Karlrobert-Kreiten-str 13 > 53115 Bonn > +49(0)22873-60143 > merche at uni-bonn.de > ***************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Mon Apr 23 09:54:54 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 23 Apr 2012 06:54:54 -0700 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <4F955E52.50400@uni-bonn.de> References: <4F954B9F.9020506@uni-bonn.de> <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> <4F955E52.50400@uni-bonn.de> Message-ID: <78EB7156-3EC8-4CCD-AE6E-C221B12D4F58@gmail.com> Then the next logical thing to do is go to the Ensembl page for info on how to install their modules. http://uswest.ensembl.org/info/docs/api/api_installation.html On Apr 23, 2012, at 6:51 AM, Merche Castillo wrote: > Hi > > Thanks for your reply. I'm working on some EnsEMBL scripts too, that's why I tried this script. I did look for the Bio::EnsEMBL::Registry on cpan but returns "no object found". > > > > On 04/23/2012 03:44 PM, Jason Stajich wrote: >> >> That module really has nothing to do with Fgenesh parsing so I don't think this is a particularly good test script - try one of the scripts that comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl >> >> However, you should know that module is part of EnsEMBL not BioPerl - it requires an additional package of modules be installed. >> >> if you use CPAN to install things you can do >> cpan> install Bio::EnsEMBL::Registry >> >> >> On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: >> >>> I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc >>> >>> I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation >>> >>> | use strict; >>> use warnings; >>> >>> use Getopt::Long; >>> use Bio::EnsEMBL::Registry; >>> >>> my $reg = "Bio::EnsEMBL::Registry"; >>> $reg->load_registry_from_db( >>> -host => "ensembldb.ensembl.org", >>> -user => "anonymous" >>> ); >>> my $db_list=$reg->get_all_adaptors(); >>> my @line; >>> >>> foreach my $db (@$db_list){ >>> @line = split ('=',$db); >>> print $line[0]."\n"; >>> } >>> | >>> >>> I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* >>> >>> I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. >>> >>> Thanks for your help Merche >>> >>> -- >>> ************************************;) >>> Mercedes Castillo >>> INRES, Dept. Molecular Phytomedicine >>> University of Bonn >>> >>> Karlrobert-Kreiten-str 13 >>> 53115 Bonn >>> +49(0)22873-60143 >>> merche at uni-bonn.de >>> ***************************************** >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> > > > -- > ************************************;) > Mercedes Castillo > INRES, Dept. Molecular Phytomedicine > University of Bonn > > Karlrobert-Kreiten-str 13 > 53115 Bonn > +49(0)22873-60143 > merche at uni-bonn.de > ***************************************** Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Mon Apr 23 09:51:24 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 23 Apr 2012 13:51:24 +0000 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> References: <4F954B9F.9020506@uni-bonn.de> <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> Message-ID: <10108081-D3CC-4A8F-9372-390A2C377088@illinois.edu> Unfortunately the Ensembl code isn't on CPAN (wish it were, it might make things a little easier). chris On Apr 23, 2012, at 8:44 AM, Jason Stajich wrote: > That module really has nothing to do with Fgenesh parsing so I don't think this is a particularly good test script - try one of the scripts that comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl > > However, you should know that module is part of EnsEMBL not BioPerl - it requires an additional package of modules be installed. > > if you use CPAN to install things you can do > cpan> install Bio::EnsEMBL::Registry > > > On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: > >> I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc >> >> I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation >> >> | use strict; >> use warnings; >> >> use Getopt::Long; >> use Bio::EnsEMBL::Registry; >> >> my $reg = "Bio::EnsEMBL::Registry"; >> $reg->load_registry_from_db( >> -host => "ensembldb.ensembl.org", >> -user => "anonymous" >> ); >> my $db_list=$reg->get_all_adaptors(); >> my @line; >> >> foreach my $db (@$db_list){ >> @line = split ('=',$db); >> print $line[0]."\n"; >> } >> | >> >> I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* >> >> I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. >> >> Thanks for your help Merche >> >> -- >> ************************************;) >> Mercedes Castillo >> INRES, Dept. Molecular Phytomedicine >> University of Bonn >> >> Karlrobert-Kreiten-str 13 >> 53115 Bonn >> +49(0)22873-60143 >> merche at uni-bonn.de >> ***************************************** >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From l.m.timmermans at students.uu.nl Mon Apr 23 10:16:04 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Mon, 23 Apr 2012 16:16:04 +0200 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <10108081-D3CC-4A8F-9372-390A2C377088@illinois.edu> References: <4F954B9F.9020506@uni-bonn.de> <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> <10108081-D3CC-4A8F-9372-390A2C377088@illinois.edu> Message-ID: Yeah, that's fairly useless. Does anyone know the reason for that? Is it just inertia, or is it more? Leon On Mon, Apr 23, 2012 at 3:51 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > Unfortunately the Ensembl code isn't on CPAN (wish it were, it might make > things a little easier). > > chris > > On Apr 23, 2012, at 8:44 AM, Jason Stajich wrote: > > > That module really has nothing to do with Fgenesh parsing so I don't > think this is a particularly good test script - try one of the scripts that > comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl > > > > However, you should know that module is part of EnsEMBL not BioPerl - it > requires an additional package of modules be installed. > > > > if you use CPAN to install things you can do > > cpan> install Bio::EnsEMBL::Registry > > > > > > On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: > > > >> I'm posting this message out of pure desperation, because I really > don't know what else to try. I'm a beginner in bioperl and I'm working on a > script to parse out some results I got from MolQuest fgenesh. Results are > out in .txt format and I want to parse them to GFF and fasta file for mRNA > and protein sequences to facilitate comparison with other results we have. > I would like to use BioPerl for other purposes in the future so I'm very > interested in getting it ready on my pc > >> > >> I followed the instructions herehttp:// > www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install > CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All > tests were ok, but when I ran this script to test the installation > >> > >> | use strict; > >> use warnings; > >> > >> use Getopt::Long; > >> use Bio::EnsEMBL::Registry; > >> > >> my $reg = "Bio::EnsEMBL::Registry"; > >> $reg->load_registry_from_db( > >> -host => "ensembldb.ensembl.org", > >> -user => "anonymous" > >> ); > >> my $db_list=$reg->get_all_adaptors(); > >> my @line; > >> > >> foreach my $db (@$db_list){ > >> @line = split ('=',$db); > >> print $line[0]."\n"; > >> } > >> | > >> > >> I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* > >> > >> I tried to install BioPerl again via Build.PL, running as root, but > still came to the same outcome. > >> > >> Thanks for your help Merche > >> > >> -- > >> ************************************;) > >> Mercedes Castillo > >> INRES, Dept. Molecular Phytomedicine > >> University of Bonn > >> > >> Karlrobert-Kreiten-str 13 > >> 53115 Bonn > >> +49(0)22873-60143 > >> merche at uni-bonn.de > >> ***************************************** > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > > jason.stajich at gmail.com > > jason at bioperl.org > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Apr 23 10:20:59 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 23 Apr 2012 14:20:59 +0000 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: References: <4F954B9F.9020506@uni-bonn.de> <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> <10108081-D3CC-4A8F-9372-390A2C377088@illinois.edu> Message-ID: <70FCB632-4CD5-4F28-A6B6-F93507397435@illinois.edu> Not sure, but it may have something to do with the requirement for a very old bioperl (v1.2.3). chris On Apr 23, 2012, at 9:16 AM, Leon Timmermans wrote: > Yeah, that's fairly useless. Does anyone know the reason for that? Is it just inertia, or is it more? > > Leon > > On Mon, Apr 23, 2012 at 3:51 PM, Fields, Christopher J wrote: > Unfortunately the Ensembl code isn't on CPAN (wish it were, it might make things a little easier). > > chris > > On Apr 23, 2012, at 8:44 AM, Jason Stajich wrote: > > > That module really has nothing to do with Fgenesh parsing so I don't think this is a particularly good test script - try one of the scripts that comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl > > > > However, you should know that module is part of EnsEMBL not BioPerl - it requires an additional package of modules be installed. > > > > if you use CPAN to install things you can do > > cpan> install Bio::EnsEMBL::Registry > > > > > > On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: > > > >> I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc > >> > >> I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation > >> > >> | use strict; > >> use warnings; > >> > >> use Getopt::Long; > >> use Bio::EnsEMBL::Registry; > >> > >> my $reg = "Bio::EnsEMBL::Registry"; > >> $reg->load_registry_from_db( > >> -host => "ensembldb.ensembl.org", > >> -user => "anonymous" > >> ); > >> my $db_list=$reg->get_all_adaptors(); > >> my @line; > >> > >> foreach my $db (@$db_list){ > >> @line = split ('=',$db); > >> print $line[0]."\n"; > >> } > >> | > >> > >> I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* > >> > >> I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. > >> > >> Thanks for your help Merche > >> > >> -- > >> ************************************;) > >> Mercedes Castillo > >> INRES, Dept. Molecular Phytomedicine > >> University of Bonn > >> > >> Karlrobert-Kreiten-str 13 > >> 53115 Bonn > >> +49(0)22873-60143 > >> merche at uni-bonn.de > >> ***************************************** > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > > jason.stajich at gmail.com > > jason at bioperl.org > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rbuels at gmail.com Mon Apr 23 19:49:10 2012 From: rbuels at gmail.com (Robert Buels) Date: Mon, 23 Apr 2012 19:49:10 -0400 Subject: [Bioperl-l] Announcing OBF Google Summer of Code Accepted Students Message-ID: <4F95EA76.4030004@gmail.com> Hello all, I'm very pleased and excited to announce that the Open Bioinformatics Foundation has selected 5 very capable students to work on OBF projects this summer as part of the Google Summer of Code program. The accepted students, their projects, and their mentors (in alphabetical order): Wibowo Arindrarto SearchIO Implementation in Biopython mentored by Peter Cock Lenna Peterson Diff My DNA: Development of a Genomic Variant Toolkit for Biopython mentored by Brad Chapman Marjan Povolni The worlds fastest parallelized GFF3/GTF parser in D, and an interfacing biogem plugin for Ruby mentored by Pjotr Prins, Francesco Strozzi, Raoul Bonnal Artem Tarasov Fast parallelized GFF3/GTF parser in C++, with Ruby FFI bindings mentored by Pjotr Prins, Francesco Strozzi, Raoul Bonnal Clayton Wheeler Multiple Alignment Format parser for BioRuby mentored by Francesco Strozzi and Raoul Bonnal As in every year, we received many great applications and ideas. However, funding and mentor resources are limited, and we were not able to accept as many as we would have liked. Our deepest thanks to all the students who applied: we sincerely appreciate the time and effort you put into your applications, and hope you will still consider being a part of the OBF's open source projects, even without Google funding. I speak for myself and all of the mentors who read and scored applications when I say that we were truly honored by the number and quality of the applications we received. For the accepted students: congratulations! You have risen to the top of a very competitive application process. Now it's time to "put your money where your mouth is", as the saying goes. Let's get out there and write some great code this summer! Best regards, Rob ---- Robert Buels OBF GSoC 2012 Administrator From Simon.Guest at agresearch.co.nz Mon Apr 30 02:00:26 2012 From: Simon.Guest at agresearch.co.nz (Guest, Simon) Date: Mon, 30 Apr 2012 18:00:26 +1200 Subject: [Bioperl-l] Circular dependency problems packaging BioPerl as RPM Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCECE5EF4@exchsth.agresearch.co.nz> Dear BioPerlers, I am building BioPerl and BioPerl-Run as RPMs, since I have to install them on several servers, and really don't want to run CPAN installation scripts on each machine. It has been a tortuous journey of chasing down dependencies and packaging them (thank goodness for cpanspec), but I think I am nearly done. However, I have hit a circular dependency / incompatibility problem between BioPerl and BioPerl-Run. When building BioPerl-Run-1.006900, it insists on BioPerl-1.6.901: Checking prerequisites... - ERROR: Bio::Root::Version (1.006001) is installed, but we need version >= 1.006900 But then BioPerl-Run-1.006900 has dependencies on Bio::Expression::DataSet Bio::Expression::Platform Bio::Expression::Sample Bio::Expression::Contact which seem to be in BioPerl-1.6.1 but not BioPerl-1.6.901 Does anyone know of this problem? Are there any suggestions for work arounds? cheers, Simon ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Mon Apr 30 09:42:34 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 30 Apr 2012 13:42:34 +0000 Subject: [Bioperl-l] Circular dependency problems packaging BioPerl as RPM In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF34CCECE5EF4@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF34CCECE5EF4@exchsth.agresearch.co.nz> Message-ID: <3E2BD058-EDB0-482B-9727-8DB7FB466737@illinois.edu> The Bio::Expression dependencies are unusual, I'll have to look through and find the modules responsible for pulling these in. When I last ran these no tests failed, so either the dependency is off or no tests have been written for the modules in question. We can always release a new CPAN BioPerl-Run to deal with it. chris On Apr 30, 2012, at 1:00 AM, Guest, Simon wrote: > Dear BioPerlers, > > I am building BioPerl and BioPerl-Run as RPMs, since I have to install them on > several servers, and really don't want to run CPAN installation scripts on > each machine. > > It has been a tortuous journey of chasing down dependencies and packaging them > (thank goodness for cpanspec), but I think I am nearly done. > > However, I have hit a circular dependency / incompatibility problem between > BioPerl and BioPerl-Run. > > When building BioPerl-Run-1.006900, it insists on BioPerl-1.6.901: > Checking prerequisites... > - ERROR: Bio::Root::Version (1.006001) is installed, but we need version >= 1.006900 > > But then BioPerl-Run-1.006900 has dependencies on > Bio::Expression::DataSet > Bio::Expression::Platform > Bio::Expression::Sample > Bio::Expression::Contact > which seem to be in BioPerl-1.6.1 but not BioPerl-1.6.901 > > Does anyone know of this problem? > > Are there any suggestions for work arounds? > > cheers, > Simon > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hnorpois at googlemail.com Mon Apr 30 12:45:50 2012 From: hnorpois at googlemail.com (Hermann Norpois) Date: Mon, 30 Apr 2012 18:45:50 +0200 Subject: [Bioperl-l] different interpretion of get_seq_by_id by DB::GenBank and DB:Entrez::Gene Message-ID: I am a confused by the different interpretation of get_seq_by_id. Obviously it is something different for the two modules. Script1: #!/bin/perl -w use Bio::DB::GenBank; use Bio::SeqIO; # Das output-Format wird festgelegt $seqio_obj = Bio::SeqIO->new(-file => '>sequence.fasta', -format => 'fasta' ); $db_obj = Bio::DB::GenBank->new; $id = "BC049766"; # accesscion number $seq_obj = $db_obj->get_Seq_by_id($id); $seqio_obj->write_seq($seq_obj); Script2: #!/bin/perl -w use strict; use Bio::DB::EntrezGene; my $id = "Penk1"; #name of the gene my $db = new Bio::DB::EntrezGene; my $seq = $db->get_Seq_by_id($id); my $ac = $seq->annotation; for my $ann ($ac->get_Annotations('dblink')) { if ($ann->database eq "Evidence Viewer") { # get the sequence identifier, the start, and the stop my ($contig,$from,$to) = $ann->url =~ /contig=([^&]+).+from=(\d+)&to=(\d+)/; print "$contig\t$from\t$to\n"; } } Thank you Hermann Norpois From jimhu at tamu.edu Mon Apr 30 13:38:23 2012 From: jimhu at tamu.edu (Jim Hu) Date: Mon, 30 Apr 2012 12:38:23 -0500 Subject: [Bioperl-l] Gbrowse file uploads, bigwig and chromosome sizes files Message-ID: <1F4B23DC-2CD1-4D61-A6F4-D823B4C7C7D1@tamu.edu> I'm not sure how many of our issues are gbrowse-specific vs. more general bioperl issues, so I'm cross-posting to both lists. We think we've traced our problems uploading wiggle files to our gbrowse to the failure to create the chromosome.size file. Short version: - what is supposed to be in the locationlist? Chromosomes only or just genes? - why does the chromosome sizes try to get everything in the locationlist, whether or not it's a chromosome? Long version: Our E. coli MG1655 database was loaded several years ago with bp_seqfeature_load.pl -d gb_MG1655_jh -f -c NC_000913.gb.gff NC_000913.gb.fasta -u -p The mysql database has 4,146 entries in the locationlist where the first one is for the chromosome and the others are named for genes. When we ask Gbrowse to generate the chromosome sizes file, instead of doing what I expect (look up the reference feature names), it tries to get the size of every feature in the locationlist. I can't actually find the fasta file I used. When this happens, the eval in Bio::Graphics::Broser2::Dataloader dies because it does not seem to be passing allow_aliases to this subroutine in Bio::DB::Seqfeature::Store:: DBI::mysql sub _name_sql { my $self = shift; my ($name,$allow_aliases,$join) = @_; my $name_table = $self->_name_table; my $from = "$name_table as n"; my ($match,$string) = $self->_match_sql($name); my $where = "n.id=$join AND n.name $match"; $where .= " AND n.display_name>0" unless $allow_aliases; return ($from,$where,'',$string); } Here's the backtrace: CHROMOSOME SIZES at /usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 942, referer: Bio::DB::SeqFeature::Store::DBI::mysql::_name_sql('Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0xb2bfed0)', 'b0001', undef, 'f.id') called at /usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/DBI/mysql.pm Bio::DB::SeqFeature::Store::DBI::mysql::_features('Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0xb2bfed0)', '-name', 'b0001', '-class', undef, '-aliases', undef, Bio::DB::SeqFeature::Store::get_features_by_name('Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0xb2bfed0)', '-name', 'b0001') called at /usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store.pm line Bio::DB::SeqFeature::Store::segment('Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0xb2bfed0)', 'b0001') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/DataLoader.pm line 171, eval {...} called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/DataLoader.pm line 169, Bio::Graphics::Browser2::DataLoader::generate_chrom_sizes('Bio::Graphics::Browser2::DataLoader=HASH(0xb2bfbd0)', '/var/tmp/gbrowse2/chrom_sizes/MG1655.sizes') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/DataLoader.pm line 143, Bio::Graphics::Browser2::DataLoader::chrom_sizes('Bio::Graphics::Browser2::DataLoader=HASH(0xb2bfbd0)') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/Action.pm line 1117, referer: Bio::Graphics::Browser2::Action::ACTION_chrom_sizes('Bio::Graphics::Browser2::Action=REF(0xa993ea0)', 'CGI=HASH(0xaf57450)') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/Render.pm line 427, Bio::Graphics::Browser2::Render::asynchronous_event('Bio::Graphics::Browser2::Render::HTML=HASH(0xaf590d8)') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/Render.pm line 356, referer: Bio::Graphics::Browser2::Render::run_asynchronous_event('Bio::Graphics::Browser2::Render::HTML=HASH(0xaf590d8)') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/Render.pm line 274, referer: Bio::Graphics::Browser2::Render::run('Bio::Graphics::Browser2::Render::HTML=HASH(0xaf590d8)') called at /usr/lib/cgi-bin/gb2/gbrowse line 50, referer: ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From hnorpois at googlemail.com Mon Apr 30 14:06:40 2012 From: hnorpois at googlemail.com (Hermann Norpois) Date: Mon, 30 Apr 2012 20:06:40 +0200 Subject: [Bioperl-l] Retrieving promoter sequenc Message-ID: Dear list, I try to write a script for retrieving a 700bp sequence upstream of the 5?prime of TTS (a putative promoter sequence). This page gave me some information how to do so (Chapter *Using Bio::DB::EntrezGene to get genomic coordinates* AND *Using Bio::DB::GenBank when you have genomic coordinates to get a Seq object*): http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences Actually I do not have an idea how to define $chr_acc_ver (see below) #!/bin/perl -w use strict; use Bio::DB::EntrezGene; use Bio::SeqIO; use Bio::DB::GenBank; my $id = "12064"; # bdnf my $seqio_obj = Bio::SeqIO->new(-file => '>s2.fasta', -format => 'fasta' ); my $db = new Bio::DB::EntrezGene; my $seq = $db->get_Seq_by_id($id); my $ac = $seq->annotation; for my $ann ($ac->get_Annotations('dblink' )) { if ($ann->database eq "Evidence Viewer") { # get the sequence identifier, the start, and the stop my ($contig,$from,$to) = $ann->url =~ /contig=([^&]+).+from=(\d+)&to=(\d+)/; my $chr_start = $from-700; my $chr_stop = $from; my $gb = Bio::DB::GenBank->new(-format => 'genbank', -seq_start => $chr_start, -seq_stop => $chr_stop, # -strand => $strand ); my $obj = $gb->get_Seq_by_id($chr_acc_ver); # *How do I define $chr_acc_ver?* $seqio_obj->write_seq($obj); # print "$contig\t$from\t$to\n$chr_start\t$chr_stop\n"; } } Can anybody give me a hint how this might work? Thanks Hermann Norpois From maquino at knome.com Mon Apr 30 15:15:26 2012 From: maquino at knome.com (Mark Aquino) Date: Mon, 30 Apr 2012 15:15:26 -0400 Subject: [Bioperl-l] unblessed reference in $sam->pileup error. Message-ID: <984E64E7-DF37-4BD1-BB84-DC86816A42E8@knome.com> Hi all, I'm trying to call all bases from a bam and count their depths, at first I was doing this getting all alignments that cover a certain region, but realized that writing the logic to detect indels via the cigar string was a bit more complicated than I thought so I decided to try this with the pileup method from Bio::DB::Sam / Bio::DB::Bam::Pileup however I am getting this error: Can't call method "b" on unblessed reference at ./coverageDepths.pl line 114, line 1. when trying to use the $pileup->alignment method. Does anyone have any idea what I'm missing? 109 $sam->pileup('1:550968-550969', 110 sub { 111 my ($seqid,$pos,$pileup) = @_; 112 for my $p (@$pileup){ 113 if ($p->indel){ print "INDEL!\n"}; 114 my $b = $pileup->b; 115 my $qbase = substr($b->qseq, $pileup->qpos,1); 116 print "$qbase\n"; 117 } 118 }); Thanks, Mark From maquino at knome.com Mon Apr 30 15:18:35 2012 From: maquino at knome.com (Mark Aquino) Date: Mon, 30 Apr 2012 15:18:35 -0400 Subject: [Bioperl-l] unblessed reference on sam->pileup Message-ID: <33D55D60-7986-4971-9802-47AB9CDE3E24@knome.com> Nevermind, as usual 5 seconds after sending an email to the group I realized what I was doing wrong the whole time. From Simon.Guest at agresearch.co.nz Mon Apr 30 23:29:58 2012 From: Simon.Guest at agresearch.co.nz (Guest, Simon) Date: Tue, 1 May 2012 15:29:58 +1200 Subject: [Bioperl-l] Circular dependency problems packaging BioPerl as RPM In-Reply-To: <3E2BD058-EDB0-482B-9727-8DB7FB466737@illinois.edu> References: <18DF7D20DFEC044098A1062202F5FFF34CCECE5EF4@exchsth.agresearch.co.nz> <3E2BD058-EDB0-482B-9727-8DB7FB466737@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCECE61BE@exchsth.agresearch.co.nz> > -----Original Message----- > From: Fields, Christopher J [mailto:cjfields at illinois.edu] > Sent: Tuesday, 1 May 2012 1:43 a.m. > To: Guest, Simon > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Circular dependency problems packaging BioPerl as > RPM > > The Bio::Expression dependencies are unusual, I'll have to look through and > find the modules responsible for pulling these in. When I last ran these no > tests failed, so either the dependency is off or no tests have been written for > the modules in question. > > We can always release a new CPAN BioPerl-Run to deal with it. Hi Chris, I ignored the Bio::Expression dependencies and everything eventually built OK, using BioPerl-1.6.901 and BioPerl-Run-1.006900. If you release a new BioPerl-Run, I would be interested in packaging it, as I have come this far. Do you have any ideas about where I could submit the BioPerl and dependency RPMs I built for CentOS 6? I now have around 40 RPMs that weren't in CentOS or EPEL, which were all built straight from CPAN using cpanspec. I guess others might like to benefit from this (and it would also serve to validate the builds). My other unknown is what non-Perl dependencies I should add to the BioPerl RPM. I don't know what to do here. The dependencies page on the BioPerl Wiki seems to list only Perl module dependencies. cheers, Simon ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From exceptlowang at gmail.com Tue Apr 17 20:00:08 2012 From: exceptlowang at gmail.com (Tim White) Date: Wed, 18 Apr 2012 00:00:08 -0000 Subject: [Bioperl-l] Bio::SeqIO::tab deletes gap characters when reading sequences, which is inconvenient Message-ID: <4F8E03FE.7000506@gmail.com> Hi, Bio::SeqIO::tab (what you get when specifying -format => 'tab' to Bio::SeqIO->new()) is perfect for converting sequences into a one-per-line format, so that standard line-oriented UNIX tools (grep, comm etc.) work as expected. Except... I just discovered that it deletes gap ("-") characters when reading sequences, so it can't be used to round-trip any files that contain these. This is a source of grief as I frequently work with FASTA files that contain aligned sequences, and thus gap characters. This is all because the next_seq() function in Bio::SeqIO::tab.pm contains the line: $seq =~ s/\W//g; which removes all non-alphanumeric characters from the sequence data. IMHO it would be *much* better if this was changed to: $seq =~ s/\s//g; which simply removes all whitespace characters (particularly including the \r that often appears at the ends of lines on text files that have visited Windows), enabling gap characters (and, for example, periods and asterisks) to be preserved. Alternatively, you could simply get rid of this line of code and allow whitespace characters through. I'm not sure whether this counts as a "bug", as a cursory search didn't turn up any docs explaining precisely what characters are and aren't preserved by classes implementing Bio::SeqIO, but it's certainly inconsistent (at least Bio::SeqIO::fasta, and Bio::SeqIO::table, with columns and delimiters set up appropriately, allow round-tripping of files containing gap characters) as well as extremely inconvenient for me personally, and I suspect for others. Assuming no harm would be done by making the above change, what's the best thing to do to get this changed? I've simply edited my own local copy of tab.pm to make the above change, but obviously if others agree I'd like to get the change done upstream. Thanks, Tim From mohammadali.alavi at edu.uni-graz.at Sat Apr 21 05:22:21 2012 From: mohammadali.alavi at edu.uni-graz.at (Alavi, Mohammadali (0313xxx)) Date: Sat, 21 Apr 2012 11:22:21 +0200 Subject: [Bioperl-l] piping values into an existing GENBANK file Message-ID: <70DA93B804A15C4387B05DEF33BC255701A1CE149E36@MSIGI.stud.ad.uni-graz.at> Hello All, I have a GENBANK file already, to which I need to add some feauture. To be precise, I want to add the data (over the COG function) to the CDSs present in the GENBANK file. The data (COG functions) I need to add is included in an array in a manner that the first value is the value needed to be added to my first CDS in the GENBANK file, the second value needs to be added to the second CDS in the GENBANK file and so on. I tried to add the data in a tag/value style to the CDSs (as described in HOW TO:Feautures-Annotation provided by Biopel), which actually basically works. The Problem is though, I do not know how I could tell Perl/Bioperl to only take one single value at a time and add it in a tag/value style to a CDS and then take the next (and only the next) value and add it to the NEXT CDS and so on. Here is the code I used. As you see, using the for $item(@array) is not appropriate, since it adds all the values of my array to all CDSs! So is there a way of piping in values one after another into CDSs one after another in a file using Bioperl?! or maybe how about another way of doing it in regular Perl? I would appreciate any help on that very much! Bioperl I'm using: 1.6.1 The Active Perl I'm using : 5.12.4 (on Windows Vista) #!/bin/perl use Bio::SeqIO; use Bio::SeqFeature::Generic; use warnings; @COGlist = qw(motility General metabolism nunknown); # think of this as the #array I would like to add the values of to my file, the real one has ofcourse #as many values as the number of CDSs in the GENBANK file $seqio_object = Bio::SeqIO -> new(-file => "file.gbk", -format => "genbank"); $seq_object = $seqio_object -> next_seq; for $feat_object ($seq_object -> get_SeqFeatures){ for $item(@COGlist){ # this would add all elements of the array to all of CDSs and is therefore wrong! $feat_object -> add_tag_value("note", $item); } for $tags ($feat_object -> get_all_tags){ print "tag:".$tags . "\n"; for $values ($feat_object -> get_tag_values($tags)){ print "value: " . $values . "\n"; # as one might imagine this does not give the output I have been looking for :-)) } } } From huansheng.xu at gmail.com Sun Apr 22 10:15:44 2012 From: huansheng.xu at gmail.com (Huansheng Xu) Date: Sun, 22 Apr 2012 10:15:44 -0400 Subject: [Bioperl-l] configuration problem with Bio::Tools::Run::Alignment::ClustalW Message-ID: Hi, I am a postdoc fellow at Massachusetts General Hospital in Boston. I am writing to seek help with the Bio::Tools::Run::Alignment::ClustalW module available at the BioPerl website. I tried to align some DNA sequences contained in a FASTA file with the module embeded in a propram (as shown below), but got stuck there. The program works very well for protein sequences. I think maybe I need to configure the module specifically for DNA, but I do not know how to do that. Could you take a look and let me know how to do the configuration? Thanks a lot! Best, Huansheng Xu -------------------------------------------------------------------------------------------------------------------------------------------------------------------- #! /usr/bin/perl use Bio::Perl; use Bio::SearchIO; use Bio::AlignIO; use Bio::Tools::Run::Alignment::Clustalw; use warnings; use strict; my $filename = $ARGV[0]; die "Usage: $0 \n" unless $filename; die "File $filename not found.\n" unless -f $filename; # Read the list of raw sequences from the file you feed the program my $fh = Bio::SeqIO->newFh(-file=>$filename, -format=>'fasta'); my @seq_array=<$fh>; # pass the parameters and generate a factory to run the alignmnet wiht ClustalW my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); @params = ('ktuple' => 2, 'dnamatrix' => 'IUB') if ($seq_array[0]->alphabet eq 'dna'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); my $seq_array_ref = \@seq_array; my $aln = $factory->align($seq_array_ref); # create a new AlignIO object my $out = Bio::AlignIO->new(-file=> ">$filename.aln", -format=> 'clustalw'); $out->write_aln($aln); From bubli_thakur at rediffmail.com Fri Apr 20 22:59:50 2012 From: bubli_thakur at rediffmail.com (subarna thakur) Date: 21 Apr 2012 02:59:50 -0000 Subject: [Bioperl-l] =?utf-8?q?codon_usage?= Message-ID: <20120421025950.8579.qmail@f4mail-235-122.rediffmail.com> I am writing a script for determining number of genes containing a particular codon. The codons are mentioned in a separate file. The output is coming all right for the first codon mentioned in the file but for the other codons , the script is not working. Please suggest the error in the script. The script is as follows ---- #!/usr/bin/perl -w use Bio::SeqIO; $file2="table.txt"; $codon=0; open OUT, ">out-test.txt" or die $!; $seqio_obj = Bio::SeqIO->new( -file => "gopi2.txt" , '-format' => 'Fasta'); open( my $fh2, $file2 ) or die "$!"; while( my $line = <$fh2> ){ $acc=$line; chomp $acc; while ($seq1= $seqio_obj->next_seq){ my @output = $seq1->id; my $string = $seq1->seq; $v=0; $l= length($string); $t=$l/3; $k=0; for ($i=1; $i <= $t; $i++){ @array2 = substr($string, $k, 3); $k=$k+3; foreach $value (@array2) { if ($value eq "$acc") { print OUT " The sequence id is @output\n"; print OUT "$acc codon found in position $i\n\n"; $v=$v+1; } } } if ($v==0) { $h=0; } else { $h=1; } $codon=$codon+$h; } print OUT "Total number of sequences with $acc codon"; print OUT "\t"; print OUT $codon; } exit; From msprasad693 at gmail.com Thu Apr 26 08:16:39 2012 From: msprasad693 at gmail.com (prasad ms) Date: Thu, 26 Apr 2012 17:46:39 +0530 Subject: [Bioperl-l] Bioperl for global alignment Message-ID: Hello sir, I am Prasad, student of MS in bioinformatics. I am doing my final year project, and sequence alignment is the part of my project. I am having nearly 50k sequences and i want to do a pairwise global alignment (NW alignment). I read the bioperl tutorial. But in that there is no mention about this. Could you please guide how can i do this type of alignment using bioperl. I assure that all the usage is purely for academic. Looking forward to hear from you. Thank you, Regards, Prasad MS From msprasad693 at gmail.com Mon Apr 30 01:40:43 2012 From: msprasad693 at gmail.com (prasad ms) Date: Mon, 30 Apr 2012 11:10:43 +0530 Subject: [Bioperl-l] Fwd: Bioperl for global alignment In-Reply-To: References: Message-ID: Hello sir, I am Prasad, student of MS in bioinformatics. I am doing my final year project, and sequence alignment is the part of my project. I am having nearly 50k sequences and i want to do a pairwise global alignment (NW alignment). I read the bioperl tutorial. But in that there is no mention about this. Could you please guide how can i do this type of alignment using bioperl. I assure that all the usage is purely for academic. Looking forward to hear from you. Thank you, Regards, Prasad MS From bartwiegmans at gmail.com Sun Apr 1 07:55:01 2012 From: bartwiegmans at gmail.com (Bart Wiegmans) Date: Sun, 1 Apr 2012 13:55:01 +0200 Subject: [Bioperl-l] Implementing Bioperl6 for GSoC 2012 Message-ID: Hello all BioPerl-ers, This is my first e-mail to the list and thus my introduction. My name is Bart Wiegmans, I study biology at the University of Groningen, the netherlands. It is my goal to implement bioperl6 this summer as part of the GSoC program. Why would I want such a thing? For a start, I'd like to learn more about bioinformatics. As I told you I study biology, so this has an obvious advantage for me. Also, I'd like to learn perl6 well, and this is only possible when one writes a significant program in it. Moreover, I think perl6 is awesome, and having a real-world toolkit like bioperl out there might just be enough to develop a significant community using it. As a third, I think perl5's object support is crufty, and difficult to learn for many people. These people include biologists who might not be inclined to learn it, and rather use some other tools instead. As to who I am, I already told you my name. I am 24 years old, and study biology at an undergraduate level. (For those interested, yes this means I haven't exactly been flying through my courses :-)). I have been programming computers ever since I was 16 years old, and earlier if you count BASIC. Starting out with C, most of that has been websites (in PHP), scripts (in Perl), and other smallish programs (in Java / Perl). For example, I implemented a parser and decoder for the dirac video specification as part of GSoC 2008, and a script which reads the NIH bookshelf website and translates this into ePub e-books. Read quite a few of them that way. Aside from my motivation and capabilities, two other factors somewhat complicate my involvement with GSoC. The first is that the academic year ends halfway in July in the netherlands, not in may as in the USA and in many other countries. This means that I am not 'free' in a real sense before that time. Also, I have a day job as a PHP programmer for a local online students' magazine, which also takes some time. Which is unfortunate, because I'd rather spend my time writing useful programs; hence, if you would accept me as a student I plan to take leave from this job during the period of GSoC. Anyway, I realize this has been enough information for any interested reader. If there is any interest on your side, I frequent freenode under the nickname brrt. Other than that and this e-mail address, I don't have much of an online presence. Kind regards, Bart Wiegmans From l.m.timmermans at students.uu.nl Sun Apr 1 10:38:13 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Sun, 1 Apr 2012 16:38:13 +0200 Subject: [Bioperl-l] Implementing Bioperl6 for GSoC 2012 In-Reply-To: References: Message-ID: On Sun, Apr 1, 2012 at 1:55 PM, Bart Wiegmans wrote: > This is my first e-mail to the list and thus my introduction. My name > is Bart Wiegmans, I study biology at the University of Groningen, the > netherlands. It is my goal to implement bioperl6 this summer as part > of the GSoC program. > Cool. Though I am wondering what exactly you want to implement. BioPerl as a whole is 2000 modules, not even a dozen GSOC students could implement that. You will have to focus on something. Also, I'd like to learn perl6 well, and this > is only possible when one writes a significant program in it. > Moreover, I think perl6 is awesome, and having a real-world toolkit > like bioperl out there might just be enough to develop a significant > community using it. How much time do you expect that to cost? Having to learn a new language means you will get less done that you would ordinarily. This doesn't have to be a problem, but do keep it into account. > As a third, I think perl5's object support is > crufty, and difficult to learn for many people. These people include > biologists who might not be inclined to learn it, and rather use some > other tools instead. > Perl 5's object support can be quite elegant with modern OO frameworks such as Moose and relatives. Sadly, BioPerl itself is based on fairly dated paradigms. Aside from my motivation and capabilities, two other factors somewhat > complicate my involvement with GSoC. The first is that the academic > year ends halfway in July in the netherlands, not in may as in the USA > and in many other countries. This means that I am not 'free' in a real > sense before that time. Also, I have a day job as a PHP programmer for > a local online students' magazine, which also takes some time. Which > is unfortunate, because I'd rather spend my time writing useful > programs; hence, if you would accept me as a student I plan to take > leave from this job during the period of GSoC. > Yeah, I'm familiar with that problem, it's rather unfortunate. > Anyway, I realize this has been enough information for any interested > reader. If there is any interest on your side, I frequent freenode > under the nickname brrt. Other than that and this e-mail address, I > don't have much of an online presence. > Well, then come join us at #bioperl and #perl6 then. Leon From cjfields at illinois.edu Sun Apr 1 21:57:53 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 2 Apr 2012 01:57:53 +0000 Subject: [Bioperl-l] Implementing Bioperl6 for GSoC 2012 In-Reply-To: References: Message-ID: <6D16BB69-A74A-4A7E-B81C-0B33E9A73473@illinois.edu> Bart, I think this is a good idea, but to address one concern up-front: I believe the time frame (possible start in mid-July) will not be in your favor unless you can participate according to Google's schedule. There is some flexibility at the beginning, mainly during the community bonding period, but beyond that there isn't much wiggle room; we're as constrained as you are. The proposals to OBF over the last few years have been highly competitive, and this year will prove even more so as a large number of major Perl-based FLOSS orgs (primarily The Perl Foundation) were not accepted into GSoC. Now for Perl 6: BioPerl6 is a project Philip Mabon and I have already started up on github: https://github.com/cjfields/bioperl6 The code is basically proof-of-concept stuff at this point, and I don't believe it's current to spec last I checked or that it works on either the most current niecza or rakudo. All this was enough of a moving target to make it hard to maintain, but that seems to be settling somewhat now. It's pretty wide open, though, as far as I'm concerned. If you are to pursue this as a proposal, you will need to focus on a specific task or set of classes to implement along with docs and tests. A straight port isn't the best option; you should take advantage of Perl 6's strengths and possibly work on redesign where needed, keeping in mind that some aspects of bioperl might be considered a bit over-designed and out-of-date w/ re: to modern perl dictums. Also, learning a new language is nice, but that isn't the main focus for any GSoC project. At the end of the day, we should have usable code for the community, and if it acts as a entry point for more people to get involved with Perl 6 the better :) (I see that Leon has also chimed in on this with similar comments as well) I will be on and off #bioperl this week (pyrimidine). IRC is also logged in case I need to backlog (provided by one Moritz Lenz): http://irclog.perlgeek.de/bioperl/today chris On Apr 1, 2012, at 6:55 AM, Bart Wiegmans wrote: > Hello all BioPerl-ers, > > This is my first e-mail to the list and thus my introduction. My name > is Bart Wiegmans, I study biology at the University of Groningen, the > netherlands. It is my goal to implement bioperl6 this summer as part > of the GSoC program. > > Why would I want such a thing? For a start, I'd like to learn more > about bioinformatics. As I told you I study biology, so this has an > obvious advantage for me. Also, I'd like to learn perl6 well, and this > is only possible when one writes a significant program in it. > Moreover, I think perl6 is awesome, and having a real-world toolkit > like bioperl out there might just be enough to develop a significant > community using it. As a third, I think perl5's object support is > crufty, and difficult to learn for many people. These people include > biologists who might not be inclined to learn it, and rather use some > other tools instead. > > As to who I am, I already told you my name. I am 24 years old, and > study biology at an undergraduate level. (For those interested, yes > this means I haven't exactly been flying through my courses :-)). I > have been programming computers ever since I was 16 years old, and > earlier if you count BASIC. Starting out with C, most of that has been > websites (in PHP), scripts (in Perl), and other smallish programs (in > Java / Perl). For example, I implemented a parser and decoder for the > dirac video specification as part of GSoC 2008, and a script which > reads the NIH bookshelf website and translates this into ePub e-books. > Read quite a few of them that way. > > Aside from my motivation and capabilities, two other factors somewhat > complicate my involvement with GSoC. The first is that the academic > year ends halfway in July in the netherlands, not in may as in the USA > and in many other countries. This means that I am not 'free' in a real > sense before that time. Also, I have a day job as a PHP programmer for > a local online students' magazine, which also takes some time. Which > is unfortunate, because I'd rather spend my time writing useful > programs; hence, if you would accept me as a student I plan to take > leave from this job during the period of GSoC. > > Anyway, I realize this has been enough information for any interested > reader. If there is any interest on your side, I frequent freenode > under the nickname brrt. Other than that and this e-mail address, I > don't have much of an online presence. > > Kind regards, > Bart Wiegmans > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From joseramonblas at gmail.com Mon Apr 2 01:21:55 2012 From: joseramonblas at gmail.com (=?ISO-8859-1?Q?Jos=E9_Ram=F3n_Blas_Pastor?=) Date: Mon, 2 Apr 2012 07:21:55 +0200 Subject: [Bioperl-l] data into R from Perl arrays Message-ID: Hi, a very simple doubt, but I do not know how to manage this. I want to plot a histogram for all data in 'datos.txt'. a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) dev.off() b) How could I invoke R inside Perl to do the same?? #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; push(@datos,$_); } #now I want a histogram of values in @datos Thanks!! JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 From joseramonblas at gmail.com Mon Apr 2 01:21:55 2012 From: joseramonblas at gmail.com (=?ISO-8859-1?Q?Jos=E9_Ram=F3n_Blas_Pastor?=) Date: Mon, 2 Apr 2012 07:21:55 +0200 Subject: [Bioperl-l] data into R from Perl arrays Message-ID: Hi, a very simple doubt, but I do not know how to manage this. I want to plot a histogram for all data in 'datos.txt'. a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) dev.off() b) How could I invoke R inside Perl to do the same?? #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; push(@datos,$_); } #now I want a histogram of values in @datos Thanks!! JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 From avilella at gmail.com Mon Apr 2 02:24:09 2012 From: avilella at gmail.com (Albert Vilella) Date: Mon, 2 Apr 2012 07:24:09 +0100 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: You may get a faster answer in a generic programming site, like stackoverflow.com... On Mon, Apr 2, 2012 at 6:21 AM, Jos? Ram?n Blas Pastor wrote: > Hi, > > a very simple doubt, but I do not know how to manage this. > > I want to plot a histogram for all data in 'datos.txt'. > > a) by using R: > datos<-scan("datos.txt") > pdf("xh.pdf") > hist(datos) > dev.off() > > > b) How could I invoke R inside Perl to do the same?? > #!/usr/bin/perl > open(DAT,"datos.txt"); > while () { > ?chomp; > ?push(@datos,$_); > } > #now I want a histogram of values in @datos > > Thanks!! > > JR > > -- > Jos? Ram?n Blas - PhD > Dept. Biochemistry - Medicine School > University of Castilla-La Mancha > C Almansa, 14 > 02006 Albacete (Spain) > > Phone: +34 967599200 ext. 2958 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From avilella at gmail.com Mon Apr 2 02:24:09 2012 From: avilella at gmail.com (Albert Vilella) Date: Mon, 2 Apr 2012 07:24:09 +0100 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: You may get a faster answer in a generic programming site, like stackoverflow.com... On Mon, Apr 2, 2012 at 6:21 AM, Jos? Ram?n Blas Pastor wrote: > Hi, > > a very simple doubt, but I do not know how to manage this. > > I want to plot a histogram for all data in 'datos.txt'. > > a) by using R: > datos<-scan("datos.txt") > pdf("xh.pdf") > hist(datos) > dev.off() > > > b) How could I invoke R inside Perl to do the same?? > #!/usr/bin/perl > open(DAT,"datos.txt"); > while () { > ?chomp; > ?push(@datos,$_); > } > #now I want a histogram of values in @datos > > Thanks!! > > JR > > -- > Jos? Ram?n Blas - PhD > Dept. Biochemistry - Medicine School > University of Castilla-La Mancha > C Almansa, 14 > 02006 Albacete (Spain) > > Phone: +34 967599200 ext. 2958 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Mon Apr 2 04:17:56 2012 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 2 Apr 2012 09:17:56 +0100 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> The quickest way to do this specific example is probably to just save your R commands to a file (eg R_commands.R) and then run some kind of system/exec/backtick function in your perl script to invoke R, something like (untested): #!/usr/bin/perl use strict; use warnings; system( 'R --file R_commands.R' ); Alternatively if you want perl and R to be able to interact and pass data back and forth, you could use something like Statistics::R http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm HTH adam On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: > Hi, > > a very simple doubt, but I do not know how to manage this. > > I want to plot a histogram for all data in 'datos.txt'. > > a) by using R: > datos<-scan("datos.txt") > pdf("xh.pdf") > hist(datos) > dev.off() > > > b) How could I invoke R inside Perl to do the same?? > #!/usr/bin/perl > open(DAT,"datos.txt"); > while () { > chomp; > push(@datos,$_); > } > #now I want a histogram of values in @datos > > Thanks!! > > JR > > -- > Jos? Ram?n Blas - PhD > Dept. Biochemistry - Medicine School > University of Castilla-La Mancha > C Almansa, 14 > 02006 Albacete (Spain) > > Phone: +34 967599200 ext. 2958 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Mon Apr 2 06:33:06 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 02 Apr 2012 11:33:06 +0100 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> References: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> Message-ID: <4F798062.4050908@gmail.com> Alternatively you could go for a Perl-only approach using something like GD::Graph::Histogram. Cheers, Roy. On 02/04/2012 09:17, Adam Witney wrote: > > The quickest way to do this specific example is probably to just save > your R commands to a file (eg R_commands.R) and then run some kind of > system/exec/backtick function in your perl script to invoke R, > something like (untested): > > #!/usr/bin/perl use strict; use warnings; system( 'R --file > R_commands.R' ); > > Alternatively if you want perl and R to be able to interact and pass > data back and forth, you could use something like Statistics::R > > http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm > > HTH > > adam > > On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: > >> Hi, >> >> a very simple doubt, but I do not know how to manage this. >> >> I want to plot a histogram for all data in 'datos.txt'. >> >> a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) >> dev.off() >> >> >> b) How could I invoke R inside Perl to do the same?? >> #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; >> push(@datos,$_); } #now I want a histogram of values in @datos >> >> Thanks!! >> >> JR >> >> -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School >> University of Castilla-La Mancha C Almansa, 14 02006 Albacete >> (Spain) >> >> Phone: +34 967599200 ext. 2958 >> >> _______________________________________________ Bioperl-l mailing >> list Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ Bioperl-l mailing > list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Apr 2 08:59:40 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 2 Apr 2012 12:59:40 +0000 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> References: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> Message-ID: Not sure how well it is supported, but there is also Statistics::useR (which has an XS layer for conversing with R). https://metacpan.org/module/Statistics::useR chris On Apr 2, 2012, at 3:17 AM, Adam Witney wrote: > > The quickest way to do this specific example is probably to just save your R commands to a file (eg R_commands.R) and then run some kind of system/exec/backtick function in your perl script to invoke R, something like (untested): > > #!/usr/bin/perl > use strict; > use warnings; > system( 'R --file R_commands.R' ); > > Alternatively if you want perl and R to be able to interact and pass data back and forth, you could use something like Statistics::R > > http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm > > HTH > > adam > > On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: > >> Hi, >> >> a very simple doubt, but I do not know how to manage this. >> >> I want to plot a histogram for all data in 'datos.txt'. >> >> a) by using R: >> datos<-scan("datos.txt") >> pdf("xh.pdf") >> hist(datos) >> dev.off() >> >> >> b) How could I invoke R inside Perl to do the same?? >> #!/usr/bin/perl >> open(DAT,"datos.txt"); >> while () { >> chomp; >> push(@datos,$_); >> } >> #now I want a histogram of values in @datos >> >> Thanks!! >> >> JR >> >> -- >> Jos? Ram?n Blas - PhD >> Dept. Biochemistry - Medicine School >> University of Castilla-La Mancha >> C Almansa, 14 >> 02006 Albacete (Spain) >> >> Phone: +34 967599200 ext. 2958 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bartwiegmans at gmail.com Mon Apr 2 13:10:47 2012 From: bartwiegmans at gmail.com (Bart Wiegmans) Date: Mon, 2 Apr 2012 19:10:47 +0200 Subject: [Bioperl-l] Implementing Bioperl6 for GSoC 2012 In-Reply-To: <6D16BB69-A74A-4A7E-B81C-0B33E9A73473@illinois.edu> References: <6D16BB69-A74A-4A7E-B81C-0B33E9A73473@illinois.edu> Message-ID: Chris, Leon, others, Thank you for your timely responses. So far as the timeframe is concerned, I might be able to get student credits for participating in this projects as it is related to my study. In that case I would have more time free. At any rate, I understand it is suboptimal to start working in july, so I will do my best to make as much time free as possible. I've already checked out the bioper-6 projects as well as the biome project from github. I am not quite sure what scope of project to choose and I was hoping for your advice. File format import / export and database connectivity would come to mind, as these are subjects I am most familiar with. In such a scenario, aside from a set of modules / classes, the end goal would be a script that could search for and import a sequence from a number of popular databases, and save it on the users' hard disk. I am very much open to suggestions, however. Anyway, thank you for your time. Kind regards, Bart Wiegmans 2012/4/2 Fields, Christopher J : > Bart, > > I think this is a good idea, but to address one concern up-front: I believe the time frame (possible start in mid-July) will not be in your favor unless you can participate according to Google's schedule. ?There is some flexibility at the beginning, mainly during the community bonding period, but beyond that there isn't much wiggle room; we're as constrained as you are. ?The proposals to OBF over the last few years have been highly competitive, and this year will prove even more so as a large number of major Perl-based FLOSS orgs (primarily The Perl Foundation) were not accepted into GSoC. > > Now for Perl 6: > > BioPerl6 is a project Philip Mabon and I have already started up on github: > > ? https://github.com/cjfields/bioperl6 > > The code is basically proof-of-concept stuff at this point, and I don't believe it's current to spec last I checked or that it works on either the most current niecza or rakudo. ?All this was enough of a moving target to make it hard to maintain, but that seems to be settling somewhat now. ?It's pretty wide open, though, as far as I'm concerned. > > If you are to pursue this as a proposal, you will need to focus on a specific task or set of classes to implement along with docs and tests. ?A straight port isn't the best option; you should take advantage of Perl 6's strengths and possibly work on redesign where needed, keeping in mind that some aspects of bioperl might be considered a bit over-designed and out-of-date w/ re: to modern perl dictums. ?Also, learning a new language is nice, but that isn't the main focus for any GSoC project. ?At the end of the day, we should have usable code for the community, and if it acts as a entry point for more people to get involved with Perl 6 the better :) > > (I see that Leon has also chimed in on this with similar comments as well) > > I will be on and off #bioperl this week (pyrimidine). ?IRC is also logged in case I need to backlog (provided by one Moritz Lenz): > > ? http://irclog.perlgeek.de/bioperl/today > > chris > > On Apr 1, 2012, at 6:55 AM, Bart Wiegmans wrote: > >> Hello all BioPerl-ers, >> >> This is my first e-mail to the list and thus my introduction. My name >> is Bart Wiegmans, I study biology at the University of Groningen, the >> netherlands. It is my goal to implement bioperl6 this summer as part >> of the GSoC program. >> >> Why would I want such a thing? For a start, I'd like to learn more >> about bioinformatics. As I told you I study biology, so this has an >> obvious advantage for me. Also, I'd like to learn perl6 well, and this >> is only possible when one writes a significant program in it. >> Moreover, I think perl6 is awesome, and having a real-world toolkit >> like bioperl out there might just be enough to develop a significant >> community using it. As a third, I think perl5's object support is >> crufty, and difficult to learn for many people. These people include >> biologists who might not be inclined to learn it, and rather use some >> other tools instead. >> >> As to who I am, I already told you my name. I am 24 years old, and >> study biology at an undergraduate level. (For those interested, yes >> this means I haven't exactly been flying through my courses :-)). I >> have been programming computers ever since I was 16 years old, and >> earlier if you count BASIC. Starting out with C, most of that has been >> websites (in PHP), scripts (in Perl), and other smallish programs (in >> Java / Perl). For example, I implemented a parser and decoder for the >> dirac video specification as part of GSoC 2008, and a script which >> reads the NIH bookshelf website and translates this into ePub e-books. >> Read quite a few of them that way. >> >> Aside from my motivation and capabilities, two other factors somewhat >> complicate my involvement with GSoC. The first is that the academic >> year ends halfway in July in the netherlands, not in may as in the USA >> and in many other countries. This means that I am not 'free' in a real >> sense before that time. Also, I have a day job as a PHP programmer for >> a local online students' magazine, which also takes some time. Which >> is unfortunate, because I'd rather spend my time writing useful >> programs; hence, if you would accept me as a student I plan to take >> leave from this job during the period of GSoC. >> >> Anyway, I realize this has been enough information for any interested >> reader. If there is any interest on your side, I frequent freenode >> under the nickname brrt. Other than that and this e-mail address, I >> don't have much of an online presence. >> >> Kind regards, >> Bart Wiegmans >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florent.angly at gmail.com Mon Apr 2 18:30:09 2012 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 03 Apr 2012 08:30:09 +1000 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> Message-ID: <4F7A2871.8080207@gmail.com> To execute R commands from Perl, you can also try Statistics::R (http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm ), which has been around for longer, and which I have recently refactored. Regards, Florent On 02/04/12 22:59, Fields, Christopher J wrote: > Not sure how well it is supported, but there is also Statistics::useR (which has an XS layer for conversing with R). > > https://metacpan.org/module/Statistics::useR > > chris > > On Apr 2, 2012, at 3:17 AM, Adam Witney wrote: > >> The quickest way to do this specific example is probably to just save your R commands to a file (eg R_commands.R) and then run some kind of system/exec/backtick function in your perl script to invoke R, something like (untested): >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> system( 'R --file R_commands.R' ); >> >> Alternatively if you want perl and R to be able to interact and pass data back and forth, you could use something like Statistics::R >> >> http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm >> >> HTH >> >> adam >> >> On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: >> >>> Hi, >>> >>> a very simple doubt, but I do not know how to manage this. >>> >>> I want to plot a histogram for all data in 'datos.txt'. >>> >>> a) by using R: >>> datos<-scan("datos.txt") >>> pdf("xh.pdf") >>> hist(datos) >>> dev.off() >>> >>> >>> b) How could I invoke R inside Perl to do the same?? >>> #!/usr/bin/perl >>> open(DAT,"datos.txt"); >>> while () { >>> chomp; >>> push(@datos,$_); >>> } >>> #now I want a histogram of values in @datos >>> >>> Thanks!! >>> >>> JR >>> >>> -- >>> Jos? Ram?n Blas - PhD >>> Dept. Biochemistry - Medicine School >>> University of Castilla-La Mancha >>> C Almansa, 14 >>> 02006 Albacete (Spain) >>> >>> Phone: +34 967599200 ext. 2958 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From huangyifeicmb at gmail.com Mon Apr 2 20:41:54 2012 From: huangyifeicmb at gmail.com (Yifei Huang) Date: Mon, 2 Apr 2012 20:41:54 -0400 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: <4F7A2871.8080207@gmail.com> References: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> <4F7A2871.8080207@gmail.com> Message-ID: You may try RSPerl. http://www.omegahat.org/RSPerl/ Yifei 2012/4/2 Florent Angly > To execute R commands from Perl, you can also try Statistics::R ( > http://search.cpan.org/~**fangly/Statistics-R-0.27/lib/**Statistics/R.pm< > http://search.cpan.org/%**7Efangly/Statistics-R-0.27/**lib/Statistics/R.pm>), > which has been around for longer, and which I have recently refactored. > Regards, > Florent > > > On 02/04/12 22:59, Fields, Christopher J wrote: > >> Not sure how well it is supported, but there is also Statistics::useR >> (which has an XS layer for conversing with R). >> >> https://metacpan.org/module/**Statistics::useR >> >> chris >> >> On Apr 2, 2012, at 3:17 AM, Adam Witney wrote: >> >> The quickest way to do this specific example is probably to just save >>> your R commands to a file (eg R_commands.R) and then run some kind of >>> system/exec/backtick function in your perl script to invoke R, something >>> like (untested): >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> system( 'R --file R_commands.R' ); >>> >>> Alternatively if you want perl and R to be able to interact and pass >>> data back and forth, you could use something like Statistics::R >>> >>> http://search.cpan.org/~**fangly/Statistics-R-0.27/lib/**Statistics/R.pm >>> >>> HTH >>> >>> adam >>> >>> On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: >>> >>> Hi, >>>> >>>> a very simple doubt, but I do not know how to manage this. >>>> >>>> I want to plot a histogram for all data in 'datos.txt'. >>>> >>>> a) by using R: >>>> datos<-scan("datos.txt") >>>> pdf("xh.pdf") >>>> hist(datos) >>>> dev.off() >>>> >>>> >>>> b) How could I invoke R inside Perl to do the same?? >>>> #!/usr/bin/perl >>>> open(DAT,"datos.txt"); >>>> while () { >>>> chomp; >>>> push(@datos,$_); >>>> } >>>> #now I want a histogram of values in @datos >>>> >>>> Thanks!! >>>> >>>> JR >>>> >>>> -- >>>> Jos? Ram?n Blas - PhD >>>> Dept. Biochemistry - Medicine School >>>> University of Castilla-La Mancha >>>> C Almansa, 14 >>>> 02006 Albacete (Spain) >>>> >>>> Phone: +34 967599200 ext. 2958 >>>> >>>> ______________________________**_________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >>>> >>> >>> ______________________________**_________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >>> >> >> ______________________________**_________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> > > ______________________________**_________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/**mailman/listinfo/bioperl-l > -- Yifei Huang Department of Biology McMaster University From gregonomic at yahoo.co.nz Mon Apr 2 20:37:38 2012 From: gregonomic at yahoo.co.nz (Gregory Baillie) Date: Mon, 2 Apr 2012 17:37:38 -0700 (PDT) Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: <1333413458.84973.YahooMailNeo@web112506.mail.gq1.yahoo.com> Hi. If you only want to plot histograms (or other kinds of graphs), and don't need to use the statistics capabilities of R, then gnuplot (http://www.gnuplot.info/) is a good option. You can construct a string of gnuplot commands and your data, and pipe that directly to gnuplot, without having to write the commands or data to separate files and then use a system call to R. Of course, you do have to learn gnuplot, which may be more trouble than it's worth (I like it, but some people don't). Greg. ________________________________ From: Jos? Ram?n Blas Pastor To: bioperl-l at lists.open-bio.org; Bioperl mailing list Sent: Monday, 2 April 2012 3:21 PM Subject: [Bioperl-l] data into R from Perl arrays Hi, a very simple doubt, but I do not know how to manage this. I want to plot a histogram for all data in 'datos.txt'. a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) dev.off() b) How could I invoke R inside Perl to do the same?? #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; push(@datos,$_); } #now I want a histogram of values in @datos Thanks!! JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From gregonomic at yahoo.co.nz Mon Apr 2 20:37:38 2012 From: gregonomic at yahoo.co.nz (Gregory Baillie) Date: Mon, 2 Apr 2012 17:37:38 -0700 (PDT) Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: <1333413458.84973.YahooMailNeo@web112506.mail.gq1.yahoo.com> Hi. If you only want to plot histograms (or other kinds of graphs), and don't need to use the statistics capabilities of R, then gnuplot (http://www.gnuplot.info/) is a good option. You can construct a string of gnuplot commands and your data, and pipe that directly to gnuplot, without having to write the commands or data to separate files and then use a system call to R. Of course, you do have to learn gnuplot, which may be more trouble than it's worth (I like it, but some people don't). Greg. ________________________________ From: Jos? Ram?n Blas Pastor To: bioperl-l at lists.open-bio.org; Bioperl mailing list Sent: Monday, 2 April 2012 3:21 PM Subject: [Bioperl-l] data into R from Perl arrays Hi, a very simple doubt, but I do not know how to manage this. I want to plot a histogram for all data in 'datos.txt'. a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) dev.off() b) How could I invoke R inside Perl to do the same?? #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; push(@datos,$_); } #now I want a histogram of values in @datos Thanks!! JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Tue Apr 3 11:34:43 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 3 Apr 2012 11:34:43 -0400 Subject: [Bioperl-l] Downloading refseq genomes in batch Message-ID: Hi All, I am trying to download refseq genomes in batch. But instead of accession number i have genome names (=~ 500). Is there any way i can download them using some bioperl module ? Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From carandraug+dev at gmail.com Tue Apr 3 11:53:32 2012 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Tue, 3 Apr 2012 16:53:32 +0100 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: Message-ID: On 3 April 2012 16:34, shalabh sharma wrote: > Hi All, > ? ? ? ? I am trying to download refseq genomes in batch. But instead of > accession number i have genome names (=~ 500). > Is there any way i can download them using some bioperl module ? If you have their name/official symbol, then searching on the database should nly return one hit, therefore one UID. Make the search, get that number, and use it for download. The EUtilities module should do that. Carn? From shalabh.sharma7 at gmail.com Tue Apr 3 14:15:16 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 3 Apr 2012 14:15:16 -0400 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: Message-ID: Hi Came, Thanks for your reply. I tried to get UID from genome names but i cant find on EUtilities. I have taxa id for those genomes, can i download genomes with taxa id in batch ? Thanks Shalabh On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug wrote: > On 3 April 2012 16:34, shalabh sharma wrote: > > Hi All, > > I am trying to download refseq genomes in batch. But instead of > > accession number i have genome names (=~ 500). > > Is there any way i can download them using some bioperl module ? > > If you have their name/official symbol, then searching on the database > should nly return one hit, therefore one UID. Make the search, get > that number, and use it for download. The EUtilities module should do > that. > > Carn? > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From berry at exisoft.nl Tue Apr 3 16:24:54 2012 From: berry at exisoft.nl (Berry Kriesels) Date: Tue, 03 Apr 2012 22:24:54 +0200 Subject: [Bioperl-l] Google summer of code Bio::Structure Message-ID: <4F7B5C96.5090208@exisoft.nl> Dear all, Currently I am considering applying as a student for the 'google summer of code' and would like to contribute to BioPerl via this way. At the moment I am investigating extending the BioPerl Bio::Structure**library in such a way that also some protein modelling can be done or at least add a method so one could do a pdb structure quality assessment. One way is to do it with the use of online services such as for instance Prosaweb (and thus creating a wrapper for this service). Also I could make libraries which one could use to asses the phi and psi angles of certain atoms within a PDB file or the distance in angstrom among many other coordinate measurements within a protein PDB file but also among (comparison) of multiple PDB files. Also adding functions such as DOPE (*D*iscrete*O*ptimized*P*rotein*E*nergy) for model comparisons is an option. There are tons of options to add. However... I have a few questions regarding this and hope some of you will be willing to answer: 1. As users of BioPerl would you consider extending the current Bio::Structure library as a added value or would you rather see effort made in different areas. 2. If one would see extension of the current Bio:Structure library as a useful project, what would your main interests and wishes be? Thank you for input and time. With kind regards, Berry Msc student Bio-informatics. From jovel_juan at hotmail.com Tue Apr 3 17:02:26 2012 From: jovel_juan at hotmail.com (Juan Jovel) Date: Tue, 3 Apr 2012 21:02:26 +0000 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: , , Message-ID: Hi Shalab You can try use Bio::DB::GenBank, but I believe the NCBI does not like people doing many remote lookups. I would advise you download the whole database you are interested in, and then you parse it locally. Cheers, Juan > Date: Tue, 3 Apr 2012 14:15:16 -0400 > From: shalabh.sharma7 at gmail.com > To: carandraug+dev at gmail.com > CC: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Downloading refseq genomes in batch > > Hi Came, > Thanks for your reply. > I tried to get UID from genome names but i cant find on EUtilities. > I have taxa id for those genomes, can i download genomes with taxa id in > batch ? > > Thanks > Shalabh > > > On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug wrote: > > > On 3 April 2012 16:34, shalabh sharma wrote: > > > Hi All, > > > I am trying to download refseq genomes in batch. But instead of > > > accession number i have genome names (=~ 500). > > > Is there any way i can download them using some bioperl module ? > > > > If you have their name/official symbol, then searching on the database > > should nly return one hit, therefore one UID. Make the search, get > > that number, and use it for download. The EUtilities module should do > > that. > > > > Carn? > > > > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Apr 3 17:19:07 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 3 Apr 2012 21:19:07 +0000 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: , , Message-ID: 500 sequences isn't too bad for a remote lookup (I have run about ~20K myself). It's much easier if you can grab them as a batch, e.g. run esearch for the IDs, use efetch with the webenv/key to grab the sequences. NCBI is more worried about the number of requests made, the length of time between requests, and the time of day requests are made. In fact, I recall updating EUtilities recently so it can use a POST, so you can grab ~2000 seqs at a time w/o having to iterate through them. chris On Apr 3, 2012, at 4:02 PM, Juan Jovel wrote: > > Hi Shalab > You can try use Bio::DB::GenBank, but I believe the NCBI does not like people doing many remote lookups. I would advise you download the whole database you are interested in, and then you parse it locally. > Cheers, Juan >> Date: Tue, 3 Apr 2012 14:15:16 -0400 >> From: shalabh.sharma7 at gmail.com >> To: carandraug+dev at gmail.com >> CC: Bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Downloading refseq genomes in batch >> >> Hi Came, >> Thanks for your reply. >> I tried to get UID from genome names but i cant find on EUtilities. >> I have taxa id for those genomes, can i download genomes with taxa id in >> batch ? >> >> Thanks >> Shalabh >> >> >> On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug wrote: >> >>> On 3 April 2012 16:34, shalabh sharma wrote: >>>> Hi All, >>>> I am trying to download refseq genomes in batch. But instead of >>>> accession number i have genome names (=~ 500). >>>> Is there any way i can download them using some bioperl module ? >>> >>> If you have their name/official symbol, then searching on the database >>> should nly return one hit, therefore one UID. Make the search, get >>> that number, and use it for download. The EUtilities module should do >>> that. >>> >>> Carn? >>> >> >> >> >> -- >> Shalabh Sharma >> Scientific Computing Professional Associate (Bioinformatics Specialist) >> Department of Marine Sciences >> University of Georgia >> Athens, GA 30602-3636 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Wed Apr 4 17:24:08 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 4 Apr 2012 17:24:08 -0400 Subject: [Bioperl-l] Weird efetch problem. Message-ID: Hi All, I am facing a really weird problem using efetch. I am getting different outputs if i am using different method of passing values. Like if i am using this method: #!/usr/bin/perl -w use Bio::DB::EUtilities; use Bio::SeqIO; my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'protein', -rettype => 'fasta', -email => 'ssharmai at uga.edu', -id => '256009369'); my $file = 'genome.fasta'; $factory->get_Response(-file => $file); I am getting correct protein sequence but if i am passing values (same id) via an array i am getting nucleotide sequences. use Bio::DB::EUtilities; use Bio::SeqIO; my @ids; my $c = 0; open(IN,"$ARGV[0]"); while(){ my $id = $_; chomp($id);chop($id); $ids[$c] = $id; print "$id\n"; $c++; } close(IN); my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'protein', -rettype => 'fasta', -email => 'ssharmai at uga.edu', -id => \@ids); my $file = 'genome.fasta'; Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From sidd.basu at gmail.com Thu Apr 5 06:31:47 2012 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 5 Apr 2012 05:31:47 -0500 Subject: [Bioperl-l] Re: Weird efetch problem. In-Reply-To: References: Message-ID: <20120405103146.GA5544@Macintosh-388.local> On Wed, 04 Apr 2012, shalabh sharma wrote: > Hi All, > I am facing a really weird problem using efetch. I am getting > different outputs if i am using different method of passing values. > > Like if i am using this method: > > #!/usr/bin/perl -w > use Bio::DB::EUtilities; > use Bio::SeqIO; > my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -db => 'protein', > -rettype => 'fasta', > -email => 'ssharmai at uga.edu', > -id => '256009369'); > > my $file = 'genome.fasta'; > $factory->get_Response(-file => $file); > > I am getting correct protein sequence but if i am passing values (same id) > via an array i am getting nucleotide sequences. > > use Bio::DB::EUtilities; > use Bio::SeqIO; > my @ids; > my $c = 0; > open(IN,"$ARGV[0]"); > while(){ > my $id = $_; > chomp($id);chop($id); > $ids[$c] = $id; > print "$id\n"; > $c++; > } > close(IN); > my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -db => 'protein', > -rettype => 'fasta', > -email => 'ssharmai at uga.edu', > -id => \@ids); Could you send the ids here. -siddhartha > > my $file = 'genome.fasta'; > > Thanks > Shalabh > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Apr 5 09:07:28 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 5 Apr 2012 13:07:28 +0000 Subject: [Bioperl-l] Weird efetch problem. In-Reply-To: <20120405103146.GA5544@Macintosh-388.local> References: <20120405103146.GA5544@Macintosh-388.local> Message-ID: On Apr 5, 2012, at 5:31 AM, Siddhartha Basu wrote: > On Wed, 04 Apr 2012, shalabh sharma wrote: > >> Hi All, >> I am facing a really weird problem using efetch. I am getting >> different outputs if i am using different method of passing values. >> ... >> >> I am getting correct protein sequence but if i am passing values (same id) >> via an array i am getting nucleotide sequences. >> >> .. > Could you send the ids here. > > -siddhartha And please file a bug report on this if something is found. I do know if you use accession numbers you can sometimes get odd results. I recommend only using UIDs (the GI in the case of protein and nuc seqs). chris From shalabh.sharma7 at gmail.com Thu Apr 5 10:40:06 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 5 Apr 2012 10:40:06 -0400 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: Message-ID: Hi All, Thanks for all the suggestions. Thanks a lot Chris, i am using your method to pull out genomes. Its working fine. Thanks Shalabh On Tue, Apr 3, 2012 at 5:19 PM, Fields, Christopher J wrote: > 500 sequences isn't too bad for a remote lookup (I have run about ~20K > myself). It's much easier if you can grab them as a batch, e.g. run > esearch for the IDs, use efetch with the webenv/key to grab the sequences. > NCBI is more worried about the number of requests made, the length of time > between requests, and the time of day requests are made. > > In fact, I recall updating EUtilities recently so it can use a POST, so > you can grab ~2000 seqs at a time w/o having to iterate through them. > > chris > > On Apr 3, 2012, at 4:02 PM, Juan Jovel wrote: > > > > > Hi Shalab > > You can try use Bio::DB::GenBank, but I believe the NCBI does not like > people doing many remote lookups. I would advise you download the whole > database you are interested in, and then you parse it locally. > > Cheers, Juan > >> Date: Tue, 3 Apr 2012 14:15:16 -0400 > >> From: shalabh.sharma7 at gmail.com > >> To: carandraug+dev at gmail.com > >> CC: Bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Downloading refseq genomes in batch > >> > >> Hi Came, > >> Thanks for your reply. > >> I tried to get UID from genome names but i cant find on EUtilities. > >> I have taxa id for those genomes, can i download genomes with taxa id in > >> batch ? > >> > >> Thanks > >> Shalabh > >> > >> > >> On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug >wrote: > >> > >>> On 3 April 2012 16:34, shalabh sharma > wrote: > >>>> Hi All, > >>>> I am trying to download refseq genomes in batch. But instead of > >>>> accession number i have genome names (=~ 500). > >>>> Is there any way i can download them using some bioperl module ? > >>> > >>> If you have their name/official symbol, then searching on the database > >>> should nly return one hit, therefore one UID. Make the search, get > >>> that number, and use it for download. The EUtilities module should do > >>> that. > >>> > >>> Carn? > >>> > >> > >> > >> > >> -- > >> Shalabh Sharma > >> Scientific Computing Professional Associate (Bioinformatics Specialist) > >> Department of Marine Sciences > >> University of Georgia > >> Athens, GA 30602-3636 > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From k.d.murray.91 at gmail.com Fri Apr 6 09:49:32 2012 From: k.d.murray.91 at gmail.com (Kevin Murray) Date: Fri, 6 Apr 2012 23:49:32 +1000 Subject: [Bioperl-l] sequence proxy server Message-ID: Hi all, I'm an undergrad student in molecular biology at the ANU in Australia, and my research projects are becoming increasingly bioinformatics heavy. The latest one has involved quite a large amount of sequence retrieval from GenBank and GenPept. The download speed to Australia from NCBI's servers is rather slow, and i've been thinking about how we can improve this. One solution would be to use Bio::DB::Flat with GenBank sequences on a local computer. However, in a situation where there are multiple people in a lab doing bioinformatics, it seems to me a bit of a waste to have the entire genbank/genpept database, or even the relevant sections thereof, on each computer. So, i though about writing a "sequence proxy" cgi script, and a corresponding module, which would work a bit like this: The user calls Bio::DB::SeqProxy::GenBank as they would Bio::DB::GenBank, with the exception that a parameter for the address of the sequence proxy server is required. The module then sends a request similar to that sent to NCBI's servers by calling Bio::DB::GenBank->get_Seq_by_x() to the sequence proxy server I believe all requests go to the efetch page now (please correct me if I'm wrong, i have read the relevant bioperl module code but not thoroughly), so the CGI script on the sequence proxy would take arguments in a similar fashion to make writing the client side module easier. The CGI script would use a Bio::DB::Flat database, or an interface to an SQL database to determine if the required sequence is stored locally. (as a aside, i'd like your thoughts on Bio::DB::Flat vs Bio::DB::Sql or similar) If the sequence exists locally, it would be returned to the user, either as plain text, or inside an XML container (see below). If not, it would be retrieved from the remote database using the relevant Bio::DB module, and returned. The sequence would either be returned as the relevant sequence format (which would default to GenBank format) in plain text, or as an XML document similar to: 1 ___YOUR GENBANK FILE HERE___ Local Database The aim of the xml document would be to simplify handling of server errors and allow for the specification of other metadata such as which database the sequence came from. Firstly, I'd like to know if this sounds feasible, and if so, if someone is already working on something similar? I don't want to reinvent the wheel. Secondly, I'd like to ask for your comments and advice. Being reasonably new to bioperl (started using bioperl about 6 months ago, but I've been coding in various languages for 8 years) I don't expect to have considered things that may seem obvious to a more experienced bioperl-er, so please be as brutally constructive in your criticism as you see fit =]. I know this is alot of questions, so thanks in advance for your help. Cheers, and a happy Easter to those who celebrate it. Regards Kevin Murray From shalabh.sharma7 at gmail.com Fri Apr 6 10:52:30 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 6 Apr 2012 10:52:30 -0400 Subject: [Bioperl-l] Question about EUtils esearch Message-ID: Hi All, I am trying to get all the UIDs for few genomes. For example: for 'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to NCBI genome page i see there are only =~ 32,00 proteins. http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens I have done this for lot of genomes and i am afraid that i have to do this again. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From shalabh.sharma7 at gmail.com Fri Apr 6 14:27:29 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 6 Apr 2012 14:27:29 -0400 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: Message-ID: Hi Chris, I am using the method you suggested. But i have a question. The UIDs that i am searching using "esearch" are not same as the number of proteins in that genome. For Example: for 'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to NCBI genome page i see there are only =~ 32,00 proteins. http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens Thanks Shalabh On Thu, Apr 5, 2012 at 10:40 AM, shalabh sharma wrote: > Hi All, > Thanks for all the suggestions. > Thanks a lot Chris, i am using your method to pull out genomes. Its > working fine. > > Thanks > Shalabh > > > On Tue, Apr 3, 2012 at 5:19 PM, Fields, Christopher J < > cjfields at illinois.edu> wrote: > >> 500 sequences isn't too bad for a remote lookup (I have run about ~20K >> myself). It's much easier if you can grab them as a batch, e.g. run >> esearch for the IDs, use efetch with the webenv/key to grab the sequences. >> NCBI is more worried about the number of requests made, the length of time >> between requests, and the time of day requests are made. >> >> In fact, I recall updating EUtilities recently so it can use a POST, so >> you can grab ~2000 seqs at a time w/o having to iterate through them. >> >> chris >> >> On Apr 3, 2012, at 4:02 PM, Juan Jovel wrote: >> >> > >> > Hi Shalab >> > You can try use Bio::DB::GenBank, but I believe the NCBI does not like >> people doing many remote lookups. I would advise you download the whole >> database you are interested in, and then you parse it locally. >> > Cheers, Juan >> >> Date: Tue, 3 Apr 2012 14:15:16 -0400 >> >> From: shalabh.sharma7 at gmail.com >> >> To: carandraug+dev at gmail.com >> >> CC: Bioperl-l at lists.open-bio.org >> >> Subject: Re: [Bioperl-l] Downloading refseq genomes in batch >> >> >> >> Hi Came, >> >> Thanks for your reply. >> >> I tried to get UID from genome names but i cant find on EUtilities. >> >> I have taxa id for those genomes, can i download genomes with taxa id >> in >> >> batch ? >> >> >> >> Thanks >> >> Shalabh >> >> >> >> >> >> On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug > >wrote: >> >> >> >>> On 3 April 2012 16:34, shalabh sharma >> wrote: >> >>>> Hi All, >> >>>> I am trying to download refseq genomes in batch. But instead >> of >> >>>> accession number i have genome names (=~ 500). >> >>>> Is there any way i can download them using some bioperl module ? >> >>> >> >>> If you have their name/official symbol, then searching on the database >> >>> should nly return one hit, therefore one UID. Make the search, get >> >>> that number, and use it for download. The EUtilities module should do >> >>> that. >> >>> >> >>> Carn? >> >>> >> >> >> >> >> >> >> >> -- >> >> Shalabh Sharma >> >> Scientific Computing Professional Associate (Bioinformatics Specialist) >> >> Department of Marine Sciences >> >> University of Georgia >> >> Athens, GA 30602-3636 >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Fri Apr 6 15:09:23 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 6 Apr 2012 19:09:23 +0000 Subject: [Bioperl-l] Question about EUtils esearch In-Reply-To: References: Message-ID: Shalabh, You should try getting the specific genome project ID of interest, linking to the proteins, and then grab those. The EUtilities cookbook has a few examples on how to do that. chris On Apr 6, 2012, at 9:52 AM, shalabh sharma wrote: > Hi All, > I am trying to get all the UIDs for few genomes. > For example: > for 'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to > NCBI genome page i see there are only =~ 32,00 proteins. > http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens > > I have done this for lot of genomes and i am afraid that i have to do this > again. > > Thanks > Shalabh > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From wrp at virginia.edu Sat Apr 7 16:56:16 2012 From: wrp at virginia.edu (William Pearson) Date: Sat, 7 Apr 2012 16:56:16 -0400 Subject: [Bioperl-l] Bioperl-l Digest, Vol 108, Issue 7 In-Reply-To: References: Message-ID: To get the UIDs (GIs) that you want, search for human[organism] AND srcdb_refseq[Properties] This will get you the refseq proteins you want. Bill Pearson > Message: 1 > Date: Fri, 6 Apr 2012 14:27:29 -0400 > From: shalabh sharma > Subject: Re: [Bioperl-l] Downloading refseq genomes in batch > > Hi Chris, > I am using the method you suggested. > But i have a question. The UIDs that i am searching using "esearch" are not > same as the number of proteins in that genome. > > For Example: > for 'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to > NCBI genome page i see there are only =~ 32,00 proteins. > http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens > > Thanks > Shalabh > From joel.klein at wur.nl Sun Apr 8 19:35:18 2012 From: joel.klein at wur.nl (Bradyjoel) Date: Sun, 8 Apr 2012 16:35:18 -0700 (PDT) Subject: [Bioperl-l] Need some help with : Error "Use of uninitialized value" Message-ID: <33653318.post@talk.nabble.com> Hi all, I have little experiences in programming with Perl/Bioperl. I'm currently working on a script that takes a whole genome from a bacteria as input, converts it into a multiple fasta file containing all the open reading frames and blast it against a multiple protein fasta file with know proteins. When I get a hit I want to combine the header of the known protein with the orf sequence, here it gives an error when I try to go through the orf file and extract the right corresponding sequence. The error it gives is : Use of uninitialized value $seq in print at blastscript.pl line .. Is there someone who has an idea what caused this error, and can help me with solving it? Regards, Joel (I put my script in the attachment) http://old.nabble.com/file/p33653318/blastscript.pl blastscript.pl -- View this message in context: http://old.nabble.com/Need-some-help-with-%3A-Error-%22Use-of-uninitialized-value%22-tp33653318p33653318.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From afia.hayati at gmail.com Thu Apr 5 00:52:01 2012 From: afia.hayati at gmail.com (afia hayati) Date: Thu, 5 Apr 2012 13:52:01 +0900 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE Message-ID: Dear all, I am afia, a PhD student in Bioinformatics. I am so interested to participate GSoC and would like to apply for project Bio::Assembly, Ace2Sam and Sam2Ace converter. I have written a proposal based on the guidance for prospective GSoC student. I paste my proposal in here. If you have time, please give me suggestions. Thank you very much. Sincerely, Afiahayati [~Be Passion, Patient and Persistent~] *Google Summer of Code 2012* *Proposal* *ACEtoSAM and SAMtoACE* 1. *Contact information * 1. Full name :Afiahayati 2. Address : Hiyoshi International House (Room C301-1), 223-0061 Yokohama-shi Kouhoku-ku, Hiyoshi 2-27 Kanagawa ? Japan 3. Email : afia.hayati at gmail.com 4. Phone number : 818044637237 5. IRC nick : afia 2. *Motivation to join this project * I am a PhD student in bioinformatics. My research is in genome assembly, especially metagenome assembly. I have same idea that the converter from ACEtoSAM and vice versa is very useful. I am familiar with Perl and BioPerl, so there is no reason for not participating in this project 3. *Programming experience and skills * 1. Perl also BioPerl since January 2010 2. R, since January 2008 3. Oracle, since January 2008 4. Biojava, since January 2007 5. PHP , since January 2006 6. C++, since January 2006 7. Java, since January 2006 8. MySQL, since January 2005 9. C , since January 2005 4. *Open source projects involved with * 1. Metagenome Assembly, 2012 (with supervisor) Develop de novo assembler for metagenomic data from short sequence reads Using C, C++ and Perl 2. Develop some interfaces in RCommander, 2010 (in team) 3. Computer system of academic hospital, 2009 (in team) By modifying an open source hospital information system, Care2x Using PHP, Java script and HTML 4. Academic data warehouse and data mining, 2008 (in team) Using Pentaho Business Analytics and R programming language 5. *Project Plan * 1. *Before April 23 * 1. Study the format of SAM and ACE more detail 2. Study the biodesign related to module Bio::Assembly::IO especially Bio::Assembly::IO::ACE and Bio::Assembly::IO::SAM 3. Study the documentation and the code of module Bio::Assembly::IO::ACE and Bio::Assembly::IO::SAM 2. *April 23 - May 20 (before official coding period) * 1. To do self coding for Bio::Assembly::IO::ACE and Bio::Assembly::IO::SAM to improve my understand. 2. Keep contact with my mentor and the BioPerl community. I will active in mailing list and IRC to confirm my understanding about Bio::Assembly::IO::ACE and Bio::Assembly::IO::SAM and also discuss the operations (the methods) needed for a module ACEtoSAM and SAMtoACE converting. 3. With the supervision from my mentor, try to determine the appropriate design of module ACEtoSAM and SAMtoACE converting. 3. *May 21 - June 21 * 1. Determine the final design of module ACEtoSAM and SAMtoACE converting. 2. Code the module ACEtoSAM and SAMtoACE converting 3. Test my code by myself 4. Discuss with my mentor to design good test 5. Test my code based on the test design 4. *June 22 - July 8 * 1. Discuss with my mentor about my code in order to publish in bioperl community 2. Publish my code to the community and learn the feedback *JULY 9 MID TERM EVALUATION * 5. *July 9 - August 5 * 1. Improving the code (do iteration activities) : 1. Keep contact with the community, learn the feedback 2. Make changes in the code, with the supervision from my mentor 3. Test the code and publish the code to the community 2. Finalize the code 3. Start writing the POD documentation 6. *August 6 - August 13 * For final documentation *A buffer of a week for unpredicted delay * *AUGUST 20 FINAL EVALUATION* From heath.obrien at gmail.com Tue Apr 3 12:56:31 2012 From: heath.obrien at gmail.com (Heath O'Brien) Date: Tue, 3 Apr 2012 16:56:31 +0000 (UTC) Subject: [Bioperl-l] =?utf-8?q?problem_with_trunc=5Fwith=5Ffeatures_=28Seq?= =?utf-8?b?VXRpbHMucG0p?= Message-ID: Hi All, I've encountered a bug in the trunc_with_features function in SeqUtils.pm, or at least behavior that was unexpected to me: Features with fuzzy coordinates in the original sequence are converted to exact coordinates in the truncated sequence. For example, the script below changes the coordinates for the feature from <1..5 to 1..5. I have modified the code to change this behavior on my system, but I thought I'd post something here in case others encounter the same problem. all good things, Heath #!/usr/bin/perl -w use strict; use warnings; use Bio::SeqIO; use Bio::SeqUtils; my $infile= shift; my $inIO = Bio::SeqIO->new('-file' => $infile, '-format' => 'genbank') or die "could not open seq file $infile\n"; my $outfile = $infile . '_out.gbk'; my $outIO = Bio::SeqIO->new('-file' => ">$outfile", '-format' => 'genbank') or die "could not open seq file $outfile\n"; my $in_seq = $inIO->next_seq; my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); $outIO->write_seq($out_seq); exit; LOCUS test_sequence 57303 bp DNA linear UNA DEFINITION Sequence to demonstrate unexpected behavior of trunc_with_features ACCESSION unknown KEYWORDS . FEATURES Location/Qualifiers source 1..10 /mol_type="genomic DNA" gene <1..5 /gene="test" CDS <1..5 /product="hypothetical protein" ORIGIN 1 caagattaaa // From mkhalfan at cshl.edu Thu Apr 5 15:29:35 2012 From: mkhalfan at cshl.edu (Khalfan, Mohammed) Date: Thu, 5 Apr 2012 19:29:35 +0000 Subject: [Bioperl-l] Bio::AlignIO->add_new ORDER? Message-ID: <7A8AC28B954CFA428BD67E1376B889A21F0077FC@EX-HS-MBX03.cshl.edu> Hi, I am having a problem trying to add a new sequence to an alignment using the order parameter. I would like to add the sequence to the first position (row) in the alignment, is this possible? Here is what my code looks like: use Bio::AlignIO; use Bio::LocatableSeq; use Bio::SimpleAlign; my $in = Bio::AlignIO->new( -file => $infile ); # $infile is the output from muscle my $aln = $in->next_aln; # build a consensus from the current alignment my $consensus = $aln->consensus_string(); # make the consensus sequence obtained in the above step into a LocatableSeq object my $consensus_obj = new Bio::LocatableSeq ( -seq => $consensus, -id => 'Consensus', -start => 1, -end => length($consensus), ); # add consensus sequence to alignment $aln->add_seq($consensus_obj, 1); ## END CODE ## I have tried $aln->add_seq(seq=>$consensus_obj, order=1); $aln->add_seq(SEQ=>$consensus_obj, ORDER=1); But no luck, I cannot get the consensus listed as the first sequence in the alignment. Is this possible? I can add it in like this successfully, but it adds it to the end, which is not what I need. $aln->add_seq($consensus_obj); These are the errors I get: Using this syntax: $aln->add_seq($consensus_obj, 1); I get this error: Use of uninitialized value in subtraction (-) at /usr/lib/perl5/site_perl/5.8.8/Bio/SimpleAlign.pm line 327, line 132. Using this syntax: $aln->add_seq(seq=>$consensus_obj, order=1); I get this error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Unable to process non locatable sequences [] STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::SimpleAlign::add_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SimpleAlign.pm:297 STACK: ./muscle_post_processor.pl:49 ----------------------------------------------------------- Any assistance would be much appreciated. Thank you. From jason.stajich at gmail.com Mon Apr 9 15:52:43 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 9 Apr 2012 14:52:43 -0500 Subject: [Bioperl-l] Need some help with : Error "Use of uninitialized value" In-Reply-To: <33653318.post@talk.nabble.com> References: <33653318.post@talk.nabble.com> Message-ID: <74341B2D-5EC2-4421-B66B-F0193CA4FB52@gmail.com> You really want to create sequence object(s) an pass these into the BLAST factory. I also can't figure out why you are manually parsing the EMBL file and then using SeqIO later. Why not use SeqIO to parse the embl/genbank file? You also don't report the line number of your current problem, but one can surmise it is here: my $seq = $db->seq($id); print $seq,"\n"; The error indicates you are looking up a sequence ID that doesn't exist since you get an undefined sequence. I would suggest printing out the name of the ID you are asking for to make sure it is correct. Typically we protect these queries like if( my $seqstr = $db->seq($id) ) { print $seqstr, "\n"; } else { warn "cannot find $id in sequence db file\n"); } I think you have not really structured your logic well enough in that loop - you only want to build Bio::DB::Fasta once, the whole point is index once and then query it multiple times. You might consider starting with this code which does a lot of the stuff you are trying to do to extract annotated features. https://github.com/bioperl/bioperl-live/blob/master/scripts/seq/bp_extract_feature_seq.pl I think you are also use tr wrong - if you want to replace replace a string with an empty string you should use s/// and you also need to escape the | character since it has special meaning. I guess in your case you just want the sequence - you would use use Bio::SeqIO to read in your sequence and then pass this back out as FASTA to give to getorf. I don't know if we have a wrapper for EMBOSS's getorf. There are probably a lot more things that need some attention but you should start on these. Jason On Apr 8, 2012, at 6:35 PM, Bradyjoel wrote: > > Hi all, > > I have little experiences in programming with Perl/Bioperl. I'm currently > working on a script that takes a whole genome from a bacteria as input, > converts it into a multiple fasta file containing all the open reading > frames and blast it against a multiple protein fasta file with know > proteins. When I get a hit I want to combine the header of the known protein > with the orf sequence, here it gives an error when I try to go through the > orf file and extract the right corresponding sequence. The error it gives is > : Use of uninitialized value $seq in print at blastscript.pl line .. > Is there someone who has an idea what caused this error, and can help me > with solving it? > > Regards, Joel (I put my script in the attachment) > http://old.nabble.com/file/p33653318/blastscript.pl blastscript.pl > -- > View this message in context: http://old.nabble.com/Need-some-help-with-%3A-Error-%22Use-of-uninitialized-value%22-tp33653318p33653318.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason.stajich at gmail.com Mon Apr 9 15:57:52 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 9 Apr 2012 14:57:52 -0500 Subject: [Bioperl-l] Bio::AlignIO->add_new ORDER? In-Reply-To: <7A8AC28B954CFA428BD67E1376B889A21F0077FC@EX-HS-MBX03.cshl.edu> References: <7A8AC28B954CFA428BD67E1376B889A21F0077FC@EX-HS-MBX03.cshl.edu> Message-ID: You cannot use order=1, it would have to be order => 1 as you are passing in a hash not an assignment. However, I think the rearrange function that parses arguments prefers a leading '-' so it should be -order => 1. Same thing for -seq=>$seq not seq=$seq Did you try using exactly what is in the perldoc? Title : add_seq Usage : $myalign->add_seq($newseq); $myalign->add_seq(-SEQ=>$newseq, -ORDER=>5); Function : Adds another sequence to the alignment. *Does not* align it - just adds it to the hashes. If -ORDER is specified, the sequence is inserted at the the position spec'd by -ORDER, and existing sequences are pushed down the storage array. Returns : nothing Args : A Bio::LocatableSeq object Positive integer for the sequence position (optional) Also - I am not sure what version of the code you are using, that line error you report is not in the current code so you may have to print out what is on those lines or consider upgrading to latest version of the code. On Apr 5, 2012, at 2:29 PM, Khalfan, Mohammed wrote: > Hi, > > I am having a problem trying to add a new sequence to an alignment using the order parameter. > > I would like to add the sequence to the first position (row) in the alignment, is this possible? Here is what my code looks like: > > use Bio::AlignIO; > use Bio::LocatableSeq; > use Bio::SimpleAlign; > > my $in = Bio::AlignIO->new( -file => $infile ); # $infile is the output from muscle > > my $aln = $in->next_aln; > > # build a consensus from the current alignment > my $consensus = $aln->consensus_string(); > > # make the consensus sequence obtained in the above step into a LocatableSeq object > my $consensus_obj = new Bio::LocatableSeq ( > -seq => $consensus, > -id => 'Consensus', > -start => 1, > -end => length($consensus), > ); > > # add consensus sequence to alignment > $aln->add_seq($consensus_obj, 1); > > ## END CODE ## > > I have tried > $aln->add_seq(seq=>$consensus_obj, order=1); > $aln->add_seq(SEQ=>$consensus_obj, ORDER=1); > > But no luck, I cannot get the consensus listed as the first sequence in the alignment. Is this possible? > > I can add it in like this successfully, but it adds it to the end, which is not what I need. > $aln->add_seq($consensus_obj); > > These are the errors I get: > > Using this syntax: $aln->add_seq($consensus_obj, 1); > I get this error: > Use of uninitialized value in subtraction (-) at /usr/lib/perl5/site_perl/5.8.8/Bio/SimpleAlign.pm line 327, line 132. > > Using this syntax: $aln->add_seq(seq=>$consensus_obj, order=1); > I get this error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Unable to process non locatable sequences [] > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::SimpleAlign::add_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SimpleAlign.pm:297 > STACK: ./muscle_post_processor.pl:49 > ----------------------------------------------------------- > > Any assistance would be much appreciated. Thank you. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From heath.obrien at gmail.com Mon Apr 9 17:37:56 2012 From: heath.obrien at gmail.com (Heath O'Brien) Date: Mon, 9 Apr 2012 17:37:56 -0400 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F8352DB.6060106@sanger.ac.uk> References: <4F8352DB.6060106@sanger.ac.uk> Message-ID: <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> Hi Frank, I just tried it with the latest version from bioperl-live, and it worked the way I described in my email. all good things, Heath On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: > Hi Heath, > > I have recently worked a bit on that module and contributed the code > to bioperl-live. I think this behaviour may already have changed but > I'm not 100% sure at the moment. When I have some time I will review > the code to confirm. In the meantime, you could give it a go with > the bioperl-live version if that's an option for you? > > Cheers, > > Frank > > > On 03/04/12 17:56, Heath O'Brien wrote: >> Hi All, >> >> I've encountered a bug in the trunc_with_features function in >> SeqUtils.pm, or at >> least behavior that was unexpected to me: >> >> Features with fuzzy coordinates in the original sequence are >> converted to exact >> coordinates in the truncated sequence. For example, the script >> below changes the >> coordinates for the feature from<1..5 to 1..5. >> >> I have modified the code to change this behavior on my system, but >> I thought I'd >> post something here in case others encounter the same problem. >> >> all good things, >> Heath >> >> >> >> #!/usr/bin/perl -w >> >> use strict; >> use warnings; >> use Bio::SeqIO; >> use Bio::SeqUtils; >> >> my $infile= shift; >> >> my $inIO = Bio::SeqIO->new('-file' => $infile, >> '-format' => 'genbank') or die "could not open seq file >> $infile\n"; >> >> my $outfile = $infile . '_out.gbk'; >> >> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >> '-format' => 'genbank') or die "could not open seq file >> $outfile\n"; >> >> my $in_seq = $inIO->next_seq; >> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >> $outIO->write_seq($out_seq); >> exit; >> >> >> LOCUS test_sequence 57303 bp DNA linear UNA >> DEFINITION Sequence to demonstrate unexpected behavior of >> trunc_with_features >> ACCESSION unknown >> KEYWORDS . >> FEATURES Location/Qualifiers >> source 1..10 >> /mol_type="genomic DNA" >> gene<1..5 >> /gene="test" >> CDS<1..5 >> /product="hypothetical protein" >> ORIGIN >> 1 caagattaaa >> // >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. From fs5 at sanger.ac.uk Mon Apr 9 17:21:31 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 09 Apr 2012 22:21:31 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: References: Message-ID: <4F8352DB.6060106@sanger.ac.uk> Hi Heath, I have recently worked a bit on that module and contributed the code to bioperl-live. I think this behaviour may already have changed but I'm not 100% sure at the moment. When I have some time I will review the code to confirm. In the meantime, you could give it a go with the bioperl-live version if that's an option for you? Cheers, Frank On 03/04/12 17:56, Heath O'Brien wrote: > Hi All, > > I've encountered a bug in the trunc_with_features function in SeqUtils.pm, or at > least behavior that was unexpected to me: > > Features with fuzzy coordinates in the original sequence are converted to exact > coordinates in the truncated sequence. For example, the script below changes the > coordinates for the feature from<1..5 to 1..5. > > I have modified the code to change this behavior on my system, but I thought I'd > post something here in case others encounter the same problem. > > all good things, > Heath > > > > #!/usr/bin/perl -w > > use strict; > use warnings; > use Bio::SeqIO; > use Bio::SeqUtils; > > my $infile= shift; > > my $inIO = Bio::SeqIO->new('-file' => $infile, > '-format' => 'genbank') or die "could not open seq file $infile\n"; > > my $outfile = $infile . '_out.gbk'; > > my $outIO = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => 'genbank') or die "could not open seq file $outfile\n"; > > my $in_seq = $inIO->next_seq; > my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); > $outIO->write_seq($out_seq); > exit; > > > LOCUS test_sequence 57303 bp DNA linear UNA > DEFINITION Sequence to demonstrate unexpected behavior of trunc_with_features > ACCESSION unknown > KEYWORDS . > FEATURES Location/Qualifiers > source 1..10 > /mol_type="genomic DNA" > gene<1..5 > /gene="test" > CDS<1..5 > /product="hypothetical protein" > ORIGIN > 1 caagattaaa > // > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From longbow0 at gmail.com Tue Apr 10 00:40:16 2012 From: longbow0 at gmail.com (longbow leo) Date: Mon, 9 Apr 2012 23:40:16 -0500 Subject: [Bioperl-l] Strange behavior of $node->height for scientific notation format branch length Message-ID: Hi all, I have encountered a strange behavior while calculating the tree height at root node. If the branch length of the tree was in scientific notation format, such as MrBayes created trees, it is unable to give correct results. For example, Tree 1: (((A:0.02,B:0.025):0.12,C:0.071):0.34,D:0.6); Tree 2: (((A:2e-2,B:2.5e-2):1.2e-1,C:7.1e-2):3.4e-1,D:6e-1); These two trees are identical besides the expression of branch length. The Perl script: # ============================================================ #!/usr/bin/perl use 5.010; use strict; use warnings; use Bio::TreeIO; my $usage = << "EOS"; Display branch lengths for leave nodes. Usage: t_branchlen.pl [] Params: : Tree file. : Tree format. Optional. Default "newick". EOS my ($ftre, $fmt) = @ARGV; die $usage unless ( defined $ftre ); $fmt = 'newick' unless ( defined $fmt); my $o_treei = Bio::TreeIO->new( -file => $ftre, -format => $fmt, ); my $o_tree = $o_treei->next_tree; my @o_leaves = $o_tree->get_leaf_nodes(); say join("\t", ("Node", "Branch Length", "Depth")); for my $o_node ( @o_leaves ) { say $o_node->id, "\t", $o_node->branch_length, "\t", $o_node->depth; } my $o_root = $o_tree->get_root_node; # say; say "Root height:\t", $o_root->height; exit 0; # ============================================================ For tree 1, the output is: Node Branch Length Depth A 0.02 0.48 B 0.025 0.485 C 0.071 0.411 D 0.6 0.6 *Root height: 0.6* For tree 2, Node Branch Length Depth A 2e-2 0.48 B 2.5e-2 0.485 C 7.1e-2 0.411 D 6e-1 0.6 *Root height: 3* The interesting thing is, the node depth values are correct, but I have no idea how the root height calculated. Are there any ideas to resolve this problem? Thanks! Haizhou From jason.stajich at gmail.com Tue Apr 10 02:33:00 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 9 Apr 2012 23:33:00 -0700 Subject: [Bioperl-l] Strange behavior of $node->height for scientific notation format branch length In-Reply-To: References: Message-ID: <1839F94F-178E-44F2-8A5C-6E2657AAD59C@gmail.com> It also looks like there is some code in calculating height that only processes numbers that are floating point - see line 64. I am not sure why this is in there, but I guess it was a protection from something that was failing in some other situation. 62: foreach my $subnode ( $self->each_Descendent ) { 63: my $bl = $subnode->branch_length; 64: $bl = 1 unless (defined $bl && $bl =~ /^\-?\d+(\.\d+)?$/); 65: my $s = $subnode->height + $bl; you can work around this by first forcing all your branch lengths to floating point after you read the tree in: for my $node ($tree->get_all_nodes ) $node->branch_length(sprintf("%f",$node->branch_length); } We should think about how we might handle scientific notation branch lengths properly in the code in the future if someone wants to take this on. Jason > Hi all, > > I have encountered a strange behavior while calculating the tree height at > root node. > > If the branch length of the tree was in scientific notation format, such as > MrBayes created trees, it is unable to give correct results. > > For example, > > Tree 1: > > (((A:0.02,B:0.025):0.12,C:0.071):0.34,D:0.6); > > Tree 2: > > (((A:2e-2,B:2.5e-2):1.2e-1,C:7.1e-2):3.4e-1,D:6e-1); > > These two trees are identical besides the expression of branch length. > > The Perl script: > > # ============================================================ > > #!/usr/bin/perl > > use 5.010; > use strict; > use warnings; > > use Bio::TreeIO; > > my $usage = << "EOS"; > Display branch lengths for leave nodes. > Usage: > t_branchlen.pl [] > Params: > : Tree file. > : Tree format. Optional. Default "newick". > EOS > > my ($ftre, $fmt) = @ARGV; > > die $usage unless ( defined $ftre ); > > $fmt = 'newick' unless ( defined $fmt); > > my $o_treei = Bio::TreeIO->new( > -file => $ftre, > -format => $fmt, > ); > > my $o_tree = $o_treei->next_tree; > > my @o_leaves = $o_tree->get_leaf_nodes(); > > say join("\t", ("Node", "Branch Length", "Depth")); > > for my $o_node ( @o_leaves ) { > say $o_node->id, "\t", $o_node->branch_length, "\t", $o_node->depth; > } > > my $o_root = $o_tree->get_root_node; > > # say; > > say "Root height:\t", $o_root->height; > > exit 0; > > # ============================================================ > > For tree 1, the output is: > > Node Branch Length Depth > A 0.02 0.48 > B 0.025 0.485 > C 0.071 0.411 > D 0.6 0.6 > *Root height: 0.6* > > For tree 2, > > Node Branch Length Depth > A 2e-2 0.48 > B 2.5e-2 0.485 > C 7.1e-2 0.411 > D 6e-1 0.6 > *Root height: 3* > > The interesting thing is, the node depth values are correct, but I have no > idea how the root height calculated. > > Are there any ideas to resolve this problem? > > Thanks! > > Haizhou > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From fs5 at sanger.ac.uk Tue Apr 10 04:42:54 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 10 Apr 2012 09:42:54 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> Message-ID: <4F83F28E.4080000@sanger.ac.uk> Hi Heath, Yes, I just had a look too and it's true that it would currently ignore the original type. I had added some new methods (delete, insert, ligate) and with those the location type is preserved but not with the already existing methods like trunc_with_features. I will look into it when I have some time and make some changes. Cheers, Frank On 09/04/12 22:37, Heath O'Brien wrote: > Hi Frank, > > I just tried it with the latest version from bioperl-live, and it worked > the way I described in my email. > > all good things, > Heath > > > On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: > >> Hi Heath, >> >> I have recently worked a bit on that module and contributed the code >> to bioperl-live. I think this behaviour may already have changed but >> I'm not 100% sure at the moment. When I have some time I will review >> the code to confirm. In the meantime, you could give it a go with the >> bioperl-live version if that's an option for you? >> >> Cheers, >> >> Frank >> >> >> On 03/04/12 17:56, Heath O'Brien wrote: >>> Hi All, >>> >>> I've encountered a bug in the trunc_with_features function in >>> SeqUtils.pm, or at >>> least behavior that was unexpected to me: >>> >>> Features with fuzzy coordinates in the original sequence are >>> converted to exact >>> coordinates in the truncated sequence. For example, the script below >>> changes the >>> coordinates for the feature from<1..5 to 1..5. >>> >>> I have modified the code to change this behavior on my system, but I >>> thought I'd >>> post something here in case others encounter the same problem. >>> >>> all good things, >>> Heath >>> >>> >>> >>> #!/usr/bin/perl -w >>> >>> use strict; >>> use warnings; >>> use Bio::SeqIO; >>> use Bio::SeqUtils; >>> >>> my $infile= shift; >>> >>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>> '-format' => 'genbank') or die "could not open seq file $infile\n"; >>> >>> my $outfile = $infile . '_out.gbk'; >>> >>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>> '-format' => 'genbank') or die "could not open seq file $outfile\n"; >>> >>> my $in_seq = $inIO->next_seq; >>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>> $outIO->write_seq($out_seq); >>> exit; >>> >>> >>> LOCUS test_sequence 57303 bp DNA linear UNA >>> DEFINITION Sequence to demonstrate unexpected behavior of >>> trunc_with_features >>> ACCESSION unknown >>> KEYWORDS . >>> FEATURES Location/Qualifiers >>> source 1..10 >>> /mol_type="genomic DNA" >>> gene<1..5 >>> /gene="test" >>> CDS<1..5 >>> /product="hypothetical protein" >>> ORIGIN >>> 1 caagattaaa >>> // >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a >> company registered in England with number 2742969, whose registered >> office is 215 Euston Road, London, NW1 2BE. > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From awitney at sgul.ac.uk Tue Apr 10 05:11:51 2012 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 10 Apr 2012 10:11:51 +0100 Subject: [Bioperl-l] Output of a BLAST parse to text file In-Reply-To: References: Message-ID: <908D24DC-1A0E-4EE1-8573-F68FB3487071@sgul.ac.uk> Hi Zac, how do you want to sort the information? if its just on num_hsps... then you will have to store the results in an array or something and then sort that before printing your output adam On 1 Apr 2012, at 04:35, Zachariah Wylde wrote: > Hi there, > > I am very new to Bioperl, so excuse me if come across as simple! I need to > write a bioperl script to extract information from BLAST results. > The script needs to count how many HSPs are on each mouse chromosome and > be written to a tab-separated table. I have this so far, but do not > understand how to > sort the information. I would much, appreciate if you could help me?? > > Yours sincerely, > > Zac Wylde > > use strict; > use warnings; > use lib "C:/Program Files (x86)/BioPerl"; > use Bio::SearchIO; > > my $infile = "Alignment_Ref_Seq.txt"; > open INFILE, $infile or die "Cannot open $infile: $!"; > > my $outfile = "assignment2.txt"; > open OUTFILE, ">$outfile" or die "Cannot open $outfile: $!"; > > > my $parser = new Bio::SearchIO(-format => 'blast', -file => > 'Alignment_Ref_Seq.txt'); > > > while (my $result = $parser->next_result){ > while (my $hit = $result->next_hit){ > while (my $hsp = $hit->next_hsp){ > if ($hit->description =~ /(mus musculus)|(mouse)/i){ > if ($hit->description =~ /chromosome (\w+)/){ > print "Hit = ", $hit->name, " \t", > "chromosome = ", $1, " \t", > "HSPs = ", $hit->num_hsps, "\n"; > } > } > } > } > } > > close INFILE; > close OUTFILE; > > #unknown > #chromosome from > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Tue Apr 10 07:10:36 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 10 Apr 2012 12:10:36 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F83F28E.4080000@sanger.ac.uk> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> Message-ID: <4F84152C.7030300@gmail.com> Hi Heath, Frank, This was probably my fault back in the mists of time. Looks like an easy fix though, I've reported the issue on Redmine and submitted a patch: https://redmine.open-bio.org/issues/3339 We should probably also add Heath's example as a test case. Cheers, Roy. On 10/04/2012 09:42, Frank Schwach wrote: > Hi Heath, > > Yes, I just had a look too and it's true that it would currently ignore > the original type. I had added some new methods (delete, insert, ligate) > and with those the location type is preserved but not with the already > existing methods like trunc_with_features. I will look into it when I > have some time and make some changes. > > Cheers, > > Frank > > > On 09/04/12 22:37, Heath O'Brien wrote: >> Hi Frank, >> >> I just tried it with the latest version from bioperl-live, and it worked >> the way I described in my email. >> >> all good things, >> Heath >> >> >> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >> >>> Hi Heath, >>> >>> I have recently worked a bit on that module and contributed the code >>> to bioperl-live. I think this behaviour may already have changed but >>> I'm not 100% sure at the moment. When I have some time I will review >>> the code to confirm. In the meantime, you could give it a go with the >>> bioperl-live version if that's an option for you? >>> >>> Cheers, >>> >>> Frank >>> >>> >>> On 03/04/12 17:56, Heath O'Brien wrote: >>>> Hi All, >>>> >>>> I've encountered a bug in the trunc_with_features function in >>>> SeqUtils.pm, or at >>>> least behavior that was unexpected to me: >>>> >>>> Features with fuzzy coordinates in the original sequence are >>>> converted to exact >>>> coordinates in the truncated sequence. For example, the script below >>>> changes the >>>> coordinates for the feature from<1..5 to 1..5. >>>> >>>> I have modified the code to change this behavior on my system, but I >>>> thought I'd >>>> post something here in case others encounter the same problem. >>>> >>>> all good things, >>>> Heath >>>> >>>> >>>> >>>> #!/usr/bin/perl -w >>>> >>>> use strict; >>>> use warnings; >>>> use Bio::SeqIO; >>>> use Bio::SeqUtils; >>>> >>>> my $infile= shift; >>>> >>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>> '-format' => 'genbank') or die "could not open seq file $infile\n"; >>>> >>>> my $outfile = $infile . '_out.gbk'; >>>> >>>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>>> '-format' => 'genbank') or die "could not open seq file $outfile\n"; >>>> >>>> my $in_seq = $inIO->next_seq; >>>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>> $outIO->write_seq($out_seq); >>>> exit; >>>> >>>> >>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>> DEFINITION Sequence to demonstrate unexpected behavior of >>>> trunc_with_features >>>> ACCESSION unknown >>>> KEYWORDS . >>>> FEATURES Location/Qualifiers >>>> source 1..10 >>>> /mol_type="genomic DNA" >>>> gene<1..5 >>>> /gene="test" >>>> CDS<1..5 >>>> /product="hypothetical protein" >>>> ORIGIN >>>> 1 caagattaaa >>>> // >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome Research >>> Limited, a charity registered in England with number 1021457 and a >>> company registered in England with number 2742969, whose registered >>> office is 215 Euston Road, London, NW1 2BE. >> > > From roy.chaudhuri at gmail.com Tue Apr 10 10:45:21 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 10 Apr 2012 15:45:21 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F841EF3.6000603@sanger.ac.uk> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> Message-ID: <4F844781.90005@gmail.com> Turns out I spoke too soon, I added in some new tests and they highlighted problems with both trunc_with_features and revcom_with_features. I think I have resolved all the issues in the most recent Redmine patch - Frank, Heath, please could you check that it works for you? Cheers, Roy. On 10/04/2012 12:52, Frank Schwach wrote: > Brilliant, thanks Roy! > Frank > > > On 10/04/12 12:10, Roy Chaudhuri wrote: >> Hi Heath, Frank, >> >> This was probably my fault back in the mists of time. Looks like an easy >> fix though, I've reported the issue on Redmine and submitted a patch: >> https://redmine.open-bio.org/issues/3339 >> >> We should probably also add Heath's example as a test case. >> >> Cheers, >> Roy. >> >> On 10/04/2012 09:42, Frank Schwach wrote: >>> Hi Heath, >>> >>> Yes, I just had a look too and it's true that it would currently ignore >>> the original type. I had added some new methods (delete, insert, ligate) >>> and with those the location type is preserved but not with the already >>> existing methods like trunc_with_features. I will look into it when I >>> have some time and make some changes. >>> >>> Cheers, >>> >>> Frank >>> >>> >>> On 09/04/12 22:37, Heath O'Brien wrote: >>>> Hi Frank, >>>> >>>> I just tried it with the latest version from bioperl-live, and it worked >>>> the way I described in my email. >>>> >>>> all good things, >>>> Heath >>>> >>>> >>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>> >>>>> Hi Heath, >>>>> >>>>> I have recently worked a bit on that module and contributed the code >>>>> to bioperl-live. I think this behaviour may already have changed but >>>>> I'm not 100% sure at the moment. When I have some time I will review >>>>> the code to confirm. In the meantime, you could give it a go with the >>>>> bioperl-live version if that's an option for you? >>>>> >>>>> Cheers, >>>>> >>>>> Frank >>>>> >>>>> >>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>> Hi All, >>>>>> >>>>>> I've encountered a bug in the trunc_with_features function in >>>>>> SeqUtils.pm, or at >>>>>> least behavior that was unexpected to me: >>>>>> >>>>>> Features with fuzzy coordinates in the original sequence are >>>>>> converted to exact >>>>>> coordinates in the truncated sequence. For example, the script below >>>>>> changes the >>>>>> coordinates for the feature from<1..5 to 1..5. >>>>>> >>>>>> I have modified the code to change this behavior on my system, but I >>>>>> thought I'd >>>>>> post something here in case others encounter the same problem. >>>>>> >>>>>> all good things, >>>>>> Heath >>>>>> >>>>>> >>>>>> >>>>>> #!/usr/bin/perl -w >>>>>> >>>>>> use strict; >>>>>> use warnings; >>>>>> use Bio::SeqIO; >>>>>> use Bio::SeqUtils; >>>>>> >>>>>> my $infile= shift; >>>>>> >>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>> '-format' => 'genbank') or die "could not open seq file $infile\n"; >>>>>> >>>>>> my $outfile = $infile . '_out.gbk'; >>>>>> >>>>>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>>>>> '-format' => 'genbank') or die "could not open seq file $outfile\n"; >>>>>> >>>>>> my $in_seq = $inIO->next_seq; >>>>>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>> $outIO->write_seq($out_seq); >>>>>> exit; >>>>>> >>>>>> >>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>> DEFINITION Sequence to demonstrate unexpected behavior of >>>>>> trunc_with_features >>>>>> ACCESSION unknown >>>>>> KEYWORDS . >>>>>> FEATURES Location/Qualifiers >>>>>> source 1..10 >>>>>> /mol_type="genomic DNA" >>>>>> gene<1..5 >>>>>> /gene="test" >>>>>> CDS<1..5 >>>>>> /product="hypothetical protein" >>>>>> ORIGIN >>>>>> 1 caagattaaa >>>>>> // >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> -- >>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>> Limited, a charity registered in England with number 1021457 and a >>>>> company registered in England with number 2742969, whose registered >>>>> office is 215 Euston Road, London, NW1 2BE. >>>> >>> >>> >> > > From heath.obrien at gmail.com Tue Apr 10 11:34:59 2012 From: heath.obrien at gmail.com (Heath O'Brien) Date: Tue, 10 Apr 2012 11:34:59 -0400 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F844781.90005@gmail.com> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> <4F844781.90005@gmail.com> Message-ID: <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> Works perfect for me. Thanks! all good things, Heath On 10-Apr-12, at 10:45 AM, Roy Chaudhuri wrote: > Turns out I spoke too soon, I added in some new tests and they > highlighted problems with both trunc_with_features and > revcom_with_features. I think I have resolved all the issues in the > most recent Redmine patch - Frank, Heath, please could you check > that it works for you? > > Cheers, > Roy. > > On 10/04/2012 12:52, Frank Schwach wrote: >> Brilliant, thanks Roy! >> Frank >> >> >> On 10/04/12 12:10, Roy Chaudhuri wrote: >>> Hi Heath, Frank, >>> >>> This was probably my fault back in the mists of time. Looks like >>> an easy >>> fix though, I've reported the issue on Redmine and submitted a >>> patch: >>> https://redmine.open-bio.org/issues/3339 >>> >>> We should probably also add Heath's example as a test case. >>> >>> Cheers, >>> Roy. >>> >>> On 10/04/2012 09:42, Frank Schwach wrote: >>>> Hi Heath, >>>> >>>> Yes, I just had a look too and it's true that it would currently >>>> ignore >>>> the original type. I had added some new methods (delete, insert, >>>> ligate) >>>> and with those the location type is preserved but not with the >>>> already >>>> existing methods like trunc_with_features. I will look into it >>>> when I >>>> have some time and make some changes. >>>> >>>> Cheers, >>>> >>>> Frank >>>> >>>> >>>> On 09/04/12 22:37, Heath O'Brien wrote: >>>>> Hi Frank, >>>>> >>>>> I just tried it with the latest version from bioperl-live, and >>>>> it worked >>>>> the way I described in my email. >>>>> >>>>> all good things, >>>>> Heath >>>>> >>>>> >>>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>>> >>>>>> Hi Heath, >>>>>> >>>>>> I have recently worked a bit on that module and contributed the >>>>>> code >>>>>> to bioperl-live. I think this behaviour may already have >>>>>> changed but >>>>>> I'm not 100% sure at the moment. When I have some time I will >>>>>> review >>>>>> the code to confirm. In the meantime, you could give it a go >>>>>> with the >>>>>> bioperl-live version if that's an option for you? >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> I've encountered a bug in the trunc_with_features function in >>>>>>> SeqUtils.pm, or at >>>>>>> least behavior that was unexpected to me: >>>>>>> >>>>>>> Features with fuzzy coordinates in the original sequence are >>>>>>> converted to exact >>>>>>> coordinates in the truncated sequence. For example, the script >>>>>>> below >>>>>>> changes the >>>>>>> coordinates for the feature from<1..5 to 1..5. >>>>>>> >>>>>>> I have modified the code to change this behavior on my system, >>>>>>> but I >>>>>>> thought I'd >>>>>>> post something here in case others encounter the same problem. >>>>>>> >>>>>>> all good things, >>>>>>> Heath >>>>>>> >>>>>>> >>>>>>> >>>>>>> #!/usr/bin/perl -w >>>>>>> >>>>>>> use strict; >>>>>>> use warnings; >>>>>>> use Bio::SeqIO; >>>>>>> use Bio::SeqUtils; >>>>>>> >>>>>>> my $infile= shift; >>>>>>> >>>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>>> '-format' => 'genbank') or die "could not open seq file >>>>>>> $infile\n"; >>>>>>> >>>>>>> my $outfile = $infile . '_out.gbk'; >>>>>>> >>>>>>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>>>>>> '-format' => 'genbank') or die "could not open seq file >>>>>>> $outfile\n"; >>>>>>> >>>>>>> my $in_seq = $inIO->next_seq; >>>>>>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>>> $outIO->write_seq($out_seq); >>>>>>> exit; >>>>>>> >>>>>>> >>>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>>> DEFINITION Sequence to demonstrate unexpected behavior of >>>>>>> trunc_with_features >>>>>>> ACCESSION unknown >>>>>>> KEYWORDS . >>>>>>> FEATURES Location/Qualifiers >>>>>>> source 1..10 >>>>>>> /mol_type="genomic DNA" >>>>>>> gene<1..5 >>>>>>> /gene="test" >>>>>>> CDS<1..5 >>>>>>> /product="hypothetical protein" >>>>>>> ORIGIN >>>>>>> 1 caagattaaa >>>>>>> // >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> -- >>>>>> The Wellcome Trust Sanger Institute is operated by Genome >>>>>> Research >>>>>> Limited, a charity registered in England with number 1021457 >>>>>> and a >>>>>> company registered in England with number 2742969, whose >>>>>> registered >>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>> >>>> >>>> >>> >> >> > From cjfields at illinois.edu Tue Apr 10 13:08:45 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 10 Apr 2012 17:08:45 +0000 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> <4F844781.90005@gmail.com> <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> Message-ID: I have committed these to bioperl-live, they passed tests for me. I have left the bug report open, however, in case more work needs to be done. Roy, did you want to close that when you are ready? chris On Apr 10, 2012, at 10:34 AM, Heath O'Brien wrote: > Works perfect for me. Thanks! > > all good things, > Heath > > On 10-Apr-12, at 10:45 AM, Roy Chaudhuri wrote: > >> Turns out I spoke too soon, I added in some new tests and they highlighted problems with both trunc_with_features and revcom_with_features. I think I have resolved all the issues in the most recent Redmine patch - Frank, Heath, please could you check that it works for you? >> >> Cheers, >> Roy. >> >> On 10/04/2012 12:52, Frank Schwach wrote: >>> Brilliant, thanks Roy! >>> Frank >>> >>> >>> On 10/04/12 12:10, Roy Chaudhuri wrote: >>>> Hi Heath, Frank, >>>> >>>> This was probably my fault back in the mists of time. Looks like an easy >>>> fix though, I've reported the issue on Redmine and submitted a patch: >>>> https://redmine.open-bio.org/issues/3339 >>>> >>>> We should probably also add Heath's example as a test case. >>>> >>>> Cheers, >>>> Roy. >>>> >>>> On 10/04/2012 09:42, Frank Schwach wrote: >>>>> Hi Heath, >>>>> >>>>> Yes, I just had a look too and it's true that it would currently ignore >>>>> the original type. I had added some new methods (delete, insert, ligate) >>>>> and with those the location type is preserved but not with the already >>>>> existing methods like trunc_with_features. I will look into it when I >>>>> have some time and make some changes. >>>>> >>>>> Cheers, >>>>> >>>>> Frank >>>>> >>>>> >>>>> On 09/04/12 22:37, Heath O'Brien wrote: >>>>>> Hi Frank, >>>>>> >>>>>> I just tried it with the latest version from bioperl-live, and it worked >>>>>> the way I described in my email. >>>>>> >>>>>> all good things, >>>>>> Heath >>>>>> >>>>>> >>>>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>>>> >>>>>>> Hi Heath, >>>>>>> >>>>>>> I have recently worked a bit on that module and contributed the code >>>>>>> to bioperl-live. I think this behaviour may already have changed but >>>>>>> I'm not 100% sure at the moment. When I have some time I will review >>>>>>> the code to confirm. In the meantime, you could give it a go with the >>>>>>> bioperl-live version if that's an option for you? >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> I've encountered a bug in the trunc_with_features function in >>>>>>>> SeqUtils.pm, or at >>>>>>>> least behavior that was unexpected to me: >>>>>>>> >>>>>>>> Features with fuzzy coordinates in the original sequence are >>>>>>>> converted to exact >>>>>>>> coordinates in the truncated sequence. For example, the script below >>>>>>>> changes the >>>>>>>> coordinates for the feature from<1..5 to 1..5. >>>>>>>> >>>>>>>> I have modified the code to change this behavior on my system, but I >>>>>>>> thought I'd >>>>>>>> post something here in case others encounter the same problem. >>>>>>>> >>>>>>>> all good things, >>>>>>>> Heath >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> #!/usr/bin/perl -w >>>>>>>> >>>>>>>> use strict; >>>>>>>> use warnings; >>>>>>>> use Bio::SeqIO; >>>>>>>> use Bio::SeqUtils; >>>>>>>> >>>>>>>> my $infile= shift; >>>>>>>> >>>>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>>>> '-format' => 'genbank') or die "could not open seq file $infile\n"; >>>>>>>> >>>>>>>> my $outfile = $infile . '_out.gbk'; >>>>>>>> >>>>>>>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>>>>>>> '-format' => 'genbank') or die "could not open seq file $outfile\n"; >>>>>>>> >>>>>>>> my $in_seq = $inIO->next_seq; >>>>>>>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>>>> $outIO->write_seq($out_seq); >>>>>>>> exit; >>>>>>>> >>>>>>>> >>>>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>>>> DEFINITION Sequence to demonstrate unexpected behavior of >>>>>>>> trunc_with_features >>>>>>>> ACCESSION unknown >>>>>>>> KEYWORDS . >>>>>>>> FEATURES Location/Qualifiers >>>>>>>> source 1..10 >>>>>>>> /mol_type="genomic DNA" >>>>>>>> gene<1..5 >>>>>>>> /gene="test" >>>>>>>> CDS<1..5 >>>>>>>> /product="hypothetical protein" >>>>>>>> ORIGIN >>>>>>>> 1 caagattaaa >>>>>>>> // >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>>>> Limited, a charity registered in England with number 1021457 and a >>>>>>> company registered in England with number 2742969, whose registered >>>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>>> >>>>> >>>>> >>>> >>> >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Tue Apr 10 16:07:28 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 10 Apr 2012 21:07:28 +0100 Subject: [Bioperl-l] sequence proxy server In-Reply-To: References: Message-ID: On Fri, Apr 6, 2012 at 2:49 PM, Kevin Murray wrote: > Hi all, > > I'm an undergrad student in molecular biology at the ANU in Australia, > and my research projects are becoming increasingly bioinformatics > heavy. The latest one has involved quite a large amount of sequence > retrieval from GenBank and GenPept. The download speed to Australia > from NCBI's servers is rather slow, and i've been thinking about how > we can improve this. ...So, i though about writing a "sequence proxy" ... Have you tried TogoWS? It is based Japan and offers access to some of the local databases but also proxies some important EMBL/EBI and NCBI resources as well - including GenBank. I would expect you'd get much faster response times from Australia than talking directly to the NCBI. http://togows.dbcls.jp/site/en/rest.html I think the TogoWS REST API is very nice to use, and seems to give much clearer error messages than the NCBI Entrez site (TogoWS uses HTTP error codes pretty consistently). Biopython 1.59 onwards has a simple API for the TogoWS REST interface, but their URL structure is very easy, so for a simple one off task you can easily roll your own in Perl (or write one for BioPerl?). Peter From cjfields at illinois.edu Tue Apr 10 21:20:48 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 11 Apr 2012 01:20:48 +0000 Subject: [Bioperl-l] sequence proxy server In-Reply-To: References: Message-ID: On Apr 10, 2012, at 3:07 PM, Peter Cock wrote: > On Fri, Apr 6, 2012 at 2:49 PM, Kevin Murray wrote: >> Hi all, >> >> I'm an undergrad student in molecular biology at the ANU in Australia, >> and my research projects are becoming increasingly bioinformatics >> heavy. The latest one has involved quite a large amount of sequence >> retrieval from GenBank and GenPept. The download speed to Australia >> from NCBI's servers is rather slow, and i've been thinking about how >> we can improve this. ...So, i though about writing a "sequence proxy" ... > > Have you tried TogoWS? It is based Japan and offers access to > some of the local databases but also proxies some important > EMBL/EBI and NCBI resources as well - including GenBank. > I would expect you'd get much faster response times from > Australia than talking directly to the NCBI. > http://togows.dbcls.jp/site/en/rest.html > > I think the TogoWS REST API is very nice to use, and seems to > give much clearer error messages than the NCBI Entrez site > (TogoWS uses HTTP error codes pretty consistently). > > Biopython 1.59 onwards has a simple API for the TogoWS > REST interface, but their URL structure is very easy, so for > a simple one off task you can easily roll your own in Perl > (or write one for BioPerl?). > > Peter Should be easy enough if the API is well-documented. Related to this, anyone know if NCBI's REST API is documented anywhere? chris From roy.chaudhuri at gmail.com Wed Apr 11 06:55:49 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 11 Apr 2012 11:55:49 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> <4F844781.90005@gmail.com> <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> Message-ID: <4F856335.1000503@gmail.com> Hi Chris, I think it should be fine to close, but my account doesn't have permission to do so. Cheers, Roy. On 10/04/2012 18:08, Fields, Christopher J wrote: > I have committed these to bioperl-live, they passed tests for me. I > have left the bug report open, however, in case more work needs to be > done. Roy, did you want to close that when you are ready? > > chris > > On Apr 10, 2012, at 10:34 AM, Heath O'Brien wrote: > >> Works perfect for me. Thanks! >> >> all good things, Heath >> >> On 10-Apr-12, at 10:45 AM, Roy Chaudhuri wrote: >> >>> Turns out I spoke too soon, I added in some new tests and they >>> highlighted problems with both trunc_with_features and >>> revcom_with_features. I think I have resolved all the issues in >>> the most recent Redmine patch - Frank, Heath, please could you >>> check that it works for you? >>> >>> Cheers, Roy. >>> >>> On 10/04/2012 12:52, Frank Schwach wrote: >>>> Brilliant, thanks Roy! Frank >>>> >>>> >>>> On 10/04/12 12:10, Roy Chaudhuri wrote: >>>>> Hi Heath, Frank, >>>>> >>>>> This was probably my fault back in the mists of time. Looks >>>>> like an easy fix though, I've reported the issue on Redmine >>>>> and submitted a patch: >>>>> https://redmine.open-bio.org/issues/3339 >>>>> >>>>> We should probably also add Heath's example as a test case. >>>>> >>>>> Cheers, Roy. >>>>> >>>>> On 10/04/2012 09:42, Frank Schwach wrote: >>>>>> Hi Heath, >>>>>> >>>>>> Yes, I just had a look too and it's true that it would >>>>>> currently ignore the original type. I had added some new >>>>>> methods (delete, insert, ligate) and with those the >>>>>> location type is preserved but not with the already >>>>>> existing methods like trunc_with_features. I will look into >>>>>> it when I have some time and make some changes. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> On 09/04/12 22:37, Heath O'Brien wrote: >>>>>>> Hi Frank, >>>>>>> >>>>>>> I just tried it with the latest version from >>>>>>> bioperl-live, and it worked the way I described in my >>>>>>> email. >>>>>>> >>>>>>> all good things, Heath >>>>>>> >>>>>>> >>>>>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>>>>> >>>>>>>> Hi Heath, >>>>>>>> >>>>>>>> I have recently worked a bit on that module and >>>>>>>> contributed the code to bioperl-live. I think this >>>>>>>> behaviour may already have changed but I'm not 100% >>>>>>>> sure at the moment. When I have some time I will >>>>>>>> review the code to confirm. In the meantime, you could >>>>>>>> give it a go with the bioperl-live version if that's an >>>>>>>> option for you? >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Frank >>>>>>>> >>>>>>>> >>>>>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> I've encountered a bug in the trunc_with_features >>>>>>>>> function in SeqUtils.pm, or at least behavior that >>>>>>>>> was unexpected to me: >>>>>>>>> >>>>>>>>> Features with fuzzy coordinates in the original >>>>>>>>> sequence are converted to exact coordinates in the >>>>>>>>> truncated sequence. For example, the script below >>>>>>>>> changes the coordinates for the feature from<1..5 to >>>>>>>>> 1..5. >>>>>>>>> >>>>>>>>> I have modified the code to change this behavior on >>>>>>>>> my system, but I thought I'd post something here in >>>>>>>>> case others encounter the same problem. >>>>>>>>> >>>>>>>>> all good things, Heath >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> #!/usr/bin/perl -w >>>>>>>>> >>>>>>>>> use strict; use warnings; use Bio::SeqIO; use >>>>>>>>> Bio::SeqUtils; >>>>>>>>> >>>>>>>>> my $infile= shift; >>>>>>>>> >>>>>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>>>>> '-format' => 'genbank') or die "could not open seq >>>>>>>>> file $infile\n"; >>>>>>>>> >>>>>>>>> my $outfile = $infile . '_out.gbk'; >>>>>>>>> >>>>>>>>> my $outIO = Bio::SeqIO->new('-file' => >>>>>>>>> ">$outfile", '-format' => 'genbank') or die "could >>>>>>>>> not open seq file $outfile\n"; >>>>>>>>> >>>>>>>>> my $in_seq = $inIO->next_seq; my $out_seq = >>>>>>>>> Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>>>>> $outIO->write_seq($out_seq); exit; >>>>>>>>> >>>>>>>>> >>>>>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>>>>> DEFINITION Sequence to demonstrate unexpected >>>>>>>>> behavior of trunc_with_features ACCESSION unknown >>>>>>>>> KEYWORDS . FEATURES Location/Qualifiers source 1..10 >>>>>>>>> /mol_type="genomic DNA" gene<1..5 /gene="test" >>>>>>>>> CDS<1..5 /product="hypothetical protein" ORIGIN 1 >>>>>>>>> caagattaaa // >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> -- The Wellcome Trust Sanger Institute is operated by >>>>>>>> Genome Research Limited, a charity registered in >>>>>>>> England with number 1021457 and a company registered in >>>>>>>> England with number 2742969, whose registered office is >>>>>>>> 215 Euston Road, London, NW1 2BE. >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> _______________________________________________ Bioperl-l mailing >> list Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed Apr 11 11:28:38 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 11 Apr 2012 15:28:38 +0000 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F856335.1000503@gmail.com> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> <4F844781.90005@gmail.com> <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> <4F856335.1000503@gmail.com> Message-ID: Okay, closed it. Thanks again! chris On Apr 11, 2012, at 5:55 AM, Roy Chaudhuri wrote: > Hi Chris, > > I think it should be fine to close, but my account doesn't have permission to do so. > > Cheers, > Roy. > > On 10/04/2012 18:08, Fields, Christopher J wrote: >> I have committed these to bioperl-live, they passed tests for me. I >> have left the bug report open, however, in case more work needs to be >> done. Roy, did you want to close that when you are ready? >> >> chris >> >> On Apr 10, 2012, at 10:34 AM, Heath O'Brien wrote: >> >>> Works perfect for me. Thanks! >>> >>> all good things, Heath >>> >>> On 10-Apr-12, at 10:45 AM, Roy Chaudhuri wrote: >>> >>>> Turns out I spoke too soon, I added in some new tests and they >>>> highlighted problems with both trunc_with_features and >>>> revcom_with_features. I think I have resolved all the issues in >>>> the most recent Redmine patch - Frank, Heath, please could you >>>> check that it works for you? >>>> >>>> Cheers, Roy. >>>> >>>> On 10/04/2012 12:52, Frank Schwach wrote: >>>>> Brilliant, thanks Roy! Frank >>>>> >>>>> >>>>> On 10/04/12 12:10, Roy Chaudhuri wrote: >>>>>> Hi Heath, Frank, >>>>>> >>>>>> This was probably my fault back in the mists of time. Looks >>>>>> like an easy fix though, I've reported the issue on Redmine >>>>>> and submitted a patch: >>>>>> https://redmine.open-bio.org/issues/3339 >>>>>> >>>>>> We should probably also add Heath's example as a test case. >>>>>> >>>>>> Cheers, Roy. >>>>>> >>>>>> On 10/04/2012 09:42, Frank Schwach wrote: >>>>>>> Hi Heath, >>>>>>> >>>>>>> Yes, I just had a look too and it's true that it would >>>>>>> currently ignore the original type. I had added some new >>>>>>> methods (delete, insert, ligate) and with those the >>>>>>> location type is preserved but not with the already >>>>>>> existing methods like trunc_with_features. I will look into >>>>>>> it when I have some time and make some changes. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> On 09/04/12 22:37, Heath O'Brien wrote: >>>>>>>> Hi Frank, >>>>>>>> >>>>>>>> I just tried it with the latest version from >>>>>>>> bioperl-live, and it worked the way I described in my >>>>>>>> email. >>>>>>>> >>>>>>>> all good things, Heath >>>>>>>> >>>>>>>> >>>>>>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>>>>>> >>>>>>>>> Hi Heath, >>>>>>>>> >>>>>>>>> I have recently worked a bit on that module and >>>>>>>>> contributed the code to bioperl-live. I think this >>>>>>>>> behaviour may already have changed but I'm not 100% >>>>>>>>> sure at the moment. When I have some time I will >>>>>>>>> review the code to confirm. In the meantime, you could >>>>>>>>> give it a go with the bioperl-live version if that's an >>>>>>>>> option for you? >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Frank >>>>>>>>> >>>>>>>>> >>>>>>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> I've encountered a bug in the trunc_with_features >>>>>>>>>> function in SeqUtils.pm, or at least behavior that >>>>>>>>>> was unexpected to me: >>>>>>>>>> >>>>>>>>>> Features with fuzzy coordinates in the original >>>>>>>>>> sequence are converted to exact coordinates in the >>>>>>>>>> truncated sequence. For example, the script below >>>>>>>>>> changes the coordinates for the feature from<1..5 to >>>>>>>>>> 1..5. >>>>>>>>>> >>>>>>>>>> I have modified the code to change this behavior on >>>>>>>>>> my system, but I thought I'd post something here in >>>>>>>>>> case others encounter the same problem. >>>>>>>>>> >>>>>>>>>> all good things, Heath >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> #!/usr/bin/perl -w >>>>>>>>>> >>>>>>>>>> use strict; use warnings; use Bio::SeqIO; use >>>>>>>>>> Bio::SeqUtils; >>>>>>>>>> >>>>>>>>>> my $infile= shift; >>>>>>>>>> >>>>>>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>>>>>> '-format' => 'genbank') or die "could not open seq >>>>>>>>>> file $infile\n"; >>>>>>>>>> >>>>>>>>>> my $outfile = $infile . '_out.gbk'; >>>>>>>>>> >>>>>>>>>> my $outIO = Bio::SeqIO->new('-file' => >>>>>>>>>> ">$outfile", '-format' => 'genbank') or die "could >>>>>>>>>> not open seq file $outfile\n"; >>>>>>>>>> >>>>>>>>>> my $in_seq = $inIO->next_seq; my $out_seq = >>>>>>>>>> Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>>>>>> $outIO->write_seq($out_seq); exit; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>>>>>> DEFINITION Sequence to demonstrate unexpected >>>>>>>>>> behavior of trunc_with_features ACCESSION unknown >>>>>>>>>> KEYWORDS . FEATURES Location/Qualifiers source 1..10 >>>>>>>>>> /mol_type="genomic DNA" gene<1..5 /gene="test" >>>>>>>>>> CDS<1..5 /product="hypothetical protein" ORIGIN 1 >>>>>>>>>> caagattaaa // >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>>> -- The Wellcome Trust Sanger Institute is operated by >>>>>>>>> Genome Research Limited, a charity registered in >>>>>>>>> England with number 1021457 and a company registered in >>>>>>>>> England with number 2742969, whose registered office is >>>>>>>>> 215 Euston Road, London, NW1 2BE. >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> _______________________________________________ Bioperl-l mailing >>> list Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From p.j.a.cock at googlemail.com Thu Apr 12 08:47:05 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 12 Apr 2012 13:47:05 +0100 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE In-Reply-To: References: Message-ID: On Thu, Apr 5, 2012 at 5:52 AM, afia hayati wrote: > Dear all, > > I am afia, a PhD student in Bioinformatics. ?I am so interested to > participate GSoC and would like to apply for project Bio::Assembly, Ace2Sam > and Sam2Ace converter. I have written a proposal based on the guidance for > prospective GSoC student. I paste my proposal in here. > If you have time, please give me suggestions. > Thank you very much. > > Sincerely, > Afiahayati Hello Afiahayati, What would you use this converter for? I can see it is useful to convert ACE to SAM/BAM for downstream analysis and visualization. At the moment the only assemblers I regularly use which produce ACE are the Roche 'Newbler' gsAssember, and MIRA. For MIRA, Bastien is working on native SAM output, but for the moment I wrote and maintain a converter from MIRA's alignment format (MAF) to SAM: https://github.com/peterjc/maf2sam Or is the idea more to support SAM (and BAM) assemblies within the existing BioPerl Bio::Assembly::IO: framework to allow easier manipulation from Perl? Peter From florent.angly at gmail.com Thu Apr 12 22:41:54 2012 From: florent.angly at gmail.com (Florent Angly) Date: Fri, 13 Apr 2012 12:41:54 +1000 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE In-Reply-To: References: Message-ID: <4F879272.30306@gmail.com> Bioperl has a module to read and write ACE files, Bio::Assembly::IO::ace. It also has a module to read (but not write) SAM files, Bio::Assembly::IO::sam. So, it looks like you can already do SAMtoACE within Bioperl. Implementing ACEtoSAM would involve adding write support to the Bio::Assembly::sam module. This can be helped by looking at how Bio::Assembly::IO::ace and Bio::Assembly::tigr implement write support. Regards, Florent On 12/04/12 22:47, Peter Cock wrote: > On Thu, Apr 5, 2012 at 5:52 AM, afia hayati wrote: >> Dear all, >> >> I am afia, a PhD student in Bioinformatics. I am so interested to >> participate GSoC and would like to apply for project Bio::Assembly, Ace2Sam >> and Sam2Ace converter. I have written a proposal based on the guidance for >> prospective GSoC student. I paste my proposal in here. >> If you have time, please give me suggestions. >> Thank you very much. >> >> Sincerely, >> Afiahayati > Hello Afiahayati, > > What would you use this converter for? > > I can see it is useful to convert ACE to SAM/BAM for downstream analysis > and visualization. At the moment the only assemblers I regularly use which > produce ACE are the Roche 'Newbler' gsAssember, and MIRA. > > For MIRA, Bastien is working on native SAM output, but for the moment > I wrote and maintain a converter from MIRA's alignment format (MAF) to > SAM: https://github.com/peterjc/maf2sam > > Or is the idea more to support SAM (and BAM) assemblies within the > existing BioPerl Bio::Assembly::IO: framework to allow easier > manipulation from Perl? > > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Fri Apr 13 04:32:00 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 13 Apr 2012 09:32:00 +0100 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE In-Reply-To: <4F879272.30306@gmail.com> References: <4F879272.30306@gmail.com> Message-ID: On Fri, Apr 13, 2012 at 3:41 AM, Florent Angly wrote: > Bioperl has a module to read and write ACE files, Bio::Assembly::IO::ace. It > also has a module to read (but not write) SAM files, Bio::Assembly::IO::sam. > So, it looks like you can already do SAMtoACE within Bioperl. Implementing > ACEtoSAM would involve adding write support to the Bio::Assembly::sam > module. This can be helped by looking at how Bio::Assembly::IO::ace and > Bio::Assembly::tigr implement write support. > Regards, > Florent Does SAMtoACE within Bioperl using Bio::Assembly::IO actually work? Note that proper multiple sequence alignments in SAM/BAM format are relatively rare - the vast majority of SAM/BAM files are just pairwise alignments which are not a good fit for ACE. Peter From k.d.murray.91 at gmail.com Fri Apr 13 05:31:06 2012 From: k.d.murray.91 at gmail.com (Kevin Murray) Date: Fri, 13 Apr 2012 19:31:06 +1000 Subject: [Bioperl-l] sequence proxy server In-Reply-To: References: Message-ID: Hi Chris and Peter, Thanks for the advice, it is much appreciated. I have found almost exactly what i was taking about in the bioperl scripts, github link https://github.com/bioperl/bioperl-live/blob/master/scripts/DB/bp_biofetch_genbank_proxy.pl I will have a go at porting this to use a Bio::DB::Flat cache, given that would be exactly what i envisaged. With regards to implementing a Bio::DB module for TogoWS, i may have a crack at it if no one else is (although it will probably take me a while). Are there any pointers or particular styles you guys have (other than TMTOWTDI). Cheers, Regards Kevin Murray From afia.hayati at gmail.com Sat Apr 14 20:15:11 2012 From: afia.hayati at gmail.com (afia hayati) Date: Sun, 15 Apr 2012 09:15:11 +0900 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE In-Reply-To: References: <4F879272.30306@gmail.com> Message-ID: Peter, Florent, and all, thanks for the responses. Ya.., the idea is more to support SAM assemblies within the existing Bio::Assembly::IO. SAM or ACE files once imported should have similar handles and methods. Bio::Assembly::IO::SAM is a read only. I also will try to add write support for that module. In Bio::Assembly::ACE, there are write methods, completed with the quality score, so it "looks like" we can do SAMtoACE converter. Anyway, the main point is to add write support in Bio::Assembly::SAM. Please CMIIW, I am open to corrections and suggestions. best regards, Afiahayati On Fri, Apr 13, 2012 at 5:32 PM, Peter Cock wrote: > On Fri, Apr 13, 2012 at 3:41 AM, Florent Angly > wrote: > > Bioperl has a module to read and write ACE files, > Bio::Assembly::IO::ace. It > > also has a module to read (but not write) SAM files, > Bio::Assembly::IO::sam. > > So, it looks like you can already do SAMtoACE within Bioperl. > Implementing > > ACEtoSAM would involve adding write support to the Bio::Assembly::sam > > module. This can be helped by looking at how Bio::Assembly::IO::ace and > > Bio::Assembly::tigr implement write support. > > Regards, > > Florent > > Does SAMtoACE within Bioperl using Bio::Assembly::IO actually work? > > Note that proper multiple sequence alignments in SAM/BAM format are > relatively rare - the vast majority of SAM/BAM files are just pairwise > alignments which are not a good fit for ACE. > > Peter From jovel_juan at hotmail.com Sat Apr 14 23:27:57 2012 From: jovel_juan at hotmail.com (Juan Jovel) Date: Sun, 15 Apr 2012 03:27:57 +0000 Subject: [Bioperl-l] how to get specific sub-sequence from GenBank file with Bio::DB::GenBank In-Reply-To: References: , , <4F879272.30306@gmail.com>, , Message-ID: Hello All, I want to get some subsequences from provirus sequences in the GenBank, I got the whole sequences with the script below. However, I want to get a specific sub-sequence, which appears in the GenBank files in the line: LTR 9091..9723 how can I modify my script to get only nts 9091-9723 (in this example), instead of the whole sequence. Thanks a lot in advance!________________________HERE THE SCRIPT: #!/usr/bin/perl -wuse strict;use Bio::DB::GenBank;use Bio::SeqIO; chomp(my $infile = $ARGV[0]);open(IN, "$infile") or die "$!";my @ids = ; chomp(my $outfile = $ARGV[1]); my $seqs_out = Bio::SeqIO -> new(-file => ">$outfile", -format => "fasta"); foreach my $entry(@ids){ print "$entry"; my $db = Bio::DB::GenBank->new; my $seq = $db->get_Seq_by_acc($entry); $seqs_out->write_seq($seq); }exit; From roy.chaudhuri at gmail.com Mon Apr 16 07:16:57 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 16 Apr 2012 12:16:57 +0100 Subject: [Bioperl-l] how to get specific sub-sequence from GenBank file with Bio::DB::GenBank In-Reply-To: References: , , <4F879272.30306@gmail.com>, , Message-ID: <4F8BFFA9.9030305@gmail.com> Hi Juan, If you know the LTR coordinates in advance, then you can download a specific subsequence using Bio::DB::GenBank as shown here: http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::GenBank_when_you_have_genomic_coordinates_to_get_a_Seq_object If you don't, then you will need to download the whole sequence as you are doing, but add in some code to print out just the sequence associated with the LTR feature. Something like (untested): for my $feat ($seq->get_SeqFeatures) { $seqs_out->write_seq($feat->spliced_seq) if $feat->primary_tag eq 'LTR'; } Cheers, Roy. On 15/04/2012 04:27, Juan Jovel wrote: > > > Hello All, I want to get some subsequences from provirus sequences in > the GenBank, I got the whole sequences with the script below. > However, I want to get a specific sub-sequence, which appears in the > GenBank files in the line: LTR 9091..9723 how can I > modify my script to get only nts 9091-9723 (in this example), instead > of the whole sequence. Thanks a lot in > advance!________________________HERE THE SCRIPT: #!/usr/bin/perl > -wuse strict;use Bio::DB::GenBank;use Bio::SeqIO; chomp(my $infile = > $ARGV[0]);open(IN, "$infile") or die "$!";my @ids =; chomp(my > $outfile = $ARGV[1]); my $seqs_out = Bio::SeqIO -> new(-file => > ">$outfile", -format => "fasta"); foreach my $entry(@ids){ > print "$entry"; my $db = Bio::DB::GenBank->new; my $seq = > $db->get_Seq_by_acc($entry); $seqs_out->write_seq($seq); }exit; > > > > > > > > _______________________________________________ Bioperl-l mailing > list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sharmashalu.bio at gmail.com Mon Apr 16 16:08:23 2012 From: sharmashalu.bio at gmail.com (shalu sharma) Date: Mon, 16 Apr 2012 16:08:23 -0400 Subject: [Bioperl-l] Amino Acid sequence to nucleotides sequence Message-ID: Hi All, Is there any way in Bioperl i can convert amino acid sequences to nucleotide sequences. Thanks Shalu From p.j.a.cock at googlemail.com Mon Apr 16 16:32:20 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 16 Apr 2012 21:32:20 +0100 Subject: [Bioperl-l] Amino Acid sequence to nucleotides sequence In-Reply-To: References: Message-ID: On Mon, Apr 16, 2012 at 9:08 PM, shalu sharma wrote: > Hi All, > ? ? ? ? ? ?Is there any way in Bioperl i can convert amino acid sequences > to nucleotide sequences. > > Thanks > Shalu Probably - but there is more than one answer since the codon tables are a many-to-one mapping. Are you hoping for one possible nucleotide sequence, perhaps with IUPAC ambiguity characters? Perhaps a specific example of what you want would help - back-translation is a fuzzy term. If you are trying to combine a protein alignment with the original unaligned nucleotide sequences to make a codon alignment that's a different task. Peter From cjfields at illinois.edu Mon Apr 16 16:44:21 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 16 Apr 2012 20:44:21 +0000 Subject: [Bioperl-l] Amino Acid sequence to nucleotides sequence In-Reply-To: References: Message-ID: <45DE0C13-27B3-4E1C-AB8A-83B99DD407AF@illinois.edu> On Apr 16, 2012, at 3:32 PM, Peter Cock wrote: > On Mon, Apr 16, 2012 at 9:08 PM, shalu sharma wrote: >> Hi All, >> Is there any way in Bioperl i can convert amino acid sequences >> to nucleotide sequences. >> >> Thanks >> Shalu > > Probably - but there is more than one answer since the codon > tables are a many-to-one mapping. Are you hoping for one > possible nucleotide sequence, perhaps with IUPAC ambiguity > characters? Perhaps a specific example of what you want > would help - back-translation is a fuzzy term. > > If you are trying to combine a protein alignment with the > original unaligned nucleotide sequences to make a codon > alignment that's a different task. > > Peter We do have a revtranslate function in bioperl that is supposed to deal with ambiguities: https://metacpan.org/module/Bio::Tools::CodonTable#revtranslate I don't know how well-tested it is, but it was added a few years back to Bio::Tools::CodonTable. IIRC Mark Jensen was the developer who did that, and he's pretty meticulous. chris From Russell.Smithies at agresearch.co.nz Mon Apr 16 17:28:11 2012 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 17 Apr 2012 09:28:11 +1200 Subject: [Bioperl-l] sequence proxy server In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCE50A550@exchsth.agresearch.co.nz> I assume you've done the obvious thing and tried downloading from your local mirror? ftp://biomirror.aarnet.edu.au/biomirror/ Or ours: http://www.biomirror.org.nz/ If you have a large number of requests it's almost always faster to download the refseq files and extract locally rather than run queries against NCBI via the web. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Murray > Sent: Saturday, 7 April 2012 1:50 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] sequence proxy server > > Hi all, > > I'm an undergrad student in molecular biology at the ANU in Australia, and > my research projects are becoming increasingly bioinformatics heavy. The > latest one has involved quite a large amount of sequence retrieval from > GenBank and GenPept. The download speed to Australia from NCBI's servers > is rather slow, and i've been thinking about how we can improve this. One > solution would be to use Bio::DB::Flat with GenBank sequences on a local > computer. However, in a situation where there are multiple people in a lab > doing bioinformatics, it seems to me a bit of a waste to have the entire > genbank/genpept database, or even the relevant sections thereof, on each > computer. So, i though about writing a "sequence proxy" cgi script, and a > corresponding module, which would work a bit like this: > > The user calls Bio::DB::SeqProxy::GenBank as they would Bio::DB::GenBank, > with the exception that a parameter for the address of the sequence proxy > server is required. > The module then sends a request similar to that sent to NCBI's servers by > calling Bio::DB::GenBank->get_Seq_by_x() to the sequence proxy server I > believe all requests go to the efetch page now (please correct me if I'm > wrong, i have read the relevant bioperl module code but not thoroughly), so > the CGI script on the sequence proxy would take arguments in a similar > fashion to make writing the client side module easier. > The CGI script would use a Bio::DB::Flat database, or an interface to an SQL > database to determine if the required sequence is stored locally. (as a aside, > i'd like your thoughts on Bio::DB::Flat vs Bio::DB::Sql or similar) If the > sequence exists locally, it would be returned to the user, either as plain text, > or inside an XML container (see below). > If not, it would be retrieved from the remote database using the relevant > Bio::DB module, and returned. > > The sequence would either be returned as the relevant sequence format > (which would default to GenBank format) in plain text, or as an XML > document similar to: > > > 1 > ___YOUR GENBANK FILE HERE___ Local > Database The aim of the xml document would be to > simplify handling of server errors and allow for the specification of other > metadata such as which database the sequence came from. > > > Firstly, I'd like to know if this sounds feasible, and if so, if someone is already > working on something similar? I don't want to reinvent the wheel. > Secondly, I'd like to ask for your comments and advice. Being reasonably new > to bioperl (started using bioperl about 6 months ago, but I've been coding in > various languages for 8 years) I don't expect to have considered things that > may seem obvious to a more experienced bioperl-er, so please be as brutally > constructive in your criticism as you see fit =]. > > I know this is alot of questions, so thanks in advance for your help. > > Cheers, and a happy Easter to those who celebrate it. > > Regards > Kevin Murray > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From hnorpois at googlemail.com Thu Apr 19 10:44:50 2012 From: hnorpois at googlemail.com (Hermann Norpois) Date: Thu, 19 Apr 2012 16:44:50 +0200 Subject: [Bioperl-l] Transcriptional Regulatory Element Database Message-ID: Hello, I would like to get access to the Transcriptional Regulatory Element Database (http://rulai.cshl.edu/cgi-bin/TRED/tred.cgi?process=searchPromForm) via Bioperl. I did not find a module that does the job. Is it possible to modify a module? Is it generally possible to access this database (by means of bioperl)? Thank you norpois From jason.stajich at gmail.com Thu Apr 19 18:45:32 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 19 Apr 2012 15:45:32 -0700 Subject: [Bioperl-l] Transcriptional Regulatory Element Database In-Reply-To: References: Message-ID: <80CFDDE6-FA7F-4614-AE5D-22A5398EAA17@gmail.com> Have you first tried emailing the author listed at the bottom of the page? That seems like a more direct way to get this information. On Apr 19, 2012, at 7:44 AM, Hermann Norpois wrote: > Hello, > > I would like to get access to the Transcriptional Regulatory Element > Database (http://rulai.cshl.edu/cgi-bin/TRED/tred.cgi?process=searchPromForm) > via Bioperl. I did not find a module that does the job. Is it possible to > modify a module? Is it generally possible to access this database (by means > of bioperl)? > Thank you > norpois > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From merche at uni-bonn.de Mon Apr 23 08:31:27 2012 From: merche at uni-bonn.de (Merche Castillo) Date: Mon, 23 Apr 2012 14:31:27 +0200 Subject: [Bioperl-l] Problems installing BioPerl & cpan Message-ID: <4F954B9F.9020506@uni-bonn.de> I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation | use strict; use warnings; use Getopt::Long; use Bio::EnsEMBL::Registry; my $reg = "Bio::EnsEMBL::Registry"; $reg->load_registry_from_db( -host => "ensembldb.ensembl.org", -user => "anonymous" ); my $db_list=$reg->get_all_adaptors(); my @line; foreach my $db (@$db_list){ @line = split ('=',$db); print $line[0]."\n"; } | I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. Thanks for your help Merche -- ************************************;) Mercedes Castillo INRES, Dept. Molecular Phytomedicine University of Bonn Karlrobert-Kreiten-str 13 53115 Bonn +49(0)22873-60143 merche at uni-bonn.de ***************************************** From jason.stajich at gmail.com Mon Apr 23 09:44:51 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 23 Apr 2012 06:44:51 -0700 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <4F954B9F.9020506@uni-bonn.de> References: <4F954B9F.9020506@uni-bonn.de> Message-ID: <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> That module really has nothing to do with Fgenesh parsing so I don't think this is a particularly good test script - try one of the scripts that comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl However, you should know that module is part of EnsEMBL not BioPerl - it requires an additional package of modules be installed. if you use CPAN to install things you can do cpan> install Bio::EnsEMBL::Registry On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: > I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc > > I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation > > | use strict; > use warnings; > > use Getopt::Long; > use Bio::EnsEMBL::Registry; > > my $reg = "Bio::EnsEMBL::Registry"; > $reg->load_registry_from_db( > -host => "ensembldb.ensembl.org", > -user => "anonymous" > ); > my $db_list=$reg->get_all_adaptors(); > my @line; > > foreach my $db (@$db_list){ > @line = split ('=',$db); > print $line[0]."\n"; > } > | > > I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* > > I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. > > Thanks for your help Merche > > -- > ************************************;) > Mercedes Castillo > INRES, Dept. Molecular Phytomedicine > University of Bonn > > Karlrobert-Kreiten-str 13 > 53115 Bonn > +49(0)22873-60143 > merche at uni-bonn.de > ***************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Mon Apr 23 09:48:53 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 23 Apr 2012 13:48:53 +0000 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <4F954B9F.9020506@uni-bonn.de> References: <4F954B9F.9020506@uni-bonn.de> Message-ID: <26594DC3-0C8D-41E5-BD22-BC3F1DC7E1F0@illinois.edu> You need the Ensembl Perl API code, which requires bioperl but is not part of the bioperl distribution. See here for the latest: http://ensembl.org/info/docs/api/index.html chris On Apr 23, 2012, at 7:31 AM, Merche Castillo wrote: > I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc > > I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation > > | use strict; > use warnings; > > use Getopt::Long; > use Bio::EnsEMBL::Registry; > > my $reg = "Bio::EnsEMBL::Registry"; > $reg->load_registry_from_db( > -host => "ensembldb.ensembl.org", > -user => "anonymous" > ); > my $db_list=$reg->get_all_adaptors(); > my @line; > > foreach my $db (@$db_list){ > @line = split ('=',$db); > print $line[0]."\n"; > } > | > > I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* > > I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. > > Thanks for your help Merche > > -- > ************************************;) > Mercedes Castillo > INRES, Dept. Molecular Phytomedicine > University of Bonn > > Karlrobert-Kreiten-str 13 > 53115 Bonn > +49(0)22873-60143 > merche at uni-bonn.de > ***************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Mon Apr 23 09:54:54 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 23 Apr 2012 06:54:54 -0700 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <4F955E52.50400@uni-bonn.de> References: <4F954B9F.9020506@uni-bonn.de> <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> <4F955E52.50400@uni-bonn.de> Message-ID: <78EB7156-3EC8-4CCD-AE6E-C221B12D4F58@gmail.com> Then the next logical thing to do is go to the Ensembl page for info on how to install their modules. http://uswest.ensembl.org/info/docs/api/api_installation.html On Apr 23, 2012, at 6:51 AM, Merche Castillo wrote: > Hi > > Thanks for your reply. I'm working on some EnsEMBL scripts too, that's why I tried this script. I did look for the Bio::EnsEMBL::Registry on cpan but returns "no object found". > > > > On 04/23/2012 03:44 PM, Jason Stajich wrote: >> >> That module really has nothing to do with Fgenesh parsing so I don't think this is a particularly good test script - try one of the scripts that comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl >> >> However, you should know that module is part of EnsEMBL not BioPerl - it requires an additional package of modules be installed. >> >> if you use CPAN to install things you can do >> cpan> install Bio::EnsEMBL::Registry >> >> >> On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: >> >>> I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc >>> >>> I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation >>> >>> | use strict; >>> use warnings; >>> >>> use Getopt::Long; >>> use Bio::EnsEMBL::Registry; >>> >>> my $reg = "Bio::EnsEMBL::Registry"; >>> $reg->load_registry_from_db( >>> -host => "ensembldb.ensembl.org", >>> -user => "anonymous" >>> ); >>> my $db_list=$reg->get_all_adaptors(); >>> my @line; >>> >>> foreach my $db (@$db_list){ >>> @line = split ('=',$db); >>> print $line[0]."\n"; >>> } >>> | >>> >>> I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* >>> >>> I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. >>> >>> Thanks for your help Merche >>> >>> -- >>> ************************************;) >>> Mercedes Castillo >>> INRES, Dept. Molecular Phytomedicine >>> University of Bonn >>> >>> Karlrobert-Kreiten-str 13 >>> 53115 Bonn >>> +49(0)22873-60143 >>> merche at uni-bonn.de >>> ***************************************** >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> > > > -- > ************************************;) > Mercedes Castillo > INRES, Dept. Molecular Phytomedicine > University of Bonn > > Karlrobert-Kreiten-str 13 > 53115 Bonn > +49(0)22873-60143 > merche at uni-bonn.de > ***************************************** Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Mon Apr 23 09:51:24 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 23 Apr 2012 13:51:24 +0000 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> References: <4F954B9F.9020506@uni-bonn.de> <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> Message-ID: <10108081-D3CC-4A8F-9372-390A2C377088@illinois.edu> Unfortunately the Ensembl code isn't on CPAN (wish it were, it might make things a little easier). chris On Apr 23, 2012, at 8:44 AM, Jason Stajich wrote: > That module really has nothing to do with Fgenesh parsing so I don't think this is a particularly good test script - try one of the scripts that comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl > > However, you should know that module is part of EnsEMBL not BioPerl - it requires an additional package of modules be installed. > > if you use CPAN to install things you can do > cpan> install Bio::EnsEMBL::Registry > > > On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: > >> I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc >> >> I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation >> >> | use strict; >> use warnings; >> >> use Getopt::Long; >> use Bio::EnsEMBL::Registry; >> >> my $reg = "Bio::EnsEMBL::Registry"; >> $reg->load_registry_from_db( >> -host => "ensembldb.ensembl.org", >> -user => "anonymous" >> ); >> my $db_list=$reg->get_all_adaptors(); >> my @line; >> >> foreach my $db (@$db_list){ >> @line = split ('=',$db); >> print $line[0]."\n"; >> } >> | >> >> I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* >> >> I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. >> >> Thanks for your help Merche >> >> -- >> ************************************;) >> Mercedes Castillo >> INRES, Dept. Molecular Phytomedicine >> University of Bonn >> >> Karlrobert-Kreiten-str 13 >> 53115 Bonn >> +49(0)22873-60143 >> merche at uni-bonn.de >> ***************************************** >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From l.m.timmermans at students.uu.nl Mon Apr 23 10:16:04 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Mon, 23 Apr 2012 16:16:04 +0200 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <10108081-D3CC-4A8F-9372-390A2C377088@illinois.edu> References: <4F954B9F.9020506@uni-bonn.de> <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> <10108081-D3CC-4A8F-9372-390A2C377088@illinois.edu> Message-ID: Yeah, that's fairly useless. Does anyone know the reason for that? Is it just inertia, or is it more? Leon On Mon, Apr 23, 2012 at 3:51 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > Unfortunately the Ensembl code isn't on CPAN (wish it were, it might make > things a little easier). > > chris > > On Apr 23, 2012, at 8:44 AM, Jason Stajich wrote: > > > That module really has nothing to do with Fgenesh parsing so I don't > think this is a particularly good test script - try one of the scripts that > comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl > > > > However, you should know that module is part of EnsEMBL not BioPerl - it > requires an additional package of modules be installed. > > > > if you use CPAN to install things you can do > > cpan> install Bio::EnsEMBL::Registry > > > > > > On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: > > > >> I'm posting this message out of pure desperation, because I really > don't know what else to try. I'm a beginner in bioperl and I'm working on a > script to parse out some results I got from MolQuest fgenesh. Results are > out in .txt format and I want to parse them to GFF and fasta file for mRNA > and protein sequences to facilitate comparison with other results we have. > I would like to use BioPerl for other purposes in the future so I'm very > interested in getting it ready on my pc > >> > >> I followed the instructions herehttp:// > www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install > CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All > tests were ok, but when I ran this script to test the installation > >> > >> | use strict; > >> use warnings; > >> > >> use Getopt::Long; > >> use Bio::EnsEMBL::Registry; > >> > >> my $reg = "Bio::EnsEMBL::Registry"; > >> $reg->load_registry_from_db( > >> -host => "ensembldb.ensembl.org", > >> -user => "anonymous" > >> ); > >> my $db_list=$reg->get_all_adaptors(); > >> my @line; > >> > >> foreach my $db (@$db_list){ > >> @line = split ('=',$db); > >> print $line[0]."\n"; > >> } > >> | > >> > >> I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* > >> > >> I tried to install BioPerl again via Build.PL, running as root, but > still came to the same outcome. > >> > >> Thanks for your help Merche > >> > >> -- > >> ************************************;) > >> Mercedes Castillo > >> INRES, Dept. Molecular Phytomedicine > >> University of Bonn > >> > >> Karlrobert-Kreiten-str 13 > >> 53115 Bonn > >> +49(0)22873-60143 > >> merche at uni-bonn.de > >> ***************************************** > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > > jason.stajich at gmail.com > > jason at bioperl.org > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Apr 23 10:20:59 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 23 Apr 2012 14:20:59 +0000 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: References: <4F954B9F.9020506@uni-bonn.de> <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> <10108081-D3CC-4A8F-9372-390A2C377088@illinois.edu> Message-ID: <70FCB632-4CD5-4F28-A6B6-F93507397435@illinois.edu> Not sure, but it may have something to do with the requirement for a very old bioperl (v1.2.3). chris On Apr 23, 2012, at 9:16 AM, Leon Timmermans wrote: > Yeah, that's fairly useless. Does anyone know the reason for that? Is it just inertia, or is it more? > > Leon > > On Mon, Apr 23, 2012 at 3:51 PM, Fields, Christopher J wrote: > Unfortunately the Ensembl code isn't on CPAN (wish it were, it might make things a little easier). > > chris > > On Apr 23, 2012, at 8:44 AM, Jason Stajich wrote: > > > That module really has nothing to do with Fgenesh parsing so I don't think this is a particularly good test script - try one of the scripts that comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl > > > > However, you should know that module is part of EnsEMBL not BioPerl - it requires an additional package of modules be installed. > > > > if you use CPAN to install things you can do > > cpan> install Bio::EnsEMBL::Registry > > > > > > On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: > > > >> I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc > >> > >> I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation > >> > >> | use strict; > >> use warnings; > >> > >> use Getopt::Long; > >> use Bio::EnsEMBL::Registry; > >> > >> my $reg = "Bio::EnsEMBL::Registry"; > >> $reg->load_registry_from_db( > >> -host => "ensembldb.ensembl.org", > >> -user => "anonymous" > >> ); > >> my $db_list=$reg->get_all_adaptors(); > >> my @line; > >> > >> foreach my $db (@$db_list){ > >> @line = split ('=',$db); > >> print $line[0]."\n"; > >> } > >> | > >> > >> I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* > >> > >> I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. > >> > >> Thanks for your help Merche > >> > >> -- > >> ************************************;) > >> Mercedes Castillo > >> INRES, Dept. Molecular Phytomedicine > >> University of Bonn > >> > >> Karlrobert-Kreiten-str 13 > >> 53115 Bonn > >> +49(0)22873-60143 > >> merche at uni-bonn.de > >> ***************************************** > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > > jason.stajich at gmail.com > > jason at bioperl.org > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rbuels at gmail.com Mon Apr 23 19:49:10 2012 From: rbuels at gmail.com (Robert Buels) Date: Mon, 23 Apr 2012 19:49:10 -0400 Subject: [Bioperl-l] Announcing OBF Google Summer of Code Accepted Students Message-ID: <4F95EA76.4030004@gmail.com> Hello all, I'm very pleased and excited to announce that the Open Bioinformatics Foundation has selected 5 very capable students to work on OBF projects this summer as part of the Google Summer of Code program. The accepted students, their projects, and their mentors (in alphabetical order): Wibowo Arindrarto SearchIO Implementation in Biopython mentored by Peter Cock Lenna Peterson Diff My DNA: Development of a Genomic Variant Toolkit for Biopython mentored by Brad Chapman Marjan Povolni The worlds fastest parallelized GFF3/GTF parser in D, and an interfacing biogem plugin for Ruby mentored by Pjotr Prins, Francesco Strozzi, Raoul Bonnal Artem Tarasov Fast parallelized GFF3/GTF parser in C++, with Ruby FFI bindings mentored by Pjotr Prins, Francesco Strozzi, Raoul Bonnal Clayton Wheeler Multiple Alignment Format parser for BioRuby mentored by Francesco Strozzi and Raoul Bonnal As in every year, we received many great applications and ideas. However, funding and mentor resources are limited, and we were not able to accept as many as we would have liked. Our deepest thanks to all the students who applied: we sincerely appreciate the time and effort you put into your applications, and hope you will still consider being a part of the OBF's open source projects, even without Google funding. I speak for myself and all of the mentors who read and scored applications when I say that we were truly honored by the number and quality of the applications we received. For the accepted students: congratulations! You have risen to the top of a very competitive application process. Now it's time to "put your money where your mouth is", as the saying goes. Let's get out there and write some great code this summer! Best regards, Rob ---- Robert Buels OBF GSoC 2012 Administrator From Simon.Guest at agresearch.co.nz Mon Apr 30 02:00:26 2012 From: Simon.Guest at agresearch.co.nz (Guest, Simon) Date: Mon, 30 Apr 2012 18:00:26 +1200 Subject: [Bioperl-l] Circular dependency problems packaging BioPerl as RPM Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCECE5EF4@exchsth.agresearch.co.nz> Dear BioPerlers, I am building BioPerl and BioPerl-Run as RPMs, since I have to install them on several servers, and really don't want to run CPAN installation scripts on each machine. It has been a tortuous journey of chasing down dependencies and packaging them (thank goodness for cpanspec), but I think I am nearly done. However, I have hit a circular dependency / incompatibility problem between BioPerl and BioPerl-Run. When building BioPerl-Run-1.006900, it insists on BioPerl-1.6.901: Checking prerequisites... - ERROR: Bio::Root::Version (1.006001) is installed, but we need version >= 1.006900 But then BioPerl-Run-1.006900 has dependencies on Bio::Expression::DataSet Bio::Expression::Platform Bio::Expression::Sample Bio::Expression::Contact which seem to be in BioPerl-1.6.1 but not BioPerl-1.6.901 Does anyone know of this problem? Are there any suggestions for work arounds? cheers, Simon ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Mon Apr 30 09:42:34 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 30 Apr 2012 13:42:34 +0000 Subject: [Bioperl-l] Circular dependency problems packaging BioPerl as RPM In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF34CCECE5EF4@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF34CCECE5EF4@exchsth.agresearch.co.nz> Message-ID: <3E2BD058-EDB0-482B-9727-8DB7FB466737@illinois.edu> The Bio::Expression dependencies are unusual, I'll have to look through and find the modules responsible for pulling these in. When I last ran these no tests failed, so either the dependency is off or no tests have been written for the modules in question. We can always release a new CPAN BioPerl-Run to deal with it. chris On Apr 30, 2012, at 1:00 AM, Guest, Simon wrote: > Dear BioPerlers, > > I am building BioPerl and BioPerl-Run as RPMs, since I have to install them on > several servers, and really don't want to run CPAN installation scripts on > each machine. > > It has been a tortuous journey of chasing down dependencies and packaging them > (thank goodness for cpanspec), but I think I am nearly done. > > However, I have hit a circular dependency / incompatibility problem between > BioPerl and BioPerl-Run. > > When building BioPerl-Run-1.006900, it insists on BioPerl-1.6.901: > Checking prerequisites... > - ERROR: Bio::Root::Version (1.006001) is installed, but we need version >= 1.006900 > > But then BioPerl-Run-1.006900 has dependencies on > Bio::Expression::DataSet > Bio::Expression::Platform > Bio::Expression::Sample > Bio::Expression::Contact > which seem to be in BioPerl-1.6.1 but not BioPerl-1.6.901 > > Does anyone know of this problem? > > Are there any suggestions for work arounds? > > cheers, > Simon > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hnorpois at googlemail.com Mon Apr 30 12:45:50 2012 From: hnorpois at googlemail.com (Hermann Norpois) Date: Mon, 30 Apr 2012 18:45:50 +0200 Subject: [Bioperl-l] different interpretion of get_seq_by_id by DB::GenBank and DB:Entrez::Gene Message-ID: I am a confused by the different interpretation of get_seq_by_id. Obviously it is something different for the two modules. Script1: #!/bin/perl -w use Bio::DB::GenBank; use Bio::SeqIO; # Das output-Format wird festgelegt $seqio_obj = Bio::SeqIO->new(-file => '>sequence.fasta', -format => 'fasta' ); $db_obj = Bio::DB::GenBank->new; $id = "BC049766"; # accesscion number $seq_obj = $db_obj->get_Seq_by_id($id); $seqio_obj->write_seq($seq_obj); Script2: #!/bin/perl -w use strict; use Bio::DB::EntrezGene; my $id = "Penk1"; #name of the gene my $db = new Bio::DB::EntrezGene; my $seq = $db->get_Seq_by_id($id); my $ac = $seq->annotation; for my $ann ($ac->get_Annotations('dblink')) { if ($ann->database eq "Evidence Viewer") { # get the sequence identifier, the start, and the stop my ($contig,$from,$to) = $ann->url =~ /contig=([^&]+).+from=(\d+)&to=(\d+)/; print "$contig\t$from\t$to\n"; } } Thank you Hermann Norpois From jimhu at tamu.edu Mon Apr 30 13:38:23 2012 From: jimhu at tamu.edu (Jim Hu) Date: Mon, 30 Apr 2012 12:38:23 -0500 Subject: [Bioperl-l] Gbrowse file uploads, bigwig and chromosome sizes files Message-ID: <1F4B23DC-2CD1-4D61-A6F4-D823B4C7C7D1@tamu.edu> I'm not sure how many of our issues are gbrowse-specific vs. more general bioperl issues, so I'm cross-posting to both lists. We think we've traced our problems uploading wiggle files to our gbrowse to the failure to create the chromosome.size file. Short version: - what is supposed to be in the locationlist? Chromosomes only or just genes? - why does the chromosome sizes try to get everything in the locationlist, whether or not it's a chromosome? Long version: Our E. coli MG1655 database was loaded several years ago with bp_seqfeature_load.pl -d gb_MG1655_jh -f -c NC_000913.gb.gff NC_000913.gb.fasta -u -p The mysql database has 4,146 entries in the locationlist where the first one is for the chromosome and the others are named for genes. When we ask Gbrowse to generate the chromosome sizes file, instead of doing what I expect (look up the reference feature names), it tries to get the size of every feature in the locationlist. I can't actually find the fasta file I used. When this happens, the eval in Bio::Graphics::Broser2::Dataloader dies because it does not seem to be passing allow_aliases to this subroutine in Bio::DB::Seqfeature::Store:: DBI::mysql sub _name_sql { my $self = shift; my ($name,$allow_aliases,$join) = @_; my $name_table = $self->_name_table; my $from = "$name_table as n"; my ($match,$string) = $self->_match_sql($name); my $where = "n.id=$join AND n.name $match"; $where .= " AND n.display_name>0" unless $allow_aliases; return ($from,$where,'',$string); } Here's the backtrace: CHROMOSOME SIZES at /usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 942, referer: Bio::DB::SeqFeature::Store::DBI::mysql::_name_sql('Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0xb2bfed0)', 'b0001', undef, 'f.id') called at /usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/DBI/mysql.pm Bio::DB::SeqFeature::Store::DBI::mysql::_features('Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0xb2bfed0)', '-name', 'b0001', '-class', undef, '-aliases', undef, Bio::DB::SeqFeature::Store::get_features_by_name('Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0xb2bfed0)', '-name', 'b0001') called at /usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store.pm line Bio::DB::SeqFeature::Store::segment('Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0xb2bfed0)', 'b0001') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/DataLoader.pm line 171, eval {...} called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/DataLoader.pm line 169, Bio::Graphics::Browser2::DataLoader::generate_chrom_sizes('Bio::Graphics::Browser2::DataLoader=HASH(0xb2bfbd0)', '/var/tmp/gbrowse2/chrom_sizes/MG1655.sizes') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/DataLoader.pm line 143, Bio::Graphics::Browser2::DataLoader::chrom_sizes('Bio::Graphics::Browser2::DataLoader=HASH(0xb2bfbd0)') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/Action.pm line 1117, referer: Bio::Graphics::Browser2::Action::ACTION_chrom_sizes('Bio::Graphics::Browser2::Action=REF(0xa993ea0)', 'CGI=HASH(0xaf57450)') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/Render.pm line 427, Bio::Graphics::Browser2::Render::asynchronous_event('Bio::Graphics::Browser2::Render::HTML=HASH(0xaf590d8)') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/Render.pm line 356, referer: Bio::Graphics::Browser2::Render::run_asynchronous_event('Bio::Graphics::Browser2::Render::HTML=HASH(0xaf590d8)') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/Render.pm line 274, referer: Bio::Graphics::Browser2::Render::run('Bio::Graphics::Browser2::Render::HTML=HASH(0xaf590d8)') called at /usr/lib/cgi-bin/gb2/gbrowse line 50, referer: ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From hnorpois at googlemail.com Mon Apr 30 14:06:40 2012 From: hnorpois at googlemail.com (Hermann Norpois) Date: Mon, 30 Apr 2012 20:06:40 +0200 Subject: [Bioperl-l] Retrieving promoter sequenc Message-ID: Dear list, I try to write a script for retrieving a 700bp sequence upstream of the 5?prime of TTS (a putative promoter sequence). This page gave me some information how to do so (Chapter *Using Bio::DB::EntrezGene to get genomic coordinates* AND *Using Bio::DB::GenBank when you have genomic coordinates to get a Seq object*): http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences Actually I do not have an idea how to define $chr_acc_ver (see below) #!/bin/perl -w use strict; use Bio::DB::EntrezGene; use Bio::SeqIO; use Bio::DB::GenBank; my $id = "12064"; # bdnf my $seqio_obj = Bio::SeqIO->new(-file => '>s2.fasta', -format => 'fasta' ); my $db = new Bio::DB::EntrezGene; my $seq = $db->get_Seq_by_id($id); my $ac = $seq->annotation; for my $ann ($ac->get_Annotations('dblink' )) { if ($ann->database eq "Evidence Viewer") { # get the sequence identifier, the start, and the stop my ($contig,$from,$to) = $ann->url =~ /contig=([^&]+).+from=(\d+)&to=(\d+)/; my $chr_start = $from-700; my $chr_stop = $from; my $gb = Bio::DB::GenBank->new(-format => 'genbank', -seq_start => $chr_start, -seq_stop => $chr_stop, # -strand => $strand ); my $obj = $gb->get_Seq_by_id($chr_acc_ver); # *How do I define $chr_acc_ver?* $seqio_obj->write_seq($obj); # print "$contig\t$from\t$to\n$chr_start\t$chr_stop\n"; } } Can anybody give me a hint how this might work? Thanks Hermann Norpois From maquino at knome.com Mon Apr 30 15:15:26 2012 From: maquino at knome.com (Mark Aquino) Date: Mon, 30 Apr 2012 15:15:26 -0400 Subject: [Bioperl-l] unblessed reference in $sam->pileup error. Message-ID: <984E64E7-DF37-4BD1-BB84-DC86816A42E8@knome.com> Hi all, I'm trying to call all bases from a bam and count their depths, at first I was doing this getting all alignments that cover a certain region, but realized that writing the logic to detect indels via the cigar string was a bit more complicated than I thought so I decided to try this with the pileup method from Bio::DB::Sam / Bio::DB::Bam::Pileup however I am getting this error: Can't call method "b" on unblessed reference at ./coverageDepths.pl line 114, line 1. when trying to use the $pileup->alignment method. Does anyone have any idea what I'm missing? 109 $sam->pileup('1:550968-550969', 110 sub { 111 my ($seqid,$pos,$pileup) = @_; 112 for my $p (@$pileup){ 113 if ($p->indel){ print "INDEL!\n"}; 114 my $b = $pileup->b; 115 my $qbase = substr($b->qseq, $pileup->qpos,1); 116 print "$qbase\n"; 117 } 118 }); Thanks, Mark From maquino at knome.com Mon Apr 30 15:18:35 2012 From: maquino at knome.com (Mark Aquino) Date: Mon, 30 Apr 2012 15:18:35 -0400 Subject: [Bioperl-l] unblessed reference on sam->pileup Message-ID: <33D55D60-7986-4971-9802-47AB9CDE3E24@knome.com> Nevermind, as usual 5 seconds after sending an email to the group I realized what I was doing wrong the whole time. From Simon.Guest at agresearch.co.nz Mon Apr 30 23:29:58 2012 From: Simon.Guest at agresearch.co.nz (Guest, Simon) Date: Tue, 1 May 2012 15:29:58 +1200 Subject: [Bioperl-l] Circular dependency problems packaging BioPerl as RPM In-Reply-To: <3E2BD058-EDB0-482B-9727-8DB7FB466737@illinois.edu> References: <18DF7D20DFEC044098A1062202F5FFF34CCECE5EF4@exchsth.agresearch.co.nz> <3E2BD058-EDB0-482B-9727-8DB7FB466737@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCECE61BE@exchsth.agresearch.co.nz> > -----Original Message----- > From: Fields, Christopher J [mailto:cjfields at illinois.edu] > Sent: Tuesday, 1 May 2012 1:43 a.m. > To: Guest, Simon > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Circular dependency problems packaging BioPerl as > RPM > > The Bio::Expression dependencies are unusual, I'll have to look through and > find the modules responsible for pulling these in. When I last ran these no > tests failed, so either the dependency is off or no tests have been written for > the modules in question. > > We can always release a new CPAN BioPerl-Run to deal with it. Hi Chris, I ignored the Bio::Expression dependencies and everything eventually built OK, using BioPerl-1.6.901 and BioPerl-Run-1.006900. If you release a new BioPerl-Run, I would be interested in packaging it, as I have come this far. Do you have any ideas about where I could submit the BioPerl and dependency RPMs I built for CentOS 6? I now have around 40 RPMs that weren't in CentOS or EPEL, which were all built straight from CPAN using cpanspec. I guess others might like to benefit from this (and it would also serve to validate the builds). My other unknown is what non-Perl dependencies I should add to the BioPerl RPM. I don't know what to do here. The dependencies page on the BioPerl Wiki seems to list only Perl module dependencies. cheers, Simon ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From exceptlowang at gmail.com Tue Apr 17 20:00:08 2012 From: exceptlowang at gmail.com (Tim White) Date: Wed, 18 Apr 2012 00:00:08 -0000 Subject: [Bioperl-l] Bio::SeqIO::tab deletes gap characters when reading sequences, which is inconvenient Message-ID: <4F8E03FE.7000506@gmail.com> Hi, Bio::SeqIO::tab (what you get when specifying -format => 'tab' to Bio::SeqIO->new()) is perfect for converting sequences into a one-per-line format, so that standard line-oriented UNIX tools (grep, comm etc.) work as expected. Except... I just discovered that it deletes gap ("-") characters when reading sequences, so it can't be used to round-trip any files that contain these. This is a source of grief as I frequently work with FASTA files that contain aligned sequences, and thus gap characters. This is all because the next_seq() function in Bio::SeqIO::tab.pm contains the line: $seq =~ s/\W//g; which removes all non-alphanumeric characters from the sequence data. IMHO it would be *much* better if this was changed to: $seq =~ s/\s//g; which simply removes all whitespace characters (particularly including the \r that often appears at the ends of lines on text files that have visited Windows), enabling gap characters (and, for example, periods and asterisks) to be preserved. Alternatively, you could simply get rid of this line of code and allow whitespace characters through. I'm not sure whether this counts as a "bug", as a cursory search didn't turn up any docs explaining precisely what characters are and aren't preserved by classes implementing Bio::SeqIO, but it's certainly inconsistent (at least Bio::SeqIO::fasta, and Bio::SeqIO::table, with columns and delimiters set up appropriately, allow round-tripping of files containing gap characters) as well as extremely inconvenient for me personally, and I suspect for others. Assuming no harm would be done by making the above change, what's the best thing to do to get this changed? I've simply edited my own local copy of tab.pm to make the above change, but obviously if others agree I'd like to get the change done upstream. Thanks, Tim From mohammadali.alavi at edu.uni-graz.at Sat Apr 21 05:22:21 2012 From: mohammadali.alavi at edu.uni-graz.at (Alavi, Mohammadali (0313xxx)) Date: Sat, 21 Apr 2012 11:22:21 +0200 Subject: [Bioperl-l] piping values into an existing GENBANK file Message-ID: <70DA93B804A15C4387B05DEF33BC255701A1CE149E36@MSIGI.stud.ad.uni-graz.at> Hello All, I have a GENBANK file already, to which I need to add some feauture. To be precise, I want to add the data (over the COG function) to the CDSs present in the GENBANK file. The data (COG functions) I need to add is included in an array in a manner that the first value is the value needed to be added to my first CDS in the GENBANK file, the second value needs to be added to the second CDS in the GENBANK file and so on. I tried to add the data in a tag/value style to the CDSs (as described in HOW TO:Feautures-Annotation provided by Biopel), which actually basically works. The Problem is though, I do not know how I could tell Perl/Bioperl to only take one single value at a time and add it in a tag/value style to a CDS and then take the next (and only the next) value and add it to the NEXT CDS and so on. Here is the code I used. As you see, using the for $item(@array) is not appropriate, since it adds all the values of my array to all CDSs! So is there a way of piping in values one after another into CDSs one after another in a file using Bioperl?! or maybe how about another way of doing it in regular Perl? I would appreciate any help on that very much! Bioperl I'm using: 1.6.1 The Active Perl I'm using : 5.12.4 (on Windows Vista) #!/bin/perl use Bio::SeqIO; use Bio::SeqFeature::Generic; use warnings; @COGlist = qw(motility General metabolism nunknown); # think of this as the #array I would like to add the values of to my file, the real one has ofcourse #as many values as the number of CDSs in the GENBANK file $seqio_object = Bio::SeqIO -> new(-file => "file.gbk", -format => "genbank"); $seq_object = $seqio_object -> next_seq; for $feat_object ($seq_object -> get_SeqFeatures){ for $item(@COGlist){ # this would add all elements of the array to all of CDSs and is therefore wrong! $feat_object -> add_tag_value("note", $item); } for $tags ($feat_object -> get_all_tags){ print "tag:".$tags . "\n"; for $values ($feat_object -> get_tag_values($tags)){ print "value: " . $values . "\n"; # as one might imagine this does not give the output I have been looking for :-)) } } } From huansheng.xu at gmail.com Sun Apr 22 10:15:44 2012 From: huansheng.xu at gmail.com (Huansheng Xu) Date: Sun, 22 Apr 2012 10:15:44 -0400 Subject: [Bioperl-l] configuration problem with Bio::Tools::Run::Alignment::ClustalW Message-ID: Hi, I am a postdoc fellow at Massachusetts General Hospital in Boston. I am writing to seek help with the Bio::Tools::Run::Alignment::ClustalW module available at the BioPerl website. I tried to align some DNA sequences contained in a FASTA file with the module embeded in a propram (as shown below), but got stuck there. The program works very well for protein sequences. I think maybe I need to configure the module specifically for DNA, but I do not know how to do that. Could you take a look and let me know how to do the configuration? Thanks a lot! Best, Huansheng Xu -------------------------------------------------------------------------------------------------------------------------------------------------------------------- #! /usr/bin/perl use Bio::Perl; use Bio::SearchIO; use Bio::AlignIO; use Bio::Tools::Run::Alignment::Clustalw; use warnings; use strict; my $filename = $ARGV[0]; die "Usage: $0 \n" unless $filename; die "File $filename not found.\n" unless -f $filename; # Read the list of raw sequences from the file you feed the program my $fh = Bio::SeqIO->newFh(-file=>$filename, -format=>'fasta'); my @seq_array=<$fh>; # pass the parameters and generate a factory to run the alignmnet wiht ClustalW my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); @params = ('ktuple' => 2, 'dnamatrix' => 'IUB') if ($seq_array[0]->alphabet eq 'dna'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); my $seq_array_ref = \@seq_array; my $aln = $factory->align($seq_array_ref); # create a new AlignIO object my $out = Bio::AlignIO->new(-file=> ">$filename.aln", -format=> 'clustalw'); $out->write_aln($aln); From bubli_thakur at rediffmail.com Fri Apr 20 22:59:50 2012 From: bubli_thakur at rediffmail.com (subarna thakur) Date: 21 Apr 2012 02:59:50 -0000 Subject: [Bioperl-l] =?utf-8?q?codon_usage?= Message-ID: <20120421025950.8579.qmail@f4mail-235-122.rediffmail.com> I am writing a script for determining number of genes containing a particular codon. The codons are mentioned in a separate file. The output is coming all right for the first codon mentioned in the file but for the other codons , the script is not working. Please suggest the error in the script. The script is as follows ---- #!/usr/bin/perl -w use Bio::SeqIO; $file2="table.txt"; $codon=0; open OUT, ">out-test.txt" or die $!; $seqio_obj = Bio::SeqIO->new( -file => "gopi2.txt" , '-format' => 'Fasta'); open( my $fh2, $file2 ) or die "$!"; while( my $line = <$fh2> ){ $acc=$line; chomp $acc; while ($seq1= $seqio_obj->next_seq){ my @output = $seq1->id; my $string = $seq1->seq; $v=0; $l= length($string); $t=$l/3; $k=0; for ($i=1; $i <= $t; $i++){ @array2 = substr($string, $k, 3); $k=$k+3; foreach $value (@array2) { if ($value eq "$acc") { print OUT " The sequence id is @output\n"; print OUT "$acc codon found in position $i\n\n"; $v=$v+1; } } } if ($v==0) { $h=0; } else { $h=1; } $codon=$codon+$h; } print OUT "Total number of sequences with $acc codon"; print OUT "\t"; print OUT $codon; } exit; From msprasad693 at gmail.com Thu Apr 26 08:16:39 2012 From: msprasad693 at gmail.com (prasad ms) Date: Thu, 26 Apr 2012 17:46:39 +0530 Subject: [Bioperl-l] Bioperl for global alignment Message-ID: Hello sir, I am Prasad, student of MS in bioinformatics. I am doing my final year project, and sequence alignment is the part of my project. I am having nearly 50k sequences and i want to do a pairwise global alignment (NW alignment). I read the bioperl tutorial. But in that there is no mention about this. Could you please guide how can i do this type of alignment using bioperl. I assure that all the usage is purely for academic. Looking forward to hear from you. Thank you, Regards, Prasad MS From msprasad693 at gmail.com Mon Apr 30 01:40:43 2012 From: msprasad693 at gmail.com (prasad ms) Date: Mon, 30 Apr 2012 11:10:43 +0530 Subject: [Bioperl-l] Fwd: Bioperl for global alignment In-Reply-To: References: Message-ID: Hello sir, I am Prasad, student of MS in bioinformatics. I am doing my final year project, and sequence alignment is the part of my project. I am having nearly 50k sequences and i want to do a pairwise global alignment (NW alignment). I read the bioperl tutorial. But in that there is no mention about this. Could you please guide how can i do this type of alignment using bioperl. I assure that all the usage is purely for academic. Looking forward to hear from you. Thank you, Regards, Prasad MS From bartwiegmans at gmail.com Sun Apr 1 11:55:01 2012 From: bartwiegmans at gmail.com (Bart Wiegmans) Date: Sun, 1 Apr 2012 13:55:01 +0200 Subject: [Bioperl-l] Implementing Bioperl6 for GSoC 2012 Message-ID: Hello all BioPerl-ers, This is my first e-mail to the list and thus my introduction. My name is Bart Wiegmans, I study biology at the University of Groningen, the netherlands. It is my goal to implement bioperl6 this summer as part of the GSoC program. Why would I want such a thing? For a start, I'd like to learn more about bioinformatics. As I told you I study biology, so this has an obvious advantage for me. Also, I'd like to learn perl6 well, and this is only possible when one writes a significant program in it. Moreover, I think perl6 is awesome, and having a real-world toolkit like bioperl out there might just be enough to develop a significant community using it. As a third, I think perl5's object support is crufty, and difficult to learn for many people. These people include biologists who might not be inclined to learn it, and rather use some other tools instead. As to who I am, I already told you my name. I am 24 years old, and study biology at an undergraduate level. (For those interested, yes this means I haven't exactly been flying through my courses :-)). I have been programming computers ever since I was 16 years old, and earlier if you count BASIC. Starting out with C, most of that has been websites (in PHP), scripts (in Perl), and other smallish programs (in Java / Perl). For example, I implemented a parser and decoder for the dirac video specification as part of GSoC 2008, and a script which reads the NIH bookshelf website and translates this into ePub e-books. Read quite a few of them that way. Aside from my motivation and capabilities, two other factors somewhat complicate my involvement with GSoC. The first is that the academic year ends halfway in July in the netherlands, not in may as in the USA and in many other countries. This means that I am not 'free' in a real sense before that time. Also, I have a day job as a PHP programmer for a local online students' magazine, which also takes some time. Which is unfortunate, because I'd rather spend my time writing useful programs; hence, if you would accept me as a student I plan to take leave from this job during the period of GSoC. Anyway, I realize this has been enough information for any interested reader. If there is any interest on your side, I frequent freenode under the nickname brrt. Other than that and this e-mail address, I don't have much of an online presence. Kind regards, Bart Wiegmans From l.m.timmermans at students.uu.nl Sun Apr 1 14:38:13 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Sun, 1 Apr 2012 16:38:13 +0200 Subject: [Bioperl-l] Implementing Bioperl6 for GSoC 2012 In-Reply-To: References: Message-ID: On Sun, Apr 1, 2012 at 1:55 PM, Bart Wiegmans wrote: > This is my first e-mail to the list and thus my introduction. My name > is Bart Wiegmans, I study biology at the University of Groningen, the > netherlands. It is my goal to implement bioperl6 this summer as part > of the GSoC program. > Cool. Though I am wondering what exactly you want to implement. BioPerl as a whole is 2000 modules, not even a dozen GSOC students could implement that. You will have to focus on something. Also, I'd like to learn perl6 well, and this > is only possible when one writes a significant program in it. > Moreover, I think perl6 is awesome, and having a real-world toolkit > like bioperl out there might just be enough to develop a significant > community using it. How much time do you expect that to cost? Having to learn a new language means you will get less done that you would ordinarily. This doesn't have to be a problem, but do keep it into account. > As a third, I think perl5's object support is > crufty, and difficult to learn for many people. These people include > biologists who might not be inclined to learn it, and rather use some > other tools instead. > Perl 5's object support can be quite elegant with modern OO frameworks such as Moose and relatives. Sadly, BioPerl itself is based on fairly dated paradigms. Aside from my motivation and capabilities, two other factors somewhat > complicate my involvement with GSoC. The first is that the academic > year ends halfway in July in the netherlands, not in may as in the USA > and in many other countries. This means that I am not 'free' in a real > sense before that time. Also, I have a day job as a PHP programmer for > a local online students' magazine, which also takes some time. Which > is unfortunate, because I'd rather spend my time writing useful > programs; hence, if you would accept me as a student I plan to take > leave from this job during the period of GSoC. > Yeah, I'm familiar with that problem, it's rather unfortunate. > Anyway, I realize this has been enough information for any interested > reader. If there is any interest on your side, I frequent freenode > under the nickname brrt. Other than that and this e-mail address, I > don't have much of an online presence. > Well, then come join us at #bioperl and #perl6 then. Leon From cjfields at illinois.edu Mon Apr 2 01:57:53 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 2 Apr 2012 01:57:53 +0000 Subject: [Bioperl-l] Implementing Bioperl6 for GSoC 2012 In-Reply-To: References: Message-ID: <6D16BB69-A74A-4A7E-B81C-0B33E9A73473@illinois.edu> Bart, I think this is a good idea, but to address one concern up-front: I believe the time frame (possible start in mid-July) will not be in your favor unless you can participate according to Google's schedule. There is some flexibility at the beginning, mainly during the community bonding period, but beyond that there isn't much wiggle room; we're as constrained as you are. The proposals to OBF over the last few years have been highly competitive, and this year will prove even more so as a large number of major Perl-based FLOSS orgs (primarily The Perl Foundation) were not accepted into GSoC. Now for Perl 6: BioPerl6 is a project Philip Mabon and I have already started up on github: https://github.com/cjfields/bioperl6 The code is basically proof-of-concept stuff at this point, and I don't believe it's current to spec last I checked or that it works on either the most current niecza or rakudo. All this was enough of a moving target to make it hard to maintain, but that seems to be settling somewhat now. It's pretty wide open, though, as far as I'm concerned. If you are to pursue this as a proposal, you will need to focus on a specific task or set of classes to implement along with docs and tests. A straight port isn't the best option; you should take advantage of Perl 6's strengths and possibly work on redesign where needed, keeping in mind that some aspects of bioperl might be considered a bit over-designed and out-of-date w/ re: to modern perl dictums. Also, learning a new language is nice, but that isn't the main focus for any GSoC project. At the end of the day, we should have usable code for the community, and if it acts as a entry point for more people to get involved with Perl 6 the better :) (I see that Leon has also chimed in on this with similar comments as well) I will be on and off #bioperl this week (pyrimidine). IRC is also logged in case I need to backlog (provided by one Moritz Lenz): http://irclog.perlgeek.de/bioperl/today chris On Apr 1, 2012, at 6:55 AM, Bart Wiegmans wrote: > Hello all BioPerl-ers, > > This is my first e-mail to the list and thus my introduction. My name > is Bart Wiegmans, I study biology at the University of Groningen, the > netherlands. It is my goal to implement bioperl6 this summer as part > of the GSoC program. > > Why would I want such a thing? For a start, I'd like to learn more > about bioinformatics. As I told you I study biology, so this has an > obvious advantage for me. Also, I'd like to learn perl6 well, and this > is only possible when one writes a significant program in it. > Moreover, I think perl6 is awesome, and having a real-world toolkit > like bioperl out there might just be enough to develop a significant > community using it. As a third, I think perl5's object support is > crufty, and difficult to learn for many people. These people include > biologists who might not be inclined to learn it, and rather use some > other tools instead. > > As to who I am, I already told you my name. I am 24 years old, and > study biology at an undergraduate level. (For those interested, yes > this means I haven't exactly been flying through my courses :-)). I > have been programming computers ever since I was 16 years old, and > earlier if you count BASIC. Starting out with C, most of that has been > websites (in PHP), scripts (in Perl), and other smallish programs (in > Java / Perl). For example, I implemented a parser and decoder for the > dirac video specification as part of GSoC 2008, and a script which > reads the NIH bookshelf website and translates this into ePub e-books. > Read quite a few of them that way. > > Aside from my motivation and capabilities, two other factors somewhat > complicate my involvement with GSoC. The first is that the academic > year ends halfway in July in the netherlands, not in may as in the USA > and in many other countries. This means that I am not 'free' in a real > sense before that time. Also, I have a day job as a PHP programmer for > a local online students' magazine, which also takes some time. Which > is unfortunate, because I'd rather spend my time writing useful > programs; hence, if you would accept me as a student I plan to take > leave from this job during the period of GSoC. > > Anyway, I realize this has been enough information for any interested > reader. If there is any interest on your side, I frequent freenode > under the nickname brrt. Other than that and this e-mail address, I > don't have much of an online presence. > > Kind regards, > Bart Wiegmans > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From joseramonblas at gmail.com Mon Apr 2 05:21:55 2012 From: joseramonblas at gmail.com (=?ISO-8859-1?Q?Jos=E9_Ram=F3n_Blas_Pastor?=) Date: Mon, 2 Apr 2012 07:21:55 +0200 Subject: [Bioperl-l] data into R from Perl arrays Message-ID: Hi, a very simple doubt, but I do not know how to manage this. I want to plot a histogram for all data in 'datos.txt'. a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) dev.off() b) How could I invoke R inside Perl to do the same?? #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; push(@datos,$_); } #now I want a histogram of values in @datos Thanks!! JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 From joseramonblas at gmail.com Mon Apr 2 05:21:55 2012 From: joseramonblas at gmail.com (=?ISO-8859-1?Q?Jos=E9_Ram=F3n_Blas_Pastor?=) Date: Mon, 2 Apr 2012 07:21:55 +0200 Subject: [Bioperl-l] data into R from Perl arrays Message-ID: Hi, a very simple doubt, but I do not know how to manage this. I want to plot a histogram for all data in 'datos.txt'. a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) dev.off() b) How could I invoke R inside Perl to do the same?? #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; push(@datos,$_); } #now I want a histogram of values in @datos Thanks!! JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 From avilella at gmail.com Mon Apr 2 06:24:09 2012 From: avilella at gmail.com (Albert Vilella) Date: Mon, 2 Apr 2012 07:24:09 +0100 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: You may get a faster answer in a generic programming site, like stackoverflow.com... On Mon, Apr 2, 2012 at 6:21 AM, Jos? Ram?n Blas Pastor wrote: > Hi, > > a very simple doubt, but I do not know how to manage this. > > I want to plot a histogram for all data in 'datos.txt'. > > a) by using R: > datos<-scan("datos.txt") > pdf("xh.pdf") > hist(datos) > dev.off() > > > b) How could I invoke R inside Perl to do the same?? > #!/usr/bin/perl > open(DAT,"datos.txt"); > while () { > ?chomp; > ?push(@datos,$_); > } > #now I want a histogram of values in @datos > > Thanks!! > > JR > > -- > Jos? Ram?n Blas - PhD > Dept. Biochemistry - Medicine School > University of Castilla-La Mancha > C Almansa, 14 > 02006 Albacete (Spain) > > Phone: +34 967599200 ext. 2958 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From avilella at gmail.com Mon Apr 2 06:24:09 2012 From: avilella at gmail.com (Albert Vilella) Date: Mon, 2 Apr 2012 07:24:09 +0100 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: You may get a faster answer in a generic programming site, like stackoverflow.com... On Mon, Apr 2, 2012 at 6:21 AM, Jos? Ram?n Blas Pastor wrote: > Hi, > > a very simple doubt, but I do not know how to manage this. > > I want to plot a histogram for all data in 'datos.txt'. > > a) by using R: > datos<-scan("datos.txt") > pdf("xh.pdf") > hist(datos) > dev.off() > > > b) How could I invoke R inside Perl to do the same?? > #!/usr/bin/perl > open(DAT,"datos.txt"); > while () { > ?chomp; > ?push(@datos,$_); > } > #now I want a histogram of values in @datos > > Thanks!! > > JR > > -- > Jos? Ram?n Blas - PhD > Dept. Biochemistry - Medicine School > University of Castilla-La Mancha > C Almansa, 14 > 02006 Albacete (Spain) > > Phone: +34 967599200 ext. 2958 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Mon Apr 2 08:17:56 2012 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 2 Apr 2012 09:17:56 +0100 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> The quickest way to do this specific example is probably to just save your R commands to a file (eg R_commands.R) and then run some kind of system/exec/backtick function in your perl script to invoke R, something like (untested): #!/usr/bin/perl use strict; use warnings; system( 'R --file R_commands.R' ); Alternatively if you want perl and R to be able to interact and pass data back and forth, you could use something like Statistics::R http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm HTH adam On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: > Hi, > > a very simple doubt, but I do not know how to manage this. > > I want to plot a histogram for all data in 'datos.txt'. > > a) by using R: > datos<-scan("datos.txt") > pdf("xh.pdf") > hist(datos) > dev.off() > > > b) How could I invoke R inside Perl to do the same?? > #!/usr/bin/perl > open(DAT,"datos.txt"); > while () { > chomp; > push(@datos,$_); > } > #now I want a histogram of values in @datos > > Thanks!! > > JR > > -- > Jos? Ram?n Blas - PhD > Dept. Biochemistry - Medicine School > University of Castilla-La Mancha > C Almansa, 14 > 02006 Albacete (Spain) > > Phone: +34 967599200 ext. 2958 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Mon Apr 2 10:33:06 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 02 Apr 2012 11:33:06 +0100 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> References: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> Message-ID: <4F798062.4050908@gmail.com> Alternatively you could go for a Perl-only approach using something like GD::Graph::Histogram. Cheers, Roy. On 02/04/2012 09:17, Adam Witney wrote: > > The quickest way to do this specific example is probably to just save > your R commands to a file (eg R_commands.R) and then run some kind of > system/exec/backtick function in your perl script to invoke R, > something like (untested): > > #!/usr/bin/perl use strict; use warnings; system( 'R --file > R_commands.R' ); > > Alternatively if you want perl and R to be able to interact and pass > data back and forth, you could use something like Statistics::R > > http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm > > HTH > > adam > > On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: > >> Hi, >> >> a very simple doubt, but I do not know how to manage this. >> >> I want to plot a histogram for all data in 'datos.txt'. >> >> a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) >> dev.off() >> >> >> b) How could I invoke R inside Perl to do the same?? >> #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; >> push(@datos,$_); } #now I want a histogram of values in @datos >> >> Thanks!! >> >> JR >> >> -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School >> University of Castilla-La Mancha C Almansa, 14 02006 Albacete >> (Spain) >> >> Phone: +34 967599200 ext. 2958 >> >> _______________________________________________ Bioperl-l mailing >> list Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ Bioperl-l mailing > list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Apr 2 12:59:40 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 2 Apr 2012 12:59:40 +0000 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> References: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> Message-ID: Not sure how well it is supported, but there is also Statistics::useR (which has an XS layer for conversing with R). https://metacpan.org/module/Statistics::useR chris On Apr 2, 2012, at 3:17 AM, Adam Witney wrote: > > The quickest way to do this specific example is probably to just save your R commands to a file (eg R_commands.R) and then run some kind of system/exec/backtick function in your perl script to invoke R, something like (untested): > > #!/usr/bin/perl > use strict; > use warnings; > system( 'R --file R_commands.R' ); > > Alternatively if you want perl and R to be able to interact and pass data back and forth, you could use something like Statistics::R > > http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm > > HTH > > adam > > On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: > >> Hi, >> >> a very simple doubt, but I do not know how to manage this. >> >> I want to plot a histogram for all data in 'datos.txt'. >> >> a) by using R: >> datos<-scan("datos.txt") >> pdf("xh.pdf") >> hist(datos) >> dev.off() >> >> >> b) How could I invoke R inside Perl to do the same?? >> #!/usr/bin/perl >> open(DAT,"datos.txt"); >> while () { >> chomp; >> push(@datos,$_); >> } >> #now I want a histogram of values in @datos >> >> Thanks!! >> >> JR >> >> -- >> Jos? Ram?n Blas - PhD >> Dept. Biochemistry - Medicine School >> University of Castilla-La Mancha >> C Almansa, 14 >> 02006 Albacete (Spain) >> >> Phone: +34 967599200 ext. 2958 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bartwiegmans at gmail.com Mon Apr 2 17:10:47 2012 From: bartwiegmans at gmail.com (Bart Wiegmans) Date: Mon, 2 Apr 2012 19:10:47 +0200 Subject: [Bioperl-l] Implementing Bioperl6 for GSoC 2012 In-Reply-To: <6D16BB69-A74A-4A7E-B81C-0B33E9A73473@illinois.edu> References: <6D16BB69-A74A-4A7E-B81C-0B33E9A73473@illinois.edu> Message-ID: Chris, Leon, others, Thank you for your timely responses. So far as the timeframe is concerned, I might be able to get student credits for participating in this projects as it is related to my study. In that case I would have more time free. At any rate, I understand it is suboptimal to start working in july, so I will do my best to make as much time free as possible. I've already checked out the bioper-6 projects as well as the biome project from github. I am not quite sure what scope of project to choose and I was hoping for your advice. File format import / export and database connectivity would come to mind, as these are subjects I am most familiar with. In such a scenario, aside from a set of modules / classes, the end goal would be a script that could search for and import a sequence from a number of popular databases, and save it on the users' hard disk. I am very much open to suggestions, however. Anyway, thank you for your time. Kind regards, Bart Wiegmans 2012/4/2 Fields, Christopher J : > Bart, > > I think this is a good idea, but to address one concern up-front: I believe the time frame (possible start in mid-July) will not be in your favor unless you can participate according to Google's schedule. ?There is some flexibility at the beginning, mainly during the community bonding period, but beyond that there isn't much wiggle room; we're as constrained as you are. ?The proposals to OBF over the last few years have been highly competitive, and this year will prove even more so as a large number of major Perl-based FLOSS orgs (primarily The Perl Foundation) were not accepted into GSoC. > > Now for Perl 6: > > BioPerl6 is a project Philip Mabon and I have already started up on github: > > ? https://github.com/cjfields/bioperl6 > > The code is basically proof-of-concept stuff at this point, and I don't believe it's current to spec last I checked or that it works on either the most current niecza or rakudo. ?All this was enough of a moving target to make it hard to maintain, but that seems to be settling somewhat now. ?It's pretty wide open, though, as far as I'm concerned. > > If you are to pursue this as a proposal, you will need to focus on a specific task or set of classes to implement along with docs and tests. ?A straight port isn't the best option; you should take advantage of Perl 6's strengths and possibly work on redesign where needed, keeping in mind that some aspects of bioperl might be considered a bit over-designed and out-of-date w/ re: to modern perl dictums. ?Also, learning a new language is nice, but that isn't the main focus for any GSoC project. ?At the end of the day, we should have usable code for the community, and if it acts as a entry point for more people to get involved with Perl 6 the better :) > > (I see that Leon has also chimed in on this with similar comments as well) > > I will be on and off #bioperl this week (pyrimidine). ?IRC is also logged in case I need to backlog (provided by one Moritz Lenz): > > ? http://irclog.perlgeek.de/bioperl/today > > chris > > On Apr 1, 2012, at 6:55 AM, Bart Wiegmans wrote: > >> Hello all BioPerl-ers, >> >> This is my first e-mail to the list and thus my introduction. My name >> is Bart Wiegmans, I study biology at the University of Groningen, the >> netherlands. It is my goal to implement bioperl6 this summer as part >> of the GSoC program. >> >> Why would I want such a thing? For a start, I'd like to learn more >> about bioinformatics. As I told you I study biology, so this has an >> obvious advantage for me. Also, I'd like to learn perl6 well, and this >> is only possible when one writes a significant program in it. >> Moreover, I think perl6 is awesome, and having a real-world toolkit >> like bioperl out there might just be enough to develop a significant >> community using it. As a third, I think perl5's object support is >> crufty, and difficult to learn for many people. These people include >> biologists who might not be inclined to learn it, and rather use some >> other tools instead. >> >> As to who I am, I already told you my name. I am 24 years old, and >> study biology at an undergraduate level. (For those interested, yes >> this means I haven't exactly been flying through my courses :-)). I >> have been programming computers ever since I was 16 years old, and >> earlier if you count BASIC. Starting out with C, most of that has been >> websites (in PHP), scripts (in Perl), and other smallish programs (in >> Java / Perl). For example, I implemented a parser and decoder for the >> dirac video specification as part of GSoC 2008, and a script which >> reads the NIH bookshelf website and translates this into ePub e-books. >> Read quite a few of them that way. >> >> Aside from my motivation and capabilities, two other factors somewhat >> complicate my involvement with GSoC. The first is that the academic >> year ends halfway in July in the netherlands, not in may as in the USA >> and in many other countries. This means that I am not 'free' in a real >> sense before that time. Also, I have a day job as a PHP programmer for >> a local online students' magazine, which also takes some time. Which >> is unfortunate, because I'd rather spend my time writing useful >> programs; hence, if you would accept me as a student I plan to take >> leave from this job during the period of GSoC. >> >> Anyway, I realize this has been enough information for any interested >> reader. If there is any interest on your side, I frequent freenode >> under the nickname brrt. Other than that and this e-mail address, I >> don't have much of an online presence. >> >> Kind regards, >> Bart Wiegmans >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florent.angly at gmail.com Mon Apr 2 22:30:09 2012 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 03 Apr 2012 08:30:09 +1000 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> Message-ID: <4F7A2871.8080207@gmail.com> To execute R commands from Perl, you can also try Statistics::R (http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm ), which has been around for longer, and which I have recently refactored. Regards, Florent On 02/04/12 22:59, Fields, Christopher J wrote: > Not sure how well it is supported, but there is also Statistics::useR (which has an XS layer for conversing with R). > > https://metacpan.org/module/Statistics::useR > > chris > > On Apr 2, 2012, at 3:17 AM, Adam Witney wrote: > >> The quickest way to do this specific example is probably to just save your R commands to a file (eg R_commands.R) and then run some kind of system/exec/backtick function in your perl script to invoke R, something like (untested): >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> system( 'R --file R_commands.R' ); >> >> Alternatively if you want perl and R to be able to interact and pass data back and forth, you could use something like Statistics::R >> >> http://search.cpan.org/~fangly/Statistics-R-0.27/lib/Statistics/R.pm >> >> HTH >> >> adam >> >> On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: >> >>> Hi, >>> >>> a very simple doubt, but I do not know how to manage this. >>> >>> I want to plot a histogram for all data in 'datos.txt'. >>> >>> a) by using R: >>> datos<-scan("datos.txt") >>> pdf("xh.pdf") >>> hist(datos) >>> dev.off() >>> >>> >>> b) How could I invoke R inside Perl to do the same?? >>> #!/usr/bin/perl >>> open(DAT,"datos.txt"); >>> while () { >>> chomp; >>> push(@datos,$_); >>> } >>> #now I want a histogram of values in @datos >>> >>> Thanks!! >>> >>> JR >>> >>> -- >>> Jos? Ram?n Blas - PhD >>> Dept. Biochemistry - Medicine School >>> University of Castilla-La Mancha >>> C Almansa, 14 >>> 02006 Albacete (Spain) >>> >>> Phone: +34 967599200 ext. 2958 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From huangyifeicmb at gmail.com Tue Apr 3 00:41:54 2012 From: huangyifeicmb at gmail.com (Yifei Huang) Date: Mon, 2 Apr 2012 20:41:54 -0400 Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: <4F7A2871.8080207@gmail.com> References: <01AA2168-9E29-4270-9FDA-3C1215DA994A@sgul.ac.uk> <4F7A2871.8080207@gmail.com> Message-ID: You may try RSPerl. http://www.omegahat.org/RSPerl/ Yifei 2012/4/2 Florent Angly > To execute R commands from Perl, you can also try Statistics::R ( > http://search.cpan.org/~**fangly/Statistics-R-0.27/lib/**Statistics/R.pm< > http://search.cpan.org/%**7Efangly/Statistics-R-0.27/**lib/Statistics/R.pm>), > which has been around for longer, and which I have recently refactored. > Regards, > Florent > > > On 02/04/12 22:59, Fields, Christopher J wrote: > >> Not sure how well it is supported, but there is also Statistics::useR >> (which has an XS layer for conversing with R). >> >> https://metacpan.org/module/**Statistics::useR >> >> chris >> >> On Apr 2, 2012, at 3:17 AM, Adam Witney wrote: >> >> The quickest way to do this specific example is probably to just save >>> your R commands to a file (eg R_commands.R) and then run some kind of >>> system/exec/backtick function in your perl script to invoke R, something >>> like (untested): >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> system( 'R --file R_commands.R' ); >>> >>> Alternatively if you want perl and R to be able to interact and pass >>> data back and forth, you could use something like Statistics::R >>> >>> http://search.cpan.org/~**fangly/Statistics-R-0.27/lib/**Statistics/R.pm >>> >>> HTH >>> >>> adam >>> >>> On 2 Apr 2012, at 06:21, Jos? Ram?n Blas Pastor wrote: >>> >>> Hi, >>>> >>>> a very simple doubt, but I do not know how to manage this. >>>> >>>> I want to plot a histogram for all data in 'datos.txt'. >>>> >>>> a) by using R: >>>> datos<-scan("datos.txt") >>>> pdf("xh.pdf") >>>> hist(datos) >>>> dev.off() >>>> >>>> >>>> b) How could I invoke R inside Perl to do the same?? >>>> #!/usr/bin/perl >>>> open(DAT,"datos.txt"); >>>> while () { >>>> chomp; >>>> push(@datos,$_); >>>> } >>>> #now I want a histogram of values in @datos >>>> >>>> Thanks!! >>>> >>>> JR >>>> >>>> -- >>>> Jos? Ram?n Blas - PhD >>>> Dept. Biochemistry - Medicine School >>>> University of Castilla-La Mancha >>>> C Almansa, 14 >>>> 02006 Albacete (Spain) >>>> >>>> Phone: +34 967599200 ext. 2958 >>>> >>>> ______________________________**_________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >>>> >>> >>> ______________________________**_________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >>> >> >> ______________________________**_________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> > > ______________________________**_________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/**mailman/listinfo/bioperl-l > -- Yifei Huang Department of Biology McMaster University From gregonomic at yahoo.co.nz Tue Apr 3 00:37:38 2012 From: gregonomic at yahoo.co.nz (Gregory Baillie) Date: Mon, 2 Apr 2012 17:37:38 -0700 (PDT) Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: <1333413458.84973.YahooMailNeo@web112506.mail.gq1.yahoo.com> Hi. If you only want to plot histograms (or other kinds of graphs), and don't need to use the statistics capabilities of R, then gnuplot (http://www.gnuplot.info/) is a good option. You can construct a string of gnuplot commands and your data, and pipe that directly to gnuplot, without having to write the commands or data to separate files and then use a system call to R. Of course, you do have to learn gnuplot, which may be more trouble than it's worth (I like it, but some people don't). Greg. ________________________________ From: Jos? Ram?n Blas Pastor To: bioperl-l at lists.open-bio.org; Bioperl mailing list Sent: Monday, 2 April 2012 3:21 PM Subject: [Bioperl-l] data into R from Perl arrays Hi, a very simple doubt, but I do not know how to manage this. I want to plot a histogram for all data in 'datos.txt'. a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) dev.off() b) How could I invoke R inside Perl to do the same?? #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; push(@datos,$_); } #now I want a histogram of values in @datos Thanks!! JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From gregonomic at yahoo.co.nz Tue Apr 3 00:37:38 2012 From: gregonomic at yahoo.co.nz (Gregory Baillie) Date: Mon, 2 Apr 2012 17:37:38 -0700 (PDT) Subject: [Bioperl-l] data into R from Perl arrays In-Reply-To: References: Message-ID: <1333413458.84973.YahooMailNeo@web112506.mail.gq1.yahoo.com> Hi. If you only want to plot histograms (or other kinds of graphs), and don't need to use the statistics capabilities of R, then gnuplot (http://www.gnuplot.info/) is a good option. You can construct a string of gnuplot commands and your data, and pipe that directly to gnuplot, without having to write the commands or data to separate files and then use a system call to R. Of course, you do have to learn gnuplot, which may be more trouble than it's worth (I like it, but some people don't). Greg. ________________________________ From: Jos? Ram?n Blas Pastor To: bioperl-l at lists.open-bio.org; Bioperl mailing list Sent: Monday, 2 April 2012 3:21 PM Subject: [Bioperl-l] data into R from Perl arrays Hi, a very simple doubt, but I do not know how to manage this. I want to plot a histogram for all data in 'datos.txt'. a) by using R: datos<-scan("datos.txt") pdf("xh.pdf") hist(datos) dev.off() b) How could I invoke R inside Perl to do the same?? #!/usr/bin/perl open(DAT,"datos.txt"); while () { chomp; push(@datos,$_); } #now I want a histogram of values in @datos Thanks!! JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Tue Apr 3 15:34:43 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 3 Apr 2012 11:34:43 -0400 Subject: [Bioperl-l] Downloading refseq genomes in batch Message-ID: Hi All, I am trying to download refseq genomes in batch. But instead of accession number i have genome names (=~ 500). Is there any way i can download them using some bioperl module ? Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From carandraug+dev at gmail.com Tue Apr 3 15:53:32 2012 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Tue, 3 Apr 2012 16:53:32 +0100 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: Message-ID: On 3 April 2012 16:34, shalabh sharma wrote: > Hi All, > ? ? ? ? I am trying to download refseq genomes in batch. But instead of > accession number i have genome names (=~ 500). > Is there any way i can download them using some bioperl module ? If you have their name/official symbol, then searching on the database should nly return one hit, therefore one UID. Make the search, get that number, and use it for download. The EUtilities module should do that. Carn? From shalabh.sharma7 at gmail.com Tue Apr 3 18:15:16 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 3 Apr 2012 14:15:16 -0400 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: Message-ID: Hi Came, Thanks for your reply. I tried to get UID from genome names but i cant find on EUtilities. I have taxa id for those genomes, can i download genomes with taxa id in batch ? Thanks Shalabh On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug wrote: > On 3 April 2012 16:34, shalabh sharma wrote: > > Hi All, > > I am trying to download refseq genomes in batch. But instead of > > accession number i have genome names (=~ 500). > > Is there any way i can download them using some bioperl module ? > > If you have their name/official symbol, then searching on the database > should nly return one hit, therefore one UID. Make the search, get > that number, and use it for download. The EUtilities module should do > that. > > Carn? > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From berry at exisoft.nl Tue Apr 3 20:24:54 2012 From: berry at exisoft.nl (Berry Kriesels) Date: Tue, 03 Apr 2012 22:24:54 +0200 Subject: [Bioperl-l] Google summer of code Bio::Structure Message-ID: <4F7B5C96.5090208@exisoft.nl> Dear all, Currently I am considering applying as a student for the 'google summer of code' and would like to contribute to BioPerl via this way. At the moment I am investigating extending the BioPerl Bio::Structure**library in such a way that also some protein modelling can be done or at least add a method so one could do a pdb structure quality assessment. One way is to do it with the use of online services such as for instance Prosaweb (and thus creating a wrapper for this service). Also I could make libraries which one could use to asses the phi and psi angles of certain atoms within a PDB file or the distance in angstrom among many other coordinate measurements within a protein PDB file but also among (comparison) of multiple PDB files. Also adding functions such as DOPE (*D*iscrete*O*ptimized*P*rotein*E*nergy) for model comparisons is an option. There are tons of options to add. However... I have a few questions regarding this and hope some of you will be willing to answer: 1. As users of BioPerl would you consider extending the current Bio::Structure library as a added value or would you rather see effort made in different areas. 2. If one would see extension of the current Bio:Structure library as a useful project, what would your main interests and wishes be? Thank you for input and time. With kind regards, Berry Msc student Bio-informatics. From jovel_juan at hotmail.com Tue Apr 3 21:02:26 2012 From: jovel_juan at hotmail.com (Juan Jovel) Date: Tue, 3 Apr 2012 21:02:26 +0000 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: , , Message-ID: Hi Shalab You can try use Bio::DB::GenBank, but I believe the NCBI does not like people doing many remote lookups. I would advise you download the whole database you are interested in, and then you parse it locally. Cheers, Juan > Date: Tue, 3 Apr 2012 14:15:16 -0400 > From: shalabh.sharma7 at gmail.com > To: carandraug+dev at gmail.com > CC: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Downloading refseq genomes in batch > > Hi Came, > Thanks for your reply. > I tried to get UID from genome names but i cant find on EUtilities. > I have taxa id for those genomes, can i download genomes with taxa id in > batch ? > > Thanks > Shalabh > > > On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug wrote: > > > On 3 April 2012 16:34, shalabh sharma wrote: > > > Hi All, > > > I am trying to download refseq genomes in batch. But instead of > > > accession number i have genome names (=~ 500). > > > Is there any way i can download them using some bioperl module ? > > > > If you have their name/official symbol, then searching on the database > > should nly return one hit, therefore one UID. Make the search, get > > that number, and use it for download. The EUtilities module should do > > that. > > > > Carn? > > > > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Apr 3 21:19:07 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 3 Apr 2012 21:19:07 +0000 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: , , Message-ID: 500 sequences isn't too bad for a remote lookup (I have run about ~20K myself). It's much easier if you can grab them as a batch, e.g. run esearch for the IDs, use efetch with the webenv/key to grab the sequences. NCBI is more worried about the number of requests made, the length of time between requests, and the time of day requests are made. In fact, I recall updating EUtilities recently so it can use a POST, so you can grab ~2000 seqs at a time w/o having to iterate through them. chris On Apr 3, 2012, at 4:02 PM, Juan Jovel wrote: > > Hi Shalab > You can try use Bio::DB::GenBank, but I believe the NCBI does not like people doing many remote lookups. I would advise you download the whole database you are interested in, and then you parse it locally. > Cheers, Juan >> Date: Tue, 3 Apr 2012 14:15:16 -0400 >> From: shalabh.sharma7 at gmail.com >> To: carandraug+dev at gmail.com >> CC: Bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Downloading refseq genomes in batch >> >> Hi Came, >> Thanks for your reply. >> I tried to get UID from genome names but i cant find on EUtilities. >> I have taxa id for those genomes, can i download genomes with taxa id in >> batch ? >> >> Thanks >> Shalabh >> >> >> On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug wrote: >> >>> On 3 April 2012 16:34, shalabh sharma wrote: >>>> Hi All, >>>> I am trying to download refseq genomes in batch. But instead of >>>> accession number i have genome names (=~ 500). >>>> Is there any way i can download them using some bioperl module ? >>> >>> If you have their name/official symbol, then searching on the database >>> should nly return one hit, therefore one UID. Make the search, get >>> that number, and use it for download. The EUtilities module should do >>> that. >>> >>> Carn? >>> >> >> >> >> -- >> Shalabh Sharma >> Scientific Computing Professional Associate (Bioinformatics Specialist) >> Department of Marine Sciences >> University of Georgia >> Athens, GA 30602-3636 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Wed Apr 4 21:24:08 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 4 Apr 2012 17:24:08 -0400 Subject: [Bioperl-l] Weird efetch problem. Message-ID: Hi All, I am facing a really weird problem using efetch. I am getting different outputs if i am using different method of passing values. Like if i am using this method: #!/usr/bin/perl -w use Bio::DB::EUtilities; use Bio::SeqIO; my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'protein', -rettype => 'fasta', -email => 'ssharmai at uga.edu', -id => '256009369'); my $file = 'genome.fasta'; $factory->get_Response(-file => $file); I am getting correct protein sequence but if i am passing values (same id) via an array i am getting nucleotide sequences. use Bio::DB::EUtilities; use Bio::SeqIO; my @ids; my $c = 0; open(IN,"$ARGV[0]"); while(){ my $id = $_; chomp($id);chop($id); $ids[$c] = $id; print "$id\n"; $c++; } close(IN); my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'protein', -rettype => 'fasta', -email => 'ssharmai at uga.edu', -id => \@ids); my $file = 'genome.fasta'; Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From sidd.basu at gmail.com Thu Apr 5 10:31:47 2012 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 5 Apr 2012 05:31:47 -0500 Subject: [Bioperl-l] Re: Weird efetch problem. In-Reply-To: References: Message-ID: <20120405103146.GA5544@Macintosh-388.local> On Wed, 04 Apr 2012, shalabh sharma wrote: > Hi All, > I am facing a really weird problem using efetch. I am getting > different outputs if i am using different method of passing values. > > Like if i am using this method: > > #!/usr/bin/perl -w > use Bio::DB::EUtilities; > use Bio::SeqIO; > my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -db => 'protein', > -rettype => 'fasta', > -email => 'ssharmai at uga.edu', > -id => '256009369'); > > my $file = 'genome.fasta'; > $factory->get_Response(-file => $file); > > I am getting correct protein sequence but if i am passing values (same id) > via an array i am getting nucleotide sequences. > > use Bio::DB::EUtilities; > use Bio::SeqIO; > my @ids; > my $c = 0; > open(IN,"$ARGV[0]"); > while(){ > my $id = $_; > chomp($id);chop($id); > $ids[$c] = $id; > print "$id\n"; > $c++; > } > close(IN); > my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -db => 'protein', > -rettype => 'fasta', > -email => 'ssharmai at uga.edu', > -id => \@ids); Could you send the ids here. -siddhartha > > my $file = 'genome.fasta'; > > Thanks > Shalabh > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Apr 5 13:07:28 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 5 Apr 2012 13:07:28 +0000 Subject: [Bioperl-l] Weird efetch problem. In-Reply-To: <20120405103146.GA5544@Macintosh-388.local> References: <20120405103146.GA5544@Macintosh-388.local> Message-ID: On Apr 5, 2012, at 5:31 AM, Siddhartha Basu wrote: > On Wed, 04 Apr 2012, shalabh sharma wrote: > >> Hi All, >> I am facing a really weird problem using efetch. I am getting >> different outputs if i am using different method of passing values. >> ... >> >> I am getting correct protein sequence but if i am passing values (same id) >> via an array i am getting nucleotide sequences. >> >> .. > Could you send the ids here. > > -siddhartha And please file a bug report on this if something is found. I do know if you use accession numbers you can sometimes get odd results. I recommend only using UIDs (the GI in the case of protein and nuc seqs). chris From shalabh.sharma7 at gmail.com Thu Apr 5 14:40:06 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 5 Apr 2012 10:40:06 -0400 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: Message-ID: Hi All, Thanks for all the suggestions. Thanks a lot Chris, i am using your method to pull out genomes. Its working fine. Thanks Shalabh On Tue, Apr 3, 2012 at 5:19 PM, Fields, Christopher J wrote: > 500 sequences isn't too bad for a remote lookup (I have run about ~20K > myself). It's much easier if you can grab them as a batch, e.g. run > esearch for the IDs, use efetch with the webenv/key to grab the sequences. > NCBI is more worried about the number of requests made, the length of time > between requests, and the time of day requests are made. > > In fact, I recall updating EUtilities recently so it can use a POST, so > you can grab ~2000 seqs at a time w/o having to iterate through them. > > chris > > On Apr 3, 2012, at 4:02 PM, Juan Jovel wrote: > > > > > Hi Shalab > > You can try use Bio::DB::GenBank, but I believe the NCBI does not like > people doing many remote lookups. I would advise you download the whole > database you are interested in, and then you parse it locally. > > Cheers, Juan > >> Date: Tue, 3 Apr 2012 14:15:16 -0400 > >> From: shalabh.sharma7 at gmail.com > >> To: carandraug+dev at gmail.com > >> CC: Bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Downloading refseq genomes in batch > >> > >> Hi Came, > >> Thanks for your reply. > >> I tried to get UID from genome names but i cant find on EUtilities. > >> I have taxa id for those genomes, can i download genomes with taxa id in > >> batch ? > >> > >> Thanks > >> Shalabh > >> > >> > >> On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug >wrote: > >> > >>> On 3 April 2012 16:34, shalabh sharma > wrote: > >>>> Hi All, > >>>> I am trying to download refseq genomes in batch. But instead of > >>>> accession number i have genome names (=~ 500). > >>>> Is there any way i can download them using some bioperl module ? > >>> > >>> If you have their name/official symbol, then searching on the database > >>> should nly return one hit, therefore one UID. Make the search, get > >>> that number, and use it for download. The EUtilities module should do > >>> that. > >>> > >>> Carn? > >>> > >> > >> > >> > >> -- > >> Shalabh Sharma > >> Scientific Computing Professional Associate (Bioinformatics Specialist) > >> Department of Marine Sciences > >> University of Georgia > >> Athens, GA 30602-3636 > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From k.d.murray.91 at gmail.com Fri Apr 6 13:49:32 2012 From: k.d.murray.91 at gmail.com (Kevin Murray) Date: Fri, 6 Apr 2012 23:49:32 +1000 Subject: [Bioperl-l] sequence proxy server Message-ID: Hi all, I'm an undergrad student in molecular biology at the ANU in Australia, and my research projects are becoming increasingly bioinformatics heavy. The latest one has involved quite a large amount of sequence retrieval from GenBank and GenPept. The download speed to Australia from NCBI's servers is rather slow, and i've been thinking about how we can improve this. One solution would be to use Bio::DB::Flat with GenBank sequences on a local computer. However, in a situation where there are multiple people in a lab doing bioinformatics, it seems to me a bit of a waste to have the entire genbank/genpept database, or even the relevant sections thereof, on each computer. So, i though about writing a "sequence proxy" cgi script, and a corresponding module, which would work a bit like this: The user calls Bio::DB::SeqProxy::GenBank as they would Bio::DB::GenBank, with the exception that a parameter for the address of the sequence proxy server is required. The module then sends a request similar to that sent to NCBI's servers by calling Bio::DB::GenBank->get_Seq_by_x() to the sequence proxy server I believe all requests go to the efetch page now (please correct me if I'm wrong, i have read the relevant bioperl module code but not thoroughly), so the CGI script on the sequence proxy would take arguments in a similar fashion to make writing the client side module easier. The CGI script would use a Bio::DB::Flat database, or an interface to an SQL database to determine if the required sequence is stored locally. (as a aside, i'd like your thoughts on Bio::DB::Flat vs Bio::DB::Sql or similar) If the sequence exists locally, it would be returned to the user, either as plain text, or inside an XML container (see below). If not, it would be retrieved from the remote database using the relevant Bio::DB module, and returned. The sequence would either be returned as the relevant sequence format (which would default to GenBank format) in plain text, or as an XML document similar to: 1 ___YOUR GENBANK FILE HERE___ Local Database The aim of the xml document would be to simplify handling of server errors and allow for the specification of other metadata such as which database the sequence came from. Firstly, I'd like to know if this sounds feasible, and if so, if someone is already working on something similar? I don't want to reinvent the wheel. Secondly, I'd like to ask for your comments and advice. Being reasonably new to bioperl (started using bioperl about 6 months ago, but I've been coding in various languages for 8 years) I don't expect to have considered things that may seem obvious to a more experienced bioperl-er, so please be as brutally constructive in your criticism as you see fit =]. I know this is alot of questions, so thanks in advance for your help. Cheers, and a happy Easter to those who celebrate it. Regards Kevin Murray From shalabh.sharma7 at gmail.com Fri Apr 6 14:52:30 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 6 Apr 2012 10:52:30 -0400 Subject: [Bioperl-l] Question about EUtils esearch Message-ID: Hi All, I am trying to get all the UIDs for few genomes. For example: for 'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to NCBI genome page i see there are only =~ 32,00 proteins. http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens I have done this for lot of genomes and i am afraid that i have to do this again. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From shalabh.sharma7 at gmail.com Fri Apr 6 18:27:29 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 6 Apr 2012 14:27:29 -0400 Subject: [Bioperl-l] Downloading refseq genomes in batch In-Reply-To: References: Message-ID: Hi Chris, I am using the method you suggested. But i have a question. The UIDs that i am searching using "esearch" are not same as the number of proteins in that genome. For Example: for 'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to NCBI genome page i see there are only =~ 32,00 proteins. http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens Thanks Shalabh On Thu, Apr 5, 2012 at 10:40 AM, shalabh sharma wrote: > Hi All, > Thanks for all the suggestions. > Thanks a lot Chris, i am using your method to pull out genomes. Its > working fine. > > Thanks > Shalabh > > > On Tue, Apr 3, 2012 at 5:19 PM, Fields, Christopher J < > cjfields at illinois.edu> wrote: > >> 500 sequences isn't too bad for a remote lookup (I have run about ~20K >> myself). It's much easier if you can grab them as a batch, e.g. run >> esearch for the IDs, use efetch with the webenv/key to grab the sequences. >> NCBI is more worried about the number of requests made, the length of time >> between requests, and the time of day requests are made. >> >> In fact, I recall updating EUtilities recently so it can use a POST, so >> you can grab ~2000 seqs at a time w/o having to iterate through them. >> >> chris >> >> On Apr 3, 2012, at 4:02 PM, Juan Jovel wrote: >> >> > >> > Hi Shalab >> > You can try use Bio::DB::GenBank, but I believe the NCBI does not like >> people doing many remote lookups. I would advise you download the whole >> database you are interested in, and then you parse it locally. >> > Cheers, Juan >> >> Date: Tue, 3 Apr 2012 14:15:16 -0400 >> >> From: shalabh.sharma7 at gmail.com >> >> To: carandraug+dev at gmail.com >> >> CC: Bioperl-l at lists.open-bio.org >> >> Subject: Re: [Bioperl-l] Downloading refseq genomes in batch >> >> >> >> Hi Came, >> >> Thanks for your reply. >> >> I tried to get UID from genome names but i cant find on EUtilities. >> >> I have taxa id for those genomes, can i download genomes with taxa id >> in >> >> batch ? >> >> >> >> Thanks >> >> Shalabh >> >> >> >> >> >> On Tue, Apr 3, 2012 at 11:53 AM, Carn? Draug > >wrote: >> >> >> >>> On 3 April 2012 16:34, shalabh sharma >> wrote: >> >>>> Hi All, >> >>>> I am trying to download refseq genomes in batch. But instead >> of >> >>>> accession number i have genome names (=~ 500). >> >>>> Is there any way i can download them using some bioperl module ? >> >>> >> >>> If you have their name/official symbol, then searching on the database >> >>> should nly return one hit, therefore one UID. Make the search, get >> >>> that number, and use it for download. The EUtilities module should do >> >>> that. >> >>> >> >>> Carn? >> >>> >> >> >> >> >> >> >> >> -- >> >> Shalabh Sharma >> >> Scientific Computing Professional Associate (Bioinformatics Specialist) >> >> Department of Marine Sciences >> >> University of Georgia >> >> Athens, GA 30602-3636 >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Fri Apr 6 19:09:23 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 6 Apr 2012 19:09:23 +0000 Subject: [Bioperl-l] Question about EUtils esearch In-Reply-To: References: Message-ID: Shalabh, You should try getting the specific genome project ID of interest, linking to the proteins, and then grab those. The EUtilities cookbook has a few examples on how to do that. chris On Apr 6, 2012, at 9:52 AM, shalabh sharma wrote: > Hi All, > I am trying to get all the UIDs for few genomes. > For example: > for 'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to > NCBI genome page i see there are only =~ 32,00 proteins. > http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens > > I have done this for lot of genomes and i am afraid that i have to do this > again. > > Thanks > Shalabh > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From wrp at virginia.edu Sat Apr 7 20:56:16 2012 From: wrp at virginia.edu (William Pearson) Date: Sat, 7 Apr 2012 16:56:16 -0400 Subject: [Bioperl-l] Bioperl-l Digest, Vol 108, Issue 7 In-Reply-To: References: Message-ID: To get the UIDs (GIs) that you want, search for human[organism] AND srcdb_refseq[Properties] This will get you the refseq proteins you want. Bill Pearson > Message: 1 > Date: Fri, 6 Apr 2012 14:27:29 -0400 > From: shalabh sharma > Subject: Re: [Bioperl-l] Downloading refseq genomes in batch > > Hi Chris, > I am using the method you suggested. > But i have a question. The UIDs that i am searching using "esearch" are not > same as the number of proteins in that genome. > > For Example: > for 'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to > NCBI genome page i see there are only =~ 32,00 proteins. > http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens > > Thanks > Shalabh > From joel.klein at wur.nl Sun Apr 8 23:35:18 2012 From: joel.klein at wur.nl (Bradyjoel) Date: Sun, 8 Apr 2012 16:35:18 -0700 (PDT) Subject: [Bioperl-l] Need some help with : Error "Use of uninitialized value" Message-ID: <33653318.post@talk.nabble.com> Hi all, I have little experiences in programming with Perl/Bioperl. I'm currently working on a script that takes a whole genome from a bacteria as input, converts it into a multiple fasta file containing all the open reading frames and blast it against a multiple protein fasta file with know proteins. When I get a hit I want to combine the header of the known protein with the orf sequence, here it gives an error when I try to go through the orf file and extract the right corresponding sequence. The error it gives is : Use of uninitialized value $seq in print at blastscript.pl line .. Is there someone who has an idea what caused this error, and can help me with solving it? Regards, Joel (I put my script in the attachment) http://old.nabble.com/file/p33653318/blastscript.pl blastscript.pl -- View this message in context: http://old.nabble.com/Need-some-help-with-%3A-Error-%22Use-of-uninitialized-value%22-tp33653318p33653318.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From afia.hayati at gmail.com Thu Apr 5 04:52:01 2012 From: afia.hayati at gmail.com (afia hayati) Date: Thu, 5 Apr 2012 13:52:01 +0900 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE Message-ID: Dear all, I am afia, a PhD student in Bioinformatics. I am so interested to participate GSoC and would like to apply for project Bio::Assembly, Ace2Sam and Sam2Ace converter. I have written a proposal based on the guidance for prospective GSoC student. I paste my proposal in here. If you have time, please give me suggestions. Thank you very much. Sincerely, Afiahayati [~Be Passion, Patient and Persistent~] *Google Summer of Code 2012* *Proposal* *ACEtoSAM and SAMtoACE* 1. *Contact information * 1. Full name :Afiahayati 2. Address : Hiyoshi International House (Room C301-1), 223-0061 Yokohama-shi Kouhoku-ku, Hiyoshi 2-27 Kanagawa ? Japan 3. Email : afia.hayati at gmail.com 4. Phone number : 818044637237 5. IRC nick : afia 2. *Motivation to join this project * I am a PhD student in bioinformatics. My research is in genome assembly, especially metagenome assembly. I have same idea that the converter from ACEtoSAM and vice versa is very useful. I am familiar with Perl and BioPerl, so there is no reason for not participating in this project 3. *Programming experience and skills * 1. Perl also BioPerl since January 2010 2. R, since January 2008 3. Oracle, since January 2008 4. Biojava, since January 2007 5. PHP , since January 2006 6. C++, since January 2006 7. Java, since January 2006 8. MySQL, since January 2005 9. C , since January 2005 4. *Open source projects involved with * 1. Metagenome Assembly, 2012 (with supervisor) Develop de novo assembler for metagenomic data from short sequence reads Using C, C++ and Perl 2. Develop some interfaces in RCommander, 2010 (in team) 3. Computer system of academic hospital, 2009 (in team) By modifying an open source hospital information system, Care2x Using PHP, Java script and HTML 4. Academic data warehouse and data mining, 2008 (in team) Using Pentaho Business Analytics and R programming language 5. *Project Plan * 1. *Before April 23 * 1. Study the format of SAM and ACE more detail 2. Study the biodesign related to module Bio::Assembly::IO especially Bio::Assembly::IO::ACE and Bio::Assembly::IO::SAM 3. Study the documentation and the code of module Bio::Assembly::IO::ACE and Bio::Assembly::IO::SAM 2. *April 23 - May 20 (before official coding period) * 1. To do self coding for Bio::Assembly::IO::ACE and Bio::Assembly::IO::SAM to improve my understand. 2. Keep contact with my mentor and the BioPerl community. I will active in mailing list and IRC to confirm my understanding about Bio::Assembly::IO::ACE and Bio::Assembly::IO::SAM and also discuss the operations (the methods) needed for a module ACEtoSAM and SAMtoACE converting. 3. With the supervision from my mentor, try to determine the appropriate design of module ACEtoSAM and SAMtoACE converting. 3. *May 21 - June 21 * 1. Determine the final design of module ACEtoSAM and SAMtoACE converting. 2. Code the module ACEtoSAM and SAMtoACE converting 3. Test my code by myself 4. Discuss with my mentor to design good test 5. Test my code based on the test design 4. *June 22 - July 8 * 1. Discuss with my mentor about my code in order to publish in bioperl community 2. Publish my code to the community and learn the feedback *JULY 9 MID TERM EVALUATION * 5. *July 9 - August 5 * 1. Improving the code (do iteration activities) : 1. Keep contact with the community, learn the feedback 2. Make changes in the code, with the supervision from my mentor 3. Test the code and publish the code to the community 2. Finalize the code 3. Start writing the POD documentation 6. *August 6 - August 13 * For final documentation *A buffer of a week for unpredicted delay * *AUGUST 20 FINAL EVALUATION* From zwyl001 at aucklanduni.ac.nz Sun Apr 1 03:35:51 2012 From: zwyl001 at aucklanduni.ac.nz (Zachariah Wylde) Date: Sun, 1 Apr 2012 15:35:51 +1200 Subject: [Bioperl-l] Output of a BLAST parse to text file Message-ID: Hi there, I am very new to Bioperl, so excuse me if come across as simple! I need to write a bioperl script to extract information from BLAST results. The script needs to count how many HSPs are on each mouse chromosome and be written to a tab-separated table. I have this so far, but do not understand how to sort the information. I would much, appreciate if you could help me?? Yours sincerely, Zac Wylde use strict; use warnings; use lib "C:/Program Files (x86)/BioPerl"; use Bio::SearchIO; my $infile = "Alignment_Ref_Seq.txt"; open INFILE, $infile or die "Cannot open $infile: $!"; my $outfile = "assignment2.txt"; open OUTFILE, ">$outfile" or die "Cannot open $outfile: $!"; my $parser = new Bio::SearchIO(-format => 'blast', -file => 'Alignment_Ref_Seq.txt'); while (my $result = $parser->next_result){ while (my $hit = $result->next_hit){ while (my $hsp = $hit->next_hsp){ if ($hit->description =~ /(mus musculus)|(mouse)/i){ if ($hit->description =~ /chromosome (\w+)/){ print "Hit = ", $hit->name, " \t", "chromosome = ", $1, " \t", "HSPs = ", $hit->num_hsps, "\n"; } } } } } close INFILE; close OUTFILE; #unknown #chromosome from From heath.obrien at gmail.com Tue Apr 3 16:56:31 2012 From: heath.obrien at gmail.com (Heath O'Brien) Date: Tue, 3 Apr 2012 16:56:31 +0000 (UTC) Subject: [Bioperl-l] =?utf-8?q?problem_with_trunc=5Fwith=5Ffeatures_=28Seq?= =?utf-8?b?VXRpbHMucG0p?= Message-ID: Hi All, I've encountered a bug in the trunc_with_features function in SeqUtils.pm, or at least behavior that was unexpected to me: Features with fuzzy coordinates in the original sequence are converted to exact coordinates in the truncated sequence. For example, the script below changes the coordinates for the feature from <1..5 to 1..5. I have modified the code to change this behavior on my system, but I thought I'd post something here in case others encounter the same problem. all good things, Heath #!/usr/bin/perl -w use strict; use warnings; use Bio::SeqIO; use Bio::SeqUtils; my $infile= shift; my $inIO = Bio::SeqIO->new('-file' => $infile, '-format' => 'genbank') or die "could not open seq file $infile\n"; my $outfile = $infile . '_out.gbk'; my $outIO = Bio::SeqIO->new('-file' => ">$outfile", '-format' => 'genbank') or die "could not open seq file $outfile\n"; my $in_seq = $inIO->next_seq; my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); $outIO->write_seq($out_seq); exit; LOCUS test_sequence 57303 bp DNA linear UNA DEFINITION Sequence to demonstrate unexpected behavior of trunc_with_features ACCESSION unknown KEYWORDS . FEATURES Location/Qualifiers source 1..10 /mol_type="genomic DNA" gene <1..5 /gene="test" CDS <1..5 /product="hypothetical protein" ORIGIN 1 caagattaaa // From mkhalfan at cshl.edu Thu Apr 5 19:29:35 2012 From: mkhalfan at cshl.edu (Khalfan, Mohammed) Date: Thu, 5 Apr 2012 19:29:35 +0000 Subject: [Bioperl-l] Bio::AlignIO->add_new ORDER? Message-ID: <7A8AC28B954CFA428BD67E1376B889A21F0077FC@EX-HS-MBX03.cshl.edu> Hi, I am having a problem trying to add a new sequence to an alignment using the order parameter. I would like to add the sequence to the first position (row) in the alignment, is this possible? Here is what my code looks like: use Bio::AlignIO; use Bio::LocatableSeq; use Bio::SimpleAlign; my $in = Bio::AlignIO->new( -file => $infile ); # $infile is the output from muscle my $aln = $in->next_aln; # build a consensus from the current alignment my $consensus = $aln->consensus_string(); # make the consensus sequence obtained in the above step into a LocatableSeq object my $consensus_obj = new Bio::LocatableSeq ( -seq => $consensus, -id => 'Consensus', -start => 1, -end => length($consensus), ); # add consensus sequence to alignment $aln->add_seq($consensus_obj, 1); ## END CODE ## I have tried $aln->add_seq(seq=>$consensus_obj, order=1); $aln->add_seq(SEQ=>$consensus_obj, ORDER=1); But no luck, I cannot get the consensus listed as the first sequence in the alignment. Is this possible? I can add it in like this successfully, but it adds it to the end, which is not what I need. $aln->add_seq($consensus_obj); These are the errors I get: Using this syntax: $aln->add_seq($consensus_obj, 1); I get this error: Use of uninitialized value in subtraction (-) at /usr/lib/perl5/site_perl/5.8.8/Bio/SimpleAlign.pm line 327, line 132. Using this syntax: $aln->add_seq(seq=>$consensus_obj, order=1); I get this error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Unable to process non locatable sequences [] STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::SimpleAlign::add_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SimpleAlign.pm:297 STACK: ./muscle_post_processor.pl:49 ----------------------------------------------------------- Any assistance would be much appreciated. Thank you. From jason.stajich at gmail.com Mon Apr 9 19:52:43 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 9 Apr 2012 14:52:43 -0500 Subject: [Bioperl-l] Need some help with : Error "Use of uninitialized value" In-Reply-To: <33653318.post@talk.nabble.com> References: <33653318.post@talk.nabble.com> Message-ID: <74341B2D-5EC2-4421-B66B-F0193CA4FB52@gmail.com> You really want to create sequence object(s) an pass these into the BLAST factory. I also can't figure out why you are manually parsing the EMBL file and then using SeqIO later. Why not use SeqIO to parse the embl/genbank file? You also don't report the line number of your current problem, but one can surmise it is here: my $seq = $db->seq($id); print $seq,"\n"; The error indicates you are looking up a sequence ID that doesn't exist since you get an undefined sequence. I would suggest printing out the name of the ID you are asking for to make sure it is correct. Typically we protect these queries like if( my $seqstr = $db->seq($id) ) { print $seqstr, "\n"; } else { warn "cannot find $id in sequence db file\n"); } I think you have not really structured your logic well enough in that loop - you only want to build Bio::DB::Fasta once, the whole point is index once and then query it multiple times. You might consider starting with this code which does a lot of the stuff you are trying to do to extract annotated features. https://github.com/bioperl/bioperl-live/blob/master/scripts/seq/bp_extract_feature_seq.pl I think you are also use tr wrong - if you want to replace replace a string with an empty string you should use s/// and you also need to escape the | character since it has special meaning. I guess in your case you just want the sequence - you would use use Bio::SeqIO to read in your sequence and then pass this back out as FASTA to give to getorf. I don't know if we have a wrapper for EMBOSS's getorf. There are probably a lot more things that need some attention but you should start on these. Jason On Apr 8, 2012, at 6:35 PM, Bradyjoel wrote: > > Hi all, > > I have little experiences in programming with Perl/Bioperl. I'm currently > working on a script that takes a whole genome from a bacteria as input, > converts it into a multiple fasta file containing all the open reading > frames and blast it against a multiple protein fasta file with know > proteins. When I get a hit I want to combine the header of the known protein > with the orf sequence, here it gives an error when I try to go through the > orf file and extract the right corresponding sequence. The error it gives is > : Use of uninitialized value $seq in print at blastscript.pl line .. > Is there someone who has an idea what caused this error, and can help me > with solving it? > > Regards, Joel (I put my script in the attachment) > http://old.nabble.com/file/p33653318/blastscript.pl blastscript.pl > -- > View this message in context: http://old.nabble.com/Need-some-help-with-%3A-Error-%22Use-of-uninitialized-value%22-tp33653318p33653318.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason.stajich at gmail.com Mon Apr 9 19:57:52 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 9 Apr 2012 14:57:52 -0500 Subject: [Bioperl-l] Bio::AlignIO->add_new ORDER? In-Reply-To: <7A8AC28B954CFA428BD67E1376B889A21F0077FC@EX-HS-MBX03.cshl.edu> References: <7A8AC28B954CFA428BD67E1376B889A21F0077FC@EX-HS-MBX03.cshl.edu> Message-ID: You cannot use order=1, it would have to be order => 1 as you are passing in a hash not an assignment. However, I think the rearrange function that parses arguments prefers a leading '-' so it should be -order => 1. Same thing for -seq=>$seq not seq=$seq Did you try using exactly what is in the perldoc? Title : add_seq Usage : $myalign->add_seq($newseq); $myalign->add_seq(-SEQ=>$newseq, -ORDER=>5); Function : Adds another sequence to the alignment. *Does not* align it - just adds it to the hashes. If -ORDER is specified, the sequence is inserted at the the position spec'd by -ORDER, and existing sequences are pushed down the storage array. Returns : nothing Args : A Bio::LocatableSeq object Positive integer for the sequence position (optional) Also - I am not sure what version of the code you are using, that line error you report is not in the current code so you may have to print out what is on those lines or consider upgrading to latest version of the code. On Apr 5, 2012, at 2:29 PM, Khalfan, Mohammed wrote: > Hi, > > I am having a problem trying to add a new sequence to an alignment using the order parameter. > > I would like to add the sequence to the first position (row) in the alignment, is this possible? Here is what my code looks like: > > use Bio::AlignIO; > use Bio::LocatableSeq; > use Bio::SimpleAlign; > > my $in = Bio::AlignIO->new( -file => $infile ); # $infile is the output from muscle > > my $aln = $in->next_aln; > > # build a consensus from the current alignment > my $consensus = $aln->consensus_string(); > > # make the consensus sequence obtained in the above step into a LocatableSeq object > my $consensus_obj = new Bio::LocatableSeq ( > -seq => $consensus, > -id => 'Consensus', > -start => 1, > -end => length($consensus), > ); > > # add consensus sequence to alignment > $aln->add_seq($consensus_obj, 1); > > ## END CODE ## > > I have tried > $aln->add_seq(seq=>$consensus_obj, order=1); > $aln->add_seq(SEQ=>$consensus_obj, ORDER=1); > > But no luck, I cannot get the consensus listed as the first sequence in the alignment. Is this possible? > > I can add it in like this successfully, but it adds it to the end, which is not what I need. > $aln->add_seq($consensus_obj); > > These are the errors I get: > > Using this syntax: $aln->add_seq($consensus_obj, 1); > I get this error: > Use of uninitialized value in subtraction (-) at /usr/lib/perl5/site_perl/5.8.8/Bio/SimpleAlign.pm line 327, line 132. > > Using this syntax: $aln->add_seq(seq=>$consensus_obj, order=1); > I get this error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Unable to process non locatable sequences [] > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::SimpleAlign::add_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SimpleAlign.pm:297 > STACK: ./muscle_post_processor.pl:49 > ----------------------------------------------------------- > > Any assistance would be much appreciated. Thank you. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From heath.obrien at gmail.com Mon Apr 9 21:37:56 2012 From: heath.obrien at gmail.com (Heath O'Brien) Date: Mon, 9 Apr 2012 17:37:56 -0400 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F8352DB.6060106@sanger.ac.uk> References: <4F8352DB.6060106@sanger.ac.uk> Message-ID: <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> Hi Frank, I just tried it with the latest version from bioperl-live, and it worked the way I described in my email. all good things, Heath On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: > Hi Heath, > > I have recently worked a bit on that module and contributed the code > to bioperl-live. I think this behaviour may already have changed but > I'm not 100% sure at the moment. When I have some time I will review > the code to confirm. In the meantime, you could give it a go with > the bioperl-live version if that's an option for you? > > Cheers, > > Frank > > > On 03/04/12 17:56, Heath O'Brien wrote: >> Hi All, >> >> I've encountered a bug in the trunc_with_features function in >> SeqUtils.pm, or at >> least behavior that was unexpected to me: >> >> Features with fuzzy coordinates in the original sequence are >> converted to exact >> coordinates in the truncated sequence. For example, the script >> below changes the >> coordinates for the feature from<1..5 to 1..5. >> >> I have modified the code to change this behavior on my system, but >> I thought I'd >> post something here in case others encounter the same problem. >> >> all good things, >> Heath >> >> >> >> #!/usr/bin/perl -w >> >> use strict; >> use warnings; >> use Bio::SeqIO; >> use Bio::SeqUtils; >> >> my $infile= shift; >> >> my $inIO = Bio::SeqIO->new('-file' => $infile, >> '-format' => 'genbank') or die "could not open seq file >> $infile\n"; >> >> my $outfile = $infile . '_out.gbk'; >> >> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >> '-format' => 'genbank') or die "could not open seq file >> $outfile\n"; >> >> my $in_seq = $inIO->next_seq; >> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >> $outIO->write_seq($out_seq); >> exit; >> >> >> LOCUS test_sequence 57303 bp DNA linear UNA >> DEFINITION Sequence to demonstrate unexpected behavior of >> trunc_with_features >> ACCESSION unknown >> KEYWORDS . >> FEATURES Location/Qualifiers >> source 1..10 >> /mol_type="genomic DNA" >> gene<1..5 >> /gene="test" >> CDS<1..5 >> /product="hypothetical protein" >> ORIGIN >> 1 caagattaaa >> // >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. From fs5 at sanger.ac.uk Mon Apr 9 21:21:31 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 09 Apr 2012 22:21:31 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: References: Message-ID: <4F8352DB.6060106@sanger.ac.uk> Hi Heath, I have recently worked a bit on that module and contributed the code to bioperl-live. I think this behaviour may already have changed but I'm not 100% sure at the moment. When I have some time I will review the code to confirm. In the meantime, you could give it a go with the bioperl-live version if that's an option for you? Cheers, Frank On 03/04/12 17:56, Heath O'Brien wrote: > Hi All, > > I've encountered a bug in the trunc_with_features function in SeqUtils.pm, or at > least behavior that was unexpected to me: > > Features with fuzzy coordinates in the original sequence are converted to exact > coordinates in the truncated sequence. For example, the script below changes the > coordinates for the feature from<1..5 to 1..5. > > I have modified the code to change this behavior on my system, but I thought I'd > post something here in case others encounter the same problem. > > all good things, > Heath > > > > #!/usr/bin/perl -w > > use strict; > use warnings; > use Bio::SeqIO; > use Bio::SeqUtils; > > my $infile= shift; > > my $inIO = Bio::SeqIO->new('-file' => $infile, > '-format' => 'genbank') or die "could not open seq file $infile\n"; > > my $outfile = $infile . '_out.gbk'; > > my $outIO = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => 'genbank') or die "could not open seq file $outfile\n"; > > my $in_seq = $inIO->next_seq; > my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); > $outIO->write_seq($out_seq); > exit; > > > LOCUS test_sequence 57303 bp DNA linear UNA > DEFINITION Sequence to demonstrate unexpected behavior of trunc_with_features > ACCESSION unknown > KEYWORDS . > FEATURES Location/Qualifiers > source 1..10 > /mol_type="genomic DNA" > gene<1..5 > /gene="test" > CDS<1..5 > /product="hypothetical protein" > ORIGIN > 1 caagattaaa > // > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From longbow0 at gmail.com Tue Apr 10 04:40:16 2012 From: longbow0 at gmail.com (longbow leo) Date: Mon, 9 Apr 2012 23:40:16 -0500 Subject: [Bioperl-l] Strange behavior of $node->height for scientific notation format branch length Message-ID: Hi all, I have encountered a strange behavior while calculating the tree height at root node. If the branch length of the tree was in scientific notation format, such as MrBayes created trees, it is unable to give correct results. For example, Tree 1: (((A:0.02,B:0.025):0.12,C:0.071):0.34,D:0.6); Tree 2: (((A:2e-2,B:2.5e-2):1.2e-1,C:7.1e-2):3.4e-1,D:6e-1); These two trees are identical besides the expression of branch length. The Perl script: # ============================================================ #!/usr/bin/perl use 5.010; use strict; use warnings; use Bio::TreeIO; my $usage = << "EOS"; Display branch lengths for leave nodes. Usage: t_branchlen.pl [] Params: : Tree file. : Tree format. Optional. Default "newick". EOS my ($ftre, $fmt) = @ARGV; die $usage unless ( defined $ftre ); $fmt = 'newick' unless ( defined $fmt); my $o_treei = Bio::TreeIO->new( -file => $ftre, -format => $fmt, ); my $o_tree = $o_treei->next_tree; my @o_leaves = $o_tree->get_leaf_nodes(); say join("\t", ("Node", "Branch Length", "Depth")); for my $o_node ( @o_leaves ) { say $o_node->id, "\t", $o_node->branch_length, "\t", $o_node->depth; } my $o_root = $o_tree->get_root_node; # say; say "Root height:\t", $o_root->height; exit 0; # ============================================================ For tree 1, the output is: Node Branch Length Depth A 0.02 0.48 B 0.025 0.485 C 0.071 0.411 D 0.6 0.6 *Root height: 0.6* For tree 2, Node Branch Length Depth A 2e-2 0.48 B 2.5e-2 0.485 C 7.1e-2 0.411 D 6e-1 0.6 *Root height: 3* The interesting thing is, the node depth values are correct, but I have no idea how the root height calculated. Are there any ideas to resolve this problem? Thanks! Haizhou From jason.stajich at gmail.com Tue Apr 10 06:33:00 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 9 Apr 2012 23:33:00 -0700 Subject: [Bioperl-l] Strange behavior of $node->height for scientific notation format branch length In-Reply-To: References: Message-ID: <1839F94F-178E-44F2-8A5C-6E2657AAD59C@gmail.com> It also looks like there is some code in calculating height that only processes numbers that are floating point - see line 64. I am not sure why this is in there, but I guess it was a protection from something that was failing in some other situation. 62: foreach my $subnode ( $self->each_Descendent ) { 63: my $bl = $subnode->branch_length; 64: $bl = 1 unless (defined $bl && $bl =~ /^\-?\d+(\.\d+)?$/); 65: my $s = $subnode->height + $bl; you can work around this by first forcing all your branch lengths to floating point after you read the tree in: for my $node ($tree->get_all_nodes ) $node->branch_length(sprintf("%f",$node->branch_length); } We should think about how we might handle scientific notation branch lengths properly in the code in the future if someone wants to take this on. Jason > Hi all, > > I have encountered a strange behavior while calculating the tree height at > root node. > > If the branch length of the tree was in scientific notation format, such as > MrBayes created trees, it is unable to give correct results. > > For example, > > Tree 1: > > (((A:0.02,B:0.025):0.12,C:0.071):0.34,D:0.6); > > Tree 2: > > (((A:2e-2,B:2.5e-2):1.2e-1,C:7.1e-2):3.4e-1,D:6e-1); > > These two trees are identical besides the expression of branch length. > > The Perl script: > > # ============================================================ > > #!/usr/bin/perl > > use 5.010; > use strict; > use warnings; > > use Bio::TreeIO; > > my $usage = << "EOS"; > Display branch lengths for leave nodes. > Usage: > t_branchlen.pl [] > Params: > : Tree file. > : Tree format. Optional. Default "newick". > EOS > > my ($ftre, $fmt) = @ARGV; > > die $usage unless ( defined $ftre ); > > $fmt = 'newick' unless ( defined $fmt); > > my $o_treei = Bio::TreeIO->new( > -file => $ftre, > -format => $fmt, > ); > > my $o_tree = $o_treei->next_tree; > > my @o_leaves = $o_tree->get_leaf_nodes(); > > say join("\t", ("Node", "Branch Length", "Depth")); > > for my $o_node ( @o_leaves ) { > say $o_node->id, "\t", $o_node->branch_length, "\t", $o_node->depth; > } > > my $o_root = $o_tree->get_root_node; > > # say; > > say "Root height:\t", $o_root->height; > > exit 0; > > # ============================================================ > > For tree 1, the output is: > > Node Branch Length Depth > A 0.02 0.48 > B 0.025 0.485 > C 0.071 0.411 > D 0.6 0.6 > *Root height: 0.6* > > For tree 2, > > Node Branch Length Depth > A 2e-2 0.48 > B 2.5e-2 0.485 > C 7.1e-2 0.411 > D 6e-1 0.6 > *Root height: 3* > > The interesting thing is, the node depth values are correct, but I have no > idea how the root height calculated. > > Are there any ideas to resolve this problem? > > Thanks! > > Haizhou > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From fs5 at sanger.ac.uk Tue Apr 10 08:42:54 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 10 Apr 2012 09:42:54 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> Message-ID: <4F83F28E.4080000@sanger.ac.uk> Hi Heath, Yes, I just had a look too and it's true that it would currently ignore the original type. I had added some new methods (delete, insert, ligate) and with those the location type is preserved but not with the already existing methods like trunc_with_features. I will look into it when I have some time and make some changes. Cheers, Frank On 09/04/12 22:37, Heath O'Brien wrote: > Hi Frank, > > I just tried it with the latest version from bioperl-live, and it worked > the way I described in my email. > > all good things, > Heath > > > On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: > >> Hi Heath, >> >> I have recently worked a bit on that module and contributed the code >> to bioperl-live. I think this behaviour may already have changed but >> I'm not 100% sure at the moment. When I have some time I will review >> the code to confirm. In the meantime, you could give it a go with the >> bioperl-live version if that's an option for you? >> >> Cheers, >> >> Frank >> >> >> On 03/04/12 17:56, Heath O'Brien wrote: >>> Hi All, >>> >>> I've encountered a bug in the trunc_with_features function in >>> SeqUtils.pm, or at >>> least behavior that was unexpected to me: >>> >>> Features with fuzzy coordinates in the original sequence are >>> converted to exact >>> coordinates in the truncated sequence. For example, the script below >>> changes the >>> coordinates for the feature from<1..5 to 1..5. >>> >>> I have modified the code to change this behavior on my system, but I >>> thought I'd >>> post something here in case others encounter the same problem. >>> >>> all good things, >>> Heath >>> >>> >>> >>> #!/usr/bin/perl -w >>> >>> use strict; >>> use warnings; >>> use Bio::SeqIO; >>> use Bio::SeqUtils; >>> >>> my $infile= shift; >>> >>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>> '-format' => 'genbank') or die "could not open seq file $infile\n"; >>> >>> my $outfile = $infile . '_out.gbk'; >>> >>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>> '-format' => 'genbank') or die "could not open seq file $outfile\n"; >>> >>> my $in_seq = $inIO->next_seq; >>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>> $outIO->write_seq($out_seq); >>> exit; >>> >>> >>> LOCUS test_sequence 57303 bp DNA linear UNA >>> DEFINITION Sequence to demonstrate unexpected behavior of >>> trunc_with_features >>> ACCESSION unknown >>> KEYWORDS . >>> FEATURES Location/Qualifiers >>> source 1..10 >>> /mol_type="genomic DNA" >>> gene<1..5 >>> /gene="test" >>> CDS<1..5 >>> /product="hypothetical protein" >>> ORIGIN >>> 1 caagattaaa >>> // >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a >> company registered in England with number 2742969, whose registered >> office is 215 Euston Road, London, NW1 2BE. > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From awitney at sgul.ac.uk Tue Apr 10 09:11:51 2012 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 10 Apr 2012 10:11:51 +0100 Subject: [Bioperl-l] Output of a BLAST parse to text file In-Reply-To: References: Message-ID: <908D24DC-1A0E-4EE1-8573-F68FB3487071@sgul.ac.uk> Hi Zac, how do you want to sort the information? if its just on num_hsps... then you will have to store the results in an array or something and then sort that before printing your output adam On 1 Apr 2012, at 04:35, Zachariah Wylde wrote: > Hi there, > > I am very new to Bioperl, so excuse me if come across as simple! I need to > write a bioperl script to extract information from BLAST results. > The script needs to count how many HSPs are on each mouse chromosome and > be written to a tab-separated table. I have this so far, but do not > understand how to > sort the information. I would much, appreciate if you could help me?? > > Yours sincerely, > > Zac Wylde > > use strict; > use warnings; > use lib "C:/Program Files (x86)/BioPerl"; > use Bio::SearchIO; > > my $infile = "Alignment_Ref_Seq.txt"; > open INFILE, $infile or die "Cannot open $infile: $!"; > > my $outfile = "assignment2.txt"; > open OUTFILE, ">$outfile" or die "Cannot open $outfile: $!"; > > > my $parser = new Bio::SearchIO(-format => 'blast', -file => > 'Alignment_Ref_Seq.txt'); > > > while (my $result = $parser->next_result){ > while (my $hit = $result->next_hit){ > while (my $hsp = $hit->next_hsp){ > if ($hit->description =~ /(mus musculus)|(mouse)/i){ > if ($hit->description =~ /chromosome (\w+)/){ > print "Hit = ", $hit->name, " \t", > "chromosome = ", $1, " \t", > "HSPs = ", $hit->num_hsps, "\n"; > } > } > } > } > } > > close INFILE; > close OUTFILE; > > #unknown > #chromosome from > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Tue Apr 10 11:10:36 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 10 Apr 2012 12:10:36 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F83F28E.4080000@sanger.ac.uk> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> Message-ID: <4F84152C.7030300@gmail.com> Hi Heath, Frank, This was probably my fault back in the mists of time. Looks like an easy fix though, I've reported the issue on Redmine and submitted a patch: https://redmine.open-bio.org/issues/3339 We should probably also add Heath's example as a test case. Cheers, Roy. On 10/04/2012 09:42, Frank Schwach wrote: > Hi Heath, > > Yes, I just had a look too and it's true that it would currently ignore > the original type. I had added some new methods (delete, insert, ligate) > and with those the location type is preserved but not with the already > existing methods like trunc_with_features. I will look into it when I > have some time and make some changes. > > Cheers, > > Frank > > > On 09/04/12 22:37, Heath O'Brien wrote: >> Hi Frank, >> >> I just tried it with the latest version from bioperl-live, and it worked >> the way I described in my email. >> >> all good things, >> Heath >> >> >> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >> >>> Hi Heath, >>> >>> I have recently worked a bit on that module and contributed the code >>> to bioperl-live. I think this behaviour may already have changed but >>> I'm not 100% sure at the moment. When I have some time I will review >>> the code to confirm. In the meantime, you could give it a go with the >>> bioperl-live version if that's an option for you? >>> >>> Cheers, >>> >>> Frank >>> >>> >>> On 03/04/12 17:56, Heath O'Brien wrote: >>>> Hi All, >>>> >>>> I've encountered a bug in the trunc_with_features function in >>>> SeqUtils.pm, or at >>>> least behavior that was unexpected to me: >>>> >>>> Features with fuzzy coordinates in the original sequence are >>>> converted to exact >>>> coordinates in the truncated sequence. For example, the script below >>>> changes the >>>> coordinates for the feature from<1..5 to 1..5. >>>> >>>> I have modified the code to change this behavior on my system, but I >>>> thought I'd >>>> post something here in case others encounter the same problem. >>>> >>>> all good things, >>>> Heath >>>> >>>> >>>> >>>> #!/usr/bin/perl -w >>>> >>>> use strict; >>>> use warnings; >>>> use Bio::SeqIO; >>>> use Bio::SeqUtils; >>>> >>>> my $infile= shift; >>>> >>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>> '-format' => 'genbank') or die "could not open seq file $infile\n"; >>>> >>>> my $outfile = $infile . '_out.gbk'; >>>> >>>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>>> '-format' => 'genbank') or die "could not open seq file $outfile\n"; >>>> >>>> my $in_seq = $inIO->next_seq; >>>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>> $outIO->write_seq($out_seq); >>>> exit; >>>> >>>> >>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>> DEFINITION Sequence to demonstrate unexpected behavior of >>>> trunc_with_features >>>> ACCESSION unknown >>>> KEYWORDS . >>>> FEATURES Location/Qualifiers >>>> source 1..10 >>>> /mol_type="genomic DNA" >>>> gene<1..5 >>>> /gene="test" >>>> CDS<1..5 >>>> /product="hypothetical protein" >>>> ORIGIN >>>> 1 caagattaaa >>>> // >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome Research >>> Limited, a charity registered in England with number 1021457 and a >>> company registered in England with number 2742969, whose registered >>> office is 215 Euston Road, London, NW1 2BE. >> > > From roy.chaudhuri at gmail.com Tue Apr 10 14:45:21 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 10 Apr 2012 15:45:21 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F841EF3.6000603@sanger.ac.uk> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> Message-ID: <4F844781.90005@gmail.com> Turns out I spoke too soon, I added in some new tests and they highlighted problems with both trunc_with_features and revcom_with_features. I think I have resolved all the issues in the most recent Redmine patch - Frank, Heath, please could you check that it works for you? Cheers, Roy. On 10/04/2012 12:52, Frank Schwach wrote: > Brilliant, thanks Roy! > Frank > > > On 10/04/12 12:10, Roy Chaudhuri wrote: >> Hi Heath, Frank, >> >> This was probably my fault back in the mists of time. Looks like an easy >> fix though, I've reported the issue on Redmine and submitted a patch: >> https://redmine.open-bio.org/issues/3339 >> >> We should probably also add Heath's example as a test case. >> >> Cheers, >> Roy. >> >> On 10/04/2012 09:42, Frank Schwach wrote: >>> Hi Heath, >>> >>> Yes, I just had a look too and it's true that it would currently ignore >>> the original type. I had added some new methods (delete, insert, ligate) >>> and with those the location type is preserved but not with the already >>> existing methods like trunc_with_features. I will look into it when I >>> have some time and make some changes. >>> >>> Cheers, >>> >>> Frank >>> >>> >>> On 09/04/12 22:37, Heath O'Brien wrote: >>>> Hi Frank, >>>> >>>> I just tried it with the latest version from bioperl-live, and it worked >>>> the way I described in my email. >>>> >>>> all good things, >>>> Heath >>>> >>>> >>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>> >>>>> Hi Heath, >>>>> >>>>> I have recently worked a bit on that module and contributed the code >>>>> to bioperl-live. I think this behaviour may already have changed but >>>>> I'm not 100% sure at the moment. When I have some time I will review >>>>> the code to confirm. In the meantime, you could give it a go with the >>>>> bioperl-live version if that's an option for you? >>>>> >>>>> Cheers, >>>>> >>>>> Frank >>>>> >>>>> >>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>> Hi All, >>>>>> >>>>>> I've encountered a bug in the trunc_with_features function in >>>>>> SeqUtils.pm, or at >>>>>> least behavior that was unexpected to me: >>>>>> >>>>>> Features with fuzzy coordinates in the original sequence are >>>>>> converted to exact >>>>>> coordinates in the truncated sequence. For example, the script below >>>>>> changes the >>>>>> coordinates for the feature from<1..5 to 1..5. >>>>>> >>>>>> I have modified the code to change this behavior on my system, but I >>>>>> thought I'd >>>>>> post something here in case others encounter the same problem. >>>>>> >>>>>> all good things, >>>>>> Heath >>>>>> >>>>>> >>>>>> >>>>>> #!/usr/bin/perl -w >>>>>> >>>>>> use strict; >>>>>> use warnings; >>>>>> use Bio::SeqIO; >>>>>> use Bio::SeqUtils; >>>>>> >>>>>> my $infile= shift; >>>>>> >>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>> '-format' => 'genbank') or die "could not open seq file $infile\n"; >>>>>> >>>>>> my $outfile = $infile . '_out.gbk'; >>>>>> >>>>>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>>>>> '-format' => 'genbank') or die "could not open seq file $outfile\n"; >>>>>> >>>>>> my $in_seq = $inIO->next_seq; >>>>>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>> $outIO->write_seq($out_seq); >>>>>> exit; >>>>>> >>>>>> >>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>> DEFINITION Sequence to demonstrate unexpected behavior of >>>>>> trunc_with_features >>>>>> ACCESSION unknown >>>>>> KEYWORDS . >>>>>> FEATURES Location/Qualifiers >>>>>> source 1..10 >>>>>> /mol_type="genomic DNA" >>>>>> gene<1..5 >>>>>> /gene="test" >>>>>> CDS<1..5 >>>>>> /product="hypothetical protein" >>>>>> ORIGIN >>>>>> 1 caagattaaa >>>>>> // >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> -- >>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>> Limited, a charity registered in England with number 1021457 and a >>>>> company registered in England with number 2742969, whose registered >>>>> office is 215 Euston Road, London, NW1 2BE. >>>> >>> >>> >> > > From heath.obrien at gmail.com Tue Apr 10 15:34:59 2012 From: heath.obrien at gmail.com (Heath O'Brien) Date: Tue, 10 Apr 2012 11:34:59 -0400 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F844781.90005@gmail.com> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> <4F844781.90005@gmail.com> Message-ID: <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> Works perfect for me. Thanks! all good things, Heath On 10-Apr-12, at 10:45 AM, Roy Chaudhuri wrote: > Turns out I spoke too soon, I added in some new tests and they > highlighted problems with both trunc_with_features and > revcom_with_features. I think I have resolved all the issues in the > most recent Redmine patch - Frank, Heath, please could you check > that it works for you? > > Cheers, > Roy. > > On 10/04/2012 12:52, Frank Schwach wrote: >> Brilliant, thanks Roy! >> Frank >> >> >> On 10/04/12 12:10, Roy Chaudhuri wrote: >>> Hi Heath, Frank, >>> >>> This was probably my fault back in the mists of time. Looks like >>> an easy >>> fix though, I've reported the issue on Redmine and submitted a >>> patch: >>> https://redmine.open-bio.org/issues/3339 >>> >>> We should probably also add Heath's example as a test case. >>> >>> Cheers, >>> Roy. >>> >>> On 10/04/2012 09:42, Frank Schwach wrote: >>>> Hi Heath, >>>> >>>> Yes, I just had a look too and it's true that it would currently >>>> ignore >>>> the original type. I had added some new methods (delete, insert, >>>> ligate) >>>> and with those the location type is preserved but not with the >>>> already >>>> existing methods like trunc_with_features. I will look into it >>>> when I >>>> have some time and make some changes. >>>> >>>> Cheers, >>>> >>>> Frank >>>> >>>> >>>> On 09/04/12 22:37, Heath O'Brien wrote: >>>>> Hi Frank, >>>>> >>>>> I just tried it with the latest version from bioperl-live, and >>>>> it worked >>>>> the way I described in my email. >>>>> >>>>> all good things, >>>>> Heath >>>>> >>>>> >>>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>>> >>>>>> Hi Heath, >>>>>> >>>>>> I have recently worked a bit on that module and contributed the >>>>>> code >>>>>> to bioperl-live. I think this behaviour may already have >>>>>> changed but >>>>>> I'm not 100% sure at the moment. When I have some time I will >>>>>> review >>>>>> the code to confirm. In the meantime, you could give it a go >>>>>> with the >>>>>> bioperl-live version if that's an option for you? >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> I've encountered a bug in the trunc_with_features function in >>>>>>> SeqUtils.pm, or at >>>>>>> least behavior that was unexpected to me: >>>>>>> >>>>>>> Features with fuzzy coordinates in the original sequence are >>>>>>> converted to exact >>>>>>> coordinates in the truncated sequence. For example, the script >>>>>>> below >>>>>>> changes the >>>>>>> coordinates for the feature from<1..5 to 1..5. >>>>>>> >>>>>>> I have modified the code to change this behavior on my system, >>>>>>> but I >>>>>>> thought I'd >>>>>>> post something here in case others encounter the same problem. >>>>>>> >>>>>>> all good things, >>>>>>> Heath >>>>>>> >>>>>>> >>>>>>> >>>>>>> #!/usr/bin/perl -w >>>>>>> >>>>>>> use strict; >>>>>>> use warnings; >>>>>>> use Bio::SeqIO; >>>>>>> use Bio::SeqUtils; >>>>>>> >>>>>>> my $infile= shift; >>>>>>> >>>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>>> '-format' => 'genbank') or die "could not open seq file >>>>>>> $infile\n"; >>>>>>> >>>>>>> my $outfile = $infile . '_out.gbk'; >>>>>>> >>>>>>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>>>>>> '-format' => 'genbank') or die "could not open seq file >>>>>>> $outfile\n"; >>>>>>> >>>>>>> my $in_seq = $inIO->next_seq; >>>>>>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>>> $outIO->write_seq($out_seq); >>>>>>> exit; >>>>>>> >>>>>>> >>>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>>> DEFINITION Sequence to demonstrate unexpected behavior of >>>>>>> trunc_with_features >>>>>>> ACCESSION unknown >>>>>>> KEYWORDS . >>>>>>> FEATURES Location/Qualifiers >>>>>>> source 1..10 >>>>>>> /mol_type="genomic DNA" >>>>>>> gene<1..5 >>>>>>> /gene="test" >>>>>>> CDS<1..5 >>>>>>> /product="hypothetical protein" >>>>>>> ORIGIN >>>>>>> 1 caagattaaa >>>>>>> // >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> -- >>>>>> The Wellcome Trust Sanger Institute is operated by Genome >>>>>> Research >>>>>> Limited, a charity registered in England with number 1021457 >>>>>> and a >>>>>> company registered in England with number 2742969, whose >>>>>> registered >>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>> >>>> >>>> >>> >> >> > From cjfields at illinois.edu Tue Apr 10 17:08:45 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 10 Apr 2012 17:08:45 +0000 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> <4F844781.90005@gmail.com> <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> Message-ID: I have committed these to bioperl-live, they passed tests for me. I have left the bug report open, however, in case more work needs to be done. Roy, did you want to close that when you are ready? chris On Apr 10, 2012, at 10:34 AM, Heath O'Brien wrote: > Works perfect for me. Thanks! > > all good things, > Heath > > On 10-Apr-12, at 10:45 AM, Roy Chaudhuri wrote: > >> Turns out I spoke too soon, I added in some new tests and they highlighted problems with both trunc_with_features and revcom_with_features. I think I have resolved all the issues in the most recent Redmine patch - Frank, Heath, please could you check that it works for you? >> >> Cheers, >> Roy. >> >> On 10/04/2012 12:52, Frank Schwach wrote: >>> Brilliant, thanks Roy! >>> Frank >>> >>> >>> On 10/04/12 12:10, Roy Chaudhuri wrote: >>>> Hi Heath, Frank, >>>> >>>> This was probably my fault back in the mists of time. Looks like an easy >>>> fix though, I've reported the issue on Redmine and submitted a patch: >>>> https://redmine.open-bio.org/issues/3339 >>>> >>>> We should probably also add Heath's example as a test case. >>>> >>>> Cheers, >>>> Roy. >>>> >>>> On 10/04/2012 09:42, Frank Schwach wrote: >>>>> Hi Heath, >>>>> >>>>> Yes, I just had a look too and it's true that it would currently ignore >>>>> the original type. I had added some new methods (delete, insert, ligate) >>>>> and with those the location type is preserved but not with the already >>>>> existing methods like trunc_with_features. I will look into it when I >>>>> have some time and make some changes. >>>>> >>>>> Cheers, >>>>> >>>>> Frank >>>>> >>>>> >>>>> On 09/04/12 22:37, Heath O'Brien wrote: >>>>>> Hi Frank, >>>>>> >>>>>> I just tried it with the latest version from bioperl-live, and it worked >>>>>> the way I described in my email. >>>>>> >>>>>> all good things, >>>>>> Heath >>>>>> >>>>>> >>>>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>>>> >>>>>>> Hi Heath, >>>>>>> >>>>>>> I have recently worked a bit on that module and contributed the code >>>>>>> to bioperl-live. I think this behaviour may already have changed but >>>>>>> I'm not 100% sure at the moment. When I have some time I will review >>>>>>> the code to confirm. In the meantime, you could give it a go with the >>>>>>> bioperl-live version if that's an option for you? >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> I've encountered a bug in the trunc_with_features function in >>>>>>>> SeqUtils.pm, or at >>>>>>>> least behavior that was unexpected to me: >>>>>>>> >>>>>>>> Features with fuzzy coordinates in the original sequence are >>>>>>>> converted to exact >>>>>>>> coordinates in the truncated sequence. For example, the script below >>>>>>>> changes the >>>>>>>> coordinates for the feature from<1..5 to 1..5. >>>>>>>> >>>>>>>> I have modified the code to change this behavior on my system, but I >>>>>>>> thought I'd >>>>>>>> post something here in case others encounter the same problem. >>>>>>>> >>>>>>>> all good things, >>>>>>>> Heath >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> #!/usr/bin/perl -w >>>>>>>> >>>>>>>> use strict; >>>>>>>> use warnings; >>>>>>>> use Bio::SeqIO; >>>>>>>> use Bio::SeqUtils; >>>>>>>> >>>>>>>> my $infile= shift; >>>>>>>> >>>>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>>>> '-format' => 'genbank') or die "could not open seq file $infile\n"; >>>>>>>> >>>>>>>> my $outfile = $infile . '_out.gbk'; >>>>>>>> >>>>>>>> my $outIO = Bio::SeqIO->new('-file' => ">$outfile", >>>>>>>> '-format' => 'genbank') or die "could not open seq file $outfile\n"; >>>>>>>> >>>>>>>> my $in_seq = $inIO->next_seq; >>>>>>>> my $out_seq = Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>>>> $outIO->write_seq($out_seq); >>>>>>>> exit; >>>>>>>> >>>>>>>> >>>>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>>>> DEFINITION Sequence to demonstrate unexpected behavior of >>>>>>>> trunc_with_features >>>>>>>> ACCESSION unknown >>>>>>>> KEYWORDS . >>>>>>>> FEATURES Location/Qualifiers >>>>>>>> source 1..10 >>>>>>>> /mol_type="genomic DNA" >>>>>>>> gene<1..5 >>>>>>>> /gene="test" >>>>>>>> CDS<1..5 >>>>>>>> /product="hypothetical protein" >>>>>>>> ORIGIN >>>>>>>> 1 caagattaaa >>>>>>>> // >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>>>> Limited, a charity registered in England with number 1021457 and a >>>>>>> company registered in England with number 2742969, whose registered >>>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>>> >>>>> >>>>> >>>> >>> >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Tue Apr 10 20:07:28 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 10 Apr 2012 21:07:28 +0100 Subject: [Bioperl-l] sequence proxy server In-Reply-To: References: Message-ID: On Fri, Apr 6, 2012 at 2:49 PM, Kevin Murray wrote: > Hi all, > > I'm an undergrad student in molecular biology at the ANU in Australia, > and my research projects are becoming increasingly bioinformatics > heavy. The latest one has involved quite a large amount of sequence > retrieval from GenBank and GenPept. The download speed to Australia > from NCBI's servers is rather slow, and i've been thinking about how > we can improve this. ...So, i though about writing a "sequence proxy" ... Have you tried TogoWS? It is based Japan and offers access to some of the local databases but also proxies some important EMBL/EBI and NCBI resources as well - including GenBank. I would expect you'd get much faster response times from Australia than talking directly to the NCBI. http://togows.dbcls.jp/site/en/rest.html I think the TogoWS REST API is very nice to use, and seems to give much clearer error messages than the NCBI Entrez site (TogoWS uses HTTP error codes pretty consistently). Biopython 1.59 onwards has a simple API for the TogoWS REST interface, but their URL structure is very easy, so for a simple one off task you can easily roll your own in Perl (or write one for BioPerl?). Peter From cjfields at illinois.edu Wed Apr 11 01:20:48 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 11 Apr 2012 01:20:48 +0000 Subject: [Bioperl-l] sequence proxy server In-Reply-To: References: Message-ID: On Apr 10, 2012, at 3:07 PM, Peter Cock wrote: > On Fri, Apr 6, 2012 at 2:49 PM, Kevin Murray wrote: >> Hi all, >> >> I'm an undergrad student in molecular biology at the ANU in Australia, >> and my research projects are becoming increasingly bioinformatics >> heavy. The latest one has involved quite a large amount of sequence >> retrieval from GenBank and GenPept. The download speed to Australia >> from NCBI's servers is rather slow, and i've been thinking about how >> we can improve this. ...So, i though about writing a "sequence proxy" ... > > Have you tried TogoWS? It is based Japan and offers access to > some of the local databases but also proxies some important > EMBL/EBI and NCBI resources as well - including GenBank. > I would expect you'd get much faster response times from > Australia than talking directly to the NCBI. > http://togows.dbcls.jp/site/en/rest.html > > I think the TogoWS REST API is very nice to use, and seems to > give much clearer error messages than the NCBI Entrez site > (TogoWS uses HTTP error codes pretty consistently). > > Biopython 1.59 onwards has a simple API for the TogoWS > REST interface, but their URL structure is very easy, so for > a simple one off task you can easily roll your own in Perl > (or write one for BioPerl?). > > Peter Should be easy enough if the API is well-documented. Related to this, anyone know if NCBI's REST API is documented anywhere? chris From roy.chaudhuri at gmail.com Wed Apr 11 10:55:49 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 11 Apr 2012 11:55:49 +0100 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> <4F844781.90005@gmail.com> <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> Message-ID: <4F856335.1000503@gmail.com> Hi Chris, I think it should be fine to close, but my account doesn't have permission to do so. Cheers, Roy. On 10/04/2012 18:08, Fields, Christopher J wrote: > I have committed these to bioperl-live, they passed tests for me. I > have left the bug report open, however, in case more work needs to be > done. Roy, did you want to close that when you are ready? > > chris > > On Apr 10, 2012, at 10:34 AM, Heath O'Brien wrote: > >> Works perfect for me. Thanks! >> >> all good things, Heath >> >> On 10-Apr-12, at 10:45 AM, Roy Chaudhuri wrote: >> >>> Turns out I spoke too soon, I added in some new tests and they >>> highlighted problems with both trunc_with_features and >>> revcom_with_features. I think I have resolved all the issues in >>> the most recent Redmine patch - Frank, Heath, please could you >>> check that it works for you? >>> >>> Cheers, Roy. >>> >>> On 10/04/2012 12:52, Frank Schwach wrote: >>>> Brilliant, thanks Roy! Frank >>>> >>>> >>>> On 10/04/12 12:10, Roy Chaudhuri wrote: >>>>> Hi Heath, Frank, >>>>> >>>>> This was probably my fault back in the mists of time. Looks >>>>> like an easy fix though, I've reported the issue on Redmine >>>>> and submitted a patch: >>>>> https://redmine.open-bio.org/issues/3339 >>>>> >>>>> We should probably also add Heath's example as a test case. >>>>> >>>>> Cheers, Roy. >>>>> >>>>> On 10/04/2012 09:42, Frank Schwach wrote: >>>>>> Hi Heath, >>>>>> >>>>>> Yes, I just had a look too and it's true that it would >>>>>> currently ignore the original type. I had added some new >>>>>> methods (delete, insert, ligate) and with those the >>>>>> location type is preserved but not with the already >>>>>> existing methods like trunc_with_features. I will look into >>>>>> it when I have some time and make some changes. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> On 09/04/12 22:37, Heath O'Brien wrote: >>>>>>> Hi Frank, >>>>>>> >>>>>>> I just tried it with the latest version from >>>>>>> bioperl-live, and it worked the way I described in my >>>>>>> email. >>>>>>> >>>>>>> all good things, Heath >>>>>>> >>>>>>> >>>>>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>>>>> >>>>>>>> Hi Heath, >>>>>>>> >>>>>>>> I have recently worked a bit on that module and >>>>>>>> contributed the code to bioperl-live. I think this >>>>>>>> behaviour may already have changed but I'm not 100% >>>>>>>> sure at the moment. When I have some time I will >>>>>>>> review the code to confirm. In the meantime, you could >>>>>>>> give it a go with the bioperl-live version if that's an >>>>>>>> option for you? >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Frank >>>>>>>> >>>>>>>> >>>>>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> I've encountered a bug in the trunc_with_features >>>>>>>>> function in SeqUtils.pm, or at least behavior that >>>>>>>>> was unexpected to me: >>>>>>>>> >>>>>>>>> Features with fuzzy coordinates in the original >>>>>>>>> sequence are converted to exact coordinates in the >>>>>>>>> truncated sequence. For example, the script below >>>>>>>>> changes the coordinates for the feature from<1..5 to >>>>>>>>> 1..5. >>>>>>>>> >>>>>>>>> I have modified the code to change this behavior on >>>>>>>>> my system, but I thought I'd post something here in >>>>>>>>> case others encounter the same problem. >>>>>>>>> >>>>>>>>> all good things, Heath >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> #!/usr/bin/perl -w >>>>>>>>> >>>>>>>>> use strict; use warnings; use Bio::SeqIO; use >>>>>>>>> Bio::SeqUtils; >>>>>>>>> >>>>>>>>> my $infile= shift; >>>>>>>>> >>>>>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>>>>> '-format' => 'genbank') or die "could not open seq >>>>>>>>> file $infile\n"; >>>>>>>>> >>>>>>>>> my $outfile = $infile . '_out.gbk'; >>>>>>>>> >>>>>>>>> my $outIO = Bio::SeqIO->new('-file' => >>>>>>>>> ">$outfile", '-format' => 'genbank') or die "could >>>>>>>>> not open seq file $outfile\n"; >>>>>>>>> >>>>>>>>> my $in_seq = $inIO->next_seq; my $out_seq = >>>>>>>>> Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>>>>> $outIO->write_seq($out_seq); exit; >>>>>>>>> >>>>>>>>> >>>>>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>>>>> DEFINITION Sequence to demonstrate unexpected >>>>>>>>> behavior of trunc_with_features ACCESSION unknown >>>>>>>>> KEYWORDS . FEATURES Location/Qualifiers source 1..10 >>>>>>>>> /mol_type="genomic DNA" gene<1..5 /gene="test" >>>>>>>>> CDS<1..5 /product="hypothetical protein" ORIGIN 1 >>>>>>>>> caagattaaa // >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> -- The Wellcome Trust Sanger Institute is operated by >>>>>>>> Genome Research Limited, a charity registered in >>>>>>>> England with number 1021457 and a company registered in >>>>>>>> England with number 2742969, whose registered office is >>>>>>>> 215 Euston Road, London, NW1 2BE. >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> _______________________________________________ Bioperl-l mailing >> list Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed Apr 11 15:28:38 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 11 Apr 2012 15:28:38 +0000 Subject: [Bioperl-l] problem with trunc_with_features (SeqUtils.pm) In-Reply-To: <4F856335.1000503@gmail.com> References: <4F8352DB.6060106@sanger.ac.uk> <0928DDE4-AD67-4846-A65B-DCF95EBA942F@gmail.com> <4F83F28E.4080000@sanger.ac.uk> <4F84152C.7030300@gmail.com> <4F841EF3.6000603@sanger.ac.uk> <4F844781.90005@gmail.com> <80696D43-B1D4-4228-A4B5-6B9299F46D41@gmail.com> <4F856335.1000503@gmail.com> Message-ID: Okay, closed it. Thanks again! chris On Apr 11, 2012, at 5:55 AM, Roy Chaudhuri wrote: > Hi Chris, > > I think it should be fine to close, but my account doesn't have permission to do so. > > Cheers, > Roy. > > On 10/04/2012 18:08, Fields, Christopher J wrote: >> I have committed these to bioperl-live, they passed tests for me. I >> have left the bug report open, however, in case more work needs to be >> done. Roy, did you want to close that when you are ready? >> >> chris >> >> On Apr 10, 2012, at 10:34 AM, Heath O'Brien wrote: >> >>> Works perfect for me. Thanks! >>> >>> all good things, Heath >>> >>> On 10-Apr-12, at 10:45 AM, Roy Chaudhuri wrote: >>> >>>> Turns out I spoke too soon, I added in some new tests and they >>>> highlighted problems with both trunc_with_features and >>>> revcom_with_features. I think I have resolved all the issues in >>>> the most recent Redmine patch - Frank, Heath, please could you >>>> check that it works for you? >>>> >>>> Cheers, Roy. >>>> >>>> On 10/04/2012 12:52, Frank Schwach wrote: >>>>> Brilliant, thanks Roy! Frank >>>>> >>>>> >>>>> On 10/04/12 12:10, Roy Chaudhuri wrote: >>>>>> Hi Heath, Frank, >>>>>> >>>>>> This was probably my fault back in the mists of time. Looks >>>>>> like an easy fix though, I've reported the issue on Redmine >>>>>> and submitted a patch: >>>>>> https://redmine.open-bio.org/issues/3339 >>>>>> >>>>>> We should probably also add Heath's example as a test case. >>>>>> >>>>>> Cheers, Roy. >>>>>> >>>>>> On 10/04/2012 09:42, Frank Schwach wrote: >>>>>>> Hi Heath, >>>>>>> >>>>>>> Yes, I just had a look too and it's true that it would >>>>>>> currently ignore the original type. I had added some new >>>>>>> methods (delete, insert, ligate) and with those the >>>>>>> location type is preserved but not with the already >>>>>>> existing methods like trunc_with_features. I will look into >>>>>>> it when I have some time and make some changes. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> On 09/04/12 22:37, Heath O'Brien wrote: >>>>>>>> Hi Frank, >>>>>>>> >>>>>>>> I just tried it with the latest version from >>>>>>>> bioperl-live, and it worked the way I described in my >>>>>>>> email. >>>>>>>> >>>>>>>> all good things, Heath >>>>>>>> >>>>>>>> >>>>>>>> On 9-Apr-12, at 5:21 PM, Frank Schwach wrote: >>>>>>>> >>>>>>>>> Hi Heath, >>>>>>>>> >>>>>>>>> I have recently worked a bit on that module and >>>>>>>>> contributed the code to bioperl-live. I think this >>>>>>>>> behaviour may already have changed but I'm not 100% >>>>>>>>> sure at the moment. When I have some time I will >>>>>>>>> review the code to confirm. In the meantime, you could >>>>>>>>> give it a go with the bioperl-live version if that's an >>>>>>>>> option for you? >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Frank >>>>>>>>> >>>>>>>>> >>>>>>>>> On 03/04/12 17:56, Heath O'Brien wrote: >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> I've encountered a bug in the trunc_with_features >>>>>>>>>> function in SeqUtils.pm, or at least behavior that >>>>>>>>>> was unexpected to me: >>>>>>>>>> >>>>>>>>>> Features with fuzzy coordinates in the original >>>>>>>>>> sequence are converted to exact coordinates in the >>>>>>>>>> truncated sequence. For example, the script below >>>>>>>>>> changes the coordinates for the feature from<1..5 to >>>>>>>>>> 1..5. >>>>>>>>>> >>>>>>>>>> I have modified the code to change this behavior on >>>>>>>>>> my system, but I thought I'd post something here in >>>>>>>>>> case others encounter the same problem. >>>>>>>>>> >>>>>>>>>> all good things, Heath >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> #!/usr/bin/perl -w >>>>>>>>>> >>>>>>>>>> use strict; use warnings; use Bio::SeqIO; use >>>>>>>>>> Bio::SeqUtils; >>>>>>>>>> >>>>>>>>>> my $infile= shift; >>>>>>>>>> >>>>>>>>>> my $inIO = Bio::SeqIO->new('-file' => $infile, >>>>>>>>>> '-format' => 'genbank') or die "could not open seq >>>>>>>>>> file $infile\n"; >>>>>>>>>> >>>>>>>>>> my $outfile = $infile . '_out.gbk'; >>>>>>>>>> >>>>>>>>>> my $outIO = Bio::SeqIO->new('-file' => >>>>>>>>>> ">$outfile", '-format' => 'genbank') or die "could >>>>>>>>>> not open seq file $outfile\n"; >>>>>>>>>> >>>>>>>>>> my $in_seq = $inIO->next_seq; my $out_seq = >>>>>>>>>> Bio::SeqUtils->trunc_with_features($in_seq, 1, 5); >>>>>>>>>> $outIO->write_seq($out_seq); exit; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> LOCUS test_sequence 57303 bp DNA linear UNA >>>>>>>>>> DEFINITION Sequence to demonstrate unexpected >>>>>>>>>> behavior of trunc_with_features ACCESSION unknown >>>>>>>>>> KEYWORDS . FEATURES Location/Qualifiers source 1..10 >>>>>>>>>> /mol_type="genomic DNA" gene<1..5 /gene="test" >>>>>>>>>> CDS<1..5 /product="hypothetical protein" ORIGIN 1 >>>>>>>>>> caagattaaa // >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>>> -- The Wellcome Trust Sanger Institute is operated by >>>>>>>>> Genome Research Limited, a charity registered in >>>>>>>>> England with number 1021457 and a company registered in >>>>>>>>> England with number 2742969, whose registered office is >>>>>>>>> 215 Euston Road, London, NW1 2BE. >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> _______________________________________________ Bioperl-l mailing >>> list Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From p.j.a.cock at googlemail.com Thu Apr 12 12:47:05 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 12 Apr 2012 13:47:05 +0100 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE In-Reply-To: References: Message-ID: On Thu, Apr 5, 2012 at 5:52 AM, afia hayati wrote: > Dear all, > > I am afia, a PhD student in Bioinformatics. ?I am so interested to > participate GSoC and would like to apply for project Bio::Assembly, Ace2Sam > and Sam2Ace converter. I have written a proposal based on the guidance for > prospective GSoC student. I paste my proposal in here. > If you have time, please give me suggestions. > Thank you very much. > > Sincerely, > Afiahayati Hello Afiahayati, What would you use this converter for? I can see it is useful to convert ACE to SAM/BAM for downstream analysis and visualization. At the moment the only assemblers I regularly use which produce ACE are the Roche 'Newbler' gsAssember, and MIRA. For MIRA, Bastien is working on native SAM output, but for the moment I wrote and maintain a converter from MIRA's alignment format (MAF) to SAM: https://github.com/peterjc/maf2sam Or is the idea more to support SAM (and BAM) assemblies within the existing BioPerl Bio::Assembly::IO: framework to allow easier manipulation from Perl? Peter From florent.angly at gmail.com Fri Apr 13 02:41:54 2012 From: florent.angly at gmail.com (Florent Angly) Date: Fri, 13 Apr 2012 12:41:54 +1000 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE In-Reply-To: References: Message-ID: <4F879272.30306@gmail.com> Bioperl has a module to read and write ACE files, Bio::Assembly::IO::ace. It also has a module to read (but not write) SAM files, Bio::Assembly::IO::sam. So, it looks like you can already do SAMtoACE within Bioperl. Implementing ACEtoSAM would involve adding write support to the Bio::Assembly::sam module. This can be helped by looking at how Bio::Assembly::IO::ace and Bio::Assembly::tigr implement write support. Regards, Florent On 12/04/12 22:47, Peter Cock wrote: > On Thu, Apr 5, 2012 at 5:52 AM, afia hayati wrote: >> Dear all, >> >> I am afia, a PhD student in Bioinformatics. I am so interested to >> participate GSoC and would like to apply for project Bio::Assembly, Ace2Sam >> and Sam2Ace converter. I have written a proposal based on the guidance for >> prospective GSoC student. I paste my proposal in here. >> If you have time, please give me suggestions. >> Thank you very much. >> >> Sincerely, >> Afiahayati > Hello Afiahayati, > > What would you use this converter for? > > I can see it is useful to convert ACE to SAM/BAM for downstream analysis > and visualization. At the moment the only assemblers I regularly use which > produce ACE are the Roche 'Newbler' gsAssember, and MIRA. > > For MIRA, Bastien is working on native SAM output, but for the moment > I wrote and maintain a converter from MIRA's alignment format (MAF) to > SAM: https://github.com/peterjc/maf2sam > > Or is the idea more to support SAM (and BAM) assemblies within the > existing BioPerl Bio::Assembly::IO: framework to allow easier > manipulation from Perl? > > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Fri Apr 13 08:32:00 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 13 Apr 2012 09:32:00 +0100 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE In-Reply-To: <4F879272.30306@gmail.com> References: <4F879272.30306@gmail.com> Message-ID: On Fri, Apr 13, 2012 at 3:41 AM, Florent Angly wrote: > Bioperl has a module to read and write ACE files, Bio::Assembly::IO::ace. It > also has a module to read (but not write) SAM files, Bio::Assembly::IO::sam. > So, it looks like you can already do SAMtoACE within Bioperl. Implementing > ACEtoSAM would involve adding write support to the Bio::Assembly::sam > module. This can be helped by looking at how Bio::Assembly::IO::ace and > Bio::Assembly::tigr implement write support. > Regards, > Florent Does SAMtoACE within Bioperl using Bio::Assembly::IO actually work? Note that proper multiple sequence alignments in SAM/BAM format are relatively rare - the vast majority of SAM/BAM files are just pairwise alignments which are not a good fit for ACE. Peter From k.d.murray.91 at gmail.com Fri Apr 13 09:31:06 2012 From: k.d.murray.91 at gmail.com (Kevin Murray) Date: Fri, 13 Apr 2012 19:31:06 +1000 Subject: [Bioperl-l] sequence proxy server In-Reply-To: References: Message-ID: Hi Chris and Peter, Thanks for the advice, it is much appreciated. I have found almost exactly what i was taking about in the bioperl scripts, github link https://github.com/bioperl/bioperl-live/blob/master/scripts/DB/bp_biofetch_genbank_proxy.pl I will have a go at porting this to use a Bio::DB::Flat cache, given that would be exactly what i envisaged. With regards to implementing a Bio::DB module for TogoWS, i may have a crack at it if no one else is (although it will probably take me a while). Are there any pointers or particular styles you guys have (other than TMTOWTDI). Cheers, Regards Kevin Murray From afia.hayati at gmail.com Sun Apr 15 00:15:11 2012 From: afia.hayati at gmail.com (afia hayati) Date: Sun, 15 Apr 2012 09:15:11 +0900 Subject: [Bioperl-l] Proposal of GSoC 2012 >> ACEtoSAM and SAMtoACE In-Reply-To: References: <4F879272.30306@gmail.com> Message-ID: Peter, Florent, and all, thanks for the responses. Ya.., the idea is more to support SAM assemblies within the existing Bio::Assembly::IO. SAM or ACE files once imported should have similar handles and methods. Bio::Assembly::IO::SAM is a read only. I also will try to add write support for that module. In Bio::Assembly::ACE, there are write methods, completed with the quality score, so it "looks like" we can do SAMtoACE converter. Anyway, the main point is to add write support in Bio::Assembly::SAM. Please CMIIW, I am open to corrections and suggestions. best regards, Afiahayati On Fri, Apr 13, 2012 at 5:32 PM, Peter Cock wrote: > On Fri, Apr 13, 2012 at 3:41 AM, Florent Angly > wrote: > > Bioperl has a module to read and write ACE files, > Bio::Assembly::IO::ace. It > > also has a module to read (but not write) SAM files, > Bio::Assembly::IO::sam. > > So, it looks like you can already do SAMtoACE within Bioperl. > Implementing > > ACEtoSAM would involve adding write support to the Bio::Assembly::sam > > module. This can be helped by looking at how Bio::Assembly::IO::ace and > > Bio::Assembly::tigr implement write support. > > Regards, > > Florent > > Does SAMtoACE within Bioperl using Bio::Assembly::IO actually work? > > Note that proper multiple sequence alignments in SAM/BAM format are > relatively rare - the vast majority of SAM/BAM files are just pairwise > alignments which are not a good fit for ACE. > > Peter From jovel_juan at hotmail.com Sun Apr 15 03:27:57 2012 From: jovel_juan at hotmail.com (Juan Jovel) Date: Sun, 15 Apr 2012 03:27:57 +0000 Subject: [Bioperl-l] how to get specific sub-sequence from GenBank file with Bio::DB::GenBank In-Reply-To: References: , , <4F879272.30306@gmail.com>, , Message-ID: Hello All, I want to get some subsequences from provirus sequences in the GenBank, I got the whole sequences with the script below. However, I want to get a specific sub-sequence, which appears in the GenBank files in the line: LTR 9091..9723 how can I modify my script to get only nts 9091-9723 (in this example), instead of the whole sequence. Thanks a lot in advance!________________________HERE THE SCRIPT: #!/usr/bin/perl -wuse strict;use Bio::DB::GenBank;use Bio::SeqIO; chomp(my $infile = $ARGV[0]);open(IN, "$infile") or die "$!";my @ids = ; chomp(my $outfile = $ARGV[1]); my $seqs_out = Bio::SeqIO -> new(-file => ">$outfile", -format => "fasta"); foreach my $entry(@ids){ print "$entry"; my $db = Bio::DB::GenBank->new; my $seq = $db->get_Seq_by_acc($entry); $seqs_out->write_seq($seq); }exit; From roy.chaudhuri at gmail.com Mon Apr 16 11:16:57 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 16 Apr 2012 12:16:57 +0100 Subject: [Bioperl-l] how to get specific sub-sequence from GenBank file with Bio::DB::GenBank In-Reply-To: References: , , <4F879272.30306@gmail.com>, , Message-ID: <4F8BFFA9.9030305@gmail.com> Hi Juan, If you know the LTR coordinates in advance, then you can download a specific subsequence using Bio::DB::GenBank as shown here: http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::GenBank_when_you_have_genomic_coordinates_to_get_a_Seq_object If you don't, then you will need to download the whole sequence as you are doing, but add in some code to print out just the sequence associated with the LTR feature. Something like (untested): for my $feat ($seq->get_SeqFeatures) { $seqs_out->write_seq($feat->spliced_seq) if $feat->primary_tag eq 'LTR'; } Cheers, Roy. On 15/04/2012 04:27, Juan Jovel wrote: > > > Hello All, I want to get some subsequences from provirus sequences in > the GenBank, I got the whole sequences with the script below. > However, I want to get a specific sub-sequence, which appears in the > GenBank files in the line: LTR 9091..9723 how can I > modify my script to get only nts 9091-9723 (in this example), instead > of the whole sequence. Thanks a lot in > advance!________________________HERE THE SCRIPT: #!/usr/bin/perl > -wuse strict;use Bio::DB::GenBank;use Bio::SeqIO; chomp(my $infile = > $ARGV[0]);open(IN, "$infile") or die "$!";my @ids =; chomp(my > $outfile = $ARGV[1]); my $seqs_out = Bio::SeqIO -> new(-file => > ">$outfile", -format => "fasta"); foreach my $entry(@ids){ > print "$entry"; my $db = Bio::DB::GenBank->new; my $seq = > $db->get_Seq_by_acc($entry); $seqs_out->write_seq($seq); }exit; > > > > > > > > _______________________________________________ Bioperl-l mailing > list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sharmashalu.bio at gmail.com Mon Apr 16 20:08:23 2012 From: sharmashalu.bio at gmail.com (shalu sharma) Date: Mon, 16 Apr 2012 16:08:23 -0400 Subject: [Bioperl-l] Amino Acid sequence to nucleotides sequence Message-ID: Hi All, Is there any way in Bioperl i can convert amino acid sequences to nucleotide sequences. Thanks Shalu From p.j.a.cock at googlemail.com Mon Apr 16 20:32:20 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 16 Apr 2012 21:32:20 +0100 Subject: [Bioperl-l] Amino Acid sequence to nucleotides sequence In-Reply-To: References: Message-ID: On Mon, Apr 16, 2012 at 9:08 PM, shalu sharma wrote: > Hi All, > ? ? ? ? ? ?Is there any way in Bioperl i can convert amino acid sequences > to nucleotide sequences. > > Thanks > Shalu Probably - but there is more than one answer since the codon tables are a many-to-one mapping. Are you hoping for one possible nucleotide sequence, perhaps with IUPAC ambiguity characters? Perhaps a specific example of what you want would help - back-translation is a fuzzy term. If you are trying to combine a protein alignment with the original unaligned nucleotide sequences to make a codon alignment that's a different task. Peter From cjfields at illinois.edu Mon Apr 16 20:44:21 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 16 Apr 2012 20:44:21 +0000 Subject: [Bioperl-l] Amino Acid sequence to nucleotides sequence In-Reply-To: References: Message-ID: <45DE0C13-27B3-4E1C-AB8A-83B99DD407AF@illinois.edu> On Apr 16, 2012, at 3:32 PM, Peter Cock wrote: > On Mon, Apr 16, 2012 at 9:08 PM, shalu sharma wrote: >> Hi All, >> Is there any way in Bioperl i can convert amino acid sequences >> to nucleotide sequences. >> >> Thanks >> Shalu > > Probably - but there is more than one answer since the codon > tables are a many-to-one mapping. Are you hoping for one > possible nucleotide sequence, perhaps with IUPAC ambiguity > characters? Perhaps a specific example of what you want > would help - back-translation is a fuzzy term. > > If you are trying to combine a protein alignment with the > original unaligned nucleotide sequences to make a codon > alignment that's a different task. > > Peter We do have a revtranslate function in bioperl that is supposed to deal with ambiguities: https://metacpan.org/module/Bio::Tools::CodonTable#revtranslate I don't know how well-tested it is, but it was added a few years back to Bio::Tools::CodonTable. IIRC Mark Jensen was the developer who did that, and he's pretty meticulous. chris From Russell.Smithies at agresearch.co.nz Mon Apr 16 21:28:11 2012 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 17 Apr 2012 09:28:11 +1200 Subject: [Bioperl-l] sequence proxy server In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCE50A550@exchsth.agresearch.co.nz> I assume you've done the obvious thing and tried downloading from your local mirror? ftp://biomirror.aarnet.edu.au/biomirror/ Or ours: http://www.biomirror.org.nz/ If you have a large number of requests it's almost always faster to download the refseq files and extract locally rather than run queries against NCBI via the web. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Murray > Sent: Saturday, 7 April 2012 1:50 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] sequence proxy server > > Hi all, > > I'm an undergrad student in molecular biology at the ANU in Australia, and > my research projects are becoming increasingly bioinformatics heavy. The > latest one has involved quite a large amount of sequence retrieval from > GenBank and GenPept. The download speed to Australia from NCBI's servers > is rather slow, and i've been thinking about how we can improve this. One > solution would be to use Bio::DB::Flat with GenBank sequences on a local > computer. However, in a situation where there are multiple people in a lab > doing bioinformatics, it seems to me a bit of a waste to have the entire > genbank/genpept database, or even the relevant sections thereof, on each > computer. So, i though about writing a "sequence proxy" cgi script, and a > corresponding module, which would work a bit like this: > > The user calls Bio::DB::SeqProxy::GenBank as they would Bio::DB::GenBank, > with the exception that a parameter for the address of the sequence proxy > server is required. > The module then sends a request similar to that sent to NCBI's servers by > calling Bio::DB::GenBank->get_Seq_by_x() to the sequence proxy server I > believe all requests go to the efetch page now (please correct me if I'm > wrong, i have read the relevant bioperl module code but not thoroughly), so > the CGI script on the sequence proxy would take arguments in a similar > fashion to make writing the client side module easier. > The CGI script would use a Bio::DB::Flat database, or an interface to an SQL > database to determine if the required sequence is stored locally. (as a aside, > i'd like your thoughts on Bio::DB::Flat vs Bio::DB::Sql or similar) If the > sequence exists locally, it would be returned to the user, either as plain text, > or inside an XML container (see below). > If not, it would be retrieved from the remote database using the relevant > Bio::DB module, and returned. > > The sequence would either be returned as the relevant sequence format > (which would default to GenBank format) in plain text, or as an XML > document similar to: > > > 1 > ___YOUR GENBANK FILE HERE___ Local > Database The aim of the xml document would be to > simplify handling of server errors and allow for the specification of other > metadata such as which database the sequence came from. > > > Firstly, I'd like to know if this sounds feasible, and if so, if someone is already > working on something similar? I don't want to reinvent the wheel. > Secondly, I'd like to ask for your comments and advice. Being reasonably new > to bioperl (started using bioperl about 6 months ago, but I've been coding in > various languages for 8 years) I don't expect to have considered things that > may seem obvious to a more experienced bioperl-er, so please be as brutally > constructive in your criticism as you see fit =]. > > I know this is alot of questions, so thanks in advance for your help. > > Cheers, and a happy Easter to those who celebrate it. > > Regards > Kevin Murray > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From hnorpois at googlemail.com Thu Apr 19 14:44:50 2012 From: hnorpois at googlemail.com (Hermann Norpois) Date: Thu, 19 Apr 2012 16:44:50 +0200 Subject: [Bioperl-l] Transcriptional Regulatory Element Database Message-ID: Hello, I would like to get access to the Transcriptional Regulatory Element Database (http://rulai.cshl.edu/cgi-bin/TRED/tred.cgi?process=searchPromForm) via Bioperl. I did not find a module that does the job. Is it possible to modify a module? Is it generally possible to access this database (by means of bioperl)? Thank you norpois From jason.stajich at gmail.com Thu Apr 19 22:45:32 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 19 Apr 2012 15:45:32 -0700 Subject: [Bioperl-l] Transcriptional Regulatory Element Database In-Reply-To: References: Message-ID: <80CFDDE6-FA7F-4614-AE5D-22A5398EAA17@gmail.com> Have you first tried emailing the author listed at the bottom of the page? That seems like a more direct way to get this information. On Apr 19, 2012, at 7:44 AM, Hermann Norpois wrote: > Hello, > > I would like to get access to the Transcriptional Regulatory Element > Database (http://rulai.cshl.edu/cgi-bin/TRED/tred.cgi?process=searchPromForm) > via Bioperl. I did not find a module that does the job. Is it possible to > modify a module? Is it generally possible to access this database (by means > of bioperl)? > Thank you > norpois > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From merche at uni-bonn.de Mon Apr 23 12:31:27 2012 From: merche at uni-bonn.de (Merche Castillo) Date: Mon, 23 Apr 2012 14:31:27 +0200 Subject: [Bioperl-l] Problems installing BioPerl & cpan Message-ID: <4F954B9F.9020506@uni-bonn.de> I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation | use strict; use warnings; use Getopt::Long; use Bio::EnsEMBL::Registry; my $reg = "Bio::EnsEMBL::Registry"; $reg->load_registry_from_db( -host => "ensembldb.ensembl.org", -user => "anonymous" ); my $db_list=$reg->get_all_adaptors(); my @line; foreach my $db (@$db_list){ @line = split ('=',$db); print $line[0]."\n"; } | I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. Thanks for your help Merche -- ************************************;) Mercedes Castillo INRES, Dept. Molecular Phytomedicine University of Bonn Karlrobert-Kreiten-str 13 53115 Bonn +49(0)22873-60143 merche at uni-bonn.de ***************************************** From jason.stajich at gmail.com Mon Apr 23 13:44:51 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 23 Apr 2012 06:44:51 -0700 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <4F954B9F.9020506@uni-bonn.de> References: <4F954B9F.9020506@uni-bonn.de> Message-ID: <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> That module really has nothing to do with Fgenesh parsing so I don't think this is a particularly good test script - try one of the scripts that comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl However, you should know that module is part of EnsEMBL not BioPerl - it requires an additional package of modules be installed. if you use CPAN to install things you can do cpan> install Bio::EnsEMBL::Registry On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: > I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc > > I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation > > | use strict; > use warnings; > > use Getopt::Long; > use Bio::EnsEMBL::Registry; > > my $reg = "Bio::EnsEMBL::Registry"; > $reg->load_registry_from_db( > -host => "ensembldb.ensembl.org", > -user => "anonymous" > ); > my $db_list=$reg->get_all_adaptors(); > my @line; > > foreach my $db (@$db_list){ > @line = split ('=',$db); > print $line[0]."\n"; > } > | > > I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* > > I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. > > Thanks for your help Merche > > -- > ************************************;) > Mercedes Castillo > INRES, Dept. Molecular Phytomedicine > University of Bonn > > Karlrobert-Kreiten-str 13 > 53115 Bonn > +49(0)22873-60143 > merche at uni-bonn.de > ***************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Mon Apr 23 13:48:53 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 23 Apr 2012 13:48:53 +0000 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <4F954B9F.9020506@uni-bonn.de> References: <4F954B9F.9020506@uni-bonn.de> Message-ID: <26594DC3-0C8D-41E5-BD22-BC3F1DC7E1F0@illinois.edu> You need the Ensembl Perl API code, which requires bioperl but is not part of the bioperl distribution. See here for the latest: http://ensembl.org/info/docs/api/index.html chris On Apr 23, 2012, at 7:31 AM, Merche Castillo wrote: > I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc > > I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation > > | use strict; > use warnings; > > use Getopt::Long; > use Bio::EnsEMBL::Registry; > > my $reg = "Bio::EnsEMBL::Registry"; > $reg->load_registry_from_db( > -host => "ensembldb.ensembl.org", > -user => "anonymous" > ); > my $db_list=$reg->get_all_adaptors(); > my @line; > > foreach my $db (@$db_list){ > @line = split ('=',$db); > print $line[0]."\n"; > } > | > > I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* > > I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. > > Thanks for your help Merche > > -- > ************************************;) > Mercedes Castillo > INRES, Dept. Molecular Phytomedicine > University of Bonn > > Karlrobert-Kreiten-str 13 > 53115 Bonn > +49(0)22873-60143 > merche at uni-bonn.de > ***************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Mon Apr 23 13:54:54 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 23 Apr 2012 06:54:54 -0700 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <4F955E52.50400@uni-bonn.de> References: <4F954B9F.9020506@uni-bonn.de> <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> <4F955E52.50400@uni-bonn.de> Message-ID: <78EB7156-3EC8-4CCD-AE6E-C221B12D4F58@gmail.com> Then the next logical thing to do is go to the Ensembl page for info on how to install their modules. http://uswest.ensembl.org/info/docs/api/api_installation.html On Apr 23, 2012, at 6:51 AM, Merche Castillo wrote: > Hi > > Thanks for your reply. I'm working on some EnsEMBL scripts too, that's why I tried this script. I did look for the Bio::EnsEMBL::Registry on cpan but returns "no object found". > > > > On 04/23/2012 03:44 PM, Jason Stajich wrote: >> >> That module really has nothing to do with Fgenesh parsing so I don't think this is a particularly good test script - try one of the scripts that comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl >> >> However, you should know that module is part of EnsEMBL not BioPerl - it requires an additional package of modules be installed. >> >> if you use CPAN to install things you can do >> cpan> install Bio::EnsEMBL::Registry >> >> >> On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: >> >>> I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc >>> >>> I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation >>> >>> | use strict; >>> use warnings; >>> >>> use Getopt::Long; >>> use Bio::EnsEMBL::Registry; >>> >>> my $reg = "Bio::EnsEMBL::Registry"; >>> $reg->load_registry_from_db( >>> -host => "ensembldb.ensembl.org", >>> -user => "anonymous" >>> ); >>> my $db_list=$reg->get_all_adaptors(); >>> my @line; >>> >>> foreach my $db (@$db_list){ >>> @line = split ('=',$db); >>> print $line[0]."\n"; >>> } >>> | >>> >>> I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* >>> >>> I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. >>> >>> Thanks for your help Merche >>> >>> -- >>> ************************************;) >>> Mercedes Castillo >>> INRES, Dept. Molecular Phytomedicine >>> University of Bonn >>> >>> Karlrobert-Kreiten-str 13 >>> 53115 Bonn >>> +49(0)22873-60143 >>> merche at uni-bonn.de >>> ***************************************** >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> > > > -- > ************************************;) > Mercedes Castillo > INRES, Dept. Molecular Phytomedicine > University of Bonn > > Karlrobert-Kreiten-str 13 > 53115 Bonn > +49(0)22873-60143 > merche at uni-bonn.de > ***************************************** Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Mon Apr 23 13:51:24 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 23 Apr 2012 13:51:24 +0000 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> References: <4F954B9F.9020506@uni-bonn.de> <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> Message-ID: <10108081-D3CC-4A8F-9372-390A2C377088@illinois.edu> Unfortunately the Ensembl code isn't on CPAN (wish it were, it might make things a little easier). chris On Apr 23, 2012, at 8:44 AM, Jason Stajich wrote: > That module really has nothing to do with Fgenesh parsing so I don't think this is a particularly good test script - try one of the scripts that comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl > > However, you should know that module is part of EnsEMBL not BioPerl - it requires an additional package of modules be installed. > > if you use CPAN to install things you can do > cpan> install Bio::EnsEMBL::Registry > > > On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: > >> I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc >> >> I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation >> >> | use strict; >> use warnings; >> >> use Getopt::Long; >> use Bio::EnsEMBL::Registry; >> >> my $reg = "Bio::EnsEMBL::Registry"; >> $reg->load_registry_from_db( >> -host => "ensembldb.ensembl.org", >> -user => "anonymous" >> ); >> my $db_list=$reg->get_all_adaptors(); >> my @line; >> >> foreach my $db (@$db_list){ >> @line = split ('=',$db); >> print $line[0]."\n"; >> } >> | >> >> I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* >> >> I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. >> >> Thanks for your help Merche >> >> -- >> ************************************;) >> Mercedes Castillo >> INRES, Dept. Molecular Phytomedicine >> University of Bonn >> >> Karlrobert-Kreiten-str 13 >> 53115 Bonn >> +49(0)22873-60143 >> merche at uni-bonn.de >> ***************************************** >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From l.m.timmermans at students.uu.nl Mon Apr 23 14:16:04 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Mon, 23 Apr 2012 16:16:04 +0200 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: <10108081-D3CC-4A8F-9372-390A2C377088@illinois.edu> References: <4F954B9F.9020506@uni-bonn.de> <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> <10108081-D3CC-4A8F-9372-390A2C377088@illinois.edu> Message-ID: Yeah, that's fairly useless. Does anyone know the reason for that? Is it just inertia, or is it more? Leon On Mon, Apr 23, 2012 at 3:51 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > Unfortunately the Ensembl code isn't on CPAN (wish it were, it might make > things a little easier). > > chris > > On Apr 23, 2012, at 8:44 AM, Jason Stajich wrote: > > > That module really has nothing to do with Fgenesh parsing so I don't > think this is a particularly good test script - try one of the scripts that > comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl > > > > However, you should know that module is part of EnsEMBL not BioPerl - it > requires an additional package of modules be installed. > > > > if you use CPAN to install things you can do > > cpan> install Bio::EnsEMBL::Registry > > > > > > On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: > > > >> I'm posting this message out of pure desperation, because I really > don't know what else to try. I'm a beginner in bioperl and I'm working on a > script to parse out some results I got from MolQuest fgenesh. Results are > out in .txt format and I want to parse them to GFF and fasta file for mRNA > and protein sequences to facilitate comparison with other results we have. > I would like to use BioPerl for other purposes in the future so I'm very > interested in getting it ready on my pc > >> > >> I followed the instructions herehttp:// > www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install > CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All > tests were ok, but when I ran this script to test the installation > >> > >> | use strict; > >> use warnings; > >> > >> use Getopt::Long; > >> use Bio::EnsEMBL::Registry; > >> > >> my $reg = "Bio::EnsEMBL::Registry"; > >> $reg->load_registry_from_db( > >> -host => "ensembldb.ensembl.org", > >> -user => "anonymous" > >> ); > >> my $db_list=$reg->get_all_adaptors(); > >> my @line; > >> > >> foreach my $db (@$db_list){ > >> @line = split ('=',$db); > >> print $line[0]."\n"; > >> } > >> | > >> > >> I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* > >> > >> I tried to install BioPerl again via Build.PL, running as root, but > still came to the same outcome. > >> > >> Thanks for your help Merche > >> > >> -- > >> ************************************;) > >> Mercedes Castillo > >> INRES, Dept. Molecular Phytomedicine > >> University of Bonn > >> > >> Karlrobert-Kreiten-str 13 > >> 53115 Bonn > >> +49(0)22873-60143 > >> merche at uni-bonn.de > >> ***************************************** > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > > jason.stajich at gmail.com > > jason at bioperl.org > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Apr 23 14:20:59 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 23 Apr 2012 14:20:59 +0000 Subject: [Bioperl-l] Problems installing BioPerl & cpan In-Reply-To: References: <4F954B9F.9020506@uni-bonn.de> <60AE84D8-FEE8-4B3C-8044-387B35071F3A@gmail.com> <10108081-D3CC-4A8F-9372-390A2C377088@illinois.edu> Message-ID: <70FCB632-4CD5-4F28-A6B6-F93507397435@illinois.edu> Not sure, but it may have something to do with the requirement for a very old bioperl (v1.2.3). chris On Apr 23, 2012, at 9:16 AM, Leon Timmermans wrote: > Yeah, that's fairly useless. Does anyone know the reason for that? Is it just inertia, or is it more? > > Leon > > On Mon, Apr 23, 2012 at 3:51 PM, Fields, Christopher J wrote: > Unfortunately the Ensembl code isn't on CPAN (wish it were, it might make things a little easier). > > chris > > On Apr 23, 2012, at 8:44 AM, Jason Stajich wrote: > > > That module really has nothing to do with Fgenesh parsing so I don't think this is a particularly good test script - try one of the scripts that comes in bioperl scripts directory like scripts/utilities/bp_sreformat.pl > > > > However, you should know that module is part of EnsEMBL not BioPerl - it requires an additional package of modules be installed. > > > > if you use CPAN to install things you can do > > cpan> install Bio::EnsEMBL::Registry > > > > > > On Apr 23, 2012, at 5:31 AM, Merche Castillo wrote: > > > >> I'm posting this message out of pure desperation, because I really don't know what else to try. I'm a beginner in bioperl and I'm working on a script to parse out some results I got from MolQuest fgenesh. Results are out in .txt format and I want to parse them to GFF and fasta file for mRNA and protein sequences to facilitate comparison with other results we have. I would like to use BioPerl for other purposes in the future so I'm very interested in getting it ready on my pc > >> > >> I followed the instructions herehttp://www.bioperl.org/wiki/Installing_Bioperl_for_Unix. I managed to install CPAN in root mode (otherwise it wouldn't work) and BioPerl via CPAN. All tests were ok, but when I ran this script to test the installation > >> > >> | use strict; > >> use warnings; > >> > >> use Getopt::Long; > >> use Bio::EnsEMBL::Registry; > >> > >> my $reg = "Bio::EnsEMBL::Registry"; > >> $reg->load_registry_from_db( > >> -host => "ensembldb.ensembl.org", > >> -user => "anonymous" > >> ); > >> my $db_list=$reg->get_all_adaptors(); > >> my @line; > >> > >> foreach my $db (@$db_list){ > >> @line = split ('=',$db); > >> print $line[0]."\n"; > >> } > >> | > >> > >> I got the error:*"Can't locate Bio/EnsEMBL/Registry.pm in @INC"* > >> > >> I tried to install BioPerl again via Build.PL, running as root, but still came to the same outcome. > >> > >> Thanks for your help Merche > >> > >> -- > >> ************************************;) > >> Mercedes Castillo > >> INRES, Dept. Molecular Phytomedicine > >> University of Bonn > >> > >> Karlrobert-Kreiten-str 13 > >> 53115 Bonn > >> +49(0)22873-60143 > >> merche at uni-bonn.de > >> ***************************************** > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > > jason.stajich at gmail.com > > jason at bioperl.org > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rbuels at gmail.com Mon Apr 23 23:49:10 2012 From: rbuels at gmail.com (Robert Buels) Date: Mon, 23 Apr 2012 19:49:10 -0400 Subject: [Bioperl-l] Announcing OBF Google Summer of Code Accepted Students Message-ID: <4F95EA76.4030004@gmail.com> Hello all, I'm very pleased and excited to announce that the Open Bioinformatics Foundation has selected 5 very capable students to work on OBF projects this summer as part of the Google Summer of Code program. The accepted students, their projects, and their mentors (in alphabetical order): Wibowo Arindrarto SearchIO Implementation in Biopython mentored by Peter Cock Lenna Peterson Diff My DNA: Development of a Genomic Variant Toolkit for Biopython mentored by Brad Chapman Marjan Povolni The worlds fastest parallelized GFF3/GTF parser in D, and an interfacing biogem plugin for Ruby mentored by Pjotr Prins, Francesco Strozzi, Raoul Bonnal Artem Tarasov Fast parallelized GFF3/GTF parser in C++, with Ruby FFI bindings mentored by Pjotr Prins, Francesco Strozzi, Raoul Bonnal Clayton Wheeler Multiple Alignment Format parser for BioRuby mentored by Francesco Strozzi and Raoul Bonnal As in every year, we received many great applications and ideas. However, funding and mentor resources are limited, and we were not able to accept as many as we would have liked. Our deepest thanks to all the students who applied: we sincerely appreciate the time and effort you put into your applications, and hope you will still consider being a part of the OBF's open source projects, even without Google funding. I speak for myself and all of the mentors who read and scored applications when I say that we were truly honored by the number and quality of the applications we received. For the accepted students: congratulations! You have risen to the top of a very competitive application process. Now it's time to "put your money where your mouth is", as the saying goes. Let's get out there and write some great code this summer! Best regards, Rob ---- Robert Buels OBF GSoC 2012 Administrator From Simon.Guest at agresearch.co.nz Mon Apr 30 06:00:26 2012 From: Simon.Guest at agresearch.co.nz (Guest, Simon) Date: Mon, 30 Apr 2012 18:00:26 +1200 Subject: [Bioperl-l] Circular dependency problems packaging BioPerl as RPM Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCECE5EF4@exchsth.agresearch.co.nz> Dear BioPerlers, I am building BioPerl and BioPerl-Run as RPMs, since I have to install them on several servers, and really don't want to run CPAN installation scripts on each machine. It has been a tortuous journey of chasing down dependencies and packaging them (thank goodness for cpanspec), but I think I am nearly done. However, I have hit a circular dependency / incompatibility problem between BioPerl and BioPerl-Run. When building BioPerl-Run-1.006900, it insists on BioPerl-1.6.901: Checking prerequisites... - ERROR: Bio::Root::Version (1.006001) is installed, but we need version >= 1.006900 But then BioPerl-Run-1.006900 has dependencies on Bio::Expression::DataSet Bio::Expression::Platform Bio::Expression::Sample Bio::Expression::Contact which seem to be in BioPerl-1.6.1 but not BioPerl-1.6.901 Does anyone know of this problem? Are there any suggestions for work arounds? cheers, Simon ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Mon Apr 30 13:42:34 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 30 Apr 2012 13:42:34 +0000 Subject: [Bioperl-l] Circular dependency problems packaging BioPerl as RPM In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF34CCECE5EF4@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF34CCECE5EF4@exchsth.agresearch.co.nz> Message-ID: <3E2BD058-EDB0-482B-9727-8DB7FB466737@illinois.edu> The Bio::Expression dependencies are unusual, I'll have to look through and find the modules responsible for pulling these in. When I last ran these no tests failed, so either the dependency is off or no tests have been written for the modules in question. We can always release a new CPAN BioPerl-Run to deal with it. chris On Apr 30, 2012, at 1:00 AM, Guest, Simon wrote: > Dear BioPerlers, > > I am building BioPerl and BioPerl-Run as RPMs, since I have to install them on > several servers, and really don't want to run CPAN installation scripts on > each machine. > > It has been a tortuous journey of chasing down dependencies and packaging them > (thank goodness for cpanspec), but I think I am nearly done. > > However, I have hit a circular dependency / incompatibility problem between > BioPerl and BioPerl-Run. > > When building BioPerl-Run-1.006900, it insists on BioPerl-1.6.901: > Checking prerequisites... > - ERROR: Bio::Root::Version (1.006001) is installed, but we need version >= 1.006900 > > But then BioPerl-Run-1.006900 has dependencies on > Bio::Expression::DataSet > Bio::Expression::Platform > Bio::Expression::Sample > Bio::Expression::Contact > which seem to be in BioPerl-1.6.1 but not BioPerl-1.6.901 > > Does anyone know of this problem? > > Are there any suggestions for work arounds? > > cheers, > Simon > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hnorpois at googlemail.com Mon Apr 30 16:45:50 2012 From: hnorpois at googlemail.com (Hermann Norpois) Date: Mon, 30 Apr 2012 18:45:50 +0200 Subject: [Bioperl-l] different interpretion of get_seq_by_id by DB::GenBank and DB:Entrez::Gene Message-ID: I am a confused by the different interpretation of get_seq_by_id. Obviously it is something different for the two modules. Script1: #!/bin/perl -w use Bio::DB::GenBank; use Bio::SeqIO; # Das output-Format wird festgelegt $seqio_obj = Bio::SeqIO->new(-file => '>sequence.fasta', -format => 'fasta' ); $db_obj = Bio::DB::GenBank->new; $id = "BC049766"; # accesscion number $seq_obj = $db_obj->get_Seq_by_id($id); $seqio_obj->write_seq($seq_obj); Script2: #!/bin/perl -w use strict; use Bio::DB::EntrezGene; my $id = "Penk1"; #name of the gene my $db = new Bio::DB::EntrezGene; my $seq = $db->get_Seq_by_id($id); my $ac = $seq->annotation; for my $ann ($ac->get_Annotations('dblink')) { if ($ann->database eq "Evidence Viewer") { # get the sequence identifier, the start, and the stop my ($contig,$from,$to) = $ann->url =~ /contig=([^&]+).+from=(\d+)&to=(\d+)/; print "$contig\t$from\t$to\n"; } } Thank you Hermann Norpois From jimhu at tamu.edu Mon Apr 30 17:38:23 2012 From: jimhu at tamu.edu (Jim Hu) Date: Mon, 30 Apr 2012 12:38:23 -0500 Subject: [Bioperl-l] Gbrowse file uploads, bigwig and chromosome sizes files Message-ID: <1F4B23DC-2CD1-4D61-A6F4-D823B4C7C7D1@tamu.edu> I'm not sure how many of our issues are gbrowse-specific vs. more general bioperl issues, so I'm cross-posting to both lists. We think we've traced our problems uploading wiggle files to our gbrowse to the failure to create the chromosome.size file. Short version: - what is supposed to be in the locationlist? Chromosomes only or just genes? - why does the chromosome sizes try to get everything in the locationlist, whether or not it's a chromosome? Long version: Our E. coli MG1655 database was loaded several years ago with bp_seqfeature_load.pl -d gb_MG1655_jh -f -c NC_000913.gb.gff NC_000913.gb.fasta -u -p The mysql database has 4,146 entries in the locationlist where the first one is for the chromosome and the others are named for genes. When we ask Gbrowse to generate the chromosome sizes file, instead of doing what I expect (look up the reference feature names), it tries to get the size of every feature in the locationlist. I can't actually find the fasta file I used. When this happens, the eval in Bio::Graphics::Broser2::Dataloader dies because it does not seem to be passing allow_aliases to this subroutine in Bio::DB::Seqfeature::Store:: DBI::mysql sub _name_sql { my $self = shift; my ($name,$allow_aliases,$join) = @_; my $name_table = $self->_name_table; my $from = "$name_table as n"; my ($match,$string) = $self->_match_sql($name); my $where = "n.id=$join AND n.name $match"; $where .= " AND n.display_name>0" unless $allow_aliases; return ($from,$where,'',$string); } Here's the backtrace: CHROMOSOME SIZES at /usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 942, referer: Bio::DB::SeqFeature::Store::DBI::mysql::_name_sql('Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0xb2bfed0)', 'b0001', undef, 'f.id') called at /usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/DBI/mysql.pm Bio::DB::SeqFeature::Store::DBI::mysql::_features('Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0xb2bfed0)', '-name', 'b0001', '-class', undef, '-aliases', undef, Bio::DB::SeqFeature::Store::get_features_by_name('Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0xb2bfed0)', '-name', 'b0001') called at /usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store.pm line Bio::DB::SeqFeature::Store::segment('Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0xb2bfed0)', 'b0001') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/DataLoader.pm line 171, eval {...} called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/DataLoader.pm line 169, Bio::Graphics::Browser2::DataLoader::generate_chrom_sizes('Bio::Graphics::Browser2::DataLoader=HASH(0xb2bfbd0)', '/var/tmp/gbrowse2/chrom_sizes/MG1655.sizes') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/DataLoader.pm line 143, Bio::Graphics::Browser2::DataLoader::chrom_sizes('Bio::Graphics::Browser2::DataLoader=HASH(0xb2bfbd0)') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/Action.pm line 1117, referer: Bio::Graphics::Browser2::Action::ACTION_chrom_sizes('Bio::Graphics::Browser2::Action=REF(0xa993ea0)', 'CGI=HASH(0xaf57450)') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/Render.pm line 427, Bio::Graphics::Browser2::Render::asynchronous_event('Bio::Graphics::Browser2::Render::HTML=HASH(0xaf590d8)') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/Render.pm line 356, referer: Bio::Graphics::Browser2::Render::run_asynchronous_event('Bio::Graphics::Browser2::Render::HTML=HASH(0xaf590d8)') called at /usr/local/lib/perl/5.10.1/Bio/Graphics/Browser2/Render.pm line 274, referer: Bio::Graphics::Browser2::Render::run('Bio::Graphics::Browser2::Render::HTML=HASH(0xaf590d8)') called at /usr/lib/cgi-bin/gb2/gbrowse line 50, referer: ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From hnorpois at googlemail.com Mon Apr 30 18:06:40 2012 From: hnorpois at googlemail.com (Hermann Norpois) Date: Mon, 30 Apr 2012 20:06:40 +0200 Subject: [Bioperl-l] Retrieving promoter sequenc Message-ID: Dear list, I try to write a script for retrieving a 700bp sequence upstream of the 5?prime of TTS (a putative promoter sequence). This page gave me some information how to do so (Chapter *Using Bio::DB::EntrezGene to get genomic coordinates* AND *Using Bio::DB::GenBank when you have genomic coordinates to get a Seq object*): http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences Actually I do not have an idea how to define $chr_acc_ver (see below) #!/bin/perl -w use strict; use Bio::DB::EntrezGene; use Bio::SeqIO; use Bio::DB::GenBank; my $id = "12064"; # bdnf my $seqio_obj = Bio::SeqIO->new(-file => '>s2.fasta', -format => 'fasta' ); my $db = new Bio::DB::EntrezGene; my $seq = $db->get_Seq_by_id($id); my $ac = $seq->annotation; for my $ann ($ac->get_Annotations('dblink' )) { if ($ann->database eq "Evidence Viewer") { # get the sequence identifier, the start, and the stop my ($contig,$from,$to) = $ann->url =~ /contig=([^&]+).+from=(\d+)&to=(\d+)/; my $chr_start = $from-700; my $chr_stop = $from; my $gb = Bio::DB::GenBank->new(-format => 'genbank', -seq_start => $chr_start, -seq_stop => $chr_stop, # -strand => $strand ); my $obj = $gb->get_Seq_by_id($chr_acc_ver); # *How do I define $chr_acc_ver?* $seqio_obj->write_seq($obj); # print "$contig\t$from\t$to\n$chr_start\t$chr_stop\n"; } } Can anybody give me a hint how this might work? Thanks Hermann Norpois From maquino at knome.com Mon Apr 30 19:15:26 2012 From: maquino at knome.com (Mark Aquino) Date: Mon, 30 Apr 2012 15:15:26 -0400 Subject: [Bioperl-l] unblessed reference in $sam->pileup error. Message-ID: <984E64E7-DF37-4BD1-BB84-DC86816A42E8@knome.com> Hi all, I'm trying to call all bases from a bam and count their depths, at first I was doing this getting all alignments that cover a certain region, but realized that writing the logic to detect indels via the cigar string was a bit more complicated than I thought so I decided to try this with the pileup method from Bio::DB::Sam / Bio::DB::Bam::Pileup however I am getting this error: Can't call method "b" on unblessed reference at ./coverageDepths.pl line 114, line 1. when trying to use the $pileup->alignment method. Does anyone have any idea what I'm missing? 109 $sam->pileup('1:550968-550969', 110 sub { 111 my ($seqid,$pos,$pileup) = @_; 112 for my $p (@$pileup){ 113 if ($p->indel){ print "INDEL!\n"}; 114 my $b = $pileup->b; 115 my $qbase = substr($b->qseq, $pileup->qpos,1); 116 print "$qbase\n"; 117 } 118 }); Thanks, Mark From maquino at knome.com Mon Apr 30 19:18:35 2012 From: maquino at knome.com (Mark Aquino) Date: Mon, 30 Apr 2012 15:18:35 -0400 Subject: [Bioperl-l] unblessed reference on sam->pileup Message-ID: <33D55D60-7986-4971-9802-47AB9CDE3E24@knome.com> Nevermind, as usual 5 seconds after sending an email to the group I realized what I was doing wrong the whole time. From exceptlowang at gmail.com Wed Apr 18 00:00:08 2012 From: exceptlowang at gmail.com (Tim White) Date: Wed, 18 Apr 2012 00:00:08 -0000 Subject: [Bioperl-l] Bio::SeqIO::tab deletes gap characters when reading sequences, which is inconvenient Message-ID: <4F8E03FE.7000506@gmail.com> Hi, Bio::SeqIO::tab (what you get when specifying -format => 'tab' to Bio::SeqIO->new()) is perfect for converting sequences into a one-per-line format, so that standard line-oriented UNIX tools (grep, comm etc.) work as expected. Except... I just discovered that it deletes gap ("-") characters when reading sequences, so it can't be used to round-trip any files that contain these. This is a source of grief as I frequently work with FASTA files that contain aligned sequences, and thus gap characters. This is all because the next_seq() function in Bio::SeqIO::tab.pm contains the line: $seq =~ s/\W//g; which removes all non-alphanumeric characters from the sequence data. IMHO it would be *much* better if this was changed to: $seq =~ s/\s//g; which simply removes all whitespace characters (particularly including the \r that often appears at the ends of lines on text files that have visited Windows), enabling gap characters (and, for example, periods and asterisks) to be preserved. Alternatively, you could simply get rid of this line of code and allow whitespace characters through. I'm not sure whether this counts as a "bug", as a cursory search didn't turn up any docs explaining precisely what characters are and aren't preserved by classes implementing Bio::SeqIO, but it's certainly inconsistent (at least Bio::SeqIO::fasta, and Bio::SeqIO::table, with columns and delimiters set up appropriately, allow round-tripping of files containing gap characters) as well as extremely inconvenient for me personally, and I suspect for others. Assuming no harm would be done by making the above change, what's the best thing to do to get this changed? I've simply edited my own local copy of tab.pm to make the above change, but obviously if others agree I'd like to get the change done upstream. Thanks, Tim From mohammadali.alavi at edu.uni-graz.at Sat Apr 21 09:22:21 2012 From: mohammadali.alavi at edu.uni-graz.at (Alavi, Mohammadali (0313xxx)) Date: Sat, 21 Apr 2012 11:22:21 +0200 Subject: [Bioperl-l] piping values into an existing GENBANK file Message-ID: <70DA93B804A15C4387B05DEF33BC255701A1CE149E36@MSIGI.stud.ad.uni-graz.at> Hello All, I have a GENBANK file already, to which I need to add some feauture. To be precise, I want to add the data (over the COG function) to the CDSs present in the GENBANK file. The data (COG functions) I need to add is included in an array in a manner that the first value is the value needed to be added to my first CDS in the GENBANK file, the second value needs to be added to the second CDS in the GENBANK file and so on. I tried to add the data in a tag/value style to the CDSs (as described in HOW TO:Feautures-Annotation provided by Biopel), which actually basically works. The Problem is though, I do not know how I could tell Perl/Bioperl to only take one single value at a time and add it in a tag/value style to a CDS and then take the next (and only the next) value and add it to the NEXT CDS and so on. Here is the code I used. As you see, using the for $item(@array) is not appropriate, since it adds all the values of my array to all CDSs! So is there a way of piping in values one after another into CDSs one after another in a file using Bioperl?! or maybe how about another way of doing it in regular Perl? I would appreciate any help on that very much! Bioperl I'm using: 1.6.1 The Active Perl I'm using : 5.12.4 (on Windows Vista) #!/bin/perl use Bio::SeqIO; use Bio::SeqFeature::Generic; use warnings; @COGlist = qw(motility General metabolism nunknown); # think of this as the #array I would like to add the values of to my file, the real one has ofcourse #as many values as the number of CDSs in the GENBANK file $seqio_object = Bio::SeqIO -> new(-file => "file.gbk", -format => "genbank"); $seq_object = $seqio_object -> next_seq; for $feat_object ($seq_object -> get_SeqFeatures){ for $item(@COGlist){ # this would add all elements of the array to all of CDSs and is therefore wrong! $feat_object -> add_tag_value("note", $item); } for $tags ($feat_object -> get_all_tags){ print "tag:".$tags . "\n"; for $values ($feat_object -> get_tag_values($tags)){ print "value: " . $values . "\n"; # as one might imagine this does not give the output I have been looking for :-)) } } } From huansheng.xu at gmail.com Sun Apr 22 14:15:44 2012 From: huansheng.xu at gmail.com (Huansheng Xu) Date: Sun, 22 Apr 2012 10:15:44 -0400 Subject: [Bioperl-l] configuration problem with Bio::Tools::Run::Alignment::ClustalW Message-ID: Hi, I am a postdoc fellow at Massachusetts General Hospital in Boston. I am writing to seek help with the Bio::Tools::Run::Alignment::ClustalW module available at the BioPerl website. I tried to align some DNA sequences contained in a FASTA file with the module embeded in a propram (as shown below), but got stuck there. The program works very well for protein sequences. I think maybe I need to configure the module specifically for DNA, but I do not know how to do that. Could you take a look and let me know how to do the configuration? Thanks a lot! Best, Huansheng Xu -------------------------------------------------------------------------------------------------------------------------------------------------------------------- #! /usr/bin/perl use Bio::Perl; use Bio::SearchIO; use Bio::AlignIO; use Bio::Tools::Run::Alignment::Clustalw; use warnings; use strict; my $filename = $ARGV[0]; die "Usage: $0 \n" unless $filename; die "File $filename not found.\n" unless -f $filename; # Read the list of raw sequences from the file you feed the program my $fh = Bio::SeqIO->newFh(-file=>$filename, -format=>'fasta'); my @seq_array=<$fh>; # pass the parameters and generate a factory to run the alignmnet wiht ClustalW my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); @params = ('ktuple' => 2, 'dnamatrix' => 'IUB') if ($seq_array[0]->alphabet eq 'dna'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); my $seq_array_ref = \@seq_array; my $aln = $factory->align($seq_array_ref); # create a new AlignIO object my $out = Bio::AlignIO->new(-file=> ">$filename.aln", -format=> 'clustalw'); $out->write_aln($aln); From bubli_thakur at rediffmail.com Sat Apr 21 02:59:50 2012 From: bubli_thakur at rediffmail.com (subarna thakur) Date: 21 Apr 2012 02:59:50 -0000 Subject: [Bioperl-l] =?utf-8?q?codon_usage?= Message-ID: <20120421025950.8579.qmail@f4mail-235-122.rediffmail.com> I am writing a script for determining number of genes containing a particular codon. The codons are mentioned in a separate file. The output is coming all right for the first codon mentioned in the file but for the other codons , the script is not working. Please suggest the error in the script. The script is as follows ---- #!/usr/bin/perl -w use Bio::SeqIO; $file2="table.txt"; $codon=0; open OUT, ">out-test.txt" or die $!; $seqio_obj = Bio::SeqIO->new( -file => "gopi2.txt" , '-format' => 'Fasta'); open( my $fh2, $file2 ) or die "$!"; while( my $line = <$fh2> ){ $acc=$line; chomp $acc; while ($seq1= $seqio_obj->next_seq){ my @output = $seq1->id; my $string = $seq1->seq; $v=0; $l= length($string); $t=$l/3; $k=0; for ($i=1; $i <= $t; $i++){ @array2 = substr($string, $k, 3); $k=$k+3; foreach $value (@array2) { if ($value eq "$acc") { print OUT " The sequence id is @output\n"; print OUT "$acc codon found in position $i\n\n"; $v=$v+1; } } } if ($v==0) { $h=0; } else { $h=1; } $codon=$codon+$h; } print OUT "Total number of sequences with $acc codon"; print OUT "\t"; print OUT $codon; } exit; From msprasad693 at gmail.com Thu Apr 26 12:16:39 2012 From: msprasad693 at gmail.com (prasad ms) Date: Thu, 26 Apr 2012 17:46:39 +0530 Subject: [Bioperl-l] Bioperl for global alignment Message-ID: Hello sir, I am Prasad, student of MS in bioinformatics. I am doing my final year project, and sequence alignment is the part of my project. I am having nearly 50k sequences and i want to do a pairwise global alignment (NW alignment). I read the bioperl tutorial. But in that there is no mention about this. Could you please guide how can i do this type of alignment using bioperl. I assure that all the usage is purely for academic. Looking forward to hear from you. Thank you, Regards, Prasad MS From msprasad693 at gmail.com Mon Apr 30 05:40:43 2012 From: msprasad693 at gmail.com (prasad ms) Date: Mon, 30 Apr 2012 11:10:43 +0530 Subject: [Bioperl-l] Fwd: Bioperl for global alignment In-Reply-To: References: Message-ID: Hello sir, I am Prasad, student of MS in bioinformatics. I am doing my final year project, and sequence alignment is the part of my project. I am having nearly 50k sequences and i want to do a pairwise global alignment (NW alignment). I read the bioperl tutorial. But in that there is no mention about this. Could you please guide how can i do this type of alignment using bioperl. I assure that all the usage is purely for academic. Looking forward to hear from you. Thank you, Regards, Prasad MS