From p.j.a.cock at googlemail.com Mon Apr 1 10:28:43 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 1 Apr 2013 15:28:43 +0100 Subject: [Bioperl-l] Google Summer of Code - BioPerl / BioSQL proposal? Message-ID: On 18 March 2013 21:26, Christopher Fields wrote: > Just a heads-up, if there are any students interested in the Google Summer of > Code, the Open Bioinformatics Foundation is planning on participating again > this year! Pjotr Prins will be organizing for OBF; all the Bio* projects are > looking for prospective projects. > > We're open for any project ideas this year, so let us know what you would like > to do! I suggested this last year too, but improving support for BioSQL on SQLite would be great - the schema exists and seems to work fine, but is currently only handled by the Biopython BioSQL bindings. So, the core of a BioSQL/BioPerl GSoC project could tackle: * Adding SQLite support to the BioSQL scripts for loading taxonomies etc https://github.com/biosql/biosql * Adding SQLite support to BioPerl's BioSQL adapter, bioperl-db https://github.com/bioperl/bioperl-db There are a number of things that could be added to this basic idea to make the project more ambitious and to fill out a full summer. One is to extend this to doing BioSQL on SQLite bindings for BioRuby or BioJava (assuming suitable co-mentors are available). One of the nice things about SQLite compared to MySQL or PostgreSQL is the database is just one binary file on disk which is easily portable and can even be checked into source code control for unit tests. This means we can use it to make cross-binding testing far far easier. Thus another part of a GSoC project could be to use the SQlite bindings to establish cross-project testing of the BioSQL implementations for consistency. At that point I'd be interested from the BioSQL and Biopython side, and Biopython may have a few possible co-mentors here. Anyway, to be viable this project would need a Perl mentor with a good knowledge of BioSQL and BioPerl's bindings for it. Is this worth adding to the BioPerl GSoC as a possible idea? http://bioperl.org/wiki/Google_Summer_of_Code#Project_Ideas_for_2013 Regards, Peter (Who doesn't know enough Perl to qualify as a mentor for this) From cjfields at illinois.edu Mon Apr 1 10:35:25 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 1 Apr 2013 14:35:25 +0000 Subject: [Bioperl-l] Google Summer of Code - BioPerl / BioSQL proposal? In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DC9169@CHIMBX5.ad.uillinois.edu> On Apr 1, 2013, at 9:28 AM, Peter Cock wrote: > On 18 March 2013 21:26, Christopher Fields wrote: >> Just a heads-up, if there are any students interested in the Google Summer of >> Code, the Open Bioinformatics Foundation is planning on participating again >> this year! Pjotr Prins will be organizing for OBF; all the Bio* projects are >> looking for prospective projects. >> >> We're open for any project ideas this year, so let us know what you would like >> to do! > > I suggested this last year too, but improving support for BioSQL on > SQLite would be great - the schema exists and seems to work fine, > but is currently only handled by the Biopython BioSQL bindings. ++++1 > So, the core of a BioSQL/BioPerl GSoC project could tackle: > > * Adding SQLite support to the BioSQL scripts for loading taxonomies etc > https://github.com/biosql/biosql > > * Adding SQLite support to BioPerl's BioSQL adapter, bioperl-db > https://github.com/bioperl/bioperl-db > > There are a number of things that could be added to this basic idea > to make the project more ambitious and to fill out a full summer. One > is to extend this to doing BioSQL on SQLite bindings for BioRuby or > BioJava (assuming suitable co-mentors are available). > > One of the nice things about SQLite compared to MySQL or PostgreSQL > is the database is just one binary file on disk which is easily portable and > can even be checked into source code control for unit tests. This means > we can use it to make cross-binding testing far far easier. Thus another > part of a GSoC project could be to use the SQlite bindings to establish > cross-project testing of the BioSQL implementations for consistency. > At that point I'd be interested from the BioSQL and Biopython side, > and Biopython may have a few possible co-mentors here. > > Anyway, to be viable this project would need a Perl mentor with a > good knowledge of BioSQL and BioPerl's bindings for it. > > Is this worth adding to the BioPerl GSoC as a possible idea? > http://bioperl.org/wiki/Google_Summer_of_Code#Project_Ideas_for_2013 Yes. Will add this now. > Regards, > > Peter > (Who doesn't know enough Perl to qualify as a mentor for this) There was a reasonable push for this a while back (BOSC in Boston) but nothing came of it code-wise that I have seen. Not sure where it stands now. chris From hlapp at drycafe.net Mon Apr 1 12:38:46 2013 From: hlapp at drycafe.net (Hilmar Lapp) Date: Mon, 1 Apr 2013 12:38:46 -0400 Subject: [Bioperl-l] Google Summer of Code - BioPerl / BioSQL proposal? In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DC9169@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF74DC9169@CHIMBX5.ad.uillinois.edu> Message-ID: <6B9A2644-9075-486F-977C-86750D88B737@drycafe.net> Just to add another possibility for a summer project here, Bioperl-db (the BioPerl bindings to BioSQL) in essence constitute a self-made ORM, invented at a time when DBIx::Class didn't exist yet. As such, it has some advantages (if you are willing to count overly clever features to be counted in this category), but arguably many more disadvantages, chief among them being the unsustainably small (you could also say non-existent) developer community supporting it, and the fact that DBIx::Class now has existed for years, and is fairly mature. So, rewriting Bioperl-db with a DBIx::Class (or another well-supported generic ORM) would, I think, stand to make a considerable impact on our ability to further develop Bioperl's relational storage capabilities, as well as BioSQL itself. And I'd be willing to help out with such a project in a at least a co-mentoring capacity. (If primary mentor, I'd need a committed co-mentor to make it viable.) -hilmar On Apr 1, 2013, at 10:35 AM, Fields, Christopher J wrote: > On Apr 1, 2013, at 9:28 AM, Peter Cock wrote: > >> On 18 March 2013 21:26, Christopher Fields wrote: >>> Just a heads-up, if there are any students interested in the Google Summer of >>> Code, the Open Bioinformatics Foundation is planning on participating again >>> this year! Pjotr Prins will be organizing for OBF; all the Bio* projects are >>> looking for prospective projects. >>> >>> We're open for any project ideas this year, so let us know what you would like >>> to do! >> >> I suggested this last year too, but improving support for BioSQL on >> SQLite would be great - the schema exists and seems to work fine, >> but is currently only handled by the Biopython BioSQL bindings. > > ++++1 > >> So, the core of a BioSQL/BioPerl GSoC project could tackle: >> >> * Adding SQLite support to the BioSQL scripts for loading taxonomies etc >> https://github.com/biosql/biosql >> >> * Adding SQLite support to BioPerl's BioSQL adapter, bioperl-db >> https://github.com/bioperl/bioperl-db >> >> There are a number of things that could be added to this basic idea >> to make the project more ambitious and to fill out a full summer. One >> is to extend this to doing BioSQL on SQLite bindings for BioRuby or >> BioJava (assuming suitable co-mentors are available). >> >> One of the nice things about SQLite compared to MySQL or PostgreSQL >> is the database is just one binary file on disk which is easily portable and >> can even be checked into source code control for unit tests. This means >> we can use it to make cross-binding testing far far easier. Thus another >> part of a GSoC project could be to use the SQlite bindings to establish >> cross-project testing of the BioSQL implementations for consistency. >> At that point I'd be interested from the BioSQL and Biopython side, >> and Biopython may have a few possible co-mentors here. >> >> Anyway, to be viable this project would need a Perl mentor with a >> good knowledge of BioSQL and BioPerl's bindings for it. >> >> Is this worth adding to the BioPerl GSoC as a possible idea? >> http://bioperl.org/wiki/Google_Summer_of_Code#Project_Ideas_for_2013 > > Yes. Will add this now. > >> Regards, >> >> Peter >> (Who doesn't know enough Perl to qualify as a mentor for this) > > There was a reasonable push for this a while back (BOSC in Boston) but nothing came of it code-wise that I have seen. Not sure where it stands now. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From hlapp at drycafe.net Mon Apr 1 12:49:17 2013 From: hlapp at drycafe.net (Hilmar Lapp) Date: Mon, 1 Apr 2013 12:49:17 -0400 Subject: [Bioperl-l] Google Summer of Code - BioPerl proposals In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DC8BD8@CHIMBX5.ad.uillinois.edu> References: <723EBB03-97B3-4750-876C-68F55256DCB6@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF74DC843A@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DC8BD8@CHIMBX5.ad.uillinois.edu> Message-ID: <31EF3BDE-9C45-4088-A83B-39D97F78D650@drycafe.net> On Mar 31, 2013, at 11:28 PM, Fields, Christopher J wrote: > a case would have to be made as to why the initial run at it wasn't completed, or why it would work better this time. We're not suggesting that this can't be done, but the above point would have to be answered. Perhaps just to clarify a bit, as an applying student the readers you'd have in mind for making the case to aren't the senior BioPerl developers or potential mentors (who'd know much of your argument already), but rather those from other OBF projects who will take part in prioritizing applications across the OBF board, but won't know much about the specific BioPerl situation. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From carandraug+dev at gmail.com Mon Apr 1 13:17:49 2013 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Mon, 1 Apr 2013 18:17:49 +0100 Subject: [Bioperl-l] Google Summer of Code - BioPerl proposals In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DC8BD8@CHIMBX5.ad.uillinois.edu> References: <723EBB03-97B3-4750-876C-68F55256DCB6@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF74DC843A@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DC8BD8@CHIMBX5.ad.uillinois.edu> Message-ID: On 1 April 2013 04:28, Fields, Christopher J wrote: > On Mar 31, 2013, at 9:05 PM, Carn? Draug wrote: > >> On 1 April 2013 01:34, Fields, Christopher J wrote: >>> I agree. Another approach might be to cleave off a section that you could mould into your own; this could be done for bioperl-run, bioperl-live, etc. >> >> Why did the project ran out of time 2 years ago? The blog posts about >> it are very few and don't sound too bad. It mentions having prepared a >> couple of them, but none was actually ever released. Instead, the >> source was also kept in bioperl-live and seems to have already >> branched. Is there any reason for this? It was my understanding that >> splitting the project is still desirable, from a discussion back in >> February >> >> http://article.gmane.org/gmane.comp.lang.perl.bio.general/26395 >> >> it just happens that no one has picked it up yet. > > The project actually made a lot of headway; the particular pieces moved out (Bio::Root, Bio::Factory, etc) worked fine, but we never followed up on exactly what to do next on master branch. It's perfectly feasible for someone to go ahead and finish the initial part of that (in fact, I believe there were some branches that started along this path but never merged back in). Can I merge any branching between these and bioperl-live and set them up so you only have to run dzil on their repos? >> I think splitting bioperl-live into subdistributions and make a new >> 1.70 release of each of them is perfectly doable over a summer. And I >> say this after having split and release Bio-Biblio. This is one of my >> itches with BioPerl. I have been using it for almost 3 years, but have >> never seen a release. I would like to make new releases of everything, >> no changes at the start, but take them to the point that "dzil >> release" does everything. Make it really easy for anyone to come in >> and contribute and even easier for a maintainer to make a new release >> after receiving a contribution. Is this desirable for the project? >> > > Hilmar's point is pretty valid, namely that a case would have to be made as to why the initial run at it wasn't completed, or why it would work better this time. We're not suggesting that this can't be done, but the above point would have to be answered. The only reason why I claim to be able to finish this is that I'm very well familiar with both BioPerl and the tools to make the split. Plus, I already split one (and trying to split another) to get a clear idea what it involves. > Frankly, the project has been pretty reliant on me for releases, so it's perfectly valid to point out the modules haven't made it out yet b/c I haven't made a release since then. From that point of view, this would be a continuation of that work, maybe with the intent/focus on making code releases much easier. As a maintainer of another FOSS gigantic project that is also a collection of libraries, I can relate to this. Of course it can be much more interesting to write new sexy code and add it to the huge pile of modules already in bioperl-live but I want to make it easier for others to develop on BioPerl. Comparing with chemistry, I want this to be the equivalent of a catalyst for the development, rather than another reactant. > Regarding updating Bioperl to use Dist::Zilla amongst other modern perl tools (Moose included), yes, it is very much our wish/intent to have this, in any way possible. But I don't think we can call it BioPerl v1.7, simply based on past release cycles; we're somewhat bound by deprecations, etc. We really need a clean break. > > So, my general feeling is that while we are cleaving out code and releasing the independent dist and core, we should re-christen core as 1.9 (e.g. pre-v2). We move to v2 when we feel we're at the right point. Each of the individual distributions would have to start with their own versions, anything greater than the point where they left the core/live distribution should work. I agree with you in that I don't think it would take a long time, but we also have bioperl-run in the mix (and in many cases it would make sense to combine wrappers with the proper parsers), so simply cleaving out from one repo may not be the best approach. > > With that in mind, my point was meant to indicate we can also start afresh with a section of the code that you would like to focus on, using some of the same ideas (pulling out the relevant modules you want to work on). This might be an attainable goal in the minds of GSoC reviewers and might suit your particular needs (for instance, if you had a research project reliant on such code). I'm supportive either way, and I don't think you'll have a problem finding a mentor if you need one. I suggested 1.70 only because it has no change. And it won't be BioPerl 1.7. It would be Bio-Seq, Bio-Align, Bio-Popgen, etc v 1.70. These smaller distributions can then stay as they are or evolve into 2.0 if their maintainers are so interested. I saw biome and liked it, but is the plan to make a BioPerl 2.00 written in Moose? Won't that path take us to the same place we are now in a couple of years? Won't it be better to make the split now, and make the clean break on each smaller distribution? Would you be available to talk about this on #bioperl? I'm online there most of the time. Carn? From cjfields at illinois.edu Mon Apr 1 17:45:16 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 1 Apr 2013 21:45:16 +0000 Subject: [Bioperl-l] Google Summer of Code - BioPerl proposals In-Reply-To: <31EF3BDE-9C45-4088-A83B-39D97F78D650@drycafe.net> References: <723EBB03-97B3-4750-876C-68F55256DCB6@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF74DC843A@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DC8BD8@CHIMBX5.ad.uillinois.edu> <31EF3BDE-9C45-4088-A83B-39D97F78D650@drycafe.net> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DCA76E@CHIMBX5.ad.uillinois.edu> On Apr 1, 2013, at 11:49 AM, Hilmar Lapp wrote: > > On Mar 31, 2013, at 11:28 PM, Fields, Christopher J wrote: > >> a case would have to be made as to why the initial run at it wasn't completed, or why it would work better this time. We're not suggesting that this can't be done, but the above point would have to be answered. > > Perhaps just to clarify a bit, as an applying student the readers you'd have in mind for making the case to aren't the senior BioPerl developers or potential mentors (who'd know much of your argument already), but rather those from other OBF projects who will take part in prioritizing applications across the OBF board, but won't know much about the specific BioPerl situation. > > -hilmar Yep, exactly. -c From cjfields at illinois.edu Mon Apr 1 18:23:45 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 1 Apr 2013 22:23:45 +0000 Subject: [Bioperl-l] Google Summer of Code - BioPerl proposals In-Reply-To: References: <723EBB03-97B3-4750-876C-68F55256DCB6@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF74DC843A@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DC8BD8@CHIMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DCA9D4@CHIMBX5.ad.uillinois.edu> On Apr 1, 2013, at 12:17 PM, Carn? Draug wrote: > On 1 April 2013 04:28, Fields, Christopher J wrote: >> On Mar 31, 2013, at 9:05 PM, Carn? Draug wrote: >> >>> On 1 April 2013 01:34, Fields, Christopher J wrote: >>>> I agree. Another approach might be to cleave off a section that you could mould into your own; this could be done for bioperl-run, bioperl-live, etc. >>> >>> Why did the project ran out of time 2 years ago? The blog posts about >>> it are very few and don't sound too bad. It mentions having prepared a >>> couple of them, but none was actually ever released. Instead, the >>> source was also kept in bioperl-live and seems to have already >>> branched. Is there any reason for this? It was my understanding that >>> splitting the project is still desirable, from a discussion back in >>> February >>> >>> http://article.gmane.org/gmane.comp.lang.perl.bio.general/26395 >>> >>> it just happens that no one has picked it up yet. >> >> The project actually made a lot of headway; the particular pieces moved out (Bio::Root, Bio::Factory, etc) worked fine, but we never followed up on exactly what to do next on master branch. It's perfectly feasible for someone to go ahead and finish the initial part of that (in fact, I believe there were some branches that started along this path but never merged back in). > > Can I merge any branching between these and bioperl-live and set them > up so you only have to run dzil on their repos? I wouldn't worry about the branches, they are probably too stale. Have it so dzil works for the various repos from that project (it should already). We will likely need to think about having a stub Build.PL that can be used for basic installation, but would be auto-generated based on the needs for that repo (and so shouldn't be committed to). This is mainly to help git-savvy users, not devs; we don't necessarily want users to install dzil, which had somewhere north of 40 or so dependencies IIRC. >>> I think splitting bioperl-live into subdistributions and make a new >>> 1.70 release of each of them is perfectly doable over a summer. And I >>> say this after having split and release Bio-Biblio. This is one of my >>> itches with BioPerl. I have been using it for almost 3 years, but have >>> never seen a release. I would like to make new releases of everything, >>> no changes at the start, but take them to the point that "dzil >>> release" does everything. Make it really easy for anyone to come in >>> and contribute and even easier for a maintainer to make a new release >>> after receiving a contribution. Is this desirable for the project? >>> >> >> Hilmar's point is pretty valid, namely that a case would have to be made as to why the initial run at it wasn't completed, or why it would work better this time. We're not suggesting that this can't be done, but the above point would have to be answered. > > The only reason why I claim to be able to finish this is that I'm very > well familiar with both BioPerl and the tools to make the split. Plus, > I already split one (and trying to split another) to get a clear idea > what it involves. Right, I do think it's feasible. But see Hilmar's response on this point; you don't have to convince us. >> Frankly, the project has been pretty reliant on me for releases, so it's perfectly valid to point out the modules haven't made it out yet b/c I haven't made a release since then. From that point of view, this would be a continuation of that work, maybe with the intent/focus on making code releases much easier. > > As a maintainer of another FOSS gigantic project that is also a > collection of libraries, I can relate to this. Of course it can be > much more interesting to write new sexy code and add it to the huge > pile of modules already in bioperl-live but I want to make it easier > for others to develop on BioPerl. Comparing with chemistry, I want > this to be the equivalent of a catalyst for the development, rather > than another reactant. > >> Regarding updating Bioperl to use Dist::Zilla amongst other modern perl tools (Moose included), yes, it is very much our wish/intent to have this, in any way possible. But I don't think we can call it BioPerl v1.7, simply based on past release cycles; we're somewhat bound by deprecations, etc. We really need a clean break. >> >> So, my general feeling is that while we are cleaving out code and releasing the independent dist and core, we should re-christen core as 1.9 (e.g. pre-v2). We move to v2 when we feel we're at the right point. Each of the individual distributions would have to start with their own versions, anything greater than the point where they left the core/live distribution should work. I agree with you in that I don't think it would take a long time, but we also have bioperl-run in the mix (and in many cases it would make sense to combine wrappers with the proper parsers), so simply cleaving out from one repo may not be the best approach. >> >> With that in mind, my point was meant to indicate we can also start afresh with a section of the code that you would like to focus on, using some of the same ideas (pulling out the relevant modules you want to work on). This might be an attainable goal in the minds of GSoC reviewers and might suit your particular needs (for instance, if you had a research project reliant on such code). I'm supportive either way, and I don't think you'll have a problem finding a mentor if you need one. > > I suggested 1.70 only because it has no change. And it won't be > BioPerl 1.7. It would be Bio-Seq, Bio-Align, Bio-Popgen, etc v 1.70. There may be a point where we will likely find it hard to split out more w/o running into circular dependency issues. This will likely center around Bio::Seq, Bio::SeqFeature, and Bio::Annotation (with others thrown in). But let's see how far we can go with it. If we get to a point where division becomes problematic, we can deem that 'core'. But I would like to see Bio::Seq etc in their own space. Re: versioning: I'm not particularly hung up on any particular versioning scheme, but the key point is support. It's easy for me to say "as of bioperl v2 the installation scheme will be something completely different" as opposed to doing so with v1.7. Will installation of v1.7 be the same is it was for v1.6 (or even similar)? Will it install the same modules by default? We would be changing a key step in using BioPerl (installation) w/o much warning. > These smaller distributions can then stay as they are or evolve into > 2.0 if their maintainers are so interested. I saw biome and liked it, > but is the plan to make a BioPerl 2.00 written in Moose? Not necessarily, unless it can be demonstrated to help considerably. I think it can FWIW. > Won't that > path take us to the same place we are now in a couple of years? Won't > it be better to make the split now, and make the clean break on each > smaller distribution? Right. Exactly. (the latter point :) > Would you be available to talk about this on #bioperl? I'm online > there most of the time. > > Carn? I'll join in tomorrow, sure. I may be on and off channel due to meetings. chris From carandraug+dev at gmail.com Mon Apr 1 21:54:49 2013 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Tue, 2 Apr 2013 02:54:49 +0100 Subject: [Bioperl-l] Google Summer of Code - BioPerl proposals In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DCA9D4@CHIMBX5.ad.uillinois.edu> References: <723EBB03-97B3-4750-876C-68F55256DCB6@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF74DC843A@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DC8BD8@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DCA9D4@CHIMBX5.ad.uillinois.edu> Message-ID: On 1 April 2013 23:23, Fields, Christopher J wrote: > On Apr 1, 2013, at 12:17 PM, Carn? Draug wrote: >> Can I merge any branching between these and bioperl-live and set them >> up so you only have to run dzil on their repos? > > I wouldn't worry about the branches, they are probably too stale. Have it so dzil works for the various repos from that project (it should already). I tried but I don't have push permissions for Bio-Root like I have for the other BioPerl repos. > We will likely need to think about having a stub Build.PL that can be used for basic installation, but would be auto-generated based on the needs for that repo (and so shouldn't be committed to). This is mainly to help git-savvy users, not devs; we don't necessarily want users to install dzil, which had somewhere north of 40 or so dependencies IIRC. Bah! People using development versions should be prepared to act as developers. Otherwise they should be content with the stable released versions. Development versions are not meant to be stable. I see no reason to give users the chance to shoot themselves, specially when it's more work for developers and maintainers. > Re: versioning: I'm not particularly hung up on any particular versioning scheme, but the key point is support. It's easy for me to say "as of bioperl v2 the installation scheme will be something completely different" as opposed to doing so with v1.7. Will installation of v1.7 be the same is it was for v1.6 (or even similar)? Will it install the same modules by default? We would be changing a key step in using BioPerl (installation) w/o much warning. That is my idea yes. Exactly what happened with Bio-Biblio, it changed close to nothing. There were a few minor changes on the code to pass the tests already in place, bust mostly it was in POD to use the BioPerl's distzilla and podweaver configuration. Carn? From cjfields at illinois.edu Mon Apr 1 22:19:23 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 2 Apr 2013 02:19:23 +0000 Subject: [Bioperl-l] Google Summer of Code - BioPerl proposals In-Reply-To: References: <723EBB03-97B3-4750-876C-68F55256DCB6@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF74DC843A@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DC8BD8@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DCA9D4@CHIMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DCACE9@CHIMBX5.ad.uillinois.edu> On Apr 1, 2013, at 8:54 PM, Carn? Draug wrote: > On 1 April 2013 23:23, Fields, Christopher J wrote: >> On Apr 1, 2013, at 12:17 PM, Carn? Draug wrote: >>> Can I merge any branching between these and bioperl-live and set them >>> up so you only have to run dzil on their repos? >> >> I wouldn't worry about the branches, they are probably too stale. Have it so dzil works for the various repos from that project (it should already). > > I tried but I don't have push permissions for Bio-Root like I have for > the other BioPerl repos. Should be fixed now, that one repo didn't have team set, just owners. >> We will likely need to think about having a stub Build.PL that can be used for basic installation, but would be auto-generated based on the needs for that repo (and so shouldn't be committed to). This is mainly to help git-savvy users, not devs; we don't necessarily want users to install dzil, which had somewhere north of 40 or so dependencies IIRC. > > Bah! People using development versions should be prepared to act as > developers. Otherwise they should be content with the stable released > versions. Development versions are not meant to be stable. I see no > reason to give users the chance to shoot themselves, specially when > it's more work for developers and maintainers. I agree (though the definition of when something is in 'development' vs 'stable/release' is very subjective). I wouldn't do this unless requested, though, and I think the current plugin bundle does have some basic functionality that supports something if needed. >> Re: versioning: I'm not particularly hung up on any particular versioning scheme, but the key point is support. It's easy for me to say "as of bioperl v2 the installation scheme will be something completely different" as opposed to doing so with v1.7. Will installation of v1.7 be the same is it was for v1.6 (or even similar)? Will it install the same modules by default? We would be changing a key step in using BioPerl (installation) w/o much warning. > > That is my idea yes. Exactly what happened with Bio-Biblio, it changed > close to nothing. There were a few minor changes on the code to pass > the tests already in place, bust mostly it was in POD to use the > BioPerl's distzilla and podweaver configuration. > > Carn? That works for Bio-Biblio, but my point is: would one be able to get an old-school all-inclusive (e.g. install everything) bioperl? Maybe the answer should be 'of course not', and we should create a bundle to take care of this instead. I wouldn't worry about it, frankly. We should just forge ahead, damn the torpedoes. chris From cjfields at illinois.edu Mon Apr 1 23:21:16 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 2 Apr 2013 03:21:16 +0000 Subject: [Bioperl-l] Google Summer of Code - BioPerl / BioSQL proposal? In-Reply-To: <6B9A2644-9075-486F-977C-86750D88B737@drycafe.net> References: <118F034CF4C3EF48A96F86CE585B94BF74DC9169@CHIMBX5.ad.uillinois.edu> <6B9A2644-9075-486F-977C-86750D88B737@drycafe.net> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DCAEAE@CHIMBX5.ad.uillinois.edu> Added! http://bioperl.org/wiki/Google_Summer_of_Code http://www.open-bio.org/wiki/Google_Summer_of_Code#Project_ideas chris On Apr 1, 2013, at 11:38 AM, Hilmar Lapp wrote: > Just to add another possibility for a summer project here, Bioperl-db (the BioPerl bindings to BioSQL) in essence constitute a self-made ORM, invented at a time when DBIx::Class didn't exist yet. As such, it has some advantages (if you are willing to count overly clever features to be counted in this category), but arguably many more disadvantages, chief among them being the unsustainably small (you could also say non-existent) developer community supporting it, and the fact that DBIx::Class now has existed for years, and is fairly mature. > > So, rewriting Bioperl-db with a DBIx::Class (or another well-supported generic ORM) would, I think, stand to make a considerable impact on our ability to further develop Bioperl's relational storage capabilities, as well as BioSQL itself. > > And I'd be willing to help out with such a project in a at least a co-mentoring capacity. (If primary mentor, I'd need a committed co-mentor to make it viable.) > > -hilmar > > On Apr 1, 2013, at 10:35 AM, Fields, Christopher J wrote: > >> On Apr 1, 2013, at 9:28 AM, Peter Cock wrote: >> >>> On 18 March 2013 21:26, Christopher Fields wrote: >>>> Just a heads-up, if there are any students interested in the Google Summer of >>>> Code, the Open Bioinformatics Foundation is planning on participating again >>>> this year! Pjotr Prins will be organizing for OBF; all the Bio* projects are >>>> looking for prospective projects. >>>> >>>> We're open for any project ideas this year, so let us know what you would like >>>> to do! >>> >>> I suggested this last year too, but improving support for BioSQL on >>> SQLite would be great - the schema exists and seems to work fine, >>> but is currently only handled by the Biopython BioSQL bindings. >> >> ++++1 >> >>> So, the core of a BioSQL/BioPerl GSoC project could tackle: >>> >>> * Adding SQLite support to the BioSQL scripts for loading taxonomies etc >>> https://github.com/biosql/biosql >>> >>> * Adding SQLite support to BioPerl's BioSQL adapter, bioperl-db >>> https://github.com/bioperl/bioperl-db >>> >>> There are a number of things that could be added to this basic idea >>> to make the project more ambitious and to fill out a full summer. One >>> is to extend this to doing BioSQL on SQLite bindings for BioRuby or >>> BioJava (assuming suitable co-mentors are available). >>> >>> One of the nice things about SQLite compared to MySQL or PostgreSQL >>> is the database is just one binary file on disk which is easily portable and >>> can even be checked into source code control for unit tests. This means >>> we can use it to make cross-binding testing far far easier. Thus another >>> part of a GSoC project could be to use the SQlite bindings to establish >>> cross-project testing of the BioSQL implementations for consistency. >>> At that point I'd be interested from the BioSQL and Biopython side, >>> and Biopython may have a few possible co-mentors here. >>> >>> Anyway, to be viable this project would need a Perl mentor with a >>> good knowledge of BioSQL and BioPerl's bindings for it. >>> >>> Is this worth adding to the BioPerl GSoC as a possible idea? >>> http://bioperl.org/wiki/Google_Summer_of_Code#Project_Ideas_for_2013 >> >> Yes. Will add this now. >> >>> Regards, >>> >>> Peter >>> (Who doesn't know enough Perl to qualify as a mentor for this) >> >> There was a reasonable push for this a while back (BOSC in Boston) but nothing came of it code-wise that I have seen. Not sure where it stands now. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > From carandraug+dev at gmail.com Tue Apr 2 00:18:19 2013 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Tue, 2 Apr 2013 05:18:19 +0100 Subject: [Bioperl-l] releasing Bio-Root Message-ID: I have prepared the Bio-Root repo so a release can be made with dzil release git push --tags IF the development version of the BioPerl pluginbundles are installed. Note that Bioperl's dist zilla and pod weaver pluginbundles are still not available on CPAN. Could someone please upload them to CPAN or give me co-maintenance? I have removed the Bio::Root modules and tests from bioperl-live that are part of the Bio-Root distribution (that leaves one behind which was not forgotten. It was imported back in 2011 and then removed). What to do about Bio::Root::Version? I didn't change its code, but I'm guessing it should be edited to do something so it keeps backward compatibility with this release. With the multiple distributions, this makes no sense since each of them may be a different version at any given time. Plus, BioPerl's distzilla pluginbundle uses the version plugin which already does it for each module. Carn? From cjfields at illinois.edu Tue Apr 2 00:54:47 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 2 Apr 2013 04:54:47 +0000 Subject: [Bioperl-l] releasing Bio-Root In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DCAFF9@CHIMBX5.ad.uillinois.edu> On Apr 1, 2013, at 11:18 PM, Carn? Draug wrote: > I have prepared the Bio-Root repo so a release can be made with > > dzil release > git push --tags > > IF the development version of the BioPerl pluginbundles are installed. > Note that Bioperl's dist zilla and pod weaver pluginbundles are still > not available on CPAN. Could someone please upload them to CPAN or > give me co-maintenance? You have co-maint on Dist::Zilla::PluginBundle::BioPerl. Note that the Pod::Weaver module wasn't in the original submission; we'll need you to transfer primary maint to BIOPERLML when you can (you should still be co-maint on it). > I have removed the Bio::Root modules and tests from bioperl-live that > are part of the Bio-Root distribution (that leaves one behind which > was not forgotten. It was imported back in 2011 and then removed). Okay. > What to do about Bio::Root::Version? I didn't change its code, but I'm > guessing it should be edited to do something so it keeps backward > compatibility with this release. With the multiple distributions, this > makes no sense since each of them may be a different version at any > given time. Plus, BioPerl's distzilla pluginbundle uses the version > plugin which already does it for each module. > > Carn? It will be deprecated for most use cases for the reasons you mention, but yes it will need to be fixed to deal with things for the time being, at least until we can get Dist::Zilla running for the main bioperl repo. chris From carandraug+dev at gmail.com Tue Apr 2 01:18:25 2013 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Tue, 2 Apr 2013 06:18:25 +0100 Subject: [Bioperl-l] releasing Bio-Root In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DCAFF9@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF74DCAFF9@CHIMBX5.ad.uillinois.edu> Message-ID: On 2 April 2013 05:54, Fields, Christopher J wrote: > On Apr 1, 2013, at 11:18 PM, Carn? Draug wrote: > >> I have prepared the Bio-Root repo so a release can be made with >> >> dzil release >> git push --tags >> >> IF the development version of the BioPerl pluginbundles are installed. >> Note that Bioperl's dist zilla and pod weaver pluginbundles are still >> not available on CPAN. Could someone please upload them to CPAN or >> give me co-maintenance? > > You have co-maint on Dist::Zilla::PluginBundle::BioPerl. Note that the Pod::Weaver module wasn't in the original submission; we'll need you to transfer primary maint to BIOPERLML when you can (you should still be co-maint on it). Done. Version 0.20 of the pluginbundles have been released. BIOPERLML already has primary maintenance of the Pod Weaver plugin namespace. >> What to do about Bio::Root::Version? I didn't change its code, but I'm >> guessing it should be edited to do something so it keeps backward >> compatibility with this release. With the multiple distributions, this >> makes no sense since each of them may be a different version at any >> given time. Plus, BioPerl's distzilla pluginbundle uses the version >> plugin which already does it for each module. > > It will be deprecated for most use cases for the reasons you mention, but yes it will need to be fixed to deal with things for the time being, at least until we can get Dist::Zilla running for the main bioperl repo. By the way, I just noticed that Bio::Root::Build does not get the version from Bio::Root::Version, it has its own value hardcoded. Carn? From cjfields at illinois.edu Tue Apr 2 09:00:53 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 2 Apr 2013 13:00:53 +0000 Subject: [Bioperl-l] releasing Bio-Root In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF74DCAFF9@CHIMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DCB16C@CHIMBX5.ad.uillinois.edu> On Apr 2, 2013, at 12:18 AM, Carn? Draug wrote: > On 2 April 2013 05:54, Fields, Christopher J wrote: >> On Apr 1, 2013, at 11:18 PM, Carn? Draug wrote: >> >>> I have prepared the Bio-Root repo so a release can be made with >>> >>> dzil release >>> git push --tags >>> >>> IF the development version of the BioPerl pluginbundles are installed. >>> Note that Bioperl's dist zilla and pod weaver pluginbundles are still >>> not available on CPAN. Could someone please upload them to CPAN or >>> give me co-maintenance? >> >> You have co-maint on Dist::Zilla::PluginBundle::BioPerl. Note that the Pod::Weaver module wasn't in the original submission; we'll need you to transfer primary maint to BIOPERLML when you can (you should still be co-maint on it). > > Done. Version 0.20 of the pluginbundles have been released. BIOPERLML > already has primary maintenance of the Pod Weaver plugin namespace. Okay, that works for me. >>> What to do about Bio::Root::Version? I didn't change its code, but I'm >>> guessing it should be edited to do something so it keeps backward >>> compatibility with this release. With the multiple distributions, this >>> makes no sense since each of them may be a different version at any >>> given time. Plus, BioPerl's distzilla pluginbundle uses the version >>> plugin which already does it for each module. >> >> It will be deprecated for most use cases for the reasons you mention, but yes it will need to be fixed to deal with things for the time being, at least until we can get Dist::Zilla running for the main bioperl repo. > > By the way, I just noticed that Bio::Root::Build does not get the > version from Bio::Root::Version, it has its own value hardcoded. > > Carn? Yeah, that's due to the way it is implemented. I have been pushing away from using Bio::Root::Build, though, as it has some very specific hacks for dealing with the original BioPerl installation process (same for Bio::Root::Test re: tests) when we switched from MakeMaker to Module::Build. They may be deprecated if there isn't a broad need for them. chris From carandraug+dev at gmail.com Wed Apr 3 18:41:54 2013 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 3 Apr 2013 23:41:54 +0100 Subject: [Bioperl-l] releasing Bio-Root In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DCAFF9@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF74DCAFF9@CHIMBX5.ad.uillinois.edu> Message-ID: On 2 April 2013 05:54, Fields, Christopher J wrote: > On Apr 1, 2013, at 11:18 PM, Carn? Draug wrote: > >> I have removed the Bio::Root modules and tests from bioperl-live that >> are part of the Bio-Root distribution (that leaves one behind which >> was not forgotten. It was imported back in 2011 and then removed). > > Okay. I didn't remove them from the branch v1 in bioperl-live. Should I? Carn? From cjfields at illinois.edu Wed Apr 3 20:47:09 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 4 Apr 2013 00:47:09 +0000 Subject: [Bioperl-l] releasing Bio-Root In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF74DCAFF9@CHIMBX5.ad.uillinois.edu>, Message-ID: <1663ad72n9fycm2ci49vv758.1365036425922@email.android.com> No, leave them. If we do another 1.6 series release it will be from that branch, but if this progresses far enough we should just move to incremental 1.7 releases. Chris Sent via a capable phone of some make and model -------- Original message -------- From: Carn? Draug Date: 04/03/2013 5:42 PM (GMT-06:00) To: "Fields, Christopher J" Cc: bioperl mailing list Subject: Re: releasing Bio-Root On 2 April 2013 05:54, Fields, Christopher J wrote: > On Apr 1, 2013, at 11:18 PM, Carn? Draug wrote: > >> I have removed the Bio::Root modules and tests from bioperl-live that >> are part of the Bio-Root distribution (that leaves one behind which >> was not forgotten. It was imported back in 2011 and then removed). > > Okay. I didn't remove them from the branch v1 in bioperl-live. Should I? Carn? From carandraug+dev at gmail.com Thu Apr 4 15:16:56 2013 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Thu, 4 Apr 2013 20:16:56 +0100 Subject: [Bioperl-l] releasing Bio-coordinate Message-ID: I have prepared the Bio-Coordinate repo so a release can be made with dzil release git push --tags provided the latest Bioperl's plugin bundles are installed. Carn? From carandraug+dev at gmail.com Thu Apr 4 15:34:28 2013 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Thu, 4 Apr 2013 20:34:28 +0100 Subject: [Bioperl-l] 2013Q2 Call for Grant Proposals Message-ID: Hi I was talking on #perl and was told about TPF grants, and that we should apply for it. I was told that even simple projects like analyse the code for points of improvement and speeding (with Devel::NYTProf for example) should be acceptable. Though we can probably try bigger things as well. http://news.perlfoundation.org/2013/03/2013q2-call-for-grant-proposal.html Apparently last time, they were short of project proposals. Carn? From avilella at gmail.com Fri Apr 5 09:44:25 2013 From: avilella at gmail.com (Albert Vilella) Date: Fri, 5 Apr 2013 14:44:25 +0100 Subject: [Bioperl-l] Has anyone looked at using Perl with Illumina Basespace? Message-ID: Hi all, Has anyone looked at using Perl with Illumina Basespace? I would like to know if there is some work done already before I go and re-create already existing pieces. Looking forward to your responses, Albert. From cjfields at illinois.edu Fri Apr 5 10:13:42 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 5 Apr 2013 14:13:42 +0000 Subject: [Bioperl-l] Has anyone looked at using Perl with Illumina Basespace? In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DCF9D5@CHIMBX5.ad.uillinois.edu> Would be the first I have heard of this, but we would be interested in this for our seq facility. chris On Apr 5, 2013, at 8:44 AM, Albert Vilella wrote: > Hi all, > > Has anyone looked at using Perl with Illumina Basespace? > > I would like to know if there is some work done already before I go and > re-create already existing pieces. > > Looking forward to your responses, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Apr 5 10:21:22 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 5 Apr 2013 14:21:22 +0000 Subject: [Bioperl-l] Has anyone looked at using Perl with Illumina Basespace? In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DCF9D5@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF74DCF9D5@CHIMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DCFA4B@CHIMBX5.ad.uillinois.edu> (clarify: meaning any Perl/Python/etc interface to it) Our sequencing facility is looking into testing it out at some point, but we have a lot of comp resources locally to process data, so it depends on long-term costs. chris On Apr 5, 2013, at 9:13 AM, "Fields, Christopher J" wrote: > Would be the first I have heard of this, but we would be interested in this for our seq facility. > > chris > > On Apr 5, 2013, at 8:44 AM, Albert Vilella wrote: > >> Hi all, >> >> Has anyone looked at using Perl with Illumina Basespace? >> >> I would like to know if there is some work done already before I go and >> re-create already existing pieces. >> >> Looking forward to your responses, >> >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From alexeymorozov1991 at gmail.com Sun Apr 7 21:32:02 2013 From: alexeymorozov1991 at gmail.com (Alexey Morozov) Date: Mon, 8 Apr 2013 10:32:02 +0900 Subject: [Bioperl-l] Has anyone looked at using Perl with Illumina Basespace? In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DCFA4B@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF74DCF9D5@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DCFA4B@CHIMBX5.ad.uillinois.edu> Message-ID: Speaking of uncommon sequence fomats, does Bio::Seq or any of Bio::Seq::Foos support SOLiD colorspace sequences? I implemented basic stuff like translation to basespace myself and can help writing a module if it does not already exist. 2013/4/5 Fields, Christopher J > (clarify: meaning any Perl/Python/etc interface to it) > > Our sequencing facility is looking into testing it out at some point, but > we have a lot of comp resources locally to process data, so it depends on > long-term costs. > > chris > > On Apr 5, 2013, at 9:13 AM, "Fields, Christopher J" > wrote: > > > Would be the first I have heard of this, but we would be interested in > this for our seq facility. > > > > chris > > > > On Apr 5, 2013, at 8:44 AM, Albert Vilella wrote: > > > >> Hi all, > >> > >> Has anyone looked at using Perl with Illumina Basespace? > >> > >> I would like to know if there is some work done already before I go and > >> re-create already existing pieces. > >> > >> Looking forward to your responses, > >> > >> Albert. > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Alexey Morozov, LIN SB RAS, bioinformatics group. Irkutsk, Russia. From alexeymorozov1991 at gmail.com Tue Apr 9 03:18:06 2013 From: alexeymorozov1991 at gmail.com (Alexey Morozov) Date: Tue, 9 Apr 2013 16:18:06 +0900 Subject: [Bioperl-l] UNIPROT foo.tsv files support Message-ID: Does BioPerl currently support UNIPROT data? For a couple of families I have downloaded foo.fasta and foo.tsv files and want to turn data in them to Bio::Seq objects, add taxonomy data and do every kind of fancy stuff I please. It seems that parsing of UNIPROT tsv files is not described on Bioperl wiki and search for 'uniprot' at CPAN gives only a couple of modules with totally different functions. -- Alexey Morozov, LIN SB RAS, bioinformatics group. Irkutsk, Russia. From p.j.a.cock at googlemail.com Tue Apr 9 06:20:43 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 9 Apr 2013 11:20:43 +0100 Subject: [Bioperl-l] OBF not accepted for GSoC 2013 Message-ID: Dear all, Unfortunately this year we have not been accepted on the Google Summer of Code scheme: I'm sure the rest of the OBF board and the other Bio* developers will join me in thanking Pjotr Prins for his efforts as the OBF GSoC administrator co-ordinating our application this year, as well as last year's administrator Rob Bruels and the other mentors for their efforts. For those of you not subscribed to the OBF's GSoC mailing list, I am forwarding Pjotr's email from last night (also below): http://lists.open-bio.org/pipermail/gsoc/2013/000211.html In all 177 organisations were accepted (about the same as the last few years), and they will be listed here (once they have filled out their profile information): https://google-melange.appspot.com/gsoc/accepted_orgs/google/gsoc2013 To potential students this summer, the good news is that some related organisations have been accepted, such as NESCent, the National Resource for Network Biology (NRNB - known for Cytoscape), SciRuby (Ruby Science Foundation), so there is still some scope for doing a bioinformatics related project in GSoC 2013, perhaps even with a Bio* developer as a co-mentor. Thank you all, Peter (Biopython developer, OBF board member) ---------- Forwarded message ---------- From: Pjotr Prins Date: Mon, Apr 8, 2013 at 9:13 PM Subject: Re: GSoC 2013 is ON To: Pjotr Prins Cc: ..., OBF GSoC Sadly, our application got rejected by GSoC this year. I am not sure what the reason was, but I am convinced our application was similar to that of other years. Maybe the project ideas could have been better presented. I am not sure at this stage. I'll make a list of successful projects to see if we can digest some truths. The upside is that FOSS is going strong! And that the field is getting increasingly competitive. As an open source geezer I can only be happy, even if it hurts our own application. Sorry everyone, and many thanks for the trouble you took getting projects written up. Let's not feel discouraged for next year. Pj. From cjfields at illinois.edu Tue Apr 9 08:53:34 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 9 Apr 2013 12:53:34 +0000 Subject: [Bioperl-l] UNIPROT foo.tsv files support In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DD4044@CHIMBX5.ad.uillinois.edu> We support Swissprot format. You'll have to point out where you retrieved the data, TSV isn't listed as a specific file type on their downloads page: http://www.uniprot.org/downloads If this is tab-delimited, it should be pretty easy to drum up a custom parser using Text::CSV and the Bioperl FASTA parser. chris On Apr 9, 2013, at 2:18 AM, Alexey Morozov wrote: > Does BioPerl currently support UNIPROT data? For a couple of families I > have downloaded foo.fasta and foo.tsv files and want to turn data in them > to Bio::Seq objects, add taxonomy data and do every kind of fancy stuff I > please. > It seems that parsing of UNIPROT tsv files is not described on Bioperl wiki > and search for 'uniprot' at CPAN gives only a couple of modules with > totally different functions. > -- > Alexey Morozov, > LIN SB RAS, bioinformatics group. > Irkutsk, Russia. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Tue Apr 9 09:08:59 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 09 Apr 2013 09:08:59 -0400 Subject: [Bioperl-l] UNIPROT foo.tsv files support In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DD4044@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF74DD4044@CHIMBX5.ad.uillinois.edu> Message-ID: <3F77909C-2314-49D0-9A58-6F97B88A1FC2@verizon.net> Or Bio::SeqIO::table, perhaps. On Apr 9, 2013, at 8:53 AM, "Fields, Christopher J" wrote: > We support Swissprot format. You'll have to point out where you retrieved the data, TSV isn't listed as a specific file type on their downloads page: > > http://www.uniprot.org/downloads > > If this is tab-delimited, it should be pretty easy to drum up a custom parser using Text::CSV and the Bioperl FASTA parser. > > chris > > On Apr 9, 2013, at 2:18 AM, Alexey Morozov wrote: > >> Does BioPerl currently support UNIPROT data? For a couple of families I >> have downloaded foo.fasta and foo.tsv files and want to turn data in them >> to Bio::Seq objects, add taxonomy data and do every kind of fancy stuff I >> please. >> It seems that parsing of UNIPROT tsv files is not described on Bioperl wiki >> and search for 'uniprot' at CPAN gives only a couple of modules with >> totally different functions. >> -- >> Alexey Morozov, >> LIN SB RAS, bioinformatics group. >> Irkutsk, Russia. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From pengyu.ut at gmail.com Wed Apr 3 21:54:25 2013 From: pengyu.ut at gmail.com (Peng Yu) Date: Wed, 3 Apr 2013 20:54:25 -0500 Subject: [Bioperl-l] Parsing uniprot fasta header? Message-ID: Hi, I need to parse the header of uniprot fasta format. Although it not very difficult to do it myself, I'm wondering if there is already something available in bioperl. Thanks. http://www.uniprot.org/help/fasta-headers -- Regards, Peng From scott at scottcain.net Tue Apr 9 06:57:40 2013 From: scott at scottcain.net (Scott Cain) Date: Tue, 9 Apr 2013 11:57:40 +0100 Subject: [Bioperl-l] OBF not accepted for GSoC 2013 In-Reply-To: References: Message-ID: <3155B235-4068-491C-B611-1117EF1B5C50@scottcain.net> Hi all, I was certainly surprised the the openbio application wasn't accepted; completion must be very tight. I just want to point out another bio project for inclined students: the Genome Informatics project is a GSoC collaboration of several well known projects, including GMOD (chado, gbrowse, jbrowse), Reactome, and Galaxy. For more information, see: http://gmod.org/wiki/GSoC Thanks, Scott Sent from my iPhone On Apr 9, 2013, at 11:20 AM, Peter Cock wrote: > Dear all, > > Unfortunately this year we have not been accepted on the Google > Summer of Code scheme: > > I'm sure the rest of the OBF board and the other Bio* developers > will join me in thanking Pjotr Prins for his efforts as the OBF > GSoC administrator co-ordinating our application this year, as > well as last year's administrator Rob Bruels and the other mentors > for their efforts. > > For those of you not subscribed to the OBF's GSoC mailing list, > I am forwarding Pjotr's email from last night (also below): > http://lists.open-bio.org/pipermail/gsoc/2013/000211.html > > In all 177 organisations were accepted (about the same as the > last few years), and they will be listed here (once they have filled > out their profile information): > https://google-melange.appspot.com/gsoc/accepted_orgs/google/gsoc2013 > > To potential students this summer, the good news is that some > related organisations have been accepted, such as NESCent, > the National Resource for Network Biology (NRNB - known for > Cytoscape), SciRuby (Ruby Science Foundation), so there is > still some scope for doing a bioinformatics related project in > GSoC 2013, perhaps even with a Bio* developer as a co-mentor. > > Thank you all, > > Peter > (Biopython developer, OBF board member) > > ---------- Forwarded message ---------- > From: Pjotr Prins > Date: Mon, Apr 8, 2013 at 9:13 PM > Subject: Re: GSoC 2013 is ON > To: Pjotr Prins > Cc: ..., OBF GSoC > > > Sadly, our application got rejected by GSoC this year. I am not sure > what the reason was, but I am convinced our application was similar to > that of other years. Maybe the project ideas could have been better > presented. I am not sure at this stage. I'll make a list of successful > projects to see if we can digest some truths. > > The upside is that FOSS is going strong! And that the field is getting > increasingly competitive. As an open source geezer I can only be > happy, even if it hurts our own application. > > Sorry everyone, and many thanks for the trouble you took getting > projects written up. Let's not feel discouraged for next year. > > Pj. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From l.m.timmermans at students.uu.nl Tue Apr 9 08:24:24 2013 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Tue, 9 Apr 2013 14:24:24 +0200 Subject: [Bioperl-l] OBF not accepted for GSoC 2013 In-Reply-To: References: Message-ID: On Tue, Apr 9, 2013 at 12:20 PM, Peter Cock wrote: > Sadly, our application got rejected by GSoC this year. I am not sure > what the reason was, but I am convinced our application was similar to > that of other years. Maybe the project ideas could have been better > presented. I am not sure at this stage. I'll make a list of successful > projects to see if we can digest some truths. Something similar happened to The Perl Foundation last year, so we asked them why? We had been a well-performing organization (100% success rate in the preceding year) that put a lot of effort in our submission and had a solid community behind us. The answer was not quite what we expected. It had little to do with the quality of the application, but everything with them wanting new blood in the SOC. The same had happened to other projects I'm told. Leon From alexeymorozov1991 at gmail.com Tue Apr 9 22:48:54 2013 From: alexeymorozov1991 at gmail.com (Alexey Morozov) Date: Wed, 10 Apr 2013 11:48:54 +0900 Subject: [Bioperl-l] UNIPROT foo.tsv files support In-Reply-To: <3F77909C-2314-49D0-9A58-6F97B88A1FC2@verizon.net> References: <118F034CF4C3EF48A96F86CE585B94BF74DD4044@CHIMBX5.ad.uillinois.edu> <3F77909C-2314-49D0-9A58-6F97B88A1FC2@verizon.net> Message-ID: These are tab-delimited files which are available for download from individual pages for families or other groups of proteins, e.g. here https://www.ebi.ac.uk/interpro/entry/IPR012078/proteins-matched in top right corner of the page. But I have also decided to make my own simple parser, especially because I need only a little subset of data available in files. 2013/4/9 Brian Osborne > Or Bio::SeqIO::table, perhaps. > > > On Apr 9, 2013, at 8:53 AM, "Fields, Christopher J" > wrote: > > > We support Swissprot format. You'll have to point out where you > retrieved the data, TSV isn't listed as a specific file type on their > downloads page: > > > > http://www.uniprot.org/downloads > > > > If this is tab-delimited, it should be pretty easy to drum up a custom > parser using Text::CSV and the Bioperl FASTA parser. > > > > chris > > > > On Apr 9, 2013, at 2:18 AM, Alexey Morozov > wrote: > > > >> Does BioPerl currently support UNIPROT data? For a couple of families I > >> have downloaded foo.fasta and foo.tsv files and want to turn data in > them > >> to Bio::Seq objects, add taxonomy data and do every kind of fancy stuff > I > >> please. > >> It seems that parsing of UNIPROT tsv files is not described on Bioperl > wiki > >> and search for 'uniprot' at CPAN gives only a couple of modules with > >> totally different functions. > >> -- > >> Alexey Morozov, > >> LIN SB RAS, bioinformatics group. > >> Irkutsk, Russia. > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Alexey Morozov, LIN SB RAS, bioinformatics group. Irkutsk, Russia. From dhoworth at mrc-lmb.cam.ac.uk Wed Apr 10 04:50:55 2013 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Wed, 10 Apr 2013 09:50:55 +0100 Subject: [Bioperl-l] UNIPROT foo.tsv files support In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF74DD4044@CHIMBX5.ad.uillinois.edu> <3F77909C-2314-49D0-9A58-6F97B88A1FC2@verizon.net> Message-ID: <516527EF.6080603@mrc-lmb.cam.ac.uk> Alexey Morozov wrote: > These are tab-delimited files which are available for download from > individual pages for families or other groups of proteins, e.g. here > https://www.ebi.ac.uk/interpro/entry/IPR012078/proteins-matched > in top right corner of the page. But I have also decided to make my own > simple parser, especially because I need only a little subset of data > available in files. They are InterPro rather than UniProt! > 2013/4/9 Brian Osborne > >> Or Bio::SeqIO::table, perhaps. >> >> >> On Apr 9, 2013, at 8:53 AM, "Fields, Christopher J" >> wrote: >> >>> We support Swissprot format. You'll have to point out where you >> retrieved the data, TSV isn't listed as a specific file type on their >> downloads page: >>> http://www.uniprot.org/downloads >>> >>> If this is tab-delimited, it should be pretty easy to drum up a custom >> parser using Text::CSV and the Bioperl FASTA parser. >>> chris >>> >>> On Apr 9, 2013, at 2:18 AM, Alexey Morozov >> wrote: >>>> Does BioPerl currently support UNIPROT data? For a couple of families I >>>> have downloaded foo.fasta and foo.tsv files and want to turn data in >> them >>>> to Bio::Seq objects, add taxonomy data and do every kind of fancy stuff >> I >>>> please. >>>> It seems that parsing of UNIPROT tsv files is not described on Bioperl >> wiki >>>> and search for 'uniprot' at CPAN gives only a couple of modules with >>>> totally different functions. >>>> -- >>>> Alexey Morozov, >>>> LIN SB RAS, bioinformatics group. >>>> Irkutsk, Russia. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From wkretzsch at gmail.com Wed Apr 10 11:41:57 2013 From: wkretzsch at gmail.com (Warren W. Kretzschmar) Date: Wed, 10 Apr 2013 16:41:57 +0100 Subject: [Bioperl-l] Missing Bio/Root/Version.pm in bioperl-live Message-ID: Hi everybody, I have updated Bio/SeqIO/msout.pm and a few test files and I am trying to figure out what branch to merge against before submitting a pull request. On commit ee8b48d583f07b575a0c77e7c458c687cbe97b11 When I build bioperl-live using perl Build.PL, I get the error message in the postcsript. Apparently Bio/Root/Version.pm is missing? Should I be merging against another branch instead? Regards, Warren PS Here is the output from perl Build.PL Checking prerequisites... recommends: * Algorithm::Munkres is not installed * Array::Compare is not installed * Convert::Binary::C is not installed * GD is not installed * Graph is not installed * GraphViz is not installed * HTML::TableExtract is not installed * Math::Random is not installed * PostScript::TextBlock is not installed * SOAP::Lite is not installed * SVG is not installed * SVG::Graph is not installed * Set::Scalar is not installed * Sort::Naturally is not installed * Spreadsheet::ParseExcel is not installed * XML::Parser::PerlSAX is not installed * XML::SAX::Writer is not installed * XML::Twig is not installed Checking optional features... EntrezGene................disabled requires: ! Bio::ASN1::EntrezGene is not installed MySQL Tests...............disabled requires: ! DBD::mysql is not installed Pg Tests..................disabled requires: ! DBD::Pg is not installed ERRORS/WARNINGS FOUND IN PREREQUISITES. You may wish to install the versions of the modules indicated above before proceeding with this installation Run 'Build installdeps' to install missing prerequisites. Can't find file Bio/Root/Version.pm to determine version at /Library/Perl/5.12/Module/Build/Base.pm line 1193. From cjfields at illinois.edu Wed Apr 10 12:47:00 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 10 Apr 2013 16:47:00 +0000 Subject: [Bioperl-l] Missing Bio/Root/Version.pm in bioperl-live In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DD5C8A@CHIMBX5.ad.uillinois.edu> Warren, Use the 'v1' branch; we're making some major changes on 'master' (moving chunks of code to separate repos). chris On Apr 10, 2013, at 10:41 AM, Warren W. Kretzschmar wrote: > Hi everybody, > I have updated Bio/SeqIO/msout.pm and a few test files and I am trying to > figure out what branch to merge against before submitting a pull request. > On commit ee8b48d583f07b575a0c77e7c458c687cbe97b11 When I build > bioperl-live using perl Build.PL, I get the error message in the > postcsript. Apparently Bio/Root/Version.pm is missing? > > Should I be merging against another branch instead? > > Regards, > Warren > > PS Here is the output from perl Build.PL > Checking prerequisites... > recommends: > * Algorithm::Munkres is not installed > * Array::Compare is not installed > * Convert::Binary::C is not installed > * GD is not installed > * Graph is not installed > * GraphViz is not installed > * HTML::TableExtract is not installed > * Math::Random is not installed > * PostScript::TextBlock is not installed > * SOAP::Lite is not installed > * SVG is not installed > * SVG::Graph is not installed > * Set::Scalar is not installed > * Sort::Naturally is not installed > * Spreadsheet::ParseExcel is not installed > * XML::Parser::PerlSAX is not installed > * XML::SAX::Writer is not installed > * XML::Twig is not installed > > Checking optional features... > EntrezGene................disabled > requires: > ! Bio::ASN1::EntrezGene is not installed > MySQL Tests...............disabled > requires: > ! DBD::mysql is not installed > Pg Tests..................disabled > requires: > ! DBD::Pg is not installed > > ERRORS/WARNINGS FOUND IN PREREQUISITES. You may wish to install the > versions > of the modules indicated above before proceeding with this installation > > Run 'Build installdeps' to install missing prerequisites. > > Can't find file Bio/Root/Version.pm to determine version at > /Library/Perl/5.12/Module/Build/Base.pm line 1193. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From wkretzsch at gmail.com Wed Apr 10 13:32:44 2013 From: wkretzsch at gmail.com (Warren W. Kretzschmar) Date: Wed, 10 Apr 2013 18:32:44 +0100 Subject: [Bioperl-l] Missing Bio/Root/Version.pm in bioperl-live In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DD5C8A@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF74DD5C8A@CHIMBX5.ad.uillinois.edu> Message-ID: Great, thanks. I have merged the changes with v1 and added a pull request. Warren On Wed, Apr 10, 2013 at 5:47 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > Warren, > > Use the 'v1' branch; we're making some major changes on 'master' (moving > chunks of code to separate repos). > > chris > > On Apr 10, 2013, at 10:41 AM, Warren W. Kretzschmar > wrote: > > > Hi everybody, > > I have updated Bio/SeqIO/msout.pm and a few test files and I am trying > to > > figure out what branch to merge against before submitting a pull request. > > On commit ee8b48d583f07b575a0c77e7c458c687cbe97b11 When I build > > bioperl-live using perl Build.PL, I get the error message in the > > postcsript. Apparently Bio/Root/Version.pm is missing? > > > > Should I be merging against another branch instead? > > > > Regards, > > Warren > > > > PS Here is the output from perl Build.PL > > Checking prerequisites... > > recommends: > > * Algorithm::Munkres is not installed > > * Array::Compare is not installed > > * Convert::Binary::C is not installed > > * GD is not installed > > * Graph is not installed > > * GraphViz is not installed > > * HTML::TableExtract is not installed > > * Math::Random is not installed > > * PostScript::TextBlock is not installed > > * SOAP::Lite is not installed > > * SVG is not installed > > * SVG::Graph is not installed > > * Set::Scalar is not installed > > * Sort::Naturally is not installed > > * Spreadsheet::ParseExcel is not installed > > * XML::Parser::PerlSAX is not installed > > * XML::SAX::Writer is not installed > > * XML::Twig is not installed > > > > Checking optional features... > > EntrezGene................disabled > > requires: > > ! Bio::ASN1::EntrezGene is not installed > > MySQL Tests...............disabled > > requires: > > ! DBD::mysql is not installed > > Pg Tests..................disabled > > requires: > > ! DBD::Pg is not installed > > > > ERRORS/WARNINGS FOUND IN PREREQUISITES. You may wish to install the > > versions > > of the modules indicated above before proceeding with this installation > > > > Run 'Build installdeps' to install missing prerequisites. > > > > Can't find file Bio/Root/Version.pm to determine version at > > /Library/Perl/5.12/Module/Build/Base.pm line 1193. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From alexeymorozov1991 at gmail.com Wed Apr 10 21:26:53 2013 From: alexeymorozov1991 at gmail.com (Alexey Morozov) Date: Thu, 11 Apr 2013 10:26:53 +0900 Subject: [Bioperl-l] UNIPROT foo.tsv files support In-Reply-To: <516527EF.6080603@mrc-lmb.cam.ac.uk> References: <118F034CF4C3EF48A96F86CE585B94BF74DD4044@CHIMBX5.ad.uillinois.edu> <3F77909C-2314-49D0-9A58-6F97B88A1FC2@verizon.net> <516527EF.6080603@mrc-lmb.cam.ac.uk> Message-ID: Truly they are! Sorry for messing up databases. 2013/4/10 Dave Howorth > Alexey Morozov wrote: > > These are tab-delimited files which are available for download from > > individual pages for families or other groups of proteins, e.g. here > > https://www.ebi.ac.uk/interpro/entry/IPR012078/proteins-matched > > in top right corner of the page. But I have also decided to make my own > > simple parser, especially because I need only a little subset of data > > available in files. > > They are InterPro rather than UniProt! > > > 2013/4/9 Brian Osborne > > > >> Or Bio::SeqIO::table, perhaps. > >> > >> > >> On Apr 9, 2013, at 8:53 AM, "Fields, Christopher J" < > cjfields at illinois.edu> > >> wrote: > >> > >>> We support Swissprot format. You'll have to point out where you > >> retrieved the data, TSV isn't listed as a specific file type on their > >> downloads page: > >>> http://www.uniprot.org/downloads > >>> > >>> If this is tab-delimited, it should be pretty easy to drum up a custom > >> parser using Text::CSV and the Bioperl FASTA parser. > >>> chris > >>> > >>> On Apr 9, 2013, at 2:18 AM, Alexey Morozov < > alexeymorozov1991 at gmail.com> > >> wrote: > >>>> Does BioPerl currently support UNIPROT data? For a couple of families > I > >>>> have downloaded foo.fasta and foo.tsv files and want to turn data in > >> them > >>>> to Bio::Seq objects, add taxonomy data and do every kind of fancy > stuff > >> I > >>>> please. > >>>> It seems that parsing of UNIPROT tsv files is not described on Bioperl > >> wiki > >>>> and search for 'uniprot' at CPAN gives only a couple of modules with > >>>> totally different functions. > >>>> -- > >>>> Alexey Morozov, > >>>> LIN SB RAS, bioinformatics group. > >>>> Irkutsk, Russia. > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Alexey Morozov, LIN SB RAS, bioinformatics group. Irkutsk, Russia. From hlapp at drycafe.net Thu Apr 11 19:48:05 2013 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 11 Apr 2013 16:48:05 -0700 Subject: [Bioperl-l] Convert BioPerl-DB to DBIx::Class In-Reply-To: <516749D5.2050705@gmail.com> References: <516749D5.2050705@gmail.com> Message-ID: <46CEFA70-E8EA-4B62-B01B-9FE032C84F36@drycafe.net> Hi Denis, The OBF has not been accepted into GSoC this year, and thus BioPerl isn't participating in an official capacity either. That doesn't mean you can't undertake this project, it just would have to be outside of GSoC and on a voluntary basis. I'm copying the BioPerl list, so others from our community can chime in. -hilmar On Apr 11, 2013, at 4:40 PM, denisboyun wrote: > Hi, Hilmar. My name is Denis. I know the bioperl is involving in GSOC 2013. As for me, application " Convert BioPerl-DB to DBIx::Class" is interesting. Here is some plan as I see it all. > > 1.First of all, I generates schemas from db tables(dbicdump toolkit). > 2.Second step, I rewrites all part of db-repository with DBIx::Class object(etc DBIx::Exception). > 3.After that, I merges this code with bioperl-live. > > Schema db that https://github.com/denisboyun/bioperl-db2.git > > Please, to share your opinions. What do think about this? What are the most difficult memonts that to pay attention? > > Thanks:) -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From l.m.timmermans at students.uu.nl Fri Apr 12 11:33:28 2013 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Fri, 12 Apr 2013 16:33:28 +0100 Subject: [Bioperl-l] Convert BioPerl-DB to DBIx::Class In-Reply-To: <46CEFA70-E8EA-4B62-B01B-9FE032C84F36@drycafe.net> References: <516749D5.2050705@gmail.com> <46CEFA70-E8EA-4B62-B01B-9FE032C84F36@drycafe.net> Message-ID: On Fri, Apr 12, 2013 at 12:48 AM, Hilmar Lapp wrote: > Hi Denis, > > The OBF has not been accepted into GSoC this year, and thus BioPerl isn't participating in an official capacity either. That doesn't mean you can't undertake this project, it just would have to be outside of GSoC and on a voluntary basis. > > I'm copying the BioPerl list, so others from our community can chime in. > > -hilmar I'm sitting next to ribasushi (one of the main developers of DBIx::Class) right now, he's saying that if you run into issues you can ask the mailinglist (http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class) or himself (or the #dbix-class on irc.perl.org) :-) Leon From voldrani at gmail.com Tue Apr 16 07:44:11 2013 From: voldrani at gmail.com (Chris Maloney) Date: Tue, 16 Apr 2013 07:44:11 -0400 Subject: [Bioperl-l] Some trouble getting started Message-ID: [Resending this -- helps to be subscribed to the list.] Hi, I'm very new to bioperl. Here are some problems I've encountered in the last couple of days: * The wiki doesn't allow new users to log in: https://redmine.open-bio.org/issues/3424: neither the OBF wiki nor the bioperl wiki. Presumably this has something to do with the migration to AWS (http://news.open-bio.org/news/2012/11/server-transition-to-aws/) in November last year. If so, that's five months -- a long time! That ticket itself has been open for more than a month. I would like to fix it, but I don't know whom to contact, and ... * The OBF blog I just linked to requires you to log in to leave a comment. But there's no way that I can discern to create a new account. I'd think this should be changed to allow anyone to leave a comment, with a CATPCHA. I haven't run a wordpress blog before, but I'd guess this would be enough to keep out the spam. * The redmine bug tracker installation has an expired security certificate. I guess most people are probably aware of this. I don't *want* to complain, but it seems to me these are pretty important issues in terms of community building. If people are too busy, let me take a look, I will gladly try my best to fix these issues. Peter Cock or genehack could vouch for me, I think. (At least, insofar as that I wouldn't do anything malicious.) Chris Maloney From p.j.a.cock at googlemail.com Tue Apr 16 08:26:14 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 16 Apr 2013 13:26:14 +0100 Subject: [Bioperl-l] Some trouble getting started In-Reply-To: References: Message-ID: On Tuesday, April 16, 2013, Chris Maloney wrote: > [Resending this -- helps to be subscribed to the list.] > > > Hi, I'm very new to bioperl. Here are some problems I've encountered > in the last couple of days: Thanks for taking the time to report this :) > * The wiki doesn't allow new users to log in: > https://redmine.open-bio.org/issues/3424: neither the OBF wiki nor > the bioperl wiki. Presumably this has something to do with the > migration to AWS > (http://news.open-bio.org/news/2012/11/server-transition-to-aws/) in > November last year. If so, that's five months -- a long time! That > ticket itself has been open for more than a month. > I would like to fix it, but I don't know whom to contact, and ... I wasn't aware of this until very recently, it was reported on the Biopython list and forwarded to the root-l SysAdmin team. http://lists.open-bio.org/pipermail/biopython/2013-April/008510.html > * The OBF blog I just linked to requires you to log in to leave a > comment. But there's no way that I can discern to create a new > account. I'd think this should be changed to allow anyone to > leave a comment, with a CATPCHA. I haven't run a wordpress > blog before, but I'd guess this would be enough to keep out > the spam. That may have been an accidental change from when the theme was changed (I find it hard to log in and I have an account). I'll pass that on to the root-l list too. > > * The redmine bug tracker installation has an expired security > certificate. I guess most people are probably aware of this. > > Already in hand, although buying the new certificate has been complicated due to the timing of the OBF/SPI asset transfer. > I don't *want* to complain, but it seems to me these are pretty > important issues in terms of community building. If people are > too busy, let me take a look, I will gladly try my best to fix these > issues. > > Peter Cock or genehack could vouch for me, I think. (At least, > insofar as that I wouldn't do anything malicious.) > > Chris Maloney Are you volunteering for some SysAdmin type stuff then? Regards, Peter From p.j.a.cock at googlemail.com Tue Apr 16 08:29:10 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 16 Apr 2013 13:29:10 +0100 Subject: [Bioperl-l] Some trouble getting started In-Reply-To: References:

Message-ID: On Tuesday, April 16, 2013, Peter Cock wrote: > > > On Tuesday, April 16, 2013, Chris Maloney wrote: > > >> * The wiki doesn't allow new users to log in: >> https://redmine.open-bio.org/issues/3424: neither the OBF wiki nor >> the bioperl wiki. Presumably this has something to do with the >> migration to AWS >> (http://news.open-bio.org/news/2012/11/server-transition-to-aws/) in >> November last year. If so, that's five months -- a long time! That >> ticket itself has been open for more than a month. >> I would like to fix it, but I don't know whom to contact, and ... > > > I wasn't aware of this until very recently, it was reported on the > Biopython list and forwarded to the root-l SysAdmin team. > http://lists.open-bio.org/pipermail/biopython/2013-April/008510.html > > That bug is a bit odd - it doesn't seem to have been filed against any particular project, and has no assignee - and so most likely no one was ever emailed of the bug's existence. :( Peter From cjfields at illinois.edu Tue Apr 16 09:23:34 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 16 Apr 2013 13:23:34 +0000 Subject: [Bioperl-l] Some trouble getting started In-Reply-To: References:

Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DE1105@CHIMBX5.ad.uillinois.edu> On Apr 16, 2013, at 7:29 AM, Peter Cock wrote: > On Tuesday, April 16, 2013, Peter Cock wrote: >> >> >> On Tuesday, April 16, 2013, Chris Maloney wrote: >> >> >>> * The wiki doesn't allow new users to log in: >>> https://redmine.open-bio.org/issues/3424: neither the OBF wiki nor >>> the bioperl wiki. Presumably this has something to do with the >>> migration to AWS >>> (http://news.open-bio.org/news/2012/11/server-transition-to-aws/) in >>> November last year. If so, that's five months -- a long time! That >>> ticket itself has been open for more than a month. >>> I would like to fix it, but I don't know whom to contact, and ... >> >> >> I wasn't aware of this until very recently, it was reported on the >> Biopython list and forwarded to the root-l SysAdmin team. >> http://lists.open-bio.org/pipermail/biopython/2013-April/008510.html >> >> > That bug is a bit odd - it doesn't seem to have been filed against > any particular project, and has no assignee - and so most likely > no one was ever emailed of the bug's existence. :( > > Peter It's an OBF bug; makes sense in light of the common wiki issues across Bio* wikis at the moment. Looks as if we never decided where such issues end up going or who they would be assigned to. What is the consensus on using redmine at the moment? Are the various Bio* (except BioPerl) still using it? Using Github? I haven't gone there primarily b/c they don't allow attachments (though that isn't necessarily a bad thing in some cases?). Reason I ask: I would suggest we set up this project tracker so that it's primarily around web-admin, etc. for OBF, point the email forwards to somewhere useful, populate it with devs from the various groups, etc. But, if no one but bioperl is using redmine then I don't see the point. chris From voldrani at gmail.com Tue Apr 16 09:49:32 2013 From: voldrani at gmail.com (Chris Maloney) Date: Tue, 16 Apr 2013 09:49:32 -0400 Subject: [Bioperl-l] Some trouble getting started In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DE1105@CHIMBX5.ad.uillinois.edu> References:

<118F034CF4C3EF48A96F86CE585B94BF74DE1105@CHIMBX5.ad.uillinois.edu> Message-ID: > Are you volunteering for some SysAdmin type stuff then? Yes -- I was starting to get the impression there was no one minding the store. In which case, I could take a stab at fixing these issues, if someone were willing to give me the keys. But otherwise ... it doesn't matter, as long as someone takes a look. I could probably fix the mediawiki and wordpress issues, though, if no one else has time. On Tue, Apr 16, 2013 at 9:23 AM, Fields, Christopher J wrote: > On Apr 16, 2013, at 7:29 AM, Peter Cock wrote: > >> On Tuesday, April 16, 2013, Peter Cock wrote: >>> >>> >>> On Tuesday, April 16, 2013, Chris Maloney wrote: >>> >>> >>>> * The wiki doesn't allow new users to log in: >>>> https://redmine.open-bio.org/issues/3424: neither the OBF wiki nor >>>> the bioperl wiki. Presumably this has something to do with the >>>> migration to AWS >>>> (http://news.open-bio.org/news/2012/11/server-transition-to-aws/) in >>>> November last year. If so, that's five months -- a long time! That >>>> ticket itself has been open for more than a month. >>>> I would like to fix it, but I don't know whom to contact, and ... >>> >>> >>> I wasn't aware of this until very recently, it was reported on the >>> Biopython list and forwarded to the root-l SysAdmin team. >>> http://lists.open-bio.org/pipermail/biopython/2013-April/008510.html >>> >>> >> That bug is a bit odd - it doesn't seem to have been filed against >> any particular project, and has no assignee - and so most likely >> no one was ever emailed of the bug's existence. :( >> >> Peter > > It's an OBF bug; makes sense in light of the common wiki issues across Bio* wikis at the moment. Looks as if we never decided where such issues end up going or who they would be assigned to. > > What is the consensus on using redmine at the moment? Are the various Bio* (except BioPerl) still using it? Using Github? I haven't gone there primarily b/c they don't allow attachments (though that isn't necessarily a bad thing in some cases?). Reason I ask: I would suggest we set up this project tracker so that it's primarily around web-admin, etc. for OBF, point the email forwards to somewhere useful, populate it with devs from the various groups, etc. But, if no one but bioperl is using redmine then I don't see the point. > > chris > From fangl at big.ac.cn Mon Apr 15 04:13:05 2013 From: fangl at big.ac.cn (fangl) Date: Mon, 15 Apr 2013 16:13:05 +0800 Subject: [Bioperl-l] BUG Message-ID: <201304151613054277071@big.ac.cn> Hi, I find a bug when I use Bio::SearchIO to parse Hmmscan. The result of my Hmmscan has 2 hits against 1 query ,but the module of Bio::SearchIO can only parse 1 hit in the output file. eg: hmmscan result: # hmmscan :: search sequence(s) against a profile database # HMMER 3.0 (March 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query sequence file: pro_2.fa # target HMM database: /leofs/biodenovo/fangl/data/Pfam/Pfam-A.hmm # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: PEG03688 [L=200] Scores for complete sequence (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Model Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 0.00038 20.7 0.1 0.00054 20.2 0.1 1.3 1 SnoaL_2 SnoaL-like domain ------ inclusion threshold ------ 0.068 12.9 0.0 0.43 10.3 0.0 2.0 1 FAD_binding_8 FAD-binding domain Domain annotation for each model (and alignments): >> SnoaL_2 SnoaL-like domain # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 20.2 0.1 7.9e-08 0.00054 11 86 .. 59 157 .. 46 172 .. 0.72 Alignments for each domain: == domain 1 score: 20.2 bits; conditional E-value: 7.9e-08 T-HHHHHHTEEEEEEEE............CT.S--..E---HHHHHHHTTHHHCEECEEEEEEEEESSTTEEEEEEEEE-E........ESBS--EEEE CS SnoaL_2 11 gdldalaallapdvvwe............dp.fge..lrGrealraffrallaafpdlrfevedviadgdrvvvrwtvtgt........ipptgrgvtv 86 d l +l++ d+++ +p +g+ + G++a+ + f ++ + f+++++++d+ v++++ +t+t i p ++ +v PEG03688 59 PDYNLLKELVTYDCTYIsltfdnptlhgiMPwAGThtHVGPQAFIDIFTRVGLYWDRGPFSIDHIFGDDGNVTAWGSFTATsrtlgktvISPWAARARV 157 3666888999999999877777555545545544488******************************************98555555554444444444 PP >> FAD_binding_8 FAD-binding domain # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ? 10.3 0.0 6.3e-05 0.43 19 63 .. 77 128 .. 63 197 .. 0.69 Alignments for each domain: == domain 1 score: 10.3 bits; conditional E-value: 6.3e-05 FAD_binding_8 19 lklkkpkks.......lkykpGqyvfini.pslsklflqsHPFtiasapeddk 63 l++ p+ ++ + G +fi i ++ l+++ PF+i dd PEG03688 77 LTFDNPTLHgimpwagTHTHVGPQAFIDIfTRVG-LYWDRGPFSIDHIFGDDG 128 555555444444444468899999*****66666.99********88777664 PP Internal pipeline statistics summary: ------------------------------------- Query sequence(s): 1 (200 residues) Target model(s): 13672 (2396357 nodes) Passed MSV filter: 244 (0.0178467); expected 273.4 (0.02) Passed bias filter: 219 (0.0160181); expected 273.4 (0.02) Passed Vit filter: 14 (0.00102399); expected 13.7 (0.001) Passed Fwd filter: 2 (0.000146284); expected 0.1 (1e-05) Initial search space (Z): 13672 [actual number of targets] Domain search space (domZ): 2 [number of targets reported over threshold] # CPU time: 0.15u 0.13s 00:00:00.28 Elapsed: 00:00:06.72 # Mc/sec: 71.32 // my parse result: query len hit len query_start query_end query_match_len hit_start hit_end hit_match_len evalue cov PEG03688 200 SnoaL_2 0 59 157 99 11 86 76 7.9e-08 0.49 pleast to solve this problem! Thanks! Best wishes! fangl From john.chen.bioinfo at gmail.com Sat Apr 13 13:37:36 2013 From: john.chen.bioinfo at gmail.com (John Chen) Date: Sat, 13 Apr 2013 10:37:36 -0700 (PDT) Subject: [Bioperl-l] Bioperl-live has (near) empty Bio::Root::Root? Message-ID: Hi, I have perlbrew with perl-5.8.9 installed. And I tried to get the bioperl-live via Git following the instruction on bioperl official site: http://www.bioperl.org/wiki/Using_Git However, it seems the testing bioperl version encountered error (below) immediately. Manually checking the cloned bioperl-live revealed there is only one module HTTPget.pm under Bio/Root (see below), I also check the online Github bioperl-live repository, the Bio/Root folder indeed only contain only one modeul: HTTPget.pm is this normal or I need to use some alternative approach to get the bioperl (the CPAN package is rather old though)? Thanks John *% perl -MBio::Perl -le 'print Bio::Perl->VERSION;'* ------------------------------------------------------------------------------------------------------------ Base class package *"Bio::Root::Root" is empty.* (Perhaps you need to 'use' the module which defines that package first, or make that module available in @INC (@INC contains: $HOME/src/bioperl-live/ $HOME/perlbrew//perls/perl-5.8.9/lib/5.8.9/x86_64-linux $HOME/perlbrew//perls/perl-5.8.9/lib/5.8.9 $HOME/perlbrew//perls/perl-5.8.9/lib/site_perl/5.8.9/x86_64-linux $HOME/perlbrew//perls/perl-5.8.9/lib/site_perl/5.8.9 .). at $HOME/src/bioperl-live//Bio/Location/WidestCoordPolicy.pm line 80 BEGIN failed--compilation aborted at $HOME/src/bioperl-live//Bio/Location/WidestCoordPolicy.pm line 80. Compilation failed in require at $HOME/src/bioperl-live//Bio/Location/Atomic.pm line 79. BEGIN failed--compilation aborted at $HOME/src/bioperl-live//Bio/Location/Atomic.pm line 79. Compilation failed in require at (eval 1) line 3. ...propagated at $HOME/perlbrew//perls/perl-5.8.9/lib/5.8.9/base.pm line 93. BEGIN failed--compilation aborted at $HOME/src/bioperl-live//Bio/Location/Simple.pm line 87. Compilation failed in require at $HOME/src/bioperl-live//Bio/Factory/FTLocationFactory.pm line 97. BEGIN failed--compilation aborted at $HOME/src/bioperl-live//Bio/Factory/FTLocationFactory.pm line 97. Compilation failed in require at $HOME/src/bioperl-live//Bio/SeqIO.pm line 328. BEGIN failed--compilation aborted at $HOME/src/bioperl-live//Bio/SeqIO.pm line 328. Compilation failed in require at $HOME/src/bioperl-live//Bio/Perl.pm line 120. BEGIN failed--compilation aborted at $HOME/src/bioperl-live//Bio/Perl.pm line 120. Compilation failed in require. BEGIN failed--compilation aborted. ------------------------------------------------------------------------------------------------------------ %ls $HOME/src/bioperl-live/Bio/Root/ HTTPget.pm ------------------------------------------------------------------------------------------------------------ From lalitha.viswanathan79 at gmail.com Tue Apr 16 06:10:14 2013 From: lalitha.viswanathan79 at gmail.com (Lalitha Viswanathan) Date: Tue, 16 Apr 2013 15:40:14 +0530 Subject: [Bioperl-l] Trouble installing BIo:DB:SAM tools In-Reply-To: References: Message-ID: Hi I am trying to install Bio::DB::SAM tools version 1.37 on ubuntu 12,10 I have BioPerl 1.6 installed and working fine I installed samtools-0.1.19 as below a) make b) Then ran perl Build.PL in Bio::DB::SAM tools version 1.37 c) ./Build install throws below error /usr/bin/ld: /home/lalitha/Documents/samtools-0.1.19/libbam.a(bgzf.o): relocation R_X86_64_32S against `.rodata.str1.1' can not be used when making a shared object; recompile with -fPIC /home/lalitha/Documents/samtools-0.1.19/libbam.a: could not read symbols: Bad value collect2: error: ld returned 1 exit status error building blib/arch/auto/Bio/DB/Sam/Sam.so from lib/Bio/DB/Sam.o c_bin/bam2bedgraph.o at /usr/share/perl/5.14/ExtUtils/CBuilder/Base.pm line 312. d) Same error is encountered when building through cpan Any help will be appreciated Thanks Lalitha From p.j.a.cock at googlemail.com Tue Apr 16 10:00:20 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 16 Apr 2013 15:00:20 +0100 Subject: [Bioperl-l] Some trouble getting started In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DE1105@CHIMBX5.ad.uillinois.edu> References:

<118F034CF4C3EF48A96F86CE585B94BF74DE1105@CHIMBX5.ad.uillinois.edu> Message-ID: Shall we move this to the cross-project list and/or root-l instead? On Tue, Apr 16, 2013 at 2:23 PM, Fields, Christopher J wrote: >> That bug is a bit odd - it doesn't seem to have been filed against >> any particular project, and has no assignee - and so most likely >> no one was ever emailed of the bug's existence. :( >> >> Peter > > It's an OBF bug; makes sense in light of the common wiki issues > across Bio* wikis at the moment. Looks as if we never decided > where such issues end up going or who they would be assigned to. My mistake - it wasn't clear from the bug page, but issue 3424 is actually filed under the general obf project, https://redmine.open-bio.org/issues/3424 https://redmine.open-bio.org/projects/obf I guess we could have had the root-l or open-bio-l lists the default assignee for that project (much like how we have the biopython-dev mailing list as the default assignee for Biopython issues on RedMine). > What is the consensus on using redmine at the moment? Are the > various Bio* (except BioPerl) still using it? Using Github? I haven't > gone there primarily b/c they don't allow attachments (though that > isn't necessarily a bad thing in some cases?). Reason I ask: I > would suggest we set up this project tracker so that it's primarily > around web-admin, etc. for OBF, point the email forwards to > somewhere useful, populate it with devs from the various groups, > etc. But, if no one but bioperl is using redmine then I don't see > the point. In the short term improving the OBF project setup on RedMine seems a good idea (default assignees and new issue alerts). https://redmine.open-bio.org/projects/obf BioSQL and BioRuby are also using RedMine according to their homepages. Biopython is still using RedMine but we're talking about moving to the GitHub issue tracker instead (you can use github gists for attachments), most likely a manual transfer and triage of all the old open issues. According to their homepage, BioJava are already using GitHub, I don't know if they turned off filing bugs on Redmine (and if not, that would be sensible): https://redmine.open-bio.org/projects/biojava If we do all move to GitHub issues, then for general OBF bug tracking (e.g. wordpress setup, DNS issues) we could use something under https://github.com/OBF/ for that. Regards, Peter From p.j.a.cock at googlemail.com Tue Apr 16 10:02:41 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 16 Apr 2013 15:02:41 +0100 Subject: [Bioperl-l] Some trouble getting started In-Reply-To: References:

<118F034CF4C3EF48A96F86CE585B94BF74DE1105@CHIMBX5.ad.uillinois.edu> Message-ID: On Tue, Apr 16, 2013 at 2:49 PM, Chris Maloney wrote: >> Are you volunteering for some SysAdmin type stuff then? > > Yes -- I was starting to get the impression there was no one minding > the store. In which case, I could take a stab at fixing these issues, > if someone were willing to give me the keys. But otherwise ... it > doesn't matter, as long as someone takes a look. I could probably fix > the mediawiki and wordpress issues, though, if no one else has time. In this case you've found a couple of communication road blocks meaning the root-l team never knew about the issues. If you'd like to sign up to root-l you'll be able to read the archive etc and we can bring you up to speed. http://lists.open-bio.org/mailman/listinfo/root-l Thanks, Peter From cjfields at illinois.edu Tue Apr 16 10:29:44 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 16 Apr 2013 14:29:44 +0000 Subject: [Bioperl-l] BUG In-Reply-To: <201304151613054277071@big.ac.cn> References: <201304151613054277071@big.ac.cn> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DE1415@CHIMBX5.ad.uillinois.edu> fangl, It would help if you can post a bug report on this, just to make sure we can reproduce it. Attach an example data set and code. https://redmine.open-bio.org/ chris On Apr 15, 2013, at 3:13 AM, fangl wrote: > Hi, > I find a bug when I use Bio::SearchIO to parse Hmmscan. > The result of my Hmmscan has 2 hits against 1 query ,but the module of Bio::SearchIO can only parse 1 hit in the output file. > eg: > hmmscan result: > # hmmscan :: search sequence(s) against a profile database > # HMMER 3.0 (March 2010); http://hmmer.org/ > # Copyright (C) 2010 Howard Hughes Medical Institute. > # Freely distributed under the GNU General Public License (GPLv3). > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > # query sequence file: pro_2.fa > # target HMM database: /leofs/biodenovo/fangl/data/Pfam/Pfam-A.hmm > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Query: PEG03688 [L=200] > Scores for complete sequence (score includes all domains): > --- full sequence --- --- best 1 domain --- -#dom- > E-value score bias E-value score bias exp N Model Description > ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- > 0.00038 20.7 0.1 0.00054 20.2 0.1 1.3 1 SnoaL_2 SnoaL-like domain > ------ inclusion threshold ------ > 0.068 12.9 0.0 0.43 10.3 0.0 2.0 1 FAD_binding_8 FAD-binding domain > > > Domain annotation for each model (and alignments): >>> SnoaL_2 SnoaL-like domain > # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc > --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- > 1 ! 20.2 0.1 7.9e-08 0.00054 11 86 .. 59 157 .. 46 172 .. 0.72 > > Alignments for each domain: > == domain 1 score: 20.2 bits; conditional E-value: 7.9e-08 > T-HHHHHHTEEEEEEEE............CT.S--..E---HHHHHHHTTHHHCEECEEEEEEEEESSTTEEEEEEEEE-E........ESBS--EEEE CS > SnoaL_2 11 gdldalaallapdvvwe............dp.fge..lrGrealraffrallaafpdlrfevedviadgdrvvvrwtvtgt........ipptgrgvtv 86 > d l +l++ d+++ +p +g+ + G++a+ + f ++ + f+++++++d+ v++++ +t+t i p ++ +v > PEG03688 59 PDYNLLKELVTYDCTYIsltfdnptlhgiMPwAGThtHVGPQAFIDIFTRVGLYWDRGPFSIDHIFGDDGNVTAWGSFTATsrtlgktvISPWAARARV 157 > 3666888999999999877777555545545544488******************************************98555555554444444444 PP > >>> FAD_binding_8 FAD-binding domain > # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc > --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- > 1 ? 10.3 0.0 6.3e-05 0.43 19 63 .. 77 128 .. 63 197 .. 0.69 > > Alignments for each domain: > == domain 1 score: 10.3 bits; conditional E-value: 6.3e-05 > FAD_binding_8 19 lklkkpkks.......lkykpGqyvfini.pslsklflqsHPFtiasapeddk 63 > l++ p+ ++ + G +fi i ++ l+++ PF+i dd > PEG03688 77 LTFDNPTLHgimpwagTHTHVGPQAFIDIfTRVG-LYWDRGPFSIDHIFGDDG 128 > 555555444444444468899999*****66666.99********88777664 PP > > > > Internal pipeline statistics summary: > ------------------------------------- > Query sequence(s): 1 (200 residues) > Target model(s): 13672 (2396357 nodes) > Passed MSV filter: 244 (0.0178467); expected 273.4 (0.02) > Passed bias filter: 219 (0.0160181); expected 273.4 (0.02) > Passed Vit filter: 14 (0.00102399); expected 13.7 (0.001) > Passed Fwd filter: 2 (0.000146284); expected 0.1 (1e-05) > Initial search space (Z): 13672 [actual number of targets] > Domain search space (domZ): 2 [number of targets reported over threshold] > # CPU time: 0.15u 0.13s 00:00:00.28 Elapsed: 00:00:06.72 > # Mc/sec: 71.32 > // > > my parse result: > query len hit len query_start query_end query_match_len hit_start hit_end hit_match_len evalue cov > PEG03688 200 SnoaL_2 0 59 157 99 11 86 76 7.9e-08 0.49 > > > pleast to solve this problem! > Thanks! > > Best wishes! > > > > fangl > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Apr 16 10:21:40 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 16 Apr 2013 14:21:40 +0000 Subject: [Bioperl-l] Some trouble getting started In-Reply-To: References:

<118F034CF4C3EF48A96F86CE585B94BF74DE1105@CHIMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DE1348@CHIMBX5.ad.uillinois.edu> On Apr 16, 2013, at 9:00 AM, Peter Cock wrote: > Shall we move this to the cross-project list and/or root-l instead? > > On Tue, Apr 16, 2013 at 2:23 PM, Fields, Christopher J > wrote: >>> That bug is a bit odd - it doesn't seem to have been filed against >>> any particular project, and has no assignee - and so most likely >>> no one was ever emailed of the bug's existence. :( >>> >>> Peter >> >> It's an OBF bug; makes sense in light of the common wiki issues >> across Bio* wikis at the moment. Looks as if we never decided >> where such issues end up going or who they would be assigned to. > > My mistake - it wasn't clear from the bug page, but issue 3424 is > actually filed under the general obf project, > https://redmine.open-bio.org/issues/3424 > https://redmine.open-bio.org/projects/obf > > I guess we could have had the root-l or open-bio-l lists the > default assignee for that project (much like how we have the > biopython-dev mailing list as the default assignee for Biopython > issues on RedMine). That sounds fine to me; I anticipate very few of these things cropping up. We'll need to set up the redmine email on the white list for the group chosen. >> What is the consensus on using redmine at the moment? Are the >> various Bio* (except BioPerl) still using it? Using Github? I haven't >> gone there primarily b/c they don't allow attachments (though that >> isn't necessarily a bad thing in some cases?). Reason I ask: I >> would suggest we set up this project tracker so that it's primarily >> around web-admin, etc. for OBF, point the email forwards to >> somewhere useful, populate it with devs from the various groups, >> etc. But, if no one but bioperl is using redmine then I don't see >> the point. > > In the short term improving the OBF project setup on RedMine > seems a good idea (default assignees and new issue alerts). > https://redmine.open-bio.org/projects/obf > > BioSQL and BioRuby are also using RedMine according to their > homepages. > > Biopython is still using RedMine but we're talking about moving > to the GitHub issue tracker instead (you can use github gists for > attachments), most likely a manual transfer and triage of all the > old open issues. gists don't accept attachments as uploads; it's a copy and paste (so text is fine, but items such as a tarball are not w/o embedding some base64 encoding). You can do image attachments, though: https://github.com/blog/1347-issue-attachments Again, it's not necessarily a bad thing that (most) attachments are not supported, but we do get users sending in scripts, data, etc. so it's worth bringing up. > According to their homepage, BioJava are already using > GitHub, I don't know if they turned off filing bugs on Redmine > (and if not, that would be sensible): > https://redmine.open-bio.org/projects/biojava Might be worth checking with them. > If we do all move to GitHub issues, then for general OBF > bug tracking (e.g. wordpress setup, DNS issues) we could > use something under https://github.com/OBF/ for that. > > Regards, > > Peter Yes; could probably set up a web-admin repo (or similar) if needed to activate Issues. chris From lincoln.stein at gmail.com Tue Apr 16 11:10:01 2013 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Tue, 16 Apr 2013 16:10:01 +0100 Subject: [Bioperl-l] Trouble installing BIo:DB:SAM tools In-Reply-To: References:

Message-ID: >From the Bio::DB::SAM README: TROUBLESHOOTING: > > >> If you encounter problems during compiling, you may need to edit > > Build.PL so that extra_compiler_flags matches the CFLAGS and DFLAGS > > settings in the Samtools Makefile. Here are some common problems: > > >> 1. When building this module, you get an error like the following: > > relocation R_X86_64_32 against `a local symbol' can not be used when > > making a shared object; recompile with -fPIC > > >> To fix this, edit the Makefile in the Samtools distribution by adding > > "-fPIC" to the CFLAGS line. It should look like this: > > >> CFLAGS= -g -Wall -O2 -fPIC #-m64 #-arch ppc > > >> Then do "make clean; make" in the Samtools directory to recompile the > > library. After this you should be able to build this module without > > errors. > > >> >> On Tue, Apr 16, 2013 at 11:10 AM, Lalitha Viswanathan < lalitha.viswanathan79 at gmail.com> wrote: > Hi > I am trying to install Bio::DB::SAM tools version 1.37 on ubuntu 12,10 > I have BioPerl 1.6 installed and working fine > I installed samtools-0.1.19 as below > a) make > > b) Then ran perl Build.PL in Bio::DB::SAM tools version 1.37 > c) ./Build install throws below error > /usr/bin/ld: /home/lalitha/Documents/samtools-0.1.19/libbam.a(bgzf.o): > relocation R_X86_64_32S against `.rodata.str1.1' can not be used when > making a shared object; recompile with -fPIC > /home/lalitha/Documents/samtools-0.1.19/libbam.a: could not read symbols: > Bad value > collect2: error: ld returned 1 exit status > error building blib/arch/auto/Bio/DB/Sam/Sam.so from lib/Bio/DB/Sam.o > c_bin/bam2bedgraph.o at /usr/share/perl/5.14/ExtUtils/CBuilder/Base.pm line > 312. > > d) Same error is encountered when building through cpan > > Any help will be appreciated > > Thanks > Lalitha > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From armendarez77 at hotmail.com Wed Apr 17 12:32:53 2013 From: armendarez77 at hotmail.com (Veronica A.) Date: Wed, 17 Apr 2013 09:32:53 -0700 Subject: [Bioperl-l] PRIMER_THERMODYNAMIC_PARAMETERS_PATH Message-ID: Hello, I've just updated to bioperl-run (1.006900) and perl (v5.14.2) installed on a Linux machine running Ubuntu 12.04. I have perl scripts that use Bio::Tools::Run::Primer3 and before the updates, ran as expected. However, they are now failing. While testing primer3_core on the command line, I realized the problem was the PRIMER_THERMODYNAMIC_PARAMETERS_PATH setting. I added this to my primer3_core input file and primers were designed. I.T. then moved the primer3_config directory to /opt/primer3_config as instructed and this helped running primer3_core on the command line as I no longer needed to include PRIMER_THERMODYNAMIC_PARAMETERS_PATH to the input file. However, the perl scripts still do not work. Is there a BioPerl config file that requires the PRIMER_THERMODYNAMIC_PARAMETERS_PATH setting? Or did something not install correctly? Thank you, Veronica From mccormack at molbio.mgh.harvard.edu Wed Apr 17 19:08:22 2013 From: mccormack at molbio.mgh.harvard.edu (Matthew McCormack) Date: Wed, 17 Apr 2013 19:08:22 -0400 Subject: [Bioperl-l] get CDS start site for entry in NCBI Message-ID: <516F2B66.7060001@molbio.mgh.harvard.edu> I am not much of a Perl coder and I have a few questions. First, I would like to write a script that will go to NCBI genebank and get the base number for the start of the CDS region, e.g. 235 (given a particular accession number). I have looked at HOWTO's and documentation for Bio::SeqIO and Bio::DB::GenBank and I can cut and paste the examples and they work, but I can not figure out how to get what I want; the CDS start site. I have difficulty knowing what all the methods and their options are for the seqio object and seq_object. Most of the examples seem to be using a file to get information and not a website. Actually, what I have to start with is a TAIR locus number such as AT4g08500, but I can not search on this at NCBI and come up with a unique entry. I may have to have a table of conversions from TAIR locus number to accession numbers. Also, I was looking for a bit of advice. What I am doing is getting data off another web site. I have a script using the WWW::Mechanize module in which I can input a link and go to that webpage, and then go down a line of links (over 100) getting information from each link. As part of that information that I am getting is the number base of a binding site, but I want to know if that binding site is in the CDS. The start number is the start of the gene, so say if the binding site is 235, then I want to know if this is in the CDS. This data is not provided by the website, that is why I want to go to NCBI and get the start of the CDS. The data at NCBI for 'gene' has the same length as the first webpage, but also contains the beginning of the CDS, say 299, so with this information I can tell if the binding site is in the CDS. Do you think the best way to do this is extract the info from the link on the first web page, then go to NCBI and extract the CDS, then back to the original web page and the next link, and so on, for a couple of hundred links ? Or is there a better way ? I am concerned about a script that will keep going back to NCBI. Matthew The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail. From cjfields at illinois.edu Wed Apr 17 21:28:58 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 18 Apr 2013 01:28:58 +0000 Subject: [Bioperl-l] PRIMER_THERMODYNAMIC_PARAMETERS_PATH In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DE4970@CHIMBX5.ad.uillinois.edu> If you are running a newer version of Primer3 you must use the Bio::Tools::Primer3Redux distribution (which also contains a modified wrapper, Bio::Tools::Run::Primer3Redux). The reason was the original module had some fundamental issues that were never addressed and required a rewrite, but since there were issues with backwards-compatibility (and I needed something to work yesterday) I rewrote it. The code is on CPAN and have very few requirements beyond bioperl (of course): https://metacpan.org/release/Bio-Tools-Primer3Redux chris On Apr 17, 2013, at 11:32 AM, Veronica A. wrote: > Hello, > I've just updated to bioperl-run (1.006900) and perl (v5.14.2) installed on a Linux machine running Ubuntu 12.04. > I have perl scripts that use Bio::Tools::Run::Primer3 and before the updates, ran as expected. However, they are now failing. > While testing primer3_core on the command line, I realized the problem was the PRIMER_THERMODYNAMIC_PARAMETERS_PATH setting. I added this to my primer3_core input file and primers were designed. I.T. then moved the primer3_config directory to /opt/primer3_config as instructed and this helped running primer3_core on the command line as I no longer needed to include PRIMER_THERMODYNAMIC_PARAMETERS_PATH to the input file. However, the perl scripts still do not work. > Is there a BioPerl config file that requires the PRIMER_THERMODYNAMIC_PARAMETERS_PATH setting? Or did something not install correctly? > Thank you, > Veronica > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Apr 17 21:31:00 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 18 Apr 2013 01:31:00 +0000 Subject: [Bioperl-l] [Root-l] Some trouble getting started In-Reply-To: References:

<118F034CF4C3EF48A96F86CE585B94BF74DE1105@CHIMBX5.ad.uillinois.edu>

Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DE49AB@CHIMBX5.ad.uillinois.edu> On Apr 17, 2013, at 2:59 PM, Spencer Bliven wrote: > > On Tue, Apr 16, 2013 at 7:00 AM, Peter Cock wrote: > According to their homepage, BioJava are already using > GitHub, I don't know if they turned off filing bugs on Redmine > (and if not, that would be sensible): > https://redmine.open-bio.org/projects/biojava > > > We should indeed turn off filing new bugs for BioJava, although it would be nice to still be able to access existing bugs. I'm not exactly sure how to do this. Would anyone mind switching the BioJava project to read-only, or should I dig into the redmine docs? > > -Spencer Closing a repo (making read-only) is only supported in newer versions of redmine, but we can probably set it up so that bug reports for that project require elevated permissions (which would actively prevent anyone beyond a group admin from submitting new ones). Should suffice; I can check up on that. chris From voldrani at gmail.com Thu Apr 18 10:13:05 2013 From: voldrani at gmail.com (Chris Maloney) Date: Thu, 18 Apr 2013 10:13:05 -0400 Subject: [Bioperl-l] get CDS start site for entry in NCBI In-Reply-To: <516F2B66.7060001@molbio.mgh.harvard.edu> References: <516F2B66.7060001@molbio.mgh.harvard.edu> Message-ID: I am a noob with BioPerl, so I don't know how to implement this exactly, but from an NCBI Eutilities perspective, you can get many records at once. You can use ESearch to give you a list of IDs: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nuccore&term=AT4g08500&retmode=xml And then use EFetch (http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch) you can give a list if IDs right in the request: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=332656411,240256243&retmode=xml If the list is long, then set usehistory=1 in your esearch: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nuccore&term=AT4g08500&retmode=xml&usehistory=1 and from that result, grab the WebEnv and use it in your efetch request. I think all of these should be well supported by BioPerl. Probably look at https://metacpan.org/module/Bio::Tools::EUtilities. Hope that helps! Chris Maloney On Wed, Apr 17, 2013 at 7:08 PM, Matthew McCormack wrote: > I am not much of a Perl coder and I have a few questions. > > First, I would like to write a script that will go to NCBI genebank and > get the base number for the start of the CDS region, e.g. 235 (given a > particular accession number). I have looked at HOWTO's and documentation for > Bio::SeqIO and Bio::DB::GenBank and I can cut and paste the examples and > they work, but I can not figure out how to get what I want; the CDS start > site. I have difficulty knowing what all the methods and their options are > for the seqio object and seq_object. Most of the examples seem to be using a > file to get information and not a website. > > Actually, what I have to start with is a TAIR locus number such as > AT4g08500, but I can not search on this at NCBI and come up with a unique > entry. I may have to have a table of conversions from TAIR locus number to > accession numbers. > > Also, I was looking for a bit of advice. What I am doing is getting data > off another web site. I have a script using the WWW::Mechanize module in > which I can input a link and go to that webpage, and then go down a line of > links (over 100) getting information from each link. As part of that > information that I am getting is the number base of a binding site, but I > want to know if that binding site is in the CDS. The start number is the > start of the gene, so say if the binding site is 235, then I want to know if > this is in the CDS. This data is not provided by the website, that is why I > want to go to NCBI and get the start of the CDS. The data at NCBI for 'gene' > has the same length as the first webpage, but also contains the beginning of > the CDS, say 299, so with this information I can tell if the binding site is > in the CDS. Do you think the best way to do this is extract the info from > the link on the first web page, then go to NCBI and extract the CDS, then > back to the original web page and the next link, and so on, for a couple of > hundred links ? Or is there a better way ? I am concerned about a script > that will keep going back to NCBI. > > Matthew > > > > The information in this e-mail is intended only for the person to whom it is > addressed. If you believe this e-mail was sent to you in error and the > e-mail > contains patient information, please contact the Partners Compliance > HelpLine at > http://www.partners.org/complianceline . If the e-mail was sent to you in > error > but does not contain patient information, please contact the sender and > properly > dispose of the e-mail. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sharmashalu.bio at gmail.com Thu Apr 18 11:17:02 2013 From: sharmashalu.bio at gmail.com (shalu sharma) Date: Thu, 18 Apr 2013 11:17:02 -0400 Subject: [Bioperl-l] Taxa Id from blast report Message-ID: Hi All. I have a default blastx report and i want to get taxa ids from it along with other usual information. I am using Bioperl to parse the blast report, but i don't know how to get taxa ids. Thanks Shalu From p.j.a.cock at googlemail.com Thu Apr 18 12:05:13 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 18 Apr 2013 17:05:13 +0100 Subject: [Bioperl-l] Taxa Id from blast report In-Reply-To: References: Message-ID: On Thu, Apr 18, 2013 at 4:17 PM, shalu sharma wrote: > Hi All. > I have a default blastx report and i want to get taxa ids from it > along with other usual information. I am using Bioperl to parse the blast > report, but i don't know how to get taxa ids. > > Thanks > Shalu If you upgrade to the latest BLAST+ and you can have this as a column in the tabular output - easy to parse in Perl etc: http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.html For example, $ blastx -outfmt "6 std staxids" ... Peter From sharmashalu.bio at gmail.com Thu Apr 18 12:13:40 2013 From: sharmashalu.bio at gmail.com (shalu sharma) Date: Thu, 18 Apr 2013 12:13:40 -0400 Subject: [Bioperl-l] Taxa Id from blast report In-Reply-To: References:

Message-ID: Thanks a lot Peter. Actually i still use legacy blast, so do i also get description of the hit if i use tabular format and also do i have to format the database with some modification ? I really appreciate you help. Thanks Shalu On Thu, Apr 18, 2013 at 12:05 PM, Peter Cock wrote: > On Thu, Apr 18, 2013 at 4:17 PM, shalu sharma > wrote: > > Hi All. > > I have a default blastx report and i want to get taxa ids from > it > > along with other usual information. I am using Bioperl to parse the blast > > report, but i don't know how to get taxa ids. > > > > Thanks > > Shalu > > If you upgrade to the latest BLAST+ and you can have this as a > column in the tabular output - easy to parse in Perl etc: > > http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.html > > For example, > > $ blastx -outfmt "6 std staxids" ... > > Peter > From p.j.a.cock at googlemail.com Thu Apr 18 12:21:03 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 18 Apr 2013 17:21:03 +0100 Subject: [Bioperl-l] Taxa Id from blast report In-Reply-To: References:

Message-ID: > On Thu, Apr 18, 2013 at 12:05 PM, Peter Cock > wrote: >> >> On Thu, Apr 18, 2013 at 4:17 PM, shalu sharma >> wrote: >> > Hi All. >> > I have a default blastx report and i want to get taxa ids from >> > it along with other usual information. I am using Bioperl to parse >> > the blast report, but i don't know how to get taxa ids. >> > >> > Thanks >> > Shalu >> >> If you upgrade to the latest BLAST+ and you can have this as a >> column in the tabular output - easy to parse in Perl etc: >> >> http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.html >> >> For example, >> >> $ blastx -outfmt "6 std staxids" ... >> >> Peter > > On Thu, Apr 18, 2013 at 5:13 PM, shalu sharma wrote: > Thanks a lot Peter. > Actually i still use legacy blast, so do i also get description of the hit > if i use tabular format and also do i have to format the database with some > modification ? > I really appreciate you help. > > Thanks > Shalu If you are using BLAST 2.2.28+ then you can also ask for the hit descriptions, also explored on my blog post. e.g. $ blastx -outfmt "6 std staxids stitle" ... or, $ blastx -outfmt "6 std staxids salltitles" ... Peter From shalabh.sharma7 at gmail.com Thu Apr 18 12:32:08 2013 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 18 Apr 2013 12:32:08 -0400 Subject: [Bioperl-l] Taxa Id from blast report In-Reply-To: References:

Message-ID: Hey Peter, Thanks a lot, I really appreciate it. I wanted these things implemented in blast from long time. Thanks Shalabh On Thu, Apr 18, 2013 at 12:21 PM, Peter Cock wrote: > > On Thu, Apr 18, 2013 at 12:05 PM, Peter Cock > > wrote: > >> > >> On Thu, Apr 18, 2013 at 4:17 PM, shalu sharma < > sharmashalu.bio at gmail.com> > >> wrote: > >> > Hi All. > >> > I have a default blastx report and i want to get taxa ids > from > >> > it along with other usual information. I am using Bioperl to parse > >> > the blast report, but i don't know how to get taxa ids. > >> > > >> > Thanks > >> > Shalu > >> > >> If you upgrade to the latest BLAST+ and you can have this as a > >> column in the tabular output - easy to parse in Perl etc: > >> > >> > http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.html > >> > >> For example, > >> > >> $ blastx -outfmt "6 std staxids" ... > >> > >> Peter > > > > > > On Thu, Apr 18, 2013 at 5:13 PM, shalu sharma > wrote: > > Thanks a lot Peter. > > Actually i still use legacy blast, so do i also get description of the > hit > > if i use tabular format and also do i have to format the database with > some > > modification ? > > I really appreciate you help. > > > > Thanks > > Shalu > > If you are using BLAST 2.2.28+ then you can also ask for the > hit descriptions, also explored on my blog post. e.g. > > $ blastx -outfmt "6 std staxids stitle" ... > > or, > > $ blastx -outfmt "6 std staxids salltitles" ... > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From p.j.a.cock at googlemail.com Thu Apr 18 12:48:53 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 18 Apr 2013 17:48:53 +0100 Subject: [Bioperl-l] Taxa Id from blast report In-Reply-To: References:

Message-ID: On Thu, Apr 18, 2013 at 5:32 PM, shalabh sharma wrote: > Hey Peter, > Thanks a lot, I really appreciate it. I wanted these things > implemented in blast from long time. > > Thanks > Shalabh Me too. You can get the descriptions from the plain text BLAST or XML output already of course, but they're not so nice to work with. Peter From cjfields at illinois.edu Thu Apr 18 12:49:16 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 18 Apr 2013 16:49:16 +0000 Subject: [Bioperl-l] Taxa Id from blast report In-Reply-To: References:

Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DE7086@CHIMBX5.ad.uillinois.edu> The old C-base BLAST is now considered deprecated by NCBI, so new features are now only implemented in BLAST+. Which is unfortunate, as a lot of people are resisting a move to BLAST+ for many (sometimes very good). Ah well. chris On Apr 18, 2013, at 11:32 AM, shalabh sharma wrote: > Hey Peter, > Thanks a lot, I really appreciate it. I wanted these things > implemented in blast from long time. > > Thanks > Shalabh > > > On Thu, Apr 18, 2013 at 12:21 PM, Peter Cock wrote: > >>> On Thu, Apr 18, 2013 at 12:05 PM, Peter Cock >>> wrote: >>>> >>>> On Thu, Apr 18, 2013 at 4:17 PM, shalu sharma < >> sharmashalu.bio at gmail.com> >>>> wrote: >>>>> Hi All. >>>>> I have a default blastx report and i want to get taxa ids >> from >>>>> it along with other usual information. I am using Bioperl to parse >>>>> the blast report, but i don't know how to get taxa ids. >>>>> >>>>> Thanks >>>>> Shalu >>>> >>>> If you upgrade to the latest BLAST+ and you can have this as a >>>> column in the tabular output - easy to parse in Perl etc: >>>> >>>> >> http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.html >>>> >>>> For example, >>>> >>>> $ blastx -outfmt "6 std staxids" ... >>>> >>>> Peter >>> >>> >> >> On Thu, Apr 18, 2013 at 5:13 PM, shalu sharma >> wrote: >>> Thanks a lot Peter. >>> Actually i still use legacy blast, so do i also get description of the >> hit >>> if i use tabular format and also do i have to format the database with >> some >>> modification ? >>> I really appreciate you help. >>> >>> Thanks >>> Shalu >> >> If you are using BLAST 2.2.28+ then you can also ask for the >> hit descriptions, also explored on my blog post. e.g. >> >> $ blastx -outfmt "6 std staxids stitle" ... >> >> or, >> >> $ blastx -outfmt "6 std staxids salltitles" ... >> >> Peter >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Apr 18 14:26:13 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 18 Apr 2013 18:26:13 +0000 Subject: [Bioperl-l] Taxa Id from blast report In-Reply-To: References:

Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DE75B3@CHIMBX5.ad.uillinois.edu> On Apr 18, 2013, at 11:48 AM, Peter Cock wrote: > On Thu, Apr 18, 2013 at 5:32 PM, shalabh sharma > wrote: >> Hey Peter, >> Thanks a lot, I really appreciate it. I wanted these things >> implemented in blast from long time. >> >> Thanks >> Shalabh > > Me too. You can get the descriptions from the plain text BLAST or > XML output already of course, but they're not so nice to work with. > > Peter NCBI has been much more receptive of user input over the last several years, much more so than in the past. I understand the reasoning for dropping BLAST support (though there were definitely needless bumps in that process). chris From Russell.Smithies at agresearch.co.nz Thu Apr 18 15:52:39 2013 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 19 Apr 2013 07:52:39 +1200 Subject: [Bioperl-l] Taxa Id from blast report In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DE75B3@CHIMBX5.ad.uillinois.edu> References:

<118F034CF4C3EF48A96F86CE585B94BF74DE75B3@CHIMBX5.ad.uillinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF37365A0E601@exchsth.agresearch.co.nz> I agree they have finally listened and added features requested by users but I've been suggesting they have a compressed output format available from eutils or genbank for years but have made no headway ;- ( What's so hard about gzip'ping the output? I'm sure it would go a long way toward solving all the problems we get with truncated replies from queries!! --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Fields, Christopher J Sent: Friday, 19 April 2013 6:26 a.m. To: Peter Cock Cc: bioperl-l at lists.open-bio.org; shalu sharma; shalabh sharma Subject: Re: [Bioperl-l] Taxa Id from blast report On Apr 18, 2013, at 11:48 AM, Peter Cock wrote: > On Thu, Apr 18, 2013 at 5:32 PM, shalabh sharma > wrote: >> Hey Peter, >> Thanks a lot, I really appreciate it. I wanted these things >> implemented in blast from long time. >> >> Thanks >> Shalabh > > Me too. You can get the descriptions from the plain text BLAST or XML > output already of course, but they're not so nice to work with. > > Peter NCBI has been much more receptive of user input over the last several years, much more so than in the past. I understand the reasoning for dropping BLAST support (though there were definitely needless bumps in that process). chris _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From sbliven at ucsd.edu Wed Apr 17 15:59:46 2013 From: sbliven at ucsd.edu (Spencer Bliven) Date: Wed, 17 Apr 2013 12:59:46 -0700 Subject: [Bioperl-l] [Root-l] Some trouble getting started In-Reply-To: References:

<118F034CF4C3EF48A96F86CE585B94BF74DE1105@CHIMBX5.ad.uillinois.edu> Message-ID: On Tue, Apr 16, 2013 at 7:00 AM, Peter Cock wrote: > According to their homepage, BioJava are already using > GitHub, I don't know if they turned off filing bugs on Redmine > (and if not, that would be sensible): > https://redmine.open-bio.org/projects/biojava > We should indeed turn off filing new bugs for BioJava, although it would be nice to still be able to access existing bugs. I'm not exactly sure how to do this. Would anyone mind switching the BioJava project to read-only, or should I dig into the redmine docs? -Spencer From vhanna77 at gmail.com Wed Apr 17 21:35:43 2013 From: vhanna77 at gmail.com (Veronica Armendarez) Date: Wed, 17 Apr 2013 18:35:43 -0700 Subject: [Bioperl-l] PRIMER_THERMODYNAMIC_PARAMETERS_PATH In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DE4970@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF74DE4970@CHIMBX5.ad.uillinois.edu> Message-ID: Thank you. On Apr 17, 2013 6:30 PM, "Fields, Christopher J" wrote: > If you are running a newer version of Primer3 you must use the > Bio::Tools::Primer3Redux distribution (which also contains a modified > wrapper, Bio::Tools::Run::Primer3Redux). The reason was the original > module had some fundamental issues that were never addressed and required a > rewrite, but since there were issues with backwards-compatibility (and I > needed something to work yesterday) I rewrote it. > > The code is on CPAN and have very few requirements beyond bioperl (of > course): > > https://metacpan.org/release/Bio-Tools-Primer3Redux > > chris > > On Apr 17, 2013, at 11:32 AM, Veronica A. > wrote: > > > Hello, > > I've just updated to bioperl-run (1.006900) and perl (v5.14.2) installed > on a Linux machine running Ubuntu 12.04. > > I have perl scripts that use Bio::Tools::Run::Primer3 and before the > updates, ran as expected. However, they are now failing. > > While testing primer3_core on the command line, I realized the problem > was the PRIMER_THERMODYNAMIC_PARAMETERS_PATH setting. I added this to my > primer3_core input file and primers were designed. I.T. then moved the > primer3_config directory to /opt/primer3_config as instructed and this > helped running primer3_core on the command line as I no longer needed to > include PRIMER_THERMODYNAMIC_PARAMETERS_PATH to the input file. However, > the perl scripts still do not work. > > Is there a BioPerl config file that requires the > PRIMER_THERMODYNAMIC_PARAMETERS_PATH setting? Or did something not install > correctly? > > Thank you, > > Veronica > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From brianfoleynm at gmail.com Thu Apr 18 19:17:59 2013 From: brianfoleynm at gmail.com (Brian Foley) Date: Thu, 18 Apr 2013 16:17:59 -0700 (PDT) Subject: [Bioperl-l] Annotation-assisted (and/or BLAST assisted) multiple sequence alignment tool? Message-ID: <1a8f3740-bb6f-4817-b312-7776fdb6f47d@googlegroups.com> At the HIV Sequence and Immunolgogy Databases (http://www.hiv.lanl.gov) where I work, we have used a bit of creativity to solve some difficult problems in multiple sequence alignment, because we often want to produce an alignment of gene sequences from more than 20,000 different isolates of HIV-1 in less than a few minutes time. We are very good at "deep" multiple alignment, thousands of copies of the same small genome. My problem comes when I want to align the genomes of other viruses or similar sized gene regions (the complete mitochondrial genomes of vertebrates for example, which are roughly 17 kb in size), they don't always have the same gene order. A good example are the mitochondrial genomes of birds and mammals, which are mostly co-linear, but with the NADH6 gene moved to a different location. See attached JPG of Aardvark and Japanese Eagle-Hawk mitochondrial genomes. In other cases, I think it is the primate mitochondrial genomes, the authors all used a different site for the "base #1" in the circular genome. So although the primate mitochondrial genomes are 100% co-linear with other vertebrates, we have to chop several thousand bases off the right end and past them onto the left end (5' end, beginning) to make them align with the mt-genomes of other mammals. So, it seems to me that there ought to be a multiple sequence alignment tool, that can read GenBank files with their annotation, and use the annotation to help with the alignment process. One tool that I am aware of, which can help a lot, is the "Artemis Genome Comparison Tool" (ACT) and its associated DOUBLE-ACT server: http://www.hpa-bioinfotools.org.uk/pise/double_act.html The DOUBLE-ACT server uses BLAST to find regions on a pair of genomes which are homologous/similar and creates a table of these matched regions. The Artemis Comparison Tool then loads both genomes into an ARTEMIS Genome Browser tool and uses the BLAST hit table to help the browser get both genomes "in synch" with each other as you browse the genomes. Although the DOUBLE-ACT BLAST step here is not dependent on annotations at all, the annotations are visible when browsing the genomes in ACT. I am quite sure that I am not the only one in the world who needs this type of tool. I am increasingly seeing large multiple sequence alignments being done for classification of organisms, where the authors could have used such a tool. Please let me know if you have any ideas about where to look for such a tool, or which groups of bioinformatics workers might be able to develop one. Brian T. Foley, PhD HIV Databases Los Alamos National Laboratory btf at lanl.gov 505 665-1970 -------------- next part -------------- A non-text attachment was scrubbed... Name: Avian_Mammal_mtGenomeMaps.jpg Type: image/jpeg Size: 170205 bytes Desc: not available URL: From sharmashalu.bio at gmail.com Fri Apr 19 13:05:47 2013 From: sharmashalu.bio at gmail.com (shalu sharma) Date: Fri, 19 Apr 2013 13:05:47 -0400 Subject: [Bioperl-l] Taxa Id from blast report In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF37365A0E601@exchsth.agresearch.co.nz> References:

<118F034CF4C3EF48A96F86CE585B94BF74DE75B3@CHIMBX5.ad.uillinois.edu> <18DF7D20DFEC044098A1062202F5FFF37365A0E601@exchsth.agresearch.co.nz> Message-ID: Hi, Thanks everyone for you inputs. @Peter: I got really excited when i saw that you can even get super kingdom, but when i tried to test it i just got taxa ids but not the super kingdom. Do you have any idea whats going wrong? my command: blastx -query test.fas -db /db/ncbiblast/refseq/latest/refseq_protein -max_target_seqs 1 -outfmt "6 staxids sskingdoms" output: 246200 N/A 246200 N/A Thanks Shalu On Thu, Apr 18, 2013 at 3:52 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > I agree they have finally listened and added features requested by users > but I've been suggesting they have a compressed output format available > from eutils or genbank for years but have made no headway ;- ( > What's so hard about gzip'ping the output? I'm sure it would go a long way > toward solving all the problems we get with truncated replies from queries!! > > --Russell > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto: > bioperl-l-bounces at lists.open-bio.org] On Behalf Of Fields, Christopher J > Sent: Friday, 19 April 2013 6:26 a.m. > To: Peter Cock > Cc: bioperl-l at lists.open-bio.org; shalu sharma; shalabh sharma > Subject: Re: [Bioperl-l] Taxa Id from blast report > > On Apr 18, 2013, at 11:48 AM, Peter Cock > wrote: > > > On Thu, Apr 18, 2013 at 5:32 PM, shalabh sharma > > wrote: > >> Hey Peter, > >> Thanks a lot, I really appreciate it. I wanted these things > >> implemented in blast from long time. > >> > >> Thanks > >> Shalabh > > > > Me too. You can get the descriptions from the plain text BLAST or XML > > output already of course, but they're not so nice to work with. > > > > Peter > > NCBI has been much more receptive of user input over the last several > years, much more so than in the past. I understand the reasoning for > dropping BLAST support (though there were definitely needless bumps in that > process). > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From jason.stajich at gmail.com Fri Apr 19 15:32:32 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 19 Apr 2013 15:32:32 -0400 Subject: [Bioperl-l] get CDS start site for entry in NCBI In-Reply-To: <516F2B66.7060001@molbio.mgh.harvard.edu> References: <516F2B66.7060001@molbio.mgh.harvard.edu> Message-ID: <60256F4E-1D92-4E46-A872-ABD27E4E0251@gmail.com> you want to loop through all the features and get the ones which are cds and then in this case get the start of the feature. The code for that is: for my $feat ( $seq->get_SeqFeatures() ) { If( $feat->primary_tag eq 'CDS' ) { print $feat->start, "\n"; } } On Apr 17, 2013, at 7:08 PM, Matthew McCormack wrote: > I am not much of a Perl coder and I have a few questions. > > First, I would like to write a script that will go to NCBI genebank and get the base number for the start of the CDS region, e.g. 235 (given a particular accession number). I have looked at HOWTO's and documentation for Bio::SeqIO and Bio::DB::GenBank and I can cut and paste the examples and they work, but I can not figure out how to get what I want; the CDS start site. I have difficulty knowing what all the methods and their options are for the seqio object and seq_object. Most of the examples seem to be using a file to get information and not a website. > > Actually, what I have to start with is a TAIR locus number such as AT4g08500, but I can not search on this at NCBI and come up with a unique entry. I may have to have a table of conversions from TAIR locus number to accession numbers. > > Also, I was looking for a bit of advice. What I am doing is getting data off another web site. I have a script using the WWW::Mechanize module in which I can input a link and go to that webpage, and then go down a line of links (over 100) getting information from each link. As part of that information that I am getting is the number base of a binding site, but I want to know if that binding site is in the CDS. The start number is the start of the gene, so say if the binding site is 235, then I want to know if this is in the CDS. This data is not provided by the website, that is why I want to go to NCBI and get the start of the CDS. The data at NCBI for 'gene' has the same length as the first webpage, but also contains the beginning of the CDS, say 299, so with this information I can tell if the binding site is in the CDS. Do you think the best way to do this is extract the info from the link on the first web page, then go to NCBI and extract the CDS, then back to the original web page and the next link, and so on, for a couple of hundred links ? Or is there a better way ? I am concerned about a script that will keep going back to NCBI. > > Matthew > > > > The information in this e-mail is intended only for the person to whom it is > addressed. If you believe this e-mail was sent to you in error and the e-mail > contains patient information, please contact the Partners Compliance HelpLine at > http://www.partners.org/complianceline . If the e-mail was sent to you in error > but does not contain patient information, please contact the sender and properly > dispose of the e-mail. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason.stajich at gmail.com Fri Apr 19 15:38:00 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 19 Apr 2013 15:38:00 -0400 Subject: [Bioperl-l] Taxa Id from blast report In-Reply-To: References:

<118F034CF4C3EF48A96F86CE585B94BF74DE75B3@CHIMBX5.ad.uillinois.edu> <18DF7D20DFEC044098A1062202F5FFF37365A0E601@exchsth.agresearch.co.nz> Message-ID: Did you provide -parse_seqids in the header? Peter dealt with related things here: http://blastedbio.blogspot.com/2012/10/my-ids-not-good-enough-for-ncbi-blast.html Jason On Apr 19, 2013, at 1:05 PM, shalu sharma wrote: > Hi, > Thanks everyone for you inputs. > @Peter: > I got really excited when i saw that you can even get super kingdom, but > when i tried to test it i just got taxa ids but not the super kingdom. Do > you have any idea whats going wrong? > my command: > blastx -query test.fas -db /db/ncbiblast/refseq/latest/refseq_protein > -max_target_seqs 1 -outfmt "6 staxids sskingdoms" > > output: > 246200 N/A > 246200 N/A > > Thanks > Shalu > > > On Thu, Apr 18, 2013 at 3:52 PM, Smithies, Russell < > Russell.Smithies at agresearch.co.nz> wrote: > >> I agree they have finally listened and added features requested by users >> but I've been suggesting they have a compressed output format available >> from eutils or genbank for years but have made no headway ;- ( >> What's so hard about gzip'ping the output? I'm sure it would go a long way >> toward solving all the problems we get with truncated replies from queries!! >> >> --Russell >> >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto: >> bioperl-l-bounces at lists.open-bio.org] On Behalf Of Fields, Christopher J >> Sent: Friday, 19 April 2013 6:26 a.m. >> To: Peter Cock >> Cc: bioperl-l at lists.open-bio.org; shalu sharma; shalabh sharma >> Subject: Re: [Bioperl-l] Taxa Id from blast report >> >> On Apr 18, 2013, at 11:48 AM, Peter Cock >> wrote: >> >>> On Thu, Apr 18, 2013 at 5:32 PM, shalabh sharma >>> wrote: >>>> Hey Peter, >>>> Thanks a lot, I really appreciate it. I wanted these things >>>> implemented in blast from long time. >>>> >>>> Thanks >>>> Shalabh >>> >>> Me too. You can get the descriptions from the plain text BLAST or XML >>> output already of course, but they're not so nice to work with. >>> >>> Peter >> >> NCBI has been much more receptive of user input over the last several >> years, much more so than in the past. I understand the reasoning for >> dropping BLAST support (though there were definitely needless bumps in that >> process). >> >> chris >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Sat Apr 20 01:24:16 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sat, 20 Apr 2013 05:24:16 +0000 Subject: [Bioperl-l] get CDS start site for entry in NCBI In-Reply-To: <516F2B66.7060001@molbio.mgh.harvard.edu> References: <516F2B66.7060001@molbio.mgh.harvard.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DFA246@CHIMBX5.ad.uillinois.edu> On Apr 17, 2013, at 6:08 PM, Matthew McCormack wrote: > I am not much of a Perl coder and I have a few questions. > > First, I would like to write a script that will go to NCBI genebank and get the base number for the start of the CDS region, e.g. 235 (given a particular accession number). I have looked at HOWTO's and documentation for Bio::SeqIO and Bio::DB::GenBank and I can cut and paste the examples and they work, but I can not figure out how to get what I want; the CDS start site. I have difficulty knowing what all the methods and their options are for the seqio object and seq_object. Most of the examples seem to be using a file to get information and not a website. > > Actually, what I have to start with is a TAIR locus number such as AT4g08500, but I can not search on this at NCBI and come up with a unique entry. I may have to have a table of conversions from TAIR locus number to accession numbers. Saying this as a Bioperl developer, there are easier ways than using NCBI. I would possibly use Gramene's Biomart for this: http://www.gramene.org/biomart/martview You can select TAIR10, then (under 'Filters') select ID list limit and paste in your list. Under 'Attributes' there are several options under the 'Features' button; you can change that button to 'Structure' and 'Exon' will pop up, where you can select CDS start/end. Using your ID, when you click results you get (sorry about the tabs): Ensembl Gene ID Ensembl Transcript ID CDS end (within cDNA) CDS start (within cDNA) AT4G08500 AT4G08500.1 1058 1 AT4G08500 AT4G08500.1 1143 1059 AT4G08500 AT4G08500.1 1206 1144 AT4G08500 AT4G08500.1 1364 1207 AT4G08500 AT4G08500.1 1437 1365 AT4G08500 AT4G08500.1 1497 1438 AT4G08500 AT4G08500.1 1611 1498 AT4G08500 AT4G08500.1 1827 1612 Also appears you can also retrieve the protein sequences in bulk from TAIR: http://www.arabidopsis.org/tools/bulk/sequences/index.jsp The sequence descriptors contain coords (start-end and strand), this might suffice if you merely want the translational start/end. > Also, I was looking for a bit of advice. What I am doing is getting data off another web site. I have a script using the WWW::Mechanize module in which I can input a link and go to that webpage, and then go down a line of links (over 100) getting information from each link. As part of that information that I am getting is the number base of a binding site, but I want to know if that binding site is in the CDS. The start number is the start of the gene, so say if the binding site is 235, then I want to know if this is in the CDS. This data is not provided by the website, that is why I want to go to NCBI and get the start of the CDS. The data at NCBI for 'gene' has the same length as the first webpage, but also contains the beginning of the CDS, say 299, so with this information I can tell if the binding site is in the CDS. Do you think the best way to do this is extract the info from the link on the first web page, then go to NCBI and extract the CDS, then back to the original web page and the next link, and so on, for a couple of hundred links ? Or is there a better way ? I am concerned about a script that will keep going back to NCBI. First, I would normally suggest if there is any way to get your hands on the raw data then do it; however, if you've gone to the extent of using WWW::Mechanize you've probably gone that route already w/o success. If you can get the initial data in batches (e.g. limit the number of requests) then it might help speed things up. Also, re: NCBI and other website queries: try to limit these to either a small # per day (hundred); depending on the site, if they see an inordinate # of requests from the same IP they may block it to prevent overloading the site or even potential DDoS attacks (NCBI and UCSC do this for example). You can also use EUtilities to retrieve the raw data in bulk. What you are trying to do is actually very common, namely finding intersections/unions of features. BioPerl can do that; many other tools are also available (BEDtools being the most prominently used). The key part is getting it into a format that is acceptable (BED, GFF, etc), but once that is in place such comparisons aren't terribly hard to do. chris > Matthew > > > > The information in this e-mail is intended only for the person to whom it is > addressed. If you believe this e-mail was sent to you in error and the e-mail > contains patient information, please contact the Partners Compliance HelpLine at > http://www.partners.org/complianceline . If the e-mail was sent to you in error > but does not contain patient information, please contact the sender and properly > dispose of the e-mail. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Mon Apr 22 15:43:29 2013 From: scott at scottcain.net (Scott Cain) Date: Mon, 22 Apr 2013 15:43:29 -0400 Subject: [Bioperl-l] Fwd: [Genome Informatics] Student Proposals Now Being Accepted for Genome Infromatics - Google Summer of Code 2013 In-Reply-To: <2d2cdbed-55b6-461a-a378-d24ffadda6b8@googlegroups.com> References: <2d2cdbed-55b6-461a-a378-d24ffadda6b8@googlegroups.com> Message-ID: ---------- Forwarded message ---------- From: rhaw Date: Mon, Apr 22, 2013 at 3:38 PM Subject: [Genome Informatics] Student Proposals Now Being Accepted for Genome Infromatics - Google Summer of Code 2013 To: genome-informatics at googlegroups.com Hi there, We are now accepting applications from students to participate in Google Summer of Code 2013. Please check out the FAQs [1], timeline [2], and student manual [3] if you are unfamiliar with the process. You can also read the Melange manual if you need help with Melange [4]. The deadline to apply is May 3 at 19:00 UTC [5]. Late proposals will not be accepted for any reason. Regards, Robin The Genome Informatics GSoC homepage is at http://gmod.org/wiki/GSoC [1] - http://www.google-melange.com/**gsoc/document/show/gsoc_** program/google/gsoc2013/help_**page# [2] - http://www.google-melange.com/**gsoc/events/google/gsoc2013 [3] - http://en.flossmanuals.net/**GSoCstudentguide/ [4] - http://en.flossmanuals.net/**melange/students-students-** application-phase/ [5] - http://goo.gl/ZSYyp -- You received this message because you are subscribed to the Google Groups "Genome Informatics-Google Summer of Code Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to genome-informatics+unsubscribe at googlegroups.com. To post to this group, send email to genome-informatics at googlegroups.com. Visit this group at http://groups.google.com/group/genome-informatics?hl=en-US. For more options, visit https://groups.google.com/groups/opt_out. -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From voldrani at gmail.com Mon Apr 22 17:00:50 2013 From: voldrani at gmail.com (Chris Maloney) Date: Mon, 22 Apr 2013 17:00:50 -0400 Subject: [Bioperl-l] YAPC-NA 2013 and BioPerl talk Message-ID: I just noticed this event, and I see that genehack posted about it to the list back in February. Nevertheless, I thought it was worth an update / reminder: http://www.yapcna.org/yn2013/talk/4687. > Jay and John will spend 50 minutes ping-ponging back and forth defending their positions: Bioperl sucks and/or is the pinnacle of human scientific achievement. You decide. Is anyone planning on going? I'm thinking about it, but it's a long shot. Chris Maloney From cjfields at illinois.edu Mon Apr 22 17:14:49 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 22 Apr 2013 21:14:49 +0000 Subject: [Bioperl-l] YAPC-NA 2013 and BioPerl talk In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74E03594@CHIMBX5.ad.uillinois.edu> On Apr 22, 2013, at 4:00 PM, Chris Maloney wrote: > I just noticed this event, and I see that genehack posted about it to > the list back in February. Nevertheless, I thought it was worth an > update / reminder: http://www.yapcna.org/yn2013/talk/4687. > >> Jay and John will spend 50 minutes ping-ponging back and forth defending their positions: Bioperl sucks and/or is the pinnacle of human scientific achievement. You decide. > > Is anyone planning on going? I'm thinking about it, but it's a long shot. > > Chris Maloney It would be worth going to see; I can't do it (I'll be prepping for a class around that time). I'm interested to see what the perl community at large thinks. chris (f) From genehack at genehack.org Mon Apr 22 17:16:05 2013 From: genehack at genehack.org (John SJ Anderson) Date: Mon, 22 Apr 2013 14:16:05 -0700 Subject: [Bioperl-l] YAPC-NA 2013 and BioPerl talk In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74E03594@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF74E03594@CHIMBX5.ad.uillinois.edu> Message-ID: On Mon, Apr 22, 2013 at 2:14 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > > It would be worth going to see; I can't do it (I'll be prepping for a > class around that time). I'm interested to see what the perl community at > large thinks. > ... no pressure. 8^) From mccormack at molbio.mgh.harvard.edu Mon Apr 22 19:11:05 2013 From: mccormack at molbio.mgh.harvard.edu (Matthew McCormack) Date: Mon, 22 Apr 2013 19:11:05 -0400 Subject: [Bioperl-l] get CDS start site for entry in NCBI In-Reply-To: <516F2B66.7060001@molbio.mgh.harvard.edu> References: <516F2B66.7060001@molbio.mgh.harvard.edu> Message-ID: <5175C389.4040906@molbio.mgh.harvard.edu> Ke, Chris and Christopher, Exploring the ENSEMBL perl API, BioPerl with NCBI Eutilities, and Gramene's Biomart, I have learned much and am sure that I can find a solution among them. Thank you very much for your help and suggestions. Matthew On 4/17/2013 7:08 PM, Matthew McCormack wrote: > I am not much of a Perl coder and I have a few questions. > > First, I would like to write a script that will go to NCBI > genebank and get the base number for the start of the CDS region, e.g. > 235 (given a particular accession number). I have looked at HOWTO's > and documentation for Bio::SeqIO and Bio::DB::GenBank and I can cut > and paste the examples and they work, but I can not figure out how to > get what I want; the CDS start site. I have difficulty knowing what > all the methods and their options are for the seqio object and > seq_object. Most of the examples seem to be using a file to get > information and not a website. > > Actually, what I have to start with is a TAIR locus number such as > AT4g08500, but I can not search on this at NCBI and come up with a > unique entry. I may have to have a table of conversions from TAIR > locus number to accession numbers. > > Also, I was looking for a bit of advice. What I am doing is getting > data off another web site. I have a script using the WWW::Mechanize > module in which I can input a link and go to that webpage, and then go > down a line of links (over 100) getting information from each link. As > part of that information that I am getting is the number base of a > binding site, but I want to know if that binding site is in the CDS. > The start number is the start of the gene, so say if the binding site > is 235, then I want to know if this is in the CDS. This data is not > provided by the website, that is why I want to go to NCBI and get the > start of the CDS. The data at NCBI for 'gene' has the same length as > the first webpage, but also contains the beginning of the CDS, say > 299, so with this information I can tell if the binding site is in the > CDS. Do you think the best way to do this is extract the info from the > link on the first web page, then go to NCBI and extract the CDS, then > back to the original web page and the next link, and so on, for a > couple of hundred links ? Or is there a better way ? I am concerned > about a script that will keep going back to NCBI. > > Matthew > > > > The information in this e-mail is intended only for the person to whom > it is > addressed. If you believe this e-mail was sent to you in error and the > e-mail > contains patient information, please contact the Partners Compliance > HelpLine at > http://www.partners.org/complianceline . If the e-mail was sent to you > in error > but does not contain patient information, please contact the sender > and properly > dispose of the e-mail. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sharmashalu.bio at gmail.com Tue Apr 23 13:14:04 2013 From: sharmashalu.bio at gmail.com (shalu sharma) Date: Tue, 23 Apr 2013 13:14:04 -0400 Subject: [Bioperl-l] Taxa Id from blast report In-Reply-To: References:

<118F034CF4C3EF48A96F86CE585B94BF74DE75B3@CHIMBX5.ad.uillinois.edu> <18DF7D20DFEC044098A1062202F5FFF37365A0E601@exchsth.agresearch.co.nz>

Message-ID: Hi Jason, Thanks a lot for you suggestion. I tried that too but i am still not getting super kingdom, actually i don't know how to put super kingdom in the database. For example: This is how i formatted my refseq microbial database: makeblastdb -dbtype prot -in microbial_protein_mask.fasta -out refMicro -taxid_map GItaxa.txt -parse_seqids ( where GItaxa is the file ), there is no super kingdom. So when i run this blast command: blastx -query test.fas -db refMicro -max_target_seqs 1 -outfmt "6 staxids sscinames sskingdoms" 246200 N/A N/A 246200 N/A N/A I would really appreciate you help. Thanks Shalu On Fri, Apr 19, 2013 at 3:38 PM, Jason Stajich wrote: > Did you provide -parse_seqids in the header? > > Peter dealt with related things here: > > http://blastedbio.blogspot.com/2012/10/my-ids-not-good-enough-for-ncbi-blast.html > > Jason > > On Apr 19, 2013, at 1:05 PM, shalu sharma > wrote: > > Hi, > Thanks everyone for you inputs. > @Peter: > I got really excited when i saw that you can even get super kingdom, but > when i tried to test it i just got taxa ids but not the super kingdom. Do > you have any idea whats going wrong? > my command: > blastx -query test.fas -db /db/ncbiblast/refseq/latest/refseq_protein > -max_target_seqs 1 -outfmt "6 staxids sskingdoms" > > output: > 246200 N/A > 246200 N/A > > Thanks > Shalu > > > On Thu, Apr 18, 2013 at 3:52 PM, Smithies, Russell < > Russell.Smithies at agresearch.co.nz> wrote: > > I agree they have finally listened and added features requested by users > but I've been suggesting they have a compressed output format available > from eutils or genbank for years but have made no headway ;- ( > What's so hard about gzip'ping the output? I'm sure it would go a long way > toward solving all the problems we get with truncated replies from > queries!! > > --Russell > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto: > bioperl-l-bounces at lists.open-bio.org] On Behalf Of Fields, Christopher J > Sent: Friday, 19 April 2013 6:26 a.m. > To: Peter Cock > Cc: bioperl-l at lists.open-bio.org; shalu sharma; shalabh sharma > Subject: Re: [Bioperl-l] Taxa Id from blast report > > On Apr 18, 2013, at 11:48 AM, Peter Cock > wrote: > > On Thu, Apr 18, 2013 at 5:32 PM, shalabh sharma > wrote: > > Hey Peter, > Thanks a lot, I really appreciate it. I wanted these things > implemented in blast from long time. > > Thanks > Shalabh > > > Me too. You can get the descriptions from the plain text BLAST or XML > output already of course, but they're not so nice to work with. > > Peter > > > NCBI has been much more receptive of user input over the last several > years, much more so than in the past. I understand the reasoning for > dropping BLAST support (though there were definitely needless bumps in that > process). > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > From brianfoleynm at gmail.com Sat Apr 20 18:36:46 2013 From: brianfoleynm at gmail.com (Brian Foley) Date: Sat, 20 Apr 2013 15:36:46 -0700 (PDT) Subject: [Bioperl-l] Annotation-assisted (and/or BLAST assisted) multiple sequence alignment tool? In-Reply-To: <1a8f3740-bb6f-4817-b312-7776fdb6f47d@googlegroups.com> References: <1a8f3740-bb6f-4817-b312-7776fdb6f47d@googlegroups.com> Message-ID: <4a890997-a1cb-4ad8-8b17-08a12b552c1d@googlegroups.com> >From another group, I got an answer. The MAUVE genome alignment software is supposed to solve this problem. http://gel.ahabs.wisc.edu/mauve/ I have not yet downloaded it and tested it, but it does seem to be the right tool for the job. From stephane.teletchea at inserm.fr Tue Apr 23 12:10:23 2013 From: stephane.teletchea at inserm.fr (=?ISO-8859-1?Q?T=E9letch=E9a_St=E9phane?=) Date: Tue, 23 Apr 2013 18:10:23 +0200 Subject: [Bioperl-l] Problem using column_from_residue_number Message-ID: <5176B26F.2010607@inserm.fr> Dear bioperlers, I am facing a problem I used to encounter in the past and failed to report it properly. I am either mis-using it or there is a bug, I would like your feedback. Consider the alignment between sequence A and B in test.ali, and the bioperl_test.pl as examples (attached to this mail). I would like to know at position i what is the amino acid of seq A, and what is the amino acid in position i in sequence B (of course these residues may not be aligned). The problem is: a) I cannot parse the first residue of the alignment, it has to start at 1, otherwise I have: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Second argument residue number missing STACK: Error::throw STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:472 STACK: Bio::SimpleAlign::column_from_residue_number /usr/share/perl5/Bio/SimpleAlign.pm:2626 STACK: ./bioperl_test.pl:19 ----------------------------------------------------------- b) Although, if I understand correctly, this function should return only the residues, sometimes I have a gap inserted, so the function is not working properly: ./bioperl_test.pl |grep '-' P:11 for seq 1 and -:30 for seq 2 S:76 for seq 1 and -:96 for seq 2 -:124 for seq 1 and L:146 for seq 2 -:150 for seq 1 and K:170 for seq 2 -:175 for seq 1 and I:190 for seq 2 -:202 for seq 1 and L:215 for seq 2 -:224 for seq 1 and G:233 for seq 2 W:264 for seq 1 and -:263 for seq 2 S:296 for seq 1 and -:297 for seq 2 -:343 for seq 1 and S:345 for seq 2 -:369 for seq 1 and H:361 for seq 2 Can we discuss it here or I have to open a bug report? Thanks in advance, St?phane -- Equipe DSIMB - Dynamique des Structures et des Interactions des Macromol?cules Biologiques INTS, INSERM-Paris-Diderot UMR-S665 6 rue Alexandre Cabanel - 75739 Paris cedex 15- France T?l : +33 144 493 057 Fax : +33 147 347 431 http://www.dsimb.inserm.fr / http://steletch.free.fr -------------- next part -------------- CLUSTAL 2.1 multiple sequence alignment sp|Q16570|DUFFY_HUMAN MGNCLHRAELSPSTENSSQLDFEDVWNSSYGVNDSFPDGDYGANLEAAAP sp|P61073|CXCR4_HUMAN -------------------MEGISIYTSDN-YTEEMGSGDYDSMKEPCFR :: .::.*. .:.: .***.: *.. sp|Q16570|DUFFY_HUMAN CHSCNLLDDSALPFFILTSVLGILASSTVLFMLFRPLFRWQLCPGWPVLA sp|P61073|CXCR4_HUMAN EENANFNKIFLPTIYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKY--RL ...*: . .:: : : **:... *:::: .: : sp|Q16570|DUFFY_HUMAN QLAVGSALFSIVVPVLAPGLGSTR--SSALCSLGYCVWYGSAFAQALLLG sp|P61073|CXCR4_HUMAN HLSVADLLFVITLPFWAVDAVANWYFGNFLCKAVHVIYTVNLYSSVLILA :*:*.. ** *.:*. * . :. .. **. : :: . ::..*:*. sp|Q16570|DUFFY_HUMAN -----CHASLGHRLGAGQVPGLTLG--LTVGIWGVAALLTLPVTLASGAS sp|P61073|CXCR4_HUMAN FISLDRYLAIVHATNSQRPRKLLAEKVVYVGVWIPALLLTIPDFIFANVS : :: * .: : * : **:* * ***:* : :..* sp|Q16570|DUFFY_HUMAN GG----LCTLIYSTELKALQATHT----------VACLAIFVLLPLGLFG sp|P61073|CXCR4_HUMAN EADDRYICDRFYPNDLWVVVFQFQHIMVGLILPGIVILSCYCIIISKLSH . :* :*..:* .: . :. *: : :: * sp|Q16570|DUFFY_HUMAN AKGLKKALGMGPGPWMNILWAWFIFWWPHGVVLGLDFLVRSKLLLLSTCL sp|P61073|CXCR4_HUMAN SKGHQKRKALKTT--VILILAFFACWLPYYIGISIDSFILLEIIKQG-CE :** :* .: . : :: *:* * *: : :.:* :: ::: . * sp|Q16570|DUFFY_HUMAN AQQALDLLLNLAEALAILHCVATPLLLALFCHQATRTLLPSLP------- sp|P61073|CXCR4_HUMAN FENTVHKWISITEALAFFHCCLNPILYAFLGAKFKTSAQHALTSVSRGSS ::::. :.::****::** .*:* *:: : . : :*. sp|Q16570|DUFFY_HUMAN ---LPEGWSSHLDTLGSKS-------- sp|P61073|CXCR4_HUMAN LKILSKGKRGGHSSVSTESESSSFHSS *.:* . .::.::* -------------- next part -------------- A non-text attachment was scrubbed... Name: bioperl_test.pl Type: application/x-perl Size: 803 bytes Desc: not available URL: From roy.chaudhuri at gmail.com Tue Apr 23 13:48:40 2013 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 23 Apr 2013 18:48:40 +0100 Subject: [Bioperl-l] Problem using column_from_residue_number In-Reply-To: <5176B26F.2010607@inserm.fr> References: <5176B26F.2010607@inserm.fr> Message-ID: Hi St?phane, I think you are confusing Bioperl's biological coordinates (which start at 1) with Perl coordinates (which start at 0). Just realising this distinction should answer your first problem - the first column of the alignment is column 1, so of course you get an error (admittedly a not very helpful one) when you specify column 0. The same confusion causes a subtle bug in your example code for problem b - you use $col1 and $col2 for your substr coordinates, but they are biological coordinates returned by Bioperl, so you need to subtract one (or you could use the Bio::Seq method subseq rather than Perl's substr). Correcting this bug means that you do not get any gaps reported in the output. Cheers, Roy. On 23 April 2013 17:10, T?letch?a St?phane wrote: > Dear bioperlers, > > I am facing a problem I used to encounter in the past and failed to report > it properly. > > I am either mis-using it or there is a bug, I would like your feedback. > > Consider the alignment between sequence A and B in test.ali, > and the bioperl_test.pl as examples (attached to this mail). > > I would like to know at position i what is the amino acid of seq A, > and what is the amino acid in position i in sequence B (of course these > residues may not be aligned). > > The problem is: > > a) I cannot parse the first residue of the alignment, it has to start at > 1, otherwise I have: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Second argument residue number missing > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/**Root.pm:472 > STACK: Bio::SimpleAlign::column_from_**residue_number > /usr/share/perl5/Bio/**SimpleAlign.pm:2626 > STACK: ./bioperl_test.pl:19 > ------------------------------**----------------------------- > > b) Although, if I understand correctly, this function should return only > the residues, > sometimes I have a gap inserted, so the function is not working properly: > > ./bioperl_test.pl |grep '-' > P:11 for seq 1 and -:30 for seq 2 > S:76 for seq 1 and -:96 for seq 2 > -:124 for seq 1 and L:146 for seq 2 > -:150 for seq 1 and K:170 for seq 2 > -:175 for seq 1 and I:190 for seq 2 > -:202 for seq 1 and L:215 for seq 2 > -:224 for seq 1 and G:233 for seq 2 > W:264 for seq 1 and -:263 for seq 2 > S:296 for seq 1 and -:297 for seq 2 > -:343 for seq 1 and S:345 for seq 2 > -:369 for seq 1 and H:361 for seq 2 > > Can we discuss it here or I have to open a bug report? > > Thanks in advance, > > St?phane > > -- > Equipe DSIMB - Dynamique des Structures et > des Interactions des Macromol?cules Biologiques > INTS, INSERM-Paris-Diderot UMR-S665 > 6 rue Alexandre Cabanel - 75739 Paris cedex 15- France > T?l : +33 144 493 057 > Fax : +33 147 347 431 > http://www.dsimb.inserm.fr / http://steletch.free.fr > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From stephane.teletchea at inserm.fr Tue Apr 23 14:44:47 2013 From: stephane.teletchea at inserm.fr (=?ISO-8859-1?Q?T=E9letch=E9a_St=E9phane?=) Date: Tue, 23 Apr 2013 20:44:47 +0200 Subject: [Bioperl-l] Problem using column_from_residue_number In-Reply-To: References: <5176B26F.2010607@inserm.fr> Message-ID: <5176D69F.2040602@inserm.fr> Le 23/04/2013 19:48, Roy Chaudhuri a ?crit : > Hi St?phane, > > I think you are confusing Bioperl's biological coordinates (which > start at 1) with Perl coordinates (which start at 0). Just realising > this distinction should answer your first problem - the first column > of the alignment is column 1, so of course you get an error > (admittedly a not very helpful one) when you specify column 0. Hi Roy, Yes, I am aware of this offset which is logical from a human point of view, and so from bioperl's perspective, so this is correct. On the other hand ... > > The same confusion causes a subtle bug in your example code for > problem b - you use $col1 and $col2 for your substr coordinates, but > they are biological coordinates returned by Bioperl, so you need to > subtract one (or you could use the Bio::Seq method subseq rather than > Perl's substr). Correcting this bug means that you do not get any gaps > reported in the output. I have just "not" catch this issue and it never came to my mind I was doing back the conversion perl->bioperl here! I'm so used to using plain perl that I just did not even think of using subseq... Thanks a lot for the rapid answer, (brain) bug solved :-) > > Cheers, > Roy. > Cheers, St?phane -- Equipe DSIMB - Dynamique des Structures et des Interactions des Macromol?cules Biologiques INTS, INSERM-Paris-Diderot UMR-S665 6 rue Alexandre Cabanel - 75739 Paris cedex 15- France T?l : +33 144 493 057 Fax : +33 147 347 431 http://www.dsimb.inserm.fr / http://steletch.free.fr From Russell.Smithies at agresearch.co.nz Tue Apr 23 18:32:25 2013 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 24 Apr 2013 10:32:25 +1200 Subject: [Bioperl-l] Taxa Id from blast report In-Reply-To: References:

<118F034CF4C3EF48A96F86CE585B94BF74DE75B3@CHIMBX5.ad.uillinois.edu> <18DF7D20DFEC044098A1062202F5FFF37365A0E601@exchsth.agresearch.co.nz>

Message-ID: <18DF7D20DFEC044098A1062202F5FFF37365A0F326@exchsth.agresearch.co.nz> It works OK if I do it with NCBI's pre-formatted databases, eg. illustrious$ blastx -query gold_small.fa -db /bifo/infernal/active/blastdata/mirror/nr -max_target_seqs 1 -outfmt "6 staxids sscinames sskingdoms" 411903 Collinsella aerofaciens ATCC 25986 Bacteria 411903 Collinsella aerofaciens ATCC 25986 Bacteria 39947 Oryza sativa Japonica Group Eukaryota 39947 Oryza sativa Japonica Group Eukaryota 39947 Oryza sativa Japonica Group Eukaryota 498761 Heliobacterium modesticaldum Ice1 Bacteria 391296 Streptococcus suis 98HAH33 Bacteria 391296 Streptococcus suis 98HAH33 Bacteria Perhaps it's something to do with your database formatting or sequence IDs? --Russell From: shalu sharma [mailto:sharmashalu.bio at gmail.com] Sent: Wednesday, 24 April 2013 5:14 a.m. To: Jason Stajich Cc: Smithies, Russell; Fields, Christopher J; Peter Cock; shalabh sharma; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Taxa Id from blast report Hi Jason, Thanks a lot for you suggestion. I tried that too but i am still not getting super kingdom, actually i don't know how to put super kingdom in the database. For example: This is how i formatted my refseq microbial database: makeblastdb -dbtype prot -in microbial_protein_mask.fasta -out refMicro -taxid_map GItaxa.txt -parse_seqids ( where GItaxa is the file ), there is no super kingdom. So when i run this blast command: blastx -query test.fas -db refMicro -max_target_seqs 1 -outfmt "6 staxids sscinames sskingdoms" 246200 N/A N/A 246200 N/A N/A I would really appreciate you help. Thanks Shalu On Fri, Apr 19, 2013 at 3:38 PM, Jason Stajich > wrote: Did you provide -parse_seqids in the header? Peter dealt with related things here: http://blastedbio.blogspot.com/2012/10/my-ids-not-good-enough-for-ncbi-blast.html Jason On Apr 19, 2013, at 1:05 PM, shalu sharma > wrote: Hi, Thanks everyone for you inputs. @Peter: I got really excited when i saw that you can even get super kingdom, but when i tried to test it i just got taxa ids but not the super kingdom. Do you have any idea whats going wrong? my command: blastx -query test.fas -db /db/ncbiblast/refseq/latest/refseq_protein -max_target_seqs 1 -outfmt "6 staxids sskingdoms" output: 246200 N/A 246200 N/A Thanks Shalu On Thu, Apr 18, 2013 at 3:52 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: I agree they have finally listened and added features requested by users but I've been suggesting they have a compressed output format available from eutils or genbank for years but have made no headway ;- ( What's so hard about gzip'ping the output? I'm sure it would go a long way toward solving all the problems we get with truncated replies from queries!! --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto: bioperl-l-bounces at lists.open-bio.org] On Behalf Of Fields, Christopher J Sent: Friday, 19 April 2013 6:26 a.m. To: Peter Cock Cc: bioperl-l at lists.open-bio.org; shalu sharma; shalabh sharma Subject: Re: [Bioperl-l] Taxa Id from blast report On Apr 18, 2013, at 11:48 AM, Peter Cock > wrote: On Thu, Apr 18, 2013 at 5:32 PM, shalabh sharma > wrote: Hey Peter, Thanks a lot, I really appreciate it. I wanted these things implemented in blast from long time. Thanks Shalabh Me too. You can get the descriptions from the plain text BLAST or XML output already of course, but they're not so nice to work with. Peter NCBI has been much more receptive of user input over the last several years, much more so than in the past. I understand the reasoning for dropping BLAST support (though there were definitely needless bumps in that process). chris _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From francesco.brundu at gmail.com Mon Apr 29 12:36:42 2013 From: francesco.brundu at gmail.com (francesco.brundu at gmail.com) Date: Mon, 29 Apr 2013 18:36:42 +0200 Subject: [Bioperl-l] Information on warning message using bp_genbank2gff.pl Message-ID: Hi all, I am having some trouble running bioperl script bp_genbank2gff.pl. The script works fine, but I don't understand a warning message when retrieving a gff (in this case gff3). That message is: 'skipping a misc_feature' I have searched a lot on google but I didn't find an answer. I tried to look up the code.. http://doc.bioperl.org/bioperl-live/Bio/DB/GFF/Adaptor/biofetch.html but since I am not fluent in Perl I just saw that in case feature is a misc feature, warning is raised. Now, maybe I misunderstood, but the misc feature is one from column 9 tag as described in http://gmod.org/wiki/GFF ? If yes, shouldn't its name be printed out when it is being skipped, to let the user know which feature has been removed from gff? Thanks -- ~Francesco Brundu From shalabh.sharma7 at gmail.com Mon Apr 29 16:25:58 2013 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 29 Apr 2013 16:25:58 -0400 Subject: [Bioperl-l] Downloading sequences in batch from Trace Archive Message-ID: Hi All, Is there any module in Bioperl that can download sequences from NCBI's trace archive? Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From witch.of.agnessi at gmail.com Mon Apr 29 17:55:01 2013 From: witch.of.agnessi at gmail.com (WoA) Date: Mon, 29 Apr 2013 14:55:01 -0700 (PDT) Subject: [Bioperl-l] Missing Method: "get_tiled_alns" Message-ID: <35347929.post@talk.nabble.com> Hi All, I'm trying to get alignment from tiled information generated out of a Blastx report, (for two sequences made by 'blas2seq'). I'm using ActivePerl 15.6.2 and Bioperl 1.6 in Windows 7 However I'm getting the following error: Can't locate object method "get_tiled_alns" via package "Bio::Search::Tiling::Ma pTiling" at tile_align_blastx.pm line 11, line 72. The code is as follows: use strict; use warnings; use Bio::SearchIO; use Bio::Search::Tiling::MapTiling; my $blio = Bio::SearchIO->new( -file => 'test.blastx2seq.out', -format =>'blast'); my $result = $blio->next_result; my $hit = $result->next_hit; my $tiling = Bio::Search::Tiling::MapTiling->new($hit); my @alns = $tiling->get_tiled_alns('abc123'); my $concat_seq_obj = $alns[0]->get_seq_by_id('abc123'); -- View this message in context: http://old.nabble.com/Missing-Method%3A-%22get_tiled_alns%22-tp35347929p35347929.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From witch.of.agnessi at gmail.com Mon Apr 29 17:54:12 2013 From: witch.of.agnessi at gmail.com (WoA) Date: Mon, 29 Apr 2013 14:54:12 -0700 (PDT) Subject: [Bioperl-l] Missing Method: "get_tiled_alns" Message-ID: <35347926.post@talk.nabble.com> Hi All, I'm trying to get alignment from tiled information generated out of a Blastx report, (for two sequences made by 'blas2seq'). I'm using ActivePerl 15.6.2 and Bioperl 1.6 in Windows 7 However I'm getting the following error: Can't locate object method "get_tiled_alns" via package "Bio::Search::Tiling::Ma pTiling" at tile_align_blastx.pm line 11, line 72. The code is as follows: use strict; use warnings; use Bio::SearchIO; use Bio::Search::Tiling::MapTiling; my $blio = Bio::SearchIO->new( -file => 'test.blastx2seq.out', -format =>'blast'); my $result = $blio->next_result; my $hit = $result->next_hit; my $tiling = Bio::Search::Tiling::MapTiling->new($hit); my @alns = $tiling->get_tiled_alns('abc123'); my $concat_seq_obj = $alns[0]->get_seq_by_id('abc123'); -- View this message in context: http://old.nabble.com/Missing-Method%3A-%22get_tiled_alns%22-tp35347926p35347926.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From witch.of.agnessi at gmail.com Mon Apr 29 17:56:19 2013 From: witch.of.agnessi at gmail.com (WoA) Date: Mon, 29 Apr 2013 14:56:19 -0700 (PDT) Subject: [Bioperl-l] Missing Method: "get_tiled_alns" Message-ID: <35347933.post@talk.nabble.com> I'm trying to get alignment from tiled information generated out of a Blastx report, (for two sequences made by 'blas2seq'). I'm using ActivePerl 15.6.2 and Bioperl 1.6 in Windows 7 However I'm getting the following error: Can't locate object method "get_tiled_alns" via package "Bio::Search::Tiling::Ma pTiling" at tile_align_blastx.pm line 11, line 72. The code is as follows: use strict; use warnings; use Bio::SearchIO; use Bio::Search::Tiling::MapTiling; my $blio = Bio::SearchIO->new( -file => 'test.blastx2seq.out', -format =>'blast'); my $result = $blio->next_result; my $hit = $result->next_hit; my $tiling = Bio::Search::Tiling::MapTiling->new($hit); my @alns = $tiling->get_tiled_alns('abc123'); my $concat_seq_obj = $alns[0]->get_seq_by_id('abc123'); -- View this message in context: http://old.nabble.com/Missing-Method%3A-%22get_tiled_alns%22-tp35347933p35347933.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Tue Apr 30 12:43:58 2013 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 30 Apr 2013 16:43:58 +0000 Subject: [Bioperl-l] Missing Method:"get tiled alnnns" Message-ID: That was probably a beta method (though it has tests). Not being around much, I couldn't tell you details, but you can find the code (you would need all the modules here) at https://github.com/bioperl/bioperl-live/tree/8164a826a4366ad3ec56c33cadfd9f0639262217/Bio/Search/Tiling MAJ >-----Original Message----- >From: WoA [mailto:witch.of.agnessi at gmail.com] >Sent: Monday, April 29, 2013 05:56 PM >To: Bioperl-l at lists.open-bio.org >Subject: [Bioperl-l] Missing Method: "get_tiled_alns" > > >I'm trying to get alignment from tiled information generated out of a Blastx >report, (for two sequences made by 'blas2seq'). I'm using ActivePerl 15.6.2 >and Bioperl 1.6 in Windows 7 > >However I'm getting the following error: > >Can't locate object method "get_tiled_alns" via package >"Bio::Search::Tiling::Ma >pTiling" at tile_align_blastx.pm line 11, line 72. > > > > >The code is as follows: > >use strict; >use warnings; >use Bio::SearchIO; >use Bio::Search::Tiling::MapTiling; > >my $blio = Bio::SearchIO->new( -file => 'test.blastx2seq.out', -format >=>'blast'); >my $result = $blio->next_result; >my $hit = $result->next_hit; >my $tiling = Bio::Search::Tiling::MapTiling->new($hit); >my @alns = $tiling->get_tiled_alns('abc123'); > >my $concat_seq_obj = $alns[0]->get_seq_by_id('abc123'); > > >-- >View this message in context: http://old.nabble.com/Missing-Method%3A-%22get_tiled_alns%22-tp35347933p35347933.html >Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From john.chen.bioinfo at gmail.com Sun Apr 28 21:00:12 2013 From: john.chen.bioinfo at gmail.com (John Chen) Date: Mon, 29 Apr 2013 01:00:12 -0000 Subject: [Bioperl-l] Bioperl-live has (near) empty Bio::Root::Root? In-Reply-To: References: Message-ID: Hi Chris, Leon, Warren and Tristan, Got it. Thank you so much. best, John On Saturday, April 13, 2013 1:37:36 PM UTC-4, John Chen wrote: > > Hi, > > I have perlbrew with perl-5.8.9 installed. And I tried to get the > bioperl-live via Git following the instruction on bioperl official site: > http://www.bioperl.org/wiki/Using_Git > > However, it seems the testing bioperl version encountered error (below) > immediately. > Manually checking the cloned bioperl-live revealed there is only one > module > HTTPget.pm under Bio/Root (see below), > I also check the online Github bioperl-live repository, the Bio/Root > folder indeed only contain only one modeul: HTTPget.pm > > is this normal or I need to use some alternative approach to get the > bioperl (the CPAN package is rather old though)? > > Thanks > > John > > *% perl -MBio::Perl -le 'print Bio::Perl->VERSION;'* > ------------------------------------------------------------------------------------------------------------ > > Base class package *"Bio::Root::Root" is empty.* > (Perhaps you need to 'use' the module which defines that package > first, > or make that module available in @INC (@INC contains: > $HOME/src/bioperl-live/ > $HOME/perlbrew//perls/perl-5.8.9/lib/5.8.9/x86_64-linux > $HOME/perlbrew//perls/perl-5.8.9/lib/5.8.9 > $HOME/perlbrew//perls/perl-5.8.9/lib/site_perl/5.8.9/x86_64-linux > $HOME/perlbrew//perls/perl-5.8.9/lib/site_perl/5.8.9 .). > at $HOME/src/bioperl-live//Bio/Location/WidestCoordPolicy.pm line 80 > BEGIN failed--compilation aborted at > $HOME/src/bioperl-live//Bio/Location/WidestCoordPolicy.pm line 80. > Compilation failed in require at > $HOME/src/bioperl-live//Bio/Location/Atomic.pm line 79. > BEGIN failed--compilation aborted at > $HOME/src/bioperl-live//Bio/Location/Atomic.pm line 79. > Compilation failed in require at (eval 1) line 3. > ...propagated at $HOME/perlbrew//perls/perl-5.8.9/lib/5.8.9/ > base.pm > line 93. > BEGIN failed--compilation aborted at > $HOME/src/bioperl-live//Bio/Location/Simple.pm line 87. > Compilation failed in require at > $HOME/src/bioperl-live//Bio/Factory/FTLocationFactory.pm line 97. > BEGIN failed--compilation aborted at > $HOME/src/bioperl-live//Bio/Factory/FTLocationFactory.pm line 97. > Compilation failed in require at $HOME/src/bioperl-live//Bio/SeqIO.pm line > 328. > BEGIN failed--compilation aborted at $HOME/src/bioperl-live//Bio/SeqIO.pm > line 328. > Compilation failed in require at $HOME/src/bioperl-live//Bio/Perl.pm line > 120. > BEGIN failed--compilation aborted at $HOME/src/bioperl-live//Bio/Perl.pm > line 120. > Compilation failed in require. > BEGIN failed--compilation aborted. > ------------------------------------------------------------------------------------------------------------ > > %ls $HOME/src/bioperl-live/Bio/Root/ > HTTPget.pm > ------------------------------------------------------------------------------------------------------------ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From tianwenlan at gmail.com Mon Apr 29 14:17:10 2013 From: tianwenlan at gmail.com (Wenlan Tian) Date: Mon, 29 Apr 2013 18:17:10 -0000 Subject: [Bioperl-l] blastxml Message-ID: <87a06932-b199-4bc5-abc7-a55d9186bf70@googlegroups.com> Hi,all, I had done a blastall search and got a xml output file. How could i parse it to other format? I've found many scripts online, but no one worked. I am new to linux. Can anyone give me a good script to use? Thanks, Vivi From p.j.a.cock at googlemail.com Mon Apr 1 10:28:43 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 1 Apr 2013 15:28:43 +0100 Subject: [Bioperl-l] Google Summer of Code - BioPerl / BioSQL proposal? Message-ID: On 18 March 2013 21:26, Christopher Fields wrote: > Just a heads-up, if there are any students interested in the Google Summer of > Code, the Open Bioinformatics Foundation is planning on participating again > this year! Pjotr Prins will be organizing for OBF; all the Bio* projects are > looking for prospective projects. > > We're open for any project ideas this year, so let us know what you would like > to do! I suggested this last year too, but improving support for BioSQL on SQLite would be great - the schema exists and seems to work fine, but is currently only handled by the Biopython BioSQL bindings. So, the core of a BioSQL/BioPerl GSoC project could tackle: * Adding SQLite support to the BioSQL scripts for loading taxonomies etc https://github.com/biosql/biosql * Adding SQLite support to BioPerl's BioSQL adapter, bioperl-db https://github.com/bioperl/bioperl-db There are a number of things that could be added to this basic idea to make the project more ambitious and to fill out a full summer. One is to extend this to doing BioSQL on SQLite bindings for BioRuby or BioJava (assuming suitable co-mentors are available). One of the nice things about SQLite compared to MySQL or PostgreSQL is the database is just one binary file on disk which is easily portable and can even be checked into source code control for unit tests. This means we can use it to make cross-binding testing far far easier. Thus another part of a GSoC project could be to use the SQlite bindings to establish cross-project testing of the BioSQL implementations for consistency. At that point I'd be interested from the BioSQL and Biopython side, and Biopython may have a few possible co-mentors here. Anyway, to be viable this project would need a Perl mentor with a good knowledge of BioSQL and BioPerl's bindings for it. Is this worth adding to the BioPerl GSoC as a possible idea? http://bioperl.org/wiki/Google_Summer_of_Code#Project_Ideas_for_2013 Regards, Peter (Who doesn't know enough Perl to qualify as a mentor for this) From cjfields at illinois.edu Mon Apr 1 10:35:25 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 1 Apr 2013 14:35:25 +0000 Subject: [Bioperl-l] Google Summer of Code - BioPerl / BioSQL proposal? In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DC9169@CHIMBX5.ad.uillinois.edu> On Apr 1, 2013, at 9:28 AM, Peter Cock wrote: > On 18 March 2013 21:26, Christopher Fields wrote: >> Just a heads-up, if there are any students interested in the Google Summer of >> Code, the Open Bioinformatics Foundation is planning on participating again >> this year! Pjotr Prins will be organizing for OBF; all the Bio* projects are >> looking for prospective projects. >> >> We're open for any project ideas this year, so let us know what you would like >> to do! > > I suggested this last year too, but improving support for BioSQL on > SQLite would be great - the schema exists and seems to work fine, > but is currently only handled by the Biopython BioSQL bindings. ++++1 > So, the core of a BioSQL/BioPerl GSoC project could tackle: > > * Adding SQLite support to the BioSQL scripts for loading taxonomies etc > https://github.com/biosql/biosql > > * Adding SQLite support to BioPerl's BioSQL adapter, bioperl-db > https://github.com/bioperl/bioperl-db > > There are a number of things that could be added to this basic idea > to make the project more ambitious and to fill out a full summer. One > is to extend this to doing BioSQL on SQLite bindings for BioRuby or > BioJava (assuming suitable co-mentors are available). > > One of the nice things about SQLite compared to MySQL or PostgreSQL > is the database is just one binary file on disk which is easily portable and > can even be checked into source code control for unit tests. This means > we can use it to make cross-binding testing far far easier. Thus another > part of a GSoC project could be to use the SQlite bindings to establish > cross-project testing of the BioSQL implementations for consistency. > At that point I'd be interested from the BioSQL and Biopython side, > and Biopython may have a few possible co-mentors here. > > Anyway, to be viable this project would need a Perl mentor with a > good knowledge of BioSQL and BioPerl's bindings for it. > > Is this worth adding to the BioPerl GSoC as a possible idea? > http://bioperl.org/wiki/Google_Summer_of_Code#Project_Ideas_for_2013 Yes. Will add this now. > Regards, > > Peter > (Who doesn't know enough Perl to qualify as a mentor for this) There was a reasonable push for this a while back (BOSC in Boston) but nothing came of it code-wise that I have seen. Not sure where it stands now. chris From hlapp at drycafe.net Mon Apr 1 12:38:46 2013 From: hlapp at drycafe.net (Hilmar Lapp) Date: Mon, 1 Apr 2013 12:38:46 -0400 Subject: [Bioperl-l] Google Summer of Code - BioPerl / BioSQL proposal? In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DC9169@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF74DC9169@CHIMBX5.ad.uillinois.edu> Message-ID: <6B9A2644-9075-486F-977C-86750D88B737@drycafe.net> Just to add another possibility for a summer project here, Bioperl-db (the BioPerl bindings to BioSQL) in essence constitute a self-made ORM, invented at a time when DBIx::Class didn't exist yet. As such, it has some advantages (if you are willing to count overly clever features to be counted in this category), but arguably many more disadvantages, chief among them being the unsustainably small (you could also say non-existent) developer community supporting it, and the fact that DBIx::Class now has existed for years, and is fairly mature. So, rewriting Bioperl-db with a DBIx::Class (or another well-supported generic ORM) would, I think, stand to make a considerable impact on our ability to further develop Bioperl's relational storage capabilities, as well as BioSQL itself. And I'd be willing to help out with such a project in a at least a co-mentoring capacity. (If primary mentor, I'd need a committed co-mentor to make it viable.) -hilmar On Apr 1, 2013, at 10:35 AM, Fields, Christopher J wrote: > On Apr 1, 2013, at 9:28 AM, Peter Cock wrote: > >> On 18 March 2013 21:26, Christopher Fields wrote: >>> Just a heads-up, if there are any students interested in the Google Summer of >>> Code, the Open Bioinformatics Foundation is planning on participating again >>> this year! Pjotr Prins will be organizing for OBF; all the Bio* projects are >>> looking for prospective projects. >>> >>> We're open for any project ideas this year, so let us know what you would like >>> to do! >> >> I suggested this last year too, but improving support for BioSQL on >> SQLite would be great - the schema exists and seems to work fine, >> but is currently only handled by the Biopython BioSQL bindings. > > ++++1 > >> So, the core of a BioSQL/BioPerl GSoC project could tackle: >> >> * Adding SQLite support to the BioSQL scripts for loading taxonomies etc >> https://github.com/biosql/biosql >> >> * Adding SQLite support to BioPerl's BioSQL adapter, bioperl-db >> https://github.com/bioperl/bioperl-db >> >> There are a number of things that could be added to this basic idea >> to make the project more ambitious and to fill out a full summer. One >> is to extend this to doing BioSQL on SQLite bindings for BioRuby or >> BioJava (assuming suitable co-mentors are available). >> >> One of the nice things about SQLite compared to MySQL or PostgreSQL >> is the database is just one binary file on disk which is easily portable and >> can even be checked into source code control for unit tests. This means >> we can use it to make cross-binding testing far far easier. Thus another >> part of a GSoC project could be to use the SQlite bindings to establish >> cross-project testing of the BioSQL implementations for consistency. >> At that point I'd be interested from the BioSQL and Biopython side, >> and Biopython may have a few possible co-mentors here. >> >> Anyway, to be viable this project would need a Perl mentor with a >> good knowledge of BioSQL and BioPerl's bindings for it. >> >> Is this worth adding to the BioPerl GSoC as a possible idea? >> http://bioperl.org/wiki/Google_Summer_of_Code#Project_Ideas_for_2013 > > Yes. Will add this now. > >> Regards, >> >> Peter >> (Who doesn't know enough Perl to qualify as a mentor for this) > > There was a reasonable push for this a while back (BOSC in Boston) but nothing came of it code-wise that I have seen. Not sure where it stands now. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From hlapp at drycafe.net Mon Apr 1 12:49:17 2013 From: hlapp at drycafe.net (Hilmar Lapp) Date: Mon, 1 Apr 2013 12:49:17 -0400 Subject: [Bioperl-l] Google Summer of Code - BioPerl proposals In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DC8BD8@CHIMBX5.ad.uillinois.edu> References: <723EBB03-97B3-4750-876C-68F55256DCB6@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF74DC843A@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DC8BD8@CHIMBX5.ad.uillinois.edu> Message-ID: <31EF3BDE-9C45-4088-A83B-39D97F78D650@drycafe.net> On Mar 31, 2013, at 11:28 PM, Fields, Christopher J wrote: > a case would have to be made as to why the initial run at it wasn't completed, or why it would work better this time. We're not suggesting that this can't be done, but the above point would have to be answered. Perhaps just to clarify a bit, as an applying student the readers you'd have in mind for making the case to aren't the senior BioPerl developers or potential mentors (who'd know much of your argument already), but rather those from other OBF projects who will take part in prioritizing applications across the OBF board, but won't know much about the specific BioPerl situation. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From carandraug+dev at gmail.com Mon Apr 1 13:17:49 2013 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Mon, 1 Apr 2013 18:17:49 +0100 Subject: [Bioperl-l] Google Summer of Code - BioPerl proposals In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DC8BD8@CHIMBX5.ad.uillinois.edu> References: <723EBB03-97B3-4750-876C-68F55256DCB6@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF74DC843A@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DC8BD8@CHIMBX5.ad.uillinois.edu> Message-ID: On 1 April 2013 04:28, Fields, Christopher J wrote: > On Mar 31, 2013, at 9:05 PM, Carn? Draug wrote: > >> On 1 April 2013 01:34, Fields, Christopher J wrote: >>> I agree. Another approach might be to cleave off a section that you could mould into your own; this could be done for bioperl-run, bioperl-live, etc. >> >> Why did the project ran out of time 2 years ago? The blog posts about >> it are very few and don't sound too bad. It mentions having prepared a >> couple of them, but none was actually ever released. Instead, the >> source was also kept in bioperl-live and seems to have already >> branched. Is there any reason for this? It was my understanding that >> splitting the project is still desirable, from a discussion back in >> February >> >> http://article.gmane.org/gmane.comp.lang.perl.bio.general/26395 >> >> it just happens that no one has picked it up yet. > > The project actually made a lot of headway; the particular pieces moved out (Bio::Root, Bio::Factory, etc) worked fine, but we never followed up on exactly what to do next on master branch. It's perfectly feasible for someone to go ahead and finish the initial part of that (in fact, I believe there were some branches that started along this path but never merged back in). Can I merge any branching between these and bioperl-live and set them up so you only have to run dzil on their repos? >> I think splitting bioperl-live into subdistributions and make a new >> 1.70 release of each of them is perfectly doable over a summer. And I >> say this after having split and release Bio-Biblio. This is one of my >> itches with BioPerl. I have been using it for almost 3 years, but have >> never seen a release. I would like to make new releases of everything, >> no changes at the start, but take them to the point that "dzil >> release" does everything. Make it really easy for anyone to come in >> and contribute and even easier for a maintainer to make a new release >> after receiving a contribution. Is this desirable for the project? >> > > Hilmar's point is pretty valid, namely that a case would have to be made as to why the initial run at it wasn't completed, or why it would work better this time. We're not suggesting that this can't be done, but the above point would have to be answered. The only reason why I claim to be able to finish this is that I'm very well familiar with both BioPerl and the tools to make the split. Plus, I already split one (and trying to split another) to get a clear idea what it involves. > Frankly, the project has been pretty reliant on me for releases, so it's perfectly valid to point out the modules haven't made it out yet b/c I haven't made a release since then. From that point of view, this would be a continuation of that work, maybe with the intent/focus on making code releases much easier. As a maintainer of another FOSS gigantic project that is also a collection of libraries, I can relate to this. Of course it can be much more interesting to write new sexy code and add it to the huge pile of modules already in bioperl-live but I want to make it easier for others to develop on BioPerl. Comparing with chemistry, I want this to be the equivalent of a catalyst for the development, rather than another reactant. > Regarding updating Bioperl to use Dist::Zilla amongst other modern perl tools (Moose included), yes, it is very much our wish/intent to have this, in any way possible. But I don't think we can call it BioPerl v1.7, simply based on past release cycles; we're somewhat bound by deprecations, etc. We really need a clean break. > > So, my general feeling is that while we are cleaving out code and releasing the independent dist and core, we should re-christen core as 1.9 (e.g. pre-v2). We move to v2 when we feel we're at the right point. Each of the individual distributions would have to start with their own versions, anything greater than the point where they left the core/live distribution should work. I agree with you in that I don't think it would take a long time, but we also have bioperl-run in the mix (and in many cases it would make sense to combine wrappers with the proper parsers), so simply cleaving out from one repo may not be the best approach. > > With that in mind, my point was meant to indicate we can also start afresh with a section of the code that you would like to focus on, using some of the same ideas (pulling out the relevant modules you want to work on). This might be an attainable goal in the minds of GSoC reviewers and might suit your particular needs (for instance, if you had a research project reliant on such code). I'm supportive either way, and I don't think you'll have a problem finding a mentor if you need one. I suggested 1.70 only because it has no change. And it won't be BioPerl 1.7. It would be Bio-Seq, Bio-Align, Bio-Popgen, etc v 1.70. These smaller distributions can then stay as they are or evolve into 2.0 if their maintainers are so interested. I saw biome and liked it, but is the plan to make a BioPerl 2.00 written in Moose? Won't that path take us to the same place we are now in a couple of years? Won't it be better to make the split now, and make the clean break on each smaller distribution? Would you be available to talk about this on #bioperl? I'm online there most of the time. Carn? From cjfields at illinois.edu Mon Apr 1 17:45:16 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 1 Apr 2013 21:45:16 +0000 Subject: [Bioperl-l] Google Summer of Code - BioPerl proposals In-Reply-To: <31EF3BDE-9C45-4088-A83B-39D97F78D650@drycafe.net> References: <723EBB03-97B3-4750-876C-68F55256DCB6@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF74DC843A@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DC8BD8@CHIMBX5.ad.uillinois.edu> <31EF3BDE-9C45-4088-A83B-39D97F78D650@drycafe.net> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DCA76E@CHIMBX5.ad.uillinois.edu> On Apr 1, 2013, at 11:49 AM, Hilmar Lapp wrote: > > On Mar 31, 2013, at 11:28 PM, Fields, Christopher J wrote: > >> a case would have to be made as to why the initial run at it wasn't completed, or why it would work better this time. We're not suggesting that this can't be done, but the above point would have to be answered. > > Perhaps just to clarify a bit, as an applying student the readers you'd have in mind for making the case to aren't the senior BioPerl developers or potential mentors (who'd know much of your argument already), but rather those from other OBF projects who will take part in prioritizing applications across the OBF board, but won't know much about the specific BioPerl situation. > > -hilmar Yep, exactly. -c From cjfields at illinois.edu Mon Apr 1 18:23:45 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 1 Apr 2013 22:23:45 +0000 Subject: [Bioperl-l] Google Summer of Code - BioPerl proposals In-Reply-To: References: <723EBB03-97B3-4750-876C-68F55256DCB6@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF74DC843A@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DC8BD8@CHIMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74DCA9D4@CHIMBX5.ad.uillinois.edu> On Apr 1, 2013, at 12:17 PM, Carn? Draug wrote: > On 1 April 2013 04:28, Fields, Christopher J wrote: >> On Mar 31, 2013, at 9:05 PM, Carn? Draug wrote: >> >>> On 1 April 2013 01:34, Fields, Christopher J wrote: >>>> I agree. Another approach might be to cleave off a section that you could mould into your own; this could be done for bioperl-run, bioperl-live, etc. >>> >>> Why did the project ran out of time 2 years ago? The blog posts about >>> it are very few and don't sound too bad. It mentions having prepared a >>> couple of them, but none was actually ever released. Instead, the >>> source was also kept in bioperl-live and seems to have already >>> branched. Is there any reason for this? It was my understanding that >>> splitting the project is still desirable, from a discussion back in >>> February >>> >>> http://article.gmane.org/gmane.comp.lang.perl.bio.general/26395 >>> >>> it just happens that no one has picked it up yet. >> >> The project actually made a lot of headway; the particular pieces moved out (Bio::Root, Bio::Factory, etc) worked fine, but we never followed up on exactly what to do next on master branch. It's perfectly feasible for someone to go ahead and finish the initial part of that (in fact, I believe there were some branches that started along this path but never merged back in). > > Can I merge any branching between these and bioperl-live and set them > up so you only have to run dzil on their repos? I wouldn't worry about the branches, they are probably too stale. Have it so dzil works for the various repos from that project (it should already). We will likely need to think about having a stub Build.PL that can be used for basic installation, but would be auto-generated based on the needs for that repo (and so shouldn't be committed to). This is mainly to help git-savvy users, not devs; we don't necessarily want users to install dzil, which had somewhere north of 40 or so dependencies IIRC. >>> I think splitting bioperl-live into subdistributions and make a new >>> 1.70 release of each of them is perfectly doable over a summer. And I >>> say this after having split and release Bio-Biblio. This is one of my >>> itches with BioPerl. I have been using it for almost 3 years, but have >>> never seen a release. I would like to make new releases of everything, >>> no changes at the start, but take them to the point that "dzil >>> release" does everything. Make it really easy for anyone to come in >>> and contribute and even easier for a maintainer to make a new release >>> after receiving a contribution. Is this desirable for the project? >>> >> >> Hilmar's point is pretty valid, namely that a case would have to be made as to why the initial run at it wasn't completed, or why it would work better this time. We're not suggesting that this can't be done, but the above point would have to be answered. > > The only reason why I claim to be able to finish this is that I'm very > well familiar with both BioPerl and the tools to make the split. Plus, > I already split one (and trying to split another) to get a clear idea > what it involves. Right, I do think it's feasible. But see Hilmar's response on this point; you don't have to convince us. >> Frankly, the project has been pretty reliant on me for releases, so it's perfectly valid to point out the modules haven't made it out yet b/c I haven't made a release since then. From that point of view, this would be a continuation of that work, maybe with the intent/focus on making code releases much easier. > > As a maintainer of another FOSS gigantic project that is also a > collection of libraries, I can relate to this. Of course it can be > much more interesting to write new sexy code and add it to the huge > pile of modules already in bioperl-live but I want to make it easier > for others to develop on BioPerl. Comparing with chemistry, I want > this to be the equivalent of a catalyst for the development, rather > than another reactant. > >> Regarding updating Bioperl to use Dist::Zilla amongst other modern perl tools (Moose included), yes, it is very much our wish/intent to have this, in any way possible. But I don't think we can call it BioPerl v1.7, simply based on past release cycles; we're somewhat bound by deprecations, etc. We really need a clean break. >> >> So, my general feeling is that while we are cleaving out code and releasing the independent dist and core, we should re-christen core as 1.9 (e.g. pre-v2). We move to v2 when we feel we're at the right point. Each of the individual distributions would have to start with their own versions, anything greater than the point where they left the core/live distribution should work. I agree with you in that I don't think it would take a long time, but we also have bioperl-run in the mix (and in many cases it would make sense to combine wrappers with the proper parsers), so simply cleaving out from one repo may not be the best approach. >> >> With that in mind, my point was meant to indicate we can also start afresh with a section of the code that you would like to focus on, using some of the same ideas (pulling out the relevant modules you want to work on). This might be an attainable goal in the minds of GSoC reviewers and might suit your particular needs (for instance, if you had a research project reliant on such code). I'm supportive either way, and I don't think you'll have a problem finding a mentor if you need one. > > I suggested 1.70 only because it has no change. And it won't be > BioPerl 1.7. It would be Bio-Seq, Bio-Align, Bio-Popgen, etc v 1.70. There may be a point where we will likely find it hard to split out more w/o running into circular dependency issues. This will likely center around Bio::Seq, Bio::SeqFeature, and Bio::Annotation (with others thrown in). But let's see how far we can go with it. If we get to a point where division becomes problematic, we can deem that 'core'. But I would like to see Bio::Seq etc in their own space. Re: versioning: I'm not particularly hung up on any particular versioning scheme, but the key point is support. It's easy for me to say "as of bioperl v2 the installation scheme will be something completely different" as opposed to doing so with v1.7. Will installation of v1.7 be the same is it was for v1.6 (or even similar)? Will it install the same modules by default? We would be changing a key step in using BioPerl (installation) w/o much warning. > These smaller distributions can then stay as they are or evolve into > 2.0 if their maintainers are so interested. I saw biome and liked it, > but is the plan to make a BioPerl 2.00 written in Moose? Not necessarily, unless it can be demonstrated to help considerably. I think it can FWIW. > Won't that > path take us to the same place we are now in a couple of years? Won't > it be better to make the split now, and make the clean break on each > smaller distribution? Right. Exactly. (the latter point :) > Would you be available to talk about this on #bioperl? I'm online > there most of the time. > > Carn? I'll join in tomorrow, sure. I may be on and off channel due to meetings. chris From carandraug+dev at gmail.com Mon Apr 1 21:54:49 2013 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Tue, 2 Apr 2013 02:54:49 +0100 Subject: [Bioperl-l] Google Summer of Code - BioPerl proposals In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74DCA9D4@CHIMBX5.ad.uillinois.edu> References: <723EBB03-97B3-4750-876C-68F55256DCB6@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF74DC843A@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DC8BD8@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DCA9D4@CHIMBX5.ad.uillinois.edu> Message-ID: On 1 April 2013 23:23, Fields, Christopher J wrote: > On Apr 1, 2013, at 12:17 PM, Carn? Draug wrote: >> Can I merge any branching between these and bioperl-live and set them >> up so you only have to run dzil on their repos? > > I wouldn't worry about the branches, they are probably too stale. Have it so dzil works for the various repos from that project (it should already). I tried but I don't have push permissions for Bio-Root like I have for the other BioPerl repos. > We will likely need to think about having a stub Build.PL that can be used for basic installation, but would be auto-generated based on the needs for that repo (and so shouldn't be committed to). This is mainly to help git-savvy users, not devs; we don't necessarily want users to install dzil, which had somewhere north of 40 or so dependencies IIRC. Bah! People using development versions should be prepared to act as developers. Otherwise they should be content with the stable released versions. Development versions are not meant to be stable. I see no reason to give users the chance to shoot themselves, specially when it's more work for developers and maintainers. > Re: versioning: I'm not particularly hung up on any particular versioning scheme, but the key point is support. It's easy for me to say "as of bioperl v2 the installation scheme will be something completely different" as opposed to doing so with v1.7. Will installation of v1.7 be the same is it was for v1.6 (or even similar)? Will it install the same modules by default? We would be changing a key step in using BioPerl (installation) w/o much warning. That is my idea yes. Exactly what happened with Bio-Biblio, it changed close to nothing. There were a few minor changes on the code to pass the tests already in place, bust mostly it was in POD to use the BioPerl's distzilla and podweaver configuration. Carn? From cjfields at illinois.edu Mon Apr 1 22:19:23 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 2 Apr 2013 02:19:23 +0000 Subject: [Bioperl-l] Google Summer of Code - BioPerl proposals In-Reply-To: References: <723EBB03-97B3-4750-876C-68F55256DCB6@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF74DC843A@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DC8BD8@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF74DCA9D4@CHIMBX5.ad.uillinois.edu>