[Bioperl-l] Bioperl-l Digest, Vol 96, Issue 28

khush ........ bioinfo.khush at gmail.com
Fri Apr 29 06:34:29 UTC 2011


Dear,

I am trying to calculate the Ka/ks ratio of my aligned sequences by clustalx
and for the same I am using

So I am using the the scrip given at
https://github.com/bioperl/bioperl-live/blob/master/scripts/utilities/pairwise_kaks.PLS

when I am trying to run the It alert me to chage the line

 "warn("Could not find the executable for $aln_prog, make sure you have
installed it and have either set ".uc($aln_prog)."DIR or it is in your
PATH");"

"Could not find the executable for clustaw, make sure you have installed it
and have either set CLUSTAWDIR or it is in your PATH at kaks.pl line 52."

I have clustalw2 and clustalx installed on my system. How to and where to
set the path for the same and how to calculate the Ka/Ks raio for my
sequences.

Thank you
Kamal






On Fri, Apr 29, 2011 at 11:16 AM, <bioperl-l-request at lists.open-bio.org>wrote:

> Send Bioperl-l mailing list submissions to
>        bioperl-l at lists.open-bio.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://lists.open-bio.org/mailman/listinfo/bioperl-l
> or, via email, send a message with subject or body 'help' to
>        bioperl-l-request at lists.open-bio.org
>
> You can reach the person managing the list at
>        bioperl-l-owner at lists.open-bio.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Bioperl-l digest..."
>
>
> Today's Topics:
>
>   1. Re:  GSoC/BioPerl Reorganization Project (Sheena Scroggins)
>   2. Re:  GSoC/BioPerl Reorganization Project (Chris Fields)
>   3. Re:  GSoC/BioPerl Reorganization Project (Robert Buels)
>   4.   Re: GSoC/BioPerl Reorganization Project (Siddhartha Basu)
>   5. Re:  Standalone blast (khush ........)
>   6. Re:  GSoC/BioPerl Reorganization Project (Robert Buels)
>   7. Re:  Standalone blast (Florent Angly)
>   8. Re:  Standalone blast (khush ........)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 28 Apr 2011 12:53:49 -0700
> From: Sheena Scroggins <sheena.scroggins at gmail.com>
> Subject: Re: [Bioperl-l] GSoC/BioPerl Reorganization Project
> To: Chris Fields <cjfields at illinois.edu>
> Cc: bioperl-l at lists.open-bio.org
> Message-ID: <BANLkTimee8HidYyh6wRY15LdsqdL5KrEuA at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Chris,
>
> We haven't talked much about the versioning yet, but it will be on the list
> to figure out asap.
>
> So far, the plan is to split out Bio::Root first, followed by a couple
> modules that depend only on Bio::Root. The plan I proposed was Bio::Das,
> Bio::Event then Bio::Location. Depending on how much time is remaining for
> the GSoC project, the next to split out would be Bio::Factory and
> Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I plan
> to still help with the reorganization after the internship is over, but I
> obviously have to have a stopping point for the GSoC project.
>
> Rob provide me with a really nice scrip to list dependencies of the
> modules,
> so I plan to make a roadmap towards to end of the summer that will help
> guide the rest of the reorganization. At that point, we'll have to deal
> with
> the circular dependencies carefully.
>
> This is a huge project, much bigger than I can do in one summer. But I plan
> to get it started in a way that makes it easy for others to contribute.
>
> Sheena
>
>
> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields <cjfields at illinois.edu
> >wrote:
>
> > Sheena,
> >
> > Congrats on being accepted! We've talked about doing this over the years,
> > but it's not an easy task and it needs a dedicated project to get the
> ball
> > rolling, so to speak.  Hopefully this isn't tl;dr.  I'll start off with a
> > few of my questions/thoughts (Rob could probably chime in as well, but I
> > think his general thoughts on the project parallel mine):
> >
> > 1) The current BioPerl CPAN could just be a simple install script, acting
> > like a 'Task' or 'Bundle' module, installing the actual Bio-specific
> > distributions.  Doing it this way would allow you to iteratively split
> off
> > additional code but retain the original Task/Bundle-based approach to
> > installation.  For instance, the first pass could split out Root, then
> have
> > a dependency-light and 'extras' distribution, 2nd round split further
> based
> > on function, and so on:
> >
> >  1st round (v 1.9)   :  BioPerl (just an installer) -> installs root,
> > min-deps, extra-deps
> >  2nd round (v 1.901) :  BioPerl (just an installer) -> root, seq/feature,
> > other-min-deps, extra-deps
> >  ...
> >  Xth round (v 1.99)  :  BioPerl (just an installer) -> root, tools, seq,
> > tree, align, coord, map, everything-else
> >  ...
> >
> > Also, one could potentially install modules in various ways:
> interactively,
> > in predetermined groups, using a user-defined list, etc (one could
> > effectively create custom BioPerl installs for GBrowse or other tools for
> > instance).  Of course I would only pick the easiest route to start, but
> > maybe that gives some ideas.  Regardless, if the dependency tree is set
> up
> > correctly any reliance on other Bio* modules would be defined in the
> various
> > Build.PL/Makefile.PL and then installed via CPAN (as is any dependency).
> >
> > 2) The Bio::Root modules are probably the true core modules and are the
> > most stable with regards to changes, so those could be moved to something
> > like BioPerl-Core.  Beyond that, what are the proposed splits?  (we've
> > discussed this on-list before, but it's appropriate to bring this up
> again)
> >
> > 3) How do we want to handle versioning?  We can't (and probably
> shouldn't)
> > release everything on a synchronized versioning scheme (via
> > Bio::Root::Version, for instance), that'll quickly fall apart.
>  Personally I
> > can foresee each split-off dist having it's own version, with the BioPerl
> > network of modules being in effect it's own mini-CPAN.
> >
> > 5) Related to versioning, in my opinion we should maybe aim on eventually
> > calling this BioPerl v2.0 and starting with a simpler X.Y versioning
> scheme.
> >  Lincoln has already done something like this with Bio::Graphics, which
> was
> > originally part of BioPerl but split off prior to v 1.6.0.
> >
> > 6) In some cases I can see particularly thorny problems, such as circular
> > dependencies.  I can think of a few ways to address that (creating a
> simple
> > lightweight Bio::Species class as a fallback if Bio::Tree code isn't
> > present, for instance), but any additional thoughts on this would be
> > helpful.
> >
> > 7) Do we want to set up something like 'git submodule' for the devs to
> pull
> > down all BioPerl-relevant code?
> >
> > Other thoughts?
> >
> > chris
> >
> > On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote:
> >
> > > Hey everyone,
> > >
> > > I wanted to take a minute to introduce myself as one of the Google
> Summer
> > of
> > > Code interns. I was the lucky one chosen to work on the BioPerl
> > > Reorganization (*crowd cheers*). I am a grad student in bioinformatics,
> > and
> > > somewhat new to this level of programming so bear with me as I learn
> the
> > > technical jargon. Luckily I have both Rob and Chris to mentor me this
> > > summer!
> > >
> > > Reading through the mailing list archives, I see there have been many
> > > discussion and differing opinions about tackling this project. Given
> the
> > > time frame for GSoC and my limited experience, there is no way I will
> > > complete this project on my own but I will at least be able to start
> it,
> > > which will hopefully motivate others to pitch in. So far, the plan for
> > the
> > > GSoC project is to start by breaking out Bio::Root, followed by a
> couple
> > > other modules based on their dependencies and the time allowed. Each
> will
> > be
> > > published to CPAN independently. You can follow the project (once it
> > starts)
> > > on github at https://github.com/sheenams.
> > >
> > > I look forward to collaborating with many of you on the reorganization
> > (hint
> > > hint)!
> > >
> > > Sheena
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 28 Apr 2011 16:04:51 -0500
> From: Chris Fields <cjfields at illinois.edu>
> Subject: Re: [Bioperl-l] GSoC/BioPerl Reorganization Project
> To: Sheena Scroggins <sheena.scroggins at gmail.com>
> Cc: BioPerl List <bioperl-l at lists.open-bio.org>,        Robert Buels
>        <rmb32 at cornell.edu>
> Message-ID: <1FF62DC3-941A-4DCB-8464-89D220E4A9C5 at illinois.edu>
> Content-Type: text/plain; charset="us-ascii"
>
> Sounds fine; I think (as you indicate) we can deal with issues along the
> way.  Rob, anything to add?
>
> chris
>
> On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote:
>
> > Chris,
> >
> > We haven't talked much about the versioning yet, but it will be on the
> list to figure out asap.
> >
> > So far, the plan is to split out Bio::Root first, followed by a couple
> modules that depend only on Bio::Root. The plan I proposed was Bio::Das,
> Bio::Event then Bio::Location. Depending on how much time is remaining for
> the GSoC project, the next to split out would be Bio::Factory and
> Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I plan
> to still help with the reorganization after the internship is over, but I
> obviously have to have a stopping point for the GSoC project.
> >
> > Rob provide me with a really nice scrip to list dependencies of the
> modules, so I plan to make a roadmap towards to end of the summer that will
> help guide the rest of the reorganization. At that point, we'll have to deal
> with the circular dependencies carefully.
> >
> > This is a huge project, much bigger than I can do in one summer. But I
> plan to get it started in a way that makes it easy for others to contribute.
> >
> > Sheena
> >
> >
> > On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields <cjfields at illinois.edu>
> wrote:
> > Sheena,
> >
> > Congrats on being accepted! We've talked about doing this over the years,
> but it's not an easy task and it needs a dedicated project to get the ball
> rolling, so to speak.  Hopefully this isn't tl;dr.  I'll start off with a
> few of my questions/thoughts (Rob could probably chime in as well, but I
> think his general thoughts on the project parallel mine):
> >
> > 1) The current BioPerl CPAN could just be a simple install script, acting
> like a 'Task' or 'Bundle' module, installing the actual Bio-specific
> distributions.  Doing it this way would allow you to iteratively split off
> additional code but retain the original Task/Bundle-based approach to
> installation.  For instance, the first pass could split out Root, then have
> a dependency-light and 'extras' distribution, 2nd round split further based
> on function, and so on:
> >
> >  1st round (v 1.9)   :  BioPerl (just an installer) -> installs root,
> min-deps, extra-deps
> >  2nd round (v 1.901) :  BioPerl (just an installer) -> root, seq/feature,
> other-min-deps, extra-deps
> >  ...
> >  Xth round (v 1.99)  :  BioPerl (just an installer) -> root, tools, seq,
> tree, align, coord, map, everything-else
> >  ...
> >
> > Also, one could potentially install modules in various ways:
> interactively, in predetermined groups, using a user-defined list, etc (one
> could effectively create custom BioPerl installs for GBrowse or other tools
> for instance).  Of course I would only pick the easiest route to start, but
> maybe that gives some ideas.  Regardless, if the dependency tree is set up
> correctly any reliance on other Bio* modules would be defined in the various
> Build.PL/Makefile.PL and then installed via CPAN (as is any dependency).
> >
> > 2) The Bio::Root modules are probably the true core modules and are the
> most stable with regards to changes, so those could be moved to something
> like BioPerl-Core.  Beyond that, what are the proposed splits?  (we've
> discussed this on-list before, but it's appropriate to bring this up again)
> >
> > 3) How do we want to handle versioning?  We can't (and probably
> shouldn't) release everything on a synchronized versioning scheme (via
> Bio::Root::Version, for instance), that'll quickly fall apart.  Personally I
> can foresee each split-off dist having it's own version, with the BioPerl
> network of modules being in effect it's own mini-CPAN.
> >
> > 5) Related to versioning, in my opinion we should maybe aim on eventually
> calling this BioPerl v2.0 and starting with a simpler X.Y versioning scheme.
>  Lincoln has already done something like this with Bio::Graphics, which was
> originally part of BioPerl but split off prior to v 1.6.0.
> >
> > 6) In some cases I can see particularly thorny problems, such as circular
> dependencies.  I can think of a few ways to address that (creating a simple
> lightweight Bio::Species class as a fallback if Bio::Tree code isn't
> present, for instance), but any additional thoughts on this would be
> helpful.
> >
> > 7) Do we want to set up something like 'git submodule' for the devs to
> pull down all BioPerl-relevant code?
> >
> > Other thoughts?
> >
> > chris
> >
> > On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote:
> >
> > > Hey everyone,
> > >
> > > I wanted to take a minute to introduce myself as one of the Google
> Summer of
> > > Code interns. I was the lucky one chosen to work on the BioPerl
> > > Reorganization (*crowd cheers*). I am a grad student in bioinformatics,
> and
> > > somewhat new to this level of programming so bear with me as I learn
> the
> > > technical jargon. Luckily I have both Rob and Chris to mentor me this
> > > summer!
> > >
> > > Reading through the mailing list archives, I see there have been many
> > > discussion and differing opinions about tackling this project. Given
> the
> > > time frame for GSoC and my limited experience, there is no way I will
> > > complete this project on my own but I will at least be able to start
> it,
> > > which will hopefully motivate others to pitch in. So far, the plan for
> the
> > > GSoC project is to start by breaking out Bio::Root, followed by a
> couple
> > > other modules based on their dependencies and the time allowed. Each
> will be
> > > published to CPAN independently. You can follow the project (once it
> starts)
> > > on github at https://github.com/sheenams.
> > >
> > > I look forward to collaborating with many of you on the reorganization
> (hint
> > > hint)!
> > >
> > > Sheena
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Thu, 28 Apr 2011 16:19:51 -0700
> From: Robert Buels <rmb32 at cornell.edu>
> Subject: Re: [Bioperl-l] GSoC/BioPerl Reorganization Project
> To: Chris Fields <cjfields at illinois.edu>
> Cc: Sheena Scroggins <sheena.scroggins at gmail.com>,      BioPerl List
>        <bioperl-l at lists.open-bio.org>
> Message-ID: <4DB9F617.6070705 at cornell.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> I think you guys are on the right track, here are some slightly more
> detailed plans.  I'll use Chris's subject numbering.
>
> 1,2,3,5.) I envision the splitting algorithm going like this:
>
>      no strict; # this is pseudocode!
>
>      my $split_count = 0;
>      for $subsystem (qw( Bio::Root Bio::Das Bio::Event ... )) {
>
>          - take $subsystem modules and tests out of bioperl-live
>
>            (my $new_dist_name = $subsystem) =~ s/::/-/g;
>          - extract $subsystem modules into new dist called
>            $new_dist_name.  Make sure all its tests pass, and write
>            some more tests if necessary.
>
>          - add dep on $subsystem to bioperl-live/Build.PL
>
>          - push $new_dist_name and bioperl-live to CPAN.
>            $new_dist_name has version '2.000', and bioperl-live has
>            version "1.7.$split_count".
>      }
>
>      and then, at the end of this loop, bioperl-live will be
>      nothing but a Build.PL and a couple of other things
>      for backcompat, like Bio::Root::Version, Bio::Perl, etc.
>
>      Important things to notice about this algorithm are that, at each
>      step in the loop:
>
>         a.) For users that install bioperl with CPAN,
>             doing cpan 'Bio::Perl' or cpan 'Bio::Root::Version' will
>             get you the same set of modules as before the split
>             started, with the split-off modules at 2.000 versions, and
>             the non-split-off ones at 1.7.x versions.
>
>         b.) For users (not developers) that are git cloning
>             bioperl-live, even though they are naughty (wink), they
>             can do 'perl Build.PL; ./Build installdeps' to get the
>             split-off modules, downloaded like any other CPAN
>             dependency.  There may be some lag before the split-off
>             thing is downloadable from CPAN,
>
>         c.) For BioPerl developers, unless they are working on a
>             certain module, they should install the split-off modules
>             from CPAN like everybody else, and git clone only the piece
>             they are working on.
>
>         d.) The version of bioperl-live keeps increasing by 0.001 with
>             each split.  The systems that are split off have a 2.x
>             version number, each slightly different depending on when it
>             was split off.  After this point, their release schedules
>             and version numbers are independent of eachother and of
>             bioperl-live.  For Bio::Perl and Bio::Root::Version, the
>             things that stay in bioperl-live, installing the latest
>             version will get you all the split-off modules.
>
>
> 6.) (thorny circular dependencies and stuff)  Those will become quickly
> apparent as this process proceeds.  They'll take some finesse and/or
> ruthlessness and/or hacking to get around.  We'll burn those bridges as
> we come to them.
>
> 7.) (git submodules) Git submodules probably won't be necessary, since
> at each step in the process BioPerl devs can use ./Build installdeps or
> cpanm --installdeps .  to install whatever the dependencies are for the
> piece they are working on, whether it's bioperl-live (in the case of a
> module that has not yet been split off), or one of the distributions
> that has already been split off (in which case their improvements will
> probably be releasable to CPAN immediately!).
>
> Lots of detail there.  I tried to make it structured and easy to skim
> though.  Thoughts?
>
> Rob
>
>
>
> On 04/28/2011 02:04 PM, Chris Fields wrote:
> > Sounds fine; I think (as you indicate) we can deal with issues along the
> way.  Rob, anything to add?
> >
> > chris
> >
> > On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote:
> >
> >> Chris,
> >>
> >> We haven't talked much about the versioning yet, but it will be on the
> list to figure out asap.
> >>
> >> So far, the plan is to split out Bio::Root first, followed by a couple
> modules that depend only on Bio::Root. The plan I proposed was Bio::Das,
> Bio::Event then Bio::Location. Depending on how much time is remaining for
> the GSoC project, the next to split out would be Bio::Factory and
> Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I plan
> to still help with the reorganization after the internship is over, but I
> obviously have to have a stopping point for the GSoC project.
> >>
> >> Rob provide me with a really nice scrip to list dependencies of the
> modules, so I plan to make a roadmap towards to end of the summer that will
> help guide the rest of the reorganization. At that point, we'll have to deal
> with the circular dependencies carefully.
> >>
> >> This is a huge project, much bigger than I can do in one summer. But I
> plan to get it started in a way that makes it easy for others to contribute.
> >>
> >> Sheena
> >>
> >>
> >> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields<cjfields at illinois.edu>
>  wrote:
> >> Sheena,
> >>
> >> Congrats on being accepted! We've talked about doing this over the
> years, but it's not an easy task and it needs a dedicated project to get the
> ball rolling, so to speak.  Hopefully this isn't tl;dr.  I'll start off with
> a few of my questions/thoughts (Rob could probably chime in as well, but I
> think his general thoughts on the project parallel mine):
> >>
> >> 1) The current BioPerl CPAN could just be a simple install script,
> acting like a 'Task' or 'Bundle' module, installing the actual Bio-specific
> distributions.  Doing it this way would allow you to iteratively split off
> additional code but retain the original Task/Bundle-based approach to
> installation.  For instance, the first pass could split out Root, then have
> a dependency-light and 'extras' distribution, 2nd round split further based
> on function, and so on:
> >>
> >>   1st round (v 1.9)   :  BioPerl (just an installer) ->  installs root,
> min-deps, extra-deps
> >>   2nd round (v 1.901) :  BioPerl (just an installer) ->  root,
> seq/feature, other-min-deps, extra-deps
> >>   ...
> >>   Xth round (v 1.99)  :  BioPerl (just an installer) ->  root, tools,
> seq, tree, align, coord, map, everything-else
> >>   ...
> >>
> >> Also, one could potentially install modules in various ways:
> interactively, in predetermined groups, using a user-defined list, etc (one
> could effectively create custom BioPerl installs for GBrowse or other tools
> for instance).  Of course I would only pick the easiest route to start, but
> maybe that gives some ideas.  Regardless, if the dependency tree is set up
> correctly any reliance on other Bio* modules would be defined in the various
> Build.PL/Makefile.PL and then installed via CPAN (as is any dependency).
> >>
> >> 2) The Bio::Root modules are probably the true core modules and are the
> most stable with regards to changes, so those could be moved to something
> like BioPerl-Core.  Beyond that, what are the proposed splits?  (we've
> discussed this on-list before, but it's appropriate to bring this up again)
> >>
> >> 3) How do we want to handle versioning?  We can't (and probably
> shouldn't) release everything on a synchronized versioning scheme (via
> Bio::Root::Version, for instance), that'll quickly fall apart.  Personally I
> can foresee each split-off dist having it's own version, with the BioPerl
> network of modules being in effect it's own mini-CPAN.
> >>
> >> 5) Related to versioning, in my opinion we should maybe aim on
> eventually calling this BioPerl v2.0 and starting with a simpler X.Y
> versioning scheme.  Lincoln has already done something like this with
> Bio::Graphics, which was originally part of BioPerl but split off prior to v
> 1.6.0.
> >>
> >> 6) In some cases I can see particularly thorny problems, such as
> circular dependencies.  I can think of a few ways to address that (creating
> a simple lightweight Bio::Species class as a fallback if Bio::Tree code
> isn't present, for instance), but any additional thoughts on this would be
> helpful.
> >>
> >> 7) Do we want to set up something like 'git submodule' for the devs to
> pull down all BioPerl-relevant code?
> >>
> >> Other thoughts?
> >>
> >> chris
> >>
> >> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote:
> >>
> >>> Hey everyone,
> >>>
> >>> I wanted to take a minute to introduce myself as one of the Google
> Summer of
> >>> Code interns. I was the lucky one chosen to work on the BioPerl
> >>> Reorganization (*crowd cheers*). I am a grad student in bioinformatics,
> and
> >>> somewhat new to this level of programming so bear with me as I learn
> the
> >>> technical jargon. Luckily I have both Rob and Chris to mentor me this
> >>> summer!
> >>>
> >>> Reading through the mailing list archives, I see there have been many
> >>> discussion and differing opinions about tackling this project. Given
> the
> >>> time frame for GSoC and my limited experience, there is no way I will
> >>> complete this project on my own but I will at least be able to start
> it,
> >>> which will hopefully motivate others to pitch in. So far, the plan for
> the
> >>> GSoC project is to start by breaking out Bio::Root, followed by a
> couple
> >>> other modules based on their dependencies and the time allowed. Each
> will be
> >>> published to CPAN independently. You can follow the project (once it
> starts)
> >>> on github at https://github.com/sheenams.
> >>>
> >>> I look forward to collaborating with many of you on the reorganization
> (hint
> >>> hint)!
> >>>
> >>> Sheena
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >
> >
>
>
>
> ------------------------------
>
> Message: 4
> Date: Thu, 28 Apr 2011 21:15:01 -0500
> From: Siddhartha Basu <sidd.basu at gmail.com>
> Subject: [Bioperl-l]  Re: GSoC/BioPerl Reorganization Project
> To: bioperl-l at lists.open-bio.org
> Message-ID: <20110429021457.GA351 at Macintosh-235.local>
> Content-Type: text/plain; charset=us-ascii
>
> Hi Robert,
> At what point in flow the dependencies between the split modules will be
> added. Is there any particular order the split modules would be created.
> And how those split off modules will be released in CPAN,  one by one as
> they being generated or all of them in a batch after which they will
> follow their release schedule.
>
> -siddhartha
>
>
>
> On Thu, 28 Apr 2011, Robert Buels wrote:
>
> > I think you guys are on the right track, here are some slightly more
> > detailed plans.  I'll use Chris's subject numbering.
> >
> > 1,2,3,5.) I envision the splitting algorithm going like this:
> >
> >      no strict; # this is pseudocode!
> >
> >      my $split_count = 0;
> >      for $subsystem (qw( Bio::Root Bio::Das Bio::Event ... )) {
> >
> >          - take $subsystem modules and tests out of bioperl-live
> >
> >            (my $new_dist_name = $subsystem) =~ s/::/-/g;
> >          - extract $subsystem modules into new dist called
> >            $new_dist_name.  Make sure all its tests pass, and write
> >            some more tests if necessary.
> >
> >          - add dep on $subsystem to bioperl-live/Build.PL
> >
> >          - push $new_dist_name and bioperl-live to CPAN.
> >            $new_dist_name has version '2.000', and bioperl-live has
> >            version "1.7.$split_count".
> >      }
> >
> >      and then, at the end of this loop, bioperl-live will be
> >      nothing but a Build.PL and a couple of other things
> >      for backcompat, like Bio::Root::Version, Bio::Perl, etc.
> >
> >      Important things to notice about this algorithm are that, at each
> >      step in the loop:
> >
> >         a.) For users that install bioperl with CPAN,
> >             doing cpan 'Bio::Perl' or cpan 'Bio::Root::Version' will
> >             get you the same set of modules as before the split
> >             started, with the split-off modules at 2.000 versions, and
> >             the non-split-off ones at 1.7.x versions.
> >
> >         b.) For users (not developers) that are git cloning
> >             bioperl-live, even though they are naughty (wink), they
> >             can do 'perl Build.PL; ./Build installdeps' to get the
> >             split-off modules, downloaded like any other CPAN
> >             dependency.  There may be some lag before the split-off
> >             thing is downloadable from CPAN,
> >
> >         c.) For BioPerl developers, unless they are working on a
> >             certain module, they should install the split-off modules
> >             from CPAN like everybody else, and git clone only the piece
> >             they are working on.
> >
> >         d.) The version of bioperl-live keeps increasing by 0.001 with
> >             each split.  The systems that are split off have a 2.x
> >             version number, each slightly different depending on when it
> >             was split off.  After this point, their release schedules
> >             and version numbers are independent of eachother and of
> >             bioperl-live.  For Bio::Perl and Bio::Root::Version, the
> >             things that stay in bioperl-live, installing the latest
> >             version will get you all the split-off modules.
> >
> >
> > 6.) (thorny circular dependencies and stuff)  Those will become quickly
> > apparent as this process proceeds.  They'll take some finesse and/or
> > ruthlessness and/or hacking to get around.  We'll burn those bridges as
> we
> > come to them.
> >
> > 7.) (git submodules) Git submodules probably won't be necessary, since at
> > each step in the process BioPerl devs can use ./Build installdeps or
> cpanm
> > --installdeps .  to install whatever the dependencies are for the piece
> > they are working on, whether it's bioperl-live (in the case of a module
> > that has not yet been split off), or one of the distributions that has
> > already been split off (in which case their improvements will probably be
> > releasable to CPAN immediately!).
> >
> > Lots of detail there.  I tried to make it structured and easy to skim
> > though.  Thoughts?
> >
> > Rob
> >
> >
> >
> > On 04/28/2011 02:04 PM, Chris Fields wrote:
> > > Sounds fine; I think (as you indicate) we can deal with issues along
> the
> > > way.  Rob, anything to add?
> > >
> > > chris
> > >
> > > On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote:
> > >
> > >> Chris,
> > >>
> > >> We haven't talked much about the versioning yet, but it will be on the
> > >> list to figure out asap.
> > >>
> > >> So far, the plan is to split out Bio::Root first, followed by a couple
> > >> modules that depend only on Bio::Root. The plan I proposed was
> Bio::Das,
> > >> Bio::Event then Bio::Location. Depending on how much time is remaining
> > >> for the GSoC project, the next to split out would be Bio::Factory and
> > >> Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I
> > >> plan to still help with the reorganization after the internship is
> over,
> > >> but I obviously have to have a stopping point for the GSoC project.
> > >>
> > >> Rob provide me with a really nice scrip to list dependencies of the
> > >> modules, so I plan to make a roadmap towards to end of the summer that
> > >> will help guide the rest of the reorganization. At that point, we'll
> have
> > >> to deal with the circular dependencies carefully.
> > >>
> > >> This is a huge project, much bigger than I can do in one summer. But I
> > >> plan to get it started in a way that makes it easy for others to
> > >> contribute.
> > >>
> > >> Sheena
> > >>
> > >>
> > >> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields<cjfields at illinois.edu>
> > >> wrote:
> > >> Sheena,
> > >>
> > >> Congrats on being accepted! We've talked about doing this over the
> years,
> > >> but it's not an easy task and it needs a dedicated project to get the
> > >> ball rolling, so to speak.  Hopefully this isn't tl;dr.  I'll start
> off
> > >> with a few of my questions/thoughts (Rob could probably chime in as
> well,
> > >> but I think his general thoughts on the project parallel mine):
> > >>
> > >> 1) The current BioPerl CPAN could just be a simple install script,
> acting
> > >> like a 'Task' or 'Bundle' module, installing the actual Bio-specific
> > >> distributions.  Doing it this way would allow you to iteratively split
> > >> off additional code but retain the original Task/Bundle-based approach
> to
> > >> installation.  For instance, the first pass could split out Root, then
> > >> have a dependency-light and 'extras' distribution, 2nd round split
> > >> further based on function, and so on:
> > >>
> > >>   1st round (v 1.9)   :  BioPerl (just an installer) ->  installs
> root,
> > >> min-deps, extra-deps
> > >>   2nd round (v 1.901) :  BioPerl (just an installer) ->  root,
> > >> seq/feature, other-min-deps, extra-deps
> > >>   ...
> > >>   Xth round (v 1.99)  :  BioPerl (just an installer) ->  root, tools,
> > >> seq, tree, align, coord, map, everything-else
> > >>   ...
> > >>
> > >> Also, one could potentially install modules in various ways:
> > >> interactively, in predetermined groups, using a user-defined list, etc
> > >> (one could effectively create custom BioPerl installs for GBrowse or
> > >> other tools for instance).  Of course I would only pick the easiest
> route
> > >> to start, but maybe that gives some ideas.  Regardless, if the
> dependency
> > >> tree is set up correctly any reliance on other Bio* modules would be
> > >> defined in the various Build.PL/Makefile.PL and then installed via
> CPAN
> > >> (as is any dependency).
> > >>
> > >> 2) The Bio::Root modules are probably the true core modules and are
> the
> > >> most stable with regards to changes, so those could be moved to
> something
> > >> like BioPerl-Core.  Beyond that, what are the proposed splits?  (we've
> > >> discussed this on-list before, but it's appropriate to bring this up
> > >> again)
> > >>
> > >> 3) How do we want to handle versioning?  We can't (and probably
> > >> shouldn't) release everything on a synchronized versioning scheme (via
> > >> Bio::Root::Version, for instance), that'll quickly fall apart.
> > >> Personally I can foresee each split-off dist having it's own version,
> > >> with the BioPerl network of modules being in effect it's own
> mini-CPAN.
> > >>
> > >> 5) Related to versioning, in my opinion we should maybe aim on
> eventually
> > >> calling this BioPerl v2.0 and starting with a simpler X.Y versioning
> > >> scheme.  Lincoln has already done something like this with
> Bio::Graphics,
> > >> which was originally part of BioPerl but split off prior to v 1.6.0.
> > >>
> > >> 6) In some cases I can see particularly thorny problems, such as
> circular
> > >> dependencies.  I can think of a few ways to address that (creating a
> > >> simple lightweight Bio::Species class as a fallback if Bio::Tree code
> > >> isn't present, for instance), but any additional thoughts on this
> would
> > >> be helpful.
> > >>
> > >> 7) Do we want to set up something like 'git submodule' for the devs to
> > >> pull down all BioPerl-relevant code?
> > >>
> > >> Other thoughts?
> > >>
> > >> chris
> > >>
> > >> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote:
> > >>
> > >>> Hey everyone,
> > >>>
> > >>> I wanted to take a minute to introduce myself as one of the Google
> > >>> Summer of
> > >>> Code interns. I was the lucky one chosen to work on the BioPerl
> > >>> Reorganization (*crowd cheers*). I am a grad student in
> bioinformatics,
> > >>> and
> > >>> somewhat new to this level of programming so bear with me as I learn
> the
> > >>> technical jargon. Luckily I have both Rob and Chris to mentor me this
> > >>> summer!
> > >>>
> > >>> Reading through the mailing list archives, I see there have been many
> > >>> discussion and differing opinions about tackling this project. Given
> the
> > >>> time frame for GSoC and my limited experience, there is no way I will
> > >>> complete this project on my own but I will at least be able to start
> it,
> > >>> which will hopefully motivate others to pitch in. So far, the plan
> for
> > >>> the
> > >>> GSoC project is to start by breaking out Bio::Root, followed by a
> couple
> > >>> other modules based on their dependencies and the time allowed. Each
> > >>> will be
> > >>> published to CPAN independently. You can follow the project (once it
> > >>> starts)
> > >>> on github at https://github.com/sheenams.
> > >>>
> > >>> I look forward to collaborating with many of you on the
> reorganization
> > >>> (hint
> > >>> hint)!
> > >>>
> > >>> Sheena
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >>
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> ------------------------------
>
> Message: 5
> Date: Fri, 29 Apr 2011 10:23:50 +0530
> From: "khush ........" <bioinfo.khush at gmail.com>
> Subject: Re: [Bioperl-l] Standalone blast
> To: Dave Messina <David.Messina at sbc.su.se>
> Cc: bioperl-l at lists.open-bio.org
> Message-ID: <BANLkTikjFc-HBBKLMRam1g+Kxoro+WAE_g at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Dear Dave,
>
> Thank you for your support.
>
> If need to change the following lines like
>
> $blast_obj = Bio::Tools::Run::StandAloneBlast->new(-program  => 'blastx',
> -database => 'nr.fa'));
>
> $seq_obj = Bio::Seq->new(-id  =>"test query", -seq =>"file.fa");
>
> I have a simple and basic query for you, as I am beginners in bioperl, that
> if I need to download the whole nr database from NCBI to run the code or It
> will directly fetch information from the NCBI website. I do not understand
> it, because downloading the whole nr d/b itself takes long time for me.
>
> How could I read whole file instead of simple string "TTTATAGATAGAGACAG" in
> -seq (a fasta file). Is there a simple way to do the exercise according to
> my conditions.
>
> Thank you
> Kamal
>
>
> On Thu, Apr 28, 2011 at 12:59 PM, Dave Messina <David.Messina at sbc.su.se
> >wrote:
>
> > Hi Kamal,
> >
> > This is covered in the beginners' HOWTO:
> > http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST
> >
> >
> > Dave
> >
> >
> > On Thu, Apr 28, 2011 at 07:22, khush ........ <bioinfo.khush at gmail.com
> >wrote:
> >
> >> Hi,
> >>
> >> I have some sequences ~250 and wanted to use BLASTX to blast against nr
> >> database of NCBI, as this is time consuming using web based search. Can
> >> some
> >> one please tell me how to start BIOPERL with scuh problems. I know that
> >> this
> >> is possible with bioperl, but do not know how.
> >>
> >> Any suggestion will be appreciable.
> >>
> >> Thanks in advance
> >> Kamal
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
>
>
> ------------------------------
>
> Message: 6
> Date: Thu, 28 Apr 2011 22:15:01 -0700
> From: Robert Buels <rmb32 at cornell.edu>
> Subject: Re: [Bioperl-l] GSoC/BioPerl Reorganization Project
> To: BioPerl List <bioperl-l at lists.open-bio.org>
> Message-ID: <4DBA4955.2030003 at cornell.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 04/28/2011 07:15 PM, Siddhartha Basu wrote:
> > At what point in flow the dependencies between the split modules will be
> > added. Is there any particular order the split modules would be created.
>
> Dependencies are added and characterized at the time each distribution
> is created.  That's why the splitting order starts at Bio::Root, so that
> you can proceed up the hierarchy of dependencies without having to
> modify the dependency lists of the distributions that have already been
> extracted.
>
> > And how those split off modules will be released in CPAN,  one by one as
> > they being generated or all of them in a batch after which they will
> > follow their release schedule.
>
> One by one, as they are generated.  I think it would be a good idea to
> re-release bioperl-live with each split as well.  This will probably
> lead to bioperl-live being released nearly every week as the split is
> ongoing.  As a consequence, the master branch of bioperl-live will need
> to be kept in very good shape.  This is easy if you just follow good
> practice: develop in branches, run *all* the tests before committing, go
> on IRC and send pull requests for code review, etc.
>
> Rob
>
>
> ------------------------------
>
> Message: 7
> Date: Fri, 29 Apr 2011 15:24:45 +1000
> From: Florent Angly <florent.angly at gmail.com>
> Subject: Re: [Bioperl-l] Standalone blast
> To: bioinfo.khush at gmail.com
> Cc: bioperl-l at lists.open-bio.org
> Message-ID: <4DBA4B9D.1010400 at gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi Kamal,
>
> To run BLAST the way Dave described, you need to have BLAST installed on
> your computer, and you need to download BLAST databases to your computer
> (or make them yourself with the formatdb command). There are plenty of
> databases available on the NCBI FTP website: ftp://ftp.ncbi.nih.gov/.
> And yes, some of these databases are very large and will take a long
> time to download. By the way, the BLAST may also take a very long time
> to execute if you use large databases, so, you'd better run the analysis
> on a powerful computer or a server.
>
> Also read this documentation:
>
> http://search.cpan.org/~cjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm
> <
> http://search.cpan.org/%7Ecjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm
> >
> It stipulates that you can BLAST an entire FASTA file (not just a
> sequence object):
>
>   $inputfilename  =  't/testquery.fa';
>   $blast_report  =  $factory->blastall($inputfilename);
>
>
> Regards,
>
> Florent
>
>
>
>
> On 29/04/11 14:53, khush ........ wrote:
> > Dear Dave,
> >
> > Thank you for your support.
> >
> > If need to change the following lines like
> >
> > $blast_obj = Bio::Tools::Run::StandAloneBlast->new(-program  =>
>  'blastx',
> > -database =>  'nr.fa'));
> >
> > $seq_obj = Bio::Seq->new(-id  =>"test query", -seq =>"file.fa");
> >
> > I have a simple and basic query for you, as I am beginners in bioperl,
> that
> > if I need to download the whole nr database from NCBI to run the code or
> It
> > will directly fetch information from the NCBI website. I do not
> understand
> > it, because downloading the whole nr d/b itself takes long time for me.
> >
> > How could I read whole file instead of simple string "TTTATAGATAGAGACAG"
> in
> > -seq (a fasta file). Is there a simple way to do the exercise according
> to
> > my conditions.
> >
> > Thank you
> > Kamal
> >
> >
> > On Thu, Apr 28, 2011 at 12:59 PM, Dave Messina<David.Messina at sbc.su.se
> >wrote:
> >
> >> Hi Kamal,
> >>
> >> This is covered in the beginners' HOWTO:
> >> http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST
> >>
> >>
> >> Dave
> >>
> >>
> >> On Thu, Apr 28, 2011 at 07:22, khush ........<bioinfo.khush at gmail.com
> >wrote:
> >>
> >>> Hi,
> >>>
> >>> I have some sequences ~250 and wanted to use BLASTX to blast against nr
> >>> database of NCBI, as this is time consuming using web based search. Can
> >>> some
> >>> one please tell me how to start BIOPERL with scuh problems. I know that
> >>> this
> >>> is possible with bioperl, but do not know how.
> >>>
> >>> Any suggestion will be appreciable.
> >>>
> >>> Thanks in advance
> >>> Kamal
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
> ------------------------------
>
> Message: 8
> Date: Fri, 29 Apr 2011 11:16:38 +0530
> From: "khush ........" <bioinfo.khush at gmail.com>
> Subject: Re: [Bioperl-l] Standalone blast
> To: Florent Angly <florent.angly at gmail.com>
> Cc: bioperl-l at lists.open-bio.org
> Message-ID: <BANLkTin_E2-Pq4Hk+W72x78bKpTRoEdy6g at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Dear Florent,
>
> Thank you very much for your kind reply and let me clear the concept of
> running the blast. I am working with simple machine so I need to take
> permission from my administrator to work on some good server to have whole
> nr database from NCBI and run the blastx.
>
> Thank you
>
> Kamal
> Bioperl is great.
>
>
> On Fri, Apr 29, 2011 at 10:54 AM, Florent Angly <florent.angly at gmail.com
> >wrote:
>
> > Hi Kamal,
> >
> > To run BLAST the way Dave described, you need to have BLAST installed on
> > your computer, and you need to download BLAST databases to your computer
> (or
> > make them yourself with the formatdb command). There are plenty of
> databases
> > available on the NCBI FTP website: ftp://ftp.ncbi.nih.gov/. And yes,
> some
> > of these databases are very large and will take a long time to download.
> By
> > the way, the BLAST may also take a very long time to execute if you use
> > large databases, so, you'd better run the analysis on a powerful computer
> or
> > a server.
> >
> > Also read this documentation:
> >
> http://search.cpan.org/~cjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm
> <
> >
> http://search.cpan.org/%7Ecjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm
> > >
> > It stipulates that you can BLAST an entire FASTA file (not just a
> sequence
> > object):
> >
> >  $inputfilename  =  't/testquery.fa';
> >  $blast_report  =  $factory->blastall($inputfilename);
> >
> >
> > Regards,
> >
> > Florent
> >
> >
> >
> >
> >
> > On 29/04/11 14:53, khush ........ wrote:
> >
> >> Dear Dave,
> >>
> >> Thank you for your support.
> >>
> >> If need to change the following lines like
> >>
> >> $blast_obj = Bio::Tools::Run::StandAloneBlast->new(-program  =>
>  'blastx',
> >> -database =>  'nr.fa'));
> >>
> >> $seq_obj = Bio::Seq->new(-id  =>"test query", -seq =>"file.fa");
> >>
> >> I have a simple and basic query for you, as I am beginners in bioperl,
> >> that
> >> if I need to download the whole nr database from NCBI to run the code or
> >> It
> >> will directly fetch information from the NCBI website. I do not
> understand
> >> it, because downloading the whole nr d/b itself takes long time for me.
> >>
> >> How could I read whole file instead of simple string "TTTATAGATAGAGACAG"
> >> in
> >> -seq (a fasta file). Is there a simple way to do the exercise according
> to
> >> my conditions.
> >>
> >> Thank you
> >> Kamal
> >>
> >>
> >> On Thu, Apr 28, 2011 at 12:59 PM, Dave Messina<David.Messina at sbc.su.se
> >> >wrote:
> >>
> >>  Hi Kamal,
> >>>
> >>> This is covered in the beginners' HOWTO:
> >>> http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST
> >>>
> >>>
> >>> Dave
> >>>
> >>>
> >>> On Thu, Apr 28, 2011 at 07:22, khush ........<bioinfo.khush at gmail.com
> >>> >wrote:
> >>>
> >>>  Hi,
> >>>>
> >>>> I have some sequences ~250 and wanted to use BLASTX to blast against
> nr
> >>>> database of NCBI, as this is time consuming using web based search.
> Can
> >>>> some
> >>>> one please tell me how to start BIOPERL with scuh problems. I know
> that
> >>>> this
> >>>> is possible with bioperl, but do not know how.
> >>>>
> >>>> Any suggestion will be appreciable.
> >>>>
> >>>> Thanks in advance
> >>>> Kamal
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>>  _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
>
>
> ------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> End of Bioperl-l Digest, Vol 96, Issue 28
> *****************************************
>



More information about the Bioperl-l mailing list