[Bioperl-l] Installing BioPerl on Windows
Barry Moore
barry.moore at genetics.utah.edu
Wed Dec 8 16:15:12 EST 2004
Jason, Brian, Others-
A recent message to the bioperl list suggests that new Windows users are
still having problems installing Bioperl on Windows. This is not
necessary because it's actually quite easy to install Bioperl 1.4. I had
a look at the INSATLL.WIN document and I think that while it has been
updated a bit, it is starting to suffer from fragmented editing over a
long period of time. All the information that you need is there, but it
doesn't really fit together to well anymore, and there is still some
outdated and conflicting information present. Since new Windows users
are often the least likely to be experienced programmers and also likely
to have little Unix experience it may also need to be written with that
in mind, providing more explanation for how things are done. I've taken
a crack at this and rewritten INSTALL.WIN with a longer (perhaps to
long) introduction to Bioperl, and updated installation instruction for
Bioperl 1.4. In fact I think that the file name INSTALL.WIN should
probably be changed as that is a filename that is intuitive to someone
who has done a lot of installing from source.
Installing_Bioperl_on_Windows.txt may be more obvious filename to new
Windows users. If you think it looks useful please feel free to post it
on the Bioperl web site as a replacement for or in addition to the
current INSTALL.WIN. I'll be happy to try to keep this document up to
date, but I'll need one of the developers to put it on the site for me.
Finally, I didn't touch the Cygwin sections of the previous INSTALL.WIN
document because I have no experience with it, so I'll have to assume
that it is accurate and let others contribute any fixes necessary there.
Let me know if I've made any errors or omissions that need to be corrected.
Barry
==================================================================================
Installing Bioperl on Windows
=============================
1) Quick Instructions for the impatient
2) Bioperl on Windows
3) Perl on Windows
4) BioPerl on Windows
5) Beyond the Core
6) BioPerl in Cygwin
7) Cygwin tips
This installation guide was written by Barry Moore and other Bioperl
authors based on the
original work of Paul Boutros. Please report problems and/or fixes to
the bioper lmailing
list, bioperl-l at bioperl.org
1) Quick instructions for the impatient, lucky, or experienced user.
=====================================================================
Download the ActivePerl MSI from
http://www.activestate.com/Products/ActivePerl/
Run the ActivePerl Installer (accepting all defaults is fine).
Open a command prompt (Menus Start->Run and type cmd) and run the ppm
shell (C:\>ppm).
Add two new ppm repositories with the following commands:
ppm> rep add Bioperl http://bioperl.org/DIST
ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms
Install Bioperl-1.4.
Go to http://www.bioperl.org and start reading documentation or try the
example script at
the end of this file.
2) Bioperl on Windows
======================
Bioperl is a large collection of Perl modules (extensions to the Perl
language) that aid
in the task of writing perl code to deal with sequence data in a myriad
of ways. Bioperl
provides objects for various types of sequence data and their associated
features and
annotations. It provides interfaces for analysis of these sequences with
a wide variety
of external programs (BLAST, fasta, clustalw and EMBOSS to name just a
few). It provides
interfaces to various types of databases both remote (GenBank, EMBL
etc.) and local
(MySQL, flat files, GFF etc.) for storage and retrieval of sequences.
And finally with
its associated documentation and mailing list Bioperl represents a
community of
bioinformatics professionals working in perl who are committed to
supporting both
development of Bioperl and the new users who are drawn to the project.
While most bioinformatics and computational biology applications are
developed in
Unix/Linux environments, more and more programs are being ported to
other operating
systems like Windows, and many users (often biologists with little
background in
programming) are looking for ways to automate bioinformatics analyses in
the Windows
environment. Perl and Bioperl can be installed natively on Windows
NT/2000/XP. Most of
the functionality of Bioperl is available with this type of install.
Much of the heavy
lifting in bioinformatics is done by programs originally developed in
lower level
languages like C and Pascal (e.g. BLAST, clustalw, Staden etc.). Bioperl
simply acts as a
wrapper for running and parsing output from these external programs.
Some of those
programs (BLAST for example) are ported to Windows. These can be
installed and work
quite happily with BioPerl in the native Windows environment. Others,
such as clustalw,
have Windows ports, however the BioPerl developer who wrote the
interface used Unix
specific system calls to interact with these programs and so these
wrappers will not work
in the Windows environment. And finally some external programs such as
Staden and the
EMBOSS suite of programs can not be installed on Windows at all, and
therefore any part
of Bioperl that interacts with these packages either won’t work or can’t
be installed at
all.
If you have a fairly simple project in mind, want to start using Bioperl
quickly, only
have access to a computer running Windows, and/or don’t mind bumping up
against some
limitations then Bioperl on Windows may be a good place for you to
start. For example,
downloading a bunch of sequences from GenBank and sorting out the ones
that have a
particular annotation or feature works great. Running a bunch of your
sequences against
remote or local BLAST, parsing the output and storing it in a MySQL
database would be
fine also. Be aware that most if not all of the Bioperl developers are
working in some
type of a Unix environment (Linux, OSX, Cygwin). If you have problems
with Bioperl that
are specific to the Windows environment, you may be blazing new ground
and your pleas for
help on the Bioperl mailing list may get few responses – simply because
no one knows the
answer to your Windows specific problem. If this is or becomes a problem
for you then
you are better off working in some type of Unix like environment. One
solution to this
problem that will keep you working on a Windows machine it to install
Cygwin, a Unix
emulation environment for Windows. A number of Bioperl users are using
this approach
successfully and it is discussed more below.
3) Perl on Windows
===================
There are a couple of ways of installing Perl on a Windows machine. The
most common and
easiest is to get the most recent build from ActiveState. ActiveState is
a software
company (http://www.activestate.com) that provides free builds of Perl
for Windows
users. The current (December 2004) build is ActivePerl 5.8.4.810
(ActivePerl 5.6.1.638
is also available and should work just fine). To install ActivePerl on
Windows:
Download the ActivePerl MSI from
http://www.activestate.com/Products/ActivePerl/
Run the ActivePerl Installer (accepting all defaults is fine).
You can also build Perl yourself (which requires a C compiler) or
download one of the
other binary distributions. The Perl source for building it yourself is
available from
CPAN (http://www.cpan.org), as are a few other binary distributions that
are alternatives
to ActiveState. This approach is not recommended unless you have
specific reasons for
doing so and know what you’re doing. It that’s the case you probably
don’t need to be
reading this guide.
Cygwin is a Unix emulation environment for Windows and comes with its
own copy of Perl.
Information on Cygwin and Bioperl is found below.
4) BioPerl on Windows
======================
Perl is a programming language that has been extended a lot by the
addition of external
modules. These modules work with the core language to extend the
functionality of Perl.
Bioperl is one such extension to Perl. These modular extensions to Perl
sometimes depend
on the functionality of other Perl modules and this creates a
dependency. You can’t
install module X unless you have already installed module Y. Some Perl
modules are so
fundamentally useful that the Perl developers have included them in the
core distribution
of Perl – if you’ve installed Perl then these modules are already
installed. Other
modules are freely available from CPAN, but you’ll have to install them
yourself if you
want to use them. BioPerl has such dependencies.
Bioperl is actually a large collection of perl modules (over 1000
currently) and these
modules are split into six groups. These six groups are:
Bioperl Group Functions
-----------------------------------------------------------------
bioperl (the core) Most of the main functionality of Bioperl.
bioperl-run Wrappers to a lot of external programs.
bioperl-ext Interaction with some alignment functions
and the Staden package.
bioperl-db Using bioperl with BioSQL and local
relational databases.
bioperl-microarray Microarray specific functions.
biperl-gui Some preliminary work on a graphical user
interface to some Bioperl functions.
The Bioperl core is what most new users will want to start with. Bioperl
1.4 (the core)
and the Perl modules that it depends on can be easily installed with
ppm. PPM
(Programming Package Manager) is an ActivePerl utility for installing
Perl modules on
systems using ActivePerl. PPM will look online (you have to be connected
to the internet
of course) for files (these files end with .ppd) that tell it how to
install the modules
you want and what other modules your new modules depends on. It will
then download and
install your modules and all dependent modules for you. These .ppd files
are stored
online in ppm repositories. ActiveState maintains the largest ppm
repository and when
you installed ActivePerl ppm was installed with directions for using the
ActiveState
repositories. Unfortunately the ActiveState repositories are far from
complete and other
ActivePerl users maintain their own ppm repositories to fill in the
gaps. Installing
will require you to direct ppm to look in two new repositories. You do
this by opening a
Windows command prompt, typing ppm to start the ppm shell and then
typing the following
two commands:
ppm> rep add Bioperl http://bioperl.org/DIST
ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms
Once ppm knows where to look for Bioperl and it’s dependencies you
simply tell ppm to
install it. This is done with the command:
ppm> install Bioperl-1.4
5) Beyond the Core
===================
You may find that you want some of the features of other Bioperl groups
like bioperl-run
or bioperl-db. There are currently no ppm packages for installing these
parts of
Bioperl. You will have to install these manually from source. For this
you will need a
Windows version of the program make called nmake
(http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe).
You will
also want to have a willingness to experiment. You’ll have to read the
installation
documents for each component that you want to install, and use nmake
where the
instructions call for make. You will have to determine from the
installation documents
what dependencies are required and you will have to get them, read there
documentation
and install them first. The details of this are beyond the scope of this
guide. Read
the documentation. Search Google. Try your best, and if you get stuck
consult with
other on the bioperl mailing list.
6) BioPerl in Cygwin
=====================
Cygwin is a Unix emulator and shell environment available free at
www.cygwin.com. BioPerl
runs well within Cygwin. Some users claim that installation of Bioperl
is easier within
Cygwin than within Windows, but these may be users with Unix backgrounds.
One advantage of using Bioperl in Cygwin is that all the external
modules are available
through CPAN, most if not all external programs can be installed and run
so many of the
limitation of Bioperl on Windows are circumvented.
To get Bioperl running first install the basic Cygwin package as well as
the Cygwin Perl,
make, and gcc packages. Clicking the "View" button in the upper right of
the installer
enables you to see details on the various packages. Then follow the
BioPerl installation
instructions for Unix in BioPerl's INSTALL file.
Note that expat comes with Cygwin (it's used by the module XML::Parser).
One known issue is that DBD::mysql can be tricky to install in
Cygwin and this module is required for the bioperl-db, Biosql, and
bioperl-pipeline
external packages. Fortunately there's some good instructions online:
http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwin.
Also, set the environmental variable TMPDIR, programs like BLAST and
clustalw need a
place to create temporary files. e.g.:
setenv TMPDIR e:/cygwin/tmp # csh, tcsh
export TMPDIR=e:/cygwin/tmp # sh, bash
Note that this is not a syntax that Cygwin understands, which would be
something like
"/cygdrive/e/cygwin/tmp". This is the syntax that a Perl module expects
on Windows.
If this variable is not set correctly you'll see errors like this when
you run
Bio::Tools::Run::StandAloneBlast:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory
STACK: Error::throw
..........
7) Cygwin tips
===============
The easiest way to install Mysql is to use the Windows binaries
available at
www.mysql.com. Note that Windows does not have sockets, so you need to
force the Mysql
connections to use TCP/IP instead. Do this by using the "-h" option from
the command-
line:
>mysql -h 127.0.0.1 -u blip -pblop biosql
Or, alias the mysql command in your .tcshrc, .cshrc, or .bashrc so it
uses a host. For
example, if your databases are installed locally:
alias mysql 'mysql -h 127.0.0.1'
If you're trying to use some application or resource "outside" of Cygwin
and you're
having a problem remember that Cygwin's path syntax may not be the
correct one. Cygwin
understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when
referring to the E:
drive) but the external resource may want 'E:/cygwin/home/jacky'. So
your *rc files may
end up with paths written in these different syntaxes, depending.
If you can, install Cygwin on a drive or partition that's
NTFS-formatted, not FAT32-
formatted. When you install Cygwin on a FAT32 partition you will not be
able to set
permissions and ownership correctly. In most situations this probably
won't make any
difference but there may be occasions where this is a problem.
If you want to use BLAST we recommend that the Windows binary be
obtained from NCBI
(ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will
be named something
like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions
in README.bls.
Although we've recommended using the BLAST and MySQL binaries you should
be able to
compile just about everything else from source code using Cygwin's gcc.
You'll notice
when you're installing Cygwin that many different libraries are also
available (gd, jpeg,
etc.).
-------------- next part --------------
Installing Bioperl on Windows
=============================
1) Quick Instructions for the impatient
2) Bioperl on Windows
3) Perl on Windows
4) BioPerl on Windows
5) Beyond the Core
6) BioPerl in Cygwin
7) Cygwin tips
This installation guide was written by Barry Moore and other Bioperlauthors based on the
original work of Paul Boutros. Please report problems and/or fixes to the bioperlmailing
list, bioperl-l at bioperl.org
1) Quick instructions for the impatient, lucky, or experienced user.
=====================================================================
Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/
Run the ActivePerl Installer (accepting all defaults is fine).
Open a command prompt (Menus Start->Run and type cmd) and run the ppm shell (C:\>ppm).
Add two new ppm repositories with the following commands:
ppm> rep add Bioperl http://bioperl.org/DIST
ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms
Install Bioperl-1.4.
Go to http://www.bioperl.org and start reading documentation or try the example script at
the end of this file.
2) Bioperl on Windows
======================
Bioperl is a large collection of Perl modules (extensions to the Perl language) that aid
in the task of writing perl code to deal with sequence data in a myriad of ways. Bioperl
provides objects for various types of sequence data and their associated features and
annotations. It provides interfaces for analysis of these sequences with a wide variety
of external programs (BLAST, fasta, clustal and EMBOSS to name just a few). It provides
interfaces to various types of databases both remote (GenBank, EMBL etc) and local
(MySQL, flatfiles, GFF etc.) for storage and retrieval of sequences. And finally with
it’s associated documentation and mailing list Bioperl represents a community of
bioinformatics professionals working in perl who are commitmented to supporting both
development of Bioperl and the new users who are drawn to the project.
While most bioinformatics and computational biology applications are developed in
Unix/Linux environments, more and more programs are being ported to other operating
systems like Windows, and many users (often biologists with little background in
programming) are looking for ways to automate bioinformatics analyses in the Windows
environment. Perl and Bioperl can be installed natively on Windows NT/2000/XP. Most of
the functionality of Bioperl is available with this type of install. Much of the heavy
lifting in bioinformatics is done by programs originally developed in lower level
languages like C and Pascal (e.g. BLAST, Clustal, Staden etc). Bioperl simply acts as a
wrapper for running and parsing output from these external programs. Some of those
programs (BLAST for example) are ported to Windows. These can be installed and work
quite happily with BioPerl in the native Windows environment. Others, such as Clustal,
have Windows ports, however the BioPerl developer who wrote the interface used Unix
specific system calls to interact with these programs and so these wrappers will not work
in the Windows environment. And finally some external programs such as Staden and the
EMBOSS suite of programs can not be installed on Windows at all, and therefore any part
of Bioperl that interacts with these packages either won’t work or can’t be installed at
all.
If you have a fairly simple project in mind, want to start using Bioperl quickly, only
have access to a computer running Windows, and/or don’t mind bumping up against some
limitations then Bioperl on Windows may be a good place for you to start. For example,
downloading a bunch of sequences from GenBank and sorting out the ones that have a
particular annotation or feature works great. Running a bunch of your sequences against
remote or local BLAST, parsing the output and storing it in a MySQL database would be
fine also. Be aware that most if not all of the Bioperl developers are working in some
type of a Unix environment (Linux, OSX, Cygwin). If you have problems with Bioperl that
are specific to the Windows environment, you may be blazing new ground and your pleas for
help on the Bioperl mailing list may get few responses – simply because no one knows the
answer to your Windows specific problem. If this is or becomes a problem for you then
you are better off working in some type of Unix like environment. One solution to this
problem that will keep you working on a Windows machine it to install Cygwin, a Unix
emulation environment for Windows. A number of Bioperl users are using this approach
successfully and it is discussed more below.
3) Perl on Windows
===================
There are a couple of ways of installing Perl on a Windows machine. The most common and
easiest is to get the most recent build from ActiveState. ActiveState is a software
company (http://www.activestate.com) that provides free builds of Perl for Windows
users. The current (December 2004) build is ActivePerl 5.8.4.810 (ActivePerl 5.6.1.638
is also available and should work just fine). To install ActivePerl on Windows:
Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/
Run the ActivePerl Installer (accepting all defaults is fine).
You can also build Perl yourself (which requires a C compiler) or download one of the
other binary distributions. The Perl source for building it yourself is available from
CPAN (http://www.cpan.org), as are a few other binary distributions that are alternatives
to ActiveState. This approach is not recommended unless you have specific reasons for
doing so and know what you’re doing. It that’s the case you probably don’t need to be
reading this guide.
Cygwin is a Unix emulation environment for Windows and comes with its own copy of Perl.
Information on Cygwin and Bioperl is found below.
4) BioPerl on Windows
======================
Perl is a programming language that has been extended a lot by the addition of external
modules. These modules work with the core language to extend the functionality of Perl.
Bioperl is one such extension to Perl. These modular extensions to Perl sometimes depend
on the functionality of other Perl modules and this creates a dependency. You can’t
install module X unless you have already installed module Y. Some Perl modules are so
fundamentally useful that the Perl developers have included them in the core distribution
of Perl – if you’ve installed Perl then these modules are already installed. Other
modules are freely available from CPAN, but you’ll have to install them yourself if you
want to use them. BioPerl has such dependencies.
Bioperl is actually a large collection of perl modules (over 1000 currently) and these
modules are split into six groups. These six groups are:
Bioperl Group Functions
-----------------------------------------------------------------
bioperl (the core) Most of the main functionality of Bioperl.
bioperl-run Wrappers to a lot of external programs.
bioperl-ext Interaction with some alignment functions
and the Staden package.
bioperl-db Using bioperl with BioSQL and local
relational databases.
bioperl-microarray Microarray specific functions.
biperl-gui Some preliminary work on a graphical user
interface to some Bioperl functions.
The Bioperl core is what most new users will want to start with. Bioperl 1.4 (the core)
and the Perl modules that it depends on can be easily installed with ppm. PPM
(Programming Package Manager) is an ActivePerl utility for installing Perl modules on
systems using ActivePerl. PPM will look online (you have to be connected to the internet
of course) for files (these files end with .ppd) that tell it how to install the modules
you want and what other modules your new modules depends on. It will then download and
install your modules and all dependent modules for you. These .ppd files are stored
online in ppm repositories. ActiveState maintains the largest ppm repository and when
you installed ActivePerl ppm was installed with directions for using the ActiveState
repositories. Unfortunately the ActiveState repositories are far from complete and other
ActivePerl users maintain their own ppm repositories to fill in the gaps. Installing
will require you to direct ppm to look in two new repositories. You do this by opening a
Windows command prompt, typing ppm to start the ppm shell and then typing the following
two commands:
ppm> rep add Bioperl http://bioperl.org/DIST
ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms
Once ppm knows where to look for Bioperl and it’s dependencies you simply tell ppm to
install it. This is done with the command:
ppm> install Bioperl-1.4
5) Beyond the Core
===================
You may find that you want some of the features of other Bioperl groups like bioperl-run
or bioperl-db. There are currently no ppm packages for installing these parts of
Bioperl. You will have to install these manually from source. For this you will need a
Windows version of the program make called nmake
(http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe). You will
also want to have a willingness to experiment. You’ll have to read the installation
documents for each component that you want to install, and use nmake where the
instructions call for make. You will have to determine from the installation documents
what dependencies are required and you will have to get them, read there documentation
and install them first. The details of this are beyond the scope of this guide. Read
the documentation. Search Google. Try your best, and if you get stuck consult with
other on the bioperl mailing list.
6) BioPerl in Cygwin
=====================
Cygwin is a Unix emulator and shell environment available free at www.cygwin.com. BioPerl
runs well within Cygwin. Some users claim that installation of Bioperl is easier within
Cygwin than within Windows, but these may be users with Unix backgrounds.
One advantage of using Bioperl in Cygwin is that all the external modules are available
through CPAN, most if not all external programs can be installed and run so many of the
limitation of Bioperl on Windows are circumvented.
To get Bioperl running first install the basic Cygwin package as well as the Cygwin Perl,
make, and gcc packages. Clicking the "View" button in the upper right of the installer
enables you to see details on the various packages. Then follow the BioPerl installation
instructions for Unix in BioPerl's INSTALL file.
Note that expat comes with Cygwin (it's used by the module XML::Parser).
One known issue is that DBD::mysql can be tricky to install in
Cygwin and this module is required for the bioperl-db, Biosql, and bioperl-pipeline
external packages. Fortunately there's some good instructions online:
http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwin.
Also, set the environmental variable TMPDIR, programs like BLAST and clustalw need a
place to create temporary files. e.g.:
setenv TMPDIR e:/cygwin/tmp # csh, tcsh
export TMPDIR=e:/cygwin/tmp # sh, bash
Note that this is not a syntax that Cygwin understands, which would be something like
"/cygdrive/e/cygwin/tmp". This is the syntax that a Perl module expects on Windows.
If this variable is not set correctly you'll see errors like this when you run
Bio::Tools::Run::StandAloneBlast:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory
STACK: Error::throw
..........
7) Cygwin tips
===============
The easiest way to install Mysql is to use the Windows binaries available at
www.mysql.com. Note that Windows does not have sockets, so you need to force the Mysql
connections to use TCP/IP instead. Do this by using the "-h" option from the command-
line:
>mysql -h 127.0.0.1 -u blip -pblop biosql
Or, alias the mysql command in your .tcshrc, .cshrc, or .bashrc so it uses a host. For
example, if your databases are installed locally:
alias mysql 'mysql -h 127.0.0.1'
If you're trying to use some application or resource "outside" of Cygwin and you're
having a problem remember that Cygwin's path syntax may not be the correct one. Cygwin
understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when referring to the E:
drive) but the external resource may want 'E:/cygwin/home/jacky'. So your *rc files may
end up with paths written in these different syntaxes, depending.
If you can, install Cygwin on a drive or partition that's NTFS-formatted, not FAT32-
formatted. When you install Cygwin on a FAT32 partition you will not be able to set
permissions and ownership correctly. In most situations this probably won't make any
difference but there may be occasions where this is a problem.
If you want to use BLAST we recommend that the Windows binary be obtained from NCBI
(ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will be named something
like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions in README.bls.
Although we've recommended using the BLAST and MySQL binaries you should be able to
compile just about everything else from source code using Cygwin's gcc. You'll notice
when you're installing Cygwin that many different libraries are also available (gd, jpeg,
etc.).
More information about the Bioperl-l
mailing list