From nicolas.chauvat at logilab.fr  Mon Apr  2 11:58:49 2007
From: nicolas.chauvat at logilab.fr (Nicolas Chauvat)
Date: Mon, 2 Apr 2007 17:58:49 +0200
Subject: [BioPython] [ANN] EuroPython 2007: Call for Proposals
Message-ID: <20070402155849.GF24884@crater.logilab.fr>

Book Monday 9th July to Wednesday 11th July 2007 in your calendar!
EuroPython 2007, the European Python and Zope Conference, will be held in
Vilnius, Lithuania.  Last year's conference was a great success, featuring
a variety of tracks, amazing lightning talks and inspiring keynotes.  With
your participation, we want to make EuroPython 2007, the sixth EuroPython,
even more successful than the previous five.

Talks, Papers and Themes
------------------------

This year we have decided to borrow a few good ideas from PyCon, one of
which is to move away from the 'track' structure.  Instead, speakers are
invited to submit presentations about anything they have done that they
think would be of interest to the Python community.  We will then arrange
them into related groups and schedule them in the space available.  In the
past, EuroPython participants have found the following themes to be of
interest:

 * Science
 * Python Language and Libraries
 * Web Related Technologies
 * Education
 * Games
 * Agile Methodologies and Testing
 * Social Skills

In addition to talks, we will also accept full paper submissions about any
of the above themes.  The Call for Refereed Papers will be posted shortly.

The deadline for talk proposals is Friday 18th May at midnight (24:00
CEST, Central European Summer Time, UTC+2).

Other ways to participate
-------------------------

Apart from giving talks, there are plenty of other ways to participate in
the conference.  Just attending and talking to people you find here can be
satisfying enough, but there are three other kinds of activity you may wish
to plan for: Lightning Talks, Open Space and Sprints.  Lightning Talks are
very short talks that give you just enough time to introduce a topic or
project, Open Space is an area reserved for informal discussions, and
Sprints are focused gatherings for developers interested in particular
projects.  For more information please see the following pages:

 * Lightning Talks: http://www.europython.org/sections/events/lightning_talks
 * Open Space: http://www.europython.org/sections/events/open_space
 * Sprints: http://www.europython.org/sections/sprints_and_wiki

Your Contribution
-----------------

To propose a talk or a paper, go to...

 * http://www.europython.org/submit

For more general information on the conference, please visit...

 * http://www.europython.org/

Looking forward to seeing what you fine folk have been up to,

The EuroPython Team


-- 
Nicolas Chauvat

logilab.fr - services en informatique avanc?e et gestion de connaissances  

From alexl at users.sourceforge.net  Sun Apr  8 05:27:27 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Sun, 08 Apr 2007 02:27:27 -0700
Subject: [BioPython] Biopython package for Fedora
Message-ID: <274pnrcj28.fsf@delpy.biol.berkeley.edu>

(Apologies if you receive multiple copies, this is a repost, my
original bounced)

Hello Biopythonistas,

I have created preliminary RPM package of the latest release of
Biopython (1.43) for Fedora as part of the "Fedora Package Collection"
(formerly "Fedora Extras" since Fedora Core+Fedora Extras are
merging).  

(I am also packaging Bioperl, you can see my some of my progress
including links to the reviews here:

http://fedoraproject.org/wiki/AlexLancaster)

I am almost ready to submit my package for review, but several issues
have arisen during the packaging that I hope the biopython list can
help clarify before I do so:

1) Will Biopython work OK with Python 2.5?  I ask because the next
   release of Fedora (Fedora 7) will only ship with Python 2.5 and
   packages first need to build in the development branch (which will
   eventually become Fedora 7) first.

2) The "python setup.py install" step appears to install a lot of
   scripts with the "#!/usr/bin/env python" at the top into the main
   /usr/lib/python2.4/site-packages/Bio/ namespace, e.g.:

   /usr/lib/python2.4/site-packages/Bio/GFF/GenericTools.py

   should these scripts be installed somewhere more appropriate such
   as /usr/bin/GenericTools.py or do they also function as classes as
   well as executables in their own right?  

   The "rpmlint" tool which is part of the packaging scans a package
   built for Fedora and identifies certain aspects of the package as
   not following the package and/or file system hierarchy (FHS)
   guidelines. [1]  

3) The setup.py install also installs some architecture-independent
   non-code data files (such as DTDs) which I would normally expect to
   live in /usr/share/python-biopython/DTDs (or somesuch) for example:

   /usr/lib/python2.4/site-packages/Bio/EUtils/DTDs/eSearch_020511.dtd

   Is this the normal location for these DTDs and does the rest of the
   bipython framework expect to find these files in this location?

4) If possible, Fedora packages should run all unit tests provided in
   the upstream package at package time, just before creating the RPM.
   I would like to do this for biopython as well, but there doesn't
   seem to be an easy way to disable the PyUnit GUI that pops up and
   run in batch-only non-GUI mode.  I looked at the code in
   Tests/run_tests.py and it does have a "--no-gui" option, but there
   does not appear to be any way to run this from the top-level
   setup.py file, e.g.:

   python setup.py test --no-gui

   doesn't work.

5) My initial package depends on the required software: python, mx,
   python-numeric, as well as the optional python-reportlab,
   MySQL-python and flex which are all also included in Fedora, but I
   won't have Wise2 available since it is not yet in Fedora, at least
   not until I (or somebody else) packages Wise2.

6) Is Biopython-corba still active, and if so, should it also be
   packaged?  Are there any interdependencies with the base biopython
   package?  (No promises, though!)

Thanks,
Alex

[1] I attempted to attach the list at the end of the e-mail for the
    developers to identify and tell me if these files are OK where the
    setup.py currently puts them, but my original e-mail bounced
    probably because of the attachment.
--
Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of Arizona

From sbassi at gmail.com  Sun Apr  8 15:12:23 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Sun, 8 Apr 2007 16:12:23 -0300
Subject: [BioPython] Biopython package for Fedora
In-Reply-To: <274pnrcj28.fsf@delpy.biol.berkeley.edu>
References: <274pnrcj28.fsf@delpy.biol.berkeley.edu>
Message-ID: <b43bf2080704081212q3efa1bcfy23959c2a388b8e67@mail.gmail.com>

On 4/8/07, Alex Lancaster <alexl at users.sourceforge.net> wrote:
> 1) Will Biopython work OK with Python 2.5?  I ask because the next
>    release of Fedora (Fedora 7) will only ship with Python 2.5 and
>    packages first need to build in the development branch (which will
>    eventually become Fedora 7) first.

This is the only question I am able to answer. Yes, it does work with
Python 2.5.

From chris.lasher at gmail.com  Sun Apr  8 16:14:54 2007
From: chris.lasher at gmail.com (Chris Lasher)
Date: Sun, 8 Apr 2007 16:14:54 -0400
Subject: [BioPython] Biopython package for Fedora
In-Reply-To: <274pnrcj28.fsf@delpy.biol.berkeley.edu>
References: <274pnrcj28.fsf@delpy.biol.berkeley.edu>
Message-ID: <128a885f0704081314r490b7fbdj71d8b16612e8b54c@mail.gmail.com>

On 4/8/07, Alex Lancaster <alexl at users.sourceforge.net> wrote:
> 2) The "python setup.py install" step appears to install a lot of
>    scripts with the "#!/usr/bin/env python" at the top into the main
>    /usr/lib/python2.4/site-packages/Bio/ namespace, e.g.:
>
>    /usr/lib/python2.4/site-packages/Bio/GFF/GenericTools.py
>
>    should these scripts be installed somewhere more appropriate such
>    as /usr/bin/GenericTools.py or do they also function as classes as
>    well as executables in their own right?

The line
#!/usr/bin/env python

retrieves the appropriate Python installation as specified by the
user's defined environment. This is preferable to hard-coding
#!/usr/bin/python, which will always use the Python installation
pointed to by /usr/bin/python. For most users, this doesn't matter,
but if the user desires to use a local or custom installation of
Python, they must change all these scripts by hand to point to their
preferred Python install.

Say my distribution's Python is version 2.3 but I have installed a
local copy of version 2.5 which is symlinked at /usr/local/bin/python.
I can set /usr/local/bin/python ahead in my path and the scripts with
"#!/usr/bin/env python" will then execute with my preferred version
(2.5) of Python rather than the system version (2.3), but the scripts
with "#!/usr/bin/python" will execute with the system version (2.3)
rather than my prefered version (2.5). Web search for more details.

Chris

From alexl at users.sourceforge.net  Sun Apr  8 18:28:06 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Sun, 08 Apr 2007 15:28:06 -0700
Subject: [BioPython] Biopython package for Fedora
Message-ID: <e4fy7abix5.fsf@delpy.biol.berkeley.edu>

>>>>> "CL" == Chris Lasher  writes:

CL> On 4/8/07, Alex Lancaster <alexl at users.sourceforge.net> wrote:
>> 2) The "python setup.py install" step appears to install a lot of
>> scripts with the "#!/usr/bin/env python" at the top into the main
>> /usr/lib/python2.4/site-packages/Bio/ namespace, e.g.:
>> 
>> /usr/lib/python2.4/site-packages/Bio/GFF/GenericTools.py
>> 
>> should these scripts be installed somewhere more appropriate such
>> as /usr/bin/GenericTools.py or do they also function as classes as
>> well as executables in their own right?

CL> The line #!/usr/bin/env python

CL> retrieves the appropriate Python installation as specified by the
CL> user's defined environment. 

[...]

I'm aware of the function of the "/usr/bin/env python"
vs. "/usr/bin/python", that isn't the problem.  My question was about
the *location* of the script files when installed in
/usr/lib/python2.4/site-packages/Bio/* vs. being installed as
executables in /usr/bin/.

It seems that there are a number of files which contain both classes
and scripts and rpmlint identifies all files containing scripts which
aren't installed in a location like /usr/bin/ to make sure that
scripts aren't unintentionally installed in a non-executable location.

Alex

From alexl at users.sourceforge.net  Sun Apr  8 00:51:04 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Sat, 07 Apr 2007 21:51:04 -0700
Subject: [BioPython] Biopython package(s) for Fedora
Message-ID: <n0hcrrcvuv.fsf@delpy.biol.berkeley.edu>

An embedded and charset-unspecified text was scrubbed...
Name: biopython-rpmlint.txt
Url: http://lists.open-bio.org/pipermail/biopython/attachments/20070407/e9f04488/attachment.txt 

From chris.lasher at gmail.com  Wed Apr 11 00:43:14 2007
From: chris.lasher at gmail.com (Chris Lasher)
Date: Wed, 11 Apr 2007 00:43:14 -0400
Subject: [BioPython] Biopython package for Fedora
In-Reply-To: <274pnrcj28.fsf@delpy.biol.berkeley.edu>
References: <274pnrcj28.fsf@delpy.biol.berkeley.edu>
Message-ID: <128a885f0704102143j3697fe93keb6eb557da63e4fc@mail.gmail.com>

On 4/8/07, Alex Lancaster <alexl at users.sourceforge.net> wrote:
> 4) If possible, Fedora packages should run all unit tests provided in
>    the upstream package at package time, just before creating the RPM.
>    I would like to do this for biopython as well, but there doesn't
>    seem to be an easy way to disable the PyUnit GUI that pops up and
>    run in batch-only non-GUI mode.  I looked at the code in
>    Tests/run_tests.py and it does have a "--no-gui" option, but there
>    does not appear to be any way to run this from the top-level
>    setup.py file, e.g.:
>
>    python setup.py test --no-gui
>
>    doesn't work.

Alex, thanks for pointing this out. I sat down tonight and resolved this issue.

<http://bugzilla.open-bio.org/show_bug.cgi?id=2266>

The patch on there should be the fix needed. Save it as
setup_test.patch (or whatever, but that's convenient), place it in the
same directory as setup.py, and patch with the command

patch -p0 < setup_test.patch

Alternatively, I can send you the patched files (setup.py and
Tests/run_tests.py).

Thanks again for pointing this out.

Chris

From timmcilveen at talktalk.net  Wed Apr 11 10:15:52 2007
From: timmcilveen at talktalk.net (tim)
Date: Wed, 11 Apr 2007 15:15:52 +0100
Subject: [BioPython] installing on Mandriva Linux
Message-ID: <1176300953.3621.13.camel@localhost>

Hi,
I am getting lots of errors during python setup using biopython setup.py
install. I am running python  2.4.3. on Linux and have mxtextools,
numeric and headers etc. installed. The installation is definately  not
working as i get errors when i type some of the test code such as:
from Bio.Seq import Seq 
I get a traceback error.

Can anyone help. I'm new to biopython and Linux. I have everything
working fine under Windows.


I get problems from this point onwards in the install, with lots of
Bio/Cluster/clustermodule errors:


Do you want to continue this installation? (Y/n)  Y

*** Bio.KDTree *** NOT built by default

The Bio.PDB.NeighborSearch module depends on the Bio.KDTree module,
which in turn, depends on C++ code that does not compile cleanly on
all platforms. Hence, Bio.KDTree is not built by default.

Would you like to build Bio.KDTree ? (y/N)  y

creating build/temp.linux-i686-2.4/Bio/Cluster
gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -g -pipe
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fomit-frame-pointer -march=i586
-mtune=pentiumpro -fasynchronous-unwind-tables -g -fPIC -IBio/Cluster
-I/usr/include/python2.4 -c Bio/Cluster/clustermodule.c -o
build/temp.linux-i686-2.4/Bio/Cluster/clustermodule.o
Bio/Cluster/clustermodule.c:2:33: error: Numeric/arrayobject.h: No such
file or directory
Bio/Cluster/clustermodule.c:20: error: expected declaration specifiers
or ?...? before ?PyArrayObject?
Bio/Cluster/clustermodule.c: In function ?parse_data?:
Bio/Cluster/clustermodule.c:27: error: ?array? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:27: error: (Each undeclared identifier is
reported only once
Bio/Cluster/clustermodule.c:27: error: for each function it appears in.)
Bio/Cluster/clustermodule.c:27: error: ?PyArrayObject? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:27: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:35: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:44: error: ?PyArray_DOUBLE? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:45: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: At top level:
Bio/Cluster/clustermodule.c:84: error: expected ?)? before ?*? token
Bio/Cluster/clustermodule.c:98: error: expected declaration specifiers
or ?...? before ?PyArrayObject?
Bio/Cluster/clustermodule.c: In function ?parse_mask?:
Bio/Cluster/clustermodule.c:109: error: ?array? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:113: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:113: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:121: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:128: error: ?PyArray_INT? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:130: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: At top level:
Bio/Cluster/clustermodule.c:178: error: expected ?)? before ?*? token
Bio/Cluster/clustermodule.c:191: error: expected declaration specifiers
or ?...? before ?PyArrayObject?
Bio/Cluster/clustermodule.c: In function ?parse_weight?:
Bio/Cluster/clustermodule.c:197: error: ?array? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:201: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:201: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:209: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:210: error: ?PyArray_DOUBLE? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:212: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: At top level:
Bio/Cluster/clustermodule.c:255: error: expected ?)? before ?*? token
Bio/Cluster/clustermodule.c:265: error: expected ?=?, ?,?, ?;?, ?asm? or
?__attribute__? before ?*? token
Bio/Cluster/clustermodule.c:372: error: expected declaration specifiers
or ?...? before ?PyArrayObject?
Bio/Cluster/clustermodule.c: In function ?parse_clusterid?:
Bio/Cluster/clustermodule.c:383: error: ?array? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:389: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:389: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:397: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:399: error: ?PyArray_INT? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:401: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: At top level:
Bio/Cluster/clustermodule.c:471: error: expected ?)? before ?*? token
Bio/Cluster/clustermodule.c:482: error: expected declaration specifiers
or ?...? before ?PyArrayObject?
Bio/Cluster/clustermodule.c: In function ?free_distances?:
Bio/Cluster/clustermodule.c:485: error: ?array? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:489: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:489: error: ?a? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:489: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: At top level:
Bio/Cluster/clustermodule.c:515: error: expected declaration specifiers
or ?...? before ?PyArrayObject?
Bio/Cluster/clustermodule.c: In function ?parse_distance?:
Bio/Cluster/clustermodule.c:522: error: ?array? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:522: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:522: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:545: error: ?a? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:545: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:557: error: ?PyArray_DOUBLE? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:576: warning: assignment makes pointer from
integer without a cast
Bio/Cluster/clustermodule.c:584: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:601: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:628: warning: passing argument 3 of
?free_distances? makes integer from pointer without a cast
Bio/Cluster/clustermodule.c:628: error: too many arguments to function
?free_distances?
Bio/Cluster/clustermodule.c:637: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:640: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: At top level:
Bio/Cluster/clustermodule.c:716: error: expected declaration specifiers
or ?...? before ?PyArrayObject?
Bio/Cluster/clustermodule.c: In function ?create_celldata?:
Bio/Cluster/clustermodule.c:725: error: ?array? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:725: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:725: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: At top level:
Bio/Cluster/clustermodule.c:753: error: expected declaration specifiers
or ?...? before ?PyArrayObject?
Bio/Cluster/clustermodule.c: In function ?parse_index?:
Bio/Cluster/clustermodule.c:757: error: ?array? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:766: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:766: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:776: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:778: error: ?PyArray_INT? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:780: warning: assignment makes pointer from
integer without a cast
Bio/Cluster/clustermodule.c:787: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:803: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: At top level:
Bio/Cluster/clustermodule.c:818: error: expected ?)? before ?*? token
Bio/Cluster/clustermodule.c: In function ?PyTree_cut?:
Bio/Cluster/clustermodule.c:1165: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:1165: error: ?aCLUSTERID? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:1165: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:1181: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:1187: error: ?clusterid? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:1197: warning: return makes pointer from
integer without a cast
Bio/Cluster/clustermodule.c: In function ?py_kcluster?:
Bio/Cluster/clustermodule.c:1312: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:1312: error: ?aDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1315: error: ?aMASK? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1318: error: ?aWEIGHT? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1325: error: ?aCLUSTERID? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:1379: error: too many arguments to function
?parse_data?
Bio/Cluster/clustermodule.c:1384: error: too many arguments to function
?parse_mask?
Bio/Cluster/clustermodule.c:1416: error: too many arguments to function
?parse_weight?
Bio/Cluster/clustermodule.c: In function ?py_kmedoids?:
Bio/Cluster/clustermodule.c:1501: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:1501: error: ?aDISTANCES? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:1504: error: ?aCLUSTERID? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:1533: error: too many arguments to function
?parse_distance?
Bio/Cluster/clustermodule.c:1538: warning: passing argument 3 of
?free_distances? makes integer from pointer without a cast
Bio/Cluster/clustermodule.c:1538: error: too many arguments to function
?free_distances?
Bio/Cluster/clustermodule.c:1545: warning: passing argument 3 of
?free_distances? makes integer from pointer without a cast
Bio/Cluster/clustermodule.c:1545: error: too many arguments to function
?free_distances?
Bio/Cluster/clustermodule.c:1552: warning: passing argument 3 of
?free_distances? makes integer from pointer without a cast
Bio/Cluster/clustermodule.c:1552: error: too many arguments to function
?free_distances?
Bio/Cluster/clustermodule.c:1565: warning: passing argument 3 of
?free_distances? makes integer from pointer without a cast
Bio/Cluster/clustermodule.c:1565: error: too many arguments to function
?free_distances?
Bio/Cluster/clustermodule.c: In function ?py_treecluster?:
Bio/Cluster/clustermodule.c:1706: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:1706: error: ?aDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1707: error: ?aMASK? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1708: error: ?aWEIGHT? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1726: error: too many arguments to function
?parse_data?
Bio/Cluster/clustermodule.c:1733: error: too many arguments to function
?parse_mask?
Bio/Cluster/clustermodule.c:1739: error: too many arguments to function
?parse_weight?
Bio/Cluster/clustermodule.c:1762: error: ?aDISTANCEMATRIX? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:1770: error: too many arguments to function
?parse_distance?
Bio/Cluster/clustermodule.c:1783: warning: passing argument 3 of
?free_distances? makes integer from pointer without a cast
Bio/Cluster/clustermodule.c:1783: error: too many arguments to function
?free_distances?
Bio/Cluster/clustermodule.c: In function ?py_somcluster?:
Bio/Cluster/clustermodule.c:1849: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:1849: error: ?aDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1852: error: ?aMASK? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1855: error: ?aWEIGHT? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1863: error: ?aCELLDATA? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:1865: error: ?aCLUSTERID? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:1922: error: too many arguments to function
?parse_data?
Bio/Cluster/clustermodule.c:1929: error: too many arguments to function
?parse_mask?
Bio/Cluster/clustermodule.c:1935: error: too many arguments to function
?parse_weight?
Bio/Cluster/clustermodule.c:1944: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:1954: error: too many arguments to function
?create_celldata?
Bio/Cluster/clustermodule.c: In function ?py_median?:
Bio/Cluster/clustermodule.c:1996: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:1996: error: ?aDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2007: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:2015: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:2018: error: ?PyArray_DOUBLE? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:2019: warning: initialization makes pointer
from integer without a cast
Bio/Cluster/clustermodule.c:2021: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:2037: warning: initialization makes pointer
from integer without a cast
Bio/Cluster/clustermodule.c:2043: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: In function ?py_mean?:
Bio/Cluster/clustermodule.c:2062: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:2062: error: ?aDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2073: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:2081: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:2084: error: ?PyArray_DOUBLE? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:2085: warning: initialization makes pointer
from integer without a cast
Bio/Cluster/clustermodule.c:2087: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:2103: warning: initialization makes pointer
from integer without a cast
Bio/Cluster/clustermodule.c:2109: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: In function ?py_clusterdistance?:
Bio/Cluster/clustermodule.c:2167: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:2167: error: ?aDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2170: error: ?aMASK? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2173: error: ?aWEIGHT? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2181: error: ?aINDEX1? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2184: error: ?aINDEX2? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2216: error: too many arguments to function
?parse_data?
Bio/Cluster/clustermodule.c:2222: error: too many arguments to function
?parse_mask?
Bio/Cluster/clustermodule.c:2228: error: too many arguments to function
?parse_weight?
Bio/Cluster/clustermodule.c:2235: error: too many arguments to function
?parse_index?
Bio/Cluster/clustermodule.c:2242: error: too many arguments to function
?parse_index?
Bio/Cluster/clustermodule.c: In function ?py_clustercentroids?:
Bio/Cluster/clustermodule.c:2312: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:2312: error: ?aDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2315: error: ?aMASK? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2318: error: ?aCLUSTERID? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:2322: error: ?aCDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2324: error: ?aCMASK? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2350: error: too many arguments to function
?parse_data?
Bio/Cluster/clustermodule.c:2356: error: too many arguments to function
?parse_mask?
Bio/Cluster/clustermodule.c:2362: warning: passing argument 3 of
?parse_clusterid? makes pointer from integer without a cast
Bio/Cluster/clustermodule.c:2362: error: too many arguments to function
?parse_clusterid?
Bio/Cluster/clustermodule.c:2371: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:2384: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: In function ?py_distancematrix?:
Bio/Cluster/clustermodule.c:2466: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:2466: error: ?aDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2469: error: ?aMASK? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2472: error: ?aWEIGHT? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2507: error: too many arguments to function
?parse_data?
Bio/Cluster/clustermodule.c:2514: error: too many arguments to function
?parse_mask?
Bio/Cluster/clustermodule.c:2520: error: too many arguments to function
?parse_weight?
Bio/Cluster/clustermodule.c:2542: error: ?PyArray_DOUBLE? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:2542: warning: initialization makes pointer
from integer without a cast
Bio/Cluster/clustermodule.c:2548: error: expected expression before ?)?
token
error: command 'gcc' failed with exit status 1
[tim at localhost biopython-1.43]$
[tim at localhost biopython-1.43]$      


Thanks,
Tim


From alexl at users.sourceforge.net  Wed Apr 11 10:44:29 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Wed, 11 Apr 2007 07:44:29 -0700
Subject: [BioPython] Biopython package for Fedora
In-Reply-To: <128a885f0704102143j3697fe93keb6eb557da63e4fc@mail.gmail.com>
	(Chris Lasher's message of "Wed\, 11 Apr 2007 00\:43\:14 -0400")
References: <274pnrcj28.fsf@delpy.biol.berkeley.edu>
	<128a885f0704102143j3697fe93keb6eb557da63e4fc@mail.gmail.com>
Message-ID: <n4hcrnas36.fsf@delpy.biol.berkeley.edu>

>>>>> "CL" == Chris Lasher  writes:

[...]

CL> Alex, thanks for pointing this out. I sat down tonight and
CL> resolved this issue.

CL> <http://bugzilla.open-bio.org/show_bug.cgi?id=2266>

CL> The patch on there should be the fix needed. Save it as
CL> setup_test.patch (or whatever, but that's convenient), place it in
CL> the same directory as setup.py, and patch with the command

CL> patch -p0 < setup_test.patch

CL> Alternatively, I can send you the patched files (setup.py and
CL> Tests/run_tests.py).

CL> Thanks again for pointing this out.

Hi Chris,

Thanks, the patch works fine for me.  I've added the patch to the
package and I can now run the tests in command-line only mode fine.
By the way, I've filed my package review for Fedora:

https://bugzilla.redhat.com/235989 

if anybody wants to keep track of it's progress.  I am currently still
disabling the tests because they hang for some reason on test_Cluster,
I get:

$ python setup.py test --no-gui
running test
test_Ace ... ok
test_BioSQL ... Skipping test because of import error: Skipping BioSQL
tests -- enable tests in Tests/test_BioSQL.py
ok
test_CAPS ... ok
test_Cluster ... 

then the CPU spins indefinitely.

Also I need to make sure that all tests that require network access
are skipped cleanly because the package build environment for Fedora
requires that all packages build without network acess.

On another packaging note: I now remove all #!/usr/bin/ etc. from the
top of files found in the /usr/lib/python2.4/site-packages/Bio/* area
to keep rpmlint happy.  These can still be run using python directly
e.g.:

python /usr/lib/python2.4/site-packages/Bio/biblio.py

Note that there's a lot of inconsistency here: some are "/usr/bin/env
python", others are /usr/bin/python or even /usr/bin/python2.3, others
don't have a main program contained within, and so the #!/usr/bin line
should be removed completely.  Somebody should go through and
cleanup/rationalise the installation process: check that the files
installed when "python setup.py install" is run are appropriate .py
package files, e.g. the EUtils installs it's own "setup.py" file in a
subdirectory, which isn't very clean.

Alex

From mdehoon at c2b2.columbia.edu  Wed Apr 11 11:44:30 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Wed, 11 Apr 2007 17:44:30 +0200
Subject: [BioPython] installing on Mandriva Linux
In-Reply-To: <1176300953.3621.13.camel@localhost>
References: <1176300953.3621.13.camel@localhost>
Message-ID: <461D025E.9070107@c2b2.columbia.edu>

tim wrote:

>I get problems from this point onwards in the install, with lots of
>Bio/Cluster/clustermodule errors:
>...
>creating build/temp.linux-i686-2.4/Bio/Cluster
>gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -g -pipe
>-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fomit-frame-pointer -march=i586
>-mtune=pentiumpro -fasynchronous-unwind-tables -g -fPIC -IBio/Cluster
>-I/usr/include/python2.4 -c Bio/Cluster/clustermodule.c -o
>build/temp.linux-i686-2.4/Bio/Cluster/clustermodule.o
>Bio/Cluster/clustermodule.c:2:33: error: Numeric/arrayobject.h: No such
>file or directory
>  
>
This is the first error message that you get. Did you check that you 
have the header file arrayobject.h? And is it in the correct location?

--Michiel


From jhortia1 at jhu.edu  Fri Apr 13 15:21:54 2007
From: jhortia1 at jhu.edu (JASON HORTIATIS)
Date: Fri, 13 Apr 2007 15:21:54 -0400
Subject: [BioPython] Local Blast Output
Message-ID: <f459cf8972a3.461fa012@johnshopkins.edu>

I'm an undergraduate using biopython to run local blast searches and I'm trying to find out how to save the entire sequence of each protein hit directly to a file.  I have only managed to be able to print the portion of the sequence that matches the query using hsp.sbjct[0:].  My goal is to use the search results from one blast run as a database to search against for a subsequent run so a fasta file is needed for each hit of the first run.
Thanks for the help!

Jason  

From sbassi at gmail.com  Sat Apr 14 00:14:20 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Sat, 14 Apr 2007 01:14:20 -0300
Subject: [BioPython] Local Blast Output
In-Reply-To: <f459cf8972a3.461fa012@johnshopkins.edu>
References: <f459cf8972a3.461fa012@johnshopkins.edu>
Message-ID: <b43bf2080704132114g7b018e8ax5da968eca1efc768@mail.gmail.com>

On 4/13/07, JASON HORTIATIS <jhortia1 at jhu.edu> wrote:
> I'm an undergraduate using biopython to run local blast searches and I'm trying to find out how to save the entire sequence of each protein hit directly to a file.  I have only managed to be able to print the portion of the sequence that matches the query using hsp.sbjct[0:].  My goal is to use the search results from one blast run as a database to search against for a subsequent run so a fasta file is needed for each hit of the first run.
> Thanks for the help!

You can only parse from the BLAST result what is inside the BLAST
output. And there is no the whole sequence In such output, just the
portion you've retrieved. You may need to parse the GID of the protein
and then look for it in your BLAST DB (using fastacmd).
Or you may use PSI-BLAST as an alternative.

From elventear at gmail.com  Tue Apr 17 13:52:40 2007
From: elventear at gmail.com (Pepe Barbe)
Date: Tue, 17 Apr 2007 12:52:40 -0500
Subject: [BioPython] Martel Help
Message-ID: <3e73596b0704171052g7ba3abb0uc04cbce3952d2bd2@mail.gmail.com>

Hello,

I am interested in using Martel for parsing some Biology formats (So
far nothing new).

While the module seems really good, I've been struggling to find some
meaningful documentation. So far I feel I am walking in the dark.
Still I've made some progress. If there is some tutorial or complete
documentation out there I would appreciate if some would point me to
it.

My current question is the following. I have the impression that every
single line that the Martel parser is going to parse must be
recognized, and otherwise it will raise and Exception. Is this
correct? If its true, how can I ignore anything that doesn't match a
RegEx and just process what matches?

Thanks,
Pepe

From elventear at gmail.com  Wed Apr 18 12:54:30 2007
From: elventear at gmail.com (Pepe Barbe)
Date: Wed, 18 Apr 2007 11:54:30 -0500
Subject: [BioPython] Martel Help
In-Reply-To: <3e73596b0704171052g7ba3abb0uc04cbce3952d2bd2@mail.gmail.com>
References: <3e73596b0704171052g7ba3abb0uc04cbce3952d2bd2@mail.gmail.com>
Message-ID: <3e73596b0704180954k752f9be9n6a4f4f46ea2c0435@mail.gmail.com>

Hello,

I've been reading the meager information available for Martel and I
have made good progress, I think. I am basically following the example
in the Exelixis presentation.

In the example, there are some things whose purpose is obvious but the
implementation details (Or all the possible options) aren't. Currently
I am curious on how does Martel.HeaderFooter and Std.record affect the
parsing.

Later in that example they use: blat.format.make_iterator("record").
Where does the "record" come from? Because of using Std.record?

Any help would be deeply appreciated.

Pepe

From dalke at dalkescientific.com  Wed Apr 18 17:45:00 2007
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 18 Apr 2007 23:45:00 +0200
Subject: [BioPython] Martel Help
In-Reply-To: <3e73596b0704180954k752f9be9n6a4f4f46ea2c0435@mail.gmail.com>
References: <3e73596b0704171052g7ba3abb0uc04cbce3952d2bd2@mail.gmail.com>
	<3e73596b0704180954k752f9be9n6a4f4f46ea2c0435@mail.gmail.com>
Message-ID: <C55A75D7-E615-46DF-ADCE-3353650488D9@dalkescientific.com>

On Apr 18, 2007, at 6:54 PM, Pepe Barbe wrote:
> In the example, there are some things whose purpose is obvious but the
> implementation details (Or all the possible options) aren't. Currently
> I am curious on how does Martel.HeaderFooter and Std.record affect the
> parsing.

I'm having to think back several years now.

A limitation with Martel is parsing large data files.  It
has a memory overhead of several times the data file being
processed.  Eg, a 1 MB file might take 7 or so MB to process.

Most bioinformatics formats are composed of records.  Eg,
a GenBank file contains many GenBank records.  The idea of the
Header / Footer / HeaderFooter classes is to break the large
file down into small records, and only have the overhead for
parsing a record.

(But it doesn't help processing large records, like the
entire chromosome as a single FASTA record.)

In FASTA files there is no header or footer.  It can be
read and split up using a RecordReader.  Specifically with
a StartsWith record reader told to look for a ">" which
marks the start of a new record.  Compare to SwissProt
where the record ends with a "//" line.

Some formats are more complicated.  GenBank is one.  Real
genbank files start with a header, something like

GBGSS1.SEQ           Genetic Sequence Data Bank
                           February 15 2003

                 NCBI-GenBank Flat File Release 134.0

                            GSS Sequences (Part 1)

    88066 loci,    66600405 bases, from    88066 reported sequences


There needs to be a way to process a single, unique header,
followed by 0-or-more repeats of a record, followed by an
optional footer.

Use the HeaderFooter expression for this case.

In general, this is a clumsy solution.


Ignore the Std.record.  My thought was that the different terms
in the expression could be standardized.  For example, that
all sequences are tagged with "bio:seq".  I hoped this would
minimize the work needed to add a new format because most of
the handlers would look for expected tags, and not depend so
much on the actual structure of the XML.

It proved too complicated to explain and use.

> Later in that example they use: blat.format.make_iterator("record").
> Where does the "record" come from? Because of using Std.record?

The "record" comes from a group name used in the expression.
It describes the point where the repetition will be done.


				Andrew
				dalke at dalkescientific.com


From skhadar at gmail.com  Fri Apr 20 08:47:07 2007
From: skhadar at gmail.com (Shameer Khadar)
Date: Fri, 20 Apr 2007 18:17:07 +0530
Subject: [BioPython] Protparam using BioPythn
Message-ID: <b6ff81950704200547t77a0edb1ycacf404116f8e655@mail.gmail.com>

Dear All,

I am looking for a script to run Protparam for a 1000 sequence. It will be
great if anyone can point me to a program / web page to get it done.

Many thanks in advance,
Shameer Khadar

From biopython at maubp.freeserve.co.uk  Fri Apr 20 09:51:54 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 20 Apr 2007 14:51:54 +0100
Subject: [BioPython] Protparam using BioPython
In-Reply-To: <b6ff81950704200547t77a0edb1ycacf404116f8e655@mail.gmail.com>
References: <b6ff81950704200547t77a0edb1ycacf404116f8e655@mail.gmail.com>
Message-ID: <4628C57A.7010803@maubp.freeserve.co.uk>

Shameer Khadar wrote:
> Dear All,
> 
> I am looking for a script to run Protparam for a 1000 sequence. It will be
> great if anyone can point me to a program / web page to get it done.

Do you mean the Biopython module Bio.SeqUtils.ProtParam which does 
protein analysis (e.g. isoelectric point).

Did you mean the Expasy ProtParam tool available online?  If you only 
have a few sequences doing them online by hand would be easy:
http://www.expasy.org/tools/protparam.html

Or did you mean something else?

Peter

P.S. did you mean 1000 different sequences, or a single 1000 amino acid 
sequence?


From skhadar at gmail.com  Fri Apr 20 11:19:01 2007
From: skhadar at gmail.com (Shameer Khadar)
Date: Fri, 20 Apr 2007 20:49:01 +0530
Subject: [BioPython] Protparam using BioPython
In-Reply-To: <4628C57A.7010803@maubp.freeserve.co.uk>
References: <b6ff81950704200547t77a0edb1ycacf404116f8e655@mail.gmail.com>
	<4628C57A.7010803@maubp.freeserve.co.uk>
Message-ID: <b6ff81950704200819h372f6296sfaa547113c8c2c5c@mail.gmail.com>

Dear Peter,

Thanks for your reply. I was looking for a script based on Bio.SeqUtils.
I got the following script from a website, its working perfect for me. But
the problem is i have around 1000 sequence (in raw format without headers)
and i thought to process it using a foreach equivalent in python(I am a
python newbie). But its only a couple of minutes back i came to know that
there is no foreach in python, but some better alternative is available
!!!.  It will be great if you can help to process my file using this
program.

program :
from Bio.SeqUtils import ProtParam, ProtParamData
def PrintDictionary(MyDict):
        for i in MyDict.keys():
                print "%s\t%.2f" %(i, MyDict[i])
        print "MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFL"
X = ProtParam.ProteinAnalysis("")
print "Instability index of test protein: %.2f" % X.instability_index()

first few lines of my file :
AEGEFAHLYGTFRED
AEGEFAHLZGTFRED
AEGEFGATYGVYTSD
AEGEFGATZGVYTSD
AEGEFGATYGVZTSD
AEGEFGATZGVZTSD
AEGEFLYGEIQGTQD

Thank you once again,
Shameer

On 4/20/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Shameer Khadar wrote:
> > Dear All,
> >
> > I am looking for a script to run Protparam for a 1000 sequence. It will
> be
> > great if anyone can point me to a program / web page to get it done.
>
> Do you mean the Biopython module Bio.SeqUtils.ProtParam which does
> protein analysis (e.g. isoelectric point).
>
> Did you mean the Expasy ProtParam tool available online?  If you only
> have a few sequences doing them online by hand would be easy:
> http://www.expasy.org/tools/protparam.html
>
> Or did you mean something else?
>
> Peter
>
> P.S. did you mean 1000 different sequences, or a single 1000 amino acid
> sequence?
>
>

From alexl at users.sourceforge.net  Wed Apr 25 04:22:44 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Wed, 25 Apr 2007 01:22:44 -0700
Subject: [BioPython] Bioperl packages now available for Fedora
Message-ID: <3kzm4w50dn.fsf@delpy.biol.berkeley.edu>

Hi all,

Fedora packages for Biopython are now available in the official Fedora
repositories.  Packages for Fedora Core 6 (FC-6) and Rawhide (the
soon-to-be Fedora 7) are available immediately and are installable via
the simple yum command:

# sudo yum install python-biopython

and through any other GUI based installers available for Fedora, such
as piruit, smart or yumex.  The name of the package is
python-biopython.  (A package for Fedora Core 5 has been built and
should be in the FC-5 repository within the next 24 hours or so).

These packages have all optional packages enabled by default:
MySQL-python, python-reportlab and Wise2.  Please file bugs on these
packages in Red Hat/Fedora bugzilla under "Fedora Extras":

https://bugzilla.redhat.com/bugzilla/

please choose your release and select the "python-biopython"
component.

If somebody could update the wiki page with this information, that
would be great: http://biopython.org/wiki/Download

Alex

From biopython at maubp.freeserve.co.uk  Fri Apr 27 05:55:42 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Apr 2007 10:55:42 +0100
Subject: [BioPython] Protparam using BioPython
In-Reply-To: <4628C57A.7010803@maubp.freeserve.co.uk>
References: <b6ff81950704200547t77a0edb1ycacf404116f8e655@mail.gmail.com>
	<4628C57A.7010803@maubp.freeserve.co.uk>
Message-ID: <4631C89E.3090208@maubp.freeserve.co.uk>

Shameer Khadar wrote:
> Dear Peter,
> 
> Thanks for your reply.

Sorry for the delay - I was away on a course this week.

 > I was looking for a script based on Bio.SeqUtils.
> I got the following script from a website, its working perfect for me. But
> the problem is i have around 1000 sequence (in raw format without headers)
> and i thought to process it using a foreach equivalent in python(I am a
> python newbie). But its only a couple of minutes back i came to know that
> there is no foreach in python, but some better alternative is available
> !!!.

There is a "for each" equivalent in python! 
http://docs.python.org/tut/node6.html

If you don't have a good introductory python book, that online tutorial 
is an excellent starting point.

 > It will be great if you can help to process my file using this
> program.
> 
> program :
> from Bio.SeqUtils import ProtParam, ProtParamData
> def PrintDictionary(MyDict):
>         for i in MyDict.keys():
>                 print "%s\t%.2f" %(i, MyDict[i])
>         print "MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFL"
> X = ProtParam.ProteinAnalysis("")
> print "Instability index of test protein: %.2f" % X.instability_index()

It seems like you have only given bits of a program, so I have tried to 
guess what you meant.

> first few lines of my file :
> AEGEFAHLYGTFRED
> AEGEFAHLZGTFRED
> AEGEFGATYGVYTSD
> AEGEFGATZGVYTSD
> AEGEFGATYGVZTSD
> AEGEFGATZGVZTSD
> AEGEFLYGEIQGTQD

In the following example, I am assuming your sequences are in a plain 
text file, called protparam.txt, which contains each sequence on a 
single line.

Try something like this first of all, and make sure that it prints out 
your sequences correctly:

for line in open("protparam.txt") :
     #Remove any trailing new lines or white space
     seq_string = line.rstrip()
     print "Sequence <%s>" % seq_string

Then try doing the ProtParam.ProteinAnalysis of each sequence string:

from Bio.SeqUtils import ProtParam, ProtParamData
for line in open("protparam.txt") :
     #Remove any trailing new lines or white space
     seq_string = line.rstrip()
     print "Sequence <%s>" % seq_string
     X = ProtParam.ProteinAnalysis(seq_string)
     print "Instability index: %.2f" % X.instability_index()

You'll find it doesn't like the "Z" (presumably this is Glx - glutamic 
acid or glutamine? i.e. E or Q) present in many of your sequences, so 
this next version uses error handling to note this and then carry on to 
the next sequence:

from Bio.SeqUtils import ProtParam, ProtParamData
for line in open("protparam.txt") :
     #Remove any trailing new lines or white space
     seq_string = line.rstrip()

     print #blank line
     print "Sequence <%s>" % seq_string
     X = ProtParam.ProteinAnalysis(seq_string)
     try :
         print "Instability index: %.2f" % X.instability_index()
     except KeyError, e :
         print "Problem with the letter %s in the sequence?" % str(e)

The output is:

Sequence <AEGEFAHLYGTFRED>
Instability index: 8.39

Sequence <AEGEFAHLZGTFRED>
Problem with the letter 'Z' in the sequence?

Sequence <AEGEFGATYGVYTSD>
Instability index: -17.70

Sequence <AEGEFGATZGVYTSD>
Problem with the letter 'Z' in the sequence?

Sequence <AEGEFGATYGVZTSD>
Problem with the letter 'Z' in the sequence?

Sequence <AEGEFGATZGVZTSD>
Problem with the letter 'Z' in the sequence?

Sequence <AEGEFLYGEIQGTQD>
Instability index: 8.61

You'll have to check yourself to see if these numbers are sensible.  I 
don't know what to suggest for your "Z" entries - the stability will be 
different if you try using E or Q instead.

Peter


From biopython at maubp.freeserve.co.uk  Sat Apr 28 04:58:40 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Apr 2007 09:58:40 +0100
Subject: [BioPython] EMBL parsing in Biopython 1.43
Message-ID: <46330CC0.9060708@maubp.freeserve.co.uk>

As part of the new SeqIO system introduced in Biopython 1.43, I added 
the ability to read in EMBL format sequences.

http://biopython.org/wiki/SeqIO

I would be interested to hear feedback (positive or negative) from 
anyone who has tried to use this.

Peter


From alexl at users.sourceforge.net  Sat Apr 28 06:21:40 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Sat, 28 Apr 2007 03:21:40 -0700
Subject: [BioPython] Somebody vandalised the wiki download page
Message-ID: <7x1wi4re8b.fsf@delpy.biol.berkeley.edu>

I just created an account and fixed it with this edit:

http://biopython.org/w/index.php?title=Download&diff=1868&oldid=1867

Can somebody with sufficient admin privileges block user "Uzman"?

Thanks,
Alex

From cjfields at uiuc.edu  Sat Apr 28 09:53:37 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 28 Apr 2007 08:53:37 -0500
Subject: [BioPython] Somebody vandalised the wiki download page
In-Reply-To: <7x1wi4re8b.fsf@delpy.biol.berkeley.edu>
References: <7x1wi4re8b.fsf@delpy.biol.berkeley.edu>
Message-ID: <D6C3B50F-AF45-4B59-904E-2FE9D5FEC857@uiuc.edu>

Done.

chris

On Apr 28, 2007, at 5:21 AM, Alex Lancaster wrote:

> I just created an account and fixed it with this edit:
>
> http://biopython.org/w/index.php?title=Download&diff=1868&oldid=1867
>
> Can somebody with sufficient admin privileges block user "Uzman"?
>
> Thanks,
> Alex
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From mdehoon at c2b2.columbia.edu  Sun Apr 29 06:16:31 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Apr 2007 19:16:31 +0900
Subject: [BioPython] EMBL parsing in Biopython 1.43
In-Reply-To: <46330CC0.9060708@maubp.freeserve.co.uk>
References: <46330CC0.9060708@maubp.freeserve.co.uk>
Message-ID: <4634707F.5060607@c2b2.columbia.edu>

Thanks Peter!

I tried this EMBL-formatted file (using the latest version of Biopython 
in CVS):

ftp://ftp.pasteur.fr/pub/GenomeDB/SubtiList/FlatFiles/SLR16.1_embl.txt

but I got this error message:

 >>> from Bio import SeqIO
 >>> input = open("SLR16.1_embl.txt")
 >>> records = SeqIO.parse(input, format="embl")
 >>> records.next()
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File 
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", 
line 410, in parse_records
     record = self.parse(handle)
   File 
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", 
line 393, in parse
     if self.feed(handle, consumer) :
   File 
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", 
line 360, in feed
     self._feed_first_line(consumer, self.line)
   File 
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", 
line 540, in _feed_first_line
     assert len(fields) == 7
AssertionError
 >>>

Do you have an idea as to what may be going wrong here?

--Michiel.


Peter wrote:
> As part of the new SeqIO system introduced in Biopython 1.43, I added 
> the ability to read in EMBL format sequences.
> 
> http://biopython.org/wiki/SeqIO
> 
> I would be interested to hear feedback (positive or negative) from 
> anyone who has tried to use this.
> 
> Peter
> 
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


From biopython at maubp.freeserve.co.uk  Sun Apr 29 16:02:05 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 29 Apr 2007 21:02:05 +0100
Subject: [BioPython] EMBL parsing in Biopython 1.43
In-Reply-To: <4634707F.5060607@c2b2.columbia.edu>
References: <46330CC0.9060708@maubp.freeserve.co.uk>
	<4634707F.5060607@c2b2.columbia.edu>
Message-ID: <4634F9BD.8070909@maubp.freeserve.co.uk>

Michiel de Hoon wrote:
> Thanks Peter!
> 
> I tried this EMBL-formatted file (using the latest version of Biopython 
> in CVS):
> 
> ftp://ftp.pasteur.fr/pub/GenomeDB/SubtiList/FlatFiles/SLR16.1_embl.txt
> 
> but I got this error message:
> 
>  >>> from Bio import SeqIO
>  >>> input = open("SLR16.1_embl.txt")
>  >>> records = SeqIO.parse(input, format="embl")
>  >>> records.next()
> Traceback (most recent call last):
...
> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", 
> line 540, in _feed_first_line
>      assert len(fields) == 7
> AssertionError
>  >>>

Does the same here on with CVS Biopython on Linux with python 2.4

> Do you have an idea as to what may be going wrong here?

Yes - I wrote and EMBL parser using the latest file format, while I 
suspect your file from the Pasteur Institute uses an older format - 
specifically one where the first list (the ID line) has a different 
number of fields.

This is reminiscent of the various revisions to the GenBank LOCUS line 
which we also have to cope with.

I hope to have a fix in CVS today/tomorrow.

Peter


From biopython at maubp.freeserve.co.uk  Sun Apr 29 18:11:07 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 29 Apr 2007 23:11:07 +0100
Subject: [BioPython] EMBL parsing in Biopython 1.43
In-Reply-To: <4634F9BD.8070909@maubp.freeserve.co.uk>
References: <46330CC0.9060708@maubp.freeserve.co.uk>	<4634707F.5060607@c2b2.columbia.edu>
	<4634F9BD.8070909@maubp.freeserve.co.uk>
Message-ID: <463517FB.9090706@maubp.freeserve.co.uk>

Peter wrote:
> Michiel de Hoon wrote:
>> Do you have an idea as to what may be going wrong here?
> 
> Yes - I wrote and EMBL parser using the latest file format, while I 
> suspect your file from the Pasteur Institute uses an older format - 
> specifically one where the first list (the ID line) has a different 
> number of fields.

The file you tried seems to use the pre 2006 style ID line.  I found 
another example like this on the BioPerl webpage.  See also:

http://www.ebi.ac.uk/embl/Documentation/archivedchanges.html

> I hope to have a fix in CVS today/tomorrow.

I have updated Bio/GenBank/Scanner.py to cope with these old EMBL ID 
lines and added another EMBL test case to test_SeqIO.py

Your example now parses fine, giving a single SeqRecord as expected.  I 
have not checked the annotation or features...

Peter


From skhadar at gmail.com  Mon Apr 30 09:01:56 2007
From: skhadar at gmail.com (Shameer Khadar)
Date: Mon, 30 Apr 2007 18:31:56 +0530
Subject: [BioPython] Protparam using BioPython
In-Reply-To: <4631C89E.3090208@maubp.freeserve.co.uk>
References: <b6ff81950704200547t77a0edb1ycacf404116f8e655@mail.gmail.com>
	<4628C57A.7010803@maubp.freeserve.co.uk>
	<4631C89E.3090208@maubp.freeserve.co.uk>
Message-ID: <b6ff81950704300601w436f5837xb7a86a033a22f0d8@mail.gmail.com>

Dear Peter,

Thanks a  lot for you detailed reply and splendid help !!!
It worked !!
Cheers,
Shameer

On 4/27/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Shameer Khadar wrote:
> > Dear Peter,
> >
> > Thanks for your reply.
>
> Sorry for the delay - I was away on a course this week.
>
> > I was looking for a script based on Bio.SeqUtils.
> > I got the following script from a website, its working perfect for me.
> But
> > the problem is i have around 1000 sequence (in raw format without
> headers)
> > and i thought to process it using a foreach equivalent in python(I am a
> > python newbie). But its only a couple of minutes back i came to know
> that
> > there is no foreach in python, but some better alternative is available
> > !!!.
>
> There is a "for each" equivalent in python!
> http://docs.python.org/tut/node6.html
>
> If you don't have a good introductory python book, that online tutorial
> is an excellent starting point.
>
> > It will be great if you can help to process my file using this
> > program.
> >
> > program :
> > from Bio.SeqUtils import ProtParam, ProtParamData
> > def PrintDictionary(MyDict):
> >         for i in MyDict.keys():
> >                 print "%s\t%.2f" %(i, MyDict[i])
> >         print "MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFL"
> > X = ProtParam.ProteinAnalysis("")
> > print "Instability index of test protein: %.2f" % X.instability_index()
>
> It seems like you have only given bits of a program, so I have tried to
> guess what you meant.
>
> > first few lines of my file :
> > AEGEFAHLYGTFRED
> > AEGEFAHLZGTFRED
> > AEGEFGATYGVYTSD
> > AEGEFGATZGVYTSD
> > AEGEFGATYGVZTSD
> > AEGEFGATZGVZTSD
> > AEGEFLYGEIQGTQD
>
> In the following example, I am assuming your sequences are in a plain
> text file, called protparam.txt, which contains each sequence on a
> single line.
>
> Try something like this first of all, and make sure that it prints out
> your sequences correctly:
>
> for line in open("protparam.txt") :
>      #Remove any trailing new lines or white space
>      seq_string = line.rstrip()
>      print "Sequence <%s>" % seq_string
>
> Then try doing the ProtParam.ProteinAnalysis of each sequence string:
>
> from Bio.SeqUtils import ProtParam, ProtParamData
> for line in open("protparam.txt") :
>      #Remove any trailing new lines or white space
>      seq_string = line.rstrip()
>      print "Sequence <%s>" % seq_string
>      X = ProtParam.ProteinAnalysis(seq_string)
>      print "Instability index: %.2f" % X.instability_index()
>
> You'll find it doesn't like the "Z" (presumably this is Glx - glutamic
> acid or glutamine? i.e. E or Q) present in many of your sequences, so
> this next version uses error handling to note this and then carry on to
> the next sequence:
>
> from Bio.SeqUtils import ProtParam, ProtParamData
> for line in open("protparam.txt") :
>      #Remove any trailing new lines or white space
>      seq_string = line.rstrip()
>
>      print #blank line
>      print "Sequence <%s>" % seq_string
>      X = ProtParam.ProteinAnalysis(seq_string)
>      try :
>          print "Instability index: %.2f" % X.instability_index()
>      except KeyError, e :
>          print "Problem with the letter %s in the sequence?" % str(e)
>
> The output is:
>
> Sequence <AEGEFAHLYGTFRED>
> Instability index: 8.39
>
> Sequence <AEGEFAHLZGTFRED>
> Problem with the letter 'Z' in the sequence?
>
> Sequence <AEGEFGATYGVYTSD>
> Instability index: -17.70
>
> Sequence <AEGEFGATZGVYTSD>
> Problem with the letter 'Z' in the sequence?
>
> Sequence <AEGEFGATYGVZTSD>
> Problem with the letter 'Z' in the sequence?
>
> Sequence <AEGEFGATZGVZTSD>
> Problem with the letter 'Z' in the sequence?
>
> Sequence <AEGEFLYGEIQGTQD>
> Instability index: 8.61
>
> You'll have to check yourself to see if these numbers are sensible.  I
> don't know what to suggest for your "Z" entries - the stability will be
> different if you try using E or Q instead.
>
> Peter
>
>

From jhortia1 at jhu.edu  Mon Apr 30 16:16:42 2007
From: jhortia1 at jhu.edu (JASON HORTIATIS)
Date: Mon, 30 Apr 2007 16:16:42 -0400
Subject: [BioPython] local blast output
Message-ID: <f64fc8e94d78.4636166a@johnshopkins.edu>

Dear all,
I'm a novice using biopython to run local blast searches and save the output to a file, but i've run into a problem becuase it seems as though the b_parser has a limit of 250 sequences, however my searches are returning far more than 250 sequences.  Does anyone know if the parser really is limited, and if so if it is possible to work around this?
Thanks for the help, 

Jason

From sbassi at gmail.com  Mon Apr 30 17:26:50 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Mon, 30 Apr 2007 18:26:50 -0300
Subject: [BioPython] local blast output
In-Reply-To: <f64fc8e94d78.4636166a@johnshopkins.edu>
References: <f64fc8e94d78.4636166a@johnshopkins.edu>
Message-ID: <b43bf2080704301426p12b9d878weedc40e2dd98246b@mail.gmail.com>

On 4/30/07, JASON HORTIATIS <jhortia1 at jhu.edu> wrote:
> Dear all,
> I'm a novice using biopython to run local blast searches and save the output to a file, but i've run into a problem becuase it seems as though the b_parser has a limit of 250 sequences, however my searches are returning far more than 250 sequences.  Does anyone know if the parser really is limited, and if so if it is possible to work around this?
> Thanks for the help,

There is no 250 limit in the parser. Please show us to code to help
you. Also tell us blast and biopython version.
Best,
SB.

-- 
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318

From nicolas.chauvat at logilab.fr  Mon Apr  2 15:58:49 2007
From: nicolas.chauvat at logilab.fr (Nicolas Chauvat)
Date: Mon, 2 Apr 2007 17:58:49 +0200
Subject: [BioPython] [ANN] EuroPython 2007: Call for Proposals
Message-ID: <20070402155849.GF24884@crater.logilab.fr>

Book Monday 9th July to Wednesday 11th July 2007 in your calendar!
EuroPython 2007, the European Python and Zope Conference, will be held in
Vilnius, Lithuania.  Last year's conference was a great success, featuring
a variety of tracks, amazing lightning talks and inspiring keynotes.  With
your participation, we want to make EuroPython 2007, the sixth EuroPython,
even more successful than the previous five.

Talks, Papers and Themes
------------------------

This year we have decided to borrow a few good ideas from PyCon, one of
which is to move away from the 'track' structure.  Instead, speakers are
invited to submit presentations about anything they have done that they
think would be of interest to the Python community.  We will then arrange
them into related groups and schedule them in the space available.  In the
past, EuroPython participants have found the following themes to be of
interest:

 * Science
 * Python Language and Libraries
 * Web Related Technologies
 * Education
 * Games
 * Agile Methodologies and Testing
 * Social Skills

In addition to talks, we will also accept full paper submissions about any
of the above themes.  The Call for Refereed Papers will be posted shortly.

The deadline for talk proposals is Friday 18th May at midnight (24:00
CEST, Central European Summer Time, UTC+2).

Other ways to participate
-------------------------

Apart from giving talks, there are plenty of other ways to participate in
the conference.  Just attending and talking to people you find here can be
satisfying enough, but there are three other kinds of activity you may wish
to plan for: Lightning Talks, Open Space and Sprints.  Lightning Talks are
very short talks that give you just enough time to introduce a topic or
project, Open Space is an area reserved for informal discussions, and
Sprints are focused gatherings for developers interested in particular
projects.  For more information please see the following pages:

 * Lightning Talks: http://www.europython.org/sections/events/lightning_talks
 * Open Space: http://www.europython.org/sections/events/open_space
 * Sprints: http://www.europython.org/sections/sprints_and_wiki

Your Contribution
-----------------

To propose a talk or a paper, go to...

 * http://www.europython.org/submit

For more general information on the conference, please visit...

 * http://www.europython.org/

Looking forward to seeing what you fine folk have been up to,

The EuroPython Team


-- 
Nicolas Chauvat

logilab.fr - services en informatique avanc?e et gestion de connaissances  


From alexl at users.sourceforge.net  Sun Apr  8 09:27:27 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Sun, 08 Apr 2007 02:27:27 -0700
Subject: [BioPython] Biopython package for Fedora
Message-ID: <274pnrcj28.fsf@delpy.biol.berkeley.edu>

(Apologies if you receive multiple copies, this is a repost, my
original bounced)

Hello Biopythonistas,

I have created preliminary RPM package of the latest release of
Biopython (1.43) for Fedora as part of the "Fedora Package Collection"
(formerly "Fedora Extras" since Fedora Core+Fedora Extras are
merging).  

(I am also packaging Bioperl, you can see my some of my progress
including links to the reviews here:

http://fedoraproject.org/wiki/AlexLancaster)

I am almost ready to submit my package for review, but several issues
have arisen during the packaging that I hope the biopython list can
help clarify before I do so:

1) Will Biopython work OK with Python 2.5?  I ask because the next
   release of Fedora (Fedora 7) will only ship with Python 2.5 and
   packages first need to build in the development branch (which will
   eventually become Fedora 7) first.

2) The "python setup.py install" step appears to install a lot of
   scripts with the "#!/usr/bin/env python" at the top into the main
   /usr/lib/python2.4/site-packages/Bio/ namespace, e.g.:

   /usr/lib/python2.4/site-packages/Bio/GFF/GenericTools.py

   should these scripts be installed somewhere more appropriate such
   as /usr/bin/GenericTools.py or do they also function as classes as
   well as executables in their own right?  

   The "rpmlint" tool which is part of the packaging scans a package
   built for Fedora and identifies certain aspects of the package as
   not following the package and/or file system hierarchy (FHS)
   guidelines. [1]  

3) The setup.py install also installs some architecture-independent
   non-code data files (such as DTDs) which I would normally expect to
   live in /usr/share/python-biopython/DTDs (or somesuch) for example:

   /usr/lib/python2.4/site-packages/Bio/EUtils/DTDs/eSearch_020511.dtd

   Is this the normal location for these DTDs and does the rest of the
   bipython framework expect to find these files in this location?

4) If possible, Fedora packages should run all unit tests provided in
   the upstream package at package time, just before creating the RPM.
   I would like to do this for biopython as well, but there doesn't
   seem to be an easy way to disable the PyUnit GUI that pops up and
   run in batch-only non-GUI mode.  I looked at the code in
   Tests/run_tests.py and it does have a "--no-gui" option, but there
   does not appear to be any way to run this from the top-level
   setup.py file, e.g.:

   python setup.py test --no-gui

   doesn't work.

5) My initial package depends on the required software: python, mx,
   python-numeric, as well as the optional python-reportlab,
   MySQL-python and flex which are all also included in Fedora, but I
   won't have Wise2 available since it is not yet in Fedora, at least
   not until I (or somebody else) packages Wise2.

6) Is Biopython-corba still active, and if so, should it also be
   packaged?  Are there any interdependencies with the base biopython
   package?  (No promises, though!)

Thanks,
Alex

[1] I attempted to attach the list at the end of the e-mail for the
    developers to identify and tell me if these files are OK where the
    setup.py currently puts them, but my original e-mail bounced
    probably because of the attachment.
--
Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of Arizona


From sbassi at gmail.com  Sun Apr  8 19:12:23 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Sun, 8 Apr 2007 16:12:23 -0300
Subject: [BioPython] Biopython package for Fedora
In-Reply-To: <274pnrcj28.fsf@delpy.biol.berkeley.edu>
References: <274pnrcj28.fsf@delpy.biol.berkeley.edu>
Message-ID: <b43bf2080704081212q3efa1bcfy23959c2a388b8e67@mail.gmail.com>

On 4/8/07, Alex Lancaster <alexl at users.sourceforge.net> wrote:
> 1) Will Biopython work OK with Python 2.5?  I ask because the next
>    release of Fedora (Fedora 7) will only ship with Python 2.5 and
>    packages first need to build in the development branch (which will
>    eventually become Fedora 7) first.

This is the only question I am able to answer. Yes, it does work with
Python 2.5.


From chris.lasher at gmail.com  Sun Apr  8 20:14:54 2007
From: chris.lasher at gmail.com (Chris Lasher)
Date: Sun, 8 Apr 2007 16:14:54 -0400
Subject: [BioPython] Biopython package for Fedora
In-Reply-To: <274pnrcj28.fsf@delpy.biol.berkeley.edu>
References: <274pnrcj28.fsf@delpy.biol.berkeley.edu>
Message-ID: <128a885f0704081314r490b7fbdj71d8b16612e8b54c@mail.gmail.com>

On 4/8/07, Alex Lancaster <alexl at users.sourceforge.net> wrote:
> 2) The "python setup.py install" step appears to install a lot of
>    scripts with the "#!/usr/bin/env python" at the top into the main
>    /usr/lib/python2.4/site-packages/Bio/ namespace, e.g.:
>
>    /usr/lib/python2.4/site-packages/Bio/GFF/GenericTools.py
>
>    should these scripts be installed somewhere more appropriate such
>    as /usr/bin/GenericTools.py or do they also function as classes as
>    well as executables in their own right?

The line
#!/usr/bin/env python

retrieves the appropriate Python installation as specified by the
user's defined environment. This is preferable to hard-coding
#!/usr/bin/python, which will always use the Python installation
pointed to by /usr/bin/python. For most users, this doesn't matter,
but if the user desires to use a local or custom installation of
Python, they must change all these scripts by hand to point to their
preferred Python install.

Say my distribution's Python is version 2.3 but I have installed a
local copy of version 2.5 which is symlinked at /usr/local/bin/python.
I can set /usr/local/bin/python ahead in my path and the scripts with
"#!/usr/bin/env python" will then execute with my preferred version
(2.5) of Python rather than the system version (2.3), but the scripts
with "#!/usr/bin/python" will execute with the system version (2.3)
rather than my prefered version (2.5). Web search for more details.

Chris


From alexl at users.sourceforge.net  Sun Apr  8 22:28:06 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Sun, 08 Apr 2007 15:28:06 -0700
Subject: [BioPython] Biopython package for Fedora
Message-ID: <e4fy7abix5.fsf@delpy.biol.berkeley.edu>

>>>>> "CL" == Chris Lasher  writes:

CL> On 4/8/07, Alex Lancaster <alexl at users.sourceforge.net> wrote:
>> 2) The "python setup.py install" step appears to install a lot of
>> scripts with the "#!/usr/bin/env python" at the top into the main
>> /usr/lib/python2.4/site-packages/Bio/ namespace, e.g.:
>> 
>> /usr/lib/python2.4/site-packages/Bio/GFF/GenericTools.py
>> 
>> should these scripts be installed somewhere more appropriate such
>> as /usr/bin/GenericTools.py or do they also function as classes as
>> well as executables in their own right?

CL> The line #!/usr/bin/env python

CL> retrieves the appropriate Python installation as specified by the
CL> user's defined environment. 

[...]

I'm aware of the function of the "/usr/bin/env python"
vs. "/usr/bin/python", that isn't the problem.  My question was about
the *location* of the script files when installed in
/usr/lib/python2.4/site-packages/Bio/* vs. being installed as
executables in /usr/bin/.

It seems that there are a number of files which contain both classes
and scripts and rpmlint identifies all files containing scripts which
aren't installed in a location like /usr/bin/ to make sure that
scripts aren't unintentionally installed in a non-executable location.

Alex


From alexl at users.sourceforge.net  Sun Apr  8 04:51:04 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Sat, 07 Apr 2007 21:51:04 -0700
Subject: [BioPython] Biopython package(s) for Fedora
Message-ID: <n0hcrrcvuv.fsf@delpy.biol.berkeley.edu>

Hello Biopythonistas,

I have created preliminary RPM package of the latest release of
Biopython (1.43) for Fedora as part of the "Fedora Package Collection"
(formerly "Fedora Extras" since Fedora Core+Fedora Extras are
merging).  

(I am also packaging Bioperl, you can see my some of my progress
including links to the reviews here:

http://fedoraproject.org/wiki/AlexLancaster)

I am almost ready to submit my package for review, but several issues
have arisen during the packaging that I hope the biopython list can
help clarify before I do so:

1) Will Biopython work OK with Python 2.5?  I ask because the next
   release of Fedora (Fedora 7) will only ship with Python 2.5 and
   packages first need to build in the development branch (which will
   eventually become Fedora 7) first.

2) The "python setup.py install" step appears to install a lot of
   scripts with the "#!/usr/bin/env python" at the top into the main
   /usr/lib/python2.4/site-packages/Bio/ namespace, e.g.:

   /usr/lib/python2.4/site-packages/Bio/GFF/GenericTools.py

   should these scripts be installed somewhere more appropriate such
   as /usr/bin/GenericTools.py or do they also function as classes as
   well as executables in their own right?  

   The "rpmlint" tool which is part of the packaging scans a package
   built for Fedora and identifies certain aspects of the package as
   not following the package and/or file system hierarchy (FHS)
   guidelines and I attach the list at the end of the e-mail for the
   developers to identify and tell me if these files are OK where
   the setup.py currently puts them.

3) The setup.py install also installs some architecture-independent
   non-code data files (such as DTDs) which I would normally expect to
   live in /usr/share/python-biopython/DTDs (or somesuch) for example:

   /usr/lib/python2.4/site-packages/Bio/EUtils/DTDs/eSearch_020511.dtd

   Is this the normal location for these DTDs and does the rest of the
   bipython framework expect to find these files in this location?

4) If possible, Fedora packages should run all unit tests provided in
   the upstream package at package time, just before creating the RPM.
   I would like to do this for biopython as well, but there doesn't
   seem to be an easy way to disable the PyUnit GUI that pops up and
   run in batch-only non-GUI mode.  I looked at the code in
   Tests/run_tests.py and it does have a "--no-gui" option, but there
   does not appear to be any way to run this from the top-level
   setup.py file, e.g.:

   python setup.py test --no-gui

   doesn't work.

5) My initial package depends on the required software: python, mx,
   python-numeric, as well as the optional python-reportlab,
   MySQL-python and flex which are all also included in Fedora, but I
   won't have Wise2 available since it is not yet in Fedora, at least
   not until I (or somebody else) packages Wise2.

6) Is Biopython-corba still active, and if so, should it also be
   packaged?  Are there any interdependencies with the base biopython
   package?  (No promises, though!)

Thanks,
Alex
--
Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of Arizona

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: biopython-rpmlint.txt
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20070407/e9f04488/attachment-0002.txt>

From chris.lasher at gmail.com  Wed Apr 11 04:43:14 2007
From: chris.lasher at gmail.com (Chris Lasher)
Date: Wed, 11 Apr 2007 00:43:14 -0400
Subject: [BioPython] Biopython package for Fedora
In-Reply-To: <274pnrcj28.fsf@delpy.biol.berkeley.edu>
References: <274pnrcj28.fsf@delpy.biol.berkeley.edu>
Message-ID: <128a885f0704102143j3697fe93keb6eb557da63e4fc@mail.gmail.com>

On 4/8/07, Alex Lancaster <alexl at users.sourceforge.net> wrote:
> 4) If possible, Fedora packages should run all unit tests provided in
>    the upstream package at package time, just before creating the RPM.
>    I would like to do this for biopython as well, but there doesn't
>    seem to be an easy way to disable the PyUnit GUI that pops up and
>    run in batch-only non-GUI mode.  I looked at the code in
>    Tests/run_tests.py and it does have a "--no-gui" option, but there
>    does not appear to be any way to run this from the top-level
>    setup.py file, e.g.:
>
>    python setup.py test --no-gui
>
>    doesn't work.

Alex, thanks for pointing this out. I sat down tonight and resolved this issue.

<http://bugzilla.open-bio.org/show_bug.cgi?id=2266>

The patch on there should be the fix needed. Save it as
setup_test.patch (or whatever, but that's convenient), place it in the
same directory as setup.py, and patch with the command

patch -p0 < setup_test.patch

Alternatively, I can send you the patched files (setup.py and
Tests/run_tests.py).

Thanks again for pointing this out.

Chris


From timmcilveen at talktalk.net  Wed Apr 11 14:15:52 2007
From: timmcilveen at talktalk.net (tim)
Date: Wed, 11 Apr 2007 15:15:52 +0100
Subject: [BioPython] installing on Mandriva Linux
Message-ID: <1176300953.3621.13.camel@localhost>

Hi,
I am getting lots of errors during python setup using biopython setup.py
install. I am running python  2.4.3. on Linux and have mxtextools,
numeric and headers etc. installed. The installation is definately  not
working as i get errors when i type some of the test code such as:
from Bio.Seq import Seq 
I get a traceback error.

Can anyone help. I'm new to biopython and Linux. I have everything
working fine under Windows.


I get problems from this point onwards in the install, with lots of
Bio/Cluster/clustermodule errors:


Do you want to continue this installation? (Y/n)  Y

*** Bio.KDTree *** NOT built by default

The Bio.PDB.NeighborSearch module depends on the Bio.KDTree module,
which in turn, depends on C++ code that does not compile cleanly on
all platforms. Hence, Bio.KDTree is not built by default.

Would you like to build Bio.KDTree ? (y/N)  y

creating build/temp.linux-i686-2.4/Bio/Cluster
gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -g -pipe
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fomit-frame-pointer -march=i586
-mtune=pentiumpro -fasynchronous-unwind-tables -g -fPIC -IBio/Cluster
-I/usr/include/python2.4 -c Bio/Cluster/clustermodule.c -o
build/temp.linux-i686-2.4/Bio/Cluster/clustermodule.o
Bio/Cluster/clustermodule.c:2:33: error: Numeric/arrayobject.h: No such
file or directory
Bio/Cluster/clustermodule.c:20: error: expected declaration specifiers
or ?...? before ?PyArrayObject?
Bio/Cluster/clustermodule.c: In function ?parse_data?:
Bio/Cluster/clustermodule.c:27: error: ?array? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:27: error: (Each undeclared identifier is
reported only once
Bio/Cluster/clustermodule.c:27: error: for each function it appears in.)
Bio/Cluster/clustermodule.c:27: error: ?PyArrayObject? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:27: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:35: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:44: error: ?PyArray_DOUBLE? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:45: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: At top level:
Bio/Cluster/clustermodule.c:84: error: expected ?)? before ?*? token
Bio/Cluster/clustermodule.c:98: error: expected declaration specifiers
or ?...? before ?PyArrayObject?
Bio/Cluster/clustermodule.c: In function ?parse_mask?:
Bio/Cluster/clustermodule.c:109: error: ?array? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:113: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:113: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:121: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:128: error: ?PyArray_INT? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:130: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: At top level:
Bio/Cluster/clustermodule.c:178: error: expected ?)? before ?*? token
Bio/Cluster/clustermodule.c:191: error: expected declaration specifiers
or ?...? before ?PyArrayObject?
Bio/Cluster/clustermodule.c: In function ?parse_weight?:
Bio/Cluster/clustermodule.c:197: error: ?array? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:201: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:201: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:209: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:210: error: ?PyArray_DOUBLE? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:212: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: At top level:
Bio/Cluster/clustermodule.c:255: error: expected ?)? before ?*? token
Bio/Cluster/clustermodule.c:265: error: expected ?=?, ?,?, ?;?, ?asm? or
?__attribute__? before ?*? token
Bio/Cluster/clustermodule.c:372: error: expected declaration specifiers
or ?...? before ?PyArrayObject?
Bio/Cluster/clustermodule.c: In function ?parse_clusterid?:
Bio/Cluster/clustermodule.c:383: error: ?array? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:389: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:389: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:397: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:399: error: ?PyArray_INT? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:401: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: At top level:
Bio/Cluster/clustermodule.c:471: error: expected ?)? before ?*? token
Bio/Cluster/clustermodule.c:482: error: expected declaration specifiers
or ?...? before ?PyArrayObject?
Bio/Cluster/clustermodule.c: In function ?free_distances?:
Bio/Cluster/clustermodule.c:485: error: ?array? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:489: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:489: error: ?a? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:489: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: At top level:
Bio/Cluster/clustermodule.c:515: error: expected declaration specifiers
or ?...? before ?PyArrayObject?
Bio/Cluster/clustermodule.c: In function ?parse_distance?:
Bio/Cluster/clustermodule.c:522: error: ?array? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:522: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:522: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:545: error: ?a? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:545: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:557: error: ?PyArray_DOUBLE? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:576: warning: assignment makes pointer from
integer without a cast
Bio/Cluster/clustermodule.c:584: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:601: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:628: warning: passing argument 3 of
?free_distances? makes integer from pointer without a cast
Bio/Cluster/clustermodule.c:628: error: too many arguments to function
?free_distances?
Bio/Cluster/clustermodule.c:637: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:640: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: At top level:
Bio/Cluster/clustermodule.c:716: error: expected declaration specifiers
or ?...? before ?PyArrayObject?
Bio/Cluster/clustermodule.c: In function ?create_celldata?:
Bio/Cluster/clustermodule.c:725: error: ?array? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:725: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:725: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: At top level:
Bio/Cluster/clustermodule.c:753: error: expected declaration specifiers
or ?...? before ?PyArrayObject?
Bio/Cluster/clustermodule.c: In function ?parse_index?:
Bio/Cluster/clustermodule.c:757: error: ?array? undeclared (first use in
this function)
Bio/Cluster/clustermodule.c:766: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:766: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:776: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:778: error: ?PyArray_INT? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:780: warning: assignment makes pointer from
integer without a cast
Bio/Cluster/clustermodule.c:787: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:803: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: At top level:
Bio/Cluster/clustermodule.c:818: error: expected ?)? before ?*? token
Bio/Cluster/clustermodule.c: In function ?PyTree_cut?:
Bio/Cluster/clustermodule.c:1165: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:1165: error: ?aCLUSTERID? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:1165: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:1181: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:1187: error: ?clusterid? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:1197: warning: return makes pointer from
integer without a cast
Bio/Cluster/clustermodule.c: In function ?py_kcluster?:
Bio/Cluster/clustermodule.c:1312: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:1312: error: ?aDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1315: error: ?aMASK? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1318: error: ?aWEIGHT? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1325: error: ?aCLUSTERID? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:1379: error: too many arguments to function
?parse_data?
Bio/Cluster/clustermodule.c:1384: error: too many arguments to function
?parse_mask?
Bio/Cluster/clustermodule.c:1416: error: too many arguments to function
?parse_weight?
Bio/Cluster/clustermodule.c: In function ?py_kmedoids?:
Bio/Cluster/clustermodule.c:1501: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:1501: error: ?aDISTANCES? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:1504: error: ?aCLUSTERID? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:1533: error: too many arguments to function
?parse_distance?
Bio/Cluster/clustermodule.c:1538: warning: passing argument 3 of
?free_distances? makes integer from pointer without a cast
Bio/Cluster/clustermodule.c:1538: error: too many arguments to function
?free_distances?
Bio/Cluster/clustermodule.c:1545: warning: passing argument 3 of
?free_distances? makes integer from pointer without a cast
Bio/Cluster/clustermodule.c:1545: error: too many arguments to function
?free_distances?
Bio/Cluster/clustermodule.c:1552: warning: passing argument 3 of
?free_distances? makes integer from pointer without a cast
Bio/Cluster/clustermodule.c:1552: error: too many arguments to function
?free_distances?
Bio/Cluster/clustermodule.c:1565: warning: passing argument 3 of
?free_distances? makes integer from pointer without a cast
Bio/Cluster/clustermodule.c:1565: error: too many arguments to function
?free_distances?
Bio/Cluster/clustermodule.c: In function ?py_treecluster?:
Bio/Cluster/clustermodule.c:1706: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:1706: error: ?aDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1707: error: ?aMASK? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1708: error: ?aWEIGHT? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1726: error: too many arguments to function
?parse_data?
Bio/Cluster/clustermodule.c:1733: error: too many arguments to function
?parse_mask?
Bio/Cluster/clustermodule.c:1739: error: too many arguments to function
?parse_weight?
Bio/Cluster/clustermodule.c:1762: error: ?aDISTANCEMATRIX? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:1770: error: too many arguments to function
?parse_distance?
Bio/Cluster/clustermodule.c:1783: warning: passing argument 3 of
?free_distances? makes integer from pointer without a cast
Bio/Cluster/clustermodule.c:1783: error: too many arguments to function
?free_distances?
Bio/Cluster/clustermodule.c: In function ?py_somcluster?:
Bio/Cluster/clustermodule.c:1849: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:1849: error: ?aDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1852: error: ?aMASK? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1855: error: ?aWEIGHT? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:1863: error: ?aCELLDATA? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:1865: error: ?aCLUSTERID? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:1922: error: too many arguments to function
?parse_data?
Bio/Cluster/clustermodule.c:1929: error: too many arguments to function
?parse_mask?
Bio/Cluster/clustermodule.c:1935: error: too many arguments to function
?parse_weight?
Bio/Cluster/clustermodule.c:1944: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:1954: error: too many arguments to function
?create_celldata?
Bio/Cluster/clustermodule.c: In function ?py_median?:
Bio/Cluster/clustermodule.c:1996: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:1996: error: ?aDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2007: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:2015: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:2018: error: ?PyArray_DOUBLE? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:2019: warning: initialization makes pointer
from integer without a cast
Bio/Cluster/clustermodule.c:2021: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:2037: warning: initialization makes pointer
from integer without a cast
Bio/Cluster/clustermodule.c:2043: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: In function ?py_mean?:
Bio/Cluster/clustermodule.c:2062: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:2062: error: ?aDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2073: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:2081: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:2084: error: ?PyArray_DOUBLE? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:2085: warning: initialization makes pointer
from integer without a cast
Bio/Cluster/clustermodule.c:2087: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:2103: warning: initialization makes pointer
from integer without a cast
Bio/Cluster/clustermodule.c:2109: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: In function ?py_clusterdistance?:
Bio/Cluster/clustermodule.c:2167: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:2167: error: ?aDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2170: error: ?aMASK? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2173: error: ?aWEIGHT? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2181: error: ?aINDEX1? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2184: error: ?aINDEX2? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2216: error: too many arguments to function
?parse_data?
Bio/Cluster/clustermodule.c:2222: error: too many arguments to function
?parse_mask?
Bio/Cluster/clustermodule.c:2228: error: too many arguments to function
?parse_weight?
Bio/Cluster/clustermodule.c:2235: error: too many arguments to function
?parse_index?
Bio/Cluster/clustermodule.c:2242: error: too many arguments to function
?parse_index?
Bio/Cluster/clustermodule.c: In function ?py_clustercentroids?:
Bio/Cluster/clustermodule.c:2312: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:2312: error: ?aDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2315: error: ?aMASK? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2318: error: ?aCLUSTERID? undeclared (first
use in this function)
Bio/Cluster/clustermodule.c:2322: error: ?aCDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2324: error: ?aCMASK? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2350: error: too many arguments to function
?parse_data?
Bio/Cluster/clustermodule.c:2356: error: too many arguments to function
?parse_mask?
Bio/Cluster/clustermodule.c:2362: warning: passing argument 3 of
?parse_clusterid? makes pointer from integer without a cast
Bio/Cluster/clustermodule.c:2362: error: too many arguments to function
?parse_clusterid?
Bio/Cluster/clustermodule.c:2371: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c:2384: error: expected expression before ?)?
token
Bio/Cluster/clustermodule.c: In function ?py_distancematrix?:
Bio/Cluster/clustermodule.c:2466: error: ?PyArrayObject? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:2466: error: ?aDATA? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2469: error: ?aMASK? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2472: error: ?aWEIGHT? undeclared (first use
in this function)
Bio/Cluster/clustermodule.c:2507: error: too many arguments to function
?parse_data?
Bio/Cluster/clustermodule.c:2514: error: too many arguments to function
?parse_mask?
Bio/Cluster/clustermodule.c:2520: error: too many arguments to function
?parse_weight?
Bio/Cluster/clustermodule.c:2542: error: ?PyArray_DOUBLE? undeclared
(first use in this function)
Bio/Cluster/clustermodule.c:2542: warning: initialization makes pointer
from integer without a cast
Bio/Cluster/clustermodule.c:2548: error: expected expression before ?)?
token
error: command 'gcc' failed with exit status 1
[tim at localhost biopython-1.43]$
[tim at localhost biopython-1.43]$      


Thanks,
Tim


From alexl at users.sourceforge.net  Wed Apr 11 14:44:29 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Wed, 11 Apr 2007 07:44:29 -0700
Subject: [BioPython] Biopython package for Fedora
In-Reply-To: <128a885f0704102143j3697fe93keb6eb557da63e4fc@mail.gmail.com>
	(Chris Lasher's message of "Wed\, 11 Apr 2007 00\:43\:14 -0400")
References: <274pnrcj28.fsf@delpy.biol.berkeley.edu>
	<128a885f0704102143j3697fe93keb6eb557da63e4fc@mail.gmail.com>
Message-ID: <n4hcrnas36.fsf@delpy.biol.berkeley.edu>

>>>>> "CL" == Chris Lasher  writes:

[...]

CL> Alex, thanks for pointing this out. I sat down tonight and
CL> resolved this issue.

CL> <http://bugzilla.open-bio.org/show_bug.cgi?id=2266>

CL> The patch on there should be the fix needed. Save it as
CL> setup_test.patch (or whatever, but that's convenient), place it in
CL> the same directory as setup.py, and patch with the command

CL> patch -p0 < setup_test.patch

CL> Alternatively, I can send you the patched files (setup.py and
CL> Tests/run_tests.py).

CL> Thanks again for pointing this out.

Hi Chris,

Thanks, the patch works fine for me.  I've added the patch to the
package and I can now run the tests in command-line only mode fine.
By the way, I've filed my package review for Fedora:

https://bugzilla.redhat.com/235989 

if anybody wants to keep track of it's progress.  I am currently still
disabling the tests because they hang for some reason on test_Cluster,
I get:

$ python setup.py test --no-gui
running test
test_Ace ... ok
test_BioSQL ... Skipping test because of import error: Skipping BioSQL
tests -- enable tests in Tests/test_BioSQL.py
ok
test_CAPS ... ok
test_Cluster ... 

then the CPU spins indefinitely.

Also I need to make sure that all tests that require network access
are skipped cleanly because the package build environment for Fedora
requires that all packages build without network acess.

On another packaging note: I now remove all #!/usr/bin/ etc. from the
top of files found in the /usr/lib/python2.4/site-packages/Bio/* area
to keep rpmlint happy.  These can still be run using python directly
e.g.:

python /usr/lib/python2.4/site-packages/Bio/biblio.py

Note that there's a lot of inconsistency here: some are "/usr/bin/env
python", others are /usr/bin/python or even /usr/bin/python2.3, others
don't have a main program contained within, and so the #!/usr/bin line
should be removed completely.  Somebody should go through and
cleanup/rationalise the installation process: check that the files
installed when "python setup.py install" is run are appropriate .py
package files, e.g. the EUtils installs it's own "setup.py" file in a
subdirectory, which isn't very clean.

Alex


From mdehoon at c2b2.columbia.edu  Wed Apr 11 15:44:30 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Wed, 11 Apr 2007 17:44:30 +0200
Subject: [BioPython] installing on Mandriva Linux
In-Reply-To: <1176300953.3621.13.camel@localhost>
References: <1176300953.3621.13.camel@localhost>
Message-ID: <461D025E.9070107@c2b2.columbia.edu>

tim wrote:

>I get problems from this point onwards in the install, with lots of
>Bio/Cluster/clustermodule errors:
>...
>creating build/temp.linux-i686-2.4/Bio/Cluster
>gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -g -pipe
>-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fomit-frame-pointer -march=i586
>-mtune=pentiumpro -fasynchronous-unwind-tables -g -fPIC -IBio/Cluster
>-I/usr/include/python2.4 -c Bio/Cluster/clustermodule.c -o
>build/temp.linux-i686-2.4/Bio/Cluster/clustermodule.o
>Bio/Cluster/clustermodule.c:2:33: error: Numeric/arrayobject.h: No such
>file or directory
>  
>
This is the first error message that you get. Did you check that you 
have the header file arrayobject.h? And is it in the correct location?

--Michiel


From jhortia1 at jhu.edu  Fri Apr 13 19:21:54 2007
From: jhortia1 at jhu.edu (JASON HORTIATIS)
Date: Fri, 13 Apr 2007 15:21:54 -0400
Subject: [BioPython] Local Blast Output
Message-ID: <f459cf8972a3.461fa012@johnshopkins.edu>

I'm an undergraduate using biopython to run local blast searches and I'm trying to find out how to save the entire sequence of each protein hit directly to a file.  I have only managed to be able to print the portion of the sequence that matches the query using hsp.sbjct[0:].  My goal is to use the search results from one blast run as a database to search against for a subsequent run so a fasta file is needed for each hit of the first run.
Thanks for the help!

Jason  


From sbassi at gmail.com  Sat Apr 14 04:14:20 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Sat, 14 Apr 2007 01:14:20 -0300
Subject: [BioPython] Local Blast Output
In-Reply-To: <f459cf8972a3.461fa012@johnshopkins.edu>
References: <f459cf8972a3.461fa012@johnshopkins.edu>
Message-ID: <b43bf2080704132114g7b018e8ax5da968eca1efc768@mail.gmail.com>

On 4/13/07, JASON HORTIATIS <jhortia1 at jhu.edu> wrote:
> I'm an undergraduate using biopython to run local blast searches and I'm trying to find out how to save the entire sequence of each protein hit directly to a file.  I have only managed to be able to print the portion of the sequence that matches the query using hsp.sbjct[0:].  My goal is to use the search results from one blast run as a database to search against for a subsequent run so a fasta file is needed for each hit of the first run.
> Thanks for the help!

You can only parse from the BLAST result what is inside the BLAST
output. And there is no the whole sequence In such output, just the
portion you've retrieved. You may need to parse the GID of the protein
and then look for it in your BLAST DB (using fastacmd).
Or you may use PSI-BLAST as an alternative.


From elventear at gmail.com  Tue Apr 17 17:52:40 2007
From: elventear at gmail.com (Pepe Barbe)
Date: Tue, 17 Apr 2007 12:52:40 -0500
Subject: [BioPython] Martel Help
Message-ID: <3e73596b0704171052g7ba3abb0uc04cbce3952d2bd2@mail.gmail.com>

Hello,

I am interested in using Martel for parsing some Biology formats (So
far nothing new).

While the module seems really good, I've been struggling to find some
meaningful documentation. So far I feel I am walking in the dark.
Still I've made some progress. If there is some tutorial or complete
documentation out there I would appreciate if some would point me to
it.

My current question is the following. I have the impression that every
single line that the Martel parser is going to parse must be
recognized, and otherwise it will raise and Exception. Is this
correct? If its true, how can I ignore anything that doesn't match a
RegEx and just process what matches?

Thanks,
Pepe


From elventear at gmail.com  Wed Apr 18 16:54:30 2007
From: elventear at gmail.com (Pepe Barbe)
Date: Wed, 18 Apr 2007 11:54:30 -0500
Subject: [BioPython] Martel Help
In-Reply-To: <3e73596b0704171052g7ba3abb0uc04cbce3952d2bd2@mail.gmail.com>
References: <3e73596b0704171052g7ba3abb0uc04cbce3952d2bd2@mail.gmail.com>
Message-ID: <3e73596b0704180954k752f9be9n6a4f4f46ea2c0435@mail.gmail.com>

Hello,

I've been reading the meager information available for Martel and I
have made good progress, I think. I am basically following the example
in the Exelixis presentation.

In the example, there are some things whose purpose is obvious but the
implementation details (Or all the possible options) aren't. Currently
I am curious on how does Martel.HeaderFooter and Std.record affect the
parsing.

Later in that example they use: blat.format.make_iterator("record").
Where does the "record" come from? Because of using Std.record?

Any help would be deeply appreciated.

Pepe


From dalke at dalkescientific.com  Wed Apr 18 21:45:00 2007
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 18 Apr 2007 23:45:00 +0200
Subject: [BioPython] Martel Help
In-Reply-To: <3e73596b0704180954k752f9be9n6a4f4f46ea2c0435@mail.gmail.com>
References: <3e73596b0704171052g7ba3abb0uc04cbce3952d2bd2@mail.gmail.com>
	<3e73596b0704180954k752f9be9n6a4f4f46ea2c0435@mail.gmail.com>
Message-ID: <C55A75D7-E615-46DF-ADCE-3353650488D9@dalkescientific.com>

On Apr 18, 2007, at 6:54 PM, Pepe Barbe wrote:
> In the example, there are some things whose purpose is obvious but the
> implementation details (Or all the possible options) aren't. Currently
> I am curious on how does Martel.HeaderFooter and Std.record affect the
> parsing.

I'm having to think back several years now.

A limitation with Martel is parsing large data files.  It
has a memory overhead of several times the data file being
processed.  Eg, a 1 MB file might take 7 or so MB to process.

Most bioinformatics formats are composed of records.  Eg,
a GenBank file contains many GenBank records.  The idea of the
Header / Footer / HeaderFooter classes is to break the large
file down into small records, and only have the overhead for
parsing a record.

(But it doesn't help processing large records, like the
entire chromosome as a single FASTA record.)

In FASTA files there is no header or footer.  It can be
read and split up using a RecordReader.  Specifically with
a StartsWith record reader told to look for a ">" which
marks the start of a new record.  Compare to SwissProt
where the record ends with a "//" line.

Some formats are more complicated.  GenBank is one.  Real
genbank files start with a header, something like

GBGSS1.SEQ           Genetic Sequence Data Bank
                           February 15 2003

                 NCBI-GenBank Flat File Release 134.0

                            GSS Sequences (Part 1)

    88066 loci,    66600405 bases, from    88066 reported sequences


There needs to be a way to process a single, unique header,
followed by 0-or-more repeats of a record, followed by an
optional footer.

Use the HeaderFooter expression for this case.

In general, this is a clumsy solution.


Ignore the Std.record.  My thought was that the different terms
in the expression could be standardized.  For example, that
all sequences are tagged with "bio:seq".  I hoped this would
minimize the work needed to add a new format because most of
the handlers would look for expected tags, and not depend so
much on the actual structure of the XML.

It proved too complicated to explain and use.

> Later in that example they use: blat.format.make_iterator("record").
> Where does the "record" come from? Because of using Std.record?

The "record" comes from a group name used in the expression.
It describes the point where the repetition will be done.


				Andrew
				dalke at dalkescientific.com


From skhadar at gmail.com  Fri Apr 20 12:47:07 2007
From: skhadar at gmail.com (Shameer Khadar)
Date: Fri, 20 Apr 2007 18:17:07 +0530
Subject: [BioPython] Protparam using BioPythn
Message-ID: <b6ff81950704200547t77a0edb1ycacf404116f8e655@mail.gmail.com>

Dear All,

I am looking for a script to run Protparam for a 1000 sequence. It will be
great if anyone can point me to a program / web page to get it done.

Many thanks in advance,
Shameer Khadar


From biopython at maubp.freeserve.co.uk  Fri Apr 20 13:51:54 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 20 Apr 2007 14:51:54 +0100
Subject: [BioPython] Protparam using BioPython
In-Reply-To: <b6ff81950704200547t77a0edb1ycacf404116f8e655@mail.gmail.com>
References: <b6ff81950704200547t77a0edb1ycacf404116f8e655@mail.gmail.com>
Message-ID: <4628C57A.7010803@maubp.freeserve.co.uk>

Shameer Khadar wrote:
> Dear All,
> 
> I am looking for a script to run Protparam for a 1000 sequence. It will be
> great if anyone can point me to a program / web page to get it done.

Do you mean the Biopython module Bio.SeqUtils.ProtParam which does 
protein analysis (e.g. isoelectric point).

Did you mean the Expasy ProtParam tool available online?  If you only 
have a few sequences doing them online by hand would be easy:
http://www.expasy.org/tools/protparam.html

Or did you mean something else?

Peter

P.S. did you mean 1000 different sequences, or a single 1000 amino acid 
sequence?


From skhadar at gmail.com  Fri Apr 20 15:19:01 2007
From: skhadar at gmail.com (Shameer Khadar)
Date: Fri, 20 Apr 2007 20:49:01 +0530
Subject: [BioPython] Protparam using BioPython
In-Reply-To: <4628C57A.7010803@maubp.freeserve.co.uk>
References: <b6ff81950704200547t77a0edb1ycacf404116f8e655@mail.gmail.com>
	<4628C57A.7010803@maubp.freeserve.co.uk>
Message-ID: <b6ff81950704200819h372f6296sfaa547113c8c2c5c@mail.gmail.com>

Dear Peter,

Thanks for your reply. I was looking for a script based on Bio.SeqUtils.
I got the following script from a website, its working perfect for me. But
the problem is i have around 1000 sequence (in raw format without headers)
and i thought to process it using a foreach equivalent in python(I am a
python newbie). But its only a couple of minutes back i came to know that
there is no foreach in python, but some better alternative is available
!!!.  It will be great if you can help to process my file using this
program.

program :
from Bio.SeqUtils import ProtParam, ProtParamData
def PrintDictionary(MyDict):
        for i in MyDict.keys():
                print "%s\t%.2f" %(i, MyDict[i])
        print "MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFL"
X = ProtParam.ProteinAnalysis("")
print "Instability index of test protein: %.2f" % X.instability_index()

first few lines of my file :
AEGEFAHLYGTFRED
AEGEFAHLZGTFRED
AEGEFGATYGVYTSD
AEGEFGATZGVYTSD
AEGEFGATYGVZTSD
AEGEFGATZGVZTSD
AEGEFLYGEIQGTQD

Thank you once again,
Shameer

On 4/20/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Shameer Khadar wrote:
> > Dear All,
> >
> > I am looking for a script to run Protparam for a 1000 sequence. It will
> be
> > great if anyone can point me to a program / web page to get it done.
>
> Do you mean the Biopython module Bio.SeqUtils.ProtParam which does
> protein analysis (e.g. isoelectric point).
>
> Did you mean the Expasy ProtParam tool available online?  If you only
> have a few sequences doing them online by hand would be easy:
> http://www.expasy.org/tools/protparam.html
>
> Or did you mean something else?
>
> Peter
>
> P.S. did you mean 1000 different sequences, or a single 1000 amino acid
> sequence?
>
>


From alexl at users.sourceforge.net  Wed Apr 25 08:22:44 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Wed, 25 Apr 2007 01:22:44 -0700
Subject: [BioPython] Bioperl packages now available for Fedora
Message-ID: <3kzm4w50dn.fsf@delpy.biol.berkeley.edu>

Hi all,

Fedora packages for Biopython are now available in the official Fedora
repositories.  Packages for Fedora Core 6 (FC-6) and Rawhide (the
soon-to-be Fedora 7) are available immediately and are installable via
the simple yum command:

# sudo yum install python-biopython

and through any other GUI based installers available for Fedora, such
as piruit, smart or yumex.  The name of the package is
python-biopython.  (A package for Fedora Core 5 has been built and
should be in the FC-5 repository within the next 24 hours or so).

These packages have all optional packages enabled by default:
MySQL-python, python-reportlab and Wise2.  Please file bugs on these
packages in Red Hat/Fedora bugzilla under "Fedora Extras":

https://bugzilla.redhat.com/bugzilla/

please choose your release and select the "python-biopython"
component.

If somebody could update the wiki page with this information, that
would be great: http://biopython.org/wiki/Download

Alex


From biopython at maubp.freeserve.co.uk  Fri Apr 27 09:55:42 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Apr 2007 10:55:42 +0100
Subject: [BioPython] Protparam using BioPython
In-Reply-To: <4628C57A.7010803@maubp.freeserve.co.uk>
References: <b6ff81950704200547t77a0edb1ycacf404116f8e655@mail.gmail.com>
	<4628C57A.7010803@maubp.freeserve.co.uk>
Message-ID: <4631C89E.3090208@maubp.freeserve.co.uk>

Shameer Khadar wrote:
> Dear Peter,
> 
> Thanks for your reply.

Sorry for the delay - I was away on a course this week.

 > I was looking for a script based on Bio.SeqUtils.
> I got the following script from a website, its working perfect for me. But
> the problem is i have around 1000 sequence (in raw format without headers)
> and i thought to process it using a foreach equivalent in python(I am a
> python newbie). But its only a couple of minutes back i came to know that
> there is no foreach in python, but some better alternative is available
> !!!.

There is a "for each" equivalent in python! 
http://docs.python.org/tut/node6.html

If you don't have a good introductory python book, that online tutorial 
is an excellent starting point.

 > It will be great if you can help to process my file using this
> program.
> 
> program :
> from Bio.SeqUtils import ProtParam, ProtParamData
> def PrintDictionary(MyDict):
>         for i in MyDict.keys():
>                 print "%s\t%.2f" %(i, MyDict[i])
>         print "MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFL"
> X = ProtParam.ProteinAnalysis("")
> print "Instability index of test protein: %.2f" % X.instability_index()

It seems like you have only given bits of a program, so I have tried to 
guess what you meant.

> first few lines of my file :
> AEGEFAHLYGTFRED
> AEGEFAHLZGTFRED
> AEGEFGATYGVYTSD
> AEGEFGATZGVYTSD
> AEGEFGATYGVZTSD
> AEGEFGATZGVZTSD
> AEGEFLYGEIQGTQD

In the following example, I am assuming your sequences are in a plain 
text file, called protparam.txt, which contains each sequence on a 
single line.

Try something like this first of all, and make sure that it prints out 
your sequences correctly:

for line in open("protparam.txt") :
     #Remove any trailing new lines or white space
     seq_string = line.rstrip()
     print "Sequence <%s>" % seq_string

Then try doing the ProtParam.ProteinAnalysis of each sequence string:

from Bio.SeqUtils import ProtParam, ProtParamData
for line in open("protparam.txt") :
     #Remove any trailing new lines or white space
     seq_string = line.rstrip()
     print "Sequence <%s>" % seq_string
     X = ProtParam.ProteinAnalysis(seq_string)
     print "Instability index: %.2f" % X.instability_index()

You'll find it doesn't like the "Z" (presumably this is Glx - glutamic 
acid or glutamine? i.e. E or Q) present in many of your sequences, so 
this next version uses error handling to note this and then carry on to 
the next sequence:

from Bio.SeqUtils import ProtParam, ProtParamData
for line in open("protparam.txt") :
     #Remove any trailing new lines or white space
     seq_string = line.rstrip()

     print #blank line
     print "Sequence <%s>" % seq_string
     X = ProtParam.ProteinAnalysis(seq_string)
     try :
         print "Instability index: %.2f" % X.instability_index()
     except KeyError, e :
         print "Problem with the letter %s in the sequence?" % str(e)

The output is:

Sequence <AEGEFAHLYGTFRED>
Instability index: 8.39

Sequence <AEGEFAHLZGTFRED>
Problem with the letter 'Z' in the sequence?

Sequence <AEGEFGATYGVYTSD>
Instability index: -17.70

Sequence <AEGEFGATZGVYTSD>
Problem with the letter 'Z' in the sequence?

Sequence <AEGEFGATYGVZTSD>
Problem with the letter 'Z' in the sequence?

Sequence <AEGEFGATZGVZTSD>
Problem with the letter 'Z' in the sequence?

Sequence <AEGEFLYGEIQGTQD>
Instability index: 8.61

You'll have to check yourself to see if these numbers are sensible.  I 
don't know what to suggest for your "Z" entries - the stability will be 
different if you try using E or Q instead.

Peter


From biopython at maubp.freeserve.co.uk  Sat Apr 28 08:58:40 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Apr 2007 09:58:40 +0100
Subject: [BioPython] EMBL parsing in Biopython 1.43
Message-ID: <46330CC0.9060708@maubp.freeserve.co.uk>

As part of the new SeqIO system introduced in Biopython 1.43, I added 
the ability to read in EMBL format sequences.

http://biopython.org/wiki/SeqIO

I would be interested to hear feedback (positive or negative) from 
anyone who has tried to use this.

Peter


From alexl at users.sourceforge.net  Sat Apr 28 10:21:40 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Sat, 28 Apr 2007 03:21:40 -0700
Subject: [BioPython] Somebody vandalised the wiki download page
Message-ID: <7x1wi4re8b.fsf@delpy.biol.berkeley.edu>

I just created an account and fixed it with this edit:

http://biopython.org/w/index.php?title=Download&diff=1868&oldid=1867

Can somebody with sufficient admin privileges block user "Uzman"?

Thanks,
Alex


From cjfields at uiuc.edu  Sat Apr 28 13:53:37 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 28 Apr 2007 08:53:37 -0500
Subject: [BioPython] Somebody vandalised the wiki download page
In-Reply-To: <7x1wi4re8b.fsf@delpy.biol.berkeley.edu>
References: <7x1wi4re8b.fsf@delpy.biol.berkeley.edu>
Message-ID: <D6C3B50F-AF45-4B59-904E-2FE9D5FEC857@uiuc.edu>

Done.

chris

On Apr 28, 2007, at 5:21 AM, Alex Lancaster wrote:

> I just created an account and fixed it with this edit:
>
> http://biopython.org/w/index.php?title=Download&diff=1868&oldid=1867
>
> Can somebody with sufficient admin privileges block user "Uzman"?
>
> Thanks,
> Alex
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From mdehoon at c2b2.columbia.edu  Sun Apr 29 10:16:31 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Apr 2007 19:16:31 +0900
Subject: [BioPython] EMBL parsing in Biopython 1.43
In-Reply-To: <46330CC0.9060708@maubp.freeserve.co.uk>
References: <46330CC0.9060708@maubp.freeserve.co.uk>
Message-ID: <4634707F.5060607@c2b2.columbia.edu>

Thanks Peter!

I tried this EMBL-formatted file (using the latest version of Biopython 
in CVS):

ftp://ftp.pasteur.fr/pub/GenomeDB/SubtiList/FlatFiles/SLR16.1_embl.txt

but I got this error message:

 >>> from Bio import SeqIO
 >>> input = open("SLR16.1_embl.txt")
 >>> records = SeqIO.parse(input, format="embl")
 >>> records.next()
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File 
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", 
line 410, in parse_records
     record = self.parse(handle)
   File 
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", 
line 393, in parse
     if self.feed(handle, consumer) :
   File 
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", 
line 360, in feed
     self._feed_first_line(consumer, self.line)
   File 
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", 
line 540, in _feed_first_line
     assert len(fields) == 7
AssertionError
 >>>

Do you have an idea as to what may be going wrong here?

--Michiel.


Peter wrote:
> As part of the new SeqIO system introduced in Biopython 1.43, I added 
> the ability to read in EMBL format sequences.
> 
> http://biopython.org/wiki/SeqIO
> 
> I would be interested to hear feedback (positive or negative) from 
> anyone who has tried to use this.
> 
> Peter
> 
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


From biopython at maubp.freeserve.co.uk  Sun Apr 29 20:02:05 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 29 Apr 2007 21:02:05 +0100
Subject: [BioPython] EMBL parsing in Biopython 1.43
In-Reply-To: <4634707F.5060607@c2b2.columbia.edu>
References: <46330CC0.9060708@maubp.freeserve.co.uk>
	<4634707F.5060607@c2b2.columbia.edu>
Message-ID: <4634F9BD.8070909@maubp.freeserve.co.uk>

Michiel de Hoon wrote:
> Thanks Peter!
> 
> I tried this EMBL-formatted file (using the latest version of Biopython 
> in CVS):
> 
> ftp://ftp.pasteur.fr/pub/GenomeDB/SubtiList/FlatFiles/SLR16.1_embl.txt
> 
> but I got this error message:
> 
>  >>> from Bio import SeqIO
>  >>> input = open("SLR16.1_embl.txt")
>  >>> records = SeqIO.parse(input, format="embl")
>  >>> records.next()
> Traceback (most recent call last):
...
> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", 
> line 540, in _feed_first_line
>      assert len(fields) == 7
> AssertionError
>  >>>

Does the same here on with CVS Biopython on Linux with python 2.4

> Do you have an idea as to what may be going wrong here?

Yes - I wrote and EMBL parser using the latest file format, while I 
suspect your file from the Pasteur Institute uses an older format - 
specifically one where the first list (the ID line) has a different 
number of fields.

This is reminiscent of the various revisions to the GenBank LOCUS line 
which we also have to cope with.

I hope to have a fix in CVS today/tomorrow.

Peter


From biopython at maubp.freeserve.co.uk  Sun Apr 29 22:11:07 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 29 Apr 2007 23:11:07 +0100
Subject: [BioPython] EMBL parsing in Biopython 1.43
In-Reply-To: <4634F9BD.8070909@maubp.freeserve.co.uk>
References: <46330CC0.9060708@maubp.freeserve.co.uk>	<4634707F.5060607@c2b2.columbia.edu>
	<4634F9BD.8070909@maubp.freeserve.co.uk>
Message-ID: <463517FB.9090706@maubp.freeserve.co.uk>

Peter wrote:
> Michiel de Hoon wrote:
>> Do you have an idea as to what may be going wrong here?
> 
> Yes - I wrote and EMBL parser using the latest file format, while I 
> suspect your file from the Pasteur Institute uses an older format - 
> specifically one where the first list (the ID line) has a different 
> number of fields.

The file you tried seems to use the pre 2006 style ID line.  I found 
another example like this on the BioPerl webpage.  See also:

http://www.ebi.ac.uk/embl/Documentation/archivedchanges.html

> I hope to have a fix in CVS today/tomorrow.

I have updated Bio/GenBank/Scanner.py to cope with these old EMBL ID 
lines and added another EMBL test case to test_SeqIO.py

Your example now parses fine, giving a single SeqRecord as expected.  I 
have not checked the annotation or features...

Peter


From skhadar at gmail.com  Mon Apr 30 13:01:56 2007
From: skhadar at gmail.com (Shameer Khadar)
Date: Mon, 30 Apr 2007 18:31:56 +0530
Subject: [BioPython] Protparam using BioPython
In-Reply-To: <4631C89E.3090208@maubp.freeserve.co.uk>
References: <b6ff81950704200547t77a0edb1ycacf404116f8e655@mail.gmail.com>
	<4628C57A.7010803@maubp.freeserve.co.uk>
	<4631C89E.3090208@maubp.freeserve.co.uk>
Message-ID: <b6ff81950704300601w436f5837xb7a86a033a22f0d8@mail.gmail.com>

Dear Peter,

Thanks a  lot for you detailed reply and splendid help !!!
It worked !!
Cheers,
Shameer

On 4/27/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Shameer Khadar wrote:
> > Dear Peter,
> >
> > Thanks for your reply.
>
> Sorry for the delay - I was away on a course this week.
>
> > I was looking for a script based on Bio.SeqUtils.
> > I got the following script from a website, its working perfect for me.
> But
> > the problem is i have around 1000 sequence (in raw format without
> headers)
> > and i thought to process it using a foreach equivalent in python(I am a
> > python newbie). But its only a couple of minutes back i came to know
> that
> > there is no foreach in python, but some better alternative is available
> > !!!.
>
> There is a "for each" equivalent in python!
> http://docs.python.org/tut/node6.html
>
> If you don't have a good introductory python book, that online tutorial
> is an excellent starting point.
>
> > It will be great if you can help to process my file using this
> > program.
> >
> > program :
> > from Bio.SeqUtils import ProtParam, ProtParamData
> > def PrintDictionary(MyDict):
> >         for i in MyDict.keys():
> >                 print "%s\t%.2f" %(i, MyDict[i])
> >         print "MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFL"
> > X = ProtParam.ProteinAnalysis("")
> > print "Instability index of test protein: %.2f" % X.instability_index()
>
> It seems like you have only given bits of a program, so I have tried to
> guess what you meant.
>
> > first few lines of my file :
> > AEGEFAHLYGTFRED
> > AEGEFAHLZGTFRED
> > AEGEFGATYGVYTSD
> > AEGEFGATZGVYTSD
> > AEGEFGATYGVZTSD
> > AEGEFGATZGVZTSD
> > AEGEFLYGEIQGTQD
>
> In the following example, I am assuming your sequences are in a plain
> text file, called protparam.txt, which contains each sequence on a
> single line.
>
> Try something like this first of all, and make sure that it prints out
> your sequences correctly:
>
> for line in open("protparam.txt") :
>      #Remove any trailing new lines or white space
>      seq_string = line.rstrip()
>      print "Sequence <%s>" % seq_string
>
> Then try doing the ProtParam.ProteinAnalysis of each sequence string:
>
> from Bio.SeqUtils import ProtParam, ProtParamData
> for line in open("protparam.txt") :
>      #Remove any trailing new lines or white space
>      seq_string = line.rstrip()
>      print "Sequence <%s>" % seq_string
>      X = ProtParam.ProteinAnalysis(seq_string)
>      print "Instability index: %.2f" % X.instability_index()
>
> You'll find it doesn't like the "Z" (presumably this is Glx - glutamic
> acid or glutamine? i.e. E or Q) present in many of your sequences, so
> this next version uses error handling to note this and then carry on to
> the next sequence:
>
> from Bio.SeqUtils import ProtParam, ProtParamData
> for line in open("protparam.txt") :
>      #Remove any trailing new lines or white space
>      seq_string = line.rstrip()
>
>      print #blank line
>      print "Sequence <%s>" % seq_string
>      X = ProtParam.ProteinAnalysis(seq_string)
>      try :
>          print "Instability index: %.2f" % X.instability_index()
>      except KeyError, e :
>          print "Problem with the letter %s in the sequence?" % str(e)
>
> The output is:
>
> Sequence <AEGEFAHLYGTFRED>
> Instability index: 8.39
>
> Sequence <AEGEFAHLZGTFRED>
> Problem with the letter 'Z' in the sequence?
>
> Sequence <AEGEFGATYGVYTSD>
> Instability index: -17.70
>
> Sequence <AEGEFGATZGVYTSD>
> Problem with the letter 'Z' in the sequence?
>
> Sequence <AEGEFGATYGVZTSD>
> Problem with the letter 'Z' in the sequence?
>
> Sequence <AEGEFGATZGVZTSD>
> Problem with the letter 'Z' in the sequence?
>
> Sequence <AEGEFLYGEIQGTQD>
> Instability index: 8.61
>
> You'll have to check yourself to see if these numbers are sensible.  I
> don't know what to suggest for your "Z" entries - the stability will be
> different if you try using E or Q instead.
>
> Peter
>
>


From jhortia1 at jhu.edu  Mon Apr 30 20:16:42 2007
From: jhortia1 at jhu.edu (JASON HORTIATIS)
Date: Mon, 30 Apr 2007 16:16:42 -0400
Subject: [BioPython] local blast output
Message-ID: <f64fc8e94d78.4636166a@johnshopkins.edu>

Dear all,
I'm a novice using biopython to run local blast searches and save the output to a file, but i've run into a problem becuase it seems as though the b_parser has a limit of 250 sequences, however my searches are returning far more than 250 sequences.  Does anyone know if the parser really is limited, and if so if it is possible to work around this?
Thanks for the help, 

Jason


From sbassi at gmail.com  Mon Apr 30 21:26:50 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Mon, 30 Apr 2007 18:26:50 -0300
Subject: [BioPython] local blast output
In-Reply-To: <f64fc8e94d78.4636166a@johnshopkins.edu>
References: <f64fc8e94d78.4636166a@johnshopkins.edu>
Message-ID: <b43bf2080704301426p12b9d878weedc40e2dd98246b@mail.gmail.com>

On 4/30/07, JASON HORTIATIS <jhortia1 at jhu.edu> wrote:
> Dear all,
> I'm a novice using biopython to run local blast searches and save the output to a file, but i've run into a problem becuase it seems as though the b_parser has a limit of 250 sequences, however my searches are returning far more than 250 sequences.  Does anyone know if the parser really is limited, and if so if it is possible to work around this?
> Thanks for the help,

There is no 250 limit in the parser. Please show us to code to help
you. Also tell us blast and biopython version.
Best,
SB.

-- 
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318