From bugzilla-daemon at portal.open-bio.org  Thu Apr  1 18:23:04 2004
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Sat Mar  5 14:43:31 2005
Subject: [Biopython-dev] [Bug 1613] New: pubmed example doesn't work.
	corrected example included
Message-ID: <200404012323.i31NN4Jg017348@portal.open-bio.org>

http://bugzilla.bioperl.org/show_bug.cgi?id=1613

           Summary: pubmed example doesn't work. corrected example included
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev@biopython.org
        ReportedBy: cariaso@yahoo.com


While the bug is small, its enough to scare away new users..

in the cookbook, section 3.3.1
The example is 


from Bio.Medline import PubMed

search_term = 'orchid'
orchid_ids = PubMed.search_for(search_term)


but it doesn't work (for me) that way. however this does work.

from Bio import PubMed

search_term = 'orchid'
orchid_ids = PubMed.search_for(search_term)

just delete the '.Medline'


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From chapmanb at uga.edu  Fri Apr  2 11:07:52 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:31 2005
Subject: [Biopython-dev] Martel NCBI blastn format
In-Reply-To: <1080751311.406af4cfe2d54@webmail.ipk-gatersleben.de>
References: <1080751311.406af4cfe2d54@webmail.ipk-gatersleben.de>
Message-ID: <20040402160752.GA45713@evostick.agtec.uga.edu>

Hi Heiko;

> The parser expression for blastn defined in Bio.expressions.blast.ncbiblast.py
> is broken for version BLASTN 2.2.8 [Jan-05-2004] (and even for the older
> BLASTN 2.2.6 [Apr-09-2003]).
> In the output is an additional 'hsp_info' section, which was not defined in the
> blastn expression. This could be patched in one line.

Thanks. Patch applied to CVS. I appreciate you looking at the Martel
formats for blast -- these are not heavily integrated into Biopython
yet (the Bio.Blast parsers do not use them) so they don't get as
much checking as they need. If you find any other problems or have
any suggestions, please do let us know.

Thanks again.
Brad


From idoerg at burnham.org  Fri Apr  2 13:32:05 2004
From: idoerg at burnham.org (Iddo Friedberg)
Date: Sat Mar  5 14:43:31 2005
Subject: [Biopython-dev] aaindex1
Message-ID: <406DB1A5.3000304@burnham.org>

Hi all,

I wrote a module for parsing the aaindex1 database. From the aaindex1 
README file:
'An amino acid index is a set of 20 numerical values representing any
of the different physicochemical and biological properties of amino
acids.  The AAindex1 section of the Amino Acid Index Database is a
collection of published indices together with the result of cluster
analysis using the correlation coefficient as the distance between
two indices.  This section currently contains 494 indices.'

See http://www.genome.ad.jp/dbget/aaindex.html for details. aaindex1 
file may be
downloaded from
ftp://ftp.genome.ad.jp/pub/db/genomenet/aaindex/aaindex1

This module contains the following classes:
AAIndex1Record: holds the information from a single aaindex1 entry
AAIndex1Reader: parses the aaindex1 file

and one ustility which reads the aaindex1 file and returns a dictionary 
of entries.

Should I open a new directory under Bio/AAIndex1, or is there a more 
appropriate place for this?

./I

-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9930
http://ffas.ljcrf.edu/~iddo

From iliketobicycle at yahoo.ca  Sat Apr  3 15:09:45 2004
From: iliketobicycle at yahoo.ca (Harry Zuzan)
Date: Sat Mar  5 14:43:31 2005
Subject: [Biopython-dev] OligoPython for Affymetrix data
Message-ID: <20040403200945.66503.qmail@web21410.mail.yahoo.com>

Hello,

A short while back I asked if there was an interest in code for
handling Affymetrix data.  The answer was yes, so I put together a
simple module for reading in an Affymetrix cel file.  The data parsed
from the cel file are then available in the form of Numeric arrays.

There is a complementary module that encapsulates these data and makes
them much more useful but it is more complex so I thought I'd find my
BioPython legs with this simple module.  I need to work on an install
script, better documentation, better error handling and an alpha
version number.  I also need to make Makefile for the C++ module
friendly to more users.

The code is at www.oligopython.org.  I'm calling it OligoPython until
it is in the BioPython cvs tree.  The OligoPython license is the
BioPython license verbatim except for the heading.

Since I am unfamiliar with cvs and writing Python install scripts and
writing Makefiles that compile under many versions of unix, I would
appreciate any help integrating this module with BioPython so that I
can add another important module soon.

I'm hoping that there will be interest in this so any feedback is
appreciated.

Best,

Harry

From idoerg at burnham.org  Sat Apr  3 21:57:16 2004
From: idoerg at burnham.org (Iddo Friedberg)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] OligoPython for Affymetrix data
In-Reply-To: <20040403200945.66503.qmail@web21410.mail.yahoo.com>
References: <20040403200945.66503.qmail@web21410.mail.yahoo.com>
Message-ID: <406F798C.5040408@burnham.org>

Hi Harry,

Thanks for your effort. This sounds like a welcome contribution.

I am getting "connection refused" when I am trying get the URL you 
listed. Can you check that out please?

Thanks,

Iddo

Harry Zuzan wrote:
> Hello,
> 
> A short while back I asked if there was an interest in code for
> handling Affymetrix data.  The answer was yes, so I put together a
> simple module for reading in an Affymetrix cel file.  The data parsed
> from the cel file are then available in the form of Numeric arrays.
> 
> There is a complementary module that encapsulates these data and makes
> them much more useful but it is more complex so I thought I'd find my
> BioPython legs with this simple module.  I need to work on an install
> script, better documentation, better error handling and an alpha
> version number.  I also need to make Makefile for the C++ module
> friendly to more users.
> 
> The code is at www.oligopython.org.  I'm calling it OligoPython until
> it is in the BioPython cvs tree.  The OligoPython license is the
> BioPython license verbatim except for the heading.
> 
> Since I am unfamiliar with cvs and writing Python install scripts and
> writing Makefiles that compile under many versions of unix, I would
> appreciate any help integrating this module with BioPython so that I
> can add another important module soon.
> 
> I'm hoping that there will be interest in this so any feedback is
> appreciated.
> 
> Best,
> 
> Harry
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev
> 
> 

-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9930
http://ffas.ljcrf.edu/~iddo

From bugzilla-daemon at portal.open-bio.org  Sun Apr  4 10:13:09 2004
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] [Bug 1605] kMeans.py should be deprecated
Message-ID: <200404041413.i34ED9e7016155@portal.open-bio.org>

http://bugzilla.bioperl.org/show_bug.cgi?id=1605

mdehoon@ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Additional Comments From mdehoon@ims.u-tokyo.ac.jp  2004-04-04 10:13 -------
I added a DeprecationWarning to kMeans.py and xkMeans.py (which relies on
kMeans.py).


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From iliketobicycle at yahoo.ca  Mon Apr  5 11:04:31 2004
From: iliketobicycle at yahoo.ca (Harry Zuzan)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] Oligopython page
Message-ID: <20040405150431.67425.qmail@web21408.mail.yahoo.com>

Hi,

I can't get dyndns to update www.oligopython.org.

For the time being, anyone interested can get the same pages from
oligopython.dyndns.org.

Sorry about any confusion.

Also, there is not a lot of code at this point so I just attached it.

Harry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OligoPython.tgz
Type: application/x-gzip-compressed
Size: 3325 bytes
Desc: OligoPython.tgz
Url : http://portal.open-bio.org/pipermail/biopython-dev/attachments/20040405/b13f5a60/OligoPython.bin
From mdehoon at ims.u-tokyo.ac.jp  Tue Apr  6 01:10:08 2004
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] Oligopython page
In-Reply-To: <20040405150431.67425.qmail@web21408.mail.yahoo.com>
References: <20040405150431.67425.qmail@web21408.mail.yahoo.com>
Message-ID: <40723BB0.5040809@ims.u-tokyo.ac.jp>

Hi,

Thanks for writing oligopython! I had a look at your package to see if I could 
write a setup.py for it, and I noticed that the file parser makes use of C++ 
rather than C. If I'm not mistaken, the only C++ code currently in Biopython is 
Bio.KDTree, which is not installed by default because of problems building it on 
some platforms. Is there some Biopython policy on C++ code? It may also be 
possible to use Martel for the file parsing. I am not familiar with Martel 
myself, so I cannot give much guidance there, but maybe somebody can.

--Michiel.

Harry Zuzan wrote:

> Hi,
> 
> I can't get dyndns to update www.oligopython.org.
> 
> For the time being, anyone interested can get the same pages from
> oligopython.dyndns.org.
> 
> Sorry about any confusion.
> 
> Also, there is not a lot of code at this point so I just attached it.
> 
> Harry
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From iliketobicycle at yahoo.ca  Tue Apr  6 09:26:24 2004
From: iliketobicycle at yahoo.ca (Harry Zuzan)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] Oligopython page
In-Reply-To: <40723BB0.5040809@ims.u-tokyo.ac.jp>
Message-ID: <20040406132624.52327.qmail@web21407.mail.yahoo.com>

Hi,

Originally the parser was in Python but it was too slow.  I could have
written it in C instead of C++ but I have a lot of other code related
to this that is written in C++.

If I write a class in Python that is too slow because of the amount of
data it has to handle I often solve the problem by converting the
Python class to a C++ class.  Then I write a thin wrapper in Python
around the C++ class.  So in my Python scripts nothing changes.  This
is where the C++ code comes from.

I'll read up on install scripts myself.  In the meantime I'm grateful
for any help.

Harry


> Hi,
> 
> Thanks for writing oligopython! I had a look at your package to see
> if I could 
> write a setup.py for it, and I noticed that the file parser makes use
> of C++ 
> rather than C. If I'm not mistaken, the only C++ code currently in
> Biopython is 
> Bio.KDTree, which is not installed by default because of problems
> building it on 
> some platforms. Is there some Biopython policy on C++ code? It may
> also be 
> possible to use Martel for the file parsing. I am not familiar with
> Martel 
> myself, so I cannot give much guidance there, but maybe somebody can.
> 
> --Michiel.
> 
> Harry Zuzan wrote:
> 
> > Hi,
> > 
> > I can't get dyndns to update www.oligopython.org.
> > 
> > For the time being, anyone interested can get the same pages from
> > oligopython.dyndns.org.
> > 
> > Sorry about any confusion.
> > 
> > Also, there is not a lot of code at this point so I just attached
> it.
> > 
> > Harry
> > 
> > 
> >
>
------------------------------------------------------------------------
> > 
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev@biopython.org
> > http://biopython.org/mailman/listinfo/biopython-dev
> 
> -- 
> Michiel de Hoon, Assistant Professor
> University of Tokyo, Institute of Medical Science
> Human Genome Center
> 4-6-1 Shirokane-dai, Minato-ku
> Tokyo 108-8639
> Japan
> http://bonsai.ims.u-tokyo.ac.jp/~mdehoon
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev 

From dalke at dalkescientific.com  Tue Apr  6 16:11:42 2004
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] EUtils
Message-ID: <9F378DC6-8806-11D8-B94E-000393C92466@dalkescientific.com>

I was at the PyCon conference a couple weeks ago and talked with
Mark Johnson from NCBI about several things, including the EUtils
client package in Biopython.

That reminded me that I needed to clean up the package and update
it to support the latest NCBI interface (they added a couple new
features in the last two years).  I've got some free time this
week so I'm working on it.

The changes are:
   - don't require the response the DTD.  I now believe that DTDs
      for this task are the wrong approach.  I'm using elementtree.
   - get rid of the silly Problem class hierarchy I was using
   - simplify the regression tests so it's easier to read
   - add more tests (found some bugs in a few corners already)
   - figure out a better way to do live tests against the server
   - include support for a request throttle
   - pull the documentation out of the docstrings and into an
       actual document
   - resubmit my bug reports in the hopes that they'll fix
       the server
   - rearrange the source tree

For the last, what I want to do is move the documentation
and tests out of Bio/EUtils/ and into the proper places for
Biopython.  As stands EUtils is distributable independent of
Biopython.  I don't think that's worthwhile so I'm thinking
of no longer doing that.  If I do make it independently
distributable then I'll have some little script to assemble
the bits and pieces from the Biopython tree.

Is anyone here using the EUtils client for anything and
has code that I can use?  I want to examine common use
cases and see if there's a way to simplify the interface.
(Eg, have some top-level functions for doing retrieval
rather than making a client object.)  I also want to make
sure I don't change the API to break existing code.


					Andrew
					dalke@dalkescientific.com


From chapmanb at uga.edu  Tue Apr  6 18:00:24 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] Oligopython page
In-Reply-To: <40723BB0.5040809@ims.u-tokyo.ac.jp>
References: <20040405150431.67425.qmail@web21408.mail.yahoo.com>
	<40723BB0.5040809@ims.u-tokyo.ac.jp>
Message-ID: <20040406220024.GB25784@evostick.agtec.uga.edu>

Hey Harry and Michiel;

[Harry announces OligoPython]
> >For the time being, anyone interested can get the same pages from
> >oligopython.dyndns.org.

Thanks for this -- very nice to see people working on microarray
code and we don't have anything like this in Biopython so it is very
welcome.

Michiel:
> Thanks for writing oligopython! I had a look at your package to see if I 
> could write a setup.py for it, and I noticed that the file parser makes use 
> of C++ rather than C. If I'm not mistaken, the only C++ code currently in 
> Biopython is Bio.KDTree, which is not installed by default because of 
> problems building it on some platforms. Is there some Biopython policy on 
> C++ code?

If I remember properly, the reason KDTree is not installed by
default isn't because it's C++ but rather because it used the C++
standard library (stdc++) which caused building problems on some
systems without development libraries.

I think either C++ or C is fine -- basically our only requirement is
that it builds on multiple platforms. Sadly, sometimes figuring that
out requires including it and seeing how many people complain :-).

A simple setup.py that will work for this code is:

from distutils.core import setup
from distutils.extension import Extension

setup(
  name = "OligoPython",
  version = "0.1",
  packages = ["Affymetrix"],
  ext_modules = [Extension("Affymetrix._cel",
                           ["Affymetrix/celmodule.cc"])]
)

This assumes that you put the code into a module directory called
Affymetrix. The other change that is necessary is that the includes
do not need to be relative to the python directory, so celmodule.cc
just needs to do:

#include "Python.h"
#include "Numeric/arrayobject.h"

After this it seems to build and install fine.

I'd be happy to include this in Biopython if you are willing, but do 
have a few suggestions for the code:

1. I'd prefer it to be named something like Bio.Affymetrix rather
then something more generic like Bio.Oligo -- since that would
reflect it's purpose and use a little better. I'm not sure exactly
what your development goals are with this, but if the main goal now
is paring cel files and manipulating them this makes some sense.
Michiel may also have some input here (which is probably more useful
then mine).

2. The current Cel class integrates both parsing and storing the
resulting data in the same class. To be more consistent with
Biopython, I think it would be nice to separate out the work into
two classes something like:

CelParser -- has the parse function and returns a CelRecord object
CelRecord -- contains the parsed data (all of the _pixels, _stdev,
_npix, _nrows, _ncols attributes) and the functions which return
them.

Other then this, things look good -- let me know how you want to
proceed forward on this and maybe coordinate with Michiel if he has
plans for dealing with microarray data and integrating this with
Cluster code and would like to be involved.

Thanks again for the work!
Brad

From chapmanb at uga.edu  Tue Apr  6 18:12:23 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] aaindex1
In-Reply-To: <406DB1A5.3000304@burnham.org>
References: <406DB1A5.3000304@burnham.org>
Message-ID: <20040406221223.GC25784@evostick.agtec.uga.edu>

Hey Iddo;

> I wrote a module for parsing the aaindex1 database. 

Sweet. Good stuff. Thanks much.

> This module contains the following classes:
> AAIndex1Record: holds the information from a single aaindex1 entry
> AAIndex1Reader: parses the aaindex1 file
> 
> and one ustility which reads the aaindex1 file and returns a dictionary 
> of entries.
> 
> Should I open a new directory under Bio/AAIndex1, or is there a more 
> appropriate place for this?

Can we use Bio/AAIndex instead of AAIndex1? I'm not sure I really
understand what the 1 signifies but it just seems a bit weird to me
for no really good reason. Other than that, checking it in under
there sounds great. If there's a good reason for using AAIndex1
(like there is an AAIndex2 or something else) just tell me and do
what you think it best.

Oh yeah, and where is my documentation for Alphabet/Reduced.py? I'm
gonna have to send my "documentation collection" agency out to see
you pretty soon :-).

Thanks again!
Brad

From idoerg at burnham.org  Tue Apr  6 19:05:13 2004
From: idoerg at burnham.org (Iddo Friedberg)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] aaindex1
Message-ID: <407337A9.1060409@burnham.org>


Brad Chapman wrote:


> Can we use Bio/AAIndex instead of AAIndex1? I'm not sure I really
> understand what the 1 signifies but it just seems a bit weird to me
> for no really good reason. Other than that, checking it in under
> there sounds great. If there's a good reason for using AAIndex1
> (like there is an AAIndex2 or something else) just tell me and do
> what you think it best.

AAIndex1 is a database (flat file) of 494 numeric values assigned to
amino acids. AAIndex2 are substitution matrices. I can create an AAIndex
directory, under which I will place the AAIndex1.py file. AAIndex2 will
have to wait.


> Oh yeah, and where is my documentation for Alphabet/Reduced.py? I'm
> gonna have to send my "documentation collection" agency out to see
> you pretty soon :-).

Egads! Not the dread DCA!!!

Just commited to CVS new versions of Bio.utils and Alphabet.Reduced it...

./I


-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9930
http://ffas.ljcrf.edu/~iddo


-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9930
http://ffas.ljcrf.edu/~iddo

From chapmanb at uga.edu  Tue Apr  6 19:07:45 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] aaindex1
In-Reply-To: <407337A9.1060409@burnham.org>
References: <407337A9.1060409@burnham.org>
Message-ID: <20040406230745.GG25784@evostick.agtec.uga.edu>

Hi Iddo;

> AAIndex1 is a database (flat file) of 494 numeric values assigned to
> amino acids. AAIndex2 are substitution matrices. I can create an AAIndex
> directory, under which I will place the AAIndex1.py file. AAIndex2 will
> have to wait.

That sounds great -- no worries about the AAIndex2 -- it'll happen
when it happens. In the meantime, as long as it's documented
(Biopython DCA, watch out) what AAIndex1 does and what files it
deal with that sounds great.

> Just commited to CVS new versions of Bio.utils and Alphabet.Reduced it...

Sweet. Threats always work best -- I never imagined marrying into
the mob would do so much to advance my career.

Brad

From chapmanb at uga.edu  Tue Apr  6 18:31:14 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] EUtils
In-Reply-To: <9F378DC6-8806-11D8-B94E-000393C92466@dalkescientific.com>
References: <9F378DC6-8806-11D8-B94E-000393C92466@dalkescientific.com>
Message-ID: <20040406223114.GD25784@evostick.agtec.uga.edu>

Hey Andrew;
Great to hear from you, as always! Hope you're doing well.

> I was at the PyCon conference a couple weeks ago and talked with
> Mark Johnson from NCBI about several things, including the EUtils
> client package in Biopython.
> 
> That reminded me that I needed to clean up the package and update
> it to support the latest NCBI interface (they added a couple new
> features in the last two years).  I've got some free time this
> week so I'm working on it.

Sweet -- that sounds great. Always glad to have work on it. I know a
few small fixes have gone into the CVS so you may want to work
directly from it, but other than that it's all yours.

>   - don't require the response the DTD.  I now believe that DTDs
>      for this task are the wrong approach.  I'm using elementtree.

Cool. This will also clean up a lot of the install messiness in
setup.py necessary to get the DTDs installed.

>   - figure out a better way to do live tests against the server

If you get somewhere with this please do let me know -- I added some
tests for Registry code recently but have been fairly frustrated
with getting errors because the server is dying. One thing I have
been thinking about is using timeoutsocket:

http://www.timo-tasi.org/python/timeoutsocket.py

to just die neatly on tests if the server is timing out.

> For the last, what I want to do is move the documentation
> and tests out of Bio/EUtils/ and into the proper places for
> Biopython.  As stands EUtils is distributable independent of
> Biopython.  I don't think that's worthwhile so I'm thinking
> of no longer doing that.  

Sounds great -- I've been adding independent documentation into
Docs/cookbook. You'd just need to make a directory for EUtils.

> Is anyone here using the EUtils client for anything and
> has code that I can use?  

I've recently added it within the registry system in
config/DBRegistry.py (class EUtilsDB), so that we can do retrieval
from NCBI in the current right way.

Other then that I've used it in a couple of talks I've given:

http://www.biopython.org/docs/presentations/biopython_exelixis.pdf

and:

http://www.biopython.org/docs/presentations/bosc_biopython.pdf

and I attached a file which I use semi-regularly for retrieving
databases by Taxonomy ID from NCBI (which actually used the
timeoutsocket module I mention above, but from my own personal code
base I use at work).

Hope those help some. Glad to see you around.
Brad
-------------- next part --------------
#!/usr/bin/env python
"""Fetch a number of FASTA files from NCBI based on a taxonomy queries.

This script is meant to be used to scheduled batch downloads of a number
of databases based on taxonomy ids. It reads an input file specified on the
commandline that looks like:

file_name,tax_id,not_tax_id,not_tax_id

Where:
    file_name -- the name of the output file (ie. arabidopsis.fasta)
    tax_id -- the primary taxonmy id we are fetching (ie. 3328)
    not_tax_id -- a list of taxonomy ids to not include. This list can
    be empty or as long as you want.

Usage:
    python get_taxonomy_list.py <tax info file>

Where:
    <tax info file> -- the location of the file to read the taxonomy information
    from, formatted as above

The files are output to a directory called "taxa-<date>" where <date> is the
date the program was run on (ie. 20020904).
"""
import sys
import os
import sgmllib
import time
import urlparse
import time

# biopython
from Bio import Fasta
from Bio.EUtils import HistoryClient

from Bio.PGML.Utils import timeoutsocket
timeoutsocket.setDefaultSocketTimeout(30)

VERBOSE = 1

def main(tax_file):
    assert os.path.exists(tax_file), "Cannot find taxonomy file"
    # create the output directory
    start_dir = os.path.dirname(tax_file)
    time_info = time.localtime(time.time())
    time_name = "%i%02i%02i" % (time_info[0], time_info[1], time_info[2])
    out_name = "taxa-%s" % time_name
    output_dir = os.path.join(start_dir, out_name)
    if not(os.path.exists(output_dir)):
        os.makedirs(output_dir)
    
    all_tax_info = process_tax_file(tax_file)
    for file_name, tax_id, not_ids in all_tax_info:
        output_file = os.path.join(output_dir, file_name)
        process_taxonomy_id(tax_id, not_ids, output_file)

def process_tax_file(tax_file):
    """Split a taxonomy information file up into relevant info from commas.
    """
    tax_info = []
    handle = open(tax_file, "r")
    for line in handle.xreadlines():
        line = line.strip()
        if line and line.find("#") == -1:
            parts = line.split(",")
            if len(parts) < 2:
                raise ValueError("Line is badly formatted: %s" % line)
            filename = parts[0]
            tax_id = parts[1]
            not_ids = parts[2:]
            tax_info.append((filename, tax_id, not_ids))
    
    handle.close()
    return tax_info

def process_taxonomy_id(taxonomy_id, not_ids, output_file):
    """Deal with the process of retrieving the FASTA file for a query.
    
    This uses the new EUtils interface at NCBI, which is so much faster
    and less annoying then the old way.
    """
    if VERBOSE:
        print "Saving taxonomy id %s to %s" % (taxonomy_id, output_file)
       
    # build the query
    query = "txid%s[Organism]" % taxonomy_id
    for not_id in not_ids:
        query += " NOT txid%s[Organism]" % not_id

    client = HistoryClient.HistoryClient()
    while 1:
        try:
            results = client.search(query, db = "nucleotide")
            break
        except (timeoutsocket.Timeout, timeoutsocket.Error, ValueError):
            print "Timed out talking to NCBI, trying again"
            time.sleep(10)

    if VERBOSE:
        print "\tSearch got %s results" % (len(results))

    while 1:
        try:
            fasta = results.efetch(retmode = "text", rettype = "fasta")
            break
        except (timeoutsocket.Timeout, timeoutsocket.Error, ValueError):
            print "Timed out talking to NCBI, trying again"
            time.sleep(10)

    output_handle = open(output_file, "w")
    while 1:
        line = fasta.readline()
        if not(line):
            break
        output_handle.write(line)
    output_handle.close()

    if VERBOSE:
        output_handle = open(output_file)
        num_seqs = check_seqs(output_handle)
        output_handle.close()
        print "\tWrote %s sequences" % (num_seqs)

def check_seqs(handle):
    """Count the number of FASTA sequences in the passed handle.
    """
    num_seqs = 0
    iterator = Fasta.Iterator(handle)
    while 1:
        rec = iterator.next()
        if not rec:
            break
        num_seqs += 1

    return num_seqs

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print "Invalid number of arguments supplied"
        print __doc__
        sys.exit()

    sys.exit(main(sys.argv[1]))
    
From thamelry at binf.ku.dk  Wed Apr  7 07:23:55 2004
From: thamelry at binf.ku.dk (Thomas Hamelryck)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] Oligopython page
In-Reply-To: <20040406220024.GB25784@evostick.agtec.uga.edu>
References: <20040405150431.67425.qmail@web21408.mail.yahoo.com>
	<40723BB0.5040809@ims.u-tokyo.ac.jp>
	<20040406220024.GB25784@evostick.agtec.uga.edu>
Message-ID: <200404071323.55612.thamelry@binf.ku.dk>

On Wednesday 07 April 2004 00:00, Brad Chapman wrote:

> If I remember properly, the reason KDTree is not installed by
> default isn't because it's C++ but rather because it used the C++
> standard library (stdc++) which caused building problems on some
> systems without development libraries.

To quote from an older post:

KDTree works fine. But: it needs a working C++ compiler, and 
a complete installation of Numpy (including header files) to compile. 
It seems that on Solaris it does not compile due to a bug in Distutils, which 
is not really coping well with C++ on some platforms (ie. missing flags, 
compiling with gcc instead of g++ etc.). 

---
Thomas Hamelryck
Bioinformatik centret      
Universitetsparken 15     
Bygning 10                 
DK-2100 K?benhavn ?
Denmark
http://www.binf.ku.dk/users/thamelry/


From mdehoon at ims.u-tokyo.ac.jp  Wed Apr  7 22:54:32 2004
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] Oligopython page
In-Reply-To: <200404071323.55612.thamelry@binf.ku.dk>
References: <20040405150431.67425.qmail@web21408.mail.yahoo.com>	<40723BB0.5040809@ims.u-tokyo.ac.jp>	<20040406220024.GB25784@evostick.agtec.uga.edu>
	<200404071323.55612.thamelry@binf.ku.dk>
Message-ID: <4074BEE8.3070704@ims.u-tokyo.ac.jp>

I tried compiling KDTree on a Unix machine running SunOS 5.8. It has two Python 
versions, one compiled with gcc and the other one with the native cc compiler. 
Distutils uses the same compiler as the one used to compile Python itself. The 
gcc-Python compiled KDTree without problems, but the cc-Python did not, as cc 
doesn't handle C++. The same problem may occur on other Unix platforms if the 
native compiler rather than gcc was used to compile Python.

--Michiel.

Thomas Hamelryck wrote:

> On Wednesday 07 April 2004 00:00, Brad Chapman wrote:
> 
> 
>>If I remember properly, the reason KDTree is not installed by
>>default isn't because it's C++ but rather because it used the C++
>>standard library (stdc++) which caused building problems on some
>>systems without development libraries.
> 
> 
> To quote from an older post:
> 
> KDTree works fine. But: it needs a working C++ compiler, and 
> a complete installation of Numpy (including header files) to compile. 
> It seems that on Solaris it does not compile due to a bug in Distutils, which 
> is not really coping well with C++ on some platforms (ie. missing flags, 
> compiling with gcc instead of g++ etc.). 
> 
> ---
> Thomas Hamelryck
> Bioinformatik centret      
> Universitetsparken 15     
> Bygning 10                 
> DK-2100 K?benhavn ?
> Denmark
> http://www.binf.ku.dk/users/thamelry/
> 
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev
> 
> 

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From chapmanb at uga.edu  Mon Apr 19 07:56:16 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] Work towards getting KDTree compiling
Message-ID: <20040419115616.GA12006@misterbd.agtec.uga.edu>

Hello Thomas and all;
Since Michiel wrote about the problems that were causing the KDTree
C++ extension to fail, I've been taking a look into seeing how we
might fix that problem and get KDTree compiled by default in
Biopython. To quote from Michiel's mail:

> I tried compiling KDTree on a Unix machine running SunOS 5.8. It has two 
> Python versions, one compiled with gcc and the other one with the native cc 
> compiler. Distutils uses the same compiler as the one used to compile 
> Python itself. The gcc-Python compiled KDTree without problems, but the 
> cc-Python did not, as cc doesn't handle C++. The same problem may occur on 
> other Unix platforms if the native compiler rather than gcc was used to 
> compile Python.

I was a bit surprised that distutils didn't recognize the C++ code
as such and use a C++ compiler, so I dug around a bit in the
distutils code and looked at how it figured out what is C++ and what
is C. Turns out, it uses the filename extensions to do this via the
following dictionary:

    language_map = {".c"   : "c",
                    ".cc"  : "c++",
                    ".cpp" : "c++",
                    ".cxx" : "c++",
                    ".m"   : "objc",
                   }

I guess if it doesn't find it here, it defaults to using the C
compiler, which is what it seemed like it was doing.

As I dug further, I discovered you can set the language when
specifying the extension, and then this'll be used instead of the
filename extension detection.

So, I checked in a modified setup.py that compiles KDTree by default
and sets the language to c++:
    
Extension('Bio.KDTree._CKDTree',
              ["Bio/KDTree/KDTree.C",
               "Bio/KDTree/KDTree.swig.C"],
              libraries=["stdc++"],
              language="c++"
              ),

I hope maybe this will fix the problem on non-gcc systems and get
KDTree compiled by default.

I was wondering if people on potential problem systems (Solaris,
Windows, Mac OSX) would mind testing this out to see if it compiles
happily. If so, I'll leave it in for the next release -- otherwise,
we'll have to keep working.

Thanks much for any reports (hopefully of success :-).
Brad

From jeffrey_chang at stanfordalumni.org  Mon Apr 19 14:32:38 2004
From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] Work towards getting KDTree compiling
In-Reply-To: <20040419115616.GA12006@misterbd.agtec.uga.edu>
References: <20040419115616.GA12006@misterbd.agtec.uga.edu>
Message-ID: <EF820C0B-922F-11D8-9764-000A956845CE@stanfordalumni.org>

On Apr 19, 2004, at 7:56 AM, Brad Chapman wrote:

> So, I checked in a modified setup.py that compiles KDTree by default
> and sets the language to c++:


> I hope maybe this will fix the problem on non-gcc systems and get
> KDTree compiled by default.
>
> I was wondering if people on potential problem systems (Solaris,
> Windows, Mac OSX) would mind testing this out to see if it compiles
> happily. If so, I'll leave it in for the next release -- otherwise,
> we'll have to keep working.

It built successfully for me, on Mac OS X 10.3.3.  The native compiler 
for Macs is gcc, so it also worked before the modification.  The 
modification did, however, change the compiler for the module from gcc 
to c++.  However, this is fine, because c++ is symlinked (by default?  
I didn't do it) to g++.

Relevant build output is attached.

Jeff


building 'Bio.KDTree._CKDTree' extension
creating build/temp.darwin-7.3.0-Power_Macintosh-2.3/Bio/KDTree
gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp 
-mno-fused-madd -DNDEBUG -g -O3 -Wall -Wstrict-prototypes 
-I/opt/local/include/python2.3 -c Bio/KDTree/KDTree.C -o 
build/temp.darwin-7.3.0-Power_Macintosh-2.3/Bio/KDTree/KDTree.o
Bio/KDTree/KDTree.C: In member function `void
    KDTree::neighbor_simple_search(float)':
Bio/KDTree/KDTree.C:914: warning: comparison between signed and unsigned
    integer expressions
Bio/KDTree/KDTree.C:923: warning: comparison between signed and unsigned
    integer expressions
gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp 
-mno-fused-madd -DNDEBUG -g -O3 -Wall -Wstrict-prototypes 
-I/opt/local/include/python2.3 -c Bio/KDTree/KDTree.swig.C -o 
build/temp.darwin-7.3.0-Power_Macintosh-2.3/Bio/KDTree/KDTree.swig.o
c++ -L/opt/local/lib -bundle -bundle_loader /opt/local/bin/python2.3 
build/temp.darwin-7.3.0-Power_Macintosh-2.3/Bio/KDTree/KDTree.o 
build/temp.darwin-7.3.0-Power_Macintosh-2.3/Bio/KDTree/KDTree.swig.o 
-lstdc++ -o 
build/lib.darwin-7.3.0-Power_Macintosh-2.3/Bio/KDTree/_CKDTree.so


From thamelry at binf.ku.dk  Mon Apr 19 15:12:30 2004
From: thamelry at binf.ku.dk (thamelry@binf.ku.dk)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] Re: Work towards getting KDTree compiling
In-Reply-To: <20040419115616.GA12006@misterbd.agtec.uga.edu>
References: <20040419115616.GA12006@misterbd.agtec.uga.edu>
Message-ID: <32794.80.63.229.120.1082401950.squirrel@www.binf.ku.dk>

Hi Brad,

> Since Michiel wrote about the problems that were causing the KDTree C++
> extension to fail, I've been taking a look into seeing how we
> might fix that problem and get KDTree compiled by default in
> Biopython.

It works for me on Mandrake 9.2 (it also worked before, of course).
But I've noticed that in the first two steps gcc is still used instead of
g++.
Output:

gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -fomit-frame-pointer -pipe
-march=i586 -mcpu=pentiumpro -g -fPIC -I/usr/include/python2.3 -c
Bio/KDTree/KDTree.swig.C -o
build/temp.linux-i686-2.3/Bio/KDTree/KDTree.swig.o
gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -fomit-frame-pointer -pipe
-march=i586 -mcpu=pentiumpro -g -fPIC -I/usr/include/python2.3 -c
Bio/KDTree/KDTree.C -o build/temp.linux-i686-2.3/Bio/KDTree/KDTree.o
g++ -pthread -shared build/temp.linux-i686-2.3/Bio/KDTree/KDTree.o
build/temp.linux-i686-2.3/Bio/KDTree/KDTree.swig.o -lstdc++ -o
build/lib.linux-i686-2.3/Bio/KDTree/_CKDTree.so
-Thomas


From hoffman at ebi.ac.uk  Mon Apr 19 16:01:24 2004
From: hoffman at ebi.ac.uk (Michael Hoffman)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] Re: Work towards getting KDTree compiling
In-Reply-To: <32794.80.63.229.120.1082401950.squirrel@www.binf.ku.dk>
References: <20040419115616.GA12006@misterbd.agtec.uga.edu>
	<32794.80.63.229.120.1082401950.squirrel@www.binf.ku.dk>
Message-ID: <Pine.LNX.4.58.0404192051170.30434@qnzvnan.rov.np.hx>

On Mon, 19 Apr 2004 thamelry@binf.ku.dk wrote:

> It works for me on Mandrake 9.2 (it also worked before, of course).
> But I've noticed that in the first two steps gcc is still used instead of
> g++.

On OSF1 V5.1 it also used cc instead of cxx which means it doesn't work.

The reason for this is that the compiler_so attribute of the compiler
object is always set to the C compiler. There is no compiler_cxx_so. I
think this is a bug in distutils. If you need additional options set
for shared object compilation you can always do it through the CXX
environment variable.

I hope no one minds that I just checked in a "customization"
(translation: slightly hacky workaround) to the build_ext_biopython
class which fixed this on OSF1. Things still work for me on Redhat
Linux 9 with the new setup.py.
-- 
Michael Hoffman <hoffman@ebi.ac.uk>
European Bioinformatics Institute

From chapmanb at uga.edu  Mon Apr 19 12:23:05 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] Re: Work towards getting KDTree compiling
In-Reply-To: <Pine.LNX.4.58.0404192051170.30434@qnzvnan.rov.np.hx>
References: <20040419115616.GA12006@misterbd.agtec.uga.edu>
	<32794.80.63.229.120.1082401950.squirrel@www.binf.ku.dk>
	<Pine.LNX.4.58.0404192051170.30434@qnzvnan.rov.np.hx>
Message-ID: <20040419162305.GB596@misterbd.agtec.uga.edu>

Hey all;

> > It works for me on Mandrake 9.2 (it also worked before, of course).
> > But I've noticed that in the first two steps gcc is still used instead of
> > g++.

Ah ha -- well spotted. 

> On OSF1 V5.1 it also used cc instead of cxx which means it doesn't work.
> 
> The reason for this is that the compiler_so attribute of the compiler
> object is always set to the C compiler. There is no compiler_cxx_so. I
> think this is a bug in distutils. 

Yeah, I'd say so as well -- it seems pretty pointless to only be
able to set the language for part of the compilation. I just went
back and looked again at what the behavior was when the language is
autodetected, and it is the same.

> I hope no one minds that I just checked in a "customization"
> (translation: slightly hacky workaround) to the build_ext_biopython
> class which fixed this on OSF1. Things still work for me on Redhat
> Linux 9 with the new setup.py.

Brilliant. It seems simple enough and only requires us to specify
the language as c++ for our included C++ code. +1 from me for keeping
this in the setup.py.

If people on disparate platforms could give this another test and
let me know if anything breaks that would be great. Especially like
to hear from the ol' Windows folks. For the record, it all works
fine for me on a FreeBSD machine with gcc (but we probably already
knew that :-).

Thanks for all the testing and thanks again for the patch Michael.
Brad

From mdehoon at ims.u-tokyo.ac.jp  Sun Apr 25 08:00:24 2004
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] Documentation for Bio.LogisticRegression
Message-ID: <408BA858.2030207@ims.u-tokyo.ac.jp>

Dear all,

Recently I have been using the logistic regression model in 
Bio.LogisticRegression to predict transcription factors in bacteria (thanks 
Jeff! Great work). Over the weekend, I wrote some documentation for this module 
and submitted it to CVS. Jeff (or other interested people), can you have a look 
at it and check if you agree with my description? Feel free to add yourself as 
one of the authors. By the way, the function train in Bio.LogisticRegression has 
a keyword typecode, whose usage I didn't understand, so it is not included in 
the documentation.

The documentation is under biopython/Doc/cookbook/LogisticRegression:

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/cookbook/LogisticRegression/?cvsroot=biopython

--Michiel.

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From thamelry at binf.ku.dk  Sun Apr 25 11:50:04 2004
From: thamelry at binf.ku.dk (thamelry@binf.ku.dk)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] 3D Vector class
In-Reply-To: <408BA858.2030207@ims.u-tokyo.ac.jp>
References: <408BA858.2030207@ims.u-tokyo.ac.jp>
Message-ID: <33255.80.63.229.248.1082908204.squirrel@www.binf.ku.dk>


A Vector class representing a 3D vector was added to Bio.PDB
Operations include dot and cross product, addition, substraction,
division by a scalar, and calculation of norm, angles and
dihedral angles.

Cheers,

-Thomas


From jeffrey_chang at stanfordalumni.org  Sun Apr 25 21:45:23 2004
From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] Documentation for Bio.LogisticRegression
In-Reply-To: <408BA858.2030207@ims.u-tokyo.ac.jp>
References: <408BA858.2030207@ims.u-tokyo.ac.jp>
Message-ID: <61FF19EA-9723-11D8-8A46-000A956845CE@stanfordalumni.org>

Hi Michiel,

Wow, this is a really nice document!  About the only comment that I  
have, is that in the first sentence "distinguish K classes from each  
other" should be "distinguish 2 classes."  While there are multinomial  
models for logistic regression, this code does not handle them.

The "typecode" parameter allows the user to choose the type of Numeric  
matrix to use.  Since Newton-Raphson can eat up a lot of memory, on  
large problems, sometimes it may be beneficial to use single-precision  
floats rather than double, which is used by default.  Thus, "typecode"  
accepts Numeric typecodes, which are defined in the Numeric library  
like Numeric.Float16, Numeric.Float32, etc.  I left that parameter  
undocumented because 1) it goes deeper into the internals than I  
normally like to expose, and 2) I wasn't sure how useful it is.

Jeff


On Apr 25, 2004, at 8:00 AM, Michiel Jan Laurens de Hoon wrote:

> Dear all,
>
> Recently I have been using the logistic regression model in  
> Bio.LogisticRegression to predict transcription factors in bacteria  
> (thanks Jeff! Great work). Over the weekend, I wrote some  
> documentation for this module and submitted it to CVS. Jeff (or other  
> interested people), can you have a look at it and check if you agree  
> with my description? Feel free to add yourself as one of the authors.  
> By the way, the function train in Bio.LogisticRegression has a keyword  
> typecode, whose usage I didn't understand, so it is not included in  
> the documentation.
>
> The documentation is under biopython/Doc/cookbook/LogisticRegression:
>
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/ 
> cookbook/LogisticRegression/?cvsroot=biopython
>
> --Michiel.
>
> -- 
> Michiel de Hoon, Assistant Professor
> University of Tokyo, Institute of Medical Science
> Human Genome Center
> 4-6-1 Shirokane-dai, Minato-ku
> Tokyo 108-8639
> Japan
> http://bonsai.ims.u-tokyo.ac.jp/~mdehoon
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev


From gllbab at hotmail.com  Sun Apr 25 23:38:07 2004
From: gllbab at hotmail.com (corey)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] Increase your metabolism with a pill! (fvmhpk)
Message-ID: <1082950687-7537@excite.com>

Slow down the effects of aging with Human Growth Hormone.

As you get older, your body produces less H.G.H, thus your body deteriorates.

Click here to learn more about it:
http://yourpills.biz/hgh/index.php?pid=eph9106


venus snuffy vermont metallica barry jared guido deutsch corrado mars philip doom2 deutsch 
carolina impala medical amanda1 malcolm bull cesar philip tequila ariane christop nick praise 

From Marijn.vanderGaag at wur.nl  Mon Apr 26 03:45:23 2004
From: Marijn.vanderGaag at wur.nl (Gaag, Marijn van der)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] new biopython modules 
Message-ID: <5F9035D8A446C84C903301CCD5FC8DB4190653@salte0010.wurnet.nl>

Hi,

I have made a parser and a record module for Fasta/ssearch similarity search results, similar to those made by Jeff Chang for parsing blast/recording the results.
Are you interested in it for use in biopython? 
If so, let me know and I will send you the scripts. We (Plant Research International at the Wageningen University and Research Centre, The Netherlands) are using these two modules for storing results of batch fasta searches. 

However, I will not be available for maintaining these modules since I will move to another job soon and will not be involved in bioinformatics anymore. 

Greetings, 
marijn van der gaag
Plant research International
Wageningen University and Research Centre
The Netherlands


From j.a.casbon at qmul.ac.uk  Tue Apr 27 07:45:20 2004
From: j.a.casbon at qmul.ac.uk (James Casbon)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] COMPASS parsing code
Message-ID: <200404271245.20940.j.a.casbon@qmul.ac.uk>


Hi,

I have written some code for parsing compass results.  Compass implements 
profile/profile alignment and is available by ftp.  See:

http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12547212
http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=14500884

for more details.

I have attached the code, which you might like to include in the biopython 
distribution. 

There are probably a few issues with the code that could make it better:

* the unit tests use some sample input, file comtest1 and comtest2.  These are 
just read using open.  I have seen someone use test.locate or something like 
that, but I'm not sure how that works.  If you want to enlighten me, I'll 
change it.

* i have used regular expressions inefficiently, as I'm not sure how you're 
supposed to cache them using the _Scanner/_Consumer framework.  At the moment 
each subroutine compiles an re when called, which can't be good.  Again, 
please enlighten me to a better way and I will change it.


regards,
James

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Compass.py
Type: application/x-python
Size: 12778 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython-dev/attachments/20040427/96a80e10/Compass.bin
-------------- next part --------------
Ali1: 60456.blo.gz.aln  Ali2: allscop//14982.blo.gz.aln
Threshold of effective gap content in columns: 0.5
length1=388     filtered_length1=386    length2=116     filtered_length2=115
Nseqs1=399      Neff1=12.972    Nseqs2=1        Neff2=11.313
Smith-Waterman score = 35        Evalue = 1.01e+03

QUERY   178    KKDLEEIAD
               ++ ++++++
QUERY   9      QAAVQAVTA

Ali1: 60456.blo.gz.aln  Ali2: allscop//14983.blo.gz.aln
Threshold of effective gap content in columns: 0.5
length1=388     filtered_length1=386    length2=121     filtered_length2=119
Nseqs1=399      Neff1=12.972    Nseqs2=1        Neff2=11.168
Smith-Waterman score = 35        Evalue = 1.01e+03

QUERY   178    KKDLEEIAD
               ++ ++++++
QUERY   9      REAVEAAVD

Ali1: 60456.blo.gz.aln  Ali2: allscop//14984.blo.gz.aln
Threshold of effective gap content in columns: 0.5
length1=388     filtered_length1=386    length2=145     filtered_length2=137
Nseqs1=399      Neff1=12.972    Nseqs2=1        Neff2=5.869
Smith-Waterman score = 37        Evalue = 5.75e+02

QUERY   371    LEEAMDRMER~~~V
               + ++++ + +   +
QUERY   76     LQNFIDQLDNpddL

Ali1: 60456.blo.gz.aln  Ali2: allscop//15010.blo.gz.aln
Threshold of effective gap content in columns: 0.5
length1=388     filtered_length1=386    length2=141     filtered_length2=141
Nseqs1=399      Neff1=12.972    Nseqs2=1        Neff2=6.099
Smith-Waterman score = 37        Evalue = 5.75e+02

QUERY   163    LIINSP
               ++++++
QUERY   32     LFDAHD


-------------- next part --------------
....Ali1: 60456.blo.gz.aln      Ali2: 60456.blo.gz.aln
Threshold of effective gap content in columns: 0.5
length1=388     filtered_length1=386    length2=388     filtered_length2=386
Nseqs1=399      Neff1=12.972    Nseqs2=399      Neff2=12.972
Smith-Waterman score = 2759      Evalue = 0.00e+00

QUERY   2      LSDRLELVSASEIRKLFDIAAGMKDVISLGIGEPDFDTPQHIKEYAKEALDKGLTHYGPN
               ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY   2      LSDRLELVSASEIRKLFDIAAGMKDVISLGIGEPDFDTPQHIKEYAKEALDKGLTHYGPN


QUERY          IGLLELREAIAEKLKKQNGIEADPKTEIMVLLGANQAFLMGLSAFLKDGEEVLIPTPAFV
               ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY          IGLLELREAIAEKLKKQNGIEADPKTEIMVLLGANQAFLMGLSAFLKDGEEVLIPTPAFV


QUERY          SYAPAVILAGGKPVEVPTYEEDEFRLNVDELKKYVTDKTRALIINSPCNPTGAVLTKKDL
               ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY          SYAPAVILAGGKPVEVPTYEEDEFRLNVDELKKYVTDKTRALIINSPCNPTGAVLTKKDL


QUERY          EEIADFVVEHDLIVISDEVYEHFIYDDARHYSIASLDGMFERTITVNGFSKTFAMTGWRL
               ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY          EEIADFVVEHDLIVISDEVYEHFIYDDARHYSIASLDGMFERTITVNGFSKTFAMTGWRL


QUERY          GFVAAPSWIIERMVKFQMYNATCPVTFIQYAAAKALKDERSWKAVEEMRKEYDRRRKLVW
               ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY          GFVAAPSWIIERMVKFQMYNATCPVTFIQYAAAKALKDERSWKAVEEMRKEYDRRRKLVW


QUERY          KRLNEMGLPTVKPKGAFYIFPRIRDTGLTSKKFSELMLKEARVAVVPGSAFGKAGEGYVR
               ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY          KRLNEMGLPTVKPKGAFYIFPRIRDTGLTSKKFSELMLKEARVAVVPGSAFGKAGEGYVR


QUERY          ISYATAYEKLEEAMDRMERVLKERKL
               ++++++++++++++++++++++++++
QUERY          ISYATAYEKLEEAMDRMERVLKERKL

From bugzilla-daemon at portal.open-bio.org  Tue Apr 27 19:31:34 2004
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] [Bug 1627] New: "Unexpected end of stream" when
	parsing Blast results
Message-ID: <200404272331.i3RNVY73001338@portal.open-bio.org>

http://bugzilla.bioperl.org/show_bug.cgi?id=1627

           Summary: "Unexpected end of stream" when parsing Blast results
           Product: Biopython
           Version: 1.24
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev@biopython.org
        ReportedBy: rayka@mit.edu


The parsing of Blast results seems to have difficulty in recognizing the end 
of a Blast file.  I've gotten an "Unexpected end of stream" error with several 
different genes.

The code:
blast_results = NCBIWWW.blast('blastn', 'nr', Seq.tostring(), 
entrez_query=org, filter='off', expect='1000', word_size='7')
blast_parser = NCBIWWW.BlastParser()
blast_record = blast_parser.parse(blast_results)

Results in error:
  blast_record = blast_parser.parse(blast_results)
  File "C:\Python23\lib\site-packages\Bio\Blast\NCBIWWW.py", line 48, in parse
    self._scanner.feed(handle, self._consumer)
  File "C:\Python23\lib\site-packages\Bio\Blast\NCBIWWW.py", line 97, in feed
    has_re=re.compile(r'<b>.?BLAST'))
  File "C:\Python23\lib\site-packages\Bio\ParserSupport.py", line 335, in 
read_and_call_until
    line = safe_readline(uhandle)
  File "C:\Python23\lib\site-packages\Bio\ParserSupport.py", line 411, in 
safe_readline
    raise SyntaxError, "Unexpected end of stream."
SyntaxError: Unexpected end of stream.

When the line in safe_readline is printed out before the error is thrown the 
output looks like:
effective search space used: 26390847234
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 15 (29.7 bits)
S1: 12 (24.3 bits)
S2: 13 (26.3 bits)
I suspect this is actually the end of the Blast file but the program does not 
recognize as such.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From idoerg at burnham.org  Thu Apr 29 14:52:42 2004
From: idoerg at burnham.org (Iddo Friedberg)
Date: Sat Mar  5 14:43:32 2005
Subject: [Biopython-dev] How many people on biopython lists?
Message-ID: <40914EFA.4010809@burnham.org>

Hi,

Can someone tell me how many subscribers are there on the biopython and 
biopython-dev lists? It's for a book chapter.. good PR.

Thanks,

Iddo


-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9930
http://ffas.ljcrf.edu/~iddo