From Y.Benita at pharm.uu.nl  Fri Aug  1 02:49:12 2003
From: Y.Benita at pharm.uu.nl (Yair Benita)
Date: Sat Mar  5 14:43:24 2005
Subject: [Biopython-dev] Re: [BioPython] GRAVY index program anyone?
In-Reply-To: <3F298B68.4060200@burnham.org>
Message-ID: <BB4FDA08.2CA0%Y.Benita@pharm.uu.nl>

on 31/7/03 23:34, Iddo Friedberg at idoerg@burnham.org wrote:

> Hi,
> 
> Does anybody have a program which calculates a sequence's Grand average
> of hydropathicity (GRAVY)?
> 
> TIA,
> 
> ./I
> 
> 
Here is one of my functions. I have a collection of many protein analysis
functions, maybe its time to put together a module.
Yair 


# Kyte & Doolittle hydrophobiciy index
kd = { 'A': 1.8,'R':-4.5,'N':-3.5,'D':-3.5,'C': 2.5,
       'Q':-3.5,'E':-3.5,'G':-0.4,'H':-3.2,'I': 4.5,
       'L': 3.8,'K':-3.9,'M': 1.9,'F': 2.8,'P':-1.6,
       'S':-0.8,'T':-0.7,'W':-0.9,'Y':-1.3,'V': 4.2 }


# calculate the garvy according to kyte and doolittle.
def Gravy(ProteinSequence):
    
    if ProteinSequence.islower():
        ProteinSequence = ProteinSequence.upper()
        
    ProtGravy=0.0
    for i in ProteinSequence:
        ProtGravy += kd[i]
        
    return ProtGravy/len(ProteinSequence)


-- 
Yair Benita
Pharmaceutical Proteomics
Faculty of Pharmacy
Utrecht University


From dalke at dalkescientific.com  Fri Aug  1 03:40:05 2003
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sat Mar  5 14:43:24 2005
Subject: [Biopython-dev] Re: [BioPython] GRAVY index program anyone?
In-Reply-To: <BB4FDA08.2CA0%Y.Benita@pharm.uu.nl>
Message-ID: <5E12B6AE-C3F3-11D7-961C-000393C92466@dalkescientific.com>

Yair:
> Here is one of my functions. I have a collection of many protein 
> analysis
> functions, maybe its time to put together a module.

It would be.

BTW, here's a way to make things go faster - make the dict include
the lowercase characters.  This means you don't need to scan/convert
the sequence before acting on it.

# Kyte & Doolittle hydrophobiciy index
kd = { 'A': 1.8,'R':-4.5,'N':-3.5,'D':-3.5,'C': 2.5,
        'Q':-3.5,'E':-3.5,'G':-0.4,'H':-3.2,'I': 4.5,
        'L': 3.8,'K':-3.9,'M': 1.9,'F': 2.8,'P':-1.6,
        'S':-0.8,'T':-0.7,'W':-0.9,'Y':-1.3,'V': 4.2 }

# add in the lowercase characters
_full_kd = kd.copy()
_full_kd.update(dict([ (k.lower(), v) for k, v in kd.items()]))

# calculate the garvy according to kyte and doolittle.
def Gravy(ProteinSequence):
     _kd = _full_kd  # slightly faster performance with a local name 
lookup
     ProtGravy=0.0
     for i in ProteinSequence:
         ProtGravy += _kd[i]

     return ProtGravy/len(ProteinSequence)

I don't think there's a faster way.  Other tricks, like
   sum([kd[c] for c in s])
and
   sum(map(kd.__getitem__, s))
for the main loop are both slower because they build up the
intermediate list.  I even played around with

def iter_lookup(d, s):
   for c in s: yield d

   sum(iter_lookup(_kd, ProteinSequence))

but at least for a short sequence it's also slower - perhaps because
of the '.next()' method call overhead?

					Andrew
					dalke@dalkescientific.com


From dalke at dalkescientific.com  Fri Aug  1 05:04:56 2003
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sat Mar  5 14:43:24 2005
Subject: [Biopython-dev] Re: [BioPython] GRAVY index program anyone?
In-Reply-To: <5E12B6AE-C3F3-11D7-961C-000393C92466@dalkescientific.com>
Message-ID: <38832842-C3FF-11D7-961C-000393C92466@dalkescientific.com>

Me:
> # add in the lowercase characters
> _full_kd = kd.copy()
> _full_kd.update(dict([ (k.lower(), v) for k, v in kd.items()]))

BTW, I've been experimenting with some of the new 2.3 features.
Unfortunately, I went overboard.  The following is better

_full_kd = {}
for k, v in kd.items():
   _full_kd[k] = _full_kd[k.lower()] = v

					Andrew
					dalke@dalkescientific.com


From Y.Benita at pharm.uu.nl  Fri Aug  1 09:07:39 2003
From: Y.Benita at pharm.uu.nl (Yair Benita)
Date: Sat Mar  5 14:43:24 2005
Subject: [Biopython-dev] Re: [BioPython] GRAVY index program anyone?
In-Reply-To: <5E12B6AE-C3F3-11D7-961C-000393C92466@dalkescientific.com>
Message-ID: <BB5032BB.2CB4%Y.Benita@pharm.uu.nl>

on 1/8/03 9:40, Andrew Dalke at dalke@dalkescientific.com wrote:

> Yair:
>> Here is one of my functions. I have a collection of many protein
>> analysis
>> functions, maybe its time to put together a module.

> Andrew: 
> It would be.
Thanks for the feedback, Andrew. I already implemented your suggestions.

I recall we discussed this issue a few months ago. We talked about a "Tools"
module which will hold all analysis functions. The "Tools" module evolved to
something different and the analysis methods were forgotten.

My own modules have two analysis classes which are initialized with an
appropriate sequence object and have these methods:

class DnaAnalysis
    nucleotide content
    GC content
    Codon adaptation index

class ProteinAnalysis
    Amino acid content
    Molecular weight
    Aromaticity
    Calculate scale -> sliding window with any given dictionary.
    Instability index
    Flexibility
    isoelectric point
    Gravy
    secondary structure -> using Garnier
    
I guess there is some redundancy with existing modules. How should we
proceed with that? Maybe some of you prefer to separate DNA and Protein or
even make separate functions instead of classes.

I will clean up my code and send it soon.
Yair
-- 
Yair Benita
Pharmaceutical Proteomics
Faculty of Pharmacy
Utrecht University


From jchang at smi.stanford.edu  Fri Aug  1 13:07:02 2003
From: jchang at smi.stanford.edu (Jeffrey Chang)
Date: Sat Mar  5 14:43:24 2005
Subject: [Biopython-dev] Re: [BioPython] GRAVY index program anyone?
In-Reply-To: <38832842-C3FF-11D7-961C-000393C92466@dalkescientific.com>
Message-ID: <919ED7E4-C442-11D7-B81C-000A956845CE@smi.stanford.edu>

On Friday, August 1, 2003, at 02:04  AM, Andrew Dalke wrote:

> BTW, I've been experimenting with some of the new 2.3 features.

What is everyone's feelings about Python 2.3 for Biopython?  I'd rather 
be conservative with the versioning (Biopython now requires Python 
2.2), so that people aren't required to upgrade their Python.  However, 
if there are compelling features in 2.3 that people need to use, that 
might be a good reason to bump up the requirement for the next release. 
  Are there features that people need, and are using in 2.3?  I haven't 
upgraded mine yet.

Jeff


From Yves.Bastide at irisa.fr  Mon Aug  4 03:51:22 2003
From: Yves.Bastide at irisa.fr (Yves Bastide)
Date: Sat Mar  5 14:43:24 2005
Subject: [Biopython-dev] Python 2.3 [Was Re: [BioPython] GRAVY index
	program anyone?]
In-Reply-To: <919ED7E4-C442-11D7-B81C-000A956845CE@smi.stanford.edu>
References: <919ED7E4-C442-11D7-B81C-000A956845CE@smi.stanford.edu>
Message-ID: <3F2E107A.4090809@irisa.fr>

Jeffrey Chang wrote:
> On Friday, August 1, 2003, at 02:04  AM, Andrew Dalke wrote:
> 
>> BTW, I've been experimenting with some of the new 2.3 features.
> 
> 
> What is everyone's feelings about Python 2.3 for Biopython?  I'd rather 
> be conservative with the versioning (Biopython now requires Python 2.2), 
> so that people aren't required to upgrade their Python.  However, if 
> there are compelling features in 2.3 that people need to use, that might 
> be a good reason to bump up the requirement for the next release.  Are 
> there features that people need, and are using in 2.3?  I haven't 
> upgraded mine yet.

* enumerate: easy to duplicate.
* The dict() constructor: ditto.
* Universal newline support: nice, though Biopython should do it OK by now.
* Some nice modules: logging, csv, optparse.  None useful for Biopython.

All in all, I think it's best to stick to Python 2.2 -- well, to 
incrementally upgrade Biopython to Python 2 :)

> 
> Jeff
> 

yves


From dalke at dalkescientific.com  Mon Aug  4 04:13:38 2003
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sat Mar  5 14:43:24 2005
Subject: [Biopython-dev] Python 2.3 [Was Re: [BioPython] GRAVY index
	program anyone?]
In-Reply-To: <3F2E107A.4090809@irisa.fr>
Message-ID: <8D5682C5-C653-11D7-961C-000393C92466@dalkescientific.com>

I don't think there's a strong reason to move to 2.3.  Here's the most
relevant changes

sum and enumerate builtins - we can get by with old-style code.

The module I like most is 'datetime'.  As some point we should get our
database records to use this.

For the chemistry work I do, csv is nice, but surprisingly unused
in biology.

The bsddb and bz2 modules are nice for Mindy, but not essential.
(*sigh*, and I need to finish that off.)

The logging might be nice, but I'm just not a logging type of person.

I'm told that optparse is better than getopt, so we should move
scripts over to use the new API

sets is potentially useful, but will require API changes.  Eg,
we could return search results as a set rather than a list.  This
would allow us to do intersections and unions pretty easily.  But
then we lose native order.

socket timeout might be handy in a few cases.  I don't think so
though - the socket should be passed in to objects rather than
created internally.

zipimport suggests a nice way to distribute biopython (excepting
the C extensions)

A lot of ugly slice code can be fixed up with the new slice object
methods.

					Andrew


From Y.Benita at pharm.uu.nl  Mon Aug  4 10:16:21 2003
From: Y.Benita at pharm.uu.nl (Yair Benita)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Additions to SeqUtils
Message-ID: <BB543757.2CC2%Y.Benita@pharm.uu.nl>

Hi All,
As promised a few days ago I submit code to be added to the SeqUtils module.
The modules include:
Codon adaptation index -> for DNA sequence
Protein analysis methods such as isoelectric point, molecular weight and
more. Take a look.

You just have to change the import statement at the top to fit the location
you use for the module.

I would appreciate any comments or feedback.
Thanks,
Yair
-- 
Yair Benita
Pharmaceutical Proteomics
Faculty of Pharmacy
Utrecht University

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/octet-stream
Size: 184320 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython-dev/attachments/20030804/084691a3/attachment.obj
From grouse at mail.utexas.edu  Mon Aug  4 12:34:22 2003
From: grouse at mail.utexas.edu (Michael Hoffman)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Python 2.3
In-Reply-To: <919ED7E4-C442-11D7-B81C-000A956845CE@smi.stanford.edu>
References: <38832842-C3FF-11D7-961C-000393C92466@dalkescientific.com>
	<919ED7E4-C442-11D7-B81C-000A956845CE@smi.stanford.edu>
Message-ID: <Pine.LNX.4.55.0308041125490.5780@indy.icmb.utexas.edu>

On Fri, 1 Aug 2003, Jeffrey Chang wrote:

> What is everyone's feelings about Python 2.3 for Biopython?

I agree with the previous posters--there are some nice features but it
isn't essential. However, I think the programmers might have a hard
time resisting using all those new features forever! ;-) We've already
seen examples of this. It might be a good idea to decide on a
migration schedule that will give the installed base time to migrate
to Python 2.3 before we allow check-in of code. Maybe 1 August 2004?
Don't worry, it will be here before you realize it. :-)

As far as enumerate, I put it into Bio.GFF.GenericTools as soon as I
saw the PEP (indeed the code came straight out of the PEP), as are
many other non-biological utility classes/functions. Have fun!

Also, we have been using optik (now optparse) for all of our scripts
locally for some time. It has greatly increased maintainability and
documentation of our code. I highly recommend you check it out.

csv (available standalone for some time) is very useful for GFF stuff
of course. None of the GFF code I have checked in uses it though.
-- 
Michael Hoffman <grouse@alumni.utexas.net>
The University of Texas at Austin


From Y.Benita at pharm.uu.nl  Tue Aug  5 03:20:02 2003
From: Y.Benita at pharm.uu.nl (Yair Benita)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Additions to SeqUtils
In-Reply-To: <a06001a0abb54b0af32aa@[192.168.1.100]>
Message-ID: <BB552742.2CCD%Y.Benita@pharm.uu.nl>

on 5/8/03 3:16, Mark Yeager at yeagerm@comcast.net wrote:

>> As promised a few days ago I submit code to be added to the SeqUtils module.
>> The modules include:
>> Codon adaptation index -> for DNA sequence
>> Protein analysis methods such as isoelectric point, molecular weight and
>> more. Take a look.
> 
> Hello Yair- I am just starting out (one week now) with learning Python and
> BioPython for small bioinformatics utilities. I am an old Fortran programmer
> so it is very new to me. I started to learn Perl and BioPerl but I could never
> make useful sense of code examples and I decided to go with Python instead.
> 
> To start with, I'd like to script adding additional info to flat file
> databases of proteins of interest. Your example of CAI would be a perfect
> starting point.
> 
> Is there a small example program to orient me to actually do something useful-
> to specify an accession number and lookup the sequence in a fasta file and
> then calculate the CAI? A working example I can play with but actually do
> something useful.
> 
> I am continuing to read through the tutorials but I have yet to make it to
> BioPython. There is probably something already there along these lines-
> perhaps you can point me to that?
> 
> Thanks very much for your contributions, best regards,
> 
> Mark Yeager

Hi Mark,
You can look in the test files for more examples. Here are a few lines which
can help you fetch a gene from Genbank and get the CAI.

from Bio.WWW import NCBI
from Bio import Fasta
import CodonUsage #make sure you put the module on your python path

# fetch a gene from genebank
aGene = 
NCBI.efetch('nucleotide',id='23113',seq_start=373,seq_stop=1113,rettype='fas
ta')

# set up the fasta parser to read it.
parser = Fasta.RecordParser()
iterator = Fasta.Iterator(aGene, parser)
record = iterator.next()

# create an instance of CodonAdaptationIndex
aGeneCai = CodonUsage.CodonAdaptationIndex()

# print the gene in fasta format
print record

# print the CAI for the gene using the cai_for_gene method.
# Note that the default Shart & Li Ecoli index is used when
# you don't specify a different index.
# look in the test_CodonUsage for an example on making your own index.
print "\nCodon adaptation index for the above gene: %.2f" %
aGeneCai.cai_for_gene(record.sequence)

-- 
Yair Benita
Pharmaceutical Proteomics
Faculty of Pharmacy
Utrecht University


From andreas.kuntzagk at mdc-berlin.de  Tue Aug  5 03:33:33 2003
From: andreas.kuntzagk at mdc-berlin.de (Andreas Kuntzagk)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Python 2.3
In-Reply-To: <Pine.LNX.4.55.0308041125490.5780@indy.icmb.utexas.edu>
References: <38832842-C3FF-11D7-961C-000393C92466@dalkescientific.com>
	<919ED7E4-C442-11D7-B81C-000A956845CE@smi.stanford.edu> 
	<Pine.LNX.4.55.0308041125490.5780@indy.icmb.utexas.edu>
Message-ID: <1060068757.11841.58.camel@sulawesi>

Am Mon, 2003-08-04 um 18.34 schrieb Michael Hoffman:
> On Fri, 1 Aug 2003, Jeffrey Chang wrote:
> 
> > What is everyone's feelings about Python 2.3 for Biopython?
> 
> I agree with the previous posters--there are some nice features but it
> isn't essential. 

I think the same.

> However, I think the programmers might have a hard
> time resisting using all those new features forever! 

I can resist for some time. (At least until I find out by accident, that
the next distribution I install comes with 2.3 preinstalled.

> ;-) We've already
> seen examples of this. It might be a good idea to decide on a
> migration schedule that will give the installed base time to migrate
> to Python 2.3 before we allow check-in of code. Maybe 1 August 2004?
> Don't worry, it will be here before you realize it. :-)
> 
> As far as enumerate, I put it into Bio.GFF.GenericTools as soon as I
> saw the PEP (indeed the code came straight out of the PEP), as are
> many other non-biological utility classes/functions. Have fun!

So we already depend on 2.3 ? Or can you use enumerate with 2.2?
The longer I think about enumerate, the more I think it is a useful
feature. Maybe we can change the migration date to September 2003 ;-)

Btw. I was thinking, is there a better way to use the Iterator-classes
in the different Modules? 

Up to now, I do it like this:
Parser = GenBank.RecordParser()
File = open ("foo")
Iterator = GenBank.Iterator(File,parser=Parser)
while 1:
    record = it.next()
    if not record: break
    ...

I think, nicer looking would be:

File = GenBank.Flatfile("foo",Parser)
for record in File:
    # work with record

Can I already do this? Maybe with the new Parsers? (I still haven't
checked it out.)

Andreas


From dalke at dalkescientific.com  Tue Aug  5 03:55:16 2003
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Python 2.3
In-Reply-To: <1060068757.11841.58.camel@sulawesi>
Message-ID: <26B71C22-C71A-11D7-961C-000393C92466@dalkescientific.com>

Andreas Kuntzagk:
> So we already depend on 2.3 ? Or can you use enumerate with 2.2?

The PEP which introduces enumerate includes Python 2.2 compatible
code which implements the same functionality.  Michael Hoffman
added that snippet to his code, so he could use it despite not
supporting 2.3.

It is possible to have a 'compatibility' module which lets some
2.3-style code work on 2.2 Pythons.  For example,

from Bio.compatibility import enumerate

This would be implemented as

try:
   enumerate
except NameError:
   def enumerate ...

I've seen similar code used (rarely) in other modules.  I'm
not enthusiastic one way or the other, but I don't see any deep
problem with it.

> Btw. I was thinking, is there a better way to use the Iterator-classes
> in the different Modules?

*sigh*  Last time I worked on the parser code we still had 2.1
compatibility so I have lots of tricky, ugly code to emulate
iterators through __getitem__.  I need to clean that up.

(And what I really want is someone to pay me for it.  :)

Sadly, not happening soon.

					Andrew
					dalke@dalkescientific.com


From thomas at cbs.dtu.dk  Tue Aug  5 07:46:10 2003
From: thomas at cbs.dtu.dk (Thomas Sicheritz-Ponten)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Additions to SeqUtils
In-Reply-To: <6FCDAE95-C712-11D7-BC7F-000A956845CE@jeffchang.com>
References: <6FCDAE95-C712-11D7-BC7F-000A956845CE@jeffchang.com>
Message-ID: <m3k79sqsnn.fsf@fkogtsp1.bmc.uu.se>

Jeffrey Chang <jchang@jeffchang.com> writes:

> Hej Thomas,
> 
> How are you doing?  Could you take a look at this and submit it if is
> looks OK to you?

?hhmm .. I am currently moving from Uppsala to Malm?, all the essential
stuff (computers, coffee machine, mp3's) are in the moving company truck
Arghhhh - I feel so lonely without my computers!!!! :-)

I can look into that after restoring my original chaos, just send me a
reminding email next week,

cheers
-thomas


> 
> 
> Jeff
> 
> 
> 
> On Monday, August 4, 2003, at 07:16  AM, Yair Benita wrote:
> 
> > Hi All,
> > As promised a few days ago I submit code to be added to the SeqUtils
> > module.
> 
> > The modules include:
> > Codon adaptation index -> for DNA sequence
> > Protein analysis methods such as isoelectric point, molecular weight
> > and
> 
> > more. Take a look.
> >
> > You just have to change the import statement at the top to fit the
> > location
> 
> > you use for the module.
> >
> > I would appreciate any comments or feedback.
> > Thanks,
> > Yair
> > -- 
> > Yair Benita
> > Pharmaceutical Proteomics
> > Faculty of Pharmacy
> > Utrecht University
> >
> > <AddToSeqUtils.tar>_______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev@biopython.org
> > http://biopython.org/mailman/listinfo/biopython-dev
> 

-- 
Sicheritz-Ponten Thomas, Ph.D, thomas@biopython.org      (
Center for Biological Sequence Analysis                   \
BioCentrum-DTU, Technical University of Denmark            )
CBS: +45 45 252485      Building 208, DK-2800 Lyngby  ##----->
Fax: +45 45 931585      http://www.cbs.dtu.dk/thomas       )
                                                          /
     ... damn arrow eating trees ...                     (

From bugzilla-daemon at portal.open-bio.org  Tue Aug  5 11:48:57 2003
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] [Bug 1409] Problems with Martel and setup.py
Message-ID: <200308051548.h75Fmv26001554@localhost.localdomain>

http://bugzilla.bioperl.org/show_bug.cgi?id=1409


------- Additional Comments From dvorak@mcs.anl.gov  2003-08-05 11:48 -------
Hello:

When I try to install, I am having similar problems.  Please see the last line
of output below.

[dvorak@jlogin2 biopython-1.21]$ python setup.py install
--prefix=/soft/apps/packages/python-biopython-1.21
running install
*** Martel *** is either not installed or out of date.

This package is required for many Biopython features.  Please install
it before you install Biopython.
You can find Martel at http://www.biopython.org/~dalke/Martel/.

Do you want to continue this installation? (y/N) y
 *** Reportlab *** is either not installed or out of date.

This package is optional, which means it is only used in a few
specialized modules in Biopython.  You probably don't need this is you
are unsure.  You can ignore this requirement, and install it later if
you see ImportErrors.
You can find Reportlab at http://www.reportlab.com/download.html.

Do you want to continue this installation? (Y/n) y
 running build
running build_py
Traceback (most recent call last):
  File "setup.py", line 387, in ?
    ext_modules=EXTENSIONS,
  File "/var/tmp/python-2.2.1-root/usr/lib/python2.2/distutils/core.py", line
138, in setup
  File "/var/tmp/python-2.2.1-root/usr/lib/python2.2/distutils/dist.py", line
893, in run_commands
  File "/var/tmp/python-2.2.1-root/usr/lib/python2.2/distutils/dist.py", line
913, in run_command
  File "setup.py", line 137, in run
    install.run(self)
  File
"/var/tmp/python-2.2.1-root/usr/lib/python2.2/distutils/command/install.py",
line 491, in run
  File "/usr/lib/python2.2/cmd.py", line 330, in run_command
    print "\n"
  File "/var/tmp/python-2.2.1-root/usr/lib/python2.2/distutils/dist.py", line
913, in run_command
  File
"/var/tmp/python-2.2.1-root/usr/lib/python2.2/distutils/command/build.py", line
107, in run
  File "/usr/lib/python2.2/cmd.py", line 330, in run_command
    print "\n"
  File "/var/tmp/python-2.2.1-root/usr/lib/python2.2/distutils/dist.py", line
913, in run_command
  File "setup.py", line 144, in run
    if not is_Martel_installed():
  File "setup.py", line 190, in is_Martel_installed
    m = can_import("Martel")
  File "setup.py", line 179, in can_import
    return __import__(module_name)
  File "Martel/__init__.py", line 78, in ?
    NoCase = Expression.NoCase
AttributeError: 'module' object has no attribute 'NoCase'


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From jchang at smi.stanford.edu  Tue Aug  5 13:14:04 2003
From: jchang at smi.stanford.edu (Jeffrey Chang)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Python 2.3
In-Reply-To: <1060068757.11841.58.camel@sulawesi>
Message-ID: <36C93FA0-C768-11D7-8FEF-000A956845CE@smi.stanford.edu>

On Tuesday, August 5, 2003, at 12:32  AM, Andreas Kuntzagk wrote:

> Btw. I was thinking, is there a better way to use the Iterator-classes
> in the different Modules?
>
> Up to now, I do it like this:
> Parser = GenBank.RecordParser()
> File = open ("foo")
> Iterator = GenBank.Iterator(File,parser=Parser)
> while 1:
>     record = it.next()
>     if not record: break
>     ...
>
> I think, nicer looking would be:
>
> File = GenBank.Flatfile("foo",Parser)
> for record in File:
>     # work with record

Yes, that is much nicer.  We can do that, now that we require Python 
2.2.  However, the code usually lags behind somewhat, because people 
are reluctant to change working code (and update the documentation), 
and break people's code.  I did go back and add iterator support to the 
Medline code, which introduced some bugs that are now fixed in the CVS. 
  :(

But yes, I think using iterators is cleaner, and should be done at some 
point.

Jeff


From letondal at pasteur.fr  Tue Aug  5 14:23:05 2003
From: letondal at pasteur.fr (Catherine Letondal)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Pise url change
Message-ID: <200308051823.h75IN5BX114747@electre.pasteur.fr>


Hi,

The url of the Pise software that is indicated in:
http://www.biopython.org/scriptcentral/
(http://www-alt.pasteur.fr/~letondal/Pise/#pisepython)
is now invalid.

Would it be possible to change it to the new location:
http://www.pasteur.fr/recherche/unites/sis/Pise/#pisepython

Thanks a lot in advance,

-- 
Catherine Letondal -- Pasteur Institute Computing Center

From chapmanb at uga.edu  Tue Aug  5 14:40:19 2003
From: chapmanb at uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Pise url change
In-Reply-To: <200308051823.h75IN5BX114747@electre.pasteur.fr>
References: <200308051823.h75IN5BX114747@electre.pasteur.fr>
Message-ID: <20030805184018.GB86377@evostick.agtec.uga.edu>

Hi Catherine;

> The url of the Pise software that is indicated in:
> http://www.biopython.org/scriptcentral/
> (http://www-alt.pasteur.fr/~letondal/Pise/#pisepython)
> is now invalid.
> 
> Would it be possible to change it to the new location:
> http://www.pasteur.fr/recherche/unites/sis/Pise/#pisepython

No problem. All done. By the way, you can edit these pages yourself
(which I hoped would make things easier and keep things up to date).
The username and password are at the bottom of:

http://biopython.org/docs/developer/website_technical.html

Same info and password for both the Participants and the Script
Central pages.

Brad

From bugzilla-daemon at portal.open-bio.org  Tue Aug  5 20:19:40 2003
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] [Bug 1409] Problems with Martel and setup.py
Message-ID: <200308060019.h760JecW003032@localhost.localdomain>

http://bugzilla.bioperl.org/show_bug.cgi?id=1409

jchang@biopython.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED


------- Additional Comments From jchang@biopython.org  2003-08-05 20:19 -------
This can happen if the mx.TextTools is not already installed before building Biopython.  If this is 
the case, then import Martel fails because of the mx.TextTools dependency.  However, the 
"Expressions" module is still left in the namespace.  This will be tricky to fix.  The work-around is 
to install mx.TextTools first.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From Y.Benita at pharm.uu.nl  Wed Aug  6 02:33:55 2003
From: Y.Benita at pharm.uu.nl (Yair Benita)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Updates to protein weights
Message-ID: <BB566DF3.2CF0%Y.Benita@pharm.uu.nl>

The protein weights in Bio-> Data -> IUPACData are not accurate enough. For
large proteins I get an error which is unacceptable. Can someone with access
please update them:

protein_weights = {
    "A": 89.09,
    "C": 121.16,
    "D": 133.10,
    "E": 147.13,
    "F": 165.19,
    "G": 75.07,
    "H": 155.16,
    "I": 131.18,
    "K": 146.19,
    "L": 131.18,
    "M": 149.21,
    "N": 132.12,
    "P": 115.13,
    "Q": 146.15,
    "R": 174.20,
    "S": 105.09,
    "T": 119.12,
    "V": 117.15,
    "W": 204.23,
    "Y": 181.19}


-- 
Yair Benita
Pharmaceutical Proteomics
Faculty of Pharmacy
Utrecht University


From Y.Benita at pharm.uu.nl  Wed Aug  6 03:04:55 2003
From: Y.Benita at pharm.uu.nl (Yair Benita)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Additions to SeqUtils
In-Reply-To: <BB543757.2CC2%Y.Benita@pharm.uu.nl>
Message-ID: <BB567537.2CF3%Y.Benita@pharm.uu.nl>

Here is an update to the molecular weight functions in ProtParam:
 
   # Calculate MW from Protein sequence
    def molecular_weight (self):
        # make local dictionary for speed
        MwDict = {}
        # remove a molecule of water from the amino acid weights.
        for i in IUPACData.protein_weights.keys():
            MwDict[i] = IUPACData.protein_weights[i] - 18.02
        MW = 18.02 # add just one water molecule for the whole sequence.
        for i in self.sequence:
            MW += MwDict[i]
        return MW
 

on 4/8/03 16:16, Yair Benita at Y.Benita@pharm.uu.nl wrote:

> Hi All,
> As promised a few days ago I submit code to be added to the SeqUtils module.
> The modules include:
> Codon adaptation index -> for DNA sequence
> Protein analysis methods such as isoelectric point, molecular weight and
> more. Take a look.
> 
> You just have to change the import statement at the top to fit the location
> you use for the module.
> 
> I would appreciate any comments or feedback.
> Thanks,
> Yair

-- 
Yair Benita
Pharmaceutical Proteomics
Faculty of Pharmacy
Utrecht University


From mark_yeager at merck.com  Wed Aug  6 09:46:57 2003
From: mark_yeager at merck.com (Yeager, Mark D)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Updates to protein weights
Message-ID: <70A1E2C0DD86D511B81500508BB23F1F07066C6D@uswpmx00.merck.com>

I am very new to Python and Biopython, so I may be making some irrelevant
suggestions...

For those doing mass spectroscopy (proteomics), it might be beneficial to
include monoisotopic as well as average MW for the amino acids, and
bookkeeping of the additional H2O for a polypeptide could be done at the end
of the calculation.

For example here are a couple amino acids without H2O

 	Monoiso.	Average	
A	71.03711	71.0788	
R	156.10111	156.1875	
	
monoiso H2O= 18.01057
average H2O= 18.0152

(I'm not sure about the accuracy of average H20 -- have to dig into IUPAC)

With the spreading use of highly accurate FTICR (Fourier transform ion
cyclotron resonance) mass spec, this might be a good thing to have in place
sooner rather than later.

FYI:
Carbon is a mix of 12C and 13C isotopes (and 14C which we neglect here).
Natural abundance of 98.9% and 1.1%.
By definition 12C = 12.00000 and all other isotope masses refer to this
standard.
13C = 13.00335.1 
1H = 1.00783
16O is 15.99491.

--
Mark

-----Original Message-----
From: Yair Benita [mailto:Y.Benita@pharm.uu.nl]
Sent: Wednesday, August 06, 2003 2:34 AM
To: biopython-dev@biopython.org
Subject: [Biopython-dev] Updates to protein weights


The protein weights in Bio-> Data -> IUPACData are not accurate enough. For
large proteins I get an error which is unacceptable. Can someone with access
please update them:

protein_weights = {
    "A": 89.09,
    "C": 121.16,
    "D": 133.10,
    "E": 147.13,
    "F": 165.19,
    "G": 75.07,
    "H": 155.16,
    "I": 131.18,
    "K": 146.19,
    "L": 131.18,
    "M": 149.21,
    "N": 132.12,
    "P": 115.13,
    "Q": 146.15,
    "R": 174.20,
    "S": 105.09,
    "T": 119.12,
    "V": 117.15,
    "W": 204.23,
    "Y": 181.19}


-- 
Yair Benita
Pharmaceutical Proteomics
Faculty of Pharmacy
Utrecht University

------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA), and/or
its affiliates (which may be known outside the United States as Merck Frosst,
Merck Sharp & Dohme or MSD) that may be confidential, proprietary copyrighted
and/or legally privileged, and is intended solely for the use of the
individual or entity named on this message.  If you are not the intended
recipient, and have received this message in error, please immediately return
this by e-mail and then delete it.
------------------------------------------------------------------------------

From jchang at smi.stanford.edu  Wed Aug  6 11:39:40 2003
From: jchang at smi.stanford.edu (Jeffrey Chang)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Updates to protein weights
In-Reply-To: <BB566DF3.2CF0%Y.Benita@pharm.uu.nl>
Message-ID: <31B521C2-C824-11D7-A6BC-000A956845CE@smi.stanford.edu>

I have committed these in the the CVS.

Jeff


On Tuesday, August 5, 2003, at 11:33  PM, Yair Benita wrote:

> The protein weights in Bio-> Data -> IUPACData are not accurate 
> enough. For
> large proteins I get an error which is unacceptable. Can someone with 
> access
> please update them:
>
> protein_weights = {
>     "A": 89.09,
>     "C": 121.16,
>     "D": 133.10,
>     "E": 147.13,
>     "F": 165.19,
>     "G": 75.07,
>     "H": 155.16,
>     "I": 131.18,
>     "K": 146.19,
>     "L": 131.18,
>     "M": 149.21,
>     "N": 132.12,
>     "P": 115.13,
>     "Q": 146.15,
>     "R": 174.20,
>     "S": 105.09,
>     "T": 119.12,
>     "V": 117.15,
>     "W": 204.23,
>     "Y": 181.19}
>
>
> -- 
> Yair Benita
> Pharmaceutical Proteomics
> Faculty of Pharmacy
> Utrecht University
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev


From Yves.Bastide at irisa.fr  Fri Aug  8 06:09:03 2003
From: Yves.Bastide at irisa.fr (Yves Bastide)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] [patch] NCBIStandalone.py cleanup
Message-ID: <3F3376BF.2070304@irisa.fr>

Hi,

Here's a patch bringing Bio.Blast.NCBIStandalone nearer to Python 3 
compatibility :-)

Changelog:
* Don't use the string module
* use .find() == -1, not .find() <= 0
* use .startswith() and .endwith()

Regards,

yves
-------------- next part --------------
Index: NCBIStandalone.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Blast/NCBIStandalone.py,v
retrieving revision 1.47
diff -u -p -r1.47 NCBIStandalone.py
--- NCBIStandalone.py	2003/06/09 02:12:09	1.47
+++ NCBIStandalone.py	2003/08/08 10:04:35
@@ -34,9 +34,7 @@ blastpgp        Execute blastpgp.
 """
 
 import os
-import string
 import re
-from types import *
 
 from Bio import File
 from Bio.ParserSupport import *
@@ -145,9 +143,9 @@ class _Scanner:
 
         while 1:
             line = safe_peekline(uhandle)
-            if line[:9] != 'Searching' and \
-               re.search(r"Score +E", line) is None and \
-               string.find(line, 'No hits found') < 0:
+            if (not line.startswith('Searching') and
+                re.search(r"Score +E", line) is None and
+                line.find('No hits found') == -1):
                 break
 
             self._scan_descriptions(uhandle, consumer)
@@ -197,7 +195,7 @@ class _Scanner:
         # Check for these error lines and ignore them for now.  Let
         # the BlastErrorParser deal with them.
         line = uhandle.peekline()
-        if line.find("ERROR:") >= 0 or line.startswith("done"):
+        if line.find("ERROR:") != -1 or line.startswith("done"):
             read_and_call_while(uhandle, consumer.noevent, contains="ERROR:")
             read_and_call(uhandle, consumer.noevent, start="done")
 
@@ -256,7 +254,7 @@ class _Scanner:
 
         # Read the descriptions and the following blank lines, making
         # sure that there are descriptions.
-        if uhandle.peekline()[:19] != 'Sequences not found':
+        if not uhandle.peekline().startswith('Sequences not found'):
             read_and_call_until(uhandle, consumer.description, blank=1)
             read_and_call_while(uhandle, consumer.noevent, blank=1)
 
@@ -269,7 +267,7 @@ class _Scanner:
             # Read the descriptions and the following blank lines.
             read_and_call_while(uhandle, consumer.noevent, blank=1)
             l = safe_peekline(uhandle)
-            if l[:9] != 'CONVERGED' and l[0] != '>':
+            if not l.startswith('CONVERGED') and l[0] != '>':
                 read_and_call_until(uhandle, consumer.description, blank=1)
                 read_and_call_while(uhandle, consumer.noevent, blank=1)
 
@@ -281,7 +279,7 @@ class _Scanner:
     def _scan_alignments(self, uhandle, consumer):
         # First, check to see if I'm at the database report.
         line = safe_peekline(uhandle)
-        if line[:10] == '  Database':
+        if line.startswith('  Database'):
             return
         elif line[0] == '>':
             # XXX make a better check here between pairwise and masterslave
@@ -305,7 +303,7 @@ class _Scanner:
         # Scan a bunch of score/alignment pairs.
         while 1:
             line = safe_peekline(uhandle)
-            if line[:6] != ' Score':
+            if not line.startswith(' Score'):
                 break
             self._scan_hsp(uhandle, consumer)
         consumer.end_alignment()
@@ -318,7 +316,7 @@ class _Scanner:
         read_and_call(uhandle, consumer.title, start='>')
         while 1:
             line = safe_readline(uhandle)
-            if string.lstrip(line)[:8] == 'Length =':
+            if line.lstrip().startswith('Length ='):
                 consumer.length(line)
                 break
             elif is_blank_line(line):
@@ -372,7 +370,7 @@ class _Scanner:
             read_and_call_while(uhandle, consumer.noevent, blank=1)
             line = safe_peekline(uhandle)
             # Alignment continues if I see a 'Query' or the spaces for Blastn.
-            if line[:5] != 'Query' and line[:5] != '     ':
+            if not (line.startswith('Query') or line.startswith('     ')):
                 break
  
     def _scan_masterslave_alignment(self, uhandle, consumer):
@@ -382,10 +380,10 @@ class _Scanner:
             # Check to see whether I'm finished reading the alignment.
             # This is indicated by 1) database section, 2) next psi-blast round
             # patch by chapmanb
-            if line[:9] == 'Searching':
+            if line.startswith('Searching'):
                 uhandle.saveline(line)
                 break
-            elif line[:10] == '  Database':
+            elif line.startswith('  Database'):
                 uhandle.saveline(line)
                 break
             elif is_blank_line(line):
@@ -423,7 +421,7 @@ class _Scanner:
 
 	    line = safe_readline(uhandle)
 	    uhandle.saveline(line)
-            if string.find(line, 'Lambda') >= 0:
+            if line.find('Lambda') != -1:
 		break
 
 	read_and_call(uhandle, consumer.noevent, start='Lambda')
@@ -577,22 +575,22 @@ class _HeaderConsumer:
         self._header = Record.Header()
         
     def version(self, line):
-        c = string.split(line)
+        c = line.split()
         self._header.application = c[0]
         self._header.version = c[1]
         self._header.date = c[2][1:-1]
 
     def reference(self, line):
-        if line[:11] == 'Reference: ':
+        if line.startswith('Reference: '):
             self._header.reference = line[11:]
         else:
             self._header.reference = self._header.reference + line
             
     def query_info(self, line):
-        if line[:7] == 'Query= ':
+        if line.startswith('Query= '):
             self._header.query = line[7:]
-        elif line[:7] != '       ':  # continuation of query_info
-            self._header.query = self._header.query + line
+        elif not line.startswith('       '):  # continuation of query_info
+            self._header.query = "%s%s" % (self._header.query, line)
         else:
             letters, = _re_search(
                 r"([0-9,]+) letters", line,
@@ -600,11 +598,11 @@ class _HeaderConsumer:
             self._header.query_letters = _safe_int(letters)
                 
     def database_info(self, line):
-        line = string.rstrip(line)
-        if line[:10] == 'Database: ':
+        line = line.rstrip()
+        if line.startswith('Database: '):
             self._header.database = line[10:]
-	elif not line[-13:] == 'total letters':
-            self._header.database = self._header.database + string.strip(line)
+	elif not line.endswith('total letters'):
+            self._header.database = self._header.database + line.strip()
         else:
             sequences, letters =_re_search(
                 r"([0-9,]+) sequences; ([0-9,]+) total letters", line,
@@ -614,8 +612,8 @@ class _HeaderConsumer:
 
     def end_header(self):
         # Get rid of the trailing newlines
-        self._header.reference = string.rstrip(self._header.reference)
-        self._header.query = string.rstrip(self._header.query)
+        self._header.reference = self._header.reference.rstrip()
+        self._header.query = self._header.query.rstrip()
 
 class _DescriptionConsumer:
     def start_descriptions(self):
@@ -629,8 +627,8 @@ class _DescriptionConsumer:
         self.__has_n = 0   # Does the description line contain an N value?
 
     def description_header(self, line):
-        if line[:19] == 'Sequences producing':
-            cols = string.split(line)
+        if line.startswith('Sequences producing'):
+            cols = line.split()
             if cols[-1] == 'N':
                 self.__has_n = 1
     
@@ -656,9 +654,9 @@ class _DescriptionConsumer:
         pass
 
     def round(self, line):
-        if line[:18] != 'Results from round':
+        if not line.startswith('Results from round'):
             raise SyntaxError, "I didn't understand the round line\n%s" % line
-        self._roundnum = _safe_int(string.strip(line[18:]))
+        self._roundnum = _safe_int(line[18:])
 
     def end_descriptions(self):
         pass
@@ -674,23 +672,23 @@ class _DescriptionConsumer:
         #   - title must be preserved exactly (including whitespaces)
         #   - score could be equal to e-value (not likely, but what if??)
         #   - sometimes there's an "N" score of '1'.
-        cols = string.split(line)
+        cols = line.split()
         if len(cols) < 3:
             raise SyntaxError, \
                   "Line does not appear to contain description:\n%s" % line
         if self.__has_n:
-            i = string.rfind(line, cols[-1])        # find start of N
-            i = string.rfind(line, cols[-2], 0, i)  # find start of p-value
-            i = string.rfind(line, cols[-3], 0, i)  # find start of score
+            i = line.rfind(cols[-1])        # find start of N
+            i = line.rfind(cols[-2], 0, i)  # find start of p-value
+            i = line.rfind(cols[-3], 0, i)  # find start of score
         else:
-            i = string.rfind(line, cols[-1])        # find start of p-value
-            i = string.rfind(line, cols[-2], 0, i)  # find start of score
+            i = line.rfind(cols[-1])        # find start of p-value
+            i = line.rfind(cols[-2], 0, i)  # find start of score
         if self.__has_n:
             dh.title, dh.score, dh.e, dh.num_alignments = \
-                      string.rstrip(line[:i]), cols[-3], cols[-2], cols[-1]
+                      line[:i].rstrip(), cols[-3], cols[-2], cols[-1]
         else:
             dh.title, dh.score, dh.e, dh.num_alignments = \
-                      string.rstrip(line[:i]), cols[-2], cols[-1], 1
+                      line[:i].rstrip(), cols[-2], cols[-1], 1
         dh.num_alignments = _safe_int(dh.num_alignments)
         dh.score = _safe_int(dh.score)
         dh.e = _safe_float(dh.e)
@@ -706,52 +704,52 @@ class _AlignmentConsumer:
         self._multiple_alignment = Record.MultipleAlignment()
 
     def title(self, line):
-        self._alignment.title = self._alignment.title + string.lstrip(line)
+        self._alignment.title = "%s%s" % (self._alignment.title,
+                                           line.lstrip())
 
     def length(self, line):
-        self._alignment.length = string.split(line)[2]
+        self._alignment.length = line.split()[2]
         self._alignment.length = _safe_int(self._alignment.length)
 
     def multalign(self, line):
         # Standalone version uses 'QUERY', while WWW version uses blast_tmp.
-        if line[:5] == 'QUERY' or line[:9] == 'blast_tmp':
+        if line.startswith('QUERY') or line.startswith('blast_tmp'):
             # If this is the first line of the multiple alignment,
             # then I need to figure out how the line is formatted.
             
             # Format of line is:
             # QUERY 1   acttg...gccagaggtggtttattcagtctccataagagaggggacaaacg 60
             try:
-                name, start, seq, end = string.split(line)
+                name, start, seq, end = line.split()
             except ValueError:
                 raise SyntaxError, "I do not understand the line\n%s" \
                       % line
-            self._start_index = string.index(line, start, len(name))
-            self._seq_index = string.index(line, seq,
-                                           self._start_index+len(start))
+            self._start_index = line.index(start, len(name))
+            self._seq_index = line.index(seq,
+                                         self._start_index+len(start))
             # subtract 1 for the space
             self._name_length = self._start_index - 1
             self._start_length = self._seq_index - self._start_index - 1
-            self._seq_length = string.rfind(line, end) - self._seq_index - 1
+            self._seq_length = line.rfind(end) - self._seq_index - 1
             
-            #self._seq_index = string.index(line, seq)
+            #self._seq_index = line.index(seq)
             ## subtract 1 for the space
-            #self._seq_length = string.rfind(line, end) - self._seq_index - 1
-            #self._start_index = string.index(line, start)
+            #self._seq_length = line.rfind(end) - self._seq_index - 1
+            #self._start_index = line.index(start)
             #self._start_length = self._seq_index - self._start_index - 1
             #self._name_length = self._start_index
 
         # Extract the information from the line
-        name = string.rstrip(line[:self._name_length])
-        start = string.rstrip(
-            line[self._start_index:self._start_index+self._start_length])
+        name = line[:self._name_length]
+        name = name.rstrip()
+        start = line[self._start_index:self._start_index+self._start_length]
+        start = start.rstrip()
         if start:
             start = _safe_int(start)
-        end = string.rstrip(
-            line[self._seq_index+self._seq_length:])
+        end = line[self._seq_index+self._seq_length:].rstrip()
         if end:
             end = _safe_int(end)
-        seq = string.rstrip(
-            line[self._seq_index:self._seq_index+self._seq_length])
+        seq = line[self._seq_index:self._seq_index+self._seq_length].rstrip()
         # right pad the sequence with spaces if necessary
         if len(seq) < self._seq_length:
             seq = seq + ' '*(self._seq_length-len(seq))
@@ -826,7 +824,7 @@ class _AlignmentConsumer:
     def end_alignment(self):
         # Remove trailing newlines
         if self._alignment:
-            self._alignment.title = string.rstrip(self._alignment.title)
+            self._alignment.title = self._alignment.title.rstrip()
 
         # This code is also obsolete.  See note above.
         # If there's a multiple alignment, I will need to make sure
@@ -883,16 +881,16 @@ class _HSPConsumer:
             "I could not find the identities in line\n%s" % line)
         self._hsp.identities = _safe_int(x), _safe_int(y)
 
-        if string.find(line, 'Positives') >= 0:
+        if line.find('Positives') != -1:
             x, y = _re_search(
                 r"Positives = (\d+)\/(\d+)", line,
                 "I could not find the positives in line\n%s" % line)
             self._hsp.positives = _safe_int(x), _safe_int(y)
 
-        if string.find(line, 'Gaps') >= 0:
+        if line.find('Gaps') != -1:
             x, y = _re_search(
                 r"Gaps = (\d+)\/(\d+)", line,
-                "I could not find the positives in line\n%s" % line)
+                "I could not find the gaps in line\n%s" % line)
             self._hsp.gaps = _safe_int(x), _safe_int(y)
 
         
@@ -905,7 +903,7 @@ class _HSPConsumer:
         # Frame can be in formats:
         # Frame = +1
         # Frame = +2 / +2
-        if string.find(line, '/') >= 0:
+        if line.find('/') != -1:
             self._hsp.frame = _re_search(
                 r"Frame = ([-+][123]) / ([-+][123])", line,
                 "I could not find the frame in line\n%s" % line)
@@ -931,7 +929,7 @@ class _HSPConsumer:
         self._query_len = len(seq)
 
     def align(self, line):
-        seq = string.rstrip(line[self._query_start_index:])
+        seq = line[self._query_start_index:].rstrip()
         if len(seq) < self._query_len:
             # Make sure the alignment is the same length as the query
             seq = seq + ' ' * (self._query_len-len(seq))
@@ -948,7 +946,7 @@ class _HSPConsumer:
 	#On occasion, there is a blast hit with no subject match
 	#so far, it only occurs with 1-line short "matches"
 	#I have decided to let these pass as they appear
-	if not string.strip(seq):
+	if not seq.strip():
             seq = ' ' * self._query_len
         self._hsp.sbjct = self._hsp.sbjct + seq
         if self._hsp.sbjct_start is None:
@@ -976,8 +974,8 @@ class _DatabaseReportConsumer:
             self._dr.database_name.append(m.group(1))
         elif self._dr.database_name:
             # This must be a continuation of the previous name.
-            x = self._dr.database_name[-1] + string.strip(line)
-            self._dr.database_name[-1] = x
+            self._dr.database_name[-1] = "%s%s" % (self._dr.database_name[-1],
+                                                   line.strip())
 
     def posted_date(self, line):
         self._dr.posted_date.append(_re_search(
@@ -995,14 +993,14 @@ class _DatabaseReportConsumer:
         self._dr.num_sequences_in_database.append(_safe_int(sequences))
 
     def ka_params(self, line):
-        x = string.split(line)
+        x = line.split()
         self._dr.ka_params = map(_safe_float, x)
 
     def gapped(self, line):
         self._dr.gapped = 1
 
     def ka_params_gap(self, line):
-        x = string.split(line)
+        x = line.split()
         self._dr.ka_params_gap = map(_safe_float, x)
 
     def end_database_report(self):
@@ -1013,7 +1011,7 @@ class _ParametersConsumer:
         self._params = Record.Parameters()
 
     def matrix(self, line):
-        self._params.matrix = string.rstrip(line[8:])
+        self._params.matrix = line[8:].rstrip()
 
     def gap_penalties(self, line):
         x = _get_cols(
@@ -1021,7 +1019,7 @@ class _ParametersConsumer:
         self._params.gap_penalties = map(_safe_float, x)
 
     def num_hits(self, line):
-        if string.find(line, '1st pass') >= 0:
+        if line.find('1st pass') != -1:
             x, = _get_cols(line, (-4,), ncols=11, expected={2:"Hits"})
             self._params.num_hits = _safe_int(x)
         else:
@@ -1029,7 +1027,7 @@ class _ParametersConsumer:
             self._params.num_hits = _safe_int(x)
 
     def num_sequences(self, line):
-        if string.find(line, '1st pass') >= 0:
+        if line.find('1st pass') != -1:
             x, = _get_cols(line, (-4,), ncols=9, expected={2:"Sequences:"})
             self._params.num_sequences = _safe_int(x)
         else:
@@ -1037,7 +1035,7 @@ class _ParametersConsumer:
             self._params.num_sequences = _safe_int(x)
 
     def num_extends(self, line):
-        if string.find(line, '1st pass') >= 0:
+        if line.find('1st pass') != -1:
             x, = _get_cols(line, (-4,), ncols=9, expected={2:"extensions:"})
             self._params.num_extends = _safe_int(x)
         else:
@@ -1045,7 +1043,7 @@ class _ParametersConsumer:
             self._params.num_extends = _safe_int(x)
 
     def num_good_extends(self, line):
-        if string.find(line, '1st pass') >= 0:
+        if line.find('1st pass') != -1:
             x, = _get_cols(line, (-4,), ncols=10, expected={3:"extensions:"})
             self._params.num_good_extends = _safe_int(x)
         else:
@@ -1297,8 +1295,13 @@ class Iterator:
         If set to None, then the raw contents of the file will be returned.
 
         """
-        if type(handle) is not FileType and type(handle) is not InstanceType:
-            raise ValueError, "I expected a file handle or file-like object"
+        try:
+            dummy = handle.readline
+        except AttributeError:
+            raise ValueError(
+                "I expected a file handle or file-like object, got %s"
+                % type(handle))
+        del dummy
         self._uhandle = File.UndoHandle(handle)
         self._parser = parser
 
@@ -1315,7 +1318,8 @@ class Iterator:
             if not line:
                 break
             # If I've reached the next one, then put the line back and stop.
-            if lines and (line[:5] == 'BLAST' or line[1:6] == 'BLAST'):
+            if lines and (line.startswith('BLAST')
+                          or line.startswith('BLAST', start = 1)):
                 self._uhandle.saveline(line)
                 break
             lines.append(line)
@@ -1323,7 +1327,7 @@ class Iterator:
         if not lines:
             return None
             
-        data = string.join(lines, '')
+        data = ''.join(lines)
         if self._parser is not None:
             return self._parser.parse(File.StringHandle(data))
         return data
@@ -1559,7 +1563,7 @@ def _re_search(regex, line, error_msg):
     return m.groups()
 
 def _get_cols(line, cols_to_get, ncols=None, expected={}):
-    cols = string.split(line)
+    cols = line.split()
 
     # Check to make sure number of columns is correct
     if ncols is not None and len(cols) != ncols:
@@ -1584,13 +1588,14 @@ def _safe_int(str):
     except ValueError:
         # Something went wrong.  Try to clean up the string.
         # Remove all commas from the string
-        str = string.replace(str, ',', '')
+        str = str.replace(',', '')
     try:
         # try again.
         return int(str)
     except ValueError:
         pass
     # If it fails again, maybe it's too long?
+    # XXX why converting to float?
     return long(float(str))
 
 def _safe_float(str):
@@ -1599,13 +1604,13 @@ def _safe_float(str):
     # we need to check the string for this condition.
     
     # Sometimes BLAST leaves of the '1' in front of an exponent.
-    if str[0] in ['E', 'e']:
+    if str and str[0] in ['E', 'e']:
         str = '1' + str
     try:
         return float(str)
     except ValueError:
         # Remove all commas from the string
-        str = string.replace(str, ',', '')
+        str = str.replace(',', '')
     # try again.
     return float(str)
 
@@ -1613,7 +1618,7 @@ class _BlastErrorConsumer(_BlastConsumer
     def __init__(self):
         _BlastConsumer.__init__(self)
     def noevent(self, line):
-        if line.find("Query must be at least wordsize") >= 0:
+        if line.find("Query must be at least wordsize") != -1:
             raise ShortQueryBlastError, "Query must be at least wordsize"
         # Now pass the line back up to the superclass.
         method = getattr(_BlastConsumer, 'noevent',
@@ -1687,7 +1692,7 @@ class BlastErrorParser(AbstractParser):
             # 'Searchingdone' instead of 'Searching......done' seems
             # to indicate a failure to perform the BLAST due to
             # low quality sequence
-            if line[:13] == 'Searchingdone':
+            if line.startswith('Searchingdone'):
                 raise LowQualityBlastError("Blast failure occured on query: ",
                                            data_record.query)
             line = handle.readline()
From Yves.Bastide at irisa.fr  Fri Aug  8 08:45:02 2003
From: Yves.Bastide at irisa.fr (Yves Bastide)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Flat files indices
Message-ID: <3F339B4E.5090304@irisa.fr>

Hi,

Are the "Open-bio flat-file indexing systems" implemented soemwhere?

yves


From jchang at jeffchang.com  Fri Aug  8 15:31:46 2003
From: jchang at jeffchang.com (Jeffrey Chang)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] [patch] NCBIStandalone.py cleanup
In-Reply-To: <3F3376BF.2070304@irisa.fr>
Message-ID: <F31DC132-C9D6-11D7-A249-000A956845CE@jeffchang.com>

Great patch!  I've applied it to the CVS repository.

Jeff


On Friday, August 8, 2003, at 03:09  AM, Yves Bastide wrote:

> Hi,
>
> Here's a patch bringing Bio.Blast.NCBIStandalone nearer to Python 3 
> compatibility :-)
>
> Changelog:
> * Don't use the string module
> * use .find() == -1, not .find() <= 0
> * use .startswith() and .endwith()
>
> Regards,
>
> yves
> Index: NCBIStandalone.py
> ===================================================================
> RCS file: 
> /home/repository/biopython/biopython/Bio/Blast/NCBIStandalone.py,v
> retrieving revision 1.47
> diff -u -p -r1.47 NCBIStandalone.py
> --- NCBIStandalone.py	2003/06/09 02:12:09	1.47
> +++ NCBIStandalone.py	2003/08/08 10:04:35
> @@ -34,9 +34,7 @@ blastpgp        Execute blastpgp.
>  """
>
>  import os
> -import string
>  import re
> -from types import *
>
>  from Bio import File
>  from Bio.ParserSupport import *
> @@ -145,9 +143,9 @@ class _Scanner:
>
>          while 1:
>              line = safe_peekline(uhandle)
> -            if line[:9] != 'Searching' and \
> -               re.search(r"Score +E", line) is None and \
> -               string.find(line, 'No hits found') < 0:
> +            if (not line.startswith('Searching') and
> +                re.search(r"Score +E", line) is None and
> +                line.find('No hits found') == -1):
>                  break
>
>              self._scan_descriptions(uhandle, consumer)
> @@ -197,7 +195,7 @@ class _Scanner:
>          # Check for these error lines and ignore them for now.  Let
>          # the BlastErrorParser deal with them.
>          line = uhandle.peekline()
> -        if line.find("ERROR:") >= 0 or line.startswith("done"):
> +        if line.find("ERROR:") != -1 or line.startswith("done"):
>              read_and_call_while(uhandle, consumer.noevent, 
> contains="ERROR:")
>              read_and_call(uhandle, consumer.noevent, start="done")
>
> @@ -256,7 +254,7 @@ class _Scanner:
>
>          # Read the descriptions and the following blank lines, making
>          # sure that there are descriptions.
> -        if uhandle.peekline()[:19] != 'Sequences not found':
> +        if not uhandle.peekline().startswith('Sequences not found'):
>              read_and_call_until(uhandle, consumer.description, 
> blank=1)
>              read_and_call_while(uhandle, consumer.noevent, blank=1)
>
> @@ -269,7 +267,7 @@ class _Scanner:
>              # Read the descriptions and the following blank lines.
>              read_and_call_while(uhandle, consumer.noevent, blank=1)
>              l = safe_peekline(uhandle)
> -            if l[:9] != 'CONVERGED' and l[0] != '>':
> +            if not l.startswith('CONVERGED') and l[0] != '>':
>                  read_and_call_until(uhandle, consumer.description, 
> blank=1)
>                  read_and_call_while(uhandle, consumer.noevent, 
> blank=1)
>
> @@ -281,7 +279,7 @@ class _Scanner:
>      def _scan_alignments(self, uhandle, consumer):
>          # First, check to see if I'm at the database report.
>          line = safe_peekline(uhandle)
> -        if line[:10] == '  Database':
> +        if line.startswith('  Database'):
>              return
>          elif line[0] == '>':
>              # XXX make a better check here between pairwise and 
> masterslave
> @@ -305,7 +303,7 @@ class _Scanner:
>          # Scan a bunch of score/alignment pairs.
>          while 1:
>              line = safe_peekline(uhandle)
> -            if line[:6] != ' Score':
> +            if not line.startswith(' Score'):
>                  break
>              self._scan_hsp(uhandle, consumer)
>          consumer.end_alignment()
> @@ -318,7 +316,7 @@ class _Scanner:
>          read_and_call(uhandle, consumer.title, start='>')
>          while 1:
>              line = safe_readline(uhandle)
> -            if string.lstrip(line)[:8] == 'Length =':
> +            if line.lstrip().startswith('Length ='):
>                  consumer.length(line)
>                  break
>              elif is_blank_line(line):
> @@ -372,7 +370,7 @@ class _Scanner:
>              read_and_call_while(uhandle, consumer.noevent, blank=1)
>              line = safe_peekline(uhandle)
>              # Alignment continues if I see a 'Query' or the spaces 
> for Blastn.
> -            if line[:5] != 'Query' and line[:5] != '     ':
> +            if not (line.startswith('Query') or line.startswith('     
> ')):
>                  break
>
>      def _scan_masterslave_alignment(self, uhandle, consumer):
> @@ -382,10 +380,10 @@ class _Scanner:
>              # Check to see whether I'm finished reading the alignment.
>              # This is indicated by 1) database section, 2) next 
> psi-blast round
>              # patch by chapmanb
> -            if line[:9] == 'Searching':
> +            if line.startswith('Searching'):
>                  uhandle.saveline(line)
>                  break
> -            elif line[:10] == '  Database':
> +            elif line.startswith('  Database'):
>                  uhandle.saveline(line)
>                  break
>              elif is_blank_line(line):
> @@ -423,7 +421,7 @@ class _Scanner:
>
>  	    line = safe_readline(uhandle)
>  	    uhandle.saveline(line)
> -            if string.find(line, 'Lambda') >= 0:
> +            if line.find('Lambda') != -1:
>  		break
>
>  	read_and_call(uhandle, consumer.noevent, start='Lambda')
> @@ -577,22 +575,22 @@ class _HeaderConsumer:
>          self._header = Record.Header()
>
>      def version(self, line):
> -        c = string.split(line)
> +        c = line.split()
>          self._header.application = c[0]
>          self._header.version = c[1]
>          self._header.date = c[2][1:-1]
>
>      def reference(self, line):
> -        if line[:11] == 'Reference: ':
> +        if line.startswith('Reference: '):
>              self._header.reference = line[11:]
>          else:
>              self._header.reference = self._header.reference + line
>
>      def query_info(self, line):
> -        if line[:7] == 'Query= ':
> +        if line.startswith('Query= '):
>              self._header.query = line[7:]
> -        elif line[:7] != '       ':  # continuation of query_info
> -            self._header.query = self._header.query + line
> +        elif not line.startswith('       '):  # continuation of 
> query_info
> +            self._header.query = "%s%s" % (self._header.query, line)
>          else:
>              letters, = _re_search(
>                  r"([0-9,]+) letters", line,
> @@ -600,11 +598,11 @@ class _HeaderConsumer:
>              self._header.query_letters = _safe_int(letters)
>
>      def database_info(self, line):
> -        line = string.rstrip(line)
> -        if line[:10] == 'Database: ':
> +        line = line.rstrip()
> +        if line.startswith('Database: '):
>              self._header.database = line[10:]
> -	elif not line[-13:] == 'total letters':
> -            self._header.database = self._header.database + 
> string.strip(line)
> +	elif not line.endswith('total letters'):
> +            self._header.database = self._header.database + 
> line.strip()
>          else:
>              sequences, letters =_re_search(
>                  r"([0-9,]+) sequences; ([0-9,]+) total letters", line,
> @@ -614,8 +612,8 @@ class _HeaderConsumer:
>
>      def end_header(self):
>          # Get rid of the trailing newlines
> -        self._header.reference = string.rstrip(self._header.reference)
> -        self._header.query = string.rstrip(self._header.query)
> +        self._header.reference = self._header.reference.rstrip()
> +        self._header.query = self._header.query.rstrip()
>
>  class _DescriptionConsumer:
>      def start_descriptions(self):
> @@ -629,8 +627,8 @@ class _DescriptionConsumer:
>          self.__has_n = 0   # Does the description line contain an N 
> value?
>
>      def description_header(self, line):
> -        if line[:19] == 'Sequences producing':
> -            cols = string.split(line)
> +        if line.startswith('Sequences producing'):
> +            cols = line.split()
>              if cols[-1] == 'N':
>                  self.__has_n = 1
>
> @@ -656,9 +654,9 @@ class _DescriptionConsumer:
>          pass
>
>      def round(self, line):
> -        if line[:18] != 'Results from round':
> +        if not line.startswith('Results from round'):
>              raise SyntaxError, "I didn't understand the round 
> line\n%s" % line
> -        self._roundnum = _safe_int(string.strip(line[18:]))
> +        self._roundnum = _safe_int(line[18:])
>
>      def end_descriptions(self):
>          pass
> @@ -674,23 +672,23 @@ class _DescriptionConsumer:
>          #   - title must be preserved exactly (including whitespaces)
>          #   - score could be equal to e-value (not likely, but what 
> if??)
>          #   - sometimes there's an "N" score of '1'.
> -        cols = string.split(line)
> +        cols = line.split()
>          if len(cols) < 3:
>              raise SyntaxError, \
>                    "Line does not appear to contain description:\n%s" 
> % line
>          if self.__has_n:
> -            i = string.rfind(line, cols[-1])        # find start of N
> -            i = string.rfind(line, cols[-2], 0, i)  # find start of 
> p-value
> -            i = string.rfind(line, cols[-3], 0, i)  # find start of 
> score
> +            i = line.rfind(cols[-1])        # find start of N
> +            i = line.rfind(cols[-2], 0, i)  # find start of p-value
> +            i = line.rfind(cols[-3], 0, i)  # find start of score
>          else:
> -            i = string.rfind(line, cols[-1])        # find start of 
> p-value
> -            i = string.rfind(line, cols[-2], 0, i)  # find start of 
> score
> +            i = line.rfind(cols[-1])        # find start of p-value
> +            i = line.rfind(cols[-2], 0, i)  # find start of score
>          if self.__has_n:
>              dh.title, dh.score, dh.e, dh.num_alignments = \
> -                      string.rstrip(line[:i]), cols[-3], cols[-2], 
> cols[-1]
> +                      line[:i].rstrip(), cols[-3], cols[-2], cols[-1]
>          else:
>              dh.title, dh.score, dh.e, dh.num_alignments = \
> -                      string.rstrip(line[:i]), cols[-2], cols[-1], 1
> +                      line[:i].rstrip(), cols[-2], cols[-1], 1
>          dh.num_alignments = _safe_int(dh.num_alignments)
>          dh.score = _safe_int(dh.score)
>          dh.e = _safe_float(dh.e)
> @@ -706,52 +704,52 @@ class _AlignmentConsumer:
>          self._multiple_alignment = Record.MultipleAlignment()
>
>      def title(self, line):
> -        self._alignment.title = self._alignment.title + 
> string.lstrip(line)
> +        self._alignment.title = "%s%s" % (self._alignment.title,
> +                                           line.lstrip())
>
>      def length(self, line):
> -        self._alignment.length = string.split(line)[2]
> +        self._alignment.length = line.split()[2]
>          self._alignment.length = _safe_int(self._alignment.length)
>
>      def multalign(self, line):
>          # Standalone version uses 'QUERY', while WWW version uses 
> blast_tmp.
> -        if line[:5] == 'QUERY' or line[:9] == 'blast_tmp':
> +        if line.startswith('QUERY') or line.startswith('blast_tmp'):
>              # If this is the first line of the multiple alignment,
>              # then I need to figure out how the line is formatted.
>
>              # Format of line is:
>              # QUERY 1   
> acttg...gccagaggtggtttattcagtctccataagagaggggacaaacg 60
>              try:
> -                name, start, seq, end = string.split(line)
> +                name, start, seq, end = line.split()
>              except ValueError:
>                  raise SyntaxError, "I do not understand the line\n%s" 
> \
>                        % line
> -            self._start_index = string.index(line, start, len(name))
> -            self._seq_index = string.index(line, seq,
> -                                           
> self._start_index+len(start))
> +            self._start_index = line.index(start, len(name))
> +            self._seq_index = line.index(seq,
> +                                         self._start_index+len(start))
>              # subtract 1 for the space
>              self._name_length = self._start_index - 1
>              self._start_length = self._seq_index - self._start_index 
> - 1
> -            self._seq_length = string.rfind(line, end) - 
> self._seq_index - 1
> +            self._seq_length = line.rfind(end) - self._seq_index - 1
>
> -            #self._seq_index = string.index(line, seq)
> +            #self._seq_index = line.index(seq)
>              ## subtract 1 for the space
> -            #self._seq_length = string.rfind(line, end) - 
> self._seq_index - 1
> -            #self._start_index = string.index(line, start)
> +            #self._seq_length = line.rfind(end) - self._seq_index - 1
> +            #self._start_index = line.index(start)
>              #self._start_length = self._seq_index - self._start_index 
> - 1
>              #self._name_length = self._start_index
>
>          # Extract the information from the line
> -        name = string.rstrip(line[:self._name_length])
> -        start = string.rstrip(
> -            
> line[self._start_index:self._start_index+self._start_length])
> +        name = line[:self._name_length]
> +        name = name.rstrip()
> +        start = 
> line[self._start_index:self._start_index+self._start_length]
> +        start = start.rstrip()
>          if start:
>              start = _safe_int(start)
> -        end = string.rstrip(
> -            line[self._seq_index+self._seq_length:])
> +        end = line[self._seq_index+self._seq_length:].rstrip()
>          if end:
>              end = _safe_int(end)
> -        seq = string.rstrip(
> -            line[self._seq_index:self._seq_index+self._seq_length])
> +        seq = 
> line[self._seq_index:self._seq_index+self._seq_length].rstrip()
>          # right pad the sequence with spaces if necessary
>          if len(seq) < self._seq_length:
>              seq = seq + ' '*(self._seq_length-len(seq))
> @@ -826,7 +824,7 @@ class _AlignmentConsumer:
>      def end_alignment(self):
>          # Remove trailing newlines
>          if self._alignment:
> -            self._alignment.title = 
> string.rstrip(self._alignment.title)
> +            self._alignment.title = self._alignment.title.rstrip()
>
>          # This code is also obsolete.  See note above.
>          # If there's a multiple alignment, I will need to make sure
> @@ -883,16 +881,16 @@ class _HSPConsumer:
>              "I could not find the identities in line\n%s" % line)
>          self._hsp.identities = _safe_int(x), _safe_int(y)
>
> -        if string.find(line, 'Positives') >= 0:
> +        if line.find('Positives') != -1:
>              x, y = _re_search(
>                  r"Positives = (\d+)\/(\d+)", line,
>                  "I could not find the positives in line\n%s" % line)
>              self._hsp.positives = _safe_int(x), _safe_int(y)
>
> -        if string.find(line, 'Gaps') >= 0:
> +        if line.find('Gaps') != -1:
>              x, y = _re_search(
>                  r"Gaps = (\d+)\/(\d+)", line,
> -                "I could not find the positives in line\n%s" % line)
> +                "I could not find the gaps in line\n%s" % line)
>              self._hsp.gaps = _safe_int(x), _safe_int(y)
>
>
> @@ -905,7 +903,7 @@ class _HSPConsumer:
>          # Frame can be in formats:
>          # Frame = +1
>          # Frame = +2 / +2
> -        if string.find(line, '/') >= 0:
> +        if line.find('/') != -1:
>              self._hsp.frame = _re_search(
>                  r"Frame = ([-+][123]) / ([-+][123])", line,
>                  "I could not find the frame in line\n%s" % line)
> @@ -931,7 +929,7 @@ class _HSPConsumer:
>          self._query_len = len(seq)
>
>      def align(self, line):
> -        seq = string.rstrip(line[self._query_start_index:])
> +        seq = line[self._query_start_index:].rstrip()
>          if len(seq) < self._query_len:
>              # Make sure the alignment is the same length as the query
>              seq = seq + ' ' * (self._query_len-len(seq))
> @@ -948,7 +946,7 @@ class _HSPConsumer:
>  	#On occasion, there is a blast hit with no subject match
>  	#so far, it only occurs with 1-line short "matches"
>  	#I have decided to let these pass as they appear
> -	if not string.strip(seq):
> +	if not seq.strip():
>              seq = ' ' * self._query_len
>          self._hsp.sbjct = self._hsp.sbjct + seq
>          if self._hsp.sbjct_start is None:
> @@ -976,8 +974,8 @@ class _DatabaseReportConsumer:
>              self._dr.database_name.append(m.group(1))
>          elif self._dr.database_name:
>              # This must be a continuation of the previous name.
> -            x = self._dr.database_name[-1] + string.strip(line)
> -            self._dr.database_name[-1] = x
> +            self._dr.database_name[-1] = "%s%s" % 
> (self._dr.database_name[-1],
> +                                                   line.strip())
>
>      def posted_date(self, line):
>          self._dr.posted_date.append(_re_search(
> @@ -995,14 +993,14 @@ class _DatabaseReportConsumer:
>          
> self._dr.num_sequences_in_database.append(_safe_int(sequences))
>
>      def ka_params(self, line):
> -        x = string.split(line)
> +        x = line.split()
>          self._dr.ka_params = map(_safe_float, x)
>
>      def gapped(self, line):
>          self._dr.gapped = 1
>
>      def ka_params_gap(self, line):
> -        x = string.split(line)
> +        x = line.split()
>          self._dr.ka_params_gap = map(_safe_float, x)
>
>      def end_database_report(self):
> @@ -1013,7 +1011,7 @@ class _ParametersConsumer:
>          self._params = Record.Parameters()
>
>      def matrix(self, line):
> -        self._params.matrix = string.rstrip(line[8:])
> +        self._params.matrix = line[8:].rstrip()
>
>      def gap_penalties(self, line):
>          x = _get_cols(
> @@ -1021,7 +1019,7 @@ class _ParametersConsumer:
>          self._params.gap_penalties = map(_safe_float, x)
>
>      def num_hits(self, line):
> -        if string.find(line, '1st pass') >= 0:
> +        if line.find('1st pass') != -1:
>              x, = _get_cols(line, (-4,), ncols=11, expected={2:"Hits"})
>              self._params.num_hits = _safe_int(x)
>          else:
> @@ -1029,7 +1027,7 @@ class _ParametersConsumer:
>              self._params.num_hits = _safe_int(x)
>
>      def num_sequences(self, line):
> -        if string.find(line, '1st pass') >= 0:
> +        if line.find('1st pass') != -1:
>              x, = _get_cols(line, (-4,), ncols=9, 
> expected={2:"Sequences:"})
>              self._params.num_sequences = _safe_int(x)
>          else:
> @@ -1037,7 +1035,7 @@ class _ParametersConsumer:
>              self._params.num_sequences = _safe_int(x)
>
>      def num_extends(self, line):
> -        if string.find(line, '1st pass') >= 0:
> +        if line.find('1st pass') != -1:
>              x, = _get_cols(line, (-4,), ncols=9, 
> expected={2:"extensions:"})
>              self._params.num_extends = _safe_int(x)
>          else:
> @@ -1045,7 +1043,7 @@ class _ParametersConsumer:
>              self._params.num_extends = _safe_int(x)
>
>      def num_good_extends(self, line):
> -        if string.find(line, '1st pass') >= 0:
> +        if line.find('1st pass') != -1:
>              x, = _get_cols(line, (-4,), ncols=10, 
> expected={3:"extensions:"})
>              self._params.num_good_extends = _safe_int(x)
>          else:
> @@ -1297,8 +1295,13 @@ class Iterator:
>          If set to None, then the raw contents of the file will be 
> returned.
>
>          """
> -        if type(handle) is not FileType and type(handle) is not 
> InstanceType:
> -            raise ValueError, "I expected a file handle or file-like 
> object"
> +        try:
> +            dummy = handle.readline
> +        except AttributeError:
> +            raise ValueError(
> +                "I expected a file handle or file-like object, got %s"
> +                % type(handle))
> +        del dummy
>          self._uhandle = File.UndoHandle(handle)
>          self._parser = parser
>
> @@ -1315,7 +1318,8 @@ class Iterator:
>              if not line:
>                  break
>              # If I've reached the next one, then put the line back 
> and stop.
> -            if lines and (line[:5] == 'BLAST' or line[1:6] == 
> 'BLAST'):
> +            if lines and (line.startswith('BLAST')
> +                          or line.startswith('BLAST', start = 1)):
>                  self._uhandle.saveline(line)
>                  break
>              lines.append(line)
> @@ -1323,7 +1327,7 @@ class Iterator:
>          if not lines:
>              return None
>
> -        data = string.join(lines, '')
> +        data = ''.join(lines)
>          if self._parser is not None:
>              return self._parser.parse(File.StringHandle(data))
>          return data
> @@ -1559,7 +1563,7 @@ def _re_search(regex, line, error_msg):
>      return m.groups()
>
>  def _get_cols(line, cols_to_get, ncols=None, expected={}):
> -    cols = string.split(line)
> +    cols = line.split()
>
>      # Check to make sure number of columns is correct
>      if ncols is not None and len(cols) != ncols:
> @@ -1584,13 +1588,14 @@ def _safe_int(str):
>      except ValueError:
>          # Something went wrong.  Try to clean up the string.
>          # Remove all commas from the string
> -        str = string.replace(str, ',', '')
> +        str = str.replace(',', '')
>      try:
>          # try again.
>          return int(str)
>      except ValueError:
>          pass
>      # If it fails again, maybe it's too long?
> +    # XXX why converting to float?
>      return long(float(str))
>
>  def _safe_float(str):
> @@ -1599,13 +1604,13 @@ def _safe_float(str):
>      # we need to check the string for this condition.
>
>      # Sometimes BLAST leaves of the '1' in front of an exponent.
> -    if str[0] in ['E', 'e']:
> +    if str and str[0] in ['E', 'e']:
>          str = '1' + str
>      try:
>          return float(str)
>      except ValueError:
>          # Remove all commas from the string
> -        str = string.replace(str, ',', '')
> +        str = str.replace(',', '')
>      # try again.
>      return float(str)
>
> @@ -1613,7 +1618,7 @@ class _BlastErrorConsumer(_BlastConsumer
>      def __init__(self):
>          _BlastConsumer.__init__(self)
>      def noevent(self, line):
> -        if line.find("Query must be at least wordsize") >= 0:
> +        if line.find("Query must be at least wordsize") != -1:
>              raise ShortQueryBlastError, "Query must be at least 
> wordsize"
>          # Now pass the line back up to the superclass.
>          method = getattr(_BlastConsumer, 'noevent',
> @@ -1687,7 +1692,7 @@ class BlastErrorParser(AbstractParser):
>              # 'Searchingdone' instead of 'Searching......done' seems
>              # to indicate a failure to perform the BLAST due to
>              # low quality sequence
> -            if line[:13] == 'Searchingdone':
> +            if line.startswith('Searchingdone'):
>                  raise LowQualityBlastError("Blast failure occured on 
> query: ",
>                                             data_record.query)
>              line = handle.readline()
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev


From jefftc at stanford.edu  Fri Aug  8 16:02:38 2003
From: jefftc at stanford.edu (Jeffrey Chang)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Re: [BioPython] Updated Logo
In-Reply-To: <200308071740.h77HeVTN011918@kira.skynet.be>
Message-ID: <4278F5D5-C9DB-11D7-A249-000A956845CE@stanford.edu>

(Moved to the -dev mailing list.)

Brad, what's the best way to add images to the webpage?  Should they be 
installed in a separate directory, or mixed in with the other pages?  
We will need for them to be accessible from other pages.  Someone 
should be able to have an IMG SRC point to our logo.

Jeff


On Thursday, August 7, 2003, at 10:42  AM, Thomas Hamelryck wrote:

>
> Hi,
>
> Here's a smaller version of the biopython logo (300x89).
> The large image scales well using gimp, BTW, or so I think.
> Can it now be added to the biopython homepage?
>
> Regards,
>
> ---
> Thomas Hamelryck
> ULTR/COMO
> Institute for molecular biology/Computer Science Department
> Vrije Universiteit Brussel (VUB)
> Brussels, Belgium
> http://homepages.vub.ac.be/~thamelry
> <biopython_small.jpg>_______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython


From chapmanb at uga.edu  Fri Aug  8 17:29:19 2003
From: chapmanb at uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Re: [BioPython] Updated Logo
In-Reply-To: <4278F5D5-C9DB-11D7-A249-000A956845CE@stanford.edu>
References: <200308071740.h77HeVTN011918@kira.skynet.be>
	<4278F5D5-C9DB-11D7-A249-000A956845CE@stanford.edu>
Message-ID: <20030808212919.GC47653@evostick.agtec.uga.edu>

Thomas:
> >Here's a smaller version of the biopython logo (300x89).
> >The large image scales well using gimp, BTW, or so I think.
> >Can it now be added to the biopython homepage?

Sure-o. I put it up at the top where the ol' BIOPYTHON text used to
be. Looks quite nice, in my non-artistic opinion. What do people
think?

Jeff:
> Brad, what's the best way to add images to the webpage?  Should they be 
> installed in a separate directory, or mixed in with the other pages?  
> We will need for them to be accessible from other pages.  Someone 
> should be able to have an IMG SRC point to our logo.

I created an images directory, so now:

http://www.biopython.org/images/

has both the full size and scaled logo in it, for easy linking and
all other good things people might like to do.

Very pretty. Thanks for the logo Thomas!
Brad

From grouse at mail.utexas.edu  Fri Aug  8 17:49:16 2003
From: grouse at mail.utexas.edu (Michael Hoffman)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Re: [BioPython] Updated Logo
In-Reply-To: <20030808212919.GC47653@evostick.agtec.uga.edu>
References: <200308071740.h77HeVTN011918@kira.skynet.be>
	<4278F5D5-C9DB-11D7-A249-000A956845CE@stanford.edu>
	<20030808212919.GC47653@evostick.agtec.uga.edu>
Message-ID: <Pine.LNX.4.55.0308081646000.17756@indy.icmb.utexas.edu>

On Fri, 8 Aug 2003, Brad Chapman wrote:

> Sure-o. I put it up at the top where the ol' BIOPYTHON text used to
> be. Looks quite nice, in my non-artistic opinion. What do people
> think?

There's no difference between the major and minor grooves! The
horror!

Just kidding. I like it. It would be nice if the web site colors
matched up with the logo more or vice versa.
-- 
Michael Hoffman <grouse@alumni.utexas.net>
The University of Texas at Austin


From chapmanb at uga.edu  Fri Aug  8 18:08:23 2003
From: chapmanb at uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Flat files indices
In-Reply-To: <3F339B4E.5090304@irisa.fr>
References: <3F339B4E.5090304@irisa.fr>
Message-ID: <20030808220823.GE47653@evostick.agtec.uga.edu>

Hi Yves;

> Are the "Open-bio flat-file indexing systems" implemented soemwhere?

Yes, in Bio.Mindy. All of the components are there to be used,
although the documentation and "ease of use" of them is lagging
behind what has actually been finished. But, you can do useful work
with them. 

I'm attaching a file of example code ripped from some work I've been
doing which hopefully demonstrates the using it. This current code
indexes a standard set of FASTA files downloaded from GenBank based
on GI numbers. It also has some support for version numbers and
other things tossed in which aren't use here, but which I have used
in other things (aaah, the ugliness of cut-n-paste code). 

This uses exclusively the Martel based parsers, which allows it to
work on pretty darn huge FASTA files. A previous version used the
standard Fasta RecordParser which doesn't do well on huge entries (I
was doing rice work for someone and had entries like all of
chromosome 10 tossed in).

So, yeah -- it's all there but needs some work to make it more
user-friendly and documented. Volunteers are always welcome :-).

Hope this helps!
Brad
-------------- next part --------------
import cStringIO

from Bio import Fasta
from Bio import Mindy
from Bio.Mindy import SimpleSeqRecord
from Bio.expressions import fasta

from Martel.LAX import LAX

class MindyIndexOrganizer:
    """Organizer to deal with a set of Mindy indexes of Fasta (or other) files
    """
    def __init__(self):
        self._databases ={}
        self._opened_mindydbs = {}
        self._retrieved_seqs = {}

    def _get_indexer(self):
        """Derive from and override this class to get different indexers.
        """
        return _FastaIdIndexer()

    def index(self, index_name, index_file, index_dir, force_index = 0):
        """Index, if necessary, the file and load the index into the organizer.
        """
        if not(os.path.exists(index_dir)):
            os.makedirs(index_dir)

        full_index = os.path.join(index_dir, index_name)
        if not(os.path.exists(full_index)) or force_index:
            print "Indexing %s..." % index_file
            indexer = self._get_indexer()
            SimpleSeqRecord.create_berkeleydb([index_file], full_index,
                                              indexer)

        self._databases[index_name] = full_index

    def retrieve(self, database_name, id_key):
        """Retrieve a sequence record from the given Mindy sequence db.

        This returns a Record object, specified by the parser passed
        to the class.
        """
        # see if we've already retrieved this sequence -- save time with big
        # sequences
        try:
            return self._retrieved_seqs[(database_name, id_key)]
        except KeyError:
            pass
        
        # Get the sequence database
        seq_db = self._get_opened_mindydb(database_name)

        # first try retrieving by the base id
        try:
            seqs = seq_db.lookup(id = id_key)
        # if we can't do that, we have to fetch by alias
        # this deals with the problem of multiple sequence versions
        except KeyError:
            seqs = seq_db.lookup(aliases = id_key)

        # easy case -- 1 sequence
        if len(seqs) == 1:
            seq_ref = seqs[0]
        # harder case -- multiple sequence versions
        else:
            seq_ref = self._get_latest_version(seqs)

        it_builder = fasta.format.make_iterator("record")
        handle = cStringIO.StringIO(seq_ref.text)
        iterator = it_builder.iterateFile(handle,
                LAX(fields = ['bioformat:sequence']))
        result = iterator.next()

        seq_record = Fasta.Record()
        seq_record.title = id_key
        seq_record.sequence = "".join(result['bioformat:sequence'])

        # add the retrieved sequence to an in-memory dictionary
        self._retrieved_seqs[(database_name, id_key)] = seq_record

        return seq_record

    def _get_opened_mindydb(self, name):
        """Retrieve an open Mindy database with the given name.

        This stores databases we already opened to try and prevent multiple 
        opens of the same database (and thus save time).
        """
        try:
            return self._opened_mindydbs[name]
        except KeyError:
            self._opened_mindydbs[name] = Mindy.open(self._databases[name])
            return self._get_opened_mindydb(name)

    def _get_latest_version(self, seqs):
        """Find the latest version of a sequence from a list of choices.

        This deals with the problem of multiple versions of a sequence by
        finding and returning the most recent version
        """
        version_dict = {}
        the_base_name = None # error checking
        for seq in seqs:
            text_parts = seq.text.split("\n")
            first_line_parts = text_parts[0].split(" ")
            seq_name = first_line_parts[0]
            assert seq_name.find(".") >= 0, \
              "Expected versioned seqs: %s" % seq_name
            base_name, version = seq_name.split(".")
            if the_base_name is None:
                the_base_name = base_name
            else:
                assert base_name == the_base_name, "Different seqs"
            version_dict[int(version)] = seq
        versions = version_dict.keys()
        versions.sort()
        # use the most recent version, seems the best plan
        return version_dict[max(versions)]

class _FastaIdIndexer(SimpleSeqRecord.BaseSeqRecordIndexer):
    """Simple indexer to index by GI values.

    This assumes that the description of the sequence records is in the
    standard GenBank-like format:
    >gi|33304516|gb|AY339367.1| Oryza sativa (japonica cultivar-group)...

    It indexes these based on the GI number of the sequence.
    """
    def __init__(self):
        SimpleSeqRecord.BaseSeqRecordIndexer.__init__(self)

    def primary_key_name(self):
        return "id"

    def secondary_key_names(self):
        return ["name", "aliases"]

    def get_id_dictionary(self, seq_record):
        parts = seq_record.description.split(" ")
        record_id = parts[0]
        record_parts = record_id.split("|")
        sequence_id = record_parts[1] # GI number
        aliases = []
        if not(sequence_id):
            raise ValueError("No sequence ID: %s" % seq_record.description)
            
        id_info = {"id" : [sequence_id],
                   "name" : [],
                   "aliases" : aliases}
        return id_info

From Yves.Bastide at irisa.fr  Tue Aug 12 15:18:41 2003
From: Yves.Bastide at irisa.fr (Yves Bastide)
Date: Sat Mar  5 14:43:25 2005
Subject: [Biopython-dev] Flat files indices
In-Reply-To: <20030808220823.GE47653@evostick.agtec.uga.edu>
References: <3F339B4E.5090304@irisa.fr>
	<20030808220823.GE47653@evostick.agtec.uga.edu>
Message-ID: <3F393D91.3010108@irisa.fr>

Brad Chapman wrote:
> Hi Yves;
> 
> 
>>Are the "Open-bio flat-file indexing systems" implemented soemwhere?
> 
> 
> Yes, in Bio.Mindy. All of the components are there to be used,
> although the documentation and "ease of use" of them is lagging
> behind what has actually been finished. But, you can do useful work
> with them. 
> 
> I'm attaching a file of example code ripped from some work I've been
> doing which hopefully demonstrates the using it. This current code
> indexes a standard set of FASTA files downloaded from GenBank based
> on GI numbers. It also has some support for version numbers and
> other things tossed in which aren't use here, but which I have used
> in other things (aaah, the ugliness of cut-n-paste code). 
> 
> This uses exclusively the Martel based parsers, which allows it to
> work on pretty darn huge FASTA files. A previous version used the
> standard Fasta RecordParser which doesn't do well on huge entries (I
> was doing rice work for someone and had entries like all of
> chromosome 10 tossed in).
> 
> So, yeah -- it's all there but needs some work to make it more
> user-friendly and documented. Volunteers are always welcome :-).
> 
> Hope this helps!
> Brad

Thanks!

I think there are a few bugs in Mindy (eg., the use of fileid_onfo in 
WriteDB); here's a patch with gratuitous cosmetic changes, perhaps 
useful ones, and certainly new bugs :) (I also started to add 
docstrings, then saw the hour here)

I also patched sprot38.py to read a current snapshot of SwissProt: 
2A5D_HUMAN has a multi-lines RP. I dunno if there are users of the 
parser to change, though...

yves
-------------- next part --------------
Index: Bio/Mindy/BaseDB.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Mindy/BaseDB.py,v
retrieving revision 1.5
diff -u -p -r1.5 BaseDB.py
--- Bio/Mindy/BaseDB.py	2002/12/10 20:56:05	1.5
+++ Bio/Mindy/BaseDB.py	2003/08/12 19:12:21
@@ -3,6 +3,7 @@ import Bio
 import compression
 
 def _int_str(i):
+    # XXX doesn't seem useful
     s = str(i)
     if s[-1:] == "l":
         return s[:-1]
@@ -12,22 +13,22 @@ class WriteDB:
     # Must define 'self.filename_map' mapping from filename -> fileid
     # Must define 'self.fileid_info' mapping from fileid -> (filename,size)
 
-    def add_filename(self, filename, size, fileid_info):
+    def add_filename(self, filename, size):
         fileid = self.filename_map.get(filename, None)
         if fileid is not None:
             return fileid
         s = str(len(self.filename_map))
         self.filename_map[filename] = s  # map from filename -> id
-        assert s not in fileid_info.keys(), "Duplicate entry! %s" % (s,)
+        assert s not in self.fileid_info.keys(), "Duplicate entry! %s" % (s,)
         self.fileid_info[s] = (filename, size)
         return s
 
-    def load(self, filename, builder, fileid_info, record_tag = "record"):
+    def load(self, filename, builder, record_tag="record"):
         formatname = self.formatname
         size = os.path.getsize(filename)
-        filetag = self.add_filename(filename, size, fileid_info)
+        filetag = self.add_filename(filename, size)
 
-        source = compression.open_file(filename, "rb")
+        source = compression.open(filename, "rb")
         if formatname == "unknown":
             formatname = "sequence"
         
@@ -66,7 +67,7 @@ class DictLookup:
     def items(self):
         return [(key, self[key]) for key in self.keys()]
 
-    def get(self, key, default = None):
+    def get(self, key, default=None):
         try:
             return self[key]
         except KeyError:
@@ -97,7 +98,7 @@ class OpenDB(DictLookup):
             if os.path.getsize(filename) != size:
                 raise TypeError(
                     "File %s has changed size from %d to %d bytes!" %
-                    (size, os.path.getsize(filename)))
+                    (filename, size, os.path.getsize(filename)))
 
         self.filename_map = filename_map
         self.fileid_info = fileid_info
Index: Bio/Mindy/BerkeleyDB.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Mindy/BerkeleyDB.py,v
retrieving revision 1.8
diff -u -p -r1.8 BerkeleyDB.py
--- Bio/Mindy/BerkeleyDB.py	2002/12/10 21:55:40	1.8
+++ Bio/Mindy/BerkeleyDB.py	2003/08/12 19:12:21
@@ -1,15 +1,19 @@
+"""Open-Bio BerkeleyDB indexing system for flat-files databanks."""
+
 import os
-from bsddb3 import db
+try:
+    from bsddb import db
+except ImportError:
+    from bsddb3 import db
 import Location
 import BaseDB
 import Bio
 
-_open = open  # rename for internal use -- gets redefined below
-
 INDEX_TYPE = "BerkeleyDB/1"
 
 def create(dbname, primary_namespace, secondary_namespaces,
-           formatname = "unknown"):
+           formatname="unknown"):
+    """BerkeleyDB creator factory"""
     os.mkdir(dbname)
     config_filename = os.path.join(dbname, "config.dat")
     BaseDB.write_config(config_filename = config_filename,
@@ -39,7 +43,7 @@ def create(dbname, primary_namespace, se
     primary_table.close()
     dbenv.close()
 
-    return open(dbname, "rw")
+    return BerkeleyDB(dbname, "rw")
     
 
 class PrimaryNamespace(BaseDB.DictLookup):
@@ -92,7 +96,7 @@ class SecondaryNamespace(BaseDB.DictLook
         return table.keys()
 
 class BerkeleyDB(BaseDB.OpenDB, BaseDB.WriteDB):
-    def __init__(self, dbname, mode = "r"):
+    def __init__(self, dbname, mode="r"):
         if mode not in ("r", "rw"):
             raise TypeError("Unknown mode: %r" % (mode,))
         self.__need_flush = 0
@@ -173,7 +177,7 @@ class BerkeleyDB(BaseDB.OpenDB, BaseDB.W
         [x.close() for x in self.secondary_tables.values()]
         self.dbenv.close()
         self.dbenv = self.primary_table = self.fileid_info = \
-                     self.secondary_tables = self.fileid_info = None
+                     self.secondary_tables = None
         
     def __del__(self):
         if self.dbenv is not None:
@@ -188,4 +192,3 @@ class BerkeleyDB(BaseDB.OpenDB, BaseDB.W
             return SecondaryNamespace(self, key)
 
 open = BerkeleyDB
-
Index: Bio/Mindy/FlatDB.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Mindy/FlatDB.py,v
retrieving revision 1.6
diff -u -p -r1.6 FlatDB.py
--- Bio/Mindy/FlatDB.py	2002/03/01 15:07:21	1.6
+++ Bio/Mindy/FlatDB.py	2003/08/12 19:12:21
@@ -1,9 +1,11 @@
+"""Open-Bio flat indexing system for flat-files databanks."""
 
-import os, bisect
-import BaseDB, Location
+import os
+import bisect
+import BaseDB
+import Location
 import Bio
 
-_open = open
 INDEX_TYPE = "flat/1"
 
 def _parse_primary_table_entry(s):
@@ -11,7 +13,7 @@ def _parse_primary_table_entry(s):
     return name, filetag, long(startpos), long(length)
 
 def _read_primary_table(filename):
-    infile = _open(filename, "rb")
+    infile = file(filename, "rb")
     size = int(infile.read(4))
     table = {}
     while 1:
@@ -36,7 +38,7 @@ def _write_primary_table(filename, prima
                 raise AssertionError(
                     "Primary index record too large for format spec! " +
                     " %s bytes in %r" % (n, s))
-    outfile = _open(filename, "wb")
+    outfile = file(filename, "wb")
     outfile.write("%04d" % n)
     for k, v in info:
         s = "%s\t%s" % (k, v)
@@ -47,7 +49,7 @@ def _parse_secondary_table_entry(s):
     return s.rstrip().split("\t")
 
 def _read_secondary_table(filename):
-    infile = _open(filename, "rb")
+    infile = file(filename, "rb")
     size = int(infile.read(4))
     table = {}
     while 1:
@@ -75,7 +77,7 @@ def _write_secondary_table(filename, tab
                "Secondary index record too large for format spec! " +
                " %s bytes in %r" % (n, s))
     # And write the output
-    outfile = _open(filename, "wb")
+    outfile = file(filename, "wb")
     outfile.write("%04d" % n)
     for k, v in items:
         for x in v:
@@ -127,7 +129,7 @@ class MemoryFlatDB(BaseDB.WriteDB, BaseF
     def __init__(self, dbname):
         self.__in_constructor = 1
         self._need_flush = 0
-        BaseFlatDB.__init__(self, dbname, INDEX_TYPE)
+        BaseFlatDB.__init__(self, dbname)
 
         primary_filename = os.path.join(self.dbname,
                              "key_%s.key" % (self.primary_namespace,) )
@@ -145,7 +147,8 @@ class MemoryFlatDB(BaseDB.WriteDB, BaseF
         if len(key_list) != 1:
             raise TypeError(
                 "Field %s has %d entries but must have only one "
-                "(must be unique)" % (repr(unique), len(key_list)))
+                "(must be unique)" % (repr(self.primary_namespace),
+                                      len(key_list)))
         key = key_list[0]
         if self.primary_table.has_key(key):
             raise TypeError("Field %r = %r already exists; must be unique" %
@@ -227,7 +230,7 @@ class BisectFile:
 
 def _find_entry(filename, wantword):
     size = os.path.getsize(filename)
-    infile = _open(filename, "rb")
+    infile = file(filename, "rb")
 
     bf = BisectFile(infile, size)
     left = bisect.bisect_left(bf, wantword)
@@ -238,7 +241,7 @@ def _find_entry(filename, wantword):
 
 def _find_range(filename, wantword):
     size = os.path.getsize(filename)
-    infile = _open(filename, "rb")
+    infile = file(filename, "rb")
 
     bf = BisectFile(infile, size)
     left = bisect.bisect_left(bf, wantword)
@@ -272,7 +275,7 @@ def _lookup_alias(id_filename, word):
     return primary_keys
 
 def create(dbname, primary_namespace, secondary_namespaces,
-           formatname = "unknown"):
+           formatname="unknown"):
     os.mkdir(dbname)
     config_filename = os.path.join(dbname, "config.dat")
     BaseDB.write_config(config_filename = config_filename,
@@ -297,7 +300,7 @@ def create(dbname, primary_namespace, se
     return open(dbname, "rw")
     
 
-def open(dbname, mode = "r"):
+def open(dbname, mode="r"):
     if mode == "r":
         return DiskFlatDB(dbname)
     elif mode == "rw":
@@ -308,7 +311,7 @@ def open(dbname, mode = "r"):
         raise TypeError("Unknown mode: %r" % (mode,))
 
 def _get_first_words(filename):
-    infile = _open(filename, "rb")
+    infile = file(filename, "rb")
     size = int(infile.read(4))
     data = []
     while 1:
Index: Bio/Mindy/Location.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Mindy/Location.py,v
retrieving revision 1.2
diff -u -p -r1.2 Location.py
--- Bio/Mindy/Location.py	2002/02/26 11:32:26	1.2
+++ Bio/Mindy/Location.py	2003/08/12 19:12:21
@@ -1,6 +1,6 @@
 import compression
 
-class Location:
+class Location(object):
     """Handle for a record (use 'text' to get the record's text)"""
     def __init__(self, namespace, name, filename, startpos, length):
         self.namespace = namespace
@@ -9,26 +9,26 @@ class Location:
         self.startpos = startpos
         self.length = length
     def __repr__(self):
-        return "Location(namespace = %r, name = %r, filename = %r, startpos = %r, length = %r)" % (self.namespace, self.name, self.filename, self.startpos, self.length)
+        return "Location(namespace = %r, name = %r, filename = %r," \
+               " startpos = %r, length = %r)" % \
+               (self.namespace, self.name, self.filename,
+                self.startpos, self.length)
     def __str__(self):
         return "Location(%s:%s at %s: %s, %s)" % \
                (self.namespace, self.name,
                 self.filename,self.startpos, self.length)
-    def __getattr__(self, key):
-        if key == "text":
-            infile = compression.open_file(self.filename)
-            if hasattr(infile, "seek"):
-                infile.seek(self.startpos)
-                return infile.read(self.length)
-            # read 1MB chunks at a time
-            CHUNKSIZE = 1000000
-            count = 0
-            while count + CHUNKSIZE < self.startpos:
-                infile.read(CHUNKSIZE)
-                count += CHUNKSIZE
-            infile.read(self.startpos - count)
+    def get_text(self):
+        infile = compression.open(self.filename)
+        if hasattr(infile, "seek"):
+            infile.seek(self.startpos)
             return infile.read(self.length)
-        elif key == "__members__":
-            return ["text"]
-        raise AttributeError(key)
+        # read 1MiB chunks at a time
+        CHUNKSIZE = 1048576
+        count = 0
+        while count + CHUNKSIZE < self.startpos:
+            infile.read(CHUNKSIZE)
+            count += CHUNKSIZE
+        infile.read(self.startpos - count)
+        return infile.read(self.length)
+    text = property(get_text)
 
Index: Bio/Mindy/SimpleSeqRecord.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Mindy/SimpleSeqRecord.py,v
retrieving revision 1.2
diff -u -p -r1.2 SimpleSeqRecord.py
--- Bio/Mindy/SimpleSeqRecord.py	2002/12/10 20:56:05	1.2
+++ Bio/Mindy/SimpleSeqRecord.py	2003/08/12 19:12:22
@@ -94,8 +94,10 @@ class FixDocumentBuilder(BuildSeqRecord)
 
 # --- convenience functions for indexing
 # you should just use these unless you are doing something fancy
-def create_berkeleydb(files, db_name, indexer = SimpleIndexer()):
+def create_berkeleydb(files, db_name, indexer=None):
     from Bio.Mindy import BerkeleyDB
+    if indexer is None:
+        indexer = SimpleIndexer()
     unique_name = indexer.primary_key_name()
     alias_names = indexer.secondary_key_names()
     creator = BerkeleyDB.create(db_name, unique_name, alias_names)
@@ -104,8 +106,10 @@ def create_berkeleydb(files, db_name, in
         creator.load(filename, builder = builder, fileid_info = {})
     creator.close()
 
-def create_flatdb(files, db_name, indexer = SimpleIndexer()):
+def create_flatdb(files, db_name, indexer=None):
     from Bio.Mindy import FlatDB
+    if indexer is None:
+        indexer = SimpleIndexer()
     unique_name = indexer.primary_key_name()
     alias_names = indexer.secondary_key_names()
     creator = FlatDB.create(db_name, unique_name, alias_names)
Index: Bio/Mindy/XPath.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Mindy/XPath.py,v
retrieving revision 1.3
diff -u -p -r1.3 XPath.py
--- Bio/Mindy/XPath.py	2002/03/01 15:07:21	1.3
+++ Bio/Mindy/XPath.py	2003/08/12 19:12:22
@@ -1,4 +1,5 @@
-import xml.sax, re
+import xml.sax
+import re
 
 from Bio import Std
 
@@ -10,7 +11,7 @@ _pat_tag_re = re.compile(r"""^//(%s)(\[@
                                                    #')  # emacs cruft
 
 
-def parse_simple_xpath(s):
+def _parse_simple_xpath(s):
     # Only supports two formats
     # //tag
     # //tag[@attr="value"]
@@ -32,11 +33,23 @@ def parse_simple_xpath(s):
 def xpath_index(dbname,
                 filenames,
                 primary_namespace,
-                extract_info,  # pair of (data_value, xpath)
-                format = "sequence",
-                record_tag = Std.record.tag,
-                creator_factory = None,
+                extract_info,
+                format="sequence",
+                record_tag=Std.record.tag,
+                creator_factory=None,
                 ):
+    """Index a flat-file databank.
+
+    Arguments:
+    dbname -- databank name
+    filenames -- list of file names; full paths should be used
+    primary_namespace -- primary identifier namespace
+    extract_info -- list of pairs (data_value, xpath)
+    format -- name of the file format (default: "sequence")
+    record_tag -- record tag (default: `Bio.Std.record.tag`)
+    creator_factory -- creator factory (default: BerkeleyDB.create)
+
+    """
     if creator_factory is None:
         import BerkeleyDB
         creator_factory = BerkeleyDB.create
@@ -55,28 +68,32 @@ def xpath_index(dbname,
         raise TypeError("Property %r has no xpath definition" %
                         (primary_namespace,))
 
-    creator = creator_factory(dbname, primary_namespace, data_names)
-    builder = GrabXPathNodes(extract_info)
+    creator = creator_factory(dbname, primary_namespace, data_names,
+                              formatname = format)
+    builder = _GrabXPathNodes(extract_info)
+    fileid_info = {}
     for filename in filenames:
-        creator.load(filename, builder = builder, record_tag = record_tag,
-                     formatname = format)
+        creator.load(filename, builder = builder, fileid_info = fileid_info,
+                     record_tag = record_tag)
     creator.close()
 
 
-class GrabXPathNodes(xml.sax.ContentHandler):
+class _GrabXPathNodes(xml.sax.ContentHandler):
     def __init__(self, extractinfo):
+        xml.sax.ContentHandler.__init__(self)
         self._fast_tags = _fast_tags = {}
         for property, xpath in extractinfo:
-            tag, attrs = parse_simple_xpath(xpath)
+            tag, attrs = _parse_simple_xpath(xpath)
             _fast_tags.setdefault(tag, []).append( (attrs, property) )
 
         # for doing the endElement in the correct order,
         # which is opposite to the input order
-        self._rev_tags = _rev_tags = {}
+        _rev_tags = {}
         for k, v in self._fast_tags.items():
             v = v[:]
             v.reverse()
-            self._rev_tags[k] = v
+            _rev_tags[k] = v
+        self._rev_tags = _rev_tags
 
     def uses_tags(self):
         return self._fast_tags.keys()
Index: Bio/Mindy/__init__.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Mindy/__init__.py,v
retrieving revision 1.6
diff -u -p -r1.6 __init__.py
--- Bio/Mindy/__init__.py	2002/03/01 15:07:21	1.6
+++ Bio/Mindy/__init__.py	2003/08/12 19:12:22
@@ -1,9 +1,13 @@
-import os, sys
+import os
 
-_open = open  # rename for internal use -- gets redefined below
+# For python 2.1 compatibility, one can add
+##try:
+##    file
+##except NameError:
+##    file = open
 
-def open(dbname, mode = "r"):
-    text = _open(os.path.join(dbname, "config.dat"), "rb").read()
+def open(dbname, mode="r"):
+    text = file(os.path.join(dbname, "config.dat"), "rb").read()
     line = text.split("\n")[0]
     if line == "index\tBerkeleyDB/1":
         import BerkeleyDB
@@ -18,19 +22,19 @@ def open(dbname, mode = "r"):
 def main():
     from Bio import Std
     import XPath
-    import FlatDB
+    ##import FlatDB
     XPath.xpath_index(
-        #dbname = "sprot_flat",
+        ##dbname = "sprot_flat",
         dbname = "sprot_small",
         filenames = ["/home/dalke/ftps/swissprot/smaller_sprot38.dat",
-        #filenames = ["/home/dalke/ftps/swissprot/sprot38.dat",
+        ##filenames = ["/home/dalke/ftps/swissprot/sprot38.dat",
                      ],
         primary_namespace = "entry",
         extract_info = [
         ("entry", "//entry_name"),
         ("accession", "//%s[@type='accession']" % (Std.dbid.tag,)),
         ],
-        #creator_factory = FlatDB.CreateFlatDB,
+        ##creator_factory = FlatDB.CreateFlatDB,
         )
 
 
Index: Bio/Mindy/compression.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Mindy/compression.py,v
retrieving revision 1.1
diff -u -p -r1.1 compression.py
--- Bio/Mindy/compression.py	2002/01/28 20:55:30	1.1
+++ Bio/Mindy/compression.py	2003/08/12 19:12:22
@@ -1,4 +1,5 @@
-import commands, os
+import commands
+import os
 
 _uncompress_table = {
     ".bz": "bzip2",
@@ -8,21 +9,23 @@ _uncompress_table = {
     ".Z": "compress",
     }
 
-def open_file(filename, mode = "rb"):
+def open(filename, mode="rb"):
     ext = os.path.splitext(filename)[1]
     type = _uncompress_table.get(ext)
     if type is None:
-        return open(filename, mode)
+        return file(filename, mode)
     if type == "gzip":
         import gzip
-        gzip.open(filename, mode)
+        return gzip.open(filename, mode)
     if type == "bzip2":
-        cmd = "bzcat --decompress"
-        cmd += commands.mkarg(filename)
-        return os.popen(cmd, mode)
+        try:
+            import bz2
+        except ImportError:
+            cmd = "bzcat --decompress %s" % commands.mkarg(filename)
+            return os.popen(cmd, mode)
+        return bz2.BZ2File(filename, mode)
     if type == "compress":
-        cmd = "zcat -d"
-        cmd += commands.mkarg(filename)
+        cmd = "zcat -d %s" % commands.mkarg(filename)
         return os.popen(cmd, mode)
     raise AssertionError("What's a %r?" % type)
             
-------------- next part --------------
Index: Bio/expressions//swissprot/sprot38.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/expressions/swissprot/sprot38.py,v
retrieving revision 1.4
diff -u -p -r1.4 sprot38.py
--- Bio/expressions//swissprot/sprot38.py	2002/02/27 07:31:32	1.4
+++ Bio/expressions//swissprot/sprot38.py	2003/08/12 19:13:33
@@ -111,9 +111,9 @@ RN = Martel.Group("RN", Martel.Re("RN   
 
 #--- RP
 
-# occurs once
+# 1 or more
 RP = Simple("RP",  "reference_position")
-
+RP_block = Martel.Group("RP_block", Martel.Rep1(RP))
 
 #--- RC
 
@@ -151,7 +151,7 @@ RL_block = Martel.Group("RL_block", Mart
 
 reference = Martel.Group("reference",
                          RN + \
-                         RP + \
+                         RP_block + \
                          Martel.Opt(RC_block) + \
                          Martel.Opt(RX) + \
                          RA_block + \
From Yves.Bastide at irisa.fr  Tue Aug 12 15:25:14 2003
From: Yves.Bastide at irisa.fr (Yves Bastide)
Date: Sat Mar  5 14:43:26 2005
Subject: [Biopython-dev] Re: [patch] Bio/Mindy
In-Reply-To: <20030808220823.GE47653@evostick.agtec.uga.edu>
References: <3F339B4E.5090304@irisa.fr>
	<20030808220823.GE47653@evostick.agtec.uga.edu>
Message-ID: <3F393F1A.9050703@irisa.fr>

Oops.  <note to="self">Do C-x s before cvs diff.</note>

yves
-------------- next part --------------
Index: Bio/Mindy/BaseDB.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Mindy/BaseDB.py,v
retrieving revision 1.5
diff -u -p -r1.5 BaseDB.py
--- Bio/Mindy/BaseDB.py	2002/12/10 20:56:05	1.5
+++ Bio/Mindy/BaseDB.py	2003/08/12 19:17:44
@@ -3,6 +3,7 @@ import Bio
 import compression
 
 def _int_str(i):
+    # XXX doesn't seem useful
     s = str(i)
     if s[-1:] == "l":
         return s[:-1]
@@ -12,22 +13,22 @@ class WriteDB:
     # Must define 'self.filename_map' mapping from filename -> fileid
     # Must define 'self.fileid_info' mapping from fileid -> (filename,size)
 
-    def add_filename(self, filename, size, fileid_info):
+    def add_filename(self, filename, size):
         fileid = self.filename_map.get(filename, None)
         if fileid is not None:
             return fileid
         s = str(len(self.filename_map))
         self.filename_map[filename] = s  # map from filename -> id
-        assert s not in fileid_info.keys(), "Duplicate entry! %s" % (s,)
+        assert s not in self.fileid_info.keys(), "Duplicate entry! %s" % (s,)
         self.fileid_info[s] = (filename, size)
         return s
 
-    def load(self, filename, builder, fileid_info, record_tag = "record"):
+    def load(self, filename, builder, record_tag="record"):
         formatname = self.formatname
         size = os.path.getsize(filename)
-        filetag = self.add_filename(filename, size, fileid_info)
+        filetag = self.add_filename(filename, size)
 
-        source = compression.open_file(filename, "rb")
+        source = compression.open(filename, "rb")
         if formatname == "unknown":
             formatname = "sequence"
         
@@ -66,7 +67,7 @@ class DictLookup:
     def items(self):
         return [(key, self[key]) for key in self.keys()]
 
-    def get(self, key, default = None):
+    def get(self, key, default=None):
         try:
             return self[key]
         except KeyError:
@@ -97,7 +98,7 @@ class OpenDB(DictLookup):
             if os.path.getsize(filename) != size:
                 raise TypeError(
                     "File %s has changed size from %d to %d bytes!" %
-                    (size, os.path.getsize(filename)))
+                    (filename, size, os.path.getsize(filename)))
 
         self.filename_map = filename_map
         self.fileid_info = fileid_info
Index: Bio/Mindy/BerkeleyDB.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Mindy/BerkeleyDB.py,v
retrieving revision 1.8
diff -u -p -r1.8 BerkeleyDB.py
--- Bio/Mindy/BerkeleyDB.py	2002/12/10 21:55:40	1.8
+++ Bio/Mindy/BerkeleyDB.py	2003/08/12 19:17:44
@@ -1,15 +1,19 @@
+"""Open-Bio BerkeleyDB indexing system for flat-files databanks."""
+
 import os
-from bsddb3 import db
+try:
+    from bsddb import db
+except ImportError:
+    from bsddb3 import db
 import Location
 import BaseDB
 import Bio
 
-_open = open  # rename for internal use -- gets redefined below
-
 INDEX_TYPE = "BerkeleyDB/1"
 
 def create(dbname, primary_namespace, secondary_namespaces,
-           formatname = "unknown"):
+           formatname="unknown"):
+    """BerkeleyDB creator factory"""
     os.mkdir(dbname)
     config_filename = os.path.join(dbname, "config.dat")
     BaseDB.write_config(config_filename = config_filename,
@@ -39,7 +43,7 @@ def create(dbname, primary_namespace, se
     primary_table.close()
     dbenv.close()
 
-    return open(dbname, "rw")
+    return BerkeleyDB(dbname, "rw")
     
 
 class PrimaryNamespace(BaseDB.DictLookup):
@@ -92,7 +96,7 @@ class SecondaryNamespace(BaseDB.DictLook
         return table.keys()
 
 class BerkeleyDB(BaseDB.OpenDB, BaseDB.WriteDB):
-    def __init__(self, dbname, mode = "r"):
+    def __init__(self, dbname, mode="r"):
         if mode not in ("r", "rw"):
             raise TypeError("Unknown mode: %r" % (mode,))
         self.__need_flush = 0
@@ -173,7 +177,7 @@ class BerkeleyDB(BaseDB.OpenDB, BaseDB.W
         [x.close() for x in self.secondary_tables.values()]
         self.dbenv.close()
         self.dbenv = self.primary_table = self.fileid_info = \
-                     self.secondary_tables = self.fileid_info = None
+                     self.secondary_tables = None
         
     def __del__(self):
         if self.dbenv is not None:
@@ -188,4 +192,3 @@ class BerkeleyDB(BaseDB.OpenDB, BaseDB.W
             return SecondaryNamespace(self, key)
 
 open = BerkeleyDB
-
Index: Bio/Mindy/FlatDB.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Mindy/FlatDB.py,v
retrieving revision 1.6
diff -u -p -r1.6 FlatDB.py
--- Bio/Mindy/FlatDB.py	2002/03/01 15:07:21	1.6
+++ Bio/Mindy/FlatDB.py	2003/08/12 19:17:44
@@ -1,9 +1,11 @@
+"""Open-Bio flat indexing system for flat-files databanks."""
 
-import os, bisect
-import BaseDB, Location
+import os
+import bisect
+import BaseDB
+import Location
 import Bio
 
-_open = open
 INDEX_TYPE = "flat/1"
 
 def _parse_primary_table_entry(s):
@@ -11,7 +13,7 @@ def _parse_primary_table_entry(s):
     return name, filetag, long(startpos), long(length)
 
 def _read_primary_table(filename):
-    infile = _open(filename, "rb")
+    infile = file(filename, "rb")
     size = int(infile.read(4))
     table = {}
     while 1:
@@ -36,7 +38,7 @@ def _write_primary_table(filename, prima
                 raise AssertionError(
                     "Primary index record too large for format spec! " +
                     " %s bytes in %r" % (n, s))
-    outfile = _open(filename, "wb")
+    outfile = file(filename, "wb")
     outfile.write("%04d" % n)
     for k, v in info:
         s = "%s\t%s" % (k, v)
@@ -47,7 +49,7 @@ def _parse_secondary_table_entry(s):
     return s.rstrip().split("\t")
 
 def _read_secondary_table(filename):
-    infile = _open(filename, "rb")
+    infile = file(filename, "rb")
     size = int(infile.read(4))
     table = {}
     while 1:
@@ -75,7 +77,7 @@ def _write_secondary_table(filename, tab
                "Secondary index record too large for format spec! " +
                " %s bytes in %r" % (n, s))
     # And write the output
-    outfile = _open(filename, "wb")
+    outfile = file(filename, "wb")
     outfile.write("%04d" % n)
     for k, v in items:
         for x in v:
@@ -127,7 +129,7 @@ class MemoryFlatDB(BaseDB.WriteDB, BaseF
     def __init__(self, dbname):
         self.__in_constructor = 1
         self._need_flush = 0
-        BaseFlatDB.__init__(self, dbname, INDEX_TYPE)
+        BaseFlatDB.__init__(self, dbname)
 
         primary_filename = os.path.join(self.dbname,
                              "key_%s.key" % (self.primary_namespace,) )
@@ -145,7 +147,8 @@ class MemoryFlatDB(BaseDB.WriteDB, BaseF
         if len(key_list) != 1:
             raise TypeError(
                 "Field %s has %d entries but must have only one "
-                "(must be unique)" % (repr(unique), len(key_list)))
+                "(must be unique)" % (repr(self.primary_namespace),
+                                      len(key_list)))
         key = key_list[0]
         if self.primary_table.has_key(key):
             raise TypeError("Field %r = %r already exists; must be unique" %
@@ -227,7 +230,7 @@ class BisectFile:
 
 def _find_entry(filename, wantword):
     size = os.path.getsize(filename)
-    infile = _open(filename, "rb")
+    infile = file(filename, "rb")
 
     bf = BisectFile(infile, size)
     left = bisect.bisect_left(bf, wantword)
@@ -238,7 +241,7 @@ def _find_entry(filename, wantword):
 
 def _find_range(filename, wantword):
     size = os.path.getsize(filename)
-    infile = _open(filename, "rb")
+    infile = file(filename, "rb")
 
     bf = BisectFile(infile, size)
     left = bisect.bisect_left(bf, wantword)
@@ -272,7 +275,7 @@ def _lookup_alias(id_filename, word):
     return primary_keys
 
 def create(dbname, primary_namespace, secondary_namespaces,
-           formatname = "unknown"):
+           formatname="unknown"):
     os.mkdir(dbname)
     config_filename = os.path.join(dbname, "config.dat")
     BaseDB.write_config(config_filename = config_filename,
@@ -297,7 +300,7 @@ def create(dbname, primary_namespace, se
     return open(dbname, "rw")
     
 
-def open(dbname, mode = "r"):
+def open(dbname, mode="r"):
     if mode == "r":
         return DiskFlatDB(dbname)
     elif mode == "rw":
@@ -308,7 +311,7 @@ def open(dbname, mode = "r"):
         raise TypeError("Unknown mode: %r" % (mode,))
 
 def _get_first_words(filename):
-    infile = _open(filename, "rb")
+    infile = file(filename, "rb")
     size = int(infile.read(4))
     data = []
     while 1:
Index: Bio/Mindy/Location.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Mindy/Location.py,v
retrieving revision 1.2
diff -u -p -r1.2 Location.py
--- Bio/Mindy/Location.py	2002/02/26 11:32:26	1.2
+++ Bio/Mindy/Location.py	2003/08/12 19:17:44
@@ -1,6 +1,6 @@
 import compression
 
-class Location:
+class Location(object):
     """Handle for a record (use 'text' to get the record's text)"""
     def __init__(self, namespace, name, filename, startpos, length):
         self.namespace = namespace
@@ -9,26 +9,26 @@ class Location:
         self.startpos = startpos
         self.length = length
     def __repr__(self):
-        return "Location(namespace = %r, name = %r, filename = %r, startpos = %r, length = %r)" % (self.namespace, self.name, self.filename, self.startpos, self.length)
+        return "Location(namespace = %r, name = %r, filename = %r," \
+               " startpos = %r, length = %r)" % \
+               (self.namespace, self.name, self.filename,
+                self.startpos, self.length)
     def __str__(self):
         return "Location(%s:%s at %s: %s, %s)" % \
                (self.namespace, self.name,
                 self.filename,self.startpos, self.length)
-    def __getattr__(self, key):
-        if key == "text":
-            infile = compression.open_file(self.filename)
-            if hasattr(infile, "seek"):
-                infile.seek(self.startpos)
-                return infile.read(self.length)
-            # read 1MB chunks at a time
-            CHUNKSIZE = 1000000
-            count = 0
-            while count + CHUNKSIZE < self.startpos:
-                infile.read(CHUNKSIZE)
-                count += CHUNKSIZE
-            infile.read(self.startpos - count)
+    def get_text(self):
+        infile = compression.open(self.filename)
+        if hasattr(infile, "seek"):
+            infile.seek(self.startpos)
             return infile.read(self.length)
-        elif key == "__members__":
-            return ["text"]
-        raise AttributeError(key)
+        # read 1MiB chunks at a time
+        CHUNKSIZE = 1048576
+        count = 0
+        while count + CHUNKSIZE < self.startpos:
+            infile.read(CHUNKSIZE)
+            count += CHUNKSIZE
+        infile.read(self.startpos - count)
+        return infile.read(self.length)
+    text = property(get_text)
 
Index: Bio/Mindy/SimpleSeqRecord.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Mindy/SimpleSeqRecord.py,v
retrieving revision 1.2
diff -u -p -r1.2 SimpleSeqRecord.py
--- Bio/Mindy/SimpleSeqRecord.py	2002/12/10 20:56:05	1.2
+++ Bio/Mindy/SimpleSeqRecord.py	2003/08/12 19:17:44
@@ -94,8 +94,10 @@ class FixDocumentBuilder(BuildSeqRecord)
 
 # --- convenience functions for indexing
 # you should just use these unless you are doing something fancy
-def create_berkeleydb(files, db_name, indexer = SimpleIndexer()):
+def create_berkeleydb(files, db_name, indexer=None):
     from Bio.Mindy import BerkeleyDB
+    if indexer is None:
+        indexer = SimpleIndexer()
     unique_name = indexer.primary_key_name()
     alias_names = indexer.secondary_key_names()
     creator = BerkeleyDB.create(db_name, unique_name, alias_names)
@@ -104,8 +106,10 @@ def create_berkeleydb(files, db_name, in
         creator.load(filename, builder = builder, fileid_info = {})
     creator.close()
 
-def create_flatdb(files, db_name, indexer = SimpleIndexer()):
+def create_flatdb(files, db_name, indexer=None):
     from Bio.Mindy import FlatDB
+    if indexer is None:
+        indexer = SimpleIndexer()
     unique_name = indexer.primary_key_name()
     alias_names = indexer.secondary_key_names()
     creator = FlatDB.create(db_name, unique_name, alias_names)
Index: Bio/Mindy/XPath.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Mindy/XPath.py,v
retrieving revision 1.3
diff -u -p -r1.3 XPath.py
--- Bio/Mindy/XPath.py	2002/03/01 15:07:21	1.3
+++ Bio/Mindy/XPath.py	2003/08/12 19:17:44
@@ -1,4 +1,5 @@
-import xml.sax, re
+import xml.sax
+import re
 
 from Bio import Std
 
@@ -10,7 +11,7 @@ _pat_tag_re = re.compile(r"""^//(%s)(\[@
                                                    #')  # emacs cruft
 
 
-def parse_simple_xpath(s):
+def _parse_simple_xpath(s):
     # Only supports two formats
     # //tag
     # //tag[@attr="value"]
@@ -32,11 +33,23 @@ def parse_simple_xpath(s):
 def xpath_index(dbname,
                 filenames,
                 primary_namespace,
-                extract_info,  # pair of (data_value, xpath)
-                format = "sequence",
-                record_tag = Std.record.tag,
-                creator_factory = None,
+                extract_info,
+                format="sequence",
+                record_tag=Std.record.tag,
+                creator_factory=None,
                 ):
+    """Index a flat-file databank.
+
+    Arguments:
+    dbname -- databank name
+    filenames -- list of file names; full paths should be used
+    primary_namespace -- primary identifier namespace
+    extract_info -- list of pairs (data_value, xpath)
+    format -- name of the file format (default: "sequence")
+    record_tag -- record tag (default: `Bio.Std.record.tag`)
+    creator_factory -- creator factory (default: BerkeleyDB.create)
+
+    """
     if creator_factory is None:
         import BerkeleyDB
         creator_factory = BerkeleyDB.create
@@ -55,28 +68,31 @@ def xpath_index(dbname,
         raise TypeError("Property %r has no xpath definition" %
                         (primary_namespace,))
 
-    creator = creator_factory(dbname, primary_namespace, data_names)
-    builder = GrabXPathNodes(extract_info)
+    creator = creator_factory(dbname, primary_namespace, data_names,
+                              formatname = format)
+    builder = _GrabXPathNodes(extract_info)
+    fileid_info = {}
     for filename in filenames:
-        creator.load(filename, builder = builder, record_tag = record_tag,
-                     formatname = format)
+        creator.load(filename, builder = builder, record_tag = record_tag)
     creator.close()
 
 
-class GrabXPathNodes(xml.sax.ContentHandler):
+class _GrabXPathNodes(xml.sax.ContentHandler):
     def __init__(self, extractinfo):
+        xml.sax.ContentHandler.__init__(self)
         self._fast_tags = _fast_tags = {}
         for property, xpath in extractinfo:
-            tag, attrs = parse_simple_xpath(xpath)
+            tag, attrs = _parse_simple_xpath(xpath)
             _fast_tags.setdefault(tag, []).append( (attrs, property) )
 
         # for doing the endElement in the correct order,
         # which is opposite to the input order
-        self._rev_tags = _rev_tags = {}
+        _rev_tags = {}
         for k, v in self._fast_tags.items():
             v = v[:]
             v.reverse()
-            self._rev_tags[k] = v
+            _rev_tags[k] = v
+        self._rev_tags = _rev_tags
 
     def uses_tags(self):
         return self._fast_tags.keys()
Index: Bio/Mindy/__init__.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Mindy/__init__.py,v
retrieving revision 1.6
diff -u -p -r1.6 __init__.py
--- Bio/Mindy/__init__.py	2002/03/01 15:07:21	1.6
+++ Bio/Mindy/__init__.py	2003/08/12 19:17:44
@@ -1,9 +1,13 @@
-import os, sys
+import os
 
-_open = open  # rename for internal use -- gets redefined below
+# For python 2.1 compatibility, one can add
+##try:
+##    file
+##except NameError:
+##    file = open
 
-def open(dbname, mode = "r"):
-    text = _open(os.path.join(dbname, "config.dat"), "rb").read()
+def open(dbname, mode="r"):
+    text = file(os.path.join(dbname, "config.dat"), "rb").read()
     line = text.split("\n")[0]
     if line == "index\tBerkeleyDB/1":
         import BerkeleyDB
@@ -18,7 +22,7 @@ def open(dbname, mode = "r"):
 def main():
     from Bio import Std
     import XPath
-    import FlatDB
+    ##import FlatDB
     XPath.xpath_index(
         #dbname = "sprot_flat",
         dbname = "sprot_small",
Index: Bio/Mindy/compression.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Mindy/compression.py,v
retrieving revision 1.1
diff -u -p -r1.1 compression.py
--- Bio/Mindy/compression.py	2002/01/28 20:55:30	1.1
+++ Bio/Mindy/compression.py	2003/08/12 19:17:44
@@ -1,4 +1,5 @@
-import commands, os
+import commands
+import os
 
 _uncompress_table = {
     ".bz": "bzip2",
@@ -8,21 +9,23 @@ _uncompress_table = {
     ".Z": "compress",
     }
 
-def open_file(filename, mode = "rb"):
+def open(filename, mode="rb"):
     ext = os.path.splitext(filename)[1]
     type = _uncompress_table.get(ext)
     if type is None:
-        return open(filename, mode)
+        return file(filename, mode)
     if type == "gzip":
         import gzip
-        gzip.open(filename, mode)
+        return gzip.open(filename, mode)
     if type == "bzip2":
-        cmd = "bzcat --decompress"
-        cmd += commands.mkarg(filename)
-        return os.popen(cmd, mode)
+        try:
+            import bz2
+        except ImportError:
+            cmd = "bzcat --decompress %s" % commands.mkarg(filename)
+            return os.popen(cmd, mode)
+        return bz2.BZ2File(filename, mode)
     if type == "compress":
-        cmd = "zcat -d"
-        cmd += commands.mkarg(filename)
+        cmd = "zcat -d %s" % commands.mkarg(filename)
         return os.popen(cmd, mode)
     raise AssertionError("What's a %r?" % type)
             
From e.bettler at cmbi.kun.nl  Wed Aug 13 17:32:24 2003
From: e.bettler at cmbi.kun.nl (Dr E Bettler)
Date: Sat Mar  5 14:43:26 2005
Subject: [Biopython-dev] reference in an article
Message-ID: <200308132332.24489.e.bettler@cmbi.kun.nl>

Hi,
we developped a project that is using Biopython libraries. In an article, how 
can we referenced Biopython ? just the url ?

thanks,

best regards,

-- 
Dr Emmanuel BETTLER
/-------------------------------/

CMBI
University of Nijmegen
P.O. Box 9010,
6500 GL Nijmegen, the Netherlands
http://www.cmbi.kun.nl/staff/EBettler.shtml

Tel. +31 (0)24 36 53338 (CMBI A-3031)
     +31 (0)24 36 53391 (CMBI's secretary)
Fax. +31 (0)24 36 52977
Mob. +31 (0)6 25 175 619 


From jefftc at stanford.edu  Wed Aug 13 19:12:03 2003
From: jefftc at stanford.edu (Jeffrey Chang)
Date: Sat Mar  5 14:43:26 2005
Subject: [Biopython-dev] reference in an article
In-Reply-To: <200308132332.24489.e.bettler@cmbi.kun.nl>
Message-ID: <8CB67A2E-CDE3-11D7-A091-000A956845CE@stanford.edu>

Yes.  Please cite the URL:
http://www.biopython.org

Jeff


On Wednesday, August 13, 2003, at 02:32  PM, Dr E Bettler wrote:

> Hi,
> we developped a project that is using Biopython libraries. In an 
> article, how
> can we referenced Biopython ? just the url ?
>
> thanks,
>
> best regards,
>
> -- 
> Dr Emmanuel BETTLER
> /-------------------------------/
>
> CMBI
> University of Nijmegen
> P.O. Box 9010,
> 6500 GL Nijmegen, the Netherlands
> http://www.cmbi.kun.nl/staff/EBettler.shtml
>
> Tel. +31 (0)24 36 53338 (CMBI A-3031)
>      +31 (0)24 36 53391 (CMBI's secretary)
> Fax. +31 (0)24 36 52977
> Mob. +31 (0)6 25 175 619
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev


From bugzilla-daemon at portal.open-bio.org  Mon Aug 18 14:58:11 2003
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Sat Mar  5 14:43:26 2005
Subject: [Biopython-dev] [Bug 1492] New: Martel Parser fails on
	Bio.db["protein-genbank-cgi"] entry
Message-ID: <200308181858.h7IIwBwt013428@localhost.localdomain>

http://bugzilla.bioperl.org/show_bug.cgi?id=1492

           Summary: Martel Parser fails on Bio.db["protein-genbank-cgi"]
                    entry
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Martel/Mindy
        AssignedTo: biopython-dev@biopython.org
        ReportedBy: henschel@mpi-cbg.de


# Hi There!

#The following few lines broke on a rather simple parsing, however it looks
#reproducible (using biopython 1.21 on python 2.2.2)
#It works for most other entries I tested.

from Bio import GenBank
from Bio import db

genbank=db["protein-genbank-cgi"]
parser=GenBank.FeatureParser()
h=genbank["3891376"]
res=parser.parse(h)
# Causes Error: Martel.Parser.ParserPositionException: error parsing at or 
#beyond character 4462

>>> print genbank["3891376"].read() # still works though!
# Hope it makes sense to you
# Cheers, Andreas


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Aug 18 15:36:08 2003
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Sat Mar  5 14:43:26 2005
Subject: [Biopython-dev] [Bug 1493] New: Failure to load a MySQL database
	using BioSQL 
Message-ID: <200308181936.h7IJa84E013527@localhost.localdomain>

http://bugzilla.bioperl.org/show_bug.cgi?id=1493

           Summary: Failure to load a MySQL database using BioSQL
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev@biopython.org
        ReportedBy: idoerg@burnham.org


Hi,

Following a tutorial example on BioSQL, I ran into the followning dump. 
Seems like the taxon_id table does not have a columns "binomial" or "variant".
I used the table definitions for MySQLdb off the CVS on the MySQLDb site. 

Iddo

>>> gdb = server.new_database("px01")
>>> from Bio import GenBank
>>> parser = GenBank.FeatureParser()
>>> iterator =
GenBank.Iterator(open("/usr/home/iddo/results/anthrax_03/px01_gb"),parser)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
IOError: [Errno 2] No such file or directory:
'/usr/home/iddo/results/anthrax_03/px01_gb'
>>> iterator =\
GenBank.Iterator(open("/usr/home/iddo/results/anthrax_03/px01_03.gb"),parser)
>>> gdb.load(iterator)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/iddo/biopy_cvs/biopython/BioSQL/BioSeqDatabase.py", line 337, in load
    db_loader.load_seqrecord(cur_record)
  File "/home/iddo/biopy_cvs/biopython/BioSQL/Loader.py", line 30, in load_seqrecord
    bioentry_id = self._load_bioentry_table(record)
  File "/home/iddo/biopy_cvs/biopython/BioSQL/Loader.py", line 173, in
_load_bioentry_table
    taxon_id = self._get_taxon_id(record)
  File "/home/iddo/biopy_cvs/biopython/BioSQL/Loader.py", line 107, in _get_taxon_id
    taxa = self.adaptor.execute_and_fetchall(sql, (binomial, variant))
  File "/home/iddo/biopy_cvs/biopython/BioSQL/BioSeqDatabase.py", line 236, in
execute_and_fetchall
    self.cursor.execute(sql, args)
  File "/usr/lib/python2.2/site-packages/MySQLdb/cursors.py", line 95, in execute
    return self._execute(query, args)
  File "/usr/lib/python2.2/site-packages/MySQLdb/cursors.py", line 114, in _execute
    self.errorhandler(self, exc, value)
  File "/usr/lib/python2.2/site-packages/MySQLdb/connections.py", line 33, in
defaulterrorhandler
    raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (1054, "Unknown column 'binomial' in 'where
clause'")
>>>


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From andreas.kuntzagk at mdc-berlin.de  Tue Aug 19 04:46:51 2003
From: andreas.kuntzagk at mdc-berlin.de (Andreas Kuntzagk)
Date: Sat Mar  5 14:43:26 2005
Subject: [Biopython-dev] biopython into PyPi ?
Message-ID: <1061282763.9530.13.camel@sulawesi>

Hi,

do you think it would good to add biopython to the Python Package Index?
( http://www.python.org/pypi )
This would maybe bring more developer/user to biopython ( if this is
wanted.)

Andreas


From mdehoon at ims.u-tokyo.ac.jp  Tue Aug 19 07:16:30 2003
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Sat Mar  5 14:43:26 2005
Subject: [Biopython-dev] biopython into PyPi ?
In-Reply-To: <1061282763.9530.13.camel@sulawesi>
References: <1061282763.9530.13.camel@sulawesi>
Message-ID: <3F42070E.8090408@ims.u-tokyo.ac.jp>

The PyPI page seems to describe smaller projects than Biopython, so 
Biopython may be out of place there. Other large Python projects are 
also not represented there. On the other hand, it wouldn't hurt.
--Michiel.

Andreas Kuntzagk wrote:
> Hi,
> 
> do you think it would good to add biopython to the Python Package Index?
> ( http://www.python.org/pypi )
> This would maybe bring more developer/user to biopython ( if this is
> wanted.)
> 
> Andreas
> 
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev
> 
> 

-- 
Michiel de Hoon
Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From bugzilla-daemon at portal.open-bio.org  Tue Aug 19 17:49:04 2003
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Sat Mar  5 14:43:26 2005
Subject: [Biopython-dev] [Bug 1493] Failure to load a MySQL database using
	BioSQL
Message-ID: <200308192149.h7JLn4WP019183@localhost.localdomain>

http://bugzilla.bioperl.org/show_bug.cgi?id=1493

idoerg@burnham.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |major
            Summary|Failure to load a MySQL     |Failure to load a MySQL
                   |database using BioSQL       |database using BioSQL


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Aug 19 18:33:05 2003
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Sat Mar  5 14:43:26 2005
Subject: [Biopython-dev] [Bug 1493] Failure to load a MySQL database using
	BioSQL
Message-ID: <200308192233.h7JMX5Ge019394@localhost.localdomain>

http://bugzilla.bioperl.org/show_bug.cgi?id=1493


------- Additional Comments From chapmanb@arches.uga.edu  2003-08-19 18:33 -------
Iddo -- Yup, the Biopython BioSQL code is definitely out of phase with the
current schema. I just plain haven't had time to work on it and get things back
up to date. Anyone else is definitely welcome to step up.

The SQL schemas in the test directory (Tests/BioSQL) are the schemas that they
work with. These are reasonably recent (depending, of course, on your definition
of reasonable) and should do most things for Biopython only use. That's the best
alternative to updating the code that we can offer at the moment.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From jchang at jeffchang.com  Wed Aug 20 02:18:11 2003
From: jchang at jeffchang.com (Jeffrey Chang)
Date: Sat Mar  5 14:43:26 2005
Subject: [Biopython-dev] biopython into PyPi ?
In-Reply-To: <1061282763.9530.13.camel@sulawesi>
Message-ID: <1344B701-D2D6-11D7-9EA3-000A956845CE@jeffchang.com>

Good idea.  I tried to create an account there, but got an SMTP error.  
Do you (or someone) already have an account, and can maintain the PyPI 
entry?  Otherwise, I will try again later.

Jeff


On Tuesday, August 19, 2003, at 01:46  AM, Andreas Kuntzagk wrote:

> Hi,
>
> do you think it would good to add biopython to the Python Package 
> Index?
> ( http://www.python.org/pypi )
> This would maybe bring more developer/user to biopython ( if this is
> wanted.)
>
> Andreas
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev


From andreas.kuntzagk at mdc-berlin.de  Wed Aug 20 04:17:11 2003
From: andreas.kuntzagk at mdc-berlin.de (Andreas Kuntzagk)
Date: Sat Mar  5 14:43:26 2005
Subject: [Biopython-dev] biopython into PyPi ?
In-Reply-To: <1344B701-D2D6-11D7-9EA3-000A956845CE@jeffchang.com>
References: <1344B701-D2D6-11D7-9EA3-000A956845CE@jeffchang.com>
Message-ID: <1061367382.9525.15.camel@sulawesi>

Am Mit, 2003-08-20 um 08.18 schrieb Jeffrey Chang:
> Good idea.  I tried to create an account there, but got an SMTP error. 

Same with me.
 
> Do you (or someone) already have an account, and can maintain the PyPI 
> entry?  Otherwise, I will try again later.


From idoerg at burnham.org  Mon Aug 25 18:30:06 2003
From: idoerg at burnham.org (Iddo Friedberg)
Date: Sat Mar  5 14:43:26 2005
Subject: [Biopython-dev] Re: [BioPython] Enzyme module
In-Reply-To: <3F4A75EB.4040008@burnham.org>
References: <3F4A75EB.4040008@burnham.org>
Message-ID: <3F4A8DEE.5070805@burnham.org>

In response to myself: I am writing the consumer. Watch this space...

./I


Iddo Friedberg wrote:
> Hi,
> 
> Has anybody ever used the Enzyme module and written some sort of 
> consumer for it? I'd like to do some very basic parsing... and steal 
> some code :)
> 
> Thanks,
> 
> Iddo
> 
> 

-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171
http://ffas.ljcrf.edu/~iddo


From jchang at jeffchang.com  Mon Aug 25 20:12:42 2003
From: jchang at jeffchang.com (Jeffrey Chang)
Date: Sat Mar  5 14:43:26 2005
Subject: [Biopython-dev] biopython into PyPi ?
In-Reply-To: <1061367382.9525.15.camel@sulawesi>
Message-ID: <02B3A831-D75A-11D7-82B3-000A956845CE@jeffchang.com>

The SMTP problems seem to have gone away, so I've registered Biopython 
under PyPI.  It was extremely easy, once my account was set up:
python setup.py register

It takes the metadata out of the setup.py file!

Jeff


On Wednesday, August 20, 2003, at 01:16  AM, Andreas Kuntzagk wrote:

> Am Mit, 2003-08-20 um 08.18 schrieb Jeffrey Chang:
>> Good idea.  I tried to create an account there, but got an SMTP error.
>
> Same with me.
>
>> Do you (or someone) already have an account, and can maintain the PyPI
>> entry?  Otherwise, I will try again later.
>


From bugzilla-daemon at portal.open-bio.org  Fri Aug 29 05:32:43 2003
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Sat Mar  5 14:43:26 2005
Subject: [Biopython-dev] [Bug 1492] Martel Parser fails on
	Bio.db["protein-genbank-cgi"] entry
Message-ID: <200308290932.h7T9Whud002272@localhost.localdomain>

http://bugzilla.bioperl.org/show_bug.cgi?id=1492

Peter.Bienstman@ugent.be changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Additional Comments From Peter.Bienstman@ugent.be  2003-08-29 05:32 -------
Fixed in current CVS (added keywords 'het' and 'heterogen'/) 


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.