From bugzilla-daemon at portal.open-bio.org  Mon Jun  2 04:19:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 04:19:50 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200806020819.m528JoXn006809@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #19 from ibdeno at gmail.com  2008-06-02 04:19 EST -------
Thank you, Peter.

In principle, I don't use that information. I will try then with the XML
parser.

Cheers,


Miguel

(In reply to comment #18)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun  2 04:49:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 04:49:55 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200806020849.m528ntdY008609@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #20 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-02 04:49 EST -------
Marking this bug as fixed.

The original report was about parsing the plain text output which is fixed -
see comment 12, and Bio/Blast/NCBIStandalone.py CVS revision 1.72.  I have not
added the 2.2.18 plain text file as a unit test since its over 750kb.

For the XML output from 2.2.18, as far as I can tell we are not ignoring any
important information from PSI-BLAST, as it is simply not included.  If the
NCBI updates the XML output from blastpgp then we should revisit the XML
parsing.

Thank you Miguel for your report and assistance.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun  2 06:37:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 06:37:51 -0400
Subject: [Biopython-dev] [Bug 2503] An error when parsing NCBIWWW Blast
	output
In-Reply-To: <bug-2503-42@http.bugzilla.open-bio.org/>
Message-ID: <200806021037.m52Abpj9019177@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2503


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-02 06:37 EST -------
Dear Prashanth,

Unless you can provide some more information, I'm going to have to close Bug
2503, as you haven't given us enough to go on.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun  2 08:57:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 08:57:20 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To: <bug-1944-42@http.bugzilla.open-bio.org/>
Message-ID: <200806021257.m52CvKt4026676@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1944


------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-02 08:57 EST -------
I've added simple __str__ and __repr__ methods to the alignment class in
Bio/Align/Generic.py CVS revision 1.8, which give output like this:

str(a):
DNAAlphabet() alignment with 3 rows and 14 columns
ACGATCAGCTAGCT Alpha
CCGATCAGCTAGCT Beta
ACGATGAGCTAGCT Gamma

repr(a):
<__main__.Alignment instance (3 records of length 14, DNAAlphabet()) at
9e96c2c>

The string output gets truncated to show a maximum of 20 rows and 50 columns,
which allowing for typical identifiers will still display nicely on a default
terminal.

I now intend to update the tutorial, as being able to print an alignment should
make it much easier to explain and get to grips with.

Note that there is still some interesting code in both attachment 732 (the
__getitem__ method) and in attachment 770 (e.g. subclassing list and adding
__len__, __add__, __radd__ etc).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun  2 09:26:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 09:26:28 -0400
Subject: [Biopython-dev] [Bug 2507] New: Adding __getitem__ to SeqRecord for
	element access and slicing
Message-ID: <bug-2507-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507

           Summary: Adding __getitem__ to SeqRecord for element access and
                    slicing
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
OtherBugsDependingO 1944
             nThis:


With a Seq object, you can access individual letters and create sub-sequences
using slicing.  You can even use a stride to reverse the sequence, or select
every third letter.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_seq = Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC', IUPAC.unambiguous_dna)
>>> print my_seq
GATCGATGGGCCTATATAGGATCGAAAATCGC
>>> my_seq
Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC', IUPACUnambiguousDNA())
>>> my_seq[5:10]
Seq('ATGGG', IUPACUnambiguousDNA())
>>> my_seq[::-1]
Seq('CGCTAAAAGCTAGGATATATCCGGGTAGCTAG', IUPACUnambiguousDNA())
>>> my_seq[5]
'A'

Currently, these operations cannot be done with a SeqRecord object.  This
enhancement bug is to allow element access and splicing (perhaps even with a
stride) on SeqRecord objects, where the annotations are taken into
consideration, and preserved as far as reasonably possible.

Looking at the different SeqRecord properties, this is what I think should
happen for creating a sub-sequence:

.id, .name, .description (three strings) - preserve?

Blindly preserving these may not always be meaningful.  For example, if the
description was "Complete plasmid" then it doesn't really apply to a
sub-sequence.  Perhaps we should preserve only the id and name, and set the
description to "sub-sequence"?

.annotations (dictionary) - either preserve or lose?

Some annotation entries will still be valid for a sub-sequence (e.g. "source"
or references).  Others will not (e.g. anything describing its coordinates
within a larger parent sequence).  There is no reliable way to decide on a case
by case basis.

.dbxrefs (list of strings) - preserve?

Any database cross-references would arguably still apply to a sub-sequence or
even a reversed sequence.

.features (list of SeqFeatures) - select only those features still in the new
sub-sequence, and adjust their locations for the new coordinates.  Supporting
strides other than +1 would be complicated!  For simplicity, I would say any
feature only partially within the sub-sequence should be discarded.

In summary, one clearly defined set of actions on creating a sub-sequence could
be to preserve all the annotation data except the SeqFeatures which would be
handled sensibly.

[If we later support "per-letter-annotation" in either a Seq or SeqRecord
subclass, then this too should be spliced]

Adding a __getitem__ method to the SeqRecord as outlined above should be
compatible with the suggestion that the SeqRecord subclasses the Seq object
(see bug 2351).

A related point, when accessing single letters, e.g. record[0], should a single
letter string be returned (which lacks any annotation) as currently happens
with the Seq object?

P.S. I'm marking this new enhancement bug as blocking bug 1944.  Once SeqRecord
objects support splicing, this would make annotation preserving slicing of
alignment objects much more straightforward.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun  2 09:26:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 09:26:33 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To: <bug-1944-42@http.bugzilla.open-bio.org/>
Message-ID: <200806021326.m52DQXk2029561@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1944


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |2507


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun  2 10:00:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 10:00:15 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806021400.m52E0FJK032027@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-02 10:00 EST -------
Simple implementation with ignores the features (non-trivial) to be added to
the SeqRecord class in Bio/SeqRecord.py

    def __getitem__(self, index) :
        if isinstance(index, int) :
            #TODO - Should single letters be returned as just
            #strings?  This prevents the inclusion of any annotation.
            #Revisit this once the Seq object is a subclass of string.
            return self.seq[index]
        elif isinstance(index, slice) :
            answer = self.__class__(self.seq[index],
                                    id=self.id,
                                    name=self.name,
                                    description=self.description)
            #COPY the annotation dict and dbxefs list:
            answer.annotations = dict(self.annotations.iteritems())
            answer.dbxrefs = self.dbxrefs[:]
            #TODO - select relevant features, and add them with
            #adjusted coordinates.  Take special care with a stride!
            return answer
        raise ValueError, "Invalid index"


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun  2 10:12:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 10:12:29 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806021412.m52ECT86000330@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


------- Comment #2 from jblanca at btc.upv.es  2008-06-02 10:12 EST -------
Does this means that SeqRecord would deprecate the .seq attribute? If the .seq
attribute is not removed slicing could be used in it like: my_seq[1:100] and
my_seq.seq[1:100].


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Mon Jun  2 10:14:40 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 2 Jun 2008 15:14:40 +0100
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <1211779470.483a498e18e3e@webmail.upv.es>
References: <320fb6e00805251437n34362f0bm2a323cd1194afaa@mail.gmail.com>
	<1211779470.483a498e18e3e@webmail.upv.es>
Message-ID: <320fb6e00806020714s2c789f61ke676a448e2ec871a@mail.gmail.com>

In reply to Jose, I (Peter) wrote:
>> One of your points seemed to be that the SeqRecord couldn't have a
>> __getitem__ and methods like reverse, complement, etc.  I don't see
>> why it couldn't have these.  Perhaps rather than introducing a whole
>> new class, enhancing the SeqRecord would be a better avenue.

I've filed Bug 2507 to try and show what I had in mind for the
__getitem__ method.
http://bugzilla.open-bio.org/show_bug.cgi?id=2507

Adding further methods for (reverse) complement etc could be done in
much the same way.

Returning to extending Biopython to support per-letter-annotation, I
can see two options:

Right now, the SeqRecord object HAS a Seq object.  If we create a new
RichSeq which subclasses the Seq object to provide
per-letter-annotation, then you could use a SeqRecord where the .seq
property is in fact a RichSeq object.  The SeqRecord class doesn't
need to have any changes made for this to work (assuming the RichSeq
provides the same API as the Seq object).

If we make the SeqRecord a subclass of the Seq object, then I would
suggest either RichSeq subclassing SeqRecord subclassing Seq, or
perhaps SeqRecord subclassing RichSeq subclassing Seq.  It depends on
if you think the id/name/description/dbxrefs/etc properties would be
useful in common use cases of the RichSeq object.

Its not going to be possible for all three classes to have the same
__init__ parameters without breaking existing scripts (and only
supporting the lowest common denominator).

Peter

From jblanca at btc.upv.es  Mon Jun  2 15:11:19 2008
From: jblanca at btc.upv.es (Blanca Postigo Jose Miguel)
Date: Mon,  2 Jun 2008 21:11:19 +0200
Subject: [Biopython-dev] Fwd: Re:  sequence class proposal
Message-ID: <1212433879.484445d7a6117@webmail.upv.es>


----- Mensaje reenviado de Blanca Postigo Jose Miguel <jblanca at btc.upv.es> -----
   Fecha: Mon,  2 Jun 2008 21:08:59 +0200
      De: Blanca Postigo Jose Miguel <jblanca at btc.upv.es>
Responder-A: Blanca Postigo Jose Miguel <jblanca at btc.upv.es>
 Asunto: Re: [Biopython-dev] sequence class proposal
    Para: Peter <biopython at maubp.freeserve.co.uk>

Mensaje citado por Peter <biopython at maubp.freeserve.co.uk>:

> In reply to Jose, I (Peter) wrote:
> >> One of your points seemed to be that the SeqRecord couldn't have a
> >> __getitem__ and methods like reverse, complement, etc.  I don't see
> >> why it couldn't have these.  Perhaps rather than introducing a whole
> >> new class, enhancing the SeqRecord would be a better avenue.
>
> I've filed Bug 2507 to try and show what I had in mind for the
> __getitem__ method.
> http://bugzilla.open-bio.org/show_bug.cgi?id=2507
I think that would be great. I've just added to the bug a question about the
.seq property of SeqRecord.

> Adding further methods for (reverse) complement etc could be done in
> much the same way.
>
> Returning to extending Biopython to support per-letter-annotation, I
> can see two options:
>
> Right now, the SeqRecord object HAS a Seq object.  If we create a new
> RichSeq which subclasses the Seq object to provide
> per-letter-annotation, then you could use a SeqRecord where the .seq
> property is in fact a RichSeq object.  The SeqRecord class doesn't
> need to have any changes made for this to work (assuming the RichSeq
> provides the same API as the Seq object).
Here I had a slighty different idea, but maybe yours is better. Basically my
RichSeq proposal is just a RichSeq with slicing and without the seq property.
The problem with the approach that you describe is that the RichSeq should have
the per-letter-annotation, so SeqRecord would have a general annotation and
RichSeq (in the .seq) would have other features. I would find that confusing.

>
> If we make the SeqRecord a subclass of the Seq object, then I would
> suggest either RichSeq subclassing SeqRecord subclassing Seq, or
> perhaps SeqRecord subclassing RichSeq subclassing Seq.  It depends on
> if you think the id/name/description/dbxrefs/etc properties would be
> useful in common use cases of the RichSeq object.
If SeqRecord is a subclass of Seq RichSeq is not necessary anymore. That's what
I was proposing. The problem is that the current users of SeqRecord would had a
hard time with the new behaviour, because in that case supporting the seq
property would be hard. To avoid that breakage I was proposing to create
RichSeq. RichSeq would be just the SeqRecord that you propose but would allow
the users to migrate to RichSeq without forcing them to change to a new
SeqRecord behaviour.

>
> Its not going to be possible for all three classes to have the same
> __init__ parameters without breaking existing scripts (and only
> supporting the lowest common denominator).
That's another reason to rename your new proposed SeqRecord to RichSeq.

>
> Peter
>

Jose Blanca

-- 
----- Fin del mensaje reenviado -----


-- 


From biopython at maubp.freeserve.co.uk  Mon Jun  2 15:51:30 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 2 Jun 2008 20:51:30 +0100
Subject: [Biopython-dev] Fwd: Re: sequence class proposal
In-Reply-To: <1212433879.484445d7a6117@webmail.upv.es>
References: <1212433879.484445d7a6117@webmail.upv.es>
Message-ID: <320fb6e00806021251q6cc1a7e8p36125c1326ab7a14@mail.gmail.com>

Jose wrote:
> > I've filed Bug 2507 to try and show what I had in mind for the
> > __getitem__ method.
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2507
>
> I think that would be great.

Good :)

Does anyone else want to comment?

>  I've just added to the bug a question about the .seq property of SeqRecord.

http://bugzilla.open-bio.org/show_bug.cgi?id=2507#c2 reads:
> Does this means that SeqRecord would deprecate the .seq attribute?
> If the .seq attribute is not removed slicing could be used in it like:
> my_seq[1:100] and my_seq.seq[1:100].

I was not intending to deprecate the SeqRecord's .seq property at this
time (I think that should happen in preparation for if/when the
SeqRecord becomes a subclass of the Seq object).

With my idea described on bug 2507, given a SeqRecord object my_seq_record:

my_seq_record[1:100] -> another SeqRecord (with annotation)
my_seq_record.seq[1:100] -> just a Seq object (no annotation)
my_seq_record.seq.tostring()[1:100] -> just a string (no annotation or alphabet)
str(my_seq_record.seq)[1:100] -> just a string (no annotation or alphabet)

These trivial examples would all "contain" the same sequence string.
This enhancement could be done right now, and shouldn't impeed any
future per-letter-annotation enhancements.

Perhaps per-letter-annotation enhancements could be added to the
SeqRecord class directly... I need to fully digest the discussion on
the BioSQL list, see:
http://lists.open-bio.org/pipermail/biosql-l/2008-May/thread.html

Peter

From mjldehoon at yahoo.com  Mon Jun  2 20:19:59 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 2 Jun 2008 17:19:59 -0700 (PDT)
Subject: [Biopython-dev] Bio.Entrez & Bio.EUtil
In-Reply-To: <320fb6e00805300717v60f0b153i88b5e9a8aee1744c@mail.gmail.com>
Message-ID: <624249.42121.qm@web62408.mail.re1.yahoo.com>

OK I'll double-check. I may not have noticed some missing DTDs if they were downloaded automatically from the internet. I think Biopython should ship the most common DTDs. At least the ones needed for test_Entrez, which probably covers most of the use cases of Bio.Entrez.

--Michiel.

Peter <biopython at maubp.freeserve.co.uk> wrote: On 24 May 2008, Michiel de Hoon wrote:
> Dear all,
>
> I have essentially completed the parser in Bio.Entrez.

The internals of the new design look more complicated to start with,
but I can see how much more general it is than the older versions :)

Should it work starting from an empty DTDs folder - or will we ship
Biopython with most of the current files?  I've had trouble with
Biopython trying to fetch missing DTD files from the internet.  I
think the problem is the NCBI using relative URLs.  The following
quick hack seems to help in Parser.py but only in some cases (because
as listed below, the NCBI have two different base paths):

279,280c279,288
<             warnings.warn("DTD file %s not found in Biopython
installation; trying to retrieve it from NCBI" % filename)
<             handle = urllib.urlopen(systemId)
---
>             warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % path)
>             if "/" in systemId :
>                 #Assume this is a full path, e.g.
>                 #http://www.ncbi.nlm.nih.gov/entrez/query/DTD/nlmmedline_080101.dtd
>                 handle = urllib.urlopen(systemId)
>             else :
>                 #Its a relative path, and I'm not sure how to best get the base path:
>                 handle = urllib.urlopen("http://www.ncbi.nlm.nih.gov/entrez/query/DTD/"+systemId)

(Also note there seem to be some tab/space isssues in this file).

>From http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ I've downloaded the
following files using wget:

egquery.dtd
eSearch_020511.dtd
nlmcommon_080101.dtd
pubmed_080101.dtd
eInfo_020511.dtd
eSpell.dtd
nlmmedline_080101.dtd
taxon.dtd
eLink_020511.dtd
eSummary_041029.dtd
nlmmedlinecitation_080101.dtd
uilist.dtd
ePost_020511.dtd
nlmsharedcatcit_080101.dtd

Additionally http://www.ncbi.nlm.nih.gov/dtd/ provided some further
XML files needed for the test_Entrez.py unit test:

NCBI_GBSeq.dtd
NCBI_GBSeq.mod.dtd
NCBI_Entity.mod.dtd
NCBI_Mim.dtd
NCBI_Mim.mod.dtd

With all the above files, then the unit test file test_Entrez.py
doesn't give any missing DTD warnings - but still has a couple of
failures.

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Jun  3 00:39:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jun 2008 00:39:24 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806030439.m534dOYI021682@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp  2008-06-03 00:39 EST -------
I agree that type checking is a problem.
I am not sure if a specialized function in Bio.File is a good idea. The
question is not if "this object is a file-like object", but "does this object
have the attributes/methods needed". So I would prefer to add checks only for
the required attributes/methods in each of the iterators.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Tue Jun  3 00:33:27 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 2 Jun 2008 21:33:27 -0700 (PDT)
Subject: [Biopython-dev] Bio.Entrez & Bio.EUtil
In-Reply-To: <624249.42121.qm@web62408.mail.re1.yahoo.com>
Message-ID: <112249.61498.qm@web62410.mail.re1.yahoo.com>

I checked but I did not see any missing DTDs. Most of the DTDs in the list you sent are in Biopython's CVS under Bio/Entrez/DTDs, and are included correctly if I do a fresh checkout from CVS. Maybe could you try with a fresh checkout?

--Michiel.

Michiel de Hoon <mjldehoon at yahoo.com> wrote: OK I'll double-check. I may not have noticed some missing DTDs if they were downloaded automatically from the internet. I think Biopython should ship the most common DTDs. At least the ones needed for test_Entrez, which probably covers most of the use cases of Bio.Entrez.

--Michiel.

Peter  wrote: On 24 May 2008, Michiel de Hoon wrote:
> Dear all,
>
> I have essentially completed the parser in Bio.Entrez.

The internals of the new design look more complicated to start with,
but I can see how much more general it is than the older versions :)

Should it work starting from an empty DTDs folder - or will we ship
Biopython with most of the current files?  I've had trouble with
Biopython trying to fetch missing DTD files from the internet.  I
think the problem is the NCBI using relative URLs.  The following
quick hack seems to help in Parser.py but only in some cases (because
as listed below, the NCBI have two different base paths):

279,280c279,288
<             warnings.warn("DTD file %s not found in Biopython
installation; trying to retrieve it from NCBI" % filename)
<             handle = urllib.urlopen(systemId)
---
>             warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % path)
>             if "/" in systemId :
>                 #Assume this is a full path, e.g.
>                 #http://www.ncbi.nlm.nih.gov/entrez/query/DTD/nlmmedline_080101.dtd
>                 handle = urllib.urlopen(systemId)
>             else :
>                 #Its a relative path, and I'm not sure how to best get the base path:
>                 handle = urllib.urlopen("http://www.ncbi.nlm.nih.gov/entrez/query/DTD/"+systemId)

(Also note there seem to be some tab/space isssues in this file).

>From http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ I've downloaded the
following files using wget:

egquery.dtd
eSearch_020511.dtd
nlmcommon_080101.dtd
pubmed_080101.dtd
eInfo_020511.dtd
eSpell.dtd
nlmmedline_080101.dtd
taxon.dtd
eLink_020511.dtd
eSummary_041029.dtd
nlmmedlinecitation_080101.dtd
uilist.dtd
ePost_020511.dtd
nlmsharedcatcit_080101.dtd

Additionally http://www.ncbi.nlm.nih.gov/dtd/ provided some further
XML files needed for the test_Entrez.py unit test:

NCBI_GBSeq.dtd
NCBI_GBSeq.mod.dtd
NCBI_Entity.mod.dtd
NCBI_Mim.dtd
NCBI_Mim.mod.dtd

With all the above files, then the unit test file test_Entrez.py
doesn't give any missing DTD warnings - but still has a couple of
failures.

Peter


_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


From bugzilla-daemon at portal.open-bio.org  Tue Jun  3 05:16:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jun 2008 05:16:48 -0400
Subject: [Biopython-dev] [Bug 2446] Comments in CT tags cause
	Bio.Sequencing.Ace.ACEParser to fail.
In-Reply-To: <bug-2446-42@http.bugzilla.open-bio.org/>
Message-ID: <200806030916.m539GmwZ001955@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2446


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-03 05:16 EST -------
As pointed out on the mailing list, the test cases attached to this bug have
disappeared (some expiry issue?).  In the mean time, we could probably just
edit the sole existing test case in Tests/Ace/contig1.ace to add a comment to
an existing CT tag.

Looking at this file, for example edit:

CT{
Contig1 repeat phrap 52 53 555456:555432
This is the forst line of comment for c1
and this the second for c1
}

to become:

CT{
Contig1 repeat phrap 52 53 555456:555432
COMMENT{
This is the first line of comment for c1
and this the second for c1}
}

In the short term, we could either ignore the COMMENT tags within a CT tag, or
just treat them as plain next.  Supporting the nested structure within the
current would require changes to the current Record structure.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Jun  3 07:46:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jun 2008 07:46:58 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806031146.m53BkwAB009224@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #5 from cracka80 at gmail.com  2008-06-03 07:46 EST -------
(In reply to comment #4)
> I agree that type checking is a problem.
> I am not sure if a specialized function in Bio.File is a good idea. The
> question is not if "this object is a file-like object", but "does this object
> have the attributes/methods needed". So I would prefer to add checks only for
> the required attributes/methods in each of the iterators.
> 

The function I have written does exactly this - it checks for the necessary
attributes and methods for a given object. The iterators would then only need
to call ``File.is_filelike()`` on each object passed into them, rather than a
type checking procedure. This is in accordance with the design pattern "Program
to an 'interface', not an 'implementation'." (Gang of Four). Would you like me
to provide a diff against the current revision of Biopython, with suggested
changes?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Jun  3 11:07:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jun 2008 11:07:35 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806031507.m53F7Zm7019694@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp  2008-06-03 11:07 EST -------
Two things:
1) Some of the code that does type checking for file-like-ness seems to be
quite old and possibly outdated (e.g. Gobase.Iterator). We should take this
opportunity  to go through these modules and check if they are still useful.
2) Many of these modules (especially the ones that use an "Iterator" class)
would be written differently in modern Python (in particular by making use of a
generator function instead of an Iterator class).

So I'd like to suggest the following:
-) For the modules whose usability is dubious in 2008, let's check on the
mailing list if anybody is still using them. If not, we can simply deprecate
them.
-) For the modules that are still useful, use try/except clauses to check for
the necessary attributes. The current function checks for 'read', 'readline',
'readlines', and '__iter__', whereas the parser probably only needs one of
them. 
-) If possible, I'd prefer to convert to modern Python as much as possible
(though formally that is not within the scope of this bug report).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jun  4 15:50:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Jun 2008 15:50:14 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806041950.m54JoEPj029720@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


------- Comment #3 from jblanca at btc.upv.es  2008-06-04 15:50 EST -------
Created an attachment (id=927)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=927&action=view)
RichSeq proposal

I have coded a sequence class that fullfils the requirements that I would like
to see. It's very similar to SeqRecord, but it is not compatible with it. It
has no seq property, although that can be solved. The problem with SeqRecord is
that it is not possible to create a class with an __init__ compatible with Seq
and SeqRecord at the same time.
This proposed class is just a draft, it needs more work but I would like to
receive comments about it.
It inherits from MutableSeq so it should be named MutableRichSeq, but it seems
that I'm too lazy to such a long name, I promise to change the name in a later
version and to create a RichSeq with Seq as parent.
Besides RichSeq there is in the attachment two other classes, RichFeature and
BioRange, but I would comment on that in another post.
I think that it is quite important to convert Seq and MutableSeq to newclasses,
what do you think about that? With the new classes we can use properties.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jun  4 16:19:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Jun 2008 16:19:41 -0400
Subject: [Biopython-dev] [Bug 2508] New: NCBIStandalone.blastall: provide
	support for '-F F' and make it safe
Message-ID: <bug-2508-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508

           Summary: NCBIStandalone.blastall:  provide support for '-F F' and
                    make it safe
           Product: Biopython
           Version: 1.44
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: mmokrejs at ribosome.natur.cuni.cz


The local NCBI blast by default masks low-complexity region by SEG algorithm.
I do not see a variable to affect this in NCBIStandalone.blastall().

Luckily, NCBIStandalone.blastall() is an unsafe function and does not check
whether I pass multiple arguments in a value expected to be a string or number.
Thus, I can do:

_blast_out, _error_info = NCBIStandalone.blastall('/usr/bin/blastall',
'blastn', blast_db, _blast_file, matrix='IDENTITY -F 0')

but imagine I would have done:

_blast_out, _error_info = NCBIStandalone.blastall('/usr/bin/blastall',
'blastn', blast_db, _blast_file, matrix='IDENTITY -F 0; rm -rf /etc/passwd')

The function should be protected against such attacks like if it would have
been directly exposed to web users as a CGI script. I propose similar defensive
strategy for all functions calling os.system(), os.exec(), os.popen*(), etc.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 04:52:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 04:52:47 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806050852.m558qlPF031059@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-05 04:52 EST -------
I replied to comment 2 on the mailing list.  I had intended this particular
bugzilla entry (bug 2507) to be very narrow in scope - purely a small backwards
compatible change to the current SeqRecord

Some of the questions in comment 3 might have fit better on Bug 2351 although
its getting rather long.  Rather than taking this issue further off topic, I'll
reply on the mailing list again.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Thu Jun  5 05:17:00 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 5 Jun 2008 10:17:00 +0100
Subject: [Biopython-dev] Fwd: Re: sequence class proposal
In-Reply-To: <320fb6e00806021251q6cc1a7e8p36125c1326ab7a14@mail.gmail.com>
References: <1212433879.484445d7a6117@webmail.upv.es>
	<320fb6e00806021251q6cc1a7e8p36125c1326ab7a14@mail.gmail.com>
Message-ID: <320fb6e00806050217y1c437b01qa7fd21d75a609e8c@mail.gmail.com>

This is in reply to Jose's comment 3 on bug 2507, which was quite broad.
http://bugzilla.open-bio.org/show_bug.cgi?id=2507#c3

> I have coded a sequence class that fullfils the requirements that I
> would like to see. It's very similar to SeqRecord, but it is not compatible
> with it. It has no seq property, although that can be solved. The problem
> with SeqRecord is that it is not possible to create a class with an __init__
> compatible with Seq and SeqRecord at the same time.

Even if one day the SeqRecord is a subclass of the Seq object, there
is no requirement that it have the same __init__ arguments.  In fact,
have to be different because for a SeqRecord you should also supply an
identifier (and potentially a name, description and other annotation).

> This proposed class is just a draft, it needs more work but I would like to
> receive comments about it.  It inherits from MutableSeq so it should be
> named MutableRichSeq, but it seems that I'm too lazy to such a long name,
> I promise to change the name in a later version and to create a RichSeq
> with Seq as parent.

I agree with you here that when getting a single letter (amino acid or
nucleotide) from a sequence with per-letter-annotation, e.g.
my_sequence[5], it would be very nice to have the
per-letter-annotation like the quality included.  This does mean the
object returned can't just be a single one character string.  However,
because the current Seq and MutableSeq classes return a simple string,
unless we return a subclass of a string, this risks breaking other
peoples code.  So, I would conclude that Seq needs to subclass a
string BEFORE we start including support for per-letter-annotation.
Ideally we would have alphabet aware versions of all the string
functions before we made this change (see Bug 2351).

> Besides RichSeq there is in the attachment two other classes, RichFeature
> and BioRange, but I would comment on that in another post.

Your BioRange and BioFeature classes seem somewhat similar to the
current SeqFeature class with its locations (and sub features).

> I think that it is quite important to convert Seq and MutableSeq to newclasses,
> what do you think about that? With the new classes we can use properties.

I have been thinking about deprecating the Seq.data property (and also
the MutableSeq).  The data string (or array) should really be a
private implementation detail, perhaps Seq._data following the
underscore for private convention.  We can then add property methods
to make the Seq.data available (perhaps with a deprecation warning).

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 05:36:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 05:36:18 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806050936.m559aINS001028@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-05 05:36 EST -------
Created an attachment (id=928)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=928&action=view)
Patch to Bio/SeqRecord.py adding __getitem__ and __len__ and __iter__

Patch based on my comment 1, with addition of __len__ allowing len(my_record)
rather than len(my_record.seq) and an explicit __iter__ method (although this
is not required, it lets us give a doc string).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 06:18:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 06:18:11 -0400
Subject: [Biopython-dev] [Bug 2509] New: Deprecating the .data property of
	the Seq and MutableSeq objects
Message-ID: <bug-2509-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2509

           Summary: Deprecating the .data property of the Seq and MutableSeq
                    objects
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
OtherBugsDependingO 2351
             nThis:


In anticipation that the Seq and MutableSeq objects will eventually subclass
the python string, their data property is not needed and confusing.  The
following patch will replace it with a new-class style property methods and a
docstring declaring it to be deprecated.

In the case of the Seq object, the sequence should be read only but the user
can currently modify the data property in place.

In the case of the MutableSeq, the fact that it is internally an array of
characters should be a private implementation detail.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 06:18:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 06:18:14 -0400
Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string,
	even subclass string?
In-Reply-To: <bug-2351-42@http.bugzilla.open-bio.org/>
Message-ID: <200806051018.m55AIE7S003198@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2351


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |2509


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 06:47:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 06:47:43 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806051047.m55AlhBe004755@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-05 06:47 EST -------
Note that adding __len__ has a knock on effect when dealing with SeqRecord
objects with a zero length sequence - they now evaluate to False rather than
True.

This was an issue for some of the unit tests where "if record" was used rather
than the more explicit "if record is not None".

This change could therefore have unexpected side effects in existing scripts,
however adding __len__ is required if we intend to make the SeqRecord act more
like the Seq object.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 07:03:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 07:03:27 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
	for '-F F' and make it safe
In-Reply-To: <bug-2508-42@http.bugzilla.open-bio.org/>
Message-ID: <200806051103.m55B3RUU005472@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-05 07:03 EST -------
You seem to have identified two issues.  Adding support for -F should be fairly
easy.

For the security issue, the caller should be validating their input.  Also if
running from a web-server, the permissions should also be restricted - failing
to do this is asking for trouble.

However, defence in layers would be good.  Would you suggest a simple check for
the ";" character?  What about escaped semi-colons?  Also this a platform
dependant issue.  The ";" character is Unix only.  At the Windows command line
you have to use an &&.

Do you have a patch in mind?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 08:56:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 08:56:21 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
	for '-F F' and make it safe
In-Reply-To: <bug-2508-42@http.bugzilla.open-bio.org/>
Message-ID: <200806051256.m55CuLfC010670@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508


------- Comment #2 from mmokrejs at ribosome.natur.cuni.cz  2008-06-05 08:56 EST -------
For the latter issue, I would go and use some python library to escape shell
metacharacters. cgi.escape() doesn't do what I would like to. Or cgi.wrap()?
Google search returned some hints:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498202
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66012
http://e-articles.info/e/a/title/Command-Injection/
https://bugs.gentoo.org/show_bug.cgi?id=187971#c5
https://bugs.gentoo.org/show_bug.cgi?id=187971#c23
http://mail.python.org/pipermail/python-3000/2007-May/007192.html
http://www.owasp.org/index.php/Interpreter_Injection
http://www.velocityreviews.com/forums/t352309-sql-escaping-module.html


One could learn or even use escaping functions from e.g. MySQLdb.escape()
of MySQLdb.connection.escape_string() but I don't think it is a complete
solution. I will try to think of it more later.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 09:25:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 09:25:43 -0400
Subject: [Biopython-dev] [Bug 2494] _retrieve_taxon in BioSQL.py needs
	urgent optimization
In-Reply-To: <bug-2494-42@http.bugzilla.open-bio.org/>
Message-ID: <200806051325.m55DPhrQ012033@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2494


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-05 09:25 EST -------
I've commited this patch to CVS as part of BioSQL/BioSeq.py revision 1.24

If you could update you installation of Biopython to CVS and test this please
Eric, then I think we can mark this bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 09:29:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 09:29:25 -0400
Subject: [Biopython-dev] [Bug 2509] Deprecating the .data property of the
	Seq and MutableSeq objects
In-Reply-To: <bug-2509-42@http.bugzilla.open-bio.org/>
Message-ID: <200806051329.m55DTP30012244@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2509


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-05 09:29 EST -------
Created an attachment (id=929)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=929&action=view)
Patch to Bio/Seq.py

This turns out to be quite a big change, and while the unit tests still pass
more extensive testing would be a good idea.

Alternatively, we could just leave expose .data as a read only property, and
switch to ._data (or a string subclass) instead.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 13:55:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 13:55:02 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806051755.m55Ht2TS024644@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #7 from cracka80 at gmail.com  2008-06-05 13:55 EST -------
I understand your approach that these functions should be converted to modern
Python, but it must also be remembered that Biopython as a whole is Python
2.3-compatible, so care must be taken not to modernise code too much. I can't
remember when iterators were phased in, but it should be possible, I think it
was around 2.2 anyway.

(In reply to comment #6)
> Two things:
> 1) Some of the code that does type checking for file-like-ness seems to be
> quite old and possibly outdated (e.g. Gobase.Iterator). We should take this
> opportunity  to go through these modules and check if they are still useful.
> 2) Many of these modules (especially the ones that use an "Iterator" class)
> would be written differently in modern Python (in particular by making use of a
> generator function instead of an Iterator class).
> 
> So I'd like to suggest the following:
> -) For the modules whose usability is dubious in 2008, let's check on the
> mailing list if anybody is still using them. If not, we can simply deprecate
> them.
> -) For the modules that are still useful, use try/except clauses to check for
> the necessary attributes. The current function checks for 'read', 'readline',
> 'readlines', and '__iter__', whereas the parser probably only needs one of
> them. 
> -) If possible, I'd prefer to convert to modern Python as much as possible
> (though formally that is not within the scope of this bug report).
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Jun  7 04:26:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 7 Jun 2008 04:26:54 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806070826.m578Qsj4019312@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp  2008-06-07 04:26 EST -------
(In reply to comment #7)
> I understand your approach that these functions should be converted to modern
> Python, but it must also be remembered that Biopython as a whole is Python
> 2.3-compatible, so care must be taken not to modernise code too much. I can't
> remember when iterators were phased in, but it should be possible, I think it
> was around 2.2 anyway.
> 
Bio.Blast.NCBIXML already uses generator functions to return iterators, so I
think we are fine as far as compatibility with Python 2.3 and later is
concerned.

I'll ask on the mailing list if Bio.Gobase has any users, to get started.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Sat Jun  7 04:35:05 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 7 Jun 2008 01:35:05 -0700 (PDT)
Subject: [Biopython-dev] Bio.Gobase, anybody?
Message-ID: <844450.31822.qm@web62415.mail.re1.yahoo.com>


Hi everbody,

As part of bug report 2454:
http://bugzilla.open-bio.org/show_bug.cgi?id=2454,
I started looking at the Bio.Gobase module.
This module provides access to the gobase database:
http://megasun.bch.umontreal.ca/gobase/

This module is about seven years old and (AFAICT)
is not actively maintained. We don't have documentation
for this module, but the unit tests suggests that it
parses HTML files from gobase. I am not sure exactly
where the HTML files came from, but I doubt that
after seven years this still works.

So I was wondering:
Does anybody use Bio.Gobase?

If not, I suggest we deprecate it for the next release,
and remove it in some future release.
If there are users, we need to make some (small) changes
to this module (that is what the original bug report
was about).

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Mon Jun  9 08:45:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 9 Jun 2008 08:45:24 -0400
Subject: [Biopython-dev] [Bug 2511] New: setup.py problem with del
	sys.modules["Martel"]
Message-ID: <bug-2511-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2511

           Summary: setup.py problem with del sys.modules["Martel"]
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


I'm currently trying to install Biopython from source (CVS) on a clean Mac OS X
machine, without reportlab, Numeric or mxTextTools.  I've run into a small
issue with "python setup.py build" related to the testing for an existing
Martel distribution (since Martel has been distributed separately from
Biopython before) due to the lack of mxTextTools.

Traceback (most recent call last):
  File "setup.py", line 508, in <module>
    'Bio.PopGen': ['SimCoal/data/*.par'],
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/core.py",
line 151, in setup
    dist.run_commands()
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/dist.py",
line 974, in run_commands
    self.run_command(cmd)
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/dist.py",
line 994, in run_command
    cmd_obj.run()
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/command/build.py",
line 112, in run
    self.run_command(cmd_name)
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/cmd.py",
line 333, in run_command
    self.distribution.run_command(command)
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/dist.py",
line 994, in run_command
    cmd_obj.run()
  File "setup.py", line 157, in run
    if not is_Martel_installed():
  File "setup.py", line 292, in is_Martel_installed
    del sys.modules["Martel"]   # Delete the old version of Martel.

The function  is_Martel_installed() starts by trying to load the bundled
Martel, by calling can_import("Martel").  This is failing with an ImportError
from mxTextTools - and hence the Martel version of the bundled copy cannot be
determined.  The next line of  is_Martel_installed() causes the problem:

del sys.modules["Martel"]

I think this only makes sense if the module could be imported, patch to follow.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun  9 08:46:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 9 Jun 2008 08:46:51 -0400
Subject: [Biopython-dev] [Bug 2511] setup.py problem with del
	sys.modules["Martel"]
In-Reply-To: <bug-2511-42@http.bugzilla.open-bio.org/>
Message-ID: <200806091246.m59Ckpts011798@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2511


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-09 08:46 EST -------
Created an attachment (id=930)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=930&action=view)
Patch to setup.py

How does this look Michiel?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Tue Jun 10 07:37:42 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 10 Jun 2008 12:37:42 +0100
Subject: [Biopython-dev] Giving the SeqRecord a length? Evaluating it as a
	boolean
Message-ID: <320fb6e00806100437n21e53369p36c85a810007ca19@mail.gmail.com>

Something we've discussed before is making the SeqRecord more like a
Seq object, perhaps even subclassing it.  I've got a patch on Bug 2507
to make some small steps in this direction - accessing elements of the
sequence by indexing the SeqRecord, i.e. letter = my_seq_record[5], or
iterating over the letters in a SeqRecord's sequence.

http://bugzilla.open-bio.org/show_bug.cgi?id=2507

In addition, I would like to give the SeqRecord a length, allowing
len(my_seq_record) rather than len(my_seq_record.seq).  However, this
has a side effect on the evaluation of a SeqRecord as a boolean.
Before, any sequence was True, but if we add the __len__ method then
any SeqRecord with a zero length sequence will evaluate as False.
This is a real issue, for example you can have GenBank files without a
sequence (see our unit test cases).  One example where this is
important is if you are using an iterator via the .next() method and
had been checking for a returned None by using "if record:" (something
some of the older unit tests were doing) you would have to start using
"if record is not None:" instead.

If the old behaviour is desirable (evaluating a SeqRecord as a boolean
is alway True), we could implement a __nonzero__ method to preserve
it, see: http://docs.python.org/ref/customization.html

What do people think?  Would adding a __len__ method to the SeqRecord
cause trouble?

Peter

From mjldehoon at yahoo.com  Tue Jun 10 19:17:56 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 10 Jun 2008 16:17:56 -0700 (PDT)
Subject: [Biopython-dev] Giving the SeqRecord a length? Evaluating it as
	a boolean
In-Reply-To: <320fb6e00806100437n21e53369p36c85a810007ca19@mail.gmail.com>
Message-ID: <797428.30617.qm@web62402.mail.re1.yahoo.com>

+1 for adding a __len__ method, with a __nonzero__ method to let all SeqRecord objects evaluate as true.

--Michiel.

Peter <biopython at maubp.freeserve.co.uk> wrote: Something we've discussed before is making the SeqRecord more like a
Seq object, perhaps even subclassing it.  I've got a patch on Bug 2507
to make some small steps in this direction - accessing elements of the
sequence by indexing the SeqRecord, i.e. letter = my_seq_record[5], or
iterating over the letters in a SeqRecord's sequence.

http://bugzilla.open-bio.org/show_bug.cgi?id=2507

In addition, I would like to give the SeqRecord a length, allowing
len(my_seq_record) rather than len(my_seq_record.seq).  However, this
has a side effect on the evaluation of a SeqRecord as a boolean.
Before, any sequence was True, but if we add the __len__ method then
any SeqRecord with a zero length sequence will evaluate as False.
This is a real issue, for example you can have GenBank files without a
sequence (see our unit test cases).  One example where this is
important is if you are using an iterator via the .next() method and
had been checking for a returned None by using "if record:" (something
some of the older unit tests were doing) you would have to start using
"if record is not None:" instead.

If the old behaviour is desirable (evaluating a SeqRecord as a boolean
is alway True), we could implement a __nonzero__ method to preserve
it, see: http://docs.python.org/ref/customization.html

What do people think?  Would adding a __len__ method to the SeqRecord
cause trouble?

Peter
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


From bugzilla-daemon at portal.open-bio.org  Tue Jun 10 19:30:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 10 Jun 2008 19:30:20 -0400
Subject: [Biopython-dev] [Bug 2511] setup.py problem with del
	sys.modules["Martel"]
In-Reply-To: <bug-2511-42@http.bugzilla.open-bio.org/>
Message-ID: <200806102330.m5ANUKfo019481@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2511


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2008-06-10 19:30 EST -------
(In reply to comment #1)
> Created an attachment (id=930)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=930&action=view) [details]
> Patch to setup.py
> 
> How does this look Michiel?
> 

That looks find to me, though eventually I would prefer to get rid of the
dependence on Martel/mxTextTools altogether.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Jun 10 19:42:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 10 Jun 2008 19:42:52 -0400
Subject: [Biopython-dev] [Bug 2511] setup.py problem with del
	sys.modules["Martel"]
In-Reply-To: <bug-2511-42@http.bugzilla.open-bio.org/>
Message-ID: <200806102342.m5ANgqct019925@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2511


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-10 19:42 EST -------
In reply to comment 2, would it make sense for the unit test framework to treat
the mxTextTools (or reportlab, or Numeric) import errors as a missing external
dependency?

In the unit tests we used to "ignore" any tests which failed with an
ImportError, but have now switched to our own MissingExternalDependencyError
exception.

We want to distinguish ImportErrors which are external to Biopython (and
therefore can be considered as missing dependencies) from those internal to
Biopython (perhaps due to refactoring or removal of code - a real unit test
failure).  One way to do this would be in the bits of Biopython that try to
import mxTextTools (or any other module) to raise
MissingExternalDependencyError (or something that is a subclass of both
MissingExternalDependencyError and the built in ImportError).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jun 11 02:54:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 02:54:32 -0400
Subject: [Biopython-dev] [Bug 2516] New: Make it clear what is numeric and
	what is numpy
Message-ID: <bug-2516-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2516

           Summary: Make it clear what is numeric and what is numpy
           Product: Biopython
           Version: 1.45
          Platform: PC
               URL: http://www.biopython.org/DIST/docs/install/Installation.
                    html
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: mmokrejs at ribosome.natur.cuni.cz


Hi,
  although both packages are from the same source site, numpy is the newer
implementation whereas numeric is the old, deprecated implementation, right?
Why do you say in the installation docs the following?

"The Numerical Python distribution (also known an Numeric or Numpy) is a fast
implementation of arrays and associated array functionality. This is important
for a number of Biopython modules that deal with number processing. The main
web site for Numeric is: http://sourceforge.net/projects/numpy and downloads
are available from:..."

I think it is fooling.

BTW, is numpy-1.1.0 supported?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jun 11 04:47:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 04:47:32 -0400
Subject: [Biopython-dev] [Bug 2511] setup.py problem with del
	sys.modules["Martel"]
In-Reply-To: <bug-2511-42@http.bugzilla.open-bio.org/>
Message-ID: <200806110847.m5B8lWxd010254@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2511


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-11 04:47 EST -------
Patch checked into CVS as Biopython/setup.py revision 1.133, marking this bug
as fixed.

The issue I raised in comment 3 is still outstanding (external ImportErrors and
the unit tests).  We may want to file a separate bug, or discuss this on the
dev mailing list.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jun 11 04:53:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 04:53:30 -0400
Subject: [Biopython-dev] [Bug 2516] Make it clear what is numeric and what
	is numpy
In-Reply-To: <bug-2516-42@http.bugzilla.open-bio.org/>
Message-ID: <200806110853.m5B8rU2t010552@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2516


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-11 04:53 EST -------
That text is rather out of date - if you are familiar with the history of
Numeric, numarray and numpy you'll know that the old module used with "import
Numeric" was called Numerical Python or NumPy for short. This shorthand was
used in lots of documentation (not just in Biopython). I think the choice to
call the third generation of the array packages numpy has caused a lot of
confusion.

See http://numpy.scipy.org/#older_array

We had updated the Biopython website and other bits of documentation, but had
missed this one.  Thank you for point this out.

P.S. Supporting numpy instead of Numeric is Biopython Bug 2251.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jun 11 05:04:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 05:04:47 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806110904.m5B94li8011303@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-11 05:04 EST -------
I raised the issue of evaluating a SeqRecord as a boolean with a proposal that
would could add __len__ but also add __nonzero__ to ensure that any SeqRecord
evaluates as True (even if the sequence is of length zero):
http://lists.open-bio.org/pipermail/biopython-dev/2008-June/003756.html

Michiel was in favour of this:
> +1 for adding a __len__ method, with a __nonzero__ method to let all SeqRecord
> objects evaluate as true.

The patch isn't ready yet because in addition it doesn't get deal with the
SeqFeature objects.  I think the SeqFeature class needs a _shift(offset) method
to return a copy of itself with its location (and the locations of any
sub-features) adjusted.

I'm still not sure about handling strides, and I am tempted to rule that if a
stride other than one is used then the features of the SeqRecord are lost.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jun 11 09:57:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 09:57:56 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806111357.m5BDvu1I024400@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #928 is|0                           |1
           obsolete|                            |


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-11 09:57 EST -------
Created an attachment (id=937)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=937&action=view)
Patch to Bio/SeqRecord.py and Bio/SeqFeature.py

This modifies the SeqRecord to give it __getitem__ (supporting sliced
annotations including features), __len__ (to return the length of the
sequence). __nonzero__ (to ensure any SeqRecord evaluates as True regardless of
the length of its sequence) and __iter__ (to explicitly support iteration over
the sequence with a docstring).  As part of this, assorted objects in
SeqFeature.py get a private _shift() method taking an integer offset to return
a self copy with an adjusted location.

Note that slices with a stride (other than one) will result in the features
being lost.  Handling (positive) strides would require complicated
consideration about if an exact location is still present, and if not replacing
it with either a fuzzy position or a range.  Negative strides are worse!

The current set of unit tests seem fine, but addition checks would need to be
added to validate this new behaviour.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jun 11 11:26:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 11:26:59 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806111526.m5BFQxMw029057@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #9 from mdehoon at ims.u-tokyo.ac.jp  2008-06-11 11:26 EST -------
I "fixed" SwissProt.SProt.Iterator by deprecating it. Instead of
SwissProt.SProt.Iterator, we recommend using Bio.SwissProt.parse and
Bio.SeqIO.parse.

Next on the to-do list is SwissProt.KeyWList.extract_keywords.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jun 12 10:23:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 12 Jun 2008 10:23:16 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806121423.m5CENG95026678@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp  2008-06-12 10:23 EST -------
SwissProt.KeyWList.extract_keywords could only parse very old SwissProt files.
I deprecated it and wrote a new function "parse" that parses current SwissProt
files. This function does not do the file-like check.

Prosite.Iterator and Prosite.Prodoc.Iterator are next.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From fkauff at biologie.uni-kl.de  Thu Jun 12 10:33:56 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Thu, 12 Jun 2008 16:33:56 +0200
Subject: [Biopython-dev] CVS access and developers web site
In-Reply-To: <320fb6e00805291446x1cebf67bpe3e0818af5b9a7c5@mail.gmail.com>
References: <483E7578.50402@biologie.uni-kl.de>
	<320fb6e00805291446x1cebf67bpe3e0818af5b9a7c5@mail.gmail.com>
Message-ID: <485133D4.2060405@biologie.uni-kl.de>


Peter Cock wrote:
> Hi Frank,
>
> I would try emailing support at helpdesk.open-bio.org using the email
> address associated with your CVS username.  If you've changed email
> address, and you run into problems, I expect Michiel or I could vouch
> for you.
>   
Is somebody monitoring that email address? I got an automated response 
about two weeks ago, and then nothing happened.

> For the website, the wiki usernames are entirely separate and you
> should be able to create a new account if you don't have one already.
> If you want to update the tutorial new HTML and PDF files are loaded
> with each release from the version in CVS.
>   
Thanks Peter, got access to the wiki and updated personal data.

Frank
> Peter
>
> On Thu, May 29, 2008 at 10:20 AM, Frank Kauff <fkauff at biologie.uni-kl.de> wrote:
>   
>> Hi folks,
>>
>> although I've been quiet for a while, I'm still doing some changes to the
>> Nexus parser of biopython from time to time.... I totally lost my passwords
>> to access the repository. Could someone please send me a new password to get
>> write access to cvs? And I would also like to change the information on the
>> biopython developers web site, as they are somewhat outdated.
>> And is this the right place to ask for such things?
>>
>> Thanks!
>>
>> Frank
>>     
>
>   


From bugzilla-daemon at portal.open-bio.org  Thu Jun 12 11:42:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 12 Jun 2008 11:42:58 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806121542.m5CFgw9t029594@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #11 from cracka80 at gmail.com  2008-06-12 11:42 EST -------
Maybe it's a good idea for any parsers/iterators to just use the iterator-like
ability of file handles? Writers would have to function slightly differently,
but since file objects, StringIOs and any other file-like objects must provide
an __iter__ method, it's probably a good idea to take that into consideration
when developing a common interface. In addition, writers could output iterators
or generators, so that they can be chained together to operate on files.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jun 13 12:24:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jun 2008 12:24:29 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806131624.m5DGOTKw025954@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #12 from mdehoon at ims.u-tokyo.ac.jp  2008-06-13 12:24 EST -------
(In reply to comment #11)
> Maybe it's a good idea for any parsers/iterators to just use the iterator-like
> ability of file handles?

In principle, yes. In practice, it's not so easy because many parsers in
Biopython follow the framework in Bio.ParserSupport. These parsers are not
really written to deal with lines pulled one-by-one from a file handle. To
reconcile these two, I pull out data line-by-line from the file handle, store
it in a string, and then call the parser to parse it. This is not ideal, and it
may be a good idea for Biopython at some point to change its parser strategy.

> Writers would have to function slightly differently,
> but since file objects, StringIOs and any other file-like objects must provide
> an __iter__ method, it's probably a good idea to take that into consideration
> when developing a common interface. In addition, writers could output 
> iterators or generators, so that they can be chained together to operate
> on files.
> 
Writers should also be able to just print the record to the screeen. I don't
see how that is easily achievable with generators. 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jun 13 12:27:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jun 2008 12:27:47 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806131627.m5DGRlTE026072@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp  2008-06-13 12:27 EST -------
Medline.Iterator, Prosite.Iterator, and Prosite.Prodoc.Iterator are now fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jun 13 22:29:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jun 2008 22:29:13 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806140229.m5E2TDdD014417@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #14 from mdehoon at ims.u-tokyo.ac.jp  2008-06-13 22:29 EST -------
I deprecated Bio.Gobase, since no users came forward on the mailing list.

Bio.Rebase is also problematic. It parses HTML from the Rebase database, but it
was written in 2000 and cannot parse current HTML from Rebase (which looks
completely different from the HTML used in 2000).

I'll ask on the mailing list if anybody is willing to update Bio.Rebase.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Fri Jun 13 22:34:05 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 13 Jun 2008 19:34:05 -0700 (PDT)
Subject: [Biopython-dev] Bio.Rebase
Message-ID: <237761.5963.qm@web62409.mail.re1.yahoo.com>

Hi everybody,

As part of bug #2454 on Bugzilla, I am looking at the Bio.Rebase module.
This module parses files (in HTML format) from the Rebase database:
http://rebase.neb.com/rebase/rebase.html

Unfortunately, since this module was written (in 2000) the HTML format used by the Rebase database has changed completely. This module is therefore not able to parse current Rebase HTML files.

Is anybody willing to update Bio.Rebase (either by updating the HTML parser, or preferably by writing a parser for plain-text output from Bio.Rebase)? If not, I think this module should be deprecated.

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Fri Jun 13 22:50:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jun 2008 22:50:42 -0400
Subject: [Biopython-dev] [Bug 2516] Make it clear what is numeric and what
	is numpy
In-Reply-To: <bug-2516-42@http.bugzilla.open-bio.org/>
Message-ID: <200806140250.m5E2ogvf014920@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2516


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2008-06-13 22:50 EST -------
According to the Numerical Python website, the NumPy documentation will become
freely available on September 1, 2008. That would be a good time to start
thinking seriously about converting from the "old" Numerical Python to the
"new" NumPy 1.1.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Fri Jun 13 22:46:37 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 13 Jun 2008 19:46:37 -0700 (PDT)
Subject: [Biopython-dev] Bio.SCOP maintainer?
Message-ID: <523172.98428.qm@web62402.mail.re1.yahoo.com>

Still looking at Bug 2454
(http://bugzilla.open-bio.org/show_bug.cgi?id=2454).

To fix this bug, I'd like to make some changes to Bio.SCOP.
Is anybody currently maintaining Bio.SCOP? The changes I'd like to make are small, but it would be better to discuss with the Bio.SCOP maintainer (if there is one) so I won't get in their way.

--Michiel.

       
From bugzilla-daemon at portal.open-bio.org  Sat Jun 14 05:52:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 14 Jun 2008 05:52:09 -0400
Subject: [Biopython-dev] [Bug 2488] Adding XML parsers to Bio.Entrez
In-Reply-To: <bug-2488-42@http.bugzilla.open-bio.org/>
Message-ID: <200806140952.m5E9q9X9032018@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2488


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp  2008-06-14 05:52 EST -------
We now have parsers for XML returned by Entrez, provided the corresponding DTDs
are available. Bio/Entrez/DTDs contains most (all?) DTDs currently used by
Entrez. If later some DTDs appear to be missing, we can simply add them to
Bio/Entrez/DTDs.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Jun 14 06:29:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 14 Jun 2008 06:29:12 -0400
Subject: [Biopython-dev] [Bug 2516] Make it clear what is numeric and what
	is numpy
In-Reply-To: <bug-2516-42@http.bugzilla.open-bio.org/>
Message-ID: <200806141029.m5EATC64001227@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2516


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp  2008-06-14 06:29 EST -------
Updated the installation instructions (in CVS, at least).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From p.j.a.cock at googlemail.com  Sat Jun 14 18:51:26 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 14 Jun 2008 23:51:26 +0100
Subject: [Biopython-dev] CVS access and developers web site
In-Reply-To: <485133D4.2060405@biologie.uni-kl.de>
References: <483E7578.50402@biologie.uni-kl.de>
	<320fb6e00805291446x1cebf67bpe3e0818af5b9a7c5@mail.gmail.com>
	<485133D4.2060405@biologie.uni-kl.de>
Message-ID: <320fb6e00806141551t56422a98v752e34bbbb38d0aa@mail.gmail.com>

>> Hi Frank,
>>
>> I would try emailing support at helpdesk.open-bio.org using the email
>> address associated with your CVS username.  If you've changed email
>> address, and you run into problems, I expect Michiel or I could vouch
>> for you.
>>
>
> Is somebody monitoring that email address? I got an automated response about
> two weeks ago, and then nothing happened.
>

Maybe someone is on holiday - or they are caught up with BOSC 2008
work?  I can suggest a few specific people at OBF to try and contact
directly if you are still stuck.

In the short term, if there are any urgent fixes you think need to be
checked in, stick them on Bugzilla and I'm sure one of us will be able
to commit them on your behalf.

Peter

From bugzilla-daemon at portal.open-bio.org  Sun Jun 15 03:03:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 15 Jun 2008 03:03:18 -0400
Subject: [Biopython-dev] [Bug 2468] Tutorial needs a fix: Bio.WWW.NCBI
In-Reply-To: <bug-2468-42@http.bugzilla.open-bio.org/>
Message-ID: <200806150703.m5F73IF2007099@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2468


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp  2008-06-15 03:03 EST -------
I created a subsection Examples to the tutorial chapter on Bio.Entrez, and
added 
the example from section 2.5 and Martin's taxonomy example to it. With the
Bio.Entrez currently in CVS, finding the lineage works as follows:

>>> handle = Entrez.esearch(db="Taxonomy", term="Cypripedioideae")
>>> record = Entrez.read(handle)
>>> record["IdList"]
['158330']
>>> handle = Entrez.efetch(db="Taxonomy", id="158330", retmode='xml')
>>> records = Entrez.read(handle)
>>> records[0]['Lineage']
'cellular organisms; Eukaryota; Viridiplantae; Streptophyta; Streptophytina;
 Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliophyta;
 Liliopsida; Asparagales; Orchidaceae'


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 16 15:23:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 16 Jun 2008 15:23:43 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806161923.m5GJNhZw012022@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #937 is|0                           |1
           obsolete|                            |


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-16 15:23 EST -------
Created an attachment (id=942)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=942&action=view)
Patch to Bio/SeqRecord.py and Bio/SeqFeature.py

I've checked in the SeqRecord __len__ and __nonzero__ methods with CVS
Bio/SeqRecord.py revision 1.17

The earlier __getitem__ and __iter__ patch has been updated accordingly.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 16 16:08:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 16 Jun 2008 16:08:00 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To: <bug-1944-42@http.bugzilla.open-bio.org/>
Message-ID: <200806162008.m5GK80bv014002@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1944


------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-16 16:07 EST -------
Created an attachment (id=943)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=943&action=view)
Minimal __getitem__ method for generic alignment

This patch just adds a __getitem__ to the alignment which ONLY accepts a single
integer index and returns the corresponding SeqRecord object.  I propose to add
this NOW, as I think even just this is a worthwhile improvement.

This is a natural expectation given the current __iter__ behaviour and the
model of the alignment as a list of SeqRecord objects.  Its also part of the
more rich behaviour discussed above, which we can add more easily if/when the
SeqRecord gets a __getitem__ method (bug 2507).

Comments on this particular patch?  Should we add __len__ at the same time
giving the number of rows in the alignments?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From jblanca at btc.upv.es  Tue Jun 17 03:35:38 2008
From: jblanca at btc.upv.es (Jose Blanca)
Date: Tue, 17 Jun 2008 09:35:38 +0200
Subject: [Biopython-dev] [BioPython] Ace contig files in Bio.SeqIO or
	Bio.AlignIO
In-Reply-To: <320fb6e00806160701l428584c0i30acac57338b9357@mail.gmail.com>
References: <320fb6e00806160701l428584c0i30acac57338b9357@mail.gmail.com>
Message-ID: <200806170935.38904.jblanca@btc.upv.es>

Hi:
My main use of the Alignment class is to parse Ace files. I've been thinking 
about that problem recently. My proposal to modify SeqRecord was due to this 
problem. I think that the best solution would be to treat the Alignment as a 
sequence. The consensus would be the actual sequences and the aligned read 
would be features with per-base-annotations. I've implemented such a class 
and it works fine for me. In fact the Alignment class is just a wrapper 
around a standard SeqRecord (I name it RichSeq in my implementation).
To do that you just need a SeqRecord with a __getitem__ method. You have 
already proposing that so that's not a problem.
Padding with spaces is not an option when you're dealing with genomic wide 
alignments, that's one of the problems of the actual Alignment class.
If you want I can send my implementation to the list, although it could take a 
while because I've got my home computer dead.
Best regards,

Jose Blanca

On Monday 16 June 2008 16:01:31 Peter wrote:
> I've recently had to deal with some contig files in the Ace format
> (output by CAP3, but many assembly files will produce this output).
>
> We have a module for parsing Ace files in Biopython,
> Bio.Sequencing.Ace but I was wondering about integrating this into the
> Bio.SeqIO or Bio.AlignIO framework.
> http://www.biopython.org/wiki/SeqIO
> http://www.biopython.org/wiki/AlignIO
>
> I'd like to hear from anyone currently using Ace files, on how they
> tend to treat the data - and if they think a SeqRecord or Alignment
> based representation would be useful.
>
> Each contig in an Ace file could be treated as a SeqRecord using the
> consensus sequence.  The identifiers of each sub-sequence used to
> build the consensus could be stored as database cross-references, or
> perhaps we could store these as SeqFeatures describing which part of
> the consensus they support.  This would then fit into Bio.SeqIO quite
> well.
>
> Alternatively, each contig could be treated as an alignment (with a
> consensus) and integrated into Bio.AlignIO.  One drawback for this is
> doing this with the current generic alignment class would require
> padding the start and/or end of each sequence with gaps in order to
> make every sequence the same length.  However, if we did this (or
> created a more specialised alignment class), the Ace file format would
> then fit into Bio.AlignIO too.
>
> So, Ace users - would either (or both) of the above approaches make
> sense for how you use the Ace contig files?
>
> Thanks
>
> Peter
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)

From biopython at maubp.freeserve.co.uk  Tue Jun 17 04:46:22 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Jun 2008 09:46:22 +0100
Subject: [Biopython-dev] [BioPython] Ace contig files in Bio.SeqIO or
	Bio.AlignIO
In-Reply-To: <200806170935.38904.jblanca@btc.upv.es>
References: <320fb6e00806160701l428584c0i30acac57338b9357@mail.gmail.com>
	<200806170935.38904.jblanca@btc.upv.es>
Message-ID: <320fb6e00806170146j6f1843e6hed4166ad62c84423@mail.gmail.com>

On Tue, Jun 17, 2008 at 8:35 AM, Jose Blanca <jblanca at btc.upv.es> wrote:
> Hi:
> My main use of the Alignment class is to parse Ace files. I've been thinking
> about that problem recently. My proposal to modify SeqRecord was due to this
> problem. I think that the best solution would be to treat the Alignment as a
> sequence. The consensus would be the actual sequences and the aligned read
> would be features with per-base-annotations.

So integrating the "ace" format into Bio.SeqIO representing the
consensus sequence of each contig as a SeqRecord would be useful.
Initially I would try and represent the aligned reads as SeqFeature
objects (much like when reading a genome from a GenBank file you get
CDS features with their amino acid translation).

Note that for memory reasons, I would be inclined to scan over the Ace
file in one pass (using the existing Iterator in the
Bio.Sequencing.Ace parser) returning SeqRecords as we go.  As Frank
points out in the code comments, this means we can't easily include
the WA, CT, RT and WR tags found in the Ace file footer.  Do you use
this information Jose?

> I've implemented such a class
> and it works fine for me. In fact the Alignment class is just a wrapper
> around a standard SeqRecord (I name it RichSeq in my implementation).
> To do that you just need a SeqRecord with a __getitem__ method. You have
> already proposing that so that's not a problem.

Your enthusiasm Jose is one of the things motivating me to try and do
more with the Seq and SeqRecord.  Without a third party to offer
feedback, making big changes is risky.

> Padding with spaces is not an option when you're dealing with genomic wide
> alignments, that's one of the problems of the actual Alignment class.

It might make sense to talk about a "Contig Alignment" object/class,
compared to the existing "multiple sequence alignment"  object/class
where all the sequences are the same length.  Ideally these should
provide as similar an API as possible - even if the internals are
different.  One idea is a sub-class of the current alignment class
which stores an offset (>=0) for each supporting read, used when
accessing columns.  Maybe we should check out BioPerl etc for
inspiration?

> If you want I can send my implementation to the list, although it could take a
> while because I've got my home computer dead.

Good luck with the broken computer - I hope you have an easier time
fixing it / rebuilding it than I did last time this hapended to me.

Regards,

Peter

From biopython at maubp.freeserve.co.uk  Tue Jun 17 05:16:29 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Jun 2008 10:16:29 +0100
Subject: [Biopython-dev] Iterating over Ace contig files
Message-ID: <320fb6e00806170216k12ecd88fof60758db1ccec3cf@mail.gmail.com>

Hello Frank,

I wanted to get your opinion on iterating over the Ace file contig by
contig, and what is lost in the WA, CT, RT and WR tags at the end of
the file by doing this.  As large sequencing runs become more common,
iterating over the file in a single pass WITHOUT keeping everything in
memory does seem to be desirable.

Similar past discussions:
http://portal.open-bio.org/pipermail/biopython/2004-February/001825.html
http://portal.open-bio.org/pipermail/biopython/2005-May/002661.html

Would you object to me rewording your module's header-comment not to
say that the Ace Iterator is NOT deprecated, but rather that it has
certain drawbacks.

[The context for this is my recent thread on the Biopython dev mailing
list about integrating your Bio.Sequencing.Ace parser into Bio.SeqIO
and/or Bio.AlignIO - I've included a little context below.]

Thanks,

Peter

--

Peter wrote:
>> So integrating the "ace" format into Bio.SeqIO representing the
>> consensus sequence of each contig as a SeqRecord would be useful.
>> Initially I would try and represent the aligned reads as SeqFeature
>> objects (much like when reading a genome from a GenBank file you get
>> CDS features with their amino acid translation).
>>
>> Note that for memory reasons, I would be inclined to scan over the Ace
>> file in one pass (using the existing Iterator in the
>> Bio.Sequencing.Ace parser) returning SeqRecords as we go.  As Frank
>> points out in the code comments, this means we can't easily include
>> the WA, CT, RT and WR tags found in the Ace file footer.  Do you use
>> this information Jose?

Jose replied,
> I haven't used the iterator because of the deprecation warning of the code. I
> tried with about 40000 alignments and it worked in a computer with 8 GB of ram.
> I there are more sequences, and there will be with the 454 sequencer, we will
> have trouble reading all at once. I vote for the iterator approach. I have not
> used the information of this tag, but I don't know also what they mean. I've
> been looking for documentation about this format, but I've found none, do you
> have any good ace documentation?

From bugzilla-daemon at portal.open-bio.org  Tue Jun 17 07:23:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Jun 2008 07:23:59 -0400
Subject: [Biopython-dev] [Bug 2520] New: Reading ACE assembly contig files
	in Bio.SeqIO
Message-ID: <bug-2520-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2520

           Summary: Reading ACE assembly contig files in Bio.SeqIO
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


As I suggested on the mailing list, we could use Bio.Sequencing.Ace to parse
ACE assembly files, and then turn each contig into a SeqRecord using the
consensus sequence.

I will attach a basic implementation which only uses the consensus sequence and
its name.  For now this ignores all the meta data and in particular the read
information.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Jun 17 07:29:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Jun 2008 07:29:15 -0400
Subject: [Biopython-dev] [Bug 2520] Reading ACE assembly contig files in
	Bio.SeqIO
In-Reply-To: <bug-2520-42@http.bugzilla.open-bio.org/>
Message-ID: <200806171129.m5HBTFVG026790@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2520


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-17 07:29 EST -------
Created an attachment (id=944)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=944&action=view)
New file Bio/SeqIO/AceIO.py

This new file would be added to Bio.SeqIO in the usual way (updating
Bio/SeqIO/__init__.py to import this module and map the format "ace" to the new
iterator).

Handling different gap characters in Bio.SeqIO (and translating them when
reading and writing files) has not been formalised.  Where possible, converting
them into dashes on loading seems to be a sensisble route to take.

Therefore I deliberately map any "*" gap characters in the consensus sequence
into "-" characters, which are used by default in the alphabet class and are
far more commonly used.  The "*" character is typically associated with a stop
codon in protein sequences, which is another reason to avoid using it here.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From fkauff at biologie.uni-kl.de  Tue Jun 17 09:06:34 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Tue, 17 Jun 2008 15:06:34 +0200
Subject: [Biopython-dev] Iterating over Ace contig files
In-Reply-To: <320fb6e00806170216k12ecd88fof60758db1ccec3cf@mail.gmail.com>
References: <320fb6e00806170216k12ecd88fof60758db1ccec3cf@mail.gmail.com>
Message-ID: <4857B6DA.9040309@biologie.uni-kl.de>

Hi Peter,

makes totally sense to me. Feel free to do the changes as you see it fit

Frank


Peter wrote:
> Hello Frank,
>
> I wanted to get your opinion on iterating over the Ace file contig by
> contig, and what is lost in the WA, CT, RT and WR tags at the end of
> the file by doing this.  As large sequencing runs become more common,
> iterating over the file in a single pass WITHOUT keeping everything in
> memory does seem to be desirable.
>
> Similar past discussions:
> http://portal.open-bio.org/pipermail/biopython/2004-February/001825.html
> http://portal.open-bio.org/pipermail/biopython/2005-May/002661.html
>
> Would you object to me rewording your module's header-comment not to
> say that the Ace Iterator is NOT deprecated, but rather that it has
> certain drawbacks.
>
> [The context for this is my recent thread on the Biopython dev mailing
> list about integrating your Bio.Sequencing.Ace parser into Bio.SeqIO
> and/or Bio.AlignIO - I've included a little context below.]
>
> Thanks,
>
> Peter
>
> --
>
> Peter wrote:
>   
>>> So integrating the "ace" format into Bio.SeqIO representing the
>>> consensus sequence of each contig as a SeqRecord would be useful.
>>> Initially I would try and represent the aligned reads as SeqFeature
>>> objects (much like when reading a genome from a GenBank file you get
>>> CDS features with their amino acid translation).
>>>
>>> Note that for memory reasons, I would be inclined to scan over the Ace
>>> file in one pass (using the existing Iterator in the
>>> Bio.Sequencing.Ace parser) returning SeqRecords as we go.  As Frank
>>> points out in the code comments, this means we can't easily include
>>> the WA, CT, RT and WR tags found in the Ace file footer.  Do you use
>>> this information Jose?
>>>       
>
> Jose replied,
>   
>> I haven't used the iterator because of the deprecation warning of the code. I
>> tried with about 40000 alignments and it worked in a computer with 8 GB of ram.
>> I there are more sequences, and there will be with the 454 sequencer, we will
>> have trouble reading all at once. I vote for the iterator approach. I have not
>> used the information of this tag, but I don't know also what they mean. I've
>> been looking for documentation about this format, but I've found none, do you
>> have any good ace documentation?
>>     
>
>   


From biopython at maubp.freeserve.co.uk  Tue Jun 17 09:53:23 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Jun 2008 14:53:23 +0100
Subject: [Biopython-dev] Iterating over Ace contig files
In-Reply-To: <4857B6DA.9040309@biologie.uni-kl.de>
References: <320fb6e00806170216k12ecd88fof60758db1ccec3cf@mail.gmail.com>
	<4857B6DA.9040309@biologie.uni-kl.de>
Message-ID: <320fb6e00806170653g482b104fl739107fcada06dc8@mail.gmail.com>

On Tue, Jun 17, 2008 at 2:06 PM, Frank Kauff <fkauff at biologie.uni-kl.de> wrote:
> Hi Peter,
>
> makes totally sense to me. Feel free to do the changes as you see it fit
>
> Frank

Thanks Frank.

I've checked in some comment changes to both Ace.py and Phd.py, aimed
at both improving the documentation and trying and make epydoc happier
for the automatic API documentation:
http://biopython.org/DIST/docs/api/

Peter

P.S. I also added an __iter__ method to the Ace Iterator (Phd already had one).

From mjldehoon at yahoo.com  Tue Jun 17 10:08:31 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 17 Jun 2008 07:08:31 -0700 (PDT)
Subject: [Biopython-dev] Iterating over Ace contig files
In-Reply-To: <320fb6e00806170653g482b104fl739107fcada06dc8@mail.gmail.com>
Message-ID: <399611.60966.qm@web62415.mail.re1.yahoo.com>

Note that bug #2454 also pertains to the Ace and Phd parsers. If you are modifying the Ace and Phd parsers, can you fix this bug at the same time?

http://bugzilla.open-bio.org/show_bug.cgi?id=2454

--Michiel.

Peter <biopython at maubp.freeserve.co.uk> wrote: On Tue, Jun 17, 2008 at 2:06 PM, Frank Kauff  wrote:
> Hi Peter,
>
> makes totally sense to me. Feel free to do the changes as you see it fit
>
> Frank

Thanks Frank.

I've checked in some comment changes to both Ace.py and Phd.py, aimed
at both improving the documentation and trying and make epydoc happier
for the automatic API documentation:
http://biopython.org/DIST/docs/api/

Peter

P.S. I also added an __iter__ method to the Ace Iterator (Phd already had one).
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


From bugzilla-daemon at portal.open-bio.org  Tue Jun 17 10:43:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Jun 2008 10:43:42 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806171443.m5HEhgua005645@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-17 10:43 EST -------
I've removed the strict file-like test in:

Bio/Sequencing/Ace.py revision: 1.12
Bio/Sequencing/Phd.py revision: 1.6

In these cases, the handle is immediately turned into an UndoHandle which will
be able to check for a sufficiently file like object.

Hopefully that's what you meant Michiel - we could go further and introduce a
parse() function and deprecate the Iterator objects in these modules.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jun 18 06:34:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 18 Jun 2008 06:34:43 -0400
Subject: [Biopython-dev] [Bug 2503] An error when parsing NCBIWWW Blast
	output
In-Reply-To: <bug-2503-42@http.bugzilla.open-bio.org/>
Message-ID: <200806181034.m5IAYhS1026214@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2503


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-18 06:34 EST -------
I'm closing this bug as "INVALID" due to a lack of information.

If you are still having trouble Prashantha, and can give us some more
information, please re-open this bug.

Thank you.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jun 18 07:34:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 18 Jun 2008 07:34:26 -0400
Subject: [Biopython-dev] [Bug 2497] Unit tests do not cover
	Bio.Blast.NCBIWWW.qblast()
In-Reply-To: <bug-2497-42@http.bugzilla.open-bio.org/>
Message-ID: <200806181134.m5IBYQjC032061@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2497


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-18 07:34 EST -------
I checked in a slightly revised version of this as test_NCBI_qblast.py -
marking this bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jun 18 08:01:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 18 Jun 2008 08:01:11 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
	for '-F F' and make it safe
In-Reply-To: <bug-2508-42@http.bugzilla.open-bio.org/>
Message-ID: <200806181201.m5IC1BxA001255@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-18 08:01 EST -------
Created an attachment (id=946)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=946&action=view)
Patch to Bio/Blast/NCBIStandalone.py and Tests/test_NCBIStandalone.py

Suggested patch for the command injection risk.

Can anyone think of a legitimate reason for a ; or & character in the
parameters of a BLAST command line?  This patch is very simple and will reject
any keyword parameter containing the ; or && characters.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Wed Jun 18 10:00:56 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 18 Jun 2008 15:00:56 +0100
Subject: [Biopython-dev] SeqRecord to file format as string
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
	<47147341.4020708@maubp.freeserve.co.uk>
	<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
	<4714EB8E.3000700@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
Message-ID: <320fb6e00806180700k327e6913m7ba9c4bdc3421f67@mail.gmail.com>

This is returning to a thread last year, about getting a SeqRecord
into a string in a particular file format (e.g. fasta).  Jared Flatow
had suggest adding a method to the SeqRecord itself.

Jared wrote:
>  > ... To always have to write to a file feels strange, but I see
>  > that it would be messy to go OO since there are so many formats.
>  > However, giving preference to fasta over other formats by making it
>  > innate doesn't seem like such a terrible idea. I do have mixed
>  > feelings about 'bloating' the code which is why I asked, and you have
>  > convinced me that this is not quite appropriate given existing
>  > convention. However the idea would be to put the to_fasta or
>  > to_format method inside the SeqRecord, then to call it from the IO
>  > when needed to actually write to a file, but call it directly when
>  > all that is wanted is a string...
>
> Its debatable isn't it?  I suspect that for most users, when they want a
> record in a particular file format its for writing to a file.  However,
> adding a to_format() method to a SeqRecord some sense (suitable for
> sequential file formats only).  This would take a format name and return
> a string, by calling Bio.SeqIO with a StringIO object internally.
>
> Peter

Jared - On reflection, do you think adding a method like this to the
SeqRecord (or even just for the FASTA format) would be useful?

I recently found myself wanting to use this sort of functionality, and
remembered this old thread.  This time I was wondering about using the
method name tostring (matching the name of a Seq object method).  In
order to mimic the Seq object's method, the format would be optional
and when omitted would give the sequence as a string.  Otherwise one
of the lower case strings used in Bio.SeqIO should be supplied.  There
is a sample implementation at the end of this email.
?
On Wed, Oct 17, 2007 Michiel De Hoon wrote:
> How about the following:
>
> SeqIO.write(sequences, handle, format) returns the properly formatted string
> if handle==None.

I can see the above is simpler than having to supply a StringIO
handle, but it doesn't make the functionality available directly from
the SeqRecord object.  It also complicates the API of the SeqIO module
with a special case.

Peter

--

######################################
For the SeqRecord class, in Bio/SeqRecord.py
######################################
    def tostring(self, format=None) :
        """Returns the record as a string in the specified file format.

        If the file format is omitted (default), the sequence itself is
        returned as a string.

        Otherwise the format should be a lower case string supported by
        Bio.SeqIO, which is used to turn the SeqRecord into a string."""
        if format :
            from StringIO import StringIO
            from Bio import SeqIO
            handle = StringIO()
            SeqIO.write([self], handle, format)
            handle.seek(0)
            return handle.read()
        else :
            #Return the sequence as a string
            return self.seq.tostring()
############################################


From jflatow at northwestern.edu  Wed Jun 18 11:25:18 2008
From: jflatow at northwestern.edu (Jared Flatow)
Date: Wed, 18 Jun 2008 10:25:18 -0500
Subject: [Biopython-dev] SeqRecord to file format as string
In-Reply-To: <4D53AB82-F673-4F4F-BCEC-BA06088E8721@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
	<47147341.4020708@maubp.freeserve.co.uk>
	<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
	<4714EB8E.3000700@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
	<320fb6e00806180700k327e6913m7ba9c4bdc3421f67@mail.gmail.com>
	<4D53AB82-F673-4F4F-BCEC-BA06088E8721@northwestern.edu>
Message-ID: <55567F98-C5F5-4A2F-8542-502F17F485E9@northwestern.edu>

Quick correction:

On Jun 18, 2008, at 10:16 AM, Jared Flatow wrote:

> Hi Peter,
>
> On Jun 18, 2008, at 9:00 AM, Peter wrote:
>
>> Jared - On reflection, do you think adding a method like this to the
>> SeqRecord (or even just for the FASTA format) would be useful?
>
> Yes I still think so. In fact, for sequences, I would say that I  
> pretty much never deal with a format ever than FASTA, so even making  
> the __str__ method of SeqRecord return the FASTA format as well  
> seems reasonable, though perhaps my use cases are different than  
> others.
>
> However, py3k and 2.6 will make available the functionality  
> described in PEP 3101:
>
> http://www.python.org/dev/peps/pep-3101/
>
> I think it would be best to define some semantics that are  
> compatible with this PEP. This would basically mean using the  
> __format__ method (which could be the same as the tostring method  
> you have defined below). To achieve backward compatibility and/or a  
> more OO interface, tostring could just be an alias for __format__.  
> Thus, instead of calling format(seq_rec, 'fasta') one could call  
> seq_rec.tostring('fasta') and these would be equivalent. The PEP  
> also states that format(seq_rec) should be the same as str(seq_rec).

On second thought it seems like a .format method (similar to the one  
the string class is acquiring) should be used as an alias to  
__format__ (somehow I think tostring should always be the same as  
__str__)

> In short, I think creating methods to return formatted versions of  
> objects (SeqRecords) is a good idea, but most especially if it is  
> done in a way consistent with the language's vision.
>
> Best,
> jared


From bugzilla-daemon at portal.open-bio.org  Wed Jun 18 11:36:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 18 Jun 2008 11:36:48 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806181536.m5IFamvB015695@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #16 from mdehoon at ims.u-tokyo.ac.jp  2008-06-18 11:36 EST -------
(In reply to comment #15)
> I've removed the strict file-like test in:
> 
> Bio/Sequencing/Ace.py revision: 1.12
> Bio/Sequencing/Phd.py revision: 1.6
> 
> In these cases, the handle is immediately turned into an UndoHandle which will
> be able to check for a sufficiently file like object.
> 
> Hopefully that's what you meant Michiel

Actually, I think we should avoid using an UndoHandle altogether, now that
Python has generator functions.

> - we could go further and introduce a
> parse() function and deprecate the Iterator objects in these modules.
> 
That would make things a lot easier. An Iterator class was useful in older
versions of Python, but generator functions provide a cleaner alternative.

In Ace.py, we'd need three functions:

1) read(handle), which returns one record (Contig) read from the handle, and
None otherwise;

2) parse(handle), a generator function returning an iterator over the records;

3) a local function _process_line(line, record)

These functions then look like this:

def read(handle):
    record = None
    for line in handle:
        if line[:2]=='CO':
            break
    else:
        return None
    record = Contig()
    for line in handle:
        if line[:2]=='CO':
            return record
        else:
            _process_line(line, record)

def parse(handle):
    record = None
    for line in handle:
        if line[:2]=='CO':
            if record:
                yield record
            record = Contig()
        _process_line(line, record)
    if record:
        return record

The actual work is done in _process_line.

So we don't need to store the read lines explicitly; this is now taken care of
by the generator function. Hence, we don't need to convert the handle to an
UndoHandle. In addition, handle can now also be a list of lines instead of a
file handle. In this respect, I think Zachary was right in comment #11:

> Maybe it's a good idea for any parsers/iterators to just
> use the iterator-like ability of file handles?

In other words, as long as we can pull lines from the handle, we can parse it.

In Phd.py, it's even simpler. Here, we only need the read() and parse()
function:

def read(handle):
    for line in handle:
        if line.startswith("BEGIN_SEQUENCE"):
            record = Record()
        elif line.startswith("END_SEQUENCE"):
            return record
        else:
            # do the actual processing of the other lines here

def parse(handle):
    while True:
        record = read(handle)
        if not record:
            return
        yield record

Again, we can process each line just as they come along. No UndoHandle, no
Parser, no Consumer, no Scanner needed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From jflatow at northwestern.edu  Wed Jun 18 11:16:59 2008
From: jflatow at northwestern.edu (Jared Flatow)
Date: Wed, 18 Jun 2008 10:16:59 -0500
Subject: [Biopython-dev] SeqRecord to file format as string
In-Reply-To: <320fb6e00806180700k327e6913m7ba9c4bdc3421f67@mail.gmail.com>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
	<47147341.4020708@maubp.freeserve.co.uk>
	<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
	<4714EB8E.3000700@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
	<320fb6e00806180700k327e6913m7ba9c4bdc3421f67@mail.gmail.com>
Message-ID: <4D53AB82-F673-4F4F-BCEC-BA06088E8721@northwestern.edu>

Hi Peter,

On Jun 18, 2008, at 9:00 AM, Peter wrote:

> Jared - On reflection, do you think adding a method like this to the
> SeqRecord (or even just for the FASTA format) would be useful?

Yes I still think so. In fact, for sequences, I would say that I  
pretty much never deal with a format ever than FASTA, so even making  
the __str__ method of SeqRecord return the FASTA format as well seems  
reasonable, though perhaps my use cases are different than others.

However, py3k and 2.6 will make available the functionality described  
in PEP 3101:

http://www.python.org/dev/peps/pep-3101/

I think it would be best to define some semantics that are compatible  
with this PEP. This would basically mean using the __format__ method  
(which could be the same as the tostring method you have defined  
below). To achieve backward compatibility and/or a more OO interface,  
tostring could just be an alias for __format__. Thus, instead of  
calling format(seq_rec, 'fasta') one could call  
seq_rec.tostring('fasta') and these would be equivalent. The PEP also  
states that format(seq_rec) should be the same as str(seq_rec).

In short, I think creating methods to return formatted versions of  
objects (SeqRecords) is a good idea, but most especially if it is done  
in a way consistent with the language's vision.

Best,
jared

From yair.benita at gmail.com  Wed Jun 18 13:26:02 2008
From: yair.benita at gmail.com (Yair Benita)
Date: Wed, 18 Jun 2008 13:26:02 -0400
Subject: [Biopython-dev] BioPax parser
Message-ID: <C47EBD6A.1B29A%yair.benita@gmail.com>

Hi Guys,
Does anyone have a biopax parser written in python?
Thanks,
Yair


From biopython at maubp.freeserve.co.uk  Wed Jun 18 13:42:13 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 18 Jun 2008 18:42:13 +0100
Subject: [Biopython-dev] BioPax parser
In-Reply-To: <C47EBD6A.1B29A%yair.benita@gmail.com>
References: <C47EBD6A.1B29A%yair.benita@gmail.com>
Message-ID: <320fb6e00806181042y169f580epbd8c876eb3cb57fa@mail.gmail.com>

On Wed, Jun 18, 2008 at 6:26 PM, Yair Benita <yair.benita at gmail.com> wrote:
> Hi Guys,
> Does anyone have a biopax parser written in python?
> Thanks,
> Yair

I don't know of any (but I haven't searched).  From a quick look on
www.biopax.org they use XML, so you should be able to parse it in
python fairly easily - but I guess some sort of object orientated
representation of the data would be very nice to have.

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Jun 19 06:08:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 19 Jun 2008 06:08:55 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
	for '-F F' and make it safe
In-Reply-To: <bug-2508-42@http.bugzilla.open-bio.org/>
Message-ID: <200806191008.m5JA8t0v016495@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-19 06:08 EST -------
On the issue of the low-complexity filter, that is actually already supported
in NCBIStandalone.blastall(), NCBIStandalone.blastpgp() and
NCBIStandalone.rpsblast() using the optional argument 'filter'.  This is
described in the doc string too, although it doesn't use the phrase "low
complexity" which might be clearer.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jun 19 06:20:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 19 Jun 2008 06:20:03 -0400
Subject: [Biopython-dev] [Bug 2494] _retrieve_taxon in BioSQL.py needs
	urgent optimization
In-Reply-To: <bug-2494-42@http.bugzilla.open-bio.org/>
Message-ID: <200806191020.m5JAK3OZ017201@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2494


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-19 06:20 EST -------
I'm marking this as fixed now, but if anyone does find an issue with it please
re-open the bug.  Thanks for your work on this Eric.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jun 19 06:41:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 19 Jun 2008 06:41:22 -0400
Subject: [Biopython-dev] [Bug 2408] GenBank records do not contain U's
In-Reply-To: <bug-2408-42@http.bugzilla.open-bio.org/>
Message-ID: <200806191041.m5JAfMNK018058@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2408


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-19 06:41 EST -------
Given there were no other opinions voiced on how to handle this, I went ahead
and fixed this in Bio/GenBank/__init__.py CVS revision 1.83

For records from RNA, if the sequence contains T but not U, we will use a DNA
alphabet in the Seq object.

Thanks for raising this Marcin.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Thu Jun 19 09:04:31 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 19 Jun 2008 06:04:31 -0700 (PDT)
Subject: [Biopython-dev] Bio.CDD, anyone?
Message-ID: <14893.84074.qm@web62409.mail.re1.yahoo.com>

Hi everybody,

Bio.CDD is a module with a parser for CDD (NCBI's Conserved Domain Database) records. The parser parses HTML pages from CDD's web site. Since the parser was written about six years ago, the CDD web site has changed considerably. Bio.CDD therefore cannot parse current HTML pages from CDD.

So I am wondering:
1) Is anybody using Bio.CDD?
2) Is anybody willing to update Bio.CDD to handle current HTML?
3) If not, can we deprecate it? There is not much purpose of having a parser for HTML pages from years ago.

--Michiel.


From biopython at maubp.freeserve.co.uk  Thu Jun 19 09:38:29 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 19 Jun 2008 14:38:29 +0100
Subject: [Biopython-dev] Bio.CDD, anyone?
In-Reply-To: <14893.84074.qm@web62409.mail.re1.yahoo.com>
References: <14893.84074.qm@web62409.mail.re1.yahoo.com>
Message-ID: <320fb6e00806190638y2e3729e1ga66561de0c962700@mail.gmail.com>

> Bio.CDD is a module with a parser for CDD (NCBI's Conserved Domain Database)
> records. The parser parses HTML pages from CDD's web site. Since the parser
> was written about six years ago, the CDD web site has changed considerably.
> Bio.CDD therefore cannot parse current HTML pages from CDD.

A couple of years ago, I wanted to get the CDD domain name and
description and ended up writing my own very simple and crude parser
to extract just this information.  Doing a proper job would mean
extracting lots and lots of fields, e.g.
http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=29475

I wonder if the NCBI make any of this available as XML via Entrez?  I
had a quick look and couldn't find anything.

Peter

From mjldehoon at yahoo.com  Thu Jun 19 09:58:25 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 19 Jun 2008 06:58:25 -0700 (PDT)
Subject: [Biopython-dev] Bio.CDD, anyone?
In-Reply-To: <320fb6e00806190638y2e3729e1ga66561de0c962700@mail.gmail.com>
Message-ID: <352888.20937.qm@web62409.mail.re1.yahoo.com>

> I wonder if the NCBI make any of this available as XML via Entrez?  I
> had a quick look and couldn't find anything.

Actually I already asked this question to NCBI. Their answer was that a subset of the information shown on the web page is available as XML via Entrez's ESummary and EFetch (and thus available from Biopython). The full CDD records are stored as one large file, which is obtainable from NCBI's ftp site, but currently it is not possible to get individual CDD records except in HTML form through the NCBI website.

--Michiel.


Peter <biopython at maubp.freeserve.co.uk> wrote: > Bio.CDD is a module with a parser for CDD (NCBI's Conserved Domain Database)
> records. The parser parses HTML pages from CDD's web site. Since the parser
> was written about six years ago, the CDD web site has changed considerably.
> Bio.CDD therefore cannot parse current HTML pages from CDD.

A couple of years ago, I wanted to get the CDD domain name and
description and ended up writing my own very simple and crude parser
to extract just this information.  Doing a proper job would mean
extracting lots and lots of fields, e.g.
http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=29475

I wonder if the NCBI make any of this available as XML via Entrez?  I
had a quick look and couldn't find anything.

Peter


From biopython at maubp.freeserve.co.uk  Thu Jun 19 17:08:13 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 19 Jun 2008 22:08:13 +0100
Subject: [Biopython-dev] test_Entrez.py fails on Linux?
Message-ID: <320fb6e00806191408t45a45da8hda0c2fc8a39aae57@mail.gmail.com>

Hi Michiel,

I've just tried the unit tests on a clean checkout on Linux, and there
is a problem with test_Entrez.py (shown below).  I'm pretty sure it
was working for me on Mac OS X this afternoon, so this may be platform
specific.  I haven't using Biopython on Windows recently so I don't
know if that is working or not.

If you can't reproduce this, let me know and I do some investigation
here.  The good news is all the other tests seem fine on Linux (bar
the GFF, dnal and the population genetics tests for which I don't have
the external dependencies installed).

Peter

This is the output I get on python 2.4.3, using 64bit Ubuntu Dapper
Drake (a little old now).

maubp at shuttle2:~/repository/biopython/Tests$ python test_Entrez.py
Test parsing database list returned by EInfo ... ok
Test parsing database info returned by EInfo ... ok
Test parsing XML returned by ESearch from the Journals database ... ok
Test parsing XML returned by ESearch when no items were found ... ok
Test parsing XML returned by ESearch from the Nucleotide database ... ok
Test parsing XML returned by ESearch from PubMed Central ... ok
Test parsing XML returned by ESearch from the Protein database ... ok
Test parsing XML returned by ESearch from PubMed (first test) ... ok
Test parsing XML returned by ESearch from PubMed (second test) ... ok
Test parsing XML returned by ESearch from PubMed (third test) ... ok
Test parsing XML returned by EPost ... ok
Test parsing XML returned by EPost with an invalid id (overflow tag) ... ok
Test parsing XML returned by EPost with incorrect arguments ... ERROR
Test parsing XML returned by ESummary from the Journals database ... ok
Test parsing XML returned by ESummary from the Nucleotide database ... ok
Test parsing XML returned by ESummary from the Protein database ... ok
Test parsing XML returned by ESummary from PubMed ... ok
Test parsing XML returned by ESummary from the Structure database ... ok
Test parsing XML returned by ESummary from the Taxonomy database ... ok
Test parsing XML returned by ESummary from the UniSTS database ... ok
Test parsing XML returned by ESummary with incorrect arguments ... ERROR
Test parsing cancerchromosomes links returned by ELink ... ok
Test parsing medline indexed articles returned by ELink ... ok
Test parsing Nucleotide to Protein links returned by ELink ... ok
Test parsing pubmed links returned by ELink (first test) ... ok
Test parsing pubmed links returned by ELink (second test) ... ok
Test parsing pubmed link returned by ELink (third test) ... ok
Test parsing pubmed links returned by ELink (fourth test) ... ok
Test parsing pubmed links returned by ELink (fifth test) ... ok
Test parsing pubmed links returned by ELink (sixth test) ... ok
Test parsing XML returned by EFetch, Journals database ... ok
Test parsing XML returned by EFetch, Nucleotide database (first test) ... ok
Test parsing XML returned by EFetch, Protein database ... ok
Test parsing XML returned by EFetch, OMIM database ... ok
Test parsing XML returned by EFetch, PubMed database (first test) ... ok
Test parsing XML returned by EFetch, PubMed database (second test) ... ok
Test parsing XML returned by EFetch, Taxonomy database ... ok
Test parsing XML output returned by EGQuery (first test) ... ok
Test parsing XML output returned by EGQuery (second test) ... ok
Test parsing XML output returned by ESpell ... ok

======================================================================
ERROR: Test parsing XML returned by EPost with incorrect arguments
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Entrez.py", line 560, in t_wrong
    assert exception.message=="Wrong DB name"
AttributeError: RuntimeError instance has no attribute 'message'

======================================================================
ERROR: Test parsing XML returned by ESummary with incorrect arguments
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Entrez.py", line 943, in t_wrong
    assert exception.message=="Neither query_key nor id specified"
AttributeError: RuntimeError instance has no attribute 'message'

----------------------------------------------------------------------
Ran 40 tests in 0.471s

FAILED (errors=2)

From biopython at maubp.freeserve.co.uk  Fri Jun 20 05:31:21 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 20 Jun 2008 10:31:21 +0100
Subject: [Biopython-dev] test_Entrez.py fails on Linux?
In-Reply-To: <320fb6e00806191408t45a45da8hda0c2fc8a39aae57@mail.gmail.com>
References: <320fb6e00806191408t45a45da8hda0c2fc8a39aae57@mail.gmail.com>
Message-ID: <320fb6e00806200231y716c5a1ds2495f16a56a15f88@mail.gmail.com>

> Hi Michiel,
>
> I've just tried the unit tests on a clean checkout on Linux, and there
> is a problem with test_Entrez.py (shown below).  I'm pretty sure it
> was working for me on Mac OS X this afternoon, so this may be platform
> specific.  I haven't using Biopython on Windows recently so I don't
> know if that is working or not.

I've just checked, and on a clean CVS checkout under Mac OS 10.5
Leopard with python 2.5.2, test_Entrez.py passes.

A clean check out last night on 64bit Ubuntu Dapper Drake with python
2.4.3 failed.

So whatever is going wrong is probably OS specific or perhaps python
version specific.

Peter

From bugzilla-daemon at portal.open-bio.org  Fri Jun 20 06:07:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Jun 2008 06:07:59 -0400
Subject: [Biopython-dev] [Bug 2524] New: Handle missing libraries like
	TextTools in run_tests.py
Message-ID: <bug-2524-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2524

           Summary: Handle missing libraries like TextTools in run_tests.py
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


Once upon a time, we treated any ImportError from a unit test as a reason to
skip the test gracefully, as these are *usually* from missing external
dependencies.  This could hide real errors if we had (re)moved a Biopython
module.

We now use the Bio.MissingExternalDependencyError exception, and the unit tests
themselve will raise this for missing command line tools or certain optional
libraries like MySQLdb.

However, the Bio.MissingExternalDependencyError exception does not get raised
when the following commonly used external dependencies are missing:

import TextTools
import Numeric
import reportlab

It is now possible to install Biopython without TextTools and reportlab (and
Numeric?), and make use of a lot of its functionality - but the unit tests give
nasty error messages.

I propose we either:

(a) Add a special case to run_tests.py to catch specific ImportError cases and
skip the test with a suitable message (patch to follow).  Specifically
TextTools, reportlab and Numeric - but potentially other third party libraries
like MySQLdb could be handled too.  This keeps the individual unit tests
simple.

or:

(b) Modify all the tests using these semi-optional libraries to catch the
ImportError and raise MissingExternalDependencyError instead.  As the tests
themselves generally don't directly import the external library this is perhaps
messy.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jun 20 06:09:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Jun 2008 06:09:37 -0400
Subject: [Biopython-dev] [Bug 2524] Handle missing libraries like TextTools
	in run_tests.py
In-Reply-To: <bug-2524-42@http.bugzilla.open-bio.org/>
Message-ID: <200806201009.m5KA9b98019988@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2524


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-20 06:09 EST -------
Created an attachment (id=948)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=948&action=view)
Patch to Tests/run_tests.py

Adds a hard coded list of known import errors to be treated as missing external
dependencies (i.e. skip the test).

This is implemented as a dict allowing a URL to be given.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jun 20 06:16:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Jun 2008 06:16:49 -0400
Subject: [Biopython-dev] [Bug 2525] New: The unit tests GUI run_tests.py
	does not track skipped tests
Message-ID: <bug-2525-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2525

           Summary: The unit tests GUI run_tests.py does not track skipped
                    tests
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


Running run_tests.py without the --no-gui command line option counts any
skipped tests as passed (green).  Furthermore, the skipped message is just
printed to the command line (if run from a terminal).

Ideally the test framework would report these skipped tests in the GUI, perhaps
even with a clickable entry (like the failures) to show the message.

[On a personal note, I never use the run_tests.py GUI, and would rather it was
not the default.  If no one likes it, we could just remove the GUI]


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jun 20 08:17:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Jun 2008 08:17:15 -0400
Subject: [Biopython-dev] [Bug 2525] The unit tests GUI run_tests.py does not
	track skipped tests
In-Reply-To: <bug-2525-42@http.bugzilla.open-bio.org/>
Message-ID: <200806201217.m5KCHFoF025054@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2525


------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp  2008-06-20 08:17 EST -------
> [On a personal note, I never use the run_tests.py GUI, and would rather it was
> not the default.  If no one likes it, we could just remove the GUI]
> 
Personally, I don't see the advantage of the GUI, and I can live without it.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Fri Jun 20 08:14:30 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 20 Jun 2008 05:14:30 -0700 (PDT)
Subject: [Biopython-dev] test_Entrez.py fails on Linux?
In-Reply-To: <320fb6e00806200231y716c5a1ds2495f16a56a15f88@mail.gmail.com>
Message-ID: <795994.35527.qm@web62408.mail.re1.yahoo.com>

Hi Peter,

Thanks for letting me know.

It turned out that there were two problems with older Python versions (2.3 and 2.4).
One issue was not in Bio.Entrez but in the test script itself, using a feature that is only available in Python 2.5. This is now fixed in CVS.
The second issue is with Python 2.3: It does not copy data files to the build directory. Then, when you run "python run_tests.py test_Entrez.py" you will get many error messages about missing DTD files. If you run "python test_entrez.py" instead, the tests are done from the installed Biopython instead of the one in the build directory, and then no errors occur.
I guess the only way to solve this is to modify run_tests.py to skip test_Entrez if Python is version 2.3. Unless somebody else has a better suggestion, I will do that.

--Michiel.

Peter <biopython at maubp.freeserve.co.uk> wrote: > Hi Michiel,
>
> I've just tried the unit tests on a clean checkout on Linux, and there
> is a problem with test_Entrez.py (shown below).  I'm pretty sure it
> was working for me on Mac OS X this afternoon, so this may be platform
> specific.  I haven't using Biopython on Windows recently so I don't
> know if that is working or not.

I've just checked, and on a clean CVS checkout under Mac OS 10.5
Leopard with python 2.5.2, test_Entrez.py passes.

A clean check out last night on 64bit Ubuntu Dapper Drake with python
2.4.3 failed.

So whatever is going wrong is probably OS specific or perhaps python
version specific.

Peter
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


From biopython at maubp.freeserve.co.uk  Fri Jun 20 08:43:55 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 20 Jun 2008 13:43:55 +0100
Subject: [Biopython-dev] test_Entrez.py fails on Linux?
In-Reply-To: <795994.35527.qm@web62408.mail.re1.yahoo.com>
References: <320fb6e00806200231y716c5a1ds2495f16a56a15f88@mail.gmail.com>
	<795994.35527.qm@web62408.mail.re1.yahoo.com>
Message-ID: <320fb6e00806200543u62d385fcka3aa9026986549ba@mail.gmail.com>

On Fri, Jun 20, 2008 at 1:14 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Peter,
>
> Thanks for letting me know.
>
> It turned out that there were two problems with older Python versions (2.3 and 2.4).
> One issue was not in Bio.Entrez but in the test script itself, using a
> feature that is only available in Python 2.5. This is now fixed in CVS.

Good work.

> The second issue is with Python 2.3: It does not copy data files to the
> build directory. Then, when you run "python run_tests.py test_Entrez.py"
> you will get many error messages about missing DTD files. If you run
> "python test_entrez.py" instead, the tests are done from the installed
> Biopython instead of the one in the build directory, and then no errors occur.

I had suspected there was something like this happening on my Windows
machine (which is on python 2.3) but at the time you were still busy
updating the code so I didn't worry about it.

This issue with non-python files in the build directory reminds me of
something Tiago found with his Population Genetics work.  I'd have to
go over the old emails to double check.

> I guess the only way to solve this is to modify run_tests.py to skip
> test_Entrez if Python is version 2.3. Unless somebody else has a better
> suggestion, I will do that.

We could modify setup.py under python 2.3 to make sure these files are
copied.  Is this related to the (reverted) package_data change you
tried recently?

Peter

From biopython at maubp.freeserve.co.uk  Fri Jun 20 09:23:21 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 20 Jun 2008 14:23:21 +0100
Subject: [Biopython-dev] test_Entrez.py fails on Linux?
In-Reply-To: <320fb6e00806200543u62d385fcka3aa9026986549ba@mail.gmail.com>
References: <320fb6e00806200231y716c5a1ds2495f16a56a15f88@mail.gmail.com>
	<795994.35527.qm@web62408.mail.re1.yahoo.com>
	<320fb6e00806200543u62d385fcka3aa9026986549ba@mail.gmail.com>
Message-ID: <320fb6e00806200623n2148b735t1071aa40b0f24a7c@mail.gmail.com>

>> The second issue is with Python 2.3: It does not copy data files to the
>> build directory. Then, when you run "python run_tests.py test_Entrez.py"
>> you will get many error messages about missing DTD files. If you run
>> "python test_entrez.py" instead, the tests are done from the installed
>> Biopython instead of the one in the build directory, and then no errors occur.
>
> ...
>
> This issue with non-python files in the build directory reminds me of
> something Tiago found with his Population Genetics work.  I'd have to
> go over the old emails to double check.

I was thinking of bug 2375, where Tiago had to add a work arround for
data files not present in the build directory.
http://bugzilla.open-bio.org/show_bug.cgi?id=2375

Peter

From biopython at maubp.freeserve.co.uk  Fri Jun 20 10:42:57 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 20 Jun 2008 15:42:57 +0100
Subject: [Biopython-dev] SeqRecord to file format as string
In-Reply-To: <4D53AB82-F673-4F4F-BCEC-BA06088E8721@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
	<47147341.4020708@maubp.freeserve.co.uk>
	<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
	<4714EB8E.3000700@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
	<320fb6e00806180700k327e6913m7ba9c4bdc3421f67@mail.gmail.com>
	<4D53AB82-F673-4F4F-BCEC-BA06088E8721@northwestern.edu>
Message-ID: <320fb6e00806200742w7e9e57dbt8d0d3362573cf9a@mail.gmail.com>

On Wed, Jun 18, 2008 at 4:16 PM, Jared Flatow <jflatow at northwestern.edu> wrote:
> However, py3k and 2.6 will make available the functionality described in PEP
> 3101:
>
> http://www.python.org/dev/peps/pep-3101/
>
> I think it would be best to define some semantics that are compatible with
> this PEP.

That is interesting - the PEP has been accepted, but I guess we should
wait and see exactly what python 2.6 and 3.0 end up using before
trying to integrate this into the SeqRecord.

> In short, I think creating methods to return formatted versions of objects
> (SeqRecords) is a good idea, but most especially if it is done in a way
> consistent with the language's vision.

That does sound wise - but I'm a little hazy on how exactly PEP-3101
will work in practice for generic complex objects.

Peter

From bugzilla-daemon at portal.open-bio.org  Fri Jun 20 11:01:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Jun 2008 11:01:17 -0400
Subject: [Biopython-dev] [Bug 2526] New: SeqFeature's .id property is not
	preserved in BioSQL
Message-ID: <bug-2526-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2526

           Summary: SeqFeature's .id property is not preserved in BioSQL
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


As per the title, a SeqFeature's .id property is not preserved after a
save/retreive in BioSQL.

I found this while working on Bug 2235, where my modified "swiss" parser
creates SeqRecord objects with SeqFeature object which may have their .id set. 
Note that in GenBank and EMBL, the SeqFeature objects do not have their id
property set, and so are not affected.

I need to review the BioSQL schema to see if there is a suitable field that
Biopython is ignoring, and if there is, use it.  If not, we can probably use a
tagged qualifier - ideally with the same name as the other Bio* projects.

See also test_BioSQL_SeqIO.py revision 1.17 which includes a word arround to
avoid this limitation.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From jflatow at northwestern.edu  Fri Jun 20 12:16:10 2008
From: jflatow at northwestern.edu (Jared Flatow)
Date: Fri, 20 Jun 2008 11:16:10 -0500
Subject: [Biopython-dev] SeqRecord to file format as string
In-Reply-To: <320fb6e00806200742w7e9e57dbt8d0d3362573cf9a@mail.gmail.com>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
	<47147341.4020708@maubp.freeserve.co.uk>
	<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
	<4714EB8E.3000700@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
	<320fb6e00806180700k327e6913m7ba9c4bdc3421f67@mail.gmail.com>
	<4D53AB82-F673-4F4F-BCEC-BA06088E8721@northwestern.edu>
	<320fb6e00806200742w7e9e57dbt8d0d3362573cf9a@mail.gmail.com>
Message-ID: <0FB6DD30-426C-43F3-BEBE-1728FA1E9D79@northwestern.edu>

On Jun 20, 2008, at 9:42 AM, Peter wrote:

> On Wed, Jun 18, 2008 at 4:16 PM, Jared Flatow <jflatow at northwestern.edu 
> > wrote:
>> However, py3k and 2.6 will make available the functionality  
>> described in PEP
>> 3101:
>>
>> http://www.python.org/dev/peps/pep-3101/
>>
>> I think it would be best to define some semantics that are  
>> compatible with
>> this PEP.
>
> That is interesting - the PEP has been accepted, but I guess we should
> wait and see exactly what python 2.6 and 3.0 end up using before
> trying to integrate this into the SeqRecord.

I agree, there's a couple of things that may still change, but the  
betas for 2.6 and 3.0 are out and that PEP has been around a while so  
I would say it's pretty much stable. At least as far as how the  
general mechanism will work, I don't believe that is likely to change.

>> In short, I think creating methods to return formatted versions of  
>> objects
>> (SeqRecords) is a good idea, but most especially if it is done in a  
>> way
>> consistent with the language's vision.
>
> That does sound wise - but I'm a little hazy on how exactly PEP-3101
> will work in practice for generic complex objects.

Yes I had to read it a few times through to understand how exactly it  
will work, here is what I know:

All objects now get the __format__ method which has a signature like  
this:

def __format__(self, format_spec):
	# return a formatted string

The format_spec (format specifier) can be defined by the object, so  
essentially it's totally customizable (if you want to do really crazy  
things there is a Formatter that can be messed with, but we should and  
can avoid this). This object method works like other customizable  
python methods, and there's a corresponding builtin, so calling  
format(obj, "the format specifier") will simply call  
obj.__format__(self, "the format specifier"). Thus we can define the  
format_spec for a SeqRecord to differentiate between FASTA and  
whatever other formats we want to define.

The string class is also getting a .format method which just calls  
the .__format__ method in an OO way instead of using the builtin. We  
can do the same thing, and it seems like most use cases will be to  
call seq_rec.format('fasta'). All this works for all python versions,  
except you typically can't call it using format(seq_rec, 'fasta')  
except in 2.6 or 3.0.

Besides the builtin format, we gain the ability to embed the format  
within other strings. So, using the implementation you provided  
earlier which just returns the underlying Seq as a string if no format  
is specified, we might define the __format__ method like this:

def __format__(self, format_spec=None):
	if format_spec:
            from StringIO import StringIO
            from Bio import SeqIO
            handle = StringIO()
            SeqIO.write([self], handle, format)
            handle.seek(0)
            return handle.read()
	return str(self)

def __str__(self):
	return str(self.seq)

Now that means I can also embed this in formatted strings, like so:

"this is my sequence: {0}".format(seq_rec)

Or:

"this is my sequence in fasta format: {0:fasta}".format(seq_rec)

All in all, its pretty much what you'd expect (and the same as what  
you had before). There's only a few small benefits we get for doing it  
this way (right now), but I don't think we can go wrong using the  
__format__ method like it was meant to be used, and who knows what  
future use cases this may simplify.

jared

From bugzilla-daemon at portal.open-bio.org  Sat Jun 21 00:19:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Jun 2008 00:19:59 -0400
Subject: [Biopython-dev] [Bug 2375] Coalescent support through Simcoal2
In-Reply-To: <bug-2375-42@http.bugzilla.open-bio.org/>
Message-ID: <200806210419.m5L4JxfJ001994@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2375


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


------- Comment #22 from mdehoon at ims.u-tokyo.ac.jp  2008-06-21 00:19 EST -------
(In reply to comment #15)
> The solution in Bio/PopGen/SimCoal/__init__.py to find builtin_tpl_dir is not
> so beautiful, but on the other hand I don't see a better way to do it.

I ran into the same problem with Bio/Entrez, which needs a bunch of DTD files
in Bio/Entrez/DTDs/. The attached patch to setup.py modifies the build command
such that the data files are copied to the build directory when running "python
setup.py build". This solves the problem with Bio.Entrez, and should also solve
the problem with Bio/PopGen/SimCoal without using the workaround in
Bio/PopGen/SimCoal/__init__.py. Can you guys try this patch on the platforms
and python versions you have access to? Just to make sure I didn't miss
anything before committing to CVS.

Recently there have been quite a lot of updates to CVS, so you may need to
start from a fresh CVS checkout.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Jun 21 00:21:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Jun 2008 00:21:13 -0400
Subject: [Biopython-dev] [Bug 2375] Coalescent support through Simcoal2
In-Reply-To: <bug-2375-42@http.bugzilla.open-bio.org/>
Message-ID: <200806210421.m5L4LDPg002064@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2375


------- Comment #23 from mdehoon at ims.u-tokyo.ac.jp  2008-06-21 00:21 EST -------
Created an attachment (id=950)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=950&action=view)
Patch to setup.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Sat Jun 21 01:11:18 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 20 Jun 2008 22:11:18 -0700 (PDT)
Subject: [Biopython-dev] Bio.SCOP
Message-ID: <251322.99482.qm@web62401.mail.re1.yahoo.com>


Bio.SCOP is one of the modules affected by Bug 2454
(http://bugzilla.open-bio.org/show_bug.cgi?id=2454),
which is basically about how Biopython uses file handles.

Bio.SCOP contains parsers for several file
formats used by SCOP. I am using Bio.SCOP.Hie
as an example here, but the same applies to
the other parsers.

The Bio.SCOP parsers define a Parser and a Iterator
class (similar to other older Biopython parsers).
Typical usage is as follows:

>>> from Bio.SCOP import Hie
>>> handle = open("mydatafile.txt")
>>> parser = Hie.Parser()
>>> records = Hier.Iterator(handle, parser)
>>> for record in records:
...     # record is an instance of Bio.SCOP.Hie.Record

Now, in the SCOP file format, each record is on one
line in the data file. So we don't need the Iterator:
>>> from Bio.SCOP import Hie
>>> handle = open("mydatafile.txt")
>>> parser = Hie.Parser()
>>> for line in handle:
...     record = parser.parse(line)
...     # record is an instance of Bio.SCOP.Hie.Record

This solves Bug #2454 (which occurs in the Iterator
class), and is more general than the Iterator class
(e.g., now we can parse a list of lines).

To take this one step further, the Parser class is not
really needed either. Although Parser is a class, we
are not using the functionality of a class (no
inheritance, and the object self is never used). In
essence, the parse() function inside the Parser class
may as well live outside of it.

There are several ways to simplify this module; each
of them essentially amount to moving the parse()
function:

1) Move the parse() function to the Record class initializer:
>>> from Bio.SCOP import Hie
>>> handle = open("mydatafile.txt")
>>> for line in handle:
...     record = Hie.Record(line)
...     # record is an instance of Bio.SCOP.Hie.Record

2) Move the parse() function outside of the Parser class,
and rename it read() for consistency with other Biopython
parsers:
>>> from Bio.SCOP import Hie
>>> handle = open("mydatafile.txt")
>>> while True:
...     record = Hie.read(handle)
...     if not record: break
...     # record is an instance of Bio.SCOP.Hie.Record

3) Move the parse() function outside of the Parser class,
and use it as a generator function:
>>> from Bio.SCOP import Hie
>>> handle = open("mydatafile.txt")
>>> records = Hie.parse(handle)
>>> for record in records:
...     # record is an instance of Bio.SCOP.Hie.Record


Comments, suggestions, preferences?

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Sat Jun 21 07:31:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Jun 2008 07:31:14 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806211131.m5LBVEWb019981@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #17 from mdehoon at ims.u-tokyo.ac.jp  2008-06-21 07:31 EST -------
I added a DeprecationWarning to Bio.Rebase.
Next on the to-do list is Bio.SCOP.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Sat Jun 21 07:36:43 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 21 Jun 2008 04:36:43 -0700 (PDT)
Subject: [Biopython-dev] [BioPython]  Bio.CDD, anyone?
In-Reply-To: <485A70B0.1010202@gmail.com>
Message-ID: <195444.96577.qm@web62403.mail.re1.yahoo.com>

As far as I can tell, the test files were created by saving the HTML source code from the CDD web site to a file. As the CDD web site has changed its HTML is the meantime, we cannot reproduce the HTML files used by the Bio.CDD tests.

Unless somebody objects in the next couple of days, I'll add a DeprecationWarning to Bio.CDD.

--Michiel.

Bruce Southey <bsouthey at gmail.com> wrote: Hi,
Do you know how the test files were created? If there is not an easy 
answer then it makes the decision easier.

Anyhow, I  vote to remove this module as, in addition to the things 
previously mentioned, it would far better to support interproscan 
(http://www.ebi.ac.uk/Tools/InterProScan/ ) than just a single tool.

Bruce
_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


From bugzilla-daemon at portal.open-bio.org  Sun Jun 22 00:51:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 22 Jun 2008 00:51:58 -0400
Subject: [Biopython-dev] [Bug 2527] New: Bug in NCBIXML.py in
	_end_BlastOutput_version()
Message-ID: <bug-2527-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2527

           Summary: Bug in NCBIXML.py in _end_BlastOutput_version()
           Product: Biopython
           Version: 1.45
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: cdputnam at ucsd.edu


biopython version is from Fedora distribution:
python-biopython-1.45-1.fc7

For a recently run NCBIWWW Blast (following the tutorial at
http://biopython.org/DIST/docs/tutorial/Tutorial.html), I
ran into a problem in parsing by _end_BlastOutput_version
with the version information:

<BlastOutput_version>BLASTP 2.2.18+</BlastOutput_version>


Traceback (most recent call last):
  File "blast2.py", line 7, in <module>
    for blast_record in blast_records:
  File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py", line 577, in
parse
    expat_parser.Parse(text, False)
  File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py", line 98, in
endElement
    eval("self.%s()" % method)
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py", line 216, in
_end_BlastOutput_version
    self._header.date = self._value.split()[2][1:-1]
IndexError: list index out of range

I've worked around this bug for now by commenting out the
offending line and setting the date to an empty string:

    def _end_BlastOutput_version(self):
        """version number of the BLAST engine (e.g., 2.1.2)

        Save this to put on each blast record object
        """
        self._header.version = self._value.split()[1]
        # self._header.date = self._value.split()[2][1:-1]
        self._header.date = ''


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Jun 22 00:52:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 22 Jun 2008 00:52:45 -0400
Subject: [Biopython-dev] [Bug 2527] Bug in NCBIXML.py in
	_end_BlastOutput_version()
In-Reply-To: <bug-2527-42@http.bugzilla.open-bio.org/>
Message-ID: <200806220452.m5M4qjiE029058@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2527


cdputnam at ucsd.edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |cdputnam at ucsd.edu


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Jun 22 01:52:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 22 Jun 2008 01:52:05 -0400
Subject: [Biopython-dev] [Bug 2527] Bug in NCBIXML.py in
	_end_BlastOutput_version()
In-Reply-To: <bug-2527-42@http.bugzilla.open-bio.org/>
Message-ID: <200806220552.m5M5q5rQ031580@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2527


------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp  2008-06-22 01:52 EST -------
I believe that this is already fixed in CVS.
Could you try the latest version of Bio/Blast/NCBIXML.py, available at
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/?cvsroot=biopython
and let us know if it fixes the bug?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 23 06:54:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 23 Jun 2008 06:54:22 -0400
Subject: [Biopython-dev] [Bug 2528] New: NCBIStandalone.blastall(): Replace
	os.popen3 with subprocess.Popen
Message-ID: <bug-2528-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2528

           Summary: NCBIStandalone.blastall(): Replace os.popen3 with
                    subprocess.Popen
           Product: Biopython
           Version: 1.45
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: mmokrejs at ribosome.natur.cuni.cz


I have already mentioned this on the email list few weeks ago ... NCBI Blast
2.2.18 (but was a case of also previous version as far as I remember) does not
flush output buffers when run from under mod_python-3.3.11/apache-2.2.8.

I tried to flush the buffers or disable buffering but it does not help. In the
end, a working solution is to move the using subprocess module introduced in
python 2.4 and which deprecates os.system, os.exec, os.popen* and other
functions. The following patch works for me, so the user receives back into
his/her web browser the blast stdout. Somehow, one has to copy the data into
another variable and close the file descriptors used by blastall binary.


Unfortunately, still a stale process can be seen in "ps -ef" output:
apache    5382  5323 47 12:31 ?        00:00:04 [blastall] <defunct>

But as I have said, at least the data is not buffered anymore.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 23 06:55:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 23 Jun 2008 06:55:26 -0400
Subject: [Biopython-dev] [Bug 2528] NCBIStandalone.blastall(): Replace
	os.popen3 with subprocess.Popen
In-Reply-To: <bug-2528-42@http.bugzilla.open-bio.org/>
Message-ID: <200806231055.m5NAtQCC030683@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2528


------- Comment #1 from mmokrejs at ribosome.natur.cuni.cz  2008-06-23 06:55 EST -------
Created an attachment (id=951)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=951&action=view)
NCBIStandalone.py.patch


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 23 06:56:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 23 Jun 2008 06:56:00 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
	for '-F F' and make it safe
In-Reply-To: <bug-2508-42@http.bugzilla.open-bio.org/>
Message-ID: <200806231056.m5NAu0or030728@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508


------- Comment #5 from mmokrejs at ribosome.natur.cuni.cz  2008-06-23 06:56 EST -------
(In reply to comment #4)

Yes, the "filter" argument is not clear, please improve the docs in the sources
and on the web. At the best I would in addition propose renaming the argument.

Regarding the patch in comment #3, I think it should be more strict and blast*
functions should only accept explicitly listed arguments in the function
definition, so no kwargs, etc. But it is a good startup. In general, I would
propose to provide a general wrapper function to be placed in front of _ALL_
popen3() calls. And, conjuction, replace the popen3 calls with
subprocess.Popen. See Bug #2528 on the NCBIStandalone.blastall() where is a
working example of this.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 23 11:01:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 23 Jun 2008 11:01:17 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806231501.m5NF1Hth014356@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #18 from mdehoon at ims.u-tokyo.ac.jp  2008-06-23 11:01 EST -------
See the discussion on the mailing list:
http://lists.open-bio.org/pipermail/biopython-dev/2008-June/003819.html
for some ideas for Bio.SCOP.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 23 11:16:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 23 Jun 2008 11:16:29 -0400
Subject: [Biopython-dev] [Bug 2527] Bug in NCBIXML.py in
	_end_BlastOutput_version()
In-Reply-To: <bug-2527-42@http.bugzilla.open-bio.org/>
Message-ID: <200806231516.m5NFGTgD015331@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2527


cdputnam at ucsd.edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from cdputnam at ucsd.edu  2008-06-23 11:16 EST -------
The latest NCBIXML.py does fix the problem with Blast version parsing.

Just so you know, I had to comment out two lines in
_end_Hsp_bit_score, similar to the version of the file I already had.
I'm guessing this is a version mismatch with some other file that
I didn't update (I only replaced NCBIXML.py).

The error was:

AttributeError: Description instance has no attribute 'bits'

And the commented version of the function is:

    def _end_Hsp_bit_score(self):
        """bit score of HSP
        """
        self._hsp.bits = float(self._value)
        #if self._descr.bits == None:
        #    self._descr.bits = float(self._value)

Thanks for your help.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Jun 24 05:38:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 24 Jun 2008 05:38:54 -0400
Subject: [Biopython-dev] [Bug 2528] NCBIStandalone.blastall(): Replace
	os.popen3 with subprocess.Popen
In-Reply-To: <bug-2528-42@http.bugzilla.open-bio.org/>
Message-ID: <200806240938.m5O9csKZ032756@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2528


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-24 05:38 EST -------
With this patch we have to wait for the sub-process to finish before we can
read its output.  This is a potential drawback as it delays the parsing. 
Currently we should be able to can parse this iteratively as the queries are
processed.

Also, you are loading the entire output into memory (as a list of strings,
which you then turn into a StringIO handle).  This is potentially a very bad
idea, as in extreme cases Blast XML files can be GB in size.

I'm not keen on your solution, but I don't know what to suggest for your
original problem, running Blast under mod_python-3.3.11/apache-2.2.8. 

Two minor points: Do you think we can do anything better on Python 2.3?  Did
you intend something similar for blastpgp and rpsblast.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Tue Jun 24 05:46:19 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 24 Jun 2008 10:46:19 +0100
Subject: [Biopython-dev] Bio.SCOP
In-Reply-To: <251322.99482.qm@web62401.mail.re1.yahoo.com>
References: <251322.99482.qm@web62401.mail.re1.yahoo.com>
Message-ID: <320fb6e00806240246u8afdb6fp51cd31000ebe3d9@mail.gmail.com>

On Sat, Jun 21, 2008 at 6:11 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Bio.SCOP contains parsers for several file
> formats used by SCOP. I am using Bio.SCOP.Hie
> as an example here, but the same applies to
> the other parsers.
>
> The Bio.SCOP parsers define a Parser and a Iterator
> class (similar to other older Biopython parsers).

I would deprecate the Parser and Iterator objects, and introduce a
parse(handle) function to iterate over a file (following our recent
convention) and a perhaps a read() function too (taking a handle or a
single line?),

Peter

From bugzilla-daemon at portal.open-bio.org  Tue Jun 24 06:17:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 24 Jun 2008 06:17:41 -0400
Subject: [Biopython-dev] [Bug 2528] NCBIStandalone.blastall(): Replace
	os.popen3 with subprocess.Popen
In-Reply-To: <bug-2528-42@http.bugzilla.open-bio.org/>
Message-ID: <200806241017.m5OAHfdK002192@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2528


------- Comment #3 from mmokrejs at ribosome.natur.cuni.cz  2008-06-24 06:17 EST -------
Hi Peter,
 well I am not much happy with this either, and I do understand your points. I
will try to come up with another solution. Would be best to disable buffering
in popen3() but I failed to get it working. Will give it some more thought next
week.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Jun 24 06:35:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 24 Jun 2008 06:35:50 -0400
Subject: [Biopython-dev] [Bug 2527] Bug in NCBIXML.py in
	_end_BlastOutput_version()
In-Reply-To: <bug-2527-42@http.bugzilla.open-bio.org/>
Message-ID: <200806241035.m5OAZo3p003784@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2527


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-24 06:35 EST -------
Regarding comment 2, I think you need to update Bio/Blast/Record.py as well.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Jun 24 06:36:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 24 Jun 2008 06:36:18 -0400
Subject: [Biopython-dev] [Bug 2528] NCBIStandalone.blastall(): Replace
	os.popen3 with subprocess.Popen
In-Reply-To: <bug-2528-42@http.bugzilla.open-bio.org/>
Message-ID: <200806241036.m5OAaIIt003857@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2528


------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp  2008-06-24 06:36 EST -------
Is there an easy way to replicate this issue?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Jun 24 07:30:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 24 Jun 2008 07:30:45 -0400
Subject: [Biopython-dev] [Bug 2527] Bug in NCBIXML.py in
	_end_BlastOutput_version()
In-Reply-To: <bug-2527-42@http.bugzilla.open-bio.org/>
Message-ID: <200806241130.m5OBUjYU007159@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2527


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |biopython-
                   |                            |bugzilla at maubp.freeserve.co.
                   |                            |uk


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-24 07:30 EST -------
P.S. This is a duplicate of Bug 2499


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Jun 24 09:05:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 24 Jun 2008 09:05:46 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
	for '-F F' and make it safe
In-Reply-To: <bug-2508-42@http.bugzilla.open-bio.org/>
Message-ID: <200806241305.m5OD5jZa012413@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-24 09:05 EST -------
Checking in Tests/test_NCBIStandalone.py new revision: 1.14
Checking in Bio/Blast/NCBIStandalone.py new revision: 1.73

I've checked in my suggested patch, and tried to improve the filter
documentation by including the phrase "low complexity".  It might be worth
passing this suggestion on to the NCBI as their own command line tools just use
the term filter.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Wed Jun 25 10:04:09 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 25 Jun 2008 07:04:09 -0700 (PDT)
Subject: [Biopython-dev] Bio.SCOP.FileIndex
Message-ID: <141582.2274.qm@web62413.mail.re1.yahoo.com>

Hi everybody,

When I was modifying Bio.SCOP, I noticed that Bio.SCOP.FileIndex is flawed if file reading is done via a buffer (which is often the case in Python).

Before we try to fix this, is anybody actually using Bio.SCOP.FileIndex?
If not, I think we should deprecate it instead of trying to fix it.

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Wed Jun 25 11:55:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 25 Jun 2008 11:55:58 -0400
Subject: [Biopython-dev] [Bug 2529] New: NCBI BLAST XML parser does not
	support the online blast version 2.2.18+
Message-ID: <bug-2529-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2529

           Summary: NCBI BLAST XML parser does not support the online blast
                    version 2.2.18+
           Product: Biopython
           Version: 1.45
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lordnapi at gmail.com
         QAContact: lordnapi at gmail.com


Hello,
I have performed a blast search of PDB database. I am having a problem while
parsing the blast result on both Windows and Linux machines. The following four
lines of code provides me the same error.  Thanks. Ahmet

>>> from Bio.Blast import NCBIWWW
>>> from Bio.Blast import NCBIXML
>>> results_handle = NCBIWWW.qblast( 'blastp', 'pdb', 'ASFPVEILPFLYLGCAKDSTNLDVLEEFGIKYILNVTPNLPNLFENAGEFKYKQIPISDHWSQNLSQ')
>>> blast_record = NCBIXML.parse( results_handle ).next()


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jun 25 12:09:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 25 Jun 2008 12:09:24 -0400
Subject: [Biopython-dev] [Bug 2528] NCBIStandalone.blastall(): Replace
	os.popen3 with subprocess.Popen
In-Reply-To: <bug-2528-42@http.bugzilla.open-bio.org/>
Message-ID: <200806251609.m5PG9OWX002384@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2528


------- Comment #5 from mmokrejs at ribosome.natur.cuni.cz  2008-06-25 12:09 EST -------
(In reply to comment #4)
> Is there an easy way to replicate this issue?
> 

I believe run under mod_python a blast search and try to display it on the web
the results, that's all I actually do. On the server the blastall processes did
not flush it's cache, so if you would connect to the running process by strace
utility you would see it has done write() of some line being not yet the last
one of the output. The process hangs like this for ages, until you do "kill
-HUP $pid", then it it flushes the write buffer and exits successfully. Happens
with blast 2.2.18 at least.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jun 25 12:24:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 25 Jun 2008 12:24:45 -0400
Subject: [Biopython-dev] [Bug 2529] NCBI BLAST XML parser does not support
	the online blast version 2.2.18+
In-Reply-To: <bug-2529-42@http.bugzilla.open-bio.org/>
Message-ID: <200806251624.m5PGOjgf003205@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2529


lordnapi at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WORKSFORME


------- Comment #1 from lordnapi at gmail.com  2008-06-25 12:24 EST -------
The problem was caused by not having data in <BlastOutput_version>BLASTP
2.2.18+</BlastOutput_version> in the XML files. I fixed the problem for myself
by changing _end_BlastOutput_version function in the Blast/NCBIXML.py file to
the following (starts at line 208). I still don't know if having date is
important elsewhere.

def _end_BlastOutput_version(self):
    """version number of the BLAST engine (e.g., 2.1.2)
    Save this to put on each blast record object
    """
    self._valuesplit = self._value.split()
    self._header.version = self._valuesplit[1]
    if len(self._valuesplit) > 2 :
        self._header.date = self._value.split()[2][1:-1]
    else:
        self._header.date = ''


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Wed Jun 25 20:01:07 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 25 Jun 2008 17:01:07 -0700 (PDT)
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
Message-ID: <254082.68438.qm@web62401.mail.re1.yahoo.com>

Dear all,

Recently NCBI blocked access for a Biopython user who? was making 50,000 requests to NCBI at a rate of 18 requests per second during peak hours. This user was using the search_for function in Bio.GenBank, which internally uses Bio.EUtils. Apparently, Bio.EUtils does not follow the 3 seconds sleep rule betwen requests. NCBI also asked us to send requests for the Entrez E-Utilities to the EUtils web address, and not to the regular NCBI web address. I don't know if Bio.EUtils does that.

Bio.Entrez does use the 3 seconds sleep rule, and the eight E-Utilities functions all make use of the EUtils web address, though it is possible to pass a different web address as one of the arguments. The "query" function, which is not part of the E-Utilities, does use the standard NCBI web address.

To avoid such problems in the future, I'd like to propose the following:
1) Deprecate Bio.EUtils. Its functionality is covered by Bio.Entrez, which (from release 1.46) will have a parser.
Bio.EUtils is currently used by the following modules: 
Bio/config/DBRegistry.py
Bio/dbdefs/fasta.py
Bio/dbdefs/genbank.py
Bio/dbdefs/medline.py
Bio/GenBank/__init__.py
We were already planning to remove Bio.config and Bio.dbdefs, so we'd only have to modify Bio.GenBank.

2) Remove the 'query' function from Bio.Entrez. Anyway accessing NCBI's web site from Python to get HTML back doesn't make a lot of sense.

3) Remove the argument for a user-specified web address to make sure that always the E-Utilities address is used.

--Michiel.


From dalke at dalkescientific.com  Wed Jun 25 21:52:07 2008
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 26 Jun 2008 03:52:07 +0200
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <254082.68438.qm@web62401.mail.re1.yahoo.com>
References: <254082.68438.qm@web62401.mail.re1.yahoo.com>
Message-ID: <635E5251-830F-409C-A2D4-10EA59FA5037@dalkescientific.com>

On Jun 26, 2008, at 2:01 AM, Michiel de Hoon wrote:
> Bio.Entrez does use the 3 seconds sleep rule, and the eight E- 
> Utilities functions all make use of the EUtils web address, though  
> it is possible to pass a different web address as one of the  
> arguments. The "query" function, which is not part of the E- 
> Utilities, does use the standard NCBI web address.

What is the proper EUtils web address?

Entrez/__init__.py uses
   cgi='http://www.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi'
while the documentation at
   http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
claims "Send E-utilities requests to http://eutils.ncbi.nlm.nih.gov",
which I think should be             "http://eutils.ncbi.nlm.nih.gov/ 
entrez/eutils/epost.fcgi"

> To avoid such problems in the future, I'd like to propose the  
> following:
> 1) Deprecate Bio.EUtils. Its functionality is covered by  
> Bio.Entrez, which (from release 1.46) will have a parser.

I looked over Bio.Entrez and it handles only a subset of what  
Bio.EUtils does.  For example, it doesn't have any support to help  
track WebEnv as it changes over each request, nor support for  
alternate format types.

I would deprecate Bio.EUtils for another reason - there's no maintainer.

> 2) Remove the 'query' function from Bio.Entrez. Anyway accessing  
> NCBI's web site from Python to get HTML back doesn't make a lot of  
> sense.

Okay, now I'm quite confused.  This is functionality that Bio.EUtils  
supports.


 >>> from Bio.EUtils import HistoryClient
 >>> client = HistoryClient.HistoryClient()
 >>> result = client.search("Michiel de Hoon[AU]")
 >>> print result.efetch("text", "docsum").read()

1:  de Hoon M, Hayashizaki Y.
  Deep cap analysis gene expression (CAGE): genome-wide  
identification of
promoters, quantification of their expression, and network inference.
Biotechniques. 2008 Apr;44(5):627-8, 630, 632. Review.
PMID: 18474037 [PubMed - indexed for MEDLINE]

2:  Sierro N, Makita Y, de Hoon M, Nakai K.
  DBTBS: a database of transcriptional regulation in Bacillus  
subtilis containing
upstream intergenic conservation information.
Nucleic Acids Res. 2008 Jan;36(Database issue):D93-6. Epub 2007 Oct 25.
PMID: 17962296 [PubMed - indexed for MEDLINE]

3:  Makita Y, de Hoon MJ, Danchin A.
  Hon-yaku: a biology-driven Bayesian methodology for identifying  
translation
initiation sites in prokaryotes.
BMC Bioinformatics. 2007 Feb 8;8:47.
PMID: 17286872 [PubMed - indexed for MEDLINE]

4:  de Hoon MJ, Makita Y, Nakai K, Miyano S.
  Prediction of transcriptional terminators in Bacillus subtilis and  
related
species.
PLoS Comput Biol. 2005 Aug;1(3):e25. Epub 2005 Aug 12.
PMID: 16110342 [PubMed - indexed for MEDLINE]

5:  de Hoon MJ, Imoto S, Kobayashi K, Ogasawara N, Miyano S.
  Inferring gene regulatory networks from time-ordered gene  
expression data of
Bacillus subtilis using differential equations.
Pac Symp Biocomput. 2003;:17-28.
PMID: 12603014 [PubMed - indexed for MEDLINE]


(The default returns this in XML format.)


 >>> print result.efetch().read(500)
<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st  
January 2008//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ 
pubmed_080101.dtd">
<PubmedArticleSet>
<PubmedArticle>
     <MedlineCitation Owner="NLM" Status="MEDLINE">
         <PMID>18474037</PMID>
         <DateCreated>
             <Year>2008</Year>
             <Month>05</Month>
             <Day>13</Day>
         </DateCreated>
         <DateCompleted>
             <Year>2008</Year>
             <Month>06</Mont


> 3) Remove the argument for a user-specified web address to make  
> sure that always the E-Utilities address is used.

Yes.

				Andrew
				dalke at dalkescientific.com


From bugzilla-daemon at portal.open-bio.org  Thu Jun 26 05:20:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jun 2008 05:20:55 -0400
Subject: [Biopython-dev] [Bug 2529] NCBI BLAST XML parser does not support
	the online blast version 2.2.18+
In-Reply-To: <bug-2529-42@http.bugzilla.open-bio.org/>
Message-ID: <200806260920.m5Q9Ktlt019555@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2529


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|WORKSFORME                  |


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-26 05:20 EST -------
This is a duplicate of Bug 2499, reopening in order to mark this.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jun 26 05:21:38 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jun 2008 05:21:38 -0400
Subject: [Biopython-dev] [Bug 2529] NCBI BLAST XML parser does not support
	the online blast version 2.2.18+
In-Reply-To: <bug-2529-42@http.bugzilla.open-bio.org/>
Message-ID: <200806260921.m5Q9Lcp6019606@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2529


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |DUPLICATE


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-26 05:21 EST -------
The fix for the 2.2.18+ XML output is already in CVS, see Bug 2499

*** This bug has been marked as a duplicate of bug 2499 ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jun 26 05:21:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jun 2008 05:21:40 -0400
Subject: [Biopython-dev] [Bug 2499] Bio.Blast.NCBIXML cannot handle XML
	without date in BlastOutput_version
In-Reply-To: <bug-2499-42@http.bugzilla.open-bio.org/>
Message-ID: <200806260921.m5Q9Lebn019619@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2499


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lordnapi at gmail.com


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-26 05:21 EST -------
*** Bug 2529 has been marked as a duplicate of this bug. ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Thu Jun 26 06:25:38 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 11:25:38 +0100
Subject: [Biopython-dev] [BioPython] Fwd: NCBI Abuse Activity with
	BioPython
In-Reply-To: <DA88B268-8CE1-4C71-A685-F7C13978C8BA@dalkescientific.com>
References: <EEEED756EF6626469B10653F74501438018FAABB@NIHCESMLBX15.nih.gov>
	<55F39FBF-3CF0-4192-AFEB-100853FEE8A1@sonsorol.org>
	<DA88B268-8CE1-4C71-A685-F7C13978C8BA@dalkescientific.com>
Message-ID: <320fb6e00806260325m3b92ff8n143141c73a1a60dd@mail.gmail.com>

Andrew wrote:
>
>  I thought I put a rate limiter into the code, but looking at it now I see I
> didn't.  The documentation clearly states that users must follow NCBI's
> recommendations, but who actually reads documentation?
>
>>> *  Send E-utilities requests to http://eutils.ncbi.nlm.nih.gov
>>> <http://eutils.ncbi.nlm.nih.gov/> , not the standard NCBI Web address.
>
> That change was announced on May 21, 2003, and most likely no one on the
> Biopython dev group tracks the EUtils mailing list.  It was also after I
> wrote the code, but to be fair I was subscribed to the utilities list at the
> time and should have caught the change.
>
> I think the correct fix is to this code in ThinClient.py:
>
>    def __init__(self,
>                 opener = None,
>                 tool = TOOL,
>                 email = EMAIL,
>                 baseurl = "http://www.ncbi.nlm.nih.gov/entrez/eutils/"):
>
> Change the baseurl to "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/".  I
> have not tested this.

I've tested that fix, and it seems to be OK with test_EUtils.py and
test_SeqIO_online.py which calls Bio.EUTils via Bio.GenBank, checked
in as Bio/EUtils/ThinClient.py revision 1.6

I'll have a look at your other specific suggestions too.  Thanks for
taking the time to go over this Andrew.

Peter

From p.j.a.cock at googlemail.com  Thu Jun 26 06:47:05 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 26 Jun 2008 11:47:05 +0100
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <635E5251-830F-409C-A2D4-10EA59FA5037@dalkescientific.com>
References: <254082.68438.qm@web62401.mail.re1.yahoo.com>
	<635E5251-830F-409C-A2D4-10EA59FA5037@dalkescientific.com>
Message-ID: <320fb6e00806260347i7655ba6eg490f5003a273a37d@mail.gmail.com>

On Thu, Jun 26, 2008 at 2:52 AM, Andrew Dalke <dalke at dalkescientific.com> wrote:
> On Jun 26, 2008, at 2:01 AM, Michiel de Hoon wrote:
>>
>> Bio.Entrez does use the 3 seconds sleep rule, and the eight E-Utilities
>> functions all make use of the EUtils web address, though it is possible to
>> pass a different web address as one of the arguments. The "query" function,
>> which is not part of the E-Utilities, does use the standard NCBI web
>> address.
>
> What is the proper EUtils web address?
>
> Entrez/__init__.py uses
>  cgi='http://www.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi'
> while the documentation at
>  http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
> claims "Send E-utilities requests to http://eutils.ncbi.nlm.nih.gov",
> which I think should be
> "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi"

Yes, for ePost that is correct:
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/epost_help.html

[On a related note, following Andrew's suggestion, I have updated CVS
to use the new base URL in Bio/EUtils/ThinClient.py]

>> To avoid such problems in the future, I'd like to propose the following:
>> 1) Deprecate Bio.EUtils. Its functionality is covered by Bio.Entrez, which
>> (from release 1.46) will have a parser.
>
> I looked over Bio.Entrez and it handles only a subset of what Bio.EUtils
> does.  For example, it doesn't have any support to help track WebEnv as it
> changes over each request, nor support for alternate format types.

No, Bio.Entrez does not support the WebEnv / history interface.  It
can request data in different format types though, although it will
only parse the XML output.

> I would deprecate Bio.EUtils for another reason - there's no maintainer.

This is a strong reason - although we are still using Bio.EUtils in
Bio.GenBank (and probably in other places too).

>> 2) Remove the 'query' function from Bio.Entrez. Anyway accessing NCBI's
>> web site from Python to get HTML back doesn't make a lot of sense.
>
> Okay, now I'm quite confused.  This is functionality that Bio.EUtils
> supports.

I think Michiel meant getting a handle containing raw HTML isn't very
sensible, and this is what the Bio.Entrez.query() function does.  If
it can only return HTML, then I agree, its not very useful and could
be removed.

>> 3) Remove the argument for a user-specified web address to make sure that
>> always the E-Utilities address is used.
>
> Yes.
>

Unlike BLAST where you may have a local webserver, is there any reason
for to use a URL other than the NCBI's one?

Peter

From dalke at dalkescientific.com  Thu Jun 26 07:03:19 2008
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 26 Jun 2008 13:03:19 +0200
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806260347i7655ba6eg490f5003a273a37d@mail.gmail.com>
References: <254082.68438.qm@web62401.mail.re1.yahoo.com>
	<635E5251-830F-409C-A2D4-10EA59FA5037@dalkescientific.com>
	<320fb6e00806260347i7655ba6eg490f5003a273a37d@mail.gmail.com>
Message-ID: <52BDC1F6-52F8-4A42-B738-DFBB119F9C27@dalkescientific.com>

On Jun 26, 2008, at 12:47 PM, Peter Cock wrote:
> I think Michiel meant getting a handle containing raw HTML isn't very
> sensible, and this is what the Bio.Entrez.query() function does.

I meant to point out that supporting the search interface, with  
machine parseable, is functionality in Bio.EUtils that isn't in  
Bio.Entrez.

> Unlike BLAST where you may have a local webserver, is there any reason
> for to use a URL other than the NCBI's one?

I can't think of any.

(I can make up one - setting up a local mock server for tests.  But  
that's not seriously going to happen.)

				Andrew
				dalke at dalkescientific.com


From biopython at maubp.freeserve.co.uk  Thu Jun 26 07:40:54 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 12:40:54 +0100
Subject: [Biopython-dev] [BioPython] Fwd: NCBI Abuse Activity with
	BioPython
In-Reply-To: <5CD393BF-D4FB-4700-B7CC-2417C9845010@dalkescientific.com>
References: <EEEED756EF6626469B10653F74501438018FAABB@NIHCESMLBX15.nih.gov>
	<55F39FBF-3CF0-4192-AFEB-100853FEE8A1@sonsorol.org>
	<DA88B268-8CE1-4C71-A685-F7C13978C8BA@dalkescientific.com>
	<320fb6e00806260421g48e5807ei92297b372c330e5b@mail.gmail.com>
	<5CD393BF-D4FB-4700-B7CC-2417C9845010@dalkescientific.com>
Message-ID: <320fb6e00806260440n4a933b60of5a7c8eee4e15a89@mail.gmail.com>

On Thu, Jun 26, 2008 at 12:26 PM, Andrew Dalke wrote:
> On Jun 26, 2008, at 1:21 PM, Peter wrote:
>>
>> Looking over the code, should this wait also be done for the
>> ThinClient's epost() method as well?
>
> Where?  It gets the URL from an instance variable, which is set in the
> constructor.

The ThinClient class is defined In Bio/EUtils/ThinClient.py, and I
have added a 3 second wait to its _get() method.  I think we should
also add the three second wait to the epost() method.  Both methods
will construct their URL using self.baseurl, so they are both going to
hit the same server.

Note that for the implementation, I would probably define a new
_wait() method to check the time since the last call, and call this
_wait() method from both _get() and epost().

>> This complexity is also daunting for anyone else considering taking
>> over the Bio.EUtils code base.
>
> My incomplete rewrite uses elementtree which does reduce some of the
> complexity.  But the NCBI interface is a mess.

I can see why Michiel has kept things simple in Bio.Entrez - this
should cater to most user's needs.

Peter

From mjldehoon at yahoo.com  Thu Jun 26 07:45:45 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 26 Jun 2008 04:45:45 -0700 (PDT)
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806260347i7655ba6eg490f5003a273a37d@mail.gmail.com>
Message-ID: <402220.93857.qm@web62411.mail.re1.yahoo.com>

> > I would deprecate Bio.EUtils for another reason - there's no maintainer.
This is what I meant. I am sure that we can fix Bio.EUtils for now, but I don't see how we can maintain it in the future. That is why originally we decided to focus on Bio.WWW.NCBI (renamed to Bio.Entrez) instead.

> - although we are still using Bio.EUtils in Bio.GenBank
> (and probably in other places too).

As far as I can tell, Bio.GenBank is currently the only module in which Bio.EUtils is used, not counting modules that themselves have been deprecated. It shouldn't be too complicated to modify Bio.GenBank to use Bio.Entrez instead.

>>> 2) Remove the 'query' function from Bio.Entrez.
>>>  Anyway accessing NCBI's web site from Python
>>> to get HTML back doesn't make a lot of sense.
>
>> Okay, now I'm quite confused.  This is functionality
>> that Bio.EUtils supports.
>
> I think Michiel meant getting a handle containing
> raw HTML isn't very sensible, and this is what the
> Bio.Entrez.query() function does.  If it can only
> return HTML, then I agree, its not very useful and
> could be removed.
That is indeed what I meant. (It is still possible to get raw HTML by using the other EUtilities, for example efetch, but from a scripting language efetch is more likely to be used to get XML or some plain-text output).

--Michiel


From mjldehoon at yahoo.com  Thu Jun 26 08:50:10 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 26 Jun 2008 05:50:10 -0700 (PDT)
Subject: [Biopython-dev] New release
Message-ID: <390323.35893.qm@web62411.mail.re1.yahoo.com>

Hi everybody,

I think we should make a new Biopython release within the next couple of weeks to solve the issues with NCBI and to get the fixed Blast parser out (for output from Blast 2.2.18). There are a few outstanding issues that hopefully can be fixed before the next release:
1) NCBI access from Bio.GenBank
2) Bug #2454 (Iterators can't use file-like objects), which affects a number of parsers in Biopython
3) Martel-based parsers.

>From a technical viewpoint, none of these are very complicated. 2) is almost finished.
With respect to 3), a small number of parsers in Biopython are based on Martel (none of the major ones as far as I can tell). For some of these parsers, it is not quite clear if they are still useful. For the remaining ones, it would be nice if they could be rewritten without using Martel -- that would let us get rid of the dependency on mxTextTools.

Any other urgent issues that need to be resolved before a release?

--Michiel.


From biopython at maubp.freeserve.co.uk  Thu Jun 26 08:53:09 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 13:53:09 +0100
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <402220.93857.qm@web62411.mail.re1.yahoo.com>
References: <320fb6e00806260347i7655ba6eg490f5003a273a37d@mail.gmail.com>
	<402220.93857.qm@web62411.mail.re1.yahoo.com>
Message-ID: <320fb6e00806260553i4a7c5b2cxe5ae5aa0c80e53d1@mail.gmail.com>

> As far as I can tell, Bio.GenBank is currently the only module in which
> Bio.EUtils is used, not counting modules that themselves have been
> deprecated. It shouldn't be too complicated to modify Bio.GenBank to use
> Bio.Entrez instead.

Looking back at CVS, it used to use Bio.WWW.NCBI once upon a time
(which is now Bio.Entrez), and had explicit rate limiting.  Then four
years ago Brad moved the Bio.GenBank.download_many() and search_for()
functions over to using Bio.EUtils (CVS revision 1.51 of
Bio/GenBank/__init__.py).

Brad also appears to have changed the functionality of
Bio.GenBank.download_many() from a call back mechanism to returning a
handle.  We could still return a handle, but it would require fetching
all the records (perhaps in batches), and concatenating them.  I think
it would make more sense to deprecate the Bio.GenBank.download_many()
function, and direct people to Bio.Entrez.efetch() instead.

The Bio.GenBank.search_for() still seems somewhat useful, but without
a default limit on the number of returned IDs, this could easily be
abused.  Again, we could deprecate this and direct people to
Bio.Entrez.esearch() instead.

Peter

From mjldehoon at yahoo.com  Thu Jun 26 09:41:24 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 26 Jun 2008 06:41:24 -0700 (PDT)
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806260553i4a7c5b2cxe5ae5aa0c80e53d1@mail.gmail.com>
Message-ID: <8498.83228.qm@web62412.mail.re1.yahoo.com>

> The Bio.GenBank.search_for() still seems somewhat
> useful, but without a default limit on the number
> of returned IDs, this could easily be abused.
> Again, we could deprecate this and direct people
> to Bio.Entrez.esearch() instead.
As always, I am in favor of deprecating functions whose purpose is dubious.
F

# Using Bio.GenBank
>>> from Bio import GenBank
>>> gi_list = GenBank.search_for("Opuntia AND rpl16")
>>> gi_list
['57240072', '57240071', '6273287', '6273291', '6273290', '6273289', '6273286', '6273285', '6273284']

# Same thing, using Bio.Entrez
>>> from Bio import Entrez
>>> handle = Entrez.esearch(db='nucleotide', term="Opuntia AND rpl16")
>>> record = Entrez.read(handle)
>>> record["IdList"]
['57240072', '57240071', '6273287', '6273291', '6273290', '6273289', '6273286', '6273285', '6273284']


--- On Thu, 6/26/08, Peter <biopython at maubp.freeserve.co.uk> wrote:
From: Peter <biopython at maubp.freeserve.co.uk>
Subject: Re: [Biopython-dev] NCBI Abuse activity with Biopython
To: mjldehoon at yahoo.com
Cc: "Biopython Developers Mailing List" <biopython-dev at biopython.org>
Date: Thursday, June 26, 2008, 8:53 AM

> As far as I can tell, Bio.GenBank is currently the only module in which
> Bio.EUtils is used, not counting modules that themselves have been
> deprecated. It shouldn't be too complicated to modify Bio.GenBank to
use
> Bio.Entrez instead.

Looking back at CVS, it used to use Bio.WWW.NCBI once upon a time
(which is now Bio.Entrez), and had explicit rate limiting.  Then four
years ago Brad moved the Bio.GenBank.download_many() and search_for()
functions over to using Bio.EUtils (CVS revision 1.51 of
Bio/GenBank/__init__.py).

Brad also appears to have changed the functionality of
Bio.GenBank.download_many() from a call back mechanism to returning a
handle.  We could still return a handle, but it would require fetching
all the records (perhaps in batches), and concatenating them.  I think
it would make more sense to deprecate the Bio.GenBank.download_many()
function, and direct people to Bio.Entrez.efetch() instead.

The Bio.GenBank.search_for() still seems somewhat useful, but without
a default limit on the number of returned IDs, this could easily be
abused.  Again, we could deprecate this and direct people to
Bio.Entrez.esearch() instead.

Peter


From mjldehoon at yahoo.com  Thu Jun 26 09:51:55 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 26 Jun 2008 06:51:55 -0700 (PDT)
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806260553i4a7c5b2cxe5ae5aa0c80e53d1@mail.gmail.com>
Message-ID: <597121.15112.qm@web62401.mail.re1.yahoo.com>

[Sorry, hit the send button too soon]

> The Bio.GenBank.search_for() still seems somewhat
> useful, but without a default limit on the number
> of returned IDs, this could easily be abused.
> Again, we could deprecate this and direct people
> to Bio.Entrez.esearch() instead.
As always, I am in favor of deprecating functions whose purpose is dubious.
As an example, this is a Genbank search done via Bio.GenBank and via Bio.Entrez:

# Using Bio.GenBank
>>> from Bio import GenBank
>>> gi_list = GenBank.search_for("Opuntia AND rpl16")
>>> gi_list
['57240072', '57240071', '6273287', '6273291', '6273290', '6273289', '6273286', '6273285', '6273284']

# Same thing, using Bio.Entrez
>>> from Bio import Entrez
>>> handle = Entrez.esearch(db='nucleotide', term="Opuntia AND rpl16")
>>> record = Entrez.read(handle)
>>> record["IdList"]
['57240072', '57240071', '6273287', '6273291', '6273290', '6273289', '6273286', '6273285', '6273284']

I believe that GenBank.search_for automatically takes care of the retmax parameter (the maximum number of ids to return), but I agree that this can be abused easily.

> Brad also appears to have changed the functionality of 
> Bio.GenBank.download_many() from a call back mechanism 
> to returning a handle.  We could still return a handle, but it would
> require fetching all the records (perhaps in batches), and
> concatenating them.  I think it would make more sense to deprecate
> the Bio.GenBank.download_many() function, and direct people to
> Bio.Entrez.efetch() instead.

Agree.

Btw, NCBIDictionary definitely needs to go.
>From the documentation, continuing the example above:
>>> ncbi_dict = GenBank.NCBIDictionary("nucleotide", "genbank")
>>> gb_record = ncbi_dict[gi_list[0]]
Hence, we're running efetch once for each key separately; this is exactly what NCBI advised against.

--Michiel.


From mjldehoon at yahoo.com  Thu Jun 26 10:01:31 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 26 Jun 2008 07:01:31 -0700 (PDT)
Subject: [Biopython-dev] Bio.ECell, anybody?
Message-ID: <712489.88060.qm@web62410.mail.re1.yahoo.com>

This is one of the Martel-based parser whose relevance in 2008 is unclear to me.

>From the docstring:

Ecell converts the ECell input from spreadsheet format to an intermediate format, described in http://www.e-cell.org/manual/chapter2E.html#3.2.? It provides an alternative to the perl script supplied with the Ecell2 distribution at http://bioinformatics.org/project/?group_id=49.

Currently, ECell is at version 3.1.106 (and uses Python as the scripting interface! Yay!). The link to the chapter in the ECell manual is dead.

Is anybody using the Bio.ECell module?

--Michiel


From biopython at maubp.freeserve.co.uk  Thu Jun 26 10:43:10 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 15:43:10 +0100
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <597121.15112.qm@web62401.mail.re1.yahoo.com>
References: <320fb6e00806260553i4a7c5b2cxe5ae5aa0c80e53d1@mail.gmail.com>
	<597121.15112.qm@web62401.mail.re1.yahoo.com>
Message-ID: <320fb6e00806260743u3385955dt2be06d7f8122d8e5@mail.gmail.com>

OK then - I will deprecate the Bio.GenBank.search_for() and
Bio.GenBank,download_many() functions, suggesting Bio.Entrez instead.
I will also update the tutorial on this.

On Thu, Jun 26, 2008 at 2:51 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Btw, NCBIDictionary definitely needs to go.
> From the documentation, continuing the example above:
>>>> ncbi_dict = GenBank.NCBIDictionary("nucleotide", "genbank")
>>>> gb_record = ncbi_dict[gi_list[0]]
> Hence, we're running efetch once for each key separately; this is exactly what NCBI advised against.

If the user wants to run a Entrez search and then fetch some/all of
the results, then yes, the NCBI would not want us to do a multiple
separate efetch calls by idenifier.  Could you prepare an example
using Bio.Entrez with the "history" (WebEnv argument)?

However, if the user has provided the list of GI numbers (e.g. from a
file), there is no existing NCBI search data to refer to, and I don't
see any other option.  So there is a use-case for the
Bio.GenBank.NCBIDictionary class.

Peter

From mjldehoon at yahoo.com  Thu Jun 26 10:49:49 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 26 Jun 2008 07:49:49 -0700 (PDT)
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806260743u3385955dt2be06d7f8122d8e5@mail.gmail.com>
Message-ID: <525848.21341.qm@web62410.mail.re1.yahoo.com>

--- On Thu, 6/26/08, Peter <biopython at maubp.freeserve.co.uk> wrote:
However, if the user has provided the list of GI numbers (e.g. from a file), there is no existing NCBI search data to refer to, and I don't see any other option.  So there is a use-case for the Bio.GenBank.NCBIDictionary class.

In that case, the following can be used:
>>> from Bio import Entrez
>>> idlist = ['123','456','453',.....] # a list of GI numbers
>>> ids = ",".join(idlist)
>>> handle = Entrez.efetch(db='nucleotide', id=ids, retmode='xml')
>>> records = Entrez.read(handle)
# records is now a list of records corresponding to '123', '456', '453',...

--Michiel. 


From biopython at maubp.freeserve.co.uk  Thu Jun 26 12:05:36 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 17:05:36 +0100
Subject: [Biopython-dev] [BioPython] Fwd: NCBI Abuse Activity with
	BioPython
In-Reply-To: <79693088-0D38-459E-ADEC-FF2757E41912@dalkescientific.com>
References: <EEEED756EF6626469B10653F74501438018FAABB@NIHCESMLBX15.nih.gov>
	<55F39FBF-3CF0-4192-AFEB-100853FEE8A1@sonsorol.org>
	<DA88B268-8CE1-4C71-A685-F7C13978C8BA@dalkescientific.com>
	<320fb6e00806260421g48e5807ei92297b372c330e5b@mail.gmail.com>
	<5CD393BF-D4FB-4700-B7CC-2417C9845010@dalkescientific.com>
	<320fb6e00806260440n4a933b60of5a7c8eee4e15a89@mail.gmail.com>
	<79693088-0D38-459E-ADEC-FF2757E41912@dalkescientific.com>
Message-ID: <320fb6e00806260905i599a53f3v367045d3ee07ffbf@mail.gmail.com>

On Thu, Jun 26, 2008 at 12:48 PM, Andrew Dalke
<dalke at dalkescientific.com> wrote:
>> I think we should
>> also add the three second wait to the epost() method.
>
> I see it now.  Yes, that needs it as well.

Good - I've updated that in CVS, Bio/EUtils/ThinClient.py revision 1.8

>> I can see why Michiel has kept things simple in Bio.Entrez - this
>> should cater to most user's needs.
>
> Sad, but true.  EUtils (the server and the client) offer a lot more than
> what most users need.
>

Agreed.

Thanks again Andrew for your advice on where Bio.EUtils needed
updating - it certainly meant this got dealt with more quickly.

Peter

From biopython at maubp.freeserve.co.uk  Thu Jun 26 13:04:26 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 18:04:26 +0100
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806260743u3385955dt2be06d7f8122d8e5@mail.gmail.com>
References: <320fb6e00806260553i4a7c5b2cxe5ae5aa0c80e53d1@mail.gmail.com>
	<597121.15112.qm@web62401.mail.re1.yahoo.com>
	<320fb6e00806260743u3385955dt2be06d7f8122d8e5@mail.gmail.com>
Message-ID: <320fb6e00806261004r227c3340wf390779f1cc4616b@mail.gmail.com>

Michiel,

I started working on a patch to mark Bio.GenBank.search_for() etc as
deprecated, but on reflection I don't really like the longer code
needed with Bio.Entrez - for example this one liner:

from Bio import GenBank
gi_list = GenBank.search_for("Opuntia AND rpl16")

becomes:

from Bio import Entrez
handle = Entrez.esearch(db='nucleotide', term="Opuntia AND rpl16")
gi_list = Entrez.read(handle)["IdList"]

One idea that might be worth discussing is having variations of the
Entrez.e* functions which will parse the XML and return the results.
i.e. something like this:

def esearch2(...) :
   """Calls ESearch and parses the returned XML."""
   return read(esearch(..., retmode="XML"))

Then we can write,

from Bio import Entrez
gi_list = Entrez.esearch2(db='nucleotide', term="Opuntia AND rpl16")["IdList"]

(An alternative naming convention like a "p" might be nicer)

My initial plan was to get the search results back as plain text
(retmode='uilist'), thus avoiding parsing the XML.  However, after
reading the Entrez documentation, and some experimentation to confirm
this, I was surprised to find the ESearch will only return XML.  The
NCBI appear to suggest that if you want your search results in another
format use the WebEnv session history, and then ask EFetch to reformat
it (!).  This does work, but means making two internet calls:

from Bio import Entrez
handle = Entrez.esearch(db='nucleotide', term="Opuntia AND rpl16",
usehistory="y")
session = Entrez.read(handle)['WebEnv']
gi_list = Entrez.efetch(db='nucleotide', WebEnv=session, query_key=1,
rettype='uilist').read().split('\n')

As an aside, do we really have to include the database in the efetch call above?

Peter

From biopython at maubp.freeserve.co.uk  Thu Jun 26 16:32:07 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 21:32:07 +0100
Subject: [Biopython-dev] New release
In-Reply-To: <390323.35893.qm@web62411.mail.re1.yahoo.com>
References: <390323.35893.qm@web62411.mail.re1.yahoo.com>
Message-ID: <320fb6e00806261332q408cc02boa7ee4c3342b53e4b@mail.gmail.com>

On Thu, Jun 26, 2008 at 1:50 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> I think we should make a new Biopython release within the next couple of weeks
> to solve the issues with NCBI and to get the fixed Blast parser out (for output
> from Blast 2.2.18). There are a few outstanding issues that hopefully can be
> fixed before the next release:
> 1) NCBI access from Bio.GenBank
> 2) Bug #2454 (Iterators can't use file-like objects), which affects a number of parsers in Biopython
> 3) Martel-based parsers.

Given the updates to Bio.EUtils to enforce the 3 second rule, the
urgent part of issue (1) is now resolved, and any futher refinements
needn't hold up the release.

>From a technical viewpoint, none of these are very complicated. 2) is almost finished.

While there are still outstanding parsers affected by issue (2) (Bug
2454), I don't think this need hold up the release.

> With respect to 3), a small number of parsers in Biopython are based on
> Martel (none of the major ones as far as I can tell). For some of these
> parsers, it is not quite clear if they are still useful. For the remaining ones,
>  it would be nice if they could be rewritten without using Martel -- that would
> let us get rid of the dependency on mxTextTools.

Again, while removing the dependency on mxTextTools is a worthwhile
aim, I don't think this should hold up the release.

> Any other urgent issues that need to be resolved before a release?

There is an AlignInfo alphabet issue I'm currently working on, and
expect to have fixed tomorrow.

Peter

From dalke at dalkescientific.com  Thu Jun 26 17:40:51 2008
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 26 Jun 2008 23:40:51 +0200
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806261004r227c3340wf390779f1cc4616b@mail.gmail.com>
References: <320fb6e00806260553i4a7c5b2cxe5ae5aa0c80e53d1@mail.gmail.com>
	<597121.15112.qm@web62401.mail.re1.yahoo.com>
	<320fb6e00806260743u3385955dt2be06d7f8122d8e5@mail.gmail.com>
	<320fb6e00806261004r227c3340wf390779f1cc4616b@mail.gmail.com>
Message-ID: <5DF39193-B52A-4EB9-84D3-C9626984DEA8@dalkescientific.com>

On Jun 26, 2008, at 7:04 PM, Peter wrote:
> I started working on a patch to mark Bio.GenBank.search_for() etc as
> deprecated, but on reflection I don't really like the longer code
> needed with Bio.Entrez

> One idea that might be worth discussing is having variations of the
> Entrez.e* functions which will parse the XML and return the results.
> i.e. something like this:
>
> def esearch2(...) :
>    """Calls ESearch and parses the returned XML."""
>    return read(esearch(..., retmode="XML"))

What about calling it "search"?  That is, the one that does  
everything the default way as most people expect is the one which  
doesn't need the prefix?

> My initial plan was to get the search results back as plain text
> (retmode='uilist'), thus avoiding parsing the XML.  However, after
> reading the Entrez documentation, and some experimentation to confirm
> this, I was surprised to find the ESearch will only return XML.  The
> NCBI appear to suggest that if you want your search results in another
> format use the WebEnv session history, and then ask EFetch to reformat
> it (!).  This does work, but means making two internet calls:

That's my memory of it too.

> As an aside, do we really have to include the database in the  
> efetch call above?

Yes.  Or you did 5 years ago.

				Andrew
				dalke at dalkescientific.com


From biopython at maubp.freeserve.co.uk  Thu Jun 26 17:53:40 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 22:53:40 +0100
Subject: [Biopython-dev] New release
In-Reply-To: <320fb6e00806261332q408cc02boa7ee4c3342b53e4b@mail.gmail.com>
References: <390323.35893.qm@web62411.mail.re1.yahoo.com>
	<320fb6e00806261332q408cc02boa7ee4c3342b53e4b@mail.gmail.com>
Message-ID: <320fb6e00806261453l649f4ce3i83a6ed38fec54965@mail.gmail.com>

>> Any other urgent issues that need to be resolved before a release?
>
> There is an AlignInfo alphabet issue I'm currently working on, and
> expect to have fixed tomorrow.

Fixed, I think.  Alphabets can be annoying, especially gapped alphabets!

Peter

From biopython at maubp.freeserve.co.uk  Thu Jun 26 18:05:45 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 23:05:45 +0100
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <5DF39193-B52A-4EB9-84D3-C9626984DEA8@dalkescientific.com>
References: <320fb6e00806260553i4a7c5b2cxe5ae5aa0c80e53d1@mail.gmail.com>
	<597121.15112.qm@web62401.mail.re1.yahoo.com>
	<320fb6e00806260743u3385955dt2be06d7f8122d8e5@mail.gmail.com>
	<320fb6e00806261004r227c3340wf390779f1cc4616b@mail.gmail.com>
	<5DF39193-B52A-4EB9-84D3-C9626984DEA8@dalkescientific.com>
Message-ID: <320fb6e00806261505w6e51d168i78987ac109a6f015@mail.gmail.com>

On Thu, Jun 26, 2008 at 10:40 PM, Andrew Dalke
<dalke at dalkescientific.com> wrote:
> On Jun 26, 2008, at 7:04 PM, Peter wrote:
>>
>> I started working on a patch to mark Bio.GenBank.search_for() etc as
>> deprecated, but on reflection I don't really like the longer code
>> needed with Bio.Entrez
>
>> One idea that might be worth discussing is having variations of the
>> Entrez.e* functions which will parse the XML and return the results.
>> i.e. something like this:
>>
>> def esearch2(...) :
>>   """Calls ESearch and parses the returned XML."""
>>   return read(esearch(..., retmode="XML"))
>
> What about calling it "search"?  That is, the one that does everything the
> default way as most people expect is the one which doesn't need the prefix?

I like that idea for the naming :)  What do you think Michiel, as this
is your module?

Peter

From mjldehoon at yahoo.com  Thu Jun 26 19:16:23 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 26 Jun 2008 16:16:23 -0700 (PDT)
Subject: [Biopython-dev] New release
In-Reply-To: <320fb6e00806261332q408cc02boa7ee4c3342b53e4b@mail.gmail.com>
Message-ID: <501202.26872.qm@web62413.mail.re1.yahoo.com>

OK, then let's make a new release as soon as possible, and perhaps another one soon after that. Tentative date is this Sunday, around noon GMT. All biopython unit tests pass (at least, on my machine), so it should be straightforward to build a release.

--Michiel.

--- On Thu, 6/26/08, Peter <biopython at maubp.freeserve.co.uk> wrote:
From: Peter <biopython at maubp.freeserve.co.uk>
Subject: Re: [Biopython-dev] New release
To: mjldehoon at yahoo.com
Cc: biopython-dev at biopython.org
Date: Thursday, June 26, 2008, 4:32 PM

On Thu, Jun 26, 2008 at 1:50 PM, Michiel de Hoon <mjldehoon at yahoo.com>
wrote:
> Hi everybody,
>
> I think we should make a new Biopython release within the next couple of
weeks
> to solve the issues with NCBI and to get the fixed Blast parser out (for
output
> from Blast 2.2.18). There are a few outstanding issues that hopefully can
be
> fixed before the next release:
> 1) NCBI access from Bio.GenBank
> 2) Bug #2454 (Iterators can't use file-like objects), which affects a
number of parsers in Biopython
> 3) Martel-based parsers.

Given the updates to Bio.EUtils to enforce the 3 second rule, the
urgent part of issue (1) is now resolved, and any futher refinements
needn't hold up the release.

>From a technical viewpoint, none of these are very complicated. 2) is
almost finished.

While there are still outstanding parsers affected by issue (2) (Bug
2454), I don't think this need hold up the release.

> With respect to 3), a small number of parsers in Biopython are based on
> Martel (none of the major ones as far as I can tell). For some of these
> parsers, it is not quite clear if they are still useful. For the remaining
ones,
>  it would be nice if they could be rewritten without using Martel -- that
would
> let us get rid of the dependency on mxTextTools.

Again, while removing the dependency on mxTextTools is a worthwhile
aim, I don't think this should hold up the release.

> Any other urgent issues that need to be resolved before a release?

There is an AlignInfo alphabet issue I'm currently working on, and
expect to have fixed tomorrow.

Peter


From mjldehoon at yahoo.com  Thu Jun 26 19:20:49 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 26 Jun 2008 16:20:49 -0700 (PDT)
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806261505w6e51d168i78987ac109a6f015@mail.gmail.com>
Message-ID: <900951.88468.qm@web62414.mail.re1.yahoo.com>

There are some other possibilities, for example to use the retout parameter. This parameter lets you choose between XML, HTML, plain text, ... format for the results. We could make the rule that without an explicit value for this parameter, the Bio.Entrez.e* functions return the parsed results.

If we're not sure what to do, I suggest we keep the search_for function in Bio.GenBank for the upcoming release, and take this issue up later.

--Michiel.

--- On Thu, 6/26/08, Peter <biopython at maubp.freeserve.co.uk> wrote:
From: Peter <biopython at maubp.freeserve.co.uk>
Subject: Re: [Biopython-dev] NCBI Abuse activity with Biopython
To: "Biopython Developers Mailing List" <biopython-dev at biopython.org>
Cc: "Andrew Dalke" <dalke at dalkescientific.com>
Date: Thursday, June 26, 2008, 6:05 PM

On Thu, Jun 26, 2008 at 10:40 PM, Andrew Dalke
<dalke at dalkescientific.com> wrote:
> On Jun 26, 2008, at 7:04 PM, Peter wrote:
>>
>> I started working on a patch to mark Bio.GenBank.search_for() etc as
>> deprecated, but on reflection I don't really like the longer code
>> needed with Bio.Entrez
>
>> One idea that might be worth discussing is having variations of the
>> Entrez.e* functions which will parse the XML and return the results.
>> i.e. something like this:
>>
>> def esearch2(...) :
>>   """Calls ESearch and parses the returned
XML."""
>>   return read(esearch(..., retmode="XML"))
>
> What about calling it "search"?  That is, the one that does
everything the
> default way as most people expect is the one which doesn't need the
prefix?

I like that idea for the naming :)  What do you think Michiel, as this
is your module?

Peter
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


From biopython at maubp.freeserve.co.uk  Thu Jun 26 19:45:50 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 00:45:50 +0100
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <900951.88468.qm@web62414.mail.re1.yahoo.com>
References: <320fb6e00806261505w6e51d168i78987ac109a6f015@mail.gmail.com>
	<900951.88468.qm@web62414.mail.re1.yahoo.com>
Message-ID: <320fb6e00806261645y1819cddx620d430f34d7e725@mail.gmail.com>

On Fri, Jun 27, 2008 at 12:20 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> There are some other possibilities, for example to use the retout parameter.
> This parameter lets you choose between XML, HTML, plain text, ... format for
> the results.

I'm not sure if its rettype, retmode or retout - but something like that.

> We could make the rule that without an explicit value for this
> parameter, the Bio.Entrez.e* functions return the parsed results.

You suggestion to automatically do the parsing when XML format is
requested would prevent the user from parsing the XML themselves (e.g.
using SAX or DOM).  It would also spoil my plan to include some of the
Entrez sequence XML formats in Bio.SeqIO as this would need
Bio.efetch(...) to return a handle with XML in it.

> If we're not sure what to do, I suggest we keep the search_for function in
> Bio.GenBank for the upcoming release, and take this issue up later.

That would be expedient.

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Jun 26 19:47:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jun 2008 19:47:14 -0400
Subject: [Biopython-dev] [Bug 2090] Blast.NCBIStandalone BlastParser fails
	with blastall 2.2.14
In-Reply-To: <bug-2090-42@http.bugzilla.open-bio.org/>
Message-ID: <200806262347.m5QNlESr031036@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2090


------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-26 19:47 EST -------
Created an attachment (id=952)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=952&action=view)
Patch to Bio/Blast/NCBIStandalone.py

This is a very rough attempt at fixing multiquery BLAST output from recent
versions of NCBI BLAST.

It seems to work for the file I tested, but breaks the final part of the unit
test due to the alignments shown as "Flat Query-Anchored with(out) Identities",
described here:

http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/multi_formats.html

See also unit test files bt005 and bt045


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jun 26 20:37:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jun 2008 20:37:14 -0400
Subject: [Biopython-dev] [Bug 2375] Coalescent support through Simcoal2
In-Reply-To: <bug-2375-42@http.bugzilla.open-bio.org/>
Message-ID: <200806270037.m5R0bEkY000324@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2375


------- Comment #24 from mdehoon at ims.u-tokyo.ac.jp  2008-06-26 20:37 EST -------
I committed my patch to setup.py, as it seems to work fine with Python 2.3,
2.4, and 2.5 on all platforms. Leaving this bug open, since we still need to
remove the workaround in Bio/PopGen/SimCoal/__init__.py.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Fri Jun 27 10:12:45 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 15:12:45 +0100
Subject: [Biopython-dev] Bio.AlignIO and Bio.Entrez documentation
Message-ID: <320fb6e00806270712w134e1c5cm903b811c55fc60e1@mail.gmail.com>

Hi all,

I've realised that there is quite a lot of new content in the Tutorial
since the last release.

In addition to my new chapter on Bio.AlignIO, Michiel and I have both
spent a good chunk of time on the Bio.Entrez chapter of the tutorial.
Michiel wrote the bulk of this chapter and has updated it to cover the
new XML parser.  I've just been adding information based on the NCBI
guidelines (for example encouraging people to include their email
address in the Entrez calls), and I've just added another section with
an example using the history/webenv for a combined esearch and efetch.

If anyone could spare some time to proof read the tutorial,
concentrating on either or both of these new chapters (and trying the
examples) it would be appreciated.  Those of you with CVS access can
of course check in any little fixes - but if you spot anything
significant its probably worth discussing first.

Ideally we can fix any little typos before Michiel releases Biopython
1.46 (tentatively this Sunday, around noon GMT).

Peter

P.S. If you'd like to help out and can't read or run LaTeX, let me
know by email and I'll send you the latest edition of the tutorial as
a PDF or HTML file.

From biopython at maubp.freeserve.co.uk  Fri Jun 27 11:42:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 16:42:16 +0100
Subject: [Biopython-dev] Removing obsolete bits of the Tutorial
Message-ID: <320fb6e00806270842r6231adfdo8edff7a07a329cdf@mail.gmail.com>

I'm still in documentation mode, and I've just removed bits of
documentation of a few deprecated or obsolete bits of code.

I've just got the the "BioRegistry ? automatically ?nding sequence
sources" section of the tutorial/cookbook, and this either needs major
updating or removing.  First of all since Biopython 1.44, the line
"from Bio import db" had to be "from Bio.config.DBRegistry import db".
 And secondly, given this is all based on Martel parsers, the list of
supported formats is now a lot thinner.

Would anyone object to me removing this section of the
tutorial/cookbook?  We might be able to deprecate it too, but I'm not
sure what side effects that might have so its a bit risky this close
to a planned release.

Then there is the section on "Parser Design" which focuses on the
scanner/consumer model and lists lots of the events these parsers
(used to) generate.  I don't think any of this is useful, and suspect
that a lot of it is out of date.  Again, should we just remove this
section?

Peter


From mjldehoon at yahoo.com  Fri Jun 27 11:54:13 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 27 Jun 2008 08:54:13 -0700 (PDT)
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806261645y1819cddx620d430f34d7e725@mail.gmail.com>
Message-ID: <224711.6366.qm@web62411.mail.re1.yahoo.com>

> > We could make the rule that without an explicit value for this

> > parameter, the Bio.Entrez.e* functions return the parsed results.

> You suggestion to automatically do the parsing when XML format is
> requested would prevent the user from parsing the XML themselves (e.g.
> using SAX or DOM).Actually I was suggesting to do the parsing only if no format is requested, and to return a handle to XML if XML format is requested.

But from the current examples in the Bio.Entrez chapter in the tutorial, it appears that typically users will have to write some glue code anyway to make optimally use of Bio.Entrez for their purposes. In that case, I suppose that whether or not we return a handle or an object from the Bio.Entrez.e* functions makes little difference.

--Michiel.


From biopython at maubp.freeserve.co.uk  Fri Jun 27 12:06:58 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 17:06:58 +0100
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <224711.6366.qm@web62411.mail.re1.yahoo.com>
References: <320fb6e00806261645y1819cddx620d430f34d7e725@mail.gmail.com>
	<224711.6366.qm@web62411.mail.re1.yahoo.com>
Message-ID: <320fb6e00806270906p3d0d3a1dyf78b64bc2f0afa13@mail.gmail.com>

On Fri, Jun 27, 2008 at 4:54 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>> Your suggestion to automatically do the parsing when XML format is
>> requested would prevent the user from parsing the XML themselves (e.g.
>> using SAX or DOM).
>
> Actually I was suggesting to do the parsing only if no format is
> requested, and to return a handle to XML if XML format is requested.

Oh I see.  But determining the format is a complex combination of the
retmode and rettype parameters... quite confusing it its own right!
Especially as the are multiple different XML file formats for the same
result set.
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/efetchlit_help.html
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/efetchlit_help.html

> But from the current examples in the Bio.Entrez chapter in the tutorial, it appears
> that typically users will have to write some glue code anyway to make optimally
> use of Bio.Entrez for their purposes. In that case, I suppose that whether or not
> we return a handle or an object from the Bio.Entrez.e* functions makes little difference.

Fair point.  Certainly the "esearch and efetch" example is relatively
complicated, and having a combined "esearch then parse" function
wouldn't make much difference.

Let's leave this suggestion for the time being (having versions of the
Bio.Entrez functions which include the call to Bio.Entrez.read() to
parse the XML).

Peter

From mjldehoon at yahoo.com  Fri Jun 27 12:01:54 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 27 Jun 2008 09:01:54 -0700 (PDT)
Subject: [Biopython-dev] Removing obsolete bits of the Tutorial
In-Reply-To: <320fb6e00806270842r6231adfdo8edff7a07a329cdf@mail.gmail.com>
Message-ID: <215121.11545.qm@web62405.mail.re1.yahoo.com>


> I've just got the the "BioRegistry ? automatically ?nding sequence
> sources" section of the tutorial/cookbook, and this either needs major
> updating or removing
> ...
> Would anyone object to me removing this section of the
> tutorial/cookbook?
I think it's better to remove it.
Then there is the section on "Parser Design" which focuses on the
scanner/consumer model and lists lots of the events these parsers
(used to) generate.  I don't think any of this is useful, and suspect
that a lot of it is out of date.  Again, should we just remove this
section?
That too. Otherwise, we may inadvertently be causing new
Biopython developers to write their parsers using this out of
date parser design, which as far as I know is not being used
in the major Biopython modules.

--Michiel


From mjldehoon at yahoo.com  Fri Jun 27 12:40:13 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 27 Jun 2008 09:40:13 -0700 (PDT)
Subject: [Biopython-dev] Modules to be removed from Biopython
Message-ID: <492634.64872.qm@web62414.mail.re1.yahoo.com>

Hi everybody,

In recent releases, we have been using the rule of thumb to remove all modules from a new Biopython release that were deprecated two releases ago.

For the upcoming release, this means that we will remove the modules that were deprecated in Biopython 1.44. In that release, quite a lot of modules were deprecated; these modules will not appear in Biopython 1.46.

Some of the modules to be removed are relatively simple cases, which I think can be removed without causing any real pain to anybody:

Bio.crc (moved to Bio.SeqUtils.CheckSum)
Bio.Fasta.index_file
Bio.Fasta.Dictionary
Bio.GenBank.index_file
Bio.GenBank.Dictionary
Bio.Geo.Iterator (replaced by Bio.Geo.parse)
Bio.KEGG.Compound.Iterator (replaced by Bio.KEGG.Compound.parse)
Bio.KEGG.Enzyme.Iterator (replaced by Bio.KEGG.Enzyme.parse)
Bio.KEGG.Map.Iterator (replaced by Bio.KEGG.Enzyme.parse)
Bio.lcc (moved to Bio.SeqUtils.lcc)
Bio.MarkupEditor
Bio.Medline.NLMMedlineXML
Bio.Medline.nlmmedline_001211_format
Bio.Medline.nlmmedline_010319_format
Bio.Medline.nlmmedline_011101_format
Bio.Medline.nlmmedline_031101_format
Bio.MultiProc
Bio.SeqIO.FASTA.py
Bio.SeqIO.generic.py

But, there is also a set of interconnected modules where it's not 100% clear if they can be removed without causing some surprises:
Bio.builders
Bio.config
Bio.dbdefs
Bio.formatdefs
Bio.dbdefs
Bio.expressions
Bio.FormatIO
Bio.Std
Bio.StdHandler
It is probably OK to remove these, since these were deprecated we did not get a barrage of complaints from our users. Personally, I think it is important to keep the code base clean, so I am in favor of removing these (and see if anybody complains; in that case, we can always put these modules back in and make a new release). But I can live with keeping these modules for another release round. If anybody thinks that that would be better, please let us know.

--Michiel


From biopython at maubp.freeserve.co.uk  Fri Jun 27 12:50:17 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 17:50:17 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <492634.64872.qm@web62414.mail.re1.yahoo.com>
References: <492634.64872.qm@web62414.mail.re1.yahoo.com>
Message-ID: <320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>

On Fri, Jun 27, 2008 at 5:40 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> In recent releases, we have been using the rule of thumb to remove all
> modules from a new Biopython release that were deprecated two releases ago.

I was wondering if there was a stated policy on this.

> For the upcoming release, this means that we will remove the modules
> that were deprecated in Biopython 1.44. In that release, quite a lot of
> modules were deprecated; these modules will not appear in Biopython 1.46.
>
> Some of the modules to be removed are relatively simple cases, which I
> think can be removed without causing any real pain to anybody:
>
> Bio.crc (moved to Bio.SeqUtils.CheckSum)
> Bio.Fasta.index_file
> Bio.Fasta.Dictionary
> Bio.GenBank.index_file
> Bio.GenBank.Dictionary
> Bio.Geo.Iterator (replaced by Bio.Geo.parse)
> Bio.KEGG.Compound.Iterator (replaced by Bio.KEGG.Compound.parse)
> Bio.KEGG.Enzyme.Iterator (replaced by Bio.KEGG.Enzyme.parse)
> Bio.KEGG.Map.Iterator (replaced by Bio.KEGG.Enzyme.parse)
> Bio.lcc (moved to Bio.SeqUtils.lcc)
> Bio.MarkupEditor
> Bio.Medline.NLMMedlineXML
> Bio.Medline.nlmmedline_001211_format
> Bio.Medline.nlmmedline_010319_format
> Bio.Medline.nlmmedline_011101_format
> Bio.Medline.nlmmedline_031101_format
> Bio.MultiProc
> Bio.SeqIO.FASTA.py
> Bio.SeqIO.generic.py

Those all look fine to remove.  I agree here.

> But, there is also a set of interconnected modules where it's not 100%
> clear if they can be removed without causing some surprises:
> Bio.builders
> Bio.config
> Bio.dbdefs
> Bio.formatdefs
> Bio.dbdefs
> Bio.expressions
> Bio.FormatIO
> Bio.Std
> Bio.StdHandler
> It is probably OK to remove these, since these were deprecated we did
> not get a barrage of complaints from our users. Personally, I think it is
> important to keep the code base clean, so I am in favor of removing
> these (and see if anybody complains; in that case, we can always put
> these modules back in and make a new release). But I can live with
> keeping these modules for another release round. If anybody thinks
> that that would be better, please let us know.

Given some of these are very interconnected, I would be inclined to leave
them in for one more release.  However I'm content to see them go.  If no
one else has any  qualms, then please carry on.

Peter

From biopython at maubp.freeserve.co.uk  Fri Jun 27 12:54:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 17:54:16 +0100
Subject: [Biopython-dev] Removing obsolete bits of the Tutorial
In-Reply-To: <215121.11545.qm@web62405.mail.re1.yahoo.com>
References: <320fb6e00806270842r6231adfdo8edff7a07a329cdf@mail.gmail.com>
	<215121.11545.qm@web62405.mail.re1.yahoo.com>
Message-ID: <320fb6e00806270954r4ee7b16fw3210cd77f1708a3@mail.gmail.com>

On Fri, Jun 27, 2008 at 5:01 PM, Michiel de Hoon wrote:
>
>> I've just got the the "BioRegistry ? automatically ?nding sequence
>> sources" section of the tutorial/cookbook, and this either needs major
>> updating or removing
>> ...
>> Would anyone object to me removing this section of the
>> tutorial/cookbook?
>
> I think it's better to remove it.

Gone.

>> Then there is the section on "Parser Design" which focuses on the
>> scanner/consumer model and lists lots of the events these parsers
>> (used to) generate.  I don't think any of this is useful, and suspect
>> that a lot of it is out of date.  Again, should we just remove this
>> section?
>
> That too. Otherwise, we may inadvertently be causing new
> Biopython developers to write their parsers using this out of
> date parser design, which as far as I know is not being used
> in the major Biopython modules.

It's not entirely out of date - don't SAX based XML parsers do
something similar?  And quite a few major modules still follow this
scheme (e.g. Bio.GenBank and Bio.SwissProt).  Anyway, I have removed
most of this section leaving only a short overview.

Peter


From biopython at maubp.freeserve.co.uk  Fri Jun 27 13:49:53 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 18:49:53 +0100
Subject: [Biopython-dev] Recent Bio.Nexus updates
Message-ID: <320fb6e00806271049vdfb15co30a05c0a93963aba@mail.gmail.com>

Hi Frank,

I see you've got your CVS access working again - good :)

I wanted to ask you about two of your recent changes to Bio/Nexus/Nexus.py

First of all, you've added a new method export_phylip(), which seems
to be a simple function to record the Nexus object's alignment as a
PHYLIP format alignment.  One point of concern is code duplication
(Bio.AlignIO can write PHYLIP files).  Also, you don't seem to be
following the "spec" strictly, as the taxon names are not cropped to
ten characters, nor are any "illegal" characters dealt with.  More
generally, I wonder if this method is really needed - perhaps instead
a general method to return a Bio.Align.Generic.Alignment object would
be preferable.  This could then be used in conjunction with any of the
alignment formats supported in Bio.AlignIO.

Secondly, you seem to have reverted the alphabet change to
Bio/Nexus/Nexus.py made in revision 1.12 to fix Bug 2380.  Was this
deliberate or just accidental?
http://bugzilla.open-bio.org/show_bug.cgi?id=2380

Thanks,

Peter

From biopython at maubp.freeserve.co.uk  Fri Jun 27 17:58:04 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 22:58:04 +0100
Subject: [Biopython-dev] [BioPython] Entrez
In-Reply-To: <1214569152.6026.9.camel@ubuntu>
References: <1214494546.6215.3.camel@ubuntu>
	<320fb6e00806260857i619d4947l130791ab8276f992@mail.gmail.com>
	<1214562160.6026.2.camel@ubuntu>
	<320fb6e00806270416x76d8b388mdd79577927001f32@mail.gmail.com>
	<1214569152.6026.9.camel@ubuntu>
Message-ID: <320fb6e00806271458t4e043c39sb664c4346c8a6949@mail.gmail.com>

Just forwarding this to the mailing list - Binbin's problem is
resolved (although I don't know what was wrong originally).

A happy ending :)

Peter

---------- Forwarded message ----------
From: binbin <binbin.liu at umb.no>
Date: Fri, Jun 27, 2008 at 1:19 PM
Subject: Re: [BioPython] Entrez
To: Peter <biopython at maubp.freeserve.co.uk>


i re-install the biopyton1.45 and now i can import Entrez!

thanks very much!


? 2008-06-27?? 13:16 +0200?Peter???
> On Fri, Jun 27, 2008 at 11:22 AM, binbin <binbin.liu at umb.no> wrote:
> > thank you for answering, i am a beginner of biopython,in the "Biopython
> > Tutorial and Cookbook":
> > 2.5  Connecting with biological databases:
> > this is found
> > "from Bio import Entrez"
> >
> > i tried this but it did work for me, that is why i asked.
>
> That should have worked if your installation of Biopython 1.45 was successful.
>
> We may be able to work out what is wrong.  What operating system are
> you using, which version of python, and how did you install Biopython?
>
> Regards,
>
> Peter

From biopython at maubp.freeserve.co.uk  Fri Jun 27 18:06:14 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 23:06:14 +0100
Subject: [Biopython-dev] [BioPython] Bio.SCOP.FileIndex
In-Reply-To: <141582.2274.qm@web62413.mail.re1.yahoo.com>
References: <141582.2274.qm@web62413.mail.re1.yahoo.com>
Message-ID: <320fb6e00806271506i1af1db34n1aec65605fd6f83c@mail.gmail.com>

On Wed, Jun 25, 2008 at 3:04 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> When I was modifying Bio.SCOP, I noticed that Bio.SCOP.FileIndex is flawed
> if file reading is done via a buffer (which is often the case in Python).

Are you talking about Bio/SCOP/FileIndex.py?  The whole design seems to be
geared to indexing the position of record in a file - down to the fact
that it takes
as filename rather than a handle. Why does it need "fixing"?

> Before we try to fix this, is anybody actually using Bio.SCOP.FileIndex?
> If not, I think we should deprecate it instead of trying to fix it.

We've deprecated similar functionality in Bio.GenBank, although if I recall
correctly that was because it was using Martel and broke with mxTextTools 3.0,
and therefore fixing it was non-trivial.

If Bio.SCOP.FileIndex is broken, then deprecation seems sensible.

Peter

From mjldehoon at yahoo.com  Fri Jun 27 22:21:53 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 27 Jun 2008 19:21:53 -0700 (PDT)
Subject: [Biopython-dev] [BioPython] Bio.SCOP.FileIndex
In-Reply-To: <320fb6e00806271506i1af1db34n1aec65605fd6f83c@mail.gmail.com>
Message-ID: <216781.61321.qm@web62403.mail.re1.yahoo.com>

--- On Fri, 6/27/08, Peter <biopython at maubp.freeserve.co.uk> wrote:
Are
you talking about Bio/SCOP/FileIndex.py? The whole design seems to
begeared to indexing the position of record in a file - down to the fact that it takes as filename rather than a handle. Why does it need "fixing"?

FileIndex pulls out records from the iterator one by one, and then calls .tell() on the file handle to find the starting position of each record. The problem is that (due to buffered reading from the file handle) .tell() does not correspond to the record starting positions.

Taking the essential pieces of FileIndex:

>>> input = open("mydatafile.txt")
>>> while True:
...???? next_line = input.next()
...???? print input.tell()
... 
8192
8192
8192
8192
8192
...
8192
8192
18432
18432
18432
...

It works because in the iterators that are actually used in Bio.SCOP call readline() internally, which reads exactly one line so that .tell() returns the expected answer.
But, calling readline() in the iterator is a limitation (e.g., you cannot run it on a list of lines).

Another option is to let FileIndex itself call readline():

class FileIndex(dict):
??? def __init__(self, filename, record_gen, key_gen)
??????? ...
??????? f = open(filename)
??????? while True:
??????????? line = f.readline()
??????????? self[key] = f.tell() # store location
...
??? def __getitem__(self, key):
??????? location = dict.__getitem__[key]
??????? f.seek(location)
??????? line = f.readline()
??????? return record_gen(line)

This works, but it means changing how users call FileIndex.
Which is also OK, but before modifying FileIndex it would be good to know if anybody is actually using this functionality.

--Michiel.


From mjldehoon at yahoo.com  Fri Jun 27 22:28:48 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 27 Jun 2008 19:28:48 -0700 (PDT)
Subject: [Biopython-dev] Bio.GenBank.NCBIDictionary, Bio.PubMed.Dictionary
Message-ID: <982950.87150.qm@web62409.mail.re1.yahoo.com>

Does anybody have any further objections to deprecating Bio.GenBank.NCBIDictionary and Bio.PubMed.Dictionary? These two classes download records from NCBI one by one, which is exactly what NCBI advised against.

--Michiel


From bugzilla-daemon at portal.open-bio.org  Sat Jun 28 16:09:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 28 Jun 2008 16:09:44 -0400
Subject: [Biopython-dev] [Bug 2530] New: Bio.Seq.translate() treats invalid
	codons as stops
Message-ID: <bug-2530-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2530

           Summary: Bio.Seq.translate() treats invalid codons as stops
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


The following results are with CVS.  Biopython 1.45 may be different, I have
recently tweaked the translate function for some less dramatic issues.

I would like Bio.Seq.translate() to raise exceptions on untranslatable codons,
rather than inserting a stop character.  e.g. for "N at N" or "TA-".

Currently:

>>> from Bio.Seq import translate
>>> translate("TAA")
'*'
>>> translate("TAG")
'*'
>>> translate("TAA")
'*'
>>> translate("TAC")
'Y'
>>> translate("TAN")
...
Bio.Data.CodonTable.TranslationError: 'TAN'
>>> translate("NNN")
...
Bio.Data.CodonTable.TranslationError: 'TAN'
>>> translate("AAA")
'K'
>>> translate("ANA")
'X'
>>> translate("AXA")
'X'

That is all fine.  However,

>>> translate("A at A")
'*'
>>> translate("A-A")
'*'

These should also raise a TranslationError.  Suggested non-trivial patch to
follow.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Jun 28 16:19:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 28 Jun 2008 16:19:09 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
	codons as stops
In-Reply-To: <bug-2530-42@http.bugzilla.open-bio.org/>
Message-ID: <200806282019.m5SKJ9l2011097@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2530


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-28 16:19 EST -------
Created an attachment (id=953)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=953&action=view)
Patch to Bio/Seq.py Bio/Data/CodonTable.py and the test_seq.py unit test

The basic idea of this patch is to include the stop codons in the CodonTable's
forward table dictionary.  Currently, when doing the translation a stop codon
is inserted when the key is undefined (but this also happens for invalid
codons).

Instead, by including the stop codons in the forward table, we can do a single
mapping.  Any KeyError becomes a translation error.

However, this is a fiarly significant change to the existing CodonTable
objects.  The are a strange odd bunch of objects - with the ambiguous codon
tables being very odd.  I have replaced all of these with a single codon table
which includes all the DNA and RNA codons, including the ambiguous ones.  All
the existing variants of DNA/RNA/Generic and (un)ambiguous CodonTables are more
replaced with the single object.  We still have one per NCBI codon table.

I think that the CodonTable could be made simpler still, but I wanted to at
least try and remain API backwards compatible (bar the dictionary change).

Then, I tweaked the Bio.Seq translate method to take advantage of this.

NOTE - We don't have a unit test for Bio.Data.CodonTable or Bio.Translate, so
it would be wise to write one BEFORE commiting this patch.  If there are any
other bits of code using Bio.Data.CodonTable they could also be affected.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Sat Jun 28 16:32:09 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jun 2008 21:32:09 +0100
Subject: [Biopython-dev] Failing unit tests under Windows
Message-ID: <320fb6e00806281332v44ba6139xd2531c57f53f92e@mail.gmail.com>

I run python 2.3.5 on Windows, and compile from source with MSCV 6.0
(which is a different setup to the one Michiel uses for the builds).
I just thought I should document the unit test oddities I see on this
machine:

test_ProtPram - fails with a single floating point difference, 0.562
versus 0.563.

test_Wise - doesn't fail gracefully due to a problem detecting dnal
http://bugzilla.open-bio.org/show_bug.cgi?id=2469

test_psw - fails due to a "doctest of" versus "Doctest: " string
difference.  This may be due to the different version of python?  We
can probably fix this in run_tests.py

test_KDTree - fails with ImportError: No module named _CKDTree
I do select yes when asked if I want to build Bio.KDTree - does this
work for anyone under Windows?

Peter

From bugzilla-daemon at portal.open-bio.org  Sat Jun 28 16:39:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 28 Jun 2008 16:39:45 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
	codons as stops
In-Reply-To: <bug-2530-42@http.bugzilla.open-bio.org/>
Message-ID: <200806282039.m5SKdjUA011740@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2530


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-28 16:39 EST -------
Actually there is a unit test, test_translate.py - maybe the lower case T
confused me?  The bad news is this unit test fails with my patch, due to the
Bio.Translate module using an incredibly strict check on the alphabet.

I'll try and come up with a less invasive change to Bio.Data.CodonTable which
makes Bio.Translate happy again - but probably not tonight.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Jun 28 21:57:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 28 Jun 2008 21:57:54 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
	codons as stops
In-Reply-To: <bug-2530-42@http.bugzilla.open-bio.org/>
Message-ID: <200806290157.m5T1vshF022329@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2530


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #953 is|0                           |1
           obsolete|                            |


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-28 21:57 EST -------
(From update of attachment 953)
There is an underlying issue in Bio.Data.CodonTable, which is at least
commented:

# These two are WRONG!  I need to get the
# list of ambiguous codons which code for                            # the stop
codons  XXX

For example, R = A or G, so UAR = UAA or UAG / TAR = TAA or TAG = stop codons.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Jun 28 22:37:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 28 Jun 2008 22:37:01 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
	codons as stops
In-Reply-To: <bug-2530-42@http.bugzilla.open-bio.org/>
Message-ID: <200806290237.m5T2b1Wu023585@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2530


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-28 22:37 EST -------
Created an attachment (id=954)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=954&action=view)
Rough patch to Bio/Data/CodonTable.py

This includes some self testing, but needs further validation before being
trusted.  For example, is it enough to compare just pairs of unambiguous
start/stop codons when generating the set of possible ambiguous start/stop
codons?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Sun Jun 29 02:22:43 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 28 Jun 2008 23:22:43 -0700 (PDT)
Subject: [Biopython-dev] [BioPython] Bio.SCOP.FileIndex
In-Reply-To: <216781.61321.qm@web62403.mail.re1.yahoo.com>
Message-ID: <584421.23968.qm@web62410.mail.re1.yahoo.com>

It turned out that Bio.SCOP.FileIndex was used as a base class in Bio.SCOP.Cla and Bio.SCOP.Raf. Without using Bio.SCOP.FileIndex as a base class, the derived classes in Bio.SCOP.Cla and Bio.SCOP.Raf were easy to fix. So I deprecated Bio.SCOP.FileIndex, while keeping Bio.SCOP's functionality intact by fixing the derived classes.

--Michiel


From bugzilla-daemon at portal.open-bio.org  Sun Jun 29 02:24:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 29 Jun 2008 02:24:42 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806290624.m5T6Og3F029458@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #19 from mdehoon at ims.u-tokyo.ac.jp  2008-06-29 02:24 EST -------
Bio.SCOP is fixed now (added a parse() function as a replacement for the
Iterator class, which is now deprecated).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Jun 29 06:09:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 29 Jun 2008 06:09:25 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
	for '-F F' and make it safe
In-Reply-To: <bug-2508-42@http.bugzilla.open-bio.org/>
Message-ID: <200806291009.m5TA9PfZ021963@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508


------- Comment #7 from mmokrejs at ribosome.natur.cuni.cz  2008-06-29 06:09 EST -------
Quoting from http://www.python.org/dev/peps/pep-0324/

    - No implicit call of /bin/sh.  This means that there is no need
      for escaping dangerous shell meta characters.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Jun 29 06:55:04 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 29 Jun 2008 06:55:04 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
	for '-F F' and make it safe
In-Reply-To: <bug-2508-42@http.bugzilla.open-bio.org/>
Message-ID: <200806291055.m5TAt4qX023404@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-29 06:55 EST -------
Hmm.  Another reason to move to Python 2.4+, see also Bug 2480.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Sun Jun 29 07:15:00 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 29 Jun 2008 04:15:00 -0700 (PDT)
Subject: [Biopython-dev] CVS freeze for release 1.46
Message-ID: <799546.26730.qm@web62413.mail.re1.yahoo.com>

Hi everybody,
I will start to creating the new release from now.
Please don't make any commits to CVS until the new release is out.
Thanks!

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Sun Jun 29 10:35:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 29 Jun 2008 10:35:11 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
	codons as stops
In-Reply-To: <bug-2530-42@http.bugzilla.open-bio.org/>
Message-ID: <200806291435.m5TEZBAh032091@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2530


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #954 is|0                           |1
           obsolete|                            |


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-29 10:35 EST -------
Created an attachment (id=955)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=955&action=view)
Patches Bio/Data/CodonTable.py for ambiguous start/stop codons

This implements the stub function list_ambiguous_codons, and adds a lot of
in-situ asserts which could later be moved to a unit test.

e.g. ['TAG', 'TAA'] -> ['TAG', 'TAA', 'TAR']
     ['UAG', 'UGA'] -> ['UAG', 'UGA', 'URA']

Note that ['TAG', 'TGA'] -> ['TAG', 'TGA'], this does not add 'TRR' is this
could be a stop codon or a coding amino acid.  Thus only two more codons are
added in the following example:

e.g. ['TGA', 'TAA', 'TAG'] -> ['TGA', 'TAA', 'TAG', 'TRA', 'TAR']


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Sun Jun 29 10:43:25 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 29 Jun 2008 07:43:25 -0700 (PDT)
Subject: [Biopython-dev] New release 1.46
Message-ID: <899008.26338.qm@web62403.mail.re1.yahoo.com>

Hi everybody,

Release 1.46 is essentially done. Feel free to start committing to CVS again.

Currently I am not able to update Biopython's wiki pages. This looks like an problem with the wiki, since I am getting a blank screen without any error message. So I cannot update the website and send out the announcement yet.

--Michiel


From biopython at maubp.freeserve.co.uk  Sun Jun 29 11:09:47 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 29 Jun 2008 16:09:47 +0100
Subject: [Biopython-dev] New release 1.46
In-Reply-To: <899008.26338.qm@web62403.mail.re1.yahoo.com>
References: <899008.26338.qm@web62403.mail.re1.yahoo.com>
Message-ID: <320fb6e00806290809r6ad238d3r3a16dfa145bc0186@mail.gmail.com>

On Sun, Jun 29, 2008 at 3:43 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> Release 1.46 is essentially done. Feel free to start committing to CVS again.

Well done - I hope you didn't give up your whole weekend for this.

> Currently I am not able to update Biopython's wiki pages. This looks like an problem
> with the wiki, since I am getting a blank screen without any error message. So I
> cannot update the website and send out the announcement yet.

I've been in touch with the OBF about this before.  You'll notice all
the other project pages are down too (check www.biosql.org and
www.bioperl.org for example).  I'm told they have something in place
to automatically reboot the server, so it should fix itself within an
hour or so, but it looks like they haven't resolved the underlying
problem.

I guess this means the new release files themselves are still waiting
on your local machine(s)?  That's a shame.

Peter

From mjldehoon at yahoo.com  Sun Jun 29 11:07:36 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 29 Jun 2008 08:07:36 -0700 (PDT)
Subject: [Biopython-dev] Removing obsolete bits of the Tutorial
In-Reply-To: <320fb6e00806270954r4ee7b16fw3210cd77f1708a3@mail.gmail.com>
Message-ID: <176230.99034.qm@web62415.mail.re1.yahoo.com>


>> Then there is the section on "Parser Design" which focuses
on the
>> scanner/consumer model and lists lots of the events these parsers
>> (used to) generate.  I don't think any of this is useful, and
suspect
>> that a lot of it is out of date.  Again, should we just remove this
>> section?
>
> That too. Otherwise, we may inadvertently be causing new
> Biopython developers to write their parsers using this out of
> date parser design, which as far as I know is not being used
> in the major Biopython modules.

It's not entirely out of date - don't SAX based XML parsers do
something similar?
Yes, but there's a difference:

In an XML file, we need to find out where the XML tags are to be able to parse the file. These tags can appear anywhere in the file.

In flat-file text formats, typically different information is stored in different lines. So finding out where one piece of information ends and another one starts becomes trivial. We just need to pull out the lines one by one, and check whether they are a new piece of information or a continuation of the current piece of information.

Especially for simple formats (e.g. Fasta), using a scanner / consumer model can be unnecessarily complex. But also for more complicated formats, parsing line by line can be entirely straightforward. For example, have a look at Bio/SwissProt/KeyWList.py, which currently contains a line-by-line parser and a scanner/consumer parser (which is deprecated). The former takes 26 lines, the latter more than a 100.

--Michiel.


From biopython at maubp.freeserve.co.uk  Sun Jun 29 11:28:04 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 29 Jun 2008 16:28:04 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>
References: <492634.64872.qm@web62414.mail.re1.yahoo.com>
	<320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>
Message-ID: <320fb6e00806290828u7133ee40x8feba14b19c13be8@mail.gmail.com>

> On Fri, Jun 27, 2008 at 5:40 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>> For the upcoming release, this means that we will remove the modules
>> that were deprecated in Biopython 1.44. In that release, quite a lot of
>> modules were deprecated; these modules will not appear in Biopython 1.46.
>>
>> Some of the modules to be removed are relatively simple cases, which I
>> think can be removed without causing any real pain to anybody:
>>
>> ...

I see you removed most of the easy ones before making Biopython 1.46.

Just to let you all know that I've just removed these three:

>> Bio.SeqIO.FASTA.py
>> Bio.SeqIO.generic.py
>> Bio.FormatIO

Peter

From fkauff at biologie.uni-kl.de  Mon Jun 30 04:34:30 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Mon, 30 Jun 2008 10:34:30 +0200
Subject: [Biopython-dev] Recent Bio.Nexus updates
In-Reply-To: <320fb6e00806271049vdfb15co30a05c0a93963aba@mail.gmail.com>
References: <320fb6e00806271049vdfb15co30a05c0a93963aba@mail.gmail.com>
Message-ID: <48689A96.4010805@biologie.uni-kl.de>

Hi Peter and Michiel,

Peter wrote:
> Hi Frank,
>
> I see you've got your CVS access working again - good :)
>
> I wanted to ask you about two of your recent changes to Bio/Nexus/Nexus.py
>
> First of all, you've added a new method export_phylip(), which seems
> to be a simple function to record the Nexus object's alignment as a
> PHYLIP format alignment.  One point of concern is code duplication
> (Bio.AlignIO can write PHYLIP files).  Also, you don't seem to be
> following the "spec" strictly, as the taxon names are not cropped to
> ten characters, nor are any "illegal" characters dealt with.  
True - I ignored this delibaretely. I think except for old PHYLIP 
itself, all software I know handles longer taxon names by default. The 
format I used here is sometimes refered to as "relaxed phylip" but as it 
has become the standard for what people call phylip formt, so I just 
kept it this way.

> More
> generally, I wonder if this method is really needed - perhaps instead
> a general method to return a Bio.Align.Generic.Alignment object would
> be preferable.  This could then be used in conjunction with any of the
> alignment formats supported in Bio.AlignIO.
>   
That is a possibility. I would then vouch for adding support for 
"relaxed phylip" to AlignIO.PhylipIO (which I could easily do with a 
little mofification of Nexus.export_phylip() myself)
> Secondly, you seem to have reverted the alphabet change to
> Bio/Nexus/Nexus.py made in revision 1.12 to fix Bug 2380.  Was this
> deliberate or just accidental?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2380
>
>   
Sorry for that. I missed that bug. Thaks for re-fixing it.

Frank
> Thanks,
>
> Peter
>
>   


-- 
J-Prof. Dr. Frank Kauff
Molecular Phylogenetics
FB Biologie, 13/276
TU Kaiserslautern
Postfach 3049
67653 Kaiserslautern

Tel. +49 (0)631 205-2562
Fax. +49 (0)631 205-2998
email: fkauff at biologie.uni-kl.de
skype: frank.kauff


From biopython at maubp.freeserve.co.uk  Mon Jun 30 05:12:17 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Jun 2008 10:12:17 +0100
Subject: [Biopython-dev] Recent Bio.Nexus updates
In-Reply-To: <48689A96.4010805@biologie.uni-kl.de>
References: <320fb6e00806271049vdfb15co30a05c0a93963aba@mail.gmail.com>
	<48689A96.4010805@biologie.uni-kl.de>
Message-ID: <320fb6e00806300212m6b129a17he9dfd7c8af7cbc03@mail.gmail.com>

>> First of all, you've added a new method export_phylip(), which seems
>> to be a simple function to record the Nexus object's alignment as a
>> PHYLIP format alignment.  One point of concern is code duplication
>> (Bio.AlignIO can write PHYLIP files).  Also, you don't seem to be
>> following the "spec" strictly, as the taxon names are not cropped to
>> ten characters, nor are any "illegal" characters dealt with.
>
> True - I ignored this delibaretely. I think except for old PHYLIP itself,
> all software I know handles longer taxon names by default. The format I used
> here is sometimes refered to as "relaxed phylip" but as it has become the
> standard for what people call phylip formt, so I just kept it this way.

Sadly "relaxed phylip" is an even less well defined format!

>> More
>> generally, I wonder if this method is really needed - perhaps instead
>> a general method to return a Bio.Align.Generic.Alignment object would
>> be preferable.  This could then be used in conjunction with any of the
>> alignment formats supported in Bio.AlignIO.
>
> That is a possibility. I would then vouch for adding support for "relaxed
> phylip" to AlignIO.PhylipIO (which I could easily do with a little
> mofification of Nexus.export_phylip() myself)

Would you expect spaces to be allowed in the names for "relaxed
phylip" files?  Writing the files is easy - checking that other tools
can understand them is more hassle.  And the flip side of this is
reading assorted versions of "relaxed phylip" is also tricky.  If you
have a collection of various "valid" files (ideally output from or
accepted by mainstream tools) we could use that to put together a test
suite which would define the de-facto standard.  But without that, I
wouldn't be so confident about adding this to Biopython.

>> Secondly, you seem to have reverted the alphabet change to
>> Bio/Nexus/Nexus.py made in revision 1.12 to fix Bug 2380.  Was this
>> deliberate or just accidental?
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2380
>
> Sorry for that. I missed that bug. Thaks for re-fixing it.

There may be a more elegant way of fixing this.

Peter

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 06:21:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 06:21:26 -0400
Subject: [Biopython-dev] [Bug 2509] Deprecating the .data property of the
	Seq and MutableSeq objects
In-Reply-To: <bug-2509-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301021.m5UALQVF020449@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2509


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 06:21 EST -------
See also Bug 2351, Make Seq more like a string, even subclass string?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 09:35:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 09:35:59 -0400
Subject: [Biopython-dev] [Bug 2531] New: Nexus and fasta parsers have a
	problem with identical taxa names
Message-ID: <bug-2531-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531

           Summary: Nexus and fasta parsers have a problem with identical
                    taxa names
           Product: Biopython
           Version: 1.44
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P4
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: abetanco at staffmail.ed.ac.uk


When identical taxa names are used to identify different sequences, the nexus
and fasta parser will output both taxa names, but output the same sequence for
each of them. 
If it's not possible to store both sequences, maybe it would be better if only
one of the sequences were  written out, so at least it's obvious there's a
problem?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 09:48:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 09:48:24 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301348.m5UDmO70030666@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 09:48 EST -------
Which Nexus and Fasta parsers?  There is more than one way to load these file
formats in Biopython - could you show us some sample code please?

You can attach a pair of example input files if it helps.

Thanks.  Peter.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 10:21:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 10:21:41 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301421.m5UELfPj000799@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 10:21 EST -------
Can I repeat my request that you upload an example file (by creating an
attachment to this bug) of a FASTA and NEXUS file that doesn't work for you.

Here is a small Nexus file I just created by hand, with repeated taxon
CYS1_DICDI (with almost the same sequence), and then below some example code
using Bio.Nexus to parse it.

==================================
#NEXUS
[TITLE: NoName]

begin data;
dimensions ntax=4 nchar=50;
format interleave datatype=protein   gap=- symbols="FSTNKEYVQMCLAWPHDRIG";

matrix
CYS1_DICDI          -----MKVIL LFVLAVFTVF VSS------- --------RG IPPEEQ---- 
ALEU_HORVU          MAHARVLLLA LAVLATAAVA VASSSSFADS NPIRPVTDRA ASTLESAVLG 
CATH_HUMAN          ------MWAT LPLLCAGAWL LGV------- -PVCGAAELS VNSLEK----
CYS1_DICDI          -----MKVIL LFVLAVFTVF VSS------- --------RG IPPEEQ---X
;
end; 
==================================

Then in python,
>>> filename = ...
>>> handle = open(filename)
>>> from Bio.Nexus import Nexus
>>> n = Nexus.Nexus(handle)
>>> print n.matrix.keys()
['CATH_HUMAN', 'CYS1_DICDI', 'CYS1_DICDI.copy', 'ALEU_HORVU']
>>> n.matrix['CYS1_DICDI']
Seq('-----MKVILLFVLAVFTVFVSS---------------RGIPPEEQ----', IUPACProtein())
>>> n.matrix['CYS1_DICDI.copy']
Seq('-----MKVILLFVLAVFTVFVSS---------------RGIPPEEQ---X', IUPACProtein())

Note that Bio.Nexus has automatically renamed the duplicate entry
'CYS1_DICDI.copy' and that their different sequences have been loaded
correctly.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 10:36:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 10:36:06 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301436.m5UEa6WK001525@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #3 from abetanco at staffmail.ed.ac.uk  2008-06-30 10:36 EST -------
Created an attachment (id=956)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=956&action=view)
nexus file

Sorry for the overly complicated nexus file, but I can't seem to reproduce the
bug with a simple example.  In this case, HI99.Line5 is entered twice, and
differs just at three sites (249, 417, and 452).  The result I get at those
three sites is the first sequence duplicated twice. 

             249        417     452
nexus file                      
HI99.Line5      T       T       A
HI99.Line5      C       C       G
fasta output
HI99.Line5      T       T       A
HI99.Line5      T       T       A


To do the conversion, I used this, which I think is just copied off the
Biopython documentation site:

#! /usr/bin/python


if __name__ == '__main__' :

        from Bio import SeqIO
        import sys

        input_handle = open(sys.argv[1], "rU")
        output_handle = open(sys.argv[1].+"fas", "w")

        sequences = SeqIO.parse(input_handle, "nexus")
        SeqIO.write(sequences, output_handle, "fasta")

        output_handle.close()
        input_handle.close()


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 10:52:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 10:52:08 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301452.m5UEq8DN002181@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 10:52 EST -------
Thanks for the example file - I can now reproduce a problem, which is progress.

There is a rather cryptic error message from Bio.SeqIO, due to the fact that
when Bio.Nexus parses the file it doesn't create a matrix.

You can see this by using Bio.Nexus directly:

>>> filename = ...
>>> handle = open(filename)
>>> from Bio.Nexus import Nexus
>>> n = Nexus.Nexus(handle)
>>> n.matrix.keys()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'keys'
>>> n.matrix is None
True

This explains why trying to use Bio.SeqIO gives the following exception:
TypeError: argument of type 'NoneType' is not iterable

So, from my point of view this is good news (joke) as its not really a problem
in Bio.SeqIO - although I will fix Bio.SeqIO so it fails gracefully.

This seems to be a problem in Bio.Nexus, so its a job for Frank...

I've got a couple more questions for you:

(1) Where did this file come from?  I'm not an expert on the details of the
Nexus file format, but I am wondering which program wrote this file, as perhaps
it is invalid in some way?

(2) Could we add it to Biopython as an example for our unit tests?  It might be
a bit big as it is, but we could cut it down a little by hand first.

P.S. I have retitled the bug from "Nexus and fasta parsers have a problem with
identical taxa names" to "Bio.Nexus has a problem with identical taxa names".

You don't seem to be parsing in any FASTA files, just trying to write one.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Mon Jun 30 10:55:16 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 30 Jun 2008 07:55:16 -0700 (PDT)
Subject: [Biopython-dev] New release
Message-ID: <97693.82874.qm@web62401.mail.re1.yahoo.com>

Sorry, but I still can't edit the Biopython wiki pages, so I can't make the new release available. Can other people edit these pages?

--Michiel.


From biopython at maubp.freeserve.co.uk  Mon Jun 30 10:56:39 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Jun 2008 15:56:39 +0100
Subject: [Biopython-dev] Bug 2531 - Bio.Nexus problem with file with
	repeated id
Message-ID: <320fb6e00806300756l7e9f6fe6sc68cf1884cb2994@mail.gmail.com>

Hi Frank,

Would you be able to take a look at this new report, bug 2531:
http://bugzilla.open-bio.org/show_bug.cgi?id=2531

The reporter Andrea Betancourt says she is using Biopython 1.44, while
I am on CVS (which should be equivalent to Biopython 1.46 for
Bio.Nexus).  Her reported symptoms and what I see are different... but
she has provided a test file to work from.

Thanks,

Peter

From p.j.a.cock at googlemail.com  Mon Jun 30 11:00:22 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 30 Jun 2008 16:00:22 +0100
Subject: [Biopython-dev] New release
In-Reply-To: <97693.82874.qm@web62401.mail.re1.yahoo.com>
References: <97693.82874.qm@web62401.mail.re1.yahoo.com>
Message-ID: <320fb6e00806300800rd74082eqabbd1a2bef66da76@mail.gmail.com>

On Mon, Jun 30, 2008 at 3:55 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Sorry, but I still can't edit the Biopython wiki pages, so I can't make the new
> release available. Can other people edit these pages?

No - as soon as I saw the wiki came back to life last night I tried,
and have tried again today.  I can make changes, view the preview and
differences, but I just get a blank page when I click submit.  I sent
off an email to OBF to alert them in case you hadn't.

I see the Biopython 1.46 files themselves are now online at
http://biopython.org/DIST/ so at least some of the web-server is
running properly :)

We could just do the announcement by email and the news page, and fix
the wiki later.  But it does risk causing a little confusion in the
short term.

Peter

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 11:36:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 11:36:17 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301536.m5UFaHlo004669@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #5 from abetanco at staffmail.ed.ac.uk  2008-06-30 11:36 EST -------
The file was written by a Windows program called DNAsp
(http://www.ub.es/dnasp/), which is widely used by population geneticists,
which is not to say that it didn't write an invalid file.  But it looked OK to
me, other than the too short taxa names. (Those too short names were inherited
from another program).
I don't mind you using for the test unit, but it would be nice if it were cut
down or something, as it is both unwieldy and unpublished data.
A.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 11:38:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 11:38:00 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301538.m5UFc0S4004813@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


fkauff at biologie.uni-kl.de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED


------- Comment #6 from fkauff at biologie.uni-kl.de  2008-06-30 11:38 EST -------
Handling a handle works like a charm for me with the attachment provided:

>>> handle=open('eg.nex')
>>> n=Nexus.Nexus(handle)
>>> n.matrix.keys()
['HI99.Line5.copy', 'am', 'HI99.Line1.copy', 'ezo', 'HI99.Line0.copy',
'DI05.Line5.copy', 'DI05.Line0.copy', 'DI05.Line8.copy1', 'DI05.Line1.copy1',
'HI99.Line3.copy', 'HI99.Line1.copy1', 'DI05.Line1.copy', 'DI05.Line9.copy',
'DI05.Line8.copy', 'HI99.Line4.copy', 'vir', 'DI05.Line8', 'DI05.Line9',
'HI99.Line2.copy', 'DI05.Line2', 'DI05.Line3', 'DI05.Line0', 'DI05.Line1',
'DI05.Line6', 'DI05.Line7', 'DI05.Line4', 'DI05.Line5', 'HI99.Line1',
'HI99.Line0', 'HI99.Line3', 'HI99.Line2', 'HI99.Line5', 'HI99.Line4']

However, Nexus.py needs unique taxon names. Non-unique taxon names won't make
much sense in a nexus file imho. If Nexus.py encounters non-unique names, they
are unified by adding a suffix (.copy, .copy1, ...) to it. Could this cause
problems to SeqIO.NexusIO?

Frank


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 12:12:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 12:12:29 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301612.m5UGCTnZ006531@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 12:12 EST -------
It looks like I didn't have the latest version of Bio.Nexus on this machine
which may have added to the confusion.  I've just updated to CVS (i.e. almost
exactly Biopython 1.46).  My issue with the matrix being None has gone away. 
Opps.

>>> from Bio.Nexus import Nexus
>>> n = Nexus.Nexus(open('eg.nex'))
>>> n.matrix.keys()
['HI99.Line5.copy', 'am', 'HI99.Line1.copy', 'ezo', 'HI99.Line0.copy',
'DI05.Line5.copy', 'DI05.Line0.copy', 'DI05.Line8.copy1', 'DI05.Line1.copy1',
'HI99.Line3.copy', 'HI99.Line1.copy1', 'DI05.Line1.copy', 'DI05.Line9.copy',
'DI05.Line8.copy', 'HI99.Line4.copy', 'vir', 'DI05.Line8', 'DI05.Line9',
'HI99.Line2.copy', 'DI05.Line2', 'DI05.Line3', 'DI05.Line0', 'DI05.Line1',
'DI05.Line6', 'DI05.Line7', 'DI05.Line4', 'DI05.Line5', 'HI99.Line1',
'HI99.Line0', 'HI99.Line3', 'HI99.Line2', 'HI99.Line5', 'HI99.Line4']
>>> assert [id for id in n.matrix] == n.matrix.keys()
>>> n.matrix['HI99.Line5']
Seq('ATCGATAGCATTGCGG-GGACGACGATGGACATTTGGAAAACGAATATGAAAAT...GAG',
IUPACAmbiguousDNA())
>>> n.matrix['HI99.Line5'][249-1]
'T'
>>> n.matrix['HI99.Line5'][417-1]
'T'
>>> n.matrix['HI99.Line5'][452-1]
'A'
>>> n.matrix['HI99.Line5.copy']
Seq('ATCGATAGCATTGCGGCGGACGACGATGGACATTTGGAAAACGAATATGAAAAT...GAG',
IUPACAmbiguousDNA())
>>> n.matrix['HI99.Line5.copy'][249-1]
'C'
>>> n.matrix['HI99.Line5.copy'][417-1]
'C'
>>> n.matrix['HI99.Line5.copy'][452-1]
'G'

So far this looks good.  However:

>>> n.original_taxon_order
['vir', 'am', 'ezo', 'DI05.Line5', 'DI05.Line1', 'DI05.Line9', 'DI05.Line2',
'DI05.Line3', 'HI99.Line2', 'HI99.Line1', 'HI99.Line5', 'DI05.Line4',
'DI05.Line1', 'DI05.Line7', 'HI99.Line3', 'DI05.Line6', 'DI05.Line8',
'HI99.Line4', 'DI05.Line1', 'HI99.Line1', 'DI05.Line8', 'DI05.Line5',
'HI99.Line2', 'HI99.Line0', 'HI99.Line0', 'HI99.Line5', 'DI05.Line9',
'HI99.Line3', 'DI05.Line0', 'DI05.Line0', 'HI99.Line4', 'HI99.Line1',
'DI05.Line8']

In the Bio.SeqIO code that calls Bio.Nexus, I hadn't realized that Bio.Nexus
kept the un-edited taxon names around.  It is this list of the non-unique
original identifiers that Bio.SeqIO was using, which explains why you end up
with two copies of HI99.Line5.

Sorry Frank - I was pointing fingers when it was my own bug after all!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 12:20:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 12:20:20 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301620.m5UGKK7M007026@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 12:20 EST -------
Frank, 

Looking back, the reason I was using the original_taxon_order list was I wanted
to get the sequences in their original order.  I see now that I can't use the
elements in this list as keys to the matrix because the matrix keys are the
modified taxon names.

Is there any way to get the modified taxon names in the original order?  Other
than looping over original_taxon_order and repeating your naming algorithm?

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 13:07:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 13:07:05 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301707.m5UH75I7009356@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 13:07 EST -------
Created an attachment (id=957)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=957&action=view)
Sample input file

Simple example file without a TAXA block

Second example file to follow


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 13:22:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 13:22:23 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301722.m5UHMNo4010009@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 13:22 EST -------
Created an attachment (id=958)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=958&action=view)
Second example file

Using the first file where there is no TAXA block:

>>> from Bio.Nexus import Nexus
>>> n = Nexus.Nexus(open('dup_names.nex'))
>>> print n.matrix.keys()
['CATH_HUMAN', 'CYS1_DICDI', 'CYS1_DICDI.copy', 'ALEU_HORVU']
>>> print n.original_taxon_order
['CYS1_DICDI', 'ALEU_HORVU', 'CATH_HUMAN', 'CYS1_DICDI.copy']

Then with a TAXA block,

>>> n2 = Nexus.Nexus(open('dup_names2.nex'))
>>> print n2.matrix.keys()
['CATH_HUMAN', 'CYS1_DICDI', 'CYS1_DICDI.copy', 'ALEU_HORVU']
>>> print n2.original_taxon_order
['CYS1_DICDI', 'ALEU_HORVU', 'CATH_HUMAN', 'CYS1_DICDI']

Notice the different behaviour of the original_taxon_order list.  In the first
case it gets the modified names, in the second case it doesn't.

Is this deliberate Frank?  On the other hand, maybe Nexus files without a TAXA
block are rare in real life?  Are they?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From fkauff at biologie.uni-kl.de  Mon Jun 30 13:10:15 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Mon, 30 Jun 2008 19:10:15 +0200
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a
 problem with identical taxa names
In-Reply-To: <200806301612.m5UGCTnZ006531@portal.open-bio.org>
References: <200806301612.m5UGCTnZ006531@portal.open-bio.org>
Message-ID: <48691377.803@biologie.uni-kl.de>


bugzilla-daemon at portal.open-bio.org wrote:
>
>
> In the Bio.SeqIO code that calls Bio.Nexus, I hadn't realized that Bio.Nexus
> kept the un-edited taxon names around.  It is this list of the non-unique
> original identifiers that Bio.SeqIO was using, which explains why you end up
> with two copies of HI99.Line5.
>
> Sorry Frank - I was pointing fingers when it was my own bug after all!
>
>
> Looking back, the reason I was using the original_taxon_order list was I wanted
> to get the sequences in their original order.  I see now that I can't use the
> elements in this list as keys to the matrix because the matrix keys are the
> modified taxon names.
>
> Is there any way to get the modified taxon names in the original order?  Other
> than looping over original_taxon_order and repeating your naming algorithm?
>   
Actually -this *IS* a bug. All fingers were pointing correctly... 
Original_taxon labels was just kept just for compatibility, and is the 
same as taxlabels. Taxlabels is supposed to have the unique identifiers 
- it just doesn't work correctly with non-unique ids in interleaved data 
sets.
Fix following soon

Frank

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 13:28:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 13:28:25 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301728.m5UHSPVk010377@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 13:28 EST -------
Created an attachment (id=959)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=959&action=view)
Tentative patch to Bio/SeqIO/NexusIO.py

This seems to cope with Andrea's real input file and my two hand written ones. 
It works by taking the original_taxon_order lists, and applying the
disambiguation algorithm if needed.  Not very elegant!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 15:29:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 15:29:32 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301929.m5UJTWYQ015982@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 15:29 EST -------
Created an attachment (id=960)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=960&action=view)
Suggested patch to Bio/Nexus/Nexus.py

This modifies Bio.Nexus to ensure that the original_taxon_order uses the
original (duplicated) names, resolving the discrepancy I reported in comment
10.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 17:18:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 17:18:48 -0400
Subject: [Biopython-dev] [Bug 2520] Reading ACE assembly contig files in
	Bio.SeqIO
In-Reply-To: <bug-2520-42@http.bugzilla.open-bio.org/>
Message-ID: <200806302118.m5ULImoB021255@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2520


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 17:18 EST -------
Checked into CVS.

We'll need to revisit this once we have a good way of dealing with
per-letter-annotation which would be suitable for the quality scores.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 18:50:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 18:50:01 -0400
Subject: [Biopython-dev] [Bug 2532] New: Using IUPAC alphabets in mixed case
	Seq objects
Message-ID: <bug-2532-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2532

           Summary: Using IUPAC alphabets in mixed case Seq objects
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


Bio.Alphabets.IUPAC defines a number of alphabets with defined lists of valid
letters which are in upper case ONLY.

Bio.Nexus and Bio.Sequencing.Phd create Seq objects which use these alphabets
even with mixed case sequences.

This contradicts how I think the alphabet's .letters property is intended to be
used (although currently this is not enforced by the Seq object).

I suggest either:

(a) Bio.Nexus etc switch to using generic DNA/RNA alphabets for any Seq objects
including lower case letters (or more simply, all Seq objects).

(b) We add lower case and mixed case variants of the alphabet objects, and use
the mixed case IUPAC alphabets in Bio.Nexus etc for the Seq objects.

There is also the option of (c) Extend the existing upper case only IUPAC
alphabets to include lower case too, but I fear this could have unexpected side
effects (e.g. where people looping over the expected set of letters).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 18:51:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 18:51:17 -0400
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
	objects
In-Reply-To: <bug-2532-42@http.bugzilla.open-bio.org/>
Message-ID: <200806302251.m5UMpHBf024519@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2532


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 18:51 EST -------
Created an attachment (id=961)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=961&action=view)
Patch to Bio.Sequencing.Phd

This takes the simple route of using a generic DNA alphabet.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jun  2 08:19:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 04:19:50 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200806020819.m528JoXn006809@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #19 from ibdeno at gmail.com  2008-06-02 04:19 EST -------
Thank you, Peter.

In principle, I don't use that information. I will try then with the XML
parser.

Cheers,


Miguel

(In reply to comment #18)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun  2 08:49:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 04:49:55 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200806020849.m528ntdY008609@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #20 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-02 04:49 EST -------
Marking this bug as fixed.

The original report was about parsing the plain text output which is fixed -
see comment 12, and Bio/Blast/NCBIStandalone.py CVS revision 1.72.  I have not
added the 2.2.18 plain text file as a unit test since its over 750kb.

For the XML output from 2.2.18, as far as I can tell we are not ignoring any
important information from PSI-BLAST, as it is simply not included.  If the
NCBI updates the XML output from blastpgp then we should revisit the XML
parsing.

Thank you Miguel for your report and assistance.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun  2 10:37:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 06:37:51 -0400
Subject: [Biopython-dev] [Bug 2503] An error when parsing NCBIWWW Blast
	output
In-Reply-To: <bug-2503-42@http.bugzilla.open-bio.org/>
Message-ID: <200806021037.m52Abpj9019177@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2503


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-02 06:37 EST -------
Dear Prashanth,

Unless you can provide some more information, I'm going to have to close Bug
2503, as you haven't given us enough to go on.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun  2 12:57:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 08:57:20 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To: <bug-1944-42@http.bugzilla.open-bio.org/>
Message-ID: <200806021257.m52CvKt4026676@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1944


------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-02 08:57 EST -------
I've added simple __str__ and __repr__ methods to the alignment class in
Bio/Align/Generic.py CVS revision 1.8, which give output like this:

str(a):
DNAAlphabet() alignment with 3 rows and 14 columns
ACGATCAGCTAGCT Alpha
CCGATCAGCTAGCT Beta
ACGATGAGCTAGCT Gamma

repr(a):
<__main__.Alignment instance (3 records of length 14, DNAAlphabet()) at
9e96c2c>

The string output gets truncated to show a maximum of 20 rows and 50 columns,
which allowing for typical identifiers will still display nicely on a default
terminal.

I now intend to update the tutorial, as being able to print an alignment should
make it much easier to explain and get to grips with.

Note that there is still some interesting code in both attachment 732 (the
__getitem__ method) and in attachment 770 (e.g. subclassing list and adding
__len__, __add__, __radd__ etc).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun  2 13:26:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 09:26:28 -0400
Subject: [Biopython-dev] [Bug 2507] New: Adding __getitem__ to SeqRecord for
	element access and slicing
Message-ID: <bug-2507-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507

           Summary: Adding __getitem__ to SeqRecord for element access and
                    slicing
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
OtherBugsDependingO 1944
             nThis:


With a Seq object, you can access individual letters and create sub-sequences
using slicing.  You can even use a stride to reverse the sequence, or select
every third letter.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_seq = Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC', IUPAC.unambiguous_dna)
>>> print my_seq
GATCGATGGGCCTATATAGGATCGAAAATCGC
>>> my_seq
Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC', IUPACUnambiguousDNA())
>>> my_seq[5:10]
Seq('ATGGG', IUPACUnambiguousDNA())
>>> my_seq[::-1]
Seq('CGCTAAAAGCTAGGATATATCCGGGTAGCTAG', IUPACUnambiguousDNA())
>>> my_seq[5]
'A'

Currently, these operations cannot be done with a SeqRecord object.  This
enhancement bug is to allow element access and splicing (perhaps even with a
stride) on SeqRecord objects, where the annotations are taken into
consideration, and preserved as far as reasonably possible.

Looking at the different SeqRecord properties, this is what I think should
happen for creating a sub-sequence:

.id, .name, .description (three strings) - preserve?

Blindly preserving these may not always be meaningful.  For example, if the
description was "Complete plasmid" then it doesn't really apply to a
sub-sequence.  Perhaps we should preserve only the id and name, and set the
description to "sub-sequence"?

.annotations (dictionary) - either preserve or lose?

Some annotation entries will still be valid for a sub-sequence (e.g. "source"
or references).  Others will not (e.g. anything describing its coordinates
within a larger parent sequence).  There is no reliable way to decide on a case
by case basis.

.dbxrefs (list of strings) - preserve?

Any database cross-references would arguably still apply to a sub-sequence or
even a reversed sequence.

.features (list of SeqFeatures) - select only those features still in the new
sub-sequence, and adjust their locations for the new coordinates.  Supporting
strides other than +1 would be complicated!  For simplicity, I would say any
feature only partially within the sub-sequence should be discarded.

In summary, one clearly defined set of actions on creating a sub-sequence could
be to preserve all the annotation data except the SeqFeatures which would be
handled sensibly.

[If we later support "per-letter-annotation" in either a Seq or SeqRecord
subclass, then this too should be spliced]

Adding a __getitem__ method to the SeqRecord as outlined above should be
compatible with the suggestion that the SeqRecord subclasses the Seq object
(see bug 2351).

A related point, when accessing single letters, e.g. record[0], should a single
letter string be returned (which lacks any annotation) as currently happens
with the Seq object?

P.S. I'm marking this new enhancement bug as blocking bug 1944.  Once SeqRecord
objects support splicing, this would make annotation preserving slicing of
alignment objects much more straightforward.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun  2 13:26:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 09:26:33 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To: <bug-1944-42@http.bugzilla.open-bio.org/>
Message-ID: <200806021326.m52DQXk2029561@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1944


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |2507


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun  2 14:00:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 10:00:15 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806021400.m52E0FJK032027@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-02 10:00 EST -------
Simple implementation with ignores the features (non-trivial) to be added to
the SeqRecord class in Bio/SeqRecord.py

    def __getitem__(self, index) :
        if isinstance(index, int) :
            #TODO - Should single letters be returned as just
            #strings?  This prevents the inclusion of any annotation.
            #Revisit this once the Seq object is a subclass of string.
            return self.seq[index]
        elif isinstance(index, slice) :
            answer = self.__class__(self.seq[index],
                                    id=self.id,
                                    name=self.name,
                                    description=self.description)
            #COPY the annotation dict and dbxefs list:
            answer.annotations = dict(self.annotations.iteritems())
            answer.dbxrefs = self.dbxrefs[:]
            #TODO - select relevant features, and add them with
            #adjusted coordinates.  Take special care with a stride!
            return answer
        raise ValueError, "Invalid index"


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun  2 14:12:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 10:12:29 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806021412.m52ECT86000330@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


------- Comment #2 from jblanca at btc.upv.es  2008-06-02 10:12 EST -------
Does this means that SeqRecord would deprecate the .seq attribute? If the .seq
attribute is not removed slicing could be used in it like: my_seq[1:100] and
my_seq.seq[1:100].


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Mon Jun  2 14:14:40 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 2 Jun 2008 15:14:40 +0100
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <1211779470.483a498e18e3e@webmail.upv.es>
References: <320fb6e00805251437n34362f0bm2a323cd1194afaa@mail.gmail.com>
	<1211779470.483a498e18e3e@webmail.upv.es>
Message-ID: <320fb6e00806020714s2c789f61ke676a448e2ec871a@mail.gmail.com>

In reply to Jose, I (Peter) wrote:
>> One of your points seemed to be that the SeqRecord couldn't have a
>> __getitem__ and methods like reverse, complement, etc.  I don't see
>> why it couldn't have these.  Perhaps rather than introducing a whole
>> new class, enhancing the SeqRecord would be a better avenue.

I've filed Bug 2507 to try and show what I had in mind for the
__getitem__ method.
http://bugzilla.open-bio.org/show_bug.cgi?id=2507

Adding further methods for (reverse) complement etc could be done in
much the same way.

Returning to extending Biopython to support per-letter-annotation, I
can see two options:

Right now, the SeqRecord object HAS a Seq object.  If we create a new
RichSeq which subclasses the Seq object to provide
per-letter-annotation, then you could use a SeqRecord where the .seq
property is in fact a RichSeq object.  The SeqRecord class doesn't
need to have any changes made for this to work (assuming the RichSeq
provides the same API as the Seq object).

If we make the SeqRecord a subclass of the Seq object, then I would
suggest either RichSeq subclassing SeqRecord subclassing Seq, or
perhaps SeqRecord subclassing RichSeq subclassing Seq.  It depends on
if you think the id/name/description/dbxrefs/etc properties would be
useful in common use cases of the RichSeq object.

Its not going to be possible for all three classes to have the same
__init__ parameters without breaking existing scripts (and only
supporting the lowest common denominator).

Peter


From jblanca at btc.upv.es  Mon Jun  2 19:11:19 2008
From: jblanca at btc.upv.es (Blanca Postigo Jose Miguel)
Date: Mon,  2 Jun 2008 21:11:19 +0200
Subject: [Biopython-dev] Fwd: Re:  sequence class proposal
Message-ID: <1212433879.484445d7a6117@webmail.upv.es>


----- Mensaje reenviado de Blanca Postigo Jose Miguel <jblanca at btc.upv.es> -----
   Fecha: Mon,  2 Jun 2008 21:08:59 +0200
      De: Blanca Postigo Jose Miguel <jblanca at btc.upv.es>
Responder-A: Blanca Postigo Jose Miguel <jblanca at btc.upv.es>
 Asunto: Re: [Biopython-dev] sequence class proposal
    Para: Peter <biopython at maubp.freeserve.co.uk>

Mensaje citado por Peter <biopython at maubp.freeserve.co.uk>:

> In reply to Jose, I (Peter) wrote:
> >> One of your points seemed to be that the SeqRecord couldn't have a
> >> __getitem__ and methods like reverse, complement, etc.  I don't see
> >> why it couldn't have these.  Perhaps rather than introducing a whole
> >> new class, enhancing the SeqRecord would be a better avenue.
>
> I've filed Bug 2507 to try and show what I had in mind for the
> __getitem__ method.
> http://bugzilla.open-bio.org/show_bug.cgi?id=2507
I think that would be great. I've just added to the bug a question about the
.seq property of SeqRecord.

> Adding further methods for (reverse) complement etc could be done in
> much the same way.
>
> Returning to extending Biopython to support per-letter-annotation, I
> can see two options:
>
> Right now, the SeqRecord object HAS a Seq object.  If we create a new
> RichSeq which subclasses the Seq object to provide
> per-letter-annotation, then you could use a SeqRecord where the .seq
> property is in fact a RichSeq object.  The SeqRecord class doesn't
> need to have any changes made for this to work (assuming the RichSeq
> provides the same API as the Seq object).
Here I had a slighty different idea, but maybe yours is better. Basically my
RichSeq proposal is just a RichSeq with slicing and without the seq property.
The problem with the approach that you describe is that the RichSeq should have
the per-letter-annotation, so SeqRecord would have a general annotation and
RichSeq (in the .seq) would have other features. I would find that confusing.

>
> If we make the SeqRecord a subclass of the Seq object, then I would
> suggest either RichSeq subclassing SeqRecord subclassing Seq, or
> perhaps SeqRecord subclassing RichSeq subclassing Seq.  It depends on
> if you think the id/name/description/dbxrefs/etc properties would be
> useful in common use cases of the RichSeq object.
If SeqRecord is a subclass of Seq RichSeq is not necessary anymore. That's what
I was proposing. The problem is that the current users of SeqRecord would had a
hard time with the new behaviour, because in that case supporting the seq
property would be hard. To avoid that breakage I was proposing to create
RichSeq. RichSeq would be just the SeqRecord that you propose but would allow
the users to migrate to RichSeq without forcing them to change to a new
SeqRecord behaviour.

>
> Its not going to be possible for all three classes to have the same
> __init__ parameters without breaking existing scripts (and only
> supporting the lowest common denominator).
That's another reason to rename your new proposed SeqRecord to RichSeq.

>
> Peter
>

Jose Blanca

-- 
----- Fin del mensaje reenviado -----


-- 


From biopython at maubp.freeserve.co.uk  Mon Jun  2 19:51:30 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 2 Jun 2008 20:51:30 +0100
Subject: [Biopython-dev] Fwd: Re: sequence class proposal
In-Reply-To: <1212433879.484445d7a6117@webmail.upv.es>
References: <1212433879.484445d7a6117@webmail.upv.es>
Message-ID: <320fb6e00806021251q6cc1a7e8p36125c1326ab7a14@mail.gmail.com>

Jose wrote:
> > I've filed Bug 2507 to try and show what I had in mind for the
> > __getitem__ method.
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2507
>
> I think that would be great.

Good :)

Does anyone else want to comment?

>  I've just added to the bug a question about the .seq property of SeqRecord.

http://bugzilla.open-bio.org/show_bug.cgi?id=2507#c2 reads:
> Does this means that SeqRecord would deprecate the .seq attribute?
> If the .seq attribute is not removed slicing could be used in it like:
> my_seq[1:100] and my_seq.seq[1:100].

I was not intending to deprecate the SeqRecord's .seq property at this
time (I think that should happen in preparation for if/when the
SeqRecord becomes a subclass of the Seq object).

With my idea described on bug 2507, given a SeqRecord object my_seq_record:

my_seq_record[1:100] -> another SeqRecord (with annotation)
my_seq_record.seq[1:100] -> just a Seq object (no annotation)
my_seq_record.seq.tostring()[1:100] -> just a string (no annotation or alphabet)
str(my_seq_record.seq)[1:100] -> just a string (no annotation or alphabet)

These trivial examples would all "contain" the same sequence string.
This enhancement could be done right now, and shouldn't impeed any
future per-letter-annotation enhancements.

Perhaps per-letter-annotation enhancements could be added to the
SeqRecord class directly... I need to fully digest the discussion on
the BioSQL list, see:
http://lists.open-bio.org/pipermail/biosql-l/2008-May/thread.html

Peter


From mjldehoon at yahoo.com  Tue Jun  3 00:19:59 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 2 Jun 2008 17:19:59 -0700 (PDT)
Subject: [Biopython-dev] Bio.Entrez & Bio.EUtil
In-Reply-To: <320fb6e00805300717v60f0b153i88b5e9a8aee1744c@mail.gmail.com>
Message-ID: <624249.42121.qm@web62408.mail.re1.yahoo.com>

OK I'll double-check. I may not have noticed some missing DTDs if they were downloaded automatically from the internet. I think Biopython should ship the most common DTDs. At least the ones needed for test_Entrez, which probably covers most of the use cases of Bio.Entrez.

--Michiel.

Peter <biopython at maubp.freeserve.co.uk> wrote: On 24 May 2008, Michiel de Hoon wrote:
> Dear all,
>
> I have essentially completed the parser in Bio.Entrez.

The internals of the new design look more complicated to start with,
but I can see how much more general it is than the older versions :)

Should it work starting from an empty DTDs folder - or will we ship
Biopython with most of the current files?  I've had trouble with
Biopython trying to fetch missing DTD files from the internet.  I
think the problem is the NCBI using relative URLs.  The following
quick hack seems to help in Parser.py but only in some cases (because
as listed below, the NCBI have two different base paths):

279,280c279,288
<             warnings.warn("DTD file %s not found in Biopython
installation; trying to retrieve it from NCBI" % filename)
<             handle = urllib.urlopen(systemId)
---
>             warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % path)
>             if "/" in systemId :
>                 #Assume this is a full path, e.g.
>                 #http://www.ncbi.nlm.nih.gov/entrez/query/DTD/nlmmedline_080101.dtd
>                 handle = urllib.urlopen(systemId)
>             else :
>                 #Its a relative path, and I'm not sure how to best get the base path:
>                 handle = urllib.urlopen("http://www.ncbi.nlm.nih.gov/entrez/query/DTD/"+systemId)

(Also note there seem to be some tab/space isssues in this file).

>From http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ I've downloaded the
following files using wget:

egquery.dtd
eSearch_020511.dtd
nlmcommon_080101.dtd
pubmed_080101.dtd
eInfo_020511.dtd
eSpell.dtd
nlmmedline_080101.dtd
taxon.dtd
eLink_020511.dtd
eSummary_041029.dtd
nlmmedlinecitation_080101.dtd
uilist.dtd
ePost_020511.dtd
nlmsharedcatcit_080101.dtd

Additionally http://www.ncbi.nlm.nih.gov/dtd/ provided some further
XML files needed for the test_Entrez.py unit test:

NCBI_GBSeq.dtd
NCBI_GBSeq.mod.dtd
NCBI_Entity.mod.dtd
NCBI_Mim.dtd
NCBI_Mim.mod.dtd

With all the above files, then the unit test file test_Entrez.py
doesn't give any missing DTD warnings - but still has a couple of
failures.

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Jun  3 04:39:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jun 2008 00:39:24 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806030439.m534dOYI021682@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp  2008-06-03 00:39 EST -------
I agree that type checking is a problem.
I am not sure if a specialized function in Bio.File is a good idea. The
question is not if "this object is a file-like object", but "does this object
have the attributes/methods needed". So I would prefer to add checks only for
the required attributes/methods in each of the iterators.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Tue Jun  3 04:33:27 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 2 Jun 2008 21:33:27 -0700 (PDT)
Subject: [Biopython-dev] Bio.Entrez & Bio.EUtil
In-Reply-To: <624249.42121.qm@web62408.mail.re1.yahoo.com>
Message-ID: <112249.61498.qm@web62410.mail.re1.yahoo.com>

I checked but I did not see any missing DTDs. Most of the DTDs in the list you sent are in Biopython's CVS under Bio/Entrez/DTDs, and are included correctly if I do a fresh checkout from CVS. Maybe could you try with a fresh checkout?

--Michiel.

Michiel de Hoon <mjldehoon at yahoo.com> wrote: OK I'll double-check. I may not have noticed some missing DTDs if they were downloaded automatically from the internet. I think Biopython should ship the most common DTDs. At least the ones needed for test_Entrez, which probably covers most of the use cases of Bio.Entrez.

--Michiel.

Peter  wrote: On 24 May 2008, Michiel de Hoon wrote:
> Dear all,
>
> I have essentially completed the parser in Bio.Entrez.

The internals of the new design look more complicated to start with,
but I can see how much more general it is than the older versions :)

Should it work starting from an empty DTDs folder - or will we ship
Biopython with most of the current files?  I've had trouble with
Biopython trying to fetch missing DTD files from the internet.  I
think the problem is the NCBI using relative URLs.  The following
quick hack seems to help in Parser.py but only in some cases (because
as listed below, the NCBI have two different base paths):

279,280c279,288
<             warnings.warn("DTD file %s not found in Biopython
installation; trying to retrieve it from NCBI" % filename)
<             handle = urllib.urlopen(systemId)
---
>             warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % path)
>             if "/" in systemId :
>                 #Assume this is a full path, e.g.
>                 #http://www.ncbi.nlm.nih.gov/entrez/query/DTD/nlmmedline_080101.dtd
>                 handle = urllib.urlopen(systemId)
>             else :
>                 #Its a relative path, and I'm not sure how to best get the base path:
>                 handle = urllib.urlopen("http://www.ncbi.nlm.nih.gov/entrez/query/DTD/"+systemId)

(Also note there seem to be some tab/space isssues in this file).

>From http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ I've downloaded the
following files using wget:

egquery.dtd
eSearch_020511.dtd
nlmcommon_080101.dtd
pubmed_080101.dtd
eInfo_020511.dtd
eSpell.dtd
nlmmedline_080101.dtd
taxon.dtd
eLink_020511.dtd
eSummary_041029.dtd
nlmmedlinecitation_080101.dtd
uilist.dtd
ePost_020511.dtd
nlmsharedcatcit_080101.dtd

Additionally http://www.ncbi.nlm.nih.gov/dtd/ provided some further
XML files needed for the test_Entrez.py unit test:

NCBI_GBSeq.dtd
NCBI_GBSeq.mod.dtd
NCBI_Entity.mod.dtd
NCBI_Mim.dtd
NCBI_Mim.mod.dtd

With all the above files, then the unit test file test_Entrez.py
doesn't give any missing DTD warnings - but still has a couple of
failures.

Peter


_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


From bugzilla-daemon at portal.open-bio.org  Tue Jun  3 09:16:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jun 2008 05:16:48 -0400
Subject: [Biopython-dev] [Bug 2446] Comments in CT tags cause
	Bio.Sequencing.Ace.ACEParser to fail.
In-Reply-To: <bug-2446-42@http.bugzilla.open-bio.org/>
Message-ID: <200806030916.m539GmwZ001955@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2446


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-03 05:16 EST -------
As pointed out on the mailing list, the test cases attached to this bug have
disappeared (some expiry issue?).  In the mean time, we could probably just
edit the sole existing test case in Tests/Ace/contig1.ace to add a comment to
an existing CT tag.

Looking at this file, for example edit:

CT{
Contig1 repeat phrap 52 53 555456:555432
This is the forst line of comment for c1
and this the second for c1
}

to become:

CT{
Contig1 repeat phrap 52 53 555456:555432
COMMENT{
This is the first line of comment for c1
and this the second for c1}
}

In the short term, we could either ignore the COMMENT tags within a CT tag, or
just treat them as plain next.  Supporting the nested structure within the
current would require changes to the current Record structure.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Jun  3 11:46:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jun 2008 07:46:58 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806031146.m53BkwAB009224@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #5 from cracka80 at gmail.com  2008-06-03 07:46 EST -------
(In reply to comment #4)
> I agree that type checking is a problem.
> I am not sure if a specialized function in Bio.File is a good idea. The
> question is not if "this object is a file-like object", but "does this object
> have the attributes/methods needed". So I would prefer to add checks only for
> the required attributes/methods in each of the iterators.
> 

The function I have written does exactly this - it checks for the necessary
attributes and methods for a given object. The iterators would then only need
to call ``File.is_filelike()`` on each object passed into them, rather than a
type checking procedure. This is in accordance with the design pattern "Program
to an 'interface', not an 'implementation'." (Gang of Four). Would you like me
to provide a diff against the current revision of Biopython, with suggested
changes?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Jun  3 15:07:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jun 2008 11:07:35 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806031507.m53F7Zm7019694@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp  2008-06-03 11:07 EST -------
Two things:
1) Some of the code that does type checking for file-like-ness seems to be
quite old and possibly outdated (e.g. Gobase.Iterator). We should take this
opportunity  to go through these modules and check if they are still useful.
2) Many of these modules (especially the ones that use an "Iterator" class)
would be written differently in modern Python (in particular by making use of a
generator function instead of an Iterator class).

So I'd like to suggest the following:
-) For the modules whose usability is dubious in 2008, let's check on the
mailing list if anybody is still using them. If not, we can simply deprecate
them.
-) For the modules that are still useful, use try/except clauses to check for
the necessary attributes. The current function checks for 'read', 'readline',
'readlines', and '__iter__', whereas the parser probably only needs one of
them. 
-) If possible, I'd prefer to convert to modern Python as much as possible
(though formally that is not within the scope of this bug report).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jun  4 19:50:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Jun 2008 15:50:14 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806041950.m54JoEPj029720@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


------- Comment #3 from jblanca at btc.upv.es  2008-06-04 15:50 EST -------
Created an attachment (id=927)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=927&action=view)
RichSeq proposal

I have coded a sequence class that fullfils the requirements that I would like
to see. It's very similar to SeqRecord, but it is not compatible with it. It
has no seq property, although that can be solved. The problem with SeqRecord is
that it is not possible to create a class with an __init__ compatible with Seq
and SeqRecord at the same time.
This proposed class is just a draft, it needs more work but I would like to
receive comments about it.
It inherits from MutableSeq so it should be named MutableRichSeq, but it seems
that I'm too lazy to such a long name, I promise to change the name in a later
version and to create a RichSeq with Seq as parent.
Besides RichSeq there is in the attachment two other classes, RichFeature and
BioRange, but I would comment on that in another post.
I think that it is quite important to convert Seq and MutableSeq to newclasses,
what do you think about that? With the new classes we can use properties.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jun  4 20:19:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Jun 2008 16:19:41 -0400
Subject: [Biopython-dev] [Bug 2508] New: NCBIStandalone.blastall: provide
	support for '-F F' and make it safe
Message-ID: <bug-2508-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508

           Summary: NCBIStandalone.blastall:  provide support for '-F F' and
                    make it safe
           Product: Biopython
           Version: 1.44
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: mmokrejs at ribosome.natur.cuni.cz


The local NCBI blast by default masks low-complexity region by SEG algorithm.
I do not see a variable to affect this in NCBIStandalone.blastall().

Luckily, NCBIStandalone.blastall() is an unsafe function and does not check
whether I pass multiple arguments in a value expected to be a string or number.
Thus, I can do:

_blast_out, _error_info = NCBIStandalone.blastall('/usr/bin/blastall',
'blastn', blast_db, _blast_file, matrix='IDENTITY -F 0')

but imagine I would have done:

_blast_out, _error_info = NCBIStandalone.blastall('/usr/bin/blastall',
'blastn', blast_db, _blast_file, matrix='IDENTITY -F 0; rm -rf /etc/passwd')

The function should be protected against such attacks like if it would have
been directly exposed to web users as a CGI script. I propose similar defensive
strategy for all functions calling os.system(), os.exec(), os.popen*(), etc.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 08:52:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 04:52:47 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806050852.m558qlPF031059@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-05 04:52 EST -------
I replied to comment 2 on the mailing list.  I had intended this particular
bugzilla entry (bug 2507) to be very narrow in scope - purely a small backwards
compatible change to the current SeqRecord

Some of the questions in comment 3 might have fit better on Bug 2351 although
its getting rather long.  Rather than taking this issue further off topic, I'll
reply on the mailing list again.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Thu Jun  5 09:17:00 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 5 Jun 2008 10:17:00 +0100
Subject: [Biopython-dev] Fwd: Re: sequence class proposal
In-Reply-To: <320fb6e00806021251q6cc1a7e8p36125c1326ab7a14@mail.gmail.com>
References: <1212433879.484445d7a6117@webmail.upv.es>
	<320fb6e00806021251q6cc1a7e8p36125c1326ab7a14@mail.gmail.com>
Message-ID: <320fb6e00806050217y1c437b01qa7fd21d75a609e8c@mail.gmail.com>

This is in reply to Jose's comment 3 on bug 2507, which was quite broad.
http://bugzilla.open-bio.org/show_bug.cgi?id=2507#c3

> I have coded a sequence class that fullfils the requirements that I
> would like to see. It's very similar to SeqRecord, but it is not compatible
> with it. It has no seq property, although that can be solved. The problem
> with SeqRecord is that it is not possible to create a class with an __init__
> compatible with Seq and SeqRecord at the same time.

Even if one day the SeqRecord is a subclass of the Seq object, there
is no requirement that it have the same __init__ arguments.  In fact,
have to be different because for a SeqRecord you should also supply an
identifier (and potentially a name, description and other annotation).

> This proposed class is just a draft, it needs more work but I would like to
> receive comments about it.  It inherits from MutableSeq so it should be
> named MutableRichSeq, but it seems that I'm too lazy to such a long name,
> I promise to change the name in a later version and to create a RichSeq
> with Seq as parent.

I agree with you here that when getting a single letter (amino acid or
nucleotide) from a sequence with per-letter-annotation, e.g.
my_sequence[5], it would be very nice to have the
per-letter-annotation like the quality included.  This does mean the
object returned can't just be a single one character string.  However,
because the current Seq and MutableSeq classes return a simple string,
unless we return a subclass of a string, this risks breaking other
peoples code.  So, I would conclude that Seq needs to subclass a
string BEFORE we start including support for per-letter-annotation.
Ideally we would have alphabet aware versions of all the string
functions before we made this change (see Bug 2351).

> Besides RichSeq there is in the attachment two other classes, RichFeature
> and BioRange, but I would comment on that in another post.

Your BioRange and BioFeature classes seem somewhat similar to the
current SeqFeature class with its locations (and sub features).

> I think that it is quite important to convert Seq and MutableSeq to newclasses,
> what do you think about that? With the new classes we can use properties.

I have been thinking about deprecating the Seq.data property (and also
the MutableSeq).  The data string (or array) should really be a
private implementation detail, perhaps Seq._data following the
underscore for private convention.  We can then add property methods
to make the Seq.data available (perhaps with a deprecation warning).

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 09:36:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 05:36:18 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806050936.m559aINS001028@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-05 05:36 EST -------
Created an attachment (id=928)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=928&action=view)
Patch to Bio/SeqRecord.py adding __getitem__ and __len__ and __iter__

Patch based on my comment 1, with addition of __len__ allowing len(my_record)
rather than len(my_record.seq) and an explicit __iter__ method (although this
is not required, it lets us give a doc string).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 10:18:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 06:18:11 -0400
Subject: [Biopython-dev] [Bug 2509] New: Deprecating the .data property of
	the Seq and MutableSeq objects
Message-ID: <bug-2509-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2509

           Summary: Deprecating the .data property of the Seq and MutableSeq
                    objects
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
OtherBugsDependingO 2351
             nThis:


In anticipation that the Seq and MutableSeq objects will eventually subclass
the python string, their data property is not needed and confusing.  The
following patch will replace it with a new-class style property methods and a
docstring declaring it to be deprecated.

In the case of the Seq object, the sequence should be read only but the user
can currently modify the data property in place.

In the case of the MutableSeq, the fact that it is internally an array of
characters should be a private implementation detail.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 10:18:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 06:18:14 -0400
Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string,
	even subclass string?
In-Reply-To: <bug-2351-42@http.bugzilla.open-bio.org/>
Message-ID: <200806051018.m55AIE7S003198@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2351


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |2509


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 10:47:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 06:47:43 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806051047.m55AlhBe004755@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-05 06:47 EST -------
Note that adding __len__ has a knock on effect when dealing with SeqRecord
objects with a zero length sequence - they now evaluate to False rather than
True.

This was an issue for some of the unit tests where "if record" was used rather
than the more explicit "if record is not None".

This change could therefore have unexpected side effects in existing scripts,
however adding __len__ is required if we intend to make the SeqRecord act more
like the Seq object.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 11:03:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 07:03:27 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
	for '-F F' and make it safe
In-Reply-To: <bug-2508-42@http.bugzilla.open-bio.org/>
Message-ID: <200806051103.m55B3RUU005472@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-05 07:03 EST -------
You seem to have identified two issues.  Adding support for -F should be fairly
easy.

For the security issue, the caller should be validating their input.  Also if
running from a web-server, the permissions should also be restricted - failing
to do this is asking for trouble.

However, defence in layers would be good.  Would you suggest a simple check for
the ";" character?  What about escaped semi-colons?  Also this a platform
dependant issue.  The ";" character is Unix only.  At the Windows command line
you have to use an &&.

Do you have a patch in mind?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 12:56:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 08:56:21 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
	for '-F F' and make it safe
In-Reply-To: <bug-2508-42@http.bugzilla.open-bio.org/>
Message-ID: <200806051256.m55CuLfC010670@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508


------- Comment #2 from mmokrejs at ribosome.natur.cuni.cz  2008-06-05 08:56 EST -------
For the latter issue, I would go and use some python library to escape shell
metacharacters. cgi.escape() doesn't do what I would like to. Or cgi.wrap()?
Google search returned some hints:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498202
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66012
http://e-articles.info/e/a/title/Command-Injection/
https://bugs.gentoo.org/show_bug.cgi?id=187971#c5
https://bugs.gentoo.org/show_bug.cgi?id=187971#c23
http://mail.python.org/pipermail/python-3000/2007-May/007192.html
http://www.owasp.org/index.php/Interpreter_Injection
http://www.velocityreviews.com/forums/t352309-sql-escaping-module.html


One could learn or even use escaping functions from e.g. MySQLdb.escape()
of MySQLdb.connection.escape_string() but I don't think it is a complete
solution. I will try to think of it more later.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 13:25:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 09:25:43 -0400
Subject: [Biopython-dev] [Bug 2494] _retrieve_taxon in BioSQL.py needs
	urgent optimization
In-Reply-To: <bug-2494-42@http.bugzilla.open-bio.org/>
Message-ID: <200806051325.m55DPhrQ012033@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2494


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-05 09:25 EST -------
I've commited this patch to CVS as part of BioSQL/BioSeq.py revision 1.24

If you could update you installation of Biopython to CVS and test this please
Eric, then I think we can mark this bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 13:29:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 09:29:25 -0400
Subject: [Biopython-dev] [Bug 2509] Deprecating the .data property of the
	Seq and MutableSeq objects
In-Reply-To: <bug-2509-42@http.bugzilla.open-bio.org/>
Message-ID: <200806051329.m55DTP30012244@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2509


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-05 09:29 EST -------
Created an attachment (id=929)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=929&action=view)
Patch to Bio/Seq.py

This turns out to be quite a big change, and while the unit tests still pass
more extensive testing would be a good idea.

Alternatively, we could just leave expose .data as a read only property, and
switch to ._data (or a string subclass) instead.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jun  5 17:55:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 13:55:02 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806051755.m55Ht2TS024644@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #7 from cracka80 at gmail.com  2008-06-05 13:55 EST -------
I understand your approach that these functions should be converted to modern
Python, but it must also be remembered that Biopython as a whole is Python
2.3-compatible, so care must be taken not to modernise code too much. I can't
remember when iterators were phased in, but it should be possible, I think it
was around 2.2 anyway.

(In reply to comment #6)
> Two things:
> 1) Some of the code that does type checking for file-like-ness seems to be
> quite old and possibly outdated (e.g. Gobase.Iterator). We should take this
> opportunity  to go through these modules and check if they are still useful.
> 2) Many of these modules (especially the ones that use an "Iterator" class)
> would be written differently in modern Python (in particular by making use of a
> generator function instead of an Iterator class).
> 
> So I'd like to suggest the following:
> -) For the modules whose usability is dubious in 2008, let's check on the
> mailing list if anybody is still using them. If not, we can simply deprecate
> them.
> -) For the modules that are still useful, use try/except clauses to check for
> the necessary attributes. The current function checks for 'read', 'readline',
> 'readlines', and '__iter__', whereas the parser probably only needs one of
> them. 
> -) If possible, I'd prefer to convert to modern Python as much as possible
> (though formally that is not within the scope of this bug report).
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Jun  7 08:26:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 7 Jun 2008 04:26:54 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806070826.m578Qsj4019312@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp  2008-06-07 04:26 EST -------
(In reply to comment #7)
> I understand your approach that these functions should be converted to modern
> Python, but it must also be remembered that Biopython as a whole is Python
> 2.3-compatible, so care must be taken not to modernise code too much. I can't
> remember when iterators were phased in, but it should be possible, I think it
> was around 2.2 anyway.
> 
Bio.Blast.NCBIXML already uses generator functions to return iterators, so I
think we are fine as far as compatibility with Python 2.3 and later is
concerned.

I'll ask on the mailing list if Bio.Gobase has any users, to get started.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Sat Jun  7 08:35:05 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 7 Jun 2008 01:35:05 -0700 (PDT)
Subject: [Biopython-dev] Bio.Gobase, anybody?
Message-ID: <844450.31822.qm@web62415.mail.re1.yahoo.com>


Hi everbody,

As part of bug report 2454:
http://bugzilla.open-bio.org/show_bug.cgi?id=2454,
I started looking at the Bio.Gobase module.
This module provides access to the gobase database:
http://megasun.bch.umontreal.ca/gobase/

This module is about seven years old and (AFAICT)
is not actively maintained. We don't have documentation
for this module, but the unit tests suggests that it
parses HTML files from gobase. I am not sure exactly
where the HTML files came from, but I doubt that
after seven years this still works.

So I was wondering:
Does anybody use Bio.Gobase?

If not, I suggest we deprecate it for the next release,
and remove it in some future release.
If there are users, we need to make some (small) changes
to this module (that is what the original bug report
was about).

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Mon Jun  9 12:45:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 9 Jun 2008 08:45:24 -0400
Subject: [Biopython-dev] [Bug 2511] New: setup.py problem with del
	sys.modules["Martel"]
Message-ID: <bug-2511-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2511

           Summary: setup.py problem with del sys.modules["Martel"]
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


I'm currently trying to install Biopython from source (CVS) on a clean Mac OS X
machine, without reportlab, Numeric or mxTextTools.  I've run into a small
issue with "python setup.py build" related to the testing for an existing
Martel distribution (since Martel has been distributed separately from
Biopython before) due to the lack of mxTextTools.

Traceback (most recent call last):
  File "setup.py", line 508, in <module>
    'Bio.PopGen': ['SimCoal/data/*.par'],
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/core.py",
line 151, in setup
    dist.run_commands()
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/dist.py",
line 974, in run_commands
    self.run_command(cmd)
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/dist.py",
line 994, in run_command
    cmd_obj.run()
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/command/build.py",
line 112, in run
    self.run_command(cmd_name)
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/cmd.py",
line 333, in run_command
    self.distribution.run_command(command)
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/dist.py",
line 994, in run_command
    cmd_obj.run()
  File "setup.py", line 157, in run
    if not is_Martel_installed():
  File "setup.py", line 292, in is_Martel_installed
    del sys.modules["Martel"]   # Delete the old version of Martel.

The function  is_Martel_installed() starts by trying to load the bundled
Martel, by calling can_import("Martel").  This is failing with an ImportError
from mxTextTools - and hence the Martel version of the bundled copy cannot be
determined.  The next line of  is_Martel_installed() causes the problem:

del sys.modules["Martel"]

I think this only makes sense if the module could be imported, patch to follow.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun  9 12:46:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 9 Jun 2008 08:46:51 -0400
Subject: [Biopython-dev] [Bug 2511] setup.py problem with del
	sys.modules["Martel"]
In-Reply-To: <bug-2511-42@http.bugzilla.open-bio.org/>
Message-ID: <200806091246.m59Ckpts011798@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2511


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-09 08:46 EST -------
Created an attachment (id=930)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=930&action=view)
Patch to setup.py

How does this look Michiel?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Tue Jun 10 11:37:42 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 10 Jun 2008 12:37:42 +0100
Subject: [Biopython-dev] Giving the SeqRecord a length? Evaluating it as a
	boolean
Message-ID: <320fb6e00806100437n21e53369p36c85a810007ca19@mail.gmail.com>

Something we've discussed before is making the SeqRecord more like a
Seq object, perhaps even subclassing it.  I've got a patch on Bug 2507
to make some small steps in this direction - accessing elements of the
sequence by indexing the SeqRecord, i.e. letter = my_seq_record[5], or
iterating over the letters in a SeqRecord's sequence.

http://bugzilla.open-bio.org/show_bug.cgi?id=2507

In addition, I would like to give the SeqRecord a length, allowing
len(my_seq_record) rather than len(my_seq_record.seq).  However, this
has a side effect on the evaluation of a SeqRecord as a boolean.
Before, any sequence was True, but if we add the __len__ method then
any SeqRecord with a zero length sequence will evaluate as False.
This is a real issue, for example you can have GenBank files without a
sequence (see our unit test cases).  One example where this is
important is if you are using an iterator via the .next() method and
had been checking for a returned None by using "if record:" (something
some of the older unit tests were doing) you would have to start using
"if record is not None:" instead.

If the old behaviour is desirable (evaluating a SeqRecord as a boolean
is alway True), we could implement a __nonzero__ method to preserve
it, see: http://docs.python.org/ref/customization.html

What do people think?  Would adding a __len__ method to the SeqRecord
cause trouble?

Peter


From mjldehoon at yahoo.com  Tue Jun 10 23:17:56 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 10 Jun 2008 16:17:56 -0700 (PDT)
Subject: [Biopython-dev] Giving the SeqRecord a length? Evaluating it as
	a boolean
In-Reply-To: <320fb6e00806100437n21e53369p36c85a810007ca19@mail.gmail.com>
Message-ID: <797428.30617.qm@web62402.mail.re1.yahoo.com>

+1 for adding a __len__ method, with a __nonzero__ method to let all SeqRecord objects evaluate as true.

--Michiel.

Peter <biopython at maubp.freeserve.co.uk> wrote: Something we've discussed before is making the SeqRecord more like a
Seq object, perhaps even subclassing it.  I've got a patch on Bug 2507
to make some small steps in this direction - accessing elements of the
sequence by indexing the SeqRecord, i.e. letter = my_seq_record[5], or
iterating over the letters in a SeqRecord's sequence.

http://bugzilla.open-bio.org/show_bug.cgi?id=2507

In addition, I would like to give the SeqRecord a length, allowing
len(my_seq_record) rather than len(my_seq_record.seq).  However, this
has a side effect on the evaluation of a SeqRecord as a boolean.
Before, any sequence was True, but if we add the __len__ method then
any SeqRecord with a zero length sequence will evaluate as False.
This is a real issue, for example you can have GenBank files without a
sequence (see our unit test cases).  One example where this is
important is if you are using an iterator via the .next() method and
had been checking for a returned None by using "if record:" (something
some of the older unit tests were doing) you would have to start using
"if record is not None:" instead.

If the old behaviour is desirable (evaluating a SeqRecord as a boolean
is alway True), we could implement a __nonzero__ method to preserve
it, see: http://docs.python.org/ref/customization.html

What do people think?  Would adding a __len__ method to the SeqRecord
cause trouble?

Peter
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


From bugzilla-daemon at portal.open-bio.org  Tue Jun 10 23:30:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 10 Jun 2008 19:30:20 -0400
Subject: [Biopython-dev] [Bug 2511] setup.py problem with del
	sys.modules["Martel"]
In-Reply-To: <bug-2511-42@http.bugzilla.open-bio.org/>
Message-ID: <200806102330.m5ANUKfo019481@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2511


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2008-06-10 19:30 EST -------
(In reply to comment #1)
> Created an attachment (id=930)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=930&action=view) [details]
> Patch to setup.py
> 
> How does this look Michiel?
> 

That looks find to me, though eventually I would prefer to get rid of the
dependence on Martel/mxTextTools altogether.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Jun 10 23:42:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 10 Jun 2008 19:42:52 -0400
Subject: [Biopython-dev] [Bug 2511] setup.py problem with del
	sys.modules["Martel"]
In-Reply-To: <bug-2511-42@http.bugzilla.open-bio.org/>
Message-ID: <200806102342.m5ANgqct019925@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2511


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-10 19:42 EST -------
In reply to comment 2, would it make sense for the unit test framework to treat
the mxTextTools (or reportlab, or Numeric) import errors as a missing external
dependency?

In the unit tests we used to "ignore" any tests which failed with an
ImportError, but have now switched to our own MissingExternalDependencyError
exception.

We want to distinguish ImportErrors which are external to Biopython (and
therefore can be considered as missing dependencies) from those internal to
Biopython (perhaps due to refactoring or removal of code - a real unit test
failure).  One way to do this would be in the bits of Biopython that try to
import mxTextTools (or any other module) to raise
MissingExternalDependencyError (or something that is a subclass of both
MissingExternalDependencyError and the built in ImportError).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jun 11 06:54:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 02:54:32 -0400
Subject: [Biopython-dev] [Bug 2516] New: Make it clear what is numeric and
	what is numpy
Message-ID: <bug-2516-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2516

           Summary: Make it clear what is numeric and what is numpy
           Product: Biopython
           Version: 1.45
          Platform: PC
               URL: http://www.biopython.org/DIST/docs/install/Installation.
                    html
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: mmokrejs at ribosome.natur.cuni.cz


Hi,
  although both packages are from the same source site, numpy is the newer
implementation whereas numeric is the old, deprecated implementation, right?
Why do you say in the installation docs the following?

"The Numerical Python distribution (also known an Numeric or Numpy) is a fast
implementation of arrays and associated array functionality. This is important
for a number of Biopython modules that deal with number processing. The main
web site for Numeric is: http://sourceforge.net/projects/numpy and downloads
are available from:..."

I think it is fooling.

BTW, is numpy-1.1.0 supported?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jun 11 08:47:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 04:47:32 -0400
Subject: [Biopython-dev] [Bug 2511] setup.py problem with del
	sys.modules["Martel"]
In-Reply-To: <bug-2511-42@http.bugzilla.open-bio.org/>
Message-ID: <200806110847.m5B8lWxd010254@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2511


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-11 04:47 EST -------
Patch checked into CVS as Biopython/setup.py revision 1.133, marking this bug
as fixed.

The issue I raised in comment 3 is still outstanding (external ImportErrors and
the unit tests).  We may want to file a separate bug, or discuss this on the
dev mailing list.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jun 11 08:53:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 04:53:30 -0400
Subject: [Biopython-dev] [Bug 2516] Make it clear what is numeric and what
	is numpy
In-Reply-To: <bug-2516-42@http.bugzilla.open-bio.org/>
Message-ID: <200806110853.m5B8rU2t010552@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2516


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-11 04:53 EST -------
That text is rather out of date - if you are familiar with the history of
Numeric, numarray and numpy you'll know that the old module used with "import
Numeric" was called Numerical Python or NumPy for short. This shorthand was
used in lots of documentation (not just in Biopython). I think the choice to
call the third generation of the array packages numpy has caused a lot of
confusion.

See http://numpy.scipy.org/#older_array

We had updated the Biopython website and other bits of documentation, but had
missed this one.  Thank you for point this out.

P.S. Supporting numpy instead of Numeric is Biopython Bug 2251.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jun 11 09:04:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 05:04:47 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806110904.m5B94li8011303@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-11 05:04 EST -------
I raised the issue of evaluating a SeqRecord as a boolean with a proposal that
would could add __len__ but also add __nonzero__ to ensure that any SeqRecord
evaluates as True (even if the sequence is of length zero):
http://lists.open-bio.org/pipermail/biopython-dev/2008-June/003756.html

Michiel was in favour of this:
> +1 for adding a __len__ method, with a __nonzero__ method to let all SeqRecord
> objects evaluate as true.

The patch isn't ready yet because in addition it doesn't get deal with the
SeqFeature objects.  I think the SeqFeature class needs a _shift(offset) method
to return a copy of itself with its location (and the locations of any
sub-features) adjusted.

I'm still not sure about handling strides, and I am tempted to rule that if a
stride other than one is used then the features of the SeqRecord are lost.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jun 11 13:57:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 09:57:56 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806111357.m5BDvu1I024400@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #928 is|0                           |1
           obsolete|                            |


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-11 09:57 EST -------
Created an attachment (id=937)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=937&action=view)
Patch to Bio/SeqRecord.py and Bio/SeqFeature.py

This modifies the SeqRecord to give it __getitem__ (supporting sliced
annotations including features), __len__ (to return the length of the
sequence). __nonzero__ (to ensure any SeqRecord evaluates as True regardless of
the length of its sequence) and __iter__ (to explicitly support iteration over
the sequence with a docstring).  As part of this, assorted objects in
SeqFeature.py get a private _shift() method taking an integer offset to return
a self copy with an adjusted location.

Note that slices with a stride (other than one) will result in the features
being lost.  Handling (positive) strides would require complicated
consideration about if an exact location is still present, and if not replacing
it with either a fuzzy position or a range.  Negative strides are worse!

The current set of unit tests seem fine, but addition checks would need to be
added to validate this new behaviour.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jun 11 15:26:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 11:26:59 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806111526.m5BFQxMw029057@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #9 from mdehoon at ims.u-tokyo.ac.jp  2008-06-11 11:26 EST -------
I "fixed" SwissProt.SProt.Iterator by deprecating it. Instead of
SwissProt.SProt.Iterator, we recommend using Bio.SwissProt.parse and
Bio.SeqIO.parse.

Next on the to-do list is SwissProt.KeyWList.extract_keywords.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jun 12 14:23:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 12 Jun 2008 10:23:16 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806121423.m5CENG95026678@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp  2008-06-12 10:23 EST -------
SwissProt.KeyWList.extract_keywords could only parse very old SwissProt files.
I deprecated it and wrote a new function "parse" that parses current SwissProt
files. This function does not do the file-like check.

Prosite.Iterator and Prosite.Prodoc.Iterator are next.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From fkauff at biologie.uni-kl.de  Thu Jun 12 14:33:56 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Thu, 12 Jun 2008 16:33:56 +0200
Subject: [Biopython-dev] CVS access and developers web site
In-Reply-To: <320fb6e00805291446x1cebf67bpe3e0818af5b9a7c5@mail.gmail.com>
References: <483E7578.50402@biologie.uni-kl.de>
	<320fb6e00805291446x1cebf67bpe3e0818af5b9a7c5@mail.gmail.com>
Message-ID: <485133D4.2060405@biologie.uni-kl.de>


Peter Cock wrote:
> Hi Frank,
>
> I would try emailing support at helpdesk.open-bio.org using the email
> address associated with your CVS username.  If you've changed email
> address, and you run into problems, I expect Michiel or I could vouch
> for you.
>   
Is somebody monitoring that email address? I got an automated response 
about two weeks ago, and then nothing happened.

> For the website, the wiki usernames are entirely separate and you
> should be able to create a new account if you don't have one already.
> If you want to update the tutorial new HTML and PDF files are loaded
> with each release from the version in CVS.
>   
Thanks Peter, got access to the wiki and updated personal data.

Frank
> Peter
>
> On Thu, May 29, 2008 at 10:20 AM, Frank Kauff <fkauff at biologie.uni-kl.de> wrote:
>   
>> Hi folks,
>>
>> although I've been quiet for a while, I'm still doing some changes to the
>> Nexus parser of biopython from time to time.... I totally lost my passwords
>> to access the repository. Could someone please send me a new password to get
>> write access to cvs? And I would also like to change the information on the
>> biopython developers web site, as they are somewhat outdated.
>> And is this the right place to ask for such things?
>>
>> Thanks!
>>
>> Frank
>>     
>
>   


From bugzilla-daemon at portal.open-bio.org  Thu Jun 12 15:42:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 12 Jun 2008 11:42:58 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806121542.m5CFgw9t029594@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #11 from cracka80 at gmail.com  2008-06-12 11:42 EST -------
Maybe it's a good idea for any parsers/iterators to just use the iterator-like
ability of file handles? Writers would have to function slightly differently,
but since file objects, StringIOs and any other file-like objects must provide
an __iter__ method, it's probably a good idea to take that into consideration
when developing a common interface. In addition, writers could output iterators
or generators, so that they can be chained together to operate on files.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jun 13 16:24:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jun 2008 12:24:29 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806131624.m5DGOTKw025954@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #12 from mdehoon at ims.u-tokyo.ac.jp  2008-06-13 12:24 EST -------
(In reply to comment #11)
> Maybe it's a good idea for any parsers/iterators to just use the iterator-like
> ability of file handles?

In principle, yes. In practice, it's not so easy because many parsers in
Biopython follow the framework in Bio.ParserSupport. These parsers are not
really written to deal with lines pulled one-by-one from a file handle. To
reconcile these two, I pull out data line-by-line from the file handle, store
it in a string, and then call the parser to parse it. This is not ideal, and it
may be a good idea for Biopython at some point to change its parser strategy.

> Writers would have to function slightly differently,
> but since file objects, StringIOs and any other file-like objects must provide
> an __iter__ method, it's probably a good idea to take that into consideration
> when developing a common interface. In addition, writers could output 
> iterators or generators, so that they can be chained together to operate
> on files.
> 
Writers should also be able to just print the record to the screeen. I don't
see how that is easily achievable with generators. 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jun 13 16:27:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jun 2008 12:27:47 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806131627.m5DGRlTE026072@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp  2008-06-13 12:27 EST -------
Medline.Iterator, Prosite.Iterator, and Prosite.Prodoc.Iterator are now fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Jun 14 02:29:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jun 2008 22:29:13 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806140229.m5E2TDdD014417@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #14 from mdehoon at ims.u-tokyo.ac.jp  2008-06-13 22:29 EST -------
I deprecated Bio.Gobase, since no users came forward on the mailing list.

Bio.Rebase is also problematic. It parses HTML from the Rebase database, but it
was written in 2000 and cannot parse current HTML from Rebase (which looks
completely different from the HTML used in 2000).

I'll ask on the mailing list if anybody is willing to update Bio.Rebase.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Sat Jun 14 02:34:05 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 13 Jun 2008 19:34:05 -0700 (PDT)
Subject: [Biopython-dev] Bio.Rebase
Message-ID: <237761.5963.qm@web62409.mail.re1.yahoo.com>

Hi everybody,

As part of bug #2454 on Bugzilla, I am looking at the Bio.Rebase module.
This module parses files (in HTML format) from the Rebase database:
http://rebase.neb.com/rebase/rebase.html

Unfortunately, since this module was written (in 2000) the HTML format used by the Rebase database has changed completely. This module is therefore not able to parse current Rebase HTML files.

Is anybody willing to update Bio.Rebase (either by updating the HTML parser, or preferably by writing a parser for plain-text output from Bio.Rebase)? If not, I think this module should be deprecated.

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Sat Jun 14 02:50:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jun 2008 22:50:42 -0400
Subject: [Biopython-dev] [Bug 2516] Make it clear what is numeric and what
	is numpy
In-Reply-To: <bug-2516-42@http.bugzilla.open-bio.org/>
Message-ID: <200806140250.m5E2ogvf014920@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2516


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2008-06-13 22:50 EST -------
According to the Numerical Python website, the NumPy documentation will become
freely available on September 1, 2008. That would be a good time to start
thinking seriously about converting from the "old" Numerical Python to the
"new" NumPy 1.1.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Sat Jun 14 02:46:37 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 13 Jun 2008 19:46:37 -0700 (PDT)
Subject: [Biopython-dev] Bio.SCOP maintainer?
Message-ID: <523172.98428.qm@web62402.mail.re1.yahoo.com>

Still looking at Bug 2454
(http://bugzilla.open-bio.org/show_bug.cgi?id=2454).

To fix this bug, I'd like to make some changes to Bio.SCOP.
Is anybody currently maintaining Bio.SCOP? The changes I'd like to make are small, but it would be better to discuss with the Bio.SCOP maintainer (if there is one) so I won't get in their way.

--Michiel.

       
From bugzilla-daemon at portal.open-bio.org  Sat Jun 14 09:52:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 14 Jun 2008 05:52:09 -0400
Subject: [Biopython-dev] [Bug 2488] Adding XML parsers to Bio.Entrez
In-Reply-To: <bug-2488-42@http.bugzilla.open-bio.org/>
Message-ID: <200806140952.m5E9q9X9032018@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2488


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp  2008-06-14 05:52 EST -------
We now have parsers for XML returned by Entrez, provided the corresponding DTDs
are available. Bio/Entrez/DTDs contains most (all?) DTDs currently used by
Entrez. If later some DTDs appear to be missing, we can simply add them to
Bio/Entrez/DTDs.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Jun 14 10:29:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 14 Jun 2008 06:29:12 -0400
Subject: [Biopython-dev] [Bug 2516] Make it clear what is numeric and what
	is numpy
In-Reply-To: <bug-2516-42@http.bugzilla.open-bio.org/>
Message-ID: <200806141029.m5EATC64001227@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2516


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp  2008-06-14 06:29 EST -------
Updated the installation instructions (in CVS, at least).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From p.j.a.cock at googlemail.com  Sat Jun 14 22:51:26 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 14 Jun 2008 23:51:26 +0100
Subject: [Biopython-dev] CVS access and developers web site
In-Reply-To: <485133D4.2060405@biologie.uni-kl.de>
References: <483E7578.50402@biologie.uni-kl.de>
	<320fb6e00805291446x1cebf67bpe3e0818af5b9a7c5@mail.gmail.com>
	<485133D4.2060405@biologie.uni-kl.de>
Message-ID: <320fb6e00806141551t56422a98v752e34bbbb38d0aa@mail.gmail.com>

>> Hi Frank,
>>
>> I would try emailing support at helpdesk.open-bio.org using the email
>> address associated with your CVS username.  If you've changed email
>> address, and you run into problems, I expect Michiel or I could vouch
>> for you.
>>
>
> Is somebody monitoring that email address? I got an automated response about
> two weeks ago, and then nothing happened.
>

Maybe someone is on holiday - or they are caught up with BOSC 2008
work?  I can suggest a few specific people at OBF to try and contact
directly if you are still stuck.

In the short term, if there are any urgent fixes you think need to be
checked in, stick them on Bugzilla and I'm sure one of us will be able
to commit them on your behalf.

Peter


From bugzilla-daemon at portal.open-bio.org  Sun Jun 15 07:03:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 15 Jun 2008 03:03:18 -0400
Subject: [Biopython-dev] [Bug 2468] Tutorial needs a fix: Bio.WWW.NCBI
In-Reply-To: <bug-2468-42@http.bugzilla.open-bio.org/>
Message-ID: <200806150703.m5F73IF2007099@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2468


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp  2008-06-15 03:03 EST -------
I created a subsection Examples to the tutorial chapter on Bio.Entrez, and
added 
the example from section 2.5 and Martin's taxonomy example to it. With the
Bio.Entrez currently in CVS, finding the lineage works as follows:

>>> handle = Entrez.esearch(db="Taxonomy", term="Cypripedioideae")
>>> record = Entrez.read(handle)
>>> record["IdList"]
['158330']
>>> handle = Entrez.efetch(db="Taxonomy", id="158330", retmode='xml')
>>> records = Entrez.read(handle)
>>> records[0]['Lineage']
'cellular organisms; Eukaryota; Viridiplantae; Streptophyta; Streptophytina;
 Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliophyta;
 Liliopsida; Asparagales; Orchidaceae'


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 16 19:23:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 16 Jun 2008 15:23:43 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200806161923.m5GJNhZw012022@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #937 is|0                           |1
           obsolete|                            |


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-16 15:23 EST -------
Created an attachment (id=942)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=942&action=view)
Patch to Bio/SeqRecord.py and Bio/SeqFeature.py

I've checked in the SeqRecord __len__ and __nonzero__ methods with CVS
Bio/SeqRecord.py revision 1.17

The earlier __getitem__ and __iter__ patch has been updated accordingly.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 16 20:08:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 16 Jun 2008 16:08:00 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To: <bug-1944-42@http.bugzilla.open-bio.org/>
Message-ID: <200806162008.m5GK80bv014002@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1944


------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-16 16:07 EST -------
Created an attachment (id=943)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=943&action=view)
Minimal __getitem__ method for generic alignment

This patch just adds a __getitem__ to the alignment which ONLY accepts a single
integer index and returns the corresponding SeqRecord object.  I propose to add
this NOW, as I think even just this is a worthwhile improvement.

This is a natural expectation given the current __iter__ behaviour and the
model of the alignment as a list of SeqRecord objects.  Its also part of the
more rich behaviour discussed above, which we can add more easily if/when the
SeqRecord gets a __getitem__ method (bug 2507).

Comments on this particular patch?  Should we add __len__ at the same time
giving the number of rows in the alignments?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From jblanca at btc.upv.es  Tue Jun 17 07:35:38 2008
From: jblanca at btc.upv.es (Jose Blanca)
Date: Tue, 17 Jun 2008 09:35:38 +0200
Subject: [Biopython-dev] [BioPython] Ace contig files in Bio.SeqIO or
	Bio.AlignIO
In-Reply-To: <320fb6e00806160701l428584c0i30acac57338b9357@mail.gmail.com>
References: <320fb6e00806160701l428584c0i30acac57338b9357@mail.gmail.com>
Message-ID: <200806170935.38904.jblanca@btc.upv.es>

Hi:
My main use of the Alignment class is to parse Ace files. I've been thinking 
about that problem recently. My proposal to modify SeqRecord was due to this 
problem. I think that the best solution would be to treat the Alignment as a 
sequence. The consensus would be the actual sequences and the aligned read 
would be features with per-base-annotations. I've implemented such a class 
and it works fine for me. In fact the Alignment class is just a wrapper 
around a standard SeqRecord (I name it RichSeq in my implementation).
To do that you just need a SeqRecord with a __getitem__ method. You have 
already proposing that so that's not a problem.
Padding with spaces is not an option when you're dealing with genomic wide 
alignments, that's one of the problems of the actual Alignment class.
If you want I can send my implementation to the list, although it could take a 
while because I've got my home computer dead.
Best regards,

Jose Blanca

On Monday 16 June 2008 16:01:31 Peter wrote:
> I've recently had to deal with some contig files in the Ace format
> (output by CAP3, but many assembly files will produce this output).
>
> We have a module for parsing Ace files in Biopython,
> Bio.Sequencing.Ace but I was wondering about integrating this into the
> Bio.SeqIO or Bio.AlignIO framework.
> http://www.biopython.org/wiki/SeqIO
> http://www.biopython.org/wiki/AlignIO
>
> I'd like to hear from anyone currently using Ace files, on how they
> tend to treat the data - and if they think a SeqRecord or Alignment
> based representation would be useful.
>
> Each contig in an Ace file could be treated as a SeqRecord using the
> consensus sequence.  The identifiers of each sub-sequence used to
> build the consensus could be stored as database cross-references, or
> perhaps we could store these as SeqFeatures describing which part of
> the consensus they support.  This would then fit into Bio.SeqIO quite
> well.
>
> Alternatively, each contig could be treated as an alignment (with a
> consensus) and integrated into Bio.AlignIO.  One drawback for this is
> doing this with the current generic alignment class would require
> padding the start and/or end of each sequence with gaps in order to
> make every sequence the same length.  However, if we did this (or
> created a more specialised alignment class), the Ace file format would
> then fit into Bio.AlignIO too.
>
> So, Ace users - would either (or both) of the above approaches make
> sense for how you use the Ace contig files?
>
> Thanks
>
> Peter
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)


From biopython at maubp.freeserve.co.uk  Tue Jun 17 08:46:22 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Jun 2008 09:46:22 +0100
Subject: [Biopython-dev] [BioPython] Ace contig files in Bio.SeqIO or
	Bio.AlignIO
In-Reply-To: <200806170935.38904.jblanca@btc.upv.es>
References: <320fb6e00806160701l428584c0i30acac57338b9357@mail.gmail.com>
	<200806170935.38904.jblanca@btc.upv.es>
Message-ID: <320fb6e00806170146j6f1843e6hed4166ad62c84423@mail.gmail.com>

On Tue, Jun 17, 2008 at 8:35 AM, Jose Blanca <jblanca at btc.upv.es> wrote:
> Hi:
> My main use of the Alignment class is to parse Ace files. I've been thinking
> about that problem recently. My proposal to modify SeqRecord was due to this
> problem. I think that the best solution would be to treat the Alignment as a
> sequence. The consensus would be the actual sequences and the aligned read
> would be features with per-base-annotations.

So integrating the "ace" format into Bio.SeqIO representing the
consensus sequence of each contig as a SeqRecord would be useful.
Initially I would try and represent the aligned reads as SeqFeature
objects (much like when reading a genome from a GenBank file you get
CDS features with their amino acid translation).

Note that for memory reasons, I would be inclined to scan over the Ace
file in one pass (using the existing Iterator in the
Bio.Sequencing.Ace parser) returning SeqRecords as we go.  As Frank
points out in the code comments, this means we can't easily include
the WA, CT, RT and WR tags found in the Ace file footer.  Do you use
this information Jose?

> I've implemented such a class
> and it works fine for me. In fact the Alignment class is just a wrapper
> around a standard SeqRecord (I name it RichSeq in my implementation).
> To do that you just need a SeqRecord with a __getitem__ method. You have
> already proposing that so that's not a problem.

Your enthusiasm Jose is one of the things motivating me to try and do
more with the Seq and SeqRecord.  Without a third party to offer
feedback, making big changes is risky.

> Padding with spaces is not an option when you're dealing with genomic wide
> alignments, that's one of the problems of the actual Alignment class.

It might make sense to talk about a "Contig Alignment" object/class,
compared to the existing "multiple sequence alignment"  object/class
where all the sequences are the same length.  Ideally these should
provide as similar an API as possible - even if the internals are
different.  One idea is a sub-class of the current alignment class
which stores an offset (>=0) for each supporting read, used when
accessing columns.  Maybe we should check out BioPerl etc for
inspiration?

> If you want I can send my implementation to the list, although it could take a
> while because I've got my home computer dead.

Good luck with the broken computer - I hope you have an easier time
fixing it / rebuilding it than I did last time this hapended to me.

Regards,

Peter


From biopython at maubp.freeserve.co.uk  Tue Jun 17 09:16:29 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Jun 2008 10:16:29 +0100
Subject: [Biopython-dev] Iterating over Ace contig files
Message-ID: <320fb6e00806170216k12ecd88fof60758db1ccec3cf@mail.gmail.com>

Hello Frank,

I wanted to get your opinion on iterating over the Ace file contig by
contig, and what is lost in the WA, CT, RT and WR tags at the end of
the file by doing this.  As large sequencing runs become more common,
iterating over the file in a single pass WITHOUT keeping everything in
memory does seem to be desirable.

Similar past discussions:
http://portal.open-bio.org/pipermail/biopython/2004-February/001825.html
http://portal.open-bio.org/pipermail/biopython/2005-May/002661.html

Would you object to me rewording your module's header-comment not to
say that the Ace Iterator is NOT deprecated, but rather that it has
certain drawbacks.

[The context for this is my recent thread on the Biopython dev mailing
list about integrating your Bio.Sequencing.Ace parser into Bio.SeqIO
and/or Bio.AlignIO - I've included a little context below.]

Thanks,

Peter

--

Peter wrote:
>> So integrating the "ace" format into Bio.SeqIO representing the
>> consensus sequence of each contig as a SeqRecord would be useful.
>> Initially I would try and represent the aligned reads as SeqFeature
>> objects (much like when reading a genome from a GenBank file you get
>> CDS features with their amino acid translation).
>>
>> Note that for memory reasons, I would be inclined to scan over the Ace
>> file in one pass (using the existing Iterator in the
>> Bio.Sequencing.Ace parser) returning SeqRecords as we go.  As Frank
>> points out in the code comments, this means we can't easily include
>> the WA, CT, RT and WR tags found in the Ace file footer.  Do you use
>> this information Jose?

Jose replied,
> I haven't used the iterator because of the deprecation warning of the code. I
> tried with about 40000 alignments and it worked in a computer with 8 GB of ram.
> I there are more sequences, and there will be with the 454 sequencer, we will
> have trouble reading all at once. I vote for the iterator approach. I have not
> used the information of this tag, but I don't know also what they mean. I've
> been looking for documentation about this format, but I've found none, do you
> have any good ace documentation?


From bugzilla-daemon at portal.open-bio.org  Tue Jun 17 11:23:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Jun 2008 07:23:59 -0400
Subject: [Biopython-dev] [Bug 2520] New: Reading ACE assembly contig files
	in Bio.SeqIO
Message-ID: <bug-2520-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2520

           Summary: Reading ACE assembly contig files in Bio.SeqIO
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


As I suggested on the mailing list, we could use Bio.Sequencing.Ace to parse
ACE assembly files, and then turn each contig into a SeqRecord using the
consensus sequence.

I will attach a basic implementation which only uses the consensus sequence and
its name.  For now this ignores all the meta data and in particular the read
information.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Jun 17 11:29:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Jun 2008 07:29:15 -0400
Subject: [Biopython-dev] [Bug 2520] Reading ACE assembly contig files in
	Bio.SeqIO
In-Reply-To: <bug-2520-42@http.bugzilla.open-bio.org/>
Message-ID: <200806171129.m5HBTFVG026790@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2520


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-17 07:29 EST -------
Created an attachment (id=944)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=944&action=view)
New file Bio/SeqIO/AceIO.py

This new file would be added to Bio.SeqIO in the usual way (updating
Bio/SeqIO/__init__.py to import this module and map the format "ace" to the new
iterator).

Handling different gap characters in Bio.SeqIO (and translating them when
reading and writing files) has not been formalised.  Where possible, converting
them into dashes on loading seems to be a sensisble route to take.

Therefore I deliberately map any "*" gap characters in the consensus sequence
into "-" characters, which are used by default in the alphabet class and are
far more commonly used.  The "*" character is typically associated with a stop
codon in protein sequences, which is another reason to avoid using it here.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From fkauff at biologie.uni-kl.de  Tue Jun 17 13:06:34 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Tue, 17 Jun 2008 15:06:34 +0200
Subject: [Biopython-dev] Iterating over Ace contig files
In-Reply-To: <320fb6e00806170216k12ecd88fof60758db1ccec3cf@mail.gmail.com>
References: <320fb6e00806170216k12ecd88fof60758db1ccec3cf@mail.gmail.com>
Message-ID: <4857B6DA.9040309@biologie.uni-kl.de>

Hi Peter,

makes totally sense to me. Feel free to do the changes as you see it fit

Frank


Peter wrote:
> Hello Frank,
>
> I wanted to get your opinion on iterating over the Ace file contig by
> contig, and what is lost in the WA, CT, RT and WR tags at the end of
> the file by doing this.  As large sequencing runs become more common,
> iterating over the file in a single pass WITHOUT keeping everything in
> memory does seem to be desirable.
>
> Similar past discussions:
> http://portal.open-bio.org/pipermail/biopython/2004-February/001825.html
> http://portal.open-bio.org/pipermail/biopython/2005-May/002661.html
>
> Would you object to me rewording your module's header-comment not to
> say that the Ace Iterator is NOT deprecated, but rather that it has
> certain drawbacks.
>
> [The context for this is my recent thread on the Biopython dev mailing
> list about integrating your Bio.Sequencing.Ace parser into Bio.SeqIO
> and/or Bio.AlignIO - I've included a little context below.]
>
> Thanks,
>
> Peter
>
> --
>
> Peter wrote:
>   
>>> So integrating the "ace" format into Bio.SeqIO representing the
>>> consensus sequence of each contig as a SeqRecord would be useful.
>>> Initially I would try and represent the aligned reads as SeqFeature
>>> objects (much like when reading a genome from a GenBank file you get
>>> CDS features with their amino acid translation).
>>>
>>> Note that for memory reasons, I would be inclined to scan over the Ace
>>> file in one pass (using the existing Iterator in the
>>> Bio.Sequencing.Ace parser) returning SeqRecords as we go.  As Frank
>>> points out in the code comments, this means we can't easily include
>>> the WA, CT, RT and WR tags found in the Ace file footer.  Do you use
>>> this information Jose?
>>>       
>
> Jose replied,
>   
>> I haven't used the iterator because of the deprecation warning of the code. I
>> tried with about 40000 alignments and it worked in a computer with 8 GB of ram.
>> I there are more sequences, and there will be with the 454 sequencer, we will
>> have trouble reading all at once. I vote for the iterator approach. I have not
>> used the information of this tag, but I don't know also what they mean. I've
>> been looking for documentation about this format, but I've found none, do you
>> have any good ace documentation?
>>     
>
>   


From biopython at maubp.freeserve.co.uk  Tue Jun 17 13:53:23 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Jun 2008 14:53:23 +0100
Subject: [Biopython-dev] Iterating over Ace contig files
In-Reply-To: <4857B6DA.9040309@biologie.uni-kl.de>
References: <320fb6e00806170216k12ecd88fof60758db1ccec3cf@mail.gmail.com>
	<4857B6DA.9040309@biologie.uni-kl.de>
Message-ID: <320fb6e00806170653g482b104fl739107fcada06dc8@mail.gmail.com>

On Tue, Jun 17, 2008 at 2:06 PM, Frank Kauff <fkauff at biologie.uni-kl.de> wrote:
> Hi Peter,
>
> makes totally sense to me. Feel free to do the changes as you see it fit
>
> Frank

Thanks Frank.

I've checked in some comment changes to both Ace.py and Phd.py, aimed
at both improving the documentation and trying and make epydoc happier
for the automatic API documentation:
http://biopython.org/DIST/docs/api/

Peter

P.S. I also added an __iter__ method to the Ace Iterator (Phd already had one).


From mjldehoon at yahoo.com  Tue Jun 17 14:08:31 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 17 Jun 2008 07:08:31 -0700 (PDT)
Subject: [Biopython-dev] Iterating over Ace contig files
In-Reply-To: <320fb6e00806170653g482b104fl739107fcada06dc8@mail.gmail.com>
Message-ID: <399611.60966.qm@web62415.mail.re1.yahoo.com>

Note that bug #2454 also pertains to the Ace and Phd parsers. If you are modifying the Ace and Phd parsers, can you fix this bug at the same time?

http://bugzilla.open-bio.org/show_bug.cgi?id=2454

--Michiel.

Peter <biopython at maubp.freeserve.co.uk> wrote: On Tue, Jun 17, 2008 at 2:06 PM, Frank Kauff  wrote:
> Hi Peter,
>
> makes totally sense to me. Feel free to do the changes as you see it fit
>
> Frank

Thanks Frank.

I've checked in some comment changes to both Ace.py and Phd.py, aimed
at both improving the documentation and trying and make epydoc happier
for the automatic API documentation:
http://biopython.org/DIST/docs/api/

Peter

P.S. I also added an __iter__ method to the Ace Iterator (Phd already had one).
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


From bugzilla-daemon at portal.open-bio.org  Tue Jun 17 14:43:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Jun 2008 10:43:42 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806171443.m5HEhgua005645@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-17 10:43 EST -------
I've removed the strict file-like test in:

Bio/Sequencing/Ace.py revision: 1.12
Bio/Sequencing/Phd.py revision: 1.6

In these cases, the handle is immediately turned into an UndoHandle which will
be able to check for a sufficiently file like object.

Hopefully that's what you meant Michiel - we could go further and introduce a
parse() function and deprecate the Iterator objects in these modules.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jun 18 10:34:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 18 Jun 2008 06:34:43 -0400
Subject: [Biopython-dev] [Bug 2503] An error when parsing NCBIWWW Blast
	output
In-Reply-To: <bug-2503-42@http.bugzilla.open-bio.org/>
Message-ID: <200806181034.m5IAYhS1026214@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2503


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-18 06:34 EST -------
I'm closing this bug as "INVALID" due to a lack of information.

If you are still having trouble Prashantha, and can give us some more
information, please re-open this bug.

Thank you.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jun 18 11:34:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 18 Jun 2008 07:34:26 -0400
Subject: [Biopython-dev] [Bug 2497] Unit tests do not cover
	Bio.Blast.NCBIWWW.qblast()
In-Reply-To: <bug-2497-42@http.bugzilla.open-bio.org/>
Message-ID: <200806181134.m5IBYQjC032061@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2497


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-18 07:34 EST -------
I checked in a slightly revised version of this as test_NCBI_qblast.py -
marking this bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jun 18 12:01:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 18 Jun 2008 08:01:11 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
	for '-F F' and make it safe
In-Reply-To: <bug-2508-42@http.bugzilla.open-bio.org/>
Message-ID: <200806181201.m5IC1BxA001255@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-18 08:01 EST -------
Created an attachment (id=946)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=946&action=view)
Patch to Bio/Blast/NCBIStandalone.py and Tests/test_NCBIStandalone.py

Suggested patch for the command injection risk.

Can anyone think of a legitimate reason for a ; or & character in the
parameters of a BLAST command line?  This patch is very simple and will reject
any keyword parameter containing the ; or && characters.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Wed Jun 18 14:00:56 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 18 Jun 2008 15:00:56 +0100
Subject: [Biopython-dev] SeqRecord to file format as string
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
	<47147341.4020708@maubp.freeserve.co.uk>
	<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
	<4714EB8E.3000700@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
Message-ID: <320fb6e00806180700k327e6913m7ba9c4bdc3421f67@mail.gmail.com>

This is returning to a thread last year, about getting a SeqRecord
into a string in a particular file format (e.g. fasta).  Jared Flatow
had suggest adding a method to the SeqRecord itself.

Jared wrote:
>  > ... To always have to write to a file feels strange, but I see
>  > that it would be messy to go OO since there are so many formats.
>  > However, giving preference to fasta over other formats by making it
>  > innate doesn't seem like such a terrible idea. I do have mixed
>  > feelings about 'bloating' the code which is why I asked, and you have
>  > convinced me that this is not quite appropriate given existing
>  > convention. However the idea would be to put the to_fasta or
>  > to_format method inside the SeqRecord, then to call it from the IO
>  > when needed to actually write to a file, but call it directly when
>  > all that is wanted is a string...
>
> Its debatable isn't it?  I suspect that for most users, when they want a
> record in a particular file format its for writing to a file.  However,
> adding a to_format() method to a SeqRecord some sense (suitable for
> sequential file formats only).  This would take a format name and return
> a string, by calling Bio.SeqIO with a StringIO object internally.
>
> Peter

Jared - On reflection, do you think adding a method like this to the
SeqRecord (or even just for the FASTA format) would be useful?

I recently found myself wanting to use this sort of functionality, and
remembered this old thread.  This time I was wondering about using the
method name tostring (matching the name of a Seq object method).  In
order to mimic the Seq object's method, the format would be optional
and when omitted would give the sequence as a string.  Otherwise one
of the lower case strings used in Bio.SeqIO should be supplied.  There
is a sample implementation at the end of this email.
?
On Wed, Oct 17, 2007 Michiel De Hoon wrote:
> How about the following:
>
> SeqIO.write(sequences, handle, format) returns the properly formatted string
> if handle==None.

I can see the above is simpler than having to supply a StringIO
handle, but it doesn't make the functionality available directly from
the SeqRecord object.  It also complicates the API of the SeqIO module
with a special case.

Peter

--

######################################
For the SeqRecord class, in Bio/SeqRecord.py
######################################
    def tostring(self, format=None) :
        """Returns the record as a string in the specified file format.

        If the file format is omitted (default), the sequence itself is
        returned as a string.

        Otherwise the format should be a lower case string supported by
        Bio.SeqIO, which is used to turn the SeqRecord into a string."""
        if format :
            from StringIO import StringIO
            from Bio import SeqIO
            handle = StringIO()
            SeqIO.write([self], handle, format)
            handle.seek(0)
            return handle.read()
        else :
            #Return the sequence as a string
            return self.seq.tostring()
############################################


From jflatow at northwestern.edu  Wed Jun 18 15:25:18 2008
From: jflatow at northwestern.edu (Jared Flatow)
Date: Wed, 18 Jun 2008 10:25:18 -0500
Subject: [Biopython-dev] SeqRecord to file format as string
In-Reply-To: <4D53AB82-F673-4F4F-BCEC-BA06088E8721@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
	<47147341.4020708@maubp.freeserve.co.uk>
	<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
	<4714EB8E.3000700@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
	<320fb6e00806180700k327e6913m7ba9c4bdc3421f67@mail.gmail.com>
	<4D53AB82-F673-4F4F-BCEC-BA06088E8721@northwestern.edu>
Message-ID: <55567F98-C5F5-4A2F-8542-502F17F485E9@northwestern.edu>

Quick correction:

On Jun 18, 2008, at 10:16 AM, Jared Flatow wrote:

> Hi Peter,
>
> On Jun 18, 2008, at 9:00 AM, Peter wrote:
>
>> Jared - On reflection, do you think adding a method like this to the
>> SeqRecord (or even just for the FASTA format) would be useful?
>
> Yes I still think so. In fact, for sequences, I would say that I  
> pretty much never deal with a format ever than FASTA, so even making  
> the __str__ method of SeqRecord return the FASTA format as well  
> seems reasonable, though perhaps my use cases are different than  
> others.
>
> However, py3k and 2.6 will make available the functionality  
> described in PEP 3101:
>
> http://www.python.org/dev/peps/pep-3101/
>
> I think it would be best to define some semantics that are  
> compatible with this PEP. This would basically mean using the  
> __format__ method (which could be the same as the tostring method  
> you have defined below). To achieve backward compatibility and/or a  
> more OO interface, tostring could just be an alias for __format__.  
> Thus, instead of calling format(seq_rec, 'fasta') one could call  
> seq_rec.tostring('fasta') and these would be equivalent. The PEP  
> also states that format(seq_rec) should be the same as str(seq_rec).

On second thought it seems like a .format method (similar to the one  
the string class is acquiring) should be used as an alias to  
__format__ (somehow I think tostring should always be the same as  
__str__)

> In short, I think creating methods to return formatted versions of  
> objects (SeqRecords) is a good idea, but most especially if it is  
> done in a way consistent with the language's vision.
>
> Best,
> jared


From bugzilla-daemon at portal.open-bio.org  Wed Jun 18 15:36:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 18 Jun 2008 11:36:48 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806181536.m5IFamvB015695@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #16 from mdehoon at ims.u-tokyo.ac.jp  2008-06-18 11:36 EST -------
(In reply to comment #15)
> I've removed the strict file-like test in:
> 
> Bio/Sequencing/Ace.py revision: 1.12
> Bio/Sequencing/Phd.py revision: 1.6
> 
> In these cases, the handle is immediately turned into an UndoHandle which will
> be able to check for a sufficiently file like object.
> 
> Hopefully that's what you meant Michiel

Actually, I think we should avoid using an UndoHandle altogether, now that
Python has generator functions.

> - we could go further and introduce a
> parse() function and deprecate the Iterator objects in these modules.
> 
That would make things a lot easier. An Iterator class was useful in older
versions of Python, but generator functions provide a cleaner alternative.

In Ace.py, we'd need three functions:

1) read(handle), which returns one record (Contig) read from the handle, and
None otherwise;

2) parse(handle), a generator function returning an iterator over the records;

3) a local function _process_line(line, record)

These functions then look like this:

def read(handle):
    record = None
    for line in handle:
        if line[:2]=='CO':
            break
    else:
        return None
    record = Contig()
    for line in handle:
        if line[:2]=='CO':
            return record
        else:
            _process_line(line, record)

def parse(handle):
    record = None
    for line in handle:
        if line[:2]=='CO':
            if record:
                yield record
            record = Contig()
        _process_line(line, record)
    if record:
        return record

The actual work is done in _process_line.

So we don't need to store the read lines explicitly; this is now taken care of
by the generator function. Hence, we don't need to convert the handle to an
UndoHandle. In addition, handle can now also be a list of lines instead of a
file handle. In this respect, I think Zachary was right in comment #11:

> Maybe it's a good idea for any parsers/iterators to just
> use the iterator-like ability of file handles?

In other words, as long as we can pull lines from the handle, we can parse it.

In Phd.py, it's even simpler. Here, we only need the read() and parse()
function:

def read(handle):
    for line in handle:
        if line.startswith("BEGIN_SEQUENCE"):
            record = Record()
        elif line.startswith("END_SEQUENCE"):
            return record
        else:
            # do the actual processing of the other lines here

def parse(handle):
    while True:
        record = read(handle)
        if not record:
            return
        yield record

Again, we can process each line just as they come along. No UndoHandle, no
Parser, no Consumer, no Scanner needed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From jflatow at northwestern.edu  Wed Jun 18 15:16:59 2008
From: jflatow at northwestern.edu (Jared Flatow)
Date: Wed, 18 Jun 2008 10:16:59 -0500
Subject: [Biopython-dev] SeqRecord to file format as string
In-Reply-To: <320fb6e00806180700k327e6913m7ba9c4bdc3421f67@mail.gmail.com>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
	<47147341.4020708@maubp.freeserve.co.uk>
	<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
	<4714EB8E.3000700@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
	<320fb6e00806180700k327e6913m7ba9c4bdc3421f67@mail.gmail.com>
Message-ID: <4D53AB82-F673-4F4F-BCEC-BA06088E8721@northwestern.edu>

Hi Peter,

On Jun 18, 2008, at 9:00 AM, Peter wrote:

> Jared - On reflection, do you think adding a method like this to the
> SeqRecord (or even just for the FASTA format) would be useful?

Yes I still think so. In fact, for sequences, I would say that I  
pretty much never deal with a format ever than FASTA, so even making  
the __str__ method of SeqRecord return the FASTA format as well seems  
reasonable, though perhaps my use cases are different than others.

However, py3k and 2.6 will make available the functionality described  
in PEP 3101:

http://www.python.org/dev/peps/pep-3101/

I think it would be best to define some semantics that are compatible  
with this PEP. This would basically mean using the __format__ method  
(which could be the same as the tostring method you have defined  
below). To achieve backward compatibility and/or a more OO interface,  
tostring could just be an alias for __format__. Thus, instead of  
calling format(seq_rec, 'fasta') one could call  
seq_rec.tostring('fasta') and these would be equivalent. The PEP also  
states that format(seq_rec) should be the same as str(seq_rec).

In short, I think creating methods to return formatted versions of  
objects (SeqRecords) is a good idea, but most especially if it is done  
in a way consistent with the language's vision.

Best,
jared


From yair.benita at gmail.com  Wed Jun 18 17:26:02 2008
From: yair.benita at gmail.com (Yair Benita)
Date: Wed, 18 Jun 2008 13:26:02 -0400
Subject: [Biopython-dev] BioPax parser
Message-ID: <C47EBD6A.1B29A%yair.benita@gmail.com>

Hi Guys,
Does anyone have a biopax parser written in python?
Thanks,
Yair


From biopython at maubp.freeserve.co.uk  Wed Jun 18 17:42:13 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 18 Jun 2008 18:42:13 +0100
Subject: [Biopython-dev] BioPax parser
In-Reply-To: <C47EBD6A.1B29A%yair.benita@gmail.com>
References: <C47EBD6A.1B29A%yair.benita@gmail.com>
Message-ID: <320fb6e00806181042y169f580epbd8c876eb3cb57fa@mail.gmail.com>

On Wed, Jun 18, 2008 at 6:26 PM, Yair Benita <yair.benita at gmail.com> wrote:
> Hi Guys,
> Does anyone have a biopax parser written in python?
> Thanks,
> Yair

I don't know of any (but I haven't searched).  From a quick look on
www.biopax.org they use XML, so you should be able to parse it in
python fairly easily - but I guess some sort of object orientated
representation of the data would be very nice to have.

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Jun 19 10:08:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 19 Jun 2008 06:08:55 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
	for '-F F' and make it safe
In-Reply-To: <bug-2508-42@http.bugzilla.open-bio.org/>
Message-ID: <200806191008.m5JA8t0v016495@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-19 06:08 EST -------
On the issue of the low-complexity filter, that is actually already supported
in NCBIStandalone.blastall(), NCBIStandalone.blastpgp() and
NCBIStandalone.rpsblast() using the optional argument 'filter'.  This is
described in the doc string too, although it doesn't use the phrase "low
complexity" which might be clearer.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jun 19 10:20:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 19 Jun 2008 06:20:03 -0400
Subject: [Biopython-dev] [Bug 2494] _retrieve_taxon in BioSQL.py needs
	urgent optimization
In-Reply-To: <bug-2494-42@http.bugzilla.open-bio.org/>
Message-ID: <200806191020.m5JAK3OZ017201@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2494


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-19 06:20 EST -------
I'm marking this as fixed now, but if anyone does find an issue with it please
re-open the bug.  Thanks for your work on this Eric.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jun 19 10:41:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 19 Jun 2008 06:41:22 -0400
Subject: [Biopython-dev] [Bug 2408] GenBank records do not contain U's
In-Reply-To: <bug-2408-42@http.bugzilla.open-bio.org/>
Message-ID: <200806191041.m5JAfMNK018058@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2408


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-19 06:41 EST -------
Given there were no other opinions voiced on how to handle this, I went ahead
and fixed this in Bio/GenBank/__init__.py CVS revision 1.83

For records from RNA, if the sequence contains T but not U, we will use a DNA
alphabet in the Seq object.

Thanks for raising this Marcin.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Thu Jun 19 13:04:31 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 19 Jun 2008 06:04:31 -0700 (PDT)
Subject: [Biopython-dev] Bio.CDD, anyone?
Message-ID: <14893.84074.qm@web62409.mail.re1.yahoo.com>

Hi everybody,

Bio.CDD is a module with a parser for CDD (NCBI's Conserved Domain Database) records. The parser parses HTML pages from CDD's web site. Since the parser was written about six years ago, the CDD web site has changed considerably. Bio.CDD therefore cannot parse current HTML pages from CDD.

So I am wondering:
1) Is anybody using Bio.CDD?
2) Is anybody willing to update Bio.CDD to handle current HTML?
3) If not, can we deprecate it? There is not much purpose of having a parser for HTML pages from years ago.

--Michiel.


From biopython at maubp.freeserve.co.uk  Thu Jun 19 13:38:29 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 19 Jun 2008 14:38:29 +0100
Subject: [Biopython-dev] Bio.CDD, anyone?
In-Reply-To: <14893.84074.qm@web62409.mail.re1.yahoo.com>
References: <14893.84074.qm@web62409.mail.re1.yahoo.com>
Message-ID: <320fb6e00806190638y2e3729e1ga66561de0c962700@mail.gmail.com>

> Bio.CDD is a module with a parser for CDD (NCBI's Conserved Domain Database)
> records. The parser parses HTML pages from CDD's web site. Since the parser
> was written about six years ago, the CDD web site has changed considerably.
> Bio.CDD therefore cannot parse current HTML pages from CDD.

A couple of years ago, I wanted to get the CDD domain name and
description and ended up writing my own very simple and crude parser
to extract just this information.  Doing a proper job would mean
extracting lots and lots of fields, e.g.
http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=29475

I wonder if the NCBI make any of this available as XML via Entrez?  I
had a quick look and couldn't find anything.

Peter


From mjldehoon at yahoo.com  Thu Jun 19 13:58:25 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 19 Jun 2008 06:58:25 -0700 (PDT)
Subject: [Biopython-dev] Bio.CDD, anyone?
In-Reply-To: <320fb6e00806190638y2e3729e1ga66561de0c962700@mail.gmail.com>
Message-ID: <352888.20937.qm@web62409.mail.re1.yahoo.com>

> I wonder if the NCBI make any of this available as XML via Entrez?  I
> had a quick look and couldn't find anything.

Actually I already asked this question to NCBI. Their answer was that a subset of the information shown on the web page is available as XML via Entrez's ESummary and EFetch (and thus available from Biopython). The full CDD records are stored as one large file, which is obtainable from NCBI's ftp site, but currently it is not possible to get individual CDD records except in HTML form through the NCBI website.

--Michiel.


Peter <biopython at maubp.freeserve.co.uk> wrote: > Bio.CDD is a module with a parser for CDD (NCBI's Conserved Domain Database)
> records. The parser parses HTML pages from CDD's web site. Since the parser
> was written about six years ago, the CDD web site has changed considerably.
> Bio.CDD therefore cannot parse current HTML pages from CDD.

A couple of years ago, I wanted to get the CDD domain name and
description and ended up writing my own very simple and crude parser
to extract just this information.  Doing a proper job would mean
extracting lots and lots of fields, e.g.
http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=29475

I wonder if the NCBI make any of this available as XML via Entrez?  I
had a quick look and couldn't find anything.

Peter


From biopython at maubp.freeserve.co.uk  Thu Jun 19 21:08:13 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 19 Jun 2008 22:08:13 +0100
Subject: [Biopython-dev] test_Entrez.py fails on Linux?
Message-ID: <320fb6e00806191408t45a45da8hda0c2fc8a39aae57@mail.gmail.com>

Hi Michiel,

I've just tried the unit tests on a clean checkout on Linux, and there
is a problem with test_Entrez.py (shown below).  I'm pretty sure it
was working for me on Mac OS X this afternoon, so this may be platform
specific.  I haven't using Biopython on Windows recently so I don't
know if that is working or not.

If you can't reproduce this, let me know and I do some investigation
here.  The good news is all the other tests seem fine on Linux (bar
the GFF, dnal and the population genetics tests for which I don't have
the external dependencies installed).

Peter

This is the output I get on python 2.4.3, using 64bit Ubuntu Dapper
Drake (a little old now).

maubp at shuttle2:~/repository/biopython/Tests$ python test_Entrez.py
Test parsing database list returned by EInfo ... ok
Test parsing database info returned by EInfo ... ok
Test parsing XML returned by ESearch from the Journals database ... ok
Test parsing XML returned by ESearch when no items were found ... ok
Test parsing XML returned by ESearch from the Nucleotide database ... ok
Test parsing XML returned by ESearch from PubMed Central ... ok
Test parsing XML returned by ESearch from the Protein database ... ok
Test parsing XML returned by ESearch from PubMed (first test) ... ok
Test parsing XML returned by ESearch from PubMed (second test) ... ok
Test parsing XML returned by ESearch from PubMed (third test) ... ok
Test parsing XML returned by EPost ... ok
Test parsing XML returned by EPost with an invalid id (overflow tag) ... ok
Test parsing XML returned by EPost with incorrect arguments ... ERROR
Test parsing XML returned by ESummary from the Journals database ... ok
Test parsing XML returned by ESummary from the Nucleotide database ... ok
Test parsing XML returned by ESummary from the Protein database ... ok
Test parsing XML returned by ESummary from PubMed ... ok
Test parsing XML returned by ESummary from the Structure database ... ok
Test parsing XML returned by ESummary from the Taxonomy database ... ok
Test parsing XML returned by ESummary from the UniSTS database ... ok
Test parsing XML returned by ESummary with incorrect arguments ... ERROR
Test parsing cancerchromosomes links returned by ELink ... ok
Test parsing medline indexed articles returned by ELink ... ok
Test parsing Nucleotide to Protein links returned by ELink ... ok
Test parsing pubmed links returned by ELink (first test) ... ok
Test parsing pubmed links returned by ELink (second test) ... ok
Test parsing pubmed link returned by ELink (third test) ... ok
Test parsing pubmed links returned by ELink (fourth test) ... ok
Test parsing pubmed links returned by ELink (fifth test) ... ok
Test parsing pubmed links returned by ELink (sixth test) ... ok
Test parsing XML returned by EFetch, Journals database ... ok
Test parsing XML returned by EFetch, Nucleotide database (first test) ... ok
Test parsing XML returned by EFetch, Protein database ... ok
Test parsing XML returned by EFetch, OMIM database ... ok
Test parsing XML returned by EFetch, PubMed database (first test) ... ok
Test parsing XML returned by EFetch, PubMed database (second test) ... ok
Test parsing XML returned by EFetch, Taxonomy database ... ok
Test parsing XML output returned by EGQuery (first test) ... ok
Test parsing XML output returned by EGQuery (second test) ... ok
Test parsing XML output returned by ESpell ... ok

======================================================================
ERROR: Test parsing XML returned by EPost with incorrect arguments
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Entrez.py", line 560, in t_wrong
    assert exception.message=="Wrong DB name"
AttributeError: RuntimeError instance has no attribute 'message'

======================================================================
ERROR: Test parsing XML returned by ESummary with incorrect arguments
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Entrez.py", line 943, in t_wrong
    assert exception.message=="Neither query_key nor id specified"
AttributeError: RuntimeError instance has no attribute 'message'

----------------------------------------------------------------------
Ran 40 tests in 0.471s

FAILED (errors=2)


From biopython at maubp.freeserve.co.uk  Fri Jun 20 09:31:21 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 20 Jun 2008 10:31:21 +0100
Subject: [Biopython-dev] test_Entrez.py fails on Linux?
In-Reply-To: <320fb6e00806191408t45a45da8hda0c2fc8a39aae57@mail.gmail.com>
References: <320fb6e00806191408t45a45da8hda0c2fc8a39aae57@mail.gmail.com>
Message-ID: <320fb6e00806200231y716c5a1ds2495f16a56a15f88@mail.gmail.com>

> Hi Michiel,
>
> I've just tried the unit tests on a clean checkout on Linux, and there
> is a problem with test_Entrez.py (shown below).  I'm pretty sure it
> was working for me on Mac OS X this afternoon, so this may be platform
> specific.  I haven't using Biopython on Windows recently so I don't
> know if that is working or not.

I've just checked, and on a clean CVS checkout under Mac OS 10.5
Leopard with python 2.5.2, test_Entrez.py passes.

A clean check out last night on 64bit Ubuntu Dapper Drake with python
2.4.3 failed.

So whatever is going wrong is probably OS specific or perhaps python
version specific.

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Jun 20 10:07:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Jun 2008 06:07:59 -0400
Subject: [Biopython-dev] [Bug 2524] New: Handle missing libraries like
	TextTools in run_tests.py
Message-ID: <bug-2524-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2524

           Summary: Handle missing libraries like TextTools in run_tests.py
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


Once upon a time, we treated any ImportError from a unit test as a reason to
skip the test gracefully, as these are *usually* from missing external
dependencies.  This could hide real errors if we had (re)moved a Biopython
module.

We now use the Bio.MissingExternalDependencyError exception, and the unit tests
themselve will raise this for missing command line tools or certain optional
libraries like MySQLdb.

However, the Bio.MissingExternalDependencyError exception does not get raised
when the following commonly used external dependencies are missing:

import TextTools
import Numeric
import reportlab

It is now possible to install Biopython without TextTools and reportlab (and
Numeric?), and make use of a lot of its functionality - but the unit tests give
nasty error messages.

I propose we either:

(a) Add a special case to run_tests.py to catch specific ImportError cases and
skip the test with a suitable message (patch to follow).  Specifically
TextTools, reportlab and Numeric - but potentially other third party libraries
like MySQLdb could be handled too.  This keeps the individual unit tests
simple.

or:

(b) Modify all the tests using these semi-optional libraries to catch the
ImportError and raise MissingExternalDependencyError instead.  As the tests
themselves generally don't directly import the external library this is perhaps
messy.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jun 20 10:09:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Jun 2008 06:09:37 -0400
Subject: [Biopython-dev] [Bug 2524] Handle missing libraries like TextTools
	in run_tests.py
In-Reply-To: <bug-2524-42@http.bugzilla.open-bio.org/>
Message-ID: <200806201009.m5KA9b98019988@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2524


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-20 06:09 EST -------
Created an attachment (id=948)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=948&action=view)
Patch to Tests/run_tests.py

Adds a hard coded list of known import errors to be treated as missing external
dependencies (i.e. skip the test).

This is implemented as a dict allowing a URL to be given.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jun 20 10:16:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Jun 2008 06:16:49 -0400
Subject: [Biopython-dev] [Bug 2525] New: The unit tests GUI run_tests.py
	does not track skipped tests
Message-ID: <bug-2525-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2525

           Summary: The unit tests GUI run_tests.py does not track skipped
                    tests
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


Running run_tests.py without the --no-gui command line option counts any
skipped tests as passed (green).  Furthermore, the skipped message is just
printed to the command line (if run from a terminal).

Ideally the test framework would report these skipped tests in the GUI, perhaps
even with a clickable entry (like the failures) to show the message.

[On a personal note, I never use the run_tests.py GUI, and would rather it was
not the default.  If no one likes it, we could just remove the GUI]


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jun 20 12:17:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Jun 2008 08:17:15 -0400
Subject: [Biopython-dev] [Bug 2525] The unit tests GUI run_tests.py does not
	track skipped tests
In-Reply-To: <bug-2525-42@http.bugzilla.open-bio.org/>
Message-ID: <200806201217.m5KCHFoF025054@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2525


------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp  2008-06-20 08:17 EST -------
> [On a personal note, I never use the run_tests.py GUI, and would rather it was
> not the default.  If no one likes it, we could just remove the GUI]
> 
Personally, I don't see the advantage of the GUI, and I can live without it.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Fri Jun 20 12:14:30 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 20 Jun 2008 05:14:30 -0700 (PDT)
Subject: [Biopython-dev] test_Entrez.py fails on Linux?
In-Reply-To: <320fb6e00806200231y716c5a1ds2495f16a56a15f88@mail.gmail.com>
Message-ID: <795994.35527.qm@web62408.mail.re1.yahoo.com>

Hi Peter,

Thanks for letting me know.

It turned out that there were two problems with older Python versions (2.3 and 2.4).
One issue was not in Bio.Entrez but in the test script itself, using a feature that is only available in Python 2.5. This is now fixed in CVS.
The second issue is with Python 2.3: It does not copy data files to the build directory. Then, when you run "python run_tests.py test_Entrez.py" you will get many error messages about missing DTD files. If you run "python test_entrez.py" instead, the tests are done from the installed Biopython instead of the one in the build directory, and then no errors occur.
I guess the only way to solve this is to modify run_tests.py to skip test_Entrez if Python is version 2.3. Unless somebody else has a better suggestion, I will do that.

--Michiel.

Peter <biopython at maubp.freeserve.co.uk> wrote: > Hi Michiel,
>
> I've just tried the unit tests on a clean checkout on Linux, and there
> is a problem with test_Entrez.py (shown below).  I'm pretty sure it
> was working for me on Mac OS X this afternoon, so this may be platform
> specific.  I haven't using Biopython on Windows recently so I don't
> know if that is working or not.

I've just checked, and on a clean CVS checkout under Mac OS 10.5
Leopard with python 2.5.2, test_Entrez.py passes.

A clean check out last night on 64bit Ubuntu Dapper Drake with python
2.4.3 failed.

So whatever is going wrong is probably OS specific or perhaps python
version specific.

Peter
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


From biopython at maubp.freeserve.co.uk  Fri Jun 20 12:43:55 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 20 Jun 2008 13:43:55 +0100
Subject: [Biopython-dev] test_Entrez.py fails on Linux?
In-Reply-To: <795994.35527.qm@web62408.mail.re1.yahoo.com>
References: <320fb6e00806200231y716c5a1ds2495f16a56a15f88@mail.gmail.com>
	<795994.35527.qm@web62408.mail.re1.yahoo.com>
Message-ID: <320fb6e00806200543u62d385fcka3aa9026986549ba@mail.gmail.com>

On Fri, Jun 20, 2008 at 1:14 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Peter,
>
> Thanks for letting me know.
>
> It turned out that there were two problems with older Python versions (2.3 and 2.4).
> One issue was not in Bio.Entrez but in the test script itself, using a
> feature that is only available in Python 2.5. This is now fixed in CVS.

Good work.

> The second issue is with Python 2.3: It does not copy data files to the
> build directory. Then, when you run "python run_tests.py test_Entrez.py"
> you will get many error messages about missing DTD files. If you run
> "python test_entrez.py" instead, the tests are done from the installed
> Biopython instead of the one in the build directory, and then no errors occur.

I had suspected there was something like this happening on my Windows
machine (which is on python 2.3) but at the time you were still busy
updating the code so I didn't worry about it.

This issue with non-python files in the build directory reminds me of
something Tiago found with his Population Genetics work.  I'd have to
go over the old emails to double check.

> I guess the only way to solve this is to modify run_tests.py to skip
> test_Entrez if Python is version 2.3. Unless somebody else has a better
> suggestion, I will do that.

We could modify setup.py under python 2.3 to make sure these files are
copied.  Is this related to the (reverted) package_data change you
tried recently?

Peter


From biopython at maubp.freeserve.co.uk  Fri Jun 20 13:23:21 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 20 Jun 2008 14:23:21 +0100
Subject: [Biopython-dev] test_Entrez.py fails on Linux?
In-Reply-To: <320fb6e00806200543u62d385fcka3aa9026986549ba@mail.gmail.com>
References: <320fb6e00806200231y716c5a1ds2495f16a56a15f88@mail.gmail.com>
	<795994.35527.qm@web62408.mail.re1.yahoo.com>
	<320fb6e00806200543u62d385fcka3aa9026986549ba@mail.gmail.com>
Message-ID: <320fb6e00806200623n2148b735t1071aa40b0f24a7c@mail.gmail.com>

>> The second issue is with Python 2.3: It does not copy data files to the
>> build directory. Then, when you run "python run_tests.py test_Entrez.py"
>> you will get many error messages about missing DTD files. If you run
>> "python test_entrez.py" instead, the tests are done from the installed
>> Biopython instead of the one in the build directory, and then no errors occur.
>
> ...
>
> This issue with non-python files in the build directory reminds me of
> something Tiago found with his Population Genetics work.  I'd have to
> go over the old emails to double check.

I was thinking of bug 2375, where Tiago had to add a work arround for
data files not present in the build directory.
http://bugzilla.open-bio.org/show_bug.cgi?id=2375

Peter


From biopython at maubp.freeserve.co.uk  Fri Jun 20 14:42:57 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 20 Jun 2008 15:42:57 +0100
Subject: [Biopython-dev] SeqRecord to file format as string
In-Reply-To: <4D53AB82-F673-4F4F-BCEC-BA06088E8721@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
	<47147341.4020708@maubp.freeserve.co.uk>
	<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
	<4714EB8E.3000700@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
	<320fb6e00806180700k327e6913m7ba9c4bdc3421f67@mail.gmail.com>
	<4D53AB82-F673-4F4F-BCEC-BA06088E8721@northwestern.edu>
Message-ID: <320fb6e00806200742w7e9e57dbt8d0d3362573cf9a@mail.gmail.com>

On Wed, Jun 18, 2008 at 4:16 PM, Jared Flatow <jflatow at northwestern.edu> wrote:
> However, py3k and 2.6 will make available the functionality described in PEP
> 3101:
>
> http://www.python.org/dev/peps/pep-3101/
>
> I think it would be best to define some semantics that are compatible with
> this PEP.

That is interesting - the PEP has been accepted, but I guess we should
wait and see exactly what python 2.6 and 3.0 end up using before
trying to integrate this into the SeqRecord.

> In short, I think creating methods to return formatted versions of objects
> (SeqRecords) is a good idea, but most especially if it is done in a way
> consistent with the language's vision.

That does sound wise - but I'm a little hazy on how exactly PEP-3101
will work in practice for generic complex objects.

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Jun 20 15:01:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Jun 2008 11:01:17 -0400
Subject: [Biopython-dev] [Bug 2526] New: SeqFeature's .id property is not
	preserved in BioSQL
Message-ID: <bug-2526-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2526

           Summary: SeqFeature's .id property is not preserved in BioSQL
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


As per the title, a SeqFeature's .id property is not preserved after a
save/retreive in BioSQL.

I found this while working on Bug 2235, where my modified "swiss" parser
creates SeqRecord objects with SeqFeature object which may have their .id set. 
Note that in GenBank and EMBL, the SeqFeature objects do not have their id
property set, and so are not affected.

I need to review the BioSQL schema to see if there is a suitable field that
Biopython is ignoring, and if there is, use it.  If not, we can probably use a
tagged qualifier - ideally with the same name as the other Bio* projects.

See also test_BioSQL_SeqIO.py revision 1.17 which includes a word arround to
avoid this limitation.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From jflatow at northwestern.edu  Fri Jun 20 16:16:10 2008
From: jflatow at northwestern.edu (Jared Flatow)
Date: Fri, 20 Jun 2008 11:16:10 -0500
Subject: [Biopython-dev] SeqRecord to file format as string
In-Reply-To: <320fb6e00806200742w7e9e57dbt8d0d3362573cf9a@mail.gmail.com>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
	<47147341.4020708@maubp.freeserve.co.uk>
	<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
	<4714EB8E.3000700@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
	<320fb6e00806180700k327e6913m7ba9c4bdc3421f67@mail.gmail.com>
	<4D53AB82-F673-4F4F-BCEC-BA06088E8721@northwestern.edu>
	<320fb6e00806200742w7e9e57dbt8d0d3362573cf9a@mail.gmail.com>
Message-ID: <0FB6DD30-426C-43F3-BEBE-1728FA1E9D79@northwestern.edu>

On Jun 20, 2008, at 9:42 AM, Peter wrote:

> On Wed, Jun 18, 2008 at 4:16 PM, Jared Flatow <jflatow at northwestern.edu 
> > wrote:
>> However, py3k and 2.6 will make available the functionality  
>> described in PEP
>> 3101:
>>
>> http://www.python.org/dev/peps/pep-3101/
>>
>> I think it would be best to define some semantics that are  
>> compatible with
>> this PEP.
>
> That is interesting - the PEP has been accepted, but I guess we should
> wait and see exactly what python 2.6 and 3.0 end up using before
> trying to integrate this into the SeqRecord.

I agree, there's a couple of things that may still change, but the  
betas for 2.6 and 3.0 are out and that PEP has been around a while so  
I would say it's pretty much stable. At least as far as how the  
general mechanism will work, I don't believe that is likely to change.

>> In short, I think creating methods to return formatted versions of  
>> objects
>> (SeqRecords) is a good idea, but most especially if it is done in a  
>> way
>> consistent with the language's vision.
>
> That does sound wise - but I'm a little hazy on how exactly PEP-3101
> will work in practice for generic complex objects.

Yes I had to read it a few times through to understand how exactly it  
will work, here is what I know:

All objects now get the __format__ method which has a signature like  
this:

def __format__(self, format_spec):
	# return a formatted string

The format_spec (format specifier) can be defined by the object, so  
essentially it's totally customizable (if you want to do really crazy  
things there is a Formatter that can be messed with, but we should and  
can avoid this). This object method works like other customizable  
python methods, and there's a corresponding builtin, so calling  
format(obj, "the format specifier") will simply call  
obj.__format__(self, "the format specifier"). Thus we can define the  
format_spec for a SeqRecord to differentiate between FASTA and  
whatever other formats we want to define.

The string class is also getting a .format method which just calls  
the .__format__ method in an OO way instead of using the builtin. We  
can do the same thing, and it seems like most use cases will be to  
call seq_rec.format('fasta'). All this works for all python versions,  
except you typically can't call it using format(seq_rec, 'fasta')  
except in 2.6 or 3.0.

Besides the builtin format, we gain the ability to embed the format  
within other strings. So, using the implementation you provided  
earlier which just returns the underlying Seq as a string if no format  
is specified, we might define the __format__ method like this:

def __format__(self, format_spec=None):
	if format_spec:
            from StringIO import StringIO
            from Bio import SeqIO
            handle = StringIO()
            SeqIO.write([self], handle, format)
            handle.seek(0)
            return handle.read()
	return str(self)

def __str__(self):
	return str(self.seq)

Now that means I can also embed this in formatted strings, like so:

"this is my sequence: {0}".format(seq_rec)

Or:

"this is my sequence in fasta format: {0:fasta}".format(seq_rec)

All in all, its pretty much what you'd expect (and the same as what  
you had before). There's only a few small benefits we get for doing it  
this way (right now), but I don't think we can go wrong using the  
__format__ method like it was meant to be used, and who knows what  
future use cases this may simplify.

jared


From bugzilla-daemon at portal.open-bio.org  Sat Jun 21 04:19:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Jun 2008 00:19:59 -0400
Subject: [Biopython-dev] [Bug 2375] Coalescent support through Simcoal2
In-Reply-To: <bug-2375-42@http.bugzilla.open-bio.org/>
Message-ID: <200806210419.m5L4JxfJ001994@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2375


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


------- Comment #22 from mdehoon at ims.u-tokyo.ac.jp  2008-06-21 00:19 EST -------
(In reply to comment #15)
> The solution in Bio/PopGen/SimCoal/__init__.py to find builtin_tpl_dir is not
> so beautiful, but on the other hand I don't see a better way to do it.

I ran into the same problem with Bio/Entrez, which needs a bunch of DTD files
in Bio/Entrez/DTDs/. The attached patch to setup.py modifies the build command
such that the data files are copied to the build directory when running "python
setup.py build". This solves the problem with Bio.Entrez, and should also solve
the problem with Bio/PopGen/SimCoal without using the workaround in
Bio/PopGen/SimCoal/__init__.py. Can you guys try this patch on the platforms
and python versions you have access to? Just to make sure I didn't miss
anything before committing to CVS.

Recently there have been quite a lot of updates to CVS, so you may need to
start from a fresh CVS checkout.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Jun 21 04:21:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Jun 2008 00:21:13 -0400
Subject: [Biopython-dev] [Bug 2375] Coalescent support through Simcoal2
In-Reply-To: <bug-2375-42@http.bugzilla.open-bio.org/>
Message-ID: <200806210421.m5L4LDPg002064@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2375


------- Comment #23 from mdehoon at ims.u-tokyo.ac.jp  2008-06-21 00:21 EST -------
Created an attachment (id=950)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=950&action=view)
Patch to setup.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Sat Jun 21 05:11:18 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 20 Jun 2008 22:11:18 -0700 (PDT)
Subject: [Biopython-dev] Bio.SCOP
Message-ID: <251322.99482.qm@web62401.mail.re1.yahoo.com>


Bio.SCOP is one of the modules affected by Bug 2454
(http://bugzilla.open-bio.org/show_bug.cgi?id=2454),
which is basically about how Biopython uses file handles.

Bio.SCOP contains parsers for several file
formats used by SCOP. I am using Bio.SCOP.Hie
as an example here, but the same applies to
the other parsers.

The Bio.SCOP parsers define a Parser and a Iterator
class (similar to other older Biopython parsers).
Typical usage is as follows:

>>> from Bio.SCOP import Hie
>>> handle = open("mydatafile.txt")
>>> parser = Hie.Parser()
>>> records = Hier.Iterator(handle, parser)
>>> for record in records:
...     # record is an instance of Bio.SCOP.Hie.Record

Now, in the SCOP file format, each record is on one
line in the data file. So we don't need the Iterator:
>>> from Bio.SCOP import Hie
>>> handle = open("mydatafile.txt")
>>> parser = Hie.Parser()
>>> for line in handle:
...     record = parser.parse(line)
...     # record is an instance of Bio.SCOP.Hie.Record

This solves Bug #2454 (which occurs in the Iterator
class), and is more general than the Iterator class
(e.g., now we can parse a list of lines).

To take this one step further, the Parser class is not
really needed either. Although Parser is a class, we
are not using the functionality of a class (no
inheritance, and the object self is never used). In
essence, the parse() function inside the Parser class
may as well live outside of it.

There are several ways to simplify this module; each
of them essentially amount to moving the parse()
function:

1) Move the parse() function to the Record class initializer:
>>> from Bio.SCOP import Hie
>>> handle = open("mydatafile.txt")
>>> for line in handle:
...     record = Hie.Record(line)
...     # record is an instance of Bio.SCOP.Hie.Record

2) Move the parse() function outside of the Parser class,
and rename it read() for consistency with other Biopython
parsers:
>>> from Bio.SCOP import Hie
>>> handle = open("mydatafile.txt")
>>> while True:
...     record = Hie.read(handle)
...     if not record: break
...     # record is an instance of Bio.SCOP.Hie.Record

3) Move the parse() function outside of the Parser class,
and use it as a generator function:
>>> from Bio.SCOP import Hie
>>> handle = open("mydatafile.txt")
>>> records = Hie.parse(handle)
>>> for record in records:
...     # record is an instance of Bio.SCOP.Hie.Record


Comments, suggestions, preferences?

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Sat Jun 21 11:31:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Jun 2008 07:31:14 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806211131.m5LBVEWb019981@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #17 from mdehoon at ims.u-tokyo.ac.jp  2008-06-21 07:31 EST -------
I added a DeprecationWarning to Bio.Rebase.
Next on the to-do list is Bio.SCOP.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Sat Jun 21 11:36:43 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 21 Jun 2008 04:36:43 -0700 (PDT)
Subject: [Biopython-dev] [BioPython]  Bio.CDD, anyone?
In-Reply-To: <485A70B0.1010202@gmail.com>
Message-ID: <195444.96577.qm@web62403.mail.re1.yahoo.com>

As far as I can tell, the test files were created by saving the HTML source code from the CDD web site to a file. As the CDD web site has changed its HTML is the meantime, we cannot reproduce the HTML files used by the Bio.CDD tests.

Unless somebody objects in the next couple of days, I'll add a DeprecationWarning to Bio.CDD.

--Michiel.

Bruce Southey <bsouthey at gmail.com> wrote: Hi,
Do you know how the test files were created? If there is not an easy 
answer then it makes the decision easier.

Anyhow, I  vote to remove this module as, in addition to the things 
previously mentioned, it would far better to support interproscan 
(http://www.ebi.ac.uk/Tools/InterProScan/ ) than just a single tool.

Bruce
_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


From bugzilla-daemon at portal.open-bio.org  Sun Jun 22 04:51:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 22 Jun 2008 00:51:58 -0400
Subject: [Biopython-dev] [Bug 2527] New: Bug in NCBIXML.py in
	_end_BlastOutput_version()
Message-ID: <bug-2527-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2527

           Summary: Bug in NCBIXML.py in _end_BlastOutput_version()
           Product: Biopython
           Version: 1.45
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: cdputnam at ucsd.edu


biopython version is from Fedora distribution:
python-biopython-1.45-1.fc7

For a recently run NCBIWWW Blast (following the tutorial at
http://biopython.org/DIST/docs/tutorial/Tutorial.html), I
ran into a problem in parsing by _end_BlastOutput_version
with the version information:

<BlastOutput_version>BLASTP 2.2.18+</BlastOutput_version>


Traceback (most recent call last):
  File "blast2.py", line 7, in <module>
    for blast_record in blast_records:
  File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py", line 577, in
parse
    expat_parser.Parse(text, False)
  File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py", line 98, in
endElement
    eval("self.%s()" % method)
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py", line 216, in
_end_BlastOutput_version
    self._header.date = self._value.split()[2][1:-1]
IndexError: list index out of range

I've worked around this bug for now by commenting out the
offending line and setting the date to an empty string:

    def _end_BlastOutput_version(self):
        """version number of the BLAST engine (e.g., 2.1.2)

        Save this to put on each blast record object
        """
        self._header.version = self._value.split()[1]
        # self._header.date = self._value.split()[2][1:-1]
        self._header.date = ''


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jun 22 04:52:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 22 Jun 2008 00:52:45 -0400
Subject: [Biopython-dev] [Bug 2527] Bug in NCBIXML.py in
	_end_BlastOutput_version()
In-Reply-To: <bug-2527-42@http.bugzilla.open-bio.org/>
Message-ID: <200806220452.m5M4qjiE029058@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2527


cdputnam at ucsd.edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |cdputnam at ucsd.edu


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jun 22 05:52:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 22 Jun 2008 01:52:05 -0400
Subject: [Biopython-dev] [Bug 2527] Bug in NCBIXML.py in
	_end_BlastOutput_version()
In-Reply-To: <bug-2527-42@http.bugzilla.open-bio.org/>
Message-ID: <200806220552.m5M5q5rQ031580@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2527


------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp  2008-06-22 01:52 EST -------
I believe that this is already fixed in CVS.
Could you try the latest version of Bio/Blast/NCBIXML.py, available at
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/?cvsroot=biopython
and let us know if it fixes the bug?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 23 10:54:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 23 Jun 2008 06:54:22 -0400
Subject: [Biopython-dev] [Bug 2528] New: NCBIStandalone.blastall(): Replace
	os.popen3 with subprocess.Popen
Message-ID: <bug-2528-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2528

           Summary: NCBIStandalone.blastall(): Replace os.popen3 with
                    subprocess.Popen
           Product: Biopython
           Version: 1.45
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: mmokrejs at ribosome.natur.cuni.cz


I have already mentioned this on the email list few weeks ago ... NCBI Blast
2.2.18 (but was a case of also previous version as far as I remember) does not
flush output buffers when run from under mod_python-3.3.11/apache-2.2.8.

I tried to flush the buffers or disable buffering but it does not help. In the
end, a working solution is to move the using subprocess module introduced in
python 2.4 and which deprecates os.system, os.exec, os.popen* and other
functions. The following patch works for me, so the user receives back into
his/her web browser the blast stdout. Somehow, one has to copy the data into
another variable and close the file descriptors used by blastall binary.


Unfortunately, still a stale process can be seen in "ps -ef" output:
apache    5382  5323 47 12:31 ?        00:00:04 [blastall] <defunct>

But as I have said, at least the data is not buffered anymore.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 23 10:55:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 23 Jun 2008 06:55:26 -0400
Subject: [Biopython-dev] [Bug 2528] NCBIStandalone.blastall(): Replace
	os.popen3 with subprocess.Popen
In-Reply-To: <bug-2528-42@http.bugzilla.open-bio.org/>
Message-ID: <200806231055.m5NAtQCC030683@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2528


------- Comment #1 from mmokrejs at ribosome.natur.cuni.cz  2008-06-23 06:55 EST -------
Created an attachment (id=951)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=951&action=view)
NCBIStandalone.py.patch


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 23 10:56:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 23 Jun 2008 06:56:00 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
	for '-F F' and make it safe
In-Reply-To: <bug-2508-42@http.bugzilla.open-bio.org/>
Message-ID: <200806231056.m5NAu0or030728@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508


------- Comment #5 from mmokrejs at ribosome.natur.cuni.cz  2008-06-23 06:56 EST -------
(In reply to comment #4)

Yes, the "filter" argument is not clear, please improve the docs in the sources
and on the web. At the best I would in addition propose renaming the argument.

Regarding the patch in comment #3, I think it should be more strict and blast*
functions should only accept explicitly listed arguments in the function
definition, so no kwargs, etc. But it is a good startup. In general, I would
propose to provide a general wrapper function to be placed in front of _ALL_
popen3() calls. And, conjuction, replace the popen3 calls with
subprocess.Popen. See Bug #2528 on the NCBIStandalone.blastall() where is a
working example of this.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 23 15:01:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 23 Jun 2008 11:01:17 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806231501.m5NF1Hth014356@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #18 from mdehoon at ims.u-tokyo.ac.jp  2008-06-23 11:01 EST -------
See the discussion on the mailing list:
http://lists.open-bio.org/pipermail/biopython-dev/2008-June/003819.html
for some ideas for Bio.SCOP.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 23 15:16:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 23 Jun 2008 11:16:29 -0400
Subject: [Biopython-dev] [Bug 2527] Bug in NCBIXML.py in
	_end_BlastOutput_version()
In-Reply-To: <bug-2527-42@http.bugzilla.open-bio.org/>
Message-ID: <200806231516.m5NFGTgD015331@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2527


cdputnam at ucsd.edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from cdputnam at ucsd.edu  2008-06-23 11:16 EST -------
The latest NCBIXML.py does fix the problem with Blast version parsing.

Just so you know, I had to comment out two lines in
_end_Hsp_bit_score, similar to the version of the file I already had.
I'm guessing this is a version mismatch with some other file that
I didn't update (I only replaced NCBIXML.py).

The error was:

AttributeError: Description instance has no attribute 'bits'

And the commented version of the function is:

    def _end_Hsp_bit_score(self):
        """bit score of HSP
        """
        self._hsp.bits = float(self._value)
        #if self._descr.bits == None:
        #    self._descr.bits = float(self._value)

Thanks for your help.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Jun 24 09:38:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 24 Jun 2008 05:38:54 -0400
Subject: [Biopython-dev] [Bug 2528] NCBIStandalone.blastall(): Replace
	os.popen3 with subprocess.Popen
In-Reply-To: <bug-2528-42@http.bugzilla.open-bio.org/>
Message-ID: <200806240938.m5O9csKZ032756@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2528


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-24 05:38 EST -------
With this patch we have to wait for the sub-process to finish before we can
read its output.  This is a potential drawback as it delays the parsing. 
Currently we should be able to can parse this iteratively as the queries are
processed.

Also, you are loading the entire output into memory (as a list of strings,
which you then turn into a StringIO handle).  This is potentially a very bad
idea, as in extreme cases Blast XML files can be GB in size.

I'm not keen on your solution, but I don't know what to suggest for your
original problem, running Blast under mod_python-3.3.11/apache-2.2.8. 

Two minor points: Do you think we can do anything better on Python 2.3?  Did
you intend something similar for blastpgp and rpsblast.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Tue Jun 24 09:46:19 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 24 Jun 2008 10:46:19 +0100
Subject: [Biopython-dev] Bio.SCOP
In-Reply-To: <251322.99482.qm@web62401.mail.re1.yahoo.com>
References: <251322.99482.qm@web62401.mail.re1.yahoo.com>
Message-ID: <320fb6e00806240246u8afdb6fp51cd31000ebe3d9@mail.gmail.com>

On Sat, Jun 21, 2008 at 6:11 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Bio.SCOP contains parsers for several file
> formats used by SCOP. I am using Bio.SCOP.Hie
> as an example here, but the same applies to
> the other parsers.
>
> The Bio.SCOP parsers define a Parser and a Iterator
> class (similar to other older Biopython parsers).

I would deprecate the Parser and Iterator objects, and introduce a
parse(handle) function to iterate over a file (following our recent
convention) and a perhaps a read() function too (taking a handle or a
single line?),

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Jun 24 10:17:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 24 Jun 2008 06:17:41 -0400
Subject: [Biopython-dev] [Bug 2528] NCBIStandalone.blastall(): Replace
	os.popen3 with subprocess.Popen
In-Reply-To: <bug-2528-42@http.bugzilla.open-bio.org/>
Message-ID: <200806241017.m5OAHfdK002192@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2528


------- Comment #3 from mmokrejs at ribosome.natur.cuni.cz  2008-06-24 06:17 EST -------
Hi Peter,
 well I am not much happy with this either, and I do understand your points. I
will try to come up with another solution. Would be best to disable buffering
in popen3() but I failed to get it working. Will give it some more thought next
week.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Jun 24 10:35:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 24 Jun 2008 06:35:50 -0400
Subject: [Biopython-dev] [Bug 2527] Bug in NCBIXML.py in
	_end_BlastOutput_version()
In-Reply-To: <bug-2527-42@http.bugzilla.open-bio.org/>
Message-ID: <200806241035.m5OAZo3p003784@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2527


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-24 06:35 EST -------
Regarding comment 2, I think you need to update Bio/Blast/Record.py as well.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Jun 24 10:36:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 24 Jun 2008 06:36:18 -0400
Subject: [Biopython-dev] [Bug 2528] NCBIStandalone.blastall(): Replace
	os.popen3 with subprocess.Popen
In-Reply-To: <bug-2528-42@http.bugzilla.open-bio.org/>
Message-ID: <200806241036.m5OAaIIt003857@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2528


------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp  2008-06-24 06:36 EST -------
Is there an easy way to replicate this issue?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Jun 24 11:30:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 24 Jun 2008 07:30:45 -0400
Subject: [Biopython-dev] [Bug 2527] Bug in NCBIXML.py in
	_end_BlastOutput_version()
In-Reply-To: <bug-2527-42@http.bugzilla.open-bio.org/>
Message-ID: <200806241130.m5OBUjYU007159@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2527


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |biopython-
                   |                            |bugzilla at maubp.freeserve.co.
                   |                            |uk


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-24 07:30 EST -------
P.S. This is a duplicate of Bug 2499


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Jun 24 13:05:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 24 Jun 2008 09:05:46 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
	for '-F F' and make it safe
In-Reply-To: <bug-2508-42@http.bugzilla.open-bio.org/>
Message-ID: <200806241305.m5OD5jZa012413@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-24 09:05 EST -------
Checking in Tests/test_NCBIStandalone.py new revision: 1.14
Checking in Bio/Blast/NCBIStandalone.py new revision: 1.73

I've checked in my suggested patch, and tried to improve the filter
documentation by including the phrase "low complexity".  It might be worth
passing this suggestion on to the NCBI as their own command line tools just use
the term filter.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Wed Jun 25 14:04:09 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 25 Jun 2008 07:04:09 -0700 (PDT)
Subject: [Biopython-dev] Bio.SCOP.FileIndex
Message-ID: <141582.2274.qm@web62413.mail.re1.yahoo.com>

Hi everybody,

When I was modifying Bio.SCOP, I noticed that Bio.SCOP.FileIndex is flawed if file reading is done via a buffer (which is often the case in Python).

Before we try to fix this, is anybody actually using Bio.SCOP.FileIndex?
If not, I think we should deprecate it instead of trying to fix it.

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Wed Jun 25 15:55:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 25 Jun 2008 11:55:58 -0400
Subject: [Biopython-dev] [Bug 2529] New: NCBI BLAST XML parser does not
	support the online blast version 2.2.18+
Message-ID: <bug-2529-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2529

           Summary: NCBI BLAST XML parser does not support the online blast
                    version 2.2.18+
           Product: Biopython
           Version: 1.45
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lordnapi at gmail.com
         QAContact: lordnapi at gmail.com


Hello,
I have performed a blast search of PDB database. I am having a problem while
parsing the blast result on both Windows and Linux machines. The following four
lines of code provides me the same error.  Thanks. Ahmet

>>> from Bio.Blast import NCBIWWW
>>> from Bio.Blast import NCBIXML
>>> results_handle = NCBIWWW.qblast( 'blastp', 'pdb', 'ASFPVEILPFLYLGCAKDSTNLDVLEEFGIKYILNVTPNLPNLFENAGEFKYKQIPISDHWSQNLSQ')
>>> blast_record = NCBIXML.parse( results_handle ).next()


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jun 25 16:09:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 25 Jun 2008 12:09:24 -0400
Subject: [Biopython-dev] [Bug 2528] NCBIStandalone.blastall(): Replace
	os.popen3 with subprocess.Popen
In-Reply-To: <bug-2528-42@http.bugzilla.open-bio.org/>
Message-ID: <200806251609.m5PG9OWX002384@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2528


------- Comment #5 from mmokrejs at ribosome.natur.cuni.cz  2008-06-25 12:09 EST -------
(In reply to comment #4)
> Is there an easy way to replicate this issue?
> 

I believe run under mod_python a blast search and try to display it on the web
the results, that's all I actually do. On the server the blastall processes did
not flush it's cache, so if you would connect to the running process by strace
utility you would see it has done write() of some line being not yet the last
one of the output. The process hangs like this for ages, until you do "kill
-HUP $pid", then it it flushes the write buffer and exits successfully. Happens
with blast 2.2.18 at least.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jun 25 16:24:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 25 Jun 2008 12:24:45 -0400
Subject: [Biopython-dev] [Bug 2529] NCBI BLAST XML parser does not support
	the online blast version 2.2.18+
In-Reply-To: <bug-2529-42@http.bugzilla.open-bio.org/>
Message-ID: <200806251624.m5PGOjgf003205@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2529


lordnapi at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WORKSFORME


------- Comment #1 from lordnapi at gmail.com  2008-06-25 12:24 EST -------
The problem was caused by not having data in <BlastOutput_version>BLASTP
2.2.18+</BlastOutput_version> in the XML files. I fixed the problem for myself
by changing _end_BlastOutput_version function in the Blast/NCBIXML.py file to
the following (starts at line 208). I still don't know if having date is
important elsewhere.

def _end_BlastOutput_version(self):
    """version number of the BLAST engine (e.g., 2.1.2)
    Save this to put on each blast record object
    """
    self._valuesplit = self._value.split()
    self._header.version = self._valuesplit[1]
    if len(self._valuesplit) > 2 :
        self._header.date = self._value.split()[2][1:-1]
    else:
        self._header.date = ''


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Thu Jun 26 00:01:07 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 25 Jun 2008 17:01:07 -0700 (PDT)
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
Message-ID: <254082.68438.qm@web62401.mail.re1.yahoo.com>

Dear all,

Recently NCBI blocked access for a Biopython user who? was making 50,000 requests to NCBI at a rate of 18 requests per second during peak hours. This user was using the search_for function in Bio.GenBank, which internally uses Bio.EUtils. Apparently, Bio.EUtils does not follow the 3 seconds sleep rule betwen requests. NCBI also asked us to send requests for the Entrez E-Utilities to the EUtils web address, and not to the regular NCBI web address. I don't know if Bio.EUtils does that.

Bio.Entrez does use the 3 seconds sleep rule, and the eight E-Utilities functions all make use of the EUtils web address, though it is possible to pass a different web address as one of the arguments. The "query" function, which is not part of the E-Utilities, does use the standard NCBI web address.

To avoid such problems in the future, I'd like to propose the following:
1) Deprecate Bio.EUtils. Its functionality is covered by Bio.Entrez, which (from release 1.46) will have a parser.
Bio.EUtils is currently used by the following modules: 
Bio/config/DBRegistry.py
Bio/dbdefs/fasta.py
Bio/dbdefs/genbank.py
Bio/dbdefs/medline.py
Bio/GenBank/__init__.py
We were already planning to remove Bio.config and Bio.dbdefs, so we'd only have to modify Bio.GenBank.

2) Remove the 'query' function from Bio.Entrez. Anyway accessing NCBI's web site from Python to get HTML back doesn't make a lot of sense.

3) Remove the argument for a user-specified web address to make sure that always the E-Utilities address is used.

--Michiel.


From dalke at dalkescientific.com  Thu Jun 26 01:52:07 2008
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 26 Jun 2008 03:52:07 +0200
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <254082.68438.qm@web62401.mail.re1.yahoo.com>
References: <254082.68438.qm@web62401.mail.re1.yahoo.com>
Message-ID: <635E5251-830F-409C-A2D4-10EA59FA5037@dalkescientific.com>

On Jun 26, 2008, at 2:01 AM, Michiel de Hoon wrote:
> Bio.Entrez does use the 3 seconds sleep rule, and the eight E- 
> Utilities functions all make use of the EUtils web address, though  
> it is possible to pass a different web address as one of the  
> arguments. The "query" function, which is not part of the E- 
> Utilities, does use the standard NCBI web address.

What is the proper EUtils web address?

Entrez/__init__.py uses
   cgi='http://www.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi'
while the documentation at
   http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
claims "Send E-utilities requests to http://eutils.ncbi.nlm.nih.gov",
which I think should be             "http://eutils.ncbi.nlm.nih.gov/ 
entrez/eutils/epost.fcgi"

> To avoid such problems in the future, I'd like to propose the  
> following:
> 1) Deprecate Bio.EUtils. Its functionality is covered by  
> Bio.Entrez, which (from release 1.46) will have a parser.

I looked over Bio.Entrez and it handles only a subset of what  
Bio.EUtils does.  For example, it doesn't have any support to help  
track WebEnv as it changes over each request, nor support for  
alternate format types.

I would deprecate Bio.EUtils for another reason - there's no maintainer.

> 2) Remove the 'query' function from Bio.Entrez. Anyway accessing  
> NCBI's web site from Python to get HTML back doesn't make a lot of  
> sense.

Okay, now I'm quite confused.  This is functionality that Bio.EUtils  
supports.


 >>> from Bio.EUtils import HistoryClient
 >>> client = HistoryClient.HistoryClient()
 >>> result = client.search("Michiel de Hoon[AU]")
 >>> print result.efetch("text", "docsum").read()

1:  de Hoon M, Hayashizaki Y.
  Deep cap analysis gene expression (CAGE): genome-wide  
identification of
promoters, quantification of their expression, and network inference.
Biotechniques. 2008 Apr;44(5):627-8, 630, 632. Review.
PMID: 18474037 [PubMed - indexed for MEDLINE]

2:  Sierro N, Makita Y, de Hoon M, Nakai K.
  DBTBS: a database of transcriptional regulation in Bacillus  
subtilis containing
upstream intergenic conservation information.
Nucleic Acids Res. 2008 Jan;36(Database issue):D93-6. Epub 2007 Oct 25.
PMID: 17962296 [PubMed - indexed for MEDLINE]

3:  Makita Y, de Hoon MJ, Danchin A.
  Hon-yaku: a biology-driven Bayesian methodology for identifying  
translation
initiation sites in prokaryotes.
BMC Bioinformatics. 2007 Feb 8;8:47.
PMID: 17286872 [PubMed - indexed for MEDLINE]

4:  de Hoon MJ, Makita Y, Nakai K, Miyano S.
  Prediction of transcriptional terminators in Bacillus subtilis and  
related
species.
PLoS Comput Biol. 2005 Aug;1(3):e25. Epub 2005 Aug 12.
PMID: 16110342 [PubMed - indexed for MEDLINE]

5:  de Hoon MJ, Imoto S, Kobayashi K, Ogasawara N, Miyano S.
  Inferring gene regulatory networks from time-ordered gene  
expression data of
Bacillus subtilis using differential equations.
Pac Symp Biocomput. 2003;:17-28.
PMID: 12603014 [PubMed - indexed for MEDLINE]


(The default returns this in XML format.)


 >>> print result.efetch().read(500)
<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st  
January 2008//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ 
pubmed_080101.dtd">
<PubmedArticleSet>
<PubmedArticle>
     <MedlineCitation Owner="NLM" Status="MEDLINE">
         <PMID>18474037</PMID>
         <DateCreated>
             <Year>2008</Year>
             <Month>05</Month>
             <Day>13</Day>
         </DateCreated>
         <DateCompleted>
             <Year>2008</Year>
             <Month>06</Mont


> 3) Remove the argument for a user-specified web address to make  
> sure that always the E-Utilities address is used.

Yes.

				Andrew
				dalke at dalkescientific.com


From bugzilla-daemon at portal.open-bio.org  Thu Jun 26 09:20:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jun 2008 05:20:55 -0400
Subject: [Biopython-dev] [Bug 2529] NCBI BLAST XML parser does not support
	the online blast version 2.2.18+
In-Reply-To: <bug-2529-42@http.bugzilla.open-bio.org/>
Message-ID: <200806260920.m5Q9Ktlt019555@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2529


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|WORKSFORME                  |


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-26 05:20 EST -------
This is a duplicate of Bug 2499, reopening in order to mark this.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jun 26 09:21:38 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jun 2008 05:21:38 -0400
Subject: [Biopython-dev] [Bug 2529] NCBI BLAST XML parser does not support
	the online blast version 2.2.18+
In-Reply-To: <bug-2529-42@http.bugzilla.open-bio.org/>
Message-ID: <200806260921.m5Q9Lcp6019606@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2529


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |DUPLICATE


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-26 05:21 EST -------
The fix for the 2.2.18+ XML output is already in CVS, see Bug 2499

*** This bug has been marked as a duplicate of bug 2499 ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jun 26 09:21:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jun 2008 05:21:40 -0400
Subject: [Biopython-dev] [Bug 2499] Bio.Blast.NCBIXML cannot handle XML
	without date in BlastOutput_version
In-Reply-To: <bug-2499-42@http.bugzilla.open-bio.org/>
Message-ID: <200806260921.m5Q9Lebn019619@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2499


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lordnapi at gmail.com


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-26 05:21 EST -------
*** Bug 2529 has been marked as a duplicate of this bug. ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Thu Jun 26 10:25:38 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 11:25:38 +0100
Subject: [Biopython-dev] [BioPython] Fwd: NCBI Abuse Activity with
	BioPython
In-Reply-To: <DA88B268-8CE1-4C71-A685-F7C13978C8BA@dalkescientific.com>
References: <EEEED756EF6626469B10653F74501438018FAABB@NIHCESMLBX15.nih.gov>
	<55F39FBF-3CF0-4192-AFEB-100853FEE8A1@sonsorol.org>
	<DA88B268-8CE1-4C71-A685-F7C13978C8BA@dalkescientific.com>
Message-ID: <320fb6e00806260325m3b92ff8n143141c73a1a60dd@mail.gmail.com>

Andrew wrote:
>
>  I thought I put a rate limiter into the code, but looking at it now I see I
> didn't.  The documentation clearly states that users must follow NCBI's
> recommendations, but who actually reads documentation?
>
>>> *  Send E-utilities requests to http://eutils.ncbi.nlm.nih.gov
>>> <http://eutils.ncbi.nlm.nih.gov/> , not the standard NCBI Web address.
>
> That change was announced on May 21, 2003, and most likely no one on the
> Biopython dev group tracks the EUtils mailing list.  It was also after I
> wrote the code, but to be fair I was subscribed to the utilities list at the
> time and should have caught the change.
>
> I think the correct fix is to this code in ThinClient.py:
>
>    def __init__(self,
>                 opener = None,
>                 tool = TOOL,
>                 email = EMAIL,
>                 baseurl = "http://www.ncbi.nlm.nih.gov/entrez/eutils/"):
>
> Change the baseurl to "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/".  I
> have not tested this.

I've tested that fix, and it seems to be OK with test_EUtils.py and
test_SeqIO_online.py which calls Bio.EUTils via Bio.GenBank, checked
in as Bio/EUtils/ThinClient.py revision 1.6

I'll have a look at your other specific suggestions too.  Thanks for
taking the time to go over this Andrew.

Peter


From p.j.a.cock at googlemail.com  Thu Jun 26 10:47:05 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 26 Jun 2008 11:47:05 +0100
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <635E5251-830F-409C-A2D4-10EA59FA5037@dalkescientific.com>
References: <254082.68438.qm@web62401.mail.re1.yahoo.com>
	<635E5251-830F-409C-A2D4-10EA59FA5037@dalkescientific.com>
Message-ID: <320fb6e00806260347i7655ba6eg490f5003a273a37d@mail.gmail.com>

On Thu, Jun 26, 2008 at 2:52 AM, Andrew Dalke <dalke at dalkescientific.com> wrote:
> On Jun 26, 2008, at 2:01 AM, Michiel de Hoon wrote:
>>
>> Bio.Entrez does use the 3 seconds sleep rule, and the eight E-Utilities
>> functions all make use of the EUtils web address, though it is possible to
>> pass a different web address as one of the arguments. The "query" function,
>> which is not part of the E-Utilities, does use the standard NCBI web
>> address.
>
> What is the proper EUtils web address?
>
> Entrez/__init__.py uses
>  cgi='http://www.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi'
> while the documentation at
>  http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
> claims "Send E-utilities requests to http://eutils.ncbi.nlm.nih.gov",
> which I think should be
> "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi"

Yes, for ePost that is correct:
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/epost_help.html

[On a related note, following Andrew's suggestion, I have updated CVS
to use the new base URL in Bio/EUtils/ThinClient.py]

>> To avoid such problems in the future, I'd like to propose the following:
>> 1) Deprecate Bio.EUtils. Its functionality is covered by Bio.Entrez, which
>> (from release 1.46) will have a parser.
>
> I looked over Bio.Entrez and it handles only a subset of what Bio.EUtils
> does.  For example, it doesn't have any support to help track WebEnv as it
> changes over each request, nor support for alternate format types.

No, Bio.Entrez does not support the WebEnv / history interface.  It
can request data in different format types though, although it will
only parse the XML output.

> I would deprecate Bio.EUtils for another reason - there's no maintainer.

This is a strong reason - although we are still using Bio.EUtils in
Bio.GenBank (and probably in other places too).

>> 2) Remove the 'query' function from Bio.Entrez. Anyway accessing NCBI's
>> web site from Python to get HTML back doesn't make a lot of sense.
>
> Okay, now I'm quite confused.  This is functionality that Bio.EUtils
> supports.

I think Michiel meant getting a handle containing raw HTML isn't very
sensible, and this is what the Bio.Entrez.query() function does.  If
it can only return HTML, then I agree, its not very useful and could
be removed.

>> 3) Remove the argument for a user-specified web address to make sure that
>> always the E-Utilities address is used.
>
> Yes.
>

Unlike BLAST where you may have a local webserver, is there any reason
for to use a URL other than the NCBI's one?

Peter


From dalke at dalkescientific.com  Thu Jun 26 11:03:19 2008
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 26 Jun 2008 13:03:19 +0200
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806260347i7655ba6eg490f5003a273a37d@mail.gmail.com>
References: <254082.68438.qm@web62401.mail.re1.yahoo.com>
	<635E5251-830F-409C-A2D4-10EA59FA5037@dalkescientific.com>
	<320fb6e00806260347i7655ba6eg490f5003a273a37d@mail.gmail.com>
Message-ID: <52BDC1F6-52F8-4A42-B738-DFBB119F9C27@dalkescientific.com>

On Jun 26, 2008, at 12:47 PM, Peter Cock wrote:
> I think Michiel meant getting a handle containing raw HTML isn't very
> sensible, and this is what the Bio.Entrez.query() function does.

I meant to point out that supporting the search interface, with  
machine parseable, is functionality in Bio.EUtils that isn't in  
Bio.Entrez.

> Unlike BLAST where you may have a local webserver, is there any reason
> for to use a URL other than the NCBI's one?

I can't think of any.

(I can make up one - setting up a local mock server for tests.  But  
that's not seriously going to happen.)

				Andrew
				dalke at dalkescientific.com


From biopython at maubp.freeserve.co.uk  Thu Jun 26 11:40:54 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 12:40:54 +0100
Subject: [Biopython-dev] [BioPython] Fwd: NCBI Abuse Activity with
	BioPython
In-Reply-To: <5CD393BF-D4FB-4700-B7CC-2417C9845010@dalkescientific.com>
References: <EEEED756EF6626469B10653F74501438018FAABB@NIHCESMLBX15.nih.gov>
	<55F39FBF-3CF0-4192-AFEB-100853FEE8A1@sonsorol.org>
	<DA88B268-8CE1-4C71-A685-F7C13978C8BA@dalkescientific.com>
	<320fb6e00806260421g48e5807ei92297b372c330e5b@mail.gmail.com>
	<5CD393BF-D4FB-4700-B7CC-2417C9845010@dalkescientific.com>
Message-ID: <320fb6e00806260440n4a933b60of5a7c8eee4e15a89@mail.gmail.com>

On Thu, Jun 26, 2008 at 12:26 PM, Andrew Dalke wrote:
> On Jun 26, 2008, at 1:21 PM, Peter wrote:
>>
>> Looking over the code, should this wait also be done for the
>> ThinClient's epost() method as well?
>
> Where?  It gets the URL from an instance variable, which is set in the
> constructor.

The ThinClient class is defined In Bio/EUtils/ThinClient.py, and I
have added a 3 second wait to its _get() method.  I think we should
also add the three second wait to the epost() method.  Both methods
will construct their URL using self.baseurl, so they are both going to
hit the same server.

Note that for the implementation, I would probably define a new
_wait() method to check the time since the last call, and call this
_wait() method from both _get() and epost().

>> This complexity is also daunting for anyone else considering taking
>> over the Bio.EUtils code base.
>
> My incomplete rewrite uses elementtree which does reduce some of the
> complexity.  But the NCBI interface is a mess.

I can see why Michiel has kept things simple in Bio.Entrez - this
should cater to most user's needs.

Peter


From mjldehoon at yahoo.com  Thu Jun 26 11:45:45 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 26 Jun 2008 04:45:45 -0700 (PDT)
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806260347i7655ba6eg490f5003a273a37d@mail.gmail.com>
Message-ID: <402220.93857.qm@web62411.mail.re1.yahoo.com>

> > I would deprecate Bio.EUtils for another reason - there's no maintainer.
This is what I meant. I am sure that we can fix Bio.EUtils for now, but I don't see how we can maintain it in the future. That is why originally we decided to focus on Bio.WWW.NCBI (renamed to Bio.Entrez) instead.

> - although we are still using Bio.EUtils in Bio.GenBank
> (and probably in other places too).

As far as I can tell, Bio.GenBank is currently the only module in which Bio.EUtils is used, not counting modules that themselves have been deprecated. It shouldn't be too complicated to modify Bio.GenBank to use Bio.Entrez instead.

>>> 2) Remove the 'query' function from Bio.Entrez.
>>>  Anyway accessing NCBI's web site from Python
>>> to get HTML back doesn't make a lot of sense.
>
>> Okay, now I'm quite confused.  This is functionality
>> that Bio.EUtils supports.
>
> I think Michiel meant getting a handle containing
> raw HTML isn't very sensible, and this is what the
> Bio.Entrez.query() function does.  If it can only
> return HTML, then I agree, its not very useful and
> could be removed.
That is indeed what I meant. (It is still possible to get raw HTML by using the other EUtilities, for example efetch, but from a scripting language efetch is more likely to be used to get XML or some plain-text output).

--Michiel


From mjldehoon at yahoo.com  Thu Jun 26 12:50:10 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 26 Jun 2008 05:50:10 -0700 (PDT)
Subject: [Biopython-dev] New release
Message-ID: <390323.35893.qm@web62411.mail.re1.yahoo.com>

Hi everybody,

I think we should make a new Biopython release within the next couple of weeks to solve the issues with NCBI and to get the fixed Blast parser out (for output from Blast 2.2.18). There are a few outstanding issues that hopefully can be fixed before the next release:
1) NCBI access from Bio.GenBank
2) Bug #2454 (Iterators can't use file-like objects), which affects a number of parsers in Biopython
3) Martel-based parsers.

>From a technical viewpoint, none of these are very complicated. 2) is almost finished.
With respect to 3), a small number of parsers in Biopython are based on Martel (none of the major ones as far as I can tell). For some of these parsers, it is not quite clear if they are still useful. For the remaining ones, it would be nice if they could be rewritten without using Martel -- that would let us get rid of the dependency on mxTextTools.

Any other urgent issues that need to be resolved before a release?

--Michiel.


From biopython at maubp.freeserve.co.uk  Thu Jun 26 12:53:09 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 13:53:09 +0100
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <402220.93857.qm@web62411.mail.re1.yahoo.com>
References: <320fb6e00806260347i7655ba6eg490f5003a273a37d@mail.gmail.com>
	<402220.93857.qm@web62411.mail.re1.yahoo.com>
Message-ID: <320fb6e00806260553i4a7c5b2cxe5ae5aa0c80e53d1@mail.gmail.com>

> As far as I can tell, Bio.GenBank is currently the only module in which
> Bio.EUtils is used, not counting modules that themselves have been
> deprecated. It shouldn't be too complicated to modify Bio.GenBank to use
> Bio.Entrez instead.

Looking back at CVS, it used to use Bio.WWW.NCBI once upon a time
(which is now Bio.Entrez), and had explicit rate limiting.  Then four
years ago Brad moved the Bio.GenBank.download_many() and search_for()
functions over to using Bio.EUtils (CVS revision 1.51 of
Bio/GenBank/__init__.py).

Brad also appears to have changed the functionality of
Bio.GenBank.download_many() from a call back mechanism to returning a
handle.  We could still return a handle, but it would require fetching
all the records (perhaps in batches), and concatenating them.  I think
it would make more sense to deprecate the Bio.GenBank.download_many()
function, and direct people to Bio.Entrez.efetch() instead.

The Bio.GenBank.search_for() still seems somewhat useful, but without
a default limit on the number of returned IDs, this could easily be
abused.  Again, we could deprecate this and direct people to
Bio.Entrez.esearch() instead.

Peter


From mjldehoon at yahoo.com  Thu Jun 26 13:41:24 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 26 Jun 2008 06:41:24 -0700 (PDT)
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806260553i4a7c5b2cxe5ae5aa0c80e53d1@mail.gmail.com>
Message-ID: <8498.83228.qm@web62412.mail.re1.yahoo.com>

> The Bio.GenBank.search_for() still seems somewhat
> useful, but without a default limit on the number
> of returned IDs, this could easily be abused.
> Again, we could deprecate this and direct people
> to Bio.Entrez.esearch() instead.
As always, I am in favor of deprecating functions whose purpose is dubious.
F

# Using Bio.GenBank
>>> from Bio import GenBank
>>> gi_list = GenBank.search_for("Opuntia AND rpl16")
>>> gi_list
['57240072', '57240071', '6273287', '6273291', '6273290', '6273289', '6273286', '6273285', '6273284']

# Same thing, using Bio.Entrez
>>> from Bio import Entrez
>>> handle = Entrez.esearch(db='nucleotide', term="Opuntia AND rpl16")
>>> record = Entrez.read(handle)
>>> record["IdList"]
['57240072', '57240071', '6273287', '6273291', '6273290', '6273289', '6273286', '6273285', '6273284']


--- On Thu, 6/26/08, Peter <biopython at maubp.freeserve.co.uk> wrote:
From: Peter <biopython at maubp.freeserve.co.uk>
Subject: Re: [Biopython-dev] NCBI Abuse activity with Biopython
To: mjldehoon at yahoo.com
Cc: "Biopython Developers Mailing List" <biopython-dev at biopython.org>
Date: Thursday, June 26, 2008, 8:53 AM

> As far as I can tell, Bio.GenBank is currently the only module in which
> Bio.EUtils is used, not counting modules that themselves have been
> deprecated. It shouldn't be too complicated to modify Bio.GenBank to
use
> Bio.Entrez instead.

Looking back at CVS, it used to use Bio.WWW.NCBI once upon a time
(which is now Bio.Entrez), and had explicit rate limiting.  Then four
years ago Brad moved the Bio.GenBank.download_many() and search_for()
functions over to using Bio.EUtils (CVS revision 1.51 of
Bio/GenBank/__init__.py).

Brad also appears to have changed the functionality of
Bio.GenBank.download_many() from a call back mechanism to returning a
handle.  We could still return a handle, but it would require fetching
all the records (perhaps in batches), and concatenating them.  I think
it would make more sense to deprecate the Bio.GenBank.download_many()
function, and direct people to Bio.Entrez.efetch() instead.

The Bio.GenBank.search_for() still seems somewhat useful, but without
a default limit on the number of returned IDs, this could easily be
abused.  Again, we could deprecate this and direct people to
Bio.Entrez.esearch() instead.

Peter


From mjldehoon at yahoo.com  Thu Jun 26 13:51:55 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 26 Jun 2008 06:51:55 -0700 (PDT)
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806260553i4a7c5b2cxe5ae5aa0c80e53d1@mail.gmail.com>
Message-ID: <597121.15112.qm@web62401.mail.re1.yahoo.com>

[Sorry, hit the send button too soon]

> The Bio.GenBank.search_for() still seems somewhat
> useful, but without a default limit on the number
> of returned IDs, this could easily be abused.
> Again, we could deprecate this and direct people
> to Bio.Entrez.esearch() instead.
As always, I am in favor of deprecating functions whose purpose is dubious.
As an example, this is a Genbank search done via Bio.GenBank and via Bio.Entrez:

# Using Bio.GenBank
>>> from Bio import GenBank
>>> gi_list = GenBank.search_for("Opuntia AND rpl16")
>>> gi_list
['57240072', '57240071', '6273287', '6273291', '6273290', '6273289', '6273286', '6273285', '6273284']

# Same thing, using Bio.Entrez
>>> from Bio import Entrez
>>> handle = Entrez.esearch(db='nucleotide', term="Opuntia AND rpl16")
>>> record = Entrez.read(handle)
>>> record["IdList"]
['57240072', '57240071', '6273287', '6273291', '6273290', '6273289', '6273286', '6273285', '6273284']

I believe that GenBank.search_for automatically takes care of the retmax parameter (the maximum number of ids to return), but I agree that this can be abused easily.

> Brad also appears to have changed the functionality of 
> Bio.GenBank.download_many() from a call back mechanism 
> to returning a handle.  We could still return a handle, but it would
> require fetching all the records (perhaps in batches), and
> concatenating them.  I think it would make more sense to deprecate
> the Bio.GenBank.download_many() function, and direct people to
> Bio.Entrez.efetch() instead.

Agree.

Btw, NCBIDictionary definitely needs to go.
>From the documentation, continuing the example above:
>>> ncbi_dict = GenBank.NCBIDictionary("nucleotide", "genbank")
>>> gb_record = ncbi_dict[gi_list[0]]
Hence, we're running efetch once for each key separately; this is exactly what NCBI advised against.

--Michiel.


From mjldehoon at yahoo.com  Thu Jun 26 14:01:31 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 26 Jun 2008 07:01:31 -0700 (PDT)
Subject: [Biopython-dev] Bio.ECell, anybody?
Message-ID: <712489.88060.qm@web62410.mail.re1.yahoo.com>

This is one of the Martel-based parser whose relevance in 2008 is unclear to me.

>From the docstring:

Ecell converts the ECell input from spreadsheet format to an intermediate format, described in http://www.e-cell.org/manual/chapter2E.html#3.2.? It provides an alternative to the perl script supplied with the Ecell2 distribution at http://bioinformatics.org/project/?group_id=49.

Currently, ECell is at version 3.1.106 (and uses Python as the scripting interface! Yay!). The link to the chapter in the ECell manual is dead.

Is anybody using the Bio.ECell module?

--Michiel


From biopython at maubp.freeserve.co.uk  Thu Jun 26 14:43:10 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 15:43:10 +0100
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <597121.15112.qm@web62401.mail.re1.yahoo.com>
References: <320fb6e00806260553i4a7c5b2cxe5ae5aa0c80e53d1@mail.gmail.com>
	<597121.15112.qm@web62401.mail.re1.yahoo.com>
Message-ID: <320fb6e00806260743u3385955dt2be06d7f8122d8e5@mail.gmail.com>

OK then - I will deprecate the Bio.GenBank.search_for() and
Bio.GenBank,download_many() functions, suggesting Bio.Entrez instead.
I will also update the tutorial on this.

On Thu, Jun 26, 2008 at 2:51 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Btw, NCBIDictionary definitely needs to go.
> From the documentation, continuing the example above:
>>>> ncbi_dict = GenBank.NCBIDictionary("nucleotide", "genbank")
>>>> gb_record = ncbi_dict[gi_list[0]]
> Hence, we're running efetch once for each key separately; this is exactly what NCBI advised against.

If the user wants to run a Entrez search and then fetch some/all of
the results, then yes, the NCBI would not want us to do a multiple
separate efetch calls by idenifier.  Could you prepare an example
using Bio.Entrez with the "history" (WebEnv argument)?

However, if the user has provided the list of GI numbers (e.g. from a
file), there is no existing NCBI search data to refer to, and I don't
see any other option.  So there is a use-case for the
Bio.GenBank.NCBIDictionary class.

Peter


From mjldehoon at yahoo.com  Thu Jun 26 14:49:49 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 26 Jun 2008 07:49:49 -0700 (PDT)
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806260743u3385955dt2be06d7f8122d8e5@mail.gmail.com>
Message-ID: <525848.21341.qm@web62410.mail.re1.yahoo.com>

--- On Thu, 6/26/08, Peter <biopython at maubp.freeserve.co.uk> wrote:
However, if the user has provided the list of GI numbers (e.g. from a file), there is no existing NCBI search data to refer to, and I don't see any other option.  So there is a use-case for the Bio.GenBank.NCBIDictionary class.

In that case, the following can be used:
>>> from Bio import Entrez
>>> idlist = ['123','456','453',.....] # a list of GI numbers
>>> ids = ",".join(idlist)
>>> handle = Entrez.efetch(db='nucleotide', id=ids, retmode='xml')
>>> records = Entrez.read(handle)
# records is now a list of records corresponding to '123', '456', '453',...

--Michiel. 


From biopython at maubp.freeserve.co.uk  Thu Jun 26 16:05:36 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 17:05:36 +0100
Subject: [Biopython-dev] [BioPython] Fwd: NCBI Abuse Activity with
	BioPython
In-Reply-To: <79693088-0D38-459E-ADEC-FF2757E41912@dalkescientific.com>
References: <EEEED756EF6626469B10653F74501438018FAABB@NIHCESMLBX15.nih.gov>
	<55F39FBF-3CF0-4192-AFEB-100853FEE8A1@sonsorol.org>
	<DA88B268-8CE1-4C71-A685-F7C13978C8BA@dalkescientific.com>
	<320fb6e00806260421g48e5807ei92297b372c330e5b@mail.gmail.com>
	<5CD393BF-D4FB-4700-B7CC-2417C9845010@dalkescientific.com>
	<320fb6e00806260440n4a933b60of5a7c8eee4e15a89@mail.gmail.com>
	<79693088-0D38-459E-ADEC-FF2757E41912@dalkescientific.com>
Message-ID: <320fb6e00806260905i599a53f3v367045d3ee07ffbf@mail.gmail.com>

On Thu, Jun 26, 2008 at 12:48 PM, Andrew Dalke
<dalke at dalkescientific.com> wrote:
>> I think we should
>> also add the three second wait to the epost() method.
>
> I see it now.  Yes, that needs it as well.

Good - I've updated that in CVS, Bio/EUtils/ThinClient.py revision 1.8

>> I can see why Michiel has kept things simple in Bio.Entrez - this
>> should cater to most user's needs.
>
> Sad, but true.  EUtils (the server and the client) offer a lot more than
> what most users need.
>

Agreed.

Thanks again Andrew for your advice on where Bio.EUtils needed
updating - it certainly meant this got dealt with more quickly.

Peter


From biopython at maubp.freeserve.co.uk  Thu Jun 26 17:04:26 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 18:04:26 +0100
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806260743u3385955dt2be06d7f8122d8e5@mail.gmail.com>
References: <320fb6e00806260553i4a7c5b2cxe5ae5aa0c80e53d1@mail.gmail.com>
	<597121.15112.qm@web62401.mail.re1.yahoo.com>
	<320fb6e00806260743u3385955dt2be06d7f8122d8e5@mail.gmail.com>
Message-ID: <320fb6e00806261004r227c3340wf390779f1cc4616b@mail.gmail.com>

Michiel,

I started working on a patch to mark Bio.GenBank.search_for() etc as
deprecated, but on reflection I don't really like the longer code
needed with Bio.Entrez - for example this one liner:

from Bio import GenBank
gi_list = GenBank.search_for("Opuntia AND rpl16")

becomes:

from Bio import Entrez
handle = Entrez.esearch(db='nucleotide', term="Opuntia AND rpl16")
gi_list = Entrez.read(handle)["IdList"]

One idea that might be worth discussing is having variations of the
Entrez.e* functions which will parse the XML and return the results.
i.e. something like this:

def esearch2(...) :
   """Calls ESearch and parses the returned XML."""
   return read(esearch(..., retmode="XML"))

Then we can write,

from Bio import Entrez
gi_list = Entrez.esearch2(db='nucleotide', term="Opuntia AND rpl16")["IdList"]

(An alternative naming convention like a "p" might be nicer)

My initial plan was to get the search results back as plain text
(retmode='uilist'), thus avoiding parsing the XML.  However, after
reading the Entrez documentation, and some experimentation to confirm
this, I was surprised to find the ESearch will only return XML.  The
NCBI appear to suggest that if you want your search results in another
format use the WebEnv session history, and then ask EFetch to reformat
it (!).  This does work, but means making two internet calls:

from Bio import Entrez
handle = Entrez.esearch(db='nucleotide', term="Opuntia AND rpl16",
usehistory="y")
session = Entrez.read(handle)['WebEnv']
gi_list = Entrez.efetch(db='nucleotide', WebEnv=session, query_key=1,
rettype='uilist').read().split('\n')

As an aside, do we really have to include the database in the efetch call above?

Peter


From biopython at maubp.freeserve.co.uk  Thu Jun 26 20:32:07 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 21:32:07 +0100
Subject: [Biopython-dev] New release
In-Reply-To: <390323.35893.qm@web62411.mail.re1.yahoo.com>
References: <390323.35893.qm@web62411.mail.re1.yahoo.com>
Message-ID: <320fb6e00806261332q408cc02boa7ee4c3342b53e4b@mail.gmail.com>

On Thu, Jun 26, 2008 at 1:50 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> I think we should make a new Biopython release within the next couple of weeks
> to solve the issues with NCBI and to get the fixed Blast parser out (for output
> from Blast 2.2.18). There are a few outstanding issues that hopefully can be
> fixed before the next release:
> 1) NCBI access from Bio.GenBank
> 2) Bug #2454 (Iterators can't use file-like objects), which affects a number of parsers in Biopython
> 3) Martel-based parsers.

Given the updates to Bio.EUtils to enforce the 3 second rule, the
urgent part of issue (1) is now resolved, and any futher refinements
needn't hold up the release.

>From a technical viewpoint, none of these are very complicated. 2) is almost finished.

While there are still outstanding parsers affected by issue (2) (Bug
2454), I don't think this need hold up the release.

> With respect to 3), a small number of parsers in Biopython are based on
> Martel (none of the major ones as far as I can tell). For some of these
> parsers, it is not quite clear if they are still useful. For the remaining ones,
>  it would be nice if they could be rewritten without using Martel -- that would
> let us get rid of the dependency on mxTextTools.

Again, while removing the dependency on mxTextTools is a worthwhile
aim, I don't think this should hold up the release.

> Any other urgent issues that need to be resolved before a release?

There is an AlignInfo alphabet issue I'm currently working on, and
expect to have fixed tomorrow.

Peter


From dalke at dalkescientific.com  Thu Jun 26 21:40:51 2008
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 26 Jun 2008 23:40:51 +0200
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806261004r227c3340wf390779f1cc4616b@mail.gmail.com>
References: <320fb6e00806260553i4a7c5b2cxe5ae5aa0c80e53d1@mail.gmail.com>
	<597121.15112.qm@web62401.mail.re1.yahoo.com>
	<320fb6e00806260743u3385955dt2be06d7f8122d8e5@mail.gmail.com>
	<320fb6e00806261004r227c3340wf390779f1cc4616b@mail.gmail.com>
Message-ID: <5DF39193-B52A-4EB9-84D3-C9626984DEA8@dalkescientific.com>

On Jun 26, 2008, at 7:04 PM, Peter wrote:
> I started working on a patch to mark Bio.GenBank.search_for() etc as
> deprecated, but on reflection I don't really like the longer code
> needed with Bio.Entrez

> One idea that might be worth discussing is having variations of the
> Entrez.e* functions which will parse the XML and return the results.
> i.e. something like this:
>
> def esearch2(...) :
>    """Calls ESearch and parses the returned XML."""
>    return read(esearch(..., retmode="XML"))

What about calling it "search"?  That is, the one that does  
everything the default way as most people expect is the one which  
doesn't need the prefix?

> My initial plan was to get the search results back as plain text
> (retmode='uilist'), thus avoiding parsing the XML.  However, after
> reading the Entrez documentation, and some experimentation to confirm
> this, I was surprised to find the ESearch will only return XML.  The
> NCBI appear to suggest that if you want your search results in another
> format use the WebEnv session history, and then ask EFetch to reformat
> it (!).  This does work, but means making two internet calls:

That's my memory of it too.

> As an aside, do we really have to include the database in the  
> efetch call above?

Yes.  Or you did 5 years ago.

				Andrew
				dalke at dalkescientific.com


From biopython at maubp.freeserve.co.uk  Thu Jun 26 21:53:40 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 22:53:40 +0100
Subject: [Biopython-dev] New release
In-Reply-To: <320fb6e00806261332q408cc02boa7ee4c3342b53e4b@mail.gmail.com>
References: <390323.35893.qm@web62411.mail.re1.yahoo.com>
	<320fb6e00806261332q408cc02boa7ee4c3342b53e4b@mail.gmail.com>
Message-ID: <320fb6e00806261453l649f4ce3i83a6ed38fec54965@mail.gmail.com>

>> Any other urgent issues that need to be resolved before a release?
>
> There is an AlignInfo alphabet issue I'm currently working on, and
> expect to have fixed tomorrow.

Fixed, I think.  Alphabets can be annoying, especially gapped alphabets!

Peter


From biopython at maubp.freeserve.co.uk  Thu Jun 26 22:05:45 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Jun 2008 23:05:45 +0100
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <5DF39193-B52A-4EB9-84D3-C9626984DEA8@dalkescientific.com>
References: <320fb6e00806260553i4a7c5b2cxe5ae5aa0c80e53d1@mail.gmail.com>
	<597121.15112.qm@web62401.mail.re1.yahoo.com>
	<320fb6e00806260743u3385955dt2be06d7f8122d8e5@mail.gmail.com>
	<320fb6e00806261004r227c3340wf390779f1cc4616b@mail.gmail.com>
	<5DF39193-B52A-4EB9-84D3-C9626984DEA8@dalkescientific.com>
Message-ID: <320fb6e00806261505w6e51d168i78987ac109a6f015@mail.gmail.com>

On Thu, Jun 26, 2008 at 10:40 PM, Andrew Dalke
<dalke at dalkescientific.com> wrote:
> On Jun 26, 2008, at 7:04 PM, Peter wrote:
>>
>> I started working on a patch to mark Bio.GenBank.search_for() etc as
>> deprecated, but on reflection I don't really like the longer code
>> needed with Bio.Entrez
>
>> One idea that might be worth discussing is having variations of the
>> Entrez.e* functions which will parse the XML and return the results.
>> i.e. something like this:
>>
>> def esearch2(...) :
>>   """Calls ESearch and parses the returned XML."""
>>   return read(esearch(..., retmode="XML"))
>
> What about calling it "search"?  That is, the one that does everything the
> default way as most people expect is the one which doesn't need the prefix?

I like that idea for the naming :)  What do you think Michiel, as this
is your module?

Peter


From mjldehoon at yahoo.com  Thu Jun 26 23:16:23 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 26 Jun 2008 16:16:23 -0700 (PDT)
Subject: [Biopython-dev] New release
In-Reply-To: <320fb6e00806261332q408cc02boa7ee4c3342b53e4b@mail.gmail.com>
Message-ID: <501202.26872.qm@web62413.mail.re1.yahoo.com>

OK, then let's make a new release as soon as possible, and perhaps another one soon after that. Tentative date is this Sunday, around noon GMT. All biopython unit tests pass (at least, on my machine), so it should be straightforward to build a release.

--Michiel.

--- On Thu, 6/26/08, Peter <biopython at maubp.freeserve.co.uk> wrote:
From: Peter <biopython at maubp.freeserve.co.uk>
Subject: Re: [Biopython-dev] New release
To: mjldehoon at yahoo.com
Cc: biopython-dev at biopython.org
Date: Thursday, June 26, 2008, 4:32 PM

On Thu, Jun 26, 2008 at 1:50 PM, Michiel de Hoon <mjldehoon at yahoo.com>
wrote:
> Hi everybody,
>
> I think we should make a new Biopython release within the next couple of
weeks
> to solve the issues with NCBI and to get the fixed Blast parser out (for
output
> from Blast 2.2.18). There are a few outstanding issues that hopefully can
be
> fixed before the next release:
> 1) NCBI access from Bio.GenBank
> 2) Bug #2454 (Iterators can't use file-like objects), which affects a
number of parsers in Biopython
> 3) Martel-based parsers.

Given the updates to Bio.EUtils to enforce the 3 second rule, the
urgent part of issue (1) is now resolved, and any futher refinements
needn't hold up the release.

>From a technical viewpoint, none of these are very complicated. 2) is
almost finished.

While there are still outstanding parsers affected by issue (2) (Bug
2454), I don't think this need hold up the release.

> With respect to 3), a small number of parsers in Biopython are based on
> Martel (none of the major ones as far as I can tell). For some of these
> parsers, it is not quite clear if they are still useful. For the remaining
ones,
>  it would be nice if they could be rewritten without using Martel -- that
would
> let us get rid of the dependency on mxTextTools.

Again, while removing the dependency on mxTextTools is a worthwhile
aim, I don't think this should hold up the release.

> Any other urgent issues that need to be resolved before a release?

There is an AlignInfo alphabet issue I'm currently working on, and
expect to have fixed tomorrow.

Peter


From mjldehoon at yahoo.com  Thu Jun 26 23:20:49 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 26 Jun 2008 16:20:49 -0700 (PDT)
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806261505w6e51d168i78987ac109a6f015@mail.gmail.com>
Message-ID: <900951.88468.qm@web62414.mail.re1.yahoo.com>

There are some other possibilities, for example to use the retout parameter. This parameter lets you choose between XML, HTML, plain text, ... format for the results. We could make the rule that without an explicit value for this parameter, the Bio.Entrez.e* functions return the parsed results.

If we're not sure what to do, I suggest we keep the search_for function in Bio.GenBank for the upcoming release, and take this issue up later.

--Michiel.

--- On Thu, 6/26/08, Peter <biopython at maubp.freeserve.co.uk> wrote:
From: Peter <biopython at maubp.freeserve.co.uk>
Subject: Re: [Biopython-dev] NCBI Abuse activity with Biopython
To: "Biopython Developers Mailing List" <biopython-dev at biopython.org>
Cc: "Andrew Dalke" <dalke at dalkescientific.com>
Date: Thursday, June 26, 2008, 6:05 PM

On Thu, Jun 26, 2008 at 10:40 PM, Andrew Dalke
<dalke at dalkescientific.com> wrote:
> On Jun 26, 2008, at 7:04 PM, Peter wrote:
>>
>> I started working on a patch to mark Bio.GenBank.search_for() etc as
>> deprecated, but on reflection I don't really like the longer code
>> needed with Bio.Entrez
>
>> One idea that might be worth discussing is having variations of the
>> Entrez.e* functions which will parse the XML and return the results.
>> i.e. something like this:
>>
>> def esearch2(...) :
>>   """Calls ESearch and parses the returned
XML."""
>>   return read(esearch(..., retmode="XML"))
>
> What about calling it "search"?  That is, the one that does
everything the
> default way as most people expect is the one which doesn't need the
prefix?

I like that idea for the naming :)  What do you think Michiel, as this
is your module?

Peter
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


From biopython at maubp.freeserve.co.uk  Thu Jun 26 23:45:50 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 00:45:50 +0100
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <900951.88468.qm@web62414.mail.re1.yahoo.com>
References: <320fb6e00806261505w6e51d168i78987ac109a6f015@mail.gmail.com>
	<900951.88468.qm@web62414.mail.re1.yahoo.com>
Message-ID: <320fb6e00806261645y1819cddx620d430f34d7e725@mail.gmail.com>

On Fri, Jun 27, 2008 at 12:20 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> There are some other possibilities, for example to use the retout parameter.
> This parameter lets you choose between XML, HTML, plain text, ... format for
> the results.

I'm not sure if its rettype, retmode or retout - but something like that.

> We could make the rule that without an explicit value for this
> parameter, the Bio.Entrez.e* functions return the parsed results.

You suggestion to automatically do the parsing when XML format is
requested would prevent the user from parsing the XML themselves (e.g.
using SAX or DOM).  It would also spoil my plan to include some of the
Entrez sequence XML formats in Bio.SeqIO as this would need
Bio.efetch(...) to return a handle with XML in it.

> If we're not sure what to do, I suggest we keep the search_for function in
> Bio.GenBank for the upcoming release, and take this issue up later.

That would be expedient.

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Jun 26 23:47:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jun 2008 19:47:14 -0400
Subject: [Biopython-dev] [Bug 2090] Blast.NCBIStandalone BlastParser fails
	with blastall 2.2.14
In-Reply-To: <bug-2090-42@http.bugzilla.open-bio.org/>
Message-ID: <200806262347.m5QNlESr031036@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2090


------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-26 19:47 EST -------
Created an attachment (id=952)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=952&action=view)
Patch to Bio/Blast/NCBIStandalone.py

This is a very rough attempt at fixing multiquery BLAST output from recent
versions of NCBI BLAST.

It seems to work for the file I tested, but breaks the final part of the unit
test due to the alignments shown as "Flat Query-Anchored with(out) Identities",
described here:

http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/multi_formats.html

See also unit test files bt005 and bt045


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jun 27 00:37:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jun 2008 20:37:14 -0400
Subject: [Biopython-dev] [Bug 2375] Coalescent support through Simcoal2
In-Reply-To: <bug-2375-42@http.bugzilla.open-bio.org/>
Message-ID: <200806270037.m5R0bEkY000324@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2375


------- Comment #24 from mdehoon at ims.u-tokyo.ac.jp  2008-06-26 20:37 EST -------
I committed my patch to setup.py, as it seems to work fine with Python 2.3,
2.4, and 2.5 on all platforms. Leaving this bug open, since we still need to
remove the workaround in Bio/PopGen/SimCoal/__init__.py.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Fri Jun 27 14:12:45 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 15:12:45 +0100
Subject: [Biopython-dev] Bio.AlignIO and Bio.Entrez documentation
Message-ID: <320fb6e00806270712w134e1c5cm903b811c55fc60e1@mail.gmail.com>

Hi all,

I've realised that there is quite a lot of new content in the Tutorial
since the last release.

In addition to my new chapter on Bio.AlignIO, Michiel and I have both
spent a good chunk of time on the Bio.Entrez chapter of the tutorial.
Michiel wrote the bulk of this chapter and has updated it to cover the
new XML parser.  I've just been adding information based on the NCBI
guidelines (for example encouraging people to include their email
address in the Entrez calls), and I've just added another section with
an example using the history/webenv for a combined esearch and efetch.

If anyone could spare some time to proof read the tutorial,
concentrating on either or both of these new chapters (and trying the
examples) it would be appreciated.  Those of you with CVS access can
of course check in any little fixes - but if you spot anything
significant its probably worth discussing first.

Ideally we can fix any little typos before Michiel releases Biopython
1.46 (tentatively this Sunday, around noon GMT).

Peter

P.S. If you'd like to help out and can't read or run LaTeX, let me
know by email and I'll send you the latest edition of the tutorial as
a PDF or HTML file.


From biopython at maubp.freeserve.co.uk  Fri Jun 27 15:42:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 16:42:16 +0100
Subject: [Biopython-dev] Removing obsolete bits of the Tutorial
Message-ID: <320fb6e00806270842r6231adfdo8edff7a07a329cdf@mail.gmail.com>

I'm still in documentation mode, and I've just removed bits of
documentation of a few deprecated or obsolete bits of code.

I've just got the the "BioRegistry ? automatically ?nding sequence
sources" section of the tutorial/cookbook, and this either needs major
updating or removing.  First of all since Biopython 1.44, the line
"from Bio import db" had to be "from Bio.config.DBRegistry import db".
 And secondly, given this is all based on Martel parsers, the list of
supported formats is now a lot thinner.

Would anyone object to me removing this section of the
tutorial/cookbook?  We might be able to deprecate it too, but I'm not
sure what side effects that might have so its a bit risky this close
to a planned release.

Then there is the section on "Parser Design" which focuses on the
scanner/consumer model and lists lots of the events these parsers
(used to) generate.  I don't think any of this is useful, and suspect
that a lot of it is out of date.  Again, should we just remove this
section?

Peter


From mjldehoon at yahoo.com  Fri Jun 27 15:54:13 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 27 Jun 2008 08:54:13 -0700 (PDT)
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <320fb6e00806261645y1819cddx620d430f34d7e725@mail.gmail.com>
Message-ID: <224711.6366.qm@web62411.mail.re1.yahoo.com>

> > We could make the rule that without an explicit value for this

> > parameter, the Bio.Entrez.e* functions return the parsed results.

> You suggestion to automatically do the parsing when XML format is
> requested would prevent the user from parsing the XML themselves (e.g.
> using SAX or DOM).Actually I was suggesting to do the parsing only if no format is requested, and to return a handle to XML if XML format is requested.

But from the current examples in the Bio.Entrez chapter in the tutorial, it appears that typically users will have to write some glue code anyway to make optimally use of Bio.Entrez for their purposes. In that case, I suppose that whether or not we return a handle or an object from the Bio.Entrez.e* functions makes little difference.

--Michiel.


From biopython at maubp.freeserve.co.uk  Fri Jun 27 16:06:58 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 17:06:58 +0100
Subject: [Biopython-dev] NCBI Abuse activity with Biopython
In-Reply-To: <224711.6366.qm@web62411.mail.re1.yahoo.com>
References: <320fb6e00806261645y1819cddx620d430f34d7e725@mail.gmail.com>
	<224711.6366.qm@web62411.mail.re1.yahoo.com>
Message-ID: <320fb6e00806270906p3d0d3a1dyf78b64bc2f0afa13@mail.gmail.com>

On Fri, Jun 27, 2008 at 4:54 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>> Your suggestion to automatically do the parsing when XML format is
>> requested would prevent the user from parsing the XML themselves (e.g.
>> using SAX or DOM).
>
> Actually I was suggesting to do the parsing only if no format is
> requested, and to return a handle to XML if XML format is requested.

Oh I see.  But determining the format is a complex combination of the
retmode and rettype parameters... quite confusing it its own right!
Especially as the are multiple different XML file formats for the same
result set.
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/efetchlit_help.html
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/efetchlit_help.html

> But from the current examples in the Bio.Entrez chapter in the tutorial, it appears
> that typically users will have to write some glue code anyway to make optimally
> use of Bio.Entrez for their purposes. In that case, I suppose that whether or not
> we return a handle or an object from the Bio.Entrez.e* functions makes little difference.

Fair point.  Certainly the "esearch and efetch" example is relatively
complicated, and having a combined "esearch then parse" function
wouldn't make much difference.

Let's leave this suggestion for the time being (having versions of the
Bio.Entrez functions which include the call to Bio.Entrez.read() to
parse the XML).

Peter


From mjldehoon at yahoo.com  Fri Jun 27 16:01:54 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 27 Jun 2008 09:01:54 -0700 (PDT)
Subject: [Biopython-dev] Removing obsolete bits of the Tutorial
In-Reply-To: <320fb6e00806270842r6231adfdo8edff7a07a329cdf@mail.gmail.com>
Message-ID: <215121.11545.qm@web62405.mail.re1.yahoo.com>


> I've just got the the "BioRegistry ? automatically ?nding sequence
> sources" section of the tutorial/cookbook, and this either needs major
> updating or removing
> ...
> Would anyone object to me removing this section of the
> tutorial/cookbook?
I think it's better to remove it.
Then there is the section on "Parser Design" which focuses on the
scanner/consumer model and lists lots of the events these parsers
(used to) generate.  I don't think any of this is useful, and suspect
that a lot of it is out of date.  Again, should we just remove this
section?
That too. Otherwise, we may inadvertently be causing new
Biopython developers to write their parsers using this out of
date parser design, which as far as I know is not being used
in the major Biopython modules.

--Michiel


From mjldehoon at yahoo.com  Fri Jun 27 16:40:13 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 27 Jun 2008 09:40:13 -0700 (PDT)
Subject: [Biopython-dev] Modules to be removed from Biopython
Message-ID: <492634.64872.qm@web62414.mail.re1.yahoo.com>

Hi everybody,

In recent releases, we have been using the rule of thumb to remove all modules from a new Biopython release that were deprecated two releases ago.

For the upcoming release, this means that we will remove the modules that were deprecated in Biopython 1.44. In that release, quite a lot of modules were deprecated; these modules will not appear in Biopython 1.46.

Some of the modules to be removed are relatively simple cases, which I think can be removed without causing any real pain to anybody:

Bio.crc (moved to Bio.SeqUtils.CheckSum)
Bio.Fasta.index_file
Bio.Fasta.Dictionary
Bio.GenBank.index_file
Bio.GenBank.Dictionary
Bio.Geo.Iterator (replaced by Bio.Geo.parse)
Bio.KEGG.Compound.Iterator (replaced by Bio.KEGG.Compound.parse)
Bio.KEGG.Enzyme.Iterator (replaced by Bio.KEGG.Enzyme.parse)
Bio.KEGG.Map.Iterator (replaced by Bio.KEGG.Enzyme.parse)
Bio.lcc (moved to Bio.SeqUtils.lcc)
Bio.MarkupEditor
Bio.Medline.NLMMedlineXML
Bio.Medline.nlmmedline_001211_format
Bio.Medline.nlmmedline_010319_format
Bio.Medline.nlmmedline_011101_format
Bio.Medline.nlmmedline_031101_format
Bio.MultiProc
Bio.SeqIO.FASTA.py
Bio.SeqIO.generic.py

But, there is also a set of interconnected modules where it's not 100% clear if they can be removed without causing some surprises:
Bio.builders
Bio.config
Bio.dbdefs
Bio.formatdefs
Bio.dbdefs
Bio.expressions
Bio.FormatIO
Bio.Std
Bio.StdHandler
It is probably OK to remove these, since these were deprecated we did not get a barrage of complaints from our users. Personally, I think it is important to keep the code base clean, so I am in favor of removing these (and see if anybody complains; in that case, we can always put these modules back in and make a new release). But I can live with keeping these modules for another release round. If anybody thinks that that would be better, please let us know.

--Michiel


From biopython at maubp.freeserve.co.uk  Fri Jun 27 16:50:17 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 17:50:17 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <492634.64872.qm@web62414.mail.re1.yahoo.com>
References: <492634.64872.qm@web62414.mail.re1.yahoo.com>
Message-ID: <320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>

On Fri, Jun 27, 2008 at 5:40 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> In recent releases, we have been using the rule of thumb to remove all
> modules from a new Biopython release that were deprecated two releases ago.

I was wondering if there was a stated policy on this.

> For the upcoming release, this means that we will remove the modules
> that were deprecated in Biopython 1.44. In that release, quite a lot of
> modules were deprecated; these modules will not appear in Biopython 1.46.
>
> Some of the modules to be removed are relatively simple cases, which I
> think can be removed without causing any real pain to anybody:
>
> Bio.crc (moved to Bio.SeqUtils.CheckSum)
> Bio.Fasta.index_file
> Bio.Fasta.Dictionary
> Bio.GenBank.index_file
> Bio.GenBank.Dictionary
> Bio.Geo.Iterator (replaced by Bio.Geo.parse)
> Bio.KEGG.Compound.Iterator (replaced by Bio.KEGG.Compound.parse)
> Bio.KEGG.Enzyme.Iterator (replaced by Bio.KEGG.Enzyme.parse)
> Bio.KEGG.Map.Iterator (replaced by Bio.KEGG.Enzyme.parse)
> Bio.lcc (moved to Bio.SeqUtils.lcc)
> Bio.MarkupEditor
> Bio.Medline.NLMMedlineXML
> Bio.Medline.nlmmedline_001211_format
> Bio.Medline.nlmmedline_010319_format
> Bio.Medline.nlmmedline_011101_format
> Bio.Medline.nlmmedline_031101_format
> Bio.MultiProc
> Bio.SeqIO.FASTA.py
> Bio.SeqIO.generic.py

Those all look fine to remove.  I agree here.

> But, there is also a set of interconnected modules where it's not 100%
> clear if they can be removed without causing some surprises:
> Bio.builders
> Bio.config
> Bio.dbdefs
> Bio.formatdefs
> Bio.dbdefs
> Bio.expressions
> Bio.FormatIO
> Bio.Std
> Bio.StdHandler
> It is probably OK to remove these, since these were deprecated we did
> not get a barrage of complaints from our users. Personally, I think it is
> important to keep the code base clean, so I am in favor of removing
> these (and see if anybody complains; in that case, we can always put
> these modules back in and make a new release). But I can live with
> keeping these modules for another release round. If anybody thinks
> that that would be better, please let us know.

Given some of these are very interconnected, I would be inclined to leave
them in for one more release.  However I'm content to see them go.  If no
one else has any  qualms, then please carry on.

Peter


From biopython at maubp.freeserve.co.uk  Fri Jun 27 16:54:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 17:54:16 +0100
Subject: [Biopython-dev] Removing obsolete bits of the Tutorial
In-Reply-To: <215121.11545.qm@web62405.mail.re1.yahoo.com>
References: <320fb6e00806270842r6231adfdo8edff7a07a329cdf@mail.gmail.com>
	<215121.11545.qm@web62405.mail.re1.yahoo.com>
Message-ID: <320fb6e00806270954r4ee7b16fw3210cd77f1708a3@mail.gmail.com>

On Fri, Jun 27, 2008 at 5:01 PM, Michiel de Hoon wrote:
>
>> I've just got the the "BioRegistry ? automatically ?nding sequence
>> sources" section of the tutorial/cookbook, and this either needs major
>> updating or removing
>> ...
>> Would anyone object to me removing this section of the
>> tutorial/cookbook?
>
> I think it's better to remove it.

Gone.

>> Then there is the section on "Parser Design" which focuses on the
>> scanner/consumer model and lists lots of the events these parsers
>> (used to) generate.  I don't think any of this is useful, and suspect
>> that a lot of it is out of date.  Again, should we just remove this
>> section?
>
> That too. Otherwise, we may inadvertently be causing new
> Biopython developers to write their parsers using this out of
> date parser design, which as far as I know is not being used
> in the major Biopython modules.

It's not entirely out of date - don't SAX based XML parsers do
something similar?  And quite a few major modules still follow this
scheme (e.g. Bio.GenBank and Bio.SwissProt).  Anyway, I have removed
most of this section leaving only a short overview.

Peter


From biopython at maubp.freeserve.co.uk  Fri Jun 27 17:49:53 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 18:49:53 +0100
Subject: [Biopython-dev] Recent Bio.Nexus updates
Message-ID: <320fb6e00806271049vdfb15co30a05c0a93963aba@mail.gmail.com>

Hi Frank,

I see you've got your CVS access working again - good :)

I wanted to ask you about two of your recent changes to Bio/Nexus/Nexus.py

First of all, you've added a new method export_phylip(), which seems
to be a simple function to record the Nexus object's alignment as a
PHYLIP format alignment.  One point of concern is code duplication
(Bio.AlignIO can write PHYLIP files).  Also, you don't seem to be
following the "spec" strictly, as the taxon names are not cropped to
ten characters, nor are any "illegal" characters dealt with.  More
generally, I wonder if this method is really needed - perhaps instead
a general method to return a Bio.Align.Generic.Alignment object would
be preferable.  This could then be used in conjunction with any of the
alignment formats supported in Bio.AlignIO.

Secondly, you seem to have reverted the alphabet change to
Bio/Nexus/Nexus.py made in revision 1.12 to fix Bug 2380.  Was this
deliberate or just accidental?
http://bugzilla.open-bio.org/show_bug.cgi?id=2380

Thanks,

Peter


From biopython at maubp.freeserve.co.uk  Fri Jun 27 21:58:04 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 22:58:04 +0100
Subject: [Biopython-dev] [BioPython] Entrez
In-Reply-To: <1214569152.6026.9.camel@ubuntu>
References: <1214494546.6215.3.camel@ubuntu>
	<320fb6e00806260857i619d4947l130791ab8276f992@mail.gmail.com>
	<1214562160.6026.2.camel@ubuntu>
	<320fb6e00806270416x76d8b388mdd79577927001f32@mail.gmail.com>
	<1214569152.6026.9.camel@ubuntu>
Message-ID: <320fb6e00806271458t4e043c39sb664c4346c8a6949@mail.gmail.com>

Just forwarding this to the mailing list - Binbin's problem is
resolved (although I don't know what was wrong originally).

A happy ending :)

Peter

---------- Forwarded message ----------
From: binbin <binbin.liu at umb.no>
Date: Fri, Jun 27, 2008 at 1:19 PM
Subject: Re: [BioPython] Entrez
To: Peter <biopython at maubp.freeserve.co.uk>


i re-install the biopyton1.45 and now i can import Entrez!

thanks very much!


? 2008-06-27?? 13:16 +0200?Peter???
> On Fri, Jun 27, 2008 at 11:22 AM, binbin <binbin.liu at umb.no> wrote:
> > thank you for answering, i am a beginner of biopython,in the "Biopython
> > Tutorial and Cookbook":
> > 2.5  Connecting with biological databases:
> > this is found
> > "from Bio import Entrez"
> >
> > i tried this but it did work for me, that is why i asked.
>
> That should have worked if your installation of Biopython 1.45 was successful.
>
> We may be able to work out what is wrong.  What operating system are
> you using, which version of python, and how did you install Biopython?
>
> Regards,
>
> Peter


From biopython at maubp.freeserve.co.uk  Fri Jun 27 22:06:14 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jun 2008 23:06:14 +0100
Subject: [Biopython-dev] [BioPython] Bio.SCOP.FileIndex
In-Reply-To: <141582.2274.qm@web62413.mail.re1.yahoo.com>
References: <141582.2274.qm@web62413.mail.re1.yahoo.com>
Message-ID: <320fb6e00806271506i1af1db34n1aec65605fd6f83c@mail.gmail.com>

On Wed, Jun 25, 2008 at 3:04 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> When I was modifying Bio.SCOP, I noticed that Bio.SCOP.FileIndex is flawed
> if file reading is done via a buffer (which is often the case in Python).

Are you talking about Bio/SCOP/FileIndex.py?  The whole design seems to be
geared to indexing the position of record in a file - down to the fact
that it takes
as filename rather than a handle. Why does it need "fixing"?

> Before we try to fix this, is anybody actually using Bio.SCOP.FileIndex?
> If not, I think we should deprecate it instead of trying to fix it.

We've deprecated similar functionality in Bio.GenBank, although if I recall
correctly that was because it was using Martel and broke with mxTextTools 3.0,
and therefore fixing it was non-trivial.

If Bio.SCOP.FileIndex is broken, then deprecation seems sensible.

Peter


From mjldehoon at yahoo.com  Sat Jun 28 02:21:53 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 27 Jun 2008 19:21:53 -0700 (PDT)
Subject: [Biopython-dev] [BioPython] Bio.SCOP.FileIndex
In-Reply-To: <320fb6e00806271506i1af1db34n1aec65605fd6f83c@mail.gmail.com>
Message-ID: <216781.61321.qm@web62403.mail.re1.yahoo.com>

--- On Fri, 6/27/08, Peter <biopython at maubp.freeserve.co.uk> wrote:
Are
you talking about Bio/SCOP/FileIndex.py? The whole design seems to
begeared to indexing the position of record in a file - down to the fact that it takes as filename rather than a handle. Why does it need "fixing"?

FileIndex pulls out records from the iterator one by one, and then calls .tell() on the file handle to find the starting position of each record. The problem is that (due to buffered reading from the file handle) .tell() does not correspond to the record starting positions.

Taking the essential pieces of FileIndex:

>>> input = open("mydatafile.txt")
>>> while True:
...???? next_line = input.next()
...???? print input.tell()
... 
8192
8192
8192
8192
8192
...
8192
8192
18432
18432
18432
...

It works because in the iterators that are actually used in Bio.SCOP call readline() internally, which reads exactly one line so that .tell() returns the expected answer.
But, calling readline() in the iterator is a limitation (e.g., you cannot run it on a list of lines).

Another option is to let FileIndex itself call readline():

class FileIndex(dict):
??? def __init__(self, filename, record_gen, key_gen)
??????? ...
??????? f = open(filename)
??????? while True:
??????????? line = f.readline()
??????????? self[key] = f.tell() # store location
...
??? def __getitem__(self, key):
??????? location = dict.__getitem__[key]
??????? f.seek(location)
??????? line = f.readline()
??????? return record_gen(line)

This works, but it means changing how users call FileIndex.
Which is also OK, but before modifying FileIndex it would be good to know if anybody is actually using this functionality.

--Michiel.


From mjldehoon at yahoo.com  Sat Jun 28 02:28:48 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 27 Jun 2008 19:28:48 -0700 (PDT)
Subject: [Biopython-dev] Bio.GenBank.NCBIDictionary, Bio.PubMed.Dictionary
Message-ID: <982950.87150.qm@web62409.mail.re1.yahoo.com>

Does anybody have any further objections to deprecating Bio.GenBank.NCBIDictionary and Bio.PubMed.Dictionary? These two classes download records from NCBI one by one, which is exactly what NCBI advised against.

--Michiel


From bugzilla-daemon at portal.open-bio.org  Sat Jun 28 20:09:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 28 Jun 2008 16:09:44 -0400
Subject: [Biopython-dev] [Bug 2530] New: Bio.Seq.translate() treats invalid
	codons as stops
Message-ID: <bug-2530-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2530

           Summary: Bio.Seq.translate() treats invalid codons as stops
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


The following results are with CVS.  Biopython 1.45 may be different, I have
recently tweaked the translate function for some less dramatic issues.

I would like Bio.Seq.translate() to raise exceptions on untranslatable codons,
rather than inserting a stop character.  e.g. for "N at N" or "TA-".

Currently:

>>> from Bio.Seq import translate
>>> translate("TAA")
'*'
>>> translate("TAG")
'*'
>>> translate("TAA")
'*'
>>> translate("TAC")
'Y'
>>> translate("TAN")
...
Bio.Data.CodonTable.TranslationError: 'TAN'
>>> translate("NNN")
...
Bio.Data.CodonTable.TranslationError: 'TAN'
>>> translate("AAA")
'K'
>>> translate("ANA")
'X'
>>> translate("AXA")
'X'

That is all fine.  However,

>>> translate("A at A")
'*'
>>> translate("A-A")
'*'

These should also raise a TranslationError.  Suggested non-trivial patch to
follow.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Jun 28 20:19:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 28 Jun 2008 16:19:09 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
	codons as stops
In-Reply-To: <bug-2530-42@http.bugzilla.open-bio.org/>
Message-ID: <200806282019.m5SKJ9l2011097@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2530


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-28 16:19 EST -------
Created an attachment (id=953)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=953&action=view)
Patch to Bio/Seq.py Bio/Data/CodonTable.py and the test_seq.py unit test

The basic idea of this patch is to include the stop codons in the CodonTable's
forward table dictionary.  Currently, when doing the translation a stop codon
is inserted when the key is undefined (but this also happens for invalid
codons).

Instead, by including the stop codons in the forward table, we can do a single
mapping.  Any KeyError becomes a translation error.

However, this is a fiarly significant change to the existing CodonTable
objects.  The are a strange odd bunch of objects - with the ambiguous codon
tables being very odd.  I have replaced all of these with a single codon table
which includes all the DNA and RNA codons, including the ambiguous ones.  All
the existing variants of DNA/RNA/Generic and (un)ambiguous CodonTables are more
replaced with the single object.  We still have one per NCBI codon table.

I think that the CodonTable could be made simpler still, but I wanted to at
least try and remain API backwards compatible (bar the dictionary change).

Then, I tweaked the Bio.Seq translate method to take advantage of this.

NOTE - We don't have a unit test for Bio.Data.CodonTable or Bio.Translate, so
it would be wise to write one BEFORE commiting this patch.  If there are any
other bits of code using Bio.Data.CodonTable they could also be affected.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Sat Jun 28 20:32:09 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jun 2008 21:32:09 +0100
Subject: [Biopython-dev] Failing unit tests under Windows
Message-ID: <320fb6e00806281332v44ba6139xd2531c57f53f92e@mail.gmail.com>

I run python 2.3.5 on Windows, and compile from source with MSCV 6.0
(which is a different setup to the one Michiel uses for the builds).
I just thought I should document the unit test oddities I see on this
machine:

test_ProtPram - fails with a single floating point difference, 0.562
versus 0.563.

test_Wise - doesn't fail gracefully due to a problem detecting dnal
http://bugzilla.open-bio.org/show_bug.cgi?id=2469

test_psw - fails due to a "doctest of" versus "Doctest: " string
difference.  This may be due to the different version of python?  We
can probably fix this in run_tests.py

test_KDTree - fails with ImportError: No module named _CKDTree
I do select yes when asked if I want to build Bio.KDTree - does this
work for anyone under Windows?

Peter


From bugzilla-daemon at portal.open-bio.org  Sat Jun 28 20:39:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 28 Jun 2008 16:39:45 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
	codons as stops
In-Reply-To: <bug-2530-42@http.bugzilla.open-bio.org/>
Message-ID: <200806282039.m5SKdjUA011740@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2530


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-28 16:39 EST -------
Actually there is a unit test, test_translate.py - maybe the lower case T
confused me?  The bad news is this unit test fails with my patch, due to the
Bio.Translate module using an incredibly strict check on the alphabet.

I'll try and come up with a less invasive change to Bio.Data.CodonTable which
makes Bio.Translate happy again - but probably not tonight.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jun 29 01:57:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 28 Jun 2008 21:57:54 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
	codons as stops
In-Reply-To: <bug-2530-42@http.bugzilla.open-bio.org/>
Message-ID: <200806290157.m5T1vshF022329@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2530


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #953 is|0                           |1
           obsolete|                            |


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-28 21:57 EST -------
(From update of attachment 953)
There is an underlying issue in Bio.Data.CodonTable, which is at least
commented:

# These two are WRONG!  I need to get the
# list of ambiguous codons which code for                            # the stop
codons  XXX

For example, R = A or G, so UAR = UAA or UAG / TAR = TAA or TAG = stop codons.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jun 29 02:37:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 28 Jun 2008 22:37:01 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
	codons as stops
In-Reply-To: <bug-2530-42@http.bugzilla.open-bio.org/>
Message-ID: <200806290237.m5T2b1Wu023585@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2530


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-28 22:37 EST -------
Created an attachment (id=954)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=954&action=view)
Rough patch to Bio/Data/CodonTable.py

This includes some self testing, but needs further validation before being
trusted.  For example, is it enough to compare just pairs of unambiguous
start/stop codons when generating the set of possible ambiguous start/stop
codons?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Sun Jun 29 06:22:43 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 28 Jun 2008 23:22:43 -0700 (PDT)
Subject: [Biopython-dev] [BioPython] Bio.SCOP.FileIndex
In-Reply-To: <216781.61321.qm@web62403.mail.re1.yahoo.com>
Message-ID: <584421.23968.qm@web62410.mail.re1.yahoo.com>

It turned out that Bio.SCOP.FileIndex was used as a base class in Bio.SCOP.Cla and Bio.SCOP.Raf. Without using Bio.SCOP.FileIndex as a base class, the derived classes in Bio.SCOP.Cla and Bio.SCOP.Raf were easy to fix. So I deprecated Bio.SCOP.FileIndex, while keeping Bio.SCOP's functionality intact by fixing the derived classes.

--Michiel


From bugzilla-daemon at portal.open-bio.org  Sun Jun 29 06:24:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 29 Jun 2008 02:24:42 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To: <bug-2454-42@http.bugzilla.open-bio.org/>
Message-ID: <200806290624.m5T6Og3F029458@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2454


------- Comment #19 from mdehoon at ims.u-tokyo.ac.jp  2008-06-29 02:24 EST -------
Bio.SCOP is fixed now (added a parse() function as a replacement for the
Iterator class, which is now deprecated).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jun 29 10:09:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 29 Jun 2008 06:09:25 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
	for '-F F' and make it safe
In-Reply-To: <bug-2508-42@http.bugzilla.open-bio.org/>
Message-ID: <200806291009.m5TA9PfZ021963@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508


------- Comment #7 from mmokrejs at ribosome.natur.cuni.cz  2008-06-29 06:09 EST -------
Quoting from http://www.python.org/dev/peps/pep-0324/

    - No implicit call of /bin/sh.  This means that there is no need
      for escaping dangerous shell meta characters.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jun 29 10:55:04 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 29 Jun 2008 06:55:04 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
	for '-F F' and make it safe
In-Reply-To: <bug-2508-42@http.bugzilla.open-bio.org/>
Message-ID: <200806291055.m5TAt4qX023404@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2508


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-29 06:55 EST -------
Hmm.  Another reason to move to Python 2.4+, see also Bug 2480.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Sun Jun 29 11:15:00 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 29 Jun 2008 04:15:00 -0700 (PDT)
Subject: [Biopython-dev] CVS freeze for release 1.46
Message-ID: <799546.26730.qm@web62413.mail.re1.yahoo.com>

Hi everybody,
I will start to creating the new release from now.
Please don't make any commits to CVS until the new release is out.
Thanks!

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Sun Jun 29 14:35:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 29 Jun 2008 10:35:11 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
	codons as stops
In-Reply-To: <bug-2530-42@http.bugzilla.open-bio.org/>
Message-ID: <200806291435.m5TEZBAh032091@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2530


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #954 is|0                           |1
           obsolete|                            |


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-29 10:35 EST -------
Created an attachment (id=955)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=955&action=view)
Patches Bio/Data/CodonTable.py for ambiguous start/stop codons

This implements the stub function list_ambiguous_codons, and adds a lot of
in-situ asserts which could later be moved to a unit test.

e.g. ['TAG', 'TAA'] -> ['TAG', 'TAA', 'TAR']
     ['UAG', 'UGA'] -> ['UAG', 'UGA', 'URA']

Note that ['TAG', 'TGA'] -> ['TAG', 'TGA'], this does not add 'TRR' is this
could be a stop codon or a coding amino acid.  Thus only two more codons are
added in the following example:

e.g. ['TGA', 'TAA', 'TAG'] -> ['TGA', 'TAA', 'TAG', 'TRA', 'TAR']


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Sun Jun 29 14:43:25 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 29 Jun 2008 07:43:25 -0700 (PDT)
Subject: [Biopython-dev] New release 1.46
Message-ID: <899008.26338.qm@web62403.mail.re1.yahoo.com>

Hi everybody,

Release 1.46 is essentially done. Feel free to start committing to CVS again.

Currently I am not able to update Biopython's wiki pages. This looks like an problem with the wiki, since I am getting a blank screen without any error message. So I cannot update the website and send out the announcement yet.

--Michiel


From biopython at maubp.freeserve.co.uk  Sun Jun 29 15:09:47 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 29 Jun 2008 16:09:47 +0100
Subject: [Biopython-dev] New release 1.46
In-Reply-To: <899008.26338.qm@web62403.mail.re1.yahoo.com>
References: <899008.26338.qm@web62403.mail.re1.yahoo.com>
Message-ID: <320fb6e00806290809r6ad238d3r3a16dfa145bc0186@mail.gmail.com>

On Sun, Jun 29, 2008 at 3:43 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> Release 1.46 is essentially done. Feel free to start committing to CVS again.

Well done - I hope you didn't give up your whole weekend for this.

> Currently I am not able to update Biopython's wiki pages. This looks like an problem
> with the wiki, since I am getting a blank screen without any error message. So I
> cannot update the website and send out the announcement yet.

I've been in touch with the OBF about this before.  You'll notice all
the other project pages are down too (check www.biosql.org and
www.bioperl.org for example).  I'm told they have something in place
to automatically reboot the server, so it should fix itself within an
hour or so, but it looks like they haven't resolved the underlying
problem.

I guess this means the new release files themselves are still waiting
on your local machine(s)?  That's a shame.

Peter


From mjldehoon at yahoo.com  Sun Jun 29 15:07:36 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 29 Jun 2008 08:07:36 -0700 (PDT)
Subject: [Biopython-dev] Removing obsolete bits of the Tutorial
In-Reply-To: <320fb6e00806270954r4ee7b16fw3210cd77f1708a3@mail.gmail.com>
Message-ID: <176230.99034.qm@web62415.mail.re1.yahoo.com>


>> Then there is the section on "Parser Design" which focuses
on the
>> scanner/consumer model and lists lots of the events these parsers
>> (used to) generate.  I don't think any of this is useful, and
suspect
>> that a lot of it is out of date.  Again, should we just remove this
>> section?
>
> That too. Otherwise, we may inadvertently be causing new
> Biopython developers to write their parsers using this out of
> date parser design, which as far as I know is not being used
> in the major Biopython modules.

It's not entirely out of date - don't SAX based XML parsers do
something similar?
Yes, but there's a difference:

In an XML file, we need to find out where the XML tags are to be able to parse the file. These tags can appear anywhere in the file.

In flat-file text formats, typically different information is stored in different lines. So finding out where one piece of information ends and another one starts becomes trivial. We just need to pull out the lines one by one, and check whether they are a new piece of information or a continuation of the current piece of information.

Especially for simple formats (e.g. Fasta), using a scanner / consumer model can be unnecessarily complex. But also for more complicated formats, parsing line by line can be entirely straightforward. For example, have a look at Bio/SwissProt/KeyWList.py, which currently contains a line-by-line parser and a scanner/consumer parser (which is deprecated). The former takes 26 lines, the latter more than a 100.

--Michiel.


From biopython at maubp.freeserve.co.uk  Sun Jun 29 15:28:04 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 29 Jun 2008 16:28:04 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>
References: <492634.64872.qm@web62414.mail.re1.yahoo.com>
	<320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>
Message-ID: <320fb6e00806290828u7133ee40x8feba14b19c13be8@mail.gmail.com>

> On Fri, Jun 27, 2008 at 5:40 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>> For the upcoming release, this means that we will remove the modules
>> that were deprecated in Biopython 1.44. In that release, quite a lot of
>> modules were deprecated; these modules will not appear in Biopython 1.46.
>>
>> Some of the modules to be removed are relatively simple cases, which I
>> think can be removed without causing any real pain to anybody:
>>
>> ...

I see you removed most of the easy ones before making Biopython 1.46.

Just to let you all know that I've just removed these three:

>> Bio.SeqIO.FASTA.py
>> Bio.SeqIO.generic.py
>> Bio.FormatIO

Peter


From fkauff at biologie.uni-kl.de  Mon Jun 30 08:34:30 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Mon, 30 Jun 2008 10:34:30 +0200
Subject: [Biopython-dev] Recent Bio.Nexus updates
In-Reply-To: <320fb6e00806271049vdfb15co30a05c0a93963aba@mail.gmail.com>
References: <320fb6e00806271049vdfb15co30a05c0a93963aba@mail.gmail.com>
Message-ID: <48689A96.4010805@biologie.uni-kl.de>

Hi Peter and Michiel,

Peter wrote:
> Hi Frank,
>
> I see you've got your CVS access working again - good :)
>
> I wanted to ask you about two of your recent changes to Bio/Nexus/Nexus.py
>
> First of all, you've added a new method export_phylip(), which seems
> to be a simple function to record the Nexus object's alignment as a
> PHYLIP format alignment.  One point of concern is code duplication
> (Bio.AlignIO can write PHYLIP files).  Also, you don't seem to be
> following the "spec" strictly, as the taxon names are not cropped to
> ten characters, nor are any "illegal" characters dealt with.  
True - I ignored this delibaretely. I think except for old PHYLIP 
itself, all software I know handles longer taxon names by default. The 
format I used here is sometimes refered to as "relaxed phylip" but as it 
has become the standard for what people call phylip formt, so I just 
kept it this way.

> More
> generally, I wonder if this method is really needed - perhaps instead
> a general method to return a Bio.Align.Generic.Alignment object would
> be preferable.  This could then be used in conjunction with any of the
> alignment formats supported in Bio.AlignIO.
>   
That is a possibility. I would then vouch for adding support for 
"relaxed phylip" to AlignIO.PhylipIO (which I could easily do with a 
little mofification of Nexus.export_phylip() myself)
> Secondly, you seem to have reverted the alphabet change to
> Bio/Nexus/Nexus.py made in revision 1.12 to fix Bug 2380.  Was this
> deliberate or just accidental?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2380
>
>   
Sorry for that. I missed that bug. Thaks for re-fixing it.

Frank
> Thanks,
>
> Peter
>
>   


-- 
J-Prof. Dr. Frank Kauff
Molecular Phylogenetics
FB Biologie, 13/276
TU Kaiserslautern
Postfach 3049
67653 Kaiserslautern

Tel. +49 (0)631 205-2562
Fax. +49 (0)631 205-2998
email: fkauff at biologie.uni-kl.de
skype: frank.kauff


From biopython at maubp.freeserve.co.uk  Mon Jun 30 09:12:17 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Jun 2008 10:12:17 +0100
Subject: [Biopython-dev] Recent Bio.Nexus updates
In-Reply-To: <48689A96.4010805@biologie.uni-kl.de>
References: <320fb6e00806271049vdfb15co30a05c0a93963aba@mail.gmail.com>
	<48689A96.4010805@biologie.uni-kl.de>
Message-ID: <320fb6e00806300212m6b129a17he9dfd7c8af7cbc03@mail.gmail.com>

>> First of all, you've added a new method export_phylip(), which seems
>> to be a simple function to record the Nexus object's alignment as a
>> PHYLIP format alignment.  One point of concern is code duplication
>> (Bio.AlignIO can write PHYLIP files).  Also, you don't seem to be
>> following the "spec" strictly, as the taxon names are not cropped to
>> ten characters, nor are any "illegal" characters dealt with.
>
> True - I ignored this delibaretely. I think except for old PHYLIP itself,
> all software I know handles longer taxon names by default. The format I used
> here is sometimes refered to as "relaxed phylip" but as it has become the
> standard for what people call phylip formt, so I just kept it this way.

Sadly "relaxed phylip" is an even less well defined format!

>> More
>> generally, I wonder if this method is really needed - perhaps instead
>> a general method to return a Bio.Align.Generic.Alignment object would
>> be preferable.  This could then be used in conjunction with any of the
>> alignment formats supported in Bio.AlignIO.
>
> That is a possibility. I would then vouch for adding support for "relaxed
> phylip" to AlignIO.PhylipIO (which I could easily do with a little
> mofification of Nexus.export_phylip() myself)

Would you expect spaces to be allowed in the names for "relaxed
phylip" files?  Writing the files is easy - checking that other tools
can understand them is more hassle.  And the flip side of this is
reading assorted versions of "relaxed phylip" is also tricky.  If you
have a collection of various "valid" files (ideally output from or
accepted by mainstream tools) we could use that to put together a test
suite which would define the de-facto standard.  But without that, I
wouldn't be so confident about adding this to Biopython.

>> Secondly, you seem to have reverted the alphabet change to
>> Bio/Nexus/Nexus.py made in revision 1.12 to fix Bug 2380.  Was this
>> deliberate or just accidental?
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2380
>
> Sorry for that. I missed that bug. Thaks for re-fixing it.

There may be a more elegant way of fixing this.

Peter


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 10:21:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 06:21:26 -0400
Subject: [Biopython-dev] [Bug 2509] Deprecating the .data property of the
	Seq and MutableSeq objects
In-Reply-To: <bug-2509-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301021.m5UALQVF020449@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2509


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 06:21 EST -------
See also Bug 2351, Make Seq more like a string, even subclass string?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 13:35:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 09:35:59 -0400
Subject: [Biopython-dev] [Bug 2531] New: Nexus and fasta parsers have a
	problem with identical taxa names
Message-ID: <bug-2531-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531

           Summary: Nexus and fasta parsers have a problem with identical
                    taxa names
           Product: Biopython
           Version: 1.44
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P4
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: abetanco at staffmail.ed.ac.uk


When identical taxa names are used to identify different sequences, the nexus
and fasta parser will output both taxa names, but output the same sequence for
each of them. 
If it's not possible to store both sequences, maybe it would be better if only
one of the sequences were  written out, so at least it's obvious there's a
problem?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 13:48:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 09:48:24 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301348.m5UDmO70030666@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 09:48 EST -------
Which Nexus and Fasta parsers?  There is more than one way to load these file
formats in Biopython - could you show us some sample code please?

You can attach a pair of example input files if it helps.

Thanks.  Peter.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 14:21:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 10:21:41 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301421.m5UELfPj000799@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 10:21 EST -------
Can I repeat my request that you upload an example file (by creating an
attachment to this bug) of a FASTA and NEXUS file that doesn't work for you.

Here is a small Nexus file I just created by hand, with repeated taxon
CYS1_DICDI (with almost the same sequence), and then below some example code
using Bio.Nexus to parse it.

==================================
#NEXUS
[TITLE: NoName]

begin data;
dimensions ntax=4 nchar=50;
format interleave datatype=protein   gap=- symbols="FSTNKEYVQMCLAWPHDRIG";

matrix
CYS1_DICDI          -----MKVIL LFVLAVFTVF VSS------- --------RG IPPEEQ---- 
ALEU_HORVU          MAHARVLLLA LAVLATAAVA VASSSSFADS NPIRPVTDRA ASTLESAVLG 
CATH_HUMAN          ------MWAT LPLLCAGAWL LGV------- -PVCGAAELS VNSLEK----
CYS1_DICDI          -----MKVIL LFVLAVFTVF VSS------- --------RG IPPEEQ---X
;
end; 
==================================

Then in python,
>>> filename = ...
>>> handle = open(filename)
>>> from Bio.Nexus import Nexus
>>> n = Nexus.Nexus(handle)
>>> print n.matrix.keys()
['CATH_HUMAN', 'CYS1_DICDI', 'CYS1_DICDI.copy', 'ALEU_HORVU']
>>> n.matrix['CYS1_DICDI']
Seq('-----MKVILLFVLAVFTVFVSS---------------RGIPPEEQ----', IUPACProtein())
>>> n.matrix['CYS1_DICDI.copy']
Seq('-----MKVILLFVLAVFTVFVSS---------------RGIPPEEQ---X', IUPACProtein())

Note that Bio.Nexus has automatically renamed the duplicate entry
'CYS1_DICDI.copy' and that their different sequences have been loaded
correctly.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 14:36:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 10:36:06 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301436.m5UEa6WK001525@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #3 from abetanco at staffmail.ed.ac.uk  2008-06-30 10:36 EST -------
Created an attachment (id=956)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=956&action=view)
nexus file

Sorry for the overly complicated nexus file, but I can't seem to reproduce the
bug with a simple example.  In this case, HI99.Line5 is entered twice, and
differs just at three sites (249, 417, and 452).  The result I get at those
three sites is the first sequence duplicated twice. 

             249        417     452
nexus file                      
HI99.Line5      T       T       A
HI99.Line5      C       C       G
fasta output
HI99.Line5      T       T       A
HI99.Line5      T       T       A


To do the conversion, I used this, which I think is just copied off the
Biopython documentation site:

#! /usr/bin/python


if __name__ == '__main__' :

        from Bio import SeqIO
        import sys

        input_handle = open(sys.argv[1], "rU")
        output_handle = open(sys.argv[1].+"fas", "w")

        sequences = SeqIO.parse(input_handle, "nexus")
        SeqIO.write(sequences, output_handle, "fasta")

        output_handle.close()
        input_handle.close()


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 14:52:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 10:52:08 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301452.m5UEq8DN002181@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 10:52 EST -------
Thanks for the example file - I can now reproduce a problem, which is progress.

There is a rather cryptic error message from Bio.SeqIO, due to the fact that
when Bio.Nexus parses the file it doesn't create a matrix.

You can see this by using Bio.Nexus directly:

>>> filename = ...
>>> handle = open(filename)
>>> from Bio.Nexus import Nexus
>>> n = Nexus.Nexus(handle)
>>> n.matrix.keys()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'keys'
>>> n.matrix is None
True

This explains why trying to use Bio.SeqIO gives the following exception:
TypeError: argument of type 'NoneType' is not iterable

So, from my point of view this is good news (joke) as its not really a problem
in Bio.SeqIO - although I will fix Bio.SeqIO so it fails gracefully.

This seems to be a problem in Bio.Nexus, so its a job for Frank...

I've got a couple more questions for you:

(1) Where did this file come from?  I'm not an expert on the details of the
Nexus file format, but I am wondering which program wrote this file, as perhaps
it is invalid in some way?

(2) Could we add it to Biopython as an example for our unit tests?  It might be
a bit big as it is, but we could cut it down a little by hand first.

P.S. I have retitled the bug from "Nexus and fasta parsers have a problem with
identical taxa names" to "Bio.Nexus has a problem with identical taxa names".

You don't seem to be parsing in any FASTA files, just trying to write one.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Mon Jun 30 14:55:16 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 30 Jun 2008 07:55:16 -0700 (PDT)
Subject: [Biopython-dev] New release
Message-ID: <97693.82874.qm@web62401.mail.re1.yahoo.com>

Sorry, but I still can't edit the Biopython wiki pages, so I can't make the new release available. Can other people edit these pages?

--Michiel.


From biopython at maubp.freeserve.co.uk  Mon Jun 30 14:56:39 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Jun 2008 15:56:39 +0100
Subject: [Biopython-dev] Bug 2531 - Bio.Nexus problem with file with
	repeated id
Message-ID: <320fb6e00806300756l7e9f6fe6sc68cf1884cb2994@mail.gmail.com>

Hi Frank,

Would you be able to take a look at this new report, bug 2531:
http://bugzilla.open-bio.org/show_bug.cgi?id=2531

The reporter Andrea Betancourt says she is using Biopython 1.44, while
I am on CVS (which should be equivalent to Biopython 1.46 for
Bio.Nexus).  Her reported symptoms and what I see are different... but
she has provided a test file to work from.

Thanks,

Peter


From p.j.a.cock at googlemail.com  Mon Jun 30 15:00:22 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 30 Jun 2008 16:00:22 +0100
Subject: [Biopython-dev] New release
In-Reply-To: <97693.82874.qm@web62401.mail.re1.yahoo.com>
References: <97693.82874.qm@web62401.mail.re1.yahoo.com>
Message-ID: <320fb6e00806300800rd74082eqabbd1a2bef66da76@mail.gmail.com>

On Mon, Jun 30, 2008 at 3:55 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Sorry, but I still can't edit the Biopython wiki pages, so I can't make the new
> release available. Can other people edit these pages?

No - as soon as I saw the wiki came back to life last night I tried,
and have tried again today.  I can make changes, view the preview and
differences, but I just get a blank page when I click submit.  I sent
off an email to OBF to alert them in case you hadn't.

I see the Biopython 1.46 files themselves are now online at
http://biopython.org/DIST/ so at least some of the web-server is
running properly :)

We could just do the announcement by email and the news page, and fix
the wiki later.  But it does risk causing a little confusion in the
short term.

Peter


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 15:36:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 11:36:17 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301536.m5UFaHlo004669@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #5 from abetanco at staffmail.ed.ac.uk  2008-06-30 11:36 EST -------
The file was written by a Windows program called DNAsp
(http://www.ub.es/dnasp/), which is widely used by population geneticists,
which is not to say that it didn't write an invalid file.  But it looked OK to
me, other than the too short taxa names. (Those too short names were inherited
from another program).
I don't mind you using for the test unit, but it would be nice if it were cut
down or something, as it is both unwieldy and unpublished data.
A.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 15:38:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 11:38:00 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301538.m5UFc0S4004813@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


fkauff at biologie.uni-kl.de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED


------- Comment #6 from fkauff at biologie.uni-kl.de  2008-06-30 11:38 EST -------
Handling a handle works like a charm for me with the attachment provided:

>>> handle=open('eg.nex')
>>> n=Nexus.Nexus(handle)
>>> n.matrix.keys()
['HI99.Line5.copy', 'am', 'HI99.Line1.copy', 'ezo', 'HI99.Line0.copy',
'DI05.Line5.copy', 'DI05.Line0.copy', 'DI05.Line8.copy1', 'DI05.Line1.copy1',
'HI99.Line3.copy', 'HI99.Line1.copy1', 'DI05.Line1.copy', 'DI05.Line9.copy',
'DI05.Line8.copy', 'HI99.Line4.copy', 'vir', 'DI05.Line8', 'DI05.Line9',
'HI99.Line2.copy', 'DI05.Line2', 'DI05.Line3', 'DI05.Line0', 'DI05.Line1',
'DI05.Line6', 'DI05.Line7', 'DI05.Line4', 'DI05.Line5', 'HI99.Line1',
'HI99.Line0', 'HI99.Line3', 'HI99.Line2', 'HI99.Line5', 'HI99.Line4']

However, Nexus.py needs unique taxon names. Non-unique taxon names won't make
much sense in a nexus file imho. If Nexus.py encounters non-unique names, they
are unified by adding a suffix (.copy, .copy1, ...) to it. Could this cause
problems to SeqIO.NexusIO?

Frank


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 16:12:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 12:12:29 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301612.m5UGCTnZ006531@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 12:12 EST -------
It looks like I didn't have the latest version of Bio.Nexus on this machine
which may have added to the confusion.  I've just updated to CVS (i.e. almost
exactly Biopython 1.46).  My issue with the matrix being None has gone away. 
Opps.

>>> from Bio.Nexus import Nexus
>>> n = Nexus.Nexus(open('eg.nex'))
>>> n.matrix.keys()
['HI99.Line5.copy', 'am', 'HI99.Line1.copy', 'ezo', 'HI99.Line0.copy',
'DI05.Line5.copy', 'DI05.Line0.copy', 'DI05.Line8.copy1', 'DI05.Line1.copy1',
'HI99.Line3.copy', 'HI99.Line1.copy1', 'DI05.Line1.copy', 'DI05.Line9.copy',
'DI05.Line8.copy', 'HI99.Line4.copy', 'vir', 'DI05.Line8', 'DI05.Line9',
'HI99.Line2.copy', 'DI05.Line2', 'DI05.Line3', 'DI05.Line0', 'DI05.Line1',
'DI05.Line6', 'DI05.Line7', 'DI05.Line4', 'DI05.Line5', 'HI99.Line1',
'HI99.Line0', 'HI99.Line3', 'HI99.Line2', 'HI99.Line5', 'HI99.Line4']
>>> assert [id for id in n.matrix] == n.matrix.keys()
>>> n.matrix['HI99.Line5']
Seq('ATCGATAGCATTGCGG-GGACGACGATGGACATTTGGAAAACGAATATGAAAAT...GAG',
IUPACAmbiguousDNA())
>>> n.matrix['HI99.Line5'][249-1]
'T'
>>> n.matrix['HI99.Line5'][417-1]
'T'
>>> n.matrix['HI99.Line5'][452-1]
'A'
>>> n.matrix['HI99.Line5.copy']
Seq('ATCGATAGCATTGCGGCGGACGACGATGGACATTTGGAAAACGAATATGAAAAT...GAG',
IUPACAmbiguousDNA())
>>> n.matrix['HI99.Line5.copy'][249-1]
'C'
>>> n.matrix['HI99.Line5.copy'][417-1]
'C'
>>> n.matrix['HI99.Line5.copy'][452-1]
'G'

So far this looks good.  However:

>>> n.original_taxon_order
['vir', 'am', 'ezo', 'DI05.Line5', 'DI05.Line1', 'DI05.Line9', 'DI05.Line2',
'DI05.Line3', 'HI99.Line2', 'HI99.Line1', 'HI99.Line5', 'DI05.Line4',
'DI05.Line1', 'DI05.Line7', 'HI99.Line3', 'DI05.Line6', 'DI05.Line8',
'HI99.Line4', 'DI05.Line1', 'HI99.Line1', 'DI05.Line8', 'DI05.Line5',
'HI99.Line2', 'HI99.Line0', 'HI99.Line0', 'HI99.Line5', 'DI05.Line9',
'HI99.Line3', 'DI05.Line0', 'DI05.Line0', 'HI99.Line4', 'HI99.Line1',
'DI05.Line8']

In the Bio.SeqIO code that calls Bio.Nexus, I hadn't realized that Bio.Nexus
kept the un-edited taxon names around.  It is this list of the non-unique
original identifiers that Bio.SeqIO was using, which explains why you end up
with two copies of HI99.Line5.

Sorry Frank - I was pointing fingers when it was my own bug after all!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 16:20:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 12:20:20 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301620.m5UGKK7M007026@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 12:20 EST -------
Frank, 

Looking back, the reason I was using the original_taxon_order list was I wanted
to get the sequences in their original order.  I see now that I can't use the
elements in this list as keys to the matrix because the matrix keys are the
modified taxon names.

Is there any way to get the modified taxon names in the original order?  Other
than looping over original_taxon_order and repeating your naming algorithm?

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 17:07:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 13:07:05 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301707.m5UH75I7009356@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 13:07 EST -------
Created an attachment (id=957)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=957&action=view)
Sample input file

Simple example file without a TAXA block

Second example file to follow


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 17:22:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 13:22:23 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301722.m5UHMNo4010009@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 13:22 EST -------
Created an attachment (id=958)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=958&action=view)
Second example file

Using the first file where there is no TAXA block:

>>> from Bio.Nexus import Nexus
>>> n = Nexus.Nexus(open('dup_names.nex'))
>>> print n.matrix.keys()
['CATH_HUMAN', 'CYS1_DICDI', 'CYS1_DICDI.copy', 'ALEU_HORVU']
>>> print n.original_taxon_order
['CYS1_DICDI', 'ALEU_HORVU', 'CATH_HUMAN', 'CYS1_DICDI.copy']

Then with a TAXA block,

>>> n2 = Nexus.Nexus(open('dup_names2.nex'))
>>> print n2.matrix.keys()
['CATH_HUMAN', 'CYS1_DICDI', 'CYS1_DICDI.copy', 'ALEU_HORVU']
>>> print n2.original_taxon_order
['CYS1_DICDI', 'ALEU_HORVU', 'CATH_HUMAN', 'CYS1_DICDI']

Notice the different behaviour of the original_taxon_order list.  In the first
case it gets the modified names, in the second case it doesn't.

Is this deliberate Frank?  On the other hand, maybe Nexus files without a TAXA
block are rare in real life?  Are they?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From fkauff at biologie.uni-kl.de  Mon Jun 30 17:10:15 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Mon, 30 Jun 2008 19:10:15 +0200
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a
 problem with identical taxa names
In-Reply-To: <200806301612.m5UGCTnZ006531@portal.open-bio.org>
References: <200806301612.m5UGCTnZ006531@portal.open-bio.org>
Message-ID: <48691377.803@biologie.uni-kl.de>


bugzilla-daemon at portal.open-bio.org wrote:
>
>
> In the Bio.SeqIO code that calls Bio.Nexus, I hadn't realized that Bio.Nexus
> kept the un-edited taxon names around.  It is this list of the non-unique
> original identifiers that Bio.SeqIO was using, which explains why you end up
> with two copies of HI99.Line5.
>
> Sorry Frank - I was pointing fingers when it was my own bug after all!
>
>
> Looking back, the reason I was using the original_taxon_order list was I wanted
> to get the sequences in their original order.  I see now that I can't use the
> elements in this list as keys to the matrix because the matrix keys are the
> modified taxon names.
>
> Is there any way to get the modified taxon names in the original order?  Other
> than looping over original_taxon_order and repeating your naming algorithm?
>   
Actually -this *IS* a bug. All fingers were pointing correctly... 
Original_taxon labels was just kept just for compatibility, and is the 
same as taxlabels. Taxlabels is supposed to have the unique identifiers 
- it just doesn't work correctly with non-unique ids in interleaved data 
sets.
Fix following soon

Frank


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 17:28:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 13:28:25 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301728.m5UHSPVk010377@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 13:28 EST -------
Created an attachment (id=959)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=959&action=view)
Tentative patch to Bio/SeqIO/NexusIO.py

This seems to cope with Andrea's real input file and my two hand written ones. 
It works by taking the original_taxon_order lists, and applying the
disambiguation algorithm if needed.  Not very elegant!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 19:29:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 15:29:32 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
	with identical taxa names
In-Reply-To: <bug-2531-42@http.bugzilla.open-bio.org/>
Message-ID: <200806301929.m5UJTWYQ015982@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2531


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 15:29 EST -------
Created an attachment (id=960)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=960&action=view)
Suggested patch to Bio/Nexus/Nexus.py

This modifies Bio.Nexus to ensure that the original_taxon_order uses the
original (duplicated) names, resolving the discrepancy I reported in comment
10.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 21:18:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 17:18:48 -0400
Subject: [Biopython-dev] [Bug 2520] Reading ACE assembly contig files in
	Bio.SeqIO
In-Reply-To: <bug-2520-42@http.bugzilla.open-bio.org/>
Message-ID: <200806302118.m5ULImoB021255@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2520


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 17:18 EST -------
Checked into CVS.

We'll need to revisit this once we have a good way of dealing with
per-letter-annotation which would be suitable for the quality scores.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 22:50:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 18:50:01 -0400
Subject: [Biopython-dev] [Bug 2532] New: Using IUPAC alphabets in mixed case
	Seq objects
Message-ID: <bug-2532-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2532

           Summary: Using IUPAC alphabets in mixed case Seq objects
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


Bio.Alphabets.IUPAC defines a number of alphabets with defined lists of valid
letters which are in upper case ONLY.

Bio.Nexus and Bio.Sequencing.Phd create Seq objects which use these alphabets
even with mixed case sequences.

This contradicts how I think the alphabet's .letters property is intended to be
used (although currently this is not enforced by the Seq object).

I suggest either:

(a) Bio.Nexus etc switch to using generic DNA/RNA alphabets for any Seq objects
including lower case letters (or more simply, all Seq objects).

(b) We add lower case and mixed case variants of the alphabet objects, and use
the mixed case IUPAC alphabets in Bio.Nexus etc for the Seq objects.

There is also the option of (c) Extend the existing upper case only IUPAC
alphabets to include lower case too, but I fear this could have unexpected side
effects (e.g. where people looping over the expected set of letters).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jun 30 22:51:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 30 Jun 2008 18:51:17 -0400
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
	objects
In-Reply-To: <bug-2532-42@http.bugzilla.open-bio.org/>
Message-ID: <200806302251.m5UMpHBf024519@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2532


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 18:51 EST -------
Created an attachment (id=961)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=961&action=view)
Patch to Bio.Sequencing.Phd

This takes the simple route of using a generic DNA alphabet.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.