From bugzilla-daemon at portal.open-bio.org  Sat Nov  1 00:02:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 1 Nov 2008 00:02:49 -0400
Subject: [Biopython-dev] [Bug 2627] Updated Bio.MarkovModel to remove
	oldnumeric and listfns imports
In-Reply-To: <bug-2627-42@http.bugzilla.open-bio.org/>
Message-ID: <200811010402.mA142nUi010329@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2627


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp  2008-11-01 00:02 EST -------
I made some changes to this patch and committed it to CVS; see MarkovModel.py
revision 1.9.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Nov  1 01:38:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 1 Nov 2008 01:38:41 -0400
Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns
	import
In-Reply-To: <bug-2631-42@http.bugzilla.open-bio.org/>
Message-ID: <200811010538.mA15cfGM016656@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2631


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp  2008-11-01 01:38 EST -------
Committed to CVS with some changes; see MaxEntropy.py versions 1.8 and 1.9.
I added your example at the bottom of Bio/MaxEntropy.py.
Next time, instead of the complete new code for a module, please attach a patch
instead. Thanks!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Nov  1 02:59:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 1 Nov 2008 02:59:40 -0400
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811010659.mA16xedF020106@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp  2008-11-01 02:59 EST -------
I committed part of this patch to CVS; see NaiveBayes.py revision 1.9.
Could you check your classify function? It seems to contain some debugging
statements. Also, do we need the classifyprob function?
If you send in a new version of this code, please attach it as a patch to the
current version of NaiveBayes.py in CVS.
Thanks!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Nov  1 17:22:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 1 Nov 2008 17:22:53 -0400
Subject: [Biopython-dev] [Bug 2592] numpy migration for Bio.PDB.Vector
In-Reply-To: <bug-2592-42@http.bugzilla.open-bio.org/>
Message-ID: <200811012122.mA1LMrf6021694@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2592


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-01 17:22 EST -------
Fixed in CVS, see Bio/PDB/Vector.py revision 1.45


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Nov  1 18:11:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 1 Nov 2008 18:11:47 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811012211.mA1MBl3b026482@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #26 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-01 18:11 EST -------
Here is an example of how the updated Seq object might be used (taken from the
new edition of the tutorial in CVS):

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", IUPAC.unambiguous_dna)
>>> coding_dna.translate()
Seq('MAIVMGR*KGAR*', HasStopCodon(IUPACProtein(), '*'))
>>> coding_dna.translate(to_stop=True)
Seq('MAIVMGR', IUPACProtein())

Using the Vertebrate Mitochondrial table instead:

>>> coding_dna.translate(table="Vertebrate Mitochondrial")
Seq('MAIVMGRWKGAR*', HasStopCodon(IUPACProtein(), '*'))
>>> coding_dna.translate(table=2)
Seq('MAIVMGRWKGAR*', HasStopCodon(IUPACProtein(), '*'))
>>> coding_dna.translate(table=2, to_stop=True)
Seq('MAIVMGRWKGAR', IUPACProtein())

As I said in comment 24, the name "to_stop" and its behaviour are taken from
the old (now obsolete) Bio.Translate module.

-------------------------------------------------------------

I'm also considering adding an additional boolean argument too (see comment
22):

> Validate the first codon is a valid start codon, and translate
> it as M (even if going on the genetic code it would normally be
> say L).  This should be a boolean argument defaulting to False,
> possible names "start", "check_start", "from_start", ...

I would prefer to avoid calling this argument "start" given the existing
meaning associated with "start" and "end" used in python strings (for
specifying a sub-sequence to be translated - discussed earlier on this bug).

This would be especially useful for translating a gene/CDS sequence into
protein where making sure a non-standard start codon is translated as "M" is
non-trivial.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov  3 06:17:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 3 Nov 2008 06:17:59 -0500
Subject: [Biopython-dev] [Bug 2638] New: test_PopGen_SimCoal_nodepend.py
	fails on Windows do to newline issue
Message-ID: <bug-2638-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2638

           Summary: test_PopGen_SimCoal_nodepend.py fails on Windows do to
                    newline issue
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Unit Tests
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


This unit test attempts to regenerate a plain text SimCoal file, and currently
fails on Windows (but passes on Linux and Mac OS X).

Patch to follow.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov  3 06:22:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 3 Nov 2008 06:22:16 -0500
Subject: [Biopython-dev] [Bug 2638] test_PopGen_SimCoal_nodepend.py fails on
	Windows do to newline issue
In-Reply-To: <bug-2638-42@http.bugzilla.open-bio.org/>
Message-ID: <200811031122.mA3BMGwX013481@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2638


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-03 06:22 EST -------
Created an attachment (id=1030)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1030&action=view)
Patch to the PopGen/SimCoal/Template.py and the unit test

Looking at the code, rather than using \n to mean a platform aware new line,
\r\n is used (this doesn't always give a CR LF, but on Windows you get CR CR LF
instead).

Also, are the template files in CVS as plain text files or binary files?  I
haven't double checked but I think they may be checked in as binary files with
DOS/Windows new lines...

I haven't committed this as I don't have SIMCOAL installed to check there are
no side effects.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov  3 06:22:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 3 Nov 2008 06:22:53 -0500
Subject: [Biopython-dev] [Bug 2638] test_PopGen_SimCoal_nodepend.py fails on
	Windows, newline issue
In-Reply-To: <bug-2638-42@http.bugzilla.open-bio.org/>
Message-ID: <200811031122.mA3BMr8B013540@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2638


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|test_PopGen_SimCoal_nodepend|test_PopGen_SimCoal_nodepend
                   |.py fails on Windows do to  |.py fails on Windows,
                   |newline issue               |newline issue


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-03 06:22 EST -------
Removed typo in the bug summary (title).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Mon Nov  3 06:48:06 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 3 Nov 2008 11:48:06 +0000
Subject: [Biopython-dev] New line issues in the source zip or tarballs
In-Reply-To: <320fb6e00809080514u5df6d9dej144c783076cbe467@mail.gmail.com>
References: <320fb6e00809060304h429f1085r301170aa93d4eb73@mail.gmail.com>
	<6d941f120809080442r1797666eu70e35c60353c5462@mail.gmail.com>
	<320fb6e00809080514u5df6d9dej144c783076cbe467@mail.gmail.com>
Message-ID: <320fb6e00811030348vb7b6068v549ebfab9f6ec76b@mail.gmail.com>

On Mon, Sep 8 Peterwrote:
> Tiago wrote:
>> Peter wrote:
>>> In the case of test_PopGen_SimCoal_nodepend.py the failure is
>>> expecting simple.par and simple_100_30.par to be exactly the same size
>>> (in class TemplateTest, line 47).  This is not true going to be true
>>> when the input file uses Unix new lines but the generated file uses
>>> Windows new lines.  Perhaps using a simple bit of code to load the
>>> files line by line and compare them would work here?
>>
>>  I am currently at a workshop (I belong to the organization committee, so I
>> don't have much time), but I will try to sort this in the next couple of
>> days.
>
> This issue new line issue has probably been there since Biopython 1.45
> without anyone else spotting it, so I don't see fixing it as urgent.
> Hopefully we can resolve this for the next release instead.

I've filed Bug 2638 on this with a possible patch.  Could you take a
look at this please?

I just tried installing SIMCOAL2 on my Mac, but failed.  To be fair,
they do only appear to support Linux and Windows...

Thanks

Peter

From biopython at maubp.freeserve.co.uk  Mon Nov  3 07:43:22 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 3 Nov 2008 12:43:22 +0000
Subject: [Biopython-dev] Bio.PopGen and SIMCOAL2 installation
Message-ID: <320fb6e00811030443w4d620c83w64c83fdafb9afa96@mail.gmail.com>

Hi Tiago,

I've just compiled SIMCOAL2 on a Linux machine from
http://cmpg.unibe.ch/software/simcoal2/ (version 2.1.2).  If anyone
else tries this, it required the use of -fpermissive on g++ 4.1.2 to
compile (and gave lots of deprecation warnings, plus some trivial ones
about header files which didn't end with a newline).

The make file specifies the executable name as simcoal2_1_2, however
it does not include an install target, so it is up to the user where
to put the binary (e.g. I used ~/bin/ rather than system wide) and
perhaps what to call it.  The provided pre-compiled binary is also
called simcoal2_1_2.

However, Bio.PopGen.SimCoal.Controller seems to assume the executable
will be called just simcoal2 (or simcoal2.exe on Windows), and thus
fails detect a binary called simcoal2_1_2.  The unit test however is
more flexible and looks for any binary on the path whose name starts
with simcoal2.  Ideally these two should be consistent.

I can make test_PopGen_SimCoal.py pass by installing SIMCOAL2 as
simcoal2 rather than simcoal2_1_2, but is this a SIMCOAL2 installation
issue or a bug in Bio.PopGen.SimCoal.Controller?  In my experience it
is not normal for a Linux tool to include the full version in the
executable name - using just simcoal2 does make more sense.

Thanks,

Peter

From bugzilla-daemon at portal.open-bio.org  Mon Nov  3 12:16:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 3 Nov 2008 12:16:41 -0500
Subject: [Biopython-dev] [Bug 2639] New: SeqRecord.init doesn't check for
	arguments to their types
Message-ID: <bug-2639-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2639

           Summary: SeqRecord.init doesn't check for arguments to their
                    types
           Product: Biopython
           Version: 1.47
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: dalloliogm at gmail.com


SeqRecord doesn't check if description is a string when creating SeqRecord
objects.
This causes an error when later you will have to print the record in formats
like fasta.

>>> from Bio.Seq import Seq
>>> from Bio.SeqRecord import SeqRecord
>>> sr = SeqRecord(Seq('aaa'), description = [1, 2, 3]) # should give an error here!
>>> print sr.fasta
<type 'exceptions.AttributeError'>: 'list' object has no attribute 'replace'

Looking at SeqRecord.__init__ code, none of the arguments is checked for its
type. 
This is a minor bug, but if you want to solve it, you just have to add some
isinstance() check in the init function.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov  3 13:47:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 3 Nov 2008 13:47:59 -0500
Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for
	arguments to their types
In-Reply-To: <bug-2639-42@http.bugzilla.open-bio.org/>
Message-ID: <200811031847.mA3IlxuE025247@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2639


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-03 13:47 EST -------
Fixed in CVS, although there is a small chance this will break existing scripts
which relied on the old lax behaviour.

Peter

P.S.
Assuming you are using an unmodified Biopython, the last line of your example
wouldn't work:
>>> print sr.fasta

Try:
>>> print sr.format("fasta")


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov  3 14:33:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 3 Nov 2008 14:33:39 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811031933.mA3JXdcZ028123@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #4 from bsouthey at gmail.com  2008-11-03 14:33 EST -------
(In reply to comment #3)
> I committed part of this patch to CVS; see NaiveBayes.py revision 1.9.
> Could you check your classify function? It seems to contain some debugging
> statements. Also, do we need the classifyprob function?
> If you send in a new version of this code, please attach it as a patch to the
> current version of NaiveBayes.py in CVS.
> Thanks!
> 

Yes, there is a print statement at the end of the 'classify' function (line 125
of attached file) that should be removed (as with any print statements that are
commented out). These were to check that the values were the same as the
original code. 

The classifyprob function can be dropped with not problems. I just wanted to
return the probability but I also recognize that it is not very useful.


I noticed you are using set (line 145 in the new cvs file) which is not
compatible with Python2.3. How should this be addressed?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From tiagoantao at gmail.com  Mon Nov  3 14:34:36 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 3 Nov 2008 19:34:36 +0000
Subject: [Biopython-dev] Bio.PopGen and SIMCOAL2 installation
In-Reply-To: <320fb6e00811030443w4d620c83w64c83fdafb9afa96@mail.gmail.com>
References: <320fb6e00811030443w4d620c83w64c83fdafb9afa96@mail.gmail.com>
Message-ID: <6d941f120811031134p4c0f1756k5ded879de7555dad@mail.gmail.com>

Hi,

On Mon, Nov 3, 2008 at 12:43 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> However, Bio.PopGen.SimCoal.Controller seems to assume the executable
> will be called just simcoal2 (or simcoal2.exe on Windows), and thus
> fails detect a binary called simcoal2_1_2.  The unit test however is
> more flexible and looks for any binary on the path whose name starts
> with simcoal2.  Ideally these two should be consistent.

I am aware of this, in fact, this issue is documented in the tutorial
(9.5.2.2). The idea is that the binary should be called simcoal2 as
documented. This can be changed of course. My preference would be to
change just the test code. Is this ok with you?

> I can make test_PopGen_SimCoal.py pass by installing SIMCOAL2 as
> simcoal2 rather than simcoal2_1_2, but is this a SIMCOAL2 installation
> issue or a bug in Bio.PopGen.SimCoal.Controller?  In my experience it
> is not normal for a Linux tool to include the full version in the
> executable name - using just simcoal2 does make more sense.

Agree. And, again, this is documented in the tutorial. I can go ahead
and change the test code (please just confirm).

From tiagoantao at gmail.com  Mon Nov  3 14:56:05 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 3 Nov 2008 19:56:05 +0000
Subject: [Biopython-dev] Statistics in population genetics module - Part
	I
In-Reply-To: <5aa3b3570811030736g7d7a0893x759777252c8d1828@mail.gmail.com>
References: <6d941f120810301658wec8678ald332abb8ddbdf80d@mail.gmail.com>
	<5aa3b3570811030736g7d7a0893x759777252c8d1828@mail.gmail.com>
Message-ID: <6d941f120811031156s2f634c1aq4252b17308ecf24a@mail.gmail.com>

Hi,

On Mon, Nov 3, 2008 at 3:36 PM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
> For how much time do you think a biopython module should be kept compatible
> with older versions, more or less?

That is an interesting discussion. My view is that biopython is fairly
conservative in that regard. I am not saying that I agree/disagree.
There seems to be a certain policy in place, and I respect it. But the
point is: Bio.PopGen has to have the same policy has the rest.

> It will take a long time to develop the module, and it is sure that we will
> make some mistakes. So, what is the best way to proceed? What if we create a

I will try to offer my view about this as soon as possible (in the next days).

> At the moment I am working with a separated git repository for all the
> popgen modules. The problem is that I didn't include all biopython modules
> in the repository, so, if any of my changes breaks something in biopython, I
> won't know it until I'll merge everything with biopython code.

It won't probably break anything as long as you don't change existing
code. If you are only doing your parser I suppose it will be very
easily accepted in (dont forget test cases and documentation).
Regarding Statistics we need to discuss it.

> p.s. When python3000 will be released, it will be probably necessary to
> rewrite large portions of biopython, if not creating a 'biopython 2' version
> (I think they were discussing something like this in bioperl's list).

Peter and Michiel opinions on this topic are be fundamental (they do
most of the work maintaining biopython). But I suppose retro
compatibility is a must.

> I thought that maybe, even if we make some 'mistakes' in this version of
> biopython, we will be able to fix them in a later version.

Mistakes should not break existing code. That is really something we
should try to avoid.

> I think that a good idea would be starting collecting use cases to have an
> idea how many things we'll have to implement in this module.

This might sound elitist, but most people doing population genetics
don't really have any idea of what they should expect from software.
While for the "business of sequences and alignment" there is a large,
mature software community, the same doesn't happen in population
genetics. Or to put it in another way: you don't want to imagine the
type of questions that arrive to my private mailbox ;) .

> I sent that mail to the Open::Bio::I last week, but still haven't received
> many replies... I will send a message to the various Bio.* mailing list in
> the next days.

OBF, in my view, is a bit slow and bureaucratic.
Anyway, i think that anybody's views will get more importance in
proportion of the quantity of code submitted and time devoted to
maintenance of the whole thing.


Tiago

From bugzilla-daemon at portal.open-bio.org  Mon Nov  3 17:58:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 3 Nov 2008 17:58:11 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811032258.mA3MwBoH008744@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-03 17:58 EST -------
(In reply to comment #4)
> 
> I noticed you are using set (line 145 in the new cvs file) which is not
> compatible with Python2.3. How should this be addressed?
> 

I've been using something like this elsewhere in Biopython:

#TODO - Remove this work around once we drop python 2.3 support
try:
    set
except NameError:
    from sets import Set as set

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Mon Nov  3 18:08:44 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 3 Nov 2008 23:08:44 +0000
Subject: [Biopython-dev] Bio.PopGen and SIMCOAL2 installation
In-Reply-To: <6d941f120811031134p4c0f1756k5ded879de7555dad@mail.gmail.com>
References: <320fb6e00811030443w4d620c83w64c83fdafb9afa96@mail.gmail.com>
	<6d941f120811031134p4c0f1756k5ded879de7555dad@mail.gmail.com>
Message-ID: <320fb6e00811031508xfef548dm1a0673b7dba70567@mail.gmail.com>

On Mon, Nov 3, 2008 at 7:34 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> Hi,
>
> On Mon, Nov 3, 2008 at 12:43 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> However, Bio.PopGen.SimCoal.Controller seems to assume the executable
>> will be called just simcoal2 (or simcoal2.exe on Windows), and thus
>> fails detect a binary called simcoal2_1_2.  The unit test however is
>> more flexible and looks for any binary on the path whose name starts
>> with simcoal2.  Ideally these two should be consistent.
>
> I am aware of this, in fact, this issue is documented in the tutorial
> (9.5.2.2). The idea is that the binary should be called simcoal2 as
> documented. This can be changed of course. My preference would be to
> change just the test code. Is this ok with you?
>
>> I can make test_PopGen_SimCoal.py pass by installing SIMCOAL2 as
>> simcoal2 rather than simcoal2_1_2, but is this a SIMCOAL2 installation
>> issue or a bug in Bio.PopGen.SimCoal.Controller?  In my experience it
>> is not normal for a Linux tool to include the full version in the
>> executable name - using just simcoal2 does make more sense.
>
> Agree. And, again, this is documented in the tutorial. I can go ahead
> and change the test code (please just confirm).

I had skimmed over the tutorial, but missed this bit - sorry.
Hopefully anyone interested in using SIMCOAL would have read this more
carefully, but perhaps it could be made more prominent? e.g. try to
include a few more keywords like install/installation and executable
as well as binary (which I did not think to search for at the time).

Let's just change test_PopGen_SimCoal.py to look for simcoal2 (or
simcoal2.exe on Windows) so it is consistent with
Bio.PopGen.SimCoal.Controller, and I would also mention what the
binary should be called in the SimCoalController __init__ docstring.

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 04:31:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 04:31:19 -0500
Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for
	arguments to their types
In-Reply-To: <bug-2639-42@http.bugzilla.open-bio.org/>
Message-ID: <200811040931.mA49VJOT019957@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2639


------- Comment #2 from dalloliogm at gmail.com  2008-11-04 04:31 EST -------
I have tested the cvs code, it seems to work.

Maybe you can allow ids to be integers, also.
If you are afraid of causing problems to older scripts, you could str() the
arguments if they are not strings.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 04:39:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 04:39:18 -0500
Subject: [Biopython-dev] [Bug 2443] Specifying the alphabet in Bio.SeqIO and
	Bio.AlignIO
In-Reply-To: <bug-2443-42@http.bugzilla.open-bio.org/>
Message-ID: <200811040939.mA49dIQ9021075@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2443


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 04:39 EST -------
Marking as fixed - unit tests updated, and the new argument is mentioned in the
tutorial as well.

A more extensive example would be nice, perhaps using Bio.AlignIO with the
Bio.Align.AlignInfo module...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 05:06:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 05:06:40 -0500
Subject: [Biopython-dev] [Bug 2628] Have Bio.SeqIO.write(...) and
	Bio.AlignIO.write(...) return number of records
In-Reply-To: <bug-2628-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041006.mA4A6eAt024777@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2628


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 05:06 EST -------
Patch checked in, marking as fixed.

Checking in Bio/SeqIO/Interfaces.py;
/home/repository/biopython/biopython/Bio/SeqIO/Interfaces.py,v  <-- 
Interfaces.py
new revision: 1.11; previous revision: 1.10
done
Checking in Bio/SeqIO/__init__.py;
/home/repository/biopython/biopython/Bio/SeqIO/__init__.py,v  <--  __init__.py
new revision: 1.44; previous revision: 1.43
done
Checking in Bio/AlignIO/Interfaces.py;
/home/repository/biopython/biopython/Bio/AlignIO/Interfaces.py,v  <-- 
Interfaces.py
new revision: 1.7; previous revision: 1.6
done
Checking in Bio/AlignIO/NexusIO.py;
/home/repository/biopython/biopython/Bio/AlignIO/NexusIO.py,v  <--  NexusIO.py
new revision: 1.7; previous revision: 1.6
done
Checking in Bio/AlignIO/__init__.py;
/home/repository/biopython/biopython/Bio/AlignIO/__init__.py,v  <-- 
__init__.py
new revision: 1.19; previous revision: 1.18
done
Checking in Tests/test_SeqIO.py;
/home/repository/biopython/biopython/Tests/test_SeqIO.py,v  <--  test_SeqIO.py
new revision: 1.44; previous revision: 1.43
done
Checking in Tests/test_AlignIO.py;
/home/repository/biopython/biopython/Tests/test_AlignIO.py,v  <-- 
test_AlignIO.py
new revision: 1.17; previous revision: 1.16
done


Checking in Tutorial.tex;
/home/repository/biopython/biopython/Doc/Tutorial.tex,v  <--  Tutorial.tex
new revision: 1.183; previous revision: 1.182
done


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 05:51:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 05:51:23 -0500
Subject: [Biopython-dev] [Bug 2640] New: Proposal: doctest for
	SeqRecord/biopython
Message-ID: <bug-2640-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640

           Summary: Proposal: doctest for SeqRecord/biopython
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P3
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: dalloliogm at gmail.com


I would like to propose to use doctest tests in biopython.
I found them very useful to understand how a script should be used, and
moreover they can act as test units.

Here it is the main documentation for unittest:
- http://www.python.org/doc/2.5.2/lib/module-doctest.html

Usually, you add a _test() function to every module, which calls the unittest
libraries, and launch it with __name__ == '__main__'.

The most significative example is added to the documentation string of every
module/function, and tested with doctest.testmod(); later, you add more tests
in a separate file, and launch them with doctest.testfile().


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 05:52:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 05:52:21 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041052.mA4AqLGX028185@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #1 from dalloliogm at gmail.com  2008-11-04 05:52 EST -------
Created an attachment (id=1031)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1031&action=view)
patch to add doctest to SeqRecord.py

here it is a patch to add doctest documentation to Bio/SeqRecord.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 06:23:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 06:23:12 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041123.mA4BNCQ0030388@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #27 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 06:23 EST -------
Created an attachment (id=1032)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view)
Patch to Bio/Seq.py to add start codon handling to translation

Patch adds a new boolean argument to the translate method and function, called
"init" (rather than my earlier suggestions like "from_start" or "check_start"
which could be considered misleading).

Docstring:

        init - Boolean, defaults to False.  Should translation check the
               first codon is a valid initiation (start) codon and translate
               it as methionine (M)?  If False, nothing special is done with
               the first codon.


Example usage of the translate function,

>>> from Bio.Seq import translate
>>> translate("TTGAAACCCTAG")
'LKP*'
>>> translate("TTGAAACCCTAG", init=True, to_stop=True)
'MKP'
>>> translate("TTGAAACCCTAG", init=True)
'MKP*'
>>> translate("TTGAAACCCTAG", to_stop=True)
'LKP'

Using the Seq method,

>>> from Bio.Seq import Seq
>>> my_seq = Seq("TTGAAACCCTAG")
>>> my_seq.translate()
Seq('LKP*', HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> my_seq.translate(init=True, to_stop=True)
Seq('MKP', ExtendedIUPACProtein())
>>> my_seq.translate(init=True)
Seq('MKP*', HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> my_seq.translate(to_stop=True)
Seq('LKP', ExtendedIUPACProtein())

Comments please.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 06:23:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 06:23:39 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041123.mA4BNdAS030439@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


dalloliogm at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1031 is|0                           |1
           obsolete|                            |


------- Comment #2 from dalloliogm at gmail.com  2008-11-04 06:23 EST -------
Created an attachment (id=1033)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1033&action=view)
patch to add doctest to SeqRecord.py

This patch is maybe clearer than the previous one - it adds an example on
adding annotations to a SeqRecord.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Tue Nov  4 06:36:50 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 4 Nov 2008 11:36:50 +0000
Subject: [Biopython-dev] Preparing for Biopython 1.49 (beta)
Message-ID: <320fb6e00811040336k12a834b9o2fa103b8fabf7ec1@mail.gmail.com>

Dear all,

The Numeric to numpy migration is done now, and we are also looking
good for python 2.6.

After a little off list discussion, its probably time to prepare the
next release.  However, given the number of changes, and therefore the
higher risk that we've broken something, we'll call this a beta
release.

Are there any bugs or issues people think should block this release?

I would like to check in my initiation/start codon argument patch for
translation (see Bug 2381), but would like a little discussion on this
first (in particular the argument naming).

I'd like to try and do the Biopython 1.49 "beta" release at the end of
this week (with a follow up Biopython 1.49 "final" release say one
week later if needed to deal with any issues from the beta).

If this schedule is realistic, then Tiago should be OK to add his next
set of PopGen code in about two weeks time (for what would become
Biopython 1.50).

Peter

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 06:48:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 06:48:53 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041148.mA4Bmrag032109@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 06:48 EST -------
I think we would need to integrate this into the existing test framework so
that any new doctests are actually used.  For an example of this on a module by
module basis, see test_Wise.py and test_psw.py (although these don't interact
well with our test framework on Python 2.3, see bug 2613).

If a large number of Biopython modules have doctests then a more automated
system could be designed (searching all non-deprecated modules for doctests).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 07:04:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 07:04:54 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041204.mA4C4sHS000823@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #28 from dalloliogm at gmail.com  2008-11-04 07:04 EST -------
(In reply to comment #27)
> Created an attachment (id=1032)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details]
> Patch to Bio/Seq.py to add start codon handling to translation
> 
> Patch adds a new boolean argument to the translate method and function, called
> "init" (rather than my earlier suggestions like "from_start" or "check_start"
> which could be considered misleading).
> 
> Docstring:
> 
>         init - Boolean, defaults to False.  Should translation check the
>                first codon is a valid initiation (start) codon and translate
>                it as methionine (M)?  If False, nothing special is done with
>                the first codon.

I don't like the name 'init' :( it would be better to use an argument with the
word 'force' in it. E.g.: force_has_coding, force_first_position, etc..

If you didn't have read this discussion in this bug report, it is not very
clear what happens when init=True and why.
You should add a description of why there is this options in the docstring.

> 
> Example usage of the translate function,
> 
> >>> from Bio.Seq import translate
> >>> translate("TTGAAACCCTAG")
> 'LKP*'
> >>> translate("TTGAAACCCTAG", init=True, to_stop=True)
> 'MKP'

Without having read the discussion in this bug report, I was expecting an
exception here.. why does it forces a Methionine to be in the first position? 
It loses the information of a Leu in the first position.

> >>> translate("TTGAAACCCTAG", init=True)
> 'MKP*'
> >>> translate("TTGAAACCCTAG", to_stop=True)
> 'LKP'
> 

You could add a check for non coding aminoacids:
>>> translate("UAACAGTGCAT")
ExceptionError: Non coding aminoacid in the first position


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 07:28:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 07:28:56 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041228.mA4CSuvT002892@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #29 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 07:28 EST -------
(In reply to comment #28)
> (In reply to comment #27)
> > Created an attachment (id=1032)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] [details]
> > Docstring:
> > 
> >         init - Boolean, defaults to False.  Should translation check the
> >                first codon is a valid initiation (start) codon and translate
> >                it as methionine (M)?  If False, nothing special is done with
> >                the first codon.
> 
> I don't like the name 'init' :( it would be better to use an argument with the
> word 'force' in it. E.g.: force_has_coding, force_first_position, etc..

Maybe - but I don't think force_has_coding, force_first_position are any
clearer, and they are very long.  Do you like "with_start_codon" or
"with_init_codon"?

Note that I used "init" rather than "initiation (codon)" because python already
uses init as shorthand for initiation/initialisation.

> If you didn't have read this discussion in this bug report, it is not very
> clear what happens when init=True and why.

If it have been called "start" or "from_start" or "start_codon" the meaning
isn't clear either - you might "start" or expect "from_start" to take an
integer location, and start_codon to take a three letter string.

> You should add a description of why there is this options in the docstring.

OK - That makes sense.

> > 
> > Example usage of the translate function,
> > 
> > >>> from Bio.Seq import translate
> > >>> translate("TTGAAACCCTAG")
> > 'LKP*'
> > >>> translate("TTGAAACCCTAG", init=True, to_stop=True)
> > 'MKP'
> 
> Without having read the discussion in this bug report, I was expecting an
> exception here.. why does it forces a Methionine to be in the first position? 
> It loses the information of a Leu in the first position.

Because if this was a CDS using an alternative start codon of TTG it would be
translated as a methionine and NOT as a leucine (because instead of a typical
tRNA-Leu, an initiation tRNA is used).  This is whole point of this optional
argument.  If you want TTG translated blindly as M, don't use the init argument
(or set it to False).

See also http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi which
explicitly lists these alternative codons as giving M when used as starts, e.g.

    AAs  = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts = ---M---------------M---------------M----------------------------
  Base1  = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Base2  = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
  Base3  = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 08:41:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 08:41:51 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041341.mA4DfpYD009210@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp  2008-11-04 08:41 EST -------
I've committed Peter's fix for the set import to CVS.

About the replacement for listfns.contents in the modified NaiveBayes code: Did
you do any timings to compare the new code to the old code? Since
listfns.contents is implemented in C, it may be (much) faster than the
replacement code.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 08:57:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 08:57:48 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041357.mA4Dvm2B010202@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #30 from lpritc at scri.sari.ac.uk  2008-11-04 08:57 EST -------
(In reply to comment #29)
> (In reply to comment #28)
> > (In reply to comment #27)
> > > Created an attachment (id=1032)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] [details] [details]
> > > Docstring:
> > > 
> > >         init - Boolean, defaults to False.  Should translation check the
> > >                first codon is a valid initiation (start) codon and translate
> > >                it as methionine (M)?  If False, nothing special is done with
> > >                the first codon.
> > 
> > I don't like the name 'init' :( it would be better to use an argument with the
> > word 'force' in it. E.g.: force_has_coding, force_first_position, etc..
> 
> Maybe - but I don't think force_has_coding, force_first_position are any
> clearer, and they are very long.  Do you like "with_start_codon" or
> "with_init_codon"?

I think that there are two key things that are going on as a result of this
setting being True:

1) The first codon (starting at position 0) of the nucleotide sequence is being
checked as a valid initiation codon

2) If it is such a valid codon, the translated aa is Met (because this is what
happens biologically).

It's quite a complicated concept, and if we wanted to be completely explicit,
an option called 'assert_first_codon_is_initiation_and_translate_to_met' would
be clear, but would be far too long to be sensible.  Most other shorter options
are either ambiguous, misleading, or ambiguously misleading - largely because
people will assume that the term means what they want it to mean instead of
what it does, as described below:

> If it have been called "start" or "from_start" or "start_codon" the meaning
> isn't clear either - you might "start" or expect "from_start" to take an
> integer location, and start_codon to take a three letter string.

I am not too worried about long arguments, so 'assert_first_codon_init' would
be fine for me (though does this mean that the first codon of the sequence
should be an initiation codon, or that translation should start from the first
initiation codon?), but I see the drive for, and value of, brevity.  If there's
a short, unambiguous option name that you can think of, I'm all for it.  An
option name that is a little cryptic, but not misleading, such as 'init', also
works for me.  I would have to go to the minor effort of typing
help(seq.translate) to find out what it meant, but it's not very much of a
chore.

Also, people learn all kinds of non-standard uses for cryptic terms, all the
time.  For example, what on earth does 'popen3' mean?  Why not
open_pipes_with_stdin_stdout_stderr?  'popen3' is short, unambiguous (if not
immediately obvious), and if you want to know what it means, then help() or a
dip in the documentation will tell you.  I think the same will be true of
'init', so long as no-one is likely to confuse it with some other meaning.

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 08:58:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 08:58:21 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041358.mA4DwLiK010266@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #4 from dalloliogm at gmail.com  2008-11-04 08:58 EST -------
(In reply to comment #3)
> I think we would need to integrate this into the existing test framework so
> that any new doctests are actually used.  For an example of this on a module by
> module basis, see test_Wise.py and test_psw.py (although these don't interact
> well with our test framework on Python 2.3, see bug 2613).
> 
> If a large number of Biopython modules have doctests then a more automated
> system could be designed (searching all non-deprecated modules for doctests).
> 

If you think it would be useful, I can write other doctests for other modules
in the following days.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 09:44:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 09:44:15 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041444.mA4EiFv5013693@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #31 from dalloliogm at gmail.com  2008-11-04 09:44 EST -------
(In reply to comment #30)
> (In reply to comment #29)
> > (In reply to comment #28)
> > > (In reply to comment #27)
> > > > Created an attachment (id=1032)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] [details] [details] [details]
> 
> It's quite a complicated concept, and if we wanted to be completely explicit,
> an option called 'assert_first_codon_is_initiation_and_translate_to_met' would
> be clear, but would be far too long to be sensible.  Most other shorter options
> are either ambiguous, misleading, or ambiguously misleading - largely because
> people will assume that the term means what they want it to mean instead of
> what it does, as described below:
> 
> > If it have been called "start" or "from_start" or "start_codon" the meaning
> > isn't clear either - you might "start" or expect "from_start" to take an
> > integer location, and start_codon to take a three letter string.
> 
> I am not too worried about long arguments, so 'assert_first_codon_init' would
> be fine for me (though does this mean that the first codon of the sequence
> should be an initiation codon, or that translation should start from the first
> initiation codon?), but I see the drive for, and value of, brevity.  If there's
> a short, unambiguous option name that you can think of, I'm all for it.  An
> option name that is a little cryptic, but not misleading, such as 'init', also
> works for me. 

When I saw 'init' for the first time, I thought there it was some kind of
complicated calculus associated with the translate function, that init=False
was meant to skip in order to have some kind of faster but less accurate
translation.

> I would have to go to the minor effort of typing
> help(seq.translate) to find out what it meant, but it's not very much of a
> chore.


It is also a matter of code readibility; I don't think many people would
understand that init is meant for that by looking at a script.

If I use this option in one of my scripts, and a colleague reads it, I want to
be sure that he will be easily understand that I am forcing the first position
to be a Methionine.
Otherwise, the risk is that he won't understand properly my results.

In which of these examples do you understand that the first position is being
forced to a Methionine?
>>> translate("TTGAAACCCTAG", init=True, to_stop=True)

>>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True)

>>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True)

>>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True)

>>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True)

Also, I don't think this option will be used very often. 
So, it shouldn't be a problem if its name is too long to type, and it would be
better if it is easy to understand.


> 
> Also, people learn all kinds of non-standard uses for cryptic terms, all the
> time.  For example, what on earth does 'popen3' mean?  Why not
> open_pipes_with_stdin_stdout_stderr?  'popen3' is short, unambiguous (if not
> immediately obvious),

When I was a python newbie, I really hated the name popen3 :)

> and if you want to know what it means, then help() or a
> dip in the documentation will tell you.  I think the same will be true of
> 'init', so long as no-one is likely to confuse it with some other meaning.
> 
> L.
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 09:45:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 09:45:17 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041445.mA4EjHg4013777@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 09:45 EST -------
(In reply to comment #4)
> 
> If you think it would be useful, I can write other doctests for other modules
> in the following days.
> 

I think adding more doctests would be useful, but they MUST get run by our
existing test suite.  Otherwise they'll just be human readable documentation
(which is still nice) but will not get regularly validated.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 10:39:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 10:39:42 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041539.mA4Fdgc8017798@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #32 from lpritc at scri.sari.ac.uk  2008-11-04 10:39 EST -------
(In reply to comment #31)
> (In reply to comment #30)

> It is also a matter of code readibility; I don't think many people would
> understand that init is meant for that by looking at a script.

True enough, but if someone's already used it, and you don't know what it means
when reading their script, looking it up isn't hard.  What's hard is guessing
which option you need to invoke, and calling help() is one way to do that,
too...

Not that I want to extend this argument to single-letter options with *no*
relevance to their intent ;)

seq.translate(a=True, b='GUG', c=9)

> If I use this option in one of my scripts, and a colleague reads it, I want to
> be sure that he will be easily understand that I am forcing the first position
> to be a Methionine.
> Otherwise, the risk is that he won't understand properly my results.

Maybe put it in a comment-line?   Even if the colleague understands from the
code *that* you've translated an alternative start to a methionine, they may
not understand *why* - and the comment line is essential, then.

> In which of these examples do you understand that the first position is being
> forced to a Methionine?

None are particularly clear, but only one of them doesn't give me the wrong
idea...

> >>> translate("TTGAAACCCTAG", init=True, to_stop=True)

Because I've read this thread (or looked at the docs) - I understand this one
;)

> >>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True)

I don't intuitively understand this.  Does it mean that the sequence should be
translatable?

> >>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True)

Does this mean that the sequence will be translated from the first methionine
the method finds?

> >>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True)

As above, and does force_stop mean that you add a '*' to the end of the
translation?  Or that you stop at a stop codon?

> >>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True)

'alt_start' I would think referred to allowing translation from alternative
start codons.  I don't know what alt_stop would mean...

> Also, I don't think this option will be used very often. 

Maybe not.  The first use case that comes to mind is QA on CDS-finding:

# Check if sequence is CDS:
assert candidate_cds.translate(init=True)
# Check if reported CDS start is valid
assert est[37:].translate(init=True)

A second use case is slower in presenting itself...

> So, it shouldn't be a problem if its name is too long to type, and it would be
> better if it is easy to understand.

That's a fair argument, I think.  On the whole, though, I would favour a short,
unambiguous, slightly cryptic name over a very long, unambiguous name, over an
ambiguous name of any length.

> When I was a python newbie, I really hated the name popen3 :)

At least we have subprocess, now. 

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 10:44:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 10:44:47 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041544.mA4FilLH018113@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #6 from dalloliogm at gmail.com  2008-11-04 10:44 EST -------
(In reply to comment #5)
> (In reply to comment #4)
> > 
> > If you think it would be useful, I can write other doctests for other modules
> > in the following days.
> > 
> 
> I think adding more doctests would be useful, but they MUST get run by our
> existing test suite.  Otherwise they'll just be human readable documentation
> (which is still nice) but will not get regularly validated.

There are a few ways to do it, but it is not too difficult to implement.
The easiest thing is to use 'doctest.testmod' in the test files.
For example, you can add to test_SeqRecord.py the following lines:

import doctest 
from Bio import SeqRecord   # import the module, not SeqRecord.SeqRecord
print "testing with doctest..."
(failures, tests) = doctest.testmod(SeqRecord)
if failures == 0:
    print 'ok'
else:
    print 'some test has failed'

or you can launch the '_test' function in every module (see my patch), but this
would require importing doctest multiple times.
>>> SeqRecord._test()


I will write some other doctests in the following days/weeks and post them here
as patches, and you will decide.
Anyway, do you think they will make biopython's documentation nicer? Do you
like them?
Sometimes, doctests make the doc strings a bit messy, so some people don't like
them. 
But it is really a matter of how you write them.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 11:11:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 11:11:49 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041611.mA4GBnuW020154@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 11:11 EST -------
(In reply to comment #32)
> > In which of these examples do you understand that the first position is
> > being forced to a Methionine?

With my suggested code, you would not just be forcing the first codon to be a
methionine.  You would also be asking for the first codon to be validated as a
start codon (initialisation codon).

> None are particularly clear, but only one of them doesn't give me the wrong
> idea...

In some cases I seem to have guessed different possible meanings for some of
these suggested names - so those are probably unclear.

> > >>> translate("TTGAAACCCTAG", init=True, to_stop=True)
> 
> Because I've read this thread (or looked at the docs) - I understand this one
> ;)

To me this suggests something special is happening with the initialisation of
the translation - but I agree its not clear what without checking the
documentation.

> > >>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True)
> 
> I don't intuitively understand this.  Does it mean that the sequence should be
> translatable?

Ditto - an argument called force_as_translating means nothing to me.  You're
calling a translation method so what can forcing a translation mean?

> > >>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True)
> 
> Does this mean that the sequence will be translated from the first methionine
> the method finds?

I would have guessed force_methionine would ignore the value of the first three
nucleotides in order to treat them as a methionine (even if they are not a
start codon).

> > >>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True)
> 
> As above, and does force_stop mean that you add a '*' to the end of the
> translation?  Or that you stop at a stop codon?

Like Leighton, I would be confused by "force_stop".  It could mean add a stop
symbol to the end of the amino acid sequence even if there isn't one there
already.

> > >>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True)
> 
> 'alt_start' I would think referred to allowing translation from alternative
> start codons.  I don't know what alt_stop would mean...

I think "alt_start" would be misleading for the intended dual functionality. 
Consider the typical use case for this option - translating a CDS, which most
of the time will use the typical start codon AUG / ATG (but not all ways). 
We'd want the start codon validated - and it often won't be an alternative
start codon.  So calling the argument "alt_start" is confusing.

> > Also, I don't think this option will be used very often. 
> 
> Maybe not.  The first use case that comes to mind is QA on CDS-finding:
> 
> # Check if sequence is CDS:
> assert candidate_cds.translate(init=True)
> # Check if reported CDS start is valid
> assert est[37:].translate(init=True)
> 
> A second use case is slower in presenting itself...

I think translating a CDS is quite a common task - so a very long argument
would be bad.

Instead of the "init" start codon option in attachment 1032, I'd also be happy
with a single boolean argument which does start codon validation, treats this
as a methionine, checks the sequence is a multiple of three in length, checks
for a final stop codon, and checks for no additional stop codons.  We'd ruled
out calling this "complete", but maybe "cds" would be better?

> > So, it shouldn't be a problem if its name is too long to type, and it would
> > be better if it is easy to understand.
> 
> That's a fair argument, I think.  On the whole, though, I would favour a
> short, unambiguous, slightly cryptic name over a very long, unambiguous
> name, over an ambiguous name of any length.

There is a lot of subjectiveness in argument naming - clearly we have not come
up with a perfect suggestion yet.

Unfortunately "init" can be misunderstood (I'm not 100% sure what you were
trying to say in comment 31, but I think you thought from the name "init" could
be some sort of optional optimisation initialisation).

How about "cds_start" instead of "init"?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 12:43:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 12:43:53 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041743.mA4Hhrcc026138@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #34 from bsouthey at gmail.com  2008-11-04 12:43 EST -------
(In reply to comment #33)
> (In reply to comment #32)
> > > In which of these examples do you understand that the first position is
> > > being forced to a Methionine?
> 
> With my suggested code, you would not just be forcing the first codon to be a
> methionine.  You would also be asking for the first codon to be validated as a
> start codon (initialisation codon).
> 
> > None are particularly clear, but only one of them doesn't give me the wrong
> > idea...
> 
> In some cases I seem to have guessed different possible meanings for some of
> these suggested names - so those are probably unclear.
> 
> > > >>> translate("TTGAAACCCTAG", init=True, to_stop=True)
> > 
> > Because I've read this thread (or looked at the docs) - I understand this one
> > ;)
> 
> To me this suggests something special is happening with the initialisation of
> the translation - but I agree its not clear what without checking the
> documentation.
> 
> > > >>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True)
> > 
> > I don't intuitively understand this.  Does it mean that the sequence should be
> > translatable?
> 
> Ditto - an argument called force_as_translating means nothing to me.  You're
> calling a translation method so what can forcing a translation mean?
> 
> > > >>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True)
> > 
> > Does this mean that the sequence will be translated from the first methionine
> > the method finds?
> 
> I would have guessed force_methionine would ignore the value of the first three
> nucleotides in order to treat them as a methionine (even if they are not a
> start codon).
> 
> > > >>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True)
> > 
> > As above, and does force_stop mean that you add a '*' to the end of the
> > translation?  Or that you stop at a stop codon?
> 
> Like Leighton, I would be confused by "force_stop".  It could mean add a stop
> symbol to the end of the amino acid sequence even if there isn't one there
> already.
> 
> > > >>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True)
> > 
> > 'alt_start' I would think referred to allowing translation from alternative
> > start codons.  I don't know what alt_stop would mean...
> 
> I think "alt_start" would be misleading for the intended dual functionality. 
> Consider the typical use case for this option - translating a CDS, which most
> of the time will use the typical start codon AUG / ATG (but not all ways). 
> We'd want the start codon validated - and it often won't be an alternative
> start codon.  So calling the argument "alt_start" is confusing.
> 
> > > Also, I don't think this option will be used very often. 
> > 
> > Maybe not.  The first use case that comes to mind is QA on CDS-finding:
> > 
> > # Check if sequence is CDS:
> > assert candidate_cds.translate(init=True)
> > # Check if reported CDS start is valid
> > assert est[37:].translate(init=True)
> > 
> > A second use case is slower in presenting itself...
> 
> I think translating a CDS is quite a common task - so a very long argument
> would be bad.
> 
> Instead of the "init" start codon option in attachment 1032 [details], I'd also be happy
> with a single boolean argument which does start codon validation, treats this
> as a methionine, checks the sequence is a multiple of three in length, checks
> for a final stop codon, and checks for no additional stop codons.  We'd ruled
> out calling this "complete", but maybe "cds" would be better?
> 
> > > So, it shouldn't be a problem if its name is too long to type, and it would
> > > be better if it is easy to understand.
> > 
> > That's a fair argument, I think.  On the whole, though, I would favour a
> > short, unambiguous, slightly cryptic name over a very long, unambiguous
> > name, over an ambiguous name of any length.
> 
> There is a lot of subjectiveness in argument naming - clearly we have not come
> up with a perfect suggestion yet.
> 
> Unfortunately "init" can be misunderstood (I'm not 100% sure what you were
> trying to say in comment 31, but I think you thought from the name "init" could
> be some sort of optional optimisation initialisation).
> 
> How about "cds_start" instead of "init"?
> 


As I think about this and the various comments, I do that you must apply the
same reasoning to non-standard translation as was applied to the ORF finding
comments. From that I understand that you want a basic translation function so
function arguments like to_stop or cds_start would be inappropriate. Also, even
if it was possible, I do not see that validating all known start codons under
all genetic codes fits here.

Rather I think the various comments reflect various combinations of three major
steps:

1) Identify the region to be translated like NCBI's sequence viewer: range from
'begin' to 'end' to denote the region to be viewed. Under this view, start_from
or begin_at could be the position to start or the first occurrence of a start
codon. Likewise to_end or end_at could be a position or the first occurrence of
a stop codon. I also note this also implies frame but I think that has a
separate meaning.

2) Having defined the region to be translated, translate that region as defined
by the frame and selected table. A question here is that if region is defined
then should the frame be set to one or not.

3) Address any non-standard codons to the translated sequence. If you are going
to allow non-standard start codons, you also need to handle selenocysteine
(http://en.wikipedia.org/wiki/Selenocysteine) and less so pyrrolysine
(http://en.wikipedia.org/wiki/Pyrrolysine). Technically, you can argue the
table used for translation in 2) should reflect this but I consider it a
separate issue. Also, the occurrence of a stop codon would likewise need to
change.

The non-standard codon usages are rare and I do really question if these are
really part of the Seq object translate function or belong elsewhere. I really
feel that if the user already knows that it is a non-AUG start codon then they
can replace the first amino acid with Met rather than rely on the translate
function. For example, the CDS field in the Genbank record for Mouse
Neuropeptide W (NM_001099664) has:
/exception="alternative start codon"
/note="non-AUG (CUG) translation initiation codon".
So if the user looked at the record then then would know it would need to be
changed.

If some form of the non-standard codons is included I would think some variant
of Leighton's assert idea should be preferred such as using an
assert_nonstandard argument (or just nonstandard). This would be a string, list
or tuple to denote the changes to be made such as say 'Met1' or 'M1' where
three or single letter code of the desired amino acid and the number is the
location within the amino acid sequence to be changed. So Met1 would mean
changing the amino acid at position one with Methionine (M). But I recognize
this is not sufficient to handle other non-standard cases with stop codons.


Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 13:28:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 13:28:19 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041828.mA4ISJAd028961@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #35 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 13:28 EST -------
(In reply to comment #34)
> As I think about this and the various comments, I do that you must apply the
> same reasoning to non-standard translation as was applied to the ORF finding
> comments. From that I understand that you want a basic translation function so
> function arguments like to_stop or cds_start would be inappropriate.

There is certainly an argument that the Bio.Seq translate function/methods
should be kept as simple as possible while providing widely useful
functionality.  Perhaps given the lack of immediate agreement we are at that
point already?  Or perhaps this is a reflection of the different types of
organisms people work with and thus the relative frequencies of non-standard
start codons.

> Also, even if it was possible, I do not see that validating all known start
> codons under all genetic codes fits here.

We have the valid start codons in the CodonTable objects derived from the NCBI,
so it is possible to check them.

> ... Address any non-standard codons to the translated sequence. If you are
> going to allow non-standard start codons, you also need to handle
> selenocysteine (http://en.wikipedia.org/wiki/Selenocysteine) and less so
> pyrrolysine (http://en.wikipedia.org/wiki/Pyrrolysine). 

Why?  Non-standard codons are pretty common in prokaryotes and the rules for
translating them are simple (once the start codon is identified).

On the other hand selenocysteine and pyrrolysine are very rare, and we can't
define a computer rule to deal with them - so we don't even try.

> The non-standard codon usages are rare and I do really question if these are
> really part of the Seq object translate function or belong elsewhere. I really
> feel that if the user already knows that it is a non-AUG start codon then they
> can replace the first amino acid with Met rather than rely on the translate
> function. For example, the CDS field in the Genbank record for Mouse
> Neuropeptide W (NM_001099664) has:
> /exception="alternative start codon"
> /note="non-AUG (CUG) translation initiation codon".
> So if the user looked at the record then then would know it would need to be
> changed.

Non-standard start codons are not that rare in prokaryotes (and I would not
expect them to be annotated like your mouse example).  When translating a well
annotated sequence, the location itself should be enough.

[I'm assuming we're not talking about the other meaning of the phrase
"alternative start codons" - where a gene may have multiple valid start codons
giving proteins of different lengths but the same C-terminal region.]

> If some form of the non-standard codons is included I would think some
> variantof Leighton's assert idea should be preferred such as using an
> assert_nonstandard argument (or just nonstandard). This would be a string, 
> list or tuple to denote the changes to be made such as say 'Met1' or 'M1'
> where three or single letter code of the desired amino acid and the number
> is the location within the amino acid sequence to be changed. So Met1 would
> mean changing the amino acid at position one with Methionine (M). But I
> recognize this is not sufficient to handle other non-standard cases with
> stop codons.

I thought Leighton was just proposing another name for a boolean argument which
I had called "init" in attachment 1032.

I'm afraid I don't understand your idea of a complicated list argument.

=============================================================================

Here is a concrete example, there are 418 annotated genes in E. coli K12 with
non-standard start codons - which you might want to translate into proteins.

#Using
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K12_substr__MG1655/NC_000913.ffn
>>> from Bio import SeqIO
>>> odd = [record for record in SeqIO.parse(open("NC_000913.ffn"),"fasta") \
           if str(record.seq[:3]) <> "ATG"]
>>> print "There are %i genes not starting ATG" % len(odd)
There are 481 genes not starting ATG
>>> record = odd[0]
>>> print record.format("fasta")
>ref|NC_000913.2|:5234-5530
GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCA
GCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGAT
AATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACAT
TATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCAT
AAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA

This starts GTC which is a valid bacterial start codon.  I'd like to translate
this and get the actual biologically relevant protein as given in the GenBank
file NC_000913.gbk (maybe with or without the stop symbol at the end).  See:

     CDS             5234..5530
                     /gene="yaaX"
                     /locus_tag="b0005"
                     /codon_start=1
                     /transl_table=11
                     /product="predicted protein"
                     /protein_id="NP_414546.1"
                     /db_xref="ASAP:ABE-0000015"
                     /db_xref="UniProtKB/Swiss-Prot:P75616"
                     /db_xref="GI:16127999"
                     /db_xref="ECOCYC:G6081"
                     /db_xref="EcoGene:EG14384"
                     /db_xref="GeneID:944747"
                     /translation="MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGY
                     YWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR"

Without any non-standard start codon support, my translations start with a V:

>>> print record.seq.translate(table=11)
VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR*
>>> print record.seq.translate(table=11, to_stop=True)
VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR

With this proposed functionality I can obtain the desired results (both with
and without the terminator stop symbol):

>>> print record.seq.translate(table=11, to_stop=True, init=True)
MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR
>>> print record.seq.translate(table=11, init=True)
MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR*

I think that wanting to translate a CDS like this is a fairly common operation.
 Perhaps not as common as translation of a partial sequence, or translating
whole genomes or contigs where we want to translate through the stop codons --
but nevertheless, a common need.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 17:47:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 17:47:02 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811042247.mA4Ml2At014897@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #7 from bsouthey at gmail.com  2008-11-04 17:47 EST -------
(In reply to comment #6)
> I've committed Peter's fix for the set import to CVS.
> 
> About the replacement for listfns.contents in the modified NaiveBayes code: Did
> you do any timings to compare the new code to the old code? Since
> listfns.contents is implemented in C, it may be (much) faster than the
> replacement code.
> 

(Hopefully I created a patch correctly.)

The purpose of listfns.contents() is to compute the frequency of each class and
return it as a dictionary. There is a difference but it is very small between
the different versions (1/100ths of second) for what I have looked at (which is
more than the actual listfns.contents function).  


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 17:48:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 17:48:12 -0500
Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns
	import
In-Reply-To: <bug-2631-42@http.bugzilla.open-bio.org/>
Message-ID: <200811042248.mA4MmCiZ015012@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2631


------- Comment #6 from bsouthey at gmail.com  2008-11-04 17:48 EST -------
Created an attachment (id=1036)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1036&action=view)
Patch to NaiveBayes


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 21:33:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 21:33:32 -0500
Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns
	import
In-Reply-To: <bug-2631-42@http.bugzilla.open-bio.org/>
Message-ID: <200811050233.mA52XWrB025772@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2631


------- Comment #7 from bsouthey at gmail.com  2008-11-04 21:33 EST -------
(In reply to comment #6)
> Created an attachment (id=1036)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1036&action=view) [details]
> Patch to NaiveBayes
> 

Sorry about this as I do not know how this ended up here. Please just ignore
it.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 21:35:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 21:35:53 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811050235.mA52Zr0b025894@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


bsouthey at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1014 is|0                           |1
           obsolete|                            |


------- Comment #8 from bsouthey at gmail.com  2008-11-04 21:35 EST -------
Created an attachment (id=1037)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1037&action=view)
Patch to update NaiveBayes

Hopefully I got this correct, if not just let me know.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 05:24:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 05:24:15 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051024.mA5AOF60024355@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-05 05:24 EST -------
(In reply to comment #8)
> Created an attachment (id=1037)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1037&action=view) [details]
> Patch to update NaiveBayes
> 
> Hopefully I got this correct, if not just let me know.
> 

At first glance it looks like this patch would remove the Python 2.3 set work
around.  Easily fixed.

Also, I would have called the new get_content_freq function _get_content_freq
(leading underscore denoting private) as this is an implementation detail that
doesn't need to be part of the public API.

I'm curious what your other implementations looked like, as this one does not
look that clear to me at first read:

    p_contents=1.0/len(contents)
    content_freqs={}
    for cval in contents:
        vcount=content_freqs.get(cval,0)+p_contents
        content_freqs.update({cval:vcount})

In particular, why use the dict update method?

Given the possible rounding issues, does doing the rescaling (dividing by the
number of elements) at the start make a big time saving (over dividing each
total at the end)?  I would feel happier with the division at the end (as done
in the listfns code).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 07:06:04 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 07:06:04 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051206.mA5C64Pg030176@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-05 07:06 EST -------
I've updated Bio.Seq, Bio.SeqIO and Bio.AlignIO so my existing docstring
examples can be used with doctest.

Adding code via the __main__ trick to allow each module's test to be run
individually might be worthwhile.

The rest of this message is a possible "test_docstrings.py" file for our unit
tests, which would require manual updating whenever we want to test a
additional module.  This is probably a neat short term solution while only a
relatively small proportion of Biopython uses doctests.

-----------------------------------------------------------------
#!/usr/bin/env python
# This code is part of the Biopython distribution and governed by its
# license.  Please see the LICENSE file that should have been included
# as part of this package.

import doctest, unittest

from Bio import Seq, SeqRecord, SeqIO, AlignIO
test_modules = [Seq, SeqRecord, SeqIO, AlignIO]

test_suite = unittest.TestSuite((doctest.DocTestSuite(module) \
                                 for module in test_modules))

#Using sys.stdout prevent this working nicely when run from idle:
#runner = unittest.TextTestRunner(sys.stdout, verbosity = 0)

#Using verbosity = 0 means we won't have to regenerate the unit
#test output file used by the run_tests.py framework whenever a
#new module or doctest is added.
runner = unittest.TextTestRunner(verbosity = 0)
runner.run(test_suite)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 08:12:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 08:12:28 -0500
Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like
	5933^5934 in GenBank/EMBL files
In-Reply-To: <bug-2622-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051312.mA5DCSYZ004411@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2622


chapmanb at 50mail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #4 from chapmanb at 50mail.com  2008-11-05 08:12 EST -------
Fixed with Bio/GenBank/__init__.py 1.93, Bio/SeqFeature.py 1.14.

Coordinates are now passed correctly with Peter's suggested fix. The empty
slice issue is resolved by adding this as a special case to FeatureLocation
nofuzzy attribute retrieval. For standard retrieval the classes are fully
available to the user and they would need to make the distinction about how
they would like to treat them.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 08:14:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 08:14:51 -0500
Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in
	GenBank files
In-Reply-To: <bug-2225-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051314.mA5DEpVe004918@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2225


chapmanb at 50mail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from chapmanb at 50mail.com  2008-11-05 08:14 EST -------
Fixed with Bio/GenBank/__init__.py 1.93, Bio/GenBank/Record.py 1.11 and
Bio/GenBank/Scanner.py 1.24

The PROJECT line is parsed as a list of projects for both SeqIO and Record
based parsing, for consistency. Output of PROJECT line also added.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 08:18:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 08:18:22 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051318.mA5DIMPJ005649@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp  2008-11-05 08:18 EST -------
See
http://coreygoldberg.blogspot.com/2008/07/python-counting-items-in-list.html
for some timings of this operation. I think Bruce's approach is most suitable,
except for the dict update method; I would use
        content_freqs[cval] = content_freqs.get(cval,0)+p_contents
instead. Depending on the contents of the list, sometimes it runs even faster
than the implementation in listfns.
> 
> Given the possible rounding issues, does doing the rescaling (dividing by the
> number of elements) at the start make a big time saving (over dividing each
> total at the end)?  I would feel happier with the division at the end (as done
> in the listfns code).
> 
I think the rescaling at the start is a good thing. If the list contains many
different objects, rescaling at the end can take a long time. Probably that is
not the typical use case here, but on the other hand I don't see a good reason
not to save time here.

Maybe just my nitpicking, but I think the get_content_freq function will be
more readable if we use different variable names inside this function.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 08:31:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 08:31:49 -0500
Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in
	GenBank files
In-Reply-To: <bug-2225-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051331.mA5DVnNI007802@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2225


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-05 08:31 EST -------
Do you think we have to worry about multiple project lines, or project entries
spanning multiple lines?

This would require a slight difference to the parsing (to append new project
entries instead of replacing any prior entries), and to the output from the
record object (including line wrapping).

HOWEVER, reading the latest ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt it seems
the PROJECT line will be replaced with a DBLINK line next year.

With that in mind, I would now suggest we parse the PROJECT and/or DBLINK lines
and store them in the record.dbxrefs list (rather than in the annotations).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 08:34:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 08:34:41 -0500
Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like
	5933^5934 in GenBank/EMBL files
In-Reply-To: <bug-2622-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051334.mA5DYfWx008228@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2622


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-05 08:34 EST -------
Hi Brad,

Looking back on this I may have been out by one on the extension calculation,
i.e. I'm not 100% sure position.high.val-position.low.val is appropriate.

I'll try and look at this later...

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 11:51:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 11:51:07 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051651.mA5Gp7R6003323@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #11 from bsouthey at gmail.com  2008-11-05 11:51 EST -------
(In reply to comment #10)
> See
> http://coreygoldberg.blogspot.com/2008/07/python-counting-items-in-list.html
> for some timings of this operation. I think Bruce's approach is most suitable,
> except for the dict update method; I would use
>         content_freqs[cval] = content_freqs.get(cval,0)+p_contents
> instead. Depending on the contents of the list, sometimes it runs even faster
> than the implementation in listfns.
> > 
> > Given the possible rounding issues, does doing the rescaling (dividing by the
> > number of elements) at the start make a big time saving (over dividing each
> > total at the end)?  I would feel happier with the division at the end (as done
> > in the listfns code).
> > 
> I think the rescaling at the start is a good thing. If the list contains many
> different objects, rescaling at the end can take a long time. Probably that is
> not the typical use case here, but on the other hand I don't see a good reason
> not to save time here.
> 
> Maybe just my nitpicking, but I think the get_content_freq function will be
> more readable if we use different variable names inside this function.
> 

(In reply to comment #10)
> See
> http://coreygoldberg.blogspot.com/2008/07/python-counting-items-in-list.html
> for some timings of this operation. I think Bruce's approach is most suitable,
> except for the dict update method; I would use
>         content_freqs[cval] = content_freqs.get(cval,0)+p_contents
> instead. Depending on the contents of the list, sometimes it runs even faster
> than the implementation in listfns.

Basically the goal is find the frequency of each class and store it in a
dictionary with the keys being each class and the value being the frequency. So
you could count up all observations in each class (essentially a adding one to
the appropriate class sum) and then divide each count by the total number of
observations - as implemented in the dictget approach.Being more cryptic, we
can avoid the second division by adding one/number of observations instead one
to the appropriate class sum as implemented in get_content_freq.

Thanks for the link, I created a timing code for random lists.

get_content_freq is the one I put in the patch
get_content_freq2 is the modified version
ternary is based the Cory code modified to give frequencies rather than counts
dictget is using a dictionary to count then get the frequencies  
listfns.contents is the Biopython Python version without the C code import.
clistfns.contents is the direct import of Biopython module that uses C code 

My system is running 64-bit Fedora on Linux with Python 2.5.2. The number of
observation is not important (difference is very small), I used 1000000 random
integers and measured just doing it once and repeat the test 5 times with
1000000 executions and get the minimum time ie min(timeit.repeat(5, 1000000)).
Also, this function is not called that much in the NaiveBayes so these are
rather extreme cases. 

Range of ints between one and two:
get_content_freq  once: 1.90734863281e-05  best of 5: 8.11614704132
get_content_freq2 once: 8.10623168945e-06  best of 5: 4.39126110077
ternary file      once: 1.59740447998e-05  best of 5: 9.42879796028
dictget file      once: 1.4066696167e-05  best of 5: 10.468517065
listfns.contents  once: 1.28746032715e-05  best of 5: 7.50778198242
clistfns.contents once: 6.91413879395e-06  best of 5: 2.71360707283


Range of ints between one and ten:
get_content_freq  once: 1.90734863281e-05  best of 5: 7.97784090042
get_content_freq2 once: 7.15255737305e-06  best of 5: 4.21833491325
ternary file      once: 1.69277191162e-05  best of 5: 9.18815684319
dictget file      once: 1.50203704834e-05  best of 5: 10.2242910862
listfns.contents  once: 1.50203704834e-05  best of 5: 7.25569987297
clistfns.contents once: 8.10623168945e-06  best of 5: 2.6411280632

Range of ints between one and one hundred:

get_content_freq  once: 2.00271606445e-05  best of 5: 7.99760317802
get_content_freq2 once: 7.86781311035e-06  best of 5: 4.20446300507
ternary file      once: 1.71661376953e-05  best of 5: 9.26767396927
dictget file      once: 1.4066696167e-05  best of 5: 10.2449028492
listfns.contents  once: 1.4066696167e-05  best of 5: 7.34166693687
clistfns.contents once: 7.15255737305e-06  best of 5: 2.63198709488

So this not dependent on the number of classes. For the most part this numbers
are showing more system overheads than major differences between the actual
approaches. Therefore I would clearly go with Michiel's version.


> > 
> > Given the possible rounding issues, does doing the rescaling (dividing by the
> > number of elements) at the start make a big time saving (over dividing each
> > total at the end)?  I would feel happier with the division at the end (as done
> > in the listfns code).
> > 
> I think the rescaling at the start is a good thing. If the list contains many
> different objects, rescaling at the end can take a long time. Probably that is
> not the typical use case here, but on the other hand I don't see a good reason
> not to save time here.

>From the two case scenario above, the get_content_freq methods result in:
{1: 0.49978999999354606, 2: 0.50020999999354643}
and the others result in:
{1: 0.49979000000000001, 2: 0.50021000000000004}

On my 64-bit linux system the numerical error is small but within the
expectations. It may be worse on a 32-bit system or OS. I really wanted to draw
attention to this because tiny differences can be important (not to mention
people who don't understand enough about numerical precision).

> 
> Maybe just my nitpicking, but I think the get_content_freq function will be
> more readable if we use different variable names inside this function.
> 

Please rename as necessary.

Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 12:00:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 12:00:42 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051700.mA5H0gxV003976@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #12 from bsouthey at gmail.com  2008-11-05 12:00 EST -------
Created an attachment (id=1038)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1038&action=view)
timing different implementions of listfns.content

This is my timing code for different implementions of listfns.content. It does
assume that there is a local version of listnfs.py without the import clistfns
statement at the end and the clistfns function from Bio. 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 15:30:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 15:30:46 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811052030.mA5KUklP023725@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #36 from bsouthey at gmail.com  2008-11-05 15:30 EST -------
(In reply to comment #35)
Okay, this is what I think of the main uses for translation. All these can be
easily achieved by the translate arguments table='Standard' and stop_symbol='*'
with very little code. So I do not see any need for any extra arguments except
for convenience. (I have these uses in file that I will upload after this.)

So really my only issue left is what is the expected behaviour for:
a) to_stop_codon=True if there are no valid stop codons (my understanding of
to_stop). 
b) from_start_codon=True (or init=True etc) if there are no valid start codons


1) Translation in some given forward frame - reverse frames should be obvious.
Looping over these will give all three frames but that could return multiple
Seq objects.

2) Translation between any range of locations. From Peter's example, extracting
the region between 5234 to 5530 in the complete sequence will give the yaaX
gene CDS that can be translated into the protein sequence.

3a) Translate to the first valid stop codon. Perhaps not as expected because it
should respect the frame so try:
3b) Translate to the first valid stop codon with respect to selected frame.
3c) Alternatively use to_stop=True argument of the translate. Here translation
is to the first valid stop codon OR the end of the sequence. This second aspect
is not documented.

4a) Start translation at first start codon. Again, does not respect frame so
try:
4b) Translate to the first valid start codon with respect to selected frame.

In both cases of 4) the very first codon must be checked against the defined
start_codon list in the appropriate CodonTable.

Obviously 3) and 4) should raise exceptions if stop or start codons are not
found because of the specific request to stop or start translation. But, as in
3c), this could be relaxed to include the end of the sequence. I am not sure
the behaviour if there is no valid start codon.

Also some variation of 3a) and 4a) could be used to find possible open reading
frames (from a start codon to stop codon). But this could return more than one
Seq object. 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 15:33:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 15:33:52 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811052033.mA5KXqqJ023824@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #37 from bsouthey at gmail.com  2008-11-05 15:33 EST -------
Created an attachment (id=1039)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1039&action=view)
examples of possible uses of translate


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 17:12:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 17:12:13 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811052212.mA5MCDhY028649@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #38 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-05 17:12 EST -------
(In reply to comment #36)
> (In reply to comment #35)
> Okay, this is what I think of the main uses for translation.
> All these can be easily achieved by the translate arguments
> table='Standard' and stop_symbol='*' with very little code.
> So I do not see any need for any extra arguments except
> for convenience. (I have these uses in file that I will
> upload after this.)

Most of your examples seem to relate to open reading frame searches, looking
for start/stop codons etc.  I agree this kind of thing isn't needed in the
basic translate method/function.

Doing a CDS translation however is more fiddly due to the methionine at the
start, and I think this warrents another option in the basic translate
method/function.

> So really my only issue left is what is the expected behaviour for:
> a) to_stop_codon=True if there are no valid stop codons (my understanding of
> to_stop).

If you are asking about the current to_stop argument in CVS right now, if there
is no in frame stop codon it will translate all the sequence (to_stop has no
effect).  I've just updated the docstring to make this more explicit (see
Bio/Seq.py CVS revision 1.55).

Do you think "to_stop_codon" is a clearer argument name than "to_stop"?

> b) from_start_codon=True (or init=True etc) if there are no valid start codons

As written in attachment 1032, if the sequence does not start with a valid
start codon an exception is raised.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 18:09:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 18:09:01 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811052309.mA5N91aO031273@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #39 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-05 18:09 EST -------
Created an attachment (id=1040)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1040&action=view)
Patch to Bio/Seq.py for complete CDS translation.

(In reply to comment #33)
> Instead of the "init" start codon option in attachment 1032,
> I'd also be happy with a single boolean argument which does
> start codon validation, treats this as a methionine, checks
> the sequence is a multiple of three in length, checks for a
> final stop codon, and checks for no additional stop codons.
> We'd ruled out calling this "complete", but maybe "cds"
> would be better?

This patch adds this functionality via a "complete_cds" boolean argument.

Here is how it could be applied to translate the CDS used as an example in my
comment 35, the yaaX gene in E. coli K12:

>>> from Bio.Seq import Seq
>>> my_cds = Seq("GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCAGCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGATAATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACATTATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCATAAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA")
>>> my_cds.translate(table=11)
Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HR*',
HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> my_cds.translate(table=11, to_stop=True)
Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR',
ExtendedIUPACProtein())
>>> my_cds.translate(table=11, complete_cds=True)
Seq('MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR',
ExtendedIUPACProtein())

I would be happy with EITHER of these options, as both can be used to translate
a complete coding sequence:

(1) the "init" argument (under another name, maybe "cds_start"?) illustrated in
attachment 1032.  This would check the start codon is valid AND translate it as
a methionine.

(2) the "complete_cds" argument (perhaps under another name, maybe "cds"?)
illustrated in this patch.  This would check the start codon is valid AND
translate it as a methionine AND check there are a whole number of codons AND
check it ends with a stop codon AND check there are no extra in-frame stop
codons.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 06:14:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 06:14:07 -0500
Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for
	arguments to their types
In-Reply-To: <bug-2639-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061114.mA6BE7jk002000@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2639


------- Comment #3 from dalloliogm at gmail.com  2008-11-06 06:14 EST -------
Created an attachment (id=1041)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1041&action=view)
add a check for the seq argument in seqrecord, to be a Seq object and not None

This patch adds a check for the seq argument in SeqRecord.
If seq is None (by default), it raises a ValueError Exception.
If it is a Seq objects, it saves it as self.seq.
If it is another kind of object (string, list, integer), it is converted to a
string, and then used to instantiate a seq object.
I thought that someone could use an integer (e.g.: 010100010101101) as a
sequence, and in this case, the integer is first converted to a string
(otherwise Seq() would return an error).

Please, take care with this patch: I have messed a bit with cvs and patches :(,
so, this patch contains also a doctest example that I have added for my self
(see bug report 2640).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 06:31:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 06:31:57 -0500
Subject: [Biopython-dev] [Bug 2643] New: Proposal: fastPhaseOutputIO for
	SeqIO
Message-ID: <bug-2643-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643

           Summary: Proposal: fastPhaseOutputIO for SeqIO
           Product: Biopython
           Version: Not Applicable
          Platform: PC
               URL: http://github.com/dalloliogm/biopython---
                    popgen/tree/master/src/PopGen/Gio/fastPhaseOutputIO.py
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: dalloliogm at gmail.com
                CC: tiagoantao at gmail.com


Hi,
fastPHASE is software for haplotype reconstruction and missing genotype
estimation from population genetic SNP data.
- http://stephenslab.uchicago.edu/software.html
It is commonly used by some population genetics bioinformaticians.

I had to convert the output from a fastPhase run to fasta; so I wrote a module
that reads a fastPhase output file, and returns SeqRecord objects.

fastPhase output contains information about SNPs and genotyping, and would
probably be supported by the PopGen module that is being written for biopython.
However, my module is thought to be used only to read the sequence information
from the output file, and to create SeqRecord objects, ignoring any other kind
of information.
So, in the future we could have to fastPhaseOutputIterator-like modules, one
that creates SeqRecord objects, and one other to be used in PopGen.

The module has been tested with doctest. I'll attach a file with the tests
along with the module.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 06:40:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 06:40:17 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061140.mA6BeHwc003465@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #1 from dalloliogm at gmail.com  2008-11-06 06:40 EST -------
Created an attachment (id=1042)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1042&action=view)
fastPhase output iterator, for SeqIO

If invoked directly, this module tries to call doctest.testfile over a file
called test_fastPhaseOutputIO.py (I will post it in 5 minutes).
You should edit this module to point it to the right file path on your
computer.

This module is thought to be used with SeqIO. You should modify
SeqIO.__init__.py and add it to the _FormatToIterator dictionary. 

I didn't wrote a Writer handler, because you are not supposed to create
fastPhaseOutput files manually (even if it could be useful for testing
purposes).

You can see the git history of this module here: 
-
http://github.com/dalloliogm/biopython---popgen/tree/master/src/PopGen/Gio/fastPhaseOutputIO.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 06:42:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 06:42:55 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061142.mA6Bgt77003705@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #2 from dalloliogm at gmail.com  2008-11-06 06:42 EST -------
Created an attachment (id=1043)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1043&action=view)
this is a doctest file to test fastPhaseOutputIterator

This file is called by fastPhaseOutputIO, when __name__ == '__init__'


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 06:44:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 06:44:55 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061144.mA6BitTU003910@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #3 from dalloliogm at gmail.com  2008-11-06 06:44 EST -------
Created an attachment (id=1044)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1044&action=view)
adds fastPhaseOutput support to SeqIO

this patchs adds fastPhaseOutput support to SeqIO (not tested)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 06:50:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 06:50:39 -0500
Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for
	arguments to their types
In-Reply-To: <bug-2639-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061150.mA6Bod9J004289@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2639


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 06:50 EST -------
(In reply to comment #3)
> Created an attachment (id=1041)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1041&action=view) [details]
> add a check for the seq argument in seqrecord, to be a Seq object and not None
>
> This patch adds a check for the seq argument in SeqRecord.
> If seq is None (by default), it raises a ValueError Exception.
> If it is a Seq objects, it saves it as self.seq.
> If it is another kind of object (string, list, integer), it is converted to a
> string, and then used to instantiate a seq object.

I was deliberately not checking the seq argument.  There are several reasonable
use cases:

* a Seq object (normal) or a subclass of it.
* a MutableSeq object (seems reasonable, note this is not a subclass of Seq)
* None (seems a good way to handle sequence records where we don't know the
sequence - for example some GenBank files).
* a user defined sequence object which implements the Seq API but does not
subclass Seq or MutableSeq (this is more difficult to check).

> I thought that someone could use an integer (e.g.: 010100010101101) as a
> sequence, and in this case, the integer is first converted to a string
> (otherwise Seq() would return an error).

Note that if someone did want to use some weird numerical sequence, then the
SeqRecord object should NOT be trying to do anything special (guessing what is
intended). The user should create a suitable Seq object themselves (ideally
with a numerical alphabet object).  Explicit rather than implicit (Zen of
python).

--

Note that I'm not 100% happy with the type checking we've just added.  See
"duck-typing" and interfaces versus types,
http://www.python.org/doc/2.5.2/tut/node18.html#l2h-46

The checks I've added shouldn't be too constraining - but maybe they should use
using interface checking instead (or just revert back to no checking).

Any comments from other people?  This should be being CC'd to the dev mailing
list.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 07:14:04 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 07:14:04 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061214.mA6CE4PD005743@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 07:14 EST -------
Hi Marco,

This looks interesting :)

Could you attach the individual valid sample fastPHASE files as separate
attachments (so they can be integrated into the existing unit tests).  You seem
to have picked very small files in order to use them as doctests; a larger more
realistic example would be better for the unit tests (a few 5kb in size should
be OK - not too big).

Do you have URL for the file format documentation?  Are they always DNA for
example, or is RNA also possible?

If you want to include a fastPHASE parser in Bio.SeqIO it should ideally cope
with any valid fastPHASE output.  In the doctests you have an example:

... BEGIN GENOTYPES
... Ind1  # subpop. label: 6  (internally 1)
... T
... T C
... Ind2  # subpop. label: 6  (internally 1)
... C
... T
... END GENOTYPES

You're treating this as an error - "Two chromosomes with different length". 
Why isn't it parsed as four short sequences (of different lengths): "T", "TC",
"C", "T"?

Similarly, the final example:

... BEGIN GENOTYPES
... Ind1  # subpop. label: 6  (internally 1)
... T T T T T G A A A C C A A A G A C G C T G C G T C A G C C T G C A A T C T G
... Ind2  # subpop. label: 6  (internally 1)
... C T T T T G C C C T C A A A A G T G C T G T G C C A G T C T A C G G C C T G
... T T T T T G A A A C C A A A G A C G C T T C G T C A G T A T A C G A T C T A
... END GENOTYPES

Again, you raised an error - "Missing sequence in input file".  If this is a
valid file shouldn't it be parsed as three sequences?

On the other hand, are these hand edited files which deliberately break the
rules?  If fastPHASE files SHOULD always come in allele groups (of the same
length), then it would be better to integrate the parser into Bio.AlignIO
giving pairwise alignments (and you would be able to read it via Bio.SeqIO
automatically as well).

P.S. Your suggested format name "fastPhaseOutput" breaks the lower case rule. 
Would "fastphase" be OK, or is there more than one format?  e.g. an input
format which might be confused with this?

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 07:21:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 07:21:09 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061221.mA6CL9e8006180@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 07:21 EST -------
(In reply to comment #4)
> You seem to have picked very small files in order to use them as
> doctests; a larger more realistic example would be better for the
> unit tests (a few 5kb in size should be OK - not too big).

Sorry - that was a typo.  I meant a few kb in size (5kb should be OK). 

I don't have a feel for the typical size of real fastPHASE output, but a few
interesting real examples (e.g. covering a range of fastPHASE command line
options) would be better than a single large file.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 07:25:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 07:25:42 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061225.mA6CPgsn006472@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 07:25 EST -------
P.S.

The module's docstring needs some work - your introduction for this bug might
be a good start.  We should include the URL
http://stephenslab.uchicago.edu/software.html and the reference in the module's
docstring:

Scheet, P and Stephens, M (2006) "A fast and flexible statistical model for
large-scale population genotype data: applications to inferring missing
genotypes and haplotypic phase." Am J Hum Genet 78(4):629-44.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From tiagoantao at gmail.com  Thu Nov  6 08:18:54 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 6 Nov 2008 13:18:54 +0000
Subject: [Biopython-dev] Preparing for Biopython 1.49 (beta)
In-Reply-To: <320fb6e00811040336k12a834b9o2fa103b8fabf7ec1@mail.gmail.com>
References: <320fb6e00811040336k12a834b9o2fa103b8fabf7ec1@mail.gmail.com>
Message-ID: <6d941f120811060518w388bd471g129aafdaf02381d4@mail.gmail.com>

On Tue, Nov 4, 2008 at 11:36 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> If this schedule is realistic, then Tiago should be OK to add his next
> set of PopGen code in about two weeks time (for what would become
> Biopython 1.50).


I am working on documentation and test cases for LDNe and extra
GenePop support (this is more or less orthogonal to the ongoing
discussion on statistics), code is all done for weeks. I will start to
upload it as soon as you unfroze CVS from 1.49.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 09:24:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 09:24:12 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061424.mA6EOCcB015073@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #40 from bsouthey at gmail.com  2008-11-06 09:24 EST -------
(In reply to comment #38)
> (In reply to comment #36)
> > (In reply to comment #35)
> > Okay, this is what I think of the main uses for translation.
> > All these can be easily achieved by the translate arguments
> > table='Standard' and stop_symbol='*' with very little code.
> > So I do not see any need for any extra arguments except
> > for convenience. (I have these uses in file that I will
> > upload after this.)
> 
> Most of your examples seem to relate to open reading frame searches, looking
> for start/stop codons etc.  I agree this kind of thing isn't needed in the
> basic translate method/function.
> 
> Doing a CDS translation however is more fiddly due to the methionine at the
> start, and I think this warrents another option in the basic translate
> method/function.
> 
> > So really my only issue left is what is the expected behaviour for:
> > a) to_stop_codon=True if there are no valid stop codons (my understanding of
> > to_stop).
> 
> If you are asking about the current to_stop argument in CVS right now, if there
> is no in frame stop codon it will translate all the sequence (to_stop has no
> effect).  I've just updated the docstring to make this more explicit (see
> Bio/Seq.py CVS revision 1.55).
> 
> Do you think "to_stop_codon" is a clearer argument name than "to_stop"?
> 

I think to_end because end does mean the end of the translation due to a stop
codon or end of a sequence.


> > b) from_start_codon=True (or init=True etc) if there are no valid start codons
> 
> As written in attachment 1032 [details], if the sequence does not start with a valid
> start codon an exception is raised.
> 

Okay.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 09:35:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 09:35:40 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061435.mA6EZe5F015831@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #41 from lpritc at scri.sari.ac.uk  2008-11-06 09:35 EST -------
(In reply to comment #40)
> > If you are asking about the current to_stop argument in CVS right now, if there
> > is no in frame stop codon it will translate all the sequence (to_stop has no
> > effect).  I've just updated the docstring to make this more explicit (see
> > Bio/Seq.py CVS revision 1.55).
> > 
> > Do you think "to_stop_codon" is a clearer argument name than "to_stop"?
> > 
> I think to_end because end does mean the end of the translation due to a stop
> codon or end of a sequence.

I would take 'to_end' to mean 'to the end of the passed sequence, ignoring all
stop codons along the way'.  'to_first_stop' is clearer, to my mind, and even
that leaves out the potential (and hopefully redundant) qualifier 'in-frame' ;)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 09:46:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 09:46:48 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061446.mA6Ekmfj016554@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #42 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 09:46 EST -------
Peter in comment #40
>>> If you are asking about the current to_stop argument in CVS right now,
>>> if there is no in frame stop codon it will translate all the sequence
>>> (to_stop has no effect).  I've just updated the docstring to make this
>>> more explicit (see Bio/Seq.py CVS revision 1.55).
>>> 
>>> Do you think "to_stop_codon" is a clearer argument name than "to_stop"?
>>>

Bruce in comment #41:
>> I think to_end because end does mean the end of the translation
>> due to a stop codon or end of a sequence.
>>

Leighton in comment #42: 
> I would take 'to_end' to mean 'to the end of the passed sequence,
> ignoring all stop codons along the way'.  'to_first_stop' is
> clearer, to my mind, and even that leaves out the potential (and
> hopefully redundant) qualifier 'in-frame' ;)
> 

I agree with Leighton here, "to_end" sounds like "to the end of the sequence
given".  I quite like "to_first_stop", but it is longer than "to_stop".


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 10:07:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 10:07:06 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061507.mA6F76PK018513@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #43 from bsouthey at gmail.com  2008-11-06 10:07 EST -------
(In reply to comment #39)
> Created an attachment (id=1040)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1040&action=view) [details]
> Patch to Bio/Seq.py for complete CDS translation.
> 
> (In reply to comment #33)
> > Instead of the "init" start codon option in attachment 1032 [details],
> > I'd also be happy with a single boolean argument which does
> > start codon validation, treats this as a methionine, checks
> > the sequence is a multiple of three in length, checks for a
> > final stop codon, and checks for no additional stop codons.
> > We'd ruled out calling this "complete", but maybe "cds"
> > would be better?
> 
> This patch adds this functionality via a "complete_cds" boolean argument.
> 
> Here is how it could be applied to translate the CDS used as an example in my
> comment 35, the yaaX gene in E. coli K12:
> 
> >>> from Bio.Seq import Seq
> >>> my_cds = Seq("GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCAGCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGATAATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACATTATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCATAAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA")
> >>> my_cds.translate(table=11)
> Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HR*',
> HasStopCodon(ExtendedIUPACProtein(), '*'))
> >>> my_cds.translate(table=11, to_stop=True)
> Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR',
> ExtendedIUPACProtein())
> >>> my_cds.translate(table=11, complete_cds=True)
> Seq('MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR',
> ExtendedIUPACProtein())
> 
> I would be happy with EITHER of these options, as both can be used to translate
> a complete coding sequence:
> 
> (1) the "init" argument (under another name, maybe "cds_start"?) illustrated in
> attachment 1032 [details].  This would check the start codon is valid AND translate it as
> a methionine.
> 
> (2) the "complete_cds" argument (perhaps under another name, maybe "cds"?)
> illustrated in this patch.  This would check the start codon is valid AND
> translate it as a methionine AND check there are a whole number of codons AND
> check it ends with a stop codon AND check there are no extra in-frame stop
> codons.
> 


I support (1) but strongly disagree with (2) because 'cds' refers to a complete
DNA sequence not just if the sequence starts with M.
http://www.yeastgenome.org/help/glossary.html
"CDS:    CoDing Sequence, region of nucleotides that corresponds to the
sequence of amino acids in the predicted protein. The CDS includes start and
stop codons, therefore coding sequences begin with an "ATG" and end with a stop
codon. In SGD, unexpressed sequences, including the 5'-UTR, the 3'-UTR,
introns, or bases not expressed due to frameshifting, are not included within a
CDS. Note that the CDS does not correspond to the actual mRNA sequence."

However, I do like being able to obtain the translation of the actual CDS -
just not here.

I do not support the name 'init' because of reasons discussed. 

I do not support the name 'cds_start' because of the DNA interpretation and
that many Genbank records include the upstream and downstream non-coding
regions. In such cases, I would have to find the actual start codon, then I
might as well do the translation after that start codon than rely on a check
that might be wrong.

Perhaps some variant of:
a) Similar cases in Python:
has_met or has_met1
get_met or get_met1
b) More direct meaning:
starts_with_methionine, starts_with_met, starts_with_m


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 10:08:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 10:08:17 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061508.mA6F8HRo018696@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #44 from bsouthey at gmail.com  2008-11-06 10:08 EST -------
(In reply to comment #42)
> Peter in comment #40
> >>> If you are asking about the current to_stop argument in CVS right now,
> >>> if there is no in frame stop codon it will translate all the sequence
> >>> (to_stop has no effect).  I've just updated the docstring to make this
> >>> more explicit (see Bio/Seq.py CVS revision 1.55).
> >>> 
> >>> Do you think "to_stop_codon" is a clearer argument name than "to_stop"?
> >>>
> 
> Bruce in comment #41:
> >> I think to_end because end does mean the end of the translation
> >> due to a stop codon or end of a sequence.
> >>
> 
> Leighton in comment #42: 
> > I would take 'to_end' to mean 'to the end of the passed sequence,
> > ignoring all stop codons along the way'.  'to_first_stop' is
> > clearer, to my mind, and even that leaves out the potential (and
> > hopefully redundant) qualifier 'in-frame' ;)
> > 
> 
> I agree with Leighton here, "to_end" sounds like "to the end of the sequence
> given".  I quite like "to_first_stop", but it is longer than "to_stop".
> 

Either is fine with me.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 10:11:38 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 10:11:38 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061511.mA6FBcAY019165@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 10:11 EST -------
I've now had a quick look at the fastPHASE documentation, and I have the
impression that the sequences should always come in pairs:

"Output ???les for inferred haplotypes or imputed genotypes contain two lines 
per given diploid individual, with the order of individuals corresponding to 
that supplied in the input ???le."

Assuming the paired sequences are always the same length, this does suggest the
format should be integrated into Bio.AlignIO (giving pairwise alignments)
rather than Bio.SeqIO.

Have you tried not estimating the haplotypes (by supplying a negative integer
following -H), and does this alter the sequence output?

Finally could you try the -Z command line argument for the simplified output
format (described as two lines per individual, without ???id??? lines,
subpopulation labels or summary information from the run).  Does this have the
sequences?  If so this may be a more parser friendly set of output to parse for
Bio.SeqIO and/or Bio.AlignIO.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 10:27:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 10:27:07 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061527.mA6FR7TQ021259@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #45 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 10:27 EST -------
(In reply to comment #43)
> (In reply to comment #39)
> > I would be happy with EITHER of these options, as both can be used to
> > translate a complete coding sequence:
> > 
> > (1) the "init" argument (under another name, maybe "cds_start"?)
> > illustrated in attachment 1032.  This would check the start
> > codon is valid AND translate it as a methionine.
> > 
> > (2) the "complete_cds" argument (perhaps under another name, maybe "cds"?)
> > illustrated in this patch.  This would check the start codon is valid AND
> > translate it as a methionine AND check there are a whole number of codons
> > AND check it ends with a stop codon AND check there are no extra in-frame
> > stop codons.
> > 
> 
> 
> I support (1) but strongly disagree with (2) because 'cds' refers to
> a complete DNA sequence not just if the sequence starts with M.
> http://www.yeastgenome.org/help/glossary.html
> "CDS:    CoDing Sequence, region of nucleotides that corresponds to the
> sequence of amino acids in the predicted protein. The CDS includes start and
> stop codons, therefore coding sequences begin with an "ATG" and end with a
> stop codon. In SGD, unexpressed sequences, including the 5'-UTR, the 3'-UTR,
> introns, or bases not expressed due to frameshifting, are not included within
> a CDS. Note that the CDS does not correspond to the actual mRNA sequence."

Starting with that definition but being aware of atypical start codons gives:

"The CDS includes start and stop codons, therefore coding sequences begin with
an "ATG" [or other valid start codon] and end with a stop codon."

This then fits exactly with what I'm doing in the "complete_cds" option
(attachment 1040).  So why the disagreement?

> However, I do like being able to obtain the translation of the actual
> CDS - just not here.

Back in comment 11, I previously mooted having separate methods like
translate_to_stop, and translate_cds - but we currently seem to be leaning
towards one method with some options.

> I do not support the name 'init' because of reasons discussed. 

I think that is settled, "init" is too ambiguous.

> I do not support the name 'cds_start' because of the DNA interpretation and
> that many Genbank records include the upstream and downstream non-coding
> regions. In such cases, I would have to find the actual start codon, then I
> might as well do the translation after that start codon than rely on a check
> that might be wrong.

In such cases, if your sequence might includes upstream and downstream
non-coding regions, then you shouldn't be trying to use the "init"/"cds_start"
option (or the "complete_cds" option).  By the nature of your uncertain
dataset, you'll have to do some extra work to find the start/stop.  I don't see
how this is an argument against providing an option useful for when you do know
where the CDS starts (or do already have the CDS).

> Perhaps some variant of:
> a) Similar cases in Python:
> has_met or has_met1
> get_met or get_met1
> b) More direct meaning:
> starts_with_methionine, starts_with_met, starts_with_m
> 

I'd been avoiding names with methionine in them, preferring to focus on
initiation or start codon based names.

I guess "starts_with_met" is OK.  Or maybe "start_met"?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 10:28:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 10:28:20 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061528.mA6FSKMv021486@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #46 from lpritc at scri.sari.ac.uk  2008-11-06 10:28 EST -------
(In reply to comment #43)

> > (2) the "complete_cds" argument (perhaps under another name, maybe "cds"?)
> > illustrated in this patch.  This would check the start codon is valid AND
> > translate it as a methionine AND check there are a whole number of codons AND
> > check it ends with a stop codon AND check there are no extra in-frame stop
> > codons.

> I support (1) but strongly disagree with (2) because 'cds' refers to a complete
> DNA sequence not just if the sequence starts with M.
> http://www.yeastgenome.org/help/glossary.html
> "CDS:    CoDing Sequence, region of nucleotides that corresponds to the
> sequence of amino acids in the predicted protein. The CDS includes start and
> stop codons, therefore coding sequences begin with an "ATG" and end with a stop
> codon. In SGD, unexpressed sequences, including the 5'-UTR, the 3'-UTR,
> introns, or bases not expressed due to frameshifting, are not included within a
> CDS. Note that the CDS does not correspond to the actual mRNA sequence."

That definition seems to correspond exactly to (2), above; not that web-based
definitions have any particular authority ;)

"Begin with an ATG" is a eukaryote-specific statement; "Begin with a (valid)
start codon" covers this.

"End with a stop codon", implying the *first in-frame* stop codon is the same
in both cases.

Where do you see that they differ?

> I do not support the name 'cds_start' because of the DNA interpretation and
> that many Genbank records include the upstream and downstream non-coding
> regions. In such cases, I would have to find the actual start codon, then I
> might as well do the translation after that start codon than rely on a check
> that might be wrong.

I don't think that the argument is proposed for that particular use-case, which
is why I don't think it's valid, there.  If, say, you knew that the 5`UTR ran
to base 17, then you could check with seq[17:].translate(complete_cds=True) or
some such arrangement - but that's not the problem that's being solved with
that method argument, I think.

> Perhaps some variant of:
> a) Similar cases in Python:
> has_met or has_met1
> get_met or get_met1
> b) More direct meaning:
> starts_with_methionine, starts_with_met, starts_with_m

I quite like this way of checking sequence properties, and would prefer an
is_cds() (or, to be pedantic, is_conceptual_cds()) method that returns a
Boolean, but otherwise implements the sort of behaviour described above.

If you only wanted the conceptual translations of sequences that fit the
criteria for a CDS, then a one-liner to replace

[seq.translate(cds=True) for seq in seqlist]

might be

[seq.translate() for seq in seqlist if seq.is_cds()]

I prefer the second option, for readability, but YMMV.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 11:06:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 11:06:46 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061606.mA6G6kL7028787@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #8 from dalloliogm at gmail.com  2008-11-06 11:06 EST -------
(In reply to comment #4)
> Hi Marco,

Hi!! :)

> This looks interesting :)
> 
> Could you attach the individual valid sample fastPHASE files as separate
> attachments (so they can be integrated into the existing unit tests).  You seem
> to have picked very small files in order to use them as doctests; a larger more
> realistic example would be better for the unit tests (a few 5kb in size should
> be OK - not too big).

ok
Actually I have been using files which come from our laboratory analysis, and I
would like to ask if I include them here and how first.

> Do you have URL for the file format documentation?  

The fastphase format seems to be described only in fastphase's manual, which is
only accessible after accepting a license agreement.
I could contact the authors of the program to ask them to publish the format
specifications publicly. It would be in their interest, as otherwise the format
could be considered as a not standard.
I'll let you know..

> Are they always DNA for example, or is RNA also possible?

They should be DNA, In principle they could be also genes, or other kind of
characters, but this software is designed for the purpose of reconstructing
haplotypes from SNPs/microsatellites.
Maybe Tiago has some more experience in this..

> If you want to include a fastPHASE parser in Bio.SeqIO it should ideally cope
> with any valid fastPHASE output.  In the doctests you have an example:
> 
> ... BEGIN GENOTYPES
> ... Ind1  # subpop. label: 6  (internally 1)
> ... T
> ... T C
> ... Ind2  # subpop. label: 6  (internally 1)
> ... C
> ... T
> ... END GENOTYPES
> You're treating this as an error - "Two chromosomes with different length". 
> Why isn't it parsed as four short sequences (of different lengths): "T", "TC",
> "C", "T"?

You should not have a file in which a chromosome is longer than the other
one... instead, you should have a '?' indicating data that the program could
not infer.


> Similarly, the final example:
> 
> ... BEGIN GENOTYPES
> ... Ind1  # subpop. label: 6  (internally 1)
> ... T T T T T G A A A C C A A A G A C G C T G C G T C A G C C T G C A A T C T G
> ... Ind2  # subpop. label: 6  (internally 1)
> ... C T T T T G C C C T C A A A A G T G C T G T G C C A G T C T A C G G C C T G
> ... T T T T T G A A A C C A A A G A C G C T T C G T C A G T A T A C G A T C T A
> ... END GENOTYPES
> 
> Again, you raised an error - "Missing sequence in input file".  If this is a
> valid file shouldn't it be parsed as three sequences?

Because that would mean that one individual has only a chromosome.
It doesn't make sense to run fastPhase on an haploid individual.


> On the other hand, are these hand edited files which deliberately break the
> rules?  

Yes. Usually you shouldn't have neither of the two cases. But I find it useful
when a script tells me if there are weird things in my files (I could have
modified them accidentally).
This could be refactored in a check_fileformat function.

> If fastPHASE files SHOULD always come in allele groups (of the same
> length), then it would be better to integrate the parser into Bio.AlignIO
> giving pairwise alignments (and you would be able to read it via Bio.SeqIO
> automatically as well).

This is good idea, I didn't think of it.
But how should I modify the module to produce AlignIO objects?


> P.S. Your suggested format name "fastPhaseOutput" breaks the lower case rule. 
> Would "fastphase" be OK, or is there more than one format?  e.g. an input
> format which might be confused with this?

I agree.. I wasn't sure of biopython's naming conventions.

> 
> Peter
> 
Scheet and Stephens (2006)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 11:12:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 11:12:15 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061612.mA6GCFHq029869@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #9 from dalloliogm at gmail.com  2008-11-06 11:12 EST -------
(In reply to comment #7)
> I've now had a quick look at the fastPHASE documentation, and I have the
> impression that the sequences should always come in pairs:

right!

> "Output ???les for inferred haplotypes or imputed genotypes contain two lines 
> per given diploid individual, with the order of individuals corresponding to 
> that supplied in the input ???le."
> 
> Assuming the paired sequences are always the same length, this does suggest the
> format should be integrated into Bio.AlignIO (giving pairwise alignments)
> rather than Bio.SeqIO.


> Have you tried not estimating the haplotypes (by supplying a negative integer
> following -H), and does this alter the sequence output?

I will try it, ok.

> Finally could you try the -Z command line argument for the simplified output
> format (described as two lines per individual, without ???id??? lines,
> subpopulation labels or summary information from the run).  Does this have the
> sequences?  If so this may be a more parser friendly set of output to parse for
> Bio.SeqIO and/or Bio.AlignIO.

ok, I can try to implement both of the two formats, but for the moment I will
prefer to concetrate on one.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 12:11:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 12:11:26 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061711.mA6HBQN5007343@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #47 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 12:11 EST -------
(In reply to comment #46)
> If you only wanted the conceptual translations of sequences that fit the
> criteria for a CDS, then a one-liner to replace
> 
> [seq.translate(cds=True) for seq in seqlist]
> 
> might be
> 
> [seq.translate() for seq in seqlist if seq.is_cds()]
> 
> I prefer the second option, for readability, but YMMV.
> 

Note the above wouldn't give you translations starting with methionine, you'd
need something like:

[seq.translate(cds_start=True) for seq in seqlist if seq.is_cds()]

(assuming we call the "init" option "cds_start")

Or, going with the complete_cds option you could build a list of translations
of valid CDSs like this:

proteins = []
for seq in seqlist :
    try :
        proteins.append(seq.translate(complete_cds=True))
    except ValueError :
        #Not a valid CDS, excluded
        pass

Not a one liner, but I think in a real situation you'd want to do something
with the invalid CDSs anyway (even if just logging them).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 12:32:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 12:32:52 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061732.mA6HWqE7009337@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #48 from lpritc at scri.sari.ac.uk  2008-11-06 12:32 EST -------
(In reply to comment #47)
> (In reply to comment #46)

> > [seq.translate() for seq in seqlist if seq.is_cds()]
> > 
> > I prefer the second option, for readability, but YMMV.
> 
> Note the above wouldn't give you translations starting with methionine, you'd
> need something like:
> 
> [seq.translate(cds_start=True) for seq in seqlist if seq.is_cds()]
> 
> (assuming we call the "init" option "cds_start")

Fair point... my focus was on putting that filter into the list comprehension.

> Or, going with the complete_cds option you could build a list of translations
> of valid CDSs like this:
> 
> proteins = []
> for seq in seqlist :
>     try :
>         proteins.append(seq.translate(complete_cds=True))
>     except ValueError :
>         #Not a valid CDS, excluded
>         pass
> 
> Not a one liner, but I think in a real situation you'd want to do something
> with the invalid CDSs anyway (even if just logging them).

True enough.  It comes down in part to a preference of style, as the same could
be achieved with

proteins = []
for seq in seqlist :
    if seq.is_cds():
        proteins.append(seq.translate(complete_cds=True))
    else:
        #Not a valid CDS, excluded
        pass

I think the clarity of this arrangement to my eyes comes from 'is/is not a cds'
being - naturally-speaking - a property or attribute of the sequence itself. 
The 'cds_start' argument in your example is then an instruction to treat the
translation as though you have a CDS, and implement some specialised behaviour
that is appropriate under that circumstance, rather than to implement a test
that raises an error if it is failed.  By separating the 'is_cds()' call from
the 'cds_start' argument, you gain the ability to translate the sequence with
either the methionine or the coded amino acid, without losing the test of the
sequence being a CDS.

Of course, using the 'cds_start=True' argument could force a call to
self.is_cds(), anyway.  Your non-one-liner could then be as you originally
wrote:

proteins = []
for seq in seqlist :
    try:
        proteins.append(seq.translate(complete_cds=True))
    except ValueError:
        #Not a valid CDS, excluded
        pass

The two advantages I see to having the is_cds() method as a separate call are
that it permits separation of the determining the CDS status of the sequence,
and that it provides a filter that is more readable than attempting to
translate the sequence to find out if it's a valid CDS.  If the 'cds_start'
argument forces a self.is_cds() test, then the usage can be - I think - exactly
as you've been proposing throughout the thread.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 12:33:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 12:33:12 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061733.mA6HXCuE009403@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 12:33 EST -------
(In reply to comment #8)
> 
> ok
> Actually I have been using files which come from our laboratory analysis,
> and I would like to ask if I include them here and how first.

If you can get permission to include a real example (and its not too big) that
would be great.  Ideally something with at least three alleles.

> > Do you have URL for the file format documentation?  
> 
> The fastphase format seems to be described only in fastphase's manual,
> which is only accessible after accepting a license agreement.
> I could contact the authors of the program to ask them to publish the format
> specifications publicly. It would be in their interest, as otherwise the
> format could be considered as a not standard.  I'll let you know.

It's not very open, is it :(

Are there any other tools that output this file format?  Do you think the
author might be willing to just add an option to output the sequences in
another format (e.g. FASTA, or better an alignment format designed for more
than one alignment).  This would be a neater solution in the long run (and
would benefit anyone using fastPhase - not just Biopython).

> > Are they always DNA for example, or is RNA also possible?
> 
> They should be DNA, In principle they could be also genes, or other kind of
> characters, but this software is designed for the purpose of reconstructing
> haplotypes from SNPs/microsatellites.
> Maybe Tiago has some more experience in this..

If it is for DNA only, the sequences/alignments returned should ideally specify
a DNA alphabet.

> ...
> Because that would mean that one individual has only a chromosome.
> It doesn't make sense to run fastPhase on an haploid individual.

Is fastPhase only for haploids?  Could it be used with polyploidy (e.g.
plants)?

> > On the other hand, are these hand edited files which deliberately break the
> > rules?  
> 
> Yes. Usually you shouldn't have neither of the two cases. But I find it
> useful when a script tells me if there are weird things in my files (I
> could have modified them accidentally).

Yes - negative test cases are good.  However, having them as a doctest made the
docstring rather confusing.

> > If fastPHASE files SHOULD always come in allele groups (of the same
> > length), then it would be better to integrate the parser into Bio.AlignIO
> > giving pairwise alignments (and you would be able to read it via Bio.SeqIO
> > automatically as well).
> 
> This is good idea, I didn't think of it.
> But how should I modify the module to produce AlignIO objects?

Essentially Instead of:

yield record_one
yield record_two

you'd do something like this:

alignment = Alignment(generic_dna)
alignment.add_sequence(id_one, seq_one)
alignment.add_sequence(id_two, seq_two)
yield alignment

> > P.S. Your suggested format name "fastPhaseOutput" breaks the lower case
> > rule.  Would "fastphase" be OK, or is there more than one format?  e.g.
> > an input format which might be confused with this?
> 
> I agree.. I wasn't sure of biopython's naming conventions.
> 

This is written down elsewhere - but the format name is a lowercase string (and
this is enforced in the API), and the same names are used in both SeqIO and
AlignIO. Where possible we use the same name as BioPerl's SeqIO and EMBOSS.

(In reply to comment #9)
> (In reply to comment #7)
> > Finally could you try the -Z command line argument for the simplified output
> > format (described as two lines per individual, without ???id??? lines,
> > subpopulation labels or summary information from the run).  Does this have
> > the sequences?  If so this may be a more parser friendly set of output to
> > parse for Bio.SeqIO and/or Bio.AlignIO.
> 
> ok, I can try to implement both of the two formats, but for the moment I will
> prefer to concetrate on one.

I was actually thinking the -Z format might be much simpler to deal with (I
didn't mean to suggest supporting both).  On the other hand, the documentation
does say the -Z is "not intended for general use".

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From dalloliogm at gmail.com  Thu Nov  6 13:09:55 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Thu, 6 Nov 2008 19:09:55 +0100
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <200811061733.mA6HXCuE009403@portal.open-bio.org>
References: <bug-2643-42@http.bugzilla.open-bio.org/>
	<200811061733.mA6HXCuE009403@portal.open-bio.org>
Message-ID: <5aa3b3570811061009i29bb2faflb456978dacbf5218@mail.gmail.com>

On Thu, Nov 6, 2008 at 6:33 PM,  <bugzilla-daemon at portal.open-bio.org> wrote:
>
>
>
>
> ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 12:33 EST -------
> (In reply to comment #8)
>>
>> ok
>> Actually I have been using files which come from our laboratory analysis,
>> and I would like to ask if I include them here and how first.
>
> If you can get permission to include a real example (and its not too big) that
> would be great.  Ideally something with at least three alleles.

ok..

>> > Do you have URL for the file format documentation?
>>
>> The fastphase format seems to be described only in fastphase's manual,
>> which is only accessible after accepting a license agreement.
>> I could contact the authors of the program to ask them to publish the format
>> specifications publicly. It would be in their interest, as otherwise the
>> format could be considered as a not standard.  I'll let you know.
>
> It's not very open, is it :(
>
> Are there any other tools that output this file format?  Do you think the
> author might be willing to just add an option to output the sequences in
> another format (e.g. FASTA, or better an alignment format designed for more
> than one alignment).  This would be a neater solution in the long run (and
> would benefit anyone using fastPhase - not just Biopython).

Not for my knowledge.
Anyway, consider that a fastPhase run could take days for medium/big samples.
In some situations it could be faster to convert its output to fasta
(or other ones) directly, instead of re-calculating the results.

>> > Are they always DNA for example, or is RNA also possible?
>>
>> They should be DNA, In principle they could be also genes, or other kind of
>> characters, but this software is designed for the purpose of reconstructing
>> haplotypes from SNPs/microsatellites.
>> Maybe Tiago has some more experience in this..
>
> If it is for DNA only, the sequences/alignments returned should ideally specify
> a DNA alphabet.

mmm ok...
Basically it could be used also with characters like genes and other
markers.. but in that case, it would not make sense to parse it as a
sequence, so nobody would try to do it.

>> Because that would mean that one individual has only a chromosome.
>> It doesn't make sense to run fastPhase on an haploid individual.
>
> Is fastPhase only for haploids?  Could it be used with polyploidy (e.g.
> plants)?

I think not... It would be another class of problem.
What fastPhase does, is trying to infer haplotypes from genotype data.

Humans and most eukaryotes are diploid, so they have two copies of
each chromosome; when you genotype markers, for every individuals, you
get two informations for each (e.g.  'AC' for a SNP).
Let's say you are studying two SNPs in an single individual: you will
have 'AC' for the first marker, and 'GT' for the second (you already
know that they are in the same chromosome).
You want to know which are the haplotypes, which means, if the 'A'
from the first SNP is on the same molecule of the 'G' from the second
SNP, and so on.

For example, you could have a chromosome with 'AG' and the other with
'CT'; or a chromosome with 'AT' and the other with 'CG', and fastPhase
tries to calculate which is the most likely (I won't be able to
explain all the details properly).

Moreover, fastPhase (there are other programs) can infer missing
genotype data, which is useful when you have big collections of SNPs.

That said, I don't know if it is able to infer haplotypes in polyploid
organisms, but I don't think so, as it would be a different class of
problem (more complex).
I thought that the best thing to do is to do not support poliploidy,
and if someone else that uses fastPhase to calculate that comes, it
would be easy to adapt the module for it (it would require to just add
an option)

>> > On the other hand, are these hand edited files which deliberately break the
>> > rules?
>>
>> Yes. Usually you shouldn't have neither of the two cases. But I find it
>> useful when a script tells me if there are weird things in my files (I
>> could have modified them accidentally).
>
> Yes - negative test cases are good.  However, having them as a doctest made the
> docstring rather confusing.

mmm I know, that doctest could be refactored.
I have started using test recently... I find it is a lot better.

>
>> > If fastPHASE files SHOULD always come in allele groups (of the same
>> > length), then it would be better to integrate the parser into Bio.AlignIO
>> > giving pairwise alignments (and you would be able to read it via Bio.SeqIO
>> > automatically as well).
>>
>> This is good idea, I didn't think of it.
>> But how should I modify the module to produce AlignIO objects?
>
> Essentially Instead of:
>
> yield record_one
> yield record_two
>
> you'd do something like this:
>
> alignment = Alignment(generic_dna)
> alignment.add_sequence(id_one, seq_one)
> alignment.add_sequence(id_two, seq_two)
> yield alignment

sounds easy :)

>
>> > P.S. Your suggested format name "fastPhaseOutput" breaks the lower case
>> > rule.  Would "fastphase" be OK, or is there more than one format?  e.g.
>> > an input format which might be confused with this?
>>
>> I agree.. I wasn't sure of biopython's naming conventions.
>>
>
> This is written down elsewhere - but the format name is a lowercase string (and
> this is enforced in the API), and the same names are used in both SeqIO and
> AlignIO. Where possible we use the same name as BioPerl's SeqIO and EMBOSS.
>
> (In reply to comment #9)
>> (In reply to comment #7)
>> > Finally could you try the -Z command line argument for the simplified output
>> > format (described as two lines per individual, without "id" lines,
>> > subpopulation labels or summary information from the run).  Does this have
>> > the sequences?  If so this may be a more parser friendly set of output to
>> > parse for Bio.SeqIO and/or Bio.AlignIO.
>>
>> ok, I can try to implement both of the two formats, but for the moment I will
>> prefer to concetrate on one.
>
> I was actually thinking the -Z format might be much simpler to deal with (I
> didn't mean to suggest supporting both).  On the other hand, the documentation
> does say the -Z is "not intended for general use".

The problem is that it could take days to run a fastPhase... most of
the times you want the longer format, and then proceed to parse it.
Anyway, it should not be a big problem to implement it (I am just
putting all of that information in SeqRecord.description)

>
> Peter
>
>
> --
> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 13:20:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 13:20:20 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061820.mA6IKK31012133@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #49 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 13:20 EST -------
OK - thank you all for your input thus far.  Unfortunately it is clear that we
haven't reached a consensus about translating sequences which begin with the
start codon (or the more special case of translating a CDS sequence).

However, I hope we are all happy with how things look in CVS right now, which
offers a blind translation continuing over any stop codon, and the "to_stop"
option which will terminate translation at the first in frame stop codon:

See
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Seq.py?cvsroot=biopython
for the full code, but in summary:

class Seq(object):
    ...
    def translate(self, table="Standard", stop_symbol="*", to_stop=False):
        """Turns a nucleotide sequence into a protein sequence. New Seq object.

        Trying to back-transcribe a protein sequence raises an exception.
        This method will translate DNA or RNA sequences.

        Trying to translate a protein sequence raises an exception.

        table - Which codon table to use?  This can be either a name
                (string) or an NCBI identifier (integer).  This defaults
                to the "Standard" table.
        stop_symbol - Single character string, what to use for terminators.
                This defaults to the asterisk, "*".
        to_stop - Boolean, defaults to False meaning do a full translation
                continuing on past any stop codons (translated as the
                specified stop_symbol).  If True, translation is terminated
                at the first in frame stop codon (and the stop_symbol is
                not appended to the returned protein sequence).
        ...

With the module level function taking the same arguments:

def translate(sequence, table="Standard", stop_symbol="*", to_stop=False):
    """Translate a nucleotide sequence into amino acids.

    If given a string, returns a new string object.
    Given a Seq or MutableSeq, returns a Seq object with a protein
    alphabet.
    ...

I think everyone is content with the naming of the "to_stop" argument.

I'm planning to prepare the Biopython 1.49 beta release tomorrow, so I'm
proposing we leave translation like this for Biopython 1.49 (and close this
bug), and revisit translation after that is done (hopefully in less than two
weeks time).  The code in CVS is still a big improvement in terms of writing
object orientated code.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 13:34:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 13:34:03 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061834.mA6IY3ra013125@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 13:34 EST -------
Replying to Marco's email on the dev mailing list:

>> Are there any other tools that output this file format?  Do you think the
>> author might be willing to just add an option to output the sequences in
>> another format (e.g. FASTA, or better an alignment format designed for more
>> than one alignment).  This would be a neater solution in the long run (and
>> would benefit anyone using fastPhase - not just Biopython).
>
> Not for my knowledge.
> Anyway, consider that a fastPhase run could take days for medium/big samples.
> In some situations it could be faster to convert its output to fasta
> (or other ones) directly, instead of re-calculating the results.

OK - I had not appreciated the run time involved.  Clearly it would not be
sensible to have to repeat a long analysis just to get the results in another
format (e.g. as FASTA, or the simplified -Z output whatever that looks like).

>> If it is for DNA only, the sequences/alignments returned should ideally
>> specify a DNA alphabet.
>
> mmm ok...
> Basically it could be used also with characters like genes and other
> markers.. but in that case, it would not make sense to parse it as a
> sequence, so nobody would try to do it.

That's interesting, and means assuming DNA wouldn't be safe.  Just use the
single letter alphabet then (rather than defaulting to the completely generic
base alphabet).

>>> Because that would mean that one individual has only a chromosome.
>>> It doesn't make sense to run fastPhase on an haploid individual.
>>
>> Is fastPhase only for haploids?  Could it be used with polyploidy (e.g.
>> plants)?
>
> I think not... It would be another class of problem.
> What fastPhase does, is trying to infer haplotypes from genotype data.

OK - you can probably tell I'm not a population biologist from the questions ;)

>> I was actually thinking the -Z format might be much simpler to deal
>> with (I didn't mean to suggest supporting both).  On the other hand,
>> the documentation does say the -Z is "not intended for general use".
>
> The problem is that it could take days to run a fastPhase... most of
> the times you want the longer format, and then proceed to parse it.
> Anyway, it should not be a big problem to implement it

OK (as I wrote above), I can see now that using the simplified -Z output is not
sensible.

> (I am just putting all of that information in SeqRecord.description)

If we know the meaning of some of these fields, then ideally they should go in
the annotations dictionary, rather than just in the SeqRecord description.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 14:00:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 14:00:59 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061900.mA6J0xi3015085@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 14:00 EST -------
I've added a few doctests to SeqRecord.py in CVS revision 1.24, plus the simple
unit test from comment 7 to make sure these get validated as part of the
Biopython test suite.

How does that look to you Marco?  I've kept the __init__ example short, not
doing anything with annotations.

Do you think we should also have the __main__ trick in all modules with
doctests?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 14:41:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 14:41:44 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061941.mA6JfiHM019925@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #9 from dalloliogm at gmail.com  2008-11-06 14:41 EST -------
(In reply to comment #8)
> I've added a few doctests to SeqRecord.py in CVS revision 1.24, plus the simple
> unit test from comment 7 to make sure these get validated as part of the
> Biopython test suite.
> 
> How does that look to you Marco?  I've kept the __init__ example short, not
> doing anything with annotations.

I think they look ok.. to me, they seem good examples of how to use the module.


> Do you think we should also have the __main__ trick in all modules with
> doctests?

I am not really experienced in managing such big projects... but I think it
could be ok, at least for now.

I would personally keep the __init__ trick for every module, because it would
make easier to test a single module when you are still writing it.

But to test many modules subsequently, the code you posted in in #7 is the way
to do.

so... in short, I don't know!! :)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 15:34:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 15:34:36 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811062034.mA6KYa6b026157@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #50 from bsouthey at gmail.com  2008-11-06 15:34 EST -------
(In reply to comment #48)
> (In reply to comment #47)
> > (In reply to comment #46)
> 
> > > [seq.translate() for seq in seqlist if seq.is_cds()]
> > > 
> > > I prefer the second option, for readability, but YMMV.
> > 
> > Note the above wouldn't give you translations starting with methionine, you'd
> > need something like:
> > 
> > [seq.translate(cds_start=True) for seq in seqlist if seq.is_cds()]
> > 
> > (assuming we call the "init" option "cds_start")
> 
> Fair point... my focus was on putting that filter into the list comprehension.
> 
> > Or, going with the complete_cds option you could build a list of translations
> > of valid CDSs like this:
> > 
> > proteins = []
> > for seq in seqlist :
> >     try :
> >         proteins.append(seq.translate(complete_cds=True))
> >     except ValueError :
> >         #Not a valid CDS, excluded
> >         pass
> > 
> > Not a one liner, but I think in a real situation you'd want to do something
> > with the invalid CDSs anyway (even if just logging them).
> 
> True enough.  It comes down in part to a preference of style, as the same could
> be achieved with
> 
> proteins = []
> for seq in seqlist :
>     if seq.is_cds():
>         proteins.append(seq.translate(complete_cds=True))
>     else:
>         #Not a valid CDS, excluded
>         pass
> 
> I think the clarity of this arrangement to my eyes comes from 'is/is not a cds'
> being - naturally-speaking - a property or attribute of the sequence itself. 
> The 'cds_start' argument in your example is then an instruction to treat the
> translation as though you have a CDS, and implement some specialised behaviour
> that is appropriate under that circumstance, rather than to implement a test
> that raises an error if it is failed.  By separating the 'is_cds()' call from
> the 'cds_start' argument, you gain the ability to translate the sequence with
> either the methionine or the coded amino acid, without losing the test of the
> sequence being a CDS.
> 
> Of course, using the 'cds_start=True' argument could force a call to
> self.is_cds(), anyway.  Your non-one-liner could then be as you originally
> wrote:
> 
> proteins = []
> for seq in seqlist :
>     try:
>         proteins.append(seq.translate(complete_cds=True))
>     except ValueError:
>         #Not a valid CDS, excluded
>         pass
> 
> The two advantages I see to having the is_cds() method as a separate call are
> that it permits separation of the determining the CDS status of the sequence,
> and that it provides a filter that is more readable than attempting to
> translate the sequence to find out if it's a valid CDS.  If the 'cds_start'
> argument forces a self.is_cds() test, then the usage can be - I think - exactly
> as you've been proposing throughout the thread.
> 

The use of 'cds' alone is wrong because cds refer to DNA not translation and
not to protein sequences. The use of cds is confusing or at least vague until
you determine how it works. Also it could be wrong in the sense it is a valid
cds (see the GUG initiation in mammalian NAT1 example at the NCBI link) just
not allowed by the table in Bio.Data.CodonTable.

I don't object to the purpose, rather I do object to the name. My overriding
issue here is that 'cds_start' does not convey the purpose of this argument and
this is likely to remain for some time in the API. One interpretation that also
comes to mind is that it is the location of the start of the cds in the
sequence (cds start at...).

I really feel that the name must clearly reflect that it invokes a test that
the first codon are in the 'start_codon' list (defined by the selected table
from Bio.Data.CodonTable). This is not a check that it is the start of a cds
rather it is a check for a possible open reading frame (as not all open reading
frames are cds).  


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 23:46:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 23:46:08 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811070446.mA74k8Js031975@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp  2008-11-06 23:46 EST -------
(In reply to comment #12)
I have uploaded a fixed version of Bio.NaiveBayes to CVS. Can you check to see
if you're happy with this version?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From sbassi at gmail.com  Fri Nov  7 01:56:48 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Fri, 7 Nov 2008 04:56:48 -0200
Subject: [Biopython-dev] Possible problem with NCBIStandalone.blastall
Message-ID: <b43bf2080811062256i5b6a35a9h5e0cd4b531cb9496@mail.gmail.com>

When I run a command line blast with these parameters:

/root/blast-2.2.18/bin/blastall -p blastn -d /var/www/blast/db/UniVec
-q -5 -G 3 -E 3 -F "m D" -e 700 -Y 1.75e12 -i tmpsq

I find a match (with evalue of 18).
But when I do it from biopyhon I can't find any match:

rh, eh = NCBIStandalone.blastall(blast_exe, "blastn", db,
                                     fin, nuc_mismatch='-5',
                                     gap_open = '3',
                                     gap_extend = '3',
                                     search_length = '1.75e12',
                                     expectation='20')

Here is the input sequence:

>C07SpCP042I015.P5A02.R. [Clone-lib=pCLD 04541]
NNNCCCCCCCTCGAGGTCGACNNNNNNNNTAAGCTTGAAATTCTATGATATGCAGTTAGT
TGCTNCTNGTTTAGCATTGGTTGGTTAACTTAAAACCTTTTCCTGCAATAATTATATGGA
TAATATTACTTTACTTNNNNNNNTATTGCCTTCACTAATTTTTAGGATCTATTTTCTGTT
AAATGTTATCTCTTGTTCTTGAGAAGTGCTTTGGAGATCATTTTTCCATCGTATTAACAA
AAAGTGAAATAACTACTTGTGCAATCAGGCTTTTCCTACACCAGGGGATAAGGCAAATAA
ACTATTCACCTCCTTTAATTAGCTCCCCCCCCCCCCCCTCCCCTTCTTTTCTCTTCATTC
CTGANNNANTTAGCTAGTACGCACCATTCAATCAATTATTTCTGTTCCATTTTGTGCTAA
ATATGTTTTCAAATGTTTAATATAGTTCTGAAGACAGCAGTTTAATGTTTTGTCTGGCTA
ACTGCTATTCTAAGCTCATTGTTTCAGCTTGCAGTTTTGCAGCAAAACCTGTCTGCTGTC
CATGAAATCTGGAAGGAATGTAGTAAATTTTACAGTCTCAGCCTTCTATCTCTGAGGAAG
TTTATATGGTCCTTCACGGAGCTGAGAGATCTGAATTCAGCCCACACAGCCTTACAGCAC
ATGGTGAGATTGGCTTTTACGGAAAACTCTTACATTAGTAGAACTGCTGAGGGGAGGTTT
TGTGATTTAAGATTGGATATTCCAGCACCTTCCTCTGGCAATTGGAGTTTCATCGATGTA
TCTGTCGACACCGCGGGTAGCAGCAATTTTGATATGGAAAGACAAAGTCTTGGCAGAAAA
ACA

and here is the database:
ftp://ftp.ncbi.nih.gov/pub/UniVec/UniVec

(I got the parameter from
http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen_docs.html#Parameters)

Best,
SB.


-- 
Vendo isla: http://www.genesdigitales.com/isla/
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Tutorial libre de Python: http://tinyurl.com/2az5d5

"It is pitch black. You are likely to be eaten by a grue." -- Zork

From bugzilla-daemon at portal.open-bio.org  Fri Nov  7 04:37:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 7 Nov 2008 04:37:23 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811070937.mA79bNh9020433@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #51 from lpritc at scri.sari.ac.uk  2008-11-07 04:37 EST -------
Just to perpetuate, what I suggest is (in pseudocode, and with argument names
up for, well, argument):

class Seq:
   [...]
   def startswith_startcodon():
      """ Returns True if the first three bases of the sequence 
           are a valid start codon in the sequence's codon table,
           returns False otherwise
      """

    def endswith_stopcodon():
        """ Returns True if the length of the sequence is a multiple
             of three, and the last three bases are a valid stop codon 
             in the sequence's codon table, returns False otherwise
        """

    def is_cds():
        """ Returns true if the sequence meets the criteria for a CDS, 
             False otherwise.  The criteria are:
             i) The very first three bases of the sequence are a valid start
codon
             ii)  The sequence length is a multiple of three
             iii) The final three bases of the sequence are a valid stop codon
             iv) There are no in-frame stop codons, other than the final stop
codon
        """
        if not self.startswith_startcodon(): return False
        if not endswith_stopcodon(): return False
        # Test for in-frame stop codon, return True if none is found, return
False otherwise

   def translate([...], assert_cds=False, assert_cds_firstcodon=False):
        """ Returns a new Seq object with the protein translation.  
             If assert_cds is True, but the sequence is not a CDS as determined
by self.is_cds(), 
             then an error is thrown.  Otherwise, the sequence is translated
with the 
             first codon read as a methionine, rather than the amino acid which
it 
             would encode at any other position.
             If assert_cdsfirstcodon is true, but the sequence doesn't start
with a valid 
             start codon, then an error is thrown.  Otherwise, the sequence is
translated 
             with the first codon read as a methionine, as above. 
        """
        # Translate away as normal, here
        [...]
        if assert_cds:
            if not self.is_cds(): 
                raise ValueError, "WTF? This is no CDS, my good fellow human!"
            else:
                # Make the first amino acid of the translated sequence a Met
        if assert_cdsfirstcodon:
            if not self.startswith_startcodon():
                raise ValueError, "Hey!  Stop playing around, this sequence
doesn't start with a start codon"
            else:
                # Make the first amino acid of the translated sequence a Met
        # Then continue as normal

This approach provides the following behaviour (assuming things about argument
names that can be thrashed out later)

# I want to translate some nt sequence, and don't care about stops, starts, or
any other stuff
aaseq = ntseq.translate()
# I want to translate my nt sequence to the first in-frame stop codon, and no
further
aaseq = ntseq.translate(to_stop=True)
# I want to know if my nt sequence is a (putative) CDS
ntseq.is_cds()
# I want to know if my nt sequence starts with a start codon
ntseq.startswith_startcodon()
# I want to know if my nt sequence ends with an in-frame stop codon
# Note that this is a different question to asking whether there is *any*
in-frame stop codon
ntseq.endswith_stopcodon()
# I want to translate my nt sequence, which I know is a CDS, 
# but not convert the first codon to a methionine
aaseq = ntseq.translate()
# I want to translate my nt sequence, which I know is a CDS, 
# and convert the first codon to a methionine
aaseq = ntseq.translate(assert_cds=True)
# OK, my sequence isn't a *real* CDS, but it still starts with a valid start
codon
# (I checked already with ntseq.startswith_startcodon()), and I'd like to
convert the first
# codon as if it was really a CDS.  You don't need to know why, I just do.  I'm
wacky that way.
aaseq = ntseq.translate(assert_cdsfirstcodon=True)
# I'd like a list of all my sequences that are valid CDS
seqlist = [s for s in myntseqs if s.is_cds()]
# I'd like translations of all my sequences that are valid CDS
tlist1 = [s.translate() for s in seqlist]
tlist2 = [s.translate() for s in myntseqs if s.is_cds()]


In terms of nomenclature:

The default behaviour of translate() as Peter proposed: read through in-frame
and translate with the appropriate codon table - is fine in nearly all
circumstances.  Most other circumstances are covered by stopping at the first
in-frame stop codon, which Peter has implemented, and is an option we all seem
to agree on.

Biologically-speaking, this behaviour is not always correct for CDS in
prokaryotes, where alternative start codons may occur a significant minority of
the time.  These will be mistranslated if no provision is made for them.  I
think a useful biological sequence object should at least try to mimic actual
biology, so we should provide an option to handle this.

We should not assume that a sequence is a CDS unless it is specified by the
user.  It seems reasonable to me that the term 'cds' should occur in any such
argument from the user.

We have at least two options for how to proceed with a CDS: i) we can provide a
strict CDS-type translation, which requires confirmation that the sequence is,
in fact, a CDS; ii) we can provide a weak CDS-type translation, which only
modifies the way the start codon is translated.  In both cases, behaviour is
specific to CDS, and so having 'cds' in the argument name *somewhere* seems
obvious, and entirely reasonable.

I think that 'assert_cds' makes clear that we are asserting that the sequence
is a valid CDS - no internal stops and everything else that comes with that
status.

I think that 'assert_cdsfirstcodon' avoids any ambiguity over the word 'start',
and also conveys that we are asserting that the first (rather than start) codon
has some relationship to a CDS; in this case the relationship is that the first
codon of the sequence meets the criteria for a CDS.  But that's kind of a long
argument name ;)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov  7 04:48:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 7 Nov 2008 04:48:18 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811070948.mA79mIRl021035@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #52 from lpritc at scri.sari.ac.uk  2008-11-07 04:48 EST -------
(In reply to comment #50)

> The use of 'cds' alone is wrong because cds refer to DNA not translation and
> not to protein sequences. The use of cds is confusing or at least vague until
> you determine how it works. 

I think that translate() also refers only to nucleotide sequences, and
therefore the association of 'cds' is not inherently confusing on that count. 
I think that it can be an appropriate term in an argument name (see above).

> Also it could be wrong in the sense it is a valid
> cds (see the GUG initiation in mammalian NAT1 example at the NCBI link) just
> not allowed by the table in Bio.Data.CodonTable.

It's up to the user to use the correct codon table for their purpose, I think. 
Otherwise, how would you propose to correct for their error?

> [...] 'cds_start' [...] One interpretation that also
> comes to mind is that it is the location of the start of the cds in the
> sequence (cds start at...).

I agree with this.  It has the potential to be confusing.

> This is not a check that it is the start of a cds
> rather it is a check for a possible open reading frame (as not all open reading
> frames are cds).  

It is true that not all ORFs are CDS (indeed, by far the majority are not). 
However, open reading frames do not have to start with - or even contain - a
start codon.  They just do not contain an in-frame stop codon.  We've been over
this definition before (comment #21).

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Fri Nov  7 05:13:21 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 7 Nov 2008 10:13:21 +0000
Subject: [Biopython-dev] Possible problem with NCBIStandalone.blastall
In-Reply-To: <b43bf2080811062256i5b6a35a9h5e0cd4b531cb9496@mail.gmail.com>
References: <b43bf2080811062256i5b6a35a9h5e0cd4b531cb9496@mail.gmail.com>
Message-ID: <320fb6e00811070213i4aa5955arf233180d6a047de0@mail.gmail.com>

On Fri, Nov 7, 2008 at 6:56 AM, Sebastian Bassi wrote:
> When I run a command line blast with these parameters:
>
> /root/blast-2.2.18/bin/blastall -p blastn -d /var/www/blast/db/UniVec
> -q -5 -G 3 -E 3 -F "m D" -e 700 -Y 1.75e12 -i tmpsq
>
> I find a match (with evalue of 18).
> But when I do it from biopyhon I can't find any match:
>
> rh, eh = NCBIStandalone.blastall(blast_exe, "blastn", db,
>                                     fin, nuc_mismatch='-5',
>                                     gap_open = '3',
>                                     gap_extend = '3',
>                                     search_length = '1.75e12',
>                                     expectation='20')

You are not using exactly the same arguments, so its not surprising
you get different results:

-q -5 =>nuc_mismatch = -5 (or as a string)
-G 3 => gap_open = 3 (or as a string)
-E 3 => gap_extend = 3 (or as a string)
-F "m D" => filter="m D" (MISSING!)
-e 700 => expectation=700 (or as a string)
-Y = 1.75e12 => search_length = '1.75e12' (or as a float)

Your expectation cut off is more generous in the Biopython version
(700) than the commanline line version (20), but that wouldn't explain
the difference.  Its probably due to omitting the filter option (-F).
If that doesn't resolve the difference then there is something very
strange going on...

Peter

From bugzilla-daemon at portal.open-bio.org  Fri Nov  7 06:14:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 7 Nov 2008 06:14:13 -0500
Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like
	5933^5934 in GenBank/EMBL files
In-Reply-To: <bug-2622-42@http.bugzilla.open-bio.org/>
Message-ID: <200811071114.mA7BED84026709@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2622


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-07 06:14 EST -------
I've updated CVS to treat a between position like 3^4 (one based counting) as a
zero length slice 3:3.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov  7 06:19:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 7 Nov 2008 06:19:12 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811071119.mA7BJCjd027093@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-07 06:19 EST -------
Marking as fixed - I've updated SeqRecord.py in CVS revision 1.25 to call the
doctests via the __main__ trick, with similar changes for Bio.Seq, Bio.SeqIO
and Bio.AlignIO (the later are complicated due to finding the input files).

Thanks for the encouragement Marco - hopefully this has also made the docstring
documentation more useful, and will also improve the API docs too:
http://biopython.org/DIST/docs/api/ (updated for each release)

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov  7 06:52:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 7 Nov 2008 06:52:50 -0500
Subject: [Biopython-dev] [Bug 2613] test_Wise and test_psw fail under Python
	2.3
In-Reply-To: <bug-2613-42@http.bugzilla.open-bio.org/>
Message-ID: <200811071152.mA7BqoKj029425@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2613


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-07 06:52 EST -------
"Fixed" by skipping these tests (and the recently added test_docstrings.py) if
run on Python 2.3.

Python 2.3 doctest uses slightly different formatting.  It also doesn't support
some features like <BLANKLINE>


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Fri Nov  7 07:32:33 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 7 Nov 2008 12:32:33 +0000
Subject: [Biopython-dev] CVS freeze for Biopython 1.49 (beta)
Message-ID: <320fb6e00811070432x123e806foa06b7f3d94bdb068@mail.gmail.com>

Hi all,

I've been going over a few little things on the unit tests (e.g.
python 2.3's doctest isn't quite the same), and think I am ready to
prepare Biopython 1.49 (beta).

I plan to make the Windows installers for Python 2.3, 2.4 and 2.5
against numpy 1.1.1

Currently there is no Windows version of numpy for python 2.6, so we
won't be able to ship a Windows installer for python 2.6 for Biopython
either.

So, its CVS freeze time.

Once the beta is out (hopefully later today), we can start using CVS
for documentation updates or fixing any bugs reported in the beta.
Then in about a week's time I hope to do the  Biopython 1.49 "final"
release.

Peter

From bugzilla-daemon at portal.open-bio.org  Fri Nov  7 10:18:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 7 Nov 2008 10:18:47 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811071518.mA7FIlHb012537@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #14 from bsouthey at gmail.com  2008-11-07 10:18 EST -------
(In reply to comment #13)
> (In reply to comment #12)
> I have uploaded a fixed version of Bio.NaiveBayes to CVS. Can you check to see
> if you're happy with this version?
> 

Yes!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From sbassi at gmail.com  Fri Nov  7 11:30:34 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Fri, 7 Nov 2008 14:30:34 -0200
Subject: [Biopython-dev] Possible problem with NCBIStandalone.blastall
In-Reply-To: <320fb6e00811070213i4aa5955arf233180d6a047de0@mail.gmail.com>
References: <b43bf2080811062256i5b6a35a9h5e0cd4b531cb9496@mail.gmail.com>
	<320fb6e00811070213i4aa5955arf233180d6a047de0@mail.gmail.com>
Message-ID: <b43bf2080811070830xb99bd6bv31277968af2152f3@mail.gmail.com>

On Fri, Nov 7, 2008 at 8:13 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> -q -5 =>nuc_mismatch = -5 (or as a string)
> -G 3 => gap_open = 3 (or as a string)
> -E 3 => gap_extend = 3 (or as a string)
> -F "m D" => filter="m D" (MISSING!)

I will try with this.

> -e 700 => expectation=700 (or as a string)
> -Y = 1.75e12 => search_length = '1.75e12' (or as a float)

I used string since I have the biopython version with the bug that
doesn't allow me to enter non iterable values.

> the difference.  Its probably due to omitting the filter option (-F).
> If that doesn't resolve the difference then there is something very
> strange going on...

OK, I will check it and get back with the results.
Thank you.
Best,
SB.

From biopython at maubp.freeserve.co.uk  Fri Nov  7 11:53:58 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 7 Nov 2008 16:53:58 +0000
Subject: [Biopython-dev] CVS freeze for Biopython 1.49 (beta)
In-Reply-To: <320fb6e00811070432x123e806foa06b7f3d94bdb068@mail.gmail.com>
References: <320fb6e00811070432x123e806foa06b7f3d94bdb068@mail.gmail.com>
Message-ID: <320fb6e00811070853w77cd415dn68b1889c09388fb6@mail.gmail.com>

> Once the beta is out (hopefully later today), we can start using CVS
> for documentation updates or fixing any bugs reported in the beta.
> Then in about a week's time I hope to do the  Biopython 1.49 "final"
> release.

OK - Biopython 1.49 beta is done, available on the website now :)

Please don't do any new code checkins for the next week.  Additional
documentation and unit tests should be fine - and any bug fixes after
discussion.

I've done a news post, which I can edit if anyone spots anything wrong
or has suggestion for improvement, but it will be a good basis for the
announcement email:

http://news.open-bio.org/news/2008/11/biopython-149-beta-released/

Peter

From bugzilla-daemon at portal.open-bio.org  Fri Nov  7 11:55:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 7 Nov 2008 11:55:22 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811071655.mA7GtM6F018980@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-07 11:55 EST -------
Grand - this bug seems to be fixed then (and in time for Biopython 1.49 beta).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Nov  8 21:56:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 8 Nov 2008 21:56:59 -0500
Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in
	GenBank files
In-Reply-To: <bug-2225-42@http.bugzilla.open-bio.org/>
Message-ID: <200811090256.mA92uxgL025316@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2225


chapmanb at 50mail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from chapmanb at 50mail.com  2008-11-08 21:56 EST -------
Thanks Peter for the heads up on the future changes. Fixed this with respect to
the offered suggestions with Bio/GenBank/Record.py 1.12; Bio/GenBank/Scanner.py
1.25 and Bio/GenBank/__init__.py 1.95.

I left PROJECT output as shown in our example as it was not clear from the
GenBank documentation whether they would be on multiple or single lines. DBLINK
was output over multiple line as defined in the documentation. When files with
DBLINKs are released we should include a test case.

For feature parsing, both DBLINK and PROJECT will be stored as dbxrefs as
suggested.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 10:04:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 10:04:09 -0500
Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in
	GenBank files
In-Reply-To: <bug-2225-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091504.mA9F49hU030667@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2225


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-09 10:04 EST -------
You've got a minor bug in there Brad...

def dblink(self, content):
    """Store DBLINK cross references as dbxrefs in our record object.
    """
    dblinks = [l for l in content.split() if l]
    self.data.dbxrefs.extend(projects)

Should be: self.data.dbxrefs.extend(dblinks)

However, based on the example DBLINK line, we shouldn't be splitting on spaces
at all - for example this transition example for when the PROJECT line and
DBLINK lines are present:

LOCUS       CP000964             5641239 bp    DNA     circular BCT 24-SEP-2008
DEFINITION  Klebsiella pneumoniae 342, complete genome.
ACCESSION   CP000964
VERSION     CP000964.1  GI:206564770
PROJECT     GenomeProject:28471
DBLINK      Project:28471
            Trace Assembly Archive:123456
....

Note that "Trace Assembly Archive:123456" should be a single cross reference. 
I'll attach a patch for CVS in a moment.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 10:07:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 10:07:30 -0500
Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in
	GenBank files
In-Reply-To: <bug-2225-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091507.mA9F7U0N030977@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2225


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-09 10:07 EST -------
Created an attachment (id=1045)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1045&action=view)
Patch to Bio/GenBank/*.py

This patch against CVS assumes DBLINK lines contain one cross reference per
line.

Also maps "GenomeProject:" to "Project:" so that we'll be consistent when the
NCBI change this as part of the PROJECT line to DBLINK line switch.

Should avoid duplicate entries in the dbxrefs list (especially during the
transition period where both PROJECT and DBLINK lines are used).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Sun Nov  9 10:16:50 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 9 Nov 2008 15:16:50 +0000
Subject: [Biopython-dev] Biopython 1.49 beta released
Message-ID: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com>

Dear Biopythoneers,

We are pleased to announce a beta release of Biopython 1.49. There are
been some significant changes since Biopython 1.48 was released two
months ago, which is why we are initially releasing a beta for wider
testing.

As previously announced, the big news is that Biopython now uses NumPy
rather than its precursor Numeric (the original Numerical Python
library).

As in the previous releases, Biopython 1.49 beta supports Python 2.3,
2.4 and 2.5 but should now also work fine on Python 2.6. Please note
that we intend to drop support for Python 2.3 in a couple of releases
time.

We also have some new functionality, starting with the basic sequence
object (the Seq class) which now has more methods. This encourages a
more object orientated coding style, and makes basic biological
operations like transcription and translation more accessible and
discoverable.

Our BioSQL interface can now optionally fetch the NCBI taxonomy on
demand when loading sequences (via Bio.Entrez) allowing you to
populate the taxon/taxon_name tables gradually. Also, BioSQL should
now work with the psycopg2 driver for PostgreSQL (as well as the older
psycopg driver).

Finally, our old parsing infrastructure (Martel and Bio.Mindy) is now
considered to be deprecated, meaning mxTextTools is no longer required
to use Biopython. This should not affect any of the typically used
parsers (e.g. Bio.SeqIO and Bio.AlignIO).

So, if you are feeling brave and know the risks, please try out
Biopython 1.49 beta, and let us know on the mailing lists if it works,
or more importantly if something doesn't.

We'd also like feedback on the updated Biopython Tutorial and Cookbook:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Source distributions and Windows installers are available from the
Biopython website:
http://biopython.org/wiki/Download

Thanks!

-Peter on behalf of the Biopython developers

P.S. Those of you subscribed to our news feed would have seen this
announcement already.  For RSS links etc, see:
http://biopython.org/wiki/News

From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 11:00:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 11:00:39 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091600.mA9G0dZ6003494@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #11 from dalloliogm at gmail.com  2008-11-09 11:00 EST -------
(In reply to comment #10)
> Marking as fixed - I've updated SeqRecord.py in CVS revision 1.25 to call the
> doctests via the __main__ trick, with similar changes for Bio.Seq, Bio.SeqIO
> and Bio.AlignIO (the later are complicated due to finding the input files).
> 
> Thanks for the encouragement Marco - hopefully this has also made the docstring
> documentation more useful, and will also improve the API docs too:
> http://biopython.org/DIST/docs/api/ (updated for each release)

Thanks to you!! :)
I am really happy you accepted my patch. 
I'll see if I can contribute something else.
> 
> Peter
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Sun Nov  9 11:10:59 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 9 Nov 2008 16:10:59 +0000
Subject: [Biopython-dev] Sequences and simple plots
In-Reply-To: <C51BB9C7.17C1C%lpritc@scri.ac.uk>
References: <320fb6e00810150709u2aed9855kb8cf91318f287765@mail.gmail.com>
	<C51BB9C7.17C1C%lpritc@scri.ac.uk>
Message-ID: <320fb6e00811090810s342e78f1n3eb45bba051d236f@mail.gmail.com>

Getting back to simpler plot examples using pylab, Andrew Dalke wrote
up some nice examples plotting Kyte & Doolittle hydrophobicities of
protein sequences:

http://www.dalkescientific.com/writings/NBN/plotting.html

Something based on this idea (but probably leaving out most of the
complicated smoothing stuff and labelling the helices) could make a
short and sweet line plot example for the Biopython tutorial.

Peter

From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 12:29:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 12:29:34 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091729.mA9HTYF1011072@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


dalloliogm at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1042 is|0                           |1
           obsolete|                            |


------- Comment #12 from dalloliogm at gmail.com  2008-11-09 12:29 EST -------
Created an attachment (id=1046)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1046&action=view)
fastPhase output iterator (returns Alignment objects)

This is the rewritten fastphaseoutputIO, which returns an Alignment file
instead of SeqRecords objects.
It can still return SeqRecord objects if a 'ret = seqrecord' parameter is
passed, but Alignemnt are returned by default.

Moreover, I have de-capitalized (.lower()) the name of the function, and added
a link to fastPhase article in the documentation (althought I think the doc
would need more work)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 12:30:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 12:30:25 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091730.mA9HUP6J011190@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


dalloliogm at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1046 is|0                           |1
           obsolete|                            |


------- Comment #13 from dalloliogm at gmail.com  2008-11-09 12:30 EST -------
Created an attachment (id=1047)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1047&action=view)
a doctest file to test fastPhaseOutputIterator


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 12:34:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 12:34:19 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091734.mA9HYJ7I011664@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #14 from dalloliogm at gmail.com  2008-11-09 12:34 EST -------
Created an attachment (id=1048)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1048&action=view)
use cases/description for fastphaseoutputIO

This is a collection of use cases/examples about fastPhaseOutputIO.
I thought it could be useful to understand how this module will be used and by
who, or just to remind me why I wrote this module later :)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 12:41:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 12:41:26 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091741.mA9HfQlr012379@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #15 from dalloliogm at gmail.com  2008-11-09 12:41 EST -------
(In reply to comment #10)
> (In reply to comment #8)
> > > If fastPHASE files SHOULD always come in allele groups (of the same
> > > length), then it would be better to integrate the parser into Bio.AlignIO
> > > giving pairwise alignments (and you would be able to read it via Bio.SeqIO
> > > automatically as well).
> > 
> > This is good idea, I didn't think of it.
> > But how should I modify the module to produce AlignIO objects?
> 
> Essentially Instead of:
> 
> yield record_one
> yield record_two
> 
> you'd do something like this:
> 
> alignment = Alignment(generic_dna)
> alignment.add_sequence(id_one, seq_one)
> alignment.add_sequence(id_two, seq_two)
> yield alignment


I have modified the module so it returns Alignment objects instead of
SeqRecords.
The problem is that Alignment.add_sequence doesn't support SeqRecords objects
as inputs; it only requires an id and the sequence.
This causes that some information is lost: to be more precise, everything I was
putting in 'description' (subpop. label: 6  (internally 1)) is lost, because
there is not a way to store it in the Alignment object.
Moreover, now the parser only returns a single Alignment object per file (I
think it is not supposed to be possible to have two fastphase outputs in the
same file), because I thought it was the most useful thing.
However, I left an option to have SeqRecord objects returned instead of
Alignments (unfortunately I removed them from the doctests :().


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 12:46:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 12:46:13 -0500
Subject: [Biopython-dev] [Bug 2554] Creating an Alignment from a list of
	SeqRecord objects
In-Reply-To: <bug-2554-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091746.mA9HkDPr012817@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2554


------- Comment #3 from dalloliogm at gmail.com  2008-11-09 12:46 EST -------
(In reply to comment #0)
> It would be nice to be able to supply a list (or iterator) of SeqRecord objects
> when creating an alignment object.  This would also make the
> Bio.SeqIO.to_alignment() function obsolete.

I agree with this request; see
http://bugzilla.open-bio.org/show_bug.cgi?id=2643#c15


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 12:52:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 12:52:48 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091752.mA9HqmqQ013518@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #12 from dalloliogm at gmail.com  2008-11-09 12:52 EST -------
Created an attachment (id=1049)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1049&action=view)
add doctests to Bio.Align.Generic.Alignment

This is a patch to add doctest to Bio.Align.Generic.Alignment.
I just wrote it for myself to understand how this class works.. if you think it
could be useful, here it is.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 16:35:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 16:35:25 -0500
Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in
	GenBank files
In-Reply-To: <bug-2225-42@http.bugzilla.open-bio.org/>
Message-ID: <200811092135.mA9LZPBG004563@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2225


------- Comment #6 from chapmanb at 50mail.com  2008-11-09 16:35 EST -------
Peter -- thanks for the bug catch and suggestion. Working into the future and
trying to predict if NCBI is going to do what they plan is always fun. Your fix
looks great to me -- commit away and we can close this out. If things are
different when the actually make the change we can always adjust then but this
looks very sensible.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 03:58:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 03:58:52 -0500
Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for
	arguments for their types
In-Reply-To: <bug-2639-42@http.bugzilla.open-bio.org/>
Message-ID: <200811100858.mAA8wq2i007149@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2639


dalloliogm at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|SeqRecord.init doesn't check|SeqRecord.init doesn't check
                   |for arguments to their types|for arguments for their
                   |                            |types


------- Comment #5 from dalloliogm at gmail.com  2008-11-10 03:58 EST -------
(In reply to comment #4)
> (In reply to comment #3)
> > Created an attachment (id=1041)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1041&action=view) [details] [details]
> > add a check for the seq argument in seqrecord, to be a Seq object and not None
> >
> > This patch adds a check for the seq argument in SeqRecord.
> > If seq is None (by default), it raises a ValueError Exception.
> > If it is a Seq objects, it saves it as self.seq.
> > If it is another kind of object (string, list, integer), it is converted to a
> > string, and then used to instantiate a seq object.
> 
> I was deliberately not checking the seq argument. 

Ok, understood. I didn't thought of these cases.
However, having not a Seq causes errors that are difficult to understand in
other functions that use SeqRecord.
For example, if you do:

>>> a = SeqRecord(id = '1')
>>> a.format('fasta')

you get the error: 
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute
'tostring'

This could scary an eventual biopython newbie, an exception like to 'error -
current SeqRecord object doesn't have a Seq' could be better.
What do you think about creating a 'NullSeq' object, which represent a Seq with
no value, and using it as a default for SeqRecord?
Later we could modify the other functions like .format e Seq.translate to
intercept these objects and return the right error message.


> There are several reasonable
> use cases:
> 
> * a Seq object (normal) or a subclass of it.
> * a MutableSeq object (seems reasonable, note this is not a subclass of Seq)
> * None (seems a good way to handle sequence records where we don't know the
> sequence - for example some GenBank files).
> * a user defined sequence object which implements the Seq API but does not
> subclass Seq or MutableSeq (this is more difficult to check).
> 
> > I thought that someone could use an integer (e.g.: 010100010101101) as a
> > sequence, and in this case, the integer is first converted to a string
> > (otherwise Seq() would return an error).
> 
> Note that if someone did want to use some weird numerical sequence, then the
> SeqRecord object should NOT be trying to do anything special (guessing what is
> intended). The user should create a suitable Seq object themselves (ideally
> with a numerical alphabet object).  Explicit rather than implicit (Zen of
> python).
> 
> --
> 
> Note that I'm not 100% happy with the type checking we've just added.  See
> "duck-typing" and interfaces versus types,
> http://www.python.org/doc/2.5.2/tut/node18.html#l2h-46
> 
> The checks I've added shouldn't be too constraining - but maybe they should use
> using interface checking instead (or just revert back to no checking).
> 
> Any comments from other people?  This should be being CC'd to the dev mailing
> list.
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 04:09:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 04:09:42 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811100909.mAA99g8S008678@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


dalloliogm at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1043 is|0                           |1
           obsolete|                            |


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 05:16:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 05:16:14 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101016.mAAAGERI012974@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-10 05:16 EST -------
(In reply to comment #15)
> I have modified the module so it returns Alignment objects instead of
> SeqRecords.
> The problem is that Alignment.add_sequence doesn't support SeqRecords objects
> as inputs; it only requires an id and the sequence.  This causes that some
> information is lost: to be more precise, everything I was
> putting in 'description' (subpop. label: 6  (internally 1)) is lost, because
> there is not a way to store it in the Alignment object.

Adding a SeqRecord to an alignment would be enhancement request Bug 2553.  I
see you've just spotted enhancement request Bug 2554 which would also solve
this issue nicely. As a short term solution until one of these bugs is
implemented, some of the Bio.AlignIO parsers "cheat" and bypass the public API
to use alignment._records directly (this is just a list of SeqRecord objects).

> Moreover, now the parser only returns a single Alignment object per file (I
> think it is not supposed to be possible to have two fastphase outputs in the
> same file), because I thought it was the most useful thing.

Bio.AlignIO uses generators/iterators just like Bio.SeqIO - so that in general
you can return multiple alignments for use with Bio.AlignIO.parse().  However,
if the file format really does just return one pairwise alignment, then just
yield one alignment (this happens on the Nexus file format).

> However, I left an option to have SeqRecord objects returned instead of
> Alignments (unfortunately I removed them from the doctests :().

If you want this as part of Bio.AlignIO / Bio.SeqIO you don't need to do this. 
Once a parser is added to Bio.AlignIO, the file format can also be used from
Bio.SeqIO to get SeqRecord objects (the rows of all the alignments).

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 05:45:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 05:45:34 -0500
Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in
	GenBank files
In-Reply-To: <bug-2225-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101045.mAAAjYJ6015314@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2225


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-10 05:45 EST -------
(In reply to comment #3)
> When files with DBLINKs are released we should include a test case.

Definitely.  We might be able to just update an existing test case, like the
one added for between locations.

(In reply to comment #6)
> Peter -- thanks for the bug catch and suggestion. Working into the future
> and trying to predict if NCBI is going to do what they plan is always fun. 

Well - they've got about six months to change their mind ;)

> Your fix looks great to me -- commit away and we can close this out.

Checked in.

> If things are different when the actually make the change we can always
> adjust then but this looks very sensible.

OK.

Thanks!

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Mon Nov 10 06:28:00 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 10 Nov 2008 11:28:00 +0000
Subject: [Biopython-dev] [BioPython] annotations in an Alignment object
In-Reply-To: <5aa3b3570811100304o4655fe60o4ecabf41e054c211@mail.gmail.com>
References: <5aa3b3570811100304o4655fe60o4ecabf41e054c211@mail.gmail.com>
Message-ID: <320fb6e00811100328j1a565c36t7f3522344e7c95c0@mail.gmail.com>

On Mon, Nov 10, 2008 at 11:04 AM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
> Is there any way to store some annotations in an Alignment object??
> For example: the alignment tool used, its parameters, its version, the
> date, and the nature of the sequence aligned.

Not officially, no.  This is on my mental list of things to do with
the alignment object (after Biopython 1.49 is done).  I've CC'd the
dev-mailing list which is probably a better place to discuss the
details.

If you look at Bio/AlignIO/StockholmIO.py or the
Bio/AlignIO/FastaIO.py code you'll see I've recorded this kind of
information in a private dictionary, i.e. alignment._annotations.
This makes the data available if anyone really needs it, but signals
that this is not part of the public API and is likely to change.

As part of an alignment annotation enhancement, we should try and
establish some agreed standards for naming annotation entries (and
also counting systems).

> I am asking this because I would like to write a module to create
> ldhat input files from an alignment program.
> A ldhat file (http://www.stats.ox.ac.uk/~mcvean/LDhat/instructions.html)
> is very similar to a fasta file; the only difference is that in its
> first line, it contains three numbers, one of which can't always be
> inferred by the data.

Why go to the trouble of making a new Bio.AlignIO module?  For this
example from the LDhat manual, it looks like a FASTA file with an
extra header:

4 10 1
>SampleA
TCCGC??RTT
>SampleB
TACGC??GTA
>SampleC
TC?-CTTGTA
>SampleD
TCC-CTTGTT

Rather than writing support for a whole new file format, wouldn't it
be easier to do something like this:

alignment = ...
number_a = 4
number_b = 10
number_c = 1

handle = open("example.txt","w")
handle.write("%i %i %i\n" % (number_a, number_b, number_c))
handle.write(alignment.format("fasta"))
handle.close()

Peter

From dalloliogm at gmail.com  Mon Nov 10 06:42:31 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 10 Nov 2008 12:42:31 +0100
Subject: [Biopython-dev] [BioPython] annotations in an Alignment object
In-Reply-To: <320fb6e00811100328j1a565c36t7f3522344e7c95c0@mail.gmail.com>
References: <5aa3b3570811100304o4655fe60o4ecabf41e054c211@mail.gmail.com>
	<320fb6e00811100328j1a565c36t7f3522344e7c95c0@mail.gmail.com>
Message-ID: <5aa3b3570811100342t7c23c0fl2b101be3fd352159@mail.gmail.com>

On Mon, Nov 10, 2008 at 12:28 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Mon, Nov 10, 2008 at 11:04 AM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>> Is there any way to store some annotations in an Alignment object??
>> For example: the alignment tool used, its parameters, its version, the
>> date, and the nature of the sequence aligned.
>
> Not officially, no.  This is on my mental list of things to do with
> the alignment object (after Biopython 1.49 is done).  I've CC'd the
> dev-mailing list which is probably a better place to discuss the
> details.
>
> If you look at Bio/AlignIO/StockholmIO.py or the
> Bio/AlignIO/FastaIO.py code you'll see I've recorded this kind of
> information in a private dictionary, i.e. alignment._annotations.
> This makes the data available if anyone really needs it, but signals
> that this is not part of the public API and is likely to change.
>
> As part of an alignment annotation enhancement, we should try and
> establish some agreed standards for naming annotation entries (and
> also counting systems).

ok... I will use the private dictionary for my own implementation.
Unfortunately I don't have any useful suggestion for this..

>> I am asking this because I would like to write a module to create
>> ldhat input files from an alignment program.
>> A ldhat file (http://www.stats.ox.ac.uk/~mcvean/LDhat/instructions.html)
>> is very similar to a fasta file; the only difference is that in its
>> first line, it contains three numbers, one of which can't always be
>> inferred by the data.
>
> Why go to the trouble of making a new Bio.AlignIO module?  For this
> example from the LDhat manual, it looks like a FASTA file with an
> extra header:

Yeah.. of course :)
Let's say I am simply playing with biopython's code, to better understand it.
Since I am going to use this function many times, I will have to write
a module for it any way.
The first number in the ldhat file is the number of sequences, the
second is their length, and the third should be usually one in an
alignment object, I suppose.

>
> 4 10 1
>>SampleA
> TCCGC??RTT
>>SampleB
> TACGC??GTA
>>SampleC
> TC?-CTTGTA
>>SampleD
> TCC-CTTGTT
>
> Rather than writing support for a whole new file format, wouldn't it
> be easier to do something like this:
>
> alignment = ...
> number_a = 4
> number_b = 10
> number_c = 1
>
> handle = open("example.txt","w")
> handle.write("%i %i %i\n" % (number_a, number_b, number_c))
> handle.write(alignment.format("fasta"))
> handle.close()
>
> Peter
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it

From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 06:48:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 06:48:08 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101148.mAABm8WO019854@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1033 is|0                           |1
           obsolete|                            |


------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-10 06:48 EST -------
(From update of attachment 1033)
Something similar was checked into CVS.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 07:02:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 07:02:12 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101202.mAAC2CV4020912@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1049 is|0                           |1
           obsolete|                            |


------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-10 07:02 EST -------
(From update of attachment 1049)
I've checked in something similar to CVS - thanks Marco.

I've not added a doctest for the format method using "clustal" because I think
the <BLANKLINE> bits make the documentation nasty to read.  Instead I've just
"fasta" and "phylip" only.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 07:14:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 07:14:28 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101214.mAACESXB021859@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-10 07:14 EST -------
(In reply to comment #16)
> (In reply to comment #15)
> > I have modified the module so it returns Alignment objects instead of
> > SeqRecords.
> > The problem is that Alignment.add_sequence doesn't support SeqRecords
> > objects as inputs; it only requires an id and the sequence.  This
> > causes that some information is lost: to be more precise, everything
> > I was putting in 'description' (subpop. label: 6  (internally 1)) is
> > lost, because there is not a way to store it in the Alignment object.
> 
> Adding a SeqRecord to an alignment would be enhancement request Bug 2553.  I
> see you've just spotted enhancement request Bug 2554 which would also solve
> this issue nicely. As a short term solution until one of these bugs is
> implemented, some of the Bio.AlignIO parsers "cheat" and bypass the public API
> to use alignment._records directly (this is just a list of SeqRecord objects).

Or, for another approach which at least avoids private properties but instead
makes an assumption that added sequences are always put at the end of the
alignment:

alignment = Alignment(generic_dna)

alignment.add_sequence(id_one, seq_one)
assert alignment[-1].id == id_one
alignment[-1].description = desrc_one
alignment[-1].annotations["label"] = label_one
...

alignment.add_sequence(id_two, seq_two)
assert alignment[-1].id == id_two
alignment[-1].description = desrc_two
alignment[-1].annotations["label"] = label_two
...
yield alignment

However, I agree with you, the best solution is to pass SeqRecord objects to
the alignment directly (i.e. Bug 2553 and/or Bug 2554).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 11:04:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 11:04:06 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101604.mAAG46Cj008024@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #18 from dalloliogm at gmail.com  2008-11-10 11:04 EST -------
(In reply to comment #17)
> 
> Or, for another approach which at least avoids private properties but instead
> makes an assumption that added sequences are always put at the end of the
> alignment:
> 
> alignment = Alignment(generic_dna)
> 
> alignment.add_sequence(id_one, seq_one)
> assert alignment[-1].id == id_one
> alignment[-1].description = desrc_one
> alignment[-1].annotations["label"] = label_one
> ...
> 
> alignment.add_sequence(id_two, seq_two)
> assert alignment[-1].id == id_two
> alignment[-1].description = desrc_two
> alignment[-1].annotations["label"] = label_two
> ...
> yield alignment
> 

Ok!! I ended up using the first method, but I left a comment in the code to
remind me that.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 11:06:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 11:06:49 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101606.mAAG6nDL008314@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


dalloliogm at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1044 is|0                           |1
           obsolete|                            |


------- Comment #19 from dalloliogm at gmail.com  2008-11-10 11:06 EST -------
Created an attachment (id=1050)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1050&action=view)
fastPhase output iterator (returns an Alignment object with SeqRecords)

This version returns an Alignment object with valid SeqRecord objects, using
the Alignment._records.append trick.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 11:07:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 11:07:27 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101607.mAAG7RLr008403@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


dalloliogm at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1047 is|0                           |1
           obsolete|                            |


------- Comment #20 from dalloliogm at gmail.com  2008-11-10 11:07 EST -------
Created an attachment (id=1051)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1051&action=view)
1047: a doctest file to test fastPhaseOutputIterator

updated for attachment 1050


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 11:34:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 11:34:34 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101634.mAAGYYbi010826@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #21 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-10 11:34 EST -------
Hi Marco,

Looking at your example, the important part of the file is this bit:

...
BEGIN GENOTYPES
Ind1  # subpop. label: 6  (internally 1)
T T T T T G A A A C C A A A G A C G C T G C G T C A G C C T G C A A T C T G
T T T T T G C C C C C A A A A G C G C G T C G T C A G T C T A A G A C C T A
Ind2  # subpop. label: 6  (internally 1)
C T T T T G C C C T C A A A A G T G C T G T G C C A G T C T A C G G C C T G
T T T T T G A A A C C A A A G A C G C T T C G T C A G T A T A C G A T C T A
END GENOTYPES

Quoting the manual again, "Output ???les for inferred haplotypes or imputed
genotypes contain two lines per given diploid individual, with the order of
individuals corresponding to that supplied in the input ???le."

In this example we have two individuals, Ind1 and Ind2 (presumably with
automatically assigned names).  In a real world example, how many individuals
would you expect to use?  Does it make more sense to return a pairwise
alignment for each individual, rather than one large combined alignment?  One
of the main points for using iterators/generators is they allow us to deal with
very large files by not having to keep everything in memory.  Now I don't have
a feel for what sized files fastPhase could output - maybe a single large
alignment is fine.

i.e. One combined alignment:

IUPACUnambiguousDNA() alignment with 4 rows and 38 columns
TTTTTGAAACCAAAGACGCTGCGTCAGCCTGCAATCTG Ind1_all1
TTTTTGCCCCCAAAAGCGCGTCGTCAGTCTAAGACCTA Ind1_all2
CTTTTGCCCTCAAAAGTGCTGTGCCAGTCTACGGCCTG Ind2_all1
TTTTTGAAACCAAAGACGCTTCGTCAGTATACGATCTA Ind2_all2

versus one pairwise alignment per individual:

IUPACUnambiguousDNA() alignment with 2 rows and 38 columns
TTTTTGAAACCAAAGACGCTGCGTCAGCCTGCAATCTG Ind1_all1
TTTTTGCCCCCAAAAGCGCGTCGTCAGTCTAAGACCTA Ind1_all2

IUPACUnambiguousDNA() alignment with 2 rows and 38 columns
CTTTTGCCCTCAAAAGTGCTGTGCCAGTCTACGGCCTG Ind2_all1
TTTTTGAAACCAAAGACGCTTCGTCAGTATACGATCTA Ind2_all2

I think you'll have to decide this (unless anyone else following this has a
view - Tiago maybe?)

P.S. Have you tried with and without the -n option to automatically name the
individuals?  What happens if the name includes a hash character (#)?  I would
hope fastPhase would treat this as an error, but it could end up in the output
file and confuse the parser.

P.P.S. Based on the examples in the manual, typical output might use lower case
nucleotides (a, t, c, g) or numbers (0, 1).  I presume upper case nucleotides
are also fine, but defaulting to this is a bad idea.  Please default to
Bio.Alphabet.single_letter_alphabet which seems to be the the safest choice (we
shouldn't guess).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 14:19:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 14:19:15 -0500
Subject: [Biopython-dev] [Bug 2649] New: Bio.KDTree expects numpy array with
	dtype="float32" on 64 bit machines.
Message-ID: <bug-2649-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2649

           Summary: Bio.KDTree expects numpy array with dtype="float32" on
                    64 bit machines.
           Product: Biopython
           Version: 1.49b
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: paul at rudin.co.uk


Bio.KDTree expects numpy array with dtype="float32" on 64 bit machines. The
numpy default for floats is "float64" on 64 bit machines and this would seem to
be a more natural and practical choice.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 17:25:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 17:25:33 -0500
Subject: [Biopython-dev] [Bug 2651] New: Error from test_GAQueens.py
Message-ID: <bug-2651-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2651

           Summary: Error from test_GAQueens.py
           Product: Biopython
           Version: 1.49b
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


I got this error with Python2.5 but it is extremely rare. I think that I seen
it before but have never reproduced it. It indicates some bugs are lurking
other than the obvious bug with Seq.py that are being triggered by the test.

======================================================================
ERROR: test_GAQueens                                                  
----------------------------------------------------------------------
Traceback (most recent call last):                                    
  File "run_tests.py", line 125, in runTest                           
    self.runSafeTest()                                                
  File "run_tests.py", line 142, in runSafeTest                       
    cur_test.run_tests([])                                            
  File "test_GAQueens.py", line 42, in run_tests                      
    main(arguments)                                                   
  File "test_GAQueens.py", line 76, in main                           
    evolved_pop = evolver.evolve(queens_solved)                       
  File
"/home/bsouthey/python/biopython-1.49b/build/lib.linux-x86_64-2.5/Bio/GA/Evolver.py",
line 56, in evolve
    self._population = self._selector.select(self._population)                  
  File
"/home/bsouthey/python/biopython-1.49b/build/lib.linux-x86_64-2.5/Bio/GA/Selection/Tournament.py",
line 77, in select
    new_orgs[1])                                                                
  File
"/home/bsouthey/python/biopython-1.49b/build/lib.linux-x86_64-2.5/Bio/GA/Selection/Abstract.py",
line 53, in mutate_and_crossover
    final_org_1 = self._repairer.repair(final_org_1)                            
  File "test_GAQueens.py", line 234, in repair                                  
    duplicated_items = self._get_duplicates(organism.genome)                    
  File "test_GAQueens.py", line 203, in _get_duplicates                         
    if genome.count(item) > 1:                                                  
  File
"/home/bsouthey/python/biopython-1.49b/build/lib.linux-x86_64-2.5/Bio/Seq.py",
line 796, in count                                
    if len(search) == 1 :                                                       
TypeError: object of type 'int' has no len()                                    

----------------------------------------------------------------------


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 18:28:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 18:28:26 -0500
Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py
In-Reply-To: <bug-2651-42@http.bugzilla.open-bio.org/>
Message-ID: <200811102328.mAANSQiJ032135@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2651


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |minor
          Component|Main Distribution           |Unit Tests


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-10 18:28 EST -------
What bug in Seq?  Trying to call the count method with an integer argument
instead of string or another Seq should fail - try it on a string for
comparison:

>>> "123456".count(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: expected a character buffer object

I would agree that the TypeError message could be better, "object of type 'int'
has no len()" is a little misleading.  Are you suggesting that be changed?

Genetic algorithms (with a random seed at least) are non deterministic - I've
seen some of the GA unit tests fail every so often (but I'm not sure off hand
if its just test_GAQueens or not).  Rerunning the test will usually be fine. 
The traceback looks familiar so its probably the same issue, but I haven't had
the time or desire to trace through the code to try and work out what is going
wrong.  I would guess it fails far less than 10% of time, but maybe 1% or 2%. 
I guess a quick shell script would answer this ;)

Maybe we should catch the error condition and issue a runtime error saying
"Didn't converge" or whatever would be appropriate terminology.  Or
automatically restart the test?  Or, maybe we can solve the unit test failure
by specifying a random seed - that might be a neat solution.

N.B. Refiling under unit tests.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 21:30:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 21:30:46 -0500
Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py
In-Reply-To: <bug-2651-42@http.bugzilla.open-bio.org/>
Message-ID: <200811110230.mAB2Ukq2020297@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2651


------- Comment #2 from bsouthey at gmail.com  2008-11-10 21:30 EST -------
(In reply to comment #1)
> What bug in Seq?  Trying to call the count method with an integer argument
> instead of string or another Seq should fail - try it on a string for
> comparison:
> 
> >>> "123456".count(1)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: expected a character buffer object
> 
> I would agree that the TypeError message could be better, "object of type 'int'
> has no len()" is a little misleading.  Are you suggesting that be changed?

That is an 'obvious' bug (in light of the error) because there is no check for
that 'sub' is a string. Using the example from the docstring:
my_mseq = MutableSeq("AAAATGA")
my_mseq.count(1) 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.5/site-packages/Bio/Seq.py", line 722, in count
    if len(search) == 1 :
TypeError: object of type 'int' has no len()

Note that using a dict or list work but perhaps these should not. I think you
need to check that 'search' is a string (isinstance(search,basestring)). If
not, then fail with some more informative message. 


> 
> Genetic algorithms (with a random seed at least) are non deterministic - I've
> seen some of the GA unit tests fail every so often (but I'm not sure off hand
> if its just test_GAQueens or not).  Rerunning the test will usually be fine. 
> The traceback looks familiar so its probably the same issue, but I haven't had
> the time or desire to trace through the code to try and work out what is going
> wrong.  I would guess it fails far less than 10% of time, but maybe 1% or 2%. 
> I guess a quick shell script would answer this ;)
> 
> Maybe we should catch the error condition and issue a runtime error saying
> "Didn't converge" or whatever would be appropriate terminology.  Or
> automatically restart the test?  Or, maybe we can solve the unit test failure
> by specifying a random seed - that might be a neat solution.
> 
> N.B. Refiling under unit tests.
> 

I agree with doing one or more of these at least until the source is identified
(hopefully a known case). But I do agree that this is not easy to find and I do
not know anything to help.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 05:10:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 05:10:45 -0500
Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py
In-Reply-To: <bug-2651-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111010.mABAAjQq029851@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2651


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-11 05:10 EST -------
(In reply to comment #2)
>(In reply to comment #1)
>> What bug in Seq?  Trying to call the count method with an integer argument
>> instead of string or another Seq should fail - try it on a string for
>> comparison:
>> 
>> >>> "123456".count(1)
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in ?
>> TypeError: expected a character buffer object
>> 
>> I would agree that the TypeError message could be better, "object of type
>> 'int' has no len()" is a little misleading.  Are you suggesting that be
>> changed?
> 
> That is an 'obvious' bug (in light of the error) because there is no check for
> that 'sub' is a string. Using the example from the docstring:
> my_mseq = MutableSeq("AAAATGA")
> my_mseq.count(1) 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/lib64/python2.5/site-packages/Bio/Seq.py", line 722, in count
>     if len(search) == 1 :
> TypeError: object of type 'int' has no len()
> 
> Note that using a dict or list work but perhaps these should not. I think you
> need to check that 'search' is a string (isinstance(search,basestring)). If
> not, then fail with some more informative message. 

That's done in CVS.

Leaving this bug open to cover the test_GAQueens.py issue.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 06:30:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 06:30:16 -0500
Subject: [Biopython-dev] [Bug 2652] New: Bio.Fasta.Iterator fails with
	IndexError when opening empty fasta files
Message-ID: <bug-2652-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2652

           Summary: Bio.Fasta.Iterator fails with IndexError when opening
                    empty fasta files
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: rjalves at igc.gulbenkian.pt


Instead of IndexError a better error handling or at least a more explicit error
message. At the first look it's not obvious what is causing the error.

Example:

In [1]: from Bio import Fasta

In [2]: Fasta.Iterator(open("empty.fasta"))
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)

/var/lib/python-support/python2.5/Bio/Fasta/__init__.pyc in __init__(self,
handle, parser, debug)
     65         while True :
     66             line = handle.readline()
---> 67             if line[0] == ">" :
     68                 break
     69             if debug : print "Skipping: " + line

IndexError: string index out of range


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 06:30:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 06:30:45 -0500
Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError
	when opening empty fasta files
In-Reply-To: <bug-2652-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111130.mABBUjf8003203@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2652


rjalves at igc.gulbenkian.pt changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|Not Applicable              |1.45


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 06:55:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 06:55:07 -0500
Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError
	when opening empty fasta files
In-Reply-To: <bug-2652-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111155.mABBt7Hf005132@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2652


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-11 06:55 EST -------
Hi Renato,

This bug in Bio.Fasta with empty files was fixed in Biopython 1.49b, see
Bio/Fasta/__init__.py revision 1.19. 
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Fasta/__init__.py?cvsroot=biopython#rev1.19

I would encourage you to try Biopython 1.49b, but if you have a reason for
running an old version like Biopython 1.45, you could probably update just this
one file instead.  Ask if you would like specific instructions, but essentially
its a one line change, from:

if line[0] == ">" :

to:

if not line or line[0] == ">" :

Please note that Bio.Fasta is considered to be obsolete (and was explicitly
documented as such as of Biopython 1.48), and may one day be deprecated. 
However, given this was the main FASTA parsing code in Biopython for some
years, we're not going to deprecate it just yet, so you should be OK continuing
to use Bio.Fasta in old scripts for a while yet.

For new code, we encourage people to use Bio.SeqIO instead, described in the
current tutorial and on the wiki:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
http://biopython.org/wiki/SeqIO

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 07:08:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 07:08:37 -0500
Subject: [Biopython-dev] [Bug 2649] Bio.KDTree expects numpy array with
	dtype="float32" on 64 bit machines.
In-Reply-To: <bug-2649-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111208.mABC8bHw006251@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2649


------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp  2008-11-11 07:08 EST -------
I've uploaded a fixed version to CVS; see KDTree.py and KDTreemodule.c at

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/KDTree/?cvsroot=biopython

Could you try with these files and see if they work for you?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Tue Nov 11 08:02:18 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 11 Nov 2008 13:02:18 +0000
Subject: [Biopython-dev] [BioPython] Cannot __add__ two DBSeq objects
In-Reply-To: <7265d4f0811110439h6c18e111te97d23070565cca2@mail.gmail.com>
References: <7265d4f0811110439h6c18e111te97d23070565cca2@mail.gmail.com>
Message-ID: <320fb6e00811110502y624cf6c1r52c316d61a1f7228@mail.gmail.com>

On Tue, Nov 11, 2008 at 12:39 PM, Cymon Cox <cy at cymon.org> wrote:
> Hi All,
>
> Two DBSeq objects cannot be concatenated, although the DBSeq object inherits
> __add__ from Seq.

Interesting point - not something I'd considered (nor anyone else until now!)

> It tries to init a new DBSeq object rather than returning a Seq object as would be expected.
> ...
> Presumably, DBSeq needs to overide Seq.__add__
> (Using CVS as of yesterday...)

Clearly we can't create a new DBSeq object (there wouldn't be any
suitable sequence in the database to point to), and returning a Seq
object is sensible.  We should probably continue this discussion on
the dev mailing list (CC'd).

Either we have the DBSeq override the __add__ method (and __radd__),
or we could make the base Seq class always use new Seq objects in
__add__ etc.  This would affect anyone writing their own Seq
subclass...

On balance, I think you're right and its DBSeq which needs to be
changed.  Would you like to tackle this, or should I?  We'd also want
to extend the BioSQL unit test to cover adding DBSeq+DBSeq, DBSeq+Seq,
Seq+DBSeq, DBSeq+MutableSeq, MutableSeq+DBSeq, etc.

Peter

From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 09:48:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 09:48:14 -0500
Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError
	when opening empty fasta files
In-Reply-To: <bug-2652-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111448.mABEmEba019180@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2652


------- Comment #2 from rjalves at igc.gulbenkian.pt  2008-11-11 09:48 EST -------
Hi Peter,

I am using the Biopython package from the debian-lenny repository (which is
1.45), I guess they haven't updated in part due to the change to the Numpy. I
will checkout the svn version then.

As for why I'm using Bio.Fasta, I'm not using it directly.
Bio.SeqUtils.CodonUsage.CodonAdaptationIndex.cai_for_gene() calls it.

Renato


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Tue Nov 11 09:53:32 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 11 Nov 2008 14:53:32 +0000
Subject: [Biopython-dev] [BioPython] Cannot __add__ two DBSeq objects
In-Reply-To: <320fb6e00811110502y624cf6c1r52c316d61a1f7228@mail.gmail.com>
References: <7265d4f0811110439h6c18e111te97d23070565cca2@mail.gmail.com>
	<320fb6e00811110502y624cf6c1r52c316d61a1f7228@mail.gmail.com>
Message-ID: <320fb6e00811110653u63e85bc6k572d5fa42ede8280@mail.gmail.com>

On Tue, Nov 11, 2008 at 1:02 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Tue, Nov 11, 2008 at 12:39 PM, Cymon Cox <cy at cymon.org> wrote:
>> Hi All,
>>
>> Two DBSeq objects cannot be concatenated, although the DBSeq object inherits
>> __add__ from Seq.
>
> Interesting point - not something I'd considered (nor anyone else until now!)
>
>> It tries to init a new DBSeq object rather than returning a Seq object as would be expected.
>> ...
>> Presumably, DBSeq needs to overide Seq.__add__
>> (Using CVS as of yesterday...)
>
> Clearly we can't create a new DBSeq object (there wouldn't be any
> suitable sequence in the database to point to), and returning a Seq
> object is sensible.  We should probably continue this discussion on
> the dev mailing list (CC'd).

Fixed in CVS by implementing the __add__ and __radd__ methods in the
DBSeq object, and having these simply off load the work to the Seq
class.

See:
BioSQL/BioSeq.py revision: 1.28
Tests/test_BioSQL.py revision: 1.26
Tests/output/test_BioSQL revision: 1.2

Peter

From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 10:28:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 10:28:20 -0500
Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError
	when opening empty fasta files
In-Reply-To: <bug-2652-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111528.mABFSK8A022517@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2652


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-11 10:28 EST -------
(In reply to comment #2)
> I am using the Biopython package from the debian-lenny repository (which is
> 1.45), I guess they haven't updated in part due to the change to the Numpy. I
> will checkout the svn version then.

Debian sid is using Biopython 1.47, I think lenny is just very conservative.

If you don't mind installing NumPy and trying to install Biopython from source,
then you could either try getting the latest Biopython code from CVS, or try
Biopython 1.49 beta which was released just a few days ago.  Ask on the mailing
list if you get stuck.

> As for why I'm using Bio.Fasta, I'm not using it directly.
> Bio.SeqUtils.CodonUsage.CodonAdaptationIndex.cai_for_gene() calls it.

Oh - thanks for that.  I've just updated Bio/SeqUtils/CodonUsage.py to use
Bio.SeqIO instead of Bio.Fasta (plus added a basic check of this module to our
unit tests).

Peter

[Leaving this bug as resolved fixed]


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 10:43:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 10:43:05 -0500
Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError
	when opening empty fasta files
In-Reply-To: <bug-2652-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111543.mABFh5x8023530@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2652


------- Comment #4 from rjalves at igc.gulbenkian.pt  2008-11-11 10:43 EST -------
Thanks Biopython 1.49b installed without any problems


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 10:43:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 10:43:15 -0500
Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError
	when opening empty fasta files
In-Reply-To: <bug-2652-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111543.mABFhFBp023551@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2652


rjalves at igc.gulbenkian.pt changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 10:46:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 10:46:13 -0500
Subject: [Biopython-dev] [Bug 2653] New: Bio.SeqUtils.CodonUsage is not
	translation table aware
Message-ID: <bug-2653-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2653

           Summary: Bio.SeqUtils.CodonUsage is not translation table aware
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


Looking at Bio/SeqUtils/CodonUsage.py there is a hard coded dictionary
SynonymousCodons, presumably for the standard genetic code.

Ideally Bio.SeqUtils.CodonUsage should support any of the genetic code tables
defined in Bio.Data.CodonTable, perhaps via an optional initiation argument to
the CodonAdaptationIndex object.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 13:09:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 13:09:20 -0500
Subject: [Biopython-dev] [Bug 2653] Bio.SeqUtils.CodonUsage is not
	translation table aware
In-Reply-To: <bug-2653-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111809.mABI9KXq004974@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2653


rjalves at igc.gulbenkian.pt changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rjalves at igc.gulbenkian.pt


------- Comment #1 from rjalves at igc.gulbenkian.pt  2008-11-11 13:09 EST -------
Thanks for the heads up Peter.

Also related to the reference codon table used... There is the possibility of a
codon being completely absent in all given sequences. In this case the
CodonAdaptationIndex.generate_index() function fails with a ZeroDivisionError
on line 90.

The resource at http://phenotype.biosci.umbc.edu/index.php?page=What_is_CAI
might give some good indications on how to work around this and also other
(improved?) implementations of CAI.

Obviously if you use a different SynonymousCodons table the picture may change.

Renato.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 12 06:14:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 12 Nov 2008 06:14:27 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811121114.mACBER3k002184@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #15 from dalloliogm at gmail.com  2008-11-12 06:14 EST -------
(In reply to comment #13)
> (From update of attachment 1033 [details])
> Something similar was checked into CVS.
> 

(In reply to comment #13)
> (From update of attachment 1033 [details])
> Something similar was checked into CVS.
> 

I saw the changes now!
ok.. But I would prefer to put the doctest in the main __doc__ of the function
instead of __init__ and __repr__.
This is because otherwise they wouldn't be accessible by the users with the
help function.
Usually you do help(SeqRecord), not help(SeqRecord.__init__).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 12 06:47:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 12 Nov 2008 06:47:25 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811121147.mACBlP4T005886@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-12 06:47 EST -------
(In reply to comment #15)
> I saw the changes now!

The CVS website is updated once an hour, you track this on
http://biopython.org/wiki/Tracking_CVS_commits which displays the RSS feed,
http://biopython.open-bio.org/CVS2RSS/biopython.rss (this works great apart
from the links when more than one file is changed).

> ok.. But I would prefer to put the doctest in the main __doc__ of
> the function instead of __init__ and __repr__.
> This is because otherwise they wouldn't be accessible by the users with the
> help function.  Usually you do help(SeqRecord), not help(SeqRecord.__init__).

If you do help(object) it shows you the main docstring followed by all the
methods and their docstrings (including __init__).

On the other hand all the special methods like __init__, __str__, __repr__ etc
are going to be confusing for a beginner.

On balance, a short example in the main docstring (covering __init__) does seem
sensible, and perhaps the __init__ example is then redundant.

Does anyone else want to comment?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From cymon.cox at googlemail.com  Wed Nov 12 05:57:12 2008
From: cymon.cox at googlemail.com (Cymon Cox)
Date: Wed, 12 Nov 2008 10:57:12 +0000
Subject: [Biopython-dev] BioSQL buglets
Message-ID: <7265d4f0811120257y241f67fl514b77cb03712552@mail.gmail.com>

All,

Selects on the seqfeature_qualifier_value and dbxref tables were not being
ordered by rank. This caused multiple qualifier values to be out of order
which in turn caused the tests to fail - see comment in
http://bugzilla.open-bio.org/show_bug.cgi?id=2616

This also solves a TODO in the test_BioSQL_SeqIO.py:

 85 +#TODO - Pin down the "Duplicate entry" IntegrityError from this:
 86 +#    ("genbank",False, 'GenBank/cor6_6.gb', 6),

This test now works and Ive generated new output.

In test_BioSQL.py create_database(), postgres returns an error string that
'find's on index 0 when the the database doesnt exist. The comparision
therefore needs to be >= 0 rather than >0.

All tests now pass OK with postgresql/psycopg2.
Patch attached.

Cheers, C.
--
-------------- next part --------------
A non-text attachment was scrubbed...
Name: biosql.patch
Type: text/x-patch
Size: 5105 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20081112/ba4e35b3/attachment.bin>

From bugzilla-daemon at portal.open-bio.org  Wed Nov 12 08:12:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 12 Nov 2008 08:12:24 -0500
Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2
In-Reply-To: <bug-2616-42@http.bugzilla.open-bio.org/>
Message-ID: <200811121312.mACDCOdj011669@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2616


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-12 08:12 EST -------
(In reply to comment #10)
> 
> We still need to sort out the feature qualifiers loss of ordering...
> 

Fixed in CVS with a another patch from Cymon (via the mailing list).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Wed Nov 12 08:13:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 12 Nov 2008 13:13:16 +0000
Subject: [Biopython-dev] BioSQL buglets
In-Reply-To: <7265d4f0811120257y241f67fl514b77cb03712552@mail.gmail.com>
References: <7265d4f0811120257y241f67fl514b77cb03712552@mail.gmail.com>
Message-ID: <320fb6e00811120513p3be878b8pe0c5a48fa3945ff5@mail.gmail.com>

On Wed, Nov 12, 2008 at 10:57 AM, Cymon Cox <cymon.cox at googlemail.com> wrote:
> All,
>
> Selects on the seqfeature_qualifier_value and dbxref tables were not being
> ordered by rank. This caused multiple qualifier values to be out of order
> which in turn caused the tests to fail - see comment in
> http://bugzilla.open-bio.org/show_bug.cgi?id=2616
>
> This also solves a TODO in the test_BioSQL_SeqIO.py:
>
>  85 +#TODO - Pin down the "Duplicate entry" IntegrityError from this:
>  86 +#    ("genbank",False, 'GenBank/cor6_6.gb', 6),
>
> This test now works and Ive generated new output.
>
> In test_BioSQL.py create_database(), postgres returns an error string that
> 'find's on index 0 when the the database doesnt exist. The comparision
> therefore needs to be >= 0 rather than >0.
>
> All tests now pass OK with postgresql/psycopg2.
> Patch attached.
>
> Cheers, C.

Excellent - that patch made perfect sense and I've checked it in
(almost as is - I tweaked the find index bit slightly).  Thank you!

At this rate you'll be co-opted as an official maintainer for the
BioSQL module ;)

Peter

P.S. It might have been better to upload the patch to Bug 2616 (or a
new Bug) rather than sending it to everyone on the mailing list.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 12 10:35:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 12 Nov 2008 10:35:54 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811121535.mACFZsMl021458@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #17 from dalloliogm at gmail.com  2008-11-12 10:35 EST -------
(In reply to comment #16)
> (In reply to comment #15)
> > I saw the changes now!
> 
> The CVS website is updated once an hour, you track this on
> http://biopython.org/wiki/Tracking_CVS_commits which displays the RSS feed,
> http://biopython.open-bio.org/CVS2RSS/biopython.rss (this works great apart
> from the links when more than one file is changed).
> 
> > ok.. But I would prefer to put the doctest in the main __doc__ of
> > the function instead of __init__ and __repr__.
> > This is because otherwise they wouldn't be accessible by the users with the
> > help function.  Usually you do help(SeqRecord), not help(SeqRecord.__init__).
> 
> If you do help(object) it shows you the main docstring followed by all the
> methods and their docstrings (including __init__).
> 
> On the other hand all the special methods like __init__, __str__, __repr__ etc
> are going to be confusing for a beginner.
> 
> On balance, a short example in the main docstring (covering __init__) does seem
> sensible, and perhaps the __init__ example is then redundant.

well, I was saying that maybe it would be better to move the doctests in
__init__ and __repr__ to the main __doc__ of the module.
So it will be visible by people using help(module). 
Moreover, you can to test __repr__ and __init__ from there, without having to
repeat the 'from Bio.ALign.Generic import Alignment' stuff and similar every
time.


as for a few comments you added in Bio.Align.Generic:

> #A doctest for __repr__ would be nice, but __class__ comes out differently
> #if run via the __main__ trick.

maybe you can use the '+ELLIPSIS' directive 

and about this comment:
#A doctest would be nice, but the <BLANKLINE> stuff is very ugly!
#The "tab" format is possible, but tabs don't seem to work nicely in doctests.

you could use the directive NORMALIZE_WHITESPACE in a similar way.
I am attaching a file just to give you an example of how it could be with
+ELLIPSIS


> Does anyone else want to comment?
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 12 10:36:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 12 Nov 2008 10:36:37 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811121536.mACFabdk021517@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #18 from dalloliogm at gmail.com  2008-11-12 10:36 EST -------
Created an attachment (id=1052)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1052&action=view)
example of ellipsis directive

Example of doctest with ellipsis directive to test Alignment.__repr__


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From dalloliogm at gmail.com  Wed Nov 12 11:25:47 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Wed, 12 Nov 2008 17:25:47 +0100
Subject: [Biopython-dev] a sequence set object in biopython?
Message-ID: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com>

Hi,
I think it could be useful to add a generic SequenceSet object in biopython.
Such an object would represent a generic set of sequences, and could
have some useful methods like .format('fasta') or
.align('alignment_tool').
Is there something similar available already?
I have noticed that the actual Generic.Alignment is very similar to
such an object. However, it would be better to be able to work with a
separated class, because sometimes you want to deal with sequences
that are not aligned.

Some use cases:
- a set of sequences that represents all introns in a particular gene,
on which I want to calculate the conservation of the splicing
regulatory sites.
- all genes sequences in an organisms, which I want to convert in EMBL format
- a set of seqs to be aligned or used as input for other tools
etc..
-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it

From bugzilla-daemon at portal.open-bio.org  Wed Nov 12 11:29:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 12 Nov 2008 11:29:07 -0500
Subject: [Biopython-dev] [Bug 2552] Adding alignments
In-Reply-To: <bug-2552-42@http.bugzilla.open-bio.org/>
Message-ID: <200811121629.mACGT7gs025634@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2552


cymon.cox at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |cymon.cox at gmail.com


------- Comment #1 from cymon.cox at gmail.com  2008-11-12 11:29 EST -------
(In reply to comment #0)
> This is related to the very broad alignment bug 1944.
> 
> Given two alignments, it can make sense to talk about adding them together.

Actually, this is a very common procedure in phylogenetic analyses, where
multiple genes/loci are combined into a "super" matrix for a set of taxa.
Although, in this case, adding by column, if a taxon/row/identifier was missing
in a particular (sub-)alignment it would be filled by "-" (missing data) in the
combined matrix.

Anyway, I think this would be a very useful enhancement.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Wed Nov 12 12:53:35 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 12 Nov 2008 17:53:35 +0000
Subject: [Biopython-dev] [BioPython] a sequence set object in biopython?
In-Reply-To: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com>
References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com>
Message-ID: <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com>

On Wed, Nov 12, 2008 at 4:25 PM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
> Hi,
> I think it could be useful to add a generic SequenceSet object in biopython.
> Such an object would represent a generic set of sequences, and could
> have some useful methods like .format('fasta') or
> .align('alignment_tool').
> Is there something similar available already?

Given your example to turn the SequenceSet into a FASTA file, then
clearly you are thinking of a collection of SeqRecord objects rather
than just Seq objects.  For this kind of thing I personally just use a
list of SeqRecord objects.

If I want to turn a list of SeqRecord objects into a FASTA file, I can
pass the list to the Bio.SeqIO.write() function.  Once I've made a
FASTA file, I can call an external tool to align them - and then load
them in again using Bio.AlignIO or Bio.SeqIO depending on what I plan
to do next.

> I have noticed that the actual Generic.Alignment is very similar to
> such an object. However, it would be better to be able to work with a
> separated class, because sometimes you want to deal with sequences
> that are not aligned.

Yes, the generic alignment is basically a list of SeqRecord objects
plus some extra functionality like column access.

> Some use cases:
> - a set of sequences that represents all introns in a particular gene,
> on which I want to calculate the conservation of the splicing
> regulatory sites.
> - all genes sequences in an organisms, which I want to convert in EMBL format
> - a set of seqs to be aligned or used as input for other tools
> etc..

All sensible use cases - but all seem to be covered by a simple python
list of SeqRecord objects, or in some cases a list of Seq objects
(e.g. the introns example, as I doube the introns have names).

Peter

From tiagoantao at gmail.com  Wed Nov 12 13:02:11 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 12 Nov 2008 18:02:11 +0000
Subject: [Biopython-dev] PopGen status and new developments
Message-ID: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com>

Hi,

This an email with the status of current PopGen developments. In some
points, advice is especially welcome.


A. Platform support

As Peter noticed there is no Simcoal for the Mac. In a couple of weeks
I hope to have access to a Mac in order to try to compile it. In any
case I wont be able to distribute it without getting permission from
the authors, so the problem might remain...
I am now preparing support for LDNe, an application to estimate Ne
(effective population size) from LD. This application is Dos(Windows)
only. Source code is not available to the public (but the app is free
as free beer). I've had access to the source and compiled a Linux
version, again, I don't know if the author will let me distribute it.
Question: How do people feel about supporting an application like
this? Any strong feelings against?


B. New developments

1. The above LDNe module is fully coded, and being tested by a few
people (not just me). Test code and documentation TBD but easy.
2. Genepop application support (no confusion with file format support,
which is done). Partially done and informally tested. Plan to start
with just partial support.
3. Fstat parser. Coded.


C. Statistics

An ongoing interesting discussion started on statistics. I am delayed
with doing a proposal to handle statistical processing (my bad, but I
will have some free time in the next couple of weeks and I will try to
recover). My current existing code on the subject is available on
Github (by Giovanni), but I think it will need some change (not in the
functionality, but in the architecture).

From biopython at maubp.freeserve.co.uk  Wed Nov 12 13:06:19 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 12 Nov 2008 18:06:19 +0000
Subject: [Biopython-dev] [BioPython] a sequence set object in biopython?
In-Reply-To: <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com>
References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com>
	<320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com>
Message-ID: <320fb6e00811121006mbe32efar2fca638d1a5fe2ef@mail.gmail.com>

On Wed, Nov 12, 2008 at 5:53 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Wed, Nov 12, 2008 at 4:25 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>> Hi,
>> I think it could be useful to add a generic SequenceSet object in biopython.
>> Such an object would represent a generic set of sequences, and could
>> have some useful methods like .format('fasta') or
>> .align('alignment_tool').
>> Is there something similar available already?
>
> Given your example to turn the SequenceSet into a FASTA file, then
> clearly you are thinking of a collection of SeqRecord objects rather
> than just Seq objects.  For this kind of thing I personally just use a
> list of SeqRecord objects.
>
> If I want to turn a list of SeqRecord objects into a FASTA file, I can
> pass the list to the Bio.SeqIO.write() function.  Once I've made a
> FASTA file, I can call an external tool to align them - and then load
> them in again using Bio.AlignIO or Bio.SeqIO depending on what I plan
> to do next.

If you really want a list like object with a format method in your
code, how about something like this:

class SeqRecordList(list) :
    """Subclass of the python list, to hold SeqRecord objects only."""
    #TODO - Override the list methods to make sure all the items
    #are indeed SeqRecord objects

    def format(self, format) :
        """Returns a string of all the records in a requested file format.

        The argument format should be any file format supported by
        the Bio.SeqIO.write() function.  This must be a lower case string.
        """
        from Bio import SeqIO
        from StringIO import StringIO
        handle = StringIO()
        SeqIO.write(self, handle, format)
        handle.seek(0)
        return handle.read()

if __name__ == "__main__" :
    print "Loading records..."
    from Bio import SeqIO
    my_list = SeqRecordList(SeqIO.parse(open("ls_orchid.gbk"),"genbank"))
    print len(my_list)
    for format in ["fasta","tab"] :
        print
        print format
        print "="*len(format)
        print my_list.format(format)


Peter

From biopython at maubp.freeserve.co.uk  Wed Nov 12 13:11:30 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 12 Nov 2008 18:11:30 +0000
Subject: [Biopython-dev] PopGen status and new developments
In-Reply-To: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com>
References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com>
Message-ID: <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com>

Tiago Ant?o wrote:
> A. Platform support
>
> As Peter noticed there is no Simcoal for the Mac. In a couple of weeks
> I hope to have access to a Mac in order to try to compile it. In any
> case I wont be able to distribute it without getting permission from
> the authors, so the problem might remain...
> I am now preparing support for LDNe, an application to estimate Ne
> (effective population size) from LD. This application is Dos(Windows)
> only. Source code is not available to the public (but the app is free
> as free beer). I've had access to the source and compiled a Linux
> version, again, I don't know if the author will let me distribute it.
> Question: How do people feel about supporting an application like
> this? Any strong feelings against?

Assuming the tools are useful, then I have no objection to including
command line wrappers for them in Biopython.

I'm not 100% sure what you meant by "supporting an application like
this", but if you are asking about supporting these cross-platform
ports of the actual command line tools, then I don't see that as
something Biopython should be doing.

Peter


From tiagoantao at gmail.com  Wed Nov 12 13:16:06 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 12 Nov 2008 18:16:06 +0000
Subject: [Biopython-dev] PopGen status and new developments
In-Reply-To: <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com>
References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com>
	<320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com>
Message-ID: <6d941f120811121016q17451c83u12b2233eba625944@mail.gmail.com>

On Wed, Nov 12, 2008 at 6:11 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> I'm not 100% sure what you meant by "supporting an application like
> this", but if you are asking about supporting these cross-platform
> ports of the actual command line tools, then I don't see that as
> something Biopython should be doing.


Sorry, I was not clear: I was just asking about supporting
applications that dont have the source available and that don't
support all common platforms (the case of LDNe).

From dalloliogm at gmail.com  Wed Nov 12 13:17:48 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Wed, 12 Nov 2008 19:17:48 +0100
Subject: [Biopython-dev] [BioPython] a sequence set object in biopython?
In-Reply-To: <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com>
References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com>
	<320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com>
Message-ID: <5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com>

On Wed, Nov 12, 2008 at 6:53 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Wed, Nov 12, 2008 at 4:25 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>> Hi,
>> I think it could be useful to add a generic SequenceSet object in biopython.
>> Such an object would represent a generic set of sequences, and could
>> have some useful methods like .format('fasta') or
>> .align('alignment_tool').
>> Is there something similar available already?
>
> Given your example to turn the SequenceSet into a FASTA file, then
> clearly you are thinking of a collection of SeqRecord objects rather
> than just Seq objects.  For this kind of thing I personally just use a
> list of SeqRecord objects.
>
> If I want to turn a list of SeqRecord objects into a FASTA file, I can
> pass the list to the Bio.SeqIO.write() function.  Once I've made a
> FASTA file, I can call an external tool to align them - and then load
> them in again using Bio.AlignIO or Bio.SeqIO depending on what I plan
> to do next.
>
>> Some use cases:
>> - a set of sequences that represents all introns in a particular gene,
>> on which I want to calculate the conservation of the splicing
>> regulatory sites.
>> - all genes sequences in an organisms, which I want to convert in EMBL format
>> - a set of seqs to be aligned or used as input for other tools
>> etc..
>
> All sensible use cases - but all seem to be covered by a simple python
> list of SeqRecord objects, or in some cases a list of Seq objects
> (e.g. the introns example, as I doube the introns have names).
>

Not always.
For example, if I have a set of genes in an organism, sometimes I
would need to access to only some of them, by their id; so, a
__getattribute__ method to make it work as a dictionary could also be
useful.
The fact is that I think that such an object would be so widely used,
that maybe it would be useful to implement it in biopython.
What I would do, honestly, is to create a GenericSeqRecordSet class
from which to derive Alignment, specifying that in an alignment all
the sequences should have the same lenght. It would not require much
work and it would change the interface.


very tiny little minusculus p.s. if you need help for implement such a
thing or anything else I can volounteer :).

> Peter
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it

From dalloliogm at gmail.com  Wed Nov 12 13:19:50 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Wed, 12 Nov 2008 19:19:50 +0100
Subject: [Biopython-dev] PopGen status and new developments
In-Reply-To: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com>
References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com>
Message-ID: <5aa3b3570811121019k3a0710f1n2add599ce0b4f56a@mail.gmail.com>

On Wed, Nov 12, 2008 at 7:02 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> Hi,
>
> This an email with the status of current PopGen developments. In some
> points, advice is especially welcome.

Hi Tiago!!
Have you noticed (I thought it wasn't directly related to PopGen so I
didn't tell you directly) about this parser for fastPhaseOutput?
- http://bugzilla.open-bio.org/show_bug.cgi?id=2643

>
>
> A. Platform support
>
> As Peter noticed there is no Simcoal for the Mac. In a couple of weeks
> I hope to have access to a Mac in order to try to compile it. In any
> case I wont be able to distribute it without getting permission from
> the authors, so the problem might remain...
> I am now preparing support for LDNe, an application to estimate Ne
> (effective population size) from LD. This application is Dos(Windows)
> only. Source code is not available to the public (but the app is free
> as free beer). I've had access to the source and compiled a Linux
> version, again, I don't know if the author will let me distribute it.
> Question: How do people feel about supporting an application like
> this? Any strong feelings against?
>
>
> B. New developments
>
> 1. The above LDNe module is fully coded, and being tested by a few
> people (not just me). Test code and documentation TBD but easy.
> 2. Genepop application support (no confusion with file format support,
> which is done). Partially done and informally tested. Plan to start
> with just partial support.
> 3. Fstat parser. Coded.
>
>
> C. Statistics
>
> An ongoing interesting discussion started on statistics. I am delayed
> with doing a proposal to handle statistical processing (my bad, but I
> will have some free time in the next couple of weeks and I will try to
> recover). My current existing code on the subject is available on
> Github (by Giovanni), but I think it will need some change (not in the
> functionality, but in the architecture).
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it


From biopython at maubp.freeserve.co.uk  Wed Nov 12 13:36:11 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 12 Nov 2008 18:36:11 +0000
Subject: [Biopython-dev] [BioPython] a sequence set object in biopython?
In-Reply-To: <5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com>
References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com>
	<320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com>
	<5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com>
Message-ID: <320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com>

Giovanni Marco Dall'Olio wrote:
>> All sensible use cases - but all seem to be covered by a simple python
>> list of SeqRecord objects, or in some cases a list of Seq objects
>> (e.g. the introns example, as I doube the introns have names).
>
> Not always.
> For example, if I have a set of genes in an organism, sometimes I
> would need to access to only some of them, by their id; so, a
> __getattribute__ method to make it work as a dictionary could also be
> useful.

OK, then use a dict of SeqRecords for this, as shown in the tutorial
chapter for Bio.SeqIO and the wiki.  We even have a helper function
Bio.SeqIO.to_dict() to do this and check for duplicate keys.

If you need an order preserving dictionary, there are examples of this
on the net and there is even PEP372 for adding this to python itself:
http://www.python.org/dev/peps/pep-0372/

> The fact is that I think that such an object would be so widely used,
> that maybe it would be useful to implement it in biopython.
> What I would do, honestly, is to create a GenericSeqRecordSet class
> from which to derive Alignment, specifying that in an alignment all
> the sequences should have the same lenght. It would not require much
> work and it would change the interface.

I agree that IF we added some sort of "GenericSeqRecordSet class", it
might be sensible for the alignment objects to subclass it -
especially if you want it to behave list a python list primarily.
Note that in python sets are not order preserving.

> very tiny little minusculus p.s. if you need help for implement such a
> thing or anything else I can volounteer :).

That's good to hear :)

However, we'd have to establish the need for this new object first -
but so far we've only had two people's view so its too early to form a
consensus.  I don't see a strong reason for adding yet another object,
when the core language provides lists, sets and dict which seem to be
enough.

Peter

From jflatow at gmail.com  Wed Nov 12 13:52:35 2008
From: jflatow at gmail.com (Jared Flatow)
Date: Wed, 12 Nov 2008 12:52:35 -0600
Subject: [Biopython-dev] [BioPython] a sequence set object in biopython?
In-Reply-To: <320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com>
References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com>
	<320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com>
	<5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com>
	<320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com>
Message-ID: <ACD9FBEC-07B9-43D3-BAA6-CA538F6DC43C@gmail.com>

On Nov 12, 2008, at 12:36 PM, Peter wrote:

> However, we'd have to establish the need for this new object first -
> but so far we've only had two people's view so its too early to form a
> consensus.  I don't see a strong reason for adding yet another object,
> when the core language provides lists, sets and dict which seem to be
> enough.

I totally agree with you Peter, that's what the basic container types  
are for. If someone wants to create a subclass of these containers for  
a specific purpose it is simple enough to do. IMO its kind of silly to  
try and make sequence specific containers that satisfy everyone's needs.

jared

From bsouthey at gmail.com  Wed Nov 12 13:58:05 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 12 Nov 2008 12:58:05 -0600
Subject: [Biopython-dev] PopGen status and new developments
In-Reply-To: <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com>
References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com>
	<320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com>
Message-ID: <491B273D.9020404@gmail.com>

Peter wrote:
> Tiago Ant?o wrote:
>   
>> A. Platform support
>>
>> As Peter noticed there is no Simcoal for the Mac. In a couple of weeks
>> I hope to have access to a Mac in order to try to compile it. In any
>> case I wont be able to distribute it without getting permission from
>> the authors, so the problem might remain...
>> I am now preparing support for LDNe, an application to estimate Ne
>> (effective population size) from LD. This application is Dos(Windows)
>> only. Source code is not available to the public (but the app is free
>> as free beer). I've had access to the source and compiled a Linux
>> version, again, I don't know if the author will let me distribute it.
>> Question: How do people feel about supporting an application like
>> this? Any strong feelings against?
>>     
>
> Assuming the tools are useful, then I have no objection to including
> command line wrappers for them in Biopython.
>
> I'm not 100% sure what you meant by "supporting an application like
> this", but if you are asking about supporting these cross-platform
> ports of the actual command line tools, then I don't see that as
> something Biopython should be doing.
>
> Peter
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>   
Hi,
I do have concerns about usefulness with regards to Biopython.

How widespread is the application?
What platforms is it released under (DOS only or some version of windows 
version like XP or Vista or Windows 7)?
Is the application well supported and will it continue to be supported?
Under what terms is the application 'free'?
How does this integrate into your ideas for Popgen?
Would it work like say clustalw where you output something from 
Biopython, run the application and perhaps import something back into 
Biopython?

If the application requires major data formatting then you would have to 
determine if it is easier to support the application or integrate it 
into Biopython. Obviously, this latter requires a clean room 
implementation of the application or the essential algorithm. Also, you 
can only provide the specification and can not be involved the actual 
implementation.

Bruce

From tiagoantao at gmail.com  Wed Nov 12 15:09:31 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 12 Nov 2008 20:09:31 +0000
Subject: [Biopython-dev] PopGen status and new developments
In-Reply-To: <491B273D.9020404@gmail.com>
References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com>
	<320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com>
	<491B273D.9020404@gmail.com>
Message-ID: <6d941f120811121209n75dfb0cfh1fb4e57a98011ed0@mail.gmail.com>

Hi,

On Wed, Nov 12, 2008 at 6:58 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> I do have concerns about usefulness with regards to Biopython.

It is important to notice that having this application support has no
big impact on deployment of biopython. The only visible thing is some
tests reporting that the application doesn't exist. This is different
from adding a dependency on, say, scipy. I don't think that this
imposes any maintenance/installation hurdle at large. I think, this is
actually a non-problem on the deployment stage, at least.

> How widespread is the application?

The application is fairly new (genepop, on the other hand is widely
used and old). I cannot answer that question. I know of some people
using it, but it is my small, biased, universe. I would guess that
currently the number is small.

Is there a policy to only support widespread applications?

> What platforms is it released under (DOS only or some version of windows
> version like XP or Vista or Windows 7)?

There is a Dos and Windows frontend. I actually asked the code to the
authors and they gave me access to it. I have compiled a Linux
version, but I don't know if they are going to make it available.

> Is the application well supported and will it continue to be supported?

Regarding current support, I can subjectively say that the authors
answer my queries rather fast. Regarding the future, I dont know.

> Under what terms is the application 'free'?

Much software available in this field is made available without no
regards for licensing issues. This is already the case for the
supported Fdist application (source available, no license).
This is problem in the field, where people make things available
without much concern for licensing issues. Some people don't care that
much about that, they just "make things available".
So, if there is a policy to only support applications for which there
is a clear license, then this one is out (and some code has to be
removed from the current PopGen module, by the way). I never link the
code in, I just invoke it (these are mostly wrappers), so there should
be no legal issues in any case, I suspect.

There is a chicken and egg problem here that needs to be fought: In
population genetics there is no widespread tradition of making things
open (not because people want closed solutions, but mostly because
people don't think about these issues). There is also little tradition
in coding (people want ready made solutions. The coding people is
relatively few and mostly R based) than in other areas. As an example:
i don't know of many direct users of fdist code, but know lots of
people which use applications made on top of that code.

By the way, Simcoal is GPL (and there are more examples of open code
in population genetics, of course).

> How does this integrate into your ideas for Popgen?

Very well. I have this stated philosophy, from the beginning, of using
existing applications and not reinvent the wheel. That being said, I
agree that a core statistic implementation should be done (even if
there are alternatives). But, mostly, for now, what is available in
Bio.PopGen are intelligent wrappers.

> Would it work like say clustalw where you output something from Biopython,
> run the application and perhaps import something back into Biopython?

Yep, it accepts genepop files and the output is fully parsed back.
This is still not the case, by the way, with simcoal where the output
is not usable (arlequin is needed to analyze the results). I need to
do an arlequin parser, that would solve the problem.

> If the application requires major data formatting then you would have to

It doesn't require any formatting at all as the de facto standard
format in the area (genepop) is supported and the results are parsed
back.

Tiago

From dalloliogm at gmail.com  Wed Nov 12 19:16:44 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Thu, 13 Nov 2008 01:16:44 +0100
Subject: [Biopython-dev] [BioPython] a sequence set object in biopython?
In-Reply-To: <320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com>
References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com>
	<320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com>
	<5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com>
	<320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com>
Message-ID: <5aa3b3570811121616u5f95cc8du9f0d91e4743f067f@mail.gmail.com>

On Wed, Nov 12, 2008 at 7:36 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Giovanni Marco Dall'Olio wrote:
>>> All sensible use cases - but all seem to be covered by a simple python
>>> list of SeqRecord objects, or in some cases a list of Seq objects
>>> (e.g. the introns example, as I doube the introns have names).
>>
>> Not always.
>> For example, if I have a set of genes in an organism, sometimes I
>> would need to access to only some of them, by their id; so, a
>> __getattribute__ method to make it work as a dictionary could also be
>> useful.
>
> OK, then use a dict of SeqRecords for this, as shown in the tutorial
> chapter for Bio.SeqIO and the wiki.  We even have a helper function
> Bio.SeqIO.to_dict() to do this and check for duplicate keys.

I would prefer a SeqRecordSet object with a to_dict method :)

> If you need an order preserving dictionary, there are examples of this
> on the net and there is even PEP372 for adding this to python itself:
> http://www.python.org/dev/peps/pep-0372/

>> The fact is that I think that such an object would be so widely used,
>> that maybe it would be useful to implement it in biopython.
>> What I would do, honestly, is to create a GenericSeqRecordSet class
>> from which to derive Alignment, specifying that in an alignment all
>> the sequences should have the same lenght. It would not require much
>> work and it would change the interface.
>
> I agree that IF we added some sort of "GenericSeqRecordSet class", it
> might be sensible for the alignment objects to subclass it -
> especially if you want it to behave list a python list primarily.

Let's see it from another point of view.
In biopython, if you want to print a set of sequences in fasta format,
you have to do the following:
>>> s1 = SeqRecord(Seq('cacacac'))
>>> s2 = SeqRecord(Seq('cacacac'))
>>> seqs = s1, s2
>>> out = ''
>>> for seq in seqs:
>>>     # a "print seq.format('fasta')" statement won't work properly here, because of blank lines
>>>     out += seq.format('fasta')
>>> print out

On the other side, printing an alignment in fasta format is a lot simpler:
>>> al = Alignment(SingleLetterAlphabet)
>>> al.add_sequence('s1', 'cacaca')
>>> al.add_sequence('s2, 'cacaca')
>>> print al.format('fasta')

I work more often with sets of sequences rather than with alignments.
So, why it is more difficult to print some un-related sequences in a
certain format, than aligned sequence? I would end up using Alignment
objects also for sequences that are not aligned.

I am also thinking about many format parsers.

Wouldn't it be easier:
>>> seqs = Bio.SeqIO.parse(filehandler, 'fasta')
>>> record_dict = seqs.to_dict()

than invoking SeqIO twice?


> Note that in python sets are not order preserving.
>
>> very tiny little minusculus p.s. if you need help for implement such a
>> thing or anything else I can volounteer :).
>
> That's good to hear :)
>
> However, we'd have to establish the need for this new object first -
> but so far we've only had two people's view so its too early to form a
> consensus.  I don't see a strong reason for adding yet another object,
> when the core language provides lists, sets and dict which seem to be
> enough.

Take for example this code you wrote for me before:

> class SeqRecordList(list) :
>    """Subclass of the python list, to hold SeqRecord objects only."""
>    #TODO - Override the list methods to make sure all the items
>    #are indeed SeqRecord objects
>
>    def format(self, format) :
>        """Returns a string of all the records in a requested file format.
>
>        The argument format should be any file format supported by
>        the Bio.SeqIO.write() function.  This must be a lower case string.
>        """
>        from Bio import SeqIO
>        from StringIO import StringIO
>        handle = StringIO()
>        SeqIO.write(self, handle, format)
>        handle.seek(0)
>        return handle.read()

It's very useful, but I don't think a python/biopython newbie would be
able to write it.
That's why I think it should be included.
Last year, I was in another laboratory and I didn't have much
experience with biopython, and I was missing such a kind of object.

> Peter
>

Goodnight!!


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it

From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 02:16:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 02:16:02 -0500
Subject: [Biopython-dev] [Bug 2552] Adding alignments
In-Reply-To: <bug-2552-42@http.bugzilla.open-bio.org/>
Message-ID: <200811130716.mAD7G2pw008200@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2552


------- Comment #2 from fkauff at biologie.uni-kl.de  2008-11-13 02:16 EST -------
The Nexus module in Bio.Nexus has a function (not a method) 'combine' that can
combine Nexus objects. It takes care of missing taxa, taxon sets, etc. Usage is
something like:

nex1=Nexus.Nexus('myfirstalignment.nex')
nex2=Nexus.Nexus('mysecondalignment.nex')
combined=Nexus.combine([('fancyname1',nex1),('fancyname2',nex2)])

It looks fairly straightforward to add this to a SeqRecord object.

Cheers,
Frank

(Hi Cymon)


(In reply to comment #1)
> (In reply to comment #0)
> > This is related to the very broad alignment bug 1944.
> > 
> > Given two alignments, it can make sense to talk about adding them together.
> 
> Actually, this is a very common procedure in phylogenetic analyses, where
> multiple genes/loci are combined into a "super" matrix for a set of taxa.
> Although, in this case, adding by column, if a taxon/row/identifier was missing
> in a particular (sub-)alignment it would be filled by "-" (missing data) in the
> combined matrix.
> 
> Anyway, I think this would be a very useful enhancement.
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 05:19:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 05:19:29 -0500
Subject: [Biopython-dev] [Bug 2552] Adding alignments
In-Reply-To: <bug-2552-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131019.mADAJTxs024880@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2552


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-13 05:19 EST -------
(In reply to comment #1)
> (In reply to comment #0)
> > This is related to the very broad alignment bug 1944.
> > 
> > Given two alignments, it can make sense to talk about adding them together.
> 
> Actually, this is a very common procedure in phylogenetic analyses, where
> multiple genes/loci are combined into a "super" matrix for a set of taxa.

This was one of the use cases I originally had in mind here (with hindsight I
should have mentioned this in the original proposal).  Another potentially use
for this is in combination with extracting sub-alignments by column (see Bug
2551) - for example to remove some middle region of an alignment by selecting
the two end regions and adding them together, e.g. new_align = align[:,:10] +
align[:,20:] to remove the region from columns 10 to 20.

As described in my original proposal, adding two alignments "by column" would
require they have the same number of rows, and the same IDs (possibly in a
different order - this is not essential as making the user think about their
preferred sort order seem fine to me).

I suppose using any common subset of shared names is also well defined, or
automatically including null sequences for missing entries (as Frank suggested
in comment 2), but I would much prefer to keep any alignment addition simple
and explicit - no "magic".

More generally you could consider adding any two alignments "by column" if they
have the same number of rows, but first we'd have to talk about adding
SeqRecord objects.  This means doing something sensible with the annotation, in
particular the id and name.  I was hoping to avoid this.

Once Biopython 1.49 is out, dealing with this bug is certainly on my todo list,
especially now that we have some positive responses.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Thu Nov 13 05:27:57 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 13 Nov 2008 02:27:57 -0800 (PST)
Subject: [Biopython-dev] [BioPython] a sequence set object in biopython?
In-Reply-To: <5aa3b3570811121616u5f95cc8du9f0d91e4743f067f@mail.gmail.com>
Message-ID: <25667.98653.qm@web62408.mail.re1.yahoo.com>

Adding new classes to Biopython should be done very carefully ... once they're in, it's difficult to remove them again. In the past, removing classes that turned out to be less than ideal was a real headache. Right now I don't see a clear need for a sequence set object ... read on.

--- On Wed, 11/12/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> > > > > OK, then use a dict of SeqRecords for this, as shown
> > in the tutorial chapter for Bio.SeqIO and the wiki.
> >  We even have a helper function
> > Bio.SeqIO.to_dict() to do this and check for duplicate
> > keys.
> 
> I would prefer a SeqRecordSet object with a to_dict method

> Wouldn't it be easier:
> >>> seqs = Bio.SeqIO.parse(filehandler,
> 'fasta')
> >>> record_dict = seqs.to_dict()
> 
> than invoking SeqIO twice?

Maybe, yes, but it's just a matter of typing and I don't think that by itself it is a good enough reason for a SeqRecordSet class.

> Let's see it from another point of view.
> In biopython, if you want to print a set of sequences in
> fasta format,
> you have to do the following:
> >>> s1 = SeqRecord(Seq('cacacac'))
> >>> s2 = SeqRecord(Seq('cacacac'))
> >>> seqs = s1, s2
> >>> out = ''
> >>> for seq in seqs:
>         # a "print seq.format('fasta')" statement won't work
>         # properly here, because of blank lines
>         out += seq.format('fasta')
> >>> print out

I don't quite understand why "print seq.format('fasta')" won't work.

> Take for example this code you wrote for me before:
> 
> > class SeqRecordList(list) :
> >    def format(self, format) :
> >        from Bio import SeqIO
> >        from StringIO import StringIO
> >        handle = StringIO()
> >        SeqIO.write(self, handle, format)
> >        handle.seek(0)
> >        return handle.read()
> 
> It's very useful, but I don't think a
> python/biopython newbie would be
> able to write it.

I agree that this is too complicated. What if we redefine SeqIO.write as

def write(self, handle=sys.stdout, format='fasta'):
...

So by default SeqIO.write prints to the screen. Then you can do

SeqIO.write(records)

where records are a list of SeqRecord's.

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 06:06:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 06:06:20 -0500
Subject: [Biopython-dev] [Bug 2628] Have Bio.SeqIO.write(...) and
	Bio.AlignIO.write(...) return number of records
In-Reply-To: <bug-2628-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131106.mADB6Ki7030741@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2628


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-13 06:06 EST -------
Note - now that we return the count, this does block a previous suggestion by
Michiel that if the handle were omitted the write function could default to
returning a string (handled via StringIO internally).

I wasn't keen on this idea at the time because it would have given the write
function very different behaviour depending on the arguments.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Thu Nov 13 06:11:10 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 13 Nov 2008 11:11:10 +0000
Subject: [Biopython-dev] [BioPython] a sequence set object in biopython?
In-Reply-To: <25667.98653.qm@web62408.mail.re1.yahoo.com>
References: <5aa3b3570811121616u5f95cc8du9f0d91e4743f067f@mail.gmail.com>
	<25667.98653.qm@web62408.mail.re1.yahoo.com>
Message-ID: <320fb6e00811130311t4e813a8fqeb21504fd5696bf1@mail.gmail.com>

Michiel wrote:
>Marco wrote:
>> Take for example this code you [Peter] wrote for me before:
>>
>> > class SeqRecordList(list) :
>> >    def format(self, format) :
>> >        from Bio import SeqIO
>> >        from StringIO import StringIO
>> >        handle = StringIO()
>> >        SeqIO.write(self, handle, format)
>> >        handle.seek(0)
>> >        return handle.read()
>>
>> It's very useful, but I don't think a
>> python/biopython newbie would be
>> able to write it.
>
> I agree that this is too complicated.

This wasn't aimed at a beginner, but rather for Marco if he really
wants to use this kind of object in his own code, or as a basis for
further discussion.

> What if we redefine SeqIO.write as
>
> def write(self, handle=sys.stdout, format='fasta'):
> ...
>
> So by default SeqIO.write prints to the screen. Then you can do
>
> SeqIO.write(records)
>
> where records are a list of SeqRecord's.

We could certainly include something like this in the documentation:

#Just an example to create some records:
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
records = [SeqRecord(Seq("ACGT"),"Alpha"), SeqRecord(Seq("GTGC"),"Beta")]

#One way to "print" records to screen,
import sys
from Bio import SeqIO
SeqIO.write(records, sys.stdout, "fasta")

I'm not so keen on making the handle default to standard out, but this
is nicer than the suggestion you made some time ago that if the handle
were omitted a string be returned (no longer an option since Bug 2628
was committed).

Any other votes for the standard out default?

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 06:18:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 06:18:01 -0500
Subject: [Biopython-dev] [Bug 2552] Adding alignments
In-Reply-To: <bug-2552-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131118.mADBI1of031964@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2552


------- Comment #4 from fkauff at biologie.uni-kl.de  2008-11-13 06:18 EST -------
(In reply to comment #3)
>
> > 
> > Actually, this is a very common procedure in phylogenetic analyses, where
> > multiple genes/loci are combined into a "super" matrix for a set of taxa.
> 
> This was one of the use cases I originally had in mind here (with hindsight I
> should have mentioned this in the original proposal).  Another potentially use
> for this is in combination with extracting sub-alignments by column (see Bug
> 2551) - for example to remove some middle region of an alignment by selecting
> the two end regions and adding them together, e.g. new_align = align[:,:10] +
> align[:,20:] to remove the region from columns 10 to 20.

Nexus parser can already handle this by rewriting the data set

>> nexobject.write_nexus_data(filename='new.nex',exclude=[range(10,21)],delete=['list','of','taxa','two','delete'])

where the indices of remaining character sets and character partitions get
recalculated.


> 
> As described in my original proposal, adding two alignments "by column" would
> require they have the same number of rows, and the same IDs (possibly in a
> different order - this is not essential as making the user think about their
> preferred sort order seem fine to me).
> 
> I suppose using any common subset of shared names is also well defined, or
> automatically including null sequences for missing entries (as Frank suggested
> in comment 2), but I would much prefer to keep any alignment addition simple
> and explicit - no "magic".
> 

Yes, missing names are given missing character entries

> More generally you could consider adding any two alignments "by column" if they
> have the same number of rows, but first we'd have to talk about adding
> SeqRecord objects.  This means doing something sensible with the annotation, in
> particular the id and name.  I was hoping to avoid this.
> 
> Once Biopython 1.49 is out, dealing with this bug is certainly on my todo list,
> especially now that we have some positive responses.
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 07:14:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 07:14:21 -0500
Subject: [Biopython-dev] [Bug 2654] New: Bio.Blast.NCBIStandalone does not
	support the output file argument
Message-ID: <bug-2654-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2654

           Summary: Bio.Blast.NCBIStandalone does not support the output
                    file argument
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


The NCBI blastall tool defaults to writing its output to standard out, but can
be told to write to a file instead:

  -o  BLAST report Output File [File Out]  Optional

Currently Bio.Blast.NCBIStandalone.blastall() does not support this optional
argument - meaning the user wants to save the output they must do this manually
from the standard out handle.

This also applies to rpsblast and blastpgp as well.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From eric.pruitt at gmail.com  Thu Nov 13 08:00:36 2008
From: eric.pruitt at gmail.com (James Pruitt)
Date: Thu, 13 Nov 2008 07:00:36 -0600
Subject: [Biopython-dev] Lowess Smooth Improvement
Message-ID: <171e8a410811130500o71c455f6mda64ab19c138e48f@mail.gmail.com>

I made some changes to the Lowess smoothing method as well as written a unit
test for it. On my machine, it runs around 37% faster in my unit tests
compared to the original lowess method and that is using the numpy.median
function so it would probably run even faster with the Bio.Cluster median
functoin. How do I go about proposing my code to be included in Bio.Python?

-- 
-Jimmy

From biopython at maubp.freeserve.co.uk  Thu Nov 13 08:27:51 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 13 Nov 2008 13:27:51 +0000
Subject: [Biopython-dev] Lowess Smooth Improvement
In-Reply-To: <171e8a410811130500o71c455f6mda64ab19c138e48f@mail.gmail.com>
References: <171e8a410811130500o71c455f6mda64ab19c138e48f@mail.gmail.com>
Message-ID: <320fb6e00811130527m41238780n9fe7f9c6de1a2d0a@mail.gmail.com>

On Thu, Nov 13, 2008 at 1:00 PM, James Pruitt <eric.pruitt at gmail.com> wrote:
> I made some changes to the Lowess smoothing method as well as written a unit
> test for it. On my machine, it runs around 37% faster in my unit tests
> compared to the original lowess method and that is using the numpy.median
> function so it would probably run even faster with the Bio.Cluster median
> functoin.

Presumable this is an update for Bio/Statistics/lowess.py?  I'm a
little confused - this code already uses Bio.Cluster.median if it can,
falling back on numpy.median.  Maybe you're working from an older
version of Bipython?

> How do I go about proposing my code to be included in Bio.Python?

First file an enhancement Bug, then once the bug is filed you can
attached a patch against CVS.
If you have any example scripts or unit tests to go with it, even better.

Thanks,

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 10:25:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 10:25:56 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131525.mADFPuvi029137@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #22 from dalloliogm at gmail.com  2008-11-13 10:25 EST -------
Created an attachment (id=1053)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1053&action=view)
test files for fastPhaseOutput

I put the fastPhaseoutput files, used in the tests, in separated files, as
asked.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 10:59:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 10:59:02 -0500
Subject: [Biopython-dev] [Bug 2655] New: Sorting sub-features in BioSeq.py
	can return corrupted feature
Message-ID: <bug-2655-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2655

           Summary: Sorting sub-features in BioSeq.py can return corrupted
                    feature
           Product: Biopython
           Version: 1.49b
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: cymon.cox at gmail.com


BioSeq.py retrieves SeqFeatures from a BioSQL database and sorts both the
features and any subfeatures. The first sort is superfluous and the second sort
is an error that can lead to feature being returned corrupted with the
sub-features in an incorrect order. So Ive marked this major...

Ive been trying to implement the feature/sub-feature locations test in
test_BioSQL_SeqIO.

Here's my solution (attached as patch1):

"""
        # Compare sub-feature Locations:
        # 
        # BioSQL currently does not store fuzzy locations, but instead stores
        # them as FeatureLocation.nofuzzy_start FeatureLocation.nofuzzy_end.
        # Hence, the old_sub from SeqIO.parse() will have fuzzy location while
        # new_sub locations from BioSQL will be fuzzy.
        # The vast majority of cases will be comparisons of ExactPosition
        # class locations, so we'll try that first and catch the exceptions.

        try:
            assert str(old_sub.location) == str(new_sub.location), \
               "%s -> %s" % (str(old_sub.location), str(new_sub.location))
        except AssertionError, e:
            if isinstance(old_sub.location.start, ExactPosition) and \
                isinstance(new_sub.location.start, ExactPosition) and \
                isinstance(old_sub.location.end, ExactPosition) and \
                isinstance(new_sub.location.end, ExactPosition):
                # Its not a problem with fuzzy locations, re-raise 
                raise AssertionError, e
            else:
                #At least one location is fuzzy
                assert old_sub.location.nofuzzy_start ==
new_sub.location.nofuzzy_start, \
                    "%s -> %s" % (old_sub.location.nofuzzy_start,
new_sub.location.nofuzzy_start)
                assert old_sub.location.nofuzzy_end ==
new_sub.location.nofuzzy_end, \
                   "%s -> %s" % (old_sub.location.nofuzzy_end,
new_sub.location.nofuzzy_end)
"""

This test causes errors in 3 of the test cases:
GenBank/extra_keywords.gb
GenBank/one_of.gb
GFF/NC_001422.gbk

e.g:
Testing loading from genbank format file GenBank/extra_keywords.gb
 - TCCAGGGGATTCACGCGCA...TTG [Gp6GqZ3Q9foPG0HvyXguIGSJN8U] len 154329,
AL138972.1
 - Retrieving by name/display_id 'DMBR25B3',
Traceback (most recent call last):
  File "test_BioSQL_SeqIO.py", line 371, in <module>
    compare_records(record, db_rec)
  File "test_BioSQL_SeqIO.py", line 280, in compare_records
    compare_features(old_f, new_f)
  File "test_BioSQL_SeqIO.py", line 185, in compare_features
    raise AssertionError, e
AssertionError: [153489:154269] -> [40:610]

This is because each of these records has a peculiar join(...)
for the above record:
join(153490..154269,AL121804.2:41..610,

(an aside how does the user know that returned feature location is a join
with a separate accession? How does BioSQL/biopython deal with this?)

The error is caused by BioSeq.py _retrieve_features() sorting the sub-features
first by sorting on start position:

BioSeq.py:
249                 sub_feature_list.append((start, subfeature))
250             sub_feature_list.sort()
251             feature.sub_features = [sub_feature[1]
252                                     for sub_feature in sub_feature_list]

This is an error because it returns the sub-features out of order. Besides this
sub-feature sort, and the seqFeature sort, are both unnecessary because the
features and sub-features are stored in BioSQL by rank and retrieved by rank,
so
they should be in the correct order anyway.

Attached BioSeq.py patch to remove both sort()'s - patch2

With these patches applied the test_BioSQL_SeqIO and test_BioSQL pass:

[cymon at chara Tests]$ python test_BioSQL_SeqIO.py > test_output
[cymon at chara Tests]$ diff -ruN test_output output/test_BioSQL_SeqIO 
--- test_output 2008-11-13 15:39:20.000000000 +0000
+++ output/test_BioSQL_SeqIO    2008-11-12 13:06:19.000000000 +0000
@@ -1,3 +1,4 @@
+test_BioSQL_SeqIO
 Connecting to database
 Removing existing sub-database 'biosql-seqio-test' (if exists)
 (Re)creating empty sub-database 'biosql-seqio-test'
[cymon at chara Tests]$ python run_tests.py test_BioSQL_SeqIO.py
test_BioSQL_SeqIO ... ok

----------------------------------------------------------------------
Ran 1 test in 15.928s

OK
[cymon at chara Tests]$ python run_tests.py test_BioSQL.py
test_BioSQL ... ok

----------------------------------------------------------------------
Ran 1 test in 25.255s

OK


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 11:00:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 11:00:02 -0500
Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can
	return corrupted feature
In-Reply-To: <bug-2655-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131600.mADG02lb002140@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2655


------- Comment #1 from cymon.cox at gmail.com  2008-11-13 11:00 EST -------
Created an attachment (id=1054)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1054&action=view)
patch1 to test_BioSQL_SeqIO


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 11:00:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 11:00:35 -0500
Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can
	return corrupted feature
In-Reply-To: <bug-2655-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131600.mADG0Zhi002264@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2655


------- Comment #2 from cymon.cox at gmail.com  2008-11-13 11:00 EST -------
Created an attachment (id=1055)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1055&action=view)
patch2 to BioSQL/BioSeq.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 11:28:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 11:28:48 -0500
Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can
	return corrupted feature
In-Reply-To: <bug-2655-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131628.mADGSmmf007542@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2655


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-13 11:28 EST -------
Another sensible improvement - checked in with only minor changes (fixed an
assert in the unit test, and removed an old comment about sorting for
subfeatures).

Checking in BioSQL/BioSeq.py;
/home/repository/biopython/biopython/BioSQL/BioSeq.py,v  <--  BioSeq.py
new revision: 1.30; previous revision: 1.29
done
Checking in Tests/test_BioSQL_SeqIO.py;
/home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v  <-- 
test_BioSQL_SeqIO.py
new revision: 1.25; previous revision: 1.24
done

Thanks Cymon,

Peter.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Thu Nov 13 11:33:43 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 13 Nov 2008 16:33:43 +0000
Subject: [Biopython-dev] Lowess Smooth Improvement
In-Reply-To: <171e8a410811130825x5732bd99o252e26f2bafa8e13@mail.gmail.com>
References: <171e8a410811130500o71c455f6mda64ab19c138e48f@mail.gmail.com>
	<320fb6e00811130527m41238780n9fe7f9c6de1a2d0a@mail.gmail.com>
	<171e8a410811130825x5732bd99o252e26f2bafa8e13@mail.gmail.com>
Message-ID: <320fb6e00811130833y3413eb36p92be13ca0ee1ed9a@mail.gmail.com>

On Thu, Nov 13, 2008 at 4:25 PM, James Pruitt <eric.pruitt at gmail.com> wrote:
> I removed the Bio.Cluster reference because the system the code would run on
> would not have acccess to it so the code was vestigial but on the version I
> will submit, I reincluded the Bio.Cluster median function. Yes-- this is an
> update for Bio/Statistics/lowess.py

OK - file the enhancement bug, upload the code (ideally as a patch)
and we'll take a look :)

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 12:09:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 12:09:37 -0500
Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can
	return corrupted feature
In-Reply-To: <bug-2655-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131709.mADH9blO013661@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2655


------- Comment #4 from cymon.cox at gmail.com  2008-11-13 12:09 EST -------
(In reply to comment #3)
> Another sensible improvement - checked in with only minor changes (fixed an
> assert in the unit test,

Thanks Peter :)

> and removed an old comment about sorting for
> subfeatures).

If the comment stays in, you'll need to remove these two lines of nonsense as
well:

test_BioSQL_SeqIO.py:
171         # Hence, the old_sub from SeqIO.parse() will have fuzzy location
while
172         # new_sub locations from BioSQL will be fuzzy.

Sorry about that.

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 12:17:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 12:17:15 -0500
Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can
	return corrupted feature
In-Reply-To: <bug-2655-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131717.mADHHFpR015244@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2655


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-13 12:17 EST -------
$ cvs commit -m "Removing two redundant comment lines (see Bug 2655)"
test_BioSQL_SeqIO.py
===========================================
 dev.open-bio.org - Authorized Access Only
===========================================
peterc at dev.open-bio.org's password: 
Checking in test_BioSQL_SeqIO.py;
/home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v  <-- 
test_BioSQL_SeqIO.py
new revision: 1.26; previous revision: 1.25
done


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 20:23:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 20:23:26 -0500
Subject: [Biopython-dev] [Bug 2657] New: Improved Bio/Statistics/lowess.py
Message-ID: <bug-2657-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657

           Summary: Improved Bio/Statistics/lowess.py
           Product: Biopython
           Version: 1.49b
          Platform: PC
               URL: http://pastebin.ca/1255734
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: eric.pruitt at gmail.com


I noticed several calculations were done repeatedly when it could be saved as a
single variable and used throughout. Then, I realized that it would be faster
since the matrix was a statics size to just hard code solving the matrix into
the function.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 04:32:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 04:32:36 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811140932.mAE9Wa1f001445@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #1 from dalloliogm at gmail.com  2008-11-14 04:32 EST -------
ok, but consider that all posts on pastebin disappear after 30 days... You
should add an attachment by clicking on 'Create a New Attachment' from this
page (you can only do that after opening the bug report).

p.s. what about adding some doctest to this module? Just to show an example on
how to run it.
Something like this:
"""
<lowess __doc__ >

    >>> import numpy
    >>> x =  numpy.array([1, 2, 3, 4, 5])
    >>> y = numpy.array([1, 2, 3, 4, 6])
    >>> lowess(x, y)
    expected result
"""

- http://docs.python.org/library/doctest.html
- http://bugzilla.open-bio.org/show_bug.cgi?id=2640


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 05:41:31 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 05:41:31 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141041.mAEAfVQO007220@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 05:41 EST -------
Created an attachment (id=1057)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1057&action=view)
The updated lowess.py from http://pastebin.ca/raw/1255734

Attaching James' new file here so it doesn't just expire at pastebin.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 06:11:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 06:11:26 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141111.mAEBBQJm010925@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 06:11 EST -------
I've updated CVS to use standard four space indentation, add a doctest and the
copyright statement etc.

James' code makes two code changes (shown against CVS revision 1.9).

67,68c67,68
<     h = [numpy.sort(abs(x-x[i]))[r] for i in range(n)]
<     w = numpy.clip(abs(([x]-numpy.transpose([x]))/h),0.0,1.0)
---
>     h = [numpy.sort(numpy.abs(x-x[i]))[r] for i in range(n)]
>     w = numpy.clip(numpy.abs(([x]-numpy.transpose([x]))/h),0.0,1.0)

Due to the historic usage "from Numeric import *" this code did once use
Numeric.abs here, so it makes sense to use numpy.abs now.  Probably just an
oversight from the recent Numeric/numpy conversion.  This is another reminder
that using "from XXX import *" is a bad idea.

76,80c76,82
<             b = numpy.array([sum(weights*y), sum(weights*y*x)])
<             A = numpy.array([[sum(weights),   sum(weights*x)],
<                        [sum(weights*x), sum(weights*x*x)]])
<             beta = numpy.linalg.solve(A,b)
<             yest[i] = beta[0] + beta[1]*x[i]
---
>             theta = weights*x
>             b_top = sum(weights*y)
>             b_bot = sum(theta*y)
>             a = sum(weights)
>             b = sum(theta)
>             d = sum(theta*x)
>             yest[i] = (d*b_top-b*b_bot+(a*b_bot-b*b_top)*x[i])/(a*d-b**2)

I can see the point of calculating and caching these:
weights*y
weights*x
sum(weights*x)

Was there a good reason for the name theta for weights*x?

I personally think using an explicit matrix solver is much nicer to read than
that complex hand coded version.  Does it really save much time?

My suggestion is just:
76,78c76,81
<             b = numpy.array([sum(weights*y), sum(weights*y*x)])
<             A = numpy.array([[sum(weights),   sum(weights*x)],
<                        [sum(weights*x), sum(weights*x*x)]])
---
>             weights_x = weights*x
>             weights_y = weights*y
>             sum_weights_x = sum(weights_x)
>             b = numpy.array([sum(weights_y), sum(weights_y*x)])
>             A = numpy.array([[sum(weights),   sum_weights_x],
>                        [sum_weights_x, sum(weights_x*x)]])

However, I'm going to leave this for Michiel to resolve (given he wrote the
code in the first place).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 06:15:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 06:15:09 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141115.mAEBF9Gi011416@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #4 from eric.pruitt at gmail.com  2008-11-14 06:15 EST -------
Created an attachment (id=1058)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1058&action=view)
Unit test for lowess.py

File will need to have the import statements adjsuted for the Bio.Python
structure.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From p.j.a.cock at googlemail.com  Fri Nov 14 06:18:43 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 14 Nov 2008 11:18:43 +0000
Subject: [Biopython-dev] [BioPython] Problems with Emboss.Primer3
In-Reply-To: <001001c94644$eeaf5c00$1022a8c0@ipkgatersleben.de>
References: <000801c94598$fd183f20$1022a8c0@ipkgatersleben.de>
	<320fb6e00811130643p357092f6y8e6d983a11909003@mail.gmail.com>
	<001001c94644$eeaf5c00$1022a8c0@ipkgatersleben.de>
Message-ID: <320fb6e00811140318s452f9a5aj76eb7d505a98b6ee@mail.gmail.com>

On Fri, Nov 14, 2008 at 10:37 AM, Stefanie L?ck
<lueck at ipk-gatersleben.de> wrote:
> Thanks for the hints!
> ...
> It gives as well as at the command line:
>
> "
> Command line:
> eprimer3 -sequence p3input.txt -outfile out.pr3 -target 50,500
> Return code:
> 1
> Errors:
>
>    EMBOSS An error in ajnam.c at line 1991:
>
> EMBOSSWIN environment variable not defined
>
> Messages
>
> "
> Any suggestions?

This doesn't seem to be a Biopython problem, but an EMBOSS
installation or configuration problem.  What version of EMBOSS do you
have?  Maybe try upgrading to version 6?

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 06:28:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 06:28:36 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141128.mAEBSaSb013641@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


eric.pruitt at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |eric.pruitt at gmail.com


------- Comment #5 from eric.pruitt at gmail.com  2008-11-14 06:28 EST -------
(In reply to comment #3)
> I've updated CVS to use standard four space indentation, add a doctest and the
> copyright statement etc.
> 
> James' code makes two code changes (shown against CVS revision 1.9).
> 
> 67,68c67,68
> <     h = [numpy.sort(abs(x-x[i]))[r] for i in range(n)]
> <     w = numpy.clip(abs(([x]-numpy.transpose([x]))/h),0.0,1.0)
> ---
> >     h = [numpy.sort(numpy.abs(x-x[i]))[r] for i in range(n)]
> >     w = numpy.clip(numpy.abs(([x]-numpy.transpose([x]))/h),0.0,1.0)
> 
> Due to the historic usage "from Numeric import *" this code did once use
> Numeric.abs here, so it makes sense to use numpy.abs now.  Probably just an
> oversight from the recent Numeric/numpy conversion.  This is another reminder
> that using "from XXX import *" is a bad idea.
> 
> 76,80c76,82
> <             b = numpy.array([sum(weights*y), sum(weights*y*x)])
> <             A = numpy.array([[sum(weights),   sum(weights*x)],
> <                        [sum(weights*x), sum(weights*x*x)]])
> <             beta = numpy.linalg.solve(A,b)
> <             yest[i] = beta[0] + beta[1]*x[i]
> ---
> >             theta = weights*x
> >             b_top = sum(weights*y)
> >             b_bot = sum(theta*y)
> >             a = sum(weights)
> >             b = sum(theta)
> >             d = sum(theta*x)
> >             yest[i] = (d*b_top-b*b_bot+(a*b_bot-b*b_top)*x[i])/(a*d-b**2)
> 
> I can see the point of calculating and caching these:
> weights*y
> weights*x
> sum(weights*x)
> 
> Was there a good reason for the name theta for weights*x?
> 
> I personally think using an explicit matrix solver is much nicer to read than
> that complex hand coded version.  Does it really save much time?
> 
> My suggestion is just:
> 76,78c76,81
> <             b = numpy.array([sum(weights*y), sum(weights*y*x)])
> <             A = numpy.array([[sum(weights),   sum(weights*x)],
> <                        [sum(weights*x), sum(weights*x*x)]])
> ---
> >             weights_x = weights*x
> >             weights_y = weights*y
> >             sum_weights_x = sum(weights_x)
> >             b = numpy.array([sum(weights_y), sum(weights_y*x)])
> >             A = numpy.array([[sum(weights),   sum_weights_x],
> >                        [sum_weights_x, sum(weights_x*x)]])
> 
> However, I'm going to leave this for Michiel to resolve (given he wrote the
> code in the first place).
> 

Yes-- replacing numpy saves quite a bit of time. When I replaced the variable
so they werent recalculated every single time, it reduced unit test time 17%
compared to the original then repaklcing numpy receduced it to a net 38% from
the original so huge difference. Also, I suggest changing something if you all
decided to keep numpy. Minor but just a suggestion.

>             weights_x = weights*x
>             sum_weights_x = sum(weights_x)
>             b = numpy.array([sum(weights*y), sum(weights_x*y)])
>             A = numpy.array([[sum(weights),   sum_weights_x],
>                        [sum_weights_x, sum(weights_x*x)]])


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 06:32:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 06:32:39 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141132.mAEBWdlC014111@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 06:32 EST -------
(In reply to comment #4)
> Created an attachment (id=1058)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1058&action=view) [details]
> Unit test for lowess.py
> 
> File will need to have the import statements adjsuted for the Bio.Python
> structure.
> 

You're also using scipy and rpy (not Biopython dependencies), so if we wanted
to include these tests they would have to be made conditional on these external
dependencies (so that the test framework knows when it can skip them).  

Removing them effectivly leaves one simple test:

from numpy import array
from Bio.Statistics.lowess import lowess

hand_iterations = 1
hand_f = 2./3.
hand_x = array([0.0,1.0,4.0,7.0])
hand_y = array([0.0,1.0,16.0,49.0])
#Was there a typo in the original, 18.85086... versus 18.5086...?
#hand_out = [-1.333391371257, 2.802858739, 18.850860916, 48.302727]
hand_out = [ -1.33338941,   2.80323154,  18.50860916,  48.30274834]
method_out = lowess(hand_x,hand_y,hand_f,hand_iterations)
for a,b in zip(method_out, hand_out) :
    assert abs(a-b) < 0.00001
print "Done"


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 06:35:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 06:35:44 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141135.mAEBZiCO014367@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #7 from eric.pruitt at gmail.com  2008-11-14 06:35 EST -------
(In reply to comment #6)
> (In reply to comment #4)
> > Created an attachment (id=1058)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1058&action=view) [details] [details]
> > Unit test for lowess.py
> > 
> > File will need to have the import statements adjsuted for the Bio.Python
> > structure.
> > 
> 
> You're also using scipy and rpy (not Biopython dependencies), so if we wanted
> to include these tests they would have to be made conditional on these external
> dependencies (so that the test framework knows when it can skip them).  
> 
> Removing them effectivly leaves one simple test:
> 
> from numpy import array
> from Bio.Statistics.lowess import lowess
> 
> hand_iterations = 1
> hand_f = 2./3.
> hand_x = array([0.0,1.0,4.0,7.0])
> hand_y = array([0.0,1.0,16.0,49.0])
> #Was there a typo in the original, 18.85086... versus 18.5086...?
> #hand_out = [-1.333391371257, 2.802858739, 18.850860916, 48.302727]
> hand_out = [ -1.33338941,   2.80323154,  18.50860916,  48.30274834]
> method_out = lowess(hand_x,hand_y,hand_f,hand_iterations)
> for a,b in zip(method_out, hand_out) :
>     assert abs(a-b) < 0.00001
> print "Done"
> 

When I did the hand calculations, I used a TI-84+ which uses decimal math
eliminating the binary error inherent in most python implementations.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 06:38:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 06:38:51 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141138.mAEBcpNd014578@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 06:38 EST -------
(In reply to comment #5)
>> I personally think using an explicit matrix solver is much nicer to read
>> than that complex hand coded version.  Does it really save much time?
>> ...
>> However, I'm going to leave this for Michiel to resolve (given he wrote
>> the code in the first place).
>> 
> 
> Yes-- replacing numpy saves quite a bit of time. When I replaced the variable
> so they werent recalculated every single time, it reduced unit test time 17%
> compared to the original then repaklcing numpy receduced it to a net 38% from
> the original so huge difference.

OK - so its clarity versus what sounds like a big speed difference.

> Also, I suggest changing something if you all
> decided to keep numpy. Minor but just a suggestion.
> 
> >             weights_x = weights*x
> >             sum_weights_x = sum(weights_x)
> >             b = numpy.array([sum(weights*y), sum(weights_x*y)])
> >             A = numpy.array([[sum(weights),   sum_weights_x],
> >                        [sum_weights_x, sum(weights_x*x)]])
> 

I see, in defining b, sum(weights*y*x) can be done as sum(weights_x*y) which
avoids creating the temp variable weights_y = weights*y, that does look better.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 06:41:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 06:41:05 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141141.mAEBf5IS014888@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


eric.pruitt at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|eric.pruitt at gmail.com       |


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 06:48:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 06:48:07 -0500
Subject: [Biopython-dev] [Bug 2658] New: 1.49b version of PDB Neighborsearch
	still based on Numeric
Message-ID: <bug-2658-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2658

           Summary: 1.49b version of PDB Neighborsearch still based on
                    Numeric
           Product: Biopython
           Version: 1.49b
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: rbickerton at gmail.com


Using python 2.52, running:

python ./lib/python2.5/site-packages/Bio/PDB/NeighborSearch.py

gives:

Traceback (most recent call last):
  File "./lib/python2.5/site-packages/Bio/PDB/NeighborSearch.py", line 138, in
<module>
    ns=NeighborSearch(al)
  File "./lib/python2.5/site-packages/Bio/PDB/NeighborSearch.py", line 41, in
__init__
    assert(self.coords.typecode()=="f")
AttributeError: 'numpy.ndarray' object has no attribute 'typecode'
Exit 1

A bit of google digging suggested that .typecode()=="f" is a Numarray function
that should be updated to its Numpy equivalent.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 07:06:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 07:06:28 -0500
Subject: [Biopython-dev] [Bug 2658] 1.49b version of PDB Neighborsearch
	still based on Numeric
In-Reply-To: <bug-2658-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141206.mAEC6SEp016723@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2658


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         OS/Version|Mac OS                      |All


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 07:06 EST -------
Yes, that does look like an oversight in the Numeric to NumPy migration.

See also Bug 2649 for a related but different issue in Bio.KDTree


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 07:18:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 07:18:25 -0500
Subject: [Biopython-dev] [Bug 2634] PAM30 Matrix doesn't work with qblast
In-Reply-To: <bug-2634-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141218.mAECIPRT017833@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2634


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 07:18 EST -------
Hi Nick,

I hope you got your blast to work.

I don't think we have an issue with Biopython itself, so I'm going to close
this bug.  It would be nice to somehow improve the error handling, but that
doesn't look straight forward.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 07:24:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 07:24:16 -0500
Subject: [Biopython-dev] [Bug 2604] test_Restriction failure with Python 2.6
	(also cause error in test_CAPS)
In-Reply-To: <bug-2604-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141224.mAECOGMN018266@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2604


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 07:24 EST -------
I'm going to mark this as fixed given it seem to be OK.

Please reopen this if there are any issues.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Fri Nov 14 07:27:23 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 14 Nov 2008 12:27:23 +0000
Subject: [Biopython-dev] Biopython 1.49 beta released
In-Reply-To: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com>
References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com>
Message-ID: <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com>

On Sun, Nov 9, 2008 at 3:16 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Dear Biopythoneers,
>
> We are pleased to announce a beta release of Biopython 1.49. There are
> been some significant changes since Biopython 1.48 was released two
> months ago, which is why we are initially releasing a beta for wider
> testing.
>
> As previously announced, the big news is that Biopython now uses NumPy
> rather than its precursor Numeric (the original Numerical Python
> library).

We've had a few Numeric -> NumPy bugs reported,

http://bugzilla.open-bio.org/show_bug.cgi?id=2658
Bug 2658 - Bio.PDB.Neighborsearch

http://bugzilla.open-bio.org/show_bug.cgi?id=2649
Bug 2649 - Bio.KDTree (probably fixed)

I don't think we should release Biopython 1.49 final until these are
resolved - but if there was interest I could put out a second beta.

Peter

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 08:17:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 08:17:39 -0500
Subject: [Biopython-dev] [Bug 2638] test_PopGen_SimCoal_nodepend.py fails on
	Windows, newline issue
In-Reply-To: <bug-2638-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141317.mAEDHdWo021804@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2638


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 08:17 EST -------
Patch checked in after testing with SIMCOAL2 on Windows XP.

Marking as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 10:16:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 10:16:12 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141516.mAEFGClF031759@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #19 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 10:16 EST -------
I've added a general example doctest to the main docstring for the SeqRecord
object.

Marking as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 10:35:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 10:35:18 -0500
Subject: [Biopython-dev] [Bug 2524] Handle missing libraries like numpy or
	reportlab in run_tests.py
In-Reply-To: <bug-2524-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141535.mAEFZIP8001033@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2524


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 10:35 EST -------
Fixed the numpy test cases (they were getting annoying with python 2.6 on
Windows where numpy isn't yet available).  The reportlab tests already fail
gracefully.

I ended up going down this route:

> (b) Modify all the tests using these semi-optional libraries to catch
> the ImportError and raise MissingExternalDependencyError instead.  As
> the tests themselves generally don't directly import the external
> library this is perhaps messy.

Marking this bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bsouthey at gmail.com  Fri Nov 14 10:39:00 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Fri, 14 Nov 2008 09:39:00 -0600
Subject: [Biopython-dev] Biopython 1.49 beta released
In-Reply-To: <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com>
References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com>
	<320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com>
Message-ID: <491D9B94.9050805@gmail.com>

Peter wrote:
> On Sun, Nov 9, 2008 at 3:16 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>   
>> Dear Biopythoneers,
>>
>> We are pleased to announce a beta release of Biopython 1.49. There are
>> been some significant changes since Biopython 1.48 was released two
>> months ago, which is why we are initially releasing a beta for wider
>> testing.
>>
>> As previously announced, the big news is that Biopython now uses NumPy
>> rather than its precursor Numeric (the original Numerical Python
>> library).
>>     
>
> We've had a few Numeric -> NumPy bugs reported,
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2658
> Bug 2658 - Bio.PDB.Neighborsearch
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2649
> Bug 2649 - Bio.KDTree (probably fixed)
>
> I don't think we should release Biopython 1.49 final until these are
> resolved - but if there was interest I could put out a second beta.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>   
I noticed that Bio.PDB.Neighborsearch is not being tested.

Is there someway to identify which functions are not getting tested?
I know it is considerable effort but it would allow the development of 
tests that at the very least exercise all the Biopython code. (Hopefully 
this is not as bad as the Numpy documentation marathon.)

Bruce

From biopython at maubp.freeserve.co.uk  Fri Nov 14 10:46:34 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 14 Nov 2008 15:46:34 +0000
Subject: [Biopython-dev] Biopython 1.49 beta released
In-Reply-To: <491D9B94.9050805@gmail.com>
References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com>
	<320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com>
	<491D9B94.9050805@gmail.com>
Message-ID: <320fb6e00811140746m119a040dv778163e0ab034a2@mail.gmail.com>

On Fri, Nov 14, 2008 at 3:39 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Peter wrote:
>> We've had a few Numeric -> NumPy bugs reported,
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2658
>> Bug 2658 - Bio.PDB.Neighborsearch
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2649
>> Bug 2649 - Bio.KDTree (probably fixed)
>>
>> ...
>
> I noticed that Bio.PDB.Neighborsearch is not being tested.
>

That fact that we didn't spot Bug 2658 from the unit tests makes that
very clear ;)

>
> Is there someway to identify which functions are not getting tested?
>

I can't think of an easy way - the best bet might be a quick script to
scan all the unit tests and pull out import lines, and from this build
a list of all modules which have some coverage.  This wouldn't tell us
about how much of each module is tested, but it would be better than
nothing.

> I know it is considerable effort but it would allow the development of tests
> that at the very least exercise all the Biopython code. (Hopefully this is
> not as bad as the Numpy documentation marathon.)

I've written plenty of tests myself, including for existing modules -
my gut feeling is full test coverage would be quite a marathon.

Compared to the early years of the project, I've propably tried to be
a bit stricter about making sure we have test cases and documentation
before accepting new code.  In some cases this has worked out pretty
well (e.g. Tiago's PopGen stuff is covered in the tutorial and has
unit tests).  In other cases it could put people off contributing
code.

Peter

From biopython at maubp.freeserve.co.uk  Fri Nov 14 12:24:33 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 14 Nov 2008 17:24:33 +0000
Subject: [Biopython-dev] Test coverage
Message-ID: <320fb6e00811140924g26cc0703r2629380540a5b667@mail.gmail.com>

Bruce:
>>
>> Is there someway to identify which functions are not getting tested?
>>

Peter:
> I can't think of an easy way - the best bet might be a quick script to
> scan all the unit tests and pull out import lines, and from this build
> a list of all modules which have some coverage.  This wouldn't tell us
> about how much of each module is tested, but it would be better than
> nothing.

I've done a very crude script to try and answer this, and can point
out a few modules in need of tests:

Bio.Affy
Bio.AlignAce
Bio.EZRetrieve
Bio.Emboss (everything except the primer parsers)
Bio.Encodings (obsolete?)
Bio.FilteredReader (obsolete?)
Bio.MaxEntropy
Bio.NMR
Bio.NaiveBayes
Bio.NetCatch (obsolete?)

Peter

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 13:06:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:06:49 -0500
Subject: [Biopython-dev] [Bug 2659] New: Typo in tutorial section "2.1
	General overview of what Biopython provides"
Message-ID: <bug-2659-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2659

           Summary: Typo in tutorial section "2.1  General overview of what
                    Biopython provides"
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: trivial
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: wilcoxjg at gmail.com


Sentence reads:
 "To me, this can be frustrating since I often WAY to just know the one right
way to do something."

Should be: 
 "To me, this can be frustrating since I often WANT to just know the one right
way to do something."


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 13:16:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:16:18 -0500
Subject: [Biopython-dev] [Bug 2660] New: Typo in tutorial section "2.2
	Working with sequences"
Message-ID: <bug-2660-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2660

           Summary: Typo in tutorial section "2.2  Working with sequences"
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: minor
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: wilcoxjg at gmail.com


Sentence reads:

"What we have here is a sequence object with a generic alphabet - reflecting
the fact WE HAVE SPECIFIED if this is a DNA or protein sequence (okay, a
protein with a lot of Alanines, Glycines, Cysteines and Threonines!)."

Should read:

"What we have here is a sequence object with a generic alphabet - reflecting
the fact we have NOT specified if this is a DNA or protein sequence (okay, a
protein with a lot of Alanines, Glycines, Cysteines and Threonines!)."


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 13:28:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:28:12 -0500
Subject: [Biopython-dev] [Bug 2659] Typo in tutorial section "2.1 General
	overview of what Biopython provides"
In-Reply-To: <bug-2659-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141828.mAEISCmZ013084@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2659


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 13:28 EST -------
Thanks :)

That's fixed in CVS now, see Doc/Tutorial.tex revision 1.185, which you can
view online here (updated every hour):

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython

We'll update the HTML and PDF on the website as part of the next release
(Biopython 1.49).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 13:34:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:34:34 -0500
Subject: [Biopython-dev] [Bug 2661] New: Typo in: "2.3  A usage example"
Message-ID: <bug-2661-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2661

           Summary: Typo in: "2.3  A usage example"
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: trivial
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: wilcoxjg at gmail.com


Sentence reads:

"We???ll start with sequence parsing in Section 2.4, but the orchids will be
back later on as well - for example WE'LL EXTRA DATA FROM Swiss-Prot from
certain orchid proteins in Section 6.1, search PubMed for papers about orchids
in Section 6.2, extract sequence data from GenBank in Section 6.3.1, and work
with ClustalW multiple sequence alignments of orchid proteins in Section
6.4.1."

Capitalized phrase should contain some modifier like "we'll NEED extra", or
"we'll GET extra".


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 13:34:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:34:49 -0500
Subject: [Biopython-dev] [Bug 2660] Typo in tutorial section "2.2 Working
	with sequences"
In-Reply-To: <bug-2660-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141834.mAEIYnm6013826@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2660


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 13:34 EST -------
The tutorial on the website (matching Biopython 1.49b) is fine:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Which version of Biopython are you using (you didn't fill this in on the bug
report), or where are you reading this?

Looking over CVS this text was only like this in Biopython 1.44, so I'm a
little confused.

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 13:38:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:38:06 -0500
Subject: [Biopython-dev] [Bug 2661] Typo in: "2.3  A usage example"
In-Reply-To: <bug-2661-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141838.mAEIc6Qo014131@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2661


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 13:38 EST -------
As per Bug 2660, which version of Biopython are you using (you didn't fill this
in on the bug report), or where are you reading this?

This has already been fixed to say "extract" instead of "extra" (but I'm not
going to check exactly when this was corrected).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 13:40:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:40:28 -0500
Subject: [Biopython-dev] [Bug 2660] Typo in tutorial section "2.2 Working
	with sequences"
In-Reply-To: <bug-2660-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141840.mAEIeSsm014238@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2660


wilcoxjg at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|Not Applicable              |1.44


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 13:41:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:41:47 -0500
Subject: [Biopython-dev] [Bug 2661] Typo in: "2.3  A usage example"
In-Reply-To: <bug-2661-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141841.mAEIfll7014298@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2661


wilcoxjg at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|Not Applicable              |1.44


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 13:47:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:47:28 -0500
Subject: [Biopython-dev] [Bug 2660] Typo in tutorial section "2.2 Working
	with sequences"
In-Reply-To: <bug-2660-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141847.mAEIlS8Y014586@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2660


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 13:47 EST -------
Hi Josh,

If you were reading the tutorial shipped with Biopython 1.44 this makes sense. 
I certainly don't want to put you off reporting any other typos, but if you
find any more please first check against the (almost completely) up to date
version before reporting them:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Note that some of the things covered in the current tutorial will not apply to
Biopython 1.44, which is now a year old.  I'd encourage you to upgrade if
possible.

Thanks,

Peter

P.S. Marking this bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mhampton at d.umn.edu  Fri Nov 14 14:48:42 2008
From: mhampton at d.umn.edu (Marshall Hampton)
Date: Fri, 14 Nov 2008 13:48:42 -0600 (CST)
Subject: [Biopython-dev] coverage of function testing
Message-ID: <Pine.SOC.4.64.0811141338280.4396@ub.d.umn.edu>


Hi,

I noticed some discussion of the coverage and automation of testing for 
functions in biopython, and thought I would suggest folks check out the 
testing and coverage tools in Sage (www.sagemath.org).  Testing of 
functions in Sage is done by testing examples in their docstrings - there 
are comments to opt out of testing or to indicate if they will take a long 
time.  They also have scripts for checking which functions have at least 
one such testable example.  So you can do something like this:


sage -coverage PATH_TO_SAGE/sage/geometry/polyhedra.py

and get

SCORE
/Volumes/D/sage-3.2.alpha0/devel/sage-main/sage/geometry/polyhedra.py:
100% (21 of 21)

to see if anything is untested.

Now that biopython is converting to numpy, I will start arguing for its 
inclusion as a standard part of Sage (right now it is an optional 
package).


Cheers,

Marshall Hampton
Integrated Biosciences Program and
Department of Mathematics and Statistics
University of Minnesota, Duluth


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 15:27:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 15:27:12 -0500
Subject: [Biopython-dev] [Bug 2662] New: Typo in tutorial "Chapter 3
	Sequence objects "
Message-ID: <bug-2662-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2662

           Summary: Typo in tutorial "Chapter 3 Sequence objects "
           Product: Biopython
           Version: 1.49b
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: trivial
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: wilcoxjg at gmail.com


Sentence reads:                                                                 

"First of all the Seq object has a slightly different set of METHODS TO A PLAIN
python string (for example, reverse_complement() and translate() methods used
for nucleotide sequences)."

Should be:
"methods THAN a plain python string"


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Fri Nov 14 15:29:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 14 Nov 2008 20:29:16 +0000
Subject: [Biopython-dev] coverage of function testing
In-Reply-To: <Pine.SOC.4.64.0811141338280.4396@ub.d.umn.edu>
References: <Pine.SOC.4.64.0811141338280.4396@ub.d.umn.edu>
Message-ID: <320fb6e00811141229j3aa3a7b6ra3a064842e8f007c@mail.gmail.com>

On Fri, Nov 14, 2008 at 7:48 PM, Marshall Hampton <mhampton at d.umn.edu> wrote:
> Hi,
>
> I noticed some discussion of the coverage and automation of testing for
> functions in biopython, and thought I would suggest folks check out the
> testing and coverage tools in Sage (www.sagemath.org).  Testing of functions
> in Sage is done by testing examples in their docstrings - there are comments
> to opt out of testing or to indicate if they will take a long time.  They
> also have scripts for checking which functions have at least one such
> testable example.  So you can do something like this:
>
> sage -coverage PATH_TO_SAGE/sage/geometry/polyhedra.py
>
> and get
>
> SCORE
> /Volumes/D/sage-3.2.alpha0/devel/sage-main/sage/geometry/polyhedra.py:
> 100% (21 of 21)
>
> to see if anything is untested.

That may be worth a go, but there are two sides to this:
(1) Making a list of the code that needs testing (pretty much the same
for any python library)
(2) Working out what is already tested (and here, that means going
over Biopython's test framework which is based on unit test, but also
includes some use of doctests).  This is probably trickier...

> Now that biopython is converting to numpy, I will start arguing for its
> inclusion as a standard part of Sage (right now it is an optional package).

That sounds good - but I have no knowledge of the Sage system and how
they divide things up.

Peter

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 18:15:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 18:15:57 -0500
Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence
	objects "
In-Reply-To: <bug-2662-42@http.bugzilla.open-bio.org/>
Message-ID: <200811142315.mAENFvNc000930@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2662


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 18:15 EST -------
(In reply to comment #0)
> Sentence reads:                                                                 
> 
> "First of all the Seq object has a slightly different set of METHODS TO A
> PLAIN python string (for example, reverse_complement() and translate()
> methods used for nucleotide sequences)."

There's nothing wrong with that (and I got a second opinion on this too).  The
only thing I think that might need changing is adding a comma: "First of all,
the Seq object...".

> Should be:
> "methods THAN a plain python string"

Why exactly?  Are you an American? ;)

There is also the possible option of "... different ... from ...", but that
doesn't flow as nicely here.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 18:47:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 18:47:16 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811142347.mAENlG5D003824@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #9 from eric.pruitt at gmail.com  2008-11-14 18:47 EST -------
Created an attachment (id=1059)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1059&action=view)
Test for speed comparison

I wrote a short program to compare the speed of the original lowess function to
my version. I thought the way the unit test was written might have affected
results. On my system, the new version ran an average of 15 seconds per test as
opposed 19 for the old one so not the boost I originally purported but closer
to 27%. Posting the program so someone else can compare it.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 21:06:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 21:06:49 -0500
Subject: [Biopython-dev] [Bug 2658] 1.49b version of PDB Neighborsearch
	still based on Numeric
In-Reply-To: <bug-2658-42@http.bugzilla.open-bio.org/>
Message-ID: <200811150206.mAF26nhu013792@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2658


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2008-11-14 21:06 EST -------
Fixed in CVS; see Bio/PDB/NeighborSearch.py revision 1.21.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 22:59:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 22:59:22 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811150359.mAF3xM8D020801@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp  2008-11-14 22:59 EST -------
This warning is due to the introduction of Py_ssize_t in Python 2.5. The best
solution for this bug depends on which Python versions will be supported by
Biopython.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 23:04:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 23:04:00 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811150404.mAF4403S021350@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp  2008-11-14 23:04 EST -------
A few comments:

1) Is there a reason to use numpy.abs instead of Python's built-int abs? Timing
these two functions suggests that they are equally fast.
2) I have no objection against James' suggestion to speed up the code. The
original call to numpy.linalg.solve was probably overkill.
3) Can you submit a unit test that does not use scipy and rpy? We should avoid
adding additional dependencies to Biopython.
4) In the long run, I am not sure whether Biopython is the right place for the
lowess function. Probably NumPy or Matplotlib would be better. (that shouldn't
stop us from improving the code here, though).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Nov 15 02:16:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 15 Nov 2008 02:16:11 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811150716.mAF7GB1r002223@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2008-11-15 02:16 EST -------
I have uploaded a fixed version to CVS. Could you try it? Bio/triemodule.c,
revision 1.7.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Nov 15 11:29:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 15 Nov 2008 11:29:53 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811151629.mAFGTrgj008598@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #11 from eric.pruitt at gmail.com  2008-11-15 11:29 EST -------
(In reply to comment #10)
> A few comments:
> 
> 1) Is there a reason to use numpy.abs instead of Python's built-int abs? Timing
> these two functions suggests that they are equally fast.
> 2) I have no objection against James' suggestion to speed up the code. The
> original call to numpy.linalg.solve was probably overkill.
> 3) Can you submit a unit test that does not use scipy and rpy? We should avoid
> adding additional dependencies to Biopython.
> 4) In the long run, I am not sure whether Biopython is the right place for the
> lowess function. Probably NumPy or Matplotlib would be better. (that shouldn't
> stop us from improving the code here, though).
> 

Yes, I only had the scipy and rpy dependencies in my unit test because I wanted
to have something to compare your function to when I was going to first use it
in my code and to make sure it worked after I made changes to it.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Nov 15 12:07:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 15 Nov 2008 12:07:36 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811151707.mAFH7aZM010885@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


eric.pruitt at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1057 is|0                           |1
           obsolete|                            |


------- Comment #12 from eric.pruitt at gmail.com  2008-11-15 12:07 EST -------
Created an attachment (id=1060)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1060&action=view)
Updated lowess.py

Renamed "theta" to a more logical name, "weighted_mul_x." Replaced numpy.abs
with regular abs statement (Actually lead to a very slight but still there
speed increase).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Nov 15 12:08:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 15 Nov 2008 12:08:15 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811151708.mAFH8F6n010936@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


eric.pruitt at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1058 is|0                           |1
           obsolete|                            |


------- Comment #13 from eric.pruitt at gmail.com  2008-11-15 12:08 EST -------
Created an attachment (id=1061)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1061&action=view)
Unit test for lowess.py removing scipy and rpy dependencies


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 03:36:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 03:36:32 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811170836.mAH8aWoY027949@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #14 from mdehoon at ims.u-tokyo.ac.jp  2008-11-17 03:36 EST -------
I have uploaded the new code and the unit test with some modifications to CVS.
Could you have a look at it to see if you're happy with the result? I am using
numpy.dot(x,y) instead of sum(x*y) whereever possible; this gave an additional
speedup.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 05:33:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 05:33:37 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171033.mAHAXbbS003922@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-17 05:33 EST -------
I haven't tried this on Linux yet.

===================================

I've just updated to CVS and rebuilt on Windows with mingw32 (gcc 3.4.4 cygming
special), using Python 2.3, 2.4, 2.5 and 2.6 - no warnings from the Bio.Trie
code.  I should have checked for any warnings BEFORE updating to CVS, but
didn't.

===================================

However, on Mac OS X 10.5 "Leopard" with I now get a lot of pointer warnings:

building 'Bio.trie' extension
creating build/temp.macosx-10.3-i386-2.5
creating build/temp.macosx-10.3-i386-2.5/Bio
gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk
-fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd
-fno-common -dynamic -DNDEBUG -g -O3 -IBio
-I/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -c
Bio/triemodule.c -o build/temp.macosx-10.3-i386-2.5/Bio/triemodule.o
Bio/triemodule.c: In function ???_write_value_to_handle???:
Bio/triemodule.c:498: warning: passing argument 3 of
???PyString_AsStringAndSize??? from incompatible pointer type
Bio/triemodule.c: In function ???_write_value_to_handle???:
Bio/triemodule.c:498: warning: passing argument 3 of
???PyString_AsStringAndSize??? from incompatible pointer type
gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk
-fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd
-fno-common -dynamic -DNDEBUG -g -O3 -IBio
-I/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -c
Bio/trie.c -o build/temp.macosx-10.3-i386-2.5/Bio/trie.o
Bio/trie.c: In function ???Trie_set???:
Bio/trie.c:103: warning: pointer targets in passing argument 1 of ???strdup???
differ in signedness
Bio/trie.c:156: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:162: warning: pointer targets in passing argument 1 of ???strncpy???
differ in signedness
Bio/trie.c:162: warning: pointer targets in passing argument 2 of ???strncpy???
differ in signedness
Bio/trie.c:164: warning: pointer targets in passing argument 1 of ???strdup???
differ in signedness
Bio/trie.c: In function ???Trie_set???:
Bio/trie.c:103: warning: pointer targets in passing argument 1 of ???strdup???
differ in signedness
Bio/trie.c:156: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:162: warning: pointer targets in passing argument 1 of ???strncpy???
differ in signedness
Bio/trie.c:162: warning: pointer targets in passing argument 2 of ???strncpy???
differ in signedness
Bio/trie.c:164: warning: pointer targets in passing argument 1 of ???strdup???
differ in signedness
Bio/trie.c: In function ???Trie_get???:
Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strncmp???
differ in signedness
Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???strncmp???
differ in signedness
Bio/trie.c:235: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???_get_approximate_transition???:
Bio/trie.c:268: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:284: warning: pointer targets in passing argument 1 of ???strncat???
differ in signedness
Bio/trie.c:284: warning: pointer targets in passing argument 2 of ???strncat???
differ in signedness
Bio/trie.c: In function ???Trie_get???:
Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strncmp???
differ in signedness
Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???strncmp???
differ in signedness
Bio/trie.c:235: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???_get_approximate_transition???:
Bio/trie.c:268: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???_get_approximate_trie???:
Bio/trie.c:353: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:355: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:284: warning: pointer targets in passing argument 1 of ???strncat???
differ in signedness
Bio/trie.c:356: warning: pointer targets in passing argument 1 of ???strcat???
differ in signedness
Bio/trie.c:284: warning: pointer targets in passing argument 2 of ???strncat???
differ in signedness
Bio/trie.c:356: warning: pointer targets in passing argument 2 of ???strcat???
differ in signedness
Bio/trie.c:367: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:369: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???_get_approximate_trie???:
Bio/trie.c:353: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:355: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???Trie_has_prefix???:
Bio/trie.c:356: warning: pointer targets in passing argument 1 of ???strcat???
differ in signednessBio/trie.c:440: warning: pointer targets in passing
argument 1 of ???strlen??? differ in signedness

Bio/trie.c:356: warning: pointer targets in passing argument 2 of ???strcat???
differ in signedness
Bio/trie.c:441: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strncmp???
differ in signedness
Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???strncmp???
differ in signedness
Bio/trie.c:367: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:369: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???_iterate_helper???:
Bio/trie.c:468: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:470: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:475: warning: pointer targets in passing argument 1 of ???strcat???
differ in signedness
Bio/trie.c:475: warning: pointer targets in passing argument 2 of ???strcat???
differ in signedness
Bio/trie.c: In function ???Trie_has_prefix???:
Bio/trie.c:440: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:441: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???_with_prefix_helper???:
Bio/trie.c:521: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strncmp???
differ in signedness
Bio/trie.c:522: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???strncmp???
differ in signedness
Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strncmp???
differ in signedness
Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???strncmp???
differ in signedness
Bio/trie.c:530: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:536: warning: pointer targets in passing argument 1 of ???strncat???
differ in signedness
Bio/trie.c:536: warning: pointer targets in passing argument 2 of ???strncat???
differ in signedness
Bio/trie.c: In function ???_iterate_helper???:
Bio/trie.c:468: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:470: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:475: warning: pointer targets in passing argument 1 of ???strcat???
differ in signedness
Bio/trie.c:475: warning: pointer targets in passing argument 2 of ???strcat???
differ in signedness
Bio/trie.c: In function ???_with_prefix_helper???:
Bio/trie.c:521: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:522: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???_serialize_transition???:Bio/trie.c:524: warning:
pointer targets in passing argument 1 of ???strncmp??? differ in signedness

Bio/trie.c:621: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???strncmp???
differ in signedness
Bio/trie.c:530: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:536: warning: pointer targets in passing argument 1 of ???strncat???
differ in signedness
Bio/trie.c:536: warning: pointer targets in passing argument 2 of ???strncat???
differ in signedness
Bio/trie.c: In function ???_serialize_transition???:
Bio/trie.c:621: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???_deserialize_transition???:
Bio/trie.c:708: warning: pointer targets in passing argument 1 of ???strdup???
differ in signedness
Bio/trie.c: In function ???test???:
Bio/trie.c:752: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:753: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:754: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:755: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:757: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:758: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:759: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c: In function ???_deserialize_transition???:
Bio/trie.c:708: warning: pointer targets in passing argument 1 of ???strdup???
differ in signednessBio/trie.c:760: warning: pointer targets in passing
argument 2 of ???Trie_get??? differ in signedness

Bio/trie.c:762: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:763: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:765: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:768: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:769: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c: In function ???test???:
Bio/trie.c:752: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:753: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:754: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:755: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:757: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:758: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:759: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:760: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:762: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:763: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:765: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:768: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:769: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
gcc -arch i386 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -g -bundle
-undefined dynamic_lookup build/temp.macosx-10.3-i386-2.5/Bio/triemodule.o
build/temp.macosx-10.3-i386-2.5/Bio/trie.o -o
build/lib.macosx-10.3-i386-2.5/Bio/trie.so

$ python
Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53) 
[GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

$ gcc -v
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5465~16/src/configure --disable-checking
-enable-werror --prefix=/usr --mandir=/share/man
--enable-languages=c,objc,c++,obj-c++
--program-transform-name=/^[cg][^.-]*$/s/$/-4.0/
--with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib
--build=i686-apple-darwin9 --with-arch=apple --with-tune=generic
--host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.0.1 (Apple Inc. build 5465)

Note that this gcc is only 4.0.1, while Bruce reported this bug on 4.3.2.

The good news is test_trie.py and test_triefind.py still pass.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 05:41:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 05:41:35 -0500
Subject: [Biopython-dev] [Bug 2666] New: Bio.PDB.NeighborSearch self test
	often fails with MemoryError
Message-ID: <bug-2666-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2666

           Summary: Bio.PDB.NeighborSearch self test often fails with
                    MemoryError
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


>From the Biopython source code (from CVS), in the Bio/PDB folder, running
NeighborSearch.py does a quick self test.

This is a random test, and sometimes this is fine:

$ python NeighborSearch.py 
Found  1
Found  4
Found  3
Found  2
Found  2
Found  2
Found  3
Found  3
Found  1
Found  5
Found  2
Found  3
Found  2
Found  2
Found  2
Found  6
Found  3
Found  2
Found  3
Found  1

However, about 50% of the time I get something like this:

$ python NeighborSearch.py 
Found  2
Found  1
Found  2
Found  1
Found  1
Found  1
Found  4
Found 
Traceback (most recent call last):
  File "NeighborSearch.py", line 139, in <module>
    print "Found ", len(ns.search_all(5.0))
  File "NeighborSearch.py", line 104, in search_all
    self.kdt.all_search(radius)
  File
"/Users/pjcock/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/KDTree/KDTree.py",
line 198, in all_search
    self.neighbors = self.kdt.neighbor_search(radius)
MemoryError: calculation failed due to lack of memory

I've tried this on a MAC which had over 4GB or RAM free at the time, so I don't
believe this really is a MemoryError.

I've also tried this on a less powerful Windows machine, which fails in the
same way (it can finish the test, but possibly with a lower success rate).

[As an aside, I'm planning to use this self test to create an actual Biopython
unit test for the Bio.PDB.NeighborSearch module.]


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 06:42:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 06:42:24 -0500
Subject: [Biopython-dev] [Bug 2666] Bio.PDB.NeighborSearch self test often
	fails with KDTree MemoryError
In-Reply-To: <bug-2666-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171142.mAHBgOD9008929@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2666


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Bio.PDB.NeighborSearch self |Bio.PDB.NeighborSearch self
                   |test often fails with       |test often fails with KDTree
                   |MemoryError                 |MemoryError


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-17 06:42 EST -------
I suspect this is failing when there are NO entries found within the specified
radius.  Changing this line:

print "Found ", len(ns.search_all(5.0))

to use a larger search radius seems to "fix" the test, e.g.

print "Found ", len(ns.search_all(10.0))

Similarly, dropping it to radius 2.0 makes it fail almost every time.  I
suspect something is amiss in the KDTree C code from the traceback.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 06:44:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 06:44:45 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171144.mAHBijrj009171@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp  2008-11-17 06:44 EST -------
(In reply to comment #3)
Yes I know; that is bug #2608.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 07:09:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 07:09:15 -0500
Subject: [Biopython-dev] [Bug 2666] Bio.PDB.NeighborSearch self test often
	fails with KDTree MemoryError
In-Reply-To: <bug-2666-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171209.mAHC9FUF010799@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2666


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2008-11-17 07:09 EST -------
I fixed Bio.KDTree and committed it to CVS; please give it a try.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 07:14:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 07:14:19 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171214.mAHCEJa0011060@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-17 07:14 EST -------
(In reply to comment #4)
> (In reply to comment #3)
> Yes I know; that is bug #2608.
> 

Oh.  Sorry - I had seen Bug 2608 but hadn't made the connection.

I've just confirmed Linux with gcc 4.1.2 is still happy.

Over to Bruce to test with gcc 4.3.2 then...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 07:25:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 07:25:21 -0500
Subject: [Biopython-dev] [Bug 2666] Bio.PDB.NeighborSearch self test often
	fails with KDTree MemoryError
In-Reply-To: <bug-2666-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171225.mAHCPLmC011729@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2666


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-17 07:25 EST -------
That's fixed it - thanks!

I've also updated test_PDB.py to include a quick test of this code, based on
the Bio/PDB/NeighborSearch.py self test code.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From tiagoantao at gmail.com  Mon Nov 17 08:27:51 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 17 Nov 2008 13:27:51 +0000
Subject: [Biopython-dev] PopGen.Stats
Message-ID: <6d941f120811170527g752c28a7j48b42569c947853d@mail.gmail.com>

After too much thinking and too much delaying (delaying in two
distinct senses: proposal delay and delaying for more than 1 year
doing the module), here is my proposal on how to proceed.

Remembering a few fundamental points:

1. Statistics is the core of population genetics. Never Bio.PopGen
will be relevant without it.
2. The framework should be future proof.
3. The API should be for general use (ie not only based on the cases
developers know of).
4. It is very difficult to a have a broad view on how an API like this
can be used (uses vary population genetics of cancer with micro
arrays/lots of data to conservation genetics of species with a few
samples and little number of loci).

A waterfall approach to development is not only outdated as it would
be quite counter productive. So I have no bureaucratic design document
to provide.
My proposal is to choose a bunch of statistics and tests that are
representative of what people might use and implement them. During the
implementation, through refactoring a reasonable API should take form.
What statistics should be choosen then? What are representative statistics?

I was able to find a list of classifications to start. This list got
some inspiration from the very good Arlequin manual. Here are the
different dimensions that I found:
1. Intra-Population versus Inter-population statistics. Say expected
heterozygosity versus Fst
2. Marker dependent vs Marker independent. Say Allelic range (for
microsatelites only) versus Fis
3. Data type: haployic, genotypic phase unknown, genotypic phase
known, genoptypic dominant, frequency only. Say for expected
heterozygosity frequencies are enough, for observed heterozygosity
genotypic phase unknown data is necessary.
4. Single locus (e.g. allelic richness, ExpHe, Fst) versus multi-loci
(e.g., number of polimorphic sites, LD or EHH)
5. Temporal/longitudinal vs single point in time. Say temporal-Fst versus Fst.
6. Population versus Landscape. This issue I suggest abandon for now.

So, the idea is to choose a set of statistics that elucidate these
points, with a good subset we will have a feeling on how everything
fits together. We implement them and then iterate until the API "feels
good". A suggestion of statistics:

ExpHz non-temporal, intra, single-locus, marker independent, genotypic
- gametic unk
ObsHz non-temporal, intra, single-locus, independent, genotypic - gametic kn
Fst(CW) non-temporal, inter, single-locus, indep, genotypic - gametic unk
temporal-Fst temporal, intra, single-locus, indep, genotypic - gametic unk
LD(D') non-temporal, intra, multi-locus, indep, haplo/geno
Fk temporal, intra, single-locus, indep, geno
S (polimorphic sites), non-temporal, intra, multi-locus, indep, haplo/geno
Alleic range, nt, intra, single-locus, microsat, haplo/geno
EHH, nt, positional
Tajima D, nt, intra, single-locus, sequence/rflp

There is still the issue of tests (say Hardy-Weinberg deviation), but
that can be thought while the rest is being done.

The good news is that the half of the above is already implemented
(exceptions are allelic range, S, Tajima D, EHH - presented in
increasing order of implementation difficulty).

I propose implementing the remaining (I can do that, unless any other
wants to give it a try) and then iterate the API until there is a
rough agreement). This can be done on GIT (BTW, my username there is
tiagoantao). I propose that ability to influence policy is roughly
proportional with the time spent coding/effort done ;) .

PS - I am assuming a sequence is a single locus in my reasoning. Of
course it can be seen (and sometimes is) as a sequence of loci (SNPs).

From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 13:29:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 13:29:08 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171829.mAHIT8u9006711@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #6 from bsouthey at gmail.com  2008-11-17 13:29 EST -------

> Over to Bruce to test with gcc 4.3.2 then...
> 

Still the same warning for Python 2.5 and 2.6:

Bio/triemodule.c: In function ???_write_value_to_handle???:
Bio/triemodule.c:498: warning: passing argument 3 of
???PyString_AsStringAndSize??? from incompatible pointer type

See PEP 353 (http://www.python.org/dev/peps/pep-0353/) which suggests to
include:
#if PY_VERSION_HEX < 0x02050000 && !defined(PY_SSIZE_T_MIN)
typedef int Py_ssize_t;
#define PY_SSIZE_T_MAX INT_MAX
#define PY_SSIZE_T_MIN INT_MIN
#endif

I did not get the warning after I added it to Bio.trie.h (as I thought that
this would be the appropriate location for it) and changed the declaration in
_write_value_to_handle for length to:
Py_ssize_t length;

But while this is fine for Python 2.3 and Python 2.4, I get the error with
Python 2.5 and Python 2.6:

[snip]
test_trie ... ERROR
test_triefind ... ok

======================================================================
ERROR: test_trie
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_trie.py", line 87, in <module>
    trieobj3 = trie.load(h)
ValueError: bad marshal data


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bsouthey at gmail.com  Mon Nov 17 13:35:05 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 17 Nov 2008 12:35:05 -0600
Subject: [Biopython-dev] test_GASelection hangs
In-Reply-To: <bug-2666-42@http.bugzilla.open-bio.org/>
References: <bug-2666-42@http.bugzilla.open-bio.org/>
Message-ID: <4921B959.2080706@gmail.com>

Hi,
I was just running the test under a very fresh cvs version and under 
Python2.3 the test was hanging with test_GASelection. Of course, there 
was no problem after killing it and rerunning the test. I think this 
also pertains to bug 2651 so I thought I would ask if there was a way to 
examine this further before doing anything else.  I understand that this 
is problem with randomization involved, but it does indicate a more 
subtle problem is present.  I would really like to track down the source 
of the problem.

Does anyone have any ideas on how I could try to examine this further?

Thanks
Bruce

From biopython at maubp.freeserve.co.uk  Mon Nov 17 13:50:14 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 17 Nov 2008 18:50:14 +0000
Subject: [Biopython-dev] test_GASelection hangs
In-Reply-To: <4921B959.2080706@gmail.com>
References: <bug-2666-42@http.bugzilla.open-bio.org/>
	<4921B959.2080706@gmail.com>
Message-ID: <320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com>

On Mon, Nov 17, 2008 at 6:35 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Hi,
> I was just running the test under a very fresh cvs version and under
> Python2.3 the test was hanging with test_GASelection. Of course, there was
> no problem after killing it and rerunning the test. I think this also
> pertains to bug 2651 so I thought I would ask if there was a way to examine
> this further before doing anything else.  I understand that this is problem
> with randomization involved, but it does indicate a more subtle problem is
> present.  I would really like to track down the source of the problem.
>
> Does anyone have any ideas on how I could try to examine this further?

If you have installed CVS (or indeed any recent version of Biopython,
as the GA stuff hasn't changed recently IIRC), then in the Tests
directory you can just run:

$ python test_GASelection.py

You'll find sometimes it gets stuck.  I tried modifying the file so
that the end reads as follows:

if __name__ == "__main__":
    #sys.exit(run_tests(sys.argv))

    ALL_TESTS = [DiversitySelectionTest, TournamentSelectionTest,
                 RouletteWheelSelectionTest]

    runner = unittest.TextTestRunner(sys.stdout, verbosity = 2)
    test_loader = unittest.TestLoader()
    test_loader.testMethodPrefix = 't_'

    test=ALL_TESTS[1] #Edit me: 0, 1 or 2
    cur_suite = test_loader.loadTestsFromTestCase(test)
    count = 0
    while True :
        count += 1
        print "#"*50, count
        runner.run(cur_suite)

On my machine, DiversitySelectionTest and RouletteWheelSelectionTest
seem safe - the tests just run and run until you interrupt them with
ctrl+c.

However, this clearly gets stuck in TournamentSelectionTest - so we've
narrowed this down a bit.  Reading that bit of code, there is an
apparent risk of an infinite loop if by chance org_1 happens to be the
worst organism in the population.  Perhaps adding a simple counter to
break out of the loop if after 1000 tries org_1 is still the worst -
but I'm not sure what to do then.

Peter

From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 13:59:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 13:59:26 -0500
Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py
In-Reply-To: <bug-2651-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171859.mAHIxQgZ009193@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2651


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-17 13:59 EST -------
This is a quick hack to help pin-point the problem, assuming you have the CVS
or recent version of Biopython installed, modify the end of test_GAQueens.py as
follows:


if __name__ == "__main__":
    #sys.exit(main(sys.argv))
    count = 0
    while True :
        count +=1
        print "#"*50, count
        run_tests([])


This just repeats the test until it fails:

$ python test_GAQueens.py
...
################################################## 7
Calculating for 5 queens...
Generating an initial population of 1000 organisms...
Evolving the population and searching for a solution...
Traceback (most recent call last):
  File "test_GAQueens.py", line 405, in <module>
    run_tests([])
  File "test_GAQueens.py", line 42, in run_tests
    main(arguments)
  File "test_GAQueens.py", line 76, in main
    evolved_pop = evolver.evolve(queens_solved)
  File
"/Users/xxx/Downloads/Software/biopython-1.49b/build/lib.macosx-10.3-i386-2.5/Bio/GA/Evolver.py",
line 56, in evolve
    self._population = self._selector.select(self._population)
  File
"/Users/xxx/Downloads/Software/biopython-1.49b/build/lib.macosx-10.3-i386-2.5/Bio/GA/Selection/Tournament.py",
line 77, in select
    new_orgs[1])
  File
"/Users/xxx/Downloads/Software/biopython-1.49b/build/lib.macosx-10.3-i386-2.5/Bio/GA/Selection/Abstract.py",
line 53, in mutate_and_crossover
    final_org_1 = self._repairer.repair(final_org_1)
  File "test_GAQueens.py", line 234, in repair
    duplicated_items = self._get_duplicates(organism.genome)
  File "test_GAQueens.py", line 203, in _get_duplicates
    if genome.count(item) > 1:
  File
"/Users/xxx/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Seq.py",
line 886, in count
    raise TypeError("expected a string, Seq or MutableSeq")
TypeError: expected a string, Seq or MutableSeq

i.e. The same traceback as in Bruce's original report (allowing for the update
to the Seq object's count method), but easier to reproduce.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 14:18:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 14:18:24 -0500
Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py
In-Reply-To: <bug-2651-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171918.mAHJIO5t010436@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2651


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-17 14:18 EST -------
Solved with Tests/test_GAQueens.py revision 1.3 in CVS.

When  test_GAQueens.py was written, a Seq object would accept an integer
argument.  Since Biopython 1.45, or to be exact Bio/Seq.py CVS revision 1.20
(see Bug 2386), the Seq object's count method will not accept an integer
argument.   This wasn't deliberate, but is consistent with a python string.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bsouthey at gmail.com  Mon Nov 17 15:03:54 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 17 Nov 2008 14:03:54 -0600
Subject: [Biopython-dev] test_GASelection hangs
In-Reply-To: <320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com>
References: <bug-2666-42@http.bugzilla.open-bio.org/>	
	<4921B959.2080706@gmail.com>
	<320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com>
Message-ID: <4921CE2A.3090606@gmail.com>

Peter wrote:
> On Mon, Nov 17, 2008 at 6:35 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>   
>> Hi,
>> I was just running the test under a very fresh cvs version and under
>> Python2.3 the test was hanging with test_GASelection. Of course, there was
>> no problem after killing it and rerunning the test. I think this also
>> pertains to bug 2651 so I thought I would ask if there was a way to examine
>> this further before doing anything else.  I understand that this is problem
>> with randomization involved, but it does indicate a more subtle problem is
>> present.  I would really like to track down the source of the problem.
>>
>> Does anyone have any ideas on how I could try to examine this further?
>>     
>
> If you have installed CVS (or indeed any recent version of Biopython,
> as the GA stuff hasn't changed recently IIRC), then in the Tests
> directory you can just run:
>
> $ python test_GASelection.py
>
> You'll find sometimes it gets stuck.  I tried modifying the file so
> that the end reads as follows:
>
> if __name__ == "__main__":
>     #sys.exit(run_tests(sys.argv))
>
>     ALL_TESTS = [DiversitySelectionTest, TournamentSelectionTest,
>                  RouletteWheelSelectionTest]
>
>     runner = unittest.TextTestRunner(sys.stdout, verbosity = 2)
>     test_loader = unittest.TestLoader()
>     test_loader.testMethodPrefix = 't_'
>
>     test=ALL_TESTS[1] #Edit me: 0, 1 or 2
>     cur_suite = test_loader.loadTestsFromTestCase(test)
>     count = 0
>     while True :
>         count += 1
>         print "#"*50, count
>         runner.run(cur_suite)
>
> On my machine, DiversitySelectionTest and RouletteWheelSelectionTest
> seem safe - the tests just run and run until you interrupt them with
> ctrl+c.
>
> However, this clearly gets stuck in TournamentSelectionTest - so we've
> narrowed this down a bit.  Reading that bit of code, there is an
> apparent risk of an infinite loop if by chance org_1 happens to be the
> worst organism in the population.  Perhaps adding a simple counter to
> break out of the loop if after 1000 tries org_1 is still the worst -
> but I'm not sure what to do then.
>
> Peter
>
>   
Hi,
I ran the test multiple times using a bash loop and I think I tracked 
down this specific problem to within the actual test code, specifically 
the function TournamentSelectionTest.t_select_best(). I think this what 
Peter noticed.

This is how I understand things which I hope is sufficient correct to 
understand it.

The test simulates a genome that has 3 locations with the 4 bases coded 
as '0', '1', '2', and '3' for an 'organism'.  (Note the 3 locations is 
hard coded into the random_genome function.) The calculation of fitness 
of an organism is just the integer of the coded values do the first 
position is hundreds, the second is tens and last is ones.

In the TournamentSelectionTest.t_select_best, a second organism is 
simulated that must have a better fitness than the first. The problem 
comes is when the simulated genome of the first organism is '000' 
because the fitness is zero. This creates an infinite loop because the 
line :
            if org_2.fitness < org_1.fitness:
will always to false but eventually this must be true to break the loop. 
Obviously this loop becomes infinite and, given that there are only 
three locations, it should be rather frequent.

Is it sufficient to use the condition '<='?
Alternatively, is there someway to fix the genome of the first organism 
rather than a random one?
For example, instead of the random_organism() declare it as say:
org_1=Organism('100', test_fitness)


Bruce


From biopython at maubp.freeserve.co.uk  Mon Nov 17 16:49:02 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 17 Nov 2008 21:49:02 +0000
Subject: [Biopython-dev] test_GASelection hangs
In-Reply-To: <4921CE2A.3090606@gmail.com>
References: <bug-2666-42@http.bugzilla.open-bio.org/>
	<4921B959.2080706@gmail.com>
	<320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com>
	<4921CE2A.3090606@gmail.com>
Message-ID: <320fb6e00811171349j3bb2757epa7e52e5e55ac0c95@mail.gmail.com>

Bruce wrote:
> Peter wrote:
>> However, this clearly gets stuck in TournamentSelectionTest - so we've
>> narrowed this down a bit.  Reading that bit of code, there is an
>> apparent risk of an infinite loop if by chance org_1 happens to be the
>> worst organism in the population.  Perhaps adding a simple counter to
>> break out of the loop if after 1000 tries org_1 is still the worst -
>> but I'm not sure what to do then.
>>
>> Peter
>
> Hi,
> I ran the test multiple times using a bash loop and I think I tracked down
> this specific problem to within the actual test code, specifically the
> function TournamentSelectionTest.t_select_best(). I think this what Peter
> noticed.

Yes, this was what I was describing.

> This is how I understand things which I hope is sufficient correct to
> understand it.
>
> The test simulates a genome that has 3 locations with the 4 bases coded
> as '0', '1', '2', and '3' for an 'organism'.  (Note the 3 locations is hard
> coded into the random_genome function.) The calculation of fitness of an
> organism is just the integer of the coded values do the first position is
> hundreds, the second is tens and last is ones.
>
> In the TournamentSelectionTest.t_select_best, a second organism is simulated
> that must have a better fitness than the first. The problem comes is when
> the simulated genome of the first organism is '000' because the fitness is
> zero. This creates an infinite loop because the line :
>           if org_2.fitness < org_1.fitness:
> will always to false but eventually this must be true to break the loop.
> Obviously this loop becomes infinite and, given that there are only three
> locations, it should be rather frequent.

Yes.

> Is it sufficient to use the condition '<='?

No, I don't think so.  The point of the setup seems to be to look for
a pair of organisms where one is measurably fitter than the other (and
make sure the better one is indeed selected).

> Alternatively, is there someway to fix the genome of the first organism
> rather than a random one?
> For example, instead of the random_organism() declare it as say:
> org_1=Organism('100', test_fitness)

We could do something like:

#Choose anything except the worst organism, "000",
while True :
    org_1=random_organism()
    if test_fitness(org_1) > 0 : break

[Not tested yet]

This at least is more or less random.

Peter

From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 17:10:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 17:10:27 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811172210.mAHMARax021977@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #15 from eric.pruitt at gmail.com  2008-11-17 17:10 EST -------
(In reply to comment #14)
> I have uploaded the new code and the unit test with some modifications to CVS.
> Could you have a look at it to see if you're happy with the result? I am using
> numpy.dot(x,y) instead of sum(x*y) whereever possible; this gave an additional
> speedup.
> 

That worked really well; I'm happy with the results.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 17:22:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 17:22:52 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811172222.mAHMMq6F022720@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-17 17:22 EST -------
(In reply to comment #15)
> 
> That worked really well; I'm happy with the results.
> 

Excellent - thanks James & Michiel!

Marking this bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bsouthey at gmail.com  Mon Nov 17 17:49:19 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 17 Nov 2008 16:49:19 -0600
Subject: [Biopython-dev] test_GASelection hangs
In-Reply-To: <320fb6e00811171349j3bb2757epa7e52e5e55ac0c95@mail.gmail.com>
References: <bug-2666-42@http.bugzilla.open-bio.org/>	
	<4921B959.2080706@gmail.com>	
	<320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com>	
	<4921CE2A.3090606@gmail.com>
	<320fb6e00811171349j3bb2757epa7e52e5e55ac0c95@mail.gmail.com>
Message-ID: <4921F4EF.4030005@gmail.com>

Peter wrote:
[snip]
>   
>> Alternatively, is there someway to fix the genome of the first organism
>> rather than a random one?
>> For example, instead of the random_organism() declare it as say:
>> org_1=Organism('100', test_fitness)
>>     
>
> We could do something like:
>
> #Choose anything except the worst organism, "000",
> while True :
>     org_1=random_organism()
>     if test_fitness(org_1) > 0 : break
>   
This needs to be:
if org_1.fitness > 0 : break

Also, when looping the test, I occasionally get
Test not getting an organism already in the new population. ... FAIL
Test basic selection on a small population. ... ok

======================================================================
FAIL: Test not getting an organism already in the new population.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_GASelection.py", line 130, in t_no_retrive_organism
    assert new_org != org, "Got organism already in the new population."
AssertionError: Got organism already in the new population.

I'll try to look at it tomorrow.

Bruce

PS thanks for fixing test_GAQueens.py as I have not got it error even 
running it 10000 times.

From biopython at maubp.freeserve.co.uk  Mon Nov 17 18:18:12 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 17 Nov 2008 23:18:12 +0000
Subject: [Biopython-dev] test_GASelection hangs
In-Reply-To: <4921F4EF.4030005@gmail.com>
References: <bug-2666-42@http.bugzilla.open-bio.org/>
	<4921B959.2080706@gmail.com>
	<320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com>
	<4921CE2A.3090606@gmail.com>
	<320fb6e00811171349j3bb2757epa7e52e5e55ac0c95@mail.gmail.com>
	<4921F4EF.4030005@gmail.com>
Message-ID: <320fb6e00811171518p78a3c25cq527c2ef338692ad2@mail.gmail.com>

> This needs to be:
> if org_1.fitness > 0 : break

Yeah.  I've checked in a fix based on this approach, could you try
test_GASelection.py revision 1.3 just to make sure I've not done
something silly.

> Also, when looping the test, I occasionally get
> Test not getting an organism already in the new population. ... FAIL
> Test basic selection on a small population. ... ok
>
> ======================================================================
> FAIL: Test not getting an organism already in the new population.
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "test_GASelection.py", line 130, in t_no_retrive_organism
>   assert new_org != org, "Got organism already in the new population."
> AssertionError: Got organism already in the new population.

Confirmed - when I was just looking for the hanging sub-test, I didn't
spot this.

>From my reading of the GA code there is no guarantee that
DiversitySelection will return a completely new organism.  If it has
to generate one at random, there is a small chance it will match
something already in the population.  i.e. the test itself is flawed.
We could try this say 10 times, but even then the test could fail.

I've fixed this in test_GASelection.py revision 1.4 by simply
commenting out the assert in
DiversitySelectionTest.t_no_retrive_organism.  However, maybe the
underlying Bio.GA.Selection.Diversity code could be altered instead to
guarantee this possibly desirable behaviour?

Peter

From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 06:13:31 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 06:13:31 -0500
Subject: [Biopython-dev] [Bug 2670] New: Populate seqfeature.display_name
Message-ID: <bug-2670-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2670

           Summary: Populate seqfeature.display_name
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


The seqfeature table has a display_name text field, currently left blank by
Biopython's loader, but is populated by BioPerl.  This field is used in GBrowse
for example: http://gmod.org/wiki/GBrowse

We could use the protein_id, locus_tag, etc depending on what annotation is
available (ideally use the same as BioPerl).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 10:06:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 10:06:06 -0500
Subject: [Biopython-dev] [Bug 2671] New: Including GenomeDiagram in the main
	Biopython distribution
Message-ID: <bug-2671-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671

           Summary: Including GenomeDiagram in the main Biopython
                    distribution
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lpritc at scri.sari.ac.uk


Thanks largely to the efforts of Robert Cadena, we have modified GenomeDiagram
so that it plays nicely with the current CVS of Biopython and would like to
propose its inclusion as part of the main distribution.

GenomeDiagram is described in a Bioinformatics publication
(http://dx.doi.org/10.1093/bioinformatics/btk021), and is useful for
construction of circular and linear  images of biological sequence data, with a
specific domain of visualisation of large-scale genomic, comparative genomic
and other data with reference to a single chromosome or other biological
sequence as publication-quality vector graphics.  It's based on the Reportlab
backend, and can be used to produce rastered and streamed image output, too.

The major changes that have been made to the version previously available at
http://bioinf.scri.ac.uk/lp are:

Class names have been changed and no longer have the GD prefix

References to 'colour' have been changed to 'color', but both spellings are
still permitted in function calls, for backwards-compatibility

The default font has been changed to 'Vera', which is shipped with Reportlab,
to avoid some problems with unavailable fonts

Code for wx widgets has been removed, although the Observer/Observable code
remains, allowing user widgets to hook into the code, if that's desirable.

Some test code is included, testing colour translation and the ability to
produce PDF output in circular and linear diagram formats.

Other minor changes to reduce deprecation warnings (those in Reportlab proper
remain, however), and to remove code that caused font issues.

There are known issues, still.  Writing to a raster format, such as PNG, uses
Reportlab's renderPM code, which defaults to using fonts that are not installed
by Reportlab itself, anymore.  This is a Reportlab issue and doesn't affect
production of PDF output, so testing currently only checks the ability to
generate PDF output.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 10:12:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 10:12:32 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811181512.mAIFCWJY023516@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #1 from lpritc at scri.sari.ac.uk  2008-11-18 10:12 EST -------
Created an attachment (id=1063)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1063&action=view)
GenomeDiagram code, ready to drop into Biopython CVS

Contains GenomeDiagram code under Bio.Graphics.GenomeDiagram, and test code
with examples.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 10:44:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 10:44:29 -0500
Subject: [Biopython-dev] [Bug 2672] New: test_lowess and test_docstrings
	fail to check if numpy is installed
Message-ID: <bug-2672-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2672

           Summary: test_lowess and test_docstrings fail to check if numpy
                    is installed
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: minor
          Priority: P5
         Component: Unit Tests
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


I used the cvs version with a version Python 2.5 that does not have numpy
installed.

Both test_lowess and test_docstring need to have checks for the presence of
Numpy like other tests that require NumPy. These tests should also be skipped
with messages like:
test_kNN ... skipping. Install NumPy if you want to use Bio.kNN. 


======================================================================
ERROR: test_docstrings
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_docstrings.py", line 18, in <module>
    import Bio.Statistics.lowess
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/Statistics/lowess.py",
line 23, in <module>
    import numpy
ImportError: No module named numpy

======================================================================
ERROR: test_lowess
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_lowess.py", line 1, in <module>
    from Bio.Statistics.lowess import lowess
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/Statistics/lowess.py",
line 23, in <module>
    import numpy
ImportError: No module named numpy


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 10:56:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 10:56:01 -0500
Subject: [Biopython-dev] [Bug 2672] test_lowess and test_docstrings fail to
	check if numpy is installed
In-Reply-To: <bug-2672-42@http.bugzilla.open-bio.org/>
Message-ID: <200811181556.mAIFu1o1026838@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2672


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-18 10:56 EST -------
I've fixed test_lowess.py with CVS revision 1.2 to check for numpy as in Bug
2534

For test_docstring.py, I think we could split this in two:

test_docstring.py - no numpy dependence
test_docstring_numpy.py - for modules which need numpy

Or, have some code within test_docstring.py to adjust the list of tests
according to if numpy is installed or not.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 11:05:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 11:05:29 -0500
Subject: [Biopython-dev] [Bug 2672] test_lowess and test_docstrings fail to
	check if numpy is installed
In-Reply-To: <bug-2672-42@http.bugzilla.open-bio.org/>
Message-ID: <200811181605.mAIG5TjK027987@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2672


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-18 11:05 EST -------
(In reply to comment #1)
> For test_docstring.py, I think we could split this in two:
> 
> test_docstring.py - no numpy dependence
> test_docstring_numpy.py - for modules which need numpy
> 
> Or, have some code within test_docstring.py to adjust the list of tests
> according to if numpy is installed or not.

I've gone for the second approach, see test_docstring.py CVS revision 1.6

Marking as fixed.

Thanks Bruce :)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 11:08:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 11:08:54 -0500
Subject: [Biopython-dev] [Bug 2607] Gcc "differ in signedness" warning with
	cstringfnsmodule.c
In-Reply-To: <bug-2607-42@http.bugzilla.open-bio.org/>
Message-ID: <200811181608.mAIG8ss2028159@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2607


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-18 11:08 EST -------
Since this bug was filed, we've declared this module obsolete for Biopython
1.49, and assuming we press ahead and deprecate it in Biopython 1.50 then I
don't see any point in fixing this compiler warning.

Marking as "won't fix".


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 13:35:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 13:35:25 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811181835.mAIIZPgc004892@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-18 13:35 EST -------
(In reply to comment #6)
> Still the same warning for Python 2.5 and 2.6:
> 
> Bio/triemodule.c: In function ???_write_value_to_handle???:
> Bio/triemodule.c:498: warning: passing argument 3 of
> ???PyString_AsStringAndSize??? from incompatible pointer type

It looks like PyString_AsStringAndSize will expect a Py_ssize_t length, and not
just an int length.  Suggested patch:


Index: triemodule.c
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/triemodule.c,v
retrieving revision 1.7
diff -r1.7 triemodule.c
486a487,489
> #if PY_VERSION_HEX < 0x02050000
>     Py_ssize_t length;
> #else
487a491
> #endif


i.e. in function  _write_value_to_handle, at line 486 replace this:

    int length;

with this:

#if PY_VERSION_HEX < 0x02050000
    Py_ssize_t length;
#else
    int length;
#endif

This still compiles for me on Python 2.5.2 with gcc 4.0.1 on a Mac.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 21:11:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 21:11:34 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811190211.mAJ2BYpO031573@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp  2008-11-18 21:11 EST -------
I've uploaded a slightly different version to CVS (there were more Py_ssize_t /
int issues). Could you try that one? Bio/triemodule.c, revision 1.8. We should
also see if the unit test still passes on 64 bit platforms.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 22:08:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 22:08:43 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811190308.mAJ38hkI003686@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #9 from bsouthey at gmail.com  2008-11-18 22:08 EST -------
I quickly build the cvs version and the associated tests passed with the
various Python versions 2.3, 2.4, 2.5 (with and without numpy) and 2.6 on my
system.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 03:45:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 03:45:52 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811190845.mAJ8jqv4023408@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #2 from lpritc at scri.sari.ac.uk  2008-11-19 03:45 EST -------
The copyright/credit section at the top of each file still needs to be changed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 05:14:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 05:14:57 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191014.mAJAEv6m032436@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-19 05:14 EST -------
(In reply to comment #8)
> I've uploaded a slightly different version to CVS (there were more Py_ssize_t
> / int issues). Could you try that one? Bio/triemodule.c, revision 1.8. We
> should also see if the unit test still passes on 64 bit platforms.
> 

CVS version compiles triemodule with no warnings using Python 2.5.2 with gcc
4.0.1 on a Mac.  Unit tests pass.

CVS version compiles triemodule with no warnings using Python 2.5 with gcc
4.1.2 on Linux (i686 so 32 bit).  Unit tests pass.

CVS version compiles triemodule with no warnings using Python 2.4.3 with gcc
3.4.6 on Linux (x86_64 so 64 bit).  Unit tests pass.

It sounds like Bruce has checked all python versions with gcc 4.3.2 on Linux.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 07:17:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 07:17:23 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191217.mAJCHN21008817@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #11 from mdehoon at ims.u-tokyo.ac.jp  2008-11-19 07:17 EST -------
I tried several Windows versions and a 64 bit unix platform. Everything seems
to be OK. Closing this bug.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 09:38:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 09:38:33 -0500
Subject: [Biopython-dev] [Bug 2674] New: test_kNN: Removal of from numpy
	import *
Message-ID: <bug-2674-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2674

           Summary: test_kNN: Removal of from numpy import *
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Unit Tests
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


This test contains a import numpy statement to check numpy is available.
Therefore it is sufficient just to say 'import numpy'.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 09:39:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 09:39:52 -0500
Subject: [Biopython-dev] [Bug 2674] test_kNN: Removal of from numpy import *
In-Reply-To: <bug-2674-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191439.mAJEdqkH019174@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2674


------- Comment #1 from bsouthey at gmail.com  2008-11-19 09:39 EST -------
Created an attachment (id=1064)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1064&action=view)
patch to change import numpy statement

Just for completeness.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 09:42:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 09:42:27 -0500
Subject: [Biopython-dev] [Bug 2675] New: Use import numpy in kNN
Message-ID: <bug-2675-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2675

           Summary: Use import numpy in kNN
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


Replacing the 'from numpy import *' statement with import numpy.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 09:43:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 09:43:12 -0500
Subject: [Biopython-dev] [Bug 2675] Use import numpy in kNN
In-Reply-To: <bug-2675-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191443.mAJEhCXu019472@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2675


------- Comment #1 from bsouthey at gmail.com  2008-11-19 09:43 EST -------
Created an attachment (id=1065)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1065&action=view)
patch to change import numpy statement

Changes the way numpy is imported.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 09:53:31 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 09:53:31 -0500
Subject: [Biopython-dev] [Bug 2676] New: LogisticRegression: changed the way
	numpy is imported
Message-ID: <bug-2676-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2676

           Summary: LogisticRegression: changed the way numpy is imported
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


A patch to remove the usage of 'from numpy import *' usage.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 09:54:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 09:54:10 -0500
Subject: [Biopython-dev] [Bug 2676] LogisticRegression: changed the way
	numpy is imported
In-Reply-To: <bug-2676-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191454.mAJEsAeg020318@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2676


------- Comment #1 from bsouthey at gmail.com  2008-11-19 09:54 EST -------
Created an attachment (id=1066)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1066&action=view)
patch to change import numpy statement


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 10:04:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 10:04:39 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191504.mAJF4diO021040@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


chapmanb at 50mail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |biopython-dev at biopython.org
         AssignedTo|biopython-dev at biopython.org |chapmanb at 50mail.com


------- Comment #3 from chapmanb at 50mail.com  2008-11-19 10:04 EST -------
Leighton;
This is great; thanks for getting it together. I took a look at this last night
and have a couple of quick comments:

- on the licensing front, the current GPL is not compatible with the Biopython
license; it would be nice to have you explicitly say you are okay with
re-licensing this version under the Biopython license
(http://www.biopython.org/DIST/LICENSE)

- Would it be possible to update the GenomeDiagram documentation from here
(http://bioinf.scri.ac.uk/lp/downloads/programs/genomediagram/userguide.pdf) to
reflect the new namespace and class name changes? Mentioning some of the
gotchas you have below, possibly to replace the installation section, would
also be nice.

I would like Peter and anyone one else interested to weigh in, but I can work
on getting this in after the next release.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 10:13:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 10:13:46 -0500
Subject: [Biopython-dev] [Bug 2674] test_kNN: Removal of from numpy import *
In-Reply-To: <bug-2674-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191513.mAJFDkuO021701@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2674


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-19 10:13 EST -------
Fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 10:17:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 10:17:28 -0500
Subject: [Biopython-dev] [Bug 2675] Use import numpy in kNN
In-Reply-To: <bug-2675-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191517.mAJFHSID022021@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2675


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-19 10:17 EST -------
Fixed in CVS,

Thanks.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 10:21:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 10:21:41 -0500
Subject: [Biopython-dev] [Bug 2676] LogisticRegression: changed the way
	numpy is imported
In-Reply-To: <bug-2676-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191521.mAJFLf8a022292@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2676


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-19 10:21 EST -------
Fixed in CVS, thanks!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 10:29:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 10:29:25 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191529.mAJFTPhW022858@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1062 is|0                           |1
           obsolete|                            |


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-19 10:29 EST -------
(From update of attachment 1062)
This attachment seems to have been removed (or failed to upload?).

See attachment 1063 instead.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 10:29:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 10:29:50 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191529.mAJFTon7022928@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-19 10:29 EST -------
(In reply to comment #3)
> 
> I would like Peter and anyone one else interested to weigh in, but
> I can work on getting this in after the next release.
> 

I'm all for adding GenomeDiagram to Biopython (as stated on the mailing list).  

I haven't actually looked at this revised code base yet - but as I've used GD
before and know Leighton "in real life" it might be easier for me to shepherd
this into CVS - but the more eyes the better ;)

We might also consider getting Leighton CVS access (provisionally use with this
module only).

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 11:07:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 11:07:24 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191607.mAJG7OcJ025581@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #6 from lpritc at scri.sari.ac.uk  2008-11-19 11:07 EST -------
(In reply to comment #5)
> Leighton;
> This is great; thanks for getting it together. I took a look at this last night
> and have a couple of quick comments:

No problem.  Robert Cadena deserves the bulk of the credit - he made most of
the changes.

> - on the licensing front, the current GPL is not compatible with the Biopython
> license; it would be nice to have you explicitly say you are okay with
> re-licensing this version under the Biopython license
> (http://www.biopython.org/DIST/LICENSE)

I am perfectly happy with re-licensing the GD code under the Biopython license.
 If you need a gpg-signed document to say so, I can provide one ;)

> - Would it be possible to update the GenomeDiagram documentation from here
> (http://bioinf.scri.ac.uk/lp/downloads/programs/genomediagram/userguide.pdf) to
> reflect the new namespace and class name changes? 

Yep - I'll do that, next.

> Mentioning some of the
> gotchas you have below, possibly to replace the installation section, would
> also be nice.

Definitely.  Most of the gotchas are Reportlab-related, but they definitely
have a place under Installation in the docs.

> I would like Peter and anyone one else interested to weigh in, but I can work
> on getting this in after the next release.

The more, the merrier... it's not my little baby anymore <sniff> it's out in
the big world ;)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 16:49:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 16:49:48 -0500
Subject: [Biopython-dev] [Bug 2677] New: BioSQL seqfeature enhancements
Message-ID: <bug-2677-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677

           Summary: BioSQL seqfeature enhancements
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: cymon.cox at gmail.com


Cleaned-up (sub-)seqFeature locations, and strand. Added location_operator
storage and test. Added remote location storage for sub-features, and test.

Ive used the "Sequence Keys" ontology for the location operator and stored loc
op in the location_qualifier_value table - not sure this is right...

Patches attached.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 16:51:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 16:51:53 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811192151.mAJLprRP024242@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #1 from cymon.cox at gmail.com  2008-11-19 16:51 EST -------
Created an attachment (id=1072)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1072&action=view)
Patch for BioSQL/BioSeq.py and Loader.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 16:52:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 16:52:46 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811192152.mAJLqk91024384@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #2 from cymon.cox at gmail.com  2008-11-19 16:52 EST -------
Created an attachment (id=1073)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1073&action=view)
Patch for BioSQL test cases


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 05:17:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 05:17:17 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811201017.mAKAHHA8027467@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-20 05:17 EST -------
(In reply to comment #0)
> Cleaned-up (sub-)seqFeature locations, and strand. Added location_operator
> storage and test. Added remote location storage for sub-features, and test.
>

Excellent - I see you've removed the naive min/max to find the parent feature's
location when dealing with sub-features.  This should fix the special case
where a feature spans the origin on a circular genome.

That should take care of many of my "TODO" entries in test_BioSQL_SeqIO.py :)

>
> Ive used the "Sequence Keys" ontology for the location operator and stored
> loc op in the location_qualifier_value table - not sure this is right...
>

I'm not sure off hand either, but would like us to check before committing
this.  In the short term, what ever BioPerl does is "right" as I'm treating
that as the BioSQL reference implementation.

> 
> Patches attached.
>

I've scanned over them quickly, and they look fine.  The comments do help :)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 05:53:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 05:53:19 -0500
Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence
	objects "
In-Reply-To: <bug-2662-42@http.bugzilla.open-bio.org/>
Message-ID: <200811201053.mAKArJsp029436@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2662


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-20 05:53 EST -------
Unless anyone else wants to weigh in on Josh's side, I'm not going to change
this.  Closing bug - but thanks for reporting it anyway Josh.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Thu Nov 20 05:55:57 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 20 Nov 2008 10:55:57 +0000
Subject: [Biopython-dev] Biopython 1.49 beta released
In-Reply-To: <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com>
References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com>
	<320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com>
Message-ID: <320fb6e00811200255x5325a7d4kf4d118350a9e7e65@mail.gmail.com>

OK,

Progress since Biopython 1.49 beta was released:

> We've had a few Numeric -> NumPy bugs reported,
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2658
> Bug 2658 - Bio.PDB.Neighborsearch

Fixed.

> http://bugzilla.open-bio.org/show_bug.cgi?id=2649
> Bug 2649 - Bio.KDTree (probably fixed)

No confirmation from the original reporter, but looks OK.

> I don't think we should release Biopython 1.49 final until these are
> resolved - but if there was interest I could put out a second beta.

No-one seems to want a second beta, which saves me some time :)

There have been a few other bugs reported and fixed in the meantime,
right now the only thing I think holding up the release of Biopython
1.49 is:

http://bugzilla.open-bio.org/show_bug.cgi?id=2677
Bug 2677 - BioSQL seqfeature enhancements

Is there anything else?

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 09:19:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 09:19:39 -0500
Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence
	objects "
In-Reply-To: <bug-2662-42@http.bugzilla.open-bio.org/>
Message-ID: <200811201419.mAKEJcW6011296@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2662


------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp  2008-11-20 09:19 EST -------
I am not a native English speaker, but I do agree with Josh that the original
phrase "... different set of methods TO a plain python string" sounds strange
to me. I would suggest something along the lines of "the set of methods of a
Seq object are slightly different from those of a plain python string."
But again, that may be Double Dutch.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 09:34:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 09:34:25 -0500
Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence
	objects "
In-Reply-To: <bug-2662-42@http.bugzilla.open-bio.org/>
Message-ID: <200811201434.mAKEYPOh015951@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2662


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-20 09:34 EST -------
(In reply to comment #3)
> I am not a native English speaker, but I do agree with Josh that the original
> phrase "... different set of methods TO a plain python string" sounds strange
> to me.

As a native English speaker I'm happy with this as is, but concede
international usage may vary - and I do want the Tutorial to be as assessable
as possible.

> I would suggest something along the lines of "the set of methods of a
> Seq object are slightly different from those of a plain python string."
> But again, that may be Double Dutch.

I would say a "set of methods" is singular, but the rest of this sentence is
plural.  How about completely rephrasing:

First of all, they have some different methods (for example, Seq objects have
reverse_complement() and translate() methods used for nucleotide sequences).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bsouthey at gmail.com  Thu Nov 20 10:09:42 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu, 20 Nov 2008 09:09:42 -0600
Subject: [Biopython-dev] Biopython 1.49 beta released
In-Reply-To: <320fb6e00811200255x5325a7d4kf4d118350a9e7e65@mail.gmail.com>
References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com>	<320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com>
	<320fb6e00811200255x5325a7d4kf4d118350a9e7e65@mail.gmail.com>
Message-ID: <49257DB6.5080902@gmail.com>

Hi,
In connection with Peter's email on forthcoming release, I was wondering 
what to do about certain modules that do not seem to be used. I started 
to look at the examples that lack test coverage in case one could do 
something for the Biopython 1.49 release. But this should not provide 
any reason for delay the release and may stretch beyond it.

Given the potential long term impact and spirit of people who donated 
the code, I was thinking that the release notes could denote which 
modules are unsupported and need some usage feedback.  In future 
releases the use of these modules would raise a warning about being 
unsupported or obsolete. Please note that I am not against any of these 
modules except for the requirement to maintain them and developing 
suitable tests.

The possible modules are those that Peter previously mentioned that had 
no tests:

Bio.Affy
Bio.AlignAce
Bio.EZRetrieve
Bio.Emboss (everything except the primer parsers)
Bio.Encodings (obsolete?)
Bio.FilteredReader (obsolete?)
Bio.MaxEntropy
Bio.NMR
Bio.NaiveBayes
Bio.NetCatch (obsolete?)

I think that Bio.MaxEntropy and Bio.NaiveBayes are useful and I did 
provide an example that is included in the code. However I am not 
confident in these methods to maintain these mainly due to my lack of 
knowledge.

Similarly for Bio.Affy, I currently work a lot with two-dye systems but 
not Affy. I find that Bio.Affy provides insufficient functionality 
because it does really only reads the intensities and misses other 
important information in version 3 of Affy format. I do recognize that 
it could be a base for Affy stuff that may be useful for users such as 
the PopGen users that use Affy SNP arrays.

Bruce


Peter wrote:
> OK,
>
> Progress since Biopython 1.49 beta was released:
>
>   
>> We've had a few Numeric -> NumPy bugs reported,
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2658
>> Bug 2658 - Bio.PDB.Neighborsearch
>>     
>
> Fixed.
>
>   
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2649
>> Bug 2649 - Bio.KDTree (probably fixed)
>>     
>
> No confirmation from the original reporter, but looks OK.
>
>   
>> I don't think we should release Biopython 1.49 final until these are
>> resolved - but if there was interest I could put out a second beta.
>>     
>
> No-one seems to want a second beta, which saves me some time :)
>
> There have been a few other bugs reported and fixed in the meantime,
> right now the only thing I think holding up the release of Biopython
> 1.49 is:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2677
> Bug 2677 - BioSQL seqfeature enhancements
>
> Is there anything else?
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>   


From bsouthey at gmail.com  Thu Nov 20 11:26:40 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu, 20 Nov 2008 10:26:40 -0600
Subject: [Biopython-dev] Bio.EZRetrieve appears to be obsolete or redunant
Message-ID: <49258FC0.10703@gmail.com>

Hi,
The Bio.EZRetrieve module retrieves a single nucleotide sequence from 
EZRetrieve website:
http://siriusb.umdnj.edu:18080/EZRetrieve/single_r.jsp
It requires a human, rat or mouse nucleic GenBank, UniGene, LocusLink, 
or IMAGE ID. No other genomes are supported.

Although it appears faster than a Bio.GenBank query, I do not see that 
this module provides any special functionality than that already 
provided by Bio.GenBank and similar. So I think this module is obsolete 
and redundant.

Notes:
1) Obviously LocusLink has been superseded by Entrez Gene.
2) The documented genome builds are 2003 (eg human BUILD.34 at 11/04/2003) 
but not known if these have been updated since.
3) The start of the sequence is zero. You can use from_='start' instead 
but the can not mix it with numerical ending.
4) The actual website provides additional information including NCBI 
links (LocusLink and Nucleic) and does base counting.
5) There are other functions provided by the website like multiple 
retrievals.

The website example is for 'homeobox B6 [/Homo sapiens/]':

import Bio.EZRetrieve
seq=Bio.EZRetrieve.retrieve_single('BC014651', 1, 20)
print seq

Gives:
 >BC014651:HOXB6                        
ACCACACCTAGGTCGGAGCA

Bruce


From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 12:05:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 12:05:22 -0500
Subject: [Biopython-dev] [Bug 2678] New: Entrez.esearch does not always
	retrieve or find DTD files
Message-ID: <bug-2678-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678

           Summary: Entrez.esearch does not always retrieve or find DTD
                    files
           Product: Biopython
           Version: 1.49b
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lpritc at scri.sari.ac.uk


When using Entrez.esearch, I have observed an intermittent failure to recover
DTD files.  These are not being cached on successful search attempts.  It may
be worth including them in the distribution.

Traceback:

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py:279:
UserWarning: DTD file xhtml1-strict.dtd not found in Biopython installation;
trying to retrieve it from NCBI
  warnings.warn("DTD file %s not found in Biopython installation; trying to
retrieve it from NCBI" % filename)
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py:279:
UserWarning: DTD file xhtml-lat1.ent not found in Biopython installation;
trying to retrieve it from NCBI
  warnings.warn("DTD file %s not found in Biopython installation; trying to
retrieve it from NCBI" % filename)
Traceback (most recent call last):
  File "./get_entrez_ests.py", line 158, in <module>
    main()
  File "./get_entrez_ests.py", line 45, in main
    options.verbose)
  File "./get_entrez_ests.py", line 76, in get_entrez_session
    results = Entrez.read(handle)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/__init__.py",
line 286, in read
    record = handler.run(handle)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py",
line 95, in run
    self.parser.ParseFile(handle)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py",
line 283, in external_entity_ref_handler
    parser.ParseFile(handle)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py",
line 280, in external_entity_ref_handler
    handle = urllib.urlopen(systemId)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py",
line 87, in urlopen
    return opener.open(url)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py",
line 203, in open
    return getattr(self, name)(url)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py",
line 461, in open_file
    return self.open_local_file(url)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py",
line 475, in open_local_file
    raise IOError(e.errno, e.strerror, e.filename)
IOError: [Errno 2] No such file or directory: 'xhtml-lat1.ent'


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Thu Nov 20 12:06:34 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 20 Nov 2008 17:06:34 +0000
Subject: [Biopython-dev] Bio.EZRetrieve appears to be obsolete or
	redunant
In-Reply-To: <49258FC0.10703@gmail.com>
References: <49258FC0.10703@gmail.com>
Message-ID: <320fb6e00811200906p4b8ba2b9jca212a39ec8f972c@mail.gmail.com>

On Thu, Nov 20, 2008 at 4:26 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Hi,
> The Bio.EZRetrieve module retrieves a single nucleotide sequence from
> EZRetrieve website:
> http://siriusb.umdnj.edu:18080/EZRetrieve/single_r.jsp
> It requires a human, rat or mouse nucleic GenBank, UniGene, LocusLink, or
> IMAGE ID. No other genomes are supported.
>
> Although it appears faster than a Bio.GenBank query, I do not see that this
> module provides any special functionality than that already provided by
> Bio.GenBank and similar. So I think this module is obsolete and redundant.

Note the online bits of Bio.GenBank are considered obsoleted by
Bio.Entrez anyway.  Maybe we should actually deprecate these for
Biopython 1.49...

I would agree in some ways  Bio.EZRetrieve module is also obsolete and
redundant, see also:
http://lists.open-bio.org/pipermail/biopython-dev/2008-March/003503.html

Unless anyone wants to defend Bio.EZRetrieve, let's ask on the main
list about declaring it obsolete for Biopython 1.49 (documentation
change only) and deprecating it in the next release (adding a warning
only).

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 12:06:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 12:06:37 -0500
Subject: [Biopython-dev] [Bug 2678] Entrez.esearch does not always retrieve
	or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200811201706.mAKH6b1r006648@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #1 from lpritc at scri.sari.ac.uk  2008-11-20 12:06 EST -------
And this time, more usefully, traceback with problem code:

>>> handle = Entrez.einfo()
>>> record = Entrez.read(handle)
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py:279:
UserWarning: DTD file xhtml1-strict.dtd not found in Biopython installation;
trying to retrieve it from NCBI
  warnings.warn("DTD file %s not found in Biopython installation; trying to
retrieve it from NCBI" % filename)
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py:279:
UserWarning: DTD file xhtml-lat1.ent not found in Biopython installation;
trying to retrieve it from NCBI
  warnings.warn("DTD file %s not found in Biopython installation; trying to
retrieve it from NCBI" % filename)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/__init__.py",
line 286, in read
    record = handler.run(handle)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py",
line 95, in run
    self.parser.ParseFile(handle)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py",
line 283, in external_entity_ref_handler
    parser.ParseFile(handle)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py",
line 280, in external_entity_ref_handler
    handle = urllib.urlopen(systemId)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py",
line 87, in urlopen
    return opener.open(url)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py",
line 203, in open
    return getattr(self, name)(url)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py",
line 461, in open_file
    return self.open_local_file(url)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py",
line 475, in open_local_file
    raise IOError(e.errno, e.strerror, e.filename)
IOError: [Errno 2] No such file or directory: 'xhtml-lat1.ent'


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 12:07:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 12:07:40 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200811201707.mAKH7ej9006714@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


lpritc at scri.sari.ac.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Entrez.esearch does not     |Bio.Entrez module does not
                   |always retrieve or find DTD |always retrieve or find DTD
                   |files                       |files


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 12:14:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 12:14:35 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811201714.mAKHEZj4007097@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #4 from cymon.cox at gmail.com  2008-11-20 12:14 EST -------
(In reply to comment #3)
> (In reply to comment #0)
> > Ive used the "Sequence Keys" ontology for the location operator and stored
> > loc op in the location_qualifier_value table - not sure this is right...
> >
> 
> I'm not sure off hand either, but would like us to check before committing
> this.  In the short term, what ever BioPerl does is "right" as I'm treating
> that as the BioSQL reference implementation.

I don't read Perl - but I grep'ed through the source and only found one ref to
the location_qualifier_value, and that was in the docs. So maybe they don't
store it there...

Sorry I can be of more help, C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 17:01:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 17:01:13 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200811202201.mAKM1Dce030238@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2008-11-20 17:01 EST -------
Could you make a list of the missing DTDs? You add the missing ones to
Bio/Entrez/DTDs and reinstall Biopython. It looks like only xhtml1-strict.dtd
and xhtml-lat1.ent are missing, but after adding these to Bio/Entrez/DTDs you
may find other missing DTDs.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 03:54:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 03:54:00 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200811210854.mAL8s0Dt009861@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #3 from lpritc at scri.sari.ac.uk  2008-11-21 03:53 EST -------
(In reply to comment #2)
> Could you make a list of the missing DTDs? You add the missing ones to
> Bio/Entrez/DTDs and reinstall Biopython. It looks like only xhtml1-strict.dtd
> and xhtml-lat1.ent are missing, but after adding these to Bio/Entrez/DTDs you
> may find other missing DTDs.

I'll add the DTDs that I noted above, but the problem is intermittent and I
haven't seen the issue arise again at all, this morning.  If I see anything
else give an error, I'll make a note here.

This may be something to keep in mind if other, similar errors are reported
from future Entrez searches, but if the problem is the result of excessive
server load, or timeouts, it may not be reliably repeatable.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 05:52:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 05:52:17 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211052.mALAqHel020569@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 05:52 EST -------
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #0)
> > > Ive used the "Sequence Keys" ontology for the location operator and stored
> > > loc op in the location_qualifier_value table - not sure this is right...
> > >
> > 
> > I'm not sure off hand either, but would like us to check before committing
> > this.  In the short term, what ever BioPerl does is "right" as I'm treating
> > that as the BioSQL reference implementation.
> 
> I don't read Perl - but I grep'ed through the source and only found one ref to
> the location_qualifier_value, and that was in the docs. So maybe they don't
> store it there...
> 
> Sorry I can be of more help, C.
> 

I tried browsing and searching the BioPerl-db source, but couldn't find the
answer, so I tried the direct route and used their load_seqdatabase.pl script
to import a GenBank file (with at least one join location) and inspected the
tables.

The answer is that location.term_id is always left as NULL, so there is no
ontology to worry about.  Doing something sensible with ontologies (e.g.
support for existing strict ontologies like SO or SOFA) rather than the current
ad-hoc relaxed approach (adding new ontology terms on the fly) taken by BioPerl
and Biopython is a possible future enhancement.

I'm going to look at modifying you patch to leave location.term_id as NULL,
with the aim of committing that today and then doing the Biopython 1.49
release.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 06:54:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 06:54:18 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211154.mALBsIcR025739@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1073 is|0                           |1
           obsolete|                            |


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 06:59:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 06:59:08 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211159.mALBx89Z026099@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1072 is|0                           |1
           obsolete|                            |


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 06:59 EST -------
(From update of attachment 1072)
Hi Cymon,

I've just checked in something based on your patches:

Checking in BioSQL/Loader.py;
/home/repository/biopython/biopython/BioSQL/Loader.py,v  <--  Loader.py
new revision: 1.37; previous revision: 1.36
done
Checking in BioSQL/BioSeq.py;
/home/repository/biopython/biopython/BioSQL/BioSeq.py,v  <--  BioSeq.py
new revision: 1.31; previous revision: 1.30
done
Checking in Tests/test_BioSQL_SeqIO.py;
/home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v  <-- 
test_BioSQL_SeqIO.py
new revision: 1.27; previous revision: 1.26
done

This should fix the strand, feature db ref in locations, and importantly the
start/end with sub-features.

I am avoiding the ontology question by leaving location.term_id as NULL
(following BioPerl usage).

I'd like to do the same with location_qualifier_value.term_id but the schema
does not allow NULL here.  Interestingly BioPerl does not seem to use this
table, so I assume they (like Biopython) have been assuming "join".

I think this is still a big improvement, but that the
(sub)feature.location_operator issue could wait.  We'll need to discuss on the
BioSQL mailing list how this should be handled consistently.

Leaving this bug open.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 07:04:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 07:04:39 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211204.mALC4dUW026607@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


dalloliogm at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #1048|application/octet-stream    |text/plain
          mime type|                            |


------- Comment #23 from dalloliogm at gmail.com  2008-11-21 07:04 EST -------
(From update of attachment 1048)
changed mime type


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 07:18:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 07:18:35 -0500
Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence
	objects "
In-Reply-To: <bug-2662-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211218.mALCIZds027946@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2662


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 07:18 EST -------
Fixed in CVS revision 1.187 of biopython/Doc/Tutorial.tex by completely
rephrasing to avoid the contentious sentence structure.  See:

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython

Now reads:
> There are two important di???erences between Seq objects and standard
> python strings. First of all, they have di???erent methods. Although
> the Seq object supports many of the same methods as a plain string,
> its translate() method di???ers by doing biological translation, and
> there are also additional biologically relevant methods like
> reverse_complement(). Secondly, the Seq object has an important
> attribute, alphabet, which is an object describing what the individual
> characters making up the sequence string ???mean???, and how they should
> be interpreted. For example, is AGTACACTGGT a DNA sequence, or just
> a protein sequence that happens to be rich in Alanines, Glycines,
> Cysteines and Threonines?

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Fri Nov 21 07:38:07 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 21 Nov 2008 12:38:07 +0000
Subject: [Biopython-dev] CVS freeze for Biopython 1.49
Message-ID: <320fb6e00811210438v272d32afta03497a846716df6@mail.gmail.com>

On Nov 20, Peter wrote:
> No-one seems to want a second beta, which saves me some time :)
>
> There have been a few other bugs reported and fixed in the meantime,
> right now the only thing I think holding up the release of Biopython
> 1.49 is:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2677
> Bug 2677 - BioSQL seqfeature enhancements

I've committed most of this bug fix to CVS, I think the remaining
issue can wait until after Biopython 1.49 is out.

> Is there anything else?

If there are no last minute objections, my plan is to do the Biopython
1.49 release this afternoon, hopefully starting after lunch - in about
one hour's time.

Please **consider CVS frozen from now**.  Hopefully I'll have the
build done within the next 12 hours, including the Windows installers.

Once the release is out, we'll give it a few days just in case there
are any issues to force a re-release, and then reopen CVS.  Tiago has
some more PopGen code waiting, and there is also GenomeDiagram to look
forward too (Bug 2671).

Peter

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 09:46:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 09:46:29 -0500
Subject: [Biopython-dev] [Bug 2680] New: Bio.AlignAce.Parser.py need to
	import string
Message-ID: <bug-2680-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2680

           Summary: Bio.AlignAce.Parser.py need to import string
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: trivial
          Priority: P4
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


The file Bio.AlignAce.Parser.py needs to 'import string' because it uses the
function 'string.atof()'. Also, please note that string.atof() is a depreciated
function (since Python 2.0) but it will not get removed until Python 3.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 09:57:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 09:57:47 -0500
Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to import
	string
In-Reply-To: <bug-2680-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211457.mALEvlR5009727@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2680


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 09:57 EST -------
This used to work via the "from Bio.ParserSupport import *", as up until
Biopython 1.48 that imported string.

Fixed in Bio/AlignAce/Parser.py revision 1.4 by importing string (this will be
included in Biopython 1.49).

I'm leaving this bug open as I would rather not use the string module here at
all - probably we can just use float() instead of string.atof() but that can
wait until after Biopython 1.49 is out.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bsouthey at gmail.com  Fri Nov 21 10:19:22 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Fri, 21 Nov 2008 09:19:22 -0600
Subject: [Biopython-dev] Use of depreciated string functions
Message-ID: <4926D17A.8080101@gmail.com>

Hi,
There are a number of files in Bio that import string. Many of these use 
depreciated functions (since Version 2) that are now string methods 
mainly  string.atof(), string.atoi()  and string.join(). The only real 
advantage of modifying these is to remove an import statement because 
these will not be removed until Python 3.

Perhaps the one exception is in HotRand.py: hex_digit = 
string.hexdigits.find( letter )

There are about 23 unique files that I identified via grep and many have 
more than one usage. While changing these is busy work, please let me 
know if you would like me to create patches for the next version of 
Biopython (ie 1.50) or just ignore this.

Thanks
Bruce

From biopython at maubp.freeserve.co.uk  Fri Nov 21 10:26:52 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 21 Nov 2008 15:26:52 +0000
Subject: [Biopython-dev] Use of depreciated string functions
In-Reply-To: <4926D17A.8080101@gmail.com>
References: <4926D17A.8080101@gmail.com>
Message-ID: <320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com>

On Fri, Nov 21, 2008 at 3:19 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Hi,
> There are a number of files in Bio that import string. Many of these use
> depreciated functions (since Version 2) that are now string methods mainly
>  string.atof(), string.atoi()  and string.join(). The only real advantage of
> modifying these is to remove an import statement because these will not be
> removed until Python 3.
>
> Perhaps the one exception is in HotRand.py: hex_digit =
> string.hexdigits.find( letter )
>
> There are about 23 unique files that I identified via grep and many have
> more than one usage. While changing these is busy work, please let me know
> if you would like me to create patches for the next version of Biopython (ie
> 1.50) or just ignore this.

As you say, there isn't much benefit from doing this other than
removing an import and making another small step towards Python 3.0
compatibility.  We have gradually been phasing out "import string"
already, usually when working on a module which used it.

Once I've dealt with Biopython 1.49, I'd be happy to look at a patch
to remove more "import string" usage from non-obsolete, non-deprecated
code.  It would be a little risky doing this to modules without unit
tests, but that's another area you've shown some interest in anyway...

Thanks,

Peter

From bartek at rezolwenta.eu.org  Fri Nov 21 10:32:02 2008
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Fri, 21 Nov 2008 16:32:02 +0100
Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to
	import string
In-Reply-To: <200811211457.mALEvlR5009727@portal.open-bio.org>
References: <bug-2680-42@http.bugzilla.open-bio.org/>
	<200811211457.mALEvlR5009727@portal.open-bio.org>
Message-ID: <8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com>

Hello,

I fixed the bug (changed both uses of string.atof() to float() ), and
commited to CVS, although I cannot close it in Bugzilla (my
dev.open-bio account does not seem to work for bugzilla).


cheers
Bartek Wilczynski

On Fri, Nov 21, 2008 at 3:57 PM,  <bugzilla-daemon at portal.open-bio.org> wrote:
> http://bugzilla.open-bio.org/show_bug.cgi?id=2680
>
>
>
>
>
> ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 09:57 EST -------
> This used to work via the "from Bio.ParserSupport import *", as up until
> Biopython 1.48 that imported string.
>
> Fixed in Bio/AlignAce/Parser.py revision 1.4 by importing string (this will be
> included in Biopython 1.49).
>
> I'm leaving this bug open as I would rather not use the string module here at
> all - probably we can just use float() instead of string.atof() but that can
> wait until after Biopython 1.49 is out.
>
>
> --
> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>


-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 10:41:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 10:41:54 -0500
Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to import
	string
In-Reply-To: <bug-2680-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211541.mALFfsDM013508@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2680


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 10:41 EST -------
Bartek's email:
> Hello,
>
> I fixed the bug (changed both uses of string.atof() to float() ),
> and commited to CVS, although I cannot close it in Bugzilla (my
> dev.open-bio account does not seem to work for bugzilla).
>
> cheers
> Bartek Wilczynski

Marking this as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Fri Nov 21 10:45:42 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 21 Nov 2008 15:45:42 +0000
Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to
	import string
In-Reply-To: <8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com>
References: <bug-2680-42@http.bugzilla.open-bio.org/>
	<200811211457.mALEvlR5009727@portal.open-bio.org>
	<8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com>
Message-ID: <320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com>

On Fri, Nov 21, 2008 at 3:32 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
> Hello,
>
> I fixed the bug (changed both uses of string.atof() to float() ), and
> commited to CVS, although I cannot close it in Bugzilla (my
> dev.open-bio account does not seem to work for bugzilla).
>
> cheers
> Bartek Wilczynski

Thanks Bartek,

I was partway through the build process for the Biopython 1.49
release, but I've got that latest Bio/AliceAce/Parser.py file now.
I've closed Bug 2680 - I'm not sure how the permissions work on
Bugzilla exactly...

On a related note - could you write a unit test for Bio.AlignAce please?

Thanks,

Peter

From biopython at maubp.freeserve.co.uk  Fri Nov 21 11:07:00 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 21 Nov 2008 16:07:00 +0000
Subject: [Biopython-dev] Warnings from epydoc
Message-ID: <320fb6e00811210807xed03553x24e3abc571e9f20a@mail.gmail.com>

Hi all,

Something that I could have mentioned when I built the beta is there
are a lot of warnings from epydoc.  Ignoring a few from deprecated
modules etc, there is a whole class as follows:

Warning: Module Bio.KDTree.KDTree is shadowed by a variable with the same name.
Warning: Module Bio.PDB.DSSP is shadowed by a variable with the same name.
Warning: Module Bio.PDB.FragmentMapper is shadowed by a variable with
the same name.
Warning: Module Bio.PDB.NeighborSearch is shadowed by a variable with
the same name.
Warning: Module Bio.PDB.PDBIO is shadowed by a variable with the same name.
Warning: Module Bio.PDB.PDBList is shadowed by a variable with the same name.
Warning: Module Bio.PDB.PDBParser is shadowed by a variable with the same name.
Warning: Module Bio.PDB.ResidueDepth is shadowed by a variable with
the same name.
Warning: Module Bio.PDB.StructureAlignment is shadowed by a variable
with the same name.
Warning: Module Bio.PDB.Superimposer is shadowed by a variable with
the same name.
Warning: Module Bio.PDB.Vector is shadowed by a variable with the same name.
Warning: Module Bio.PDB.parse_pdb_header is shadowed by a variable
with the same name.
Warning: Module Bio.SVDSuperimposer.SVDSuperimposer is shadowed by a
variable with the same name.
Warning: Module Bio.SCOP.Residues is shadowed by a variable with the same name.

One visible side effect of this in the epydoc output is these modules
get shown with an apostrophe suffix for disambiguation.

On another point, I think some of the imports used in Bio.PopGen are
making epydoc unhappy:

+-------------------------------------------------------------------------------------------------
| In /usr/local/lib/python2.5/site-packages/Bio/PopGen/SimCoal/Cache.py:
| Import failed (but source code parsing was successful).
|     Error: ImportError: No module named PopGen.SimCoal.Controller (line 14)
|
+-------------------------------------------------------------------------------------------------
| In /usr/local/lib/python2.5/site-packages/Bio/PopGen/SimCoal/Async.py:
| Import failed (but source code parsing was successful).
|     Error: ImportError: No module named PopGen.SimCoal.Controller (line 16)
|

Taking Bio/PopGen/SimCoal/Cache.py as an example, currently this has:

from PopGen.SimCoal.Controller import SimCoalController
from PopGen import Config

Perhaps this should be changed to either local imports:

from Controller import SimCoalController
import Config

or full imports:

from Bio.PopGen.SimCoal.Controller import SimCoalController
from Bio.PopGen import Config

(Neither tested yet).

I don't know if the current imports have any downsides (apart from
upsetting epydoc), as the current code works and the unit tests pass.

Peter

From bsouthey at gmail.com  Fri Nov 21 11:15:29 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Fri, 21 Nov 2008 10:15:29 -0600
Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need
 to	import string
In-Reply-To: <320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com>
References: <bug-2680-42@http.bugzilla.open-bio.org/>	<200811211457.mALEvlR5009727@portal.open-bio.org>	<8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com>
	<320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com>
Message-ID: <4926DEA1.7020405@gmail.com>

Peter wrote:
> On Fri, Nov 21, 2008 at 3:32 PM, Bartek Wilczynski
> <bartek at rezolwenta.eu.org> wrote:
>   
>> Hello,
>>
>> I fixed the bug (changed both uses of string.atof() to float() ), and
>> commited to CVS, although I cannot close it in Bugzilla (my
>> dev.open-bio account does not seem to work for bugzilla).
>>
>> cheers
>> Bartek Wilczynski
>>     
>
> Thanks Bartek,
>
> I was partway through the build process for the Biopython 1.49
> release, but I've got that latest Bio/AliceAce/Parser.py file now.
> I've closed Bug 2680 - I'm not sure how the permissions work on
> Bugzilla exactly...
>
> On a related note - could you write a unit test for Bio.AlignAce please?
>
> Thanks,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>   
Hi Bartek,
I just started on working through understanding the functionality of the 
code so it would be really great to the tests and a tutorial section on 
AlignAce.

So far I know that there needs to be at least two tests for AlignAce:
1) Running  Bio.AlignAce.AlignAceStandalone
2) Parsing the output  from AlignAce

There needs to be similar tests for CompareAce.

Also, could you please add the following lines to your AlignAce2004 code 
(I downloaded it from your site yesterday) to standard.h?

#include <limits.h>
#include <string.h>

I needed these to compile AlignAce under Linux with gcc version 4.3.2. I 
would also suggest not to include binaries because they are statically 
linked to old C++ libraries. Running just './AlignACE' gives the error:
./AlignACE: error while loading shared libraries: libstdc++.so.5: cannot 
open shared object file: No such file or directory

Thanks
Bruce

From biopython at maubp.freeserve.co.uk  Fri Nov 21 11:59:08 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 21 Nov 2008 16:59:08 +0000
Subject: [Biopython-dev] Biopython 1.49 released
Message-ID: <320fb6e00811210859n2d128fd6nc21ad1012e1d93bf@mail.gmail.com>

Dear Biopythoneers,

We are pleased to announce the release of Biopython 1.49. There have
been some significant changes since Biopython 1.48 was released a few
months ago, which is why we initially released a beta for wider
testing. Thank you to all those who tried this and reported the minor
problems uncovered.

As previously announced, the big news is that Biopython now uses NumPy
rather than its precursor Numeric (the original Numerical Python
library).

As in the previous releases, Biopython 1.49 supports Python 2.3, 2.4
and 2.5 but should now also work fine on Python 2.6. Please note that
we intend to drop support for Python 2.3 in a couple of releases time.

We also have some new functionality, starting with the basic sequence
object (the Seq class) which now has more methods. This encourages a
more object orientated coding style, and makes basic biological
operations like transcription and translation more accessible and
discoverable.

Our BioSQL interface can now optionally fetch the NCBI taxonomy on
demand when loading sequences (via Bio.Entrez) allowing you to
populate the taxon/taxon_name tables gradually. Also, BioSQL should
now work with the psycopg2 driver for PostgreSQL (as well as the older
psycopg driver), and the handling of feature locations has also been
improved.

We've also updated the Biopython Tutorial and Cookbook (also available in PDF).
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Finally, our old parsing infrastructure (Martel and Bio.Mindy) is now
considered to be deprecated, meaning mxTextTools is no longer required
to use Biopython. This should not affect any of the typically used
parsers (e.g. Bio.SeqIO and Bio.AlignIO).

Given there have been more changes than in recent Biopython releases,
please do check your old scripts still work fine, and let us know on
the mailing list or file a bug if there is anything wrong.

Source distributions and Windows installers are available from the
Biopython website:
http://biopython.org/wiki/Download

Thanks!

-Peter on behalf of the Biopython developers

P.S. You may wish to subscribe to our news feed.  For RSS links etc, see:
http://biopython.org/wiki/News

From biopython at maubp.freeserve.co.uk  Fri Nov 21 12:05:46 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 21 Nov 2008 17:05:46 +0000
Subject: [Biopython-dev] CVS freeze for Biopython 1.49
In-Reply-To: <320fb6e00811210438v272d32afta03497a846716df6@mail.gmail.com>
References: <320fb6e00811210438v272d32afta03497a846716df6@mail.gmail.com>
Message-ID: <320fb6e00811210905i4835819bvb4955b05658ef535@mail.gmail.com>

> If there are no last minute objections, my plan is to do the Biopython
> 1.49 release this afternoon, hopefully starting after lunch - in about
> one hour's time.
>
> Please **consider CVS frozen from now**.  Hopefully I'll have the
> build done within the next 12 hours, including the Windows installers.

OK, the release is out.  Thanks everyone!  I haven't sat down and
counted, but it feels like there were more people involved and taking
an interest than for Biopython 1.48, which is great.

> Once the release is out, we'll give it a few days just in case there
> are any issues to force a re-release, and then reopen CVS.

The CVS "freeze" is over, but for the next couple of days, please only
commit small bug fixes and documentation improvements.  Baring any
surprises, we can expect to start looking at adding new code mid next
week:

> Tiago has some more PopGen code waiting, and there is also
> GenomeDiagram to look forward too (Bug 2671).

Have a good weekend,

Regards,

Peter

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 12:24:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 12:24:55 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211724.mALHOt8x003395@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 12:24 EST -------
Looking at the code for the external_entity_ref_handler function in
Bio/Entrez/Parser.py is doesn't actually attempt to cache missing DTD files.

Would this be a worthwhile enhancement?  We would have to cope with the fact
that the process may not have permissions to write to the DTD directory,
perhaps by falling back on the system temp folder?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 14:22:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 14:22:36 -0500
Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long
	organism names
In-Reply-To: <bug-2591-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211922.mALJMa8Q011752@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2591


------- Comment #3 from joelb at lanl.gov  2008-11-21 14:22 EST -------
I never heard back from info at genbank, so I found a different contact there and
I just re-sent the problem.  I'll follow up when I hear something.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 14:31:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 14:31:26 -0500
Subject: [Biopython-dev] [Bug 2681] New: BioSQL: record annotations
	enhancements
Message-ID: <bug-2681-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681

           Summary: BioSQL: record annotations enhancements
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: cymon.cox at gmail.com


BioSQL storage and retrieval of record annotations. See also bug 2396.


Patch fixes 3 annotations:

1) Fixed date/dates typo.
2) comment's were being stored by not retrieved - fixed with test.
3) A 'reference' annotation, even if an empty list, was being retrieved in a
DBSeqRecord. Fixed so that if there are no references there is no annotation in
DBSeqRecord.

Other annotations:

'date', 'ncbi_taxid', 'gi', and 'contig' are the only annotations we are not
handling correctly in the test suite.

'date' can be ignored if present in DBSeqRecord but absent in SeqRecord because
the current date is entered into table if a date is not present in the record.

Annotation 'ncbi_taxid' will be present in the DBSeqRecords even when not
present in the loaded SeqRecord as they are grabbed from the taxon table. We
can
therefore ignore this specific comparision: old record absent, new record
present. Some swiss prot SeqRecords have ncbi_taxid and they are retrieved
correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing from the
retrieved DBSeqRecord: sp012, sp014, 

Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved
DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in
bioentry, if the gi annotation is missing, which is pulled as the gi
annotation.
So the swissprot, fasta, and embl DBSeqRecords return the accession as the gi
(GenBank identifier). I think this is misleading; annotation 'gi' in the
DBSeqRecord should really be named a more generic 'identifier'...  What to do
here?

'contig' is ignored by loader because it's a SeqFeature object. Is there any
reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 14:32:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 14:32:43 -0500
Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements
In-Reply-To: <bug-2681-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211932.mALJWhXP012653@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681


------- Comment #1 from cymon.cox at gmail.com  2008-11-21 14:32 EST -------
Created an attachment (id=1074)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1074&action=view)
BioSQL patch for enhancements to record annotations


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 17:41:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 17:41:16 -0500
Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements
In-Reply-To: <bug-2681-42@http.bugzilla.open-bio.org/>
Message-ID: <200811212241.mALMfGT8026797@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 17:41 EST -------
(In reply to comment #0)
> 1) Fixed date/dates typo.

Why is it a typo?  Change not checked in.

> 2) comment's were being stored by not retrieved - fixed with test.

Looks good, except for returning an empty list if there were no comments.

> 3) A 'reference' annotation, even if an empty list, was being retrieved in a
> DBSeqRecord. Fixed so that if there are no references there is no annotation
> in DBSeqRecord.

I agree, but preferred a smaller change for this:

Checking in BioSQL/BioSeq.py;
/home/repository/biopython/biopython/BioSQL/BioSeq.py,v  <--  BioSeq.py
new revision: 1.33; previous revision: 1.32
done
Checking in Tests/test_BioSQL_SeqIO.py;
/home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v  <-- 
test_BioSQL_SeqIO.py
new revision: 1.29; previous revision: 1.28
done

This was based closely on your patch, so thank you!  You are making steady
progress through the remaining "TODO" notes I left when writing
test_BioSQL_SeqIO.py :)

> Some swiss prot SeqRecords have ncbi_taxid and they are retrieved
> correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing
> from the retrieved DBSeqRecord: sp012, sp014, 

Note some swiss prot records may be multi-species, which the BioSQL schema
can't cope with.  Not sure if that applies here.

> Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved
> DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in
> bioentry, if the gi annotation is missing, which is pulled as the gi
> annotation.

There probably is something not quite right here.  Are you talking about the
bioentry.identifier entry in the database?  Perhaps an explicit example might
help.  As an aside, I think "gi" (GeneIndex used by NCBI) might be better
stored in the record.dbxrefs, but that could be a parser change...

> 'contig' is ignored by loader because it's a SeqFeature object. Is there any
> reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb)

I couldn't even say off hand how the CONTIG line in that example would be
parsed, let alone how it gets dealt with when loading into BioSQL.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 17:42:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 17:42:33 -0500
Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements
In-Reply-To: <bug-2681-42@http.bugzilla.open-bio.org/>
Message-ID: <200811212242.mALMgXAN026914@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 17:42 EST -------
P.S. For a little background, see Bug 2396.  Looking back I can see why I
missed the comments annotation at the time (being stored in a different table).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 18:47:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 18:47:13 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200811212347.mALNlDsF030565@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp  2008-11-21 18:47 EST -------
(In reply to comment #4)
> Looking at the code for the external_entity_ref_handler function in
> Bio/Entrez/Parser.py is doesn't actually attempt to cache missing DTD files.
> 
> Would this be a worthwhile enhancement?  We would have to cope with the fact
> that the process may not have permissions to write to the DTD directory,
> perhaps by falling back on the system temp folder?
> 
I think that there is an easier solution, which is to include all missing DTDs
with the Biopython installation. The number of DTDs is limited; I tried to
identify all of them but apparently I missed some. 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 18:49:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 18:49:27 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200811212349.mALNnRMn030720@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp  2008-11-21 18:49 EST -------
> I'll add the DTDs that I noted above, but the problem is intermittent and I
> haven't seen the issue arise again at all, this morning.  If I see anything
> else give an error, I'll make a note here.
> 
If the DTD is available locally in Bio/Entrez/DTDs, then Bio.Entrez will read
it from there. If not, it tries to download it. This may fail if the servers
are busy. If the needed DTDs are saved in Bio/Entrez/DTDs (and installed when
Biopython is installed), you won't run into this problem.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Nov 23 10:16:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 23 Nov 2008 10:16:53 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811231516.mANFGraa019222@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #7 from dalloliogm at gmail.com  2008-11-23 10:16 EST -------
(In reply to comment #0)

> The major changes that have been made to the version previously available at
> http://bioinf.scri.ac.uk/lp are:

That's a very nice contribution, thank you!!!
This link is wrong, I think you mean
http://bioinf.scri.ac.uk/lp/programs.php#genomediagram


> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From dalloliogm at gmail.com  Sun Nov 23 12:33:54 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Sun, 23 Nov 2008 18:33:54 +0100
Subject: [Biopython-dev] blog article on GenomeDiagram in Biopython
Message-ID: <5aa3b3570811230933n2de8af3lf31d3c4b962930a3@mail.gmail.com>

Hi people,
I thought that the inclusion of GenomeDiagrams in biopython is such an
interesting news, that I wrote a blog post on it:
- http://bioinfoblog.it/2008/11/genome-diagrams-included-in-biopython-150/

I have used images from some tutorials without asking, I hope it is
not a problem.
Cheers! :)


On Sun, Nov 23, 2008 at 4:16 PM,  <bugzilla-daemon at portal.open-bio.org> wrote:
> http://bugzilla.open-bio.org/show_bug.cgi?id=2671
>
>

-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it

From mjldehoon at yahoo.com  Mon Nov 24 01:44:13 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 23 Nov 2008 22:44:13 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
Message-ID: <871524.42970.qm@web62403.mail.re1.yahoo.com>

Hi everybody,

Biopython's testing framework is built on top of Python's unit testing framewerk. Python's unit testing framework makes use of assertion statements to compare the result of a command to the expected result. Biopython uses test scripts that print output to stdout, together with an output file that contains the correct output. After running each test script, it compares the generated output with the correct output to see if the test was successful.

This approach can be useful for modules that deal with different file formats. For example, you can read in a file in one format, write it out in a different format, and compare it with the expected result.

However, more than half of Biopython's tests do not actually make use of this testing framework:

test_BioSQL
test_CAPS
test_Cluster
test_CodonTable
test_Compass
test_Crystal
test_DocSQL
test_EmbossPrimer
test_Entrez
test_Fasta
test_GACrossover
test_GAMutation
test_GAOrganism
test_GAQueens
test_GARepair
test_GASelection
test_GFF
test_GFF2
test_GraphicsChromosome
test_GraphicsDistribution
test_GraphicsGeneral
test_HMMCasino
test_HMMGeneral
test_HotRand
test_KDTree
test_KeyWList
test_LogisticRegression
test_Medline
test_NNExclusiveOr
test_NNGene
test_NNGeneral
test_Pathway
test_PopGen_FDist
test_PopGen_FDist_nodepend
test_PopGen_SimCoal
test_PopGen_SimCoal_nodepend
test_Registry
test_Restriction
test_SCOP_Astral
test_SCOP_Cla
test_SCOP_Des
test_SCOP_Dom
test_SCOP_Hie
test_SCOP_Raf
test_SCOP_Residues
test_SCOP_Scop
test_Wise
test_docstrings
test_kNN
test_lowess
test_psw

These tests have trivial output, for example test_Cluster:

test_Cluster
test_clusterdistance (test_Cluster.TestCluster) ... ok
test_distancematrix_kmedoids (test_Cluster.TestCluster) ... ok
test_kcluster (test_Cluster.TestCluster) ... ok
test_matrix_parse (test_Cluster.TestCluster) ... ok
test_median_mean (test_Cluster.TestCluster) ... ok
test_somcluster (test_Cluster.TestCluster) ... ok
test_treecluster (test_Cluster.TestCluster) ... ok

----------------------------------------------------------------------
Ran 7 tests in 0.015s

OK

I suspect that for many of the remaining tests Biopython's unit testing framework doesn't bring any real advantage, but is used anyway solely because it currently is the standard in Biopython.

Personally, I find Python's unit testing framework easier to understand than Biopython's testing framework. It doesn't need a separate output file, and it is easier to match each line of code with the correct behavior.

I would therefore like to suggest to move from Biopython's testing framework to Python's testing framework. This also relieves us of the task of explaining Biopython's testing framework to contributors, and allows us to make better use of what Python already provides. Comparing output line-by-line, as Biopython's testing framework currently does, can still be used by test scripts that need this functionality.

Comments, suggestions, anybody?

--Michiel.


From dalloliogm at gmail.com  Mon Nov 24 04:04:08 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 24 Nov 2008 10:04:08 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <871524.42970.qm@web62403.mail.re1.yahoo.com>
References: <871524.42970.qm@web62403.mail.re1.yahoo.com>
Message-ID: <5aa3b3570811240104m1442e5dfkd0c0f92c6fa772f9@mail.gmail.com>

On Mon, Nov 24, 2008 at 7:44 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> Biopython's testing framework is built on top of Python's unit testing framewerk. Python's unit testing framework makes use of assertion statements to compare the result of a command to the expected result.

Hi,
I was also proposing to use the doctest framework for some of the
modules, and for enhancing documentation.

- http://bugzilla.open-bio.org/show_bug.cgi?id=2640


> Biopython uses test scripts that print output to stdout, together with an output file that contains the
> correct output. After running each test script, it compares the generated output with the correct
> output to see if the test was successful.
>
> This approach can be useful for modules that deal with different file formats. For example, you can read in a file in one format, write it out in a different format, and compare it with the expected result.
>
> However, more than half of Biopython's tests do not actually make use of this testing framework:
>

Do you need help in re-organizing all of these modules?

> test_BioSQL
> test_CAPS
> test_Cluster
> test_CodonTable
> test_Compass
> test_Crystal
> test_DocSQL
> test_EmbossPrimer
> test_Entrez
> test_Fasta
> test_GACrossover
> test_GAMutation
> test_GAOrganism
> test_GAQueens
> test_GARepair
> test_GASelection
> test_GFF
> test_GFF2
> test_GraphicsChromosome
> test_GraphicsDistribution
> test_GraphicsGeneral
> test_HMMCasino
> test_HMMGeneral
> test_HotRand
> test_KDTree
> test_KeyWList
> test_LogisticRegression
> test_Medline
> test_NNExclusiveOr
> test_NNGene
> test_NNGeneral
> test_Pathway
> test_PopGen_FDist
> test_PopGen_FDist_nodepend
> test_PopGen_SimCoal
> test_PopGen_SimCoal_nodepend
> test_Registry
> test_Restriction
> test_SCOP_Astral
> test_SCOP_Cla
> test_SCOP_Des
> test_SCOP_Dom
> test_SCOP_Hie
> test_SCOP_Raf
> test_SCOP_Residues
> test_SCOP_Scop
> test_Wise
> test_docstrings
> test_kNN
> test_lowess
> test_psw
>
> These tests have trivial output, for example test_Cluster:
>
> test_Cluster
> test_clusterdistance (test_Cluster.TestCluster) ... ok
> test_distancematrix_kmedoids (test_Cluster.TestCluster) ... ok
> test_kcluster (test_Cluster.TestCluster) ... ok
> test_matrix_parse (test_Cluster.TestCluster) ... ok
> test_median_mean (test_Cluster.TestCluster) ... ok
> test_somcluster (test_Cluster.TestCluster) ... ok
> test_treecluster (test_Cluster.TestCluster) ... ok
>
> ----------------------------------------------------------------------
> Ran 7 tests in 0.015s
>
> OK
>
> I suspect that for many of the remaining tests Biopython's unit testing framework doesn't bring any real advantage, but is used anyway solely because it currently is the standard in Biopython.
>
> Personally, I find Python's unit testing framework easier to understand than Biopython's testing framework. It doesn't need a separate output file, and it is easier to match each line of code with the correct behavior.
>
> I would therefore like to suggest to move from Biopython's testing framework to Python's testing framework. This also relieves us of the task of explaining Biopython's testing framework to contributors, and allows us to make better use of what Python already provides. Comparing output line-by-line, as Biopython's testing framework currently does, can still be used by test scripts that need this functionality.
>
> Comments, suggestions, anybody?
>
> --Michiel.
>
>
>
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it

From bartek at rezolwenta.eu.org  Mon Nov 24 07:45:52 2008
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Mon, 24 Nov 2008 13:45:52 +0100
Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to
	import string
In-Reply-To: <320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com>
References: <bug-2680-42@http.bugzilla.open-bio.org/>
	<200811211457.mALEvlR5009727@portal.open-bio.org>
	<8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com>
	<320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com>
Message-ID: <8b34ec180811240445w3e6e97d8k38c1740e84372184@mail.gmail.com>

On Fri, Nov 21, 2008 at 4:45 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:

>
> On a related note - could you write a unit test for Bio.AlignAce please?
>

Hi Peter,

I do not have much experience with writing unit tests but I would like
to do it (treating it as an opportunity to learn more on unit tests).

There are two issues which are somewhat related to this:
- I have some more code related to sequence motif analysis which I'm
using myself and could contribute as an extension to BIo.AlignACE. If
people are interested in having this in biopython, it would be
sensible to think about refactoring Bio.AlignACE and Bio.MEME which
both provide a Motif class with largely overlapping functionality. I
could do that and at the same time write unit tests for the new
version. For that it would be cool to get input from all current or
potential users of this functionality. I'll think about it a little
and maybe write to biopython-users list.
- The other issue is connected with the type of the tests I should
write. Since Michiel brought this topic up recently, I'd like to know
whether I should do it in the python (doctest) or biopython way.

cheers
Bartek


-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433

From bartek at rezolwenta.eu.org  Mon Nov 24 09:51:12 2008
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Mon, 24 Nov 2008 15:51:12 +0100
Subject: [Biopython-dev] Refactoring motif analysis code
Message-ID: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>

Hello All,

Currently, there are two packages dealing with motif analysis in biopython :
Bio.AlignAce (written by me) and Bio.MEME (written by Jason Hackney).

Both of them are quite old and they were developed independently so
the functionality is largely overlapping.
Particularly the files AlignAce/Motif.py  and MEME/Motif.py contain
almost identical functionality useful for
anyone interested in motif analysis of  writing a parser for yet
another motif searching tool.

I'd like to change this and create a new library called Bio.Motif,
which would contain:
-Motif class for all general functionality concerning motif objects:
i/o, comparisons, sequence scanning
-AlignAce Parser
-MEME Parser

When this is completed, we could deprecate the AlignAce and MEME
modules. For AlignAce I have most of the code
already written, I need to rewrite portions of MEME parser to work
with different motif implementation (not a major pain).
Then I just need to polish it a bit and provide tests and a short tutorial.

After this rather long intro I'd like to ask about several things:
- Are there many Bio.AlignAce or Bio.MEME users who would be unhappy
about deprecating them?
- Are there any features which people would find valuable in Bio.Motif
- Both MEME and AlignAce are DNA-oriented, I've never worked on
Protein motifs myself, but I'd like to know whether anyone is
interested in using Bio.Motif for that

Any comments/ideas are welcome

cheers
Bartek

-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433

From dalloliogm at gmail.com  Mon Nov 24 10:25:23 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 24 Nov 2008 16:25:23 +0100
Subject: [Biopython-dev] Refactoring motif analysis code
In-Reply-To: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>
References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>
Message-ID: <5aa3b3570811240725n54f7f624oc1db5fe0b88e3f5a@mail.gmail.com>

On Mon, Nov 24, 2008 at 3:51 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
> Hello All,
>
> Currently, there are two packages dealing with motif analysis in biopython :
> Bio.AlignAce (written by me) and Bio.MEME (written by Jason Hackney).

Hi, I asked a question about motifs one year ago on this list.
Here it is the thread:
- http://lists.open-bio.org/pipermail/biopython/2007-September/003727.html

I would just like to tell you that I have tried the TAMO framework you
suggested me, and found it very useful.
I am not using it anymore because I don't need it, but I remember that I liked:
- the methods to represent motifs as matrixes of frequencies/occurrencies etc..
- the fact that it was easy to create a motif from an alignment of sequences
- the integration it had with this website:
http://weblogo.berkeley.edu/logo.cgi.
I would suggest you to provide integration with this other web
service, which enable to plot the difference between two sequence
logos: http://www.twosamplelogo.org/examples.html.

Maybe you should contact TAMO's author to ask him if he wants to
contribute, because I remember that its framework was really complete.


>
> Both of them are quite old and they were developed independently so
> the functionality is largely overlapping.
> Particularly the files AlignAce/Motif.py  and MEME/Motif.py contain
> almost identical functionality useful for
> anyone interested in motif analysis of  writing a parser for yet
> another motif searching tool.
>
> I'd like to change this and create a new library called Bio.Motif,
> which would contain:
> -Motif class for all general functionality concerning motif objects:
> i/o, comparisons, sequence scanning
> -AlignAce Parser
> -MEME Parser
>
> When this is completed, we could deprecate the AlignAce and MEME
> modules. For AlignAce I have most of the code
> already written, I need to rewrite portions of MEME parser to work
> with different motif implementation (not a major pain).
> Then I just need to polish it a bit and provide tests and a short tutorial.
>
> After this rather long intro I'd like to ask about several things:
> - Are there many Bio.AlignAce or Bio.MEME users who would be unhappy
> about deprecating them?
> - Are there any features which people would find valuable in Bio.Motif
> - Both MEME and AlignAce are DNA-oriented, I've never worked on
> Protein motifs myself, but I'd like to know whether anyone is
> interested in using Bio.Motif for that
>
> Any comments/ideas are welcome
>
> cheers
> Bartek
>
> --
> Bartek Wilczynski
> ==================
> Postdoctoral fellow
> EMBL, Furlong group
> Meyerhoffstrasse 1,
> 69012 Heidelberg,
> Germany
> tel: +49 6221 387 8433
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it

From bsouthey at gmail.com  Mon Nov 24 10:54:32 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 24 Nov 2008 09:54:32 -0600
Subject: [Biopython-dev] Refactoring motif analysis code
In-Reply-To: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>
References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>
Message-ID: <492ACE38.1090301@gmail.com>

Bartek Wilczynski wrote:
> Hello All,
>
> Currently, there are two packages dealing with motif analysis in biopython :
> Bio.AlignAce (written by me) and Bio.MEME (written by Jason Hackney).
>   
Actually I am not that thrilled with the licenses for these packages and 
similar packages because these are free only for academic use. To me 
this clashes with the spirit of an open-sourced project especially a 
BSD-licensed one. But if there is a need for such modules then these 
modules should be included.

> Both of them are quite old and they were developed independently so
> the functionality is largely overlapping.
> Particularly the files AlignAce/Motif.py  and MEME/Motif.py contain
> almost identical functionality useful for
> anyone interested in motif analysis of  writing a parser for yet
> another motif searching tool.
>
> I'd like to change this and create a new library called Bio.Motif,
> which would contain:
> -Motif class for all general functionality concerning motif objects:
> i/o, comparisons, sequence scanning
> -AlignAce Parser
> -MEME Parser
>
>   
While it is only free for academic use, have you seen TAMO?
*TAMO: a flexible, object-oriented framework for analyzing 
transcriptional regulation using DNA-sequence motifs. *
Bioinformatics. 2005 Jul 15;21(14):3164-5. 
<http://bioinformatics.oxfordjournals.org/cgi/content/abstract/21/14/3164>

http://fraenkel.mit.edu/TAMO/


> When this is completed, we could deprecate the AlignAce and MEME
> modules. For AlignAce I have most of the code
> already written, I need to rewrite portions of MEME parser to work
> with different motif implementation (not a major pain).
> Then I just need to polish it a bit and provide tests and a short tutorial.
>
> After this rather long intro I'd like to ask about several things:
> - Are there many Bio.AlignAce or Bio.MEME users who would be unhappy
> about deprecating them?
>   
Well, I am not sure how many used Bio.AlignAce given the Parser.py bug :-)
Based on the CVS, both have been untouched for about three years.

Also, what species are these used for?
One of the papers of AlignAce indicate that the base composition was set 
for yeast.

> - Are there any features which people would find valuable in Bio.Motif
> - Both MEME and AlignAce are DNA-oriented, I've never worked on
> Protein motifs myself, but I'd like to know whether anyone is
> interested in using Bio.Motif for that
>
> Any comments/ideas are welcome
>
> cheers
> Bartek
>
>   
Personally I would be interested in a general protein motif finding 
module because of my current research. However, I do have a different 
view with respect to the Biopython community as indicated above with the 
licenses.

Bruce

From bsouthey at gmail.com  Mon Nov 24 12:47:21 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 24 Nov 2008 11:47:21 -0600
Subject: [Biopython-dev] Use of depreciated string functions
In-Reply-To: <320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com>
References: <4926D17A.8080101@gmail.com>
	<320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com>
Message-ID: <492AE8A9.1000406@gmail.com>

Peter wrote:
> On Fri, Nov 21, 2008 at 3:19 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>   
>> Hi,
>> There are a number of files in Bio that import string. Many of these use
>> depreciated functions (since Version 2) that are now string methods mainly
>>  string.atof(), string.atoi()  and string.join(). The only real advantage of
>> modifying these is to remove an import statement because these will not be
>> removed until Python 3.
>>
>> Perhaps the one exception is in HotRand.py: hex_digit =
>> string.hexdigits.find( letter )
>>
>> There are about 23 unique files that I identified via grep and many have
>> more than one usage. While changing these is busy work, please let me know
>> if you would like me to create patches for the next version of Biopython (ie
>> 1.50) or just ignore this.
>>     
>
> As you say, there isn't much benefit from doing this other than
> removing an import and making another small step towards Python 3.0
> compatibility.  We have gradually been phasing out "import string"
> already, usually when working on a module which used it.
>
> Once I've dealt with Biopython 1.49, I'd be happy to look at a patch
> to remove more "import string" usage from non-obsolete, non-deprecated
> code.  It would be a little risky doing this to modules without unit
> tests, but that's another area you've shown some interest in anyway...
>
> Thanks,
>
> Peter
>
>   
Hi,
I was planning to get started on with these depending on what time I 
have available. So just a quick question:
Do you want one bug report per patch per file?

Or just let me know if there is another way.

Thanks
Bruce

From biopython at maubp.freeserve.co.uk  Mon Nov 24 13:42:08 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 24 Nov 2008 18:42:08 +0000
Subject: [Biopython-dev] Use of depreciated string functions
In-Reply-To: <492AE8A9.1000406@gmail.com>
References: <4926D17A.8080101@gmail.com>
	<320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com>
	<492AE8A9.1000406@gmail.com>
Message-ID: <320fb6e00811241042g646ff65fq61d3751537c882b1@mail.gmail.com>

On Mon, Nov 24, 2008 at 5:47 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>> Once I've dealt with Biopython 1.49, I'd be happy to look at a patch
>> to remove more "import string" usage from non-obsolete, non-deprecated
>> code.  It would be a little risky doing this to modules without unit
>> tests, but that's another area you've shown some interest in anyway...
>>
>> Thanks,
>>
>> Peter
>
> Hi,
> I was planning to get started on with these depending on what time I have
> available. So just a quick question:
> Do you want one bug report per patch per file?
> Or just let me know if there is another way.

I'd suggest one general bug, and uploading one patch per module - that
way the can be evaluated on a case by case basis (a single huge
multi-file patch would be more difficult, and could become out of
date).

Personally however, I would prioritise more unit test coverage over
this, but on the other hand its the kind of short task you can handle
when you have the odd spare 10 minutes.  Up to you.

Peter

From bugzilla-daemon at portal.open-bio.org  Mon Nov 24 15:40:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 24 Nov 2008 15:40:49 -0500
Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements
In-Reply-To: <bug-2681-42@http.bugzilla.open-bio.org/>
Message-ID: <200811242040.mAOKenEi002020@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681


------- Comment #4 from cymon.cox at gmail.com  2008-11-24 15:40 EST -------
(In reply to comment #2)
> (In reply to comment #0)
> > 1) Fixed date/dates typo.
> 
> Why is it a typo?  Change not checked in.

The function _load_bioentry_date in Loader.py inserts the annotation 'date', if
present, or the current date if not, into the bioentry_qualifier_value table.
This is pulled by BioSeq.py _retrieve_qualifier_value and stored as the
attribute 'dates'. Hence I considered line 307 in BioSeq.py to be a typo, which
should be 'date' and not 'dates'. Also, because Loader.py handles dates
separately, they should not be handled by the function load_annotations.

> > 2) comment's were being stored by not retrieved - fixed with test.
> 
> Looks good, except for returning an empty list if there were no comments.
> 
> > 3) A 'reference' annotation, even if an empty list, was being retrieved in a
> > DBSeqRecord. Fixed so that if there are no references there is no annotation
> > in DBSeqRecord.
> 
> I agree, but preferred a smaller change for this:
> 
> Checking in BioSQL/BioSeq.py;
> /home/repository/biopython/biopython/BioSQL/BioSeq.py,v  <--  BioSeq.py
> new revision: 1.33; previous revision: 1.32
> done
> Checking in Tests/test_BioSQL_SeqIO.py;
> /home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v  <-- 
> test_BioSQL_SeqIO.py
> new revision: 1.29; previous revision: 1.28
> done

Actually, your version of _retrieve_comment never returns comments ;-)

On the wider issue: perhaps, it's best if DBSeqRecord's always have the same
set of attributes, even if comments and references are empty lists. Trying to
regenerate the attributes present in the loaded SeqRecord is, I think, not the
way to go, and not possible (or at least currently not attempted) for fasta
records. Perhaps we should be coding around the issue in the test suite rather
than changing the attributes of the DBSeqRecord so that it passes the test...

> > Some swiss prot SeqRecords have ncbi_taxid and they are retrieved
> > correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing
> > from the retrieved DBSeqRecord: sp012, sp014, 
> 
> Note some swiss prot records may be multi-species, which the BioSQL schema
> can't cope with.  Not sure if that applies here.

Yep, thats exactly what was causing the problem. Currently the code refuses to
load an ncbi_taxid, which I think is correct, after all which one should be
loaded? Anyway, I'll look into this a bit more...

> > Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved
> > DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in
> > bioentry, if the gi annotation is missing, which is pulled as the gi
> > annotation.
> 
> There probably is something not quite right here.  Are you talking about the
> bioentry.identifier entry in the database?  Perhaps an explicit example might
> help.  As an aside, I think "gi" (GeneIndex used by NCBI) might be better
> stored in the record.dbxrefs, but that could be a parser change...

Ah, OK, will look further into this as well...

> > 'contig' is ignored by loader because it's a SeqFeature object. Is there any
> > reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb)
> 
> I couldn't even say off hand how the CONTIG line in that example would be
> parsed, let alone how it gets dealt with when loading into BioSQL.

Well, the parser correctly deals with it as a SeqFeature (with a whole bunch of
sub_features) but it never gets loaded its not dealt with at all an falls of
the bottom of the function; I cant see any reason not to load it...

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 24 16:40:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 24 Nov 2008 16:40:24 -0500
Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements
In-Reply-To: <bug-2681-42@http.bugzilla.open-bio.org/>
Message-ID: <200811242140.mAOLeO8n008996@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681


------- Comment #5 from cymon.cox at gmail.com  2008-11-24 16:40 EST -------
(In reply to comment #2)
> (In reply to comment #0)
> > Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved
> > DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in
> > bioentry, if the gi annotation is missing, which is pulled as the gi
> > annotation.
> 
> There probably is something not quite right here.  Are you talking about the
> bioentry.identifier entry in the database?  Perhaps an explicit example might
> help.  As an aside, I think "gi" (GeneIndex used by NCBI) might be better
> stored in the record.dbxrefs, but that could be a parser change...

The "gi" annotation of a parsed GenBank record refers to this GenInfo
Identifier:

>From NCBI: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#GInB
"""
"GenInfo Identifier" sequence identification number, in this case, for the
nucleotide sequence. If a sequence changes in any way, a new GI number will be
assigned. GI sequence identifiers run parallel to the new accession.version
system of sequence identifiers. """

This is stored in bioentry.identifier. However, "gi"'s are not present in
swissprot, fasta, and embl records, instead the following couplet loads the
record.id into the identifier slot:

Loader.py:
 519         if "gi" in record.annotations :
 520             identifier = record.annotations["gi"]
 521         else :
 522             identifier = record.id

But of course, the record.id is not the "gi" - so perhaps the
bioentry.identifier should be left NULL if the "gi" number is missing. Or we
might consider calling the DBSeqRecord attribute "identifier" rather than
"gi"...

Here's an example of an EMBL file where the record.id becomes the "gi":

Testing loading from embl format file EMBL/TRBG361.embl
 - AAACAAACCAAATATGGAT...AAA [jfp/7BKv3jTJAU/4jVMrSftEq20] len 1859, X56734.1
 - Retrieving by name/display_id 'X56734', 
old annos diff: set([])
new annos diff: set(['dates', 'ncbi_taxid', 'gi'])

OLD:
taxonomy = ['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta',
'Tracheophyta', 'Spermatophyta', 'Magnoliophyta', 'eudicotyledons', 'core
eudicotyledons', 'rosids', 'eurosids I', 'Fabales', 'Fabaceae',
'Papilionoideae', 'Trifolieae', 'Trifolium']
references = [<Bio.SeqFeature.Reference instance at 0x8e9302c>,
<Bio.SeqFeature.Reference instance at 0x8e931ac>]
accessions = ['X56734', 'S46826']
data_file_division = PLN
organism = Trifolium repens (white clover)
sequence_version = 1
NEW:
dates = ['24-NOV-2008']
ncbi_taxid = 3899
references = [<Bio.SeqFeature.Reference instance at 0x8eced6c>,
<Bio.SeqFeature.Reference instance at 0x8ecedcc>]
accessions = ['X56734', 'S46826']
data_file_division = PLN
taxonomy = ['Trifolium repens (white clover)']
gi = X56734.1
organism = Trifolium repens (white clover)
sequence_version = ['1']
ncbi_taxid: 3899


C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 24 17:51:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 24 Nov 2008 17:51:37 -0500
Subject: [Biopython-dev] [Bug 2683] New: Modules with unused string modules
Message-ID: <bug-2683-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2683

           Summary: Modules with unused string modules
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: trivial
          Priority: P5
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


This is a trivial general bug for any Biopython modules that import the string
module but do not use it. A different bug will be used for those modules that
actually use any depreciated string functions.

Please attach any similar modules to this report.

AlignAce modules:
Bio/AlignAce/AlignAceStandalone.py
Bio/AlignAce/CompareAceStandalone.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Nov 24 18:05:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 24 Nov 2008 18:05:27 -0500
Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements
In-Reply-To: <bug-2681-42@http.bugzilla.open-bio.org/>
Message-ID: <200811242305.mAON5Rs2017499@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681


------- Comment #6 from cymon.cox at gmail.com  2008-11-24 18:05 EST -------
(In reply to comment #4)
> (In reply to comment #2)
> > (In reply to comment #0)
> > > Some swiss prot SeqRecords have ncbi_taxid and they are retrieved
> > > correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing
> > > from the retrieved DBSeqRecord: sp012, sp014, 
> > 
> > Note some swiss prot records may be multi-species, which the BioSQL schema
> > can't cope with.  Not sure if that applies here.
> 
> Yep, thats exactly what was causing the problem. Currently the code refuses to
> load an ncbi_taxid, which I think is correct, after all which one should be
> loaded? Anyway, I'll look into this a bit more...

So, how best to handle records with multiple taxa:

SwissProt/sp014 has 10 organisms which are currently loaded directly into the
taxon_name table:

biosql_test=# select name, name_class from taxon_name where taxon_id = 94;
                                                                               
                                              name                             
                                                                               
                 |   name_class    
------------------------------------------------------------------------------
 Oryza sativa (Rice), Nicotiana tabacum (Common tobacco) Hordeum vulgare
(Barley), Triticum aestivum (Wheat) Secale cereale (Rye), Zea mays (Maize),
Pisum sativum (Garden pea) Spinacia oleracea (Spinach), Capsicum annuum (Bell
pepper) Mesembryanthemum crys | scientific name
(1 row)

That's clearly not a scientific name...

The record has the ncbi_taxon_ids:
OX   NCBI_TaxID=4530, 4097, 4513, 4565, 4550, 4577, 3888, 3562, 4072, 3544,
 19 OX   3555, 3696;

Which are currently not stored because there is more than one:

Loader.py:
 150         ncbi_taxon_id = None
 151         if "ncbi_taxid" in record.annotations :
 152             #Could be a list of IDs.
 153             if isinstance(record.annotations["ncbi_taxid"],list) :
 154                 if len(record.annotations["ncbi_taxid"])==1 :
 155                     ncbi_taxon_id = record.annotations["ncbi_taxid"][0]
 156             else :
 157                 ncbi_taxon_id = record.annotations["ncbi_taxid"]

BioSQL is clearly not designed to store records from multiple taxa: one
bioentry has one taxon_id. Should biopython be refusing to load such records if
the scientific name is not a binomial? What does perl do? 

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Mon Nov 24 23:08:18 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 24 Nov 2008 20:08:18 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570811240104m1442e5dfkd0c0f92c6fa772f9@mail.gmail.com>
Message-ID: <199296.58154.qm@web62402.mail.re1.yahoo.com>

> > However, more than half of Biopython's tests do
> > not actually make use of this testing framework:
> 
> Do you need help in re-organizing all of these modules?

That would be helpful, but let's see first if there are any objections to my proposal. We'll also have to decide the pathway to change the tests without breaking anything. For the unit tests I listed, the changes should be trivial, but still we need to check if any problems show up.

Thanks!

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 09:31:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 09:31:18 -0500
Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules
In-Reply-To: <bug-2683-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251431.mAPEVIYj014396@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2683


------- Comment #1 from bsouthey at gmail.com  2008-11-25 09:31 EST -------
Bio/Crystal/__init__.py imports but does appear to use the following modules:
array
string
Seq
MutableSeq


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 09:40:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 09:40:23 -0500
Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules
In-Reply-To: <bug-2683-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251440.mAPEeN8f015160@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2683


------- Comment #2 from barwil at gmail.com  2008-11-25 09:40 EST -------

> AlignAce modules:
> Bio/AlignAce/AlignAceStandalone.py
> Bio/AlignAce/CompareAceStandalone.py
> 

Fixed in CVS now.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From chapmanb at 50mail.com  Tue Nov 25 09:40:41 2008
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 25 Nov 2008 09:40:41 -0500
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <871524.42970.qm@web62403.mail.re1.yahoo.com>
References: <871524.42970.qm@web62403.mail.re1.yahoo.com>
Message-ID: <20081125144041.GC83220@sobchak.mgh.harvard.edu>

Hi Michiel;
Good thoughts on this; my comments are below.

> Biopython's testing framework is built on top of Python's unit testing
> framewerk. Python's unit testing framework makes use of assertion
> statements to compare the result of a command to the expected result.
> Biopython uses test scripts that print output to stdout, together with
> an output file that contains the correct output. After running each
> test script, it compares the generated output with the correct output
> to see if the test was successful.

Agreed with the distinction between the unit tests and the "dump
lots of text and compare" approach. I've written both and do think
the unit testing/assertion model is more robust since you can go
back and actually get some insight into what someone was thinking
when they wrote an assertion.

> However, more than half of Biopython's tests do not actually make use of this testing framework:
[...]
> These tests have trivial output, for example test_Cluster:
> 
> test_Cluster
> test_clusterdistance (test_Cluster.TestCluster) ... ok
> test_distancematrix_kmedoids (test_Cluster.TestCluster) ... ok
> test_kcluster (test_Cluster.TestCluster) ... ok
> test_matrix_parse (test_Cluster.TestCluster) ... ok
> test_median_mean (test_Cluster.TestCluster) ... ok
> test_somcluster (test_Cluster.TestCluster) ... ok
> test_treecluster (test_Cluster.TestCluster) ... ok

They really do make use of the framework, but at a higher level. I
agree that if you run a single test it makes little difference
whether you use 'run_tests.py test_Cluster' or just run
'test_Cluster.py' directly. However, when you are running all the
tests as is regular done in development or before pushing releases,
this comparison is important. It will will pick out if you get a
line like:

test_clusterdistance (test_Cluster.TestCluster) ... ERROR

instead of the expected ok and report this in the summary for all of
the tests. Otherwise this is likely to get lost in all of the
results.

> Personally, I find Python's unit testing framework easier to
> understand than Biopython's testing framework. It doesn't need a
> separate output file, and it is easier to match each line of code with
> the correct behavior.
>
> I would therefore like to suggest to move from Biopython's testing
> framework to Python's testing framework. This also relieves us of the
> task of explaining Biopython's testing framework to contributors,
> and allows us to make better use of what Python already provides.
> Comparing output line-by-line, as Biopython's testing framework
> currently does, can still be used by test scripts that need this
> functionality.

Is the testing framework you are proposing different from the unit
tests used the individual tests? How does your proposed manage the
higher level functionality of checking if all sub-tests within one
of the test suites passes?

Brad

From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 10:24:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 10:24:33 -0500
Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules
In-Reply-To: <bug-2683-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251524.mAPFOXe2019581@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2683


------- Comment #3 from bsouthey at gmail.com  2008-11-25 10:24 EST -------
Bio/FilteredReader.py imports but does appear to use the following modules:

os
string
copy
from File import UndoHandle


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 11:13:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 11:13:01 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251613.mAPGD1FG024870@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #7 from cymon.cox at gmail.com  2008-11-25 11:13 EST -------
(In reply to comment #6)
> (From update of attachment 1072 [details])
> I think this is still a big improvement, but that the
> (sub)feature.location_operator issue could wait.  We'll need to discuss on the
> BioSQL mailing list how this should be handled consistently.
> 
> Leaving this bug open.

Further to the "where to put the (sub)feature.location_operator" (eg. "join",
"order") question, this comment appears in the BioPerl MySQL schema for the
location_qualifier_value table:

-- location qualifiers - mainly intended for fuzzies but anything
-- can go in here
-- some controlled vocab terms have slots;

So, this would seem a suitable place to store the attribute.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 11:13:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 11:13:07 -0500
Subject: [Biopython-dev] [Bug 2684] New: GenBank/__init__.py: Removing loop
	over string.whitespace
Message-ID: <bug-2684-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2684

           Summary: GenBank/__init__.py: Removing loop over
                    string.whitespace
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


The function '_clean_location' in GenBank/__init__.py uses a 'for' loop over
string.whitespace that removes whitespace from string. A simpler way is to just
split the string on whitespace and rejoin it as a single line:

location_line=''.join(location_string.split())


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 11:14:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 11:14:19 -0500
Subject: [Biopython-dev] [Bug 2684] GenBank/__init__.py: Removing loop over
	string.whitespace
In-Reply-To: <bug-2684-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251614.mAPGEJvT025100@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2684


------- Comment #1 from bsouthey at gmail.com  2008-11-25 11:14 EST -------
Created an attachment (id=1083)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1083&action=view)
Removal of unnessary loop over string.whitespace


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 11:30:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 11:30:01 -0500
Subject: [Biopython-dev] [Bug 2685] New: HotRand provides an unnecessary
	function to convert hex to integer
Message-ID: <bug-2685-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2685

           Summary: HotRand provides an unnecessary function to convert hex
                    to integer
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


The file Bio/HotRand.py defines the function hex_convert that converts a hex
number to an integer number. This functionality is provided by the builtin
int() with appropriate radix, i.e. 
int(hex_number, 16)

This function could be removed or replaced to avoiding using the string module.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 11:31:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 11:31:09 -0500
Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function
	to convert hex to integer
In-Reply-To: <bug-2685-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251631.mAPGV91O027180@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2685


------- Comment #1 from bsouthey at gmail.com  2008-11-25 11:31 EST -------
Created an attachment (id=1084)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1084&action=view)
Replaces hex_convert() with int()


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 11:52:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 11:52:12 -0500
Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function
	to convert hex to integer
In-Reply-To: <bug-2685-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251652.mAPGqCMt029684@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2685


bsouthey at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1084 is|0                           |1
           obsolete|                            |


------- Comment #2 from bsouthey at gmail.com  2008-11-25 11:52 EST -------
Created an attachment (id=1085)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1085&action=view)
Messed up the first patch


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 11:53:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 11:53:41 -0500
Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function
	to convert hex to integer
In-Reply-To: <bug-2685-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251653.mAPGrfPk029811@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2685


bsouthey at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1085 is|0                           |1
           obsolete|                            |


------- Comment #3 from bsouthey at gmail.com  2008-11-25 11:53 EST -------
Created an attachment (id=1086)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1086&action=view)
Sorry wrong version


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 13:18:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 13:18:59 -0500
Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules
In-Reply-To: <bug-2683-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251818.mAPIIxQt006109@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2683


------- Comment #4 from bsouthey at gmail.com  2008-11-25 13:18 EST -------
These are the last files that I have found in Bio that import the string module
but are not used:

IntelliGenetics/__init__.py
IntelliGenetics/intelligenetics_format.py
IntelliGenetics/Record.py
NetCatch.py
SCOP/__init__.py
PDB/PSEA.py (imports upper)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 17:18:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 17:18:41 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcribe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811252218.mAPMIfFX029455@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


mmokrejs at ribosome.natur.cuni.cz changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|translate and transcibe     |translate and transcribe
                   |methods for the Seq object  |methods for the Seq object
                   |(in Bio.Seq)                |(in Bio.Seq)


------- Comment #53 from mmokrejs at ribosome.natur.cuni.cz  2008-11-25 17:18 EST -------
(In reply to comment #27)
> Created an attachment (id=1032)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details]
> Patch to Bio/Seq.py to add start codon handling to translation
> 
> Patch adds a new boolean argument to the translate method and function, called
> "init" (rather than my earlier suggestions like "from_start" or "check_start"
> which could be considered misleading).
> 
> Docstring:
> 
>         init - Boolean, defaults to False.  Should translation check the
>                first codon is a valid initiation (start) codon and translate
>                it as methionine (M)?  If False, nothing special is done with
>                the first codon.

What kind of check is it doing? I think it just forces the first letter to be
'M'.

> 
> 
> Example usage of the translate function,
> 
> >>> from Bio.Seq import translate
> >>> translate("TTGAAACCCTAG")
> 'LKP*'
> >>> translate("TTGAAACCCTAG", init=True, to_stop=True)
> 'MKP'
> >>> translate("TTGAAACCCTAG", init=True)
> 'MKP*'
> >>> translate("TTGAAACCCTAG", to_stop=True)
> 'LKP'

I don't like the "init" argument either. I would call it force_initiator_Met
instead. BTW, non-canonical initiator codon is CUG, where did you found UUG?

Sorry, I got overloaded by many other tasks so haven't read any other
follow-ups, I just hit the email from bugzilla by luck.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 10:57:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 10:57:05 -0500
Subject: [Biopython-dev] [Bug 2688] New: Removal of depreciated string
	functions
Message-ID: <bug-2688-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688

           Summary: Removal of depreciated string functions
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: minor
          Priority: P5
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


This is a general bug to remove any depreciated string functions from Biopython
modules. I apologize in advance for the noise this creates especially due to my
mistakes.

I have tested and validated the subsequent patches on my Linux system with
Python versions 2.3, 2.4, 2.5 and 2.6. However, I do recognize that patches may
be in code not used by the tests. 


The following files require importing the string module and are thus excluded
(although depreciated functions may still be used):
Bio/Decode.py - maketrans()
Bio/EUtils/POM.py - maketrans()
Bio/Prosite/Pattern.py - maketrans()
Bio/Seq.py - maketrans()
triefind.py - defines string.punctuation + string.whitespace

The following files have alternative reports
GenBank/__init__.py
HotRand.py


The following files are depreciated and are excluded:
Emboss/Primer.py
stringfns.py
MetaTool/__init__.py
MetaTool/metatool_format.py
MetaTool/Record.py
NBRF/__init__.py
Ndb/__init__.py
Transcribe.py


The following files import but do not use the string module 
AlignAce/AlignAceStandalone.py (fixed)
AlignAce/CompareAceStandalone.py (fixed)
Crystal/__init__.py
IntelliGenetics/__init__.py
IntelliGenetics/intelligenetics_format.py
IntelliGenetics/Record.py
NetCatch.py
SCOP/__init__.py


The following files are known to use string module and have patches:
Align/AlignInfo.py
Blast/ParseBlastTable.py
FSSP/__init__.py
NMR/NOEtools.py
NMR/xpktools.py
PDB/MMCIFParser.py
SubsMat/__init__.py
Blast/Record.py
Compass/__init__.py
Data/CodonTable.py
Eutils/sourcegen.py
Eutils/tests/unittest.py
Fasta/FastaAlign.py
FilteredReader.py
GFF/easy.py
HMM/Utilities.py
Index.py
MEME/Parser.py
NeuralNetwork/Gene/Pattern.py
NeuralNetwork/Gene/Schema.py
Parsers/spark.py
PDB/parse_pdb_header.py
PDB/PDBList.py
PDB/PDBParser.py
PDB/PSEA.py
SCOP/__init__.py
utils.py

I did not see an trivial resolution for the functions in:
SubsMat/FreqTable.py
So I rewrote the functions to avoid using map.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 10:58:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 10:58:03 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261558.mAQFw3wc029231@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #1 from bsouthey at gmail.com  2008-11-26 10:58 EST -------
Created an attachment (id=1088)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1088&action=view)
Remove depreciated string functions


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 10:59:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 10:59:27 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261559.mAQFxR5t029522@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #2 from bsouthey at gmail.com  2008-11-26 10:59 EST -------
Created an attachment (id=1089)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1089&action=view)
Blast/Record.py patch


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:01:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:01:30 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261601.mAQG1U4h029894@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #3 from bsouthey at gmail.com  2008-11-26 11:01 EST -------
Created an attachment (id=1090)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1090&action=view)
Compass/__init__.py depreciated string functions


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:02:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:02:26 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261602.mAQG2Qlx030068@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #4 from bsouthey at gmail.com  2008-11-26 11:02 EST -------
Created an attachment (id=1091)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1091&action=view)
Data/CodonTable.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:03:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:03:14 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261603.mAQG3ETM030188@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #5 from bsouthey at gmail.com  2008-11-26 11:03 EST -------
Created an attachment (id=1092)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1092&action=view)
Eutils/sourcegen.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:04:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:04:07 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261604.mAQG47K1030328@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #6 from bsouthey at gmail.com  2008-11-26 11:04 EST -------
Created an attachment (id=1093)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1093&action=view)
Eutils/tests/unittest.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:05:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:05:14 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261605.mAQG5EUu030457@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #7 from bsouthey at gmail.com  2008-11-26 11:05 EST -------
Created an attachment (id=1094)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1094&action=view)
Fasta/FastaAlign.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:06:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:06:35 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261606.mAQG6ZqF030610@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #8 from bsouthey at gmail.com  2008-11-26 11:06 EST -------
Created an attachment (id=1095)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1095&action=view)
FSSP/__init__.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:09:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:09:26 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261609.mAQG9QMf030939@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


bsouthey at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1095 is|0                           |1
           obsolete|                            |


------- Comment #9 from bsouthey at gmail.com  2008-11-26 11:09 EST -------
Created an attachment (id=1096)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1096&action=view)
FSSP/__init__.py corrected

Got the files in the wrong order.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:10:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:10:25 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261610.mAQGAP10031066@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #10 from bsouthey at gmail.com  2008-11-26 11:10 EST -------
Created an attachment (id=1097)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1097&action=view)
GFF/easy.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:11:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:11:19 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261611.mAQGBJ28031191@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #11 from bsouthey at gmail.com  2008-11-26 11:11 EST -------
Created an attachment (id=1098)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1098&action=view)
HMM/Utilities.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:31:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:31:52 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261631.mAQGVqef001363@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #12 from bsouthey at gmail.com  2008-11-26 11:31 EST -------
Created an attachment (id=1099)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1099&action=view)
Index.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:32:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:32:37 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261632.mAQGWbYF001446@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #13 from bsouthey at gmail.com  2008-11-26 11:32 EST -------
Created an attachment (id=1100)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1100&action=view)
MEME/Parser.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:33:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:33:41 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261633.mAQGXfww001564@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #14 from bsouthey at gmail.com  2008-11-26 11:33 EST -------
Created an attachment (id=1101)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1101&action=view)
NeuralNetwork/Gene/Pattern.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:34:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:34:41 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261634.mAQGYf0u001687@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #15 from bsouthey at gmail.com  2008-11-26 11:34 EST -------
Created an attachment (id=1102)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1102&action=view)
NeuralNetwork/Gene/Schema.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:35:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:35:35 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261635.mAQGZZno001826@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #16 from bsouthey at gmail.com  2008-11-26 11:35 EST -------
Created an attachment (id=1103)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1103&action=view)
NMR/NOEtools.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:36:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:36:19 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261636.mAQGaJXQ001918@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #17 from bsouthey at gmail.com  2008-11-26 11:36 EST -------
Created an attachment (id=1104)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1104&action=view)
NMR/xpktools.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:37:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:37:14 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261637.mAQGbEX0002035@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #18 from bsouthey at gmail.com  2008-11-26 11:37 EST -------
Created an attachment (id=1105)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1105&action=view)
Parsers/spark.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:38:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:38:42 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261638.mAQGcgvH002293@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #19 from bsouthey at gmail.com  2008-11-26 11:38 EST -------
Created an attachment (id=1106)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1106&action=view)
Blast/ParseBlastTable.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:39:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:39:37 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261639.mAQGdbdC002442@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #20 from bsouthey at gmail.com  2008-11-26 11:39 EST -------
Created an attachment (id=1107)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1107&action=view)
PDB/MMCIFParser.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:40:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:40:56 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261640.mAQGeuHm002669@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #21 from bsouthey at gmail.com  2008-11-26 11:40 EST -------
Created an attachment (id=1108)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1108&action=view)
PDB/parse_pdb_header.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:41:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:41:56 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261641.mAQGfuJj002827@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #22 from bsouthey at gmail.com  2008-11-26 11:41 EST -------
Created an attachment (id=1109)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1109&action=view)
PDB/PDBList.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:42:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:42:41 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261642.mAQGgfiH002929@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #23 from bsouthey at gmail.com  2008-11-26 11:42 EST -------
Created an attachment (id=1110)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1110&action=view)
PDB/PDBParser.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:43:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:43:28 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261643.mAQGhSbJ003019@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #24 from bsouthey at gmail.com  2008-11-26 11:43 EST -------
Created an attachment (id=1111)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1111&action=view)
SubsMat/__init__.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:46:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:46:00 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261646.mAQGk0id003484@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #25 from bsouthey at gmail.com  2008-11-26 11:46 EST -------
Created an attachment (id=1112)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1112&action=view)
SubsMat/FreqTable.py 

The two functions involved were rewritten because of the use of map(). 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:49:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:49:58 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261649.mAQGnwds003938@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #26 from bsouthey at gmail.com  2008-11-26 11:49 EST -------
Created an attachment (id=1113)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1113&action=view)
utils.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 11:55:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:55:45 -0500
Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function
	to convert hex to integer
In-Reply-To: <bug-2685-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261655.mAQGtjPA004778@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2685


bsouthey at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1086 is|0                           |1
           obsolete|                            |


------- Comment #4 from bsouthey at gmail.com  2008-11-26 11:55 EST -------
Created an attachment (id=1115)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1115&action=view)
Modified HotRand.hex_convert() function

Hopefully the last attempt to get the right version as a patch!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bsouthey at gmail.com  Wed Nov 26 12:10:57 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 26 Nov 2008 11:10:57 -0600
Subject: [Biopython-dev] Use of depreciated string functions
In-Reply-To: <320fb6e00811241042g646ff65fq61d3751537c882b1@mail.gmail.com>
References: <4926D17A.8080101@gmail.com>	
	<320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com>	
	<492AE8A9.1000406@gmail.com>
	<320fb6e00811241042g646ff65fq61d3751537c882b1@mail.gmail.com>
Message-ID: <492D8321.2060301@gmail.com>

Peter wrote:
> On Mon, Nov 24, 2008 at 5:47 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>   
>>> Once I've dealt with Biopython 1.49, I'd be happy to look at a patch
>>> to remove more "import string" usage from non-obsolete, non-deprecated
>>> code.  It would be a little risky doing this to modules without unit
>>> tests, but that's another area you've shown some interest in anyway...
>>>
>>> Thanks,
>>>
>>> Peter
>>>       
>> Hi,
>> I was planning to get started on with these depending on what time I have
>> available. So just a quick question:
>> Do you want one bug report per patch per file?
>> Or just let me know if there is another way.
>>     
>
> I'd suggest one general bug, and uploading one patch per module - that
> way the can be evaluated on a case by case basis (a single huge
> multi-file patch would be more difficult, and could become out of
> date).
>
> Personally however, I would prioritise more unit test coverage over
> this, but on the other hand its the kind of short task you can handle
> when you have the odd spare 10 minutes.  Up to you.
>
> Peter
>   
Hi,
I have filed Bug 2688 
<http://bugzilla.open-bio.org/show_bug.cgi?id=2688> as a general bug for 
the files in the Bio module that use the depreciated string functions. I 
listed all the files that I identified that imported string and whether 
or not I provided a patch for it. Bug  2683 
<http://bugzilla.open-bio.org/show_bug.cgi?id=2683> lists those files 
that import string but do not use it.

There is one attachment for each file (excluding mistakes).

In addition, Bugs 2684 
<http://bugzilla.open-bio.org/show_bug.cgi?id=2684> and 2685 
<http://bugzilla.open-bio.org/show_bug.cgi?id=2685> were created because 
these involve rewritten code that was related to this. I probably should 
have created a separate one for
 SubsMat/FreqTable.py although the reason directly involves the string 
module.

Regards
Bruce

 
From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 20:23:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 20:23:32 -0500
Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function
	to convert hex to integer
In-Reply-To: <bug-2685-42@http.bugzilla.open-bio.org/>
Message-ID: <200811270123.mAR1NWWu011079@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2685


------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp  2008-11-26 20:23 EST -------
As far as I can tell, the HotRand.hex_convert function is not used any more in
Bio.HotRand or anywhere else in Biopython; its usage was lost in revision 1.3
of Bio.HotRand. So I think that we can simply deprecate this function. If there
are no objections, I'll add a DeprecationWarning and use Bruce's code in the
mean time until the function is removed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 22:06:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 22:06:59 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811270306.mAR36xuB020451@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1088 is|0                           |1
           obsolete|                            |


------- Comment #27 from mdehoon at ims.u-tokyo.ac.jp  2008-11-26 22:06 EST -------
(From update of attachment 1088)
Committed to CVS.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 23:16:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 23:16:43 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811270416.mAR4Gh40027250@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1089 is|0                           |1
           obsolete|                            |


------- Comment #28 from mdehoon at ims.u-tokyo.ac.jp  2008-11-26 23:16 EST -------
(From update of attachment 1089)
Committed to CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 23:29:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 23:29:01 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811270429.mAR4T1tn027991@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1090 is|0                           |1
           obsolete|                            |


------- Comment #29 from mdehoon at ims.u-tokyo.ac.jp  2008-11-26 23:29 EST -------
(From update of attachment 1090)
Committed to CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 23:45:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 23:45:40 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811270445.mAR4jeph029067@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1091 is|0                           |1
           obsolete|                            |


------- Comment #30 from mdehoon at ims.u-tokyo.ac.jp  2008-11-26 23:45 EST -------
(From update of attachment 1091)
Committed to CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 01:54:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 01:54:12 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811270654.mAR6sC92005762@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1092 is|0                           |1
           obsolete|                            |


------- Comment #31 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 01:54 EST -------
(From update of attachment 1092)
Committed to CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 04:35:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 04:35:50 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811270935.mAR9Zoxj019658@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #8 from lpritc at scri.sari.ac.uk  2008-11-27 04:35 EST -------
(In reply to comment #7)
> (In reply to comment #0)
> 
> > The major changes that have been made to the version previously available at
> > http://bioinf.scri.ac.uk/lp are:
> 
> That's a very nice contribution, thank you!!!
> This link is wrong, I think you mean
> http://bioinf.scri.ac.uk/lp/programs.php#genomediagram

Thanks, Marco.

You're absolutely correct - and people ought to be able to navigate to there
from the link I gave.  Thanks for posting the accurate link.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From lpritc at scri.ac.uk  Thu Nov 27 04:33:43 2008
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Thu, 27 Nov 2008 09:33:43 +0000
Subject: [Biopython-dev] blog article on GenomeDiagram in Biopython
In-Reply-To: <5aa3b3570811230933n2de8af3lf31d3c4b962930a3@mail.gmail.com>
Message-ID: <C55419F7.19ECC%lpritc@scri.ac.uk>

Thanks Giovanni,

On 23/11/2008 17:33, "Giovanni Marco Dall'Olio" <dalloliogm at gmail.com>
wrote:

> I thought that the inclusion of GenomeDiagrams in biopython is such an
> interesting news, that I wrote a blog post on it:
> - http://bioinfoblog.it/2008/11/genome-diagrams-included-in-biopython-150/

I left a comment there ;)
 
> I have used images from some tutorials without asking, I hope it is
> not a problem.

No problem at all - I think the old license covered it, and I'm pretty sure
that the Biopython license will, too.  Even if they didn't, as the original
copyright holder, I approve ;)

L.


-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by
guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are
confidential

to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this

confidentiality and you must not use, disclose, copy, print or rely on
this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan
the email and the attachments (if any).
______________________________________________________________________

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 04:57:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 04:57:00 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcribe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811270957.mAR9v0i0021623@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #54 from lpritc at scri.sari.ac.uk  2008-11-27 04:56 EST -------
(In reply to comment #53)
> (In reply to comment #27)
> > Created an attachment (id=1032)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] [details]
> > Patch to Bio/Seq.py to add start codon handling to translation
> > 
> > Patch adds a new boolean argument to the translate method and function, called
> > "init" (rather than my earlier suggestions like "from_start" or "check_start"
> > which could be considered misleading).

[...]

> I don't like the "init" argument either. I would call it force_initiator_Met
> instead. BTW, non-canonical initiator codon is CUG, where did you found UUG?

This may clarify things:

>From the E. coli K-12 sequencing paper
(http://dx.doi.org/10.1126/science.277.5331.1453):

"The distribution of start codons is as follows: ATG, 3542; GTG, 612; and TTG,
130. There is also one ATT and possibly a CTG"

It's not that unusual an occurrence, and there are a small number of known
alternative start codons.  'Forcing' a Met start imposes the result that the
first codon is a methionine, rather than checking that the first codon *could
be* a methionine.  I prefer the second behaviour.

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 05:41:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 05:41:18 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811271041.mARAfITj025395@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1093 is|0                           |1
           obsolete|                            |


------- Comment #32 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 05:41 EST -------
(From update of attachment 1093)
Committed to CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 05:46:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 05:46:57 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811271046.mARAkv9t025868@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1094 is|0                           |1
           obsolete|                            |


------- Comment #33 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 05:46 EST -------
(From update of attachment 1094)
Committed to CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 06:08:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 06:08:30 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811271108.mARB8U6n027821@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1096 is|0                           |1
           obsolete|                            |


------- Comment #34 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 06:08 EST -------
(From update of attachment 1096)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 06:14:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 06:14:18 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811271114.mARBEI5w028329@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1097 is|0                           |1
           obsolete|                            |


------- Comment #35 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 06:14 EST -------
(From update of attachment 1097)
Committed to CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Thu Nov 27 08:09:43 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 27 Nov 2008 05:09:43 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <20081125144041.GC83220@sobchak.mgh.harvard.edu>
Message-ID: <45956.75241.qm@web62406.mail.re1.yahoo.com>

> > However, more than half of Biopython's tests do
> > not actually make use of this testing framework:
> > [...]
> > These tests have trivial output, for example
> test_Cluster:
> > 
> > test_Cluster
> > test_clusterdistance (test_Cluster.TestCluster) ... ok
> > test_distancematrix_kmedoids
> > (test_Cluster.TestCluster) ... ok
> > test_kcluster (test_Cluster.TestCluster) ... ok
> > test_matrix_parse (test_Cluster.TestCluster) ... ok
> > test_median_mean (test_Cluster.TestCluster) ... ok
> > test_somcluster (test_Cluster.TestCluster) ... ok
> > test_treecluster (test_Cluster.TestCluster) ... ok
> 
> They really do make use of the framework, but at a higher
> level. I agree that if you run a single test it makes little
> difference whether you use 'run_tests.py test_Cluster' or just
> run 'test_Cluster.py' directly. However, when you are
> running all the tests as is regular done in development
> or before pushing releases, this comparison is important. It
> will pick out if you get a line like:
> 
> test_clusterdistance (test_Cluster.TestCluster) ... ERROR
> 
> instead of the expected ok and report this in the summary
> for all of the tests. Otherwise this is likely to get lost
> in all of the results.

Actually, I never use the summary produced by run_tests.py. I just check which tests failed, and then fix them one by one by running the individual test scripts.

> > I would therefore like to suggest to move from
> > Biopython's testing framework to Python's testing
> > framework. This also relieves us of the
> > task of explaining Biopython's testing framework
> > to contributors, and allows us to make better use
> > of what Python already provides.
...
> Is the testing framework you are proposing different from
> the unit tests used the individual tests?

I am proposing to use the regular Python unit testing framework as it is. This means that most Biopython tests do not change at all (or only trivially). The run_tests.py script will need to be modified though to remove the requirement of having an output file for each individual test.

> How does your proposed
> manage the higher level functionality of checking if all sub-tests
> within one of the test suites passes?

If one of the sub-tests fails, Python's unit testing framework will tell us so, though (perhaps) not exactly which sub-test fails. However, that is easy to figure out just by running the individual test script by itself.

--Michiel


From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 08:33:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 08:33:46 -0500
Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules
In-Reply-To: <bug-2683-42@http.bugzilla.open-bio.org/>
Message-ID: <200811271333.mARDXkHx009514@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2683


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 08:33 EST -------
Fixed in CVS, thanks


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 09:38:04 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 09:38:04 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811271438.mAREc4IG018238@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #9 from lpritc at scri.sari.ac.uk  2008-11-27 09:38 EST -------
The revised color/colour code in AbstractDrawer.py causes all bar charts in
linear diagrams to be the default colour of light green.  A fixed version of
AbstractDrawer is provided as an attachment.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 09:39:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 09:39:37 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811271439.mAREdbXp018415@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #10 from lpritc at scri.sari.ac.uk  2008-11-27 09:39 EST -------
Created an attachment (id=1121)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1121&action=view)
Revised AbstractDrawer.py

This revision fixes a behaviour where bar charts for linear diagrams cannot be
changed from tehir defautl colour.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 20:33:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 20:33:56 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280133.mAS1XuXq002406@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1098 is|0                           |1
           obsolete|                            |


------- Comment #36 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 20:33 EST -------
(From update of attachment 1098)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 20:52:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 20:52:10 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280152.mAS1qAR3003698@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1099 is|0                           |1
           obsolete|                            |


------- Comment #37 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 20:52 EST -------
(From update of attachment 1099)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 21:27:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 21:27:29 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280227.mAS2RTea005795@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #38 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 21:27 EST -------
(From update of attachment 1100)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 21:27:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 21:27:47 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280227.mAS2RlEg005835@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1100 is|0                           |1
           obsolete|                            |


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 21:55:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 21:55:11 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280255.mAS2tBTL007510@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1101 is|0                           |1
           obsolete|                            |


------- Comment #39 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 21:55 EST -------
(From update of attachment 1101)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 22:02:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 22:02:25 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280302.mAS32Pxh008177@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1102 is|0                           |1
           obsolete|                            |


------- Comment #40 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 22:02 EST -------
(From update of attachment 1102)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 23:08:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 23:08:57 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280408.mAS48vaq012054@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1103 is|0                           |1
           obsolete|                            |


------- Comment #41 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 23:08 EST -------
(From update of attachment 1103)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 23:16:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 23:16:29 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280416.mAS4GThb012692@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1104 is|0                           |1
           obsolete|                            |


------- Comment #42 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 23:16 EST -------
(From update of attachment 1104)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 23:22:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 23:22:37 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280422.mAS4MbVR013025@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1105 is|0                           |1
           obsolete|                            |


------- Comment #43 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 23:22 EST -------
(From update of attachment 1105)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 23:50:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 23:50:59 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280450.mAS4oxjC014450@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1106 is|0                           |1
           obsolete|                            |


------- Comment #44 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 23:50 EST -------
(From update of attachment 1106)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 00:07:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 00:07:15 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280507.mAS57F3P015386@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1107 is|0                           |1
           obsolete|                            |


------- Comment #45 from mdehoon at ims.u-tokyo.ac.jp  2008-11-28 00:07 EST -------
(From update of attachment 1107)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 03:48:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 03:48:30 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280848.mAS8mUmr028058@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1108 is|0                           |1
           obsolete|                            |


------- Comment #46 from mdehoon at ims.u-tokyo.ac.jp  2008-11-28 03:47 EST -------
(From update of attachment 1108)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 05:07:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 05:07:05 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281007.mASA751F001103@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1109 is|0                           |1
           obsolete|                            |


------- Comment #47 from mdehoon at ims.u-tokyo.ac.jp  2008-11-28 05:07 EST -------
(From update of attachment 1109)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 05:22:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 05:22:13 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281022.mASAMDwt002023@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1110 is|0                           |1
           obsolete|                            |


------- Comment #48 from mdehoon at ims.u-tokyo.ac.jp  2008-11-28 05:22 EST -------
(From update of attachment 1110)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 05:29:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 05:29:16 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281029.mASATGhi002380@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1111 is|0                           |1
           obsolete|                            |


------- Comment #49 from mdehoon at ims.u-tokyo.ac.jp  2008-11-28 05:29 EST -------
(From update of attachment 1111)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 05:29:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 05:29:39 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281029.mASATdU5002440@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1112 is|0                           |1
           obsolete|                            |


------- Comment #50 from mdehoon at ims.u-tokyo.ac.jp  2008-11-28 05:29 EST -------
(From update of attachment 1112)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 05:30:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 05:30:23 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281030.mASAUNDX002501@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1113 is|0                           |1
           obsolete|                            |


------- Comment #51 from mdehoon at ims.u-tokyo.ac.jp  2008-11-28 05:30 EST -------
(From update of attachment 1113)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Fri Nov 28 06:09:30 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 11:09:30 +0000
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <45956.75241.qm@web62406.mail.re1.yahoo.com>
References: <20081125144041.GC83220@sobchak.mgh.harvard.edu>
	<45956.75241.qm@web62406.mail.re1.yahoo.com>
Message-ID: <320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com>

Hello all,

Sorry for not replying earlier - I've been travelling and didn't get
to check my email as often as I had hoped.   I'm going to reply to
several points in this one email...

Marco wrote:
> I was also proposing to use the doctest framework for some of the
> modules, and for enhancing documentation.
> http://bugzilla.open-bio.org/show_bug.cgi?id=2640

As Marco points out, there is also the option of using doctest, which
were doing in some of the unit tests (e.g. test_wise.py).  I like the
idea of using doctest were we want to include examples in the
docstrings anyway.  Marco wasn't suggesting this, but just to be
clear, I don't think we should use JUST doctest for all our unit
tests.  Many test cases would make misleading documentation, and also
having lots and lots of doctest examples would also hide the important
parts of the documentation.  Additionally, doctests using input files
are not straightforward due to path issues.

Brad wrote:
> Agreed with the distinction between the unit tests and the "dump
> lots of text and compare" approach. I've written both and do think
> the unit testing/assertion model is more robust since you can go
> back and actually get some insight into what someone was thinking
> when they wrote an assertion.

I have probably written more of the "dump lots of text and compare"
style tests.  I think these have a number of advantages:
(1) Easier for beginneers to write a test, you can almost take any
example script and use that.  You don't have to learn the unit test
framework.
(2) Debugging a failing test in IDLE is much easier - using unit tests
you have all that framework between you and the local scope where the
error happens.
(3) For many broad tests, manually setting up the expected output for
an assert is extremely tedious (e.g. parsing sequences and checking
their checksums).

We could discuss a modification to run_tests.py so that if there is no
expected output file output/test_XXX for test_XXX.py we just run
test_XXX.py and check its return value (I think Michiel had previously
suggested something like this).  Perhaps for more robustness, capture
the output and compare it to a predefined list of regular expressions
covering the typical outputs.  For example, looking at
output/test_Cluster, the first line is the test name, but rest follows
the patten "test_... ok". I imaging only a few output styles exist.
With such a change, half the unit test's (e.g. test_Cluster.py)
wouldn't need their output file in CVS (output/test_Cluster).

Michiel de Hoon wrote:
> If one of the sub-tests fails, Python's unit testing framework will tell us so,
> though (perhaps) not exactly which sub-test fails. However, that is easy to
> figure out just by running the individual test script by itself.

That won't always work.  Consider intermittent network problems, or
tests using random data - in general it really is worthwhile having
run_tests.py report a little more than just which test_XXX.py module
failed.

Peter

From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 06:53:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 06:53:36 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281153.mASBra4q008163@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #52 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-28 06:53 EST -------
Although I had offered to look over the patches, it looks like Michiel has
reviewed and committed them all while I was away, so I don't have to ;)

Thank you both!

Can we close this bug now?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 06:57:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 06:57:35 -0500
Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function
	to convert hex to integer
In-Reply-To: <bug-2685-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281157.mASBvZ6A008475@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2685


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-28 06:57 EST -------
(In reply to comment #5)
> As far as I can tell, the HotRand.hex_convert function is not used any more in
> Bio.HotRand or anywhere else in Biopython; its usage was lost in revision 1.3
> of Bio.HotRand. So I think that we can simply deprecate this function. If there
> are no objections, I'll add a DeprecationWarning and use Bruce's code in the
> mean time until the function is removed.


+1 on this plan.

(I was going to say we should deprecate this function rather than removing it,
but you'd already covered that).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 07:05:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 07:05:14 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281205.mASC5EY8009077@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-28 07:05 EST -------
(In reply to comment #7)
> (In reply to comment #6)
> > (From update of attachment 1072 [details] [details])
> > I think this is still a big improvement, but that the
> > (sub)feature.location_operator issue could wait.  We'll
> > need to discuss on the
> > BioSQL mailing list how this should be handled consistently.
> > 
> > Leaving this bug open.
> 
> Further to the "where to put the (sub)feature.location_operator" (eg. "join",
> "order") question, this comment appears in the BioPerl MySQL schema for the
> location_qualifier_value table:
> 
> -- location qualifiers - mainly intended for fuzzies but anything
> -- can go in here
> -- some controlled vocab terms have slots;
> 
> So, this would seem a suitable place to store the attribute.
> 

Yes, but if we record something in the location_qualifier_value table we can't
use a NULL term_id (possibly a schema limitation).  We therefore need to use a
particular ontology, which is where some co-ordination with the other BioSQL
projects is needed (so that we all default to the same ontology).  I'd meant to
send of an email about this to the BioSQL mailing list but didn't get it done
before I had to leave for a trip.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 07:24:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 07:24:19 -0500
Subject: [Biopython-dev] [Bug 2684] GenBank/__init__.py: Removing loop over
	string.whitespace
In-Reply-To: <bug-2684-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281224.mASCOJSg010226@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2684


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-28 07:24 EST -------
Marking as fixed - I've checked in a simplified version of your patch.  See
Bio/GenBank/__init__.py  revision 1.98 in CVS.

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/__init__.py?cvsroot=biopython

Thanks Bruce.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Fri Nov 28 07:37:04 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 12:37:04 +0000
Subject: [Biopython-dev] [BioPython] PubMed find_related
In-Reply-To: <580790.81356.qm@web62404.mail.re1.yahoo.com>
References: <aa5471510811241418h2a6ca97ai74aab652cdcfdaa3@mail.gmail.com>
	<580790.81356.qm@web62404.mail.re1.yahoo.com>
Message-ID: <320fb6e00811280437w8f9f3d2t84716f7a554b913@mail.gmail.com>

On Tue, Nov 25, 2008 at 4:05 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>>>> from Bio import Entrez
>>>> handle = Entrez.elink(dbfrom='pubmed',id=12345)
>>>> record = Entrez.read(handle)
>
> Feel free to write a section about Entrez.elink for the Biopython documentation :-).
> Currently, this section is almost empty.

This does need a little love, doesn't it.  Here is a slightly longer
example which could form the basis of a tutorial entry:

    >>> from Bio import Entrez
    >>> Entrez.email = "A.N.Other at example.com"
    >>> pmid = "12230038"
    >>> handle = Entrez.elink(dbfrom='pubmed', id=pmid)
    >>> result = Entrez.read(handle)
    >>> for link in result[0]["LinkSetDb"][0]['Link'] :
    ...     print link

The deeply nested nature of the XML results do suggest that a helper
function in Bio.Entrez would be useful here.  Maybe something like:

def find_related(dbfrom, id) :
    #Returns a list of dictionaries containing Score and ID matched
    result = read(elink(dbfrom=dbfrom, id=id))
    return result[0]["LinkSetDb"][0]['Link']

It might make more sense to return just a list of ID strings, but the
score may be interesting.

Peter

From biopython at maubp.freeserve.co.uk  Fri Nov 28 08:05:38 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 13:05:38 +0000
Subject: [Biopython-dev] Bio.Entrez batched downloads
Message-ID: <320fb6e00811280505m3b065877r99785f306a356aa@mail.gmail.com>

This is returning to a topic we've discussed in the past - the NCBI
Entrez API is quite low level, and the Bio.Entrez module reflects
this.  As a result certain "typical" tasks require more code than one
might expect.  In particular, batched downloads of a large result set.

The tutorial covers using Bio.Entrez.efetch in a loop to download a
result set in a batch, for example writing out a MedLine or FASTA
format file.  This seems like a common need - starting either from a
list of IDs, or better from a history webenv and query_key.  I think
there is a use for a Bio.Entrez.batched_efetch or download_many
function to save people re-implementing their own batched downloader
(even just as a copy and paste from the tutorial).

If the NCBI every give any explicit guidance on batch sizes then we
can update Biopython centrally - rather than individual scripts
requiring changes everywhere.  We might also be able to include some
basic error checking to (e.g. empty or partial downloads). One catch
is that downloading and concatenating batches as XML files does not
give a valid XML file - but this is safe for MedLine, FASTA, GenBank
etc.  This proposed function could raise an exception if used with XML
to avoid this issue.

In terms of the API for getting the data back, there are several options
* Take an output handle as an argument (which would be written to as
each batch was downloaded)
* Return a handle - the implementation would be a bit more complicated
as we should avoid holding everything in memory, but would then be
very similar to the existing Bio.Entrez.efetch function in its usage.

Other options which I don't like:
* Take an output filename (less flexible than just taking an output handle)
* Return the data as a string (memory concerns with large downloads)

Note that related functions like the deprecated
Bio.PubMed.download_many (and early versions of
Bio.GenBank.download_many) used a complicated function call back
mechanism (which required knowing the file format in advance and
having a parser for it).  This doesn't seem sensible for a generic
function.  Currently Bio.GenBank.download_many (obsolete, soon to be
deprecated) just makes a single call to Bio.Entrez.efetch, regardless
of the number of records / amount of data expected.

Peter

From biopython at maubp.freeserve.co.uk  Fri Nov 28 12:26:45 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 17:26:45 +0000
Subject: [Biopython-dev] Deprecation and removal policy
Message-ID: <320fb6e00811280926v16454fa6t891fcc74e4fa4729@mail.gmail.com>

Back on 27 June 2008, in preparation for what became Biopython 1.47,
Michiel wrote:
> In recent releases, we have been using the rule of thumb to remove all
> modules from a new Biopython release that were deprecated two
> releases ago.

I was thinking that when we made releases about six months apart, this
rule of thumb effectively gave a year's warning.  Recently we're made
releases roughly every three months, which translates to only about
six months warning, so I think we should be a little more restrained
in removing deprecated code in future.

As an example, Bio.EUtils was deprecated in favour of Bio.Entrez in
Release 1.48 (Sept 2009).  Under the old rule of thumb, we could
remove this module from CVS now (as the deprecation was present in
Biopython 1.48 and 1.49).  If we release Biopython 1.50 in January or
February 2009 (for the sake of argument), that means the deprecation
would have been in place for only four or five months - which seems
too rash.

How about a new policy that after adding a deprecation warning,
deprecated modules/functions are kept for at least two public releases
AND at least 12 months (counting from the first release when they are
deprecated - not the date of the CVS change) before being removed?

Peter

From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 15:10:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 15:10:43 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811282010.mASKAhuK012846@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-28 15:10 EST -------
(In reply to comment #8)
> Yes, but if we record something in the location_qualifier_value table we can't
> use a NULL term_id (possibly a schema limitation).  We therefore need to use a
> particular ontology, which is where some co-ordination with the other BioSQL
> projects is needed (so that we all default to the same ontology).  I'd meant
> to send of an email about this to the BioSQL mailing list but didn't get it
> done before I had to leave for a trip.

I've started a discussion on the BioSQL mailing list, see this thread:
http://lists.open-bio.org/pipermail/biosql-l/2008-November/001412.html - me
http://lists.open-bio.org/pipermail/biosql-l/2008-November/001414.html -
Richard from BioJava
http://lists.open-bio.org/pipermail/biosql-l/2008-November/001413.html - me
etc.

Cymon - if you haven't already done so, I would encourage you to sign up to the
BioSQL mailing list.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 23:48:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 23:48:46 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811290448.mAT4mkmI008416@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #53 from mdehoon at ims.u-tokyo.ac.jp  2008-11-28 23:48 EST -------
(In reply to comment #52)

> Can we close this bug now?
> 
Not yet, there are a few more things to consider in the original description.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Nov 29 00:01:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 29 Nov 2008 00:01:12 -0500
Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function
	to convert hex to integer
In-Reply-To: <bug-2685-42@http.bugzilla.open-bio.org/>
Message-ID: <200811290501.mAT51ClZ009532@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2685


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #7 from mdehoon at ims.u-tokyo.ac.jp  2008-11-29 00:01 EST -------
I used Bruce's patch and added a DeprecationWarning to the hex_convert
function, and modified the unit test accordingly.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Sat Nov 29 00:13:33 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 28 Nov 2008 21:13:33 -0800 (PST)
Subject: [Biopython-dev] Bio.Entrez batched downloads
In-Reply-To: <320fb6e00811280505m3b065877r99785f306a356aa@mail.gmail.com>
Message-ID: <432417.5854.qm@web62405.mail.re1.yahoo.com>

Sorry, but I am -1 on this. This sounds like software bloat to me.
The reason that the NCBI Entrez API is low level is that they are unable to predict how users will want to use the NCBI Entrez. We as Biopython know little more than NCBI, except that our users want to access NCBI Entrez via Python, so we provide a Python interface to NCBI Entrez. Also, I don't think that the current situation is unsatisfactory. The Bio.Entrez API is extremely simple, and with an example in the tutorial it should be very easy to use; I don't see a problem with copying and pasting from the tutorial, provided that sufficient information is available there.

--Michiel.

--- On Fri, 11/28/08, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: [Biopython-dev] Bio.Entrez batched downloads
> To: "BioPython-Dev Mailing List" <biopython-dev at lists.open-bio.org>
> Date: Friday, November 28, 2008, 8:05 AM
> This is returning to a topic we've discussed in the past
> - the NCBI
> Entrez API is quite low level, and the Bio.Entrez module
> reflects
> this.  As a result certain "typical" tasks
> require more code than one
> might expect.  In particular, batched downloads of a large
> result set.
> 
> The tutorial covers using Bio.Entrez.efetch in a loop to
> download a
> result set in a batch, for example writing out a MedLine or
> FASTA
> format file.  This seems like a common need - starting
> either from a
> list of IDs, or better from a history webenv and query_key.
>  I think
> there is a use for a Bio.Entrez.batched_efetch or
> download_many
> function to save people re-implementing their own batched
> downloader
> (even just as a copy and paste from the tutorial).
> 
> If the NCBI every give any explicit guidance on batch sizes
> then we
> can update Biopython centrally - rather than individual
> scripts
> requiring changes everywhere.  We might also be able to
> include some
> basic error checking to (e.g. empty or partial downloads).
> One catch
> is that downloading and concatenating batches as XML files
> does not
> give a valid XML file - but this is safe for MedLine,
> FASTA, GenBank
> etc.  This proposed function could raise an exception if
> used with XML
> to avoid this issue.
> 
> In terms of the API for getting the data back, there are
> several options
> * Take an output handle as an argument (which would be
> written to as
> each batch was downloaded)
> * Return a handle - the implementation would be a bit more
> complicated
> as we should avoid holding everything in memory, but would
> then be
> very similar to the existing Bio.Entrez.efetch function in
> its usage.
> 
> Other options which I don't like:
> * Take an output filename (less flexible than just taking
> an output handle)
> * Return the data as a string (memory concerns with large
> downloads)
> 
> Note that related functions like the deprecated
> Bio.PubMed.download_many (and early versions of
> Bio.GenBank.download_many) used a complicated function call
> back
> mechanism (which required knowing the file format in
> advance and
> having a parser for it).  This doesn't seem sensible
> for a generic
> function.  Currently Bio.GenBank.download_many (obsolete,
> soon to be
> deprecated) just makes a single call to Bio.Entrez.efetch,
> regardless
> of the number of records / amount of data expected.
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From mjldehoon at yahoo.com  Sat Nov 29 00:22:10 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 28 Nov 2008 21:22:10 -0800 (PST)
Subject: [Biopython-dev] [BioPython] PubMed find_related
In-Reply-To: <320fb6e00811280437w8f9f3d2t84716f7a554b913@mail.gmail.com>
Message-ID: <246349.44664.qm@web62404.mail.re1.yahoo.com>

> The deeply nested nature of the XML results do suggest that
> a helper function in Bio.Entrez would be useful here.  Maybe
> something like:
> 
> def find_related(dbfrom, id) :
>     #Returns a list of dictionaries containing Score and ID
>     # matched
>     result = read(elink(dbfrom=dbfrom, id=id))
>     return result[0]["LinkSetDb"][0]['Link']
> 
> It might make more sense to return just a list of ID
> strings, but the score may be interesting.
>

The problem this user encountered was that the DeprecationWarning in 
PubMed.find_related function contained very little information and did not mention that Entrez.elink is the appropriate function to use:

"Find related articles in PubMed, returns an ID list (DEPRECATED).
Please use Bio.Entrez instead as described in the Biopython Tutorial."

and in addition that currently the description of Bio.Entrez.elink in the tutorial is almost empty. Instead of adding a function to Bio.Entrez that helps this particular user, we should improve our documentation to enable all users to use Bio.Entrez appropriately. The set of helper functions to Bio.Entrez that we could write is virtually endless; we should not go down that path.

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Sat Nov 29 01:02:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 29 Nov 2008 01:02:01 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811290602.mAT621Lc012846@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #54 from mdehoon at ims.u-tokyo.ac.jp  2008-11-29 01:02 EST -------
All fixed now; I hope I didn't screw up anything.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Sat Nov 29 02:04:33 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 28 Nov 2008 23:04:33 -0800 (PST)
Subject: [Biopython-dev] [BioPython] PubMed find_related
In-Reply-To: <320fb6e00811280437w8f9f3d2t84716f7a554b913@mail.gmail.com>
Message-ID: <652169.76582.qm@web62406.mail.re1.yahoo.com>

I've expanded your example a bit and added it to the documentation of Entrez.elink. Thanks!

--Michiel.


--- On Fri, 11/28/08, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: Re: [BioPython] PubMed find_related
> To: mjldehoon at yahoo.com
> Cc: "BioPython-Dev Mailing List" <biopython-dev at lists.open-bio.org>
> Date: Friday, November 28, 2008, 7:37 AM
> On Tue, Nov 25, 2008 at 4:05 AM, Michiel de Hoon
> <mjldehoon at yahoo.com> wrote:
> >>>> from Bio import Entrez
> >>>> handle =
> Entrez.elink(dbfrom='pubmed',id=12345)
> >>>> record = Entrez.read(handle)
> >
> > Feel free to write a section about Entrez.elink for
> the Biopython documentation :-).
> > Currently, this section is almost empty.
> 
> This does need a little love, doesn't it.  Here is a
> slightly longer
> example which could form the basis of a tutorial entry:
> 
>     >>> from Bio import Entrez
>     >>> Entrez.email =
> "A.N.Other at example.com"
>     >>> pmid = "12230038"
>     >>> handle =
> Entrez.elink(dbfrom='pubmed', id=pmid)
>     >>> result = Entrez.read(handle)
>     >>> for link in
> result[0]["LinkSetDb"][0]['Link'] :
>     ...     print link
> 
> The deeply nested nature of the XML results do suggest that
> a helper
> function in Bio.Entrez would be useful here.  Maybe
> something like:
> 
> def find_related(dbfrom, id) :
>     #Returns a list of dictionaries containing Score and ID
> matched
>     result = read(elink(dbfrom=dbfrom, id=id))
>     return
> result[0]["LinkSetDb"][0]['Link']
> 
> It might make more sense to return just a list of ID
> strings, but the
> score may be interesting.
> 
> Peter


From biopython at maubp.freeserve.co.uk  Sat Nov 29 08:36:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 29 Nov 2008 13:36:16 +0000
Subject: [Biopython-dev] [BioPython] PubMed find_related
In-Reply-To: <246349.44664.qm@web62404.mail.re1.yahoo.com>
References: <320fb6e00811280437w8f9f3d2t84716f7a554b913@mail.gmail.com>
	<246349.44664.qm@web62404.mail.re1.yahoo.com>
Message-ID: <320fb6e00811290536n7fe25b0fxfe78d52b16014a92@mail.gmail.com>

On Sat, Nov 29, 2008 at 5:22 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> The problem this user encountered was that the DeprecationWarning in
> PubMed.find_related function contained very little information and did
> not mention that Entrez.elink is the appropriate function to use:
>
> "Find related articles in PubMed, returns an ID list (DEPRECATED).
> Please use Bio.Entrez instead as described in the Biopython Tutorial."

We could make the deprecation warnings from Bio.PubMed (and the online
bits of Bio.GenBank) a little more explicit about which bits of
Bio.Entrez to use.  I made a start on updating Bio/PubMed.py on my
work computer on Friday, so I'll try to remember to finish this off on
Monday.

> and in addition that currently the description of Bio.Entrez.elink in the
> tutorial is almost empty. Instead of adding a function to Bio.Entrez
> that helps this particular user, we should improve our documentation
> to enable all users to use Bio.Entrez appropriately.

The tutorial update for elink looks good (see below).

> The set of helper functions to Bio.Entrez that we could write is
> virtually endless; we should not go down that path.

I take your point - there are lots of possible helper functions we
could consider.  As long as we cover the typical use cases in the
tutorial that should be enough.

On Sat, Nov 29, 2008 at 7:04 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> I've expanded your example a bit and added it to the documentation of Entrez.elink. Thanks!
>
> --Michiel.

That looks good - and tries to explain the nested result structure too.

Peter

From bsouthey at gmail.com  Sun Nov 30 21:37:05 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Sun, 30 Nov 2008 20:37:05 -0600
Subject: [Biopython-dev] Deprecation and removal policy
In-Reply-To: <320fb6e00811280926v16454fa6t891fcc74e4fa4729@mail.gmail.com>
References: <320fb6e00811280926v16454fa6t891fcc74e4fa4729@mail.gmail.com>
Message-ID: <bbcd77d00811301837qf6e7909x18b09f423c55a800@mail.gmail.com>

On Fri, Nov 28, 2008 at 11:26 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Back on 27 June 2008, in preparation for what became Biopython 1.47,
> Michiel wrote:
>> In recent releases, we have been using the rule of thumb to remove all
>> modules from a new Biopython release that were deprecated two
>> releases ago.
>
> I was thinking that when we made releases about six months apart, this
> rule of thumb effectively gave a year's warning.  Recently we're made
> releases roughly every three months, which translates to only about
> six months warning, so I think we should be a little more restrained
> in removing deprecated code in future.
>
> As an example, Bio.EUtils was deprecated in favour of Bio.Entrez in
> Release 1.48 (Sept 2009).  Under the old rule of thumb, we could
> remove this module from CVS now (as the deprecation was present in
> Biopython 1.48 and 1.49).  If we release Biopython 1.50 in January or
> February 2009 (for the sake of argument), that means the deprecation
> would have been in place for only four or five months - which seems
> too rash.
>
> How about a new policy that after adding a deprecation warning,
> deprecated modules/functions are kept for at least two public releases
> AND at least 12 months (counting from the first release when they are
> deprecated - not the date of the CVS change) before being removed?
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

Hi,
Generally I would agree with idea for code that is under active
development. For certain code that has not really been touched for a
few years except for trivial changes (like removing string functions),
 I think 12 months is perhaps too long if it passes two releases.

Regardless of how it is done, Python 3 will need to be supported (the
final release is due soon) and I do not see a reason to port
depreciated modules or functions just because of some policy.  So I
would add the provision that depreciated code will not be ported to
the Python 3 compatible Biopython branch.

Bruce

From bugzilla-daemon at portal.open-bio.org  Sat Nov  1 04:02:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 1 Nov 2008 00:02:49 -0400
Subject: [Biopython-dev] [Bug 2627] Updated Bio.MarkovModel to remove
	oldnumeric and listfns imports
In-Reply-To: <bug-2627-42@http.bugzilla.open-bio.org/>
Message-ID: <200811010402.mA142nUi010329@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2627


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp  2008-11-01 00:02 EST -------
I made some changes to this patch and committed it to CVS; see MarkovModel.py
revision 1.9.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Nov  1 05:38:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 1 Nov 2008 01:38:41 -0400
Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns
	import
In-Reply-To: <bug-2631-42@http.bugzilla.open-bio.org/>
Message-ID: <200811010538.mA15cfGM016656@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2631


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp  2008-11-01 01:38 EST -------
Committed to CVS with some changes; see MaxEntropy.py versions 1.8 and 1.9.
I added your example at the bottom of Bio/MaxEntropy.py.
Next time, instead of the complete new code for a module, please attach a patch
instead. Thanks!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Nov  1 06:59:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 1 Nov 2008 02:59:40 -0400
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811010659.mA16xedF020106@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp  2008-11-01 02:59 EST -------
I committed part of this patch to CVS; see NaiveBayes.py revision 1.9.
Could you check your classify function? It seems to contain some debugging
statements. Also, do we need the classifyprob function?
If you send in a new version of this code, please attach it as a patch to the
current version of NaiveBayes.py in CVS.
Thanks!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Nov  1 21:22:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 1 Nov 2008 17:22:53 -0400
Subject: [Biopython-dev] [Bug 2592] numpy migration for Bio.PDB.Vector
In-Reply-To: <bug-2592-42@http.bugzilla.open-bio.org/>
Message-ID: <200811012122.mA1LMrf6021694@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2592


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-01 17:22 EST -------
Fixed in CVS, see Bio/PDB/Vector.py revision 1.45


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Nov  1 22:11:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 1 Nov 2008 18:11:47 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811012211.mA1MBl3b026482@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #26 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-01 18:11 EST -------
Here is an example of how the updated Seq object might be used (taken from the
new edition of the tutorial in CVS):

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", IUPAC.unambiguous_dna)
>>> coding_dna.translate()
Seq('MAIVMGR*KGAR*', HasStopCodon(IUPACProtein(), '*'))
>>> coding_dna.translate(to_stop=True)
Seq('MAIVMGR', IUPACProtein())

Using the Vertebrate Mitochondrial table instead:

>>> coding_dna.translate(table="Vertebrate Mitochondrial")
Seq('MAIVMGRWKGAR*', HasStopCodon(IUPACProtein(), '*'))
>>> coding_dna.translate(table=2)
Seq('MAIVMGRWKGAR*', HasStopCodon(IUPACProtein(), '*'))
>>> coding_dna.translate(table=2, to_stop=True)
Seq('MAIVMGRWKGAR', IUPACProtein())

As I said in comment 24, the name "to_stop" and its behaviour are taken from
the old (now obsolete) Bio.Translate module.

-------------------------------------------------------------

I'm also considering adding an additional boolean argument too (see comment
22):

> Validate the first codon is a valid start codon, and translate
> it as M (even if going on the genetic code it would normally be
> say L).  This should be a boolean argument defaulting to False,
> possible names "start", "check_start", "from_start", ...

I would prefer to avoid calling this argument "start" given the existing
meaning associated with "start" and "end" used in python strings (for
specifying a sub-sequence to be translated - discussed earlier on this bug).

This would be especially useful for translating a gene/CDS sequence into
protein where making sure a non-standard start codon is translated as "M" is
non-trivial.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov  3 11:17:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 3 Nov 2008 06:17:59 -0500
Subject: [Biopython-dev] [Bug 2638] New: test_PopGen_SimCoal_nodepend.py
	fails on Windows do to newline issue
Message-ID: <bug-2638-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2638

           Summary: test_PopGen_SimCoal_nodepend.py fails on Windows do to
                    newline issue
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Unit Tests
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


This unit test attempts to regenerate a plain text SimCoal file, and currently
fails on Windows (but passes on Linux and Mac OS X).

Patch to follow.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov  3 11:22:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 3 Nov 2008 06:22:16 -0500
Subject: [Biopython-dev] [Bug 2638] test_PopGen_SimCoal_nodepend.py fails on
	Windows do to newline issue
In-Reply-To: <bug-2638-42@http.bugzilla.open-bio.org/>
Message-ID: <200811031122.mA3BMGwX013481@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2638


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-03 06:22 EST -------
Created an attachment (id=1030)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1030&action=view)
Patch to the PopGen/SimCoal/Template.py and the unit test

Looking at the code, rather than using \n to mean a platform aware new line,
\r\n is used (this doesn't always give a CR LF, but on Windows you get CR CR LF
instead).

Also, are the template files in CVS as plain text files or binary files?  I
haven't double checked but I think they may be checked in as binary files with
DOS/Windows new lines...

I haven't committed this as I don't have SIMCOAL installed to check there are
no side effects.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov  3 11:22:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 3 Nov 2008 06:22:53 -0500
Subject: [Biopython-dev] [Bug 2638] test_PopGen_SimCoal_nodepend.py fails on
	Windows, newline issue
In-Reply-To: <bug-2638-42@http.bugzilla.open-bio.org/>
Message-ID: <200811031122.mA3BMr8B013540@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2638


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|test_PopGen_SimCoal_nodepend|test_PopGen_SimCoal_nodepend
                   |.py fails on Windows do to  |.py fails on Windows,
                   |newline issue               |newline issue


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-03 06:22 EST -------
Removed typo in the bug summary (title).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Mon Nov  3 11:48:06 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 3 Nov 2008 11:48:06 +0000
Subject: [Biopython-dev] New line issues in the source zip or tarballs
In-Reply-To: <320fb6e00809080514u5df6d9dej144c783076cbe467@mail.gmail.com>
References: <320fb6e00809060304h429f1085r301170aa93d4eb73@mail.gmail.com>
	<6d941f120809080442r1797666eu70e35c60353c5462@mail.gmail.com>
	<320fb6e00809080514u5df6d9dej144c783076cbe467@mail.gmail.com>
Message-ID: <320fb6e00811030348vb7b6068v549ebfab9f6ec76b@mail.gmail.com>

On Mon, Sep 8 Peterwrote:
> Tiago wrote:
>> Peter wrote:
>>> In the case of test_PopGen_SimCoal_nodepend.py the failure is
>>> expecting simple.par and simple_100_30.par to be exactly the same size
>>> (in class TemplateTest, line 47).  This is not true going to be true
>>> when the input file uses Unix new lines but the generated file uses
>>> Windows new lines.  Perhaps using a simple bit of code to load the
>>> files line by line and compare them would work here?
>>
>>  I am currently at a workshop (I belong to the organization committee, so I
>> don't have much time), but I will try to sort this in the next couple of
>> days.
>
> This issue new line issue has probably been there since Biopython 1.45
> without anyone else spotting it, so I don't see fixing it as urgent.
> Hopefully we can resolve this for the next release instead.

I've filed Bug 2638 on this with a possible patch.  Could you take a
look at this please?

I just tried installing SIMCOAL2 on my Mac, but failed.  To be fair,
they do only appear to support Linux and Windows...

Thanks

Peter


From biopython at maubp.freeserve.co.uk  Mon Nov  3 12:43:22 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 3 Nov 2008 12:43:22 +0000
Subject: [Biopython-dev] Bio.PopGen and SIMCOAL2 installation
Message-ID: <320fb6e00811030443w4d620c83w64c83fdafb9afa96@mail.gmail.com>

Hi Tiago,

I've just compiled SIMCOAL2 on a Linux machine from
http://cmpg.unibe.ch/software/simcoal2/ (version 2.1.2).  If anyone
else tries this, it required the use of -fpermissive on g++ 4.1.2 to
compile (and gave lots of deprecation warnings, plus some trivial ones
about header files which didn't end with a newline).

The make file specifies the executable name as simcoal2_1_2, however
it does not include an install target, so it is up to the user where
to put the binary (e.g. I used ~/bin/ rather than system wide) and
perhaps what to call it.  The provided pre-compiled binary is also
called simcoal2_1_2.

However, Bio.PopGen.SimCoal.Controller seems to assume the executable
will be called just simcoal2 (or simcoal2.exe on Windows), and thus
fails detect a binary called simcoal2_1_2.  The unit test however is
more flexible and looks for any binary on the path whose name starts
with simcoal2.  Ideally these two should be consistent.

I can make test_PopGen_SimCoal.py pass by installing SIMCOAL2 as
simcoal2 rather than simcoal2_1_2, but is this a SIMCOAL2 installation
issue or a bug in Bio.PopGen.SimCoal.Controller?  In my experience it
is not normal for a Linux tool to include the full version in the
executable name - using just simcoal2 does make more sense.

Thanks,

Peter


From bugzilla-daemon at portal.open-bio.org  Mon Nov  3 17:16:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 3 Nov 2008 12:16:41 -0500
Subject: [Biopython-dev] [Bug 2639] New: SeqRecord.init doesn't check for
	arguments to their types
Message-ID: <bug-2639-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2639

           Summary: SeqRecord.init doesn't check for arguments to their
                    types
           Product: Biopython
           Version: 1.47
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: dalloliogm at gmail.com


SeqRecord doesn't check if description is a string when creating SeqRecord
objects.
This causes an error when later you will have to print the record in formats
like fasta.

>>> from Bio.Seq import Seq
>>> from Bio.SeqRecord import SeqRecord
>>> sr = SeqRecord(Seq('aaa'), description = [1, 2, 3]) # should give an error here!
>>> print sr.fasta
<type 'exceptions.AttributeError'>: 'list' object has no attribute 'replace'

Looking at SeqRecord.__init__ code, none of the arguments is checked for its
type. 
This is a minor bug, but if you want to solve it, you just have to add some
isinstance() check in the init function.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov  3 18:47:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 3 Nov 2008 13:47:59 -0500
Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for
	arguments to their types
In-Reply-To: <bug-2639-42@http.bugzilla.open-bio.org/>
Message-ID: <200811031847.mA3IlxuE025247@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2639


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-03 13:47 EST -------
Fixed in CVS, although there is a small chance this will break existing scripts
which relied on the old lax behaviour.

Peter

P.S.
Assuming you are using an unmodified Biopython, the last line of your example
wouldn't work:
>>> print sr.fasta

Try:
>>> print sr.format("fasta")


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov  3 19:33:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 3 Nov 2008 14:33:39 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811031933.mA3JXdcZ028123@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #4 from bsouthey at gmail.com  2008-11-03 14:33 EST -------
(In reply to comment #3)
> I committed part of this patch to CVS; see NaiveBayes.py revision 1.9.
> Could you check your classify function? It seems to contain some debugging
> statements. Also, do we need the classifyprob function?
> If you send in a new version of this code, please attach it as a patch to the
> current version of NaiveBayes.py in CVS.
> Thanks!
> 

Yes, there is a print statement at the end of the 'classify' function (line 125
of attached file) that should be removed (as with any print statements that are
commented out). These were to check that the values were the same as the
original code. 

The classifyprob function can be dropped with not problems. I just wanted to
return the probability but I also recognize that it is not very useful.


I noticed you are using set (line 145 in the new cvs file) which is not
compatible with Python2.3. How should this be addressed?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From tiagoantao at gmail.com  Mon Nov  3 19:34:36 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 3 Nov 2008 19:34:36 +0000
Subject: [Biopython-dev] Bio.PopGen and SIMCOAL2 installation
In-Reply-To: <320fb6e00811030443w4d620c83w64c83fdafb9afa96@mail.gmail.com>
References: <320fb6e00811030443w4d620c83w64c83fdafb9afa96@mail.gmail.com>
Message-ID: <6d941f120811031134p4c0f1756k5ded879de7555dad@mail.gmail.com>

Hi,

On Mon, Nov 3, 2008 at 12:43 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> However, Bio.PopGen.SimCoal.Controller seems to assume the executable
> will be called just simcoal2 (or simcoal2.exe on Windows), and thus
> fails detect a binary called simcoal2_1_2.  The unit test however is
> more flexible and looks for any binary on the path whose name starts
> with simcoal2.  Ideally these two should be consistent.

I am aware of this, in fact, this issue is documented in the tutorial
(9.5.2.2). The idea is that the binary should be called simcoal2 as
documented. This can be changed of course. My preference would be to
change just the test code. Is this ok with you?

> I can make test_PopGen_SimCoal.py pass by installing SIMCOAL2 as
> simcoal2 rather than simcoal2_1_2, but is this a SIMCOAL2 installation
> issue or a bug in Bio.PopGen.SimCoal.Controller?  In my experience it
> is not normal for a Linux tool to include the full version in the
> executable name - using just simcoal2 does make more sense.

Agree. And, again, this is documented in the tutorial. I can go ahead
and change the test code (please just confirm).


From tiagoantao at gmail.com  Mon Nov  3 19:56:05 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 3 Nov 2008 19:56:05 +0000
Subject: [Biopython-dev] Statistics in population genetics module - Part
	I
In-Reply-To: <5aa3b3570811030736g7d7a0893x759777252c8d1828@mail.gmail.com>
References: <6d941f120810301658wec8678ald332abb8ddbdf80d@mail.gmail.com>
	<5aa3b3570811030736g7d7a0893x759777252c8d1828@mail.gmail.com>
Message-ID: <6d941f120811031156s2f634c1aq4252b17308ecf24a@mail.gmail.com>

Hi,

On Mon, Nov 3, 2008 at 3:36 PM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
> For how much time do you think a biopython module should be kept compatible
> with older versions, more or less?

That is an interesting discussion. My view is that biopython is fairly
conservative in that regard. I am not saying that I agree/disagree.
There seems to be a certain policy in place, and I respect it. But the
point is: Bio.PopGen has to have the same policy has the rest.

> It will take a long time to develop the module, and it is sure that we will
> make some mistakes. So, what is the best way to proceed? What if we create a

I will try to offer my view about this as soon as possible (in the next days).

> At the moment I am working with a separated git repository for all the
> popgen modules. The problem is that I didn't include all biopython modules
> in the repository, so, if any of my changes breaks something in biopython, I
> won't know it until I'll merge everything with biopython code.

It won't probably break anything as long as you don't change existing
code. If you are only doing your parser I suppose it will be very
easily accepted in (dont forget test cases and documentation).
Regarding Statistics we need to discuss it.

> p.s. When python3000 will be released, it will be probably necessary to
> rewrite large portions of biopython, if not creating a 'biopython 2' version
> (I think they were discussing something like this in bioperl's list).

Peter and Michiel opinions on this topic are be fundamental (they do
most of the work maintaining biopython). But I suppose retro
compatibility is a must.

> I thought that maybe, even if we make some 'mistakes' in this version of
> biopython, we will be able to fix them in a later version.

Mistakes should not break existing code. That is really something we
should try to avoid.

> I think that a good idea would be starting collecting use cases to have an
> idea how many things we'll have to implement in this module.

This might sound elitist, but most people doing population genetics
don't really have any idea of what they should expect from software.
While for the "business of sequences and alignment" there is a large,
mature software community, the same doesn't happen in population
genetics. Or to put it in another way: you don't want to imagine the
type of questions that arrive to my private mailbox ;) .

> I sent that mail to the Open::Bio::I last week, but still haven't received
> many replies... I will send a message to the various Bio.* mailing list in
> the next days.

OBF, in my view, is a bit slow and bureaucratic.
Anyway, i think that anybody's views will get more importance in
proportion of the quantity of code submitted and time devoted to
maintenance of the whole thing.


Tiago


From bugzilla-daemon at portal.open-bio.org  Mon Nov  3 22:58:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 3 Nov 2008 17:58:11 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811032258.mA3MwBoH008744@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-03 17:58 EST -------
(In reply to comment #4)
> 
> I noticed you are using set (line 145 in the new cvs file) which is not
> compatible with Python2.3. How should this be addressed?
> 

I've been using something like this elsewhere in Biopython:

#TODO - Remove this work around once we drop python 2.3 support
try:
    set
except NameError:
    from sets import Set as set

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Mon Nov  3 23:08:44 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 3 Nov 2008 23:08:44 +0000
Subject: [Biopython-dev] Bio.PopGen and SIMCOAL2 installation
In-Reply-To: <6d941f120811031134p4c0f1756k5ded879de7555dad@mail.gmail.com>
References: <320fb6e00811030443w4d620c83w64c83fdafb9afa96@mail.gmail.com>
	<6d941f120811031134p4c0f1756k5ded879de7555dad@mail.gmail.com>
Message-ID: <320fb6e00811031508xfef548dm1a0673b7dba70567@mail.gmail.com>

On Mon, Nov 3, 2008 at 7:34 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> Hi,
>
> On Mon, Nov 3, 2008 at 12:43 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> However, Bio.PopGen.SimCoal.Controller seems to assume the executable
>> will be called just simcoal2 (or simcoal2.exe on Windows), and thus
>> fails detect a binary called simcoal2_1_2.  The unit test however is
>> more flexible and looks for any binary on the path whose name starts
>> with simcoal2.  Ideally these two should be consistent.
>
> I am aware of this, in fact, this issue is documented in the tutorial
> (9.5.2.2). The idea is that the binary should be called simcoal2 as
> documented. This can be changed of course. My preference would be to
> change just the test code. Is this ok with you?
>
>> I can make test_PopGen_SimCoal.py pass by installing SIMCOAL2 as
>> simcoal2 rather than simcoal2_1_2, but is this a SIMCOAL2 installation
>> issue or a bug in Bio.PopGen.SimCoal.Controller?  In my experience it
>> is not normal for a Linux tool to include the full version in the
>> executable name - using just simcoal2 does make more sense.
>
> Agree. And, again, this is documented in the tutorial. I can go ahead
> and change the test code (please just confirm).

I had skimmed over the tutorial, but missed this bit - sorry.
Hopefully anyone interested in using SIMCOAL would have read this more
carefully, but perhaps it could be made more prominent? e.g. try to
include a few more keywords like install/installation and executable
as well as binary (which I did not think to search for at the time).

Let's just change test_PopGen_SimCoal.py to look for simcoal2 (or
simcoal2.exe on Windows) so it is consistent with
Bio.PopGen.SimCoal.Controller, and I would also mention what the
binary should be called in the SimCoalController __init__ docstring.

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 09:31:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 04:31:19 -0500
Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for
	arguments to their types
In-Reply-To: <bug-2639-42@http.bugzilla.open-bio.org/>
Message-ID: <200811040931.mA49VJOT019957@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2639


------- Comment #2 from dalloliogm at gmail.com  2008-11-04 04:31 EST -------
I have tested the cvs code, it seems to work.

Maybe you can allow ids to be integers, also.
If you are afraid of causing problems to older scripts, you could str() the
arguments if they are not strings.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 09:39:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 04:39:18 -0500
Subject: [Biopython-dev] [Bug 2443] Specifying the alphabet in Bio.SeqIO and
	Bio.AlignIO
In-Reply-To: <bug-2443-42@http.bugzilla.open-bio.org/>
Message-ID: <200811040939.mA49dIQ9021075@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2443


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 04:39 EST -------
Marking as fixed - unit tests updated, and the new argument is mentioned in the
tutorial as well.

A more extensive example would be nice, perhaps using Bio.AlignIO with the
Bio.Align.AlignInfo module...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 10:06:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 05:06:40 -0500
Subject: [Biopython-dev] [Bug 2628] Have Bio.SeqIO.write(...) and
	Bio.AlignIO.write(...) return number of records
In-Reply-To: <bug-2628-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041006.mA4A6eAt024777@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2628


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 05:06 EST -------
Patch checked in, marking as fixed.

Checking in Bio/SeqIO/Interfaces.py;
/home/repository/biopython/biopython/Bio/SeqIO/Interfaces.py,v  <-- 
Interfaces.py
new revision: 1.11; previous revision: 1.10
done
Checking in Bio/SeqIO/__init__.py;
/home/repository/biopython/biopython/Bio/SeqIO/__init__.py,v  <--  __init__.py
new revision: 1.44; previous revision: 1.43
done
Checking in Bio/AlignIO/Interfaces.py;
/home/repository/biopython/biopython/Bio/AlignIO/Interfaces.py,v  <-- 
Interfaces.py
new revision: 1.7; previous revision: 1.6
done
Checking in Bio/AlignIO/NexusIO.py;
/home/repository/biopython/biopython/Bio/AlignIO/NexusIO.py,v  <--  NexusIO.py
new revision: 1.7; previous revision: 1.6
done
Checking in Bio/AlignIO/__init__.py;
/home/repository/biopython/biopython/Bio/AlignIO/__init__.py,v  <-- 
__init__.py
new revision: 1.19; previous revision: 1.18
done
Checking in Tests/test_SeqIO.py;
/home/repository/biopython/biopython/Tests/test_SeqIO.py,v  <--  test_SeqIO.py
new revision: 1.44; previous revision: 1.43
done
Checking in Tests/test_AlignIO.py;
/home/repository/biopython/biopython/Tests/test_AlignIO.py,v  <-- 
test_AlignIO.py
new revision: 1.17; previous revision: 1.16
done


Checking in Tutorial.tex;
/home/repository/biopython/biopython/Doc/Tutorial.tex,v  <--  Tutorial.tex
new revision: 1.183; previous revision: 1.182
done


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 10:51:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 05:51:23 -0500
Subject: [Biopython-dev] [Bug 2640] New: Proposal: doctest for
	SeqRecord/biopython
Message-ID: <bug-2640-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640

           Summary: Proposal: doctest for SeqRecord/biopython
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P3
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: dalloliogm at gmail.com


I would like to propose to use doctest tests in biopython.
I found them very useful to understand how a script should be used, and
moreover they can act as test units.

Here it is the main documentation for unittest:
- http://www.python.org/doc/2.5.2/lib/module-doctest.html

Usually, you add a _test() function to every module, which calls the unittest
libraries, and launch it with __name__ == '__main__'.

The most significative example is added to the documentation string of every
module/function, and tested with doctest.testmod(); later, you add more tests
in a separate file, and launch them with doctest.testfile().


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 10:52:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 05:52:21 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041052.mA4AqLGX028185@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #1 from dalloliogm at gmail.com  2008-11-04 05:52 EST -------
Created an attachment (id=1031)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1031&action=view)
patch to add doctest to SeqRecord.py

here it is a patch to add doctest documentation to Bio/SeqRecord.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 11:23:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 06:23:12 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041123.mA4BNCQ0030388@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #27 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 06:23 EST -------
Created an attachment (id=1032)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view)
Patch to Bio/Seq.py to add start codon handling to translation

Patch adds a new boolean argument to the translate method and function, called
"init" (rather than my earlier suggestions like "from_start" or "check_start"
which could be considered misleading).

Docstring:

        init - Boolean, defaults to False.  Should translation check the
               first codon is a valid initiation (start) codon and translate
               it as methionine (M)?  If False, nothing special is done with
               the first codon.


Example usage of the translate function,

>>> from Bio.Seq import translate
>>> translate("TTGAAACCCTAG")
'LKP*'
>>> translate("TTGAAACCCTAG", init=True, to_stop=True)
'MKP'
>>> translate("TTGAAACCCTAG", init=True)
'MKP*'
>>> translate("TTGAAACCCTAG", to_stop=True)
'LKP'

Using the Seq method,

>>> from Bio.Seq import Seq
>>> my_seq = Seq("TTGAAACCCTAG")
>>> my_seq.translate()
Seq('LKP*', HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> my_seq.translate(init=True, to_stop=True)
Seq('MKP', ExtendedIUPACProtein())
>>> my_seq.translate(init=True)
Seq('MKP*', HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> my_seq.translate(to_stop=True)
Seq('LKP', ExtendedIUPACProtein())

Comments please.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 11:23:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 06:23:39 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041123.mA4BNdAS030439@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


dalloliogm at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1031 is|0                           |1
           obsolete|                            |


------- Comment #2 from dalloliogm at gmail.com  2008-11-04 06:23 EST -------
Created an attachment (id=1033)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1033&action=view)
patch to add doctest to SeqRecord.py

This patch is maybe clearer than the previous one - it adds an example on
adding annotations to a SeqRecord.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Tue Nov  4 11:36:50 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 4 Nov 2008 11:36:50 +0000
Subject: [Biopython-dev] Preparing for Biopython 1.49 (beta)
Message-ID: <320fb6e00811040336k12a834b9o2fa103b8fabf7ec1@mail.gmail.com>

Dear all,

The Numeric to numpy migration is done now, and we are also looking
good for python 2.6.

After a little off list discussion, its probably time to prepare the
next release.  However, given the number of changes, and therefore the
higher risk that we've broken something, we'll call this a beta
release.

Are there any bugs or issues people think should block this release?

I would like to check in my initiation/start codon argument patch for
translation (see Bug 2381), but would like a little discussion on this
first (in particular the argument naming).

I'd like to try and do the Biopython 1.49 "beta" release at the end of
this week (with a follow up Biopython 1.49 "final" release say one
week later if needed to deal with any issues from the beta).

If this schedule is realistic, then Tiago should be OK to add his next
set of PopGen code in about two weeks time (for what would become
Biopython 1.50).

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 11:48:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 06:48:53 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041148.mA4Bmrag032109@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 06:48 EST -------
I think we would need to integrate this into the existing test framework so
that any new doctests are actually used.  For an example of this on a module by
module basis, see test_Wise.py and test_psw.py (although these don't interact
well with our test framework on Python 2.3, see bug 2613).

If a large number of Biopython modules have doctests then a more automated
system could be designed (searching all non-deprecated modules for doctests).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 12:04:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 07:04:54 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041204.mA4C4sHS000823@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #28 from dalloliogm at gmail.com  2008-11-04 07:04 EST -------
(In reply to comment #27)
> Created an attachment (id=1032)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details]
> Patch to Bio/Seq.py to add start codon handling to translation
> 
> Patch adds a new boolean argument to the translate method and function, called
> "init" (rather than my earlier suggestions like "from_start" or "check_start"
> which could be considered misleading).
> 
> Docstring:
> 
>         init - Boolean, defaults to False.  Should translation check the
>                first codon is a valid initiation (start) codon and translate
>                it as methionine (M)?  If False, nothing special is done with
>                the first codon.

I don't like the name 'init' :( it would be better to use an argument with the
word 'force' in it. E.g.: force_has_coding, force_first_position, etc..

If you didn't have read this discussion in this bug report, it is not very
clear what happens when init=True and why.
You should add a description of why there is this options in the docstring.

> 
> Example usage of the translate function,
> 
> >>> from Bio.Seq import translate
> >>> translate("TTGAAACCCTAG")
> 'LKP*'
> >>> translate("TTGAAACCCTAG", init=True, to_stop=True)
> 'MKP'

Without having read the discussion in this bug report, I was expecting an
exception here.. why does it forces a Methionine to be in the first position? 
It loses the information of a Leu in the first position.

> >>> translate("TTGAAACCCTAG", init=True)
> 'MKP*'
> >>> translate("TTGAAACCCTAG", to_stop=True)
> 'LKP'
> 

You could add a check for non coding aminoacids:
>>> translate("UAACAGTGCAT")
ExceptionError: Non coding aminoacid in the first position


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 12:28:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 07:28:56 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041228.mA4CSuvT002892@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #29 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 07:28 EST -------
(In reply to comment #28)
> (In reply to comment #27)
> > Created an attachment (id=1032)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] [details]
> > Docstring:
> > 
> >         init - Boolean, defaults to False.  Should translation check the
> >                first codon is a valid initiation (start) codon and translate
> >                it as methionine (M)?  If False, nothing special is done with
> >                the first codon.
> 
> I don't like the name 'init' :( it would be better to use an argument with the
> word 'force' in it. E.g.: force_has_coding, force_first_position, etc..

Maybe - but I don't think force_has_coding, force_first_position are any
clearer, and they are very long.  Do you like "with_start_codon" or
"with_init_codon"?

Note that I used "init" rather than "initiation (codon)" because python already
uses init as shorthand for initiation/initialisation.

> If you didn't have read this discussion in this bug report, it is not very
> clear what happens when init=True and why.

If it have been called "start" or "from_start" or "start_codon" the meaning
isn't clear either - you might "start" or expect "from_start" to take an
integer location, and start_codon to take a three letter string.

> You should add a description of why there is this options in the docstring.

OK - That makes sense.

> > 
> > Example usage of the translate function,
> > 
> > >>> from Bio.Seq import translate
> > >>> translate("TTGAAACCCTAG")
> > 'LKP*'
> > >>> translate("TTGAAACCCTAG", init=True, to_stop=True)
> > 'MKP'
> 
> Without having read the discussion in this bug report, I was expecting an
> exception here.. why does it forces a Methionine to be in the first position? 
> It loses the information of a Leu in the first position.

Because if this was a CDS using an alternative start codon of TTG it would be
translated as a methionine and NOT as a leucine (because instead of a typical
tRNA-Leu, an initiation tRNA is used).  This is whole point of this optional
argument.  If you want TTG translated blindly as M, don't use the init argument
(or set it to False).

See also http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi which
explicitly lists these alternative codons as giving M when used as starts, e.g.

    AAs  = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts = ---M---------------M---------------M----------------------------
  Base1  = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Base2  = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
  Base3  = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 13:41:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 08:41:51 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041341.mA4DfpYD009210@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp  2008-11-04 08:41 EST -------
I've committed Peter's fix for the set import to CVS.

About the replacement for listfns.contents in the modified NaiveBayes code: Did
you do any timings to compare the new code to the old code? Since
listfns.contents is implemented in C, it may be (much) faster than the
replacement code.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 13:57:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 08:57:48 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041357.mA4Dvm2B010202@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #30 from lpritc at scri.sari.ac.uk  2008-11-04 08:57 EST -------
(In reply to comment #29)
> (In reply to comment #28)
> > (In reply to comment #27)
> > > Created an attachment (id=1032)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] [details] [details]
> > > Docstring:
> > > 
> > >         init - Boolean, defaults to False.  Should translation check the
> > >                first codon is a valid initiation (start) codon and translate
> > >                it as methionine (M)?  If False, nothing special is done with
> > >                the first codon.
> > 
> > I don't like the name 'init' :( it would be better to use an argument with the
> > word 'force' in it. E.g.: force_has_coding, force_first_position, etc..
> 
> Maybe - but I don't think force_has_coding, force_first_position are any
> clearer, and they are very long.  Do you like "with_start_codon" or
> "with_init_codon"?

I think that there are two key things that are going on as a result of this
setting being True:

1) The first codon (starting at position 0) of the nucleotide sequence is being
checked as a valid initiation codon

2) If it is such a valid codon, the translated aa is Met (because this is what
happens biologically).

It's quite a complicated concept, and if we wanted to be completely explicit,
an option called 'assert_first_codon_is_initiation_and_translate_to_met' would
be clear, but would be far too long to be sensible.  Most other shorter options
are either ambiguous, misleading, or ambiguously misleading - largely because
people will assume that the term means what they want it to mean instead of
what it does, as described below:

> If it have been called "start" or "from_start" or "start_codon" the meaning
> isn't clear either - you might "start" or expect "from_start" to take an
> integer location, and start_codon to take a three letter string.

I am not too worried about long arguments, so 'assert_first_codon_init' would
be fine for me (though does this mean that the first codon of the sequence
should be an initiation codon, or that translation should start from the first
initiation codon?), but I see the drive for, and value of, brevity.  If there's
a short, unambiguous option name that you can think of, I'm all for it.  An
option name that is a little cryptic, but not misleading, such as 'init', also
works for me.  I would have to go to the minor effort of typing
help(seq.translate) to find out what it meant, but it's not very much of a
chore.

Also, people learn all kinds of non-standard uses for cryptic terms, all the
time.  For example, what on earth does 'popen3' mean?  Why not
open_pipes_with_stdin_stdout_stderr?  'popen3' is short, unambiguous (if not
immediately obvious), and if you want to know what it means, then help() or a
dip in the documentation will tell you.  I think the same will be true of
'init', so long as no-one is likely to confuse it with some other meaning.

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 13:58:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 08:58:21 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041358.mA4DwLiK010266@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #4 from dalloliogm at gmail.com  2008-11-04 08:58 EST -------
(In reply to comment #3)
> I think we would need to integrate this into the existing test framework so
> that any new doctests are actually used.  For an example of this on a module by
> module basis, see test_Wise.py and test_psw.py (although these don't interact
> well with our test framework on Python 2.3, see bug 2613).
> 
> If a large number of Biopython modules have doctests then a more automated
> system could be designed (searching all non-deprecated modules for doctests).
> 

If you think it would be useful, I can write other doctests for other modules
in the following days.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 14:44:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 09:44:15 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041444.mA4EiFv5013693@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #31 from dalloliogm at gmail.com  2008-11-04 09:44 EST -------
(In reply to comment #30)
> (In reply to comment #29)
> > (In reply to comment #28)
> > > (In reply to comment #27)
> > > > Created an attachment (id=1032)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] [details] [details] [details]
> 
> It's quite a complicated concept, and if we wanted to be completely explicit,
> an option called 'assert_first_codon_is_initiation_and_translate_to_met' would
> be clear, but would be far too long to be sensible.  Most other shorter options
> are either ambiguous, misleading, or ambiguously misleading - largely because
> people will assume that the term means what they want it to mean instead of
> what it does, as described below:
> 
> > If it have been called "start" or "from_start" or "start_codon" the meaning
> > isn't clear either - you might "start" or expect "from_start" to take an
> > integer location, and start_codon to take a three letter string.
> 
> I am not too worried about long arguments, so 'assert_first_codon_init' would
> be fine for me (though does this mean that the first codon of the sequence
> should be an initiation codon, or that translation should start from the first
> initiation codon?), but I see the drive for, and value of, brevity.  If there's
> a short, unambiguous option name that you can think of, I'm all for it.  An
> option name that is a little cryptic, but not misleading, such as 'init', also
> works for me. 

When I saw 'init' for the first time, I thought there it was some kind of
complicated calculus associated with the translate function, that init=False
was meant to skip in order to have some kind of faster but less accurate
translation.

> I would have to go to the minor effort of typing
> help(seq.translate) to find out what it meant, but it's not very much of a
> chore.


It is also a matter of code readibility; I don't think many people would
understand that init is meant for that by looking at a script.

If I use this option in one of my scripts, and a colleague reads it, I want to
be sure that he will be easily understand that I am forcing the first position
to be a Methionine.
Otherwise, the risk is that he won't understand properly my results.

In which of these examples do you understand that the first position is being
forced to a Methionine?
>>> translate("TTGAAACCCTAG", init=True, to_stop=True)

>>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True)

>>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True)

>>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True)

>>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True)

Also, I don't think this option will be used very often. 
So, it shouldn't be a problem if its name is too long to type, and it would be
better if it is easy to understand.


> 
> Also, people learn all kinds of non-standard uses for cryptic terms, all the
> time.  For example, what on earth does 'popen3' mean?  Why not
> open_pipes_with_stdin_stdout_stderr?  'popen3' is short, unambiguous (if not
> immediately obvious),

When I was a python newbie, I really hated the name popen3 :)

> and if you want to know what it means, then help() or a
> dip in the documentation will tell you.  I think the same will be true of
> 'init', so long as no-one is likely to confuse it with some other meaning.
> 
> L.
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 14:45:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 09:45:17 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041445.mA4EjHg4013777@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 09:45 EST -------
(In reply to comment #4)
> 
> If you think it would be useful, I can write other doctests for other modules
> in the following days.
> 

I think adding more doctests would be useful, but they MUST get run by our
existing test suite.  Otherwise they'll just be human readable documentation
(which is still nice) but will not get regularly validated.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 15:39:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 10:39:42 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041539.mA4Fdgc8017798@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #32 from lpritc at scri.sari.ac.uk  2008-11-04 10:39 EST -------
(In reply to comment #31)
> (In reply to comment #30)

> It is also a matter of code readibility; I don't think many people would
> understand that init is meant for that by looking at a script.

True enough, but if someone's already used it, and you don't know what it means
when reading their script, looking it up isn't hard.  What's hard is guessing
which option you need to invoke, and calling help() is one way to do that,
too...

Not that I want to extend this argument to single-letter options with *no*
relevance to their intent ;)

seq.translate(a=True, b='GUG', c=9)

> If I use this option in one of my scripts, and a colleague reads it, I want to
> be sure that he will be easily understand that I am forcing the first position
> to be a Methionine.
> Otherwise, the risk is that he won't understand properly my results.

Maybe put it in a comment-line?   Even if the colleague understands from the
code *that* you've translated an alternative start to a methionine, they may
not understand *why* - and the comment line is essential, then.

> In which of these examples do you understand that the first position is being
> forced to a Methionine?

None are particularly clear, but only one of them doesn't give me the wrong
idea...

> >>> translate("TTGAAACCCTAG", init=True, to_stop=True)

Because I've read this thread (or looked at the docs) - I understand this one
;)

> >>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True)

I don't intuitively understand this.  Does it mean that the sequence should be
translatable?

> >>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True)

Does this mean that the sequence will be translated from the first methionine
the method finds?

> >>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True)

As above, and does force_stop mean that you add a '*' to the end of the
translation?  Or that you stop at a stop codon?

> >>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True)

'alt_start' I would think referred to allowing translation from alternative
start codons.  I don't know what alt_stop would mean...

> Also, I don't think this option will be used very often. 

Maybe not.  The first use case that comes to mind is QA on CDS-finding:

# Check if sequence is CDS:
assert candidate_cds.translate(init=True)
# Check if reported CDS start is valid
assert est[37:].translate(init=True)

A second use case is slower in presenting itself...

> So, it shouldn't be a problem if its name is too long to type, and it would be
> better if it is easy to understand.

That's a fair argument, I think.  On the whole, though, I would favour a short,
unambiguous, slightly cryptic name over a very long, unambiguous name, over an
ambiguous name of any length.

> When I was a python newbie, I really hated the name popen3 :)

At least we have subprocess, now. 

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 15:44:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 10:44:47 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041544.mA4FilLH018113@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #6 from dalloliogm at gmail.com  2008-11-04 10:44 EST -------
(In reply to comment #5)
> (In reply to comment #4)
> > 
> > If you think it would be useful, I can write other doctests for other modules
> > in the following days.
> > 
> 
> I think adding more doctests would be useful, but they MUST get run by our
> existing test suite.  Otherwise they'll just be human readable documentation
> (which is still nice) but will not get regularly validated.

There are a few ways to do it, but it is not too difficult to implement.
The easiest thing is to use 'doctest.testmod' in the test files.
For example, you can add to test_SeqRecord.py the following lines:

import doctest 
from Bio import SeqRecord   # import the module, not SeqRecord.SeqRecord
print "testing with doctest..."
(failures, tests) = doctest.testmod(SeqRecord)
if failures == 0:
    print 'ok'
else:
    print 'some test has failed'

or you can launch the '_test' function in every module (see my patch), but this
would require importing doctest multiple times.
>>> SeqRecord._test()


I will write some other doctests in the following days/weeks and post them here
as patches, and you will decide.
Anyway, do you think they will make biopython's documentation nicer? Do you
like them?
Sometimes, doctests make the doc strings a bit messy, so some people don't like
them. 
But it is really a matter of how you write them.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 16:11:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 11:11:49 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041611.mA4GBnuW020154@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 11:11 EST -------
(In reply to comment #32)
> > In which of these examples do you understand that the first position is
> > being forced to a Methionine?

With my suggested code, you would not just be forcing the first codon to be a
methionine.  You would also be asking for the first codon to be validated as a
start codon (initialisation codon).

> None are particularly clear, but only one of them doesn't give me the wrong
> idea...

In some cases I seem to have guessed different possible meanings for some of
these suggested names - so those are probably unclear.

> > >>> translate("TTGAAACCCTAG", init=True, to_stop=True)
> 
> Because I've read this thread (or looked at the docs) - I understand this one
> ;)

To me this suggests something special is happening with the initialisation of
the translation - but I agree its not clear what without checking the
documentation.

> > >>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True)
> 
> I don't intuitively understand this.  Does it mean that the sequence should be
> translatable?

Ditto - an argument called force_as_translating means nothing to me.  You're
calling a translation method so what can forcing a translation mean?

> > >>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True)
> 
> Does this mean that the sequence will be translated from the first methionine
> the method finds?

I would have guessed force_methionine would ignore the value of the first three
nucleotides in order to treat them as a methionine (even if they are not a
start codon).

> > >>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True)
> 
> As above, and does force_stop mean that you add a '*' to the end of the
> translation?  Or that you stop at a stop codon?

Like Leighton, I would be confused by "force_stop".  It could mean add a stop
symbol to the end of the amino acid sequence even if there isn't one there
already.

> > >>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True)
> 
> 'alt_start' I would think referred to allowing translation from alternative
> start codons.  I don't know what alt_stop would mean...

I think "alt_start" would be misleading for the intended dual functionality. 
Consider the typical use case for this option - translating a CDS, which most
of the time will use the typical start codon AUG / ATG (but not all ways). 
We'd want the start codon validated - and it often won't be an alternative
start codon.  So calling the argument "alt_start" is confusing.

> > Also, I don't think this option will be used very often. 
> 
> Maybe not.  The first use case that comes to mind is QA on CDS-finding:
> 
> # Check if sequence is CDS:
> assert candidate_cds.translate(init=True)
> # Check if reported CDS start is valid
> assert est[37:].translate(init=True)
> 
> A second use case is slower in presenting itself...

I think translating a CDS is quite a common task - so a very long argument
would be bad.

Instead of the "init" start codon option in attachment 1032, I'd also be happy
with a single boolean argument which does start codon validation, treats this
as a methionine, checks the sequence is a multiple of three in length, checks
for a final stop codon, and checks for no additional stop codons.  We'd ruled
out calling this "complete", but maybe "cds" would be better?

> > So, it shouldn't be a problem if its name is too long to type, and it would
> > be better if it is easy to understand.
> 
> That's a fair argument, I think.  On the whole, though, I would favour a
> short, unambiguous, slightly cryptic name over a very long, unambiguous
> name, over an ambiguous name of any length.

There is a lot of subjectiveness in argument naming - clearly we have not come
up with a perfect suggestion yet.

Unfortunately "init" can be misunderstood (I'm not 100% sure what you were
trying to say in comment 31, but I think you thought from the name "init" could
be some sort of optional optimisation initialisation).

How about "cds_start" instead of "init"?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 17:43:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 12:43:53 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041743.mA4Hhrcc026138@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #34 from bsouthey at gmail.com  2008-11-04 12:43 EST -------
(In reply to comment #33)
> (In reply to comment #32)
> > > In which of these examples do you understand that the first position is
> > > being forced to a Methionine?
> 
> With my suggested code, you would not just be forcing the first codon to be a
> methionine.  You would also be asking for the first codon to be validated as a
> start codon (initialisation codon).
> 
> > None are particularly clear, but only one of them doesn't give me the wrong
> > idea...
> 
> In some cases I seem to have guessed different possible meanings for some of
> these suggested names - so those are probably unclear.
> 
> > > >>> translate("TTGAAACCCTAG", init=True, to_stop=True)
> > 
> > Because I've read this thread (or looked at the docs) - I understand this one
> > ;)
> 
> To me this suggests something special is happening with the initialisation of
> the translation - but I agree its not clear what without checking the
> documentation.
> 
> > > >>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True)
> > 
> > I don't intuitively understand this.  Does it mean that the sequence should be
> > translatable?
> 
> Ditto - an argument called force_as_translating means nothing to me.  You're
> calling a translation method so what can forcing a translation mean?
> 
> > > >>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True)
> > 
> > Does this mean that the sequence will be translated from the first methionine
> > the method finds?
> 
> I would have guessed force_methionine would ignore the value of the first three
> nucleotides in order to treat them as a methionine (even if they are not a
> start codon).
> 
> > > >>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True)
> > 
> > As above, and does force_stop mean that you add a '*' to the end of the
> > translation?  Or that you stop at a stop codon?
> 
> Like Leighton, I would be confused by "force_stop".  It could mean add a stop
> symbol to the end of the amino acid sequence even if there isn't one there
> already.
> 
> > > >>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True)
> > 
> > 'alt_start' I would think referred to allowing translation from alternative
> > start codons.  I don't know what alt_stop would mean...
> 
> I think "alt_start" would be misleading for the intended dual functionality. 
> Consider the typical use case for this option - translating a CDS, which most
> of the time will use the typical start codon AUG / ATG (but not all ways). 
> We'd want the start codon validated - and it often won't be an alternative
> start codon.  So calling the argument "alt_start" is confusing.
> 
> > > Also, I don't think this option will be used very often. 
> > 
> > Maybe not.  The first use case that comes to mind is QA on CDS-finding:
> > 
> > # Check if sequence is CDS:
> > assert candidate_cds.translate(init=True)
> > # Check if reported CDS start is valid
> > assert est[37:].translate(init=True)
> > 
> > A second use case is slower in presenting itself...
> 
> I think translating a CDS is quite a common task - so a very long argument
> would be bad.
> 
> Instead of the "init" start codon option in attachment 1032 [details], I'd also be happy
> with a single boolean argument which does start codon validation, treats this
> as a methionine, checks the sequence is a multiple of three in length, checks
> for a final stop codon, and checks for no additional stop codons.  We'd ruled
> out calling this "complete", but maybe "cds" would be better?
> 
> > > So, it shouldn't be a problem if its name is too long to type, and it would
> > > be better if it is easy to understand.
> > 
> > That's a fair argument, I think.  On the whole, though, I would favour a
> > short, unambiguous, slightly cryptic name over a very long, unambiguous
> > name, over an ambiguous name of any length.
> 
> There is a lot of subjectiveness in argument naming - clearly we have not come
> up with a perfect suggestion yet.
> 
> Unfortunately "init" can be misunderstood (I'm not 100% sure what you were
> trying to say in comment 31, but I think you thought from the name "init" could
> be some sort of optional optimisation initialisation).
> 
> How about "cds_start" instead of "init"?
> 


As I think about this and the various comments, I do that you must apply the
same reasoning to non-standard translation as was applied to the ORF finding
comments. From that I understand that you want a basic translation function so
function arguments like to_stop or cds_start would be inappropriate. Also, even
if it was possible, I do not see that validating all known start codons under
all genetic codes fits here.

Rather I think the various comments reflect various combinations of three major
steps:

1) Identify the region to be translated like NCBI's sequence viewer: range from
'begin' to 'end' to denote the region to be viewed. Under this view, start_from
or begin_at could be the position to start or the first occurrence of a start
codon. Likewise to_end or end_at could be a position or the first occurrence of
a stop codon. I also note this also implies frame but I think that has a
separate meaning.

2) Having defined the region to be translated, translate that region as defined
by the frame and selected table. A question here is that if region is defined
then should the frame be set to one or not.

3) Address any non-standard codons to the translated sequence. If you are going
to allow non-standard start codons, you also need to handle selenocysteine
(http://en.wikipedia.org/wiki/Selenocysteine) and less so pyrrolysine
(http://en.wikipedia.org/wiki/Pyrrolysine). Technically, you can argue the
table used for translation in 2) should reflect this but I consider it a
separate issue. Also, the occurrence of a stop codon would likewise need to
change.

The non-standard codon usages are rare and I do really question if these are
really part of the Seq object translate function or belong elsewhere. I really
feel that if the user already knows that it is a non-AUG start codon then they
can replace the first amino acid with Met rather than rely on the translate
function. For example, the CDS field in the Genbank record for Mouse
Neuropeptide W (NM_001099664) has:
/exception="alternative start codon"
/note="non-AUG (CUG) translation initiation codon".
So if the user looked at the record then then would know it would need to be
changed.

If some form of the non-standard codons is included I would think some variant
of Leighton's assert idea should be preferred such as using an
assert_nonstandard argument (or just nonstandard). This would be a string, list
or tuple to denote the changes to be made such as say 'Met1' or 'M1' where
three or single letter code of the desired amino acid and the number is the
location within the amino acid sequence to be changed. So Met1 would mean
changing the amino acid at position one with Methionine (M). But I recognize
this is not sufficient to handle other non-standard cases with stop codons.


Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 18:28:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 13:28:19 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811041828.mA4ISJAd028961@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #35 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 13:28 EST -------
(In reply to comment #34)
> As I think about this and the various comments, I do that you must apply the
> same reasoning to non-standard translation as was applied to the ORF finding
> comments. From that I understand that you want a basic translation function so
> function arguments like to_stop or cds_start would be inappropriate.

There is certainly an argument that the Bio.Seq translate function/methods
should be kept as simple as possible while providing widely useful
functionality.  Perhaps given the lack of immediate agreement we are at that
point already?  Or perhaps this is a reflection of the different types of
organisms people work with and thus the relative frequencies of non-standard
start codons.

> Also, even if it was possible, I do not see that validating all known start
> codons under all genetic codes fits here.

We have the valid start codons in the CodonTable objects derived from the NCBI,
so it is possible to check them.

> ... Address any non-standard codons to the translated sequence. If you are
> going to allow non-standard start codons, you also need to handle
> selenocysteine (http://en.wikipedia.org/wiki/Selenocysteine) and less so
> pyrrolysine (http://en.wikipedia.org/wiki/Pyrrolysine). 

Why?  Non-standard codons are pretty common in prokaryotes and the rules for
translating them are simple (once the start codon is identified).

On the other hand selenocysteine and pyrrolysine are very rare, and we can't
define a computer rule to deal with them - so we don't even try.

> The non-standard codon usages are rare and I do really question if these are
> really part of the Seq object translate function or belong elsewhere. I really
> feel that if the user already knows that it is a non-AUG start codon then they
> can replace the first amino acid with Met rather than rely on the translate
> function. For example, the CDS field in the Genbank record for Mouse
> Neuropeptide W (NM_001099664) has:
> /exception="alternative start codon"
> /note="non-AUG (CUG) translation initiation codon".
> So if the user looked at the record then then would know it would need to be
> changed.

Non-standard start codons are not that rare in prokaryotes (and I would not
expect them to be annotated like your mouse example).  When translating a well
annotated sequence, the location itself should be enough.

[I'm assuming we're not talking about the other meaning of the phrase
"alternative start codons" - where a gene may have multiple valid start codons
giving proteins of different lengths but the same C-terminal region.]

> If some form of the non-standard codons is included I would think some
> variantof Leighton's assert idea should be preferred such as using an
> assert_nonstandard argument (or just nonstandard). This would be a string, 
> list or tuple to denote the changes to be made such as say 'Met1' or 'M1'
> where three or single letter code of the desired amino acid and the number
> is the location within the amino acid sequence to be changed. So Met1 would
> mean changing the amino acid at position one with Methionine (M). But I
> recognize this is not sufficient to handle other non-standard cases with
> stop codons.

I thought Leighton was just proposing another name for a boolean argument which
I had called "init" in attachment 1032.

I'm afraid I don't understand your idea of a complicated list argument.

=============================================================================

Here is a concrete example, there are 418 annotated genes in E. coli K12 with
non-standard start codons - which you might want to translate into proteins.

#Using
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K12_substr__MG1655/NC_000913.ffn
>>> from Bio import SeqIO
>>> odd = [record for record in SeqIO.parse(open("NC_000913.ffn"),"fasta") \
           if str(record.seq[:3]) <> "ATG"]
>>> print "There are %i genes not starting ATG" % len(odd)
There are 481 genes not starting ATG
>>> record = odd[0]
>>> print record.format("fasta")
>ref|NC_000913.2|:5234-5530
GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCA
GCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGAT
AATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACAT
TATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCAT
AAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA

This starts GTC which is a valid bacterial start codon.  I'd like to translate
this and get the actual biologically relevant protein as given in the GenBank
file NC_000913.gbk (maybe with or without the stop symbol at the end).  See:

     CDS             5234..5530
                     /gene="yaaX"
                     /locus_tag="b0005"
                     /codon_start=1
                     /transl_table=11
                     /product="predicted protein"
                     /protein_id="NP_414546.1"
                     /db_xref="ASAP:ABE-0000015"
                     /db_xref="UniProtKB/Swiss-Prot:P75616"
                     /db_xref="GI:16127999"
                     /db_xref="ECOCYC:G6081"
                     /db_xref="EcoGene:EG14384"
                     /db_xref="GeneID:944747"
                     /translation="MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGY
                     YWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR"

Without any non-standard start codon support, my translations start with a V:

>>> print record.seq.translate(table=11)
VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR*
>>> print record.seq.translate(table=11, to_stop=True)
VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR

With this proposed functionality I can obtain the desired results (both with
and without the terminator stop symbol):

>>> print record.seq.translate(table=11, to_stop=True, init=True)
MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR
>>> print record.seq.translate(table=11, init=True)
MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR*

I think that wanting to translate a CDS like this is a fairly common operation.
 Perhaps not as common as translation of a partial sequence, or translating
whole genomes or contigs where we want to translate through the stop codons --
but nevertheless, a common need.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 22:47:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 17:47:02 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811042247.mA4Ml2At014897@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #7 from bsouthey at gmail.com  2008-11-04 17:47 EST -------
(In reply to comment #6)
> I've committed Peter's fix for the set import to CVS.
> 
> About the replacement for listfns.contents in the modified NaiveBayes code: Did
> you do any timings to compare the new code to the old code? Since
> listfns.contents is implemented in C, it may be (much) faster than the
> replacement code.
> 

(Hopefully I created a patch correctly.)

The purpose of listfns.contents() is to compute the frequency of each class and
return it as a dictionary. There is a difference but it is very small between
the different versions (1/100ths of second) for what I have looked at (which is
more than the actual listfns.contents function).  


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  4 22:48:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 17:48:12 -0500
Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns
	import
In-Reply-To: <bug-2631-42@http.bugzilla.open-bio.org/>
Message-ID: <200811042248.mA4MmCiZ015012@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2631


------- Comment #6 from bsouthey at gmail.com  2008-11-04 17:48 EST -------
Created an attachment (id=1036)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1036&action=view)
Patch to NaiveBayes


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 02:33:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 21:33:32 -0500
Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns
	import
In-Reply-To: <bug-2631-42@http.bugzilla.open-bio.org/>
Message-ID: <200811050233.mA52XWrB025772@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2631


------- Comment #7 from bsouthey at gmail.com  2008-11-04 21:33 EST -------
(In reply to comment #6)
> Created an attachment (id=1036)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1036&action=view) [details]
> Patch to NaiveBayes
> 

Sorry about this as I do not know how this ended up here. Please just ignore
it.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 02:35:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Nov 2008 21:35:53 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811050235.mA52Zr0b025894@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


bsouthey at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1014 is|0                           |1
           obsolete|                            |


------- Comment #8 from bsouthey at gmail.com  2008-11-04 21:35 EST -------
Created an attachment (id=1037)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1037&action=view)
Patch to update NaiveBayes

Hopefully I got this correct, if not just let me know.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 10:24:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 05:24:15 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051024.mA5AOF60024355@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-05 05:24 EST -------
(In reply to comment #8)
> Created an attachment (id=1037)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1037&action=view) [details]
> Patch to update NaiveBayes
> 
> Hopefully I got this correct, if not just let me know.
> 

At first glance it looks like this patch would remove the Python 2.3 set work
around.  Easily fixed.

Also, I would have called the new get_content_freq function _get_content_freq
(leading underscore denoting private) as this is an implementation detail that
doesn't need to be part of the public API.

I'm curious what your other implementations looked like, as this one does not
look that clear to me at first read:

    p_contents=1.0/len(contents)
    content_freqs={}
    for cval in contents:
        vcount=content_freqs.get(cval,0)+p_contents
        content_freqs.update({cval:vcount})

In particular, why use the dict update method?

Given the possible rounding issues, does doing the rescaling (dividing by the
number of elements) at the start make a big time saving (over dividing each
total at the end)?  I would feel happier with the division at the end (as done
in the listfns code).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 12:06:04 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 07:06:04 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051206.mA5C64Pg030176@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-05 07:06 EST -------
I've updated Bio.Seq, Bio.SeqIO and Bio.AlignIO so my existing docstring
examples can be used with doctest.

Adding code via the __main__ trick to allow each module's test to be run
individually might be worthwhile.

The rest of this message is a possible "test_docstrings.py" file for our unit
tests, which would require manual updating whenever we want to test a
additional module.  This is probably a neat short term solution while only a
relatively small proportion of Biopython uses doctests.

-----------------------------------------------------------------
#!/usr/bin/env python
# This code is part of the Biopython distribution and governed by its
# license.  Please see the LICENSE file that should have been included
# as part of this package.

import doctest, unittest

from Bio import Seq, SeqRecord, SeqIO, AlignIO
test_modules = [Seq, SeqRecord, SeqIO, AlignIO]

test_suite = unittest.TestSuite((doctest.DocTestSuite(module) \
                                 for module in test_modules))

#Using sys.stdout prevent this working nicely when run from idle:
#runner = unittest.TextTestRunner(sys.stdout, verbosity = 0)

#Using verbosity = 0 means we won't have to regenerate the unit
#test output file used by the run_tests.py framework whenever a
#new module or doctest is added.
runner = unittest.TextTestRunner(verbosity = 0)
runner.run(test_suite)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 13:12:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 08:12:28 -0500
Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like
	5933^5934 in GenBank/EMBL files
In-Reply-To: <bug-2622-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051312.mA5DCSYZ004411@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2622


chapmanb at 50mail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #4 from chapmanb at 50mail.com  2008-11-05 08:12 EST -------
Fixed with Bio/GenBank/__init__.py 1.93, Bio/SeqFeature.py 1.14.

Coordinates are now passed correctly with Peter's suggested fix. The empty
slice issue is resolved by adding this as a special case to FeatureLocation
nofuzzy attribute retrieval. For standard retrieval the classes are fully
available to the user and they would need to make the distinction about how
they would like to treat them.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 13:14:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 08:14:51 -0500
Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in
	GenBank files
In-Reply-To: <bug-2225-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051314.mA5DEpVe004918@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2225


chapmanb at 50mail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from chapmanb at 50mail.com  2008-11-05 08:14 EST -------
Fixed with Bio/GenBank/__init__.py 1.93, Bio/GenBank/Record.py 1.11 and
Bio/GenBank/Scanner.py 1.24

The PROJECT line is parsed as a list of projects for both SeqIO and Record
based parsing, for consistency. Output of PROJECT line also added.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 13:18:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 08:18:22 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051318.mA5DIMPJ005649@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp  2008-11-05 08:18 EST -------
See
http://coreygoldberg.blogspot.com/2008/07/python-counting-items-in-list.html
for some timings of this operation. I think Bruce's approach is most suitable,
except for the dict update method; I would use
        content_freqs[cval] = content_freqs.get(cval,0)+p_contents
instead. Depending on the contents of the list, sometimes it runs even faster
than the implementation in listfns.
> 
> Given the possible rounding issues, does doing the rescaling (dividing by the
> number of elements) at the start make a big time saving (over dividing each
> total at the end)?  I would feel happier with the division at the end (as done
> in the listfns code).
> 
I think the rescaling at the start is a good thing. If the list contains many
different objects, rescaling at the end can take a long time. Probably that is
not the typical use case here, but on the other hand I don't see a good reason
not to save time here.

Maybe just my nitpicking, but I think the get_content_freq function will be
more readable if we use different variable names inside this function.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 13:31:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 08:31:49 -0500
Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in
	GenBank files
In-Reply-To: <bug-2225-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051331.mA5DVnNI007802@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2225


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-05 08:31 EST -------
Do you think we have to worry about multiple project lines, or project entries
spanning multiple lines?

This would require a slight difference to the parsing (to append new project
entries instead of replacing any prior entries), and to the output from the
record object (including line wrapping).

HOWEVER, reading the latest ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt it seems
the PROJECT line will be replaced with a DBLINK line next year.

With that in mind, I would now suggest we parse the PROJECT and/or DBLINK lines
and store them in the record.dbxrefs list (rather than in the annotations).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 13:34:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 08:34:41 -0500
Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like
	5933^5934 in GenBank/EMBL files
In-Reply-To: <bug-2622-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051334.mA5DYfWx008228@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2622


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-05 08:34 EST -------
Hi Brad,

Looking back on this I may have been out by one on the extension calculation,
i.e. I'm not 100% sure position.high.val-position.low.val is appropriate.

I'll try and look at this later...

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 16:51:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 11:51:07 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051651.mA5Gp7R6003323@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #11 from bsouthey at gmail.com  2008-11-05 11:51 EST -------
(In reply to comment #10)
> See
> http://coreygoldberg.blogspot.com/2008/07/python-counting-items-in-list.html
> for some timings of this operation. I think Bruce's approach is most suitable,
> except for the dict update method; I would use
>         content_freqs[cval] = content_freqs.get(cval,0)+p_contents
> instead. Depending on the contents of the list, sometimes it runs even faster
> than the implementation in listfns.
> > 
> > Given the possible rounding issues, does doing the rescaling (dividing by the
> > number of elements) at the start make a big time saving (over dividing each
> > total at the end)?  I would feel happier with the division at the end (as done
> > in the listfns code).
> > 
> I think the rescaling at the start is a good thing. If the list contains many
> different objects, rescaling at the end can take a long time. Probably that is
> not the typical use case here, but on the other hand I don't see a good reason
> not to save time here.
> 
> Maybe just my nitpicking, but I think the get_content_freq function will be
> more readable if we use different variable names inside this function.
> 

(In reply to comment #10)
> See
> http://coreygoldberg.blogspot.com/2008/07/python-counting-items-in-list.html
> for some timings of this operation. I think Bruce's approach is most suitable,
> except for the dict update method; I would use
>         content_freqs[cval] = content_freqs.get(cval,0)+p_contents
> instead. Depending on the contents of the list, sometimes it runs even faster
> than the implementation in listfns.

Basically the goal is find the frequency of each class and store it in a
dictionary with the keys being each class and the value being the frequency. So
you could count up all observations in each class (essentially a adding one to
the appropriate class sum) and then divide each count by the total number of
observations - as implemented in the dictget approach.Being more cryptic, we
can avoid the second division by adding one/number of observations instead one
to the appropriate class sum as implemented in get_content_freq.

Thanks for the link, I created a timing code for random lists.

get_content_freq is the one I put in the patch
get_content_freq2 is the modified version
ternary is based the Cory code modified to give frequencies rather than counts
dictget is using a dictionary to count then get the frequencies  
listfns.contents is the Biopython Python version without the C code import.
clistfns.contents is the direct import of Biopython module that uses C code 

My system is running 64-bit Fedora on Linux with Python 2.5.2. The number of
observation is not important (difference is very small), I used 1000000 random
integers and measured just doing it once and repeat the test 5 times with
1000000 executions and get the minimum time ie min(timeit.repeat(5, 1000000)).
Also, this function is not called that much in the NaiveBayes so these are
rather extreme cases. 

Range of ints between one and two:
get_content_freq  once: 1.90734863281e-05  best of 5: 8.11614704132
get_content_freq2 once: 8.10623168945e-06  best of 5: 4.39126110077
ternary file      once: 1.59740447998e-05  best of 5: 9.42879796028
dictget file      once: 1.4066696167e-05  best of 5: 10.468517065
listfns.contents  once: 1.28746032715e-05  best of 5: 7.50778198242
clistfns.contents once: 6.91413879395e-06  best of 5: 2.71360707283


Range of ints between one and ten:
get_content_freq  once: 1.90734863281e-05  best of 5: 7.97784090042
get_content_freq2 once: 7.15255737305e-06  best of 5: 4.21833491325
ternary file      once: 1.69277191162e-05  best of 5: 9.18815684319
dictget file      once: 1.50203704834e-05  best of 5: 10.2242910862
listfns.contents  once: 1.50203704834e-05  best of 5: 7.25569987297
clistfns.contents once: 8.10623168945e-06  best of 5: 2.6411280632

Range of ints between one and one hundred:

get_content_freq  once: 2.00271606445e-05  best of 5: 7.99760317802
get_content_freq2 once: 7.86781311035e-06  best of 5: 4.20446300507
ternary file      once: 1.71661376953e-05  best of 5: 9.26767396927
dictget file      once: 1.4066696167e-05  best of 5: 10.2449028492
listfns.contents  once: 1.4066696167e-05  best of 5: 7.34166693687
clistfns.contents once: 7.15255737305e-06  best of 5: 2.63198709488

So this not dependent on the number of classes. For the most part this numbers
are showing more system overheads than major differences between the actual
approaches. Therefore I would clearly go with Michiel's version.


> > 
> > Given the possible rounding issues, does doing the rescaling (dividing by the
> > number of elements) at the start make a big time saving (over dividing each
> > total at the end)?  I would feel happier with the division at the end (as done
> > in the listfns code).
> > 
> I think the rescaling at the start is a good thing. If the list contains many
> different objects, rescaling at the end can take a long time. Probably that is
> not the typical use case here, but on the other hand I don't see a good reason
> not to save time here.

>From the two case scenario above, the get_content_freq methods result in:
{1: 0.49978999999354606, 2: 0.50020999999354643}
and the others result in:
{1: 0.49979000000000001, 2: 0.50021000000000004}

On my 64-bit linux system the numerical error is small but within the
expectations. It may be worse on a 32-bit system or OS. I really wanted to draw
attention to this because tiny differences can be important (not to mention
people who don't understand enough about numerical precision).

> 
> Maybe just my nitpicking, but I think the get_content_freq function will be
> more readable if we use different variable names inside this function.
> 

Please rename as necessary.

Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 17:00:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 12:00:42 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811051700.mA5H0gxV003976@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #12 from bsouthey at gmail.com  2008-11-05 12:00 EST -------
Created an attachment (id=1038)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1038&action=view)
timing different implementions of listfns.content

This is my timing code for different implementions of listfns.content. It does
assume that there is a local version of listnfs.py without the import clistfns
statement at the end and the clistfns function from Bio. 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 20:30:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 15:30:46 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811052030.mA5KUklP023725@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #36 from bsouthey at gmail.com  2008-11-05 15:30 EST -------
(In reply to comment #35)
Okay, this is what I think of the main uses for translation. All these can be
easily achieved by the translate arguments table='Standard' and stop_symbol='*'
with very little code. So I do not see any need for any extra arguments except
for convenience. (I have these uses in file that I will upload after this.)

So really my only issue left is what is the expected behaviour for:
a) to_stop_codon=True if there are no valid stop codons (my understanding of
to_stop). 
b) from_start_codon=True (or init=True etc) if there are no valid start codons


1) Translation in some given forward frame - reverse frames should be obvious.
Looping over these will give all three frames but that could return multiple
Seq objects.

2) Translation between any range of locations. From Peter's example, extracting
the region between 5234 to 5530 in the complete sequence will give the yaaX
gene CDS that can be translated into the protein sequence.

3a) Translate to the first valid stop codon. Perhaps not as expected because it
should respect the frame so try:
3b) Translate to the first valid stop codon with respect to selected frame.
3c) Alternatively use to_stop=True argument of the translate. Here translation
is to the first valid stop codon OR the end of the sequence. This second aspect
is not documented.

4a) Start translation at first start codon. Again, does not respect frame so
try:
4b) Translate to the first valid start codon with respect to selected frame.

In both cases of 4) the very first codon must be checked against the defined
start_codon list in the appropriate CodonTable.

Obviously 3) and 4) should raise exceptions if stop or start codons are not
found because of the specific request to stop or start translation. But, as in
3c), this could be relaxed to include the end of the sequence. I am not sure
the behaviour if there is no valid start codon.

Also some variation of 3a) and 4a) could be used to find possible open reading
frames (from a start codon to stop codon). But this could return more than one
Seq object. 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 20:33:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 15:33:52 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811052033.mA5KXqqJ023824@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #37 from bsouthey at gmail.com  2008-11-05 15:33 EST -------
Created an attachment (id=1039)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1039&action=view)
examples of possible uses of translate


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 22:12:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 17:12:13 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811052212.mA5MCDhY028649@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #38 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-05 17:12 EST -------
(In reply to comment #36)
> (In reply to comment #35)
> Okay, this is what I think of the main uses for translation.
> All these can be easily achieved by the translate arguments
> table='Standard' and stop_symbol='*' with very little code.
> So I do not see any need for any extra arguments except
> for convenience. (I have these uses in file that I will
> upload after this.)

Most of your examples seem to relate to open reading frame searches, looking
for start/stop codons etc.  I agree this kind of thing isn't needed in the
basic translate method/function.

Doing a CDS translation however is more fiddly due to the methionine at the
start, and I think this warrents another option in the basic translate
method/function.

> So really my only issue left is what is the expected behaviour for:
> a) to_stop_codon=True if there are no valid stop codons (my understanding of
> to_stop).

If you are asking about the current to_stop argument in CVS right now, if there
is no in frame stop codon it will translate all the sequence (to_stop has no
effect).  I've just updated the docstring to make this more explicit (see
Bio/Seq.py CVS revision 1.55).

Do you think "to_stop_codon" is a clearer argument name than "to_stop"?

> b) from_start_codon=True (or init=True etc) if there are no valid start codons

As written in attachment 1032, if the sequence does not start with a valid
start codon an exception is raised.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov  5 23:09:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 5 Nov 2008 18:09:01 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811052309.mA5N91aO031273@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #39 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-05 18:09 EST -------
Created an attachment (id=1040)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1040&action=view)
Patch to Bio/Seq.py for complete CDS translation.

(In reply to comment #33)
> Instead of the "init" start codon option in attachment 1032,
> I'd also be happy with a single boolean argument which does
> start codon validation, treats this as a methionine, checks
> the sequence is a multiple of three in length, checks for a
> final stop codon, and checks for no additional stop codons.
> We'd ruled out calling this "complete", but maybe "cds"
> would be better?

This patch adds this functionality via a "complete_cds" boolean argument.

Here is how it could be applied to translate the CDS used as an example in my
comment 35, the yaaX gene in E. coli K12:

>>> from Bio.Seq import Seq
>>> my_cds = Seq("GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCAGCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGATAATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACATTATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCATAAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA")
>>> my_cds.translate(table=11)
Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HR*',
HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> my_cds.translate(table=11, to_stop=True)
Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR',
ExtendedIUPACProtein())
>>> my_cds.translate(table=11, complete_cds=True)
Seq('MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR',
ExtendedIUPACProtein())

I would be happy with EITHER of these options, as both can be used to translate
a complete coding sequence:

(1) the "init" argument (under another name, maybe "cds_start"?) illustrated in
attachment 1032.  This would check the start codon is valid AND translate it as
a methionine.

(2) the "complete_cds" argument (perhaps under another name, maybe "cds"?)
illustrated in this patch.  This would check the start codon is valid AND
translate it as a methionine AND check there are a whole number of codons AND
check it ends with a stop codon AND check there are no extra in-frame stop
codons.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 11:14:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 06:14:07 -0500
Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for
	arguments to their types
In-Reply-To: <bug-2639-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061114.mA6BE7jk002000@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2639


------- Comment #3 from dalloliogm at gmail.com  2008-11-06 06:14 EST -------
Created an attachment (id=1041)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1041&action=view)
add a check for the seq argument in seqrecord, to be a Seq object and not None

This patch adds a check for the seq argument in SeqRecord.
If seq is None (by default), it raises a ValueError Exception.
If it is a Seq objects, it saves it as self.seq.
If it is another kind of object (string, list, integer), it is converted to a
string, and then used to instantiate a seq object.
I thought that someone could use an integer (e.g.: 010100010101101) as a
sequence, and in this case, the integer is first converted to a string
(otherwise Seq() would return an error).

Please, take care with this patch: I have messed a bit with cvs and patches :(,
so, this patch contains also a doctest example that I have added for my self
(see bug report 2640).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 11:31:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 06:31:57 -0500
Subject: [Biopython-dev] [Bug 2643] New: Proposal: fastPhaseOutputIO for
	SeqIO
Message-ID: <bug-2643-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643

           Summary: Proposal: fastPhaseOutputIO for SeqIO
           Product: Biopython
           Version: Not Applicable
          Platform: PC
               URL: http://github.com/dalloliogm/biopython---
                    popgen/tree/master/src/PopGen/Gio/fastPhaseOutputIO.py
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: dalloliogm at gmail.com
                CC: tiagoantao at gmail.com


Hi,
fastPHASE is software for haplotype reconstruction and missing genotype
estimation from population genetic SNP data.
- http://stephenslab.uchicago.edu/software.html
It is commonly used by some population genetics bioinformaticians.

I had to convert the output from a fastPhase run to fasta; so I wrote a module
that reads a fastPhase output file, and returns SeqRecord objects.

fastPhase output contains information about SNPs and genotyping, and would
probably be supported by the PopGen module that is being written for biopython.
However, my module is thought to be used only to read the sequence information
from the output file, and to create SeqRecord objects, ignoring any other kind
of information.
So, in the future we could have to fastPhaseOutputIterator-like modules, one
that creates SeqRecord objects, and one other to be used in PopGen.

The module has been tested with doctest. I'll attach a file with the tests
along with the module.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 11:40:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 06:40:17 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061140.mA6BeHwc003465@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #1 from dalloliogm at gmail.com  2008-11-06 06:40 EST -------
Created an attachment (id=1042)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1042&action=view)
fastPhase output iterator, for SeqIO

If invoked directly, this module tries to call doctest.testfile over a file
called test_fastPhaseOutputIO.py (I will post it in 5 minutes).
You should edit this module to point it to the right file path on your
computer.

This module is thought to be used with SeqIO. You should modify
SeqIO.__init__.py and add it to the _FormatToIterator dictionary. 

I didn't wrote a Writer handler, because you are not supposed to create
fastPhaseOutput files manually (even if it could be useful for testing
purposes).

You can see the git history of this module here: 
-
http://github.com/dalloliogm/biopython---popgen/tree/master/src/PopGen/Gio/fastPhaseOutputIO.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 11:42:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 06:42:55 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061142.mA6Bgt77003705@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #2 from dalloliogm at gmail.com  2008-11-06 06:42 EST -------
Created an attachment (id=1043)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1043&action=view)
this is a doctest file to test fastPhaseOutputIterator

This file is called by fastPhaseOutputIO, when __name__ == '__init__'


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 11:44:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 06:44:55 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061144.mA6BitTU003910@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #3 from dalloliogm at gmail.com  2008-11-06 06:44 EST -------
Created an attachment (id=1044)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1044&action=view)
adds fastPhaseOutput support to SeqIO

this patchs adds fastPhaseOutput support to SeqIO (not tested)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 11:50:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 06:50:39 -0500
Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for
	arguments to their types
In-Reply-To: <bug-2639-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061150.mA6Bod9J004289@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2639


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 06:50 EST -------
(In reply to comment #3)
> Created an attachment (id=1041)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1041&action=view) [details]
> add a check for the seq argument in seqrecord, to be a Seq object and not None
>
> This patch adds a check for the seq argument in SeqRecord.
> If seq is None (by default), it raises a ValueError Exception.
> If it is a Seq objects, it saves it as self.seq.
> If it is another kind of object (string, list, integer), it is converted to a
> string, and then used to instantiate a seq object.

I was deliberately not checking the seq argument.  There are several reasonable
use cases:

* a Seq object (normal) or a subclass of it.
* a MutableSeq object (seems reasonable, note this is not a subclass of Seq)
* None (seems a good way to handle sequence records where we don't know the
sequence - for example some GenBank files).
* a user defined sequence object which implements the Seq API but does not
subclass Seq or MutableSeq (this is more difficult to check).

> I thought that someone could use an integer (e.g.: 010100010101101) as a
> sequence, and in this case, the integer is first converted to a string
> (otherwise Seq() would return an error).

Note that if someone did want to use some weird numerical sequence, then the
SeqRecord object should NOT be trying to do anything special (guessing what is
intended). The user should create a suitable Seq object themselves (ideally
with a numerical alphabet object).  Explicit rather than implicit (Zen of
python).

--

Note that I'm not 100% happy with the type checking we've just added.  See
"duck-typing" and interfaces versus types,
http://www.python.org/doc/2.5.2/tut/node18.html#l2h-46

The checks I've added shouldn't be too constraining - but maybe they should use
using interface checking instead (or just revert back to no checking).

Any comments from other people?  This should be being CC'd to the dev mailing
list.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 12:14:04 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 07:14:04 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061214.mA6CE4PD005743@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 07:14 EST -------
Hi Marco,

This looks interesting :)

Could you attach the individual valid sample fastPHASE files as separate
attachments (so they can be integrated into the existing unit tests).  You seem
to have picked very small files in order to use them as doctests; a larger more
realistic example would be better for the unit tests (a few 5kb in size should
be OK - not too big).

Do you have URL for the file format documentation?  Are they always DNA for
example, or is RNA also possible?

If you want to include a fastPHASE parser in Bio.SeqIO it should ideally cope
with any valid fastPHASE output.  In the doctests you have an example:

... BEGIN GENOTYPES
... Ind1  # subpop. label: 6  (internally 1)
... T
... T C
... Ind2  # subpop. label: 6  (internally 1)
... C
... T
... END GENOTYPES

You're treating this as an error - "Two chromosomes with different length". 
Why isn't it parsed as four short sequences (of different lengths): "T", "TC",
"C", "T"?

Similarly, the final example:

... BEGIN GENOTYPES
... Ind1  # subpop. label: 6  (internally 1)
... T T T T T G A A A C C A A A G A C G C T G C G T C A G C C T G C A A T C T G
... Ind2  # subpop. label: 6  (internally 1)
... C T T T T G C C C T C A A A A G T G C T G T G C C A G T C T A C G G C C T G
... T T T T T G A A A C C A A A G A C G C T T C G T C A G T A T A C G A T C T A
... END GENOTYPES

Again, you raised an error - "Missing sequence in input file".  If this is a
valid file shouldn't it be parsed as three sequences?

On the other hand, are these hand edited files which deliberately break the
rules?  If fastPHASE files SHOULD always come in allele groups (of the same
length), then it would be better to integrate the parser into Bio.AlignIO
giving pairwise alignments (and you would be able to read it via Bio.SeqIO
automatically as well).

P.S. Your suggested format name "fastPhaseOutput" breaks the lower case rule. 
Would "fastphase" be OK, or is there more than one format?  e.g. an input
format which might be confused with this?

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 12:21:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 07:21:09 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061221.mA6CL9e8006180@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 07:21 EST -------
(In reply to comment #4)
> You seem to have picked very small files in order to use them as
> doctests; a larger more realistic example would be better for the
> unit tests (a few 5kb in size should be OK - not too big).

Sorry - that was a typo.  I meant a few kb in size (5kb should be OK). 

I don't have a feel for the typical size of real fastPHASE output, but a few
interesting real examples (e.g. covering a range of fastPHASE command line
options) would be better than a single large file.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 12:25:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 07:25:42 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061225.mA6CPgsn006472@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 07:25 EST -------
P.S.

The module's docstring needs some work - your introduction for this bug might
be a good start.  We should include the URL
http://stephenslab.uchicago.edu/software.html and the reference in the module's
docstring:

Scheet, P and Stephens, M (2006) "A fast and flexible statistical model for
large-scale population genotype data: applications to inferring missing
genotypes and haplotypic phase." Am J Hum Genet 78(4):629-44.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From tiagoantao at gmail.com  Thu Nov  6 13:18:54 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 6 Nov 2008 13:18:54 +0000
Subject: [Biopython-dev] Preparing for Biopython 1.49 (beta)
In-Reply-To: <320fb6e00811040336k12a834b9o2fa103b8fabf7ec1@mail.gmail.com>
References: <320fb6e00811040336k12a834b9o2fa103b8fabf7ec1@mail.gmail.com>
Message-ID: <6d941f120811060518w388bd471g129aafdaf02381d4@mail.gmail.com>

On Tue, Nov 4, 2008 at 11:36 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> If this schedule is realistic, then Tiago should be OK to add his next
> set of PopGen code in about two weeks time (for what would become
> Biopython 1.50).


I am working on documentation and test cases for LDNe and extra
GenePop support (this is more or less orthogonal to the ongoing
discussion on statistics), code is all done for weeks. I will start to
upload it as soon as you unfroze CVS from 1.49.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 14:24:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 09:24:12 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061424.mA6EOCcB015073@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #40 from bsouthey at gmail.com  2008-11-06 09:24 EST -------
(In reply to comment #38)
> (In reply to comment #36)
> > (In reply to comment #35)
> > Okay, this is what I think of the main uses for translation.
> > All these can be easily achieved by the translate arguments
> > table='Standard' and stop_symbol='*' with very little code.
> > So I do not see any need for any extra arguments except
> > for convenience. (I have these uses in file that I will
> > upload after this.)
> 
> Most of your examples seem to relate to open reading frame searches, looking
> for start/stop codons etc.  I agree this kind of thing isn't needed in the
> basic translate method/function.
> 
> Doing a CDS translation however is more fiddly due to the methionine at the
> start, and I think this warrents another option in the basic translate
> method/function.
> 
> > So really my only issue left is what is the expected behaviour for:
> > a) to_stop_codon=True if there are no valid stop codons (my understanding of
> > to_stop).
> 
> If you are asking about the current to_stop argument in CVS right now, if there
> is no in frame stop codon it will translate all the sequence (to_stop has no
> effect).  I've just updated the docstring to make this more explicit (see
> Bio/Seq.py CVS revision 1.55).
> 
> Do you think "to_stop_codon" is a clearer argument name than "to_stop"?
> 

I think to_end because end does mean the end of the translation due to a stop
codon or end of a sequence.


> > b) from_start_codon=True (or init=True etc) if there are no valid start codons
> 
> As written in attachment 1032 [details], if the sequence does not start with a valid
> start codon an exception is raised.
> 

Okay.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 14:35:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 09:35:40 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061435.mA6EZe5F015831@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #41 from lpritc at scri.sari.ac.uk  2008-11-06 09:35 EST -------
(In reply to comment #40)
> > If you are asking about the current to_stop argument in CVS right now, if there
> > is no in frame stop codon it will translate all the sequence (to_stop has no
> > effect).  I've just updated the docstring to make this more explicit (see
> > Bio/Seq.py CVS revision 1.55).
> > 
> > Do you think "to_stop_codon" is a clearer argument name than "to_stop"?
> > 
> I think to_end because end does mean the end of the translation due to a stop
> codon or end of a sequence.

I would take 'to_end' to mean 'to the end of the passed sequence, ignoring all
stop codons along the way'.  'to_first_stop' is clearer, to my mind, and even
that leaves out the potential (and hopefully redundant) qualifier 'in-frame' ;)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 14:46:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 09:46:48 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061446.mA6Ekmfj016554@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #42 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 09:46 EST -------
Peter in comment #40
>>> If you are asking about the current to_stop argument in CVS right now,
>>> if there is no in frame stop codon it will translate all the sequence
>>> (to_stop has no effect).  I've just updated the docstring to make this
>>> more explicit (see Bio/Seq.py CVS revision 1.55).
>>> 
>>> Do you think "to_stop_codon" is a clearer argument name than "to_stop"?
>>>

Bruce in comment #41:
>> I think to_end because end does mean the end of the translation
>> due to a stop codon or end of a sequence.
>>

Leighton in comment #42: 
> I would take 'to_end' to mean 'to the end of the passed sequence,
> ignoring all stop codons along the way'.  'to_first_stop' is
> clearer, to my mind, and even that leaves out the potential (and
> hopefully redundant) qualifier 'in-frame' ;)
> 

I agree with Leighton here, "to_end" sounds like "to the end of the sequence
given".  I quite like "to_first_stop", but it is longer than "to_stop".


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 15:07:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 10:07:06 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061507.mA6F76PK018513@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #43 from bsouthey at gmail.com  2008-11-06 10:07 EST -------
(In reply to comment #39)
> Created an attachment (id=1040)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1040&action=view) [details]
> Patch to Bio/Seq.py for complete CDS translation.
> 
> (In reply to comment #33)
> > Instead of the "init" start codon option in attachment 1032 [details],
> > I'd also be happy with a single boolean argument which does
> > start codon validation, treats this as a methionine, checks
> > the sequence is a multiple of three in length, checks for a
> > final stop codon, and checks for no additional stop codons.
> > We'd ruled out calling this "complete", but maybe "cds"
> > would be better?
> 
> This patch adds this functionality via a "complete_cds" boolean argument.
> 
> Here is how it could be applied to translate the CDS used as an example in my
> comment 35, the yaaX gene in E. coli K12:
> 
> >>> from Bio.Seq import Seq
> >>> my_cds = Seq("GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCAGCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGATAATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACATTATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCATAAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA")
> >>> my_cds.translate(table=11)
> Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HR*',
> HasStopCodon(ExtendedIUPACProtein(), '*'))
> >>> my_cds.translate(table=11, to_stop=True)
> Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR',
> ExtendedIUPACProtein())
> >>> my_cds.translate(table=11, complete_cds=True)
> Seq('MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR',
> ExtendedIUPACProtein())
> 
> I would be happy with EITHER of these options, as both can be used to translate
> a complete coding sequence:
> 
> (1) the "init" argument (under another name, maybe "cds_start"?) illustrated in
> attachment 1032 [details].  This would check the start codon is valid AND translate it as
> a methionine.
> 
> (2) the "complete_cds" argument (perhaps under another name, maybe "cds"?)
> illustrated in this patch.  This would check the start codon is valid AND
> translate it as a methionine AND check there are a whole number of codons AND
> check it ends with a stop codon AND check there are no extra in-frame stop
> codons.
> 


I support (1) but strongly disagree with (2) because 'cds' refers to a complete
DNA sequence not just if the sequence starts with M.
http://www.yeastgenome.org/help/glossary.html
"CDS:    CoDing Sequence, region of nucleotides that corresponds to the
sequence of amino acids in the predicted protein. The CDS includes start and
stop codons, therefore coding sequences begin with an "ATG" and end with a stop
codon. In SGD, unexpressed sequences, including the 5'-UTR, the 3'-UTR,
introns, or bases not expressed due to frameshifting, are not included within a
CDS. Note that the CDS does not correspond to the actual mRNA sequence."

However, I do like being able to obtain the translation of the actual CDS -
just not here.

I do not support the name 'init' because of reasons discussed. 

I do not support the name 'cds_start' because of the DNA interpretation and
that many Genbank records include the upstream and downstream non-coding
regions. In such cases, I would have to find the actual start codon, then I
might as well do the translation after that start codon than rely on a check
that might be wrong.

Perhaps some variant of:
a) Similar cases in Python:
has_met or has_met1
get_met or get_met1
b) More direct meaning:
starts_with_methionine, starts_with_met, starts_with_m


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 15:08:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 10:08:17 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061508.mA6F8HRo018696@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #44 from bsouthey at gmail.com  2008-11-06 10:08 EST -------
(In reply to comment #42)
> Peter in comment #40
> >>> If you are asking about the current to_stop argument in CVS right now,
> >>> if there is no in frame stop codon it will translate all the sequence
> >>> (to_stop has no effect).  I've just updated the docstring to make this
> >>> more explicit (see Bio/Seq.py CVS revision 1.55).
> >>> 
> >>> Do you think "to_stop_codon" is a clearer argument name than "to_stop"?
> >>>
> 
> Bruce in comment #41:
> >> I think to_end because end does mean the end of the translation
> >> due to a stop codon or end of a sequence.
> >>
> 
> Leighton in comment #42: 
> > I would take 'to_end' to mean 'to the end of the passed sequence,
> > ignoring all stop codons along the way'.  'to_first_stop' is
> > clearer, to my mind, and even that leaves out the potential (and
> > hopefully redundant) qualifier 'in-frame' ;)
> > 
> 
> I agree with Leighton here, "to_end" sounds like "to the end of the sequence
> given".  I quite like "to_first_stop", but it is longer than "to_stop".
> 

Either is fine with me.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 15:11:38 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 10:11:38 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061511.mA6FBcAY019165@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 10:11 EST -------
I've now had a quick look at the fastPHASE documentation, and I have the
impression that the sequences should always come in pairs:

"Output ???les for inferred haplotypes or imputed genotypes contain two lines 
per given diploid individual, with the order of individuals corresponding to 
that supplied in the input ???le."

Assuming the paired sequences are always the same length, this does suggest the
format should be integrated into Bio.AlignIO (giving pairwise alignments)
rather than Bio.SeqIO.

Have you tried not estimating the haplotypes (by supplying a negative integer
following -H), and does this alter the sequence output?

Finally could you try the -Z command line argument for the simplified output
format (described as two lines per individual, without ???id??? lines,
subpopulation labels or summary information from the run).  Does this have the
sequences?  If so this may be a more parser friendly set of output to parse for
Bio.SeqIO and/or Bio.AlignIO.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 15:27:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 10:27:07 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061527.mA6FR7TQ021259@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #45 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 10:27 EST -------
(In reply to comment #43)
> (In reply to comment #39)
> > I would be happy with EITHER of these options, as both can be used to
> > translate a complete coding sequence:
> > 
> > (1) the "init" argument (under another name, maybe "cds_start"?)
> > illustrated in attachment 1032.  This would check the start
> > codon is valid AND translate it as a methionine.
> > 
> > (2) the "complete_cds" argument (perhaps under another name, maybe "cds"?)
> > illustrated in this patch.  This would check the start codon is valid AND
> > translate it as a methionine AND check there are a whole number of codons
> > AND check it ends with a stop codon AND check there are no extra in-frame
> > stop codons.
> > 
> 
> 
> I support (1) but strongly disagree with (2) because 'cds' refers to
> a complete DNA sequence not just if the sequence starts with M.
> http://www.yeastgenome.org/help/glossary.html
> "CDS:    CoDing Sequence, region of nucleotides that corresponds to the
> sequence of amino acids in the predicted protein. The CDS includes start and
> stop codons, therefore coding sequences begin with an "ATG" and end with a
> stop codon. In SGD, unexpressed sequences, including the 5'-UTR, the 3'-UTR,
> introns, or bases not expressed due to frameshifting, are not included within
> a CDS. Note that the CDS does not correspond to the actual mRNA sequence."

Starting with that definition but being aware of atypical start codons gives:

"The CDS includes start and stop codons, therefore coding sequences begin with
an "ATG" [or other valid start codon] and end with a stop codon."

This then fits exactly with what I'm doing in the "complete_cds" option
(attachment 1040).  So why the disagreement?

> However, I do like being able to obtain the translation of the actual
> CDS - just not here.

Back in comment 11, I previously mooted having separate methods like
translate_to_stop, and translate_cds - but we currently seem to be leaning
towards one method with some options.

> I do not support the name 'init' because of reasons discussed. 

I think that is settled, "init" is too ambiguous.

> I do not support the name 'cds_start' because of the DNA interpretation and
> that many Genbank records include the upstream and downstream non-coding
> regions. In such cases, I would have to find the actual start codon, then I
> might as well do the translation after that start codon than rely on a check
> that might be wrong.

In such cases, if your sequence might includes upstream and downstream
non-coding regions, then you shouldn't be trying to use the "init"/"cds_start"
option (or the "complete_cds" option).  By the nature of your uncertain
dataset, you'll have to do some extra work to find the start/stop.  I don't see
how this is an argument against providing an option useful for when you do know
where the CDS starts (or do already have the CDS).

> Perhaps some variant of:
> a) Similar cases in Python:
> has_met or has_met1
> get_met or get_met1
> b) More direct meaning:
> starts_with_methionine, starts_with_met, starts_with_m
> 

I'd been avoiding names with methionine in them, preferring to focus on
initiation or start codon based names.

I guess "starts_with_met" is OK.  Or maybe "start_met"?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 15:28:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 10:28:20 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061528.mA6FSKMv021486@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #46 from lpritc at scri.sari.ac.uk  2008-11-06 10:28 EST -------
(In reply to comment #43)

> > (2) the "complete_cds" argument (perhaps under another name, maybe "cds"?)
> > illustrated in this patch.  This would check the start codon is valid AND
> > translate it as a methionine AND check there are a whole number of codons AND
> > check it ends with a stop codon AND check there are no extra in-frame stop
> > codons.

> I support (1) but strongly disagree with (2) because 'cds' refers to a complete
> DNA sequence not just if the sequence starts with M.
> http://www.yeastgenome.org/help/glossary.html
> "CDS:    CoDing Sequence, region of nucleotides that corresponds to the
> sequence of amino acids in the predicted protein. The CDS includes start and
> stop codons, therefore coding sequences begin with an "ATG" and end with a stop
> codon. In SGD, unexpressed sequences, including the 5'-UTR, the 3'-UTR,
> introns, or bases not expressed due to frameshifting, are not included within a
> CDS. Note that the CDS does not correspond to the actual mRNA sequence."

That definition seems to correspond exactly to (2), above; not that web-based
definitions have any particular authority ;)

"Begin with an ATG" is a eukaryote-specific statement; "Begin with a (valid)
start codon" covers this.

"End with a stop codon", implying the *first in-frame* stop codon is the same
in both cases.

Where do you see that they differ?

> I do not support the name 'cds_start' because of the DNA interpretation and
> that many Genbank records include the upstream and downstream non-coding
> regions. In such cases, I would have to find the actual start codon, then I
> might as well do the translation after that start codon than rely on a check
> that might be wrong.

I don't think that the argument is proposed for that particular use-case, which
is why I don't think it's valid, there.  If, say, you knew that the 5`UTR ran
to base 17, then you could check with seq[17:].translate(complete_cds=True) or
some such arrangement - but that's not the problem that's being solved with
that method argument, I think.

> Perhaps some variant of:
> a) Similar cases in Python:
> has_met or has_met1
> get_met or get_met1
> b) More direct meaning:
> starts_with_methionine, starts_with_met, starts_with_m

I quite like this way of checking sequence properties, and would prefer an
is_cds() (or, to be pedantic, is_conceptual_cds()) method that returns a
Boolean, but otherwise implements the sort of behaviour described above.

If you only wanted the conceptual translations of sequences that fit the
criteria for a CDS, then a one-liner to replace

[seq.translate(cds=True) for seq in seqlist]

might be

[seq.translate() for seq in seqlist if seq.is_cds()]

I prefer the second option, for readability, but YMMV.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 16:06:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 11:06:46 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061606.mA6G6kL7028787@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #8 from dalloliogm at gmail.com  2008-11-06 11:06 EST -------
(In reply to comment #4)
> Hi Marco,

Hi!! :)

> This looks interesting :)
> 
> Could you attach the individual valid sample fastPHASE files as separate
> attachments (so they can be integrated into the existing unit tests).  You seem
> to have picked very small files in order to use them as doctests; a larger more
> realistic example would be better for the unit tests (a few 5kb in size should
> be OK - not too big).

ok
Actually I have been using files which come from our laboratory analysis, and I
would like to ask if I include them here and how first.

> Do you have URL for the file format documentation?  

The fastphase format seems to be described only in fastphase's manual, which is
only accessible after accepting a license agreement.
I could contact the authors of the program to ask them to publish the format
specifications publicly. It would be in their interest, as otherwise the format
could be considered as a not standard.
I'll let you know..

> Are they always DNA for example, or is RNA also possible?

They should be DNA, In principle they could be also genes, or other kind of
characters, but this software is designed for the purpose of reconstructing
haplotypes from SNPs/microsatellites.
Maybe Tiago has some more experience in this..

> If you want to include a fastPHASE parser in Bio.SeqIO it should ideally cope
> with any valid fastPHASE output.  In the doctests you have an example:
> 
> ... BEGIN GENOTYPES
> ... Ind1  # subpop. label: 6  (internally 1)
> ... T
> ... T C
> ... Ind2  # subpop. label: 6  (internally 1)
> ... C
> ... T
> ... END GENOTYPES
> You're treating this as an error - "Two chromosomes with different length". 
> Why isn't it parsed as four short sequences (of different lengths): "T", "TC",
> "C", "T"?

You should not have a file in which a chromosome is longer than the other
one... instead, you should have a '?' indicating data that the program could
not infer.


> Similarly, the final example:
> 
> ... BEGIN GENOTYPES
> ... Ind1  # subpop. label: 6  (internally 1)
> ... T T T T T G A A A C C A A A G A C G C T G C G T C A G C C T G C A A T C T G
> ... Ind2  # subpop. label: 6  (internally 1)
> ... C T T T T G C C C T C A A A A G T G C T G T G C C A G T C T A C G G C C T G
> ... T T T T T G A A A C C A A A G A C G C T T C G T C A G T A T A C G A T C T A
> ... END GENOTYPES
> 
> Again, you raised an error - "Missing sequence in input file".  If this is a
> valid file shouldn't it be parsed as three sequences?

Because that would mean that one individual has only a chromosome.
It doesn't make sense to run fastPhase on an haploid individual.


> On the other hand, are these hand edited files which deliberately break the
> rules?  

Yes. Usually you shouldn't have neither of the two cases. But I find it useful
when a script tells me if there are weird things in my files (I could have
modified them accidentally).
This could be refactored in a check_fileformat function.

> If fastPHASE files SHOULD always come in allele groups (of the same
> length), then it would be better to integrate the parser into Bio.AlignIO
> giving pairwise alignments (and you would be able to read it via Bio.SeqIO
> automatically as well).

This is good idea, I didn't think of it.
But how should I modify the module to produce AlignIO objects?


> P.S. Your suggested format name "fastPhaseOutput" breaks the lower case rule. 
> Would "fastphase" be OK, or is there more than one format?  e.g. an input
> format which might be confused with this?

I agree.. I wasn't sure of biopython's naming conventions.

> 
> Peter
> 
Scheet and Stephens (2006)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 16:12:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 11:12:15 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061612.mA6GCFHq029869@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #9 from dalloliogm at gmail.com  2008-11-06 11:12 EST -------
(In reply to comment #7)
> I've now had a quick look at the fastPHASE documentation, and I have the
> impression that the sequences should always come in pairs:

right!

> "Output ???les for inferred haplotypes or imputed genotypes contain two lines 
> per given diploid individual, with the order of individuals corresponding to 
> that supplied in the input ???le."
> 
> Assuming the paired sequences are always the same length, this does suggest the
> format should be integrated into Bio.AlignIO (giving pairwise alignments)
> rather than Bio.SeqIO.


> Have you tried not estimating the haplotypes (by supplying a negative integer
> following -H), and does this alter the sequence output?

I will try it, ok.

> Finally could you try the -Z command line argument for the simplified output
> format (described as two lines per individual, without ???id??? lines,
> subpopulation labels or summary information from the run).  Does this have the
> sequences?  If so this may be a more parser friendly set of output to parse for
> Bio.SeqIO and/or Bio.AlignIO.

ok, I can try to implement both of the two formats, but for the moment I will
prefer to concetrate on one.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 17:11:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 12:11:26 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061711.mA6HBQN5007343@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #47 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 12:11 EST -------
(In reply to comment #46)
> If you only wanted the conceptual translations of sequences that fit the
> criteria for a CDS, then a one-liner to replace
> 
> [seq.translate(cds=True) for seq in seqlist]
> 
> might be
> 
> [seq.translate() for seq in seqlist if seq.is_cds()]
> 
> I prefer the second option, for readability, but YMMV.
> 

Note the above wouldn't give you translations starting with methionine, you'd
need something like:

[seq.translate(cds_start=True) for seq in seqlist if seq.is_cds()]

(assuming we call the "init" option "cds_start")

Or, going with the complete_cds option you could build a list of translations
of valid CDSs like this:

proteins = []
for seq in seqlist :
    try :
        proteins.append(seq.translate(complete_cds=True))
    except ValueError :
        #Not a valid CDS, excluded
        pass

Not a one liner, but I think in a real situation you'd want to do something
with the invalid CDSs anyway (even if just logging them).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 17:32:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 12:32:52 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061732.mA6HWqE7009337@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #48 from lpritc at scri.sari.ac.uk  2008-11-06 12:32 EST -------
(In reply to comment #47)
> (In reply to comment #46)

> > [seq.translate() for seq in seqlist if seq.is_cds()]
> > 
> > I prefer the second option, for readability, but YMMV.
> 
> Note the above wouldn't give you translations starting with methionine, you'd
> need something like:
> 
> [seq.translate(cds_start=True) for seq in seqlist if seq.is_cds()]
> 
> (assuming we call the "init" option "cds_start")

Fair point... my focus was on putting that filter into the list comprehension.

> Or, going with the complete_cds option you could build a list of translations
> of valid CDSs like this:
> 
> proteins = []
> for seq in seqlist :
>     try :
>         proteins.append(seq.translate(complete_cds=True))
>     except ValueError :
>         #Not a valid CDS, excluded
>         pass
> 
> Not a one liner, but I think in a real situation you'd want to do something
> with the invalid CDSs anyway (even if just logging them).

True enough.  It comes down in part to a preference of style, as the same could
be achieved with

proteins = []
for seq in seqlist :
    if seq.is_cds():
        proteins.append(seq.translate(complete_cds=True))
    else:
        #Not a valid CDS, excluded
        pass

I think the clarity of this arrangement to my eyes comes from 'is/is not a cds'
being - naturally-speaking - a property or attribute of the sequence itself. 
The 'cds_start' argument in your example is then an instruction to treat the
translation as though you have a CDS, and implement some specialised behaviour
that is appropriate under that circumstance, rather than to implement a test
that raises an error if it is failed.  By separating the 'is_cds()' call from
the 'cds_start' argument, you gain the ability to translate the sequence with
either the methionine or the coded amino acid, without losing the test of the
sequence being a CDS.

Of course, using the 'cds_start=True' argument could force a call to
self.is_cds(), anyway.  Your non-one-liner could then be as you originally
wrote:

proteins = []
for seq in seqlist :
    try:
        proteins.append(seq.translate(complete_cds=True))
    except ValueError:
        #Not a valid CDS, excluded
        pass

The two advantages I see to having the is_cds() method as a separate call are
that it permits separation of the determining the CDS status of the sequence,
and that it provides a filter that is more readable than attempting to
translate the sequence to find out if it's a valid CDS.  If the 'cds_start'
argument forces a self.is_cds() test, then the usage can be - I think - exactly
as you've been proposing throughout the thread.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 17:33:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 12:33:12 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061733.mA6HXCuE009403@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 12:33 EST -------
(In reply to comment #8)
> 
> ok
> Actually I have been using files which come from our laboratory analysis,
> and I would like to ask if I include them here and how first.

If you can get permission to include a real example (and its not too big) that
would be great.  Ideally something with at least three alleles.

> > Do you have URL for the file format documentation?  
> 
> The fastphase format seems to be described only in fastphase's manual,
> which is only accessible after accepting a license agreement.
> I could contact the authors of the program to ask them to publish the format
> specifications publicly. It would be in their interest, as otherwise the
> format could be considered as a not standard.  I'll let you know.

It's not very open, is it :(

Are there any other tools that output this file format?  Do you think the
author might be willing to just add an option to output the sequences in
another format (e.g. FASTA, or better an alignment format designed for more
than one alignment).  This would be a neater solution in the long run (and
would benefit anyone using fastPhase - not just Biopython).

> > Are they always DNA for example, or is RNA also possible?
> 
> They should be DNA, In principle they could be also genes, or other kind of
> characters, but this software is designed for the purpose of reconstructing
> haplotypes from SNPs/microsatellites.
> Maybe Tiago has some more experience in this..

If it is for DNA only, the sequences/alignments returned should ideally specify
a DNA alphabet.

> ...
> Because that would mean that one individual has only a chromosome.
> It doesn't make sense to run fastPhase on an haploid individual.

Is fastPhase only for haploids?  Could it be used with polyploidy (e.g.
plants)?

> > On the other hand, are these hand edited files which deliberately break the
> > rules?  
> 
> Yes. Usually you shouldn't have neither of the two cases. But I find it
> useful when a script tells me if there are weird things in my files (I
> could have modified them accidentally).

Yes - negative test cases are good.  However, having them as a doctest made the
docstring rather confusing.

> > If fastPHASE files SHOULD always come in allele groups (of the same
> > length), then it would be better to integrate the parser into Bio.AlignIO
> > giving pairwise alignments (and you would be able to read it via Bio.SeqIO
> > automatically as well).
> 
> This is good idea, I didn't think of it.
> But how should I modify the module to produce AlignIO objects?

Essentially Instead of:

yield record_one
yield record_two

you'd do something like this:

alignment = Alignment(generic_dna)
alignment.add_sequence(id_one, seq_one)
alignment.add_sequence(id_two, seq_two)
yield alignment

> > P.S. Your suggested format name "fastPhaseOutput" breaks the lower case
> > rule.  Would "fastphase" be OK, or is there more than one format?  e.g.
> > an input format which might be confused with this?
> 
> I agree.. I wasn't sure of biopython's naming conventions.
> 

This is written down elsewhere - but the format name is a lowercase string (and
this is enforced in the API), and the same names are used in both SeqIO and
AlignIO. Where possible we use the same name as BioPerl's SeqIO and EMBOSS.

(In reply to comment #9)
> (In reply to comment #7)
> > Finally could you try the -Z command line argument for the simplified output
> > format (described as two lines per individual, without ???id??? lines,
> > subpopulation labels or summary information from the run).  Does this have
> > the sequences?  If so this may be a more parser friendly set of output to
> > parse for Bio.SeqIO and/or Bio.AlignIO.
> 
> ok, I can try to implement both of the two formats, but for the moment I will
> prefer to concetrate on one.

I was actually thinking the -Z format might be much simpler to deal with (I
didn't mean to suggest supporting both).  On the other hand, the documentation
does say the -Z is "not intended for general use".

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From dalloliogm at gmail.com  Thu Nov  6 18:09:55 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Thu, 6 Nov 2008 19:09:55 +0100
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <200811061733.mA6HXCuE009403@portal.open-bio.org>
References: <bug-2643-42@http.bugzilla.open-bio.org/>
	<200811061733.mA6HXCuE009403@portal.open-bio.org>
Message-ID: <5aa3b3570811061009i29bb2faflb456978dacbf5218@mail.gmail.com>

On Thu, Nov 6, 2008 at 6:33 PM,  <bugzilla-daemon at portal.open-bio.org> wrote:
>
>
>
>
> ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 12:33 EST -------
> (In reply to comment #8)
>>
>> ok
>> Actually I have been using files which come from our laboratory analysis,
>> and I would like to ask if I include them here and how first.
>
> If you can get permission to include a real example (and its not too big) that
> would be great.  Ideally something with at least three alleles.

ok..

>> > Do you have URL for the file format documentation?
>>
>> The fastphase format seems to be described only in fastphase's manual,
>> which is only accessible after accepting a license agreement.
>> I could contact the authors of the program to ask them to publish the format
>> specifications publicly. It would be in their interest, as otherwise the
>> format could be considered as a not standard.  I'll let you know.
>
> It's not very open, is it :(
>
> Are there any other tools that output this file format?  Do you think the
> author might be willing to just add an option to output the sequences in
> another format (e.g. FASTA, or better an alignment format designed for more
> than one alignment).  This would be a neater solution in the long run (and
> would benefit anyone using fastPhase - not just Biopython).

Not for my knowledge.
Anyway, consider that a fastPhase run could take days for medium/big samples.
In some situations it could be faster to convert its output to fasta
(or other ones) directly, instead of re-calculating the results.

>> > Are they always DNA for example, or is RNA also possible?
>>
>> They should be DNA, In principle they could be also genes, or other kind of
>> characters, but this software is designed for the purpose of reconstructing
>> haplotypes from SNPs/microsatellites.
>> Maybe Tiago has some more experience in this..
>
> If it is for DNA only, the sequences/alignments returned should ideally specify
> a DNA alphabet.

mmm ok...
Basically it could be used also with characters like genes and other
markers.. but in that case, it would not make sense to parse it as a
sequence, so nobody would try to do it.

>> Because that would mean that one individual has only a chromosome.
>> It doesn't make sense to run fastPhase on an haploid individual.
>
> Is fastPhase only for haploids?  Could it be used with polyploidy (e.g.
> plants)?

I think not... It would be another class of problem.
What fastPhase does, is trying to infer haplotypes from genotype data.

Humans and most eukaryotes are diploid, so they have two copies of
each chromosome; when you genotype markers, for every individuals, you
get two informations for each (e.g.  'AC' for a SNP).
Let's say you are studying two SNPs in an single individual: you will
have 'AC' for the first marker, and 'GT' for the second (you already
know that they are in the same chromosome).
You want to know which are the haplotypes, which means, if the 'A'
from the first SNP is on the same molecule of the 'G' from the second
SNP, and so on.

For example, you could have a chromosome with 'AG' and the other with
'CT'; or a chromosome with 'AT' and the other with 'CG', and fastPhase
tries to calculate which is the most likely (I won't be able to
explain all the details properly).

Moreover, fastPhase (there are other programs) can infer missing
genotype data, which is useful when you have big collections of SNPs.

That said, I don't know if it is able to infer haplotypes in polyploid
organisms, but I don't think so, as it would be a different class of
problem (more complex).
I thought that the best thing to do is to do not support poliploidy,
and if someone else that uses fastPhase to calculate that comes, it
would be easy to adapt the module for it (it would require to just add
an option)

>> > On the other hand, are these hand edited files which deliberately break the
>> > rules?
>>
>> Yes. Usually you shouldn't have neither of the two cases. But I find it
>> useful when a script tells me if there are weird things in my files (I
>> could have modified them accidentally).
>
> Yes - negative test cases are good.  However, having them as a doctest made the
> docstring rather confusing.

mmm I know, that doctest could be refactored.
I have started using test recently... I find it is a lot better.

>
>> > If fastPHASE files SHOULD always come in allele groups (of the same
>> > length), then it would be better to integrate the parser into Bio.AlignIO
>> > giving pairwise alignments (and you would be able to read it via Bio.SeqIO
>> > automatically as well).
>>
>> This is good idea, I didn't think of it.
>> But how should I modify the module to produce AlignIO objects?
>
> Essentially Instead of:
>
> yield record_one
> yield record_two
>
> you'd do something like this:
>
> alignment = Alignment(generic_dna)
> alignment.add_sequence(id_one, seq_one)
> alignment.add_sequence(id_two, seq_two)
> yield alignment

sounds easy :)

>
>> > P.S. Your suggested format name "fastPhaseOutput" breaks the lower case
>> > rule.  Would "fastphase" be OK, or is there more than one format?  e.g.
>> > an input format which might be confused with this?
>>
>> I agree.. I wasn't sure of biopython's naming conventions.
>>
>
> This is written down elsewhere - but the format name is a lowercase string (and
> this is enforced in the API), and the same names are used in both SeqIO and
> AlignIO. Where possible we use the same name as BioPerl's SeqIO and EMBOSS.
>
> (In reply to comment #9)
>> (In reply to comment #7)
>> > Finally could you try the -Z command line argument for the simplified output
>> > format (described as two lines per individual, without "id" lines,
>> > subpopulation labels or summary information from the run).  Does this have
>> > the sequences?  If so this may be a more parser friendly set of output to
>> > parse for Bio.SeqIO and/or Bio.AlignIO.
>>
>> ok, I can try to implement both of the two formats, but for the moment I will
>> prefer to concetrate on one.
>
> I was actually thinking the -Z format might be much simpler to deal with (I
> didn't mean to suggest supporting both).  On the other hand, the documentation
> does say the -Z is "not intended for general use".

The problem is that it could take days to run a fastPhase... most of
the times you want the longer format, and then proceed to parse it.
Anyway, it should not be a big problem to implement it (I am just
putting all of that information in SeqRecord.description)

>
> Peter
>
>
> --
> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 18:20:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 13:20:20 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061820.mA6IKK31012133@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #49 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 13:20 EST -------
OK - thank you all for your input thus far.  Unfortunately it is clear that we
haven't reached a consensus about translating sequences which begin with the
start codon (or the more special case of translating a CDS sequence).

However, I hope we are all happy with how things look in CVS right now, which
offers a blind translation continuing over any stop codon, and the "to_stop"
option which will terminate translation at the first in frame stop codon:

See
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Seq.py?cvsroot=biopython
for the full code, but in summary:

class Seq(object):
    ...
    def translate(self, table="Standard", stop_symbol="*", to_stop=False):
        """Turns a nucleotide sequence into a protein sequence. New Seq object.

        Trying to back-transcribe a protein sequence raises an exception.
        This method will translate DNA or RNA sequences.

        Trying to translate a protein sequence raises an exception.

        table - Which codon table to use?  This can be either a name
                (string) or an NCBI identifier (integer).  This defaults
                to the "Standard" table.
        stop_symbol - Single character string, what to use for terminators.
                This defaults to the asterisk, "*".
        to_stop - Boolean, defaults to False meaning do a full translation
                continuing on past any stop codons (translated as the
                specified stop_symbol).  If True, translation is terminated
                at the first in frame stop codon (and the stop_symbol is
                not appended to the returned protein sequence).
        ...

With the module level function taking the same arguments:

def translate(sequence, table="Standard", stop_symbol="*", to_stop=False):
    """Translate a nucleotide sequence into amino acids.

    If given a string, returns a new string object.
    Given a Seq or MutableSeq, returns a Seq object with a protein
    alphabet.
    ...

I think everyone is content with the naming of the "to_stop" argument.

I'm planning to prepare the Biopython 1.49 beta release tomorrow, so I'm
proposing we leave translation like this for Biopython 1.49 (and close this
bug), and revisit translation after that is done (hopefully in less than two
weeks time).  The code in CVS is still a big improvement in terms of writing
object orientated code.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 18:34:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 13:34:03 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061834.mA6IY3ra013125@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 13:34 EST -------
Replying to Marco's email on the dev mailing list:

>> Are there any other tools that output this file format?  Do you think the
>> author might be willing to just add an option to output the sequences in
>> another format (e.g. FASTA, or better an alignment format designed for more
>> than one alignment).  This would be a neater solution in the long run (and
>> would benefit anyone using fastPhase - not just Biopython).
>
> Not for my knowledge.
> Anyway, consider that a fastPhase run could take days for medium/big samples.
> In some situations it could be faster to convert its output to fasta
> (or other ones) directly, instead of re-calculating the results.

OK - I had not appreciated the run time involved.  Clearly it would not be
sensible to have to repeat a long analysis just to get the results in another
format (e.g. as FASTA, or the simplified -Z output whatever that looks like).

>> If it is for DNA only, the sequences/alignments returned should ideally
>> specify a DNA alphabet.
>
> mmm ok...
> Basically it could be used also with characters like genes and other
> markers.. but in that case, it would not make sense to parse it as a
> sequence, so nobody would try to do it.

That's interesting, and means assuming DNA wouldn't be safe.  Just use the
single letter alphabet then (rather than defaulting to the completely generic
base alphabet).

>>> Because that would mean that one individual has only a chromosome.
>>> It doesn't make sense to run fastPhase on an haploid individual.
>>
>> Is fastPhase only for haploids?  Could it be used with polyploidy (e.g.
>> plants)?
>
> I think not... It would be another class of problem.
> What fastPhase does, is trying to infer haplotypes from genotype data.

OK - you can probably tell I'm not a population biologist from the questions ;)

>> I was actually thinking the -Z format might be much simpler to deal
>> with (I didn't mean to suggest supporting both).  On the other hand,
>> the documentation does say the -Z is "not intended for general use".
>
> The problem is that it could take days to run a fastPhase... most of
> the times you want the longer format, and then proceed to parse it.
> Anyway, it should not be a big problem to implement it

OK (as I wrote above), I can see now that using the simplified -Z output is not
sensible.

> (I am just putting all of that information in SeqRecord.description)

If we know the meaning of some of these fields, then ideally they should go in
the annotations dictionary, rather than just in the SeqRecord description.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 19:00:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 14:00:59 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061900.mA6J0xi3015085@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 14:00 EST -------
I've added a few doctests to SeqRecord.py in CVS revision 1.24, plus the simple
unit test from comment 7 to make sure these get validated as part of the
Biopython test suite.

How does that look to you Marco?  I've kept the __init__ example short, not
doing anything with annotations.

Do you think we should also have the __main__ trick in all modules with
doctests?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 19:41:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 14:41:44 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811061941.mA6JfiHM019925@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #9 from dalloliogm at gmail.com  2008-11-06 14:41 EST -------
(In reply to comment #8)
> I've added a few doctests to SeqRecord.py in CVS revision 1.24, plus the simple
> unit test from comment 7 to make sure these get validated as part of the
> Biopython test suite.
> 
> How does that look to you Marco?  I've kept the __init__ example short, not
> doing anything with annotations.

I think they look ok.. to me, they seem good examples of how to use the module.


> Do you think we should also have the __main__ trick in all modules with
> doctests?

I am not really experienced in managing such big projects... but I think it
could be ok, at least for now.

I would personally keep the __init__ trick for every module, because it would
make easier to test a single module when you are still writing it.

But to test many modules subsequently, the code you posted in in #7 is the way
to do.

so... in short, I don't know!! :)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov  6 20:34:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 15:34:36 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811062034.mA6KYa6b026157@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #50 from bsouthey at gmail.com  2008-11-06 15:34 EST -------
(In reply to comment #48)
> (In reply to comment #47)
> > (In reply to comment #46)
> 
> > > [seq.translate() for seq in seqlist if seq.is_cds()]
> > > 
> > > I prefer the second option, for readability, but YMMV.
> > 
> > Note the above wouldn't give you translations starting with methionine, you'd
> > need something like:
> > 
> > [seq.translate(cds_start=True) for seq in seqlist if seq.is_cds()]
> > 
> > (assuming we call the "init" option "cds_start")
> 
> Fair point... my focus was on putting that filter into the list comprehension.
> 
> > Or, going with the complete_cds option you could build a list of translations
> > of valid CDSs like this:
> > 
> > proteins = []
> > for seq in seqlist :
> >     try :
> >         proteins.append(seq.translate(complete_cds=True))
> >     except ValueError :
> >         #Not a valid CDS, excluded
> >         pass
> > 
> > Not a one liner, but I think in a real situation you'd want to do something
> > with the invalid CDSs anyway (even if just logging them).
> 
> True enough.  It comes down in part to a preference of style, as the same could
> be achieved with
> 
> proteins = []
> for seq in seqlist :
>     if seq.is_cds():
>         proteins.append(seq.translate(complete_cds=True))
>     else:
>         #Not a valid CDS, excluded
>         pass
> 
> I think the clarity of this arrangement to my eyes comes from 'is/is not a cds'
> being - naturally-speaking - a property or attribute of the sequence itself. 
> The 'cds_start' argument in your example is then an instruction to treat the
> translation as though you have a CDS, and implement some specialised behaviour
> that is appropriate under that circumstance, rather than to implement a test
> that raises an error if it is failed.  By separating the 'is_cds()' call from
> the 'cds_start' argument, you gain the ability to translate the sequence with
> either the methionine or the coded amino acid, without losing the test of the
> sequence being a CDS.
> 
> Of course, using the 'cds_start=True' argument could force a call to
> self.is_cds(), anyway.  Your non-one-liner could then be as you originally
> wrote:
> 
> proteins = []
> for seq in seqlist :
>     try:
>         proteins.append(seq.translate(complete_cds=True))
>     except ValueError:
>         #Not a valid CDS, excluded
>         pass
> 
> The two advantages I see to having the is_cds() method as a separate call are
> that it permits separation of the determining the CDS status of the sequence,
> and that it provides a filter that is more readable than attempting to
> translate the sequence to find out if it's a valid CDS.  If the 'cds_start'
> argument forces a self.is_cds() test, then the usage can be - I think - exactly
> as you've been proposing throughout the thread.
> 

The use of 'cds' alone is wrong because cds refer to DNA not translation and
not to protein sequences. The use of cds is confusing or at least vague until
you determine how it works. Also it could be wrong in the sense it is a valid
cds (see the GUG initiation in mammalian NAT1 example at the NCBI link) just
not allowed by the table in Bio.Data.CodonTable.

I don't object to the purpose, rather I do object to the name. My overriding
issue here is that 'cds_start' does not convey the purpose of this argument and
this is likely to remain for some time in the API. One interpretation that also
comes to mind is that it is the location of the start of the cds in the
sequence (cds start at...).

I really feel that the name must clearly reflect that it invokes a test that
the first codon are in the 'start_codon' list (defined by the selected table
from Bio.Data.CodonTable). This is not a check that it is the start of a cds
rather it is a check for a possible open reading frame (as not all open reading
frames are cds).  


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov  7 04:46:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 6 Nov 2008 23:46:08 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811070446.mA74k8Js031975@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp  2008-11-06 23:46 EST -------
(In reply to comment #12)
I have uploaded a fixed version of Bio.NaiveBayes to CVS. Can you check to see
if you're happy with this version?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From sbassi at gmail.com  Fri Nov  7 06:56:48 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Fri, 7 Nov 2008 04:56:48 -0200
Subject: [Biopython-dev] Possible problem with NCBIStandalone.blastall
Message-ID: <b43bf2080811062256i5b6a35a9h5e0cd4b531cb9496@mail.gmail.com>

When I run a command line blast with these parameters:

/root/blast-2.2.18/bin/blastall -p blastn -d /var/www/blast/db/UniVec
-q -5 -G 3 -E 3 -F "m D" -e 700 -Y 1.75e12 -i tmpsq

I find a match (with evalue of 18).
But when I do it from biopyhon I can't find any match:

rh, eh = NCBIStandalone.blastall(blast_exe, "blastn", db,
                                     fin, nuc_mismatch='-5',
                                     gap_open = '3',
                                     gap_extend = '3',
                                     search_length = '1.75e12',
                                     expectation='20')

Here is the input sequence:

>C07SpCP042I015.P5A02.R. [Clone-lib=pCLD 04541]
NNNCCCCCCCTCGAGGTCGACNNNNNNNNTAAGCTTGAAATTCTATGATATGCAGTTAGT
TGCTNCTNGTTTAGCATTGGTTGGTTAACTTAAAACCTTTTCCTGCAATAATTATATGGA
TAATATTACTTTACTTNNNNNNNTATTGCCTTCACTAATTTTTAGGATCTATTTTCTGTT
AAATGTTATCTCTTGTTCTTGAGAAGTGCTTTGGAGATCATTTTTCCATCGTATTAACAA
AAAGTGAAATAACTACTTGTGCAATCAGGCTTTTCCTACACCAGGGGATAAGGCAAATAA
ACTATTCACCTCCTTTAATTAGCTCCCCCCCCCCCCCCTCCCCTTCTTTTCTCTTCATTC
CTGANNNANTTAGCTAGTACGCACCATTCAATCAATTATTTCTGTTCCATTTTGTGCTAA
ATATGTTTTCAAATGTTTAATATAGTTCTGAAGACAGCAGTTTAATGTTTTGTCTGGCTA
ACTGCTATTCTAAGCTCATTGTTTCAGCTTGCAGTTTTGCAGCAAAACCTGTCTGCTGTC
CATGAAATCTGGAAGGAATGTAGTAAATTTTACAGTCTCAGCCTTCTATCTCTGAGGAAG
TTTATATGGTCCTTCACGGAGCTGAGAGATCTGAATTCAGCCCACACAGCCTTACAGCAC
ATGGTGAGATTGGCTTTTACGGAAAACTCTTACATTAGTAGAACTGCTGAGGGGAGGTTT
TGTGATTTAAGATTGGATATTCCAGCACCTTCCTCTGGCAATTGGAGTTTCATCGATGTA
TCTGTCGACACCGCGGGTAGCAGCAATTTTGATATGGAAAGACAAAGTCTTGGCAGAAAA
ACA

and here is the database:
ftp://ftp.ncbi.nih.gov/pub/UniVec/UniVec

(I got the parameter from
http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen_docs.html#Parameters)

Best,
SB.


-- 
Vendo isla: http://www.genesdigitales.com/isla/
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Tutorial libre de Python: http://tinyurl.com/2az5d5

"It is pitch black. You are likely to be eaten by a grue." -- Zork


From bugzilla-daemon at portal.open-bio.org  Fri Nov  7 09:37:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 7 Nov 2008 04:37:23 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811070937.mA79bNh9020433@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #51 from lpritc at scri.sari.ac.uk  2008-11-07 04:37 EST -------
Just to perpetuate, what I suggest is (in pseudocode, and with argument names
up for, well, argument):

class Seq:
   [...]
   def startswith_startcodon():
      """ Returns True if the first three bases of the sequence 
           are a valid start codon in the sequence's codon table,
           returns False otherwise
      """

    def endswith_stopcodon():
        """ Returns True if the length of the sequence is a multiple
             of three, and the last three bases are a valid stop codon 
             in the sequence's codon table, returns False otherwise
        """

    def is_cds():
        """ Returns true if the sequence meets the criteria for a CDS, 
             False otherwise.  The criteria are:
             i) The very first three bases of the sequence are a valid start
codon
             ii)  The sequence length is a multiple of three
             iii) The final three bases of the sequence are a valid stop codon
             iv) There are no in-frame stop codons, other than the final stop
codon
        """
        if not self.startswith_startcodon(): return False
        if not endswith_stopcodon(): return False
        # Test for in-frame stop codon, return True if none is found, return
False otherwise

   def translate([...], assert_cds=False, assert_cds_firstcodon=False):
        """ Returns a new Seq object with the protein translation.  
             If assert_cds is True, but the sequence is not a CDS as determined
by self.is_cds(), 
             then an error is thrown.  Otherwise, the sequence is translated
with the 
             first codon read as a methionine, rather than the amino acid which
it 
             would encode at any other position.
             If assert_cdsfirstcodon is true, but the sequence doesn't start
with a valid 
             start codon, then an error is thrown.  Otherwise, the sequence is
translated 
             with the first codon read as a methionine, as above. 
        """
        # Translate away as normal, here
        [...]
        if assert_cds:
            if not self.is_cds(): 
                raise ValueError, "WTF? This is no CDS, my good fellow human!"
            else:
                # Make the first amino acid of the translated sequence a Met
        if assert_cdsfirstcodon:
            if not self.startswith_startcodon():
                raise ValueError, "Hey!  Stop playing around, this sequence
doesn't start with a start codon"
            else:
                # Make the first amino acid of the translated sequence a Met
        # Then continue as normal

This approach provides the following behaviour (assuming things about argument
names that can be thrashed out later)

# I want to translate some nt sequence, and don't care about stops, starts, or
any other stuff
aaseq = ntseq.translate()
# I want to translate my nt sequence to the first in-frame stop codon, and no
further
aaseq = ntseq.translate(to_stop=True)
# I want to know if my nt sequence is a (putative) CDS
ntseq.is_cds()
# I want to know if my nt sequence starts with a start codon
ntseq.startswith_startcodon()
# I want to know if my nt sequence ends with an in-frame stop codon
# Note that this is a different question to asking whether there is *any*
in-frame stop codon
ntseq.endswith_stopcodon()
# I want to translate my nt sequence, which I know is a CDS, 
# but not convert the first codon to a methionine
aaseq = ntseq.translate()
# I want to translate my nt sequence, which I know is a CDS, 
# and convert the first codon to a methionine
aaseq = ntseq.translate(assert_cds=True)
# OK, my sequence isn't a *real* CDS, but it still starts with a valid start
codon
# (I checked already with ntseq.startswith_startcodon()), and I'd like to
convert the first
# codon as if it was really a CDS.  You don't need to know why, I just do.  I'm
wacky that way.
aaseq = ntseq.translate(assert_cdsfirstcodon=True)
# I'd like a list of all my sequences that are valid CDS
seqlist = [s for s in myntseqs if s.is_cds()]
# I'd like translations of all my sequences that are valid CDS
tlist1 = [s.translate() for s in seqlist]
tlist2 = [s.translate() for s in myntseqs if s.is_cds()]


In terms of nomenclature:

The default behaviour of translate() as Peter proposed: read through in-frame
and translate with the appropriate codon table - is fine in nearly all
circumstances.  Most other circumstances are covered by stopping at the first
in-frame stop codon, which Peter has implemented, and is an option we all seem
to agree on.

Biologically-speaking, this behaviour is not always correct for CDS in
prokaryotes, where alternative start codons may occur a significant minority of
the time.  These will be mistranslated if no provision is made for them.  I
think a useful biological sequence object should at least try to mimic actual
biology, so we should provide an option to handle this.

We should not assume that a sequence is a CDS unless it is specified by the
user.  It seems reasonable to me that the term 'cds' should occur in any such
argument from the user.

We have at least two options for how to proceed with a CDS: i) we can provide a
strict CDS-type translation, which requires confirmation that the sequence is,
in fact, a CDS; ii) we can provide a weak CDS-type translation, which only
modifies the way the start codon is translated.  In both cases, behaviour is
specific to CDS, and so having 'cds' in the argument name *somewhere* seems
obvious, and entirely reasonable.

I think that 'assert_cds' makes clear that we are asserting that the sequence
is a valid CDS - no internal stops and everything else that comes with that
status.

I think that 'assert_cdsfirstcodon' avoids any ambiguity over the word 'start',
and also conveys that we are asserting that the first (rather than start) codon
has some relationship to a CDS; in this case the relationship is that the first
codon of the sequence meets the criteria for a CDS.  But that's kind of a long
argument name ;)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov  7 09:48:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 7 Nov 2008 04:48:18 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811070948.mA79mIRl021035@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #52 from lpritc at scri.sari.ac.uk  2008-11-07 04:48 EST -------
(In reply to comment #50)

> The use of 'cds' alone is wrong because cds refer to DNA not translation and
> not to protein sequences. The use of cds is confusing or at least vague until
> you determine how it works. 

I think that translate() also refers only to nucleotide sequences, and
therefore the association of 'cds' is not inherently confusing on that count. 
I think that it can be an appropriate term in an argument name (see above).

> Also it could be wrong in the sense it is a valid
> cds (see the GUG initiation in mammalian NAT1 example at the NCBI link) just
> not allowed by the table in Bio.Data.CodonTable.

It's up to the user to use the correct codon table for their purpose, I think. 
Otherwise, how would you propose to correct for their error?

> [...] 'cds_start' [...] One interpretation that also
> comes to mind is that it is the location of the start of the cds in the
> sequence (cds start at...).

I agree with this.  It has the potential to be confusing.

> This is not a check that it is the start of a cds
> rather it is a check for a possible open reading frame (as not all open reading
> frames are cds).  

It is true that not all ORFs are CDS (indeed, by far the majority are not). 
However, open reading frames do not have to start with - or even contain - a
start codon.  They just do not contain an in-frame stop codon.  We've been over
this definition before (comment #21).

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Fri Nov  7 10:13:21 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 7 Nov 2008 10:13:21 +0000
Subject: [Biopython-dev] Possible problem with NCBIStandalone.blastall
In-Reply-To: <b43bf2080811062256i5b6a35a9h5e0cd4b531cb9496@mail.gmail.com>
References: <b43bf2080811062256i5b6a35a9h5e0cd4b531cb9496@mail.gmail.com>
Message-ID: <320fb6e00811070213i4aa5955arf233180d6a047de0@mail.gmail.com>

On Fri, Nov 7, 2008 at 6:56 AM, Sebastian Bassi wrote:
> When I run a command line blast with these parameters:
>
> /root/blast-2.2.18/bin/blastall -p blastn -d /var/www/blast/db/UniVec
> -q -5 -G 3 -E 3 -F "m D" -e 700 -Y 1.75e12 -i tmpsq
>
> I find a match (with evalue of 18).
> But when I do it from biopyhon I can't find any match:
>
> rh, eh = NCBIStandalone.blastall(blast_exe, "blastn", db,
>                                     fin, nuc_mismatch='-5',
>                                     gap_open = '3',
>                                     gap_extend = '3',
>                                     search_length = '1.75e12',
>                                     expectation='20')

You are not using exactly the same arguments, so its not surprising
you get different results:

-q -5 =>nuc_mismatch = -5 (or as a string)
-G 3 => gap_open = 3 (or as a string)
-E 3 => gap_extend = 3 (or as a string)
-F "m D" => filter="m D" (MISSING!)
-e 700 => expectation=700 (or as a string)
-Y = 1.75e12 => search_length = '1.75e12' (or as a float)

Your expectation cut off is more generous in the Biopython version
(700) than the commanline line version (20), but that wouldn't explain
the difference.  Its probably due to omitting the filter option (-F).
If that doesn't resolve the difference then there is something very
strange going on...

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Nov  7 11:14:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 7 Nov 2008 06:14:13 -0500
Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like
	5933^5934 in GenBank/EMBL files
In-Reply-To: <bug-2622-42@http.bugzilla.open-bio.org/>
Message-ID: <200811071114.mA7BED84026709@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2622


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-07 06:14 EST -------
I've updated CVS to treat a between position like 3^4 (one based counting) as a
zero length slice 3:3.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov  7 11:19:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 7 Nov 2008 06:19:12 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811071119.mA7BJCjd027093@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-07 06:19 EST -------
Marking as fixed - I've updated SeqRecord.py in CVS revision 1.25 to call the
doctests via the __main__ trick, with similar changes for Bio.Seq, Bio.SeqIO
and Bio.AlignIO (the later are complicated due to finding the input files).

Thanks for the encouragement Marco - hopefully this has also made the docstring
documentation more useful, and will also improve the API docs too:
http://biopython.org/DIST/docs/api/ (updated for each release)

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov  7 11:52:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 7 Nov 2008 06:52:50 -0500
Subject: [Biopython-dev] [Bug 2613] test_Wise and test_psw fail under Python
	2.3
In-Reply-To: <bug-2613-42@http.bugzilla.open-bio.org/>
Message-ID: <200811071152.mA7BqoKj029425@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2613


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-07 06:52 EST -------
"Fixed" by skipping these tests (and the recently added test_docstrings.py) if
run on Python 2.3.

Python 2.3 doctest uses slightly different formatting.  It also doesn't support
some features like <BLANKLINE>


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Fri Nov  7 12:32:33 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 7 Nov 2008 12:32:33 +0000
Subject: [Biopython-dev] CVS freeze for Biopython 1.49 (beta)
Message-ID: <320fb6e00811070432x123e806foa06b7f3d94bdb068@mail.gmail.com>

Hi all,

I've been going over a few little things on the unit tests (e.g.
python 2.3's doctest isn't quite the same), and think I am ready to
prepare Biopython 1.49 (beta).

I plan to make the Windows installers for Python 2.3, 2.4 and 2.5
against numpy 1.1.1

Currently there is no Windows version of numpy for python 2.6, so we
won't be able to ship a Windows installer for python 2.6 for Biopython
either.

So, its CVS freeze time.

Once the beta is out (hopefully later today), we can start using CVS
for documentation updates or fixing any bugs reported in the beta.
Then in about a week's time I hope to do the  Biopython 1.49 "final"
release.

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Nov  7 15:18:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 7 Nov 2008 10:18:47 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811071518.mA7FIlHb012537@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


------- Comment #14 from bsouthey at gmail.com  2008-11-07 10:18 EST -------
(In reply to comment #13)
> (In reply to comment #12)
> I have uploaded a fixed version of Bio.NaiveBayes to CVS. Can you check to see
> if you're happy with this version?
> 

Yes!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From sbassi at gmail.com  Fri Nov  7 16:30:34 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Fri, 7 Nov 2008 14:30:34 -0200
Subject: [Biopython-dev] Possible problem with NCBIStandalone.blastall
In-Reply-To: <320fb6e00811070213i4aa5955arf233180d6a047de0@mail.gmail.com>
References: <b43bf2080811062256i5b6a35a9h5e0cd4b531cb9496@mail.gmail.com>
	<320fb6e00811070213i4aa5955arf233180d6a047de0@mail.gmail.com>
Message-ID: <b43bf2080811070830xb99bd6bv31277968af2152f3@mail.gmail.com>

On Fri, Nov 7, 2008 at 8:13 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> -q -5 =>nuc_mismatch = -5 (or as a string)
> -G 3 => gap_open = 3 (or as a string)
> -E 3 => gap_extend = 3 (or as a string)
> -F "m D" => filter="m D" (MISSING!)

I will try with this.

> -e 700 => expectation=700 (or as a string)
> -Y = 1.75e12 => search_length = '1.75e12' (or as a float)

I used string since I have the biopython version with the bug that
doesn't allow me to enter non iterable values.

> the difference.  Its probably due to omitting the filter option (-F).
> If that doesn't resolve the difference then there is something very
> strange going on...

OK, I will check it and get back with the results.
Thank you.
Best,
SB.


From biopython at maubp.freeserve.co.uk  Fri Nov  7 16:53:58 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 7 Nov 2008 16:53:58 +0000
Subject: [Biopython-dev] CVS freeze for Biopython 1.49 (beta)
In-Reply-To: <320fb6e00811070432x123e806foa06b7f3d94bdb068@mail.gmail.com>
References: <320fb6e00811070432x123e806foa06b7f3d94bdb068@mail.gmail.com>
Message-ID: <320fb6e00811070853w77cd415dn68b1889c09388fb6@mail.gmail.com>

> Once the beta is out (hopefully later today), we can start using CVS
> for documentation updates or fixing any bugs reported in the beta.
> Then in about a week's time I hope to do the  Biopython 1.49 "final"
> release.

OK - Biopython 1.49 beta is done, available on the website now :)

Please don't do any new code checkins for the next week.  Additional
documentation and unit tests should be fine - and any bug fixes after
discussion.

I've done a news post, which I can edit if anyone spots anything wrong
or has suggestion for improvement, but it will be a good basis for the
announcement email:

http://news.open-bio.org/news/2008/11/biopython-149-beta-released/

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Nov  7 16:55:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 7 Nov 2008 11:55:22 -0500
Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import
In-Reply-To: <bug-2629-42@http.bugzilla.open-bio.org/>
Message-ID: <200811071655.mA7GtM6F018980@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2629


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-07 11:55 EST -------
Grand - this bug seems to be fixed then (and in time for Biopython 1.49 beta).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 02:56:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 8 Nov 2008 21:56:59 -0500
Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in
	GenBank files
In-Reply-To: <bug-2225-42@http.bugzilla.open-bio.org/>
Message-ID: <200811090256.mA92uxgL025316@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2225


chapmanb at 50mail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from chapmanb at 50mail.com  2008-11-08 21:56 EST -------
Thanks Peter for the heads up on the future changes. Fixed this with respect to
the offered suggestions with Bio/GenBank/Record.py 1.12; Bio/GenBank/Scanner.py
1.25 and Bio/GenBank/__init__.py 1.95.

I left PROJECT output as shown in our example as it was not clear from the
GenBank documentation whether they would be on multiple or single lines. DBLINK
was output over multiple line as defined in the documentation. When files with
DBLINKs are released we should include a test case.

For feature parsing, both DBLINK and PROJECT will be stored as dbxrefs as
suggested.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 15:04:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 10:04:09 -0500
Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in
	GenBank files
In-Reply-To: <bug-2225-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091504.mA9F49hU030667@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2225


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-09 10:04 EST -------
You've got a minor bug in there Brad...

def dblink(self, content):
    """Store DBLINK cross references as dbxrefs in our record object.
    """
    dblinks = [l for l in content.split() if l]
    self.data.dbxrefs.extend(projects)

Should be: self.data.dbxrefs.extend(dblinks)

However, based on the example DBLINK line, we shouldn't be splitting on spaces
at all - for example this transition example for when the PROJECT line and
DBLINK lines are present:

LOCUS       CP000964             5641239 bp    DNA     circular BCT 24-SEP-2008
DEFINITION  Klebsiella pneumoniae 342, complete genome.
ACCESSION   CP000964
VERSION     CP000964.1  GI:206564770
PROJECT     GenomeProject:28471
DBLINK      Project:28471
            Trace Assembly Archive:123456
....

Note that "Trace Assembly Archive:123456" should be a single cross reference. 
I'll attach a patch for CVS in a moment.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 15:07:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 10:07:30 -0500
Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in
	GenBank files
In-Reply-To: <bug-2225-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091507.mA9F7U0N030977@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2225


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-09 10:07 EST -------
Created an attachment (id=1045)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1045&action=view)
Patch to Bio/GenBank/*.py

This patch against CVS assumes DBLINK lines contain one cross reference per
line.

Also maps "GenomeProject:" to "Project:" so that we'll be consistent when the
NCBI change this as part of the PROJECT line to DBLINK line switch.

Should avoid duplicate entries in the dbxrefs list (especially during the
transition period where both PROJECT and DBLINK lines are used).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Sun Nov  9 15:16:50 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 9 Nov 2008 15:16:50 +0000
Subject: [Biopython-dev] Biopython 1.49 beta released
Message-ID: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com>

Dear Biopythoneers,

We are pleased to announce a beta release of Biopython 1.49. There are
been some significant changes since Biopython 1.48 was released two
months ago, which is why we are initially releasing a beta for wider
testing.

As previously announced, the big news is that Biopython now uses NumPy
rather than its precursor Numeric (the original Numerical Python
library).

As in the previous releases, Biopython 1.49 beta supports Python 2.3,
2.4 and 2.5 but should now also work fine on Python 2.6. Please note
that we intend to drop support for Python 2.3 in a couple of releases
time.

We also have some new functionality, starting with the basic sequence
object (the Seq class) which now has more methods. This encourages a
more object orientated coding style, and makes basic biological
operations like transcription and translation more accessible and
discoverable.

Our BioSQL interface can now optionally fetch the NCBI taxonomy on
demand when loading sequences (via Bio.Entrez) allowing you to
populate the taxon/taxon_name tables gradually. Also, BioSQL should
now work with the psycopg2 driver for PostgreSQL (as well as the older
psycopg driver).

Finally, our old parsing infrastructure (Martel and Bio.Mindy) is now
considered to be deprecated, meaning mxTextTools is no longer required
to use Biopython. This should not affect any of the typically used
parsers (e.g. Bio.SeqIO and Bio.AlignIO).

So, if you are feeling brave and know the risks, please try out
Biopython 1.49 beta, and let us know on the mailing lists if it works,
or more importantly if something doesn't.

We'd also like feedback on the updated Biopython Tutorial and Cookbook:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Source distributions and Windows installers are available from the
Biopython website:
http://biopython.org/wiki/Download

Thanks!

-Peter on behalf of the Biopython developers

P.S. Those of you subscribed to our news feed would have seen this
announcement already.  For RSS links etc, see:
http://biopython.org/wiki/News


From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 16:00:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 11:00:39 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091600.mA9G0dZ6003494@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #11 from dalloliogm at gmail.com  2008-11-09 11:00 EST -------
(In reply to comment #10)
> Marking as fixed - I've updated SeqRecord.py in CVS revision 1.25 to call the
> doctests via the __main__ trick, with similar changes for Bio.Seq, Bio.SeqIO
> and Bio.AlignIO (the later are complicated due to finding the input files).
> 
> Thanks for the encouragement Marco - hopefully this has also made the docstring
> documentation more useful, and will also improve the API docs too:
> http://biopython.org/DIST/docs/api/ (updated for each release)

Thanks to you!! :)
I am really happy you accepted my patch. 
I'll see if I can contribute something else.
> 
> Peter
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Sun Nov  9 16:10:59 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 9 Nov 2008 16:10:59 +0000
Subject: [Biopython-dev] Sequences and simple plots
In-Reply-To: <C51BB9C7.17C1C%lpritc@scri.ac.uk>
References: <320fb6e00810150709u2aed9855kb8cf91318f287765@mail.gmail.com>
	<C51BB9C7.17C1C%lpritc@scri.ac.uk>
Message-ID: <320fb6e00811090810s342e78f1n3eb45bba051d236f@mail.gmail.com>

Getting back to simpler plot examples using pylab, Andrew Dalke wrote
up some nice examples plotting Kyte & Doolittle hydrophobicities of
protein sequences:

http://www.dalkescientific.com/writings/NBN/plotting.html

Something based on this idea (but probably leaving out most of the
complicated smoothing stuff and labelling the helices) could make a
short and sweet line plot example for the Biopython tutorial.

Peter


From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 17:29:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 12:29:34 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091729.mA9HTYF1011072@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


dalloliogm at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1042 is|0                           |1
           obsolete|                            |


------- Comment #12 from dalloliogm at gmail.com  2008-11-09 12:29 EST -------
Created an attachment (id=1046)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1046&action=view)
fastPhase output iterator (returns Alignment objects)

This is the rewritten fastphaseoutputIO, which returns an Alignment file
instead of SeqRecords objects.
It can still return SeqRecord objects if a 'ret = seqrecord' parameter is
passed, but Alignemnt are returned by default.

Moreover, I have de-capitalized (.lower()) the name of the function, and added
a link to fastPhase article in the documentation (althought I think the doc
would need more work)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 17:30:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 12:30:25 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091730.mA9HUP6J011190@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


dalloliogm at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1046 is|0                           |1
           obsolete|                            |


------- Comment #13 from dalloliogm at gmail.com  2008-11-09 12:30 EST -------
Created an attachment (id=1047)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1047&action=view)
a doctest file to test fastPhaseOutputIterator


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 17:34:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 12:34:19 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091734.mA9HYJ7I011664@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #14 from dalloliogm at gmail.com  2008-11-09 12:34 EST -------
Created an attachment (id=1048)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1048&action=view)
use cases/description for fastphaseoutputIO

This is a collection of use cases/examples about fastPhaseOutputIO.
I thought it could be useful to understand how this module will be used and by
who, or just to remind me why I wrote this module later :)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 17:41:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 12:41:26 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091741.mA9HfQlr012379@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #15 from dalloliogm at gmail.com  2008-11-09 12:41 EST -------
(In reply to comment #10)
> (In reply to comment #8)
> > > If fastPHASE files SHOULD always come in allele groups (of the same
> > > length), then it would be better to integrate the parser into Bio.AlignIO
> > > giving pairwise alignments (and you would be able to read it via Bio.SeqIO
> > > automatically as well).
> > 
> > This is good idea, I didn't think of it.
> > But how should I modify the module to produce AlignIO objects?
> 
> Essentially Instead of:
> 
> yield record_one
> yield record_two
> 
> you'd do something like this:
> 
> alignment = Alignment(generic_dna)
> alignment.add_sequence(id_one, seq_one)
> alignment.add_sequence(id_two, seq_two)
> yield alignment


I have modified the module so it returns Alignment objects instead of
SeqRecords.
The problem is that Alignment.add_sequence doesn't support SeqRecords objects
as inputs; it only requires an id and the sequence.
This causes that some information is lost: to be more precise, everything I was
putting in 'description' (subpop. label: 6  (internally 1)) is lost, because
there is not a way to store it in the Alignment object.
Moreover, now the parser only returns a single Alignment object per file (I
think it is not supposed to be possible to have two fastphase outputs in the
same file), because I thought it was the most useful thing.
However, I left an option to have SeqRecord objects returned instead of
Alignments (unfortunately I removed them from the doctests :().


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 17:46:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 12:46:13 -0500
Subject: [Biopython-dev] [Bug 2554] Creating an Alignment from a list of
	SeqRecord objects
In-Reply-To: <bug-2554-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091746.mA9HkDPr012817@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2554


------- Comment #3 from dalloliogm at gmail.com  2008-11-09 12:46 EST -------
(In reply to comment #0)
> It would be nice to be able to supply a list (or iterator) of SeqRecord objects
> when creating an alignment object.  This would also make the
> Bio.SeqIO.to_alignment() function obsolete.

I agree with this request; see
http://bugzilla.open-bio.org/show_bug.cgi?id=2643#c15


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 17:52:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 12:52:48 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811091752.mA9HqmqQ013518@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #12 from dalloliogm at gmail.com  2008-11-09 12:52 EST -------
Created an attachment (id=1049)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1049&action=view)
add doctests to Bio.Align.Generic.Alignment

This is a patch to add doctest to Bio.Align.Generic.Alignment.
I just wrote it for myself to understand how this class works.. if you think it
could be useful, here it is.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Nov  9 21:35:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 9 Nov 2008 16:35:25 -0500
Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in
	GenBank files
In-Reply-To: <bug-2225-42@http.bugzilla.open-bio.org/>
Message-ID: <200811092135.mA9LZPBG004563@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2225


------- Comment #6 from chapmanb at 50mail.com  2008-11-09 16:35 EST -------
Peter -- thanks for the bug catch and suggestion. Working into the future and
trying to predict if NCBI is going to do what they plan is always fun. Your fix
looks great to me -- commit away and we can close this out. If things are
different when the actually make the change we can always adjust then but this
looks very sensible.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 08:58:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 03:58:52 -0500
Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for
	arguments for their types
In-Reply-To: <bug-2639-42@http.bugzilla.open-bio.org/>
Message-ID: <200811100858.mAA8wq2i007149@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2639


dalloliogm at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|SeqRecord.init doesn't check|SeqRecord.init doesn't check
                   |for arguments to their types|for arguments for their
                   |                            |types


------- Comment #5 from dalloliogm at gmail.com  2008-11-10 03:58 EST -------
(In reply to comment #4)
> (In reply to comment #3)
> > Created an attachment (id=1041)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1041&action=view) [details] [details]
> > add a check for the seq argument in seqrecord, to be a Seq object and not None
> >
> > This patch adds a check for the seq argument in SeqRecord.
> > If seq is None (by default), it raises a ValueError Exception.
> > If it is a Seq objects, it saves it as self.seq.
> > If it is another kind of object (string, list, integer), it is converted to a
> > string, and then used to instantiate a seq object.
> 
> I was deliberately not checking the seq argument. 

Ok, understood. I didn't thought of these cases.
However, having not a Seq causes errors that are difficult to understand in
other functions that use SeqRecord.
For example, if you do:

>>> a = SeqRecord(id = '1')
>>> a.format('fasta')

you get the error: 
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute
'tostring'

This could scary an eventual biopython newbie, an exception like to 'error -
current SeqRecord object doesn't have a Seq' could be better.
What do you think about creating a 'NullSeq' object, which represent a Seq with
no value, and using it as a default for SeqRecord?
Later we could modify the other functions like .format e Seq.translate to
intercept these objects and return the right error message.


> There are several reasonable
> use cases:
> 
> * a Seq object (normal) or a subclass of it.
> * a MutableSeq object (seems reasonable, note this is not a subclass of Seq)
> * None (seems a good way to handle sequence records where we don't know the
> sequence - for example some GenBank files).
> * a user defined sequence object which implements the Seq API but does not
> subclass Seq or MutableSeq (this is more difficult to check).
> 
> > I thought that someone could use an integer (e.g.: 010100010101101) as a
> > sequence, and in this case, the integer is first converted to a string
> > (otherwise Seq() would return an error).
> 
> Note that if someone did want to use some weird numerical sequence, then the
> SeqRecord object should NOT be trying to do anything special (guessing what is
> intended). The user should create a suitable Seq object themselves (ideally
> with a numerical alphabet object).  Explicit rather than implicit (Zen of
> python).
> 
> --
> 
> Note that I'm not 100% happy with the type checking we've just added.  See
> "duck-typing" and interfaces versus types,
> http://www.python.org/doc/2.5.2/tut/node18.html#l2h-46
> 
> The checks I've added shouldn't be too constraining - but maybe they should use
> using interface checking instead (or just revert back to no checking).
> 
> Any comments from other people?  This should be being CC'd to the dev mailing
> list.
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 09:09:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 04:09:42 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811100909.mAA99g8S008678@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


dalloliogm at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1043 is|0                           |1
           obsolete|                            |


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 10:16:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 05:16:14 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101016.mAAAGERI012974@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-10 05:16 EST -------
(In reply to comment #15)
> I have modified the module so it returns Alignment objects instead of
> SeqRecords.
> The problem is that Alignment.add_sequence doesn't support SeqRecords objects
> as inputs; it only requires an id and the sequence.  This causes that some
> information is lost: to be more precise, everything I was
> putting in 'description' (subpop. label: 6  (internally 1)) is lost, because
> there is not a way to store it in the Alignment object.

Adding a SeqRecord to an alignment would be enhancement request Bug 2553.  I
see you've just spotted enhancement request Bug 2554 which would also solve
this issue nicely. As a short term solution until one of these bugs is
implemented, some of the Bio.AlignIO parsers "cheat" and bypass the public API
to use alignment._records directly (this is just a list of SeqRecord objects).

> Moreover, now the parser only returns a single Alignment object per file (I
> think it is not supposed to be possible to have two fastphase outputs in the
> same file), because I thought it was the most useful thing.

Bio.AlignIO uses generators/iterators just like Bio.SeqIO - so that in general
you can return multiple alignments for use with Bio.AlignIO.parse().  However,
if the file format really does just return one pairwise alignment, then just
yield one alignment (this happens on the Nexus file format).

> However, I left an option to have SeqRecord objects returned instead of
> Alignments (unfortunately I removed them from the doctests :().

If you want this as part of Bio.AlignIO / Bio.SeqIO you don't need to do this. 
Once a parser is added to Bio.AlignIO, the file format can also be used from
Bio.SeqIO to get SeqRecord objects (the rows of all the alignments).

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 10:45:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 05:45:34 -0500
Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in
	GenBank files
In-Reply-To: <bug-2225-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101045.mAAAjYJ6015314@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2225


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-10 05:45 EST -------
(In reply to comment #3)
> When files with DBLINKs are released we should include a test case.

Definitely.  We might be able to just update an existing test case, like the
one added for between locations.

(In reply to comment #6)
> Peter -- thanks for the bug catch and suggestion. Working into the future
> and trying to predict if NCBI is going to do what they plan is always fun. 

Well - they've got about six months to change their mind ;)

> Your fix looks great to me -- commit away and we can close this out.

Checked in.

> If things are different when the actually make the change we can always
> adjust then but this looks very sensible.

OK.

Thanks!

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Mon Nov 10 11:28:00 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 10 Nov 2008 11:28:00 +0000
Subject: [Biopython-dev] [BioPython] annotations in an Alignment object
In-Reply-To: <5aa3b3570811100304o4655fe60o4ecabf41e054c211@mail.gmail.com>
References: <5aa3b3570811100304o4655fe60o4ecabf41e054c211@mail.gmail.com>
Message-ID: <320fb6e00811100328j1a565c36t7f3522344e7c95c0@mail.gmail.com>

On Mon, Nov 10, 2008 at 11:04 AM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
> Is there any way to store some annotations in an Alignment object??
> For example: the alignment tool used, its parameters, its version, the
> date, and the nature of the sequence aligned.

Not officially, no.  This is on my mental list of things to do with
the alignment object (after Biopython 1.49 is done).  I've CC'd the
dev-mailing list which is probably a better place to discuss the
details.

If you look at Bio/AlignIO/StockholmIO.py or the
Bio/AlignIO/FastaIO.py code you'll see I've recorded this kind of
information in a private dictionary, i.e. alignment._annotations.
This makes the data available if anyone really needs it, but signals
that this is not part of the public API and is likely to change.

As part of an alignment annotation enhancement, we should try and
establish some agreed standards for naming annotation entries (and
also counting systems).

> I am asking this because I would like to write a module to create
> ldhat input files from an alignment program.
> A ldhat file (http://www.stats.ox.ac.uk/~mcvean/LDhat/instructions.html)
> is very similar to a fasta file; the only difference is that in its
> first line, it contains three numbers, one of which can't always be
> inferred by the data.

Why go to the trouble of making a new Bio.AlignIO module?  For this
example from the LDhat manual, it looks like a FASTA file with an
extra header:

4 10 1
>SampleA
TCCGC??RTT
>SampleB
TACGC??GTA
>SampleC
TC?-CTTGTA
>SampleD
TCC-CTTGTT

Rather than writing support for a whole new file format, wouldn't it
be easier to do something like this:

alignment = ...
number_a = 4
number_b = 10
number_c = 1

handle = open("example.txt","w")
handle.write("%i %i %i\n" % (number_a, number_b, number_c))
handle.write(alignment.format("fasta"))
handle.close()

Peter


From dalloliogm at gmail.com  Mon Nov 10 11:42:31 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 10 Nov 2008 12:42:31 +0100
Subject: [Biopython-dev] [BioPython] annotations in an Alignment object
In-Reply-To: <320fb6e00811100328j1a565c36t7f3522344e7c95c0@mail.gmail.com>
References: <5aa3b3570811100304o4655fe60o4ecabf41e054c211@mail.gmail.com>
	<320fb6e00811100328j1a565c36t7f3522344e7c95c0@mail.gmail.com>
Message-ID: <5aa3b3570811100342t7c23c0fl2b101be3fd352159@mail.gmail.com>

On Mon, Nov 10, 2008 at 12:28 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Mon, Nov 10, 2008 at 11:04 AM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>> Is there any way to store some annotations in an Alignment object??
>> For example: the alignment tool used, its parameters, its version, the
>> date, and the nature of the sequence aligned.
>
> Not officially, no.  This is on my mental list of things to do with
> the alignment object (after Biopython 1.49 is done).  I've CC'd the
> dev-mailing list which is probably a better place to discuss the
> details.
>
> If you look at Bio/AlignIO/StockholmIO.py or the
> Bio/AlignIO/FastaIO.py code you'll see I've recorded this kind of
> information in a private dictionary, i.e. alignment._annotations.
> This makes the data available if anyone really needs it, but signals
> that this is not part of the public API and is likely to change.
>
> As part of an alignment annotation enhancement, we should try and
> establish some agreed standards for naming annotation entries (and
> also counting systems).

ok... I will use the private dictionary for my own implementation.
Unfortunately I don't have any useful suggestion for this..

>> I am asking this because I would like to write a module to create
>> ldhat input files from an alignment program.
>> A ldhat file (http://www.stats.ox.ac.uk/~mcvean/LDhat/instructions.html)
>> is very similar to a fasta file; the only difference is that in its
>> first line, it contains three numbers, one of which can't always be
>> inferred by the data.
>
> Why go to the trouble of making a new Bio.AlignIO module?  For this
> example from the LDhat manual, it looks like a FASTA file with an
> extra header:

Yeah.. of course :)
Let's say I am simply playing with biopython's code, to better understand it.
Since I am going to use this function many times, I will have to write
a module for it any way.
The first number in the ldhat file is the number of sequences, the
second is their length, and the third should be usually one in an
alignment object, I suppose.

>
> 4 10 1
>>SampleA
> TCCGC??RTT
>>SampleB
> TACGC??GTA
>>SampleC
> TC?-CTTGTA
>>SampleD
> TCC-CTTGTT
>
> Rather than writing support for a whole new file format, wouldn't it
> be easier to do something like this:
>
> alignment = ...
> number_a = 4
> number_b = 10
> number_c = 1
>
> handle = open("example.txt","w")
> handle.write("%i %i %i\n" % (number_a, number_b, number_c))
> handle.write(alignment.format("fasta"))
> handle.close()
>
> Peter
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it


From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 11:48:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 06:48:08 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101148.mAABm8WO019854@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1033 is|0                           |1
           obsolete|                            |


------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-10 06:48 EST -------
(From update of attachment 1033)
Something similar was checked into CVS.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 12:02:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 07:02:12 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101202.mAAC2CV4020912@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1049 is|0                           |1
           obsolete|                            |


------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-10 07:02 EST -------
(From update of attachment 1049)
I've checked in something similar to CVS - thanks Marco.

I've not added a doctest for the format method using "clustal" because I think
the <BLANKLINE> bits make the documentation nasty to read.  Instead I've just
"fasta" and "phylip" only.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 12:14:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 07:14:28 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101214.mAACESXB021859@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-10 07:14 EST -------
(In reply to comment #16)
> (In reply to comment #15)
> > I have modified the module so it returns Alignment objects instead of
> > SeqRecords.
> > The problem is that Alignment.add_sequence doesn't support SeqRecords
> > objects as inputs; it only requires an id and the sequence.  This
> > causes that some information is lost: to be more precise, everything
> > I was putting in 'description' (subpop. label: 6  (internally 1)) is
> > lost, because there is not a way to store it in the Alignment object.
> 
> Adding a SeqRecord to an alignment would be enhancement request Bug 2553.  I
> see you've just spotted enhancement request Bug 2554 which would also solve
> this issue nicely. As a short term solution until one of these bugs is
> implemented, some of the Bio.AlignIO parsers "cheat" and bypass the public API
> to use alignment._records directly (this is just a list of SeqRecord objects).

Or, for another approach which at least avoids private properties but instead
makes an assumption that added sequences are always put at the end of the
alignment:

alignment = Alignment(generic_dna)

alignment.add_sequence(id_one, seq_one)
assert alignment[-1].id == id_one
alignment[-1].description = desrc_one
alignment[-1].annotations["label"] = label_one
...

alignment.add_sequence(id_two, seq_two)
assert alignment[-1].id == id_two
alignment[-1].description = desrc_two
alignment[-1].annotations["label"] = label_two
...
yield alignment

However, I agree with you, the best solution is to pass SeqRecord objects to
the alignment directly (i.e. Bug 2553 and/or Bug 2554).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 16:04:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 11:04:06 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101604.mAAG46Cj008024@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #18 from dalloliogm at gmail.com  2008-11-10 11:04 EST -------
(In reply to comment #17)
> 
> Or, for another approach which at least avoids private properties but instead
> makes an assumption that added sequences are always put at the end of the
> alignment:
> 
> alignment = Alignment(generic_dna)
> 
> alignment.add_sequence(id_one, seq_one)
> assert alignment[-1].id == id_one
> alignment[-1].description = desrc_one
> alignment[-1].annotations["label"] = label_one
> ...
> 
> alignment.add_sequence(id_two, seq_two)
> assert alignment[-1].id == id_two
> alignment[-1].description = desrc_two
> alignment[-1].annotations["label"] = label_two
> ...
> yield alignment
> 

Ok!! I ended up using the first method, but I left a comment in the code to
remind me that.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 16:06:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 11:06:49 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101606.mAAG6nDL008314@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


dalloliogm at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1044 is|0                           |1
           obsolete|                            |


------- Comment #19 from dalloliogm at gmail.com  2008-11-10 11:06 EST -------
Created an attachment (id=1050)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1050&action=view)
fastPhase output iterator (returns an Alignment object with SeqRecords)

This version returns an Alignment object with valid SeqRecord objects, using
the Alignment._records.append trick.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 16:07:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 11:07:27 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101607.mAAG7RLr008403@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


dalloliogm at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1047 is|0                           |1
           obsolete|                            |


------- Comment #20 from dalloliogm at gmail.com  2008-11-10 11:07 EST -------
Created an attachment (id=1051)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1051&action=view)
1047: a doctest file to test fastPhaseOutputIterator

updated for attachment 1050


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 16:34:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 11:34:34 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811101634.mAAGYYbi010826@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #21 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-10 11:34 EST -------
Hi Marco,

Looking at your example, the important part of the file is this bit:

...
BEGIN GENOTYPES
Ind1  # subpop. label: 6  (internally 1)
T T T T T G A A A C C A A A G A C G C T G C G T C A G C C T G C A A T C T G
T T T T T G C C C C C A A A A G C G C G T C G T C A G T C T A A G A C C T A
Ind2  # subpop. label: 6  (internally 1)
C T T T T G C C C T C A A A A G T G C T G T G C C A G T C T A C G G C C T G
T T T T T G A A A C C A A A G A C G C T T C G T C A G T A T A C G A T C T A
END GENOTYPES

Quoting the manual again, "Output ???les for inferred haplotypes or imputed
genotypes contain two lines per given diploid individual, with the order of
individuals corresponding to that supplied in the input ???le."

In this example we have two individuals, Ind1 and Ind2 (presumably with
automatically assigned names).  In a real world example, how many individuals
would you expect to use?  Does it make more sense to return a pairwise
alignment for each individual, rather than one large combined alignment?  One
of the main points for using iterators/generators is they allow us to deal with
very large files by not having to keep everything in memory.  Now I don't have
a feel for what sized files fastPhase could output - maybe a single large
alignment is fine.

i.e. One combined alignment:

IUPACUnambiguousDNA() alignment with 4 rows and 38 columns
TTTTTGAAACCAAAGACGCTGCGTCAGCCTGCAATCTG Ind1_all1
TTTTTGCCCCCAAAAGCGCGTCGTCAGTCTAAGACCTA Ind1_all2
CTTTTGCCCTCAAAAGTGCTGTGCCAGTCTACGGCCTG Ind2_all1
TTTTTGAAACCAAAGACGCTTCGTCAGTATACGATCTA Ind2_all2

versus one pairwise alignment per individual:

IUPACUnambiguousDNA() alignment with 2 rows and 38 columns
TTTTTGAAACCAAAGACGCTGCGTCAGCCTGCAATCTG Ind1_all1
TTTTTGCCCCCAAAAGCGCGTCGTCAGTCTAAGACCTA Ind1_all2

IUPACUnambiguousDNA() alignment with 2 rows and 38 columns
CTTTTGCCCTCAAAAGTGCTGTGCCAGTCTACGGCCTG Ind2_all1
TTTTTGAAACCAAAGACGCTTCGTCAGTATACGATCTA Ind2_all2

I think you'll have to decide this (unless anyone else following this has a
view - Tiago maybe?)

P.S. Have you tried with and without the -n option to automatically name the
individuals?  What happens if the name includes a hash character (#)?  I would
hope fastPhase would treat this as an error, but it could end up in the output
file and confuse the parser.

P.P.S. Based on the examples in the manual, typical output might use lower case
nucleotides (a, t, c, g) or numbers (0, 1).  I presume upper case nucleotides
are also fine, but defaulting to this is a bad idea.  Please default to
Bio.Alphabet.single_letter_alphabet which seems to be the the safest choice (we
shouldn't guess).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 19:19:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 14:19:15 -0500
Subject: [Biopython-dev] [Bug 2649] New: Bio.KDTree expects numpy array with
	dtype="float32" on 64 bit machines.
Message-ID: <bug-2649-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2649

           Summary: Bio.KDTree expects numpy array with dtype="float32" on
                    64 bit machines.
           Product: Biopython
           Version: 1.49b
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: paul at rudin.co.uk


Bio.KDTree expects numpy array with dtype="float32" on 64 bit machines. The
numpy default for floats is "float64" on 64 bit machines and this would seem to
be a more natural and practical choice.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 22:25:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 17:25:33 -0500
Subject: [Biopython-dev] [Bug 2651] New: Error from test_GAQueens.py
Message-ID: <bug-2651-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2651

           Summary: Error from test_GAQueens.py
           Product: Biopython
           Version: 1.49b
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


I got this error with Python2.5 but it is extremely rare. I think that I seen
it before but have never reproduced it. It indicates some bugs are lurking
other than the obvious bug with Seq.py that are being triggered by the test.

======================================================================
ERROR: test_GAQueens                                                  
----------------------------------------------------------------------
Traceback (most recent call last):                                    
  File "run_tests.py", line 125, in runTest                           
    self.runSafeTest()                                                
  File "run_tests.py", line 142, in runSafeTest                       
    cur_test.run_tests([])                                            
  File "test_GAQueens.py", line 42, in run_tests                      
    main(arguments)                                                   
  File "test_GAQueens.py", line 76, in main                           
    evolved_pop = evolver.evolve(queens_solved)                       
  File
"/home/bsouthey/python/biopython-1.49b/build/lib.linux-x86_64-2.5/Bio/GA/Evolver.py",
line 56, in evolve
    self._population = self._selector.select(self._population)                  
  File
"/home/bsouthey/python/biopython-1.49b/build/lib.linux-x86_64-2.5/Bio/GA/Selection/Tournament.py",
line 77, in select
    new_orgs[1])                                                                
  File
"/home/bsouthey/python/biopython-1.49b/build/lib.linux-x86_64-2.5/Bio/GA/Selection/Abstract.py",
line 53, in mutate_and_crossover
    final_org_1 = self._repairer.repair(final_org_1)                            
  File "test_GAQueens.py", line 234, in repair                                  
    duplicated_items = self._get_duplicates(organism.genome)                    
  File "test_GAQueens.py", line 203, in _get_duplicates                         
    if genome.count(item) > 1:                                                  
  File
"/home/bsouthey/python/biopython-1.49b/build/lib.linux-x86_64-2.5/Bio/Seq.py",
line 796, in count                                
    if len(search) == 1 :                                                       
TypeError: object of type 'int' has no len()                                    

----------------------------------------------------------------------


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 10 23:28:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 18:28:26 -0500
Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py
In-Reply-To: <bug-2651-42@http.bugzilla.open-bio.org/>
Message-ID: <200811102328.mAANSQiJ032135@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2651


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |minor
          Component|Main Distribution           |Unit Tests


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-10 18:28 EST -------
What bug in Seq?  Trying to call the count method with an integer argument
instead of string or another Seq should fail - try it on a string for
comparison:

>>> "123456".count(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: expected a character buffer object

I would agree that the TypeError message could be better, "object of type 'int'
has no len()" is a little misleading.  Are you suggesting that be changed?

Genetic algorithms (with a random seed at least) are non deterministic - I've
seen some of the GA unit tests fail every so often (but I'm not sure off hand
if its just test_GAQueens or not).  Rerunning the test will usually be fine. 
The traceback looks familiar so its probably the same issue, but I haven't had
the time or desire to trace through the code to try and work out what is going
wrong.  I would guess it fails far less than 10% of time, but maybe 1% or 2%. 
I guess a quick shell script would answer this ;)

Maybe we should catch the error condition and issue a runtime error saying
"Didn't converge" or whatever would be appropriate terminology.  Or
automatically restart the test?  Or, maybe we can solve the unit test failure
by specifying a random seed - that might be a neat solution.

N.B. Refiling under unit tests.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 02:30:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 10 Nov 2008 21:30:46 -0500
Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py
In-Reply-To: <bug-2651-42@http.bugzilla.open-bio.org/>
Message-ID: <200811110230.mAB2Ukq2020297@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2651


------- Comment #2 from bsouthey at gmail.com  2008-11-10 21:30 EST -------
(In reply to comment #1)
> What bug in Seq?  Trying to call the count method with an integer argument
> instead of string or another Seq should fail - try it on a string for
> comparison:
> 
> >>> "123456".count(1)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: expected a character buffer object
> 
> I would agree that the TypeError message could be better, "object of type 'int'
> has no len()" is a little misleading.  Are you suggesting that be changed?

That is an 'obvious' bug (in light of the error) because there is no check for
that 'sub' is a string. Using the example from the docstring:
my_mseq = MutableSeq("AAAATGA")
my_mseq.count(1) 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.5/site-packages/Bio/Seq.py", line 722, in count
    if len(search) == 1 :
TypeError: object of type 'int' has no len()

Note that using a dict or list work but perhaps these should not. I think you
need to check that 'search' is a string (isinstance(search,basestring)). If
not, then fail with some more informative message. 


> 
> Genetic algorithms (with a random seed at least) are non deterministic - I've
> seen some of the GA unit tests fail every so often (but I'm not sure off hand
> if its just test_GAQueens or not).  Rerunning the test will usually be fine. 
> The traceback looks familiar so its probably the same issue, but I haven't had
> the time or desire to trace through the code to try and work out what is going
> wrong.  I would guess it fails far less than 10% of time, but maybe 1% or 2%. 
> I guess a quick shell script would answer this ;)
> 
> Maybe we should catch the error condition and issue a runtime error saying
> "Didn't converge" or whatever would be appropriate terminology.  Or
> automatically restart the test?  Or, maybe we can solve the unit test failure
> by specifying a random seed - that might be a neat solution.
> 
> N.B. Refiling under unit tests.
> 

I agree with doing one or more of these at least until the source is identified
(hopefully a known case). But I do agree that this is not easy to find and I do
not know anything to help.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 10:10:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 05:10:45 -0500
Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py
In-Reply-To: <bug-2651-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111010.mABAAjQq029851@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2651


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-11 05:10 EST -------
(In reply to comment #2)
>(In reply to comment #1)
>> What bug in Seq?  Trying to call the count method with an integer argument
>> instead of string or another Seq should fail - try it on a string for
>> comparison:
>> 
>> >>> "123456".count(1)
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in ?
>> TypeError: expected a character buffer object
>> 
>> I would agree that the TypeError message could be better, "object of type
>> 'int' has no len()" is a little misleading.  Are you suggesting that be
>> changed?
> 
> That is an 'obvious' bug (in light of the error) because there is no check for
> that 'sub' is a string. Using the example from the docstring:
> my_mseq = MutableSeq("AAAATGA")
> my_mseq.count(1) 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/lib64/python2.5/site-packages/Bio/Seq.py", line 722, in count
>     if len(search) == 1 :
> TypeError: object of type 'int' has no len()
> 
> Note that using a dict or list work but perhaps these should not. I think you
> need to check that 'search' is a string (isinstance(search,basestring)). If
> not, then fail with some more informative message. 

That's done in CVS.

Leaving this bug open to cover the test_GAQueens.py issue.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 11:30:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 06:30:16 -0500
Subject: [Biopython-dev] [Bug 2652] New: Bio.Fasta.Iterator fails with
	IndexError when opening empty fasta files
Message-ID: <bug-2652-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2652

           Summary: Bio.Fasta.Iterator fails with IndexError when opening
                    empty fasta files
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: rjalves at igc.gulbenkian.pt


Instead of IndexError a better error handling or at least a more explicit error
message. At the first look it's not obvious what is causing the error.

Example:

In [1]: from Bio import Fasta

In [2]: Fasta.Iterator(open("empty.fasta"))
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)

/var/lib/python-support/python2.5/Bio/Fasta/__init__.pyc in __init__(self,
handle, parser, debug)
     65         while True :
     66             line = handle.readline()
---> 67             if line[0] == ">" :
     68                 break
     69             if debug : print "Skipping: " + line

IndexError: string index out of range


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 11:30:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 06:30:45 -0500
Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError
	when opening empty fasta files
In-Reply-To: <bug-2652-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111130.mABBUjf8003203@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2652


rjalves at igc.gulbenkian.pt changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|Not Applicable              |1.45


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 11:55:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 06:55:07 -0500
Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError
	when opening empty fasta files
In-Reply-To: <bug-2652-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111155.mABBt7Hf005132@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2652


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-11 06:55 EST -------
Hi Renato,

This bug in Bio.Fasta with empty files was fixed in Biopython 1.49b, see
Bio/Fasta/__init__.py revision 1.19. 
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Fasta/__init__.py?cvsroot=biopython#rev1.19

I would encourage you to try Biopython 1.49b, but if you have a reason for
running an old version like Biopython 1.45, you could probably update just this
one file instead.  Ask if you would like specific instructions, but essentially
its a one line change, from:

if line[0] == ">" :

to:

if not line or line[0] == ">" :

Please note that Bio.Fasta is considered to be obsolete (and was explicitly
documented as such as of Biopython 1.48), and may one day be deprecated. 
However, given this was the main FASTA parsing code in Biopython for some
years, we're not going to deprecate it just yet, so you should be OK continuing
to use Bio.Fasta in old scripts for a while yet.

For new code, we encourage people to use Bio.SeqIO instead, described in the
current tutorial and on the wiki:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
http://biopython.org/wiki/SeqIO

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 12:08:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 07:08:37 -0500
Subject: [Biopython-dev] [Bug 2649] Bio.KDTree expects numpy array with
	dtype="float32" on 64 bit machines.
In-Reply-To: <bug-2649-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111208.mABC8bHw006251@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2649


------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp  2008-11-11 07:08 EST -------
I've uploaded a fixed version to CVS; see KDTree.py and KDTreemodule.c at

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/KDTree/?cvsroot=biopython

Could you try with these files and see if they work for you?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Tue Nov 11 13:02:18 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 11 Nov 2008 13:02:18 +0000
Subject: [Biopython-dev] [BioPython] Cannot __add__ two DBSeq objects
In-Reply-To: <7265d4f0811110439h6c18e111te97d23070565cca2@mail.gmail.com>
References: <7265d4f0811110439h6c18e111te97d23070565cca2@mail.gmail.com>
Message-ID: <320fb6e00811110502y624cf6c1r52c316d61a1f7228@mail.gmail.com>

On Tue, Nov 11, 2008 at 12:39 PM, Cymon Cox <cy at cymon.org> wrote:
> Hi All,
>
> Two DBSeq objects cannot be concatenated, although the DBSeq object inherits
> __add__ from Seq.

Interesting point - not something I'd considered (nor anyone else until now!)

> It tries to init a new DBSeq object rather than returning a Seq object as would be expected.
> ...
> Presumably, DBSeq needs to overide Seq.__add__
> (Using CVS as of yesterday...)

Clearly we can't create a new DBSeq object (there wouldn't be any
suitable sequence in the database to point to), and returning a Seq
object is sensible.  We should probably continue this discussion on
the dev mailing list (CC'd).

Either we have the DBSeq override the __add__ method (and __radd__),
or we could make the base Seq class always use new Seq objects in
__add__ etc.  This would affect anyone writing their own Seq
subclass...

On balance, I think you're right and its DBSeq which needs to be
changed.  Would you like to tackle this, or should I?  We'd also want
to extend the BioSQL unit test to cover adding DBSeq+DBSeq, DBSeq+Seq,
Seq+DBSeq, DBSeq+MutableSeq, MutableSeq+DBSeq, etc.

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 14:48:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 09:48:14 -0500
Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError
	when opening empty fasta files
In-Reply-To: <bug-2652-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111448.mABEmEba019180@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2652


------- Comment #2 from rjalves at igc.gulbenkian.pt  2008-11-11 09:48 EST -------
Hi Peter,

I am using the Biopython package from the debian-lenny repository (which is
1.45), I guess they haven't updated in part due to the change to the Numpy. I
will checkout the svn version then.

As for why I'm using Bio.Fasta, I'm not using it directly.
Bio.SeqUtils.CodonUsage.CodonAdaptationIndex.cai_for_gene() calls it.

Renato


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Tue Nov 11 14:53:32 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 11 Nov 2008 14:53:32 +0000
Subject: [Biopython-dev] [BioPython] Cannot __add__ two DBSeq objects
In-Reply-To: <320fb6e00811110502y624cf6c1r52c316d61a1f7228@mail.gmail.com>
References: <7265d4f0811110439h6c18e111te97d23070565cca2@mail.gmail.com>
	<320fb6e00811110502y624cf6c1r52c316d61a1f7228@mail.gmail.com>
Message-ID: <320fb6e00811110653u63e85bc6k572d5fa42ede8280@mail.gmail.com>

On Tue, Nov 11, 2008 at 1:02 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Tue, Nov 11, 2008 at 12:39 PM, Cymon Cox <cy at cymon.org> wrote:
>> Hi All,
>>
>> Two DBSeq objects cannot be concatenated, although the DBSeq object inherits
>> __add__ from Seq.
>
> Interesting point - not something I'd considered (nor anyone else until now!)
>
>> It tries to init a new DBSeq object rather than returning a Seq object as would be expected.
>> ...
>> Presumably, DBSeq needs to overide Seq.__add__
>> (Using CVS as of yesterday...)
>
> Clearly we can't create a new DBSeq object (there wouldn't be any
> suitable sequence in the database to point to), and returning a Seq
> object is sensible.  We should probably continue this discussion on
> the dev mailing list (CC'd).

Fixed in CVS by implementing the __add__ and __radd__ methods in the
DBSeq object, and having these simply off load the work to the Seq
class.

See:
BioSQL/BioSeq.py revision: 1.28
Tests/test_BioSQL.py revision: 1.26
Tests/output/test_BioSQL revision: 1.2

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 15:28:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 10:28:20 -0500
Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError
	when opening empty fasta files
In-Reply-To: <bug-2652-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111528.mABFSK8A022517@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2652


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-11 10:28 EST -------
(In reply to comment #2)
> I am using the Biopython package from the debian-lenny repository (which is
> 1.45), I guess they haven't updated in part due to the change to the Numpy. I
> will checkout the svn version then.

Debian sid is using Biopython 1.47, I think lenny is just very conservative.

If you don't mind installing NumPy and trying to install Biopython from source,
then you could either try getting the latest Biopython code from CVS, or try
Biopython 1.49 beta which was released just a few days ago.  Ask on the mailing
list if you get stuck.

> As for why I'm using Bio.Fasta, I'm not using it directly.
> Bio.SeqUtils.CodonUsage.CodonAdaptationIndex.cai_for_gene() calls it.

Oh - thanks for that.  I've just updated Bio/SeqUtils/CodonUsage.py to use
Bio.SeqIO instead of Bio.Fasta (plus added a basic check of this module to our
unit tests).

Peter

[Leaving this bug as resolved fixed]


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 15:43:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 10:43:05 -0500
Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError
	when opening empty fasta files
In-Reply-To: <bug-2652-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111543.mABFh5x8023530@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2652


------- Comment #4 from rjalves at igc.gulbenkian.pt  2008-11-11 10:43 EST -------
Thanks Biopython 1.49b installed without any problems


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 15:43:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 10:43:15 -0500
Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError
	when opening empty fasta files
In-Reply-To: <bug-2652-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111543.mABFhFBp023551@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2652


rjalves at igc.gulbenkian.pt changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 15:46:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 10:46:13 -0500
Subject: [Biopython-dev] [Bug 2653] New: Bio.SeqUtils.CodonUsage is not
	translation table aware
Message-ID: <bug-2653-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2653

           Summary: Bio.SeqUtils.CodonUsage is not translation table aware
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


Looking at Bio/SeqUtils/CodonUsage.py there is a hard coded dictionary
SynonymousCodons, presumably for the standard genetic code.

Ideally Bio.SeqUtils.CodonUsage should support any of the genetic code tables
defined in Bio.Data.CodonTable, perhaps via an optional initiation argument to
the CodonAdaptationIndex object.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 11 18:09:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 11 Nov 2008 13:09:20 -0500
Subject: [Biopython-dev] [Bug 2653] Bio.SeqUtils.CodonUsage is not
	translation table aware
In-Reply-To: <bug-2653-42@http.bugzilla.open-bio.org/>
Message-ID: <200811111809.mABI9KXq004974@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2653


rjalves at igc.gulbenkian.pt changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rjalves at igc.gulbenkian.pt


------- Comment #1 from rjalves at igc.gulbenkian.pt  2008-11-11 13:09 EST -------
Thanks for the heads up Peter.

Also related to the reference codon table used... There is the possibility of a
codon being completely absent in all given sequences. In this case the
CodonAdaptationIndex.generate_index() function fails with a ZeroDivisionError
on line 90.

The resource at http://phenotype.biosci.umbc.edu/index.php?page=What_is_CAI
might give some good indications on how to work around this and also other
(improved?) implementations of CAI.

Obviously if you use a different SynonymousCodons table the picture may change.

Renato.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 12 11:14:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 12 Nov 2008 06:14:27 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811121114.mACBER3k002184@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #15 from dalloliogm at gmail.com  2008-11-12 06:14 EST -------
(In reply to comment #13)
> (From update of attachment 1033 [details])
> Something similar was checked into CVS.
> 

(In reply to comment #13)
> (From update of attachment 1033 [details])
> Something similar was checked into CVS.
> 

I saw the changes now!
ok.. But I would prefer to put the doctest in the main __doc__ of the function
instead of __init__ and __repr__.
This is because otherwise they wouldn't be accessible by the users with the
help function.
Usually you do help(SeqRecord), not help(SeqRecord.__init__).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 12 11:47:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 12 Nov 2008 06:47:25 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811121147.mACBlP4T005886@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-12 06:47 EST -------
(In reply to comment #15)
> I saw the changes now!

The CVS website is updated once an hour, you track this on
http://biopython.org/wiki/Tracking_CVS_commits which displays the RSS feed,
http://biopython.open-bio.org/CVS2RSS/biopython.rss (this works great apart
from the links when more than one file is changed).

> ok.. But I would prefer to put the doctest in the main __doc__ of
> the function instead of __init__ and __repr__.
> This is because otherwise they wouldn't be accessible by the users with the
> help function.  Usually you do help(SeqRecord), not help(SeqRecord.__init__).

If you do help(object) it shows you the main docstring followed by all the
methods and their docstrings (including __init__).

On the other hand all the special methods like __init__, __str__, __repr__ etc
are going to be confusing for a beginner.

On balance, a short example in the main docstring (covering __init__) does seem
sensible, and perhaps the __init__ example is then redundant.

Does anyone else want to comment?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From cymon.cox at googlemail.com  Wed Nov 12 10:57:12 2008
From: cymon.cox at googlemail.com (Cymon Cox)
Date: Wed, 12 Nov 2008 10:57:12 +0000
Subject: [Biopython-dev] BioSQL buglets
Message-ID: <7265d4f0811120257y241f67fl514b77cb03712552@mail.gmail.com>

All,

Selects on the seqfeature_qualifier_value and dbxref tables were not being
ordered by rank. This caused multiple qualifier values to be out of order
which in turn caused the tests to fail - see comment in
http://bugzilla.open-bio.org/show_bug.cgi?id=2616

This also solves a TODO in the test_BioSQL_SeqIO.py:

 85 +#TODO - Pin down the "Duplicate entry" IntegrityError from this:
 86 +#    ("genbank",False, 'GenBank/cor6_6.gb', 6),

This test now works and Ive generated new output.

In test_BioSQL.py create_database(), postgres returns an error string that
'find's on index 0 when the the database doesnt exist. The comparision
therefore needs to be >= 0 rather than >0.

All tests now pass OK with postgresql/psycopg2.
Patch attached.

Cheers, C.
--
-------------- next part --------------
A non-text attachment was scrubbed...
Name: biosql.patch
Type: text/x-patch
Size: 5105 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20081112/ba4e35b3/attachment-0002.bin>

From bugzilla-daemon at portal.open-bio.org  Wed Nov 12 13:12:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 12 Nov 2008 08:12:24 -0500
Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2
In-Reply-To: <bug-2616-42@http.bugzilla.open-bio.org/>
Message-ID: <200811121312.mACDCOdj011669@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2616


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-12 08:12 EST -------
(In reply to comment #10)
> 
> We still need to sort out the feature qualifiers loss of ordering...
> 

Fixed in CVS with a another patch from Cymon (via the mailing list).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Wed Nov 12 13:13:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 12 Nov 2008 13:13:16 +0000
Subject: [Biopython-dev] BioSQL buglets
In-Reply-To: <7265d4f0811120257y241f67fl514b77cb03712552@mail.gmail.com>
References: <7265d4f0811120257y241f67fl514b77cb03712552@mail.gmail.com>
Message-ID: <320fb6e00811120513p3be878b8pe0c5a48fa3945ff5@mail.gmail.com>

On Wed, Nov 12, 2008 at 10:57 AM, Cymon Cox <cymon.cox at googlemail.com> wrote:
> All,
>
> Selects on the seqfeature_qualifier_value and dbxref tables were not being
> ordered by rank. This caused multiple qualifier values to be out of order
> which in turn caused the tests to fail - see comment in
> http://bugzilla.open-bio.org/show_bug.cgi?id=2616
>
> This also solves a TODO in the test_BioSQL_SeqIO.py:
>
>  85 +#TODO - Pin down the "Duplicate entry" IntegrityError from this:
>  86 +#    ("genbank",False, 'GenBank/cor6_6.gb', 6),
>
> This test now works and Ive generated new output.
>
> In test_BioSQL.py create_database(), postgres returns an error string that
> 'find's on index 0 when the the database doesnt exist. The comparision
> therefore needs to be >= 0 rather than >0.
>
> All tests now pass OK with postgresql/psycopg2.
> Patch attached.
>
> Cheers, C.

Excellent - that patch made perfect sense and I've checked it in
(almost as is - I tweaked the find index bit slightly).  Thank you!

At this rate you'll be co-opted as an official maintainer for the
BioSQL module ;)

Peter

P.S. It might have been better to upload the patch to Bug 2616 (or a
new Bug) rather than sending it to everyone on the mailing list.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 12 15:35:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 12 Nov 2008 10:35:54 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811121535.mACFZsMl021458@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #17 from dalloliogm at gmail.com  2008-11-12 10:35 EST -------
(In reply to comment #16)
> (In reply to comment #15)
> > I saw the changes now!
> 
> The CVS website is updated once an hour, you track this on
> http://biopython.org/wiki/Tracking_CVS_commits which displays the RSS feed,
> http://biopython.open-bio.org/CVS2RSS/biopython.rss (this works great apart
> from the links when more than one file is changed).
> 
> > ok.. But I would prefer to put the doctest in the main __doc__ of
> > the function instead of __init__ and __repr__.
> > This is because otherwise they wouldn't be accessible by the users with the
> > help function.  Usually you do help(SeqRecord), not help(SeqRecord.__init__).
> 
> If you do help(object) it shows you the main docstring followed by all the
> methods and their docstrings (including __init__).
> 
> On the other hand all the special methods like __init__, __str__, __repr__ etc
> are going to be confusing for a beginner.
> 
> On balance, a short example in the main docstring (covering __init__) does seem
> sensible, and perhaps the __init__ example is then redundant.

well, I was saying that maybe it would be better to move the doctests in
__init__ and __repr__ to the main __doc__ of the module.
So it will be visible by people using help(module). 
Moreover, you can to test __repr__ and __init__ from there, without having to
repeat the 'from Bio.ALign.Generic import Alignment' stuff and similar every
time.


as for a few comments you added in Bio.Align.Generic:

> #A doctest for __repr__ would be nice, but __class__ comes out differently
> #if run via the __main__ trick.

maybe you can use the '+ELLIPSIS' directive 

and about this comment:
#A doctest would be nice, but the <BLANKLINE> stuff is very ugly!
#The "tab" format is possible, but tabs don't seem to work nicely in doctests.

you could use the directive NORMALIZE_WHITESPACE in a similar way.
I am attaching a file just to give you an example of how it could be with
+ELLIPSIS


> Does anyone else want to comment?
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 12 15:36:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 12 Nov 2008 10:36:37 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811121536.mACFabdk021517@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


------- Comment #18 from dalloliogm at gmail.com  2008-11-12 10:36 EST -------
Created an attachment (id=1052)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1052&action=view)
example of ellipsis directive

Example of doctest with ellipsis directive to test Alignment.__repr__


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From dalloliogm at gmail.com  Wed Nov 12 16:25:47 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Wed, 12 Nov 2008 17:25:47 +0100
Subject: [Biopython-dev] a sequence set object in biopython?
Message-ID: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com>

Hi,
I think it could be useful to add a generic SequenceSet object in biopython.
Such an object would represent a generic set of sequences, and could
have some useful methods like .format('fasta') or
.align('alignment_tool').
Is there something similar available already?
I have noticed that the actual Generic.Alignment is very similar to
such an object. However, it would be better to be able to work with a
separated class, because sometimes you want to deal with sequences
that are not aligned.

Some use cases:
- a set of sequences that represents all introns in a particular gene,
on which I want to calculate the conservation of the splicing
regulatory sites.
- all genes sequences in an organisms, which I want to convert in EMBL format
- a set of seqs to be aligned or used as input for other tools
etc..
-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it


From bugzilla-daemon at portal.open-bio.org  Wed Nov 12 16:29:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 12 Nov 2008 11:29:07 -0500
Subject: [Biopython-dev] [Bug 2552] Adding alignments
In-Reply-To: <bug-2552-42@http.bugzilla.open-bio.org/>
Message-ID: <200811121629.mACGT7gs025634@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2552


cymon.cox at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |cymon.cox at gmail.com


------- Comment #1 from cymon.cox at gmail.com  2008-11-12 11:29 EST -------
(In reply to comment #0)
> This is related to the very broad alignment bug 1944.
> 
> Given two alignments, it can make sense to talk about adding them together.

Actually, this is a very common procedure in phylogenetic analyses, where
multiple genes/loci are combined into a "super" matrix for a set of taxa.
Although, in this case, adding by column, if a taxon/row/identifier was missing
in a particular (sub-)alignment it would be filled by "-" (missing data) in the
combined matrix.

Anyway, I think this would be a very useful enhancement.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Wed Nov 12 17:53:35 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 12 Nov 2008 17:53:35 +0000
Subject: [Biopython-dev] [BioPython] a sequence set object in biopython?
In-Reply-To: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com>
References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com>
Message-ID: <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com>

On Wed, Nov 12, 2008 at 4:25 PM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
> Hi,
> I think it could be useful to add a generic SequenceSet object in biopython.
> Such an object would represent a generic set of sequences, and could
> have some useful methods like .format('fasta') or
> .align('alignment_tool').
> Is there something similar available already?

Given your example to turn the SequenceSet into a FASTA file, then
clearly you are thinking of a collection of SeqRecord objects rather
than just Seq objects.  For this kind of thing I personally just use a
list of SeqRecord objects.

If I want to turn a list of SeqRecord objects into a FASTA file, I can
pass the list to the Bio.SeqIO.write() function.  Once I've made a
FASTA file, I can call an external tool to align them - and then load
them in again using Bio.AlignIO or Bio.SeqIO depending on what I plan
to do next.

> I have noticed that the actual Generic.Alignment is very similar to
> such an object. However, it would be better to be able to work with a
> separated class, because sometimes you want to deal with sequences
> that are not aligned.

Yes, the generic alignment is basically a list of SeqRecord objects
plus some extra functionality like column access.

> Some use cases:
> - a set of sequences that represents all introns in a particular gene,
> on which I want to calculate the conservation of the splicing
> regulatory sites.
> - all genes sequences in an organisms, which I want to convert in EMBL format
> - a set of seqs to be aligned or used as input for other tools
> etc..

All sensible use cases - but all seem to be covered by a simple python
list of SeqRecord objects, or in some cases a list of Seq objects
(e.g. the introns example, as I doube the introns have names).

Peter


From tiagoantao at gmail.com  Wed Nov 12 18:02:11 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 12 Nov 2008 18:02:11 +0000
Subject: [Biopython-dev] PopGen status and new developments
Message-ID: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com>

Hi,

This an email with the status of current PopGen developments. In some
points, advice is especially welcome.


A. Platform support

As Peter noticed there is no Simcoal for the Mac. In a couple of weeks
I hope to have access to a Mac in order to try to compile it. In any
case I wont be able to distribute it without getting permission from
the authors, so the problem might remain...
I am now preparing support for LDNe, an application to estimate Ne
(effective population size) from LD. This application is Dos(Windows)
only. Source code is not available to the public (but the app is free
as free beer). I've had access to the source and compiled a Linux
version, again, I don't know if the author will let me distribute it.
Question: How do people feel about supporting an application like
this? Any strong feelings against?


B. New developments

1. The above LDNe module is fully coded, and being tested by a few
people (not just me). Test code and documentation TBD but easy.
2. Genepop application support (no confusion with file format support,
which is done). Partially done and informally tested. Plan to start
with just partial support.
3. Fstat parser. Coded.


C. Statistics

An ongoing interesting discussion started on statistics. I am delayed
with doing a proposal to handle statistical processing (my bad, but I
will have some free time in the next couple of weeks and I will try to
recover). My current existing code on the subject is available on
Github (by Giovanni), but I think it will need some change (not in the
functionality, but in the architecture).


From biopython at maubp.freeserve.co.uk  Wed Nov 12 18:06:19 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 12 Nov 2008 18:06:19 +0000
Subject: [Biopython-dev] [BioPython] a sequence set object in biopython?
In-Reply-To: <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com>
References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com>
	<320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com>
Message-ID: <320fb6e00811121006mbe32efar2fca638d1a5fe2ef@mail.gmail.com>

On Wed, Nov 12, 2008 at 5:53 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Wed, Nov 12, 2008 at 4:25 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>> Hi,
>> I think it could be useful to add a generic SequenceSet object in biopython.
>> Such an object would represent a generic set of sequences, and could
>> have some useful methods like .format('fasta') or
>> .align('alignment_tool').
>> Is there something similar available already?
>
> Given your example to turn the SequenceSet into a FASTA file, then
> clearly you are thinking of a collection of SeqRecord objects rather
> than just Seq objects.  For this kind of thing I personally just use a
> list of SeqRecord objects.
>
> If I want to turn a list of SeqRecord objects into a FASTA file, I can
> pass the list to the Bio.SeqIO.write() function.  Once I've made a
> FASTA file, I can call an external tool to align them - and then load
> them in again using Bio.AlignIO or Bio.SeqIO depending on what I plan
> to do next.

If you really want a list like object with a format method in your
code, how about something like this:

class SeqRecordList(list) :
    """Subclass of the python list, to hold SeqRecord objects only."""
    #TODO - Override the list methods to make sure all the items
    #are indeed SeqRecord objects

    def format(self, format) :
        """Returns a string of all the records in a requested file format.

        The argument format should be any file format supported by
        the Bio.SeqIO.write() function.  This must be a lower case string.
        """
        from Bio import SeqIO
        from StringIO import StringIO
        handle = StringIO()
        SeqIO.write(self, handle, format)
        handle.seek(0)
        return handle.read()

if __name__ == "__main__" :
    print "Loading records..."
    from Bio import SeqIO
    my_list = SeqRecordList(SeqIO.parse(open("ls_orchid.gbk"),"genbank"))
    print len(my_list)
    for format in ["fasta","tab"] :
        print
        print format
        print "="*len(format)
        print my_list.format(format)


Peter


From biopython at maubp.freeserve.co.uk  Wed Nov 12 18:11:30 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 12 Nov 2008 18:11:30 +0000
Subject: [Biopython-dev] PopGen status and new developments
In-Reply-To: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com>
References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com>
Message-ID: <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com>

Tiago Ant?o wrote:
> A. Platform support
>
> As Peter noticed there is no Simcoal for the Mac. In a couple of weeks
> I hope to have access to a Mac in order to try to compile it. In any
> case I wont be able to distribute it without getting permission from
> the authors, so the problem might remain...
> I am now preparing support for LDNe, an application to estimate Ne
> (effective population size) from LD. This application is Dos(Windows)
> only. Source code is not available to the public (but the app is free
> as free beer). I've had access to the source and compiled a Linux
> version, again, I don't know if the author will let me distribute it.
> Question: How do people feel about supporting an application like
> this? Any strong feelings against?

Assuming the tools are useful, then I have no objection to including
command line wrappers for them in Biopython.

I'm not 100% sure what you meant by "supporting an application like
this", but if you are asking about supporting these cross-platform
ports of the actual command line tools, then I don't see that as
something Biopython should be doing.

Peter


From tiagoantao at gmail.com  Wed Nov 12 18:16:06 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 12 Nov 2008 18:16:06 +0000
Subject: [Biopython-dev] PopGen status and new developments
In-Reply-To: <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com>
References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com>
	<320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com>
Message-ID: <6d941f120811121016q17451c83u12b2233eba625944@mail.gmail.com>

On Wed, Nov 12, 2008 at 6:11 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> I'm not 100% sure what you meant by "supporting an application like
> this", but if you are asking about supporting these cross-platform
> ports of the actual command line tools, then I don't see that as
> something Biopython should be doing.


Sorry, I was not clear: I was just asking about supporting
applications that dont have the source available and that don't
support all common platforms (the case of LDNe).


From dalloliogm at gmail.com  Wed Nov 12 18:17:48 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Wed, 12 Nov 2008 19:17:48 +0100
Subject: [Biopython-dev] [BioPython] a sequence set object in biopython?
In-Reply-To: <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com>
References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com>
	<320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com>
Message-ID: <5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com>

On Wed, Nov 12, 2008 at 6:53 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Wed, Nov 12, 2008 at 4:25 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>> Hi,
>> I think it could be useful to add a generic SequenceSet object in biopython.
>> Such an object would represent a generic set of sequences, and could
>> have some useful methods like .format('fasta') or
>> .align('alignment_tool').
>> Is there something similar available already?
>
> Given your example to turn the SequenceSet into a FASTA file, then
> clearly you are thinking of a collection of SeqRecord objects rather
> than just Seq objects.  For this kind of thing I personally just use a
> list of SeqRecord objects.
>
> If I want to turn a list of SeqRecord objects into a FASTA file, I can
> pass the list to the Bio.SeqIO.write() function.  Once I've made a
> FASTA file, I can call an external tool to align them - and then load
> them in again using Bio.AlignIO or Bio.SeqIO depending on what I plan
> to do next.
>
>> Some use cases:
>> - a set of sequences that represents all introns in a particular gene,
>> on which I want to calculate the conservation of the splicing
>> regulatory sites.
>> - all genes sequences in an organisms, which I want to convert in EMBL format
>> - a set of seqs to be aligned or used as input for other tools
>> etc..
>
> All sensible use cases - but all seem to be covered by a simple python
> list of SeqRecord objects, or in some cases a list of Seq objects
> (e.g. the introns example, as I doube the introns have names).
>

Not always.
For example, if I have a set of genes in an organism, sometimes I
would need to access to only some of them, by their id; so, a
__getattribute__ method to make it work as a dictionary could also be
useful.
The fact is that I think that such an object would be so widely used,
that maybe it would be useful to implement it in biopython.
What I would do, honestly, is to create a GenericSeqRecordSet class
from which to derive Alignment, specifying that in an alignment all
the sequences should have the same lenght. It would not require much
work and it would change the interface.


very tiny little minusculus p.s. if you need help for implement such a
thing or anything else I can volounteer :).

> Peter
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it


From dalloliogm at gmail.com  Wed Nov 12 18:19:50 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Wed, 12 Nov 2008 19:19:50 +0100
Subject: [Biopython-dev] PopGen status and new developments
In-Reply-To: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com>
References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com>
Message-ID: <5aa3b3570811121019k3a0710f1n2add599ce0b4f56a@mail.gmail.com>

On Wed, Nov 12, 2008 at 7:02 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> Hi,
>
> This an email with the status of current PopGen developments. In some
> points, advice is especially welcome.

Hi Tiago!!
Have you noticed (I thought it wasn't directly related to PopGen so I
didn't tell you directly) about this parser for fastPhaseOutput?
- http://bugzilla.open-bio.org/show_bug.cgi?id=2643

>
>
> A. Platform support
>
> As Peter noticed there is no Simcoal for the Mac. In a couple of weeks
> I hope to have access to a Mac in order to try to compile it. In any
> case I wont be able to distribute it without getting permission from
> the authors, so the problem might remain...
> I am now preparing support for LDNe, an application to estimate Ne
> (effective population size) from LD. This application is Dos(Windows)
> only. Source code is not available to the public (but the app is free
> as free beer). I've had access to the source and compiled a Linux
> version, again, I don't know if the author will let me distribute it.
> Question: How do people feel about supporting an application like
> this? Any strong feelings against?
>
>
> B. New developments
>
> 1. The above LDNe module is fully coded, and being tested by a few
> people (not just me). Test code and documentation TBD but easy.
> 2. Genepop application support (no confusion with file format support,
> which is done). Partially done and informally tested. Plan to start
> with just partial support.
> 3. Fstat parser. Coded.
>
>
> C. Statistics
>
> An ongoing interesting discussion started on statistics. I am delayed
> with doing a proposal to handle statistical processing (my bad, but I
> will have some free time in the next couple of weeks and I will try to
> recover). My current existing code on the subject is available on
> Github (by Giovanni), but I think it will need some change (not in the
> functionality, but in the architecture).
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it


From biopython at maubp.freeserve.co.uk  Wed Nov 12 18:36:11 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 12 Nov 2008 18:36:11 +0000
Subject: [Biopython-dev] [BioPython] a sequence set object in biopython?
In-Reply-To: <5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com>
References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com>
	<320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com>
	<5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com>
Message-ID: <320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com>

Giovanni Marco Dall'Olio wrote:
>> All sensible use cases - but all seem to be covered by a simple python
>> list of SeqRecord objects, or in some cases a list of Seq objects
>> (e.g. the introns example, as I doube the introns have names).
>
> Not always.
> For example, if I have a set of genes in an organism, sometimes I
> would need to access to only some of them, by their id; so, a
> __getattribute__ method to make it work as a dictionary could also be
> useful.

OK, then use a dict of SeqRecords for this, as shown in the tutorial
chapter for Bio.SeqIO and the wiki.  We even have a helper function
Bio.SeqIO.to_dict() to do this and check for duplicate keys.

If you need an order preserving dictionary, there are examples of this
on the net and there is even PEP372 for adding this to python itself:
http://www.python.org/dev/peps/pep-0372/

> The fact is that I think that such an object would be so widely used,
> that maybe it would be useful to implement it in biopython.
> What I would do, honestly, is to create a GenericSeqRecordSet class
> from which to derive Alignment, specifying that in an alignment all
> the sequences should have the same lenght. It would not require much
> work and it would change the interface.

I agree that IF we added some sort of "GenericSeqRecordSet class", it
might be sensible for the alignment objects to subclass it -
especially if you want it to behave list a python list primarily.
Note that in python sets are not order preserving.

> very tiny little minusculus p.s. if you need help for implement such a
> thing or anything else I can volounteer :).

That's good to hear :)

However, we'd have to establish the need for this new object first -
but so far we've only had two people's view so its too early to form a
consensus.  I don't see a strong reason for adding yet another object,
when the core language provides lists, sets and dict which seem to be
enough.

Peter


From jflatow at gmail.com  Wed Nov 12 18:52:35 2008
From: jflatow at gmail.com (Jared Flatow)
Date: Wed, 12 Nov 2008 12:52:35 -0600
Subject: [Biopython-dev] [BioPython] a sequence set object in biopython?
In-Reply-To: <320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com>
References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com>
	<320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com>
	<5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com>
	<320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com>
Message-ID: <ACD9FBEC-07B9-43D3-BAA6-CA538F6DC43C@gmail.com>

On Nov 12, 2008, at 12:36 PM, Peter wrote:

> However, we'd have to establish the need for this new object first -
> but so far we've only had two people's view so its too early to form a
> consensus.  I don't see a strong reason for adding yet another object,
> when the core language provides lists, sets and dict which seem to be
> enough.

I totally agree with you Peter, that's what the basic container types  
are for. If someone wants to create a subclass of these containers for  
a specific purpose it is simple enough to do. IMO its kind of silly to  
try and make sequence specific containers that satisfy everyone's needs.

jared


From bsouthey at gmail.com  Wed Nov 12 18:58:05 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 12 Nov 2008 12:58:05 -0600
Subject: [Biopython-dev] PopGen status and new developments
In-Reply-To: <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com>
References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com>
	<320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com>
Message-ID: <491B273D.9020404@gmail.com>

Peter wrote:
> Tiago Ant?o wrote:
>   
>> A. Platform support
>>
>> As Peter noticed there is no Simcoal for the Mac. In a couple of weeks
>> I hope to have access to a Mac in order to try to compile it. In any
>> case I wont be able to distribute it without getting permission from
>> the authors, so the problem might remain...
>> I am now preparing support for LDNe, an application to estimate Ne
>> (effective population size) from LD. This application is Dos(Windows)
>> only. Source code is not available to the public (but the app is free
>> as free beer). I've had access to the source and compiled a Linux
>> version, again, I don't know if the author will let me distribute it.
>> Question: How do people feel about supporting an application like
>> this? Any strong feelings against?
>>     
>
> Assuming the tools are useful, then I have no objection to including
> command line wrappers for them in Biopython.
>
> I'm not 100% sure what you meant by "supporting an application like
> this", but if you are asking about supporting these cross-platform
> ports of the actual command line tools, then I don't see that as
> something Biopython should be doing.
>
> Peter
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>   
Hi,
I do have concerns about usefulness with regards to Biopython.

How widespread is the application?
What platforms is it released under (DOS only or some version of windows 
version like XP or Vista or Windows 7)?
Is the application well supported and will it continue to be supported?
Under what terms is the application 'free'?
How does this integrate into your ideas for Popgen?
Would it work like say clustalw where you output something from 
Biopython, run the application and perhaps import something back into 
Biopython?

If the application requires major data formatting then you would have to 
determine if it is easier to support the application or integrate it 
into Biopython. Obviously, this latter requires a clean room 
implementation of the application or the essential algorithm. Also, you 
can only provide the specification and can not be involved the actual 
implementation.

Bruce


From tiagoantao at gmail.com  Wed Nov 12 20:09:31 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 12 Nov 2008 20:09:31 +0000
Subject: [Biopython-dev] PopGen status and new developments
In-Reply-To: <491B273D.9020404@gmail.com>
References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com>
	<320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com>
	<491B273D.9020404@gmail.com>
Message-ID: <6d941f120811121209n75dfb0cfh1fb4e57a98011ed0@mail.gmail.com>

Hi,

On Wed, Nov 12, 2008 at 6:58 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> I do have concerns about usefulness with regards to Biopython.

It is important to notice that having this application support has no
big impact on deployment of biopython. The only visible thing is some
tests reporting that the application doesn't exist. This is different
from adding a dependency on, say, scipy. I don't think that this
imposes any maintenance/installation hurdle at large. I think, this is
actually a non-problem on the deployment stage, at least.

> How widespread is the application?

The application is fairly new (genepop, on the other hand is widely
used and old). I cannot answer that question. I know of some people
using it, but it is my small, biased, universe. I would guess that
currently the number is small.

Is there a policy to only support widespread applications?

> What platforms is it released under (DOS only or some version of windows
> version like XP or Vista or Windows 7)?

There is a Dos and Windows frontend. I actually asked the code to the
authors and they gave me access to it. I have compiled a Linux
version, but I don't know if they are going to make it available.

> Is the application well supported and will it continue to be supported?

Regarding current support, I can subjectively say that the authors
answer my queries rather fast. Regarding the future, I dont know.

> Under what terms is the application 'free'?

Much software available in this field is made available without no
regards for licensing issues. This is already the case for the
supported Fdist application (source available, no license).
This is problem in the field, where people make things available
without much concern for licensing issues. Some people don't care that
much about that, they just "make things available".
So, if there is a policy to only support applications for which there
is a clear license, then this one is out (and some code has to be
removed from the current PopGen module, by the way). I never link the
code in, I just invoke it (these are mostly wrappers), so there should
be no legal issues in any case, I suspect.

There is a chicken and egg problem here that needs to be fought: In
population genetics there is no widespread tradition of making things
open (not because people want closed solutions, but mostly because
people don't think about these issues). There is also little tradition
in coding (people want ready made solutions. The coding people is
relatively few and mostly R based) than in other areas. As an example:
i don't know of many direct users of fdist code, but know lots of
people which use applications made on top of that code.

By the way, Simcoal is GPL (and there are more examples of open code
in population genetics, of course).

> How does this integrate into your ideas for Popgen?

Very well. I have this stated philosophy, from the beginning, of using
existing applications and not reinvent the wheel. That being said, I
agree that a core statistic implementation should be done (even if
there are alternatives). But, mostly, for now, what is available in
Bio.PopGen are intelligent wrappers.

> Would it work like say clustalw where you output something from Biopython,
> run the application and perhaps import something back into Biopython?

Yep, it accepts genepop files and the output is fully parsed back.
This is still not the case, by the way, with simcoal where the output
is not usable (arlequin is needed to analyze the results). I need to
do an arlequin parser, that would solve the problem.

> If the application requires major data formatting then you would have to

It doesn't require any formatting at all as the de facto standard
format in the area (genepop) is supported and the results are parsed
back.

Tiago


From dalloliogm at gmail.com  Thu Nov 13 00:16:44 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Thu, 13 Nov 2008 01:16:44 +0100
Subject: [Biopython-dev] [BioPython] a sequence set object in biopython?
In-Reply-To: <320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com>
References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com>
	<320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com>
	<5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com>
	<320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com>
Message-ID: <5aa3b3570811121616u5f95cc8du9f0d91e4743f067f@mail.gmail.com>

On Wed, Nov 12, 2008 at 7:36 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Giovanni Marco Dall'Olio wrote:
>>> All sensible use cases - but all seem to be covered by a simple python
>>> list of SeqRecord objects, or in some cases a list of Seq objects
>>> (e.g. the introns example, as I doube the introns have names).
>>
>> Not always.
>> For example, if I have a set of genes in an organism, sometimes I
>> would need to access to only some of them, by their id; so, a
>> __getattribute__ method to make it work as a dictionary could also be
>> useful.
>
> OK, then use a dict of SeqRecords for this, as shown in the tutorial
> chapter for Bio.SeqIO and the wiki.  We even have a helper function
> Bio.SeqIO.to_dict() to do this and check for duplicate keys.

I would prefer a SeqRecordSet object with a to_dict method :)

> If you need an order preserving dictionary, there are examples of this
> on the net and there is even PEP372 for adding this to python itself:
> http://www.python.org/dev/peps/pep-0372/

>> The fact is that I think that such an object would be so widely used,
>> that maybe it would be useful to implement it in biopython.
>> What I would do, honestly, is to create a GenericSeqRecordSet class
>> from which to derive Alignment, specifying that in an alignment all
>> the sequences should have the same lenght. It would not require much
>> work and it would change the interface.
>
> I agree that IF we added some sort of "GenericSeqRecordSet class", it
> might be sensible for the alignment objects to subclass it -
> especially if you want it to behave list a python list primarily.

Let's see it from another point of view.
In biopython, if you want to print a set of sequences in fasta format,
you have to do the following:
>>> s1 = SeqRecord(Seq('cacacac'))
>>> s2 = SeqRecord(Seq('cacacac'))
>>> seqs = s1, s2
>>> out = ''
>>> for seq in seqs:
>>>     # a "print seq.format('fasta')" statement won't work properly here, because of blank lines
>>>     out += seq.format('fasta')
>>> print out

On the other side, printing an alignment in fasta format is a lot simpler:
>>> al = Alignment(SingleLetterAlphabet)
>>> al.add_sequence('s1', 'cacaca')
>>> al.add_sequence('s2, 'cacaca')
>>> print al.format('fasta')

I work more often with sets of sequences rather than with alignments.
So, why it is more difficult to print some un-related sequences in a
certain format, than aligned sequence? I would end up using Alignment
objects also for sequences that are not aligned.

I am also thinking about many format parsers.

Wouldn't it be easier:
>>> seqs = Bio.SeqIO.parse(filehandler, 'fasta')
>>> record_dict = seqs.to_dict()

than invoking SeqIO twice?


> Note that in python sets are not order preserving.
>
>> very tiny little minusculus p.s. if you need help for implement such a
>> thing or anything else I can volounteer :).
>
> That's good to hear :)
>
> However, we'd have to establish the need for this new object first -
> but so far we've only had two people's view so its too early to form a
> consensus.  I don't see a strong reason for adding yet another object,
> when the core language provides lists, sets and dict which seem to be
> enough.

Take for example this code you wrote for me before:

> class SeqRecordList(list) :
>    """Subclass of the python list, to hold SeqRecord objects only."""
>    #TODO - Override the list methods to make sure all the items
>    #are indeed SeqRecord objects
>
>    def format(self, format) :
>        """Returns a string of all the records in a requested file format.
>
>        The argument format should be any file format supported by
>        the Bio.SeqIO.write() function.  This must be a lower case string.
>        """
>        from Bio import SeqIO
>        from StringIO import StringIO
>        handle = StringIO()
>        SeqIO.write(self, handle, format)
>        handle.seek(0)
>        return handle.read()

It's very useful, but I don't think a python/biopython newbie would be
able to write it.
That's why I think it should be included.
Last year, I was in another laboratory and I didn't have much
experience with biopython, and I was missing such a kind of object.

> Peter
>

Goodnight!!


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it


From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 07:16:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 02:16:02 -0500
Subject: [Biopython-dev] [Bug 2552] Adding alignments
In-Reply-To: <bug-2552-42@http.bugzilla.open-bio.org/>
Message-ID: <200811130716.mAD7G2pw008200@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2552


------- Comment #2 from fkauff at biologie.uni-kl.de  2008-11-13 02:16 EST -------
The Nexus module in Bio.Nexus has a function (not a method) 'combine' that can
combine Nexus objects. It takes care of missing taxa, taxon sets, etc. Usage is
something like:

nex1=Nexus.Nexus('myfirstalignment.nex')
nex2=Nexus.Nexus('mysecondalignment.nex')
combined=Nexus.combine([('fancyname1',nex1),('fancyname2',nex2)])

It looks fairly straightforward to add this to a SeqRecord object.

Cheers,
Frank

(Hi Cymon)


(In reply to comment #1)
> (In reply to comment #0)
> > This is related to the very broad alignment bug 1944.
> > 
> > Given two alignments, it can make sense to talk about adding them together.
> 
> Actually, this is a very common procedure in phylogenetic analyses, where
> multiple genes/loci are combined into a "super" matrix for a set of taxa.
> Although, in this case, adding by column, if a taxon/row/identifier was missing
> in a particular (sub-)alignment it would be filled by "-" (missing data) in the
> combined matrix.
> 
> Anyway, I think this would be a very useful enhancement.
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 10:19:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 05:19:29 -0500
Subject: [Biopython-dev] [Bug 2552] Adding alignments
In-Reply-To: <bug-2552-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131019.mADAJTxs024880@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2552


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-13 05:19 EST -------
(In reply to comment #1)
> (In reply to comment #0)
> > This is related to the very broad alignment bug 1944.
> > 
> > Given two alignments, it can make sense to talk about adding them together.
> 
> Actually, this is a very common procedure in phylogenetic analyses, where
> multiple genes/loci are combined into a "super" matrix for a set of taxa.

This was one of the use cases I originally had in mind here (with hindsight I
should have mentioned this in the original proposal).  Another potentially use
for this is in combination with extracting sub-alignments by column (see Bug
2551) - for example to remove some middle region of an alignment by selecting
the two end regions and adding them together, e.g. new_align = align[:,:10] +
align[:,20:] to remove the region from columns 10 to 20.

As described in my original proposal, adding two alignments "by column" would
require they have the same number of rows, and the same IDs (possibly in a
different order - this is not essential as making the user think about their
preferred sort order seem fine to me).

I suppose using any common subset of shared names is also well defined, or
automatically including null sequences for missing entries (as Frank suggested
in comment 2), but I would much prefer to keep any alignment addition simple
and explicit - no "magic".

More generally you could consider adding any two alignments "by column" if they
have the same number of rows, but first we'd have to talk about adding
SeqRecord objects.  This means doing something sensible with the annotation, in
particular the id and name.  I was hoping to avoid this.

Once Biopython 1.49 is out, dealing with this bug is certainly on my todo list,
especially now that we have some positive responses.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Thu Nov 13 10:27:57 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 13 Nov 2008 02:27:57 -0800 (PST)
Subject: [Biopython-dev] [BioPython] a sequence set object in biopython?
In-Reply-To: <5aa3b3570811121616u5f95cc8du9f0d91e4743f067f@mail.gmail.com>
Message-ID: <25667.98653.qm@web62408.mail.re1.yahoo.com>

Adding new classes to Biopython should be done very carefully ... once they're in, it's difficult to remove them again. In the past, removing classes that turned out to be less than ideal was a real headache. Right now I don't see a clear need for a sequence set object ... read on.

--- On Wed, 11/12/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> > > > > OK, then use a dict of SeqRecords for this, as shown
> > in the tutorial chapter for Bio.SeqIO and the wiki.
> >  We even have a helper function
> > Bio.SeqIO.to_dict() to do this and check for duplicate
> > keys.
> 
> I would prefer a SeqRecordSet object with a to_dict method

> Wouldn't it be easier:
> >>> seqs = Bio.SeqIO.parse(filehandler,
> 'fasta')
> >>> record_dict = seqs.to_dict()
> 
> than invoking SeqIO twice?

Maybe, yes, but it's just a matter of typing and I don't think that by itself it is a good enough reason for a SeqRecordSet class.

> Let's see it from another point of view.
> In biopython, if you want to print a set of sequences in
> fasta format,
> you have to do the following:
> >>> s1 = SeqRecord(Seq('cacacac'))
> >>> s2 = SeqRecord(Seq('cacacac'))
> >>> seqs = s1, s2
> >>> out = ''
> >>> for seq in seqs:
>         # a "print seq.format('fasta')" statement won't work
>         # properly here, because of blank lines
>         out += seq.format('fasta')
> >>> print out

I don't quite understand why "print seq.format('fasta')" won't work.

> Take for example this code you wrote for me before:
> 
> > class SeqRecordList(list) :
> >    def format(self, format) :
> >        from Bio import SeqIO
> >        from StringIO import StringIO
> >        handle = StringIO()
> >        SeqIO.write(self, handle, format)
> >        handle.seek(0)
> >        return handle.read()
> 
> It's very useful, but I don't think a
> python/biopython newbie would be
> able to write it.

I agree that this is too complicated. What if we redefine SeqIO.write as

def write(self, handle=sys.stdout, format='fasta'):
...

So by default SeqIO.write prints to the screen. Then you can do

SeqIO.write(records)

where records are a list of SeqRecord's.

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 11:06:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 06:06:20 -0500
Subject: [Biopython-dev] [Bug 2628] Have Bio.SeqIO.write(...) and
	Bio.AlignIO.write(...) return number of records
In-Reply-To: <bug-2628-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131106.mADB6Ki7030741@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2628


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-13 06:06 EST -------
Note - now that we return the count, this does block a previous suggestion by
Michiel that if the handle were omitted the write function could default to
returning a string (handled via StringIO internally).

I wasn't keen on this idea at the time because it would have given the write
function very different behaviour depending on the arguments.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Thu Nov 13 11:11:10 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 13 Nov 2008 11:11:10 +0000
Subject: [Biopython-dev] [BioPython] a sequence set object in biopython?
In-Reply-To: <25667.98653.qm@web62408.mail.re1.yahoo.com>
References: <5aa3b3570811121616u5f95cc8du9f0d91e4743f067f@mail.gmail.com>
	<25667.98653.qm@web62408.mail.re1.yahoo.com>
Message-ID: <320fb6e00811130311t4e813a8fqeb21504fd5696bf1@mail.gmail.com>

Michiel wrote:
>Marco wrote:
>> Take for example this code you [Peter] wrote for me before:
>>
>> > class SeqRecordList(list) :
>> >    def format(self, format) :
>> >        from Bio import SeqIO
>> >        from StringIO import StringIO
>> >        handle = StringIO()
>> >        SeqIO.write(self, handle, format)
>> >        handle.seek(0)
>> >        return handle.read()
>>
>> It's very useful, but I don't think a
>> python/biopython newbie would be
>> able to write it.
>
> I agree that this is too complicated.

This wasn't aimed at a beginner, but rather for Marco if he really
wants to use this kind of object in his own code, or as a basis for
further discussion.

> What if we redefine SeqIO.write as
>
> def write(self, handle=sys.stdout, format='fasta'):
> ...
>
> So by default SeqIO.write prints to the screen. Then you can do
>
> SeqIO.write(records)
>
> where records are a list of SeqRecord's.

We could certainly include something like this in the documentation:

#Just an example to create some records:
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
records = [SeqRecord(Seq("ACGT"),"Alpha"), SeqRecord(Seq("GTGC"),"Beta")]

#One way to "print" records to screen,
import sys
from Bio import SeqIO
SeqIO.write(records, sys.stdout, "fasta")

I'm not so keen on making the handle default to standard out, but this
is nicer than the suggestion you made some time ago that if the handle
were omitted a string be returned (no longer an option since Bug 2628
was committed).

Any other votes for the standard out default?

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 11:18:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 06:18:01 -0500
Subject: [Biopython-dev] [Bug 2552] Adding alignments
In-Reply-To: <bug-2552-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131118.mADBI1of031964@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2552


------- Comment #4 from fkauff at biologie.uni-kl.de  2008-11-13 06:18 EST -------
(In reply to comment #3)
>
> > 
> > Actually, this is a very common procedure in phylogenetic analyses, where
> > multiple genes/loci are combined into a "super" matrix for a set of taxa.
> 
> This was one of the use cases I originally had in mind here (with hindsight I
> should have mentioned this in the original proposal).  Another potentially use
> for this is in combination with extracting sub-alignments by column (see Bug
> 2551) - for example to remove some middle region of an alignment by selecting
> the two end regions and adding them together, e.g. new_align = align[:,:10] +
> align[:,20:] to remove the region from columns 10 to 20.

Nexus parser can already handle this by rewriting the data set

>> nexobject.write_nexus_data(filename='new.nex',exclude=[range(10,21)],delete=['list','of','taxa','two','delete'])

where the indices of remaining character sets and character partitions get
recalculated.


> 
> As described in my original proposal, adding two alignments "by column" would
> require they have the same number of rows, and the same IDs (possibly in a
> different order - this is not essential as making the user think about their
> preferred sort order seem fine to me).
> 
> I suppose using any common subset of shared names is also well defined, or
> automatically including null sequences for missing entries (as Frank suggested
> in comment 2), but I would much prefer to keep any alignment addition simple
> and explicit - no "magic".
> 

Yes, missing names are given missing character entries

> More generally you could consider adding any two alignments "by column" if they
> have the same number of rows, but first we'd have to talk about adding
> SeqRecord objects.  This means doing something sensible with the annotation, in
> particular the id and name.  I was hoping to avoid this.
> 
> Once Biopython 1.49 is out, dealing with this bug is certainly on my todo list,
> especially now that we have some positive responses.
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 12:14:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 07:14:21 -0500
Subject: [Biopython-dev] [Bug 2654] New: Bio.Blast.NCBIStandalone does not
	support the output file argument
Message-ID: <bug-2654-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2654

           Summary: Bio.Blast.NCBIStandalone does not support the output
                    file argument
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


The NCBI blastall tool defaults to writing its output to standard out, but can
be told to write to a file instead:

  -o  BLAST report Output File [File Out]  Optional

Currently Bio.Blast.NCBIStandalone.blastall() does not support this optional
argument - meaning the user wants to save the output they must do this manually
from the standard out handle.

This also applies to rpsblast and blastpgp as well.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From eric.pruitt at gmail.com  Thu Nov 13 13:00:36 2008
From: eric.pruitt at gmail.com (James Pruitt)
Date: Thu, 13 Nov 2008 07:00:36 -0600
Subject: [Biopython-dev] Lowess Smooth Improvement
Message-ID: <171e8a410811130500o71c455f6mda64ab19c138e48f@mail.gmail.com>

I made some changes to the Lowess smoothing method as well as written a unit
test for it. On my machine, it runs around 37% faster in my unit tests
compared to the original lowess method and that is using the numpy.median
function so it would probably run even faster with the Bio.Cluster median
functoin. How do I go about proposing my code to be included in Bio.Python?

-- 
-Jimmy


From biopython at maubp.freeserve.co.uk  Thu Nov 13 13:27:51 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 13 Nov 2008 13:27:51 +0000
Subject: [Biopython-dev] Lowess Smooth Improvement
In-Reply-To: <171e8a410811130500o71c455f6mda64ab19c138e48f@mail.gmail.com>
References: <171e8a410811130500o71c455f6mda64ab19c138e48f@mail.gmail.com>
Message-ID: <320fb6e00811130527m41238780n9fe7f9c6de1a2d0a@mail.gmail.com>

On Thu, Nov 13, 2008 at 1:00 PM, James Pruitt <eric.pruitt at gmail.com> wrote:
> I made some changes to the Lowess smoothing method as well as written a unit
> test for it. On my machine, it runs around 37% faster in my unit tests
> compared to the original lowess method and that is using the numpy.median
> function so it would probably run even faster with the Bio.Cluster median
> functoin.

Presumable this is an update for Bio/Statistics/lowess.py?  I'm a
little confused - this code already uses Bio.Cluster.median if it can,
falling back on numpy.median.  Maybe you're working from an older
version of Bipython?

> How do I go about proposing my code to be included in Bio.Python?

First file an enhancement Bug, then once the bug is filed you can
attached a patch against CVS.
If you have any example scripts or unit tests to go with it, even better.

Thanks,

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 15:25:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 10:25:56 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131525.mADFPuvi029137@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


------- Comment #22 from dalloliogm at gmail.com  2008-11-13 10:25 EST -------
Created an attachment (id=1053)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1053&action=view)
test files for fastPhaseOutput

I put the fastPhaseoutput files, used in the tests, in separated files, as
asked.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 15:59:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 10:59:02 -0500
Subject: [Biopython-dev] [Bug 2655] New: Sorting sub-features in BioSeq.py
	can return corrupted feature
Message-ID: <bug-2655-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2655

           Summary: Sorting sub-features in BioSeq.py can return corrupted
                    feature
           Product: Biopython
           Version: 1.49b
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: cymon.cox at gmail.com


BioSeq.py retrieves SeqFeatures from a BioSQL database and sorts both the
features and any subfeatures. The first sort is superfluous and the second sort
is an error that can lead to feature being returned corrupted with the
sub-features in an incorrect order. So Ive marked this major...

Ive been trying to implement the feature/sub-feature locations test in
test_BioSQL_SeqIO.

Here's my solution (attached as patch1):

"""
        # Compare sub-feature Locations:
        # 
        # BioSQL currently does not store fuzzy locations, but instead stores
        # them as FeatureLocation.nofuzzy_start FeatureLocation.nofuzzy_end.
        # Hence, the old_sub from SeqIO.parse() will have fuzzy location while
        # new_sub locations from BioSQL will be fuzzy.
        # The vast majority of cases will be comparisons of ExactPosition
        # class locations, so we'll try that first and catch the exceptions.

        try:
            assert str(old_sub.location) == str(new_sub.location), \
               "%s -> %s" % (str(old_sub.location), str(new_sub.location))
        except AssertionError, e:
            if isinstance(old_sub.location.start, ExactPosition) and \
                isinstance(new_sub.location.start, ExactPosition) and \
                isinstance(old_sub.location.end, ExactPosition) and \
                isinstance(new_sub.location.end, ExactPosition):
                # Its not a problem with fuzzy locations, re-raise 
                raise AssertionError, e
            else:
                #At least one location is fuzzy
                assert old_sub.location.nofuzzy_start ==
new_sub.location.nofuzzy_start, \
                    "%s -> %s" % (old_sub.location.nofuzzy_start,
new_sub.location.nofuzzy_start)
                assert old_sub.location.nofuzzy_end ==
new_sub.location.nofuzzy_end, \
                   "%s -> %s" % (old_sub.location.nofuzzy_end,
new_sub.location.nofuzzy_end)
"""

This test causes errors in 3 of the test cases:
GenBank/extra_keywords.gb
GenBank/one_of.gb
GFF/NC_001422.gbk

e.g:
Testing loading from genbank format file GenBank/extra_keywords.gb
 - TCCAGGGGATTCACGCGCA...TTG [Gp6GqZ3Q9foPG0HvyXguIGSJN8U] len 154329,
AL138972.1
 - Retrieving by name/display_id 'DMBR25B3',
Traceback (most recent call last):
  File "test_BioSQL_SeqIO.py", line 371, in <module>
    compare_records(record, db_rec)
  File "test_BioSQL_SeqIO.py", line 280, in compare_records
    compare_features(old_f, new_f)
  File "test_BioSQL_SeqIO.py", line 185, in compare_features
    raise AssertionError, e
AssertionError: [153489:154269] -> [40:610]

This is because each of these records has a peculiar join(...)
for the above record:
join(153490..154269,AL121804.2:41..610,

(an aside how does the user know that returned feature location is a join
with a separate accession? How does BioSQL/biopython deal with this?)

The error is caused by BioSeq.py _retrieve_features() sorting the sub-features
first by sorting on start position:

BioSeq.py:
249                 sub_feature_list.append((start, subfeature))
250             sub_feature_list.sort()
251             feature.sub_features = [sub_feature[1]
252                                     for sub_feature in sub_feature_list]

This is an error because it returns the sub-features out of order. Besides this
sub-feature sort, and the seqFeature sort, are both unnecessary because the
features and sub-features are stored in BioSQL by rank and retrieved by rank,
so
they should be in the correct order anyway.

Attached BioSeq.py patch to remove both sort()'s - patch2

With these patches applied the test_BioSQL_SeqIO and test_BioSQL pass:

[cymon at chara Tests]$ python test_BioSQL_SeqIO.py > test_output
[cymon at chara Tests]$ diff -ruN test_output output/test_BioSQL_SeqIO 
--- test_output 2008-11-13 15:39:20.000000000 +0000
+++ output/test_BioSQL_SeqIO    2008-11-12 13:06:19.000000000 +0000
@@ -1,3 +1,4 @@
+test_BioSQL_SeqIO
 Connecting to database
 Removing existing sub-database 'biosql-seqio-test' (if exists)
 (Re)creating empty sub-database 'biosql-seqio-test'
[cymon at chara Tests]$ python run_tests.py test_BioSQL_SeqIO.py
test_BioSQL_SeqIO ... ok

----------------------------------------------------------------------
Ran 1 test in 15.928s

OK
[cymon at chara Tests]$ python run_tests.py test_BioSQL.py
test_BioSQL ... ok

----------------------------------------------------------------------
Ran 1 test in 25.255s

OK


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 16:00:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 11:00:02 -0500
Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can
	return corrupted feature
In-Reply-To: <bug-2655-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131600.mADG02lb002140@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2655


------- Comment #1 from cymon.cox at gmail.com  2008-11-13 11:00 EST -------
Created an attachment (id=1054)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1054&action=view)
patch1 to test_BioSQL_SeqIO


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 16:00:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 11:00:35 -0500
Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can
	return corrupted feature
In-Reply-To: <bug-2655-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131600.mADG0Zhi002264@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2655


------- Comment #2 from cymon.cox at gmail.com  2008-11-13 11:00 EST -------
Created an attachment (id=1055)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1055&action=view)
patch2 to BioSQL/BioSeq.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 16:28:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 11:28:48 -0500
Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can
	return corrupted feature
In-Reply-To: <bug-2655-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131628.mADGSmmf007542@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2655


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-13 11:28 EST -------
Another sensible improvement - checked in with only minor changes (fixed an
assert in the unit test, and removed an old comment about sorting for
subfeatures).

Checking in BioSQL/BioSeq.py;
/home/repository/biopython/biopython/BioSQL/BioSeq.py,v  <--  BioSeq.py
new revision: 1.30; previous revision: 1.29
done
Checking in Tests/test_BioSQL_SeqIO.py;
/home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v  <-- 
test_BioSQL_SeqIO.py
new revision: 1.25; previous revision: 1.24
done

Thanks Cymon,

Peter.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Thu Nov 13 16:33:43 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 13 Nov 2008 16:33:43 +0000
Subject: [Biopython-dev] Lowess Smooth Improvement
In-Reply-To: <171e8a410811130825x5732bd99o252e26f2bafa8e13@mail.gmail.com>
References: <171e8a410811130500o71c455f6mda64ab19c138e48f@mail.gmail.com>
	<320fb6e00811130527m41238780n9fe7f9c6de1a2d0a@mail.gmail.com>
	<171e8a410811130825x5732bd99o252e26f2bafa8e13@mail.gmail.com>
Message-ID: <320fb6e00811130833y3413eb36p92be13ca0ee1ed9a@mail.gmail.com>

On Thu, Nov 13, 2008 at 4:25 PM, James Pruitt <eric.pruitt at gmail.com> wrote:
> I removed the Bio.Cluster reference because the system the code would run on
> would not have acccess to it so the code was vestigial but on the version I
> will submit, I reincluded the Bio.Cluster median function. Yes-- this is an
> update for Bio/Statistics/lowess.py

OK - file the enhancement bug, upload the code (ideally as a patch)
and we'll take a look :)

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 17:09:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 12:09:37 -0500
Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can
	return corrupted feature
In-Reply-To: <bug-2655-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131709.mADH9blO013661@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2655


------- Comment #4 from cymon.cox at gmail.com  2008-11-13 12:09 EST -------
(In reply to comment #3)
> Another sensible improvement - checked in with only minor changes (fixed an
> assert in the unit test,

Thanks Peter :)

> and removed an old comment about sorting for
> subfeatures).

If the comment stays in, you'll need to remove these two lines of nonsense as
well:

test_BioSQL_SeqIO.py:
171         # Hence, the old_sub from SeqIO.parse() will have fuzzy location
while
172         # new_sub locations from BioSQL will be fuzzy.

Sorry about that.

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 13 17:17:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 12:17:15 -0500
Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can
	return corrupted feature
In-Reply-To: <bug-2655-42@http.bugzilla.open-bio.org/>
Message-ID: <200811131717.mADHHFpR015244@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2655


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-13 12:17 EST -------
$ cvs commit -m "Removing two redundant comment lines (see Bug 2655)"
test_BioSQL_SeqIO.py
===========================================
 dev.open-bio.org - Authorized Access Only
===========================================
peterc at dev.open-bio.org's password: 
Checking in test_BioSQL_SeqIO.py;
/home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v  <-- 
test_BioSQL_SeqIO.py
new revision: 1.26; previous revision: 1.25
done


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 01:23:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 13 Nov 2008 20:23:26 -0500
Subject: [Biopython-dev] [Bug 2657] New: Improved Bio/Statistics/lowess.py
Message-ID: <bug-2657-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657

           Summary: Improved Bio/Statistics/lowess.py
           Product: Biopython
           Version: 1.49b
          Platform: PC
               URL: http://pastebin.ca/1255734
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: eric.pruitt at gmail.com


I noticed several calculations were done repeatedly when it could be saved as a
single variable and used throughout. Then, I realized that it would be faster
since the matrix was a statics size to just hard code solving the matrix into
the function.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 09:32:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 04:32:36 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811140932.mAE9Wa1f001445@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #1 from dalloliogm at gmail.com  2008-11-14 04:32 EST -------
ok, but consider that all posts on pastebin disappear after 30 days... You
should add an attachment by clicking on 'Create a New Attachment' from this
page (you can only do that after opening the bug report).

p.s. what about adding some doctest to this module? Just to show an example on
how to run it.
Something like this:
"""
<lowess __doc__ >

    >>> import numpy
    >>> x =  numpy.array([1, 2, 3, 4, 5])
    >>> y = numpy.array([1, 2, 3, 4, 6])
    >>> lowess(x, y)
    expected result
"""

- http://docs.python.org/library/doctest.html
- http://bugzilla.open-bio.org/show_bug.cgi?id=2640


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 10:41:31 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 05:41:31 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141041.mAEAfVQO007220@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 05:41 EST -------
Created an attachment (id=1057)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1057&action=view)
The updated lowess.py from http://pastebin.ca/raw/1255734

Attaching James' new file here so it doesn't just expire at pastebin.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 11:11:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 06:11:26 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141111.mAEBBQJm010925@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 06:11 EST -------
I've updated CVS to use standard four space indentation, add a doctest and the
copyright statement etc.

James' code makes two code changes (shown against CVS revision 1.9).

67,68c67,68
<     h = [numpy.sort(abs(x-x[i]))[r] for i in range(n)]
<     w = numpy.clip(abs(([x]-numpy.transpose([x]))/h),0.0,1.0)
---
>     h = [numpy.sort(numpy.abs(x-x[i]))[r] for i in range(n)]
>     w = numpy.clip(numpy.abs(([x]-numpy.transpose([x]))/h),0.0,1.0)

Due to the historic usage "from Numeric import *" this code did once use
Numeric.abs here, so it makes sense to use numpy.abs now.  Probably just an
oversight from the recent Numeric/numpy conversion.  This is another reminder
that using "from XXX import *" is a bad idea.

76,80c76,82
<             b = numpy.array([sum(weights*y), sum(weights*y*x)])
<             A = numpy.array([[sum(weights),   sum(weights*x)],
<                        [sum(weights*x), sum(weights*x*x)]])
<             beta = numpy.linalg.solve(A,b)
<             yest[i] = beta[0] + beta[1]*x[i]
---
>             theta = weights*x
>             b_top = sum(weights*y)
>             b_bot = sum(theta*y)
>             a = sum(weights)
>             b = sum(theta)
>             d = sum(theta*x)
>             yest[i] = (d*b_top-b*b_bot+(a*b_bot-b*b_top)*x[i])/(a*d-b**2)

I can see the point of calculating and caching these:
weights*y
weights*x
sum(weights*x)

Was there a good reason for the name theta for weights*x?

I personally think using an explicit matrix solver is much nicer to read than
that complex hand coded version.  Does it really save much time?

My suggestion is just:
76,78c76,81
<             b = numpy.array([sum(weights*y), sum(weights*y*x)])
<             A = numpy.array([[sum(weights),   sum(weights*x)],
<                        [sum(weights*x), sum(weights*x*x)]])
---
>             weights_x = weights*x
>             weights_y = weights*y
>             sum_weights_x = sum(weights_x)
>             b = numpy.array([sum(weights_y), sum(weights_y*x)])
>             A = numpy.array([[sum(weights),   sum_weights_x],
>                        [sum_weights_x, sum(weights_x*x)]])

However, I'm going to leave this for Michiel to resolve (given he wrote the
code in the first place).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 11:15:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 06:15:09 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141115.mAEBF9Gi011416@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #4 from eric.pruitt at gmail.com  2008-11-14 06:15 EST -------
Created an attachment (id=1058)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1058&action=view)
Unit test for lowess.py

File will need to have the import statements adjsuted for the Bio.Python
structure.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From p.j.a.cock at googlemail.com  Fri Nov 14 11:18:43 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 14 Nov 2008 11:18:43 +0000
Subject: [Biopython-dev] [BioPython] Problems with Emboss.Primer3
In-Reply-To: <001001c94644$eeaf5c00$1022a8c0@ipkgatersleben.de>
References: <000801c94598$fd183f20$1022a8c0@ipkgatersleben.de>
	<320fb6e00811130643p357092f6y8e6d983a11909003@mail.gmail.com>
	<001001c94644$eeaf5c00$1022a8c0@ipkgatersleben.de>
Message-ID: <320fb6e00811140318s452f9a5aj76eb7d505a98b6ee@mail.gmail.com>

On Fri, Nov 14, 2008 at 10:37 AM, Stefanie L?ck
<lueck at ipk-gatersleben.de> wrote:
> Thanks for the hints!
> ...
> It gives as well as at the command line:
>
> "
> Command line:
> eprimer3 -sequence p3input.txt -outfile out.pr3 -target 50,500
> Return code:
> 1
> Errors:
>
>    EMBOSS An error in ajnam.c at line 1991:
>
> EMBOSSWIN environment variable not defined
>
> Messages
>
> "
> Any suggestions?

This doesn't seem to be a Biopython problem, but an EMBOSS
installation or configuration problem.  What version of EMBOSS do you
have?  Maybe try upgrading to version 6?

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 11:28:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 06:28:36 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141128.mAEBSaSb013641@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


eric.pruitt at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |eric.pruitt at gmail.com


------- Comment #5 from eric.pruitt at gmail.com  2008-11-14 06:28 EST -------
(In reply to comment #3)
> I've updated CVS to use standard four space indentation, add a doctest and the
> copyright statement etc.
> 
> James' code makes two code changes (shown against CVS revision 1.9).
> 
> 67,68c67,68
> <     h = [numpy.sort(abs(x-x[i]))[r] for i in range(n)]
> <     w = numpy.clip(abs(([x]-numpy.transpose([x]))/h),0.0,1.0)
> ---
> >     h = [numpy.sort(numpy.abs(x-x[i]))[r] for i in range(n)]
> >     w = numpy.clip(numpy.abs(([x]-numpy.transpose([x]))/h),0.0,1.0)
> 
> Due to the historic usage "from Numeric import *" this code did once use
> Numeric.abs here, so it makes sense to use numpy.abs now.  Probably just an
> oversight from the recent Numeric/numpy conversion.  This is another reminder
> that using "from XXX import *" is a bad idea.
> 
> 76,80c76,82
> <             b = numpy.array([sum(weights*y), sum(weights*y*x)])
> <             A = numpy.array([[sum(weights),   sum(weights*x)],
> <                        [sum(weights*x), sum(weights*x*x)]])
> <             beta = numpy.linalg.solve(A,b)
> <             yest[i] = beta[0] + beta[1]*x[i]
> ---
> >             theta = weights*x
> >             b_top = sum(weights*y)
> >             b_bot = sum(theta*y)
> >             a = sum(weights)
> >             b = sum(theta)
> >             d = sum(theta*x)
> >             yest[i] = (d*b_top-b*b_bot+(a*b_bot-b*b_top)*x[i])/(a*d-b**2)
> 
> I can see the point of calculating and caching these:
> weights*y
> weights*x
> sum(weights*x)
> 
> Was there a good reason for the name theta for weights*x?
> 
> I personally think using an explicit matrix solver is much nicer to read than
> that complex hand coded version.  Does it really save much time?
> 
> My suggestion is just:
> 76,78c76,81
> <             b = numpy.array([sum(weights*y), sum(weights*y*x)])
> <             A = numpy.array([[sum(weights),   sum(weights*x)],
> <                        [sum(weights*x), sum(weights*x*x)]])
> ---
> >             weights_x = weights*x
> >             weights_y = weights*y
> >             sum_weights_x = sum(weights_x)
> >             b = numpy.array([sum(weights_y), sum(weights_y*x)])
> >             A = numpy.array([[sum(weights),   sum_weights_x],
> >                        [sum_weights_x, sum(weights_x*x)]])
> 
> However, I'm going to leave this for Michiel to resolve (given he wrote the
> code in the first place).
> 

Yes-- replacing numpy saves quite a bit of time. When I replaced the variable
so they werent recalculated every single time, it reduced unit test time 17%
compared to the original then repaklcing numpy receduced it to a net 38% from
the original so huge difference. Also, I suggest changing something if you all
decided to keep numpy. Minor but just a suggestion.

>             weights_x = weights*x
>             sum_weights_x = sum(weights_x)
>             b = numpy.array([sum(weights*y), sum(weights_x*y)])
>             A = numpy.array([[sum(weights),   sum_weights_x],
>                        [sum_weights_x, sum(weights_x*x)]])


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 11:32:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 06:32:39 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141132.mAEBWdlC014111@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 06:32 EST -------
(In reply to comment #4)
> Created an attachment (id=1058)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1058&action=view) [details]
> Unit test for lowess.py
> 
> File will need to have the import statements adjsuted for the Bio.Python
> structure.
> 

You're also using scipy and rpy (not Biopython dependencies), so if we wanted
to include these tests they would have to be made conditional on these external
dependencies (so that the test framework knows when it can skip them).  

Removing them effectivly leaves one simple test:

from numpy import array
from Bio.Statistics.lowess import lowess

hand_iterations = 1
hand_f = 2./3.
hand_x = array([0.0,1.0,4.0,7.0])
hand_y = array([0.0,1.0,16.0,49.0])
#Was there a typo in the original, 18.85086... versus 18.5086...?
#hand_out = [-1.333391371257, 2.802858739, 18.850860916, 48.302727]
hand_out = [ -1.33338941,   2.80323154,  18.50860916,  48.30274834]
method_out = lowess(hand_x,hand_y,hand_f,hand_iterations)
for a,b in zip(method_out, hand_out) :
    assert abs(a-b) < 0.00001
print "Done"


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 11:35:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 06:35:44 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141135.mAEBZiCO014367@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #7 from eric.pruitt at gmail.com  2008-11-14 06:35 EST -------
(In reply to comment #6)
> (In reply to comment #4)
> > Created an attachment (id=1058)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1058&action=view) [details] [details]
> > Unit test for lowess.py
> > 
> > File will need to have the import statements adjsuted for the Bio.Python
> > structure.
> > 
> 
> You're also using scipy and rpy (not Biopython dependencies), so if we wanted
> to include these tests they would have to be made conditional on these external
> dependencies (so that the test framework knows when it can skip them).  
> 
> Removing them effectivly leaves one simple test:
> 
> from numpy import array
> from Bio.Statistics.lowess import lowess
> 
> hand_iterations = 1
> hand_f = 2./3.
> hand_x = array([0.0,1.0,4.0,7.0])
> hand_y = array([0.0,1.0,16.0,49.0])
> #Was there a typo in the original, 18.85086... versus 18.5086...?
> #hand_out = [-1.333391371257, 2.802858739, 18.850860916, 48.302727]
> hand_out = [ -1.33338941,   2.80323154,  18.50860916,  48.30274834]
> method_out = lowess(hand_x,hand_y,hand_f,hand_iterations)
> for a,b in zip(method_out, hand_out) :
>     assert abs(a-b) < 0.00001
> print "Done"
> 

When I did the hand calculations, I used a TI-84+ which uses decimal math
eliminating the binary error inherent in most python implementations.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 11:38:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 06:38:51 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141138.mAEBcpNd014578@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 06:38 EST -------
(In reply to comment #5)
>> I personally think using an explicit matrix solver is much nicer to read
>> than that complex hand coded version.  Does it really save much time?
>> ...
>> However, I'm going to leave this for Michiel to resolve (given he wrote
>> the code in the first place).
>> 
> 
> Yes-- replacing numpy saves quite a bit of time. When I replaced the variable
> so they werent recalculated every single time, it reduced unit test time 17%
> compared to the original then repaklcing numpy receduced it to a net 38% from
> the original so huge difference.

OK - so its clarity versus what sounds like a big speed difference.

> Also, I suggest changing something if you all
> decided to keep numpy. Minor but just a suggestion.
> 
> >             weights_x = weights*x
> >             sum_weights_x = sum(weights_x)
> >             b = numpy.array([sum(weights*y), sum(weights_x*y)])
> >             A = numpy.array([[sum(weights),   sum_weights_x],
> >                        [sum_weights_x, sum(weights_x*x)]])
> 

I see, in defining b, sum(weights*y*x) can be done as sum(weights_x*y) which
avoids creating the temp variable weights_y = weights*y, that does look better.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 11:41:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 06:41:05 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141141.mAEBf5IS014888@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


eric.pruitt at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|eric.pruitt at gmail.com       |


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 11:48:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 06:48:07 -0500
Subject: [Biopython-dev] [Bug 2658] New: 1.49b version of PDB Neighborsearch
	still based on Numeric
Message-ID: <bug-2658-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2658

           Summary: 1.49b version of PDB Neighborsearch still based on
                    Numeric
           Product: Biopython
           Version: 1.49b
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: rbickerton at gmail.com


Using python 2.52, running:

python ./lib/python2.5/site-packages/Bio/PDB/NeighborSearch.py

gives:

Traceback (most recent call last):
  File "./lib/python2.5/site-packages/Bio/PDB/NeighborSearch.py", line 138, in
<module>
    ns=NeighborSearch(al)
  File "./lib/python2.5/site-packages/Bio/PDB/NeighborSearch.py", line 41, in
__init__
    assert(self.coords.typecode()=="f")
AttributeError: 'numpy.ndarray' object has no attribute 'typecode'
Exit 1

A bit of google digging suggested that .typecode()=="f" is a Numarray function
that should be updated to its Numpy equivalent.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 12:06:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 07:06:28 -0500
Subject: [Biopython-dev] [Bug 2658] 1.49b version of PDB Neighborsearch
	still based on Numeric
In-Reply-To: <bug-2658-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141206.mAEC6SEp016723@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2658


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         OS/Version|Mac OS                      |All


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 07:06 EST -------
Yes, that does look like an oversight in the Numeric to NumPy migration.

See also Bug 2649 for a related but different issue in Bio.KDTree


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 12:18:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 07:18:25 -0500
Subject: [Biopython-dev] [Bug 2634] PAM30 Matrix doesn't work with qblast
In-Reply-To: <bug-2634-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141218.mAECIPRT017833@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2634


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 07:18 EST -------
Hi Nick,

I hope you got your blast to work.

I don't think we have an issue with Biopython itself, so I'm going to close
this bug.  It would be nice to somehow improve the error handling, but that
doesn't look straight forward.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 12:24:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 07:24:16 -0500
Subject: [Biopython-dev] [Bug 2604] test_Restriction failure with Python 2.6
	(also cause error in test_CAPS)
In-Reply-To: <bug-2604-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141224.mAECOGMN018266@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2604


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 07:24 EST -------
I'm going to mark this as fixed given it seem to be OK.

Please reopen this if there are any issues.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Fri Nov 14 12:27:23 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 14 Nov 2008 12:27:23 +0000
Subject: [Biopython-dev] Biopython 1.49 beta released
In-Reply-To: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com>
References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com>
Message-ID: <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com>

On Sun, Nov 9, 2008 at 3:16 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Dear Biopythoneers,
>
> We are pleased to announce a beta release of Biopython 1.49. There are
> been some significant changes since Biopython 1.48 was released two
> months ago, which is why we are initially releasing a beta for wider
> testing.
>
> As previously announced, the big news is that Biopython now uses NumPy
> rather than its precursor Numeric (the original Numerical Python
> library).

We've had a few Numeric -> NumPy bugs reported,

http://bugzilla.open-bio.org/show_bug.cgi?id=2658
Bug 2658 - Bio.PDB.Neighborsearch

http://bugzilla.open-bio.org/show_bug.cgi?id=2649
Bug 2649 - Bio.KDTree (probably fixed)

I don't think we should release Biopython 1.49 final until these are
resolved - but if there was interest I could put out a second beta.

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 13:17:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 08:17:39 -0500
Subject: [Biopython-dev] [Bug 2638] test_PopGen_SimCoal_nodepend.py fails on
	Windows, newline issue
In-Reply-To: <bug-2638-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141317.mAEDHdWo021804@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2638


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 08:17 EST -------
Patch checked in after testing with SIMCOAL2 on Windows XP.

Marking as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 15:16:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 10:16:12 -0500
Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython
In-Reply-To: <bug-2640-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141516.mAEFGClF031759@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2640


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #19 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 10:16 EST -------
I've added a general example doctest to the main docstring for the SeqRecord
object.

Marking as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 15:35:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 10:35:18 -0500
Subject: [Biopython-dev] [Bug 2524] Handle missing libraries like numpy or
	reportlab in run_tests.py
In-Reply-To: <bug-2524-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141535.mAEFZIP8001033@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2524


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 10:35 EST -------
Fixed the numpy test cases (they were getting annoying with python 2.6 on
Windows where numpy isn't yet available).  The reportlab tests already fail
gracefully.

I ended up going down this route:

> (b) Modify all the tests using these semi-optional libraries to catch
> the ImportError and raise MissingExternalDependencyError instead.  As
> the tests themselves generally don't directly import the external
> library this is perhaps messy.

Marking this bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bsouthey at gmail.com  Fri Nov 14 15:39:00 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Fri, 14 Nov 2008 09:39:00 -0600
Subject: [Biopython-dev] Biopython 1.49 beta released
In-Reply-To: <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com>
References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com>
	<320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com>
Message-ID: <491D9B94.9050805@gmail.com>

Peter wrote:
> On Sun, Nov 9, 2008 at 3:16 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>   
>> Dear Biopythoneers,
>>
>> We are pleased to announce a beta release of Biopython 1.49. There are
>> been some significant changes since Biopython 1.48 was released two
>> months ago, which is why we are initially releasing a beta for wider
>> testing.
>>
>> As previously announced, the big news is that Biopython now uses NumPy
>> rather than its precursor Numeric (the original Numerical Python
>> library).
>>     
>
> We've had a few Numeric -> NumPy bugs reported,
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2658
> Bug 2658 - Bio.PDB.Neighborsearch
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2649
> Bug 2649 - Bio.KDTree (probably fixed)
>
> I don't think we should release Biopython 1.49 final until these are
> resolved - but if there was interest I could put out a second beta.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>   
I noticed that Bio.PDB.Neighborsearch is not being tested.

Is there someway to identify which functions are not getting tested?
I know it is considerable effort but it would allow the development of 
tests that at the very least exercise all the Biopython code. (Hopefully 
this is not as bad as the Numpy documentation marathon.)

Bruce


From biopython at maubp.freeserve.co.uk  Fri Nov 14 15:46:34 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 14 Nov 2008 15:46:34 +0000
Subject: [Biopython-dev] Biopython 1.49 beta released
In-Reply-To: <491D9B94.9050805@gmail.com>
References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com>
	<320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com>
	<491D9B94.9050805@gmail.com>
Message-ID: <320fb6e00811140746m119a040dv778163e0ab034a2@mail.gmail.com>

On Fri, Nov 14, 2008 at 3:39 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Peter wrote:
>> We've had a few Numeric -> NumPy bugs reported,
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2658
>> Bug 2658 - Bio.PDB.Neighborsearch
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2649
>> Bug 2649 - Bio.KDTree (probably fixed)
>>
>> ...
>
> I noticed that Bio.PDB.Neighborsearch is not being tested.
>

That fact that we didn't spot Bug 2658 from the unit tests makes that
very clear ;)

>
> Is there someway to identify which functions are not getting tested?
>

I can't think of an easy way - the best bet might be a quick script to
scan all the unit tests and pull out import lines, and from this build
a list of all modules which have some coverage.  This wouldn't tell us
about how much of each module is tested, but it would be better than
nothing.

> I know it is considerable effort but it would allow the development of tests
> that at the very least exercise all the Biopython code. (Hopefully this is
> not as bad as the Numpy documentation marathon.)

I've written plenty of tests myself, including for existing modules -
my gut feeling is full test coverage would be quite a marathon.

Compared to the early years of the project, I've propably tried to be
a bit stricter about making sure we have test cases and documentation
before accepting new code.  In some cases this has worked out pretty
well (e.g. Tiago's PopGen stuff is covered in the tutorial and has
unit tests).  In other cases it could put people off contributing
code.

Peter


From biopython at maubp.freeserve.co.uk  Fri Nov 14 17:24:33 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 14 Nov 2008 17:24:33 +0000
Subject: [Biopython-dev] Test coverage
Message-ID: <320fb6e00811140924g26cc0703r2629380540a5b667@mail.gmail.com>

Bruce:
>>
>> Is there someway to identify which functions are not getting tested?
>>

Peter:
> I can't think of an easy way - the best bet might be a quick script to
> scan all the unit tests and pull out import lines, and from this build
> a list of all modules which have some coverage.  This wouldn't tell us
> about how much of each module is tested, but it would be better than
> nothing.

I've done a very crude script to try and answer this, and can point
out a few modules in need of tests:

Bio.Affy
Bio.AlignAce
Bio.EZRetrieve
Bio.Emboss (everything except the primer parsers)
Bio.Encodings (obsolete?)
Bio.FilteredReader (obsolete?)
Bio.MaxEntropy
Bio.NMR
Bio.NaiveBayes
Bio.NetCatch (obsolete?)

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 18:06:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:06:49 -0500
Subject: [Biopython-dev] [Bug 2659] New: Typo in tutorial section "2.1
	General overview of what Biopython provides"
Message-ID: <bug-2659-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2659

           Summary: Typo in tutorial section "2.1  General overview of what
                    Biopython provides"
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: trivial
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: wilcoxjg at gmail.com


Sentence reads:
 "To me, this can be frustrating since I often WAY to just know the one right
way to do something."

Should be: 
 "To me, this can be frustrating since I often WANT to just know the one right
way to do something."


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 18:16:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:16:18 -0500
Subject: [Biopython-dev] [Bug 2660] New: Typo in tutorial section "2.2
	Working with sequences"
Message-ID: <bug-2660-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2660

           Summary: Typo in tutorial section "2.2  Working with sequences"
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: minor
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: wilcoxjg at gmail.com


Sentence reads:

"What we have here is a sequence object with a generic alphabet - reflecting
the fact WE HAVE SPECIFIED if this is a DNA or protein sequence (okay, a
protein with a lot of Alanines, Glycines, Cysteines and Threonines!)."

Should read:

"What we have here is a sequence object with a generic alphabet - reflecting
the fact we have NOT specified if this is a DNA or protein sequence (okay, a
protein with a lot of Alanines, Glycines, Cysteines and Threonines!)."


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 18:28:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:28:12 -0500
Subject: [Biopython-dev] [Bug 2659] Typo in tutorial section "2.1 General
	overview of what Biopython provides"
In-Reply-To: <bug-2659-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141828.mAEISCmZ013084@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2659


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 13:28 EST -------
Thanks :)

That's fixed in CVS now, see Doc/Tutorial.tex revision 1.185, which you can
view online here (updated every hour):

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython

We'll update the HTML and PDF on the website as part of the next release
(Biopython 1.49).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 18:34:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:34:34 -0500
Subject: [Biopython-dev] [Bug 2661] New: Typo in: "2.3  A usage example"
Message-ID: <bug-2661-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2661

           Summary: Typo in: "2.3  A usage example"
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: trivial
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: wilcoxjg at gmail.com


Sentence reads:

"We???ll start with sequence parsing in Section 2.4, but the orchids will be
back later on as well - for example WE'LL EXTRA DATA FROM Swiss-Prot from
certain orchid proteins in Section 6.1, search PubMed for papers about orchids
in Section 6.2, extract sequence data from GenBank in Section 6.3.1, and work
with ClustalW multiple sequence alignments of orchid proteins in Section
6.4.1."

Capitalized phrase should contain some modifier like "we'll NEED extra", or
"we'll GET extra".


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 18:34:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:34:49 -0500
Subject: [Biopython-dev] [Bug 2660] Typo in tutorial section "2.2 Working
	with sequences"
In-Reply-To: <bug-2660-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141834.mAEIYnm6013826@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2660


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 13:34 EST -------
The tutorial on the website (matching Biopython 1.49b) is fine:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Which version of Biopython are you using (you didn't fill this in on the bug
report), or where are you reading this?

Looking over CVS this text was only like this in Biopython 1.44, so I'm a
little confused.

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 18:38:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:38:06 -0500
Subject: [Biopython-dev] [Bug 2661] Typo in: "2.3  A usage example"
In-Reply-To: <bug-2661-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141838.mAEIc6Qo014131@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2661


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 13:38 EST -------
As per Bug 2660, which version of Biopython are you using (you didn't fill this
in on the bug report), or where are you reading this?

This has already been fixed to say "extract" instead of "extra" (but I'm not
going to check exactly when this was corrected).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 18:40:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:40:28 -0500
Subject: [Biopython-dev] [Bug 2660] Typo in tutorial section "2.2 Working
	with sequences"
In-Reply-To: <bug-2660-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141840.mAEIeSsm014238@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2660


wilcoxjg at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|Not Applicable              |1.44


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 18:41:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:41:47 -0500
Subject: [Biopython-dev] [Bug 2661] Typo in: "2.3  A usage example"
In-Reply-To: <bug-2661-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141841.mAEIfll7014298@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2661


wilcoxjg at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|Not Applicable              |1.44


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 18:47:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 13:47:28 -0500
Subject: [Biopython-dev] [Bug 2660] Typo in tutorial section "2.2 Working
	with sequences"
In-Reply-To: <bug-2660-42@http.bugzilla.open-bio.org/>
Message-ID: <200811141847.mAEIlS8Y014586@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2660


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 13:47 EST -------
Hi Josh,

If you were reading the tutorial shipped with Biopython 1.44 this makes sense. 
I certainly don't want to put you off reporting any other typos, but if you
find any more please first check against the (almost completely) up to date
version before reporting them:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Note that some of the things covered in the current tutorial will not apply to
Biopython 1.44, which is now a year old.  I'd encourage you to upgrade if
possible.

Thanks,

Peter

P.S. Marking this bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mhampton at d.umn.edu  Fri Nov 14 19:48:42 2008
From: mhampton at d.umn.edu (Marshall Hampton)
Date: Fri, 14 Nov 2008 13:48:42 -0600 (CST)
Subject: [Biopython-dev] coverage of function testing
Message-ID: <Pine.SOC.4.64.0811141338280.4396@ub.d.umn.edu>


Hi,

I noticed some discussion of the coverage and automation of testing for 
functions in biopython, and thought I would suggest folks check out the 
testing and coverage tools in Sage (www.sagemath.org).  Testing of 
functions in Sage is done by testing examples in their docstrings - there 
are comments to opt out of testing or to indicate if they will take a long 
time.  They also have scripts for checking which functions have at least 
one such testable example.  So you can do something like this:


sage -coverage PATH_TO_SAGE/sage/geometry/polyhedra.py

and get

SCORE
/Volumes/D/sage-3.2.alpha0/devel/sage-main/sage/geometry/polyhedra.py:
100% (21 of 21)

to see if anything is untested.

Now that biopython is converting to numpy, I will start arguing for its 
inclusion as a standard part of Sage (right now it is an optional 
package).


Cheers,

Marshall Hampton
Integrated Biosciences Program and
Department of Mathematics and Statistics
University of Minnesota, Duluth


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 20:27:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 15:27:12 -0500
Subject: [Biopython-dev] [Bug 2662] New: Typo in tutorial "Chapter 3
	Sequence objects "
Message-ID: <bug-2662-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2662

           Summary: Typo in tutorial "Chapter 3 Sequence objects "
           Product: Biopython
           Version: 1.49b
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: trivial
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: wilcoxjg at gmail.com


Sentence reads:                                                                 

"First of all the Seq object has a slightly different set of METHODS TO A PLAIN
python string (for example, reverse_complement() and translate() methods used
for nucleotide sequences)."

Should be:
"methods THAN a plain python string"


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Fri Nov 14 20:29:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 14 Nov 2008 20:29:16 +0000
Subject: [Biopython-dev] coverage of function testing
In-Reply-To: <Pine.SOC.4.64.0811141338280.4396@ub.d.umn.edu>
References: <Pine.SOC.4.64.0811141338280.4396@ub.d.umn.edu>
Message-ID: <320fb6e00811141229j3aa3a7b6ra3a064842e8f007c@mail.gmail.com>

On Fri, Nov 14, 2008 at 7:48 PM, Marshall Hampton <mhampton at d.umn.edu> wrote:
> Hi,
>
> I noticed some discussion of the coverage and automation of testing for
> functions in biopython, and thought I would suggest folks check out the
> testing and coverage tools in Sage (www.sagemath.org).  Testing of functions
> in Sage is done by testing examples in their docstrings - there are comments
> to opt out of testing or to indicate if they will take a long time.  They
> also have scripts for checking which functions have at least one such
> testable example.  So you can do something like this:
>
> sage -coverage PATH_TO_SAGE/sage/geometry/polyhedra.py
>
> and get
>
> SCORE
> /Volumes/D/sage-3.2.alpha0/devel/sage-main/sage/geometry/polyhedra.py:
> 100% (21 of 21)
>
> to see if anything is untested.

That may be worth a go, but there are two sides to this:
(1) Making a list of the code that needs testing (pretty much the same
for any python library)
(2) Working out what is already tested (and here, that means going
over Biopython's test framework which is based on unit test, but also
includes some use of doctests).  This is probably trickier...

> Now that biopython is converting to numpy, I will start arguing for its
> inclusion as a standard part of Sage (right now it is an optional package).

That sounds good - but I have no knowledge of the Sage system and how
they divide things up.

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 23:15:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 18:15:57 -0500
Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence
	objects "
In-Reply-To: <bug-2662-42@http.bugzilla.open-bio.org/>
Message-ID: <200811142315.mAENFvNc000930@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2662


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 18:15 EST -------
(In reply to comment #0)
> Sentence reads:                                                                 
> 
> "First of all the Seq object has a slightly different set of METHODS TO A
> PLAIN python string (for example, reverse_complement() and translate()
> methods used for nucleotide sequences)."

There's nothing wrong with that (and I got a second opinion on this too).  The
only thing I think that might need changing is adding a comma: "First of all,
the Seq object...".

> Should be:
> "methods THAN a plain python string"

Why exactly?  Are you an American? ;)

There is also the possible option of "... different ... from ...", but that
doesn't flow as nicely here.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 14 23:47:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 18:47:16 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811142347.mAENlG5D003824@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #9 from eric.pruitt at gmail.com  2008-11-14 18:47 EST -------
Created an attachment (id=1059)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1059&action=view)
Test for speed comparison

I wrote a short program to compare the speed of the original lowess function to
my version. I thought the way the unit test was written might have affected
results. On my system, the new version ran an average of 15 seconds per test as
opposed 19 for the old one so not the boost I originally purported but closer
to 27%. Posting the program so someone else can compare it.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Nov 15 02:06:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 21:06:49 -0500
Subject: [Biopython-dev] [Bug 2658] 1.49b version of PDB Neighborsearch
	still based on Numeric
In-Reply-To: <bug-2658-42@http.bugzilla.open-bio.org/>
Message-ID: <200811150206.mAF26nhu013792@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2658


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2008-11-14 21:06 EST -------
Fixed in CVS; see Bio/PDB/NeighborSearch.py revision 1.21.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Nov 15 03:59:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 22:59:22 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811150359.mAF3xM8D020801@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp  2008-11-14 22:59 EST -------
This warning is due to the introduction of Py_ssize_t in Python 2.5. The best
solution for this bug depends on which Python versions will be supported by
Biopython.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Nov 15 04:04:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 14 Nov 2008 23:04:00 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811150404.mAF4403S021350@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp  2008-11-14 23:04 EST -------
A few comments:

1) Is there a reason to use numpy.abs instead of Python's built-int abs? Timing
these two functions suggests that they are equally fast.
2) I have no objection against James' suggestion to speed up the code. The
original call to numpy.linalg.solve was probably overkill.
3) Can you submit a unit test that does not use scipy and rpy? We should avoid
adding additional dependencies to Biopython.
4) In the long run, I am not sure whether Biopython is the right place for the
lowess function. Probably NumPy or Matplotlib would be better. (that shouldn't
stop us from improving the code here, though).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Nov 15 07:16:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 15 Nov 2008 02:16:11 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811150716.mAF7GB1r002223@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2008-11-15 02:16 EST -------
I have uploaded a fixed version to CVS. Could you try it? Bio/triemodule.c,
revision 1.7.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Nov 15 16:29:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 15 Nov 2008 11:29:53 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811151629.mAFGTrgj008598@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #11 from eric.pruitt at gmail.com  2008-11-15 11:29 EST -------
(In reply to comment #10)
> A few comments:
> 
> 1) Is there a reason to use numpy.abs instead of Python's built-int abs? Timing
> these two functions suggests that they are equally fast.
> 2) I have no objection against James' suggestion to speed up the code. The
> original call to numpy.linalg.solve was probably overkill.
> 3) Can you submit a unit test that does not use scipy and rpy? We should avoid
> adding additional dependencies to Biopython.
> 4) In the long run, I am not sure whether Biopython is the right place for the
> lowess function. Probably NumPy or Matplotlib would be better. (that shouldn't
> stop us from improving the code here, though).
> 

Yes, I only had the scipy and rpy dependencies in my unit test because I wanted
to have something to compare your function to when I was going to first use it
in my code and to make sure it worked after I made changes to it.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Nov 15 17:07:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 15 Nov 2008 12:07:36 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811151707.mAFH7aZM010885@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


eric.pruitt at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1057 is|0                           |1
           obsolete|                            |


------- Comment #12 from eric.pruitt at gmail.com  2008-11-15 12:07 EST -------
Created an attachment (id=1060)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1060&action=view)
Updated lowess.py

Renamed "theta" to a more logical name, "weighted_mul_x." Replaced numpy.abs
with regular abs statement (Actually lead to a very slight but still there
speed increase).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Nov 15 17:08:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 15 Nov 2008 12:08:15 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811151708.mAFH8F6n010936@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


eric.pruitt at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1058 is|0                           |1
           obsolete|                            |


------- Comment #13 from eric.pruitt at gmail.com  2008-11-15 12:08 EST -------
Created an attachment (id=1061)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1061&action=view)
Unit test for lowess.py removing scipy and rpy dependencies


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 08:36:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 03:36:32 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811170836.mAH8aWoY027949@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #14 from mdehoon at ims.u-tokyo.ac.jp  2008-11-17 03:36 EST -------
I have uploaded the new code and the unit test with some modifications to CVS.
Could you have a look at it to see if you're happy with the result? I am using
numpy.dot(x,y) instead of sum(x*y) whereever possible; this gave an additional
speedup.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 10:33:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 05:33:37 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171033.mAHAXbbS003922@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-17 05:33 EST -------
I haven't tried this on Linux yet.

===================================

I've just updated to CVS and rebuilt on Windows with mingw32 (gcc 3.4.4 cygming
special), using Python 2.3, 2.4, 2.5 and 2.6 - no warnings from the Bio.Trie
code.  I should have checked for any warnings BEFORE updating to CVS, but
didn't.

===================================

However, on Mac OS X 10.5 "Leopard" with I now get a lot of pointer warnings:

building 'Bio.trie' extension
creating build/temp.macosx-10.3-i386-2.5
creating build/temp.macosx-10.3-i386-2.5/Bio
gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk
-fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd
-fno-common -dynamic -DNDEBUG -g -O3 -IBio
-I/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -c
Bio/triemodule.c -o build/temp.macosx-10.3-i386-2.5/Bio/triemodule.o
Bio/triemodule.c: In function ???_write_value_to_handle???:
Bio/triemodule.c:498: warning: passing argument 3 of
???PyString_AsStringAndSize??? from incompatible pointer type
Bio/triemodule.c: In function ???_write_value_to_handle???:
Bio/triemodule.c:498: warning: passing argument 3 of
???PyString_AsStringAndSize??? from incompatible pointer type
gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk
-fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd
-fno-common -dynamic -DNDEBUG -g -O3 -IBio
-I/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -c
Bio/trie.c -o build/temp.macosx-10.3-i386-2.5/Bio/trie.o
Bio/trie.c: In function ???Trie_set???:
Bio/trie.c:103: warning: pointer targets in passing argument 1 of ???strdup???
differ in signedness
Bio/trie.c:156: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:162: warning: pointer targets in passing argument 1 of ???strncpy???
differ in signedness
Bio/trie.c:162: warning: pointer targets in passing argument 2 of ???strncpy???
differ in signedness
Bio/trie.c:164: warning: pointer targets in passing argument 1 of ???strdup???
differ in signedness
Bio/trie.c: In function ???Trie_set???:
Bio/trie.c:103: warning: pointer targets in passing argument 1 of ???strdup???
differ in signedness
Bio/trie.c:156: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:162: warning: pointer targets in passing argument 1 of ???strncpy???
differ in signedness
Bio/trie.c:162: warning: pointer targets in passing argument 2 of ???strncpy???
differ in signedness
Bio/trie.c:164: warning: pointer targets in passing argument 1 of ???strdup???
differ in signedness
Bio/trie.c: In function ???Trie_get???:
Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strncmp???
differ in signedness
Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???strncmp???
differ in signedness
Bio/trie.c:235: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???_get_approximate_transition???:
Bio/trie.c:268: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:284: warning: pointer targets in passing argument 1 of ???strncat???
differ in signedness
Bio/trie.c:284: warning: pointer targets in passing argument 2 of ???strncat???
differ in signedness
Bio/trie.c: In function ???Trie_get???:
Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strncmp???
differ in signedness
Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???strncmp???
differ in signedness
Bio/trie.c:235: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???_get_approximate_transition???:
Bio/trie.c:268: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???_get_approximate_trie???:
Bio/trie.c:353: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:355: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:284: warning: pointer targets in passing argument 1 of ???strncat???
differ in signedness
Bio/trie.c:356: warning: pointer targets in passing argument 1 of ???strcat???
differ in signedness
Bio/trie.c:284: warning: pointer targets in passing argument 2 of ???strncat???
differ in signedness
Bio/trie.c:356: warning: pointer targets in passing argument 2 of ???strcat???
differ in signedness
Bio/trie.c:367: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:369: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???_get_approximate_trie???:
Bio/trie.c:353: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:355: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???Trie_has_prefix???:
Bio/trie.c:356: warning: pointer targets in passing argument 1 of ???strcat???
differ in signednessBio/trie.c:440: warning: pointer targets in passing
argument 1 of ???strlen??? differ in signedness

Bio/trie.c:356: warning: pointer targets in passing argument 2 of ???strcat???
differ in signedness
Bio/trie.c:441: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strncmp???
differ in signedness
Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???strncmp???
differ in signedness
Bio/trie.c:367: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:369: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???_iterate_helper???:
Bio/trie.c:468: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:470: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:475: warning: pointer targets in passing argument 1 of ???strcat???
differ in signedness
Bio/trie.c:475: warning: pointer targets in passing argument 2 of ???strcat???
differ in signedness
Bio/trie.c: In function ???Trie_has_prefix???:
Bio/trie.c:440: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:441: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???_with_prefix_helper???:
Bio/trie.c:521: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strncmp???
differ in signedness
Bio/trie.c:522: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???strncmp???
differ in signedness
Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strncmp???
differ in signedness
Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???strncmp???
differ in signedness
Bio/trie.c:530: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:536: warning: pointer targets in passing argument 1 of ???strncat???
differ in signedness
Bio/trie.c:536: warning: pointer targets in passing argument 2 of ???strncat???
differ in signedness
Bio/trie.c: In function ???_iterate_helper???:
Bio/trie.c:468: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:470: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:475: warning: pointer targets in passing argument 1 of ???strcat???
differ in signedness
Bio/trie.c:475: warning: pointer targets in passing argument 2 of ???strcat???
differ in signedness
Bio/trie.c: In function ???_with_prefix_helper???:
Bio/trie.c:521: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:522: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???_serialize_transition???:Bio/trie.c:524: warning:
pointer targets in passing argument 1 of ???strncmp??? differ in signedness

Bio/trie.c:621: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???strncmp???
differ in signedness
Bio/trie.c:530: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c:536: warning: pointer targets in passing argument 1 of ???strncat???
differ in signedness
Bio/trie.c:536: warning: pointer targets in passing argument 2 of ???strncat???
differ in signedness
Bio/trie.c: In function ???_serialize_transition???:
Bio/trie.c:621: warning: pointer targets in passing argument 1 of ???strlen???
differ in signedness
Bio/trie.c: In function ???_deserialize_transition???:
Bio/trie.c:708: warning: pointer targets in passing argument 1 of ???strdup???
differ in signedness
Bio/trie.c: In function ???test???:
Bio/trie.c:752: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:753: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:754: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:755: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:757: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:758: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:759: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c: In function ???_deserialize_transition???:
Bio/trie.c:708: warning: pointer targets in passing argument 1 of ???strdup???
differ in signednessBio/trie.c:760: warning: pointer targets in passing
argument 2 of ???Trie_get??? differ in signedness

Bio/trie.c:762: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:763: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:765: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:768: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:769: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c: In function ???test???:
Bio/trie.c:752: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:753: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:754: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:755: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:757: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:758: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:759: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:760: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:762: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:763: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:765: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
Bio/trie.c:768: warning: pointer targets in passing argument 2 of
???Trie_set??? differ in signedness
Bio/trie.c:769: warning: pointer targets in passing argument 2 of
???Trie_get??? differ in signedness
gcc -arch i386 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -g -bundle
-undefined dynamic_lookup build/temp.macosx-10.3-i386-2.5/Bio/triemodule.o
build/temp.macosx-10.3-i386-2.5/Bio/trie.o -o
build/lib.macosx-10.3-i386-2.5/Bio/trie.so

$ python
Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53) 
[GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

$ gcc -v
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5465~16/src/configure --disable-checking
-enable-werror --prefix=/usr --mandir=/share/man
--enable-languages=c,objc,c++,obj-c++
--program-transform-name=/^[cg][^.-]*$/s/$/-4.0/
--with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib
--build=i686-apple-darwin9 --with-arch=apple --with-tune=generic
--host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.0.1 (Apple Inc. build 5465)

Note that this gcc is only 4.0.1, while Bruce reported this bug on 4.3.2.

The good news is test_trie.py and test_triefind.py still pass.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 10:41:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 05:41:35 -0500
Subject: [Biopython-dev] [Bug 2666] New: Bio.PDB.NeighborSearch self test
	often fails with MemoryError
Message-ID: <bug-2666-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2666

           Summary: Bio.PDB.NeighborSearch self test often fails with
                    MemoryError
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


>From the Biopython source code (from CVS), in the Bio/PDB folder, running
NeighborSearch.py does a quick self test.

This is a random test, and sometimes this is fine:

$ python NeighborSearch.py 
Found  1
Found  4
Found  3
Found  2
Found  2
Found  2
Found  3
Found  3
Found  1
Found  5
Found  2
Found  3
Found  2
Found  2
Found  2
Found  6
Found  3
Found  2
Found  3
Found  1

However, about 50% of the time I get something like this:

$ python NeighborSearch.py 
Found  2
Found  1
Found  2
Found  1
Found  1
Found  1
Found  4
Found 
Traceback (most recent call last):
  File "NeighborSearch.py", line 139, in <module>
    print "Found ", len(ns.search_all(5.0))
  File "NeighborSearch.py", line 104, in search_all
    self.kdt.all_search(radius)
  File
"/Users/pjcock/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/KDTree/KDTree.py",
line 198, in all_search
    self.neighbors = self.kdt.neighbor_search(radius)
MemoryError: calculation failed due to lack of memory

I've tried this on a MAC which had over 4GB or RAM free at the time, so I don't
believe this really is a MemoryError.

I've also tried this on a less powerful Windows machine, which fails in the
same way (it can finish the test, but possibly with a lower success rate).

[As an aside, I'm planning to use this self test to create an actual Biopython
unit test for the Bio.PDB.NeighborSearch module.]


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 11:42:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 06:42:24 -0500
Subject: [Biopython-dev] [Bug 2666] Bio.PDB.NeighborSearch self test often
	fails with KDTree MemoryError
In-Reply-To: <bug-2666-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171142.mAHBgOD9008929@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2666


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Bio.PDB.NeighborSearch self |Bio.PDB.NeighborSearch self
                   |test often fails with       |test often fails with KDTree
                   |MemoryError                 |MemoryError


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-17 06:42 EST -------
I suspect this is failing when there are NO entries found within the specified
radius.  Changing this line:

print "Found ", len(ns.search_all(5.0))

to use a larger search radius seems to "fix" the test, e.g.

print "Found ", len(ns.search_all(10.0))

Similarly, dropping it to radius 2.0 makes it fail almost every time.  I
suspect something is amiss in the KDTree C code from the traceback.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 11:44:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 06:44:45 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171144.mAHBijrj009171@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp  2008-11-17 06:44 EST -------
(In reply to comment #3)
Yes I know; that is bug #2608.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 12:09:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 07:09:15 -0500
Subject: [Biopython-dev] [Bug 2666] Bio.PDB.NeighborSearch self test often
	fails with KDTree MemoryError
In-Reply-To: <bug-2666-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171209.mAHC9FUF010799@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2666


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2008-11-17 07:09 EST -------
I fixed Bio.KDTree and committed it to CVS; please give it a try.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 12:14:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 07:14:19 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171214.mAHCEJa0011060@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-17 07:14 EST -------
(In reply to comment #4)
> (In reply to comment #3)
> Yes I know; that is bug #2608.
> 

Oh.  Sorry - I had seen Bug 2608 but hadn't made the connection.

I've just confirmed Linux with gcc 4.1.2 is still happy.

Over to Bruce to test with gcc 4.3.2 then...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 12:25:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 07:25:21 -0500
Subject: [Biopython-dev] [Bug 2666] Bio.PDB.NeighborSearch self test often
	fails with KDTree MemoryError
In-Reply-To: <bug-2666-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171225.mAHCPLmC011729@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2666


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-17 07:25 EST -------
That's fixed it - thanks!

I've also updated test_PDB.py to include a quick test of this code, based on
the Bio/PDB/NeighborSearch.py self test code.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From tiagoantao at gmail.com  Mon Nov 17 13:27:51 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 17 Nov 2008 13:27:51 +0000
Subject: [Biopython-dev] PopGen.Stats
Message-ID: <6d941f120811170527g752c28a7j48b42569c947853d@mail.gmail.com>

After too much thinking and too much delaying (delaying in two
distinct senses: proposal delay and delaying for more than 1 year
doing the module), here is my proposal on how to proceed.

Remembering a few fundamental points:

1. Statistics is the core of population genetics. Never Bio.PopGen
will be relevant without it.
2. The framework should be future proof.
3. The API should be for general use (ie not only based on the cases
developers know of).
4. It is very difficult to a have a broad view on how an API like this
can be used (uses vary population genetics of cancer with micro
arrays/lots of data to conservation genetics of species with a few
samples and little number of loci).

A waterfall approach to development is not only outdated as it would
be quite counter productive. So I have no bureaucratic design document
to provide.
My proposal is to choose a bunch of statistics and tests that are
representative of what people might use and implement them. During the
implementation, through refactoring a reasonable API should take form.
What statistics should be choosen then? What are representative statistics?

I was able to find a list of classifications to start. This list got
some inspiration from the very good Arlequin manual. Here are the
different dimensions that I found:
1. Intra-Population versus Inter-population statistics. Say expected
heterozygosity versus Fst
2. Marker dependent vs Marker independent. Say Allelic range (for
microsatelites only) versus Fis
3. Data type: haployic, genotypic phase unknown, genotypic phase
known, genoptypic dominant, frequency only. Say for expected
heterozygosity frequencies are enough, for observed heterozygosity
genotypic phase unknown data is necessary.
4. Single locus (e.g. allelic richness, ExpHe, Fst) versus multi-loci
(e.g., number of polimorphic sites, LD or EHH)
5. Temporal/longitudinal vs single point in time. Say temporal-Fst versus Fst.
6. Population versus Landscape. This issue I suggest abandon for now.

So, the idea is to choose a set of statistics that elucidate these
points, with a good subset we will have a feeling on how everything
fits together. We implement them and then iterate until the API "feels
good". A suggestion of statistics:

ExpHz non-temporal, intra, single-locus, marker independent, genotypic
- gametic unk
ObsHz non-temporal, intra, single-locus, independent, genotypic - gametic kn
Fst(CW) non-temporal, inter, single-locus, indep, genotypic - gametic unk
temporal-Fst temporal, intra, single-locus, indep, genotypic - gametic unk
LD(D') non-temporal, intra, multi-locus, indep, haplo/geno
Fk temporal, intra, single-locus, indep, geno
S (polimorphic sites), non-temporal, intra, multi-locus, indep, haplo/geno
Alleic range, nt, intra, single-locus, microsat, haplo/geno
EHH, nt, positional
Tajima D, nt, intra, single-locus, sequence/rflp

There is still the issue of tests (say Hardy-Weinberg deviation), but
that can be thought while the rest is being done.

The good news is that the half of the above is already implemented
(exceptions are allelic range, S, Tajima D, EHH - presented in
increasing order of implementation difficulty).

I propose implementing the remaining (I can do that, unless any other
wants to give it a try) and then iterate the API until there is a
rough agreement). This can be done on GIT (BTW, my username there is
tiagoantao). I propose that ability to influence policy is roughly
proportional with the time spent coding/effort done ;) .

PS - I am assuming a sequence is a single locus in my reasoning. Of
course it can be seen (and sometimes is) as a sequence of loci (SNPs).


From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 18:29:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 13:29:08 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171829.mAHIT8u9006711@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #6 from bsouthey at gmail.com  2008-11-17 13:29 EST -------

> Over to Bruce to test with gcc 4.3.2 then...
> 

Still the same warning for Python 2.5 and 2.6:

Bio/triemodule.c: In function ???_write_value_to_handle???:
Bio/triemodule.c:498: warning: passing argument 3 of
???PyString_AsStringAndSize??? from incompatible pointer type

See PEP 353 (http://www.python.org/dev/peps/pep-0353/) which suggests to
include:
#if PY_VERSION_HEX < 0x02050000 && !defined(PY_SSIZE_T_MIN)
typedef int Py_ssize_t;
#define PY_SSIZE_T_MAX INT_MAX
#define PY_SSIZE_T_MIN INT_MIN
#endif

I did not get the warning after I added it to Bio.trie.h (as I thought that
this would be the appropriate location for it) and changed the declaration in
_write_value_to_handle for length to:
Py_ssize_t length;

But while this is fine for Python 2.3 and Python 2.4, I get the error with
Python 2.5 and Python 2.6:

[snip]
test_trie ... ERROR
test_triefind ... ok

======================================================================
ERROR: test_trie
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_trie.py", line 87, in <module>
    trieobj3 = trie.load(h)
ValueError: bad marshal data


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bsouthey at gmail.com  Mon Nov 17 18:35:05 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 17 Nov 2008 12:35:05 -0600
Subject: [Biopython-dev] test_GASelection hangs
In-Reply-To: <bug-2666-42@http.bugzilla.open-bio.org/>
References: <bug-2666-42@http.bugzilla.open-bio.org/>
Message-ID: <4921B959.2080706@gmail.com>

Hi,
I was just running the test under a very fresh cvs version and under 
Python2.3 the test was hanging with test_GASelection. Of course, there 
was no problem after killing it and rerunning the test. I think this 
also pertains to bug 2651 so I thought I would ask if there was a way to 
examine this further before doing anything else.  I understand that this 
is problem with randomization involved, but it does indicate a more 
subtle problem is present.  I would really like to track down the source 
of the problem.

Does anyone have any ideas on how I could try to examine this further?

Thanks
Bruce


From biopython at maubp.freeserve.co.uk  Mon Nov 17 18:50:14 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 17 Nov 2008 18:50:14 +0000
Subject: [Biopython-dev] test_GASelection hangs
In-Reply-To: <4921B959.2080706@gmail.com>
References: <bug-2666-42@http.bugzilla.open-bio.org/>
	<4921B959.2080706@gmail.com>
Message-ID: <320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com>

On Mon, Nov 17, 2008 at 6:35 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Hi,
> I was just running the test under a very fresh cvs version and under
> Python2.3 the test was hanging with test_GASelection. Of course, there was
> no problem after killing it and rerunning the test. I think this also
> pertains to bug 2651 so I thought I would ask if there was a way to examine
> this further before doing anything else.  I understand that this is problem
> with randomization involved, but it does indicate a more subtle problem is
> present.  I would really like to track down the source of the problem.
>
> Does anyone have any ideas on how I could try to examine this further?

If you have installed CVS (or indeed any recent version of Biopython,
as the GA stuff hasn't changed recently IIRC), then in the Tests
directory you can just run:

$ python test_GASelection.py

You'll find sometimes it gets stuck.  I tried modifying the file so
that the end reads as follows:

if __name__ == "__main__":
    #sys.exit(run_tests(sys.argv))

    ALL_TESTS = [DiversitySelectionTest, TournamentSelectionTest,
                 RouletteWheelSelectionTest]

    runner = unittest.TextTestRunner(sys.stdout, verbosity = 2)
    test_loader = unittest.TestLoader()
    test_loader.testMethodPrefix = 't_'

    test=ALL_TESTS[1] #Edit me: 0, 1 or 2
    cur_suite = test_loader.loadTestsFromTestCase(test)
    count = 0
    while True :
        count += 1
        print "#"*50, count
        runner.run(cur_suite)

On my machine, DiversitySelectionTest and RouletteWheelSelectionTest
seem safe - the tests just run and run until you interrupt them with
ctrl+c.

However, this clearly gets stuck in TournamentSelectionTest - so we've
narrowed this down a bit.  Reading that bit of code, there is an
apparent risk of an infinite loop if by chance org_1 happens to be the
worst organism in the population.  Perhaps adding a simple counter to
break out of the loop if after 1000 tries org_1 is still the worst -
but I'm not sure what to do then.

Peter


From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 18:59:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 13:59:26 -0500
Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py
In-Reply-To: <bug-2651-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171859.mAHIxQgZ009193@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2651


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-17 13:59 EST -------
This is a quick hack to help pin-point the problem, assuming you have the CVS
or recent version of Biopython installed, modify the end of test_GAQueens.py as
follows:


if __name__ == "__main__":
    #sys.exit(main(sys.argv))
    count = 0
    while True :
        count +=1
        print "#"*50, count
        run_tests([])


This just repeats the test until it fails:

$ python test_GAQueens.py
...
################################################## 7
Calculating for 5 queens...
Generating an initial population of 1000 organisms...
Evolving the population and searching for a solution...
Traceback (most recent call last):
  File "test_GAQueens.py", line 405, in <module>
    run_tests([])
  File "test_GAQueens.py", line 42, in run_tests
    main(arguments)
  File "test_GAQueens.py", line 76, in main
    evolved_pop = evolver.evolve(queens_solved)
  File
"/Users/xxx/Downloads/Software/biopython-1.49b/build/lib.macosx-10.3-i386-2.5/Bio/GA/Evolver.py",
line 56, in evolve
    self._population = self._selector.select(self._population)
  File
"/Users/xxx/Downloads/Software/biopython-1.49b/build/lib.macosx-10.3-i386-2.5/Bio/GA/Selection/Tournament.py",
line 77, in select
    new_orgs[1])
  File
"/Users/xxx/Downloads/Software/biopython-1.49b/build/lib.macosx-10.3-i386-2.5/Bio/GA/Selection/Abstract.py",
line 53, in mutate_and_crossover
    final_org_1 = self._repairer.repair(final_org_1)
  File "test_GAQueens.py", line 234, in repair
    duplicated_items = self._get_duplicates(organism.genome)
  File "test_GAQueens.py", line 203, in _get_duplicates
    if genome.count(item) > 1:
  File
"/Users/xxx/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Seq.py",
line 886, in count
    raise TypeError("expected a string, Seq or MutableSeq")
TypeError: expected a string, Seq or MutableSeq

i.e. The same traceback as in Bruce's original report (allowing for the update
to the Seq object's count method), but easier to reproduce.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 19:18:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 14:18:24 -0500
Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py
In-Reply-To: <bug-2651-42@http.bugzilla.open-bio.org/>
Message-ID: <200811171918.mAHJIO5t010436@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2651


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-17 14:18 EST -------
Solved with Tests/test_GAQueens.py revision 1.3 in CVS.

When  test_GAQueens.py was written, a Seq object would accept an integer
argument.  Since Biopython 1.45, or to be exact Bio/Seq.py CVS revision 1.20
(see Bug 2386), the Seq object's count method will not accept an integer
argument.   This wasn't deliberate, but is consistent with a python string.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bsouthey at gmail.com  Mon Nov 17 20:03:54 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 17 Nov 2008 14:03:54 -0600
Subject: [Biopython-dev] test_GASelection hangs
In-Reply-To: <320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com>
References: <bug-2666-42@http.bugzilla.open-bio.org/>	
	<4921B959.2080706@gmail.com>
	<320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com>
Message-ID: <4921CE2A.3090606@gmail.com>

Peter wrote:
> On Mon, Nov 17, 2008 at 6:35 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>   
>> Hi,
>> I was just running the test under a very fresh cvs version and under
>> Python2.3 the test was hanging with test_GASelection. Of course, there was
>> no problem after killing it and rerunning the test. I think this also
>> pertains to bug 2651 so I thought I would ask if there was a way to examine
>> this further before doing anything else.  I understand that this is problem
>> with randomization involved, but it does indicate a more subtle problem is
>> present.  I would really like to track down the source of the problem.
>>
>> Does anyone have any ideas on how I could try to examine this further?
>>     
>
> If you have installed CVS (or indeed any recent version of Biopython,
> as the GA stuff hasn't changed recently IIRC), then in the Tests
> directory you can just run:
>
> $ python test_GASelection.py
>
> You'll find sometimes it gets stuck.  I tried modifying the file so
> that the end reads as follows:
>
> if __name__ == "__main__":
>     #sys.exit(run_tests(sys.argv))
>
>     ALL_TESTS = [DiversitySelectionTest, TournamentSelectionTest,
>                  RouletteWheelSelectionTest]
>
>     runner = unittest.TextTestRunner(sys.stdout, verbosity = 2)
>     test_loader = unittest.TestLoader()
>     test_loader.testMethodPrefix = 't_'
>
>     test=ALL_TESTS[1] #Edit me: 0, 1 or 2
>     cur_suite = test_loader.loadTestsFromTestCase(test)
>     count = 0
>     while True :
>         count += 1
>         print "#"*50, count
>         runner.run(cur_suite)
>
> On my machine, DiversitySelectionTest and RouletteWheelSelectionTest
> seem safe - the tests just run and run until you interrupt them with
> ctrl+c.
>
> However, this clearly gets stuck in TournamentSelectionTest - so we've
> narrowed this down a bit.  Reading that bit of code, there is an
> apparent risk of an infinite loop if by chance org_1 happens to be the
> worst organism in the population.  Perhaps adding a simple counter to
> break out of the loop if after 1000 tries org_1 is still the worst -
> but I'm not sure what to do then.
>
> Peter
>
>   
Hi,
I ran the test multiple times using a bash loop and I think I tracked 
down this specific problem to within the actual test code, specifically 
the function TournamentSelectionTest.t_select_best(). I think this what 
Peter noticed.

This is how I understand things which I hope is sufficient correct to 
understand it.

The test simulates a genome that has 3 locations with the 4 bases coded 
as '0', '1', '2', and '3' for an 'organism'.  (Note the 3 locations is 
hard coded into the random_genome function.) The calculation of fitness 
of an organism is just the integer of the coded values do the first 
position is hundreds, the second is tens and last is ones.

In the TournamentSelectionTest.t_select_best, a second organism is 
simulated that must have a better fitness than the first. The problem 
comes is when the simulated genome of the first organism is '000' 
because the fitness is zero. This creates an infinite loop because the 
line :
            if org_2.fitness < org_1.fitness:
will always to false but eventually this must be true to break the loop. 
Obviously this loop becomes infinite and, given that there are only 
three locations, it should be rather frequent.

Is it sufficient to use the condition '<='?
Alternatively, is there someway to fix the genome of the first organism 
rather than a random one?
For example, instead of the random_organism() declare it as say:
org_1=Organism('100', test_fitness)


Bruce


From biopython at maubp.freeserve.co.uk  Mon Nov 17 21:49:02 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 17 Nov 2008 21:49:02 +0000
Subject: [Biopython-dev] test_GASelection hangs
In-Reply-To: <4921CE2A.3090606@gmail.com>
References: <bug-2666-42@http.bugzilla.open-bio.org/>
	<4921B959.2080706@gmail.com>
	<320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com>
	<4921CE2A.3090606@gmail.com>
Message-ID: <320fb6e00811171349j3bb2757epa7e52e5e55ac0c95@mail.gmail.com>

Bruce wrote:
> Peter wrote:
>> However, this clearly gets stuck in TournamentSelectionTest - so we've
>> narrowed this down a bit.  Reading that bit of code, there is an
>> apparent risk of an infinite loop if by chance org_1 happens to be the
>> worst organism in the population.  Perhaps adding a simple counter to
>> break out of the loop if after 1000 tries org_1 is still the worst -
>> but I'm not sure what to do then.
>>
>> Peter
>
> Hi,
> I ran the test multiple times using a bash loop and I think I tracked down
> this specific problem to within the actual test code, specifically the
> function TournamentSelectionTest.t_select_best(). I think this what Peter
> noticed.

Yes, this was what I was describing.

> This is how I understand things which I hope is sufficient correct to
> understand it.
>
> The test simulates a genome that has 3 locations with the 4 bases coded
> as '0', '1', '2', and '3' for an 'organism'.  (Note the 3 locations is hard
> coded into the random_genome function.) The calculation of fitness of an
> organism is just the integer of the coded values do the first position is
> hundreds, the second is tens and last is ones.
>
> In the TournamentSelectionTest.t_select_best, a second organism is simulated
> that must have a better fitness than the first. The problem comes is when
> the simulated genome of the first organism is '000' because the fitness is
> zero. This creates an infinite loop because the line :
>           if org_2.fitness < org_1.fitness:
> will always to false but eventually this must be true to break the loop.
> Obviously this loop becomes infinite and, given that there are only three
> locations, it should be rather frequent.

Yes.

> Is it sufficient to use the condition '<='?

No, I don't think so.  The point of the setup seems to be to look for
a pair of organisms where one is measurably fitter than the other (and
make sure the better one is indeed selected).

> Alternatively, is there someway to fix the genome of the first organism
> rather than a random one?
> For example, instead of the random_organism() declare it as say:
> org_1=Organism('100', test_fitness)

We could do something like:

#Choose anything except the worst organism, "000",
while True :
    org_1=random_organism()
    if test_fitness(org_1) > 0 : break

[Not tested yet]

This at least is more or less random.

Peter


From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 22:10:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 17:10:27 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811172210.mAHMARax021977@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


------- Comment #15 from eric.pruitt at gmail.com  2008-11-17 17:10 EST -------
(In reply to comment #14)
> I have uploaded the new code and the unit test with some modifications to CVS.
> Could you have a look at it to see if you're happy with the result? I am using
> numpy.dot(x,y) instead of sum(x*y) whereever possible; this gave an additional
> speedup.
> 

That worked really well; I'm happy with the results.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 17 22:22:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 17 Nov 2008 17:22:52 -0500
Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
In-Reply-To: <bug-2657-42@http.bugzilla.open-bio.org/>
Message-ID: <200811172222.mAHMMq6F022720@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2657


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-17 17:22 EST -------
(In reply to comment #15)
> 
> That worked really well; I'm happy with the results.
> 

Excellent - thanks James & Michiel!

Marking this bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bsouthey at gmail.com  Mon Nov 17 22:49:19 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 17 Nov 2008 16:49:19 -0600
Subject: [Biopython-dev] test_GASelection hangs
In-Reply-To: <320fb6e00811171349j3bb2757epa7e52e5e55ac0c95@mail.gmail.com>
References: <bug-2666-42@http.bugzilla.open-bio.org/>	
	<4921B959.2080706@gmail.com>	
	<320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com>	
	<4921CE2A.3090606@gmail.com>
	<320fb6e00811171349j3bb2757epa7e52e5e55ac0c95@mail.gmail.com>
Message-ID: <4921F4EF.4030005@gmail.com>

Peter wrote:
[snip]
>   
>> Alternatively, is there someway to fix the genome of the first organism
>> rather than a random one?
>> For example, instead of the random_organism() declare it as say:
>> org_1=Organism('100', test_fitness)
>>     
>
> We could do something like:
>
> #Choose anything except the worst organism, "000",
> while True :
>     org_1=random_organism()
>     if test_fitness(org_1) > 0 : break
>   
This needs to be:
if org_1.fitness > 0 : break

Also, when looping the test, I occasionally get
Test not getting an organism already in the new population. ... FAIL
Test basic selection on a small population. ... ok

======================================================================
FAIL: Test not getting an organism already in the new population.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_GASelection.py", line 130, in t_no_retrive_organism
    assert new_org != org, "Got organism already in the new population."
AssertionError: Got organism already in the new population.

I'll try to look at it tomorrow.

Bruce

PS thanks for fixing test_GAQueens.py as I have not got it error even 
running it 10000 times.


From biopython at maubp.freeserve.co.uk  Mon Nov 17 23:18:12 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 17 Nov 2008 23:18:12 +0000
Subject: [Biopython-dev] test_GASelection hangs
In-Reply-To: <4921F4EF.4030005@gmail.com>
References: <bug-2666-42@http.bugzilla.open-bio.org/>
	<4921B959.2080706@gmail.com>
	<320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com>
	<4921CE2A.3090606@gmail.com>
	<320fb6e00811171349j3bb2757epa7e52e5e55ac0c95@mail.gmail.com>
	<4921F4EF.4030005@gmail.com>
Message-ID: <320fb6e00811171518p78a3c25cq527c2ef338692ad2@mail.gmail.com>

> This needs to be:
> if org_1.fitness > 0 : break

Yeah.  I've checked in a fix based on this approach, could you try
test_GASelection.py revision 1.3 just to make sure I've not done
something silly.

> Also, when looping the test, I occasionally get
> Test not getting an organism already in the new population. ... FAIL
> Test basic selection on a small population. ... ok
>
> ======================================================================
> FAIL: Test not getting an organism already in the new population.
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "test_GASelection.py", line 130, in t_no_retrive_organism
>   assert new_org != org, "Got organism already in the new population."
> AssertionError: Got organism already in the new population.

Confirmed - when I was just looking for the hanging sub-test, I didn't
spot this.

>From my reading of the GA code there is no guarantee that
DiversitySelection will return a completely new organism.  If it has
to generate one at random, there is a small chance it will match
something already in the population.  i.e. the test itself is flawed.
We could try this say 10 times, but even then the test could fail.

I've fixed this in test_GASelection.py revision 1.4 by simply
commenting out the assert in
DiversitySelectionTest.t_no_retrive_organism.  However, maybe the
underlying Bio.GA.Selection.Diversity code could be altered instead to
guarantee this possibly desirable behaviour?

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 11:13:31 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 06:13:31 -0500
Subject: [Biopython-dev] [Bug 2670] New: Populate seqfeature.display_name
Message-ID: <bug-2670-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2670

           Summary: Populate seqfeature.display_name
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


The seqfeature table has a display_name text field, currently left blank by
Biopython's loader, but is populated by BioPerl.  This field is used in GBrowse
for example: http://gmod.org/wiki/GBrowse

We could use the protein_id, locus_tag, etc depending on what annotation is
available (ideally use the same as BioPerl).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 15:06:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 10:06:06 -0500
Subject: [Biopython-dev] [Bug 2671] New: Including GenomeDiagram in the main
	Biopython distribution
Message-ID: <bug-2671-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671

           Summary: Including GenomeDiagram in the main Biopython
                    distribution
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lpritc at scri.sari.ac.uk


Thanks largely to the efforts of Robert Cadena, we have modified GenomeDiagram
so that it plays nicely with the current CVS of Biopython and would like to
propose its inclusion as part of the main distribution.

GenomeDiagram is described in a Bioinformatics publication
(http://dx.doi.org/10.1093/bioinformatics/btk021), and is useful for
construction of circular and linear  images of biological sequence data, with a
specific domain of visualisation of large-scale genomic, comparative genomic
and other data with reference to a single chromosome or other biological
sequence as publication-quality vector graphics.  It's based on the Reportlab
backend, and can be used to produce rastered and streamed image output, too.

The major changes that have been made to the version previously available at
http://bioinf.scri.ac.uk/lp are:

Class names have been changed and no longer have the GD prefix

References to 'colour' have been changed to 'color', but both spellings are
still permitted in function calls, for backwards-compatibility

The default font has been changed to 'Vera', which is shipped with Reportlab,
to avoid some problems with unavailable fonts

Code for wx widgets has been removed, although the Observer/Observable code
remains, allowing user widgets to hook into the code, if that's desirable.

Some test code is included, testing colour translation and the ability to
produce PDF output in circular and linear diagram formats.

Other minor changes to reduce deprecation warnings (those in Reportlab proper
remain, however), and to remove code that caused font issues.

There are known issues, still.  Writing to a raster format, such as PNG, uses
Reportlab's renderPM code, which defaults to using fonts that are not installed
by Reportlab itself, anymore.  This is a Reportlab issue and doesn't affect
production of PDF output, so testing currently only checks the ability to
generate PDF output.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 15:12:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 10:12:32 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811181512.mAIFCWJY023516@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #1 from lpritc at scri.sari.ac.uk  2008-11-18 10:12 EST -------
Created an attachment (id=1063)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1063&action=view)
GenomeDiagram code, ready to drop into Biopython CVS

Contains GenomeDiagram code under Bio.Graphics.GenomeDiagram, and test code
with examples.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 15:44:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 10:44:29 -0500
Subject: [Biopython-dev] [Bug 2672] New: test_lowess and test_docstrings
	fail to check if numpy is installed
Message-ID: <bug-2672-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2672

           Summary: test_lowess and test_docstrings fail to check if numpy
                    is installed
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: minor
          Priority: P5
         Component: Unit Tests
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


I used the cvs version with a version Python 2.5 that does not have numpy
installed.

Both test_lowess and test_docstring need to have checks for the presence of
Numpy like other tests that require NumPy. These tests should also be skipped
with messages like:
test_kNN ... skipping. Install NumPy if you want to use Bio.kNN. 


======================================================================
ERROR: test_docstrings
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_docstrings.py", line 18, in <module>
    import Bio.Statistics.lowess
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/Statistics/lowess.py",
line 23, in <module>
    import numpy
ImportError: No module named numpy

======================================================================
ERROR: test_lowess
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_lowess.py", line 1, in <module>
    from Bio.Statistics.lowess import lowess
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/Statistics/lowess.py",
line 23, in <module>
    import numpy
ImportError: No module named numpy


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 15:56:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 10:56:01 -0500
Subject: [Biopython-dev] [Bug 2672] test_lowess and test_docstrings fail to
	check if numpy is installed
In-Reply-To: <bug-2672-42@http.bugzilla.open-bio.org/>
Message-ID: <200811181556.mAIFu1o1026838@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2672


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-18 10:56 EST -------
I've fixed test_lowess.py with CVS revision 1.2 to check for numpy as in Bug
2534

For test_docstring.py, I think we could split this in two:

test_docstring.py - no numpy dependence
test_docstring_numpy.py - for modules which need numpy

Or, have some code within test_docstring.py to adjust the list of tests
according to if numpy is installed or not.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 16:05:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 11:05:29 -0500
Subject: [Biopython-dev] [Bug 2672] test_lowess and test_docstrings fail to
	check if numpy is installed
In-Reply-To: <bug-2672-42@http.bugzilla.open-bio.org/>
Message-ID: <200811181605.mAIG5TjK027987@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2672


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-18 11:05 EST -------
(In reply to comment #1)
> For test_docstring.py, I think we could split this in two:
> 
> test_docstring.py - no numpy dependence
> test_docstring_numpy.py - for modules which need numpy
> 
> Or, have some code within test_docstring.py to adjust the list of tests
> according to if numpy is installed or not.

I've gone for the second approach, see test_docstring.py CVS revision 1.6

Marking as fixed.

Thanks Bruce :)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 16:08:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 11:08:54 -0500
Subject: [Biopython-dev] [Bug 2607] Gcc "differ in signedness" warning with
	cstringfnsmodule.c
In-Reply-To: <bug-2607-42@http.bugzilla.open-bio.org/>
Message-ID: <200811181608.mAIG8ss2028159@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2607


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-18 11:08 EST -------
Since this bug was filed, we've declared this module obsolete for Biopython
1.49, and assuming we press ahead and deprecate it in Biopython 1.50 then I
don't see any point in fixing this compiler warning.

Marking as "won't fix".


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 18 18:35:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 13:35:25 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811181835.mAIIZPgc004892@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-18 13:35 EST -------
(In reply to comment #6)
> Still the same warning for Python 2.5 and 2.6:
> 
> Bio/triemodule.c: In function ???_write_value_to_handle???:
> Bio/triemodule.c:498: warning: passing argument 3 of
> ???PyString_AsStringAndSize??? from incompatible pointer type

It looks like PyString_AsStringAndSize will expect a Py_ssize_t length, and not
just an int length.  Suggested patch:


Index: triemodule.c
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/triemodule.c,v
retrieving revision 1.7
diff -r1.7 triemodule.c
486a487,489
> #if PY_VERSION_HEX < 0x02050000
>     Py_ssize_t length;
> #else
487a491
> #endif


i.e. in function  _write_value_to_handle, at line 486 replace this:

    int length;

with this:

#if PY_VERSION_HEX < 0x02050000
    Py_ssize_t length;
#else
    int length;
#endif

This still compiles for me on Python 2.5.2 with gcc 4.0.1 on a Mac.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 02:11:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 21:11:34 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811190211.mAJ2BYpO031573@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp  2008-11-18 21:11 EST -------
I've uploaded a slightly different version to CVS (there were more Py_ssize_t /
int issues). Could you try that one? Bio/triemodule.c, revision 1.8. We should
also see if the unit test still passes on 64 bit platforms.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 03:08:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 18 Nov 2008 22:08:43 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811190308.mAJ38hkI003686@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #9 from bsouthey at gmail.com  2008-11-18 22:08 EST -------
I quickly build the cvs version and the associated tests passed with the
various Python versions 2.3, 2.4, 2.5 (with and without numpy) and 2.6 on my
system.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 08:45:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 03:45:52 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811190845.mAJ8jqv4023408@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #2 from lpritc at scri.sari.ac.uk  2008-11-19 03:45 EST -------
The copyright/credit section at the top of each file still needs to be changed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 10:14:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 05:14:57 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191014.mAJAEv6m032436@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-19 05:14 EST -------
(In reply to comment #8)
> I've uploaded a slightly different version to CVS (there were more Py_ssize_t
> / int issues). Could you try that one? Bio/triemodule.c, revision 1.8. We
> should also see if the unit test still passes on 64 bit platforms.
> 

CVS version compiles triemodule with no warnings using Python 2.5.2 with gcc
4.0.1 on a Mac.  Unit tests pass.

CVS version compiles triemodule with no warnings using Python 2.5 with gcc
4.1.2 on Linux (i686 so 32 bit).  Unit tests pass.

CVS version compiles triemodule with no warnings using Python 2.4.3 with gcc
3.4.6 on Linux (x86_64 so 64 bit).  Unit tests pass.

It sounds like Bruce has checked all python versions with gcc 4.3.2 on Linux.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 12:17:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 07:17:23 -0500
Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from
	incompatible pointer type' warning with triemodule.c
In-Reply-To: <bug-2609-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191217.mAJCHN21008817@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2609


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #11 from mdehoon at ims.u-tokyo.ac.jp  2008-11-19 07:17 EST -------
I tried several Windows versions and a 64 bit unix platform. Everything seems
to be OK. Closing this bug.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 14:38:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 09:38:33 -0500
Subject: [Biopython-dev] [Bug 2674] New: test_kNN: Removal of from numpy
	import *
Message-ID: <bug-2674-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2674

           Summary: test_kNN: Removal of from numpy import *
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Unit Tests
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


This test contains a import numpy statement to check numpy is available.
Therefore it is sufficient just to say 'import numpy'.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 14:39:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 09:39:52 -0500
Subject: [Biopython-dev] [Bug 2674] test_kNN: Removal of from numpy import *
In-Reply-To: <bug-2674-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191439.mAJEdqkH019174@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2674


------- Comment #1 from bsouthey at gmail.com  2008-11-19 09:39 EST -------
Created an attachment (id=1064)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1064&action=view)
patch to change import numpy statement

Just for completeness.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 14:42:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 09:42:27 -0500
Subject: [Biopython-dev] [Bug 2675] New: Use import numpy in kNN
Message-ID: <bug-2675-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2675

           Summary: Use import numpy in kNN
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


Replacing the 'from numpy import *' statement with import numpy.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 14:43:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 09:43:12 -0500
Subject: [Biopython-dev] [Bug 2675] Use import numpy in kNN
In-Reply-To: <bug-2675-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191443.mAJEhCXu019472@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2675


------- Comment #1 from bsouthey at gmail.com  2008-11-19 09:43 EST -------
Created an attachment (id=1065)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1065&action=view)
patch to change import numpy statement

Changes the way numpy is imported.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 14:53:31 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 09:53:31 -0500
Subject: [Biopython-dev] [Bug 2676] New: LogisticRegression: changed the way
	numpy is imported
Message-ID: <bug-2676-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2676

           Summary: LogisticRegression: changed the way numpy is imported
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


A patch to remove the usage of 'from numpy import *' usage.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 14:54:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 09:54:10 -0500
Subject: [Biopython-dev] [Bug 2676] LogisticRegression: changed the way
	numpy is imported
In-Reply-To: <bug-2676-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191454.mAJEsAeg020318@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2676


------- Comment #1 from bsouthey at gmail.com  2008-11-19 09:54 EST -------
Created an attachment (id=1066)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1066&action=view)
patch to change import numpy statement


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 15:04:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 10:04:39 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191504.mAJF4diO021040@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


chapmanb at 50mail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |biopython-dev at biopython.org
         AssignedTo|biopython-dev at biopython.org |chapmanb at 50mail.com


------- Comment #3 from chapmanb at 50mail.com  2008-11-19 10:04 EST -------
Leighton;
This is great; thanks for getting it together. I took a look at this last night
and have a couple of quick comments:

- on the licensing front, the current GPL is not compatible with the Biopython
license; it would be nice to have you explicitly say you are okay with
re-licensing this version under the Biopython license
(http://www.biopython.org/DIST/LICENSE)

- Would it be possible to update the GenomeDiagram documentation from here
(http://bioinf.scri.ac.uk/lp/downloads/programs/genomediagram/userguide.pdf) to
reflect the new namespace and class name changes? Mentioning some of the
gotchas you have below, possibly to replace the installation section, would
also be nice.

I would like Peter and anyone one else interested to weigh in, but I can work
on getting this in after the next release.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 15:13:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 10:13:46 -0500
Subject: [Biopython-dev] [Bug 2674] test_kNN: Removal of from numpy import *
In-Reply-To: <bug-2674-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191513.mAJFDkuO021701@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2674


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-19 10:13 EST -------
Fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 15:17:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 10:17:28 -0500
Subject: [Biopython-dev] [Bug 2675] Use import numpy in kNN
In-Reply-To: <bug-2675-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191517.mAJFHSID022021@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2675


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-19 10:17 EST -------
Fixed in CVS,

Thanks.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 15:21:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 10:21:41 -0500
Subject: [Biopython-dev] [Bug 2676] LogisticRegression: changed the way
	numpy is imported
In-Reply-To: <bug-2676-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191521.mAJFLf8a022292@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2676


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-19 10:21 EST -------
Fixed in CVS, thanks!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 15:29:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 10:29:25 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191529.mAJFTPhW022858@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1062 is|0                           |1
           obsolete|                            |


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-19 10:29 EST -------
(From update of attachment 1062)
This attachment seems to have been removed (or failed to upload?).

See attachment 1063 instead.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 15:29:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 10:29:50 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191529.mAJFTon7022928@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-19 10:29 EST -------
(In reply to comment #3)
> 
> I would like Peter and anyone one else interested to weigh in, but
> I can work on getting this in after the next release.
> 

I'm all for adding GenomeDiagram to Biopython (as stated on the mailing list).  

I haven't actually looked at this revised code base yet - but as I've used GD
before and know Leighton "in real life" it might be easier for me to shepherd
this into CVS - but the more eyes the better ;)

We might also consider getting Leighton CVS access (provisionally use with this
module only).

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 16:07:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 11:07:24 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811191607.mAJG7OcJ025581@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #6 from lpritc at scri.sari.ac.uk  2008-11-19 11:07 EST -------
(In reply to comment #5)
> Leighton;
> This is great; thanks for getting it together. I took a look at this last night
> and have a couple of quick comments:

No problem.  Robert Cadena deserves the bulk of the credit - he made most of
the changes.

> - on the licensing front, the current GPL is not compatible with the Biopython
> license; it would be nice to have you explicitly say you are okay with
> re-licensing this version under the Biopython license
> (http://www.biopython.org/DIST/LICENSE)

I am perfectly happy with re-licensing the GD code under the Biopython license.
 If you need a gpg-signed document to say so, I can provide one ;)

> - Would it be possible to update the GenomeDiagram documentation from here
> (http://bioinf.scri.ac.uk/lp/downloads/programs/genomediagram/userguide.pdf) to
> reflect the new namespace and class name changes? 

Yep - I'll do that, next.

> Mentioning some of the
> gotchas you have below, possibly to replace the installation section, would
> also be nice.

Definitely.  Most of the gotchas are Reportlab-related, but they definitely
have a place under Installation in the docs.

> I would like Peter and anyone one else interested to weigh in, but I can work
> on getting this in after the next release.

The more, the merrier... it's not my little baby anymore <sniff> it's out in
the big world ;)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 21:49:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 16:49:48 -0500
Subject: [Biopython-dev] [Bug 2677] New: BioSQL seqfeature enhancements
Message-ID: <bug-2677-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677

           Summary: BioSQL seqfeature enhancements
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: cymon.cox at gmail.com


Cleaned-up (sub-)seqFeature locations, and strand. Added location_operator
storage and test. Added remote location storage for sub-features, and test.

Ive used the "Sequence Keys" ontology for the location operator and stored loc
op in the location_qualifier_value table - not sure this is right...

Patches attached.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 21:51:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 16:51:53 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811192151.mAJLprRP024242@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #1 from cymon.cox at gmail.com  2008-11-19 16:51 EST -------
Created an attachment (id=1072)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1072&action=view)
Patch for BioSQL/BioSeq.py and Loader.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 19 21:52:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 19 Nov 2008 16:52:46 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811192152.mAJLqk91024384@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #2 from cymon.cox at gmail.com  2008-11-19 16:52 EST -------
Created an attachment (id=1073)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1073&action=view)
Patch for BioSQL test cases


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 10:17:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 05:17:17 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811201017.mAKAHHA8027467@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-20 05:17 EST -------
(In reply to comment #0)
> Cleaned-up (sub-)seqFeature locations, and strand. Added location_operator
> storage and test. Added remote location storage for sub-features, and test.
>

Excellent - I see you've removed the naive min/max to find the parent feature's
location when dealing with sub-features.  This should fix the special case
where a feature spans the origin on a circular genome.

That should take care of many of my "TODO" entries in test_BioSQL_SeqIO.py :)

>
> Ive used the "Sequence Keys" ontology for the location operator and stored
> loc op in the location_qualifier_value table - not sure this is right...
>

I'm not sure off hand either, but would like us to check before committing
this.  In the short term, what ever BioPerl does is "right" as I'm treating
that as the BioSQL reference implementation.

> 
> Patches attached.
>

I've scanned over them quickly, and they look fine.  The comments do help :)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 10:53:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 05:53:19 -0500
Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence
	objects "
In-Reply-To: <bug-2662-42@http.bugzilla.open-bio.org/>
Message-ID: <200811201053.mAKArJsp029436@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2662


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-20 05:53 EST -------
Unless anyone else wants to weigh in on Josh's side, I'm not going to change
this.  Closing bug - but thanks for reporting it anyway Josh.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Thu Nov 20 10:55:57 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 20 Nov 2008 10:55:57 +0000
Subject: [Biopython-dev] Biopython 1.49 beta released
In-Reply-To: <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com>
References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com>
	<320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com>
Message-ID: <320fb6e00811200255x5325a7d4kf4d118350a9e7e65@mail.gmail.com>

OK,

Progress since Biopython 1.49 beta was released:

> We've had a few Numeric -> NumPy bugs reported,
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2658
> Bug 2658 - Bio.PDB.Neighborsearch

Fixed.

> http://bugzilla.open-bio.org/show_bug.cgi?id=2649
> Bug 2649 - Bio.KDTree (probably fixed)

No confirmation from the original reporter, but looks OK.

> I don't think we should release Biopython 1.49 final until these are
> resolved - but if there was interest I could put out a second beta.

No-one seems to want a second beta, which saves me some time :)

There have been a few other bugs reported and fixed in the meantime,
right now the only thing I think holding up the release of Biopython
1.49 is:

http://bugzilla.open-bio.org/show_bug.cgi?id=2677
Bug 2677 - BioSQL seqfeature enhancements

Is there anything else?

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 14:19:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 09:19:39 -0500
Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence
	objects "
In-Reply-To: <bug-2662-42@http.bugzilla.open-bio.org/>
Message-ID: <200811201419.mAKEJcW6011296@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2662


------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp  2008-11-20 09:19 EST -------
I am not a native English speaker, but I do agree with Josh that the original
phrase "... different set of methods TO a plain python string" sounds strange
to me. I would suggest something along the lines of "the set of methods of a
Seq object are slightly different from those of a plain python string."
But again, that may be Double Dutch.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 14:34:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 09:34:25 -0500
Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence
	objects "
In-Reply-To: <bug-2662-42@http.bugzilla.open-bio.org/>
Message-ID: <200811201434.mAKEYPOh015951@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2662


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-20 09:34 EST -------
(In reply to comment #3)
> I am not a native English speaker, but I do agree with Josh that the original
> phrase "... different set of methods TO a plain python string" sounds strange
> to me.

As a native English speaker I'm happy with this as is, but concede
international usage may vary - and I do want the Tutorial to be as assessable
as possible.

> I would suggest something along the lines of "the set of methods of a
> Seq object are slightly different from those of a plain python string."
> But again, that may be Double Dutch.

I would say a "set of methods" is singular, but the rest of this sentence is
plural.  How about completely rephrasing:

First of all, they have some different methods (for example, Seq objects have
reverse_complement() and translate() methods used for nucleotide sequences).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bsouthey at gmail.com  Thu Nov 20 15:09:42 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu, 20 Nov 2008 09:09:42 -0600
Subject: [Biopython-dev] Biopython 1.49 beta released
In-Reply-To: <320fb6e00811200255x5325a7d4kf4d118350a9e7e65@mail.gmail.com>
References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com>	<320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com>
	<320fb6e00811200255x5325a7d4kf4d118350a9e7e65@mail.gmail.com>
Message-ID: <49257DB6.5080902@gmail.com>

Hi,
In connection with Peter's email on forthcoming release, I was wondering 
what to do about certain modules that do not seem to be used. I started 
to look at the examples that lack test coverage in case one could do 
something for the Biopython 1.49 release. But this should not provide 
any reason for delay the release and may stretch beyond it.

Given the potential long term impact and spirit of people who donated 
the code, I was thinking that the release notes could denote which 
modules are unsupported and need some usage feedback.  In future 
releases the use of these modules would raise a warning about being 
unsupported or obsolete. Please note that I am not against any of these 
modules except for the requirement to maintain them and developing 
suitable tests.

The possible modules are those that Peter previously mentioned that had 
no tests:

Bio.Affy
Bio.AlignAce
Bio.EZRetrieve
Bio.Emboss (everything except the primer parsers)
Bio.Encodings (obsolete?)
Bio.FilteredReader (obsolete?)
Bio.MaxEntropy
Bio.NMR
Bio.NaiveBayes
Bio.NetCatch (obsolete?)

I think that Bio.MaxEntropy and Bio.NaiveBayes are useful and I did 
provide an example that is included in the code. However I am not 
confident in these methods to maintain these mainly due to my lack of 
knowledge.

Similarly for Bio.Affy, I currently work a lot with two-dye systems but 
not Affy. I find that Bio.Affy provides insufficient functionality 
because it does really only reads the intensities and misses other 
important information in version 3 of Affy format. I do recognize that 
it could be a base for Affy stuff that may be useful for users such as 
the PopGen users that use Affy SNP arrays.

Bruce


Peter wrote:
> OK,
>
> Progress since Biopython 1.49 beta was released:
>
>   
>> We've had a few Numeric -> NumPy bugs reported,
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2658
>> Bug 2658 - Bio.PDB.Neighborsearch
>>     
>
> Fixed.
>
>   
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2649
>> Bug 2649 - Bio.KDTree (probably fixed)
>>     
>
> No confirmation from the original reporter, but looks OK.
>
>   
>> I don't think we should release Biopython 1.49 final until these are
>> resolved - but if there was interest I could put out a second beta.
>>     
>
> No-one seems to want a second beta, which saves me some time :)
>
> There have been a few other bugs reported and fixed in the meantime,
> right now the only thing I think holding up the release of Biopython
> 1.49 is:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2677
> Bug 2677 - BioSQL seqfeature enhancements
>
> Is there anything else?
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>   


From bsouthey at gmail.com  Thu Nov 20 16:26:40 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu, 20 Nov 2008 10:26:40 -0600
Subject: [Biopython-dev] Bio.EZRetrieve appears to be obsolete or redunant
Message-ID: <49258FC0.10703@gmail.com>

Hi,
The Bio.EZRetrieve module retrieves a single nucleotide sequence from 
EZRetrieve website:
http://siriusb.umdnj.edu:18080/EZRetrieve/single_r.jsp
It requires a human, rat or mouse nucleic GenBank, UniGene, LocusLink, 
or IMAGE ID. No other genomes are supported.

Although it appears faster than a Bio.GenBank query, I do not see that 
this module provides any special functionality than that already 
provided by Bio.GenBank and similar. So I think this module is obsolete 
and redundant.

Notes:
1) Obviously LocusLink has been superseded by Entrez Gene.
2) The documented genome builds are 2003 (eg human BUILD.34 at 11/04/2003) 
but not known if these have been updated since.
3) The start of the sequence is zero. You can use from_='start' instead 
but the can not mix it with numerical ending.
4) The actual website provides additional information including NCBI 
links (LocusLink and Nucleic) and does base counting.
5) There are other functions provided by the website like multiple 
retrievals.

The website example is for 'homeobox B6 [/Homo sapiens/]':

import Bio.EZRetrieve
seq=Bio.EZRetrieve.retrieve_single('BC014651', 1, 20)
print seq

Gives:
 >BC014651:HOXB6                        
ACCACACCTAGGTCGGAGCA

Bruce


From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 17:05:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 12:05:22 -0500
Subject: [Biopython-dev] [Bug 2678] New: Entrez.esearch does not always
	retrieve or find DTD files
Message-ID: <bug-2678-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678

           Summary: Entrez.esearch does not always retrieve or find DTD
                    files
           Product: Biopython
           Version: 1.49b
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lpritc at scri.sari.ac.uk


When using Entrez.esearch, I have observed an intermittent failure to recover
DTD files.  These are not being cached on successful search attempts.  It may
be worth including them in the distribution.

Traceback:

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py:279:
UserWarning: DTD file xhtml1-strict.dtd not found in Biopython installation;
trying to retrieve it from NCBI
  warnings.warn("DTD file %s not found in Biopython installation; trying to
retrieve it from NCBI" % filename)
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py:279:
UserWarning: DTD file xhtml-lat1.ent not found in Biopython installation;
trying to retrieve it from NCBI
  warnings.warn("DTD file %s not found in Biopython installation; trying to
retrieve it from NCBI" % filename)
Traceback (most recent call last):
  File "./get_entrez_ests.py", line 158, in <module>
    main()
  File "./get_entrez_ests.py", line 45, in main
    options.verbose)
  File "./get_entrez_ests.py", line 76, in get_entrez_session
    results = Entrez.read(handle)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/__init__.py",
line 286, in read
    record = handler.run(handle)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py",
line 95, in run
    self.parser.ParseFile(handle)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py",
line 283, in external_entity_ref_handler
    parser.ParseFile(handle)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py",
line 280, in external_entity_ref_handler
    handle = urllib.urlopen(systemId)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py",
line 87, in urlopen
    return opener.open(url)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py",
line 203, in open
    return getattr(self, name)(url)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py",
line 461, in open_file
    return self.open_local_file(url)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py",
line 475, in open_local_file
    raise IOError(e.errno, e.strerror, e.filename)
IOError: [Errno 2] No such file or directory: 'xhtml-lat1.ent'


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Thu Nov 20 17:06:34 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 20 Nov 2008 17:06:34 +0000
Subject: [Biopython-dev] Bio.EZRetrieve appears to be obsolete or
	redunant
In-Reply-To: <49258FC0.10703@gmail.com>
References: <49258FC0.10703@gmail.com>
Message-ID: <320fb6e00811200906p4b8ba2b9jca212a39ec8f972c@mail.gmail.com>

On Thu, Nov 20, 2008 at 4:26 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Hi,
> The Bio.EZRetrieve module retrieves a single nucleotide sequence from
> EZRetrieve website:
> http://siriusb.umdnj.edu:18080/EZRetrieve/single_r.jsp
> It requires a human, rat or mouse nucleic GenBank, UniGene, LocusLink, or
> IMAGE ID. No other genomes are supported.
>
> Although it appears faster than a Bio.GenBank query, I do not see that this
> module provides any special functionality than that already provided by
> Bio.GenBank and similar. So I think this module is obsolete and redundant.

Note the online bits of Bio.GenBank are considered obsoleted by
Bio.Entrez anyway.  Maybe we should actually deprecate these for
Biopython 1.49...

I would agree in some ways  Bio.EZRetrieve module is also obsolete and
redundant, see also:
http://lists.open-bio.org/pipermail/biopython-dev/2008-March/003503.html

Unless anyone wants to defend Bio.EZRetrieve, let's ask on the main
list about declaring it obsolete for Biopython 1.49 (documentation
change only) and deprecating it in the next release (adding a warning
only).

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 17:06:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 12:06:37 -0500
Subject: [Biopython-dev] [Bug 2678] Entrez.esearch does not always retrieve
	or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200811201706.mAKH6b1r006648@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #1 from lpritc at scri.sari.ac.uk  2008-11-20 12:06 EST -------
And this time, more usefully, traceback with problem code:

>>> handle = Entrez.einfo()
>>> record = Entrez.read(handle)
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py:279:
UserWarning: DTD file xhtml1-strict.dtd not found in Biopython installation;
trying to retrieve it from NCBI
  warnings.warn("DTD file %s not found in Biopython installation; trying to
retrieve it from NCBI" % filename)
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py:279:
UserWarning: DTD file xhtml-lat1.ent not found in Biopython installation;
trying to retrieve it from NCBI
  warnings.warn("DTD file %s not found in Biopython installation; trying to
retrieve it from NCBI" % filename)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/__init__.py",
line 286, in read
    record = handler.run(handle)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py",
line 95, in run
    self.parser.ParseFile(handle)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py",
line 283, in external_entity_ref_handler
    parser.ParseFile(handle)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py",
line 280, in external_entity_ref_handler
    handle = urllib.urlopen(systemId)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py",
line 87, in urlopen
    return opener.open(url)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py",
line 203, in open
    return getattr(self, name)(url)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py",
line 461, in open_file
    return self.open_local_file(url)
  File
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py",
line 475, in open_local_file
    raise IOError(e.errno, e.strerror, e.filename)
IOError: [Errno 2] No such file or directory: 'xhtml-lat1.ent'


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 17:07:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 12:07:40 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200811201707.mAKH7ej9006714@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


lpritc at scri.sari.ac.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Entrez.esearch does not     |Bio.Entrez module does not
                   |always retrieve or find DTD |always retrieve or find DTD
                   |files                       |files


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 17:14:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 12:14:35 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811201714.mAKHEZj4007097@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #4 from cymon.cox at gmail.com  2008-11-20 12:14 EST -------
(In reply to comment #3)
> (In reply to comment #0)
> > Ive used the "Sequence Keys" ontology for the location operator and stored
> > loc op in the location_qualifier_value table - not sure this is right...
> >
> 
> I'm not sure off hand either, but would like us to check before committing
> this.  In the short term, what ever BioPerl does is "right" as I'm treating
> that as the BioSQL reference implementation.

I don't read Perl - but I grep'ed through the source and only found one ref to
the location_qualifier_value, and that was in the docs. So maybe they don't
store it there...

Sorry I can be of more help, C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 20 22:01:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 20 Nov 2008 17:01:13 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200811202201.mAKM1Dce030238@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2008-11-20 17:01 EST -------
Could you make a list of the missing DTDs? You add the missing ones to
Bio/Entrez/DTDs and reinstall Biopython. It looks like only xhtml1-strict.dtd
and xhtml-lat1.ent are missing, but after adding these to Bio/Entrez/DTDs you
may find other missing DTDs.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 08:54:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 03:54:00 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200811210854.mAL8s0Dt009861@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #3 from lpritc at scri.sari.ac.uk  2008-11-21 03:53 EST -------
(In reply to comment #2)
> Could you make a list of the missing DTDs? You add the missing ones to
> Bio/Entrez/DTDs and reinstall Biopython. It looks like only xhtml1-strict.dtd
> and xhtml-lat1.ent are missing, but after adding these to Bio/Entrez/DTDs you
> may find other missing DTDs.

I'll add the DTDs that I noted above, but the problem is intermittent and I
haven't seen the issue arise again at all, this morning.  If I see anything
else give an error, I'll make a note here.

This may be something to keep in mind if other, similar errors are reported
from future Entrez searches, but if the problem is the result of excessive
server load, or timeouts, it may not be reliably repeatable.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 10:52:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 05:52:17 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211052.mALAqHel020569@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 05:52 EST -------
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #0)
> > > Ive used the "Sequence Keys" ontology for the location operator and stored
> > > loc op in the location_qualifier_value table - not sure this is right...
> > >
> > 
> > I'm not sure off hand either, but would like us to check before committing
> > this.  In the short term, what ever BioPerl does is "right" as I'm treating
> > that as the BioSQL reference implementation.
> 
> I don't read Perl - but I grep'ed through the source and only found one ref to
> the location_qualifier_value, and that was in the docs. So maybe they don't
> store it there...
> 
> Sorry I can be of more help, C.
> 

I tried browsing and searching the BioPerl-db source, but couldn't find the
answer, so I tried the direct route and used their load_seqdatabase.pl script
to import a GenBank file (with at least one join location) and inspected the
tables.

The answer is that location.term_id is always left as NULL, so there is no
ontology to worry about.  Doing something sensible with ontologies (e.g.
support for existing strict ontologies like SO or SOFA) rather than the current
ad-hoc relaxed approach (adding new ontology terms on the fly) taken by BioPerl
and Biopython is a possible future enhancement.

I'm going to look at modifying you patch to leave location.term_id as NULL,
with the aim of committing that today and then doing the Biopython 1.49
release.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 11:54:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 06:54:18 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211154.mALBsIcR025739@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1073 is|0                           |1
           obsolete|                            |


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 11:59:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 06:59:08 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211159.mALBx89Z026099@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1072 is|0                           |1
           obsolete|                            |


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 06:59 EST -------
(From update of attachment 1072)
Hi Cymon,

I've just checked in something based on your patches:

Checking in BioSQL/Loader.py;
/home/repository/biopython/biopython/BioSQL/Loader.py,v  <--  Loader.py
new revision: 1.37; previous revision: 1.36
done
Checking in BioSQL/BioSeq.py;
/home/repository/biopython/biopython/BioSQL/BioSeq.py,v  <--  BioSeq.py
new revision: 1.31; previous revision: 1.30
done
Checking in Tests/test_BioSQL_SeqIO.py;
/home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v  <-- 
test_BioSQL_SeqIO.py
new revision: 1.27; previous revision: 1.26
done

This should fix the strand, feature db ref in locations, and importantly the
start/end with sub-features.

I am avoiding the ontology question by leaving location.term_id as NULL
(following BioPerl usage).

I'd like to do the same with location_qualifier_value.term_id but the schema
does not allow NULL here.  Interestingly BioPerl does not seem to use this
table, so I assume they (like Biopython) have been assuming "join".

I think this is still a big improvement, but that the
(sub)feature.location_operator issue could wait.  We'll need to discuss on the
BioSQL mailing list how this should be handled consistently.

Leaving this bug open.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 12:04:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 07:04:39 -0500
Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO
In-Reply-To: <bug-2643-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211204.mALC4dUW026607@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2643


dalloliogm at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #1048|application/octet-stream    |text/plain
          mime type|                            |


------- Comment #23 from dalloliogm at gmail.com  2008-11-21 07:04 EST -------
(From update of attachment 1048)
changed mime type


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 12:18:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 07:18:35 -0500
Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence
	objects "
In-Reply-To: <bug-2662-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211218.mALCIZds027946@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2662


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 07:18 EST -------
Fixed in CVS revision 1.187 of biopython/Doc/Tutorial.tex by completely
rephrasing to avoid the contentious sentence structure.  See:

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython

Now reads:
> There are two important di???erences between Seq objects and standard
> python strings. First of all, they have di???erent methods. Although
> the Seq object supports many of the same methods as a plain string,
> its translate() method di???ers by doing biological translation, and
> there are also additional biologically relevant methods like
> reverse_complement(). Secondly, the Seq object has an important
> attribute, alphabet, which is an object describing what the individual
> characters making up the sequence string ???mean???, and how they should
> be interpreted. For example, is AGTACACTGGT a DNA sequence, or just
> a protein sequence that happens to be rich in Alanines, Glycines,
> Cysteines and Threonines?

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Fri Nov 21 12:38:07 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 21 Nov 2008 12:38:07 +0000
Subject: [Biopython-dev] CVS freeze for Biopython 1.49
Message-ID: <320fb6e00811210438v272d32afta03497a846716df6@mail.gmail.com>

On Nov 20, Peter wrote:
> No-one seems to want a second beta, which saves me some time :)
>
> There have been a few other bugs reported and fixed in the meantime,
> right now the only thing I think holding up the release of Biopython
> 1.49 is:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2677
> Bug 2677 - BioSQL seqfeature enhancements

I've committed most of this bug fix to CVS, I think the remaining
issue can wait until after Biopython 1.49 is out.

> Is there anything else?

If there are no last minute objections, my plan is to do the Biopython
1.49 release this afternoon, hopefully starting after lunch - in about
one hour's time.

Please **consider CVS frozen from now**.  Hopefully I'll have the
build done within the next 12 hours, including the Windows installers.

Once the release is out, we'll give it a few days just in case there
are any issues to force a re-release, and then reopen CVS.  Tiago has
some more PopGen code waiting, and there is also GenomeDiagram to look
forward too (Bug 2671).

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 14:46:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 09:46:29 -0500
Subject: [Biopython-dev] [Bug 2680] New: Bio.AlignAce.Parser.py need to
	import string
Message-ID: <bug-2680-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2680

           Summary: Bio.AlignAce.Parser.py need to import string
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: trivial
          Priority: P4
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


The file Bio.AlignAce.Parser.py needs to 'import string' because it uses the
function 'string.atof()'. Also, please note that string.atof() is a depreciated
function (since Python 2.0) but it will not get removed until Python 3.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 14:57:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 09:57:47 -0500
Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to import
	string
In-Reply-To: <bug-2680-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211457.mALEvlR5009727@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2680


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 09:57 EST -------
This used to work via the "from Bio.ParserSupport import *", as up until
Biopython 1.48 that imported string.

Fixed in Bio/AlignAce/Parser.py revision 1.4 by importing string (this will be
included in Biopython 1.49).

I'm leaving this bug open as I would rather not use the string module here at
all - probably we can just use float() instead of string.atof() but that can
wait until after Biopython 1.49 is out.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bsouthey at gmail.com  Fri Nov 21 15:19:22 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Fri, 21 Nov 2008 09:19:22 -0600
Subject: [Biopython-dev] Use of depreciated string functions
Message-ID: <4926D17A.8080101@gmail.com>

Hi,
There are a number of files in Bio that import string. Many of these use 
depreciated functions (since Version 2) that are now string methods 
mainly  string.atof(), string.atoi()  and string.join(). The only real 
advantage of modifying these is to remove an import statement because 
these will not be removed until Python 3.

Perhaps the one exception is in HotRand.py: hex_digit = 
string.hexdigits.find( letter )

There are about 23 unique files that I identified via grep and many have 
more than one usage. While changing these is busy work, please let me 
know if you would like me to create patches for the next version of 
Biopython (ie 1.50) or just ignore this.

Thanks
Bruce


From biopython at maubp.freeserve.co.uk  Fri Nov 21 15:26:52 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 21 Nov 2008 15:26:52 +0000
Subject: [Biopython-dev] Use of depreciated string functions
In-Reply-To: <4926D17A.8080101@gmail.com>
References: <4926D17A.8080101@gmail.com>
Message-ID: <320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com>

On Fri, Nov 21, 2008 at 3:19 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Hi,
> There are a number of files in Bio that import string. Many of these use
> depreciated functions (since Version 2) that are now string methods mainly
>  string.atof(), string.atoi()  and string.join(). The only real advantage of
> modifying these is to remove an import statement because these will not be
> removed until Python 3.
>
> Perhaps the one exception is in HotRand.py: hex_digit =
> string.hexdigits.find( letter )
>
> There are about 23 unique files that I identified via grep and many have
> more than one usage. While changing these is busy work, please let me know
> if you would like me to create patches for the next version of Biopython (ie
> 1.50) or just ignore this.

As you say, there isn't much benefit from doing this other than
removing an import and making another small step towards Python 3.0
compatibility.  We have gradually been phasing out "import string"
already, usually when working on a module which used it.

Once I've dealt with Biopython 1.49, I'd be happy to look at a patch
to remove more "import string" usage from non-obsolete, non-deprecated
code.  It would be a little risky doing this to modules without unit
tests, but that's another area you've shown some interest in anyway...

Thanks,

Peter


From bartek at rezolwenta.eu.org  Fri Nov 21 15:32:02 2008
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Fri, 21 Nov 2008 16:32:02 +0100
Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to
	import string
In-Reply-To: <200811211457.mALEvlR5009727@portal.open-bio.org>
References: <bug-2680-42@http.bugzilla.open-bio.org/>
	<200811211457.mALEvlR5009727@portal.open-bio.org>
Message-ID: <8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com>

Hello,

I fixed the bug (changed both uses of string.atof() to float() ), and
commited to CVS, although I cannot close it in Bugzilla (my
dev.open-bio account does not seem to work for bugzilla).


cheers
Bartek Wilczynski

On Fri, Nov 21, 2008 at 3:57 PM,  <bugzilla-daemon at portal.open-bio.org> wrote:
> http://bugzilla.open-bio.org/show_bug.cgi?id=2680
>
>
>
>
>
> ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 09:57 EST -------
> This used to work via the "from Bio.ParserSupport import *", as up until
> Biopython 1.48 that imported string.
>
> Fixed in Bio/AlignAce/Parser.py revision 1.4 by importing string (this will be
> included in Biopython 1.49).
>
> I'm leaving this bug open as I would rather not use the string module here at
> all - probably we can just use float() instead of string.atof() but that can
> wait until after Biopython 1.49 is out.
>
>
> --
> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>


-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 15:41:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 10:41:54 -0500
Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to import
	string
In-Reply-To: <bug-2680-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211541.mALFfsDM013508@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2680


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 10:41 EST -------
Bartek's email:
> Hello,
>
> I fixed the bug (changed both uses of string.atof() to float() ),
> and commited to CVS, although I cannot close it in Bugzilla (my
> dev.open-bio account does not seem to work for bugzilla).
>
> cheers
> Bartek Wilczynski

Marking this as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Fri Nov 21 15:45:42 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 21 Nov 2008 15:45:42 +0000
Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to
	import string
In-Reply-To: <8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com>
References: <bug-2680-42@http.bugzilla.open-bio.org/>
	<200811211457.mALEvlR5009727@portal.open-bio.org>
	<8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com>
Message-ID: <320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com>

On Fri, Nov 21, 2008 at 3:32 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
> Hello,
>
> I fixed the bug (changed both uses of string.atof() to float() ), and
> commited to CVS, although I cannot close it in Bugzilla (my
> dev.open-bio account does not seem to work for bugzilla).
>
> cheers
> Bartek Wilczynski

Thanks Bartek,

I was partway through the build process for the Biopython 1.49
release, but I've got that latest Bio/AliceAce/Parser.py file now.
I've closed Bug 2680 - I'm not sure how the permissions work on
Bugzilla exactly...

On a related note - could you write a unit test for Bio.AlignAce please?

Thanks,

Peter


From biopython at maubp.freeserve.co.uk  Fri Nov 21 16:07:00 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 21 Nov 2008 16:07:00 +0000
Subject: [Biopython-dev] Warnings from epydoc
Message-ID: <320fb6e00811210807xed03553x24e3abc571e9f20a@mail.gmail.com>

Hi all,

Something that I could have mentioned when I built the beta is there
are a lot of warnings from epydoc.  Ignoring a few from deprecated
modules etc, there is a whole class as follows:

Warning: Module Bio.KDTree.KDTree is shadowed by a variable with the same name.
Warning: Module Bio.PDB.DSSP is shadowed by a variable with the same name.
Warning: Module Bio.PDB.FragmentMapper is shadowed by a variable with
the same name.
Warning: Module Bio.PDB.NeighborSearch is shadowed by a variable with
the same name.
Warning: Module Bio.PDB.PDBIO is shadowed by a variable with the same name.
Warning: Module Bio.PDB.PDBList is shadowed by a variable with the same name.
Warning: Module Bio.PDB.PDBParser is shadowed by a variable with the same name.
Warning: Module Bio.PDB.ResidueDepth is shadowed by a variable with
the same name.
Warning: Module Bio.PDB.StructureAlignment is shadowed by a variable
with the same name.
Warning: Module Bio.PDB.Superimposer is shadowed by a variable with
the same name.
Warning: Module Bio.PDB.Vector is shadowed by a variable with the same name.
Warning: Module Bio.PDB.parse_pdb_header is shadowed by a variable
with the same name.
Warning: Module Bio.SVDSuperimposer.SVDSuperimposer is shadowed by a
variable with the same name.
Warning: Module Bio.SCOP.Residues is shadowed by a variable with the same name.

One visible side effect of this in the epydoc output is these modules
get shown with an apostrophe suffix for disambiguation.

On another point, I think some of the imports used in Bio.PopGen are
making epydoc unhappy:

+-------------------------------------------------------------------------------------------------
| In /usr/local/lib/python2.5/site-packages/Bio/PopGen/SimCoal/Cache.py:
| Import failed (but source code parsing was successful).
|     Error: ImportError: No module named PopGen.SimCoal.Controller (line 14)
|
+-------------------------------------------------------------------------------------------------
| In /usr/local/lib/python2.5/site-packages/Bio/PopGen/SimCoal/Async.py:
| Import failed (but source code parsing was successful).
|     Error: ImportError: No module named PopGen.SimCoal.Controller (line 16)
|

Taking Bio/PopGen/SimCoal/Cache.py as an example, currently this has:

from PopGen.SimCoal.Controller import SimCoalController
from PopGen import Config

Perhaps this should be changed to either local imports:

from Controller import SimCoalController
import Config

or full imports:

from Bio.PopGen.SimCoal.Controller import SimCoalController
from Bio.PopGen import Config

(Neither tested yet).

I don't know if the current imports have any downsides (apart from
upsetting epydoc), as the current code works and the unit tests pass.

Peter


From bsouthey at gmail.com  Fri Nov 21 16:15:29 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Fri, 21 Nov 2008 10:15:29 -0600
Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need
 to	import string
In-Reply-To: <320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com>
References: <bug-2680-42@http.bugzilla.open-bio.org/>	<200811211457.mALEvlR5009727@portal.open-bio.org>	<8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com>
	<320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com>
Message-ID: <4926DEA1.7020405@gmail.com>

Peter wrote:
> On Fri, Nov 21, 2008 at 3:32 PM, Bartek Wilczynski
> <bartek at rezolwenta.eu.org> wrote:
>   
>> Hello,
>>
>> I fixed the bug (changed both uses of string.atof() to float() ), and
>> commited to CVS, although I cannot close it in Bugzilla (my
>> dev.open-bio account does not seem to work for bugzilla).
>>
>> cheers
>> Bartek Wilczynski
>>     
>
> Thanks Bartek,
>
> I was partway through the build process for the Biopython 1.49
> release, but I've got that latest Bio/AliceAce/Parser.py file now.
> I've closed Bug 2680 - I'm not sure how the permissions work on
> Bugzilla exactly...
>
> On a related note - could you write a unit test for Bio.AlignAce please?
>
> Thanks,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>   
Hi Bartek,
I just started on working through understanding the functionality of the 
code so it would be really great to the tests and a tutorial section on 
AlignAce.

So far I know that there needs to be at least two tests for AlignAce:
1) Running  Bio.AlignAce.AlignAceStandalone
2) Parsing the output  from AlignAce

There needs to be similar tests for CompareAce.

Also, could you please add the following lines to your AlignAce2004 code 
(I downloaded it from your site yesterday) to standard.h?

#include <limits.h>
#include <string.h>

I needed these to compile AlignAce under Linux with gcc version 4.3.2. I 
would also suggest not to include binaries because they are statically 
linked to old C++ libraries. Running just './AlignACE' gives the error:
./AlignACE: error while loading shared libraries: libstdc++.so.5: cannot 
open shared object file: No such file or directory

Thanks
Bruce


From biopython at maubp.freeserve.co.uk  Fri Nov 21 16:59:08 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 21 Nov 2008 16:59:08 +0000
Subject: [Biopython-dev] Biopython 1.49 released
Message-ID: <320fb6e00811210859n2d128fd6nc21ad1012e1d93bf@mail.gmail.com>

Dear Biopythoneers,

We are pleased to announce the release of Biopython 1.49. There have
been some significant changes since Biopython 1.48 was released a few
months ago, which is why we initially released a beta for wider
testing. Thank you to all those who tried this and reported the minor
problems uncovered.

As previously announced, the big news is that Biopython now uses NumPy
rather than its precursor Numeric (the original Numerical Python
library).

As in the previous releases, Biopython 1.49 supports Python 2.3, 2.4
and 2.5 but should now also work fine on Python 2.6. Please note that
we intend to drop support for Python 2.3 in a couple of releases time.

We also have some new functionality, starting with the basic sequence
object (the Seq class) which now has more methods. This encourages a
more object orientated coding style, and makes basic biological
operations like transcription and translation more accessible and
discoverable.

Our BioSQL interface can now optionally fetch the NCBI taxonomy on
demand when loading sequences (via Bio.Entrez) allowing you to
populate the taxon/taxon_name tables gradually. Also, BioSQL should
now work with the psycopg2 driver for PostgreSQL (as well as the older
psycopg driver), and the handling of feature locations has also been
improved.

We've also updated the Biopython Tutorial and Cookbook (also available in PDF).
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Finally, our old parsing infrastructure (Martel and Bio.Mindy) is now
considered to be deprecated, meaning mxTextTools is no longer required
to use Biopython. This should not affect any of the typically used
parsers (e.g. Bio.SeqIO and Bio.AlignIO).

Given there have been more changes than in recent Biopython releases,
please do check your old scripts still work fine, and let us know on
the mailing list or file a bug if there is anything wrong.

Source distributions and Windows installers are available from the
Biopython website:
http://biopython.org/wiki/Download

Thanks!

-Peter on behalf of the Biopython developers

P.S. You may wish to subscribe to our news feed.  For RSS links etc, see:
http://biopython.org/wiki/News


From biopython at maubp.freeserve.co.uk  Fri Nov 21 17:05:46 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 21 Nov 2008 17:05:46 +0000
Subject: [Biopython-dev] CVS freeze for Biopython 1.49
In-Reply-To: <320fb6e00811210438v272d32afta03497a846716df6@mail.gmail.com>
References: <320fb6e00811210438v272d32afta03497a846716df6@mail.gmail.com>
Message-ID: <320fb6e00811210905i4835819bvb4955b05658ef535@mail.gmail.com>

> If there are no last minute objections, my plan is to do the Biopython
> 1.49 release this afternoon, hopefully starting after lunch - in about
> one hour's time.
>
> Please **consider CVS frozen from now**.  Hopefully I'll have the
> build done within the next 12 hours, including the Windows installers.

OK, the release is out.  Thanks everyone!  I haven't sat down and
counted, but it feels like there were more people involved and taking
an interest than for Biopython 1.48, which is great.

> Once the release is out, we'll give it a few days just in case there
> are any issues to force a re-release, and then reopen CVS.

The CVS "freeze" is over, but for the next couple of days, please only
commit small bug fixes and documentation improvements.  Baring any
surprises, we can expect to start looking at adding new code mid next
week:

> Tiago has some more PopGen code waiting, and there is also
> GenomeDiagram to look forward too (Bug 2671).

Have a good weekend,

Regards,

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 17:24:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 12:24:55 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211724.mALHOt8x003395@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 12:24 EST -------
Looking at the code for the external_entity_ref_handler function in
Bio/Entrez/Parser.py is doesn't actually attempt to cache missing DTD files.

Would this be a worthwhile enhancement?  We would have to cope with the fact
that the process may not have permissions to write to the DTD directory,
perhaps by falling back on the system temp folder?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 19:22:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 14:22:36 -0500
Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long
	organism names
In-Reply-To: <bug-2591-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211922.mALJMa8Q011752@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2591


------- Comment #3 from joelb at lanl.gov  2008-11-21 14:22 EST -------
I never heard back from info at genbank, so I found a different contact there and
I just re-sent the problem.  I'll follow up when I hear something.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 19:31:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 14:31:26 -0500
Subject: [Biopython-dev] [Bug 2681] New: BioSQL: record annotations
	enhancements
Message-ID: <bug-2681-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681

           Summary: BioSQL: record annotations enhancements
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: cymon.cox at gmail.com


BioSQL storage and retrieval of record annotations. See also bug 2396.


Patch fixes 3 annotations:

1) Fixed date/dates typo.
2) comment's were being stored by not retrieved - fixed with test.
3) A 'reference' annotation, even if an empty list, was being retrieved in a
DBSeqRecord. Fixed so that if there are no references there is no annotation in
DBSeqRecord.

Other annotations:

'date', 'ncbi_taxid', 'gi', and 'contig' are the only annotations we are not
handling correctly in the test suite.

'date' can be ignored if present in DBSeqRecord but absent in SeqRecord because
the current date is entered into table if a date is not present in the record.

Annotation 'ncbi_taxid' will be present in the DBSeqRecords even when not
present in the loaded SeqRecord as they are grabbed from the taxon table. We
can
therefore ignore this specific comparision: old record absent, new record
present. Some swiss prot SeqRecords have ncbi_taxid and they are retrieved
correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing from the
retrieved DBSeqRecord: sp012, sp014, 

Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved
DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in
bioentry, if the gi annotation is missing, which is pulled as the gi
annotation.
So the swissprot, fasta, and embl DBSeqRecords return the accession as the gi
(GenBank identifier). I think this is misleading; annotation 'gi' in the
DBSeqRecord should really be named a more generic 'identifier'...  What to do
here?

'contig' is ignored by loader because it's a SeqFeature object. Is there any
reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 19:32:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 14:32:43 -0500
Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements
In-Reply-To: <bug-2681-42@http.bugzilla.open-bio.org/>
Message-ID: <200811211932.mALJWhXP012653@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681


------- Comment #1 from cymon.cox at gmail.com  2008-11-21 14:32 EST -------
Created an attachment (id=1074)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1074&action=view)
BioSQL patch for enhancements to record annotations


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 22:41:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 17:41:16 -0500
Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements
In-Reply-To: <bug-2681-42@http.bugzilla.open-bio.org/>
Message-ID: <200811212241.mALMfGT8026797@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 17:41 EST -------
(In reply to comment #0)
> 1) Fixed date/dates typo.

Why is it a typo?  Change not checked in.

> 2) comment's were being stored by not retrieved - fixed with test.

Looks good, except for returning an empty list if there were no comments.

> 3) A 'reference' annotation, even if an empty list, was being retrieved in a
> DBSeqRecord. Fixed so that if there are no references there is no annotation
> in DBSeqRecord.

I agree, but preferred a smaller change for this:

Checking in BioSQL/BioSeq.py;
/home/repository/biopython/biopython/BioSQL/BioSeq.py,v  <--  BioSeq.py
new revision: 1.33; previous revision: 1.32
done
Checking in Tests/test_BioSQL_SeqIO.py;
/home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v  <-- 
test_BioSQL_SeqIO.py
new revision: 1.29; previous revision: 1.28
done

This was based closely on your patch, so thank you!  You are making steady
progress through the remaining "TODO" notes I left when writing
test_BioSQL_SeqIO.py :)

> Some swiss prot SeqRecords have ncbi_taxid and they are retrieved
> correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing
> from the retrieved DBSeqRecord: sp012, sp014, 

Note some swiss prot records may be multi-species, which the BioSQL schema
can't cope with.  Not sure if that applies here.

> Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved
> DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in
> bioentry, if the gi annotation is missing, which is pulled as the gi
> annotation.

There probably is something not quite right here.  Are you talking about the
bioentry.identifier entry in the database?  Perhaps an explicit example might
help.  As an aside, I think "gi" (GeneIndex used by NCBI) might be better
stored in the record.dbxrefs, but that could be a parser change...

> 'contig' is ignored by loader because it's a SeqFeature object. Is there any
> reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb)

I couldn't even say off hand how the CONTIG line in that example would be
parsed, let alone how it gets dealt with when loading into BioSQL.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 22:42:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 17:42:33 -0500
Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements
In-Reply-To: <bug-2681-42@http.bugzilla.open-bio.org/>
Message-ID: <200811212242.mALMgXAN026914@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 17:42 EST -------
P.S. For a little background, see Bug 2396.  Looking back I can see why I
missed the comments annotation at the time (being stored in a different table).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 23:47:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 18:47:13 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200811212347.mALNlDsF030565@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp  2008-11-21 18:47 EST -------
(In reply to comment #4)
> Looking at the code for the external_entity_ref_handler function in
> Bio/Entrez/Parser.py is doesn't actually attempt to cache missing DTD files.
> 
> Would this be a worthwhile enhancement?  We would have to cope with the fact
> that the process may not have permissions to write to the DTD directory,
> perhaps by falling back on the system temp folder?
> 
I think that there is an easier solution, which is to include all missing DTDs
with the Biopython installation. The number of DTDs is limited; I tried to
identify all of them but apparently I missed some. 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 21 23:49:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 21 Nov 2008 18:49:27 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200811212349.mALNnRMn030720@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp  2008-11-21 18:49 EST -------
> I'll add the DTDs that I noted above, but the problem is intermittent and I
> haven't seen the issue arise again at all, this morning.  If I see anything
> else give an error, I'll make a note here.
> 
If the DTD is available locally in Bio/Entrez/DTDs, then Bio.Entrez will read
it from there. If not, it tries to download it. This may fail if the servers
are busy. If the needed DTDs are saved in Bio/Entrez/DTDs (and installed when
Biopython is installed), you won't run into this problem.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Nov 23 15:16:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 23 Nov 2008 10:16:53 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811231516.mANFGraa019222@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #7 from dalloliogm at gmail.com  2008-11-23 10:16 EST -------
(In reply to comment #0)

> The major changes that have been made to the version previously available at
> http://bioinf.scri.ac.uk/lp are:

That's a very nice contribution, thank you!!!
This link is wrong, I think you mean
http://bioinf.scri.ac.uk/lp/programs.php#genomediagram


> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From dalloliogm at gmail.com  Sun Nov 23 17:33:54 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Sun, 23 Nov 2008 18:33:54 +0100
Subject: [Biopython-dev] blog article on GenomeDiagram in Biopython
Message-ID: <5aa3b3570811230933n2de8af3lf31d3c4b962930a3@mail.gmail.com>

Hi people,
I thought that the inclusion of GenomeDiagrams in biopython is such an
interesting news, that I wrote a blog post on it:
- http://bioinfoblog.it/2008/11/genome-diagrams-included-in-biopython-150/

I have used images from some tutorials without asking, I hope it is
not a problem.
Cheers! :)


On Sun, Nov 23, 2008 at 4:16 PM,  <bugzilla-daemon at portal.open-bio.org> wrote:
> http://bugzilla.open-bio.org/show_bug.cgi?id=2671
>
>

-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it


From mjldehoon at yahoo.com  Mon Nov 24 06:44:13 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 23 Nov 2008 22:44:13 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
Message-ID: <871524.42970.qm@web62403.mail.re1.yahoo.com>

Hi everybody,

Biopython's testing framework is built on top of Python's unit testing framewerk. Python's unit testing framework makes use of assertion statements to compare the result of a command to the expected result. Biopython uses test scripts that print output to stdout, together with an output file that contains the correct output. After running each test script, it compares the generated output with the correct output to see if the test was successful.

This approach can be useful for modules that deal with different file formats. For example, you can read in a file in one format, write it out in a different format, and compare it with the expected result.

However, more than half of Biopython's tests do not actually make use of this testing framework:

test_BioSQL
test_CAPS
test_Cluster
test_CodonTable
test_Compass
test_Crystal
test_DocSQL
test_EmbossPrimer
test_Entrez
test_Fasta
test_GACrossover
test_GAMutation
test_GAOrganism
test_GAQueens
test_GARepair
test_GASelection
test_GFF
test_GFF2
test_GraphicsChromosome
test_GraphicsDistribution
test_GraphicsGeneral
test_HMMCasino
test_HMMGeneral
test_HotRand
test_KDTree
test_KeyWList
test_LogisticRegression
test_Medline
test_NNExclusiveOr
test_NNGene
test_NNGeneral
test_Pathway
test_PopGen_FDist
test_PopGen_FDist_nodepend
test_PopGen_SimCoal
test_PopGen_SimCoal_nodepend
test_Registry
test_Restriction
test_SCOP_Astral
test_SCOP_Cla
test_SCOP_Des
test_SCOP_Dom
test_SCOP_Hie
test_SCOP_Raf
test_SCOP_Residues
test_SCOP_Scop
test_Wise
test_docstrings
test_kNN
test_lowess
test_psw

These tests have trivial output, for example test_Cluster:

test_Cluster
test_clusterdistance (test_Cluster.TestCluster) ... ok
test_distancematrix_kmedoids (test_Cluster.TestCluster) ... ok
test_kcluster (test_Cluster.TestCluster) ... ok
test_matrix_parse (test_Cluster.TestCluster) ... ok
test_median_mean (test_Cluster.TestCluster) ... ok
test_somcluster (test_Cluster.TestCluster) ... ok
test_treecluster (test_Cluster.TestCluster) ... ok

----------------------------------------------------------------------
Ran 7 tests in 0.015s

OK

I suspect that for many of the remaining tests Biopython's unit testing framework doesn't bring any real advantage, but is used anyway solely because it currently is the standard in Biopython.

Personally, I find Python's unit testing framework easier to understand than Biopython's testing framework. It doesn't need a separate output file, and it is easier to match each line of code with the correct behavior.

I would therefore like to suggest to move from Biopython's testing framework to Python's testing framework. This also relieves us of the task of explaining Biopython's testing framework to contributors, and allows us to make better use of what Python already provides. Comparing output line-by-line, as Biopython's testing framework currently does, can still be used by test scripts that need this functionality.

Comments, suggestions, anybody?

--Michiel.


From dalloliogm at gmail.com  Mon Nov 24 09:04:08 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 24 Nov 2008 10:04:08 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <871524.42970.qm@web62403.mail.re1.yahoo.com>
References: <871524.42970.qm@web62403.mail.re1.yahoo.com>
Message-ID: <5aa3b3570811240104m1442e5dfkd0c0f92c6fa772f9@mail.gmail.com>

On Mon, Nov 24, 2008 at 7:44 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> Biopython's testing framework is built on top of Python's unit testing framewerk. Python's unit testing framework makes use of assertion statements to compare the result of a command to the expected result.

Hi,
I was also proposing to use the doctest framework for some of the
modules, and for enhancing documentation.

- http://bugzilla.open-bio.org/show_bug.cgi?id=2640


> Biopython uses test scripts that print output to stdout, together with an output file that contains the
> correct output. After running each test script, it compares the generated output with the correct
> output to see if the test was successful.
>
> This approach can be useful for modules that deal with different file formats. For example, you can read in a file in one format, write it out in a different format, and compare it with the expected result.
>
> However, more than half of Biopython's tests do not actually make use of this testing framework:
>

Do you need help in re-organizing all of these modules?

> test_BioSQL
> test_CAPS
> test_Cluster
> test_CodonTable
> test_Compass
> test_Crystal
> test_DocSQL
> test_EmbossPrimer
> test_Entrez
> test_Fasta
> test_GACrossover
> test_GAMutation
> test_GAOrganism
> test_GAQueens
> test_GARepair
> test_GASelection
> test_GFF
> test_GFF2
> test_GraphicsChromosome
> test_GraphicsDistribution
> test_GraphicsGeneral
> test_HMMCasino
> test_HMMGeneral
> test_HotRand
> test_KDTree
> test_KeyWList
> test_LogisticRegression
> test_Medline
> test_NNExclusiveOr
> test_NNGene
> test_NNGeneral
> test_Pathway
> test_PopGen_FDist
> test_PopGen_FDist_nodepend
> test_PopGen_SimCoal
> test_PopGen_SimCoal_nodepend
> test_Registry
> test_Restriction
> test_SCOP_Astral
> test_SCOP_Cla
> test_SCOP_Des
> test_SCOP_Dom
> test_SCOP_Hie
> test_SCOP_Raf
> test_SCOP_Residues
> test_SCOP_Scop
> test_Wise
> test_docstrings
> test_kNN
> test_lowess
> test_psw
>
> These tests have trivial output, for example test_Cluster:
>
> test_Cluster
> test_clusterdistance (test_Cluster.TestCluster) ... ok
> test_distancematrix_kmedoids (test_Cluster.TestCluster) ... ok
> test_kcluster (test_Cluster.TestCluster) ... ok
> test_matrix_parse (test_Cluster.TestCluster) ... ok
> test_median_mean (test_Cluster.TestCluster) ... ok
> test_somcluster (test_Cluster.TestCluster) ... ok
> test_treecluster (test_Cluster.TestCluster) ... ok
>
> ----------------------------------------------------------------------
> Ran 7 tests in 0.015s
>
> OK
>
> I suspect that for many of the remaining tests Biopython's unit testing framework doesn't bring any real advantage, but is used anyway solely because it currently is the standard in Biopython.
>
> Personally, I find Python's unit testing framework easier to understand than Biopython's testing framework. It doesn't need a separate output file, and it is easier to match each line of code with the correct behavior.
>
> I would therefore like to suggest to move from Biopython's testing framework to Python's testing framework. This also relieves us of the task of explaining Biopython's testing framework to contributors, and allows us to make better use of what Python already provides. Comparing output line-by-line, as Biopython's testing framework currently does, can still be used by test scripts that need this functionality.
>
> Comments, suggestions, anybody?
>
> --Michiel.
>
>
>
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it


From bartek at rezolwenta.eu.org  Mon Nov 24 12:45:52 2008
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Mon, 24 Nov 2008 13:45:52 +0100
Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to
	import string
In-Reply-To: <320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com>
References: <bug-2680-42@http.bugzilla.open-bio.org/>
	<200811211457.mALEvlR5009727@portal.open-bio.org>
	<8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com>
	<320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com>
Message-ID: <8b34ec180811240445w3e6e97d8k38c1740e84372184@mail.gmail.com>

On Fri, Nov 21, 2008 at 4:45 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:

>
> On a related note - could you write a unit test for Bio.AlignAce please?
>

Hi Peter,

I do not have much experience with writing unit tests but I would like
to do it (treating it as an opportunity to learn more on unit tests).

There are two issues which are somewhat related to this:
- I have some more code related to sequence motif analysis which I'm
using myself and could contribute as an extension to BIo.AlignACE. If
people are interested in having this in biopython, it would be
sensible to think about refactoring Bio.AlignACE and Bio.MEME which
both provide a Motif class with largely overlapping functionality. I
could do that and at the same time write unit tests for the new
version. For that it would be cool to get input from all current or
potential users of this functionality. I'll think about it a little
and maybe write to biopython-users list.
- The other issue is connected with the type of the tests I should
write. Since Michiel brought this topic up recently, I'd like to know
whether I should do it in the python (doctest) or biopython way.

cheers
Bartek


-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433


From bartek at rezolwenta.eu.org  Mon Nov 24 14:51:12 2008
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Mon, 24 Nov 2008 15:51:12 +0100
Subject: [Biopython-dev] Refactoring motif analysis code
Message-ID: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>

Hello All,

Currently, there are two packages dealing with motif analysis in biopython :
Bio.AlignAce (written by me) and Bio.MEME (written by Jason Hackney).

Both of them are quite old and they were developed independently so
the functionality is largely overlapping.
Particularly the files AlignAce/Motif.py  and MEME/Motif.py contain
almost identical functionality useful for
anyone interested in motif analysis of  writing a parser for yet
another motif searching tool.

I'd like to change this and create a new library called Bio.Motif,
which would contain:
-Motif class for all general functionality concerning motif objects:
i/o, comparisons, sequence scanning
-AlignAce Parser
-MEME Parser

When this is completed, we could deprecate the AlignAce and MEME
modules. For AlignAce I have most of the code
already written, I need to rewrite portions of MEME parser to work
with different motif implementation (not a major pain).
Then I just need to polish it a bit and provide tests and a short tutorial.

After this rather long intro I'd like to ask about several things:
- Are there many Bio.AlignAce or Bio.MEME users who would be unhappy
about deprecating them?
- Are there any features which people would find valuable in Bio.Motif
- Both MEME and AlignAce are DNA-oriented, I've never worked on
Protein motifs myself, but I'd like to know whether anyone is
interested in using Bio.Motif for that

Any comments/ideas are welcome

cheers
Bartek

-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433


From dalloliogm at gmail.com  Mon Nov 24 15:25:23 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 24 Nov 2008 16:25:23 +0100
Subject: [Biopython-dev] Refactoring motif analysis code
In-Reply-To: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>
References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>
Message-ID: <5aa3b3570811240725n54f7f624oc1db5fe0b88e3f5a@mail.gmail.com>

On Mon, Nov 24, 2008 at 3:51 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
> Hello All,
>
> Currently, there are two packages dealing with motif analysis in biopython :
> Bio.AlignAce (written by me) and Bio.MEME (written by Jason Hackney).

Hi, I asked a question about motifs one year ago on this list.
Here it is the thread:
- http://lists.open-bio.org/pipermail/biopython/2007-September/003727.html

I would just like to tell you that I have tried the TAMO framework you
suggested me, and found it very useful.
I am not using it anymore because I don't need it, but I remember that I liked:
- the methods to represent motifs as matrixes of frequencies/occurrencies etc..
- the fact that it was easy to create a motif from an alignment of sequences
- the integration it had with this website:
http://weblogo.berkeley.edu/logo.cgi.
I would suggest you to provide integration with this other web
service, which enable to plot the difference between two sequence
logos: http://www.twosamplelogo.org/examples.html.

Maybe you should contact TAMO's author to ask him if he wants to
contribute, because I remember that its framework was really complete.


>
> Both of them are quite old and they were developed independently so
> the functionality is largely overlapping.
> Particularly the files AlignAce/Motif.py  and MEME/Motif.py contain
> almost identical functionality useful for
> anyone interested in motif analysis of  writing a parser for yet
> another motif searching tool.
>
> I'd like to change this and create a new library called Bio.Motif,
> which would contain:
> -Motif class for all general functionality concerning motif objects:
> i/o, comparisons, sequence scanning
> -AlignAce Parser
> -MEME Parser
>
> When this is completed, we could deprecate the AlignAce and MEME
> modules. For AlignAce I have most of the code
> already written, I need to rewrite portions of MEME parser to work
> with different motif implementation (not a major pain).
> Then I just need to polish it a bit and provide tests and a short tutorial.
>
> After this rather long intro I'd like to ask about several things:
> - Are there many Bio.AlignAce or Bio.MEME users who would be unhappy
> about deprecating them?
> - Are there any features which people would find valuable in Bio.Motif
> - Both MEME and AlignAce are DNA-oriented, I've never worked on
> Protein motifs myself, but I'd like to know whether anyone is
> interested in using Bio.Motif for that
>
> Any comments/ideas are welcome
>
> cheers
> Bartek
>
> --
> Bartek Wilczynski
> ==================
> Postdoctoral fellow
> EMBL, Furlong group
> Meyerhoffstrasse 1,
> 69012 Heidelberg,
> Germany
> tel: +49 6221 387 8433
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it


From bsouthey at gmail.com  Mon Nov 24 15:54:32 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 24 Nov 2008 09:54:32 -0600
Subject: [Biopython-dev] Refactoring motif analysis code
In-Reply-To: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>
References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>
Message-ID: <492ACE38.1090301@gmail.com>

Bartek Wilczynski wrote:
> Hello All,
>
> Currently, there are two packages dealing with motif analysis in biopython :
> Bio.AlignAce (written by me) and Bio.MEME (written by Jason Hackney).
>   
Actually I am not that thrilled with the licenses for these packages and 
similar packages because these are free only for academic use. To me 
this clashes with the spirit of an open-sourced project especially a 
BSD-licensed one. But if there is a need for such modules then these 
modules should be included.

> Both of them are quite old and they were developed independently so
> the functionality is largely overlapping.
> Particularly the files AlignAce/Motif.py  and MEME/Motif.py contain
> almost identical functionality useful for
> anyone interested in motif analysis of  writing a parser for yet
> another motif searching tool.
>
> I'd like to change this and create a new library called Bio.Motif,
> which would contain:
> -Motif class for all general functionality concerning motif objects:
> i/o, comparisons, sequence scanning
> -AlignAce Parser
> -MEME Parser
>
>   
While it is only free for academic use, have you seen TAMO?
*TAMO: a flexible, object-oriented framework for analyzing 
transcriptional regulation using DNA-sequence motifs. *
Bioinformatics. 2005 Jul 15;21(14):3164-5. 
<http://bioinformatics.oxfordjournals.org/cgi/content/abstract/21/14/3164>

http://fraenkel.mit.edu/TAMO/


> When this is completed, we could deprecate the AlignAce and MEME
> modules. For AlignAce I have most of the code
> already written, I need to rewrite portions of MEME parser to work
> with different motif implementation (not a major pain).
> Then I just need to polish it a bit and provide tests and a short tutorial.
>
> After this rather long intro I'd like to ask about several things:
> - Are there many Bio.AlignAce or Bio.MEME users who would be unhappy
> about deprecating them?
>   
Well, I am not sure how many used Bio.AlignAce given the Parser.py bug :-)
Based on the CVS, both have been untouched for about three years.

Also, what species are these used for?
One of the papers of AlignAce indicate that the base composition was set 
for yeast.

> - Are there any features which people would find valuable in Bio.Motif
> - Both MEME and AlignAce are DNA-oriented, I've never worked on
> Protein motifs myself, but I'd like to know whether anyone is
> interested in using Bio.Motif for that
>
> Any comments/ideas are welcome
>
> cheers
> Bartek
>
>   
Personally I would be interested in a general protein motif finding 
module because of my current research. However, I do have a different 
view with respect to the Biopython community as indicated above with the 
licenses.

Bruce


From bsouthey at gmail.com  Mon Nov 24 17:47:21 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 24 Nov 2008 11:47:21 -0600
Subject: [Biopython-dev] Use of depreciated string functions
In-Reply-To: <320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com>
References: <4926D17A.8080101@gmail.com>
	<320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com>
Message-ID: <492AE8A9.1000406@gmail.com>

Peter wrote:
> On Fri, Nov 21, 2008 at 3:19 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>   
>> Hi,
>> There are a number of files in Bio that import string. Many of these use
>> depreciated functions (since Version 2) that are now string methods mainly
>>  string.atof(), string.atoi()  and string.join(). The only real advantage of
>> modifying these is to remove an import statement because these will not be
>> removed until Python 3.
>>
>> Perhaps the one exception is in HotRand.py: hex_digit =
>> string.hexdigits.find( letter )
>>
>> There are about 23 unique files that I identified via grep and many have
>> more than one usage. While changing these is busy work, please let me know
>> if you would like me to create patches for the next version of Biopython (ie
>> 1.50) or just ignore this.
>>     
>
> As you say, there isn't much benefit from doing this other than
> removing an import and making another small step towards Python 3.0
> compatibility.  We have gradually been phasing out "import string"
> already, usually when working on a module which used it.
>
> Once I've dealt with Biopython 1.49, I'd be happy to look at a patch
> to remove more "import string" usage from non-obsolete, non-deprecated
> code.  It would be a little risky doing this to modules without unit
> tests, but that's another area you've shown some interest in anyway...
>
> Thanks,
>
> Peter
>
>   
Hi,
I was planning to get started on with these depending on what time I 
have available. So just a quick question:
Do you want one bug report per patch per file?

Or just let me know if there is another way.

Thanks
Bruce


From biopython at maubp.freeserve.co.uk  Mon Nov 24 18:42:08 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 24 Nov 2008 18:42:08 +0000
Subject: [Biopython-dev] Use of depreciated string functions
In-Reply-To: <492AE8A9.1000406@gmail.com>
References: <4926D17A.8080101@gmail.com>
	<320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com>
	<492AE8A9.1000406@gmail.com>
Message-ID: <320fb6e00811241042g646ff65fq61d3751537c882b1@mail.gmail.com>

On Mon, Nov 24, 2008 at 5:47 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>> Once I've dealt with Biopython 1.49, I'd be happy to look at a patch
>> to remove more "import string" usage from non-obsolete, non-deprecated
>> code.  It would be a little risky doing this to modules without unit
>> tests, but that's another area you've shown some interest in anyway...
>>
>> Thanks,
>>
>> Peter
>
> Hi,
> I was planning to get started on with these depending on what time I have
> available. So just a quick question:
> Do you want one bug report per patch per file?
> Or just let me know if there is another way.

I'd suggest one general bug, and uploading one patch per module - that
way the can be evaluated on a case by case basis (a single huge
multi-file patch would be more difficult, and could become out of
date).

Personally however, I would prioritise more unit test coverage over
this, but on the other hand its the kind of short task you can handle
when you have the odd spare 10 minutes.  Up to you.

Peter


From bugzilla-daemon at portal.open-bio.org  Mon Nov 24 20:40:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 24 Nov 2008 15:40:49 -0500
Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements
In-Reply-To: <bug-2681-42@http.bugzilla.open-bio.org/>
Message-ID: <200811242040.mAOKenEi002020@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681


------- Comment #4 from cymon.cox at gmail.com  2008-11-24 15:40 EST -------
(In reply to comment #2)
> (In reply to comment #0)
> > 1) Fixed date/dates typo.
> 
> Why is it a typo?  Change not checked in.

The function _load_bioentry_date in Loader.py inserts the annotation 'date', if
present, or the current date if not, into the bioentry_qualifier_value table.
This is pulled by BioSeq.py _retrieve_qualifier_value and stored as the
attribute 'dates'. Hence I considered line 307 in BioSeq.py to be a typo, which
should be 'date' and not 'dates'. Also, because Loader.py handles dates
separately, they should not be handled by the function load_annotations.

> > 2) comment's were being stored by not retrieved - fixed with test.
> 
> Looks good, except for returning an empty list if there were no comments.
> 
> > 3) A 'reference' annotation, even if an empty list, was being retrieved in a
> > DBSeqRecord. Fixed so that if there are no references there is no annotation
> > in DBSeqRecord.
> 
> I agree, but preferred a smaller change for this:
> 
> Checking in BioSQL/BioSeq.py;
> /home/repository/biopython/biopython/BioSQL/BioSeq.py,v  <--  BioSeq.py
> new revision: 1.33; previous revision: 1.32
> done
> Checking in Tests/test_BioSQL_SeqIO.py;
> /home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v  <-- 
> test_BioSQL_SeqIO.py
> new revision: 1.29; previous revision: 1.28
> done

Actually, your version of _retrieve_comment never returns comments ;-)

On the wider issue: perhaps, it's best if DBSeqRecord's always have the same
set of attributes, even if comments and references are empty lists. Trying to
regenerate the attributes present in the loaded SeqRecord is, I think, not the
way to go, and not possible (or at least currently not attempted) for fasta
records. Perhaps we should be coding around the issue in the test suite rather
than changing the attributes of the DBSeqRecord so that it passes the test...

> > Some swiss prot SeqRecords have ncbi_taxid and they are retrieved
> > correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing
> > from the retrieved DBSeqRecord: sp012, sp014, 
> 
> Note some swiss prot records may be multi-species, which the BioSQL schema
> can't cope with.  Not sure if that applies here.

Yep, thats exactly what was causing the problem. Currently the code refuses to
load an ncbi_taxid, which I think is correct, after all which one should be
loaded? Anyway, I'll look into this a bit more...

> > Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved
> > DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in
> > bioentry, if the gi annotation is missing, which is pulled as the gi
> > annotation.
> 
> There probably is something not quite right here.  Are you talking about the
> bioentry.identifier entry in the database?  Perhaps an explicit example might
> help.  As an aside, I think "gi" (GeneIndex used by NCBI) might be better
> stored in the record.dbxrefs, but that could be a parser change...

Ah, OK, will look further into this as well...

> > 'contig' is ignored by loader because it's a SeqFeature object. Is there any
> > reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb)
> 
> I couldn't even say off hand how the CONTIG line in that example would be
> parsed, let alone how it gets dealt with when loading into BioSQL.

Well, the parser correctly deals with it as a SeqFeature (with a whole bunch of
sub_features) but it never gets loaded its not dealt with at all an falls of
the bottom of the function; I cant see any reason not to load it...

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 24 21:40:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 24 Nov 2008 16:40:24 -0500
Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements
In-Reply-To: <bug-2681-42@http.bugzilla.open-bio.org/>
Message-ID: <200811242140.mAOLeO8n008996@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681


------- Comment #5 from cymon.cox at gmail.com  2008-11-24 16:40 EST -------
(In reply to comment #2)
> (In reply to comment #0)
> > Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved
> > DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in
> > bioentry, if the gi annotation is missing, which is pulled as the gi
> > annotation.
> 
> There probably is something not quite right here.  Are you talking about the
> bioentry.identifier entry in the database?  Perhaps an explicit example might
> help.  As an aside, I think "gi" (GeneIndex used by NCBI) might be better
> stored in the record.dbxrefs, but that could be a parser change...

The "gi" annotation of a parsed GenBank record refers to this GenInfo
Identifier:

>From NCBI: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#GInB
"""
"GenInfo Identifier" sequence identification number, in this case, for the
nucleotide sequence. If a sequence changes in any way, a new GI number will be
assigned. GI sequence identifiers run parallel to the new accession.version
system of sequence identifiers. """

This is stored in bioentry.identifier. However, "gi"'s are not present in
swissprot, fasta, and embl records, instead the following couplet loads the
record.id into the identifier slot:

Loader.py:
 519         if "gi" in record.annotations :
 520             identifier = record.annotations["gi"]
 521         else :
 522             identifier = record.id

But of course, the record.id is not the "gi" - so perhaps the
bioentry.identifier should be left NULL if the "gi" number is missing. Or we
might consider calling the DBSeqRecord attribute "identifier" rather than
"gi"...

Here's an example of an EMBL file where the record.id becomes the "gi":

Testing loading from embl format file EMBL/TRBG361.embl
 - AAACAAACCAAATATGGAT...AAA [jfp/7BKv3jTJAU/4jVMrSftEq20] len 1859, X56734.1
 - Retrieving by name/display_id 'X56734', 
old annos diff: set([])
new annos diff: set(['dates', 'ncbi_taxid', 'gi'])

OLD:
taxonomy = ['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta',
'Tracheophyta', 'Spermatophyta', 'Magnoliophyta', 'eudicotyledons', 'core
eudicotyledons', 'rosids', 'eurosids I', 'Fabales', 'Fabaceae',
'Papilionoideae', 'Trifolieae', 'Trifolium']
references = [<Bio.SeqFeature.Reference instance at 0x8e9302c>,
<Bio.SeqFeature.Reference instance at 0x8e931ac>]
accessions = ['X56734', 'S46826']
data_file_division = PLN
organism = Trifolium repens (white clover)
sequence_version = 1
NEW:
dates = ['24-NOV-2008']
ncbi_taxid = 3899
references = [<Bio.SeqFeature.Reference instance at 0x8eced6c>,
<Bio.SeqFeature.Reference instance at 0x8ecedcc>]
accessions = ['X56734', 'S46826']
data_file_division = PLN
taxonomy = ['Trifolium repens (white clover)']
gi = X56734.1
organism = Trifolium repens (white clover)
sequence_version = ['1']
ncbi_taxid: 3899


C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 24 22:51:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 24 Nov 2008 17:51:37 -0500
Subject: [Biopython-dev] [Bug 2683] New: Modules with unused string modules
Message-ID: <bug-2683-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2683

           Summary: Modules with unused string modules
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: trivial
          Priority: P5
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


This is a trivial general bug for any Biopython modules that import the string
module but do not use it. A different bug will be used for those modules that
actually use any depreciated string functions.

Please attach any similar modules to this report.

AlignAce modules:
Bio/AlignAce/AlignAceStandalone.py
Bio/AlignAce/CompareAceStandalone.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Nov 24 23:05:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 24 Nov 2008 18:05:27 -0500
Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements
In-Reply-To: <bug-2681-42@http.bugzilla.open-bio.org/>
Message-ID: <200811242305.mAON5Rs2017499@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681


------- Comment #6 from cymon.cox at gmail.com  2008-11-24 18:05 EST -------
(In reply to comment #4)
> (In reply to comment #2)
> > (In reply to comment #0)
> > > Some swiss prot SeqRecords have ncbi_taxid and they are retrieved
> > > correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing
> > > from the retrieved DBSeqRecord: sp012, sp014, 
> > 
> > Note some swiss prot records may be multi-species, which the BioSQL schema
> > can't cope with.  Not sure if that applies here.
> 
> Yep, thats exactly what was causing the problem. Currently the code refuses to
> load an ncbi_taxid, which I think is correct, after all which one should be
> loaded? Anyway, I'll look into this a bit more...

So, how best to handle records with multiple taxa:

SwissProt/sp014 has 10 organisms which are currently loaded directly into the
taxon_name table:

biosql_test=# select name, name_class from taxon_name where taxon_id = 94;
                                                                               
                                              name                             
                                                                               
                 |   name_class    
------------------------------------------------------------------------------
 Oryza sativa (Rice), Nicotiana tabacum (Common tobacco) Hordeum vulgare
(Barley), Triticum aestivum (Wheat) Secale cereale (Rye), Zea mays (Maize),
Pisum sativum (Garden pea) Spinacia oleracea (Spinach), Capsicum annuum (Bell
pepper) Mesembryanthemum crys | scientific name
(1 row)

That's clearly not a scientific name...

The record has the ncbi_taxon_ids:
OX   NCBI_TaxID=4530, 4097, 4513, 4565, 4550, 4577, 3888, 3562, 4072, 3544,
 19 OX   3555, 3696;

Which are currently not stored because there is more than one:

Loader.py:
 150         ncbi_taxon_id = None
 151         if "ncbi_taxid" in record.annotations :
 152             #Could be a list of IDs.
 153             if isinstance(record.annotations["ncbi_taxid"],list) :
 154                 if len(record.annotations["ncbi_taxid"])==1 :
 155                     ncbi_taxon_id = record.annotations["ncbi_taxid"][0]
 156             else :
 157                 ncbi_taxon_id = record.annotations["ncbi_taxid"]

BioSQL is clearly not designed to store records from multiple taxa: one
bioentry has one taxon_id. Should biopython be refusing to load such records if
the scientific name is not a binomial? What does perl do? 

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Tue Nov 25 04:08:18 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 24 Nov 2008 20:08:18 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570811240104m1442e5dfkd0c0f92c6fa772f9@mail.gmail.com>
Message-ID: <199296.58154.qm@web62402.mail.re1.yahoo.com>

> > However, more than half of Biopython's tests do
> > not actually make use of this testing framework:
> 
> Do you need help in re-organizing all of these modules?

That would be helpful, but let's see first if there are any objections to my proposal. We'll also have to decide the pathway to change the tests without breaking anything. For the unit tests I listed, the changes should be trivial, but still we need to check if any problems show up.

Thanks!

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 14:31:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 09:31:18 -0500
Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules
In-Reply-To: <bug-2683-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251431.mAPEVIYj014396@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2683


------- Comment #1 from bsouthey at gmail.com  2008-11-25 09:31 EST -------
Bio/Crystal/__init__.py imports but does appear to use the following modules:
array
string
Seq
MutableSeq


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 14:40:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 09:40:23 -0500
Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules
In-Reply-To: <bug-2683-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251440.mAPEeN8f015160@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2683


------- Comment #2 from barwil at gmail.com  2008-11-25 09:40 EST -------

> AlignAce modules:
> Bio/AlignAce/AlignAceStandalone.py
> Bio/AlignAce/CompareAceStandalone.py
> 

Fixed in CVS now.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From chapmanb at 50mail.com  Tue Nov 25 14:40:41 2008
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 25 Nov 2008 09:40:41 -0500
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <871524.42970.qm@web62403.mail.re1.yahoo.com>
References: <871524.42970.qm@web62403.mail.re1.yahoo.com>
Message-ID: <20081125144041.GC83220@sobchak.mgh.harvard.edu>

Hi Michiel;
Good thoughts on this; my comments are below.

> Biopython's testing framework is built on top of Python's unit testing
> framewerk. Python's unit testing framework makes use of assertion
> statements to compare the result of a command to the expected result.
> Biopython uses test scripts that print output to stdout, together with
> an output file that contains the correct output. After running each
> test script, it compares the generated output with the correct output
> to see if the test was successful.

Agreed with the distinction between the unit tests and the "dump
lots of text and compare" approach. I've written both and do think
the unit testing/assertion model is more robust since you can go
back and actually get some insight into what someone was thinking
when they wrote an assertion.

> However, more than half of Biopython's tests do not actually make use of this testing framework:
[...]
> These tests have trivial output, for example test_Cluster:
> 
> test_Cluster
> test_clusterdistance (test_Cluster.TestCluster) ... ok
> test_distancematrix_kmedoids (test_Cluster.TestCluster) ... ok
> test_kcluster (test_Cluster.TestCluster) ... ok
> test_matrix_parse (test_Cluster.TestCluster) ... ok
> test_median_mean (test_Cluster.TestCluster) ... ok
> test_somcluster (test_Cluster.TestCluster) ... ok
> test_treecluster (test_Cluster.TestCluster) ... ok

They really do make use of the framework, but at a higher level. I
agree that if you run a single test it makes little difference
whether you use 'run_tests.py test_Cluster' or just run
'test_Cluster.py' directly. However, when you are running all the
tests as is regular done in development or before pushing releases,
this comparison is important. It will will pick out if you get a
line like:

test_clusterdistance (test_Cluster.TestCluster) ... ERROR

instead of the expected ok and report this in the summary for all of
the tests. Otherwise this is likely to get lost in all of the
results.

> Personally, I find Python's unit testing framework easier to
> understand than Biopython's testing framework. It doesn't need a
> separate output file, and it is easier to match each line of code with
> the correct behavior.
>
> I would therefore like to suggest to move from Biopython's testing
> framework to Python's testing framework. This also relieves us of the
> task of explaining Biopython's testing framework to contributors,
> and allows us to make better use of what Python already provides.
> Comparing output line-by-line, as Biopython's testing framework
> currently does, can still be used by test scripts that need this
> functionality.

Is the testing framework you are proposing different from the unit
tests used the individual tests? How does your proposed manage the
higher level functionality of checking if all sub-tests within one
of the test suites passes?

Brad


From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 15:24:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 10:24:33 -0500
Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules
In-Reply-To: <bug-2683-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251524.mAPFOXe2019581@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2683


------- Comment #3 from bsouthey at gmail.com  2008-11-25 10:24 EST -------
Bio/FilteredReader.py imports but does appear to use the following modules:

os
string
copy
from File import UndoHandle


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 16:13:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 11:13:01 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251613.mAPGD1FG024870@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #7 from cymon.cox at gmail.com  2008-11-25 11:13 EST -------
(In reply to comment #6)
> (From update of attachment 1072 [details])
> I think this is still a big improvement, but that the
> (sub)feature.location_operator issue could wait.  We'll need to discuss on the
> BioSQL mailing list how this should be handled consistently.
> 
> Leaving this bug open.

Further to the "where to put the (sub)feature.location_operator" (eg. "join",
"order") question, this comment appears in the BioPerl MySQL schema for the
location_qualifier_value table:

-- location qualifiers - mainly intended for fuzzies but anything
-- can go in here
-- some controlled vocab terms have slots;

So, this would seem a suitable place to store the attribute.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 16:13:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 11:13:07 -0500
Subject: [Biopython-dev] [Bug 2684] New: GenBank/__init__.py: Removing loop
	over string.whitespace
Message-ID: <bug-2684-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2684

           Summary: GenBank/__init__.py: Removing loop over
                    string.whitespace
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


The function '_clean_location' in GenBank/__init__.py uses a 'for' loop over
string.whitespace that removes whitespace from string. A simpler way is to just
split the string on whitespace and rejoin it as a single line:

location_line=''.join(location_string.split())


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 16:14:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 11:14:19 -0500
Subject: [Biopython-dev] [Bug 2684] GenBank/__init__.py: Removing loop over
	string.whitespace
In-Reply-To: <bug-2684-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251614.mAPGEJvT025100@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2684


------- Comment #1 from bsouthey at gmail.com  2008-11-25 11:14 EST -------
Created an attachment (id=1083)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1083&action=view)
Removal of unnessary loop over string.whitespace


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 16:30:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 11:30:01 -0500
Subject: [Biopython-dev] [Bug 2685] New: HotRand provides an unnecessary
	function to convert hex to integer
Message-ID: <bug-2685-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2685

           Summary: HotRand provides an unnecessary function to convert hex
                    to integer
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


The file Bio/HotRand.py defines the function hex_convert that converts a hex
number to an integer number. This functionality is provided by the builtin
int() with appropriate radix, i.e. 
int(hex_number, 16)

This function could be removed or replaced to avoiding using the string module.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 16:31:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 11:31:09 -0500
Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function
	to convert hex to integer
In-Reply-To: <bug-2685-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251631.mAPGV91O027180@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2685


------- Comment #1 from bsouthey at gmail.com  2008-11-25 11:31 EST -------
Created an attachment (id=1084)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1084&action=view)
Replaces hex_convert() with int()


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 16:52:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 11:52:12 -0500
Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function
	to convert hex to integer
In-Reply-To: <bug-2685-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251652.mAPGqCMt029684@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2685


bsouthey at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1084 is|0                           |1
           obsolete|                            |


------- Comment #2 from bsouthey at gmail.com  2008-11-25 11:52 EST -------
Created an attachment (id=1085)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1085&action=view)
Messed up the first patch


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 16:53:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 11:53:41 -0500
Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function
	to convert hex to integer
In-Reply-To: <bug-2685-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251653.mAPGrfPk029811@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2685


bsouthey at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1085 is|0                           |1
           obsolete|                            |


------- Comment #3 from bsouthey at gmail.com  2008-11-25 11:53 EST -------
Created an attachment (id=1086)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1086&action=view)
Sorry wrong version


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 18:18:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 13:18:59 -0500
Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules
In-Reply-To: <bug-2683-42@http.bugzilla.open-bio.org/>
Message-ID: <200811251818.mAPIIxQt006109@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2683


------- Comment #4 from bsouthey at gmail.com  2008-11-25 13:18 EST -------
These are the last files that I have found in Bio that import the string module
but are not used:

IntelliGenetics/__init__.py
IntelliGenetics/intelligenetics_format.py
IntelliGenetics/Record.py
NetCatch.py
SCOP/__init__.py
PDB/PSEA.py (imports upper)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov 25 22:18:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 25 Nov 2008 17:18:41 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcribe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811252218.mAPMIfFX029455@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


mmokrejs at ribosome.natur.cuni.cz changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|translate and transcibe     |translate and transcribe
                   |methods for the Seq object  |methods for the Seq object
                   |(in Bio.Seq)                |(in Bio.Seq)


------- Comment #53 from mmokrejs at ribosome.natur.cuni.cz  2008-11-25 17:18 EST -------
(In reply to comment #27)
> Created an attachment (id=1032)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details]
> Patch to Bio/Seq.py to add start codon handling to translation
> 
> Patch adds a new boolean argument to the translate method and function, called
> "init" (rather than my earlier suggestions like "from_start" or "check_start"
> which could be considered misleading).
> 
> Docstring:
> 
>         init - Boolean, defaults to False.  Should translation check the
>                first codon is a valid initiation (start) codon and translate
>                it as methionine (M)?  If False, nothing special is done with
>                the first codon.

What kind of check is it doing? I think it just forces the first letter to be
'M'.

> 
> 
> Example usage of the translate function,
> 
> >>> from Bio.Seq import translate
> >>> translate("TTGAAACCCTAG")
> 'LKP*'
> >>> translate("TTGAAACCCTAG", init=True, to_stop=True)
> 'MKP'
> >>> translate("TTGAAACCCTAG", init=True)
> 'MKP*'
> >>> translate("TTGAAACCCTAG", to_stop=True)
> 'LKP'

I don't like the "init" argument either. I would call it force_initiator_Met
instead. BTW, non-canonical initiator codon is CUG, where did you found UUG?

Sorry, I got overloaded by many other tasks so haven't read any other
follow-ups, I just hit the email from bugzilla by luck.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 15:57:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 10:57:05 -0500
Subject: [Biopython-dev] [Bug 2688] New: Removal of depreciated string
	functions
Message-ID: <bug-2688-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688

           Summary: Removal of depreciated string functions
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: minor
          Priority: P5
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


This is a general bug to remove any depreciated string functions from Biopython
modules. I apologize in advance for the noise this creates especially due to my
mistakes.

I have tested and validated the subsequent patches on my Linux system with
Python versions 2.3, 2.4, 2.5 and 2.6. However, I do recognize that patches may
be in code not used by the tests. 


The following files require importing the string module and are thus excluded
(although depreciated functions may still be used):
Bio/Decode.py - maketrans()
Bio/EUtils/POM.py - maketrans()
Bio/Prosite/Pattern.py - maketrans()
Bio/Seq.py - maketrans()
triefind.py - defines string.punctuation + string.whitespace

The following files have alternative reports
GenBank/__init__.py
HotRand.py


The following files are depreciated and are excluded:
Emboss/Primer.py
stringfns.py
MetaTool/__init__.py
MetaTool/metatool_format.py
MetaTool/Record.py
NBRF/__init__.py
Ndb/__init__.py
Transcribe.py


The following files import but do not use the string module 
AlignAce/AlignAceStandalone.py (fixed)
AlignAce/CompareAceStandalone.py (fixed)
Crystal/__init__.py
IntelliGenetics/__init__.py
IntelliGenetics/intelligenetics_format.py
IntelliGenetics/Record.py
NetCatch.py
SCOP/__init__.py


The following files are known to use string module and have patches:
Align/AlignInfo.py
Blast/ParseBlastTable.py
FSSP/__init__.py
NMR/NOEtools.py
NMR/xpktools.py
PDB/MMCIFParser.py
SubsMat/__init__.py
Blast/Record.py
Compass/__init__.py
Data/CodonTable.py
Eutils/sourcegen.py
Eutils/tests/unittest.py
Fasta/FastaAlign.py
FilteredReader.py
GFF/easy.py
HMM/Utilities.py
Index.py
MEME/Parser.py
NeuralNetwork/Gene/Pattern.py
NeuralNetwork/Gene/Schema.py
Parsers/spark.py
PDB/parse_pdb_header.py
PDB/PDBList.py
PDB/PDBParser.py
PDB/PSEA.py
SCOP/__init__.py
utils.py

I did not see an trivial resolution for the functions in:
SubsMat/FreqTable.py
So I rewrote the functions to avoid using map.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 15:58:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 10:58:03 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261558.mAQFw3wc029231@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #1 from bsouthey at gmail.com  2008-11-26 10:58 EST -------
Created an attachment (id=1088)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1088&action=view)
Remove depreciated string functions


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 15:59:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 10:59:27 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261559.mAQFxR5t029522@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #2 from bsouthey at gmail.com  2008-11-26 10:59 EST -------
Created an attachment (id=1089)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1089&action=view)
Blast/Record.py patch


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:01:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:01:30 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261601.mAQG1U4h029894@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #3 from bsouthey at gmail.com  2008-11-26 11:01 EST -------
Created an attachment (id=1090)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1090&action=view)
Compass/__init__.py depreciated string functions


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:02:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:02:26 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261602.mAQG2Qlx030068@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #4 from bsouthey at gmail.com  2008-11-26 11:02 EST -------
Created an attachment (id=1091)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1091&action=view)
Data/CodonTable.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:03:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:03:14 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261603.mAQG3ETM030188@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #5 from bsouthey at gmail.com  2008-11-26 11:03 EST -------
Created an attachment (id=1092)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1092&action=view)
Eutils/sourcegen.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:04:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:04:07 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261604.mAQG47K1030328@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #6 from bsouthey at gmail.com  2008-11-26 11:04 EST -------
Created an attachment (id=1093)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1093&action=view)
Eutils/tests/unittest.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:05:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:05:14 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261605.mAQG5EUu030457@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #7 from bsouthey at gmail.com  2008-11-26 11:05 EST -------
Created an attachment (id=1094)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1094&action=view)
Fasta/FastaAlign.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:06:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:06:35 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261606.mAQG6ZqF030610@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #8 from bsouthey at gmail.com  2008-11-26 11:06 EST -------
Created an attachment (id=1095)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1095&action=view)
FSSP/__init__.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:09:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:09:26 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261609.mAQG9QMf030939@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


bsouthey at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1095 is|0                           |1
           obsolete|                            |


------- Comment #9 from bsouthey at gmail.com  2008-11-26 11:09 EST -------
Created an attachment (id=1096)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1096&action=view)
FSSP/__init__.py corrected

Got the files in the wrong order.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:10:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:10:25 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261610.mAQGAP10031066@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #10 from bsouthey at gmail.com  2008-11-26 11:10 EST -------
Created an attachment (id=1097)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1097&action=view)
GFF/easy.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:11:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:11:19 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261611.mAQGBJ28031191@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #11 from bsouthey at gmail.com  2008-11-26 11:11 EST -------
Created an attachment (id=1098)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1098&action=view)
HMM/Utilities.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:31:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:31:52 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261631.mAQGVqef001363@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #12 from bsouthey at gmail.com  2008-11-26 11:31 EST -------
Created an attachment (id=1099)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1099&action=view)
Index.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:32:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:32:37 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261632.mAQGWbYF001446@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #13 from bsouthey at gmail.com  2008-11-26 11:32 EST -------
Created an attachment (id=1100)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1100&action=view)
MEME/Parser.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:33:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:33:41 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261633.mAQGXfww001564@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #14 from bsouthey at gmail.com  2008-11-26 11:33 EST -------
Created an attachment (id=1101)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1101&action=view)
NeuralNetwork/Gene/Pattern.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:34:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:34:41 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261634.mAQGYf0u001687@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #15 from bsouthey at gmail.com  2008-11-26 11:34 EST -------
Created an attachment (id=1102)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1102&action=view)
NeuralNetwork/Gene/Schema.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:35:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:35:35 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261635.mAQGZZno001826@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #16 from bsouthey at gmail.com  2008-11-26 11:35 EST -------
Created an attachment (id=1103)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1103&action=view)
NMR/NOEtools.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:36:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:36:19 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261636.mAQGaJXQ001918@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #17 from bsouthey at gmail.com  2008-11-26 11:36 EST -------
Created an attachment (id=1104)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1104&action=view)
NMR/xpktools.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:37:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:37:14 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261637.mAQGbEX0002035@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #18 from bsouthey at gmail.com  2008-11-26 11:37 EST -------
Created an attachment (id=1105)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1105&action=view)
Parsers/spark.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:38:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:38:42 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261638.mAQGcgvH002293@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #19 from bsouthey at gmail.com  2008-11-26 11:38 EST -------
Created an attachment (id=1106)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1106&action=view)
Blast/ParseBlastTable.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:39:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:39:37 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261639.mAQGdbdC002442@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #20 from bsouthey at gmail.com  2008-11-26 11:39 EST -------
Created an attachment (id=1107)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1107&action=view)
PDB/MMCIFParser.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:40:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:40:56 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261640.mAQGeuHm002669@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #21 from bsouthey at gmail.com  2008-11-26 11:40 EST -------
Created an attachment (id=1108)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1108&action=view)
PDB/parse_pdb_header.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:41:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:41:56 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261641.mAQGfuJj002827@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #22 from bsouthey at gmail.com  2008-11-26 11:41 EST -------
Created an attachment (id=1109)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1109&action=view)
PDB/PDBList.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:42:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:42:41 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261642.mAQGgfiH002929@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #23 from bsouthey at gmail.com  2008-11-26 11:42 EST -------
Created an attachment (id=1110)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1110&action=view)
PDB/PDBParser.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:43:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:43:28 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261643.mAQGhSbJ003019@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #24 from bsouthey at gmail.com  2008-11-26 11:43 EST -------
Created an attachment (id=1111)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1111&action=view)
SubsMat/__init__.py 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:46:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:46:00 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261646.mAQGk0id003484@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #25 from bsouthey at gmail.com  2008-11-26 11:46 EST -------
Created an attachment (id=1112)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1112&action=view)
SubsMat/FreqTable.py 

The two functions involved were rewritten because of the use of map(). 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:49:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:49:58 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261649.mAQGnwds003938@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #26 from bsouthey at gmail.com  2008-11-26 11:49 EST -------
Created an attachment (id=1113)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1113&action=view)
utils.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Nov 26 16:55:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 11:55:45 -0500
Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function
	to convert hex to integer
In-Reply-To: <bug-2685-42@http.bugzilla.open-bio.org/>
Message-ID: <200811261655.mAQGtjPA004778@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2685


bsouthey at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1086 is|0                           |1
           obsolete|                            |


------- Comment #4 from bsouthey at gmail.com  2008-11-26 11:55 EST -------
Created an attachment (id=1115)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1115&action=view)
Modified HotRand.hex_convert() function

Hopefully the last attempt to get the right version as a patch!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bsouthey at gmail.com  Wed Nov 26 17:10:57 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 26 Nov 2008 11:10:57 -0600
Subject: [Biopython-dev] Use of depreciated string functions
In-Reply-To: <320fb6e00811241042g646ff65fq61d3751537c882b1@mail.gmail.com>
References: <4926D17A.8080101@gmail.com>	
	<320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com>	
	<492AE8A9.1000406@gmail.com>
	<320fb6e00811241042g646ff65fq61d3751537c882b1@mail.gmail.com>
Message-ID: <492D8321.2060301@gmail.com>

Peter wrote:
> On Mon, Nov 24, 2008 at 5:47 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>   
>>> Once I've dealt with Biopython 1.49, I'd be happy to look at a patch
>>> to remove more "import string" usage from non-obsolete, non-deprecated
>>> code.  It would be a little risky doing this to modules without unit
>>> tests, but that's another area you've shown some interest in anyway...
>>>
>>> Thanks,
>>>
>>> Peter
>>>       
>> Hi,
>> I was planning to get started on with these depending on what time I have
>> available. So just a quick question:
>> Do you want one bug report per patch per file?
>> Or just let me know if there is another way.
>>     
>
> I'd suggest one general bug, and uploading one patch per module - that
> way the can be evaluated on a case by case basis (a single huge
> multi-file patch would be more difficult, and could become out of
> date).
>
> Personally however, I would prioritise more unit test coverage over
> this, but on the other hand its the kind of short task you can handle
> when you have the odd spare 10 minutes.  Up to you.
>
> Peter
>   
Hi,
I have filed Bug 2688 
<http://bugzilla.open-bio.org/show_bug.cgi?id=2688> as a general bug for 
the files in the Bio module that use the depreciated string functions. I 
listed all the files that I identified that imported string and whether 
or not I provided a patch for it. Bug  2683 
<http://bugzilla.open-bio.org/show_bug.cgi?id=2683> lists those files 
that import string but do not use it.

There is one attachment for each file (excluding mistakes).

In addition, Bugs 2684 
<http://bugzilla.open-bio.org/show_bug.cgi?id=2684> and 2685 
<http://bugzilla.open-bio.org/show_bug.cgi?id=2685> were created because 
these involve rewritten code that was related to this. I probably should 
have created a separate one for
 SubsMat/FreqTable.py although the reason directly involves the string 
module.

Regards
Bruce

 
From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 01:23:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 20:23:32 -0500
Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function
	to convert hex to integer
In-Reply-To: <bug-2685-42@http.bugzilla.open-bio.org/>
Message-ID: <200811270123.mAR1NWWu011079@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2685


------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp  2008-11-26 20:23 EST -------
As far as I can tell, the HotRand.hex_convert function is not used any more in
Bio.HotRand or anywhere else in Biopython; its usage was lost in revision 1.3
of Bio.HotRand. So I think that we can simply deprecate this function. If there
are no objections, I'll add a DeprecationWarning and use Bruce's code in the
mean time until the function is removed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 03:06:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 22:06:59 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811270306.mAR36xuB020451@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1088 is|0                           |1
           obsolete|                            |


------- Comment #27 from mdehoon at ims.u-tokyo.ac.jp  2008-11-26 22:06 EST -------
(From update of attachment 1088)
Committed to CVS.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 04:16:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 23:16:43 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811270416.mAR4Gh40027250@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1089 is|0                           |1
           obsolete|                            |


------- Comment #28 from mdehoon at ims.u-tokyo.ac.jp  2008-11-26 23:16 EST -------
(From update of attachment 1089)
Committed to CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 04:29:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 23:29:01 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811270429.mAR4T1tn027991@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1090 is|0                           |1
           obsolete|                            |


------- Comment #29 from mdehoon at ims.u-tokyo.ac.jp  2008-11-26 23:29 EST -------
(From update of attachment 1090)
Committed to CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 04:45:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 26 Nov 2008 23:45:40 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811270445.mAR4jeph029067@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1091 is|0                           |1
           obsolete|                            |


------- Comment #30 from mdehoon at ims.u-tokyo.ac.jp  2008-11-26 23:45 EST -------
(From update of attachment 1091)
Committed to CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 06:54:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 01:54:12 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811270654.mAR6sC92005762@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1092 is|0                           |1
           obsolete|                            |


------- Comment #31 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 01:54 EST -------
(From update of attachment 1092)
Committed to CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 09:35:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 04:35:50 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811270935.mAR9Zoxj019658@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #8 from lpritc at scri.sari.ac.uk  2008-11-27 04:35 EST -------
(In reply to comment #7)
> (In reply to comment #0)
> 
> > The major changes that have been made to the version previously available at
> > http://bioinf.scri.ac.uk/lp are:
> 
> That's a very nice contribution, thank you!!!
> This link is wrong, I think you mean
> http://bioinf.scri.ac.uk/lp/programs.php#genomediagram

Thanks, Marco.

You're absolutely correct - and people ought to be able to navigate to there
from the link I gave.  Thanks for posting the accurate link.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From lpritc at scri.ac.uk  Thu Nov 27 09:33:43 2008
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Thu, 27 Nov 2008 09:33:43 +0000
Subject: [Biopython-dev] blog article on GenomeDiagram in Biopython
In-Reply-To: <5aa3b3570811230933n2de8af3lf31d3c4b962930a3@mail.gmail.com>
Message-ID: <C55419F7.19ECC%lpritc@scri.ac.uk>

Thanks Giovanni,

On 23/11/2008 17:33, "Giovanni Marco Dall'Olio" <dalloliogm at gmail.com>
wrote:

> I thought that the inclusion of GenomeDiagrams in biopython is such an
> interesting news, that I wrote a blog post on it:
> - http://bioinfoblog.it/2008/11/genome-diagrams-included-in-biopython-150/

I left a comment there ;)
 
> I have used images from some tutorials without asking, I hope it is
> not a problem.

No problem at all - I think the old license covered it, and I'm pretty sure
that the Biopython license will, too.  Even if they didn't, as the original
copyright holder, I approve ;)

L.


-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by
guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are
confidential

to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this

confidentiality and you must not use, disclose, copy, print or rely on
this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan
the email and the attachments (if any).
______________________________________________________________________


From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 09:57:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 04:57:00 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcribe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200811270957.mAR9v0i0021623@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


------- Comment #54 from lpritc at scri.sari.ac.uk  2008-11-27 04:56 EST -------
(In reply to comment #53)
> (In reply to comment #27)
> > Created an attachment (id=1032)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] [details]
> > Patch to Bio/Seq.py to add start codon handling to translation
> > 
> > Patch adds a new boolean argument to the translate method and function, called
> > "init" (rather than my earlier suggestions like "from_start" or "check_start"
> > which could be considered misleading).

[...]

> I don't like the "init" argument either. I would call it force_initiator_Met
> instead. BTW, non-canonical initiator codon is CUG, where did you found UUG?

This may clarify things:

>From the E. coli K-12 sequencing paper
(http://dx.doi.org/10.1126/science.277.5331.1453):

"The distribution of start codons is as follows: ATG, 3542; GTG, 612; and TTG,
130. There is also one ATT and possibly a CTG"

It's not that unusual an occurrence, and there are a small number of known
alternative start codons.  'Forcing' a Met start imposes the result that the
first codon is a methionine, rather than checking that the first codon *could
be* a methionine.  I prefer the second behaviour.

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 10:41:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 05:41:18 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811271041.mARAfITj025395@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1093 is|0                           |1
           obsolete|                            |


------- Comment #32 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 05:41 EST -------
(From update of attachment 1093)
Committed to CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 10:46:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 05:46:57 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811271046.mARAkv9t025868@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1094 is|0                           |1
           obsolete|                            |


------- Comment #33 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 05:46 EST -------
(From update of attachment 1094)
Committed to CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 11:08:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 06:08:30 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811271108.mARB8U6n027821@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1096 is|0                           |1
           obsolete|                            |


------- Comment #34 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 06:08 EST -------
(From update of attachment 1096)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 11:14:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 06:14:18 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811271114.mARBEI5w028329@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1097 is|0                           |1
           obsolete|                            |


------- Comment #35 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 06:14 EST -------
(From update of attachment 1097)
Committed to CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Thu Nov 27 13:09:43 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 27 Nov 2008 05:09:43 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <20081125144041.GC83220@sobchak.mgh.harvard.edu>
Message-ID: <45956.75241.qm@web62406.mail.re1.yahoo.com>

> > However, more than half of Biopython's tests do
> > not actually make use of this testing framework:
> > [...]
> > These tests have trivial output, for example
> test_Cluster:
> > 
> > test_Cluster
> > test_clusterdistance (test_Cluster.TestCluster) ... ok
> > test_distancematrix_kmedoids
> > (test_Cluster.TestCluster) ... ok
> > test_kcluster (test_Cluster.TestCluster) ... ok
> > test_matrix_parse (test_Cluster.TestCluster) ... ok
> > test_median_mean (test_Cluster.TestCluster) ... ok
> > test_somcluster (test_Cluster.TestCluster) ... ok
> > test_treecluster (test_Cluster.TestCluster) ... ok
> 
> They really do make use of the framework, but at a higher
> level. I agree that if you run a single test it makes little
> difference whether you use 'run_tests.py test_Cluster' or just
> run 'test_Cluster.py' directly. However, when you are
> running all the tests as is regular done in development
> or before pushing releases, this comparison is important. It
> will pick out if you get a line like:
> 
> test_clusterdistance (test_Cluster.TestCluster) ... ERROR
> 
> instead of the expected ok and report this in the summary
> for all of the tests. Otherwise this is likely to get lost
> in all of the results.

Actually, I never use the summary produced by run_tests.py. I just check which tests failed, and then fix them one by one by running the individual test scripts.

> > I would therefore like to suggest to move from
> > Biopython's testing framework to Python's testing
> > framework. This also relieves us of the
> > task of explaining Biopython's testing framework
> > to contributors, and allows us to make better use
> > of what Python already provides.
...
> Is the testing framework you are proposing different from
> the unit tests used the individual tests?

I am proposing to use the regular Python unit testing framework as it is. This means that most Biopython tests do not change at all (or only trivially). The run_tests.py script will need to be modified though to remove the requirement of having an output file for each individual test.

> How does your proposed
> manage the higher level functionality of checking if all sub-tests
> within one of the test suites passes?

If one of the sub-tests fails, Python's unit testing framework will tell us so, though (perhaps) not exactly which sub-test fails. However, that is easy to figure out just by running the individual test script by itself.

--Michiel


From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 13:33:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 08:33:46 -0500
Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules
In-Reply-To: <bug-2683-42@http.bugzilla.open-bio.org/>
Message-ID: <200811271333.mARDXkHx009514@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2683


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 08:33 EST -------
Fixed in CVS, thanks


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 14:38:04 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 09:38:04 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811271438.mAREc4IG018238@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #9 from lpritc at scri.sari.ac.uk  2008-11-27 09:38 EST -------
The revised color/colour code in AbstractDrawer.py causes all bar charts in
linear diagrams to be the default colour of light green.  A fixed version of
AbstractDrawer is provided as an attachment.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 27 14:39:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 09:39:37 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200811271439.mAREdbXp018415@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #10 from lpritc at scri.sari.ac.uk  2008-11-27 09:39 EST -------
Created an attachment (id=1121)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1121&action=view)
Revised AbstractDrawer.py

This revision fixes a behaviour where bar charts for linear diagrams cannot be
changed from tehir defautl colour.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 01:33:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 20:33:56 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280133.mAS1XuXq002406@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1098 is|0                           |1
           obsolete|                            |


------- Comment #36 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 20:33 EST -------
(From update of attachment 1098)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 01:52:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 20:52:10 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280152.mAS1qAR3003698@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1099 is|0                           |1
           obsolete|                            |


------- Comment #37 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 20:52 EST -------
(From update of attachment 1099)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 02:27:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 21:27:29 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280227.mAS2RTea005795@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #38 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 21:27 EST -------
(From update of attachment 1100)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 02:27:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 21:27:47 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280227.mAS2RlEg005835@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1100 is|0                           |1
           obsolete|                            |


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 02:55:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 21:55:11 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280255.mAS2tBTL007510@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1101 is|0                           |1
           obsolete|                            |


------- Comment #39 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 21:55 EST -------
(From update of attachment 1101)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 03:02:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 22:02:25 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280302.mAS32Pxh008177@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1102 is|0                           |1
           obsolete|                            |


------- Comment #40 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 22:02 EST -------
(From update of attachment 1102)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 04:08:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 23:08:57 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280408.mAS48vaq012054@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1103 is|0                           |1
           obsolete|                            |


------- Comment #41 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 23:08 EST -------
(From update of attachment 1103)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 04:16:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 23:16:29 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280416.mAS4GThb012692@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1104 is|0                           |1
           obsolete|                            |


------- Comment #42 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 23:16 EST -------
(From update of attachment 1104)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 04:22:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 23:22:37 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280422.mAS4MbVR013025@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1105 is|0                           |1
           obsolete|                            |


------- Comment #43 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 23:22 EST -------
(From update of attachment 1105)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 04:50:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 27 Nov 2008 23:50:59 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280450.mAS4oxjC014450@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1106 is|0                           |1
           obsolete|                            |


------- Comment #44 from mdehoon at ims.u-tokyo.ac.jp  2008-11-27 23:50 EST -------
(From update of attachment 1106)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 05:07:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 00:07:15 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280507.mAS57F3P015386@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1107 is|0                           |1
           obsolete|                            |


------- Comment #45 from mdehoon at ims.u-tokyo.ac.jp  2008-11-28 00:07 EST -------
(From update of attachment 1107)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 08:48:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 03:48:30 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811280848.mAS8mUmr028058@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1108 is|0                           |1
           obsolete|                            |


------- Comment #46 from mdehoon at ims.u-tokyo.ac.jp  2008-11-28 03:47 EST -------
(From update of attachment 1108)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 10:07:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 05:07:05 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281007.mASA751F001103@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1109 is|0                           |1
           obsolete|                            |


------- Comment #47 from mdehoon at ims.u-tokyo.ac.jp  2008-11-28 05:07 EST -------
(From update of attachment 1109)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 10:22:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 05:22:13 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281022.mASAMDwt002023@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1110 is|0                           |1
           obsolete|                            |


------- Comment #48 from mdehoon at ims.u-tokyo.ac.jp  2008-11-28 05:22 EST -------
(From update of attachment 1110)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 10:29:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 05:29:16 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281029.mASATGhi002380@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1111 is|0                           |1
           obsolete|                            |


------- Comment #49 from mdehoon at ims.u-tokyo.ac.jp  2008-11-28 05:29 EST -------
(From update of attachment 1111)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 10:29:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 05:29:39 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281029.mASATdU5002440@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1112 is|0                           |1
           obsolete|                            |


------- Comment #50 from mdehoon at ims.u-tokyo.ac.jp  2008-11-28 05:29 EST -------
(From update of attachment 1112)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 10:30:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 05:30:23 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281030.mASAUNDX002501@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1113 is|0                           |1
           obsolete|                            |


------- Comment #51 from mdehoon at ims.u-tokyo.ac.jp  2008-11-28 05:30 EST -------
(From update of attachment 1113)
Fixed in CVS


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Fri Nov 28 11:09:30 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 11:09:30 +0000
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <45956.75241.qm@web62406.mail.re1.yahoo.com>
References: <20081125144041.GC83220@sobchak.mgh.harvard.edu>
	<45956.75241.qm@web62406.mail.re1.yahoo.com>
Message-ID: <320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com>

Hello all,

Sorry for not replying earlier - I've been travelling and didn't get
to check my email as often as I had hoped.   I'm going to reply to
several points in this one email...

Marco wrote:
> I was also proposing to use the doctest framework for some of the
> modules, and for enhancing documentation.
> http://bugzilla.open-bio.org/show_bug.cgi?id=2640

As Marco points out, there is also the option of using doctest, which
were doing in some of the unit tests (e.g. test_wise.py).  I like the
idea of using doctest were we want to include examples in the
docstrings anyway.  Marco wasn't suggesting this, but just to be
clear, I don't think we should use JUST doctest for all our unit
tests.  Many test cases would make misleading documentation, and also
having lots and lots of doctest examples would also hide the important
parts of the documentation.  Additionally, doctests using input files
are not straightforward due to path issues.

Brad wrote:
> Agreed with the distinction between the unit tests and the "dump
> lots of text and compare" approach. I've written both and do think
> the unit testing/assertion model is more robust since you can go
> back and actually get some insight into what someone was thinking
> when they wrote an assertion.

I have probably written more of the "dump lots of text and compare"
style tests.  I think these have a number of advantages:
(1) Easier for beginneers to write a test, you can almost take any
example script and use that.  You don't have to learn the unit test
framework.
(2) Debugging a failing test in IDLE is much easier - using unit tests
you have all that framework between you and the local scope where the
error happens.
(3) For many broad tests, manually setting up the expected output for
an assert is extremely tedious (e.g. parsing sequences and checking
their checksums).

We could discuss a modification to run_tests.py so that if there is no
expected output file output/test_XXX for test_XXX.py we just run
test_XXX.py and check its return value (I think Michiel had previously
suggested something like this).  Perhaps for more robustness, capture
the output and compare it to a predefined list of regular expressions
covering the typical outputs.  For example, looking at
output/test_Cluster, the first line is the test name, but rest follows
the patten "test_... ok". I imaging only a few output styles exist.
With such a change, half the unit test's (e.g. test_Cluster.py)
wouldn't need their output file in CVS (output/test_Cluster).

Michiel de Hoon wrote:
> If one of the sub-tests fails, Python's unit testing framework will tell us so,
> though (perhaps) not exactly which sub-test fails. However, that is easy to
> figure out just by running the individual test script by itself.

That won't always work.  Consider intermittent network problems, or
tests using random data - in general it really is worthwhile having
run_tests.py report a little more than just which test_XXX.py module
failed.

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 11:53:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 06:53:36 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281153.mASBra4q008163@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #52 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-28 06:53 EST -------
Although I had offered to look over the patches, it looks like Michiel has
reviewed and committed them all while I was away, so I don't have to ;)

Thank you both!

Can we close this bug now?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 11:57:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 06:57:35 -0500
Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function
	to convert hex to integer
In-Reply-To: <bug-2685-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281157.mASBvZ6A008475@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2685


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-28 06:57 EST -------
(In reply to comment #5)
> As far as I can tell, the HotRand.hex_convert function is not used any more in
> Bio.HotRand or anywhere else in Biopython; its usage was lost in revision 1.3
> of Bio.HotRand. So I think that we can simply deprecate this function. If there
> are no objections, I'll add a DeprecationWarning and use Bruce's code in the
> mean time until the function is removed.


+1 on this plan.

(I was going to say we should deprecate this function rather than removing it,
but you'd already covered that).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 12:05:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 07:05:14 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281205.mASC5EY8009077@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-28 07:05 EST -------
(In reply to comment #7)
> (In reply to comment #6)
> > (From update of attachment 1072 [details] [details])
> > I think this is still a big improvement, but that the
> > (sub)feature.location_operator issue could wait.  We'll
> > need to discuss on the
> > BioSQL mailing list how this should be handled consistently.
> > 
> > Leaving this bug open.
> 
> Further to the "where to put the (sub)feature.location_operator" (eg. "join",
> "order") question, this comment appears in the BioPerl MySQL schema for the
> location_qualifier_value table:
> 
> -- location qualifiers - mainly intended for fuzzies but anything
> -- can go in here
> -- some controlled vocab terms have slots;
> 
> So, this would seem a suitable place to store the attribute.
> 

Yes, but if we record something in the location_qualifier_value table we can't
use a NULL term_id (possibly a schema limitation).  We therefore need to use a
particular ontology, which is where some co-ordination with the other BioSQL
projects is needed (so that we all default to the same ontology).  I'd meant to
send of an email about this to the BioSQL mailing list but didn't get it done
before I had to leave for a trip.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 12:24:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 07:24:19 -0500
Subject: [Biopython-dev] [Bug 2684] GenBank/__init__.py: Removing loop over
	string.whitespace
In-Reply-To: <bug-2684-42@http.bugzilla.open-bio.org/>
Message-ID: <200811281224.mASCOJSg010226@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2684


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-28 07:24 EST -------
Marking as fixed - I've checked in a simplified version of your patch.  See
Bio/GenBank/__init__.py  revision 1.98 in CVS.

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/__init__.py?cvsroot=biopython

Thanks Bruce.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Fri Nov 28 12:37:04 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 12:37:04 +0000
Subject: [Biopython-dev] [BioPython] PubMed find_related
In-Reply-To: <580790.81356.qm@web62404.mail.re1.yahoo.com>
References: <aa5471510811241418h2a6ca97ai74aab652cdcfdaa3@mail.gmail.com>
	<580790.81356.qm@web62404.mail.re1.yahoo.com>
Message-ID: <320fb6e00811280437w8f9f3d2t84716f7a554b913@mail.gmail.com>

On Tue, Nov 25, 2008 at 4:05 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>>>> from Bio import Entrez
>>>> handle = Entrez.elink(dbfrom='pubmed',id=12345)
>>>> record = Entrez.read(handle)
>
> Feel free to write a section about Entrez.elink for the Biopython documentation :-).
> Currently, this section is almost empty.

This does need a little love, doesn't it.  Here is a slightly longer
example which could form the basis of a tutorial entry:

    >>> from Bio import Entrez
    >>> Entrez.email = "A.N.Other at example.com"
    >>> pmid = "12230038"
    >>> handle = Entrez.elink(dbfrom='pubmed', id=pmid)
    >>> result = Entrez.read(handle)
    >>> for link in result[0]["LinkSetDb"][0]['Link'] :
    ...     print link

The deeply nested nature of the XML results do suggest that a helper
function in Bio.Entrez would be useful here.  Maybe something like:

def find_related(dbfrom, id) :
    #Returns a list of dictionaries containing Score and ID matched
    result = read(elink(dbfrom=dbfrom, id=id))
    return result[0]["LinkSetDb"][0]['Link']

It might make more sense to return just a list of ID strings, but the
score may be interesting.

Peter


From biopython at maubp.freeserve.co.uk  Fri Nov 28 13:05:38 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 13:05:38 +0000
Subject: [Biopython-dev] Bio.Entrez batched downloads
Message-ID: <320fb6e00811280505m3b065877r99785f306a356aa@mail.gmail.com>

This is returning to a topic we've discussed in the past - the NCBI
Entrez API is quite low level, and the Bio.Entrez module reflects
this.  As a result certain "typical" tasks require more code than one
might expect.  In particular, batched downloads of a large result set.

The tutorial covers using Bio.Entrez.efetch in a loop to download a
result set in a batch, for example writing out a MedLine or FASTA
format file.  This seems like a common need - starting either from a
list of IDs, or better from a history webenv and query_key.  I think
there is a use for a Bio.Entrez.batched_efetch or download_many
function to save people re-implementing their own batched downloader
(even just as a copy and paste from the tutorial).

If the NCBI every give any explicit guidance on batch sizes then we
can update Biopython centrally - rather than individual scripts
requiring changes everywhere.  We might also be able to include some
basic error checking to (e.g. empty or partial downloads). One catch
is that downloading and concatenating batches as XML files does not
give a valid XML file - but this is safe for MedLine, FASTA, GenBank
etc.  This proposed function could raise an exception if used with XML
to avoid this issue.

In terms of the API for getting the data back, there are several options
* Take an output handle as an argument (which would be written to as
each batch was downloaded)
* Return a handle - the implementation would be a bit more complicated
as we should avoid holding everything in memory, but would then be
very similar to the existing Bio.Entrez.efetch function in its usage.

Other options which I don't like:
* Take an output filename (less flexible than just taking an output handle)
* Return the data as a string (memory concerns with large downloads)

Note that related functions like the deprecated
Bio.PubMed.download_many (and early versions of
Bio.GenBank.download_many) used a complicated function call back
mechanism (which required knowing the file format in advance and
having a parser for it).  This doesn't seem sensible for a generic
function.  Currently Bio.GenBank.download_many (obsolete, soon to be
deprecated) just makes a single call to Bio.Entrez.efetch, regardless
of the number of records / amount of data expected.

Peter


From biopython at maubp.freeserve.co.uk  Fri Nov 28 17:26:45 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 17:26:45 +0000
Subject: [Biopython-dev] Deprecation and removal policy
Message-ID: <320fb6e00811280926v16454fa6t891fcc74e4fa4729@mail.gmail.com>

Back on 27 June 2008, in preparation for what became Biopython 1.47,
Michiel wrote:
> In recent releases, we have been using the rule of thumb to remove all
> modules from a new Biopython release that were deprecated two
> releases ago.

I was thinking that when we made releases about six months apart, this
rule of thumb effectively gave a year's warning.  Recently we're made
releases roughly every three months, which translates to only about
six months warning, so I think we should be a little more restrained
in removing deprecated code in future.

As an example, Bio.EUtils was deprecated in favour of Bio.Entrez in
Release 1.48 (Sept 2009).  Under the old rule of thumb, we could
remove this module from CVS now (as the deprecation was present in
Biopython 1.48 and 1.49).  If we release Biopython 1.50 in January or
February 2009 (for the sake of argument), that means the deprecation
would have been in place for only four or five months - which seems
too rash.

How about a new policy that after adding a deprecation warning,
deprecated modules/functions are kept for at least two public releases
AND at least 12 months (counting from the first release when they are
deprecated - not the date of the CVS change) before being removed?

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Nov 28 20:10:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 15:10:43 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200811282010.mASKAhuK012846@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-28 15:10 EST -------
(In reply to comment #8)
> Yes, but if we record something in the location_qualifier_value table we can't
> use a NULL term_id (possibly a schema limitation).  We therefore need to use a
> particular ontology, which is where some co-ordination with the other BioSQL
> projects is needed (so that we all default to the same ontology).  I'd meant
> to send of an email about this to the BioSQL mailing list but didn't get it
> done before I had to leave for a trip.

I've started a discussion on the BioSQL mailing list, see this thread:
http://lists.open-bio.org/pipermail/biosql-l/2008-November/001412.html - me
http://lists.open-bio.org/pipermail/biosql-l/2008-November/001414.html -
Richard from BioJava
http://lists.open-bio.org/pipermail/biosql-l/2008-November/001413.html - me
etc.

Cymon - if you haven't already done so, I would encourage you to sign up to the
BioSQL mailing list.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Nov 29 04:48:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 28 Nov 2008 23:48:46 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811290448.mAT4mkmI008416@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


------- Comment #53 from mdehoon at ims.u-tokyo.ac.jp  2008-11-28 23:48 EST -------
(In reply to comment #52)

> Can we close this bug now?
> 
Not yet, there are a few more things to consider in the original description.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Nov 29 05:01:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 29 Nov 2008 00:01:12 -0500
Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function
	to convert hex to integer
In-Reply-To: <bug-2685-42@http.bugzilla.open-bio.org/>
Message-ID: <200811290501.mAT51ClZ009532@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2685


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #7 from mdehoon at ims.u-tokyo.ac.jp  2008-11-29 00:01 EST -------
I used Bruce's patch and added a DeprecationWarning to the hex_convert
function, and modified the unit test accordingly.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Sat Nov 29 05:13:33 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 28 Nov 2008 21:13:33 -0800 (PST)
Subject: [Biopython-dev] Bio.Entrez batched downloads
In-Reply-To: <320fb6e00811280505m3b065877r99785f306a356aa@mail.gmail.com>
Message-ID: <432417.5854.qm@web62405.mail.re1.yahoo.com>

Sorry, but I am -1 on this. This sounds like software bloat to me.
The reason that the NCBI Entrez API is low level is that they are unable to predict how users will want to use the NCBI Entrez. We as Biopython know little more than NCBI, except that our users want to access NCBI Entrez via Python, so we provide a Python interface to NCBI Entrez. Also, I don't think that the current situation is unsatisfactory. The Bio.Entrez API is extremely simple, and with an example in the tutorial it should be very easy to use; I don't see a problem with copying and pasting from the tutorial, provided that sufficient information is available there.

--Michiel.

--- On Fri, 11/28/08, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: [Biopython-dev] Bio.Entrez batched downloads
> To: "BioPython-Dev Mailing List" <biopython-dev at lists.open-bio.org>
> Date: Friday, November 28, 2008, 8:05 AM
> This is returning to a topic we've discussed in the past
> - the NCBI
> Entrez API is quite low level, and the Bio.Entrez module
> reflects
> this.  As a result certain "typical" tasks
> require more code than one
> might expect.  In particular, batched downloads of a large
> result set.
> 
> The tutorial covers using Bio.Entrez.efetch in a loop to
> download a
> result set in a batch, for example writing out a MedLine or
> FASTA
> format file.  This seems like a common need - starting
> either from a
> list of IDs, or better from a history webenv and query_key.
>  I think
> there is a use for a Bio.Entrez.batched_efetch or
> download_many
> function to save people re-implementing their own batched
> downloader
> (even just as a copy and paste from the tutorial).
> 
> If the NCBI every give any explicit guidance on batch sizes
> then we
> can update Biopython centrally - rather than individual
> scripts
> requiring changes everywhere.  We might also be able to
> include some
> basic error checking to (e.g. empty or partial downloads).
> One catch
> is that downloading and concatenating batches as XML files
> does not
> give a valid XML file - but this is safe for MedLine,
> FASTA, GenBank
> etc.  This proposed function could raise an exception if
> used with XML
> to avoid this issue.
> 
> In terms of the API for getting the data back, there are
> several options
> * Take an output handle as an argument (which would be
> written to as
> each batch was downloaded)
> * Return a handle - the implementation would be a bit more
> complicated
> as we should avoid holding everything in memory, but would
> then be
> very similar to the existing Bio.Entrez.efetch function in
> its usage.
> 
> Other options which I don't like:
> * Take an output filename (less flexible than just taking
> an output handle)
> * Return the data as a string (memory concerns with large
> downloads)
> 
> Note that related functions like the deprecated
> Bio.PubMed.download_many (and early versions of
> Bio.GenBank.download_many) used a complicated function call
> back
> mechanism (which required knowing the file format in
> advance and
> having a parser for it).  This doesn't seem sensible
> for a generic
> function.  Currently Bio.GenBank.download_many (obsolete,
> soon to be
> deprecated) just makes a single call to Bio.Entrez.efetch,
> regardless
> of the number of records / amount of data expected.
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From mjldehoon at yahoo.com  Sat Nov 29 05:22:10 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 28 Nov 2008 21:22:10 -0800 (PST)
Subject: [Biopython-dev] [BioPython] PubMed find_related
In-Reply-To: <320fb6e00811280437w8f9f3d2t84716f7a554b913@mail.gmail.com>
Message-ID: <246349.44664.qm@web62404.mail.re1.yahoo.com>

> The deeply nested nature of the XML results do suggest that
> a helper function in Bio.Entrez would be useful here.  Maybe
> something like:
> 
> def find_related(dbfrom, id) :
>     #Returns a list of dictionaries containing Score and ID
>     # matched
>     result = read(elink(dbfrom=dbfrom, id=id))
>     return result[0]["LinkSetDb"][0]['Link']
> 
> It might make more sense to return just a list of ID
> strings, but the score may be interesting.
>

The problem this user encountered was that the DeprecationWarning in 
PubMed.find_related function contained very little information and did not mention that Entrez.elink is the appropriate function to use:

"Find related articles in PubMed, returns an ID list (DEPRECATED).
Please use Bio.Entrez instead as described in the Biopython Tutorial."

and in addition that currently the description of Bio.Entrez.elink in the tutorial is almost empty. Instead of adding a function to Bio.Entrez that helps this particular user, we should improve our documentation to enable all users to use Bio.Entrez appropriately. The set of helper functions to Bio.Entrez that we could write is virtually endless; we should not go down that path.

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Sat Nov 29 06:02:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 29 Nov 2008 01:02:01 -0500
Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions
In-Reply-To: <bug-2688-42@http.bugzilla.open-bio.org/>
Message-ID: <200811290602.mAT621Lc012846@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2688


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #54 from mdehoon at ims.u-tokyo.ac.jp  2008-11-29 01:02 EST -------
All fixed now; I hope I didn't screw up anything.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Sat Nov 29 07:04:33 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 28 Nov 2008 23:04:33 -0800 (PST)
Subject: [Biopython-dev] [BioPython] PubMed find_related
In-Reply-To: <320fb6e00811280437w8f9f3d2t84716f7a554b913@mail.gmail.com>
Message-ID: <652169.76582.qm@web62406.mail.re1.yahoo.com>

I've expanded your example a bit and added it to the documentation of Entrez.elink. Thanks!

--Michiel.


--- On Fri, 11/28/08, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: Re: [BioPython] PubMed find_related
> To: mjldehoon at yahoo.com
> Cc: "BioPython-Dev Mailing List" <biopython-dev at lists.open-bio.org>
> Date: Friday, November 28, 2008, 7:37 AM
> On Tue, Nov 25, 2008 at 4:05 AM, Michiel de Hoon
> <mjldehoon at yahoo.com> wrote:
> >>>> from Bio import Entrez
> >>>> handle =
> Entrez.elink(dbfrom='pubmed',id=12345)
> >>>> record = Entrez.read(handle)
> >
> > Feel free to write a section about Entrez.elink for
> the Biopython documentation :-).
> > Currently, this section is almost empty.
> 
> This does need a little love, doesn't it.  Here is a
> slightly longer
> example which could form the basis of a tutorial entry:
> 
>     >>> from Bio import Entrez
>     >>> Entrez.email =
> "A.N.Other at example.com"
>     >>> pmid = "12230038"
>     >>> handle =
> Entrez.elink(dbfrom='pubmed', id=pmid)
>     >>> result = Entrez.read(handle)
>     >>> for link in
> result[0]["LinkSetDb"][0]['Link'] :
>     ...     print link
> 
> The deeply nested nature of the XML results do suggest that
> a helper
> function in Bio.Entrez would be useful here.  Maybe
> something like:
> 
> def find_related(dbfrom, id) :
>     #Returns a list of dictionaries containing Score and ID
> matched
>     result = read(elink(dbfrom=dbfrom, id=id))
>     return
> result[0]["LinkSetDb"][0]['Link']
> 
> It might make more sense to return just a list of ID
> strings, but the
> score may be interesting.
> 
> Peter


From biopython at maubp.freeserve.co.uk  Sat Nov 29 13:36:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 29 Nov 2008 13:36:16 +0000
Subject: [Biopython-dev] [BioPython] PubMed find_related
In-Reply-To: <246349.44664.qm@web62404.mail.re1.yahoo.com>
References: <320fb6e00811280437w8f9f3d2t84716f7a554b913@mail.gmail.com>
	<246349.44664.qm@web62404.mail.re1.yahoo.com>
Message-ID: <320fb6e00811290536n7fe25b0fxfe78d52b16014a92@mail.gmail.com>

On Sat, Nov 29, 2008 at 5:22 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> The problem this user encountered was that the DeprecationWarning in
> PubMed.find_related function contained very little information and did
> not mention that Entrez.elink is the appropriate function to use:
>
> "Find related articles in PubMed, returns an ID list (DEPRECATED).
> Please use Bio.Entrez instead as described in the Biopython Tutorial."

We could make the deprecation warnings from Bio.PubMed (and the online
bits of Bio.GenBank) a little more explicit about which bits of
Bio.Entrez to use.  I made a start on updating Bio/PubMed.py on my
work computer on Friday, so I'll try to remember to finish this off on
Monday.

> and in addition that currently the description of Bio.Entrez.elink in the
> tutorial is almost empty. Instead of adding a function to Bio.Entrez
> that helps this particular user, we should improve our documentation
> to enable all users to use Bio.Entrez appropriately.

The tutorial update for elink looks good (see below).

> The set of helper functions to Bio.Entrez that we could write is
> virtually endless; we should not go down that path.

I take your point - there are lots of possible helper functions we
could consider.  As long as we cover the typical use cases in the
tutorial that should be enough.

On Sat, Nov 29, 2008 at 7:04 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> I've expanded your example a bit and added it to the documentation of Entrez.elink. Thanks!
>
> --Michiel.

That looks good - and tries to explain the nested result structure too.

Peter