From p.j.a.cock at googlemail.com  Mon Jan  7 13:55:25 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 7 Jan 2013 18:55:25 +0000
Subject: [Biopython-dev] Dropping Python 2.5 and Jython 2.5 support?
In-Reply-To: <CAKVJ-_60vgH_T8eqrfYKBScyRVNUApmFM32_4D+uTLABLy19Ng@mail.gmail.com>
References: <CAKVJ-_60vgH_T8eqrfYKBScyRVNUApmFM32_4D+uTLABLy19Ng@mail.gmail.com>
Message-ID: <CAKVJ-_5M55Lky6qg+DohH-QQ1JBBqueCZ1GCzcsYscQ1YSu6UQ@mail.gmail.com>

On Mon, Oct 22, 2012 at 6:17 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Dear Biopythoneers,
>
> Would anyone object to us preparing to drop support for Python 2.5 and
> Jython 2.5, perhaps after the next Biopython release?
>
> To reassure those of you using Jython, we'd wait until Jython 2.7 is out
> first. Jython 2.7 is already in alpha, and brings support for C Python 2.7
> language features.
>
> Thanks,
>
> Peter

Hello all,

Having recently back-ported some Python 3 code with a C
extension to Python 2.6 and 2.7, I can now more clearly
appreciate the benefits dropping Python 2.5 support has for
writing code for both Python 2 and 3 - and am keen to be
able to exploit this for Biopython.

Given no major objections to the email I sent round in October
last year (thank you for your input Nathan), we will press ahead
with phasing out support for Python 2.5, provisionally supporting
it in the forthcoming Biopython 1.61 and at least one more release
(which would mean Biopython 1.62 due Summer 2013).

https://github.com/biopython/biopython/commit/3f17f75b320fb6624d332809ef07314bab97477c

My only significant concern is for Jython users, since this will also
mean dropping support for Jython 2.5 (which implements the
Python 2.5 language). The replacement Jython 2.7 is still only
at the alpha release stage.

Regards,

Peter

From kai.blin at biotech.uni-tuebingen.de  Tue Jan  8 05:28:31 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Tue, 08 Jan 2013 11:28:31 +0100
Subject: [Biopython-dev] More relaxed parsing of wonky GenBank files
Message-ID: <50EBF4CF.9080901@biotech.uni-tuebingen.de>

Hi folks,

I've recently pushed into production use a new version of my software
that uses BioPython parsers instead of our own hand-written parsers.

One big thing we noticed is that BioPython is waaay more picky as to
what a proper GenBank file is supposed to look like. Sadly, many of
our users seem to be creating their GenBank files with programs that
only have a rough understanding what the file format is supposed to
look like. Most of the invalid input can safely be ignored, and I
would propose to extend the GenBank parser to cope with the most
common errors I'm seeing in day to day use.

I'm happy to provide the patches, but before starting this work I'd
like to make sure that they would be acceptable in principle. So, any
reason to rather blow up in our user's face than to try and cope with
invalid input?

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben

From mjldehoon at yahoo.com  Tue Jan  8 06:11:46 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 8 Jan 2013 03:11:46 -0800 (PST)
Subject: [Biopython-dev] More relaxed parsing of wonky GenBank files
In-Reply-To: <50EBF4CF.9080901@biotech.uni-tuebingen.de>
Message-ID: <1357643506.32308.YahooMailClassic@web164005.mail.gq1.yahoo.com>

Entrez.parse has a "validate" argument to allow parsing of XML files that contain tags that are not represented in the corresponding DTD. If validate==True, the parser raises an Exception if any tags are missing. If False, then the parser will ignore missing tags.
Maybe SeqIO.parse could have a similar "validate" argument?

Best,
-Michiel.

--- On Tue, 1/8/13, Kai Blin <kai.blin at biotech.uni-tuebingen.de> wrote:

> From: Kai Blin <kai.blin at biotech.uni-tuebingen.de>
> Subject: [Biopython-dev] More relaxed parsing of wonky GenBank files
> To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Tuesday, January 8, 2013, 5:28 AM
> Hi folks,
> 
> I've recently pushed into production use a new version of my
> software
> that uses BioPython parsers instead of our own hand-written
> parsers.
> 
> One big thing we noticed is that BioPython is waaay more
> picky as to
> what a proper GenBank file is supposed to look like. Sadly,
> many of
> our users seem to be creating their GenBank files with
> programs that
> only have a rough understanding what the file format is
> supposed to
> look like. Most of the invalid input can safely be ignored,
> and I
> would propose to extend the GenBank parser to cope with the
> most
> common errors I'm seeing in day to day use.
> 
> I'm happy to provide the patches, but before starting this
> work I'd
> like to make sure that they would be acceptable in
> principle. So, any
> reason to rather blow up in our user's face than to try and
> cope with
> invalid input?
> 
> Cheers,
> Kai
> 
> -- 
> Dipl.-Inform. Kai Blin? ? ?
> ???kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-Universit?t T?bingen
> Auf der Morgenstelle 28? ? ? ? ?
> ? ? ???Phone : ++49 7071 29-78841
> D-72076 T?bingen? ? ? ? ? ?
> ? ? ? ? ? ? Fax
> :???++49 7071 29-5979
> Germany
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


From p.j.a.cock at googlemail.com  Tue Jan  8 08:27:20 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 8 Jan 2013 13:27:20 +0000
Subject: [Biopython-dev] More relaxed parsing of wonky GenBank files
In-Reply-To: <50EBF4CF.9080901@biotech.uni-tuebingen.de>
References: <50EBF4CF.9080901@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_5nr3n46RPbtazb4LLyFUzPzRgxjtOdbrQRK3w-c-tWQA@mail.gmail.com>

On Tuesday, January 8, 2013, Kai Blin wrote:

> Hi folks,
>
> I've recently pushed into production use a new version of my software
> that uses BioPython parsers instead of our own hand-written parsers.
>
> One big thing we noticed is that BioPython is waaay more picky as to
> what a proper GenBank file is supposed to look like. Sadly, many of
> our users seem to be creating their GenBank files with programs that
> only have a rough understanding what the file format is supposed to
> look like. Most of the invalid input can safely be ignored, and I
> would propose to extend the GenBank parser to cope with the most
> common errors I'm seeing in day to day use.
>
> I'm happy to provide the patches, but before starting this work I'd
> like to make sure that they would be acceptable in principle. So, any
> reason to rather blow up in our user's face than to try and cope with
> invalid input?
>
> Cheers,
> Kai
>

We already try to be tolerant, and issue warnings where it seems
safe to take a broken file (e.g. Unrecognised first line, mismatch
between length given in first line and actual sequence), but in
these cases not all the mis-formed data will or can be parsed.
Sometimes a file is broken to the point it is unwise to attempt
to parse it any further and an exception is the best course
of action.

Clearly you're found a whole load more dodgy files. If you
can work out which buggy tools are producing them, please
do try and report the issues to the tool authors. I know that
BioEdit is one source, but maintainence of that popular
free Windows tool stopped many years ago.

If you can prepare some (small) example files illustrating the
rule-breaking files (for testing), and with patches too if you like,
I will certainly review them for inclusion.

Note if the user wants an exception, they can use the warnings
module to catch and upgrade our parser warnings. As Michael
pointed out, other bits of Biopython have an explicit validation
or strict mode like the Entrez and PDB parsers. In the case of
the PDB parser this just toggles between issuing warnings and
raising exceptions. I'm not sure if the GenBank (and any other
SeqIO parsers) need a validate/permissive option given this
can already be achieved with the warnings module. After all,
broken GenBank files should be in the minority.

(My understanding of the Entrez setting is also about dealing
with missing DTD files and cases where the NCBI has a
bug and their XML and DTD disagree.)

Peter

From kai.blin at biotech.uni-tuebingen.de  Tue Jan  8 08:55:42 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Tue, 08 Jan 2013 14:55:42 +0100
Subject: [Biopython-dev] More relaxed parsing of wonky GenBank files
In-Reply-To: <CAKVJ-_5nr3n46RPbtazb4LLyFUzPzRgxjtOdbrQRK3w-c-tWQA@mail.gmail.com>
References: <50EBF4CF.9080901@biotech.uni-tuebingen.de>
	<CAKVJ-_5nr3n46RPbtazb4LLyFUzPzRgxjtOdbrQRK3w-c-tWQA@mail.gmail.com>
Message-ID: <50EC255E.5040904@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-01-08 14:27, Peter Cock wrote:

> We already try to be tolerant, and issue warnings where it seems 
> safe to take a broken file (e.g. Unrecognised first line, mismatch 
> between length given in first line and actual sequence), but in 
> these cases not all the mis-formed data will or can be parsed. 
> Sometimes a file is broken to the point it is unwise to attempt to
> parse it any further and an exception is the best course of
> action.

Yeah, I started looking into the code and realized that it already
tries to handle a lot of special cases.

> Clearly you're found a whole load more dodgy files. If you can work
> out which buggy tools are producing them, please do try and report
> the issues to the tool authors. I know that BioEdit is one source,
> but maintainence of that popular free Windows tool stopped many
> years ago.

Unfortunately I often have no way to contact the uploaders of the
broken sequence files, unless they chose to provide an email address.

> If you can prepare some (small) example files illustrating the 
> rule-breaking files (for testing), and with patches too if you
> like, I will certainly review them for inclusion.

The two most common things I saw in the last week are single record
files without the '//' end-of-record marker, and files where the
sequence lines are indented by one space more than expected (my
favourite).

I've added two sample files for these issues, I'm currently working on
patches that make them pass the tests.

Thanks for the comments. I'll push to my github fork once I've got
something.

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQEcBAEBAgAGBQJQ7CVeAAoJEKM5lwBiwTTPGCYIANAkOxKtNPkclw66aCBWCaAH
Uz6zyCk8DTomGOy1fnBoPKI3R+tn73+8XNe6RknFDb6NL/uMD1bR4mTHi1yuHT24
7XSJp+j1JeIamMSs6hLAf4s/HIE2YoEriOe8I6lUAa2I//rxsKf2PcS7y/4Ax6XP
K/PUPODVanTCKFrpOIh2DS92lXvMJqI+cpZQ7k1ioaL+6iM9uqi9iRiV9H69Dci5
9bubA98+XvG1cnBISoQTHXpU1p1uiKU1CLxyWdl+9GTq4dCxTkeKDQvxoOd8JH/P
ksJPXyYY5u41KrDFpIMNJZpvr0PawLHcUGePKXDEvAt7wvmfDxN92xcVYsUP9w4=
=9u/w
-----END PGP SIGNATURE-----

From kai.blin at biotech.uni-tuebingen.de  Tue Jan  8 09:36:03 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Tue, 08 Jan 2013 15:36:03 +0100
Subject: [Biopython-dev] More relaxed parsing of wonky GenBank files
In-Reply-To: <50EC255E.5040904@biotech.uni-tuebingen.de>
References: <50EBF4CF.9080901@biotech.uni-tuebingen.de>
	<CAKVJ-_5nr3n46RPbtazb4LLyFUzPzRgxjtOdbrQRK3w-c-tWQA@mail.gmail.com>
	<50EC255E.5040904@biotech.uni-tuebingen.de>
Message-ID: <50EC2ED3.8000401@biotech.uni-tuebingen.de>

On 2013-01-08 14:55, Kai Blin wrote:

> Thanks for the comments. I'll push to my github fork once I've got 
> something.

Pull request is at https://github.com/biopython/biopython/pull/145

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben

From redmine at redmine.open-bio.org  Wed Jan  9 17:58:25 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 9 Jan 2013 22:58:25 +0000
Subject: [Biopython-dev] [Biopython - Bug #3403] (New) PDBList fails to
	download large PDB structures
Message-ID: <redmine.issue-3403.20130109225825@redmine.open-bio.org>


Issue #3403 has been reported by David Cain.

----------------------------------------
Bug #3403: PDBList fails to download large PDB structures
https://redmine.open-bio.org/issues/3403

Author: David Cain
Status: New
Priority: High
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: https://github.com/DavidCain/biopython/tree/fix_pdb_dl


The current @PDBList@ module will often fail to download large PDB files.

<pre>
>>> from Bio.PDB import PDBList
>>> pdbl = PDBList()
>>> pdbl.retrieve_pdb_file("1hgg")
Downloading PDB structure '1hgg'...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/Bio/PDB/PDBList.py", line 247, in retrieve_pdb_file
    out.writelines(gz.read())
  File "/usr/lib/python2.7/gzip.py", line 249, in read
    self._read(readsize)
  File "/usr/lib/python2.7/gzip.py", line 303, in _read
    self._read_eof()
  File "/usr/lib/python2.7/gzip.py", line 342, in _read_eof
    hex(self.crc)))
IOError: CRC check failed 0x21d7a5f7 != 0x4b5eabb6L
>>>
</pre>

The source of this problem is that the entire gzipped file must be read into memory before it's written to disk locally. With large archives, the local file can be truncated prematurely, which causes gzip to crash on extraction.

I fixed this issue on my "GitHub branch":https://github.com/DavidCain/biopython/tree/fix_pdb_dl, which I've made a pull request for.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Wed Jan  9 17:58:25 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 9 Jan 2013 22:58:25 +0000
Subject: [Biopython-dev] [Biopython - Bug #3403] (New) PDBList fails to
	download large PDB structures
Message-ID: <redmine.issue-3403.20130109225825@redmine.open-bio.org>


Issue #3403 has been reported by David Cain.

----------------------------------------
Bug #3403: PDBList fails to download large PDB structures
https://redmine.open-bio.org/issues/3403

Author: David Cain
Status: New
Priority: High
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: https://github.com/DavidCain/biopython/tree/fix_pdb_dl


The current @PDBList@ module will often fail to download large PDB files.

<pre>
>>> from Bio.PDB import PDBList
>>> pdbl = PDBList()
>>> pdbl.retrieve_pdb_file("1hgg")
Downloading PDB structure '1hgg'...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/Bio/PDB/PDBList.py", line 247, in retrieve_pdb_file
    out.writelines(gz.read())
  File "/usr/lib/python2.7/gzip.py", line 249, in read
    self._read(readsize)
  File "/usr/lib/python2.7/gzip.py", line 303, in _read
    self._read_eof()
  File "/usr/lib/python2.7/gzip.py", line 342, in _read_eof
    hex(self.crc)))
IOError: CRC check failed 0x21d7a5f7 != 0x4b5eabb6L
>>>
</pre>

The source of this problem is that the entire gzipped file must be read into memory before it's written to disk locally. With large archives, the local file can be truncated prematurely, which causes gzip to crash on extraction.

I fixed this issue on my "GitHub branch":https://github.com/DavidCain/biopython/tree/fix_pdb_dl, which I've made a pull request for.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Wed Jan  9 18:08:28 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 9 Jan 2013 23:08:28 +0000
Subject: [Biopython-dev] [Biopython - Bug #3403] PDBList fails to download
	large PDB structures
References: <redmine.issue-3403.20130109225825@redmine.open-bio.org>
Message-ID: <redmine.journal-15062.20130109230828@redmine.open-bio.org>


Issue #3403 has been updated by David Cain.


(Pull request "here":https://github.com/biopython/biopython/pull/146)
----------------------------------------
Bug #3403: PDBList fails to download large PDB structures
https://redmine.open-bio.org/issues/3403

Author: David Cain
Status: New
Priority: High
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: https://github.com/DavidCain/biopython/tree/fix_pdb_dl


The current @PDBList@ module will often fail to download large PDB files.

<pre>
>>> from Bio.PDB import PDBList
>>> pdbl = PDBList()
>>> pdbl.retrieve_pdb_file("1hgg")
Downloading PDB structure '1hgg'...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/Bio/PDB/PDBList.py", line 247, in retrieve_pdb_file
    out.writelines(gz.read())
  File "/usr/lib/python2.7/gzip.py", line 249, in read
    self._read(readsize)
  File "/usr/lib/python2.7/gzip.py", line 303, in _read
    self._read_eof()
  File "/usr/lib/python2.7/gzip.py", line 342, in _read_eof
    hex(self.crc)))
IOError: CRC check failed 0x21d7a5f7 != 0x4b5eabb6L
>>>
</pre>

The source of this problem is that the entire gzipped file must be read into memory before it's written to disk locally. With large archives, the local file can be truncated prematurely, which causes gzip to crash on extraction.

I fixed this issue on my "GitHub branch":https://github.com/DavidCain/biopython/tree/fix_pdb_dl, which I've made a pull request for.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Wed Jan  9 18:55:13 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 9 Jan 2013 23:55:13 +0000
Subject: [Biopython-dev] Fwd: [biopython] Fix broken downloading of large
	PDB structures (#146)
In-Reply-To: <biopython/biopython/pull/146@github.com>
References: <biopython/biopython/pull/146@github.com>
Message-ID: <CAKVJ-_6zi6LVva0uvWjm=ooHiho5MAR=r0Cgnxi64yG2h0fmJA@mail.gmail.com>

FYI

---------- Forwarded message ----------
From: David Cain <notifications at github.com>
Date: Wed, Jan 9, 2013 at 10:59 PM
Subject: [biopython] Fix broken downloading of large PDB structures (#146)
To: biopython/biopython <biopython at noreply.github.com>


Summary of changes

   - Fix failure to download large PDB files
   - Use with statements for safer file I/O
   - Remove obsolete parameters
   - PEP 8 changes, update documentation

Failure to download large PDB files

(See: Redmine bug #3403 <https://redmine.open-bio.org/issues/3403>)

The current PDBList module will often fail to download large PDB files.

>>> from Bio.PDB import PDBList
>>> pdbl = PDBList()
>>> pdbl.retrieve_pdb_file("1hgg")
...
IOError: CRC check failed 0x21d7a5f7 != 0x4b5eabb6L
>>>

The source of this problem is that the entire gzipped file must be read
into memory before it's written to disk locally.

Instead of this memory-intensive approach, I changed the downloading to
use urllib.urlretrieve, which is more readable and far more efficient.
Obsolete parameters

The long-obsolete parameters to retrieve_pdb_file(() have been
removed. Formerly, the function allowed the user to specify compression
and/or a system utility to perform decompression. But all archives are
now gzipped, and PDBList uses Python's gzip module to decompress
archives. These parameters have been obsolete for over a year (they were
marked deprecated with commit
7ebf6e9<https://github.com/biopython/biopython/commit/7ebf6e9ecb>
).
------------------------------
You can merge this Pull Request by running

  git pull https://github.com/DavidCain/biopython fix_pdb_dl

Or view, comment on, or merge it at:

  https://github.com/biopython/biopython/pull/146
Commit Summary

   - Use urlretrieve to smartly download PDB archives
   - Use 'with' statement for safer file I/O
   - Collapse unwieldy if-else structure
   - PEP8 fixes within retrieve_pdb_file
   - Remove deprecated parameters
   - Update with clarifying comments
   - PEP8 fixes, updated comments for file
   - Use urlretrieve in other instance of save to disk

File Changes

   - *M* Bio/PDB/PDBList.py (217)

Patch Links:

   - https://github.com/biopython/biopython/pull/146.patch
   - https://github.com/biopython/biopython/pull/146.diff

From mjldehoon at yahoo.com  Thu Jan 10 04:21:34 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 10 Jan 2013 01:21:34 -0800 (PST)
Subject: [Biopython-dev] Bio._utils iterlen not needed
Message-ID: <1357809694.20781.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Dear all,

As far as I can tell the iterlen function in Bio._utils is not needed.
Simply calling len(items) does exactly what iterlen does, and is much faster too.

For the other functions, are they important enough to warrant a separate module? From our previous experience in Biopython, these kinds of utility modules tend to be underused. This is because the functions are simple and therefore easy to replicate, and often they do not do exactly what is needed in a particular module. Similar utility modules in Biopython in the past were forgotten after a while, and then deprecated and removed.

Best,
-Michiel.

From p.j.a.cock at googlemail.com  Thu Jan 10 08:03:50 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 10 Jan 2013 13:03:50 +0000
Subject: [Biopython-dev] Bio._utils iterlen not needed
In-Reply-To: <1357809694.20781.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <1357809694.20781.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_4yzp-TDJ9y_ATYYoD+6vhkhj7-Xm9dW86N_ELSeQGkrQ@mail.gmail.com>

On Thu, Jan 10, 2013 at 9:21 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> Dear all,
>
> As far as I can tell the iterlen function in Bio._utils is not needed.
> Simply calling len(items) does exactly what iterlen does, and is much faster too.

No, the reason d'?tre for iterlen is that you can't use len on an iterator, e.g.

>>> len(iter("abcde"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'iterator' has no len()

>>> from Bio._utils import iterlen
>>> iterlen(iter("abcde"))
5

Perhaps the function needs a little more documentation...

> For the other functions, are they important enough to warrant
> a separate module? From our previous experience in Biopython,
> these kinds of utility modules tend to be underused. This is
> because the functions are simple and therefore easy to
> replicate, and often they do not do exactly what is needed
> in a particular module. Similar utility modules in Biopython
> in the past were forgotten after a while, and then deprecated
> and removed.

Note that Bio._utils has a leading underscore - these are
therefore a 'private' API which we don't have to worry about
maintaining and deprecated etc in the same way as a public
API. We're not expect end users to use this module ;)

The functions here were originally helper functions used in
Bio.Phylo which are now also used in Bio.SearchIO - I think
a shared private module like this is a good compromise
between code duplication and top level modules.

Peter


From mjldehoon at yahoo.com  Thu Jan 10 12:24:14 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 10 Jan 2013 09:24:14 -0800 (PST)
Subject: [Biopython-dev] Bio._utils iterlen not needed
In-Reply-To: <CAKVJ-_4yzp-TDJ9y_ATYYoD+6vhkhj7-Xm9dW86N_ELSeQGkrQ@mail.gmail.com>
Message-ID: <1357838654.1021.YahooMailClassic@web164001.mail.gq1.yahoo.com>

--- On Thu, 1/10/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> > Simply calling len(items) does exactly what iterlen
> does, and is much faster too.
> 
> No, the reason d'?tre for iterlen is that you can't use len
> on an iterator, e.g.
> 
> >>> len(iter("abcde"))
> Traceback (most recent call last):
> ? File "<stdin>", line 1, in <module>
> TypeError: object of type 'iterator' has no len()
> 
You're right. Actually it depends on the iterator. For example,
len(xrange(100)) works (xrange also returns an iterator). I guess in general an iterator can't have a len() function because it's not clear that the iterator will ever end.

That said, currently the iterlen function is used in only one place, in Bio/Phylo/BaseTree.py as follows:

    def count_terminals(self):
        return _utils.iterlen(self.find_clades(terminal=True))

But here you could simply have

    def count_terminals(self):
        clades = self.find_clades(terminal=True)
        count = 0
        for clade in clades:
            count+=1
        return count

I don't see why we need a function iterlen for this, and if we do have such a function, why it should be in Bio._utils.

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Thu Jan 10 16:16:12 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 10 Jan 2013 21:16:12 +0000
Subject: [Biopython-dev] Bio._utils iterlen not needed
In-Reply-To: <1357838654.1021.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <CAKVJ-_4yzp-TDJ9y_ATYYoD+6vhkhj7-Xm9dW86N_ELSeQGkrQ@mail.gmail.com>
	<1357838654.1021.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7P2SX0ctkbPf-zLxwTwANjwKPxMtKLgakymTYcoooM-Q@mail.gmail.com>

On Thu, Jan 10, 2013 at 5:24 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Thu, 1/10/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> > Simply calling len(items) does exactly what iterlen
>> > does, and is much faster too.
>>
>> No, the reason d'?tre for iterlen is that you can't use len
>> on an iterator, e.g.
>>
>> >>> len(iter("abcde"))
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> TypeError: object of type 'iterator' has no len()
>
> You're right. Actually it depends on the iterator. For example,
> len(xrange(100)) works (xrange also returns an iterator). I guess
> in general an iterator can't have a len() function because it's not
> clear that the iterator will ever end.

Good point - I didn't know xrange defined __len__, and you are
right in general - other iterator object could also do that:

https://github.com/biopython/biopython/commit/57ae89cdedbc1e18495ffb615a3a1d2c9feb0296

> That said, currently the iterlen function is used in only one place,
> in Bio/Phylo/BaseTree.py as follows:

True. I hadn't checked that - I assumed it was used more
than once. If there are no other natural placed where it would
make sense then yes, it might as well be done in line once,
and Bio._utils.iterlen could be removed.

When written, iterlen was in private module Bio.Phylo._sugar
(CC'ing Eric) which Bow moved to Bio._utils as he wanted to
use some of it in SearchIO.

Peter


From eric.talevich at gmail.com  Thu Jan 10 16:50:45 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Thu, 10 Jan 2013 16:50:45 -0500
Subject: [Biopython-dev] Bio._utils iterlen not needed
In-Reply-To: <CAKVJ-_7P2SX0ctkbPf-zLxwTwANjwKPxMtKLgakymTYcoooM-Q@mail.gmail.com>
References: <CAKVJ-_4yzp-TDJ9y_ATYYoD+6vhkhj7-Xm9dW86N_ELSeQGkrQ@mail.gmail.com>
	<1357838654.1021.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAKVJ-_7P2SX0ctkbPf-zLxwTwANjwKPxMtKLgakymTYcoooM-Q@mail.gmail.com>
Message-ID: <CAMC681kYPV=Z74-o2f14guYBPhnyAv7DAuGdrrtt1NLNQUOMxQ@mail.gmail.com>

On Thu, Jan 10, 2013 at 4:16 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Thu, Jan 10, 2013 at 5:24 PM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > That said, currently the iterlen function is used in only one place,
> > in Bio/Phylo/BaseTree.py as follows:
>
> True. I hadn't checked that - I assumed it was used more
> than once. If there are no other natural placed where it would
> make sense then yes, it might as well be done in line once,
> and Bio._utils.iterlen could be removed.
>
> When written, iterlen was in private module Bio.Phylo._sugar
> (CC'ing Eric) which Bow moved to Bio._utils as he wanted to
> use some of it in SearchIO.
>

That's all true. I created _sugar.py during GSoC 2009 for utility code that
Bio.Phylo needed, but wasn't related to trees in any way -- similar to
Bow's thinking. I probably meant to get rid of the module entirely after
the grand merge (hence the note at the top of _sugar.py to keep the file as
small as possible). IIRC, I made it a separate function while testing
whether "enumerate" or "cnt += 1" would be faster.

I have no objections to getting rid of the function now.

-E

From mjldehoon at yahoo.com  Fri Jan 11 07:36:15 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 11 Jan 2013 04:36:15 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
Message-ID: <1357907775.13851.YahooMailClassic@web164001.mail.gq1.yahoo.com>

Hi everybody,

Bio.ParserSupport has had a PendingDeprecationWarning since Biopython 1.59, so we may consider upgrading this to a BiopythonDeprecationWarning in Biopython 1.61 before removing Bio.ParserSupport. The only tricky point is that then we would also have to upgrade the PendingDeprecationWarning in Bio/Blast/NCBIStandalone.py to a BiopythonDeprecationWarning, as that code relies on Bio.ParserSupport. Bio.Blast.NCBIStandalone has had this PendingDeprecationWarning since Biopython release 1.56.

Any objections? This may help giving Bow's Bio.SearchIO module some more prominence.

On a related point, the fact that we are deprecating Bio.ParserSupport (which was a painful process) suggests that having a new module Bio._utils with a set of generic utility functions is not a good idea.

Best,
-Michiel.

From p.j.a.cock at googlemail.com  Fri Jan 11 10:33:05 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 11 Jan 2013 15:33:05 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1357907775.13851.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <1357907775.13851.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_41GC2oRy2n8ycmc2m=kPp4_E1eJKHky0NBAUCG1iZe9A@mail.gmail.com>

On Fri, Jan 11, 2013 at 12:36 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> Bio.ParserSupport has had a PendingDeprecationWarning since Biopython
> 1.59, so we may consider upgrading this to a BiopythonDeprecationWarning in
> Biopython 1.61 before removing Bio.ParserSupport. The only tricky point is
> that then we would also have to upgrade the PendingDeprecationWarning in
> Bio/Blast/NCBIStandalone.py to a BiopythonDeprecationWarning, as that code
> relies on Bio.ParserSupport. Bio.Blast.NCBIStandalone has had this
> PendingDeprecationWarning since Biopython release 1.56.
>
> Any objections? This may help giving Bow's Bio.SearchIO module some more
> prominence.

Bow's SearchIO is using Bio.Blast.NCBIStandalone to handle plain text,
https://github.com/biopython/biopython/blob/master/Bio/SearchIO/BlastIO/blast_text.py

We'd discussed a new parser targeting just the plain text from BLAST+
(and if not too different maybe the final legacy BLAST release), which
should be less diverse that the current range of BLAST quirks built up
over the years.

> On a related point, the fact that we are deprecating Bio.ParserSupport
> (which was a painful process) suggests that having a new module Bio._utils
> with a set of generic utility functions is not a good idea.

That's why Bio._utils is a private module - we can drop/change/etc
this without worrying about breaking other people's code. The issue
with Bio.ParserSupport is it was a public API.

Regards,

Peter

From w.arindrarto at gmail.com  Sun Jan 13 10:22:13 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Sun, 13 Jan 2013 16:22:13 +0100
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CAKVJ-_41GC2oRy2n8ycmc2m=kPp4_E1eJKHky0NBAUCG1iZe9A@mail.gmail.com>
References: <1357907775.13851.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAKVJ-_41GC2oRy2n8ycmc2m=kPp4_E1eJKHky0NBAUCG1iZe9A@mail.gmail.com>
Message-ID: <CADEGkF6+rQtV_XK6Y40UJ9bn+52Ed9ZbuF5N46pUUTjJjq9c1g@mail.gmail.com>

Hi everyone,

>> Bio.ParserSupport has had a PendingDeprecationWarning since Biopython
>> 1.59, so we may consider upgrading this to a BiopythonDeprecationWarning in
>> Biopython 1.61 before removing Bio.ParserSupport. The only tricky point is
>> that then we would also have to upgrade the PendingDeprecationWarning in
>> Bio/Blast/NCBIStandalone.py to a BiopythonDeprecationWarning, as that code
>> relies on Bio.ParserSupport. Bio.Blast.NCBIStandalone has had this
>> PendingDeprecationWarning since Biopython release 1.56.
>>
>> Any objections? This may help giving Bow's Bio.SearchIO module some more
>> prominence.
>
> Bow's SearchIO is using Bio.Blast.NCBIStandalone to handle plain text,
> https://github.com/biopython/biopython/blob/master/Bio/SearchIO/BlastIO/blast_text.py
>
> We'd discussed a new parser targeting just the plain text from BLAST+
> (and if not too different maybe the final legacy BLAST release), which
> should be less diverse that the current range of BLAST quirks built up
> over the years.

Yes. Until such a parser is ready, Bio.ParserSupport is still needed.
We may still deprecate it from the visible / public namespace and move
it into a private module, though. If we are also deprecating
Bio.BLAST, then moving Bio.BLAST.NCBIStandalone into a private module
as well seems like an ok fix for the time being.

regards,
Bow

From p.j.a.cock at googlemail.com  Tue Jan 15 10:28:07 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 15 Jan 2013 15:28:07 +0000
Subject: [Biopython-dev] buildbot issue on Python 3.1 - stdout?
In-Reply-To: <CADEGkF6Sk0N7-2Ygay7FD_hGP-ZXyhKYkpXdp=qPy9mg++_WxQ@mail.gmail.com>
References: <CAKVJ-_4HJ-Qze2UwFtnU8MkHQc3dBL0t=aJW=wdJ08aOSt8gUA@mail.gmail.com>
	<CAKVJ-_4qi0txaYWXv8axJHf_WJJc7uZiRLdo3MBx_5BtSZrR6w@mail.gmail.com>
	<CADEGkF41Fu8BzuBh_3DfRSF5SS6C8UecU7F-TXTgnd-Md44Kcw@mail.gmail.com>
	<CAKVJ-_5SjfRFiiKSatU9ds8b5ESdUTexa3TP=k+W=TPmtHoTfA@mail.gmail.com>
	<CADEGkF6Sk0N7-2Ygay7FD_hGP-ZXyhKYkpXdp=qPy9mg++_WxQ@mail.gmail.com>
Message-ID: <CAKVJ-_4J1np7P7DYfSiWGKFYNzztLOveGFGwo6QuhtpjQpovKg@mail.gmail.com>

On Fri, Dec 14, 2012 at 12:48 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi everyone,
>
>>> It's reproducible in my machine: Arch Linux 64 bit running
>>> Python3.1.5. Haven't figured out a fix yet, but trying to see if I
>>> can.
>>
>> Great. We haven't really proved this is down to a change in
>> either Python 3.1.4 or 3.1.5 but it does look likely.
>
> It's reproduced in my local 3.1.4 installation. Seems like an unfixed
> bug that went through to 3.1.5.

Regarding this issue with test_Emboss.py,
AttributeError: '_io.FileIO' object has no attribute 'read1'
http://lists.open-bio.org/pipermail/biopython-dev/2012-December/010156.html

I've now tried downgrading Python 3.1 on this machine, and it does
seem to be a problem under Python 3.1.4 and 3.1.5 but not 3.1.3.
For now I have simply left this buildslave running 3.1.3 instead. I
will also downgrade Python 3.1 on the second 64 bit Linux server.

That should take care of the annoying buildbot failures (and the
daily email I've been getting). This thread may help someone else
with a similar issue, but I don't feel inclined to try and explore in
any more depth what exactly is going wrong under Python 3.1.4
and 3.1.5, and if there is a Python bug we should report.

Regards,

Peter

From kai.blin at biotech.uni-tuebingen.de  Tue Jan 15 10:54:45 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Tue, 15 Jan 2013 16:54:45 +0100
Subject: [Biopython-dev] More 'fun' with GenBank
Message-ID: <50F57BC5.7020607@biotech.uni-tuebingen.de>

Hi folks,

as people are hitting my web service with all sorts of wonky GenBank
files, I've stumbled over another one that throws the GenBank parser off
track.

The culprit is a SeqFeature with a location line like:

     CDS             join(complement(4093..4338),complement(3876..4011),
                     complement(3655..3809),complement(3284..3585),
                     complement(2421..2813),complement(2057..2303))

Now, the way I read the GenBank spec, this is not a valid location line,
but should instead be a complement() of joins(). Unfortunately, the NCBI
seems to disagree with its own specs, and put the record into their
Nucleotide database as CABT02000004, which means that by all practical
purposes, it _is_ a valid GenBank file and the parser should cope.

The parser looks at this location and creates a feature on the -1
strand, from 4092:2303. This is caused by by the feature location
calculation on
https://github.com/biopython/biopython/blob/master/Bio/GenBank/__init__.py#L1049
and the lines after.

In short, we do
            s = cur_feature.sub_features[0].location.start
            e = cur_feature.sub_features[-1].location.end
            cur_feature.location = SeqFeature.FeatureLocation(s, e, strand)

And when the join() looks like the record I'm dealing with, this is
clearly the wrong way around.

I decided to fix this by sorting the subfeatures by start,end
coordinates, and that fixes this issue for me.

Unfortunately, this also breaks an existing test, the extra_keywords.gb
test.
https://github.com/biopython/biopython/blob/master/Tests/GenBank/extra_keywords.gb#L647
has a feature that has a location of

     CDS             join(153490..154269,AL121804.2:41..610,
                     AL121804.2:672..1487)

Here, we probably do want the feature from 153489:1487, even though I'm
not sure how useful such a location really is.

So I decided to fix this by sorting the subfeatures first on their ref,
and then on start, end.

This again breaks a test, this time in one_of.gb
https://github.com/biopython/biopython/blob/master/Tests/GenBank/one_of.gb#L39
where the location line is

     CDS join(2201..2479,U18267.1:120..246,U18268.1:130..288,
                     U18270.1:4691..4788,U18269.1:82..>128)

Here, the U18270.1 record seems to come befire the U18269.1 record.

Now, we're again spanning a feature into multiple contigs, none of which
are accessible to the extract() function as far as I'm aware.
Sorting the locations by start, end (and maybe ref first) at least fixes
the case CABT02000004 is broken on where we have the chance of getting
extract() to work.

The attached patch is my proposed change, but I wanted to get some
feedback first before opening a bug and/or submitting a pull request.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-GenBank-Sort-subfeatures-by-ref-and-start-end-positi.patch
Type: text/x-patch
Size: 9059 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20130115/f7c0bb7d/attachment.bin>

From p.j.a.cock at googlemail.com  Tue Jan 15 11:41:32 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 15 Jan 2013 16:41:32 +0000
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <50F57BC5.7020607@biotech.uni-tuebingen.de>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>

On Tue, Jan 15, 2013 at 3:54 PM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
> Hi folks,
>
> as people are hitting my web service with all sorts of wonky GenBank
> files, I've stumbled over another one that throws the GenBank parser off
> track.
>
> The culprit is a SeqFeature with a location line like:
>
>      CDS             join(complement(4093..4338),complement(3876..4011),
>                      complement(3655..3809),complement(3284..3585),
>                      complement(2421..2813),complement(2057..2303))
>
> Now, the way I read the GenBank spec, this is not a valid location line,
> but should instead be a complement() of joins(). Unfortunately, the NCBI
> seems to disagree with its own specs, and put the record into their
> Nucleotide database as CABT02000004, which means that by all practical
> purposes, it _is_ a valid GenBank file and the parser should cope.

That should work - for a while GenBank and EMBL didn't agree about
joins on the complement strand, one did complement(join(a..b,c..d))
and the other join(complement(c..d),complement(a..b)), notice the
order of the sub-regions flips.

> The parser looks at this location and creates a feature on the -1
> strand, from 4092:2303. This is caused by by the feature location
> calculation on
> https://github.com/biopython/biopython/blob/master/Bio/GenBank/__init__.py#L1049
> and the lines after.
>
> In short, we do
>             s = cur_feature.sub_features[0].location.start
>             e = cur_feature.sub_features[-1].location.end
>             cur_feature.location = SeqFeature.FeatureLocation(s, e, strand)

For join feature locations, the sub-feature locations should be fine
but the overall feature location is a bit weird/broken for negative
and mixed strands.

This was one of the things the re-factoring on this branch aimed to
fix, https://github.com/peterjc/biopython/tree/f_loc4/
http://lists.open-bio.org/pipermail/biopython-dev/2012-July/009803.html

I was intending to bring this up again after the next release (which
could be later this month or February 2012), but perhaps it would
be worth doing now?

Peter

From arklenna at gmail.com  Tue Jan 15 12:19:48 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Tue, 15 Jan 2013 12:19:48 -0500
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
	<CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
Message-ID: <CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>

+1 for f_loc4. The FeatureLocation/CompoundLocation classes will hopefully
make handling joins and other GenBank operators a little more logical. Not
to mention my CoordinateMapper is based on this branch!

Lenna


On Tue, Jan 15, 2013 at 11:41 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Tue, Jan 15, 2013 at 3:54 PM, Kai Blin
> <kai.blin at biotech.uni-tuebingen.de> wrote:
> > Hi folks,
> >
> > as people are hitting my web service with all sorts of wonky GenBank
> > files, I've stumbled over another one that throws the GenBank parser off
> > track.
> >
> > The culprit is a SeqFeature with a location line like:
> >
> >      CDS             join(complement(4093..4338),complement(3876..4011),
> >                      complement(3655..3809),complement(3284..3585),
> >                      complement(2421..2813),complement(2057..2303))
> >
> > Now, the way I read the GenBank spec, this is not a valid location line,
> > but should instead be a complement() of joins(). Unfortunately, the NCBI
> > seems to disagree with its own specs, and put the record into their
> > Nucleotide database as CABT02000004, which means that by all practical
> > purposes, it _is_ a valid GenBank file and the parser should cope.
>
> That should work - for a while GenBank and EMBL didn't agree about
> joins on the complement strand, one did complement(join(a..b,c..d))
> and the other join(complement(c..d),complement(a..b)), notice the
> order of the sub-regions flips.
>
> > The parser looks at this location and creates a feature on the -1
> > strand, from 4092:2303. This is caused by by the feature location
> > calculation on
> >
> https://github.com/biopython/biopython/blob/master/Bio/GenBank/__init__.py#L1049
> > and the lines after.
> >
> > In short, we do
> >             s = cur_feature.sub_features[0].location.start
> >             e = cur_feature.sub_features[-1].location.end
> >             cur_feature.location = SeqFeature.FeatureLocation(s, e,
> strand)
>
> For join feature locations, the sub-feature locations should be fine
> but the overall feature location is a bit weird/broken for negative
> and mixed strands.
>
> This was one of the things the re-factoring on this branch aimed to
> fix, https://github.com/peterjc/biopython/tree/f_loc4/
> http://lists.open-bio.org/pipermail/biopython-dev/2012-July/009803.html
>
> I was intending to bring this up again after the next release (which
> could be later this month or February 2012), but perhaps it would
> be worth doing now?
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

From p.j.a.cock at googlemail.com  Tue Jan 15 14:03:51 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 15 Jan 2013 19:03:51 +0000
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
	<CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
	<CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>
Message-ID: <CAKVJ-_77kdYzXv_Q_KVqy9jWSNJSgU+PWdVB-DzxdF8TKwUAGg@mail.gmail.com>

On Tue, Jan 15, 2013 at 5:19 PM, Lenna Peterson <arklenna at gmail.com> wrote:
> +1 for f_loc4. The FeatureLocation/CompoundLocation classes will hopefully
> make handling joins and other GenBank operators a little more logical. Not
> to mention my CoordinateMapper is based on this branch!
>
> Lenna

It will need a bit of work to rebase (some of the PEP8 changes have
touched the same lines of code), but I will try and do that this week.

Peter

From antony.lee at berkeley.edu  Tue Jan 15 16:45:19 2013
From: antony.lee at berkeley.edu (Antony Lee)
Date: Tue, 15 Jan 2013 13:45:19 -0800
Subject: [Biopython-dev] Circular sequences
Message-ID: <20130115214519.GC8511@gmail.com>

Hi all,

While working on a (more sane?) rewrite of the Restriction library
(https://github.com/biopython/biopython/pull/148), I found the need
to add a circular/linear attribute to sequence objects (just as the
currently existing Restriction library does).  So I quickly added such
a class, independently of whatever Biopython currently provides.  But
it seems like the module would be better integrated in the rest of
Biopython if it used Bio.Seq.Seq instead.

I saw that CircularSeqs have already been discussed on the mailing
list, and the main issue was with indexing and slicing.  So here are my
thoughts about how such an object should behave.  Assume a circular seq
s of length 10.  Simple indexing works modulo 10 (and negative indices
work identically).  Methods that return one or more indices return the
indices modulo 10.  Slicing with both ends defined (i.e. s[x:y(:z)])
wrap as many times as needed around the sequence if y >= x, and make at
most one complete cycle if y < x (i.e. add len(s) as many times as
needed to y to make it bigger than x, and stop there).  Slicing with one
or both ends undefined (ie. s[:], s[x:], s[:y]) raises an IndexError
(because, well, I read s[x:] as "return the elements of s starting from
the x'th until the end"... but there is no such end.).  (A second option
would be to return an infinite iterable for s[x:], but that doesn't take
care of s[:y] anyways, not to mention the bugs that may appear from
that.)

A few other issues were addressed in the previous thread.  I think that
adding CircularSeqs does not make sense at all (so __add__ raises a
ValueError), and translation can either check for the presence of a stop
codon and raise ValueError otherwise, or return an infinite iterator.

Another thing that may be useful for a restriction analysis library is a
good way to represent a dsDNA sequence with some overhangs.  Any
thoughts?

Antony

From kai.blin at biotech.uni-tuebingen.de  Wed Jan 16 03:28:06 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Wed, 16 Jan 2013 09:28:06 +0100
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <CAKVJ-_77kdYzXv_Q_KVqy9jWSNJSgU+PWdVB-DzxdF8TKwUAGg@mail.gmail.com>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
	<CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
	<CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>
	<CAKVJ-_77kdYzXv_Q_KVqy9jWSNJSgU+PWdVB-DzxdF8TKwUAGg@mail.gmail.com>
Message-ID: <50F66496.8000109@biotech.uni-tuebingen.de>

On 2013-01-15 20:03, Peter Cock wrote:

Hi Peter,

> It will need a bit of work to rebase (some of the PEP8 changes have
> touched the same lines of code), but I will try and do that this week.

Your f_loc4 branch certainly fixes the problem I'm seeing. Is there
anything I can do to help with getting it merged? I'm happy to give a
closer look at the rebase conflicts coming up during the merge if you
don't mind me asking the occasional question if I can't work out reasons
for a code change from the commit messages.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben

From Markus.Piotrowski at ruhr-uni-bochum.de  Wed Jan 16 04:42:54 2013
From: Markus.Piotrowski at ruhr-uni-bochum.de (Markus Piotrowski)
Date: 16 Jan 2013 10:42:54 +0100
Subject: [Biopython-dev] Circular sequences
In-Reply-To: <20130115214519.GC8511@gmail.com>
References: <20130115214519.GC8511@gmail.com>
Message-ID: <50F6761E.9000606@ruhr-uni-bochum.de>

Am 15.01.2013 22:45, schrieb Antony Lee:
> needed to y to make it bigger than x, and stop there).  Slicing with one
> or both ends undefined (ie. s[:], s[x:], s[:y]) raises an IndexError
> (because, well, I read s[x:] as "return the elements of s starting from
> the x'th until the end"... but there is no such end.).  (A second option
> would be to return an infinite iterable for s[x:], but that doesn't take
> care of s[:y] anyways, not to mention the bugs that may appear from
> that.)

Another possibility, which makes some biological sense (thinking on 
restriction), would be that
s[x:] (or s[:y]) returns a linear sequence starting at x and ending with 
x-1 (or ending with y and starting at y+1). Thus, s[x:] would mean 'cut 
my circle at x and return the linear sequence starting at x'.

Markus


From p.j.a.cock at googlemail.com  Wed Jan 16 05:24:13 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 16 Jan 2013 10:24:13 +0000
Subject: [Biopython-dev] Circular sequences
In-Reply-To: <50F6761E.9000606@ruhr-uni-bochum.de>
References: <20130115214519.GC8511@gmail.com>
	<50F6761E.9000606@ruhr-uni-bochum.de>
Message-ID: <CAKVJ-_5U+fYJY+aaQ+L5SaQTwQ+WT-O8_5O4zpLJv+FSJ00u7w@mail.gmail.com>

For those that missed it last time, I think the most recent in depth
discussion about circular sequences and slicing was here:

http://lists.open-bio.org/pipermail/biopython/2011-March/007075.html
...
http://lists.open-bio.org/pipermail/biopython/2011-March/007085.html

On Wed, Jan 16, 2013 at 9:42 AM, Markus Piotrowski
<Markus.Piotrowski at ruhr-uni-bochum.de> wrote:
> Am 15.01.2013 22:45, schrieb Antony Lee:
>
>> needed to y to make it bigger than x, and stop there).  Slicing with one
>> or both ends undefined (ie. s[:], s[x:], s[:y]) raises an IndexError
>> (because, well, I read s[x:] as "return the elements of s starting from
>> the x'th until the end"... but there is no such end.).  (A second option
>> would be to return an infinite iterable for s[x:], but that doesn't take
>> care of s[:y] anyways, not to mention the bugs that may appear from
>> that.)
>
>
> Another possibility, which makes some biological sense (thinking on
> restriction), would be that
> s[x:] (or s[:y]) returns a linear sequence starting at x and ending with x-1
> (or ending with y and starting at y+1). Thus, s[x:] would mean 'cut my
> circle at x and return the linear sequence starting at x'.

That's exactly the kind of behaviour which would make me nervous
given in general the Biopython sequence objects mimic Python strings.
There are many examples where that 'extra' sequence would be
unexpected. For instance, writing out line wrapped sequence data.

I would prefer an explicit method like 'cut' on a circular sequence
object returning a full length linear sequence. Similarly a 'roll' or
'rotate' method could shift the origin to a new coordinate.

One simple solution to the complexities of the slice behaviour is
the practical one: They act like Python strings, basically all we
would be adding would an 'is circular' flag and some logic about
how to propagate that flag in operations like addition and slicing.
If we went that route it might still be possible to make the find and
'in' functionality origin aware... but that may just cause trouble.

This would solve where to store if a sequence is circular (e.g. when
reading GenBank and EMBL files - or for handling restriction
enzyme digests), but other than that not add much utility.

Thoughts?

Peter

From antony.lee at berkeley.edu  Wed Jan 16 14:09:32 2013
From: antony.lee at berkeley.edu (Antony Lee)
Date: Wed, 16 Jan 2013 11:09:32 -0800
Subject: [Biopython-dev] Circular sequences
In-Reply-To: <CAKVJ-_5U+fYJY+aaQ+L5SaQTwQ+WT-O8_5O4zpLJv+FSJ00u7w@mail.gmail.com>
References: <20130115214519.GC8511@gmail.com>
	<50F6761E.9000606@ruhr-uni-bochum.de>
	<CAKVJ-_5U+fYJY+aaQ+L5SaQTwQ+WT-O8_5O4zpLJv+FSJ00u7w@mail.gmail.com>
Message-ID: <20130116190932.GA1962@gmail.com>

I think the proposed behaviour makes biological sense (now s[x:] and
s[:y] mean "cut the sequence before x (or before y) and keep the
downstream (or upstream) sequence, whatever it is").  But I understand
Peter's concerns as well.  A quick grep showed me around 400 instances
of "[:" showing up in the current code base, and as many ":]", and most
of them seem to be related to string (as opposed to sequence) processing
so checking these may not be impossible (though not very fun of course),
but this won't protect against future mis-uses of sequence indexing.

So I think methods such as cut and roll are fine too (and go back to
raising ValueError when either or both ends of the slice are None).  Now
it would be the responsibility of sequence-consuming functions to start
by .cut()ting the sequence before slicing it.

find and __contains__ can be implemented easily (though perhaps
inelegantly) by changing "foo in circular(bar)" into "foo in linear(bar)
+ linear(bar)[:len(foo)-1]" (which is essentially what is done in both
Restriction libraries, the old and the new one).

Finally let me say that right now I don't use the most of the rest
of Biopython (and don't really think I'll use most of it in the near
future) so I care little about whether this specific feature gets
integrated or not; however I do think it is needed in a proper
restriction analysis library.  Indeed, one could say that we just have
to add a "circular=True|False" keyword argument to methods such as
search and catalyze, but that is not enough to distinguish e.g. if a
circular plasmid is digested once or not at all (of course, one can
check separately but what I mean there is that circularity is a natural
"output" of the functions, not just input).

Antony

On Wed, Jan 16, 2013 at 10:24:13AM +0000, Peter Cock wrote:
> For those that missed it last time, I think the most recent in depth
> discussion about circular sequences and slicing was here:
> 
> http://lists.open-bio.org/pipermail/biopython/2011-March/007075.html
> ...
> http://lists.open-bio.org/pipermail/biopython/2011-March/007085.html
> 
> On Wed, Jan 16, 2013 at 9:42 AM, Markus Piotrowski
> <Markus.Piotrowski at ruhr-uni-bochum.de> wrote:
> > Am 15.01.2013 22:45, schrieb Antony Lee:
> >
> >> needed to y to make it bigger than x, and stop there).  Slicing with one
> >> or both ends undefined (ie. s[:], s[x:], s[:y]) raises an IndexError
> >> (because, well, I read s[x:] as "return the elements of s starting from
> >> the x'th until the end"... but there is no such end.).  (A second option
> >> would be to return an infinite iterable for s[x:], but that doesn't take
> >> care of s[:y] anyways, not to mention the bugs that may appear from
> >> that.)
> >
> >
> > Another possibility, which makes some biological sense (thinking on
> > restriction), would be that
> > s[x:] (or s[:y]) returns a linear sequence starting at x and ending with x-1
> > (or ending with y and starting at y+1). Thus, s[x:] would mean 'cut my
> > circle at x and return the linear sequence starting at x'.
> 
> That's exactly the kind of behaviour which would make me nervous
> given in general the Biopython sequence objects mimic Python strings.
> There are many examples where that 'extra' sequence would be
> unexpected. For instance, writing out line wrapped sequence data.
> 
> I would prefer an explicit method like 'cut' on a circular sequence
> object returning a full length linear sequence. Similarly a 'roll' or
> 'rotate' method could shift the origin to a new coordinate.
> 
> One simple solution to the complexities of the slice behaviour is
> the practical one: They act like Python strings, basically all we
> would be adding would an 'is circular' flag and some logic about
> how to propagate that flag in operations like addition and slicing.
> If we went that route it might still be possible to make the find and
> 'in' functionality origin aware... but that may just cause trouble.
> 
> This would solve where to store if a sequence is circular (e.g. when
> reading GenBank and EMBL files - or for handling restriction
> enzyme digests), but other than that not add much utility.
> 
> Thoughts?
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

From redmine at redmine.open-bio.org  Fri Jan 18 04:43:26 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Fri, 18 Jan 2013 09:43:26 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15065.20130118094326@redmine.open-bio.org>


Issue #3395 has been updated by Michiel de Hoon.


Micha?, can you confirm that the fixed Bio.trie works for you? Then we can close this bug report.

----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Fri Jan 18 10:17:43 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Fri, 18 Jan 2013 15:17:43 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15066.20130118151743@redmine.open-bio.org>


Issue #3395 has been updated by Micha? Nowotka.


Can you just give me two more weeks? I need some time to evaluate it.
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From eric.talevich at gmail.com  Fri Jan 18 20:20:11 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Fri, 18 Jan 2013 20:20:11 -0500
Subject: [Biopython-dev] Support for NeXML and RDF trees in Bio.Phylo
In-Reply-To: <CAAzEd5Bz5xvc2Bz80Ru+FbUbJK-WnAjfvLv70SfkPZup89NGRQ@mail.gmail.com>
References: <CAAzEd5AvRgkr=UYmqwHPH+cBYXCS+5yLHs=bHjCDxN1rY_aGFg@mail.gmail.com>
	<CAMC681=OrHJmfEbxWz=8-qzo2rEVJaqFeqgihiAMVi6No7GBCw@mail.gmail.com>
	<CAAzEd5Bz5xvc2Bz80Ru+FbUbJK-WnAjfvLv70SfkPZup89NGRQ@mail.gmail.com>
Message-ID: <CAMC681ndsaK0J0iR==O7djsG1KQxdbp6TWd7sgGDVySP2OHuSA@mail.gmail.com>

On Fri, Dec 28, 2012 at 10:50 AM, Ben Morris <ben at bendmorris.com> wrote:

> On Tue, Dec 25, 2012 at 2:18 AM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> >
> > On Mon, Dec 24, 2012 at 8:58 AM, Ben Morris <ben at bendmorris.com> wrote:
> >>
> >> Hi all,
> >>
> >> I've implemented support for two new phylogenetic tree formats: NeXML
> and
> >> RDF (conforming to the Comparative Data Analysis Ontology).
> >>
> >> I noticed that NeXML support was planned, but I didn't see anyone
> working
> >> on it on GitHub and the feature request hadn't been updated in about a
> >> year, so I went ahead and implemented a simple version. At first I tried
> >> the generateDS.py approach, but the generated writer doesn't give very
> much
> >> control over the output, so I ended up writing my own parser/writer
> using
> >> ElementTree.
> >>
> >> As for the RDF/CDAO format, AFAIK this is not a format that's supported
> by
> >> any other phylogenetic libraries, so I'm not sure how useful this is to
> >> everyone else. It provides a simple, standards-compliant format that
> can be
> >> imported to a triple store and supports annotation. We'll be using it at
> >> NESCent so I wanted to make it available to everyone else as well. The
> >> parser and writer require the Redlands Python bindings.
> >>
> >> The code is available in my fork of Biopython,
> >>
> >>     https://github.com/bendmorris/biopython
> >>
> >> under branches "cdao" and "nexml." I'd love to get everyone's thoughts
> and
> >> see if these contributions would be a good fit for the Biopython
> project.
> >
> >
> >
> > Thanks for letting us know! I'll try it out soonish. Looking at the code
> on your nexml branch, I have a few comments:
> >
> > - The parser uses ElementTree.parse rather than iterparse, so in its
> current state it would not be able to parse massive files (those larger
> than available RAM). Worth fixing eventually?
>
> Great point. I rewrote it to use iterparse instead.
>
> > - The parser creates Newick.Tree and Newick.Clade objects, which is
> nearly correct in my opinion. I would suggest subclassing BaseTree.Tree and
> BaseTree.Clade to create NeXML-specific Tree and Clade classes, even if you
> don't have any additional attributes to attach to those classes at the
> moment. (These would go in a new file NeXML.py, similar to PhyloXML.py and
> PhyloXMLIO.py.)
>
> Went ahead and did this as well.
>

Thanks! Sorry for the pace of this, I'm in the midst of a dissertation.


 > - The 'confidence' or 'confidences' attribute isn't used (for e.g.
> bootstrap support values). Does NeXML define it?
>
> Not that I'm aware of, but I'm not sure. I searched
> http://nexml.org/nexml/html/doc/schema-1/ and didn't find anything.
> I'm going to ask some people who know more about this than I do.
>

I would like for Bio.Phylo's I/O modules to be able to successfully
round-trip a file from Newick to phyloXML to NeXML and back to Newick
without losing support values. I found these two examples of how to add
this data to a NeXML document by referencing CDAO:
https://www.nescent.org/wg_evoinfo/NeXML_Test_Files#Bootstraps_represented_using_the_.22meta.22_tag
https://www.nescent.org/wg_evoinfo/NeXML_Test_Files#Bootstraps_represented_without_new_tags_or_elements

That's the standard way to store bootstrap supports in NeXML (Hilmar
confirms). How do your NeXML and CDAO modules interact, if at all? Would
the CDAO modules be useful to properly support NeXML metadata like
support/confidence values, or would it be simpler to just hard-code the few
tags we're specifically interested in?

Relatedly, those look like good test files. I see you've started writing
NeXML unit tests already; if you would like help with any of this, just let
me know.

-Eric

From mjldehoon at yahoo.com  Sun Jan 20 02:30:24 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 19 Jan 2013 23:30:24 -0800 (PST)
Subject: [Biopython-dev] Bio.Motif update
Message-ID: <1358667024.24762.YahooMailClassic@web164004.mail.gq1.yahoo.com>

Dear all,

As we discussed previously, I've been going over Bio.Motif to update it and make its usage more explicit. I'm pretty much done. While I have been uploading my changes to the main biopython github repository, this does not mean that these changes are final; comments and suggestions for changes are welcome.

In many cases, there is a difference in the syntax between the old Bio.Motif and the new Bio.Motif. For example, motif.consensus is a method in the old Bio.Motif, but a property in the new Bio.Motif.
While I tried to put PendingDeprecationWarnings on all changes consistently, there may be some corner cases that I missed.

For this reason, and also to make the documentation more understandable, it may be better to put the new Bio.Motif code in a module Bio.motifs, to put the old Bio.Motif code back into Bio.Motif (so that Bio.Motif in release 1.61 will be identical to the Bio.Motif in release 1.60), and (assuming that we are happy with the new Bio.motifs modules) put a PendingDeprecationWarning on Bio.Motif as a whole. Then in the documentation we'll have one chapter on Bio.Motif and one chapter on Bio.motifs. Also we'll have one set of tests for Bio.Motif, and one set of tests for Bio.motifs.

Any objections to creating a separate Bio.motifs module?

Here you can find the relevant chapter in the current documentation on the new Bio.Motif:

http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html#htoc190

Best,
-Michiel

From p.j.a.cock at googlemail.com  Sun Jan 20 14:03:45 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 20 Jan 2013 19:03:45 +0000
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <50F66496.8000109@biotech.uni-tuebingen.de>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
	<CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
	<CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>
	<CAKVJ-_77kdYzXv_Q_KVqy9jWSNJSgU+PWdVB-DzxdF8TKwUAGg@mail.gmail.com>
	<50F66496.8000109@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_5Tj+POzmYLvHx_nScjE6x9A-HgQPRQ_Ec_Bu1VGGjH6Q@mail.gmail.com>

On Wed, Jan 16, 2013 at 8:28 AM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
> On 2013-01-15 20:03, Peter Cock wrote:
>
> Hi Peter,
>
>> It will need a bit of work to rebase (some of the PEP8 changes have
>> touched the same lines of code), but I will try and do that this week.
>
> Your f_loc4 branch certainly fixes the problem I'm seeing. Is there
> anything I can do to help with getting it merged? I'm happy to give a
> closer look at the rebase conflicts coming up during the merge if you
> don't mind me asking the occasional question if I can't work out reasons
> for a code change from the commit messages.
>
> Cheers,
> Kai

I've done the rebase - all the tests still pass so if I missed anything
it should just be minor:

https://github.com/peterjc/biopython/commits/f_loc4 (old)
https://github.com/peterjc/biopython/commits/f_loc5 (rebased)

Kai - would you mind retesting with f_loc5 (the rebased branch)?

Everyone - does it seem sensible to include this now, ready for
the upcoming release (*)? Or perhaps just after the release?

Peter

(*) See other thread about Bio.Motif, which I think is all we need
to address before doing the release:
http://lists.open-bio.org/pipermail/biopython-dev/2013-January/010235.html

From bartek at rezolwenta.eu.org  Sun Jan 20 17:34:42 2013
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Sun, 20 Jan 2013 23:34:42 +0100
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <1358667024.24762.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <1358667024.24762.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CABHxouWi3xNub97t3vP1hPAQTFTbMa4qdFnq3FLTRnA39t4uWA@mail.gmail.com>

Hi,

great job Michiel! It looks very nice overall. As the code that will
be using the new library needs to be changed, I would vote for the
change in the namespace, but given that the userbase of the Bio.Motif
was quite limited, I think it wouldn't cause major problems to keep
the name as is.

best
Bartek

On Sun, Jan 20, 2013 at 8:30 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Dear all,
>
> As we discussed previously, I've been going over Bio.Motif to update it and make its usage more explicit. I'm pretty much done. While I have been uploading my changes to the main biopython github repository, this does not mean that these changes are final; comments and suggestions for changes are welcome.
>
> In many cases, there is a difference in the syntax between the old Bio.Motif and the new Bio.Motif. For example, motif.consensus is a method in the old Bio.Motif, but a property in the new Bio.Motif.
> While I tried to put PendingDeprecationWarnings on all changes consistently, there may be some corner cases that I missed.
>
> For this reason, and also to make the documentation more understandable, it may be better to put the new Bio.Motif code in a module Bio.motifs, to put the old Bio.Motif code back into Bio.Motif (so that Bio.Motif in release 1.61 will be identical to the Bio.Motif in release 1.60), and (assuming that we are happy with the new Bio.motifs modules) put a PendingDeprecationWarning on Bio.Motif as a whole. Then in the documentation we'll have one chapter on Bio.Motif and one chapter on Bio.motifs. Also we'll have one set of tests for Bio.Motif, and one set of tests for Bio.motifs.
>
> Any objections to creating a separate Bio.motifs module?
>
> Here you can find the relevant chapter in the current documentation on the new Bio.Motif:
>
> http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html#htoc190
>
> Best,
> -Michiel
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 
Bartek Wilczynski


From kai.blin at biotech.uni-tuebingen.de  Mon Jan 21 04:49:31 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Mon, 21 Jan 2013 10:49:31 +0100
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <CAKVJ-_5Tj+POzmYLvHx_nScjE6x9A-HgQPRQ_Ec_Bu1VGGjH6Q@mail.gmail.com>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
	<CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
	<CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>
	<CAKVJ-_77kdYzXv_Q_KVqy9jWSNJSgU+PWdVB-DzxdF8TKwUAGg@mail.gmail.com>
	<50F66496.8000109@biotech.uni-tuebingen.de>
	<CAKVJ-_5Tj+POzmYLvHx_nScjE6x9A-HgQPRQ_Ec_Bu1VGGjH6Q@mail.gmail.com>
Message-ID: <50FD0F2B.1080606@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-01-20 20:03, Peter Cock wrote:

> Kai - would you mind retesting with f_loc5 (the rebased branch)?

The location of the feature that caused trouble for me still looks
correct. I'm currently running some more sequences, but I'm pretty
confident that the code will work just fine. The tests I added to the
genbank parser code for all the problem cases I had pass, after all. :)

> Everyone - does it seem sensible to include this now, ready for the
> upcoming release (*)? Or perhaps just after the release?

I'd perfer having this in the next release if possible, but of course
if the release after that is coming up within a reasonable time frame,
that would work as well.

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQEcBAEBAgAGBQJQ/Q8rAAoJEKM5lwBiwTTP9oEIAIoa543zGerNtxNg67ybV4uE
jzOkyBzJIxkGAjIxcuNnYTo+OgYHkMQekeo7wkGgPKN558+LE8zKza3JdWbVqV/M
bEd6mYo5LsfveK3Vn397GJcPCOaQtb5MvNUOPJWstzReRVIM6lN3WXm3HxicuTji
2aFZG5dtaMXjZhxxMo4IRz2Jtrr01nZu1OVP02mco4LDoEkRInunDcWJcz/DOsJd
h4vJzVa4veMKFfJV4U9PGZnuatcwKgMLVQ1heKh4/efEOQ4dIjdlYG29FjHsZvy6
RjwL4ZZpGZfZwgBJPGiYqn5ZsgzVqgS5aWdw8/9jN5dpETP24DnzVi6vlIRTWqg=
=uUeG
-----END PGP SIGNATURE-----

From redmine at redmine.open-bio.org  Tue Jan 22 21:30:31 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 23 Jan 2013 02:30:31 +0000
Subject: [Biopython-dev] [Biopython - Bug #3403] (Closed) PDBList fails to
	download large PDB structures
References: <redmine.issue-3403.20130109225825@redmine.open-bio.org>
Message-ID: <redmine.journal-15068.20130123023031@redmine.open-bio.org>


Issue #3403 has been updated by Eric Talevich.

Status changed from New to Closed
% Done changed from 0 to 100

Fixed by David Cain. Thanks!
https://github.com/biopython/biopython/pull/146

First commit in the series here:
https://github.com/biopython/biopython/commit/7282e80ed6a65a10c5c624b2a7ec787656437a15
----------------------------------------
Bug #3403: PDBList fails to download large PDB structures
https://redmine.open-bio.org/issues/3403

Author: David Cain
Status: Closed
Priority: High
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: https://github.com/DavidCain/biopython/tree/fix_pdb_dl


The current @PDBList@ module will often fail to download large PDB files.

<pre>
>>> from Bio.PDB import PDBList
>>> pdbl = PDBList()
>>> pdbl.retrieve_pdb_file("1hgg")
Downloading PDB structure '1hgg'...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/Bio/PDB/PDBList.py", line 247, in retrieve_pdb_file
    out.writelines(gz.read())
  File "/usr/lib/python2.7/gzip.py", line 249, in read
    self._read(readsize)
  File "/usr/lib/python2.7/gzip.py", line 303, in _read
    self._read_eof()
  File "/usr/lib/python2.7/gzip.py", line 342, in _read_eof
    hex(self.crc)))
IOError: CRC check failed 0x21d7a5f7 != 0x4b5eabb6L
>>>
</pre>

The source of this problem is that the entire gzipped file must be read into memory before it's written to disk locally. With large archives, the local file can be truncated prematurely, which causes gzip to crash on extraction.

I fixed this issue on my "GitHub branch":https://github.com/DavidCain/biopython/tree/fix_pdb_dl, which I've made a pull request for.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From mjldehoon at yahoo.com  Sat Jan 26 23:45:46 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 26 Jan 2013 20:45:46 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
Message-ID: <1359261946.16561.YahooMailClassic@web164001.mail.gq1.yahoo.com>

[This message previously got lost in cyberspace. Sending it again.]

--- On Fri, 1/11/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Bow's SearchIO is using Bio.Blast.NCBIStandalone to handle
> plain text,
> https://github.com/biopython/biopython/blob/master/Bio/SearchIO/BlastIO/blast_text.py

OK then let's keep Bio.ParserSupport as is for now.

> That's why Bio._utils is a private module - we can
> drop/change/etc this without worrying about breaking
> other people's code. The issue with Bio.ParserSupport
> is it was a public API.

Its API being public was not the problem -- we have deprecated and removed lots of public modules over the years.

The problem with Bio.ParserSupport was twofold. First, it ended up making parsers more complex and difficult to understand for people not familiar with Bio.ParserSupport, in particular for newcomers and users trying to fix a bug. So Bio.ParserSupport never made us really happy. As a case in point, Bio._utils was created rather than reusing the code in Bio.ParserSupport.

The second problem was that many modules were using bits and pieces of Bio.ParserSupport, so we could not drop or change Bio.ParserSupport easily. Bio.ParserSupport has been officially obsolete but not deprecated for years.

> That's why Bio._utils is a private module - we can
> drop/change/etc this without worrying about breaking
> other people's code.

Let's drop it.

Just it being a private module doesn't make it "free". It clutters up the code base. This is particularly true for top-level modules.

Best,
-Michiel.

From mjldehoon at yahoo.com  Sat Jan 26 23:46:47 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 26 Jan 2013 20:46:47 -0800 (PST)
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <CABHxouWi3xNub97t3vP1hPAQTFTbMa4qdFnq3FLTRnA39t4uWA@mail.gmail.com>
Message-ID: <1359262007.25151.YahooMailClassic@web164002.mail.gq1.yahoo.com>

OK, thanks! I separated Bio.Motif into Bio.Motif (essentially the same as in Biopython release 1.60) and Bio.motifs (the new code).

Best,
-Michiel.

--- On Sun, 1/20/13, Bartek Wilczynski <bartek at rezolwenta.eu.org> wrote:

> From: Bartek Wilczynski <bartek at rezolwenta.eu.org>
> Subject: Re: [Biopython-dev] Bio.Motif update
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "BioPython-Dev" <biopython-dev at biopython.org>
> Date: Sunday, January 20, 2013, 5:34 PM
> Hi,
> 
> great job Michiel! It looks very nice overall. As the code
> that will
> be using the new library needs to be changed, I would vote
> for the
> change in the namespace, but given that the userbase of the
> Bio.Motif
> was quite limited, I think it wouldn't cause major problems
> to keep
> the name as is.
> 
> best
> Bartek
> 
> On Sun, Jan 20, 2013 at 8:30 AM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > Dear all,
> >
> > As we discussed previously, I've been going over
> Bio.Motif to update it and make its usage more explicit. I'm
> pretty much done. While I have been uploading my changes to
> the main biopython github repository, this does not mean
> that these changes are final; comments and suggestions for
> changes are welcome.
> >
> > In many cases, there is a difference in the syntax
> between the old Bio.Motif and the new Bio.Motif. For
> example, motif.consensus is a method in the old Bio.Motif,
> but a property in the new Bio.Motif.
> > While I tried to put PendingDeprecationWarnings on all
> changes consistently, there may be some corner cases that I
> missed.
> >
> > For this reason, and also to make the documentation
> more understandable, it may be better to put the new
> Bio.Motif code in a module Bio.motifs, to put the old
> Bio.Motif code back into Bio.Motif (so that Bio.Motif in
> release 1.61 will be identical to the Bio.Motif in release
> 1.60), and (assuming that we are happy with the new
> Bio.motifs modules) put a PendingDeprecationWarning on
> Bio.Motif as a whole. Then in the documentation we'll have
> one chapter on Bio.Motif and one chapter on Bio.motifs. Also
> we'll have one set of tests for Bio.Motif, and one set of
> tests for Bio.motifs.
> >
> > Any objections to creating a separate Bio.motifs
> module?
> >
> > Here you can find the relevant chapter in the current
> documentation on the new Bio.Motif:
> >
> > http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html#htoc190
> >
> > Best,
> > -Michiel
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
> 
> 
> 
> -- 
> Bartek Wilczynski
> 

From w.arindrarto at gmail.com  Sun Jan 27 05:52:15 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Sun, 27 Jan 2013 11:52:15 +0100
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359261946.16561.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <1359261946.16561.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CADEGkF6VnZE2gjfhm4sQAR2ecYm3Hjwpu8zmscPcgp_aHtQ8zA@mail.gmail.com>

Hi Michiel, everyone,

>> That's why Bio._utils is a private module - we can
>> drop/change/etc this without worrying about breaking
>> other people's code. The issue with Bio.ParserSupport
>> is it was a public API.
>
> Its API being public was not the problem -- we have deprecated and removed lots of public modules over the years.
>
> The problem with Bio.ParserSupport was twofold. First, it ended up making parsers more complex and difficult to understand for people not familiar with Bio.ParserSupport, in particular for newcomers and users trying to fix a bug. So Bio.ParserSupport never made us really happy. As a case in point, Bio._utils was created rather than reusing the code in Bio.ParserSupport.
>
> The second problem was that many modules were using bits and pieces of Bio.ParserSupport, so we could not drop or change Bio.ParserSupport easily. Bio.ParserSupport has been officially obsolete but not deprecated for years.
>
>> That's why Bio._utils is a private module - we can
>> drop/change/etc this without worrying about breaking
>> other people's code.
>
> Let's drop it.

My initial intention of refactoring and adding some new code to
Bio._utils was to reduce code repetition. I intended it (and perhaps
we should make it explicit in its docstrings) to be a collection of
small, useful functions that may be used in various cases.

Some examples inside include several string-formatting functions, each
of them independent of the other. There's also a general function for
running doctests
(https://github.com/biopython/biopython/blob/master/Bio/_utils.py#L100),
which was written because there was a lot of repetitive code in
different submodules basically doing the same thing (looking up the
test directory, running the test). I feel quite strongly that this
doctest function is required by many current (and future modules)
across Biopython, so it makes sense to refactor them out into a root
namespace.

All of this seems different from Bio.ParserSupport, which attempts to
be a one-single solution for writing new parsers (only parsers). Given
the wildly incoherent nature of different file output formats, it's
not surprising that Bio.ParserSupport's code base has to be quite
complicated to accomodate all of them. Naturally it has many related
parts and functions, and understanding them all is much harder than to
understand the small functions in Bio._utils (in my experience).

So for now, I think it is still ok if we use Bio._utils. Perhaps, in
light of this discussion, we should make it explicitly clear that it's
only for containing general, small, utility functions instead of
containing one 'support framework' (e.g. ParserSupport) to avoid
future unhappiness.

Cheers,
Bow


From eric.talevich at gmail.com  Mon Jan 28 00:59:14 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Mon, 28 Jan 2013 00:59:14 -0500
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CADEGkF6VnZE2gjfhm4sQAR2ecYm3Hjwpu8zmscPcgp_aHtQ8zA@mail.gmail.com>
References: <1359261946.16561.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CADEGkF6VnZE2gjfhm4sQAR2ecYm3Hjwpu8zmscPcgp_aHtQ8zA@mail.gmail.com>
Message-ID: <CAMC681m7kbjsRnAGZLDO4u_+RjjUoo3Jd7MTjWOsA8kiyJHqJA@mail.gmail.com>

On Sun, Jan 27, 2013 at 5:52 AM, Wibowo Arindrarto
<w.arindrarto at gmail.com>wrote:

> Hi Michiel, everyone,
>
> >> That's why Bio._utils is a private module - we can
> >> drop/change/etc this without worrying about breaking
> >> other people's code. The issue with Bio.ParserSupport
> >> is it was a public API.
> >
> > Its API being public was not the problem -- we have deprecated and
> removed lots of public modules over the years.
> >
> > The problem with Bio.ParserSupport was twofold. First, it ended up
> making parsers more complex and difficult to understand for people not
> familiar with Bio.ParserSupport, in particular for newcomers and users
> trying to fix a bug. So Bio.ParserSupport never made us really happy. As a
> case in point, Bio._utils was created rather than reusing the code in
> Bio.ParserSupport.
> >
> > The second problem was that many modules were using bits and pieces of
> Bio.ParserSupport, so we could not drop or change Bio.ParserSupport easily.
> Bio.ParserSupport has been officially obsolete but not deprecated for years.
> >
> >> That's why Bio._utils is a private module - we can
> >> drop/change/etc this without worrying about breaking
> >> other people's code.
> >
> > Let's drop it.
>
> My initial intention of refactoring and adding some new code to
> Bio._utils was to reduce code repetition. I intended it (and perhaps
> we should make it explicit in its docstrings) to be a collection of
> small, useful functions that may be used in various cases.
>
> Some examples inside include several string-formatting functions, each
> of them independent of the other. There's also a general function for
> running doctests
> (https://github.com/biopython/biopython/blob/master/Bio/_utils.py#L100),
> which was written because there was a lot of repetitive code in
> different submodules basically doing the same thing (looking up the
> test directory, running the test). I feel quite strongly that this
> doctest function is required by many current (and future modules)
> across Biopython, so it makes sense to refactor them out into a root
> namespace.
>

Interesting discussion.

It's worth considering why some functions are being used in multiple parts
of the code base. In some cases there are essentially shortcomings in the
Python standard library or issues with
cross-platform/cross-implementation/backward compatibility that would
require us to use *exactly* the same code each time a certain recurring
problem is encountered. The Bio._py3k and Bio.File modules makes sense for
this reason, I think, and before we deprecated Py2.4 it would have been
helpful to have shared code for importing ElementTree (both the uniprot-xml
and phyloXML parsers used the same half-page tangle of attempted imports).

So, maybe the doctest helpers should go in a new module specific to that
topic.

In other cases there's a recurring need in separate modules, but (a) it's
short and simple enough to write the solution from scratch each time where
it's needed, and so isn't enough of a maintenance concern to offset the
convenience of having all the relevant code in one place; and/or (b) the
needs of different modules aren't exactly the same, merely similar, leading
to a proliferation of options in the shared function and the situation that
a simpler implementation would have worked for any given module.

The point is that just as there's a maintenance cost to having duplicated
code in multiple places, there's a maintenance cost to having dependencies
between multiple modules even within the same project, and the value of a
new module ought to be greater than the cost it imposes.

Best,
Eric

From mjldehoon at yahoo.com  Mon Jan 28 09:58:58 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 28 Jan 2013 06:58:58 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CADEGkF6VnZE2gjfhm4sQAR2ecYm3Hjwpu8zmscPcgp_aHtQ8zA@mail.gmail.com>
Message-ID: <1359385138.84799.YahooMailClassic@web164002.mail.gq1.yahoo.com>

Hi Bow,

--- On Sun, 1/27/13, Wibowo Arindrarto <w.arindrarto at gmail.com> wrote:
> All of this seems different from Bio.ParserSupport, which
> attempts to be a one-single solution for writing new parsers
> (only parsers). Given the wildly incoherent nature of different
> file output formats, it's not surprising that Bio.ParserSupport's
> code base has to be quite complicated to accommodate all of them.
> Naturally it has many related parts and functions, and understanding
> them all is much harder than to understand the small functions in
> Bio._utils (in my experience).

It's not just Bio.ParserSupport; previously we also had Bio/listfns.py; Bio/mathfns.py; Bio/stringfns.py; their C versions; and Bio/csupport.c. These all contained small utility functions. But in the end we dropped them.

Btw, was Bio._utils ever discussed on the mailing list? If yes, I apologize for missing this discussion and raising these issues now.

Best,

-Michiel.

From p.j.a.cock at googlemail.com  Mon Jan 28 10:10:29 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 28 Jan 2013 15:10:29 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359385138.84799.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CADEGkF6VnZE2gjfhm4sQAR2ecYm3Hjwpu8zmscPcgp_aHtQ8zA@mail.gmail.com>
	<1359385138.84799.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_4q+QcrLGBshNe7fo5GxqnuBMunt1=iKERXCRD3e3vxww@mail.gmail.com>

On Mon, Jan 28, 2013 at 2:58 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> Btw, was Bio._utils ever discussed on the mailing list? If yes, I
> apologize for missing this discussion and raising these issues now.

I think only on the pull request - I'll have a look at the GitHub
settings as ideally at the minimum new pull requests should
perhaps be CC'd to the dev list?

Peter

From p.j.a.cock at googlemail.com  Mon Jan 28 10:17:19 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 28 Jan 2013 15:17:19 +0000
Subject: [Biopython-dev] Sending pull requests to the mailing list
Message-ID: <CAKVJ-_5uPrrzq7WN=x9s7NWjX5Q8E0OYBwKA9Pz_M=GncpMncg@mail.gmail.com>

Retitling thread,

On Mon, Jan 28, 2013 at 3:10 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Mon, Jan 28, 2013 at 2:58 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>>
>> Btw, was Bio._utils ever discussed on the mailing list? If yes, I
>> apologize for missing this discussion and raising these issues now.
>
> I think only on the pull request - I'll have a look at the GitHub
> settings as ideally at the minimum new pull requests should
> perhaps be CC'd to the dev list?

According to https://help.github.com/articles/using-pull-requests

"Everyone that can push to the base repository will receive an
 email notification and see the new pull request in their
 dashboard the next time they log in."

I think you can also choose to get emails under your own profile
settings. There doesn't seem to be any email notification settings
under the Biopython organisation account on GitHub.

If there is an easy way to have GitHub email new pull requests to
the biopython-dev mailing I've overlooked it. There might be an
API based solution... or a simple email client forwarding rule?

Peter

From w.arindrarto at gmail.com  Mon Jan 28 12:19:51 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Mon, 28 Jan 2013 18:19:51 +0100
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359385138.84799.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CADEGkF6VnZE2gjfhm4sQAR2ecYm3Hjwpu8zmscPcgp_aHtQ8zA@mail.gmail.com>
	<1359385138.84799.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CADEGkF6r0--bP8Nr+eSpjQgxUCnfd29UMXOSyzh0LMxf1xFi-g@mail.gmail.com>

Hi everyone,

> --- On Sun, 1/27/13, Wibowo Arindrarto <w.arindrarto at gmail.com> wrote:
>> All of this seems different from Bio.ParserSupport, which
>> attempts to be a one-single solution for writing new parsers
>> (only parsers). Given the wildly incoherent nature of different
>> file output formats, it's not surprising that Bio.ParserSupport's
>> code base has to be quite complicated to accommodate all of them.
>> Naturally it has many related parts and functions, and understanding
>> them all is much harder than to understand the small functions in
>> Bio._utils (in my experience).
>
> It's not just Bio.ParserSupport; previously we also had Bio/listfns.py; Bio/mathfns.py; Bio/stringfns.py; their C versions; and Bio/csupport.c. These all contained small utility functions. But in the end we dropped them.

Hm..in this case (and in light of Eric's points as well), it may be ok
to drop the string formatting functions in Bio._utils. They are used
in Bio.Phylo and Bio.SearchIO for now. In Bio.SearchIO they are used
in multiple submodules, however, so I am still leaning on putting them
at least on Bio.SearchIO's main directory. They were originally in
Bio.SearchIO._utils, after all.

As for the doctest-related functions, do you propose to move them to a
specific doctest-related module as well?

>> Btw, was Bio._utils ever discussed on the mailing list? If yes, I
>> apologize for missing this discussion and raising these issues now.
>
> I think only on the pull request - I'll have a look at the GitHub
> settings as ideally at the minimum new pull requests should
> perhaps be CC'd to the dev list?

Indeed, I did submit a pull request but was not forwarded / discussed
in the mailing list. This is the pull request, for reference:
https://github.com/biopython/biopython/pull/140. For the dev-mailing
list notification, I personally agree, given that the amount of pull
requests received still seems manageable. Is it possible to just
receive the initial email notifying the pull, though?

So far, I've been 'watching' the repository and getting emails from
there ~ perhaps the organization needs to 'watch' the repo to get
notifications as well?

Best,
Bow

From redmine at redmine.open-bio.org  Mon Jan 28 17:20:54 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 28 Jan 2013 22:20:54 +0000
Subject: [Biopython-dev] [Biopython - Bug #2776] Bio.pairwise2 returns
	non-optimal alignment in at least some cases
References: <redmine.issue-2776.20090302102253@redmine.open-bio.org>
Message-ID: <redmine.journal-15069.20130128222054@redmine.open-bio.org>


Issue #2776 has been updated by Peter Cock.


In the opinion of Bryan Lunt, comment on another issue on Github:
https://github.com/biopython/biopython/pull/149

"Bug" 2776 is not a bug, it is a feature.

I hand-edited a datafile for EMBOSS programs and tried the EMBOSS "needle" program with (a homomorphism of) the same sequences. It behaves the same as pairwise2.

The point is that for there to be gaps they have to be flanked by matches, except on the ends, so what the original bug report asks for is not something these algorithms will ever produce anyway.
----------------------------------------
Bug #2776: Bio.pairwise2 returns non-optimal alignment in at least some cases
https://redmine.open-bio.org/issues/2776

Author: Klaus Kopec
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.49
URL: 


At least in some cases, Bio.pairwise2 returns an alignment that is not the one with the highest score for the input parameters. This occurs in localXX and globalXX.

Yet, I only encountered the problem with large mismatch values (which I use as I need mismatch free alignments).

simple example (the bug also occured for longer sequences):
>>> sequence1 = 'GKG'
>>> sequence2 = 'GWG'
>>> A = pairwise2.align.globalms(sequence1, sequence2, 5, -100, -5, -5)[0]
>>> A[0]
'GKG--'
>>> A[1]
'--GWG'
>>> A[2]
-15.0

whereas
'GK-G'
'G-WG'

would get a score of 0


System: Kubuntu 8.10 64Bit, Python 2.6.1, Biopython 1.49 (my pairwise2.py is identical to the current CVS version of it)


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From mjldehoon at yahoo.com  Tue Jan 29 04:43:59 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 29 Jan 2013 01:43:59 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CAKVJ-_4q+QcrLGBshNe7fo5GxqnuBMunt1=iKERXCRD3e3vxww@mail.gmail.com>
Message-ID: <1359452639.95165.YahooMailClassic@web164005.mail.gq1.yahoo.com>

I'd prefer if developers first write to the dev mailing list if they want to make any major changes, or changes that affect Biopython overall. It can be hard to understand the implications just from looking at a pull request, and there may be so many pull requests that the important ones may be missed anyway.

Best,
-Michiel.

--- On Mon, 1/28/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "Wibowo Arindrarto" <w.arindrarto at gmail.com>, "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Monday, January 28, 2013, 10:10 AM
> On Mon, Jan 28, 2013 at 2:58 PM,
> Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> >
> > Btw, was Bio._utils ever discussed on the mailing list?
> If yes, I
> > apologize for missing this discussion and raising these
> issues now.
> 
> I think only on the pull request - I'll have a look at the
> GitHub
> settings as ideally at the minimum new pull requests should
> perhaps be CC'd to the dev list?
> 
> Peter
> 

From mjldehoon at yahoo.com  Tue Jan 29 04:54:01 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 29 Jan 2013 01:54:01 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CADEGkF6r0--bP8Nr+eSpjQgxUCnfd29UMXOSyzh0LMxf1xFi-g@mail.gmail.com>
Message-ID: <1359453241.43038.YahooMailClassic@web164004.mail.gq1.yahoo.com>

--- On Mon, 1/28/13, Wibowo Arindrarto <w.arindrarto at gmail.com> wrote:
> Hm..in this case (and in light of Eric's points as well), it
> may be ok to drop the string formatting functions in Bio._utils.
> They are used in Bio.Phylo and Bio.SearchIO for now. In Bio.SearchIO
> they are used in multiple submodules, however, so I am still leaning
> on putting them at least on Bio.SearchIO's main directory. They were
> originally in Bio.SearchIO._utils, after all.

I think it's OK to have a _utils submodule inside Bio.SearchIO. Since you are developing and maintaining that module, to a large degree it's up to you how you want to organize your code. For the same reason, for Bio.Phylo it's better to discuss with Eric Talevich first to see what he thinks.

> As for the doctest-related functions, do you propose to move
> them to a specific doctest-related module as well?

For the doctest-related functions, we first need to understand what the purpose is, before deciding how to implement it (and in what module the code should be).

Best,
-Michiel.

From p.j.a.cock at googlemail.com  Tue Jan 29 05:23:43 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 29 Jan 2013 10:23:43 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359452639.95165.YahooMailClassic@web164005.mail.gq1.yahoo.com>
References: <CAKVJ-_4q+QcrLGBshNe7fo5GxqnuBMunt1=iKERXCRD3e3vxww@mail.gmail.com>
	<1359452639.95165.YahooMailClassic@web164005.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7+hQ1OHDd-tWWVYbLzVgbRpe9wkyL2ZPnatYcdake1uw@mail.gmail.com>

On Tue, Jan 29, 2013 at 9:43 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> I'd prefer if developers first write to the dev mailing list if they want to make
> any major changes, or changes that affect Biopython overall. It can be hard
> to understand the implications just from looking at a pull request, and there
> may be so many pull requests that the important ones may be missed anyway.

Certainly a good policy, which I have tried to follow.

In this case since it was just moving a small private API code, I
didn't consider
it major.

Peter

From p.j.a.cock at googlemail.com  Tue Jan 29 05:29:30 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 29 Jan 2013 10:29:30 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359453241.43038.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <CADEGkF6r0--bP8Nr+eSpjQgxUCnfd29UMXOSyzh0LMxf1xFi-g@mail.gmail.com>
	<1359453241.43038.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_5iKKx4h5OyGLOpmvfuNbDHgCa_Kx9po2st+oan_ZMR=g@mail.gmail.com>

On Tue, Jan 29, 2013 at 9:54 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
>> As for the doctest-related functions, do you propose to move
>> them to a specific doctest-related module as well?
>
> For the doctest-related functions, we first need to understand
> what the purpose is, before deciding how to implement it (and
> in what module the code should be).

When editing doctests, it is convenient to be able to run them on
the current file, e.g.

~/biopython $ emacs Bio/SeqRecord.py
~/biopython $ python Bio/SeqRecord.py

Or,

~/biopython/Bio $ emacs SeqRecord.py
~/biopython/Bio $ python SeqRecord.py

To do that, many of our modules had a repeated bit of code at
the bottom, now moved to a shared function in Bio/_utils.py
resulting in a lot less boiler plate code, e.g.

https://github.com/biopython/biopython/commit/8b59d89bb4e282192ddee751e24ceef4afa63528

Bow had initially done this for the doctests in Bio.SearchIO,
but I agreed it make sense to do this elsewhere.

Peter

From w.arindrarto at gmail.com  Tue Jan 29 06:05:19 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 29 Jan 2013 12:05:19 +0100
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359453241.43038.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <CADEGkF6r0--bP8Nr+eSpjQgxUCnfd29UMXOSyzh0LMxf1xFi-g@mail.gmail.com>
	<1359453241.43038.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CADEGkF4LL=0_5uCtMMsBveM-ce1XkxPDcTASMhAUEXKiYFLY9A@mail.gmail.com>

Hi Michiel, everyone,

>>> I'd prefer if developers first write to the dev mailing list if they want to make any major changes, or changes that affect Biopython overall. It can be hard to understand the implications just from looking at a pull request, and there may be so many pull requests that the important ones may be missed anyway.

>>> I think it's OK to have a _utils submodule inside Bio.SearchIO. Since you are developing and maintaining that module, to a large degree it's up to you how you want to organize your code. For the same reason, for Bio.Phylo it's better to discuss with Eric Talevich first to see what he thinks.

Noted. I'm sorry that this is causing more headaches than it solves.
I'll be sure to notify the dev-mailing list for other similar changes.

>>> As for the doctest-related functions, do you propose to move
>>> them to a specific doctest-related module as well?
>>
>> For the doctest-related functions, we first need to understand what the purpose is, before deciding how to implement it (and in what module the code should be).
>
> When editing doctests, it is convenient to be able to run them on
> the current file, e.g.
>
> ~/biopython $ emacs Bio/SeqRecord.py
> ~/biopython $ python Bio/SeqRecord.py
>
> Or,
>
> ~/biopython/Bio $ emacs SeqRecord.py
> ~/biopython/Bio $ python SeqRecord.py
>
> To do that, many of our modules had a repeated bit of code at
> the bottom, now moved to a shared function in Bio/_utils.py
> resulting in a lot less boiler plate code, e.g.
>
> https://github.com/biopython/biopython/commit/8b59d89bb4e282192ddee751e24ceef4afa63528
>
> Bow had initially done this for the doctests in Bio.SearchIO,
> but I agreed it make sense to do this elsewhere.

Indeed, the doctests functions are two simple small functions to make
it easier to run doctests. The first one looks up the test directory
(our Tests directory) and the second one simply executes the doctest.

Best,
Bow


From p.j.a.cock at googlemail.com  Tue Jan 29 10:46:25 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 29 Jan 2013 15:46:25 +0000
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <1359262007.25151.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CABHxouWi3xNub97t3vP1hPAQTFTbMa4qdFnq3FLTRnA39t4uWA@mail.gmail.com>
	<1359262007.25151.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_5X3eoXEnqD7yfTGFW1Saxr7rMe-WcbCofmCqdu_yq6KA@mail.gmail.com>

On Sun, Jan 27, 2013 at 4:46 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> OK, thanks! I separated Bio.Motif into Bio.Motif (essentially the same
> as in Biopython release 1.60) and Bio.motifs (the new code).

We need to say something about this in the NEWS file too.

I think it would make sense to add a PendingDeprecationWarning
to Bio.Motif now. Also, if you feel the new Bio.motifs API isn't quite
settled yet, adding the new BiopythonExperimentalWarning to that
makes sense.

What do you think?

(And once this is settled, I think we can schedule the release)

Regards,

Peter

From p.j.a.cock at googlemail.com  Tue Jan 29 12:10:50 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 29 Jan 2013 17:10:50 +0000
Subject: [Biopython-dev] Namespace for online resources?
Message-ID: <CAKVJ-_62TQS9-eswnkNRtaMQGri3cDSCNgRFyqwjVofgrg+9rA@mail.gmail.com>

Hello all,

We used to have Bio.WWW for assorted online tools, but that
was deprecated some time back. Is there a case for bringing it
back, or something similar like Bio.WebTools as suggested by
Kevin Murray on this pull request?:

https://github.com/biopython/biopython/pull/132

In this case, since this is to fetch Arabidopsis sequence via
an accession number, perhaps Bio.SeqUtils might be better?
(As an aside, recall we've talked about merging Bio.Seq* at
some point).

Thoughts?

Peter

From w.arindrarto at gmail.com  Tue Jan 29 14:52:42 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 29 Jan 2013 20:52:42 +0100
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CAKVJ-_62TQS9-eswnkNRtaMQGri3cDSCNgRFyqwjVofgrg+9rA@mail.gmail.com>
References: <CAKVJ-_62TQS9-eswnkNRtaMQGri3cDSCNgRFyqwjVofgrg+9rA@mail.gmail.com>
Message-ID: <CADEGkF5QQpvt5svMMsfM=sGW+zavQq-mpAtBu=Twf3CX5+rDKg@mail.gmail.com>

Hi everyone,

> We used to have Bio.WWW for assorted online tools, but that
> was deprecated some time back. Is there a case for bringing it
> back, or something similar like Bio.WebTools as suggested by
> Kevin Murray on this pull request?:
>
> https://github.com/biopython/biopython/pull/132
>
> In this case, since this is to fetch Arabidopsis sequence via
> an accession number, perhaps Bio.SeqUtils might be better?
> (As an aside, recall we've talked about merging Bio.Seq* at
> some point).

Why was Bio.WWW deprecated in the first place?

Personally, I would prefer to have all online database access
centralized in one place, if possible. It makes for a less-cluttered
root namespace and may be more intuitive in most cases. I do notice
that for cases like Bio.Entrez, sometimes we need to only parse the
data locally since it has been downloaded previously (hence no online
access). To do this task, Bio.www (basically the centralized online
module) may not be the most intuitive place to look in, for most
people, although an argument can be made that we are still parsing
data whose format is specific for an online resource.

However, looking at the way we are doing this now (with the current
codebase placing Entrez access and parsing in Bio.Entrez; similarly
for Bio.ExPASy) locating the module in Bio.TAIR (or Bio.tair? PEP-8
compliance?) looks more consistent. If we are to create a new module
for online access (e.g. Bio.webtools. Bio.www) for Bio.TAIR, for
consistency we may have to juggle Entrez and ExPASy around as well,
right?

Putting Bio.TAIR in Bio.SeqUtils doesn't seem..right to me. My
impression is that SeqUtils is supposed to be for functions acting on
sequence strings (or Seq objects) and nothing else. After all, we can
also retrieve GenBank sequences from Biopython but that functionality
is separated on its own Bio.Entrez not Bio.SeqUtils.
.
Just my two cents :),
Bow

From arklenna at gmail.com  Tue Jan 29 15:05:15 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Tue, 29 Jan 2013 15:05:15 -0500
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CADEGkF5QQpvt5svMMsfM=sGW+zavQq-mpAtBu=Twf3CX5+rDKg@mail.gmail.com>
References: <CAKVJ-_62TQS9-eswnkNRtaMQGri3cDSCNgRFyqwjVofgrg+9rA@mail.gmail.com>
	<CADEGkF5QQpvt5svMMsfM=sGW+zavQq-mpAtBu=Twf3CX5+rDKg@mail.gmail.com>
Message-ID: <CALfq9t+w=_9dEMFDUZdpJNRMKOxL_+PkmHKCm=Wf2o-KA0XabA@mail.gmail.com>

I agree with Bow that centralizing all online database access makes sense.
It would also simplify the testing process (i.e. anything that requires a
network connection goes into the web namespace and can be skipped when
testing offline).

In situations like Entrez, the network access portion could be separated
out and put into the web namespace under the same name:

    import Bio.www.Entrez  # for downloading the data
    import Bio.Entrez  # for parsing/using the downloaded data

Cheers,

Lenna


On Tue, Jan 29, 2013 at 2:52 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com>wrote:

> Hi everyone,
>
> > We used to have Bio.WWW for assorted online tools, but that
> > was deprecated some time back. Is there a case for bringing it
> > back, or something similar like Bio.WebTools as suggested by
> > Kevin Murray on this pull request?:
> >
> > https://github.com/biopython/biopython/pull/132
> >
> > In this case, since this is to fetch Arabidopsis sequence via
> > an accession number, perhaps Bio.SeqUtils might be better?
> > (As an aside, recall we've talked about merging Bio.Seq* at
> > some point).
>
> Why was Bio.WWW deprecated in the first place?
>
> Personally, I would prefer to have all online database access
> centralized in one place, if possible. It makes for a less-cluttered
> root namespace and may be more intuitive in most cases. I do notice
> that for cases like Bio.Entrez, sometimes we need to only parse the
> data locally since it has been downloaded previously (hence no online
> access). To do this task, Bio.www (basically the centralized online
> module) may not be the most intuitive place to look in, for most
> people, although an argument can be made that we are still parsing
> data whose format is specific for an online resource.
>
> However, looking at the way we are doing this now (with the current
> codebase placing Entrez access and parsing in Bio.Entrez; similarly
> for Bio.ExPASy) locating the module in Bio.TAIR (or Bio.tair? PEP-8
> compliance?) looks more consistent. If we are to create a new module
> for online access (e.g. Bio.webtools. Bio.www) for Bio.TAIR, for
> consistency we may have to juggle Entrez and ExPASy around as well,
> right?
>
> Putting Bio.TAIR in Bio.SeqUtils doesn't seem..right to me. My
> impression is that SeqUtils is supposed to be for functions acting on
> sequence strings (or Seq objects) and nothing else. After all, we can
> also retrieve GenBank sequences from Biopython but that functionality
> is separated on its own Bio.Entrez not Bio.SeqUtils.
> .
> Just my two cents :),
> Bow
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

From p.j.a.cock at googlemail.com  Tue Jan 29 16:03:59 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 29 Jan 2013 21:03:59 +0000
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CADEGkF5QQpvt5svMMsfM=sGW+zavQq-mpAtBu=Twf3CX5+rDKg@mail.gmail.com>
References: <CAKVJ-_62TQS9-eswnkNRtaMQGri3cDSCNgRFyqwjVofgrg+9rA@mail.gmail.com>
	<CADEGkF5QQpvt5svMMsfM=sGW+zavQq-mpAtBu=Twf3CX5+rDKg@mail.gmail.com>
Message-ID: <CAKVJ-_75uoDU_4chu8WpcrD_zRjCjwF6Qa6AjadKvBy-321UWw@mail.gmail.com>

On Tue, Jan 29, 2013 at 7:52 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi everyone,
>
> Why was Bio.WWW deprecated in the first place?
>

The flippant answer is everything under Bio.WWW was moved
or deprecated:
http://lists.open-bio.org/pipermail/biopython-dev/2008-July/004059.html

I'm trying to identify the discussions prior to that covering the moves:

Bio.WWW.ExPASy -> Bio.ExPASy
Bio.WWW.InterPro -> Bio.InterPro
Bio.WWW.NCBI -> Bio.Entrez
Bio.WWW.SCOP -> Bio.SCOP

Peter

From p.j.a.cock at googlemail.com  Tue Jan 29 16:11:29 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 29 Jan 2013 21:11:29 +0000
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CAKVJ-_75uoDU_4chu8WpcrD_zRjCjwF6Qa6AjadKvBy-321UWw@mail.gmail.com>
References: <CAKVJ-_62TQS9-eswnkNRtaMQGri3cDSCNgRFyqwjVofgrg+9rA@mail.gmail.com>
	<CADEGkF5QQpvt5svMMsfM=sGW+zavQq-mpAtBu=Twf3CX5+rDKg@mail.gmail.com>
	<CAKVJ-_75uoDU_4chu8WpcrD_zRjCjwF6Qa6AjadKvBy-321UWw@mail.gmail.com>
Message-ID: <CAKVJ-_7pCQj4C=8MXXq=8Cr6e1rNNDuNinCJqQDhADpV_muMSg@mail.gmail.com>

On Tue, Jan 29, 2013 at 9:03 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Jan 29, 2013 at 7:52 PM, Wibowo Arindrarto
> <w.arindrarto at gmail.com> wrote:
>> Hi everyone,
>>
>> Why was Bio.WWW deprecated in the first place?
>>
>
> The flippant answer is everything under Bio.WWW was moved
> or deprecated:
> http://lists.open-bio.org/pipermail/biopython-dev/2008-July/004059.html
>
> I'm trying to identify the discussions prior to that covering the moves:
>
> Bio.WWW.ExPASy -> Bio.ExPASy
> Bio.WWW.InterPro -> Bio.InterPro
> Bio.WWW.NCBI -> Bio.Entrez
> Bio.WWW.SCOP -> Bio.SCOP

Probably this thread,
http://lists.open-bio.org/pipermail/biopython-dev/2007-November/003241.html

Also a bit more background on the NCBI Entrez side:
http://lists.open-bio.org/pipermail/biopython-dev/2008-February/003423.html

Peter

From natemsutton at yahoo.com  Tue Jan 29 16:22:57 2013
From: natemsutton at yahoo.com (Nate Sutton)
Date: Tue, 29 Jan 2013 13:22:57 -0800 (PST)
Subject: [Biopython-dev] New BioPython member
Message-ID: <1359494577.29159.YahooMailNeo@web122606.mail.ne1.yahoo.com>

Dear all,

I just recently joined the BioPython developers group and am
looking forward to contributing to BioPython!? I have worked for a while in programming, genetics, and biology and have
a m.s. in Biomedical Informatics.? After
talking with some fellow contributors I have decided to try working on https://redmine.open-bio.org/issues/3360 but I will also work on writing some documentation on examples from the
cookbook, especially if I am stuck on the bug.? If anyone wants to work on the same things, I?d be glad to hear that, I
may be slow on the work because I am still learning Python after coming from
other languages.

-Nate

From mjldehoon at yahoo.com  Tue Jan 29 21:00:32 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 29 Jan 2013 18:00:32 -0800 (PST)
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CAKVJ-_7pCQj4C=8MXXq=8Cr6e1rNNDuNinCJqQDhADpV_muMSg@mail.gmail.com>
Message-ID: <1359511232.14591.YahooMailClassic@web164002.mail.gq1.yahoo.com>

Bio.WWW was one of those modules that seem a good idea at first, but then failed to gain general acceptance. There are three problems with Bio.WWW:

1) From the module name, it's not clear what you would find in it. For example, if you want to access the Entrez database, would you first look in Bio.Entrez or in Bio.WWW? Similarly for TAIR: Would you look for it in Bio.TAIR, or in Bio.WWW?

2) The modules in Bio.WWW don't have much to do with each other, except that they access the internet. But any given user probably is mainly interested in Entrez, or ExPASy, or some other database, not in all of them at the same time.

3) The flip side of this is that a user accessing e.g. ExPASy would have to import both Bio.WWW and Bio.ExPASy to be able to use ExPASy. Doctests get more complicated also, as they would span more than one module. Here is an example from Bio.Entrez that accesses the database, and then parses the results:
>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here at example.org"
>>> handle = Entrez.einfo() # or esearch, efetch, ...
>>> record = Entrez.read(handle)
>>> handle.close()

The ultimate question is whether we organize the code in Biopython by their functionality from a user perspective, or by the kind of things they do? Almost all of Biopython is organized according to the former. For example, we don't have a Bio.Parsers module for all the parsers; similarly, we don't have Bio.WWW for internet access.

Best,
-Michiel.


--- On Tue, 1/29/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] Namespace for online resources?
> To: "Wibowo Arindrarto" <w.arindrarto at gmail.com>
> Cc: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Tuesday, January 29, 2013, 4:11 PM
> On Tue, Jan 29, 2013 at 9:03 PM,
> Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> > On Tue, Jan 29, 2013 at 7:52 PM, Wibowo Arindrarto
> > <w.arindrarto at gmail.com>
> wrote:
> >> Hi everyone,
> >>
> >> Why was Bio.WWW deprecated in the first place?
> >>
> >
> > The flippant answer is everything under Bio.WWW was
> moved
> > or deprecated:
> > http://lists.open-bio.org/pipermail/biopython-dev/2008-July/004059.html
> >
> > I'm trying to identify the discussions prior to that
> covering the moves:
> >
> > Bio.WWW.ExPASy -> Bio.ExPASy
> > Bio.WWW.InterPro -> Bio.InterPro
> > Bio.WWW.NCBI -> Bio.Entrez
> > Bio.WWW.SCOP -> Bio.SCOP
> 
> Probably this thread,
> http://lists.open-bio.org/pipermail/biopython-dev/2007-November/003241.html
> 
> Also a bit more background on the NCBI Entrez side:
> http://lists.open-bio.org/pipermail/biopython-dev/2008-February/003423.html
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 

From kjwu at ucsd.edu  Tue Jan 29 21:09:42 2013
From: kjwu at ucsd.edu (Kevin Wu)
Date: Tue, 29 Jan 2013 18:09:42 -0800
Subject: [Biopython-dev] Trie with_prefix doesn't work as expected
Message-ID: <CAEe6yUEX_xbawFkP74-yKO+eTKo_mAU9SGO-+0PoRwU0mA8=vw@mail.gmail.com>

Hi All,

I'm attempting to use the trie implementation in biopython to develop a
suffix trie. I'm using the with_prefix function to find all keys which
start with a sequence, however, the function doesn't return values that I
expect. I tested it with the canonical example "banana" and am a bit
confused.

from Bio.trie import trie
t = trie()
s = "BANANA"
for i in range(len(s)):  # insert all suffixes into trie
    t[s[i:]] = i

t.with_prefix("NA")  # this works as expected
>> ['NA', 'NANA']

t.with_prefix("AN")
>> ['AN', 'ANNA']  # this doesn't work as expected
                           # expected output: ["ANANA", "ANA"]

Can anyone clarify my confusion or confirm this bug? I'm on Biopython 1.60,
Linux Mint 64-bit.

Thanks!
Kevin

From mjldehoon at yahoo.com  Tue Jan 29 21:29:09 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 29 Jan 2013 18:29:09 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CADEGkF4LL=0_5uCtMMsBveM-ce1XkxPDcTASMhAUEXKiYFLY9A@mail.gmail.com>
Message-ID: <1359512949.16659.YahooMailClassic@web164002.mail.gq1.yahoo.com>

Hi Bow,

Thanks for the explanation.

> Indeed, the doctests functions are two simple small
> functions to make it easier to run doctests. The first
> one looks up the test directory (our Tests directory) and
> the second one simply executes the doctest.

The point of looking up the test directory is to find the example input files, right?
Have a look at Bio/Align/Applications/_Mafft.py.
Its doctest uses the complete path to the example input file:

https://github.com/biopython/biopython/commit/32a6beb1e039fa614398a7dee1c031466e8e42ed#Bio/Align/Applications/_Mafft.py

I like this solution better, since it's more straightforward, it doesn't need a new module, and also allows the user to run the example without having to figure out where the input file is located.

Best,
-Michiel.

From k.d.murray.91 at gmail.com  Tue Jan 29 22:37:46 2013
From: k.d.murray.91 at gmail.com (Kevin Murray)
Date: Wed, 30 Jan 2013 14:37:46 +1100
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <1359511232.14591.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CAKVJ-_7pCQj4C=8MXXq=8Cr6e1rNNDuNinCJqQDhADpV_muMSg@mail.gmail.com>
	<1359511232.14591.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CAH80STVHwLdBY5Ov4CuSBth4W2=ytRYHq2MB47=tdAQTfN66eg@mail.gmail.com>

Hi all,

Essentially, I agree with everything Bow and Lenna have said. If all
web-based tools are in a single root-level package, then with appropriate
documentation I think users should know where to find any function. People
are at least going to know if their required module interfaces with some
website.

I guess the problem is that moving all the web stuff into one package will
break alot of code, which leads me back to my original idea of just copying
where stuff like TOGOws and ExPASy is located, i.e. sticking TAIR in the
root level directory.

Peter and Michiel, do you think that Lenna's suggestion is workable? Would
it make sense to go all in and simultaneously refactor parsers into
Bio.parse,  Bio.*IO into Bio.io.*, etc etc. Perhaps this could be delayed
until the next major release (or form the beginings of a biopython2
branch?).

Cheers,
Kevin Murray


On 30 January 2013 13:00, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> Bio.WWW was one of those modules that seem a good idea at first, but then
> failed to gain general acceptance. There are three problems with Bio.WWW:
>
> 1) From the module name, it's not clear what you would find in it. For
> example, if you want to access the Entrez database, would you first look in
> Bio.Entrez or in Bio.WWW? Similarly for TAIR: Would you look for it in
> Bio.TAIR, or in Bio.WWW?
>
> 2) The modules in Bio.WWW don't have much to do with each other, except
> that they access the internet. But any given user probably is mainly
> interested in Entrez, or ExPASy, or some other database, not in all of them
> at the same time.
>
> 3) The flip side of this is that a user accessing e.g. ExPASy would have
> to import both Bio.WWW and Bio.ExPASy to be able to use ExPASy. Doctests
> get more complicated also, as they would span more than one module. Here is
> an example from Bio.Entrez that accesses the database, and then parses the
> results:
> >>> from Bio import Entrez
> >>> Entrez.email = "Your.Name.Here at example.org"
> >>> handle = Entrez.einfo() # or esearch, efetch, ...
> >>> record = Entrez.read(handle)
> >>> handle.close()
>
> The ultimate question is whether we organize the code in Biopython by
> their functionality from a user perspective, or by the kind of things they
> do? Almost all of Biopython is organized according to the former. For
> example, we don't have a Bio.Parsers module for all the parsers; similarly,
> we don't have Bio.WWW for internet access.
>
> Best,
> -Michiel.
>
>
> --- On Tue, 1/29/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> > From: Peter Cock <p.j.a.cock at googlemail.com>
> > Subject: Re: [Biopython-dev] Namespace for online resources?
> > To: "Wibowo Arindrarto" <w.arindrarto at gmail.com>
> > Cc: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> > Date: Tuesday, January 29, 2013, 4:11 PM
> > On Tue, Jan 29, 2013 at 9:03 PM,
> > Peter Cock <p.j.a.cock at googlemail.com>
> > wrote:
> > > On Tue, Jan 29, 2013 at 7:52 PM, Wibowo Arindrarto
> > > <w.arindrarto at gmail.com>
> > wrote:
> > >> Hi everyone,
> > >>
> > >> Why was Bio.WWW deprecated in the first place?
> > >>
> > >
> > > The flippant answer is everything under Bio.WWW was
> > moved
> > > or deprecated:
> > >
> http://lists.open-bio.org/pipermail/biopython-dev/2008-July/004059.html
> > >
> > > I'm trying to identify the discussions prior to that
> > covering the moves:
> > >
> > > Bio.WWW.ExPASy -> Bio.ExPASy
> > > Bio.WWW.InterPro -> Bio.InterPro
> > > Bio.WWW.NCBI -> Bio.Entrez
> > > Bio.WWW.SCOP -> Bio.SCOP
> >
> > Probably this thread,
> >
> http://lists.open-bio.org/pipermail/biopython-dev/2007-November/003241.html
> >
> > Also a bit more background on the NCBI Entrez side:
> >
> http://lists.open-bio.org/pipermail/biopython-dev/2008-February/003423.html
> >
> > Peter
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

From p.j.a.cock at googlemail.com  Wed Jan 30 03:52:24 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 30 Jan 2013 08:52:24 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359512949.16659.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CADEGkF4LL=0_5uCtMMsBveM-ce1XkxPDcTASMhAUEXKiYFLY9A@mail.gmail.com>
	<1359512949.16659.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_5LZhd+s1+-E=TCuRzEEx-BDUFYqs6sGDm8aDgQsr+d6Q@mail.gmail.com>

On Wed, Jan 30, 2013 at 2:29 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Bow,
>
> Thanks for the explanation.
>
>> Indeed, the doctests functions are two simple small
>> functions to make it easier to run doctests. The first
>> one looks up the test directory (our Tests directory) and
>> the second one simply executes the doctest.
>
> The point of looking up the test directory is to find the
> example input files, right?

Yes. Most of the code is working out where our Test
directory is, without that it is just two lines:

import doctest
doctest.testmod()

> Have a look at Bio/Align/Applications/_Mafft.py.
> Its doctest uses the complete path to the example input file:
>
> https://github.com/biopython/biopython/commit/32a6beb1e039fa614398a7dee1c031466e8e42ed#Bio/Align/Applications/_Mafft.py
>
> I like this solution better, since it's more straightforward, it doesn't
> need a new module, and also allows the user to run the example
> without having to figure out where the input file is located.

That's a special case - the file being referred to isn't used
other than to print out a command line string. So it is fine.

The doctests we're talking about typically are for parsing,
and they need to find the file. In order to run via the main
test suite (run_tests.py) we can assume we are in the
Biopython Tests folder and therefore use relative paths.

Those relative paths won't work if trying to run the doctests
via the __name__ trick, thus the path magic which seemed
sensible to put in one place only.

We can of course remove these __name__ trick conveniences,
they are only intended to make life easier for us developers
when editing the doctests of a module. But I think it is worth
having as a private function somewhere in the code base.

Regards,

Peter

From p.j.a.cock at googlemail.com  Wed Jan 30 04:31:31 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 30 Jan 2013 09:31:31 +0000
Subject: [Biopython-dev] New BioPython member
In-Reply-To: <1359494577.29159.YahooMailNeo@web122606.mail.ne1.yahoo.com>
References: <1359494577.29159.YahooMailNeo@web122606.mail.ne1.yahoo.com>
Message-ID: <CAKVJ-_4HQC5V59V=jy4cpkqAAs-8FxtbAsjph3tWXwRAMFAMyQ@mail.gmail.com>

On Tue, Jan 29, 2013 at 9:22 PM, Nate Sutton <natemsutton at yahoo.com> wrote:
> Dear all,
>
> I just recently joined the BioPython developers group and am
> looking forward to contributing to BioPython!  I have worked for a while
> in programming, genetics, and biology and have
> a m.s. in Biomedical Informatics.  After
> talking with some fellow contributors I have decided to try working on
> https://redmine.open-bio.org/issues/3360 but I will also work on writing
> some documentation on examples from the
> cookbook, especially if I am stuck on the bug.  If anyone wants to work on
> the same things, I?d be glad to hear that, I
> may be slow on the work because I am still learning Python after coming
> from
> other languages.
>
> -Nate

Hi Nate, and welcome.

Eric is in charge of the Bio.Phylo module, but within that the
command line application wrappers under Bio.Phylo.Applications
follow a pattern used elsewhere in Biopython.

To add a wrapper for fasttree http://www.microbesonline.org/fasttree/
have a look at the existing wrappers for PHYML and RAXML, defined in
Bio/Phylo/Applications/_Phyml.py and Bio/Phylo/Applications/_Raxml.py
(leading underscores mean private modules in Python), which are
exposed to the user via Bio/Phylo/Applications/__init__.py

In this case, I'd suggest putting the new wrapper in a new file,
Bio/Phylo/Applications/_fastree.py

Other similar wrappers existing under Bio.Emboss, Bio.Align, etc.

Don't be shy about asking for guidance on this, or git and github.
Ultimately I'm hoping you'll be able to do is take a fork (personally
copy of the repository) on GitHub, create a new fasttree branch,
commit your enhancements, and make a pull request. If that's
all too much for now, simply writing the new file and letting us
do the git side would be fine.

Regards,

Peter


From p.j.a.cock at googlemail.com  Wed Jan 30 04:42:23 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 30 Jan 2013 09:42:23 +0000
Subject: [Biopython-dev] Trie with_prefix doesn't work as expected
In-Reply-To: <CAEe6yUEX_xbawFkP74-yKO+eTKo_mAU9SGO-+0PoRwU0mA8=vw@mail.gmail.com>
References: <CAEe6yUEX_xbawFkP74-yKO+eTKo_mAU9SGO-+0PoRwU0mA8=vw@mail.gmail.com>
Message-ID: <CAKVJ-_5XvS899Oi7=3oDMW713oHnz6nNvEx5C-rRMUa=aashvQ@mail.gmail.com>

On Wed, Jan 30, 2013 at 2:09 AM, Kevin Wu <kjwu at ucsd.edu> wrote:
> Hi All,
>
> I'm attempting to use the trie implementation in biopython to develop a
> suffix trie. I'm using the with_prefix function to find all keys which
> start with a sequence, however, the function doesn't return values that I
> expect. I tested it with the canonical example "banana" and am a bit
> confused.
>
> from Bio.trie import trie
> t = trie()
> s = "BANANA"
> for i in range(len(s)):  # insert all suffixes into trie
>     t[s[i:]] = i
>
> t.with_prefix("NA")  # this works as expected
>>> ['NA', 'NANA']
>
> t.with_prefix("AN")
>>> ['AN', 'ANNA']  # this doesn't work as expected
>                            # expected output: ["ANANA", "ANA"]
>
> Can anyone clarify my confusion or confirm this bug? I'm on Biopython 1.60,
> Linux Mint 64-bit.

There is certainly something odd happening. I'm testing with the
current code in git (pre-Biopython 1.61) under Mac OS X.

>>> from Bio.trie import trie
>>> t = trie()
>>> s = "BANANA"
>>> for i in range(len(s)):  # insert all suffixes into trie
...     t[s[i:]] = i
...     print "%s -> %i" % (s[i:], i)
...     assert t[s[i:]] == i
...
BANANA -> 0
ANANA -> 1
NANA -> 2
ANA -> 3
NA -> 4
A -> 5
>>> t.values()
[5, 3, 1, 0, 4, 2]
>>> t.keys()
['A', 'ANA', 'ANANA', 'BANANA', 'NA', 'NANA']

These look fine:

>>> t.with_prefix("NA")
['NA', 'NANA']
>>> t.with_prefix("A")
['A', 'ANA', 'ANANA']
>>> t.with_prefix("ANA")
['ANA', 'ANANA']

As you point out, this example seems wrong:

>>> t.with_prefix("AN")
['AN', 'ANNA']

The value 'ANNA' shouldn't be in the trie.

Peter

From mjldehoon at yahoo.com  Wed Jan 30 05:20:53 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 30 Jan 2013 02:20:53 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CAKVJ-_5LZhd+s1+-E=TCuRzEEx-BDUFYqs6sGDm8aDgQsr+d6Q@mail.gmail.com>
Message-ID: <1359541253.85968.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi Peter,

--- On Wed, 1/30/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Those relative paths won't work if trying to run the
> doctests via the __name__ trick, thus the path magic which
> seemed sensible to put in one place only.

In which case won't they work? I tried this on SeqRecord.py, and as far as I can tell, the relative paths work fine also when running the doctests from the __name__=="__main__" block, both on Unix and Windows.

Best,
-Michiel

From p.j.a.cock at googlemail.com  Wed Jan 30 06:42:21 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 30 Jan 2013 11:42:21 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359541253.85968.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <CAKVJ-_5LZhd+s1+-E=TCuRzEEx-BDUFYqs6sGDm8aDgQsr+d6Q@mail.gmail.com>
	<1359541253.85968.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7_RdLfTP4iAGwieR-dmFSRLx_euO0Xx-qk8cRzBsNzOg@mail.gmail.com>

On Wed, Jan 30, 2013 at 10:20 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Peter,
>
> --- On Wed, 1/30/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> Those relative paths won't work if trying to run the
>> doctests via the __name__ trick, thus the path magic which
>> seemed sensible to put in one place only.
>
> In which case won't they work? I tried this on SeqRecord.py,
> and as far as I can tell, the relative paths work fine also when
> running the doctests from the __name__=="__main__" block,
> both on Unix and Windows.

Yes, no path magic works IF you are in the Tests folder, e.g.

~/biopython/Tests $ emacs ../Bio/SeqRecord.py
~/biopython/Tests $ python ../Bio/SeqRecord.py

However for anything like the following convenient alternatives
to work and run the doctests, you need some path magic:

~/biopython $ emacs Bio/SeqRecord.py
~/biopython $ python Bio/SeqRecord.py

Or,

~/biopython/Bio $ emacs SeqRecord.py
~/biopython/Bio $ python SeqRecord.py

I felt having a central convenience function to make that work
was worthwhile in order to make working on doctests easier
without code duplication. I would accept that this alone does
not justify a whole module or file like Bio/_utils.py

If you feel strongly about this, we can remove the function
run_doctest from Bio/_utils.py (it does after all serve no
real purpose in the installed library code), and just require
the current directory be the test folder.

Would you like me to make that change?

Regards,

Peter

From mjldehoon at yahoo.com  Wed Jan 30 07:10:17 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 30 Jan 2013 04:10:17 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CAKVJ-_7_RdLfTP4iAGwieR-dmFSRLx_euO0Xx-qk8cRzBsNzOg@mail.gmail.com>
Message-ID: <1359547817.36972.YahooMailClassic@web164001.mail.gq1.yahoo.com>

Hi Peter,

--- On Wed, 1/30/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> However for anything like the following convenient
> alternatives to work and run the doctests, you need
> some path magic:
> ~/biopython $ emacs Bio/SeqRecord.py
> ~/biopython $ python Bio/SeqRecord.py

Here I agree.
> Or,
> 
> ~/biopython/Bio $ emacs SeqRecord.py
> ~/biopython/Bio $ python SeqRecord.py
> 
Well I was thinking that the doctests in SeqRecord.py could use a relative path to the Tests directory, e.g. ../Tests/Quality/solexa_faked.fastq.
But I agree that this will fail again for any script in submodules.

Still I would think that there is a better way to do this, and I doubt that we are the first ones who want to access test files with doctests. I can write a short message to comp.lang.python to see have anybody has any suggestions.

Best,
-Michiel.

From arklenna at gmail.com  Wed Jan 30 12:10:40 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Wed, 30 Jan 2013 12:10:40 -0500
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <1359511232.14591.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CAKVJ-_7pCQj4C=8MXXq=8Cr6e1rNNDuNinCJqQDhADpV_muMSg@mail.gmail.com>
	<1359511232.14591.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CALfq9t++Co8TVunxk0J9JopMtmgvktiV3LKfAzU=7m=RhdMBFg@mail.gmail.com>

Michiel,

You raise an excellent point that separating the modules in this way will
complicate doctests.

Regarding point (2), is your primary concern namespace clutter or importing
efficiency?

I still maintain that the category of internet access is more fundamental
than the category of parsers. For point (1), if every database is accessed
using a WWW submodule, a user will know to look there.

Obviously moving everything would be a lot of work...

Cheers,

Lenna


On Tue, Jan 29, 2013 at 9:00 PM, Michiel de Hoon <mjldehoon at yahoo.com>wrote:

> Bio.WWW was one of those modules that seem a good idea at first, but then
> failed to gain general acceptance. There are three problems with Bio.WWW:
>
> 1) From the module name, it's not clear what you would find in it. For
> example, if you want to access the Entrez database, would you first look in
> Bio.Entrez or in Bio.WWW? Similarly for TAIR: Would you look for it in
> Bio.TAIR, or in Bio.WWW?
>
> 2) The modules in Bio.WWW don't have much to do with each other, except
> that they access the internet. But any given user probably is mainly
> interested in Entrez, or ExPASy, or some other database, not in all of them
> at the same time.
>
> 3) The flip side of this is that a user accessing e.g. ExPASy would have
> to import both Bio.WWW and Bio.ExPASy to be able to use ExPASy. Doctests
> get more complicated also, as they would span more than one module. Here is
> an example from Bio.Entrez that accesses the database, and then parses the
> results:
> >>> from Bio import Entrez
> >>> Entrez.email = "Your.Name.Here at example.org"
> >>> handle = Entrez.einfo() # or esearch, efetch, ...
> >>> record = Entrez.read(handle)
> >>> handle.close()
>
> The ultimate question is whether we organize the code in Biopython by
> their functionality from a user perspective, or by the kind of things they
> do? Almost all of Biopython is organized according to the former. For
> example, we don't have a Bio.Parsers module for all the parsers; similarly,
> we don't have Bio.WWW for internet access.
>
> Best,
> -Michiel.
>
>
> --- On Tue, 1/29/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> > From: Peter Cock <p.j.a.cock at googlemail.com>
> > Subject: Re: [Biopython-dev] Namespace for online resources?
> > To: "Wibowo Arindrarto" <w.arindrarto at gmail.com>
> > Cc: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> > Date: Tuesday, January 29, 2013, 4:11 PM
> > On Tue, Jan 29, 2013 at 9:03 PM,
> > Peter Cock <p.j.a.cock at googlemail.com>
> > wrote:
> > > On Tue, Jan 29, 2013 at 7:52 PM, Wibowo Arindrarto
> > > <w.arindrarto at gmail.com>
> > wrote:
> > >> Hi everyone,
> > >>
> > >> Why was Bio.WWW deprecated in the first place?
> > >>
> > >
> > > The flippant answer is everything under Bio.WWW was
> > moved
> > > or deprecated:
> > >
> http://lists.open-bio.org/pipermail/biopython-dev/2008-July/004059.html
> > >
> > > I'm trying to identify the discussions prior to that
> > covering the moves:
> > >
> > > Bio.WWW.ExPASy -> Bio.ExPASy
> > > Bio.WWW.InterPro -> Bio.InterPro
> > > Bio.WWW.NCBI -> Bio.Entrez
> > > Bio.WWW.SCOP -> Bio.SCOP
> >
> > Probably this thread,
> >
> http://lists.open-bio.org/pipermail/biopython-dev/2007-November/003241.html
> >
> > Also a bit more background on the NCBI Entrez side:
> >
> http://lists.open-bio.org/pipermail/biopython-dev/2008-February/003423.html
> >
> > Peter
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

From w.arindrarto at gmail.com  Wed Jan 30 12:20:39 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 30 Jan 2013 18:20:39 +0100
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <1359511232.14591.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CAKVJ-_7pCQj4C=8MXXq=8Cr6e1rNNDuNinCJqQDhADpV_muMSg@mail.gmail.com>
	<1359511232.14591.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CADEGkF7u7O=ZzwuJ8ZQyySofWoQpFAKGS247usr_mJK2dZcJMA@mail.gmail.com>

Hi everyone,

Peter, thanks for the links to the archives, I'm starting to get a
grip on why Bio.WWW was deprecated in the first place.

Michiel, thanks for the explanation. My responses are below.

My reply is a bit long, so in the interest of brevity, I'll say first
that I'm in favor of putting TAIR in Bio.TAIR now, for practical
reasons and consistency with similar modules. But I do still have some
slight objections to this approach.

> Bio.WWW was one of those modules that seem a good idea at first, but then failed to gain general acceptance. There are three problems with Bio.WWW:
>
> 1) From the module name, it's not clear what you would find in it. For example, if you want to access the Entrez database, would you first look in Bio.Entrez or in Bio.WWW? Similarly for TAIR: Would you look for it in Bio.TAIR, or in Bio.WWW?

This seems to be a naming issue, but it does not invalidate the idea
of having one central place for online access. I'll continue to refer
to this module as Bio.WW here, but there may be other more suitable
names, such as Bio.remotedb, Bio.remote.db, Bio.www.db (or something
else) which makes the module a more intuitive place to look in,
right?.

> 2) The modules in Bio.WWW don't have much to do with each other, except that they access the internet. But any given user probably is mainly interested in Entrez, or ExPASy, or some other database, not in all of them at the same time.

We may put a note in the documentation to note this, right? If we are
worried about loading unecessary modules, we can keep the __init__.py
in Bio.WWW empty, and have Entrez, ExPASy, and the others inside
Bio.WWW.

> 3) The flip side of this is that a user accessing e.g. ExPASy would have to import both Bio.WWW and Bio.ExPASy to be able to use ExPASy. Doctests get more complicated also, as they would span more than one module. Here is an example from Bio.Entrez that accesses the database, and then parses the results:
>>>> from Bio import Entrez
>>>> Entrez.email = "Your.Name.Here at example.org"
>>>> handle = Entrez.einfo() # or esearch, efetch, ...
>>>> record = Entrez.read(handle)
>>>> handle.close()

Since ExPASy's formats may be specific to them, I was thinking their
parsers should also go in Bio.WWW (in this case, Bio.WWW.ExPASy).

Note that at the moment we also have cases where the database entry
retriever and parser lies in different submodules of the code (e.g.
importing Fasta from Bio.Entrez and parsing it with Bio.SeqIO). This
is OK in my opinion, however, as Fasta is a widely used format not
exclusive to Entrez. But for exclusive format like ExPASy's or
Entrez's, it makes sense to keep them in the same module as their
database entry retriever.

> The ultimate question is whether we organize the code in Biopython by their functionality from a user perspective, or by the kind of things they do? Almost all of Biopython is organized according to the former. For example, we don't have a Bio.Parsers module for all the parsers; similarly, we don't have Bio.WWW for internet access.

Hmm..those two points are not necessarily mutually exclusive, right? I
think having a centralized module for online access still makes for a
functional grouping based on a user's perspective.

In the parser's case, it makes sense to organize it the way we do now
as there are so many parsers. But for online access, I think it's
still manageable to put them in one directory. Just to throw the idea
around, we may also have subdirectories for different kinds of online
access (e.g. Bio.www.db for online database access, Bio.www.app for
online tools access like NCBI BLAST or HMMER).

This is not something urgent, but maybe worth thinking / discussing about :).

Cheers,
Bow


From mjldehoon at yahoo.com  Thu Jan 31 06:03:12 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 31 Jan 2013 03:03:12 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CADEGkF4LL=0_5uCtMMsBveM-ce1XkxPDcTASMhAUEXKiYFLY9A@mail.gmail.com>
Message-ID: <1359630192.62870.YahooMailClassic@web164001.mail.gq1.yahoo.com>

Dear all,

[Michiel wrote:]
> Still I would think that there is a better way to do this,
> and I doubt that we are the first ones who want to access
> test files with doctests. I can write a short message to
> comp.lang.python to see have anybody has any suggestions.

So I started writing a message to comp.lang.python, and while reading the doctest documentation to make my message understandable I realized that we can solve our problem by using the setUp and tearDown arguments to doctest.DocTestSuite. Then we put the test files in the same directory as the module we want to test, and use setUp/tearDown to let the unittest switch to this directory when needed.

This has the added benefit that the example files are easier to find for users who want to try out a doctest example.

Perhaps we'll still run into some issues if we try to implement this, but it seems a step in the right direction.

Best,
-Michiel.

From p.j.a.cock at googlemail.com  Thu Jan 31 06:38:43 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 31 Jan 2013 11:38:43 +0000
Subject: [Biopython-dev] Trie with_prefix doesn't work as expected
In-Reply-To: <CAKVJ-_5XvS899Oi7=3oDMW713oHnz6nNvEx5C-rRMUa=aashvQ@mail.gmail.com>
References: <CAEe6yUEX_xbawFkP74-yKO+eTKo_mAU9SGO-+0PoRwU0mA8=vw@mail.gmail.com>
	<CAKVJ-_5XvS899Oi7=3oDMW713oHnz6nNvEx5C-rRMUa=aashvQ@mail.gmail.com>
Message-ID: <CAKVJ-_7i4+MY6OSYkm+_tqbV_ndwCvmGc=nW0gMG9PnoEobyGA@mail.gmail.com>

On Wed, Jan 30, 2013 at 9:42 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Wed, Jan 30, 2013 at 2:09 AM, Kevin Wu <kjwu at ucsd.edu> wrote:
>> Hi All,
>>
>> I'm attempting to use the trie implementation in biopython to develop a
>> suffix trie. I'm using the with_prefix function to find all keys which
>> start with a sequence, however, the function doesn't return values that I
>> expect. I tested it with the canonical example "banana" and am a bit
>> confused.
>>
>> from Bio.trie import trie
>> t = trie()
>> s = "BANANA"
>> for i in range(len(s)):  # insert all suffixes into trie
>>     t[s[i:]] = i
>>
>> t.with_prefix("NA")  # this works as expected
>>>> ['NA', 'NANA']
>>
>> t.with_prefix("AN")
>>>> ['AN', 'ANNA']  # this doesn't work as expected
>>                            # expected output: ["ANANA", "ANA"]
>>
>> Can anyone clarify my confusion or confirm this bug? I'm on Biopython 1.60,
>> Linux Mint 64-bit.
>
> There is certainly something odd happening. I'm testing with the
> current code in git (pre-Biopython 1.61) under Mac OS X.
>
>>>> from Bio.trie import trie
>>>> t = trie()
>>>> s = "BANANA"
>>>> for i in range(len(s)):  # insert all suffixes into trie
> ...     t[s[i:]] = i
> ...     print "%s -> %i" % (s[i:], i)
> ...     assert t[s[i:]] == i
> ...
> BANANA -> 0
> ANANA -> 1
> NANA -> 2
> ANA -> 3
> NA -> 4
> A -> 5
>>>> t.values()
> [5, 3, 1, 0, 4, 2]
>>>> t.keys()
> ['A', 'ANA', 'ANANA', 'BANANA', 'NA', 'NANA']
>
> These look fine:
>
>>>> t.with_prefix("NA")
> ['NA', 'NANA']
>>>> t.with_prefix("A")
> ['A', 'ANA', 'ANANA']
>>>> t.with_prefix("ANA")
> ['ANA', 'ANANA']
>
> As you point out, this example seems wrong:
>
>>>> t.with_prefix("AN")
> ['AN', 'ANNA']
>
> The value 'ANNA' shouldn't be in the trie.
>
> Peter

Thanks to Jeff Chang for a very speedy fix (sent as an attachment off list),
which I have applied to the repository:
https://github.com/biopython/biopython/commit/cd7cc7174fd4b0607381e9c58f6ae0d17cca8f74

I've also added a unit test based on Kevin's example:
https://github.com/biopython/biopython/commit/efc289c8fe2e78ad12481973e42554fa40f2ea0a

Thank you for reporting this Kevin.

Peter

P.S. Nice to hear from you again Jeff :)

I think your last commit was before we moved from CVS to git, please
let us know if you want commit access on github.

From p.j.a.cock at googlemail.com  Thu Jan 31 06:43:44 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 31 Jan 2013 11:43:44 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359630192.62870.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <CADEGkF4LL=0_5uCtMMsBveM-ce1XkxPDcTASMhAUEXKiYFLY9A@mail.gmail.com>
	<1359630192.62870.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_5Z3cGsKY8xAi_PCjPdwALPV3qg2wfihd9o1BOQE48eYQ@mail.gmail.com>

On Thu, Jan 31, 2013 at 11:03 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Dear all,
>
> [Michiel wrote:]
>> Still I would think that there is a better way to do this,
>> and I doubt that we are the first ones who want to access
>> test files with doctests. I can write a short message to
>> comp.lang.python to see have anybody has any suggestions.
>
> So I started writing a message to comp.lang.python, and while reading
> the doctest documentation to make my message understandable I
> realized that we can solve our problem by using the setUp and tearDown
> arguments to doctest.DocTestSuite. Then we put the test files in the same
> directory as the module we want to test, and use setUp/tearDown to let
> the unittest switch to this directory when needed.
>
> This has the added benefit that the example files are easier to find
> for users who want to try out a doctest example.
>
> Perhaps we'll still run into some issues if we try to implement this, but
> it seems a step in the right direction.

I don't follow what you are suggesting here. Are you suggesting putting
test files under Bio/* as well/instead or under Tests/* ?

Peter

From mjldehoon at yahoo.com  Thu Jan 31 08:46:47 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 31 Jan 2013 05:46:47 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CAKVJ-_5Z3cGsKY8xAi_PCjPdwALPV3qg2wfihd9o1BOQE48eYQ@mail.gmail.com>
Message-ID: <1359640007.58576.YahooMailClassic@web164005.mail.gq1.yahoo.com>

> I don't follow what you are suggesting here. Are you
> suggesting putting test files under Bio/* as well/instead
> or under Tests/* ?

Well the key point is that if we run the doctests from the Tests directory (with run_tests.py), we can change directory to the directory containing the module whose doctests we want to test. Then, if "python somemodule.py" can find the test files, then so can run_tests.py. We'd just need to make sure that the relative paths in somemodule.py are correct with respect to the directory in which somemodule.py resides.

But keep in mind that the unit tests in Tests and the doctests in the modules have different functions. The purpose of the unit tests is to test the Biopython code; the purpose of the doctests is to make sure the docstring examples work. So one could argue that the heavy test files should go under Tests, while simple test files just for the docstring examples should go under Bio/SomeModule.

Best,
-Michiel.

--- On Thu, 1/31/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "Wibowo Arindrarto" <w.arindrarto at gmail.com>, "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Thursday, January 31, 2013, 6:43 AM
> On Thu, Jan 31, 2013 at 11:03 AM,
> Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > Dear all,
> >
> > [Michiel wrote:]
> >> Still I would think that there is a better way to
> do this,
> >> and I doubt that we are the first ones who want to
> access
> >> test files with doctests. I can write a short
> message to
> >> comp.lang.python to see have anybody has any
> suggestions.
> >
> > So I started writing a message to comp.lang.python, and
> while reading
> > the doctest documentation to make my message
> understandable I
> > realized that we can solve our problem by using the
> setUp and tearDown
> > arguments to doctest.DocTestSuite. Then we put the test
> files in the same
> > directory as the module we want to test, and use
> setUp/tearDown to let
> > the unittest switch to this directory when needed.
> >
> > This has the added benefit that the example files are
> easier to find
> > for users who want to try out a doctest example.
> >
> > Perhaps we'll still run into some issues if we try to
> implement this, but
> > it seems a step in the right direction.
> 
> I don't follow what you are suggesting here. Are you
> suggesting putting
> test files under Bio/* as well/instead or under Tests/* ?
> 
> Peter
> 

From p.j.a.cock at googlemail.com  Thu Jan 31 09:26:50 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 31 Jan 2013 14:26:50 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359640007.58576.YahooMailClassic@web164005.mail.gq1.yahoo.com>
References: <CAKVJ-_5Z3cGsKY8xAi_PCjPdwALPV3qg2wfihd9o1BOQE48eYQ@mail.gmail.com>
	<1359640007.58576.YahooMailClassic@web164005.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_5c=FmHibGh++thBzPsrAYv+_fnjEYgHwBy74Zpvkf-Cw@mail.gmail.com>

On Thu, Jan 31, 2013 at 1:46 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>> I don't follow what you are suggesting here. Are you
>> suggesting putting test files under Bio/* as well/instead
>> or under Tests/* ?
>
> Well the key point is that if we run the doctests from the Tests directory
> (with run_tests.py), we can change directory to the directory containing
> the module whose doctests we want to test. Then, if "python somemodule.py"
> can find the test files, then so can run_tests.py. We'd just need to make
> sure that the relative paths in somemodule.py are correct with respect to
> the directory in which somemodule.py resides.

I can see how that would work - put all the path changing magic into
run_tests.py (before running the doctest for Bio/x/y/z.py change to
the directory Bio/x/y and so on), and have the Bio/x/y/z.py doctests
assume they will be run from Bio/x/y only.

> But keep in mind that the unit tests in Tests and the doctests in the modules
> have different functions. The purpose of the unit tests is to test the Biopython
> code; the purpose of the doctests is to make sure the docstring examples work.

Of course.

> So one could argue that the heavy test files should go under Tests, while
> simple test files just for the docstring examples should go under Bio/SomeModule.

Many of the unittests and doctests currently use the same example files.

However, my main objection is that I don't like the idea of putting test files
under Bio/* - I feel it should be the source code only (bar some special
cases like data files). There are probably packaging guidelines about this
somewhere... but I can't find anything immediately.

Regards,

Peter

From mjldehoon at yahoo.com  Thu Jan 31 10:33:35 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 31 Jan 2013 07:33:35 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
Message-ID: <1359646415.80564.YahooMailClassic@web164006.mail.gq1.yahoo.com>

--- On Thu, 1/31/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> However, my main objection is that I don't like the idea of
> putting test files under Bio/* 

I'm OK with using the setUp and tearDown arguments to doctest.DocTestSuite to do the directory magic, but keeping the test files under Tests/.

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Thu Jan 31 10:47:18 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 31 Jan 2013 15:47:18 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359646415.80564.YahooMailClassic@web164006.mail.gq1.yahoo.com>
References: <1359646415.80564.YahooMailClassic@web164006.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_6RR-gZ+oCmW+N2hBnutvk7OvwxiFDym_y5NZ+KGi0Sow@mail.gmail.com>

On Thu, Jan 31, 2013 at 3:33 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Thu, 1/31/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> However, my main objection is that I don't like the idea of
>> putting test files under Bio/*
>
> I'm OK with using the setUp and tearDown arguments to
> doctest.DocTestSuite to do the directory magic, but keeping the test files
> under Tests/.

As a more elegant version of the Bio._utils.run_doctest() function?

Peter

From p.j.a.cock at googlemail.com  Mon Jan  7 18:55:25 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 7 Jan 2013 18:55:25 +0000
Subject: [Biopython-dev] Dropping Python 2.5 and Jython 2.5 support?
In-Reply-To: <CAKVJ-_60vgH_T8eqrfYKBScyRVNUApmFM32_4D+uTLABLy19Ng@mail.gmail.com>
References: <CAKVJ-_60vgH_T8eqrfYKBScyRVNUApmFM32_4D+uTLABLy19Ng@mail.gmail.com>
Message-ID: <CAKVJ-_5M55Lky6qg+DohH-QQ1JBBqueCZ1GCzcsYscQ1YSu6UQ@mail.gmail.com>

On Mon, Oct 22, 2012 at 6:17 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Dear Biopythoneers,
>
> Would anyone object to us preparing to drop support for Python 2.5 and
> Jython 2.5, perhaps after the next Biopython release?
>
> To reassure those of you using Jython, we'd wait until Jython 2.7 is out
> first. Jython 2.7 is already in alpha, and brings support for C Python 2.7
> language features.
>
> Thanks,
>
> Peter

Hello all,

Having recently back-ported some Python 3 code with a C
extension to Python 2.6 and 2.7, I can now more clearly
appreciate the benefits dropping Python 2.5 support has for
writing code for both Python 2 and 3 - and am keen to be
able to exploit this for Biopython.

Given no major objections to the email I sent round in October
last year (thank you for your input Nathan), we will press ahead
with phasing out support for Python 2.5, provisionally supporting
it in the forthcoming Biopython 1.61 and at least one more release
(which would mean Biopython 1.62 due Summer 2013).

https://github.com/biopython/biopython/commit/3f17f75b320fb6624d332809ef07314bab97477c

My only significant concern is for Jython users, since this will also
mean dropping support for Jython 2.5 (which implements the
Python 2.5 language). The replacement Jython 2.7 is still only
at the alpha release stage.

Regards,

Peter


From kai.blin at biotech.uni-tuebingen.de  Tue Jan  8 10:28:31 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Tue, 08 Jan 2013 11:28:31 +0100
Subject: [Biopython-dev] More relaxed parsing of wonky GenBank files
Message-ID: <50EBF4CF.9080901@biotech.uni-tuebingen.de>

Hi folks,

I've recently pushed into production use a new version of my software
that uses BioPython parsers instead of our own hand-written parsers.

One big thing we noticed is that BioPython is waaay more picky as to
what a proper GenBank file is supposed to look like. Sadly, many of
our users seem to be creating their GenBank files with programs that
only have a rough understanding what the file format is supposed to
look like. Most of the invalid input can safely be ignored, and I
would propose to extend the GenBank parser to cope with the most
common errors I'm seeing in day to day use.

I'm happy to provide the patches, but before starting this work I'd
like to make sure that they would be acceptable in principle. So, any
reason to rather blow up in our user's face than to try and cope with
invalid input?

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From mjldehoon at yahoo.com  Tue Jan  8 11:11:46 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 8 Jan 2013 03:11:46 -0800 (PST)
Subject: [Biopython-dev] More relaxed parsing of wonky GenBank files
In-Reply-To: <50EBF4CF.9080901@biotech.uni-tuebingen.de>
Message-ID: <1357643506.32308.YahooMailClassic@web164005.mail.gq1.yahoo.com>

Entrez.parse has a "validate" argument to allow parsing of XML files that contain tags that are not represented in the corresponding DTD. If validate==True, the parser raises an Exception if any tags are missing. If False, then the parser will ignore missing tags.
Maybe SeqIO.parse could have a similar "validate" argument?

Best,
-Michiel.

--- On Tue, 1/8/13, Kai Blin <kai.blin at biotech.uni-tuebingen.de> wrote:

> From: Kai Blin <kai.blin at biotech.uni-tuebingen.de>
> Subject: [Biopython-dev] More relaxed parsing of wonky GenBank files
> To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Tuesday, January 8, 2013, 5:28 AM
> Hi folks,
> 
> I've recently pushed into production use a new version of my
> software
> that uses BioPython parsers instead of our own hand-written
> parsers.
> 
> One big thing we noticed is that BioPython is waaay more
> picky as to
> what a proper GenBank file is supposed to look like. Sadly,
> many of
> our users seem to be creating their GenBank files with
> programs that
> only have a rough understanding what the file format is
> supposed to
> look like. Most of the invalid input can safely be ignored,
> and I
> would propose to extend the GenBank parser to cope with the
> most
> common errors I'm seeing in day to day use.
> 
> I'm happy to provide the patches, but before starting this
> work I'd
> like to make sure that they would be acceptable in
> principle. So, any
> reason to rather blow up in our user's face than to try and
> cope with
> invalid input?
> 
> Cheers,
> Kai
> 
> -- 
> Dipl.-Inform. Kai Blin? ? ?
> ???kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-Universit?t T?bingen
> Auf der Morgenstelle 28? ? ? ? ?
> ? ? ???Phone : ++49 7071 29-78841
> D-72076 T?bingen? ? ? ? ? ?
> ? ? ? ? ? ? Fax
> :???++49 7071 29-5979
> Germany
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


From p.j.a.cock at googlemail.com  Tue Jan  8 13:27:20 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 8 Jan 2013 13:27:20 +0000
Subject: [Biopython-dev] More relaxed parsing of wonky GenBank files
In-Reply-To: <50EBF4CF.9080901@biotech.uni-tuebingen.de>
References: <50EBF4CF.9080901@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_5nr3n46RPbtazb4LLyFUzPzRgxjtOdbrQRK3w-c-tWQA@mail.gmail.com>

On Tuesday, January 8, 2013, Kai Blin wrote:

> Hi folks,
>
> I've recently pushed into production use a new version of my software
> that uses BioPython parsers instead of our own hand-written parsers.
>
> One big thing we noticed is that BioPython is waaay more picky as to
> what a proper GenBank file is supposed to look like. Sadly, many of
> our users seem to be creating their GenBank files with programs that
> only have a rough understanding what the file format is supposed to
> look like. Most of the invalid input can safely be ignored, and I
> would propose to extend the GenBank parser to cope with the most
> common errors I'm seeing in day to day use.
>
> I'm happy to provide the patches, but before starting this work I'd
> like to make sure that they would be acceptable in principle. So, any
> reason to rather blow up in our user's face than to try and cope with
> invalid input?
>
> Cheers,
> Kai
>

We already try to be tolerant, and issue warnings where it seems
safe to take a broken file (e.g. Unrecognised first line, mismatch
between length given in first line and actual sequence), but in
these cases not all the mis-formed data will or can be parsed.
Sometimes a file is broken to the point it is unwise to attempt
to parse it any further and an exception is the best course
of action.

Clearly you're found a whole load more dodgy files. If you
can work out which buggy tools are producing them, please
do try and report the issues to the tool authors. I know that
BioEdit is one source, but maintainence of that popular
free Windows tool stopped many years ago.

If you can prepare some (small) example files illustrating the
rule-breaking files (for testing), and with patches too if you like,
I will certainly review them for inclusion.

Note if the user wants an exception, they can use the warnings
module to catch and upgrade our parser warnings. As Michael
pointed out, other bits of Biopython have an explicit validation
or strict mode like the Entrez and PDB parsers. In the case of
the PDB parser this just toggles between issuing warnings and
raising exceptions. I'm not sure if the GenBank (and any other
SeqIO parsers) need a validate/permissive option given this
can already be achieved with the warnings module. After all,
broken GenBank files should be in the minority.

(My understanding of the Entrez setting is also about dealing
with missing DTD files and cases where the NCBI has a
bug and their XML and DTD disagree.)

Peter


From kai.blin at biotech.uni-tuebingen.de  Tue Jan  8 13:55:42 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Tue, 08 Jan 2013 14:55:42 +0100
Subject: [Biopython-dev] More relaxed parsing of wonky GenBank files
In-Reply-To: <CAKVJ-_5nr3n46RPbtazb4LLyFUzPzRgxjtOdbrQRK3w-c-tWQA@mail.gmail.com>
References: <50EBF4CF.9080901@biotech.uni-tuebingen.de>
	<CAKVJ-_5nr3n46RPbtazb4LLyFUzPzRgxjtOdbrQRK3w-c-tWQA@mail.gmail.com>
Message-ID: <50EC255E.5040904@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-01-08 14:27, Peter Cock wrote:

> We already try to be tolerant, and issue warnings where it seems 
> safe to take a broken file (e.g. Unrecognised first line, mismatch 
> between length given in first line and actual sequence), but in 
> these cases not all the mis-formed data will or can be parsed. 
> Sometimes a file is broken to the point it is unwise to attempt to
> parse it any further and an exception is the best course of
> action.

Yeah, I started looking into the code and realized that it already
tries to handle a lot of special cases.

> Clearly you're found a whole load more dodgy files. If you can work
> out which buggy tools are producing them, please do try and report
> the issues to the tool authors. I know that BioEdit is one source,
> but maintainence of that popular free Windows tool stopped many
> years ago.

Unfortunately I often have no way to contact the uploaders of the
broken sequence files, unless they chose to provide an email address.

> If you can prepare some (small) example files illustrating the 
> rule-breaking files (for testing), and with patches too if you
> like, I will certainly review them for inclusion.

The two most common things I saw in the last week are single record
files without the '//' end-of-record marker, and files where the
sequence lines are indented by one space more than expected (my
favourite).

I've added two sample files for these issues, I'm currently working on
patches that make them pass the tests.

Thanks for the comments. I'll push to my github fork once I've got
something.

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQEcBAEBAgAGBQJQ7CVeAAoJEKM5lwBiwTTPGCYIANAkOxKtNPkclw66aCBWCaAH
Uz6zyCk8DTomGOy1fnBoPKI3R+tn73+8XNe6RknFDb6NL/uMD1bR4mTHi1yuHT24
7XSJp+j1JeIamMSs6hLAf4s/HIE2YoEriOe8I6lUAa2I//rxsKf2PcS7y/4Ax6XP
K/PUPODVanTCKFrpOIh2DS92lXvMJqI+cpZQ7k1ioaL+6iM9uqi9iRiV9H69Dci5
9bubA98+XvG1cnBISoQTHXpU1p1uiKU1CLxyWdl+9GTq4dCxTkeKDQvxoOd8JH/P
ksJPXyYY5u41KrDFpIMNJZpvr0PawLHcUGePKXDEvAt7wvmfDxN92xcVYsUP9w4=
=9u/w
-----END PGP SIGNATURE-----


From kai.blin at biotech.uni-tuebingen.de  Tue Jan  8 14:36:03 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Tue, 08 Jan 2013 15:36:03 +0100
Subject: [Biopython-dev] More relaxed parsing of wonky GenBank files
In-Reply-To: <50EC255E.5040904@biotech.uni-tuebingen.de>
References: <50EBF4CF.9080901@biotech.uni-tuebingen.de>
	<CAKVJ-_5nr3n46RPbtazb4LLyFUzPzRgxjtOdbrQRK3w-c-tWQA@mail.gmail.com>
	<50EC255E.5040904@biotech.uni-tuebingen.de>
Message-ID: <50EC2ED3.8000401@biotech.uni-tuebingen.de>

On 2013-01-08 14:55, Kai Blin wrote:

> Thanks for the comments. I'll push to my github fork once I've got 
> something.

Pull request is at https://github.com/biopython/biopython/pull/145

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From redmine at redmine.open-bio.org  Wed Jan  9 22:58:25 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 9 Jan 2013 22:58:25 +0000
Subject: [Biopython-dev] [Biopython - Bug #3403] (New) PDBList fails to
	download large PDB structures
Message-ID: <redmine.issue-3403.20130109225825@redmine.open-bio.org>


Issue #3403 has been reported by David Cain.

----------------------------------------
Bug #3403: PDBList fails to download large PDB structures
https://redmine.open-bio.org/issues/3403

Author: David Cain
Status: New
Priority: High
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: https://github.com/DavidCain/biopython/tree/fix_pdb_dl


The current @PDBList@ module will often fail to download large PDB files.

<pre>
>>> from Bio.PDB import PDBList
>>> pdbl = PDBList()
>>> pdbl.retrieve_pdb_file("1hgg")
Downloading PDB structure '1hgg'...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/Bio/PDB/PDBList.py", line 247, in retrieve_pdb_file
    out.writelines(gz.read())
  File "/usr/lib/python2.7/gzip.py", line 249, in read
    self._read(readsize)
  File "/usr/lib/python2.7/gzip.py", line 303, in _read
    self._read_eof()
  File "/usr/lib/python2.7/gzip.py", line 342, in _read_eof
    hex(self.crc)))
IOError: CRC check failed 0x21d7a5f7 != 0x4b5eabb6L
>>>
</pre>

The source of this problem is that the entire gzipped file must be read into memory before it's written to disk locally. With large archives, the local file can be truncated prematurely, which causes gzip to crash on extraction.

I fixed this issue on my "GitHub branch":https://github.com/DavidCain/biopython/tree/fix_pdb_dl, which I've made a pull request for.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Wed Jan  9 22:58:25 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 9 Jan 2013 22:58:25 +0000
Subject: [Biopython-dev] [Biopython - Bug #3403] (New) PDBList fails to
	download large PDB structures
Message-ID: <redmine.issue-3403.20130109225825@redmine.open-bio.org>


Issue #3403 has been reported by David Cain.

----------------------------------------
Bug #3403: PDBList fails to download large PDB structures
https://redmine.open-bio.org/issues/3403

Author: David Cain
Status: New
Priority: High
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: https://github.com/DavidCain/biopython/tree/fix_pdb_dl


The current @PDBList@ module will often fail to download large PDB files.

<pre>
>>> from Bio.PDB import PDBList
>>> pdbl = PDBList()
>>> pdbl.retrieve_pdb_file("1hgg")
Downloading PDB structure '1hgg'...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/Bio/PDB/PDBList.py", line 247, in retrieve_pdb_file
    out.writelines(gz.read())
  File "/usr/lib/python2.7/gzip.py", line 249, in read
    self._read(readsize)
  File "/usr/lib/python2.7/gzip.py", line 303, in _read
    self._read_eof()
  File "/usr/lib/python2.7/gzip.py", line 342, in _read_eof
    hex(self.crc)))
IOError: CRC check failed 0x21d7a5f7 != 0x4b5eabb6L
>>>
</pre>

The source of this problem is that the entire gzipped file must be read into memory before it's written to disk locally. With large archives, the local file can be truncated prematurely, which causes gzip to crash on extraction.

I fixed this issue on my "GitHub branch":https://github.com/DavidCain/biopython/tree/fix_pdb_dl, which I've made a pull request for.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Wed Jan  9 23:08:28 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 9 Jan 2013 23:08:28 +0000
Subject: [Biopython-dev] [Biopython - Bug #3403] PDBList fails to download
	large PDB structures
References: <redmine.issue-3403.20130109225825@redmine.open-bio.org>
Message-ID: <redmine.journal-15062.20130109230828@redmine.open-bio.org>


Issue #3403 has been updated by David Cain.


(Pull request "here":https://github.com/biopython/biopython/pull/146)
----------------------------------------
Bug #3403: PDBList fails to download large PDB structures
https://redmine.open-bio.org/issues/3403

Author: David Cain
Status: New
Priority: High
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: https://github.com/DavidCain/biopython/tree/fix_pdb_dl


The current @PDBList@ module will often fail to download large PDB files.

<pre>
>>> from Bio.PDB import PDBList
>>> pdbl = PDBList()
>>> pdbl.retrieve_pdb_file("1hgg")
Downloading PDB structure '1hgg'...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/Bio/PDB/PDBList.py", line 247, in retrieve_pdb_file
    out.writelines(gz.read())
  File "/usr/lib/python2.7/gzip.py", line 249, in read
    self._read(readsize)
  File "/usr/lib/python2.7/gzip.py", line 303, in _read
    self._read_eof()
  File "/usr/lib/python2.7/gzip.py", line 342, in _read_eof
    hex(self.crc)))
IOError: CRC check failed 0x21d7a5f7 != 0x4b5eabb6L
>>>
</pre>

The source of this problem is that the entire gzipped file must be read into memory before it's written to disk locally. With large archives, the local file can be truncated prematurely, which causes gzip to crash on extraction.

I fixed this issue on my "GitHub branch":https://github.com/DavidCain/biopython/tree/fix_pdb_dl, which I've made a pull request for.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Wed Jan  9 23:55:13 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 9 Jan 2013 23:55:13 +0000
Subject: [Biopython-dev] Fwd: [biopython] Fix broken downloading of large
	PDB structures (#146)
In-Reply-To: <biopython/biopython/pull/146@github.com>
References: <biopython/biopython/pull/146@github.com>
Message-ID: <CAKVJ-_6zi6LVva0uvWjm=ooHiho5MAR=r0Cgnxi64yG2h0fmJA@mail.gmail.com>

FYI

---------- Forwarded message ----------
From: David Cain <notifications at github.com>
Date: Wed, Jan 9, 2013 at 10:59 PM
Subject: [biopython] Fix broken downloading of large PDB structures (#146)
To: biopython/biopython <biopython at noreply.github.com>


Summary of changes

   - Fix failure to download large PDB files
   - Use with statements for safer file I/O
   - Remove obsolete parameters
   - PEP 8 changes, update documentation

Failure to download large PDB files

(See: Redmine bug #3403 <https://redmine.open-bio.org/issues/3403>)

The current PDBList module will often fail to download large PDB files.

>>> from Bio.PDB import PDBList
>>> pdbl = PDBList()
>>> pdbl.retrieve_pdb_file("1hgg")
...
IOError: CRC check failed 0x21d7a5f7 != 0x4b5eabb6L
>>>

The source of this problem is that the entire gzipped file must be read
into memory before it's written to disk locally.

Instead of this memory-intensive approach, I changed the downloading to
use urllib.urlretrieve, which is more readable and far more efficient.
Obsolete parameters

The long-obsolete parameters to retrieve_pdb_file(() have been
removed. Formerly, the function allowed the user to specify compression
and/or a system utility to perform decompression. But all archives are
now gzipped, and PDBList uses Python's gzip module to decompress
archives. These parameters have been obsolete for over a year (they were
marked deprecated with commit
7ebf6e9<https://github.com/biopython/biopython/commit/7ebf6e9ecb>
).
------------------------------
You can merge this Pull Request by running

  git pull https://github.com/DavidCain/biopython fix_pdb_dl

Or view, comment on, or merge it at:

  https://github.com/biopython/biopython/pull/146
Commit Summary

   - Use urlretrieve to smartly download PDB archives
   - Use 'with' statement for safer file I/O
   - Collapse unwieldy if-else structure
   - PEP8 fixes within retrieve_pdb_file
   - Remove deprecated parameters
   - Update with clarifying comments
   - PEP8 fixes, updated comments for file
   - Use urlretrieve in other instance of save to disk

File Changes

   - *M* Bio/PDB/PDBList.py (217)

Patch Links:

   - https://github.com/biopython/biopython/pull/146.patch
   - https://github.com/biopython/biopython/pull/146.diff


From mjldehoon at yahoo.com  Thu Jan 10 09:21:34 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 10 Jan 2013 01:21:34 -0800 (PST)
Subject: [Biopython-dev] Bio._utils iterlen not needed
Message-ID: <1357809694.20781.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Dear all,

As far as I can tell the iterlen function in Bio._utils is not needed.
Simply calling len(items) does exactly what iterlen does, and is much faster too.

For the other functions, are they important enough to warrant a separate module? From our previous experience in Biopython, these kinds of utility modules tend to be underused. This is because the functions are simple and therefore easy to replicate, and often they do not do exactly what is needed in a particular module. Similar utility modules in Biopython in the past were forgotten after a while, and then deprecated and removed.

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Thu Jan 10 13:03:50 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 10 Jan 2013 13:03:50 +0000
Subject: [Biopython-dev] Bio._utils iterlen not needed
In-Reply-To: <1357809694.20781.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <1357809694.20781.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_4yzp-TDJ9y_ATYYoD+6vhkhj7-Xm9dW86N_ELSeQGkrQ@mail.gmail.com>

On Thu, Jan 10, 2013 at 9:21 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> Dear all,
>
> As far as I can tell the iterlen function in Bio._utils is not needed.
> Simply calling len(items) does exactly what iterlen does, and is much faster too.

No, the reason d'?tre for iterlen is that you can't use len on an iterator, e.g.

>>> len(iter("abcde"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'iterator' has no len()

>>> from Bio._utils import iterlen
>>> iterlen(iter("abcde"))
5

Perhaps the function needs a little more documentation...

> For the other functions, are they important enough to warrant
> a separate module? From our previous experience in Biopython,
> these kinds of utility modules tend to be underused. This is
> because the functions are simple and therefore easy to
> replicate, and often they do not do exactly what is needed
> in a particular module. Similar utility modules in Biopython
> in the past were forgotten after a while, and then deprecated
> and removed.

Note that Bio._utils has a leading underscore - these are
therefore a 'private' API which we don't have to worry about
maintaining and deprecated etc in the same way as a public
API. We're not expect end users to use this module ;)

The functions here were originally helper functions used in
Bio.Phylo which are now also used in Bio.SearchIO - I think
a shared private module like this is a good compromise
between code duplication and top level modules.

Peter


From mjldehoon at yahoo.com  Thu Jan 10 17:24:14 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 10 Jan 2013 09:24:14 -0800 (PST)
Subject: [Biopython-dev] Bio._utils iterlen not needed
In-Reply-To: <CAKVJ-_4yzp-TDJ9y_ATYYoD+6vhkhj7-Xm9dW86N_ELSeQGkrQ@mail.gmail.com>
Message-ID: <1357838654.1021.YahooMailClassic@web164001.mail.gq1.yahoo.com>

--- On Thu, 1/10/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> > Simply calling len(items) does exactly what iterlen
> does, and is much faster too.
> 
> No, the reason d'?tre for iterlen is that you can't use len
> on an iterator, e.g.
> 
> >>> len(iter("abcde"))
> Traceback (most recent call last):
> ? File "<stdin>", line 1, in <module>
> TypeError: object of type 'iterator' has no len()
> 
You're right. Actually it depends on the iterator. For example,
len(xrange(100)) works (xrange also returns an iterator). I guess in general an iterator can't have a len() function because it's not clear that the iterator will ever end.

That said, currently the iterlen function is used in only one place, in Bio/Phylo/BaseTree.py as follows:

    def count_terminals(self):
        return _utils.iterlen(self.find_clades(terminal=True))

But here you could simply have

    def count_terminals(self):
        clades = self.find_clades(terminal=True)
        count = 0
        for clade in clades:
            count+=1
        return count

I don't see why we need a function iterlen for this, and if we do have such a function, why it should be in Bio._utils.

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Thu Jan 10 21:16:12 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 10 Jan 2013 21:16:12 +0000
Subject: [Biopython-dev] Bio._utils iterlen not needed
In-Reply-To: <1357838654.1021.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <CAKVJ-_4yzp-TDJ9y_ATYYoD+6vhkhj7-Xm9dW86N_ELSeQGkrQ@mail.gmail.com>
	<1357838654.1021.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7P2SX0ctkbPf-zLxwTwANjwKPxMtKLgakymTYcoooM-Q@mail.gmail.com>

On Thu, Jan 10, 2013 at 5:24 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Thu, 1/10/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> > Simply calling len(items) does exactly what iterlen
>> > does, and is much faster too.
>>
>> No, the reason d'?tre for iterlen is that you can't use len
>> on an iterator, e.g.
>>
>> >>> len(iter("abcde"))
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> TypeError: object of type 'iterator' has no len()
>
> You're right. Actually it depends on the iterator. For example,
> len(xrange(100)) works (xrange also returns an iterator). I guess
> in general an iterator can't have a len() function because it's not
> clear that the iterator will ever end.

Good point - I didn't know xrange defined __len__, and you are
right in general - other iterator object could also do that:

https://github.com/biopython/biopython/commit/57ae89cdedbc1e18495ffb615a3a1d2c9feb0296

> That said, currently the iterlen function is used in only one place,
> in Bio/Phylo/BaseTree.py as follows:

True. I hadn't checked that - I assumed it was used more
than once. If there are no other natural placed where it would
make sense then yes, it might as well be done in line once,
and Bio._utils.iterlen could be removed.

When written, iterlen was in private module Bio.Phylo._sugar
(CC'ing Eric) which Bow moved to Bio._utils as he wanted to
use some of it in SearchIO.

Peter


From eric.talevich at gmail.com  Thu Jan 10 21:50:45 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Thu, 10 Jan 2013 16:50:45 -0500
Subject: [Biopython-dev] Bio._utils iterlen not needed
In-Reply-To: <CAKVJ-_7P2SX0ctkbPf-zLxwTwANjwKPxMtKLgakymTYcoooM-Q@mail.gmail.com>
References: <CAKVJ-_4yzp-TDJ9y_ATYYoD+6vhkhj7-Xm9dW86N_ELSeQGkrQ@mail.gmail.com>
	<1357838654.1021.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAKVJ-_7P2SX0ctkbPf-zLxwTwANjwKPxMtKLgakymTYcoooM-Q@mail.gmail.com>
Message-ID: <CAMC681kYPV=Z74-o2f14guYBPhnyAv7DAuGdrrtt1NLNQUOMxQ@mail.gmail.com>

On Thu, Jan 10, 2013 at 4:16 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Thu, Jan 10, 2013 at 5:24 PM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > That said, currently the iterlen function is used in only one place,
> > in Bio/Phylo/BaseTree.py as follows:
>
> True. I hadn't checked that - I assumed it was used more
> than once. If there are no other natural placed where it would
> make sense then yes, it might as well be done in line once,
> and Bio._utils.iterlen could be removed.
>
> When written, iterlen was in private module Bio.Phylo._sugar
> (CC'ing Eric) which Bow moved to Bio._utils as he wanted to
> use some of it in SearchIO.
>

That's all true. I created _sugar.py during GSoC 2009 for utility code that
Bio.Phylo needed, but wasn't related to trees in any way -- similar to
Bow's thinking. I probably meant to get rid of the module entirely after
the grand merge (hence the note at the top of _sugar.py to keep the file as
small as possible). IIRC, I made it a separate function while testing
whether "enumerate" or "cnt += 1" would be faster.

I have no objections to getting rid of the function now.

-E


From mjldehoon at yahoo.com  Fri Jan 11 12:36:15 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 11 Jan 2013 04:36:15 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
Message-ID: <1357907775.13851.YahooMailClassic@web164001.mail.gq1.yahoo.com>

Hi everybody,

Bio.ParserSupport has had a PendingDeprecationWarning since Biopython 1.59, so we may consider upgrading this to a BiopythonDeprecationWarning in Biopython 1.61 before removing Bio.ParserSupport. The only tricky point is that then we would also have to upgrade the PendingDeprecationWarning in Bio/Blast/NCBIStandalone.py to a BiopythonDeprecationWarning, as that code relies on Bio.ParserSupport. Bio.Blast.NCBIStandalone has had this PendingDeprecationWarning since Biopython release 1.56.

Any objections? This may help giving Bow's Bio.SearchIO module some more prominence.

On a related point, the fact that we are deprecating Bio.ParserSupport (which was a painful process) suggests that having a new module Bio._utils with a set of generic utility functions is not a good idea.

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Fri Jan 11 15:33:05 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 11 Jan 2013 15:33:05 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1357907775.13851.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <1357907775.13851.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_41GC2oRy2n8ycmc2m=kPp4_E1eJKHky0NBAUCG1iZe9A@mail.gmail.com>

On Fri, Jan 11, 2013 at 12:36 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> Bio.ParserSupport has had a PendingDeprecationWarning since Biopython
> 1.59, so we may consider upgrading this to a BiopythonDeprecationWarning in
> Biopython 1.61 before removing Bio.ParserSupport. The only tricky point is
> that then we would also have to upgrade the PendingDeprecationWarning in
> Bio/Blast/NCBIStandalone.py to a BiopythonDeprecationWarning, as that code
> relies on Bio.ParserSupport. Bio.Blast.NCBIStandalone has had this
> PendingDeprecationWarning since Biopython release 1.56.
>
> Any objections? This may help giving Bow's Bio.SearchIO module some more
> prominence.

Bow's SearchIO is using Bio.Blast.NCBIStandalone to handle plain text,
https://github.com/biopython/biopython/blob/master/Bio/SearchIO/BlastIO/blast_text.py

We'd discussed a new parser targeting just the plain text from BLAST+
(and if not too different maybe the final legacy BLAST release), which
should be less diverse that the current range of BLAST quirks built up
over the years.

> On a related point, the fact that we are deprecating Bio.ParserSupport
> (which was a painful process) suggests that having a new module Bio._utils
> with a set of generic utility functions is not a good idea.

That's why Bio._utils is a private module - we can drop/change/etc
this without worrying about breaking other people's code. The issue
with Bio.ParserSupport is it was a public API.

Regards,

Peter


From w.arindrarto at gmail.com  Sun Jan 13 15:22:13 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Sun, 13 Jan 2013 16:22:13 +0100
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CAKVJ-_41GC2oRy2n8ycmc2m=kPp4_E1eJKHky0NBAUCG1iZe9A@mail.gmail.com>
References: <1357907775.13851.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAKVJ-_41GC2oRy2n8ycmc2m=kPp4_E1eJKHky0NBAUCG1iZe9A@mail.gmail.com>
Message-ID: <CADEGkF6+rQtV_XK6Y40UJ9bn+52Ed9ZbuF5N46pUUTjJjq9c1g@mail.gmail.com>

Hi everyone,

>> Bio.ParserSupport has had a PendingDeprecationWarning since Biopython
>> 1.59, so we may consider upgrading this to a BiopythonDeprecationWarning in
>> Biopython 1.61 before removing Bio.ParserSupport. The only tricky point is
>> that then we would also have to upgrade the PendingDeprecationWarning in
>> Bio/Blast/NCBIStandalone.py to a BiopythonDeprecationWarning, as that code
>> relies on Bio.ParserSupport. Bio.Blast.NCBIStandalone has had this
>> PendingDeprecationWarning since Biopython release 1.56.
>>
>> Any objections? This may help giving Bow's Bio.SearchIO module some more
>> prominence.
>
> Bow's SearchIO is using Bio.Blast.NCBIStandalone to handle plain text,
> https://github.com/biopython/biopython/blob/master/Bio/SearchIO/BlastIO/blast_text.py
>
> We'd discussed a new parser targeting just the plain text from BLAST+
> (and if not too different maybe the final legacy BLAST release), which
> should be less diverse that the current range of BLAST quirks built up
> over the years.

Yes. Until such a parser is ready, Bio.ParserSupport is still needed.
We may still deprecate it from the visible / public namespace and move
it into a private module, though. If we are also deprecating
Bio.BLAST, then moving Bio.BLAST.NCBIStandalone into a private module
as well seems like an ok fix for the time being.

regards,
Bow


From p.j.a.cock at googlemail.com  Tue Jan 15 15:28:07 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 15 Jan 2013 15:28:07 +0000
Subject: [Biopython-dev] buildbot issue on Python 3.1 - stdout?
In-Reply-To: <CADEGkF6Sk0N7-2Ygay7FD_hGP-ZXyhKYkpXdp=qPy9mg++_WxQ@mail.gmail.com>
References: <CAKVJ-_4HJ-Qze2UwFtnU8MkHQc3dBL0t=aJW=wdJ08aOSt8gUA@mail.gmail.com>
	<CAKVJ-_4qi0txaYWXv8axJHf_WJJc7uZiRLdo3MBx_5BtSZrR6w@mail.gmail.com>
	<CADEGkF41Fu8BzuBh_3DfRSF5SS6C8UecU7F-TXTgnd-Md44Kcw@mail.gmail.com>
	<CAKVJ-_5SjfRFiiKSatU9ds8b5ESdUTexa3TP=k+W=TPmtHoTfA@mail.gmail.com>
	<CADEGkF6Sk0N7-2Ygay7FD_hGP-ZXyhKYkpXdp=qPy9mg++_WxQ@mail.gmail.com>
Message-ID: <CAKVJ-_4J1np7P7DYfSiWGKFYNzztLOveGFGwo6QuhtpjQpovKg@mail.gmail.com>

On Fri, Dec 14, 2012 at 12:48 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi everyone,
>
>>> It's reproducible in my machine: Arch Linux 64 bit running
>>> Python3.1.5. Haven't figured out a fix yet, but trying to see if I
>>> can.
>>
>> Great. We haven't really proved this is down to a change in
>> either Python 3.1.4 or 3.1.5 but it does look likely.
>
> It's reproduced in my local 3.1.4 installation. Seems like an unfixed
> bug that went through to 3.1.5.

Regarding this issue with test_Emboss.py,
AttributeError: '_io.FileIO' object has no attribute 'read1'
http://lists.open-bio.org/pipermail/biopython-dev/2012-December/010156.html

I've now tried downgrading Python 3.1 on this machine, and it does
seem to be a problem under Python 3.1.4 and 3.1.5 but not 3.1.3.
For now I have simply left this buildslave running 3.1.3 instead. I
will also downgrade Python 3.1 on the second 64 bit Linux server.

That should take care of the annoying buildbot failures (and the
daily email I've been getting). This thread may help someone else
with a similar issue, but I don't feel inclined to try and explore in
any more depth what exactly is going wrong under Python 3.1.4
and 3.1.5, and if there is a Python bug we should report.

Regards,

Peter


From kai.blin at biotech.uni-tuebingen.de  Tue Jan 15 15:54:45 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Tue, 15 Jan 2013 16:54:45 +0100
Subject: [Biopython-dev] More 'fun' with GenBank
Message-ID: <50F57BC5.7020607@biotech.uni-tuebingen.de>

Hi folks,

as people are hitting my web service with all sorts of wonky GenBank
files, I've stumbled over another one that throws the GenBank parser off
track.

The culprit is a SeqFeature with a location line like:

     CDS             join(complement(4093..4338),complement(3876..4011),
                     complement(3655..3809),complement(3284..3585),
                     complement(2421..2813),complement(2057..2303))

Now, the way I read the GenBank spec, this is not a valid location line,
but should instead be a complement() of joins(). Unfortunately, the NCBI
seems to disagree with its own specs, and put the record into their
Nucleotide database as CABT02000004, which means that by all practical
purposes, it _is_ a valid GenBank file and the parser should cope.

The parser looks at this location and creates a feature on the -1
strand, from 4092:2303. This is caused by by the feature location
calculation on
https://github.com/biopython/biopython/blob/master/Bio/GenBank/__init__.py#L1049
and the lines after.

In short, we do
            s = cur_feature.sub_features[0].location.start
            e = cur_feature.sub_features[-1].location.end
            cur_feature.location = SeqFeature.FeatureLocation(s, e, strand)

And when the join() looks like the record I'm dealing with, this is
clearly the wrong way around.

I decided to fix this by sorting the subfeatures by start,end
coordinates, and that fixes this issue for me.

Unfortunately, this also breaks an existing test, the extra_keywords.gb
test.
https://github.com/biopython/biopython/blob/master/Tests/GenBank/extra_keywords.gb#L647
has a feature that has a location of

     CDS             join(153490..154269,AL121804.2:41..610,
                     AL121804.2:672..1487)

Here, we probably do want the feature from 153489:1487, even though I'm
not sure how useful such a location really is.

So I decided to fix this by sorting the subfeatures first on their ref,
and then on start, end.

This again breaks a test, this time in one_of.gb
https://github.com/biopython/biopython/blob/master/Tests/GenBank/one_of.gb#L39
where the location line is

     CDS join(2201..2479,U18267.1:120..246,U18268.1:130..288,
                     U18270.1:4691..4788,U18269.1:82..>128)

Here, the U18270.1 record seems to come befire the U18269.1 record.

Now, we're again spanning a feature into multiple contigs, none of which
are accessible to the extract() function as far as I'm aware.
Sorting the locations by start, end (and maybe ref first) at least fixes
the case CABT02000004 is broken on where we have the chance of getting
extract() to work.

The attached patch is my proposed change, but I wanted to get some
feedback first before opening a bug and/or submitting a pull request.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-GenBank-Sort-subfeatures-by-ref-and-start-end-positi.patch
Type: text/x-patch
Size: 9059 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20130115/f7c0bb7d/attachment-0002.bin>

From p.j.a.cock at googlemail.com  Tue Jan 15 16:41:32 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 15 Jan 2013 16:41:32 +0000
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <50F57BC5.7020607@biotech.uni-tuebingen.de>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>

On Tue, Jan 15, 2013 at 3:54 PM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
> Hi folks,
>
> as people are hitting my web service with all sorts of wonky GenBank
> files, I've stumbled over another one that throws the GenBank parser off
> track.
>
> The culprit is a SeqFeature with a location line like:
>
>      CDS             join(complement(4093..4338),complement(3876..4011),
>                      complement(3655..3809),complement(3284..3585),
>                      complement(2421..2813),complement(2057..2303))
>
> Now, the way I read the GenBank spec, this is not a valid location line,
> but should instead be a complement() of joins(). Unfortunately, the NCBI
> seems to disagree with its own specs, and put the record into their
> Nucleotide database as CABT02000004, which means that by all practical
> purposes, it _is_ a valid GenBank file and the parser should cope.

That should work - for a while GenBank and EMBL didn't agree about
joins on the complement strand, one did complement(join(a..b,c..d))
and the other join(complement(c..d),complement(a..b)), notice the
order of the sub-regions flips.

> The parser looks at this location and creates a feature on the -1
> strand, from 4092:2303. This is caused by by the feature location
> calculation on
> https://github.com/biopython/biopython/blob/master/Bio/GenBank/__init__.py#L1049
> and the lines after.
>
> In short, we do
>             s = cur_feature.sub_features[0].location.start
>             e = cur_feature.sub_features[-1].location.end
>             cur_feature.location = SeqFeature.FeatureLocation(s, e, strand)

For join feature locations, the sub-feature locations should be fine
but the overall feature location is a bit weird/broken for negative
and mixed strands.

This was one of the things the re-factoring on this branch aimed to
fix, https://github.com/peterjc/biopython/tree/f_loc4/
http://lists.open-bio.org/pipermail/biopython-dev/2012-July/009803.html

I was intending to bring this up again after the next release (which
could be later this month or February 2012), but perhaps it would
be worth doing now?

Peter


From arklenna at gmail.com  Tue Jan 15 17:19:48 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Tue, 15 Jan 2013 12:19:48 -0500
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
	<CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
Message-ID: <CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>

+1 for f_loc4. The FeatureLocation/CompoundLocation classes will hopefully
make handling joins and other GenBank operators a little more logical. Not
to mention my CoordinateMapper is based on this branch!

Lenna


On Tue, Jan 15, 2013 at 11:41 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Tue, Jan 15, 2013 at 3:54 PM, Kai Blin
> <kai.blin at biotech.uni-tuebingen.de> wrote:
> > Hi folks,
> >
> > as people are hitting my web service with all sorts of wonky GenBank
> > files, I've stumbled over another one that throws the GenBank parser off
> > track.
> >
> > The culprit is a SeqFeature with a location line like:
> >
> >      CDS             join(complement(4093..4338),complement(3876..4011),
> >                      complement(3655..3809),complement(3284..3585),
> >                      complement(2421..2813),complement(2057..2303))
> >
> > Now, the way I read the GenBank spec, this is not a valid location line,
> > but should instead be a complement() of joins(). Unfortunately, the NCBI
> > seems to disagree with its own specs, and put the record into their
> > Nucleotide database as CABT02000004, which means that by all practical
> > purposes, it _is_ a valid GenBank file and the parser should cope.
>
> That should work - for a while GenBank and EMBL didn't agree about
> joins on the complement strand, one did complement(join(a..b,c..d))
> and the other join(complement(c..d),complement(a..b)), notice the
> order of the sub-regions flips.
>
> > The parser looks at this location and creates a feature on the -1
> > strand, from 4092:2303. This is caused by by the feature location
> > calculation on
> >
> https://github.com/biopython/biopython/blob/master/Bio/GenBank/__init__.py#L1049
> > and the lines after.
> >
> > In short, we do
> >             s = cur_feature.sub_features[0].location.start
> >             e = cur_feature.sub_features[-1].location.end
> >             cur_feature.location = SeqFeature.FeatureLocation(s, e,
> strand)
>
> For join feature locations, the sub-feature locations should be fine
> but the overall feature location is a bit weird/broken for negative
> and mixed strands.
>
> This was one of the things the re-factoring on this branch aimed to
> fix, https://github.com/peterjc/biopython/tree/f_loc4/
> http://lists.open-bio.org/pipermail/biopython-dev/2012-July/009803.html
>
> I was intending to bring this up again after the next release (which
> could be later this month or February 2012), but perhaps it would
> be worth doing now?
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From p.j.a.cock at googlemail.com  Tue Jan 15 19:03:51 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 15 Jan 2013 19:03:51 +0000
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
	<CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
	<CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>
Message-ID: <CAKVJ-_77kdYzXv_Q_KVqy9jWSNJSgU+PWdVB-DzxdF8TKwUAGg@mail.gmail.com>

On Tue, Jan 15, 2013 at 5:19 PM, Lenna Peterson <arklenna at gmail.com> wrote:
> +1 for f_loc4. The FeatureLocation/CompoundLocation classes will hopefully
> make handling joins and other GenBank operators a little more logical. Not
> to mention my CoordinateMapper is based on this branch!
>
> Lenna

It will need a bit of work to rebase (some of the PEP8 changes have
touched the same lines of code), but I will try and do that this week.

Peter


From antony.lee at berkeley.edu  Tue Jan 15 21:45:19 2013
From: antony.lee at berkeley.edu (Antony Lee)
Date: Tue, 15 Jan 2013 13:45:19 -0800
Subject: [Biopython-dev] Circular sequences
Message-ID: <20130115214519.GC8511@gmail.com>

Hi all,

While working on a (more sane?) rewrite of the Restriction library
(https://github.com/biopython/biopython/pull/148), I found the need
to add a circular/linear attribute to sequence objects (just as the
currently existing Restriction library does).  So I quickly added such
a class, independently of whatever Biopython currently provides.  But
it seems like the module would be better integrated in the rest of
Biopython if it used Bio.Seq.Seq instead.

I saw that CircularSeqs have already been discussed on the mailing
list, and the main issue was with indexing and slicing.  So here are my
thoughts about how such an object should behave.  Assume a circular seq
s of length 10.  Simple indexing works modulo 10 (and negative indices
work identically).  Methods that return one or more indices return the
indices modulo 10.  Slicing with both ends defined (i.e. s[x:y(:z)])
wrap as many times as needed around the sequence if y >= x, and make at
most one complete cycle if y < x (i.e. add len(s) as many times as
needed to y to make it bigger than x, and stop there).  Slicing with one
or both ends undefined (ie. s[:], s[x:], s[:y]) raises an IndexError
(because, well, I read s[x:] as "return the elements of s starting from
the x'th until the end"... but there is no such end.).  (A second option
would be to return an infinite iterable for s[x:], but that doesn't take
care of s[:y] anyways, not to mention the bugs that may appear from
that.)

A few other issues were addressed in the previous thread.  I think that
adding CircularSeqs does not make sense at all (so __add__ raises a
ValueError), and translation can either check for the presence of a stop
codon and raise ValueError otherwise, or return an infinite iterator.

Another thing that may be useful for a restriction analysis library is a
good way to represent a dsDNA sequence with some overhangs.  Any
thoughts?

Antony


From kai.blin at biotech.uni-tuebingen.de  Wed Jan 16 08:28:06 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Wed, 16 Jan 2013 09:28:06 +0100
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <CAKVJ-_77kdYzXv_Q_KVqy9jWSNJSgU+PWdVB-DzxdF8TKwUAGg@mail.gmail.com>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
	<CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
	<CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>
	<CAKVJ-_77kdYzXv_Q_KVqy9jWSNJSgU+PWdVB-DzxdF8TKwUAGg@mail.gmail.com>
Message-ID: <50F66496.8000109@biotech.uni-tuebingen.de>

On 2013-01-15 20:03, Peter Cock wrote:

Hi Peter,

> It will need a bit of work to rebase (some of the PEP8 changes have
> touched the same lines of code), but I will try and do that this week.

Your f_loc4 branch certainly fixes the problem I'm seeing. Is there
anything I can do to help with getting it merged? I'm happy to give a
closer look at the rebase conflicts coming up during the merge if you
don't mind me asking the occasional question if I can't work out reasons
for a code change from the commit messages.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From Markus.Piotrowski at ruhr-uni-bochum.de  Wed Jan 16 09:42:54 2013
From: Markus.Piotrowski at ruhr-uni-bochum.de (Markus Piotrowski)
Date: 16 Jan 2013 10:42:54 +0100
Subject: [Biopython-dev] Circular sequences
In-Reply-To: <20130115214519.GC8511@gmail.com>
References: <20130115214519.GC8511@gmail.com>
Message-ID: <50F6761E.9000606@ruhr-uni-bochum.de>

Am 15.01.2013 22:45, schrieb Antony Lee:
> needed to y to make it bigger than x, and stop there).  Slicing with one
> or both ends undefined (ie. s[:], s[x:], s[:y]) raises an IndexError
> (because, well, I read s[x:] as "return the elements of s starting from
> the x'th until the end"... but there is no such end.).  (A second option
> would be to return an infinite iterable for s[x:], but that doesn't take
> care of s[:y] anyways, not to mention the bugs that may appear from
> that.)

Another possibility, which makes some biological sense (thinking on 
restriction), would be that
s[x:] (or s[:y]) returns a linear sequence starting at x and ending with 
x-1 (or ending with y and starting at y+1). Thus, s[x:] would mean 'cut 
my circle at x and return the linear sequence starting at x'.

Markus


From p.j.a.cock at googlemail.com  Wed Jan 16 10:24:13 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 16 Jan 2013 10:24:13 +0000
Subject: [Biopython-dev] Circular sequences
In-Reply-To: <50F6761E.9000606@ruhr-uni-bochum.de>
References: <20130115214519.GC8511@gmail.com>
	<50F6761E.9000606@ruhr-uni-bochum.de>
Message-ID: <CAKVJ-_5U+fYJY+aaQ+L5SaQTwQ+WT-O8_5O4zpLJv+FSJ00u7w@mail.gmail.com>

For those that missed it last time, I think the most recent in depth
discussion about circular sequences and slicing was here:

http://lists.open-bio.org/pipermail/biopython/2011-March/007075.html
...
http://lists.open-bio.org/pipermail/biopython/2011-March/007085.html

On Wed, Jan 16, 2013 at 9:42 AM, Markus Piotrowski
<Markus.Piotrowski at ruhr-uni-bochum.de> wrote:
> Am 15.01.2013 22:45, schrieb Antony Lee:
>
>> needed to y to make it bigger than x, and stop there).  Slicing with one
>> or both ends undefined (ie. s[:], s[x:], s[:y]) raises an IndexError
>> (because, well, I read s[x:] as "return the elements of s starting from
>> the x'th until the end"... but there is no such end.).  (A second option
>> would be to return an infinite iterable for s[x:], but that doesn't take
>> care of s[:y] anyways, not to mention the bugs that may appear from
>> that.)
>
>
> Another possibility, which makes some biological sense (thinking on
> restriction), would be that
> s[x:] (or s[:y]) returns a linear sequence starting at x and ending with x-1
> (or ending with y and starting at y+1). Thus, s[x:] would mean 'cut my
> circle at x and return the linear sequence starting at x'.

That's exactly the kind of behaviour which would make me nervous
given in general the Biopython sequence objects mimic Python strings.
There are many examples where that 'extra' sequence would be
unexpected. For instance, writing out line wrapped sequence data.

I would prefer an explicit method like 'cut' on a circular sequence
object returning a full length linear sequence. Similarly a 'roll' or
'rotate' method could shift the origin to a new coordinate.

One simple solution to the complexities of the slice behaviour is
the practical one: They act like Python strings, basically all we
would be adding would an 'is circular' flag and some logic about
how to propagate that flag in operations like addition and slicing.
If we went that route it might still be possible to make the find and
'in' functionality origin aware... but that may just cause trouble.

This would solve where to store if a sequence is circular (e.g. when
reading GenBank and EMBL files - or for handling restriction
enzyme digests), but other than that not add much utility.

Thoughts?

Peter


From antony.lee at berkeley.edu  Wed Jan 16 19:09:32 2013
From: antony.lee at berkeley.edu (Antony Lee)
Date: Wed, 16 Jan 2013 11:09:32 -0800
Subject: [Biopython-dev] Circular sequences
In-Reply-To: <CAKVJ-_5U+fYJY+aaQ+L5SaQTwQ+WT-O8_5O4zpLJv+FSJ00u7w@mail.gmail.com>
References: <20130115214519.GC8511@gmail.com>
	<50F6761E.9000606@ruhr-uni-bochum.de>
	<CAKVJ-_5U+fYJY+aaQ+L5SaQTwQ+WT-O8_5O4zpLJv+FSJ00u7w@mail.gmail.com>
Message-ID: <20130116190932.GA1962@gmail.com>

I think the proposed behaviour makes biological sense (now s[x:] and
s[:y] mean "cut the sequence before x (or before y) and keep the
downstream (or upstream) sequence, whatever it is").  But I understand
Peter's concerns as well.  A quick grep showed me around 400 instances
of "[:" showing up in the current code base, and as many ":]", and most
of them seem to be related to string (as opposed to sequence) processing
so checking these may not be impossible (though not very fun of course),
but this won't protect against future mis-uses of sequence indexing.

So I think methods such as cut and roll are fine too (and go back to
raising ValueError when either or both ends of the slice are None).  Now
it would be the responsibility of sequence-consuming functions to start
by .cut()ting the sequence before slicing it.

find and __contains__ can be implemented easily (though perhaps
inelegantly) by changing "foo in circular(bar)" into "foo in linear(bar)
+ linear(bar)[:len(foo)-1]" (which is essentially what is done in both
Restriction libraries, the old and the new one).

Finally let me say that right now I don't use the most of the rest
of Biopython (and don't really think I'll use most of it in the near
future) so I care little about whether this specific feature gets
integrated or not; however I do think it is needed in a proper
restriction analysis library.  Indeed, one could say that we just have
to add a "circular=True|False" keyword argument to methods such as
search and catalyze, but that is not enough to distinguish e.g. if a
circular plasmid is digested once or not at all (of course, one can
check separately but what I mean there is that circularity is a natural
"output" of the functions, not just input).

Antony

On Wed, Jan 16, 2013 at 10:24:13AM +0000, Peter Cock wrote:
> For those that missed it last time, I think the most recent in depth
> discussion about circular sequences and slicing was here:
> 
> http://lists.open-bio.org/pipermail/biopython/2011-March/007075.html
> ...
> http://lists.open-bio.org/pipermail/biopython/2011-March/007085.html
> 
> On Wed, Jan 16, 2013 at 9:42 AM, Markus Piotrowski
> <Markus.Piotrowski at ruhr-uni-bochum.de> wrote:
> > Am 15.01.2013 22:45, schrieb Antony Lee:
> >
> >> needed to y to make it bigger than x, and stop there).  Slicing with one
> >> or both ends undefined (ie. s[:], s[x:], s[:y]) raises an IndexError
> >> (because, well, I read s[x:] as "return the elements of s starting from
> >> the x'th until the end"... but there is no such end.).  (A second option
> >> would be to return an infinite iterable for s[x:], but that doesn't take
> >> care of s[:y] anyways, not to mention the bugs that may appear from
> >> that.)
> >
> >
> > Another possibility, which makes some biological sense (thinking on
> > restriction), would be that
> > s[x:] (or s[:y]) returns a linear sequence starting at x and ending with x-1
> > (or ending with y and starting at y+1). Thus, s[x:] would mean 'cut my
> > circle at x and return the linear sequence starting at x'.
> 
> That's exactly the kind of behaviour which would make me nervous
> given in general the Biopython sequence objects mimic Python strings.
> There are many examples where that 'extra' sequence would be
> unexpected. For instance, writing out line wrapped sequence data.
> 
> I would prefer an explicit method like 'cut' on a circular sequence
> object returning a full length linear sequence. Similarly a 'roll' or
> 'rotate' method could shift the origin to a new coordinate.
> 
> One simple solution to the complexities of the slice behaviour is
> the practical one: They act like Python strings, basically all we
> would be adding would an 'is circular' flag and some logic about
> how to propagate that flag in operations like addition and slicing.
> If we went that route it might still be possible to make the find and
> 'in' functionality origin aware... but that may just cause trouble.
> 
> This would solve where to store if a sequence is circular (e.g. when
> reading GenBank and EMBL files - or for handling restriction
> enzyme digests), but other than that not add much utility.
> 
> Thoughts?
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From redmine at redmine.open-bio.org  Fri Jan 18 09:43:26 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Fri, 18 Jan 2013 09:43:26 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15065.20130118094326@redmine.open-bio.org>


Issue #3395 has been updated by Michiel de Hoon.


Micha?, can you confirm that the fixed Bio.trie works for you? Then we can close this bug report.

----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Fri Jan 18 15:17:43 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Fri, 18 Jan 2013 15:17:43 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15066.20130118151743@redmine.open-bio.org>


Issue #3395 has been updated by Micha? Nowotka.


Can you just give me two more weeks? I need some time to evaluate it.
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From eric.talevich at gmail.com  Sat Jan 19 01:20:11 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Fri, 18 Jan 2013 20:20:11 -0500
Subject: [Biopython-dev] Support for NeXML and RDF trees in Bio.Phylo
In-Reply-To: <CAAzEd5Bz5xvc2Bz80Ru+FbUbJK-WnAjfvLv70SfkPZup89NGRQ@mail.gmail.com>
References: <CAAzEd5AvRgkr=UYmqwHPH+cBYXCS+5yLHs=bHjCDxN1rY_aGFg@mail.gmail.com>
	<CAMC681=OrHJmfEbxWz=8-qzo2rEVJaqFeqgihiAMVi6No7GBCw@mail.gmail.com>
	<CAAzEd5Bz5xvc2Bz80Ru+FbUbJK-WnAjfvLv70SfkPZup89NGRQ@mail.gmail.com>
Message-ID: <CAMC681ndsaK0J0iR==O7djsG1KQxdbp6TWd7sgGDVySP2OHuSA@mail.gmail.com>

On Fri, Dec 28, 2012 at 10:50 AM, Ben Morris <ben at bendmorris.com> wrote:

> On Tue, Dec 25, 2012 at 2:18 AM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> >
> > On Mon, Dec 24, 2012 at 8:58 AM, Ben Morris <ben at bendmorris.com> wrote:
> >>
> >> Hi all,
> >>
> >> I've implemented support for two new phylogenetic tree formats: NeXML
> and
> >> RDF (conforming to the Comparative Data Analysis Ontology).
> >>
> >> I noticed that NeXML support was planned, but I didn't see anyone
> working
> >> on it on GitHub and the feature request hadn't been updated in about a
> >> year, so I went ahead and implemented a simple version. At first I tried
> >> the generateDS.py approach, but the generated writer doesn't give very
> much
> >> control over the output, so I ended up writing my own parser/writer
> using
> >> ElementTree.
> >>
> >> As for the RDF/CDAO format, AFAIK this is not a format that's supported
> by
> >> any other phylogenetic libraries, so I'm not sure how useful this is to
> >> everyone else. It provides a simple, standards-compliant format that
> can be
> >> imported to a triple store and supports annotation. We'll be using it at
> >> NESCent so I wanted to make it available to everyone else as well. The
> >> parser and writer require the Redlands Python bindings.
> >>
> >> The code is available in my fork of Biopython,
> >>
> >>     https://github.com/bendmorris/biopython
> >>
> >> under branches "cdao" and "nexml." I'd love to get everyone's thoughts
> and
> >> see if these contributions would be a good fit for the Biopython
> project.
> >
> >
> >
> > Thanks for letting us know! I'll try it out soonish. Looking at the code
> on your nexml branch, I have a few comments:
> >
> > - The parser uses ElementTree.parse rather than iterparse, so in its
> current state it would not be able to parse massive files (those larger
> than available RAM). Worth fixing eventually?
>
> Great point. I rewrote it to use iterparse instead.
>
> > - The parser creates Newick.Tree and Newick.Clade objects, which is
> nearly correct in my opinion. I would suggest subclassing BaseTree.Tree and
> BaseTree.Clade to create NeXML-specific Tree and Clade classes, even if you
> don't have any additional attributes to attach to those classes at the
> moment. (These would go in a new file NeXML.py, similar to PhyloXML.py and
> PhyloXMLIO.py.)
>
> Went ahead and did this as well.
>

Thanks! Sorry for the pace of this, I'm in the midst of a dissertation.


 > - The 'confidence' or 'confidences' attribute isn't used (for e.g.
> bootstrap support values). Does NeXML define it?
>
> Not that I'm aware of, but I'm not sure. I searched
> http://nexml.org/nexml/html/doc/schema-1/ and didn't find anything.
> I'm going to ask some people who know more about this than I do.
>

I would like for Bio.Phylo's I/O modules to be able to successfully
round-trip a file from Newick to phyloXML to NeXML and back to Newick
without losing support values. I found these two examples of how to add
this data to a NeXML document by referencing CDAO:
https://www.nescent.org/wg_evoinfo/NeXML_Test_Files#Bootstraps_represented_using_the_.22meta.22_tag
https://www.nescent.org/wg_evoinfo/NeXML_Test_Files#Bootstraps_represented_without_new_tags_or_elements

That's the standard way to store bootstrap supports in NeXML (Hilmar
confirms). How do your NeXML and CDAO modules interact, if at all? Would
the CDAO modules be useful to properly support NeXML metadata like
support/confidence values, or would it be simpler to just hard-code the few
tags we're specifically interested in?

Relatedly, those look like good test files. I see you've started writing
NeXML unit tests already; if you would like help with any of this, just let
me know.

-Eric


From mjldehoon at yahoo.com  Sun Jan 20 07:30:24 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 19 Jan 2013 23:30:24 -0800 (PST)
Subject: [Biopython-dev] Bio.Motif update
Message-ID: <1358667024.24762.YahooMailClassic@web164004.mail.gq1.yahoo.com>

Dear all,

As we discussed previously, I've been going over Bio.Motif to update it and make its usage more explicit. I'm pretty much done. While I have been uploading my changes to the main biopython github repository, this does not mean that these changes are final; comments and suggestions for changes are welcome.

In many cases, there is a difference in the syntax between the old Bio.Motif and the new Bio.Motif. For example, motif.consensus is a method in the old Bio.Motif, but a property in the new Bio.Motif.
While I tried to put PendingDeprecationWarnings on all changes consistently, there may be some corner cases that I missed.

For this reason, and also to make the documentation more understandable, it may be better to put the new Bio.Motif code in a module Bio.motifs, to put the old Bio.Motif code back into Bio.Motif (so that Bio.Motif in release 1.61 will be identical to the Bio.Motif in release 1.60), and (assuming that we are happy with the new Bio.motifs modules) put a PendingDeprecationWarning on Bio.Motif as a whole. Then in the documentation we'll have one chapter on Bio.Motif and one chapter on Bio.motifs. Also we'll have one set of tests for Bio.Motif, and one set of tests for Bio.motifs.

Any objections to creating a separate Bio.motifs module?

Here you can find the relevant chapter in the current documentation on the new Bio.Motif:

http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html#htoc190

Best,
-Michiel


From p.j.a.cock at googlemail.com  Sun Jan 20 19:03:45 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 20 Jan 2013 19:03:45 +0000
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <50F66496.8000109@biotech.uni-tuebingen.de>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
	<CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
	<CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>
	<CAKVJ-_77kdYzXv_Q_KVqy9jWSNJSgU+PWdVB-DzxdF8TKwUAGg@mail.gmail.com>
	<50F66496.8000109@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_5Tj+POzmYLvHx_nScjE6x9A-HgQPRQ_Ec_Bu1VGGjH6Q@mail.gmail.com>

On Wed, Jan 16, 2013 at 8:28 AM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
> On 2013-01-15 20:03, Peter Cock wrote:
>
> Hi Peter,
>
>> It will need a bit of work to rebase (some of the PEP8 changes have
>> touched the same lines of code), but I will try and do that this week.
>
> Your f_loc4 branch certainly fixes the problem I'm seeing. Is there
> anything I can do to help with getting it merged? I'm happy to give a
> closer look at the rebase conflicts coming up during the merge if you
> don't mind me asking the occasional question if I can't work out reasons
> for a code change from the commit messages.
>
> Cheers,
> Kai

I've done the rebase - all the tests still pass so if I missed anything
it should just be minor:

https://github.com/peterjc/biopython/commits/f_loc4 (old)
https://github.com/peterjc/biopython/commits/f_loc5 (rebased)

Kai - would you mind retesting with f_loc5 (the rebased branch)?

Everyone - does it seem sensible to include this now, ready for
the upcoming release (*)? Or perhaps just after the release?

Peter

(*) See other thread about Bio.Motif, which I think is all we need
to address before doing the release:
http://lists.open-bio.org/pipermail/biopython-dev/2013-January/010235.html


From bartek at rezolwenta.eu.org  Sun Jan 20 22:34:42 2013
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Sun, 20 Jan 2013 23:34:42 +0100
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <1358667024.24762.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <1358667024.24762.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CABHxouWi3xNub97t3vP1hPAQTFTbMa4qdFnq3FLTRnA39t4uWA@mail.gmail.com>

Hi,

great job Michiel! It looks very nice overall. As the code that will
be using the new library needs to be changed, I would vote for the
change in the namespace, but given that the userbase of the Bio.Motif
was quite limited, I think it wouldn't cause major problems to keep
the name as is.

best
Bartek

On Sun, Jan 20, 2013 at 8:30 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Dear all,
>
> As we discussed previously, I've been going over Bio.Motif to update it and make its usage more explicit. I'm pretty much done. While I have been uploading my changes to the main biopython github repository, this does not mean that these changes are final; comments and suggestions for changes are welcome.
>
> In many cases, there is a difference in the syntax between the old Bio.Motif and the new Bio.Motif. For example, motif.consensus is a method in the old Bio.Motif, but a property in the new Bio.Motif.
> While I tried to put PendingDeprecationWarnings on all changes consistently, there may be some corner cases that I missed.
>
> For this reason, and also to make the documentation more understandable, it may be better to put the new Bio.Motif code in a module Bio.motifs, to put the old Bio.Motif code back into Bio.Motif (so that Bio.Motif in release 1.61 will be identical to the Bio.Motif in release 1.60), and (assuming that we are happy with the new Bio.motifs modules) put a PendingDeprecationWarning on Bio.Motif as a whole. Then in the documentation we'll have one chapter on Bio.Motif and one chapter on Bio.motifs. Also we'll have one set of tests for Bio.Motif, and one set of tests for Bio.motifs.
>
> Any objections to creating a separate Bio.motifs module?
>
> Here you can find the relevant chapter in the current documentation on the new Bio.Motif:
>
> http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html#htoc190
>
> Best,
> -Michiel
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 
Bartek Wilczynski


From kai.blin at biotech.uni-tuebingen.de  Mon Jan 21 09:49:31 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Mon, 21 Jan 2013 10:49:31 +0100
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <CAKVJ-_5Tj+POzmYLvHx_nScjE6x9A-HgQPRQ_Ec_Bu1VGGjH6Q@mail.gmail.com>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
	<CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
	<CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>
	<CAKVJ-_77kdYzXv_Q_KVqy9jWSNJSgU+PWdVB-DzxdF8TKwUAGg@mail.gmail.com>
	<50F66496.8000109@biotech.uni-tuebingen.de>
	<CAKVJ-_5Tj+POzmYLvHx_nScjE6x9A-HgQPRQ_Ec_Bu1VGGjH6Q@mail.gmail.com>
Message-ID: <50FD0F2B.1080606@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-01-20 20:03, Peter Cock wrote:

> Kai - would you mind retesting with f_loc5 (the rebased branch)?

The location of the feature that caused trouble for me still looks
correct. I'm currently running some more sequences, but I'm pretty
confident that the code will work just fine. The tests I added to the
genbank parser code for all the problem cases I had pass, after all. :)

> Everyone - does it seem sensible to include this now, ready for the
> upcoming release (*)? Or perhaps just after the release?

I'd perfer having this in the next release if possible, but of course
if the release after that is coming up within a reasonable time frame,
that would work as well.

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQEcBAEBAgAGBQJQ/Q8rAAoJEKM5lwBiwTTP9oEIAIoa543zGerNtxNg67ybV4uE
jzOkyBzJIxkGAjIxcuNnYTo+OgYHkMQekeo7wkGgPKN558+LE8zKza3JdWbVqV/M
bEd6mYo5LsfveK3Vn397GJcPCOaQtb5MvNUOPJWstzReRVIM6lN3WXm3HxicuTji
2aFZG5dtaMXjZhxxMo4IRz2Jtrr01nZu1OVP02mco4LDoEkRInunDcWJcz/DOsJd
h4vJzVa4veMKFfJV4U9PGZnuatcwKgMLVQ1heKh4/efEOQ4dIjdlYG29FjHsZvy6
RjwL4ZZpGZfZwgBJPGiYqn5ZsgzVqgS5aWdw8/9jN5dpETP24DnzVi6vlIRTWqg=
=uUeG
-----END PGP SIGNATURE-----


From redmine at redmine.open-bio.org  Wed Jan 23 02:30:31 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 23 Jan 2013 02:30:31 +0000
Subject: [Biopython-dev] [Biopython - Bug #3403] (Closed) PDBList fails to
	download large PDB structures
References: <redmine.issue-3403.20130109225825@redmine.open-bio.org>
Message-ID: <redmine.journal-15068.20130123023031@redmine.open-bio.org>


Issue #3403 has been updated by Eric Talevich.

Status changed from New to Closed
% Done changed from 0 to 100

Fixed by David Cain. Thanks!
https://github.com/biopython/biopython/pull/146

First commit in the series here:
https://github.com/biopython/biopython/commit/7282e80ed6a65a10c5c624b2a7ec787656437a15
----------------------------------------
Bug #3403: PDBList fails to download large PDB structures
https://redmine.open-bio.org/issues/3403

Author: David Cain
Status: Closed
Priority: High
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: https://github.com/DavidCain/biopython/tree/fix_pdb_dl


The current @PDBList@ module will often fail to download large PDB files.

<pre>
>>> from Bio.PDB import PDBList
>>> pdbl = PDBList()
>>> pdbl.retrieve_pdb_file("1hgg")
Downloading PDB structure '1hgg'...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/Bio/PDB/PDBList.py", line 247, in retrieve_pdb_file
    out.writelines(gz.read())
  File "/usr/lib/python2.7/gzip.py", line 249, in read
    self._read(readsize)
  File "/usr/lib/python2.7/gzip.py", line 303, in _read
    self._read_eof()
  File "/usr/lib/python2.7/gzip.py", line 342, in _read_eof
    hex(self.crc)))
IOError: CRC check failed 0x21d7a5f7 != 0x4b5eabb6L
>>>
</pre>

The source of this problem is that the entire gzipped file must be read into memory before it's written to disk locally. With large archives, the local file can be truncated prematurely, which causes gzip to crash on extraction.

I fixed this issue on my "GitHub branch":https://github.com/DavidCain/biopython/tree/fix_pdb_dl, which I've made a pull request for.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From mjldehoon at yahoo.com  Sun Jan 27 04:45:46 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 26 Jan 2013 20:45:46 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
Message-ID: <1359261946.16561.YahooMailClassic@web164001.mail.gq1.yahoo.com>

[This message previously got lost in cyberspace. Sending it again.]

--- On Fri, 1/11/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Bow's SearchIO is using Bio.Blast.NCBIStandalone to handle
> plain text,
> https://github.com/biopython/biopython/blob/master/Bio/SearchIO/BlastIO/blast_text.py

OK then let's keep Bio.ParserSupport as is for now.

> That's why Bio._utils is a private module - we can
> drop/change/etc this without worrying about breaking
> other people's code. The issue with Bio.ParserSupport
> is it was a public API.

Its API being public was not the problem -- we have deprecated and removed lots of public modules over the years.

The problem with Bio.ParserSupport was twofold. First, it ended up making parsers more complex and difficult to understand for people not familiar with Bio.ParserSupport, in particular for newcomers and users trying to fix a bug. So Bio.ParserSupport never made us really happy. As a case in point, Bio._utils was created rather than reusing the code in Bio.ParserSupport.

The second problem was that many modules were using bits and pieces of Bio.ParserSupport, so we could not drop or change Bio.ParserSupport easily. Bio.ParserSupport has been officially obsolete but not deprecated for years.

> That's why Bio._utils is a private module - we can
> drop/change/etc this without worrying about breaking
> other people's code.

Let's drop it.

Just it being a private module doesn't make it "free". It clutters up the code base. This is particularly true for top-level modules.

Best,
-Michiel.


From mjldehoon at yahoo.com  Sun Jan 27 04:46:47 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 26 Jan 2013 20:46:47 -0800 (PST)
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <CABHxouWi3xNub97t3vP1hPAQTFTbMa4qdFnq3FLTRnA39t4uWA@mail.gmail.com>
Message-ID: <1359262007.25151.YahooMailClassic@web164002.mail.gq1.yahoo.com>

OK, thanks! I separated Bio.Motif into Bio.Motif (essentially the same as in Biopython release 1.60) and Bio.motifs (the new code).

Best,
-Michiel.

--- On Sun, 1/20/13, Bartek Wilczynski <bartek at rezolwenta.eu.org> wrote:

> From: Bartek Wilczynski <bartek at rezolwenta.eu.org>
> Subject: Re: [Biopython-dev] Bio.Motif update
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "BioPython-Dev" <biopython-dev at biopython.org>
> Date: Sunday, January 20, 2013, 5:34 PM
> Hi,
> 
> great job Michiel! It looks very nice overall. As the code
> that will
> be using the new library needs to be changed, I would vote
> for the
> change in the namespace, but given that the userbase of the
> Bio.Motif
> was quite limited, I think it wouldn't cause major problems
> to keep
> the name as is.
> 
> best
> Bartek
> 
> On Sun, Jan 20, 2013 at 8:30 AM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > Dear all,
> >
> > As we discussed previously, I've been going over
> Bio.Motif to update it and make its usage more explicit. I'm
> pretty much done. While I have been uploading my changes to
> the main biopython github repository, this does not mean
> that these changes are final; comments and suggestions for
> changes are welcome.
> >
> > In many cases, there is a difference in the syntax
> between the old Bio.Motif and the new Bio.Motif. For
> example, motif.consensus is a method in the old Bio.Motif,
> but a property in the new Bio.Motif.
> > While I tried to put PendingDeprecationWarnings on all
> changes consistently, there may be some corner cases that I
> missed.
> >
> > For this reason, and also to make the documentation
> more understandable, it may be better to put the new
> Bio.Motif code in a module Bio.motifs, to put the old
> Bio.Motif code back into Bio.Motif (so that Bio.Motif in
> release 1.61 will be identical to the Bio.Motif in release
> 1.60), and (assuming that we are happy with the new
> Bio.motifs modules) put a PendingDeprecationWarning on
> Bio.Motif as a whole. Then in the documentation we'll have
> one chapter on Bio.Motif and one chapter on Bio.motifs. Also
> we'll have one set of tests for Bio.Motif, and one set of
> tests for Bio.motifs.
> >
> > Any objections to creating a separate Bio.motifs
> module?
> >
> > Here you can find the relevant chapter in the current
> documentation on the new Bio.Motif:
> >
> > http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html#htoc190
> >
> > Best,
> > -Michiel
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
> 
> 
> 
> -- 
> Bartek Wilczynski
> 


From w.arindrarto at gmail.com  Sun Jan 27 10:52:15 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Sun, 27 Jan 2013 11:52:15 +0100
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359261946.16561.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <1359261946.16561.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CADEGkF6VnZE2gjfhm4sQAR2ecYm3Hjwpu8zmscPcgp_aHtQ8zA@mail.gmail.com>

Hi Michiel, everyone,

>> That's why Bio._utils is a private module - we can
>> drop/change/etc this without worrying about breaking
>> other people's code. The issue with Bio.ParserSupport
>> is it was a public API.
>
> Its API being public was not the problem -- we have deprecated and removed lots of public modules over the years.
>
> The problem with Bio.ParserSupport was twofold. First, it ended up making parsers more complex and difficult to understand for people not familiar with Bio.ParserSupport, in particular for newcomers and users trying to fix a bug. So Bio.ParserSupport never made us really happy. As a case in point, Bio._utils was created rather than reusing the code in Bio.ParserSupport.
>
> The second problem was that many modules were using bits and pieces of Bio.ParserSupport, so we could not drop or change Bio.ParserSupport easily. Bio.ParserSupport has been officially obsolete but not deprecated for years.
>
>> That's why Bio._utils is a private module - we can
>> drop/change/etc this without worrying about breaking
>> other people's code.
>
> Let's drop it.

My initial intention of refactoring and adding some new code to
Bio._utils was to reduce code repetition. I intended it (and perhaps
we should make it explicit in its docstrings) to be a collection of
small, useful functions that may be used in various cases.

Some examples inside include several string-formatting functions, each
of them independent of the other. There's also a general function for
running doctests
(https://github.com/biopython/biopython/blob/master/Bio/_utils.py#L100),
which was written because there was a lot of repetitive code in
different submodules basically doing the same thing (looking up the
test directory, running the test). I feel quite strongly that this
doctest function is required by many current (and future modules)
across Biopython, so it makes sense to refactor them out into a root
namespace.

All of this seems different from Bio.ParserSupport, which attempts to
be a one-single solution for writing new parsers (only parsers). Given
the wildly incoherent nature of different file output formats, it's
not surprising that Bio.ParserSupport's code base has to be quite
complicated to accomodate all of them. Naturally it has many related
parts and functions, and understanding them all is much harder than to
understand the small functions in Bio._utils (in my experience).

So for now, I think it is still ok if we use Bio._utils. Perhaps, in
light of this discussion, we should make it explicitly clear that it's
only for containing general, small, utility functions instead of
containing one 'support framework' (e.g. ParserSupport) to avoid
future unhappiness.

Cheers,
Bow


From eric.talevich at gmail.com  Mon Jan 28 05:59:14 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Mon, 28 Jan 2013 00:59:14 -0500
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CADEGkF6VnZE2gjfhm4sQAR2ecYm3Hjwpu8zmscPcgp_aHtQ8zA@mail.gmail.com>
References: <1359261946.16561.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CADEGkF6VnZE2gjfhm4sQAR2ecYm3Hjwpu8zmscPcgp_aHtQ8zA@mail.gmail.com>
Message-ID: <CAMC681m7kbjsRnAGZLDO4u_+RjjUoo3Jd7MTjWOsA8kiyJHqJA@mail.gmail.com>

On Sun, Jan 27, 2013 at 5:52 AM, Wibowo Arindrarto
<w.arindrarto at gmail.com>wrote:

> Hi Michiel, everyone,
>
> >> That's why Bio._utils is a private module - we can
> >> drop/change/etc this without worrying about breaking
> >> other people's code. The issue with Bio.ParserSupport
> >> is it was a public API.
> >
> > Its API being public was not the problem -- we have deprecated and
> removed lots of public modules over the years.
> >
> > The problem with Bio.ParserSupport was twofold. First, it ended up
> making parsers more complex and difficult to understand for people not
> familiar with Bio.ParserSupport, in particular for newcomers and users
> trying to fix a bug. So Bio.ParserSupport never made us really happy. As a
> case in point, Bio._utils was created rather than reusing the code in
> Bio.ParserSupport.
> >
> > The second problem was that many modules were using bits and pieces of
> Bio.ParserSupport, so we could not drop or change Bio.ParserSupport easily.
> Bio.ParserSupport has been officially obsolete but not deprecated for years.
> >
> >> That's why Bio._utils is a private module - we can
> >> drop/change/etc this without worrying about breaking
> >> other people's code.
> >
> > Let's drop it.
>
> My initial intention of refactoring and adding some new code to
> Bio._utils was to reduce code repetition. I intended it (and perhaps
> we should make it explicit in its docstrings) to be a collection of
> small, useful functions that may be used in various cases.
>
> Some examples inside include several string-formatting functions, each
> of them independent of the other. There's also a general function for
> running doctests
> (https://github.com/biopython/biopython/blob/master/Bio/_utils.py#L100),
> which was written because there was a lot of repetitive code in
> different submodules basically doing the same thing (looking up the
> test directory, running the test). I feel quite strongly that this
> doctest function is required by many current (and future modules)
> across Biopython, so it makes sense to refactor them out into a root
> namespace.
>

Interesting discussion.

It's worth considering why some functions are being used in multiple parts
of the code base. In some cases there are essentially shortcomings in the
Python standard library or issues with
cross-platform/cross-implementation/backward compatibility that would
require us to use *exactly* the same code each time a certain recurring
problem is encountered. The Bio._py3k and Bio.File modules makes sense for
this reason, I think, and before we deprecated Py2.4 it would have been
helpful to have shared code for importing ElementTree (both the uniprot-xml
and phyloXML parsers used the same half-page tangle of attempted imports).

So, maybe the doctest helpers should go in a new module specific to that
topic.

In other cases there's a recurring need in separate modules, but (a) it's
short and simple enough to write the solution from scratch each time where
it's needed, and so isn't enough of a maintenance concern to offset the
convenience of having all the relevant code in one place; and/or (b) the
needs of different modules aren't exactly the same, merely similar, leading
to a proliferation of options in the shared function and the situation that
a simpler implementation would have worked for any given module.

The point is that just as there's a maintenance cost to having duplicated
code in multiple places, there's a maintenance cost to having dependencies
between multiple modules even within the same project, and the value of a
new module ought to be greater than the cost it imposes.

Best,
Eric


From mjldehoon at yahoo.com  Mon Jan 28 14:58:58 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 28 Jan 2013 06:58:58 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CADEGkF6VnZE2gjfhm4sQAR2ecYm3Hjwpu8zmscPcgp_aHtQ8zA@mail.gmail.com>
Message-ID: <1359385138.84799.YahooMailClassic@web164002.mail.gq1.yahoo.com>

Hi Bow,

--- On Sun, 1/27/13, Wibowo Arindrarto <w.arindrarto at gmail.com> wrote:
> All of this seems different from Bio.ParserSupport, which
> attempts to be a one-single solution for writing new parsers
> (only parsers). Given the wildly incoherent nature of different
> file output formats, it's not surprising that Bio.ParserSupport's
> code base has to be quite complicated to accommodate all of them.
> Naturally it has many related parts and functions, and understanding
> them all is much harder than to understand the small functions in
> Bio._utils (in my experience).

It's not just Bio.ParserSupport; previously we also had Bio/listfns.py; Bio/mathfns.py; Bio/stringfns.py; their C versions; and Bio/csupport.c. These all contained small utility functions. But in the end we dropped them.

Btw, was Bio._utils ever discussed on the mailing list? If yes, I apologize for missing this discussion and raising these issues now.

Best,

-Michiel.


From p.j.a.cock at googlemail.com  Mon Jan 28 15:10:29 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 28 Jan 2013 15:10:29 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359385138.84799.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CADEGkF6VnZE2gjfhm4sQAR2ecYm3Hjwpu8zmscPcgp_aHtQ8zA@mail.gmail.com>
	<1359385138.84799.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_4q+QcrLGBshNe7fo5GxqnuBMunt1=iKERXCRD3e3vxww@mail.gmail.com>

On Mon, Jan 28, 2013 at 2:58 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> Btw, was Bio._utils ever discussed on the mailing list? If yes, I
> apologize for missing this discussion and raising these issues now.

I think only on the pull request - I'll have a look at the GitHub
settings as ideally at the minimum new pull requests should
perhaps be CC'd to the dev list?

Peter


From p.j.a.cock at googlemail.com  Mon Jan 28 15:17:19 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 28 Jan 2013 15:17:19 +0000
Subject: [Biopython-dev] Sending pull requests to the mailing list
Message-ID: <CAKVJ-_5uPrrzq7WN=x9s7NWjX5Q8E0OYBwKA9Pz_M=GncpMncg@mail.gmail.com>

Retitling thread,

On Mon, Jan 28, 2013 at 3:10 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Mon, Jan 28, 2013 at 2:58 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>>
>> Btw, was Bio._utils ever discussed on the mailing list? If yes, I
>> apologize for missing this discussion and raising these issues now.
>
> I think only on the pull request - I'll have a look at the GitHub
> settings as ideally at the minimum new pull requests should
> perhaps be CC'd to the dev list?

According to https://help.github.com/articles/using-pull-requests

"Everyone that can push to the base repository will receive an
 email notification and see the new pull request in their
 dashboard the next time they log in."

I think you can also choose to get emails under your own profile
settings. There doesn't seem to be any email notification settings
under the Biopython organisation account on GitHub.

If there is an easy way to have GitHub email new pull requests to
the biopython-dev mailing I've overlooked it. There might be an
API based solution... or a simple email client forwarding rule?

Peter


From w.arindrarto at gmail.com  Mon Jan 28 17:19:51 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Mon, 28 Jan 2013 18:19:51 +0100
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359385138.84799.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CADEGkF6VnZE2gjfhm4sQAR2ecYm3Hjwpu8zmscPcgp_aHtQ8zA@mail.gmail.com>
	<1359385138.84799.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CADEGkF6r0--bP8Nr+eSpjQgxUCnfd29UMXOSyzh0LMxf1xFi-g@mail.gmail.com>

Hi everyone,

> --- On Sun, 1/27/13, Wibowo Arindrarto <w.arindrarto at gmail.com> wrote:
>> All of this seems different from Bio.ParserSupport, which
>> attempts to be a one-single solution for writing new parsers
>> (only parsers). Given the wildly incoherent nature of different
>> file output formats, it's not surprising that Bio.ParserSupport's
>> code base has to be quite complicated to accommodate all of them.
>> Naturally it has many related parts and functions, and understanding
>> them all is much harder than to understand the small functions in
>> Bio._utils (in my experience).
>
> It's not just Bio.ParserSupport; previously we also had Bio/listfns.py; Bio/mathfns.py; Bio/stringfns.py; their C versions; and Bio/csupport.c. These all contained small utility functions. But in the end we dropped them.

Hm..in this case (and in light of Eric's points as well), it may be ok
to drop the string formatting functions in Bio._utils. They are used
in Bio.Phylo and Bio.SearchIO for now. In Bio.SearchIO they are used
in multiple submodules, however, so I am still leaning on putting them
at least on Bio.SearchIO's main directory. They were originally in
Bio.SearchIO._utils, after all.

As for the doctest-related functions, do you propose to move them to a
specific doctest-related module as well?

>> Btw, was Bio._utils ever discussed on the mailing list? If yes, I
>> apologize for missing this discussion and raising these issues now.
>
> I think only on the pull request - I'll have a look at the GitHub
> settings as ideally at the minimum new pull requests should
> perhaps be CC'd to the dev list?

Indeed, I did submit a pull request but was not forwarded / discussed
in the mailing list. This is the pull request, for reference:
https://github.com/biopython/biopython/pull/140. For the dev-mailing
list notification, I personally agree, given that the amount of pull
requests received still seems manageable. Is it possible to just
receive the initial email notifying the pull, though?

So far, I've been 'watching' the repository and getting emails from
there ~ perhaps the organization needs to 'watch' the repo to get
notifications as well?

Best,
Bow


From redmine at redmine.open-bio.org  Mon Jan 28 22:20:54 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 28 Jan 2013 22:20:54 +0000
Subject: [Biopython-dev] [Biopython - Bug #2776] Bio.pairwise2 returns
	non-optimal alignment in at least some cases
References: <redmine.issue-2776.20090302102253@redmine.open-bio.org>
Message-ID: <redmine.journal-15069.20130128222054@redmine.open-bio.org>


Issue #2776 has been updated by Peter Cock.


In the opinion of Bryan Lunt, comment on another issue on Github:
https://github.com/biopython/biopython/pull/149

"Bug" 2776 is not a bug, it is a feature.

I hand-edited a datafile for EMBOSS programs and tried the EMBOSS "needle" program with (a homomorphism of) the same sequences. It behaves the same as pairwise2.

The point is that for there to be gaps they have to be flanked by matches, except on the ends, so what the original bug report asks for is not something these algorithms will ever produce anyway.
----------------------------------------
Bug #2776: Bio.pairwise2 returns non-optimal alignment in at least some cases
https://redmine.open-bio.org/issues/2776

Author: Klaus Kopec
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.49
URL: 


At least in some cases, Bio.pairwise2 returns an alignment that is not the one with the highest score for the input parameters. This occurs in localXX and globalXX.

Yet, I only encountered the problem with large mismatch values (which I use as I need mismatch free alignments).

simple example (the bug also occured for longer sequences):
>>> sequence1 = 'GKG'
>>> sequence2 = 'GWG'
>>> A = pairwise2.align.globalms(sequence1, sequence2, 5, -100, -5, -5)[0]
>>> A[0]
'GKG--'
>>> A[1]
'--GWG'
>>> A[2]
-15.0

whereas
'GK-G'
'G-WG'

would get a score of 0


System: Kubuntu 8.10 64Bit, Python 2.6.1, Biopython 1.49 (my pairwise2.py is identical to the current CVS version of it)


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From mjldehoon at yahoo.com  Tue Jan 29 09:43:59 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 29 Jan 2013 01:43:59 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CAKVJ-_4q+QcrLGBshNe7fo5GxqnuBMunt1=iKERXCRD3e3vxww@mail.gmail.com>
Message-ID: <1359452639.95165.YahooMailClassic@web164005.mail.gq1.yahoo.com>

I'd prefer if developers first write to the dev mailing list if they want to make any major changes, or changes that affect Biopython overall. It can be hard to understand the implications just from looking at a pull request, and there may be so many pull requests that the important ones may be missed anyway.

Best,
-Michiel.

--- On Mon, 1/28/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "Wibowo Arindrarto" <w.arindrarto at gmail.com>, "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Monday, January 28, 2013, 10:10 AM
> On Mon, Jan 28, 2013 at 2:58 PM,
> Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> >
> > Btw, was Bio._utils ever discussed on the mailing list?
> If yes, I
> > apologize for missing this discussion and raising these
> issues now.
> 
> I think only on the pull request - I'll have a look at the
> GitHub
> settings as ideally at the minimum new pull requests should
> perhaps be CC'd to the dev list?
> 
> Peter
> 


From mjldehoon at yahoo.com  Tue Jan 29 09:54:01 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 29 Jan 2013 01:54:01 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CADEGkF6r0--bP8Nr+eSpjQgxUCnfd29UMXOSyzh0LMxf1xFi-g@mail.gmail.com>
Message-ID: <1359453241.43038.YahooMailClassic@web164004.mail.gq1.yahoo.com>

--- On Mon, 1/28/13, Wibowo Arindrarto <w.arindrarto at gmail.com> wrote:
> Hm..in this case (and in light of Eric's points as well), it
> may be ok to drop the string formatting functions in Bio._utils.
> They are used in Bio.Phylo and Bio.SearchIO for now. In Bio.SearchIO
> they are used in multiple submodules, however, so I am still leaning
> on putting them at least on Bio.SearchIO's main directory. They were
> originally in Bio.SearchIO._utils, after all.

I think it's OK to have a _utils submodule inside Bio.SearchIO. Since you are developing and maintaining that module, to a large degree it's up to you how you want to organize your code. For the same reason, for Bio.Phylo it's better to discuss with Eric Talevich first to see what he thinks.

> As for the doctest-related functions, do you propose to move
> them to a specific doctest-related module as well?

For the doctest-related functions, we first need to understand what the purpose is, before deciding how to implement it (and in what module the code should be).

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Tue Jan 29 10:23:43 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 29 Jan 2013 10:23:43 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359452639.95165.YahooMailClassic@web164005.mail.gq1.yahoo.com>
References: <CAKVJ-_4q+QcrLGBshNe7fo5GxqnuBMunt1=iKERXCRD3e3vxww@mail.gmail.com>
	<1359452639.95165.YahooMailClassic@web164005.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7+hQ1OHDd-tWWVYbLzVgbRpe9wkyL2ZPnatYcdake1uw@mail.gmail.com>

On Tue, Jan 29, 2013 at 9:43 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> I'd prefer if developers first write to the dev mailing list if they want to make
> any major changes, or changes that affect Biopython overall. It can be hard
> to understand the implications just from looking at a pull request, and there
> may be so many pull requests that the important ones may be missed anyway.

Certainly a good policy, which I have tried to follow.

In this case since it was just moving a small private API code, I
didn't consider
it major.

Peter


From p.j.a.cock at googlemail.com  Tue Jan 29 10:29:30 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 29 Jan 2013 10:29:30 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359453241.43038.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <CADEGkF6r0--bP8Nr+eSpjQgxUCnfd29UMXOSyzh0LMxf1xFi-g@mail.gmail.com>
	<1359453241.43038.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_5iKKx4h5OyGLOpmvfuNbDHgCa_Kx9po2st+oan_ZMR=g@mail.gmail.com>

On Tue, Jan 29, 2013 at 9:54 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
>> As for the doctest-related functions, do you propose to move
>> them to a specific doctest-related module as well?
>
> For the doctest-related functions, we first need to understand
> what the purpose is, before deciding how to implement it (and
> in what module the code should be).

When editing doctests, it is convenient to be able to run them on
the current file, e.g.

~/biopython $ emacs Bio/SeqRecord.py
~/biopython $ python Bio/SeqRecord.py

Or,

~/biopython/Bio $ emacs SeqRecord.py
~/biopython/Bio $ python SeqRecord.py

To do that, many of our modules had a repeated bit of code at
the bottom, now moved to a shared function in Bio/_utils.py
resulting in a lot less boiler plate code, e.g.

https://github.com/biopython/biopython/commit/8b59d89bb4e282192ddee751e24ceef4afa63528

Bow had initially done this for the doctests in Bio.SearchIO,
but I agreed it make sense to do this elsewhere.

Peter


From w.arindrarto at gmail.com  Tue Jan 29 11:05:19 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 29 Jan 2013 12:05:19 +0100
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359453241.43038.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <CADEGkF6r0--bP8Nr+eSpjQgxUCnfd29UMXOSyzh0LMxf1xFi-g@mail.gmail.com>
	<1359453241.43038.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CADEGkF4LL=0_5uCtMMsBveM-ce1XkxPDcTASMhAUEXKiYFLY9A@mail.gmail.com>

Hi Michiel, everyone,

>>> I'd prefer if developers first write to the dev mailing list if they want to make any major changes, or changes that affect Biopython overall. It can be hard to understand the implications just from looking at a pull request, and there may be so many pull requests that the important ones may be missed anyway.

>>> I think it's OK to have a _utils submodule inside Bio.SearchIO. Since you are developing and maintaining that module, to a large degree it's up to you how you want to organize your code. For the same reason, for Bio.Phylo it's better to discuss with Eric Talevich first to see what he thinks.

Noted. I'm sorry that this is causing more headaches than it solves.
I'll be sure to notify the dev-mailing list for other similar changes.

>>> As for the doctest-related functions, do you propose to move
>>> them to a specific doctest-related module as well?
>>
>> For the doctest-related functions, we first need to understand what the purpose is, before deciding how to implement it (and in what module the code should be).
>
> When editing doctests, it is convenient to be able to run them on
> the current file, e.g.
>
> ~/biopython $ emacs Bio/SeqRecord.py
> ~/biopython $ python Bio/SeqRecord.py
>
> Or,
>
> ~/biopython/Bio $ emacs SeqRecord.py
> ~/biopython/Bio $ python SeqRecord.py
>
> To do that, many of our modules had a repeated bit of code at
> the bottom, now moved to a shared function in Bio/_utils.py
> resulting in a lot less boiler plate code, e.g.
>
> https://github.com/biopython/biopython/commit/8b59d89bb4e282192ddee751e24ceef4afa63528
>
> Bow had initially done this for the doctests in Bio.SearchIO,
> but I agreed it make sense to do this elsewhere.

Indeed, the doctests functions are two simple small functions to make
it easier to run doctests. The first one looks up the test directory
(our Tests directory) and the second one simply executes the doctest.

Best,
Bow


From p.j.a.cock at googlemail.com  Tue Jan 29 15:46:25 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 29 Jan 2013 15:46:25 +0000
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <1359262007.25151.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CABHxouWi3xNub97t3vP1hPAQTFTbMa4qdFnq3FLTRnA39t4uWA@mail.gmail.com>
	<1359262007.25151.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_5X3eoXEnqD7yfTGFW1Saxr7rMe-WcbCofmCqdu_yq6KA@mail.gmail.com>

On Sun, Jan 27, 2013 at 4:46 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> OK, thanks! I separated Bio.Motif into Bio.Motif (essentially the same
> as in Biopython release 1.60) and Bio.motifs (the new code).

We need to say something about this in the NEWS file too.

I think it would make sense to add a PendingDeprecationWarning
to Bio.Motif now. Also, if you feel the new Bio.motifs API isn't quite
settled yet, adding the new BiopythonExperimentalWarning to that
makes sense.

What do you think?

(And once this is settled, I think we can schedule the release)

Regards,

Peter


From p.j.a.cock at googlemail.com  Tue Jan 29 17:10:50 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 29 Jan 2013 17:10:50 +0000
Subject: [Biopython-dev] Namespace for online resources?
Message-ID: <CAKVJ-_62TQS9-eswnkNRtaMQGri3cDSCNgRFyqwjVofgrg+9rA@mail.gmail.com>

Hello all,

We used to have Bio.WWW for assorted online tools, but that
was deprecated some time back. Is there a case for bringing it
back, or something similar like Bio.WebTools as suggested by
Kevin Murray on this pull request?:

https://github.com/biopython/biopython/pull/132

In this case, since this is to fetch Arabidopsis sequence via
an accession number, perhaps Bio.SeqUtils might be better?
(As an aside, recall we've talked about merging Bio.Seq* at
some point).

Thoughts?

Peter


From w.arindrarto at gmail.com  Tue Jan 29 19:52:42 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 29 Jan 2013 20:52:42 +0100
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CAKVJ-_62TQS9-eswnkNRtaMQGri3cDSCNgRFyqwjVofgrg+9rA@mail.gmail.com>
References: <CAKVJ-_62TQS9-eswnkNRtaMQGri3cDSCNgRFyqwjVofgrg+9rA@mail.gmail.com>
Message-ID: <CADEGkF5QQpvt5svMMsfM=sGW+zavQq-mpAtBu=Twf3CX5+rDKg@mail.gmail.com>

Hi everyone,

> We used to have Bio.WWW for assorted online tools, but that
> was deprecated some time back. Is there a case for bringing it
> back, or something similar like Bio.WebTools as suggested by
> Kevin Murray on this pull request?:
>
> https://github.com/biopython/biopython/pull/132
>
> In this case, since this is to fetch Arabidopsis sequence via
> an accession number, perhaps Bio.SeqUtils might be better?
> (As an aside, recall we've talked about merging Bio.Seq* at
> some point).

Why was Bio.WWW deprecated in the first place?

Personally, I would prefer to have all online database access
centralized in one place, if possible. It makes for a less-cluttered
root namespace and may be more intuitive in most cases. I do notice
that for cases like Bio.Entrez, sometimes we need to only parse the
data locally since it has been downloaded previously (hence no online
access). To do this task, Bio.www (basically the centralized online
module) may not be the most intuitive place to look in, for most
people, although an argument can be made that we are still parsing
data whose format is specific for an online resource.

However, looking at the way we are doing this now (with the current
codebase placing Entrez access and parsing in Bio.Entrez; similarly
for Bio.ExPASy) locating the module in Bio.TAIR (or Bio.tair? PEP-8
compliance?) looks more consistent. If we are to create a new module
for online access (e.g. Bio.webtools. Bio.www) for Bio.TAIR, for
consistency we may have to juggle Entrez and ExPASy around as well,
right?

Putting Bio.TAIR in Bio.SeqUtils doesn't seem..right to me. My
impression is that SeqUtils is supposed to be for functions acting on
sequence strings (or Seq objects) and nothing else. After all, we can
also retrieve GenBank sequences from Biopython but that functionality
is separated on its own Bio.Entrez not Bio.SeqUtils.
.
Just my two cents :),
Bow


From arklenna at gmail.com  Tue Jan 29 20:05:15 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Tue, 29 Jan 2013 15:05:15 -0500
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CADEGkF5QQpvt5svMMsfM=sGW+zavQq-mpAtBu=Twf3CX5+rDKg@mail.gmail.com>
References: <CAKVJ-_62TQS9-eswnkNRtaMQGri3cDSCNgRFyqwjVofgrg+9rA@mail.gmail.com>
	<CADEGkF5QQpvt5svMMsfM=sGW+zavQq-mpAtBu=Twf3CX5+rDKg@mail.gmail.com>
Message-ID: <CALfq9t+w=_9dEMFDUZdpJNRMKOxL_+PkmHKCm=Wf2o-KA0XabA@mail.gmail.com>

I agree with Bow that centralizing all online database access makes sense.
It would also simplify the testing process (i.e. anything that requires a
network connection goes into the web namespace and can be skipped when
testing offline).

In situations like Entrez, the network access portion could be separated
out and put into the web namespace under the same name:

    import Bio.www.Entrez  # for downloading the data
    import Bio.Entrez  # for parsing/using the downloaded data

Cheers,

Lenna


On Tue, Jan 29, 2013 at 2:52 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com>wrote:

> Hi everyone,
>
> > We used to have Bio.WWW for assorted online tools, but that
> > was deprecated some time back. Is there a case for bringing it
> > back, or something similar like Bio.WebTools as suggested by
> > Kevin Murray on this pull request?:
> >
> > https://github.com/biopython/biopython/pull/132
> >
> > In this case, since this is to fetch Arabidopsis sequence via
> > an accession number, perhaps Bio.SeqUtils might be better?
> > (As an aside, recall we've talked about merging Bio.Seq* at
> > some point).
>
> Why was Bio.WWW deprecated in the first place?
>
> Personally, I would prefer to have all online database access
> centralized in one place, if possible. It makes for a less-cluttered
> root namespace and may be more intuitive in most cases. I do notice
> that for cases like Bio.Entrez, sometimes we need to only parse the
> data locally since it has been downloaded previously (hence no online
> access). To do this task, Bio.www (basically the centralized online
> module) may not be the most intuitive place to look in, for most
> people, although an argument can be made that we are still parsing
> data whose format is specific for an online resource.
>
> However, looking at the way we are doing this now (with the current
> codebase placing Entrez access and parsing in Bio.Entrez; similarly
> for Bio.ExPASy) locating the module in Bio.TAIR (or Bio.tair? PEP-8
> compliance?) looks more consistent. If we are to create a new module
> for online access (e.g. Bio.webtools. Bio.www) for Bio.TAIR, for
> consistency we may have to juggle Entrez and ExPASy around as well,
> right?
>
> Putting Bio.TAIR in Bio.SeqUtils doesn't seem..right to me. My
> impression is that SeqUtils is supposed to be for functions acting on
> sequence strings (or Seq objects) and nothing else. After all, we can
> also retrieve GenBank sequences from Biopython but that functionality
> is separated on its own Bio.Entrez not Bio.SeqUtils.
> .
> Just my two cents :),
> Bow
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From p.j.a.cock at googlemail.com  Tue Jan 29 21:03:59 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 29 Jan 2013 21:03:59 +0000
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CADEGkF5QQpvt5svMMsfM=sGW+zavQq-mpAtBu=Twf3CX5+rDKg@mail.gmail.com>
References: <CAKVJ-_62TQS9-eswnkNRtaMQGri3cDSCNgRFyqwjVofgrg+9rA@mail.gmail.com>
	<CADEGkF5QQpvt5svMMsfM=sGW+zavQq-mpAtBu=Twf3CX5+rDKg@mail.gmail.com>
Message-ID: <CAKVJ-_75uoDU_4chu8WpcrD_zRjCjwF6Qa6AjadKvBy-321UWw@mail.gmail.com>

On Tue, Jan 29, 2013 at 7:52 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi everyone,
>
> Why was Bio.WWW deprecated in the first place?
>

The flippant answer is everything under Bio.WWW was moved
or deprecated:
http://lists.open-bio.org/pipermail/biopython-dev/2008-July/004059.html

I'm trying to identify the discussions prior to that covering the moves:

Bio.WWW.ExPASy -> Bio.ExPASy
Bio.WWW.InterPro -> Bio.InterPro
Bio.WWW.NCBI -> Bio.Entrez
Bio.WWW.SCOP -> Bio.SCOP

Peter


From p.j.a.cock at googlemail.com  Tue Jan 29 21:11:29 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 29 Jan 2013 21:11:29 +0000
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CAKVJ-_75uoDU_4chu8WpcrD_zRjCjwF6Qa6AjadKvBy-321UWw@mail.gmail.com>
References: <CAKVJ-_62TQS9-eswnkNRtaMQGri3cDSCNgRFyqwjVofgrg+9rA@mail.gmail.com>
	<CADEGkF5QQpvt5svMMsfM=sGW+zavQq-mpAtBu=Twf3CX5+rDKg@mail.gmail.com>
	<CAKVJ-_75uoDU_4chu8WpcrD_zRjCjwF6Qa6AjadKvBy-321UWw@mail.gmail.com>
Message-ID: <CAKVJ-_7pCQj4C=8MXXq=8Cr6e1rNNDuNinCJqQDhADpV_muMSg@mail.gmail.com>

On Tue, Jan 29, 2013 at 9:03 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Jan 29, 2013 at 7:52 PM, Wibowo Arindrarto
> <w.arindrarto at gmail.com> wrote:
>> Hi everyone,
>>
>> Why was Bio.WWW deprecated in the first place?
>>
>
> The flippant answer is everything under Bio.WWW was moved
> or deprecated:
> http://lists.open-bio.org/pipermail/biopython-dev/2008-July/004059.html
>
> I'm trying to identify the discussions prior to that covering the moves:
>
> Bio.WWW.ExPASy -> Bio.ExPASy
> Bio.WWW.InterPro -> Bio.InterPro
> Bio.WWW.NCBI -> Bio.Entrez
> Bio.WWW.SCOP -> Bio.SCOP

Probably this thread,
http://lists.open-bio.org/pipermail/biopython-dev/2007-November/003241.html

Also a bit more background on the NCBI Entrez side:
http://lists.open-bio.org/pipermail/biopython-dev/2008-February/003423.html

Peter


From natemsutton at yahoo.com  Tue Jan 29 21:22:57 2013
From: natemsutton at yahoo.com (Nate Sutton)
Date: Tue, 29 Jan 2013 13:22:57 -0800 (PST)
Subject: [Biopython-dev] New BioPython member
Message-ID: <1359494577.29159.YahooMailNeo@web122606.mail.ne1.yahoo.com>

Dear all,

I just recently joined the BioPython developers group and am
looking forward to contributing to BioPython!? I have worked for a while in programming, genetics, and biology and have
a m.s. in Biomedical Informatics.? After
talking with some fellow contributors I have decided to try working on https://redmine.open-bio.org/issues/3360 but I will also work on writing some documentation on examples from the
cookbook, especially if I am stuck on the bug.? If anyone wants to work on the same things, I?d be glad to hear that, I
may be slow on the work because I am still learning Python after coming from
other languages.

-Nate


From mjldehoon at yahoo.com  Wed Jan 30 02:00:32 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 29 Jan 2013 18:00:32 -0800 (PST)
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CAKVJ-_7pCQj4C=8MXXq=8Cr6e1rNNDuNinCJqQDhADpV_muMSg@mail.gmail.com>
Message-ID: <1359511232.14591.YahooMailClassic@web164002.mail.gq1.yahoo.com>

Bio.WWW was one of those modules that seem a good idea at first, but then failed to gain general acceptance. There are three problems with Bio.WWW:

1) From the module name, it's not clear what you would find in it. For example, if you want to access the Entrez database, would you first look in Bio.Entrez or in Bio.WWW? Similarly for TAIR: Would you look for it in Bio.TAIR, or in Bio.WWW?

2) The modules in Bio.WWW don't have much to do with each other, except that they access the internet. But any given user probably is mainly interested in Entrez, or ExPASy, or some other database, not in all of them at the same time.

3) The flip side of this is that a user accessing e.g. ExPASy would have to import both Bio.WWW and Bio.ExPASy to be able to use ExPASy. Doctests get more complicated also, as they would span more than one module. Here is an example from Bio.Entrez that accesses the database, and then parses the results:
>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here at example.org"
>>> handle = Entrez.einfo() # or esearch, efetch, ...
>>> record = Entrez.read(handle)
>>> handle.close()

The ultimate question is whether we organize the code in Biopython by their functionality from a user perspective, or by the kind of things they do? Almost all of Biopython is organized according to the former. For example, we don't have a Bio.Parsers module for all the parsers; similarly, we don't have Bio.WWW for internet access.

Best,
-Michiel.


--- On Tue, 1/29/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] Namespace for online resources?
> To: "Wibowo Arindrarto" <w.arindrarto at gmail.com>
> Cc: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Tuesday, January 29, 2013, 4:11 PM
> On Tue, Jan 29, 2013 at 9:03 PM,
> Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> > On Tue, Jan 29, 2013 at 7:52 PM, Wibowo Arindrarto
> > <w.arindrarto at gmail.com>
> wrote:
> >> Hi everyone,
> >>
> >> Why was Bio.WWW deprecated in the first place?
> >>
> >
> > The flippant answer is everything under Bio.WWW was
> moved
> > or deprecated:
> > http://lists.open-bio.org/pipermail/biopython-dev/2008-July/004059.html
> >
> > I'm trying to identify the discussions prior to that
> covering the moves:
> >
> > Bio.WWW.ExPASy -> Bio.ExPASy
> > Bio.WWW.InterPro -> Bio.InterPro
> > Bio.WWW.NCBI -> Bio.Entrez
> > Bio.WWW.SCOP -> Bio.SCOP
> 
> Probably this thread,
> http://lists.open-bio.org/pipermail/biopython-dev/2007-November/003241.html
> 
> Also a bit more background on the NCBI Entrez side:
> http://lists.open-bio.org/pipermail/biopython-dev/2008-February/003423.html
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


From kjwu at ucsd.edu  Wed Jan 30 02:09:42 2013
From: kjwu at ucsd.edu (Kevin Wu)
Date: Tue, 29 Jan 2013 18:09:42 -0800
Subject: [Biopython-dev] Trie with_prefix doesn't work as expected
Message-ID: <CAEe6yUEX_xbawFkP74-yKO+eTKo_mAU9SGO-+0PoRwU0mA8=vw@mail.gmail.com>

Hi All,

I'm attempting to use the trie implementation in biopython to develop a
suffix trie. I'm using the with_prefix function to find all keys which
start with a sequence, however, the function doesn't return values that I
expect. I tested it with the canonical example "banana" and am a bit
confused.

from Bio.trie import trie
t = trie()
s = "BANANA"
for i in range(len(s)):  # insert all suffixes into trie
    t[s[i:]] = i

t.with_prefix("NA")  # this works as expected
>> ['NA', 'NANA']

t.with_prefix("AN")
>> ['AN', 'ANNA']  # this doesn't work as expected
                           # expected output: ["ANANA", "ANA"]

Can anyone clarify my confusion or confirm this bug? I'm on Biopython 1.60,
Linux Mint 64-bit.

Thanks!
Kevin


From mjldehoon at yahoo.com  Wed Jan 30 02:29:09 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 29 Jan 2013 18:29:09 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CADEGkF4LL=0_5uCtMMsBveM-ce1XkxPDcTASMhAUEXKiYFLY9A@mail.gmail.com>
Message-ID: <1359512949.16659.YahooMailClassic@web164002.mail.gq1.yahoo.com>

Hi Bow,

Thanks for the explanation.

> Indeed, the doctests functions are two simple small
> functions to make it easier to run doctests. The first
> one looks up the test directory (our Tests directory) and
> the second one simply executes the doctest.

The point of looking up the test directory is to find the example input files, right?
Have a look at Bio/Align/Applications/_Mafft.py.
Its doctest uses the complete path to the example input file:

https://github.com/biopython/biopython/commit/32a6beb1e039fa614398a7dee1c031466e8e42ed#Bio/Align/Applications/_Mafft.py

I like this solution better, since it's more straightforward, it doesn't need a new module, and also allows the user to run the example without having to figure out where the input file is located.

Best,
-Michiel.


From k.d.murray.91 at gmail.com  Wed Jan 30 03:37:46 2013
From: k.d.murray.91 at gmail.com (Kevin Murray)
Date: Wed, 30 Jan 2013 14:37:46 +1100
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <1359511232.14591.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CAKVJ-_7pCQj4C=8MXXq=8Cr6e1rNNDuNinCJqQDhADpV_muMSg@mail.gmail.com>
	<1359511232.14591.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CAH80STVHwLdBY5Ov4CuSBth4W2=ytRYHq2MB47=tdAQTfN66eg@mail.gmail.com>

Hi all,

Essentially, I agree with everything Bow and Lenna have said. If all
web-based tools are in a single root-level package, then with appropriate
documentation I think users should know where to find any function. People
are at least going to know if their required module interfaces with some
website.

I guess the problem is that moving all the web stuff into one package will
break alot of code, which leads me back to my original idea of just copying
where stuff like TOGOws and ExPASy is located, i.e. sticking TAIR in the
root level directory.

Peter and Michiel, do you think that Lenna's suggestion is workable? Would
it make sense to go all in and simultaneously refactor parsers into
Bio.parse,  Bio.*IO into Bio.io.*, etc etc. Perhaps this could be delayed
until the next major release (or form the beginings of a biopython2
branch?).

Cheers,
Kevin Murray


On 30 January 2013 13:00, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> Bio.WWW was one of those modules that seem a good idea at first, but then
> failed to gain general acceptance. There are three problems with Bio.WWW:
>
> 1) From the module name, it's not clear what you would find in it. For
> example, if you want to access the Entrez database, would you first look in
> Bio.Entrez or in Bio.WWW? Similarly for TAIR: Would you look for it in
> Bio.TAIR, or in Bio.WWW?
>
> 2) The modules in Bio.WWW don't have much to do with each other, except
> that they access the internet. But any given user probably is mainly
> interested in Entrez, or ExPASy, or some other database, not in all of them
> at the same time.
>
> 3) The flip side of this is that a user accessing e.g. ExPASy would have
> to import both Bio.WWW and Bio.ExPASy to be able to use ExPASy. Doctests
> get more complicated also, as they would span more than one module. Here is
> an example from Bio.Entrez that accesses the database, and then parses the
> results:
> >>> from Bio import Entrez
> >>> Entrez.email = "Your.Name.Here at example.org"
> >>> handle = Entrez.einfo() # or esearch, efetch, ...
> >>> record = Entrez.read(handle)
> >>> handle.close()
>
> The ultimate question is whether we organize the code in Biopython by
> their functionality from a user perspective, or by the kind of things they
> do? Almost all of Biopython is organized according to the former. For
> example, we don't have a Bio.Parsers module for all the parsers; similarly,
> we don't have Bio.WWW for internet access.
>
> Best,
> -Michiel.
>
>
> --- On Tue, 1/29/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> > From: Peter Cock <p.j.a.cock at googlemail.com>
> > Subject: Re: [Biopython-dev] Namespace for online resources?
> > To: "Wibowo Arindrarto" <w.arindrarto at gmail.com>
> > Cc: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> > Date: Tuesday, January 29, 2013, 4:11 PM
> > On Tue, Jan 29, 2013 at 9:03 PM,
> > Peter Cock <p.j.a.cock at googlemail.com>
> > wrote:
> > > On Tue, Jan 29, 2013 at 7:52 PM, Wibowo Arindrarto
> > > <w.arindrarto at gmail.com>
> > wrote:
> > >> Hi everyone,
> > >>
> > >> Why was Bio.WWW deprecated in the first place?
> > >>
> > >
> > > The flippant answer is everything under Bio.WWW was
> > moved
> > > or deprecated:
> > >
> http://lists.open-bio.org/pipermail/biopython-dev/2008-July/004059.html
> > >
> > > I'm trying to identify the discussions prior to that
> > covering the moves:
> > >
> > > Bio.WWW.ExPASy -> Bio.ExPASy
> > > Bio.WWW.InterPro -> Bio.InterPro
> > > Bio.WWW.NCBI -> Bio.Entrez
> > > Bio.WWW.SCOP -> Bio.SCOP
> >
> > Probably this thread,
> >
> http://lists.open-bio.org/pipermail/biopython-dev/2007-November/003241.html
> >
> > Also a bit more background on the NCBI Entrez side:
> >
> http://lists.open-bio.org/pipermail/biopython-dev/2008-February/003423.html
> >
> > Peter
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From p.j.a.cock at googlemail.com  Wed Jan 30 08:52:24 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 30 Jan 2013 08:52:24 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359512949.16659.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CADEGkF4LL=0_5uCtMMsBveM-ce1XkxPDcTASMhAUEXKiYFLY9A@mail.gmail.com>
	<1359512949.16659.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_5LZhd+s1+-E=TCuRzEEx-BDUFYqs6sGDm8aDgQsr+d6Q@mail.gmail.com>

On Wed, Jan 30, 2013 at 2:29 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Bow,
>
> Thanks for the explanation.
>
>> Indeed, the doctests functions are two simple small
>> functions to make it easier to run doctests. The first
>> one looks up the test directory (our Tests directory) and
>> the second one simply executes the doctest.
>
> The point of looking up the test directory is to find the
> example input files, right?

Yes. Most of the code is working out where our Test
directory is, without that it is just two lines:

import doctest
doctest.testmod()

> Have a look at Bio/Align/Applications/_Mafft.py.
> Its doctest uses the complete path to the example input file:
>
> https://github.com/biopython/biopython/commit/32a6beb1e039fa614398a7dee1c031466e8e42ed#Bio/Align/Applications/_Mafft.py
>
> I like this solution better, since it's more straightforward, it doesn't
> need a new module, and also allows the user to run the example
> without having to figure out where the input file is located.

That's a special case - the file being referred to isn't used
other than to print out a command line string. So it is fine.

The doctests we're talking about typically are for parsing,
and they need to find the file. In order to run via the main
test suite (run_tests.py) we can assume we are in the
Biopython Tests folder and therefore use relative paths.

Those relative paths won't work if trying to run the doctests
via the __name__ trick, thus the path magic which seemed
sensible to put in one place only.

We can of course remove these __name__ trick conveniences,
they are only intended to make life easier for us developers
when editing the doctests of a module. But I think it is worth
having as a private function somewhere in the code base.

Regards,

Peter


From p.j.a.cock at googlemail.com  Wed Jan 30 09:31:31 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 30 Jan 2013 09:31:31 +0000
Subject: [Biopython-dev] New BioPython member
In-Reply-To: <1359494577.29159.YahooMailNeo@web122606.mail.ne1.yahoo.com>
References: <1359494577.29159.YahooMailNeo@web122606.mail.ne1.yahoo.com>
Message-ID: <CAKVJ-_4HQC5V59V=jy4cpkqAAs-8FxtbAsjph3tWXwRAMFAMyQ@mail.gmail.com>

On Tue, Jan 29, 2013 at 9:22 PM, Nate Sutton <natemsutton at yahoo.com> wrote:
> Dear all,
>
> I just recently joined the BioPython developers group and am
> looking forward to contributing to BioPython!  I have worked for a while
> in programming, genetics, and biology and have
> a m.s. in Biomedical Informatics.  After
> talking with some fellow contributors I have decided to try working on
> https://redmine.open-bio.org/issues/3360 but I will also work on writing
> some documentation on examples from the
> cookbook, especially if I am stuck on the bug.  If anyone wants to work on
> the same things, I?d be glad to hear that, I
> may be slow on the work because I am still learning Python after coming
> from
> other languages.
>
> -Nate

Hi Nate, and welcome.

Eric is in charge of the Bio.Phylo module, but within that the
command line application wrappers under Bio.Phylo.Applications
follow a pattern used elsewhere in Biopython.

To add a wrapper for fasttree http://www.microbesonline.org/fasttree/
have a look at the existing wrappers for PHYML and RAXML, defined in
Bio/Phylo/Applications/_Phyml.py and Bio/Phylo/Applications/_Raxml.py
(leading underscores mean private modules in Python), which are
exposed to the user via Bio/Phylo/Applications/__init__.py

In this case, I'd suggest putting the new wrapper in a new file,
Bio/Phylo/Applications/_fastree.py

Other similar wrappers existing under Bio.Emboss, Bio.Align, etc.

Don't be shy about asking for guidance on this, or git and github.
Ultimately I'm hoping you'll be able to do is take a fork (personally
copy of the repository) on GitHub, create a new fasttree branch,
commit your enhancements, and make a pull request. If that's
all too much for now, simply writing the new file and letting us
do the git side would be fine.

Regards,

Peter


From p.j.a.cock at googlemail.com  Wed Jan 30 09:42:23 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 30 Jan 2013 09:42:23 +0000
Subject: [Biopython-dev] Trie with_prefix doesn't work as expected
In-Reply-To: <CAEe6yUEX_xbawFkP74-yKO+eTKo_mAU9SGO-+0PoRwU0mA8=vw@mail.gmail.com>
References: <CAEe6yUEX_xbawFkP74-yKO+eTKo_mAU9SGO-+0PoRwU0mA8=vw@mail.gmail.com>
Message-ID: <CAKVJ-_5XvS899Oi7=3oDMW713oHnz6nNvEx5C-rRMUa=aashvQ@mail.gmail.com>

On Wed, Jan 30, 2013 at 2:09 AM, Kevin Wu <kjwu at ucsd.edu> wrote:
> Hi All,
>
> I'm attempting to use the trie implementation in biopython to develop a
> suffix trie. I'm using the with_prefix function to find all keys which
> start with a sequence, however, the function doesn't return values that I
> expect. I tested it with the canonical example "banana" and am a bit
> confused.
>
> from Bio.trie import trie
> t = trie()
> s = "BANANA"
> for i in range(len(s)):  # insert all suffixes into trie
>     t[s[i:]] = i
>
> t.with_prefix("NA")  # this works as expected
>>> ['NA', 'NANA']
>
> t.with_prefix("AN")
>>> ['AN', 'ANNA']  # this doesn't work as expected
>                            # expected output: ["ANANA", "ANA"]
>
> Can anyone clarify my confusion or confirm this bug? I'm on Biopython 1.60,
> Linux Mint 64-bit.

There is certainly something odd happening. I'm testing with the
current code in git (pre-Biopython 1.61) under Mac OS X.

>>> from Bio.trie import trie
>>> t = trie()
>>> s = "BANANA"
>>> for i in range(len(s)):  # insert all suffixes into trie
...     t[s[i:]] = i
...     print "%s -> %i" % (s[i:], i)
...     assert t[s[i:]] == i
...
BANANA -> 0
ANANA -> 1
NANA -> 2
ANA -> 3
NA -> 4
A -> 5
>>> t.values()
[5, 3, 1, 0, 4, 2]
>>> t.keys()
['A', 'ANA', 'ANANA', 'BANANA', 'NA', 'NANA']

These look fine:

>>> t.with_prefix("NA")
['NA', 'NANA']
>>> t.with_prefix("A")
['A', 'ANA', 'ANANA']
>>> t.with_prefix("ANA")
['ANA', 'ANANA']

As you point out, this example seems wrong:

>>> t.with_prefix("AN")
['AN', 'ANNA']

The value 'ANNA' shouldn't be in the trie.

Peter


From mjldehoon at yahoo.com  Wed Jan 30 10:20:53 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 30 Jan 2013 02:20:53 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CAKVJ-_5LZhd+s1+-E=TCuRzEEx-BDUFYqs6sGDm8aDgQsr+d6Q@mail.gmail.com>
Message-ID: <1359541253.85968.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi Peter,

--- On Wed, 1/30/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Those relative paths won't work if trying to run the
> doctests via the __name__ trick, thus the path magic which
> seemed sensible to put in one place only.

In which case won't they work? I tried this on SeqRecord.py, and as far as I can tell, the relative paths work fine also when running the doctests from the __name__=="__main__" block, both on Unix and Windows.

Best,
-Michiel


From p.j.a.cock at googlemail.com  Wed Jan 30 11:42:21 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 30 Jan 2013 11:42:21 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359541253.85968.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <CAKVJ-_5LZhd+s1+-E=TCuRzEEx-BDUFYqs6sGDm8aDgQsr+d6Q@mail.gmail.com>
	<1359541253.85968.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7_RdLfTP4iAGwieR-dmFSRLx_euO0Xx-qk8cRzBsNzOg@mail.gmail.com>

On Wed, Jan 30, 2013 at 10:20 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Peter,
>
> --- On Wed, 1/30/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> Those relative paths won't work if trying to run the
>> doctests via the __name__ trick, thus the path magic which
>> seemed sensible to put in one place only.
>
> In which case won't they work? I tried this on SeqRecord.py,
> and as far as I can tell, the relative paths work fine also when
> running the doctests from the __name__=="__main__" block,
> both on Unix and Windows.

Yes, no path magic works IF you are in the Tests folder, e.g.

~/biopython/Tests $ emacs ../Bio/SeqRecord.py
~/biopython/Tests $ python ../Bio/SeqRecord.py

However for anything like the following convenient alternatives
to work and run the doctests, you need some path magic:

~/biopython $ emacs Bio/SeqRecord.py
~/biopython $ python Bio/SeqRecord.py

Or,

~/biopython/Bio $ emacs SeqRecord.py
~/biopython/Bio $ python SeqRecord.py

I felt having a central convenience function to make that work
was worthwhile in order to make working on doctests easier
without code duplication. I would accept that this alone does
not justify a whole module or file like Bio/_utils.py

If you feel strongly about this, we can remove the function
run_doctest from Bio/_utils.py (it does after all serve no
real purpose in the installed library code), and just require
the current directory be the test folder.

Would you like me to make that change?

Regards,

Peter


From mjldehoon at yahoo.com  Wed Jan 30 12:10:17 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 30 Jan 2013 04:10:17 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CAKVJ-_7_RdLfTP4iAGwieR-dmFSRLx_euO0Xx-qk8cRzBsNzOg@mail.gmail.com>
Message-ID: <1359547817.36972.YahooMailClassic@web164001.mail.gq1.yahoo.com>

Hi Peter,

--- On Wed, 1/30/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> However for anything like the following convenient
> alternatives to work and run the doctests, you need
> some path magic:
> ~/biopython $ emacs Bio/SeqRecord.py
> ~/biopython $ python Bio/SeqRecord.py

Here I agree.
> Or,
> 
> ~/biopython/Bio $ emacs SeqRecord.py
> ~/biopython/Bio $ python SeqRecord.py
> 
Well I was thinking that the doctests in SeqRecord.py could use a relative path to the Tests directory, e.g. ../Tests/Quality/solexa_faked.fastq.
But I agree that this will fail again for any script in submodules.

Still I would think that there is a better way to do this, and I doubt that we are the first ones who want to access test files with doctests. I can write a short message to comp.lang.python to see have anybody has any suggestions.

Best,
-Michiel.


From arklenna at gmail.com  Wed Jan 30 17:10:40 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Wed, 30 Jan 2013 12:10:40 -0500
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <1359511232.14591.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CAKVJ-_7pCQj4C=8MXXq=8Cr6e1rNNDuNinCJqQDhADpV_muMSg@mail.gmail.com>
	<1359511232.14591.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CALfq9t++Co8TVunxk0J9JopMtmgvktiV3LKfAzU=7m=RhdMBFg@mail.gmail.com>

Michiel,

You raise an excellent point that separating the modules in this way will
complicate doctests.

Regarding point (2), is your primary concern namespace clutter or importing
efficiency?

I still maintain that the category of internet access is more fundamental
than the category of parsers. For point (1), if every database is accessed
using a WWW submodule, a user will know to look there.

Obviously moving everything would be a lot of work...

Cheers,

Lenna


On Tue, Jan 29, 2013 at 9:00 PM, Michiel de Hoon <mjldehoon at yahoo.com>wrote:

> Bio.WWW was one of those modules that seem a good idea at first, but then
> failed to gain general acceptance. There are three problems with Bio.WWW:
>
> 1) From the module name, it's not clear what you would find in it. For
> example, if you want to access the Entrez database, would you first look in
> Bio.Entrez or in Bio.WWW? Similarly for TAIR: Would you look for it in
> Bio.TAIR, or in Bio.WWW?
>
> 2) The modules in Bio.WWW don't have much to do with each other, except
> that they access the internet. But any given user probably is mainly
> interested in Entrez, or ExPASy, or some other database, not in all of them
> at the same time.
>
> 3) The flip side of this is that a user accessing e.g. ExPASy would have
> to import both Bio.WWW and Bio.ExPASy to be able to use ExPASy. Doctests
> get more complicated also, as they would span more than one module. Here is
> an example from Bio.Entrez that accesses the database, and then parses the
> results:
> >>> from Bio import Entrez
> >>> Entrez.email = "Your.Name.Here at example.org"
> >>> handle = Entrez.einfo() # or esearch, efetch, ...
> >>> record = Entrez.read(handle)
> >>> handle.close()
>
> The ultimate question is whether we organize the code in Biopython by
> their functionality from a user perspective, or by the kind of things they
> do? Almost all of Biopython is organized according to the former. For
> example, we don't have a Bio.Parsers module for all the parsers; similarly,
> we don't have Bio.WWW for internet access.
>
> Best,
> -Michiel.
>
>
> --- On Tue, 1/29/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> > From: Peter Cock <p.j.a.cock at googlemail.com>
> > Subject: Re: [Biopython-dev] Namespace for online resources?
> > To: "Wibowo Arindrarto" <w.arindrarto at gmail.com>
> > Cc: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> > Date: Tuesday, January 29, 2013, 4:11 PM
> > On Tue, Jan 29, 2013 at 9:03 PM,
> > Peter Cock <p.j.a.cock at googlemail.com>
> > wrote:
> > > On Tue, Jan 29, 2013 at 7:52 PM, Wibowo Arindrarto
> > > <w.arindrarto at gmail.com>
> > wrote:
> > >> Hi everyone,
> > >>
> > >> Why was Bio.WWW deprecated in the first place?
> > >>
> > >
> > > The flippant answer is everything under Bio.WWW was
> > moved
> > > or deprecated:
> > >
> http://lists.open-bio.org/pipermail/biopython-dev/2008-July/004059.html
> > >
> > > I'm trying to identify the discussions prior to that
> > covering the moves:
> > >
> > > Bio.WWW.ExPASy -> Bio.ExPASy
> > > Bio.WWW.InterPro -> Bio.InterPro
> > > Bio.WWW.NCBI -> Bio.Entrez
> > > Bio.WWW.SCOP -> Bio.SCOP
> >
> > Probably this thread,
> >
> http://lists.open-bio.org/pipermail/biopython-dev/2007-November/003241.html
> >
> > Also a bit more background on the NCBI Entrez side:
> >
> http://lists.open-bio.org/pipermail/biopython-dev/2008-February/003423.html
> >
> > Peter
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From w.arindrarto at gmail.com  Wed Jan 30 17:20:39 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 30 Jan 2013 18:20:39 +0100
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <1359511232.14591.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CAKVJ-_7pCQj4C=8MXXq=8Cr6e1rNNDuNinCJqQDhADpV_muMSg@mail.gmail.com>
	<1359511232.14591.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CADEGkF7u7O=ZzwuJ8ZQyySofWoQpFAKGS247usr_mJK2dZcJMA@mail.gmail.com>

Hi everyone,

Peter, thanks for the links to the archives, I'm starting to get a
grip on why Bio.WWW was deprecated in the first place.

Michiel, thanks for the explanation. My responses are below.

My reply is a bit long, so in the interest of brevity, I'll say first
that I'm in favor of putting TAIR in Bio.TAIR now, for practical
reasons and consistency with similar modules. But I do still have some
slight objections to this approach.

> Bio.WWW was one of those modules that seem a good idea at first, but then failed to gain general acceptance. There are three problems with Bio.WWW:
>
> 1) From the module name, it's not clear what you would find in it. For example, if you want to access the Entrez database, would you first look in Bio.Entrez or in Bio.WWW? Similarly for TAIR: Would you look for it in Bio.TAIR, or in Bio.WWW?

This seems to be a naming issue, but it does not invalidate the idea
of having one central place for online access. I'll continue to refer
to this module as Bio.WW here, but there may be other more suitable
names, such as Bio.remotedb, Bio.remote.db, Bio.www.db (or something
else) which makes the module a more intuitive place to look in,
right?.

> 2) The modules in Bio.WWW don't have much to do with each other, except that they access the internet. But any given user probably is mainly interested in Entrez, or ExPASy, or some other database, not in all of them at the same time.

We may put a note in the documentation to note this, right? If we are
worried about loading unecessary modules, we can keep the __init__.py
in Bio.WWW empty, and have Entrez, ExPASy, and the others inside
Bio.WWW.

> 3) The flip side of this is that a user accessing e.g. ExPASy would have to import both Bio.WWW and Bio.ExPASy to be able to use ExPASy. Doctests get more complicated also, as they would span more than one module. Here is an example from Bio.Entrez that accesses the database, and then parses the results:
>>>> from Bio import Entrez
>>>> Entrez.email = "Your.Name.Here at example.org"
>>>> handle = Entrez.einfo() # or esearch, efetch, ...
>>>> record = Entrez.read(handle)
>>>> handle.close()

Since ExPASy's formats may be specific to them, I was thinking their
parsers should also go in Bio.WWW (in this case, Bio.WWW.ExPASy).

Note that at the moment we also have cases where the database entry
retriever and parser lies in different submodules of the code (e.g.
importing Fasta from Bio.Entrez and parsing it with Bio.SeqIO). This
is OK in my opinion, however, as Fasta is a widely used format not
exclusive to Entrez. But for exclusive format like ExPASy's or
Entrez's, it makes sense to keep them in the same module as their
database entry retriever.

> The ultimate question is whether we organize the code in Biopython by their functionality from a user perspective, or by the kind of things they do? Almost all of Biopython is organized according to the former. For example, we don't have a Bio.Parsers module for all the parsers; similarly, we don't have Bio.WWW for internet access.

Hmm..those two points are not necessarily mutually exclusive, right? I
think having a centralized module for online access still makes for a
functional grouping based on a user's perspective.

In the parser's case, it makes sense to organize it the way we do now
as there are so many parsers. But for online access, I think it's
still manageable to put them in one directory. Just to throw the idea
around, we may also have subdirectories for different kinds of online
access (e.g. Bio.www.db for online database access, Bio.www.app for
online tools access like NCBI BLAST or HMMER).

This is not something urgent, but maybe worth thinking / discussing about :).

Cheers,
Bow


From mjldehoon at yahoo.com  Thu Jan 31 11:03:12 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 31 Jan 2013 03:03:12 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CADEGkF4LL=0_5uCtMMsBveM-ce1XkxPDcTASMhAUEXKiYFLY9A@mail.gmail.com>
Message-ID: <1359630192.62870.YahooMailClassic@web164001.mail.gq1.yahoo.com>

Dear all,

[Michiel wrote:]
> Still I would think that there is a better way to do this,
> and I doubt that we are the first ones who want to access
> test files with doctests. I can write a short message to
> comp.lang.python to see have anybody has any suggestions.

So I started writing a message to comp.lang.python, and while reading the doctest documentation to make my message understandable I realized that we can solve our problem by using the setUp and tearDown arguments to doctest.DocTestSuite. Then we put the test files in the same directory as the module we want to test, and use setUp/tearDown to let the unittest switch to this directory when needed.

This has the added benefit that the example files are easier to find for users who want to try out a doctest example.

Perhaps we'll still run into some issues if we try to implement this, but it seems a step in the right direction.

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Thu Jan 31 11:38:43 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 31 Jan 2013 11:38:43 +0000
Subject: [Biopython-dev] Trie with_prefix doesn't work as expected
In-Reply-To: <CAKVJ-_5XvS899Oi7=3oDMW713oHnz6nNvEx5C-rRMUa=aashvQ@mail.gmail.com>
References: <CAEe6yUEX_xbawFkP74-yKO+eTKo_mAU9SGO-+0PoRwU0mA8=vw@mail.gmail.com>
	<CAKVJ-_5XvS899Oi7=3oDMW713oHnz6nNvEx5C-rRMUa=aashvQ@mail.gmail.com>
Message-ID: <CAKVJ-_7i4+MY6OSYkm+_tqbV_ndwCvmGc=nW0gMG9PnoEobyGA@mail.gmail.com>

On Wed, Jan 30, 2013 at 9:42 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Wed, Jan 30, 2013 at 2:09 AM, Kevin Wu <kjwu at ucsd.edu> wrote:
>> Hi All,
>>
>> I'm attempting to use the trie implementation in biopython to develop a
>> suffix trie. I'm using the with_prefix function to find all keys which
>> start with a sequence, however, the function doesn't return values that I
>> expect. I tested it with the canonical example "banana" and am a bit
>> confused.
>>
>> from Bio.trie import trie
>> t = trie()
>> s = "BANANA"
>> for i in range(len(s)):  # insert all suffixes into trie
>>     t[s[i:]] = i
>>
>> t.with_prefix("NA")  # this works as expected
>>>> ['NA', 'NANA']
>>
>> t.with_prefix("AN")
>>>> ['AN', 'ANNA']  # this doesn't work as expected
>>                            # expected output: ["ANANA", "ANA"]
>>
>> Can anyone clarify my confusion or confirm this bug? I'm on Biopython 1.60,
>> Linux Mint 64-bit.
>
> There is certainly something odd happening. I'm testing with the
> current code in git (pre-Biopython 1.61) under Mac OS X.
>
>>>> from Bio.trie import trie
>>>> t = trie()
>>>> s = "BANANA"
>>>> for i in range(len(s)):  # insert all suffixes into trie
> ...     t[s[i:]] = i
> ...     print "%s -> %i" % (s[i:], i)
> ...     assert t[s[i:]] == i
> ...
> BANANA -> 0
> ANANA -> 1
> NANA -> 2
> ANA -> 3
> NA -> 4
> A -> 5
>>>> t.values()
> [5, 3, 1, 0, 4, 2]
>>>> t.keys()
> ['A', 'ANA', 'ANANA', 'BANANA', 'NA', 'NANA']
>
> These look fine:
>
>>>> t.with_prefix("NA")
> ['NA', 'NANA']
>>>> t.with_prefix("A")
> ['A', 'ANA', 'ANANA']
>>>> t.with_prefix("ANA")
> ['ANA', 'ANANA']
>
> As you point out, this example seems wrong:
>
>>>> t.with_prefix("AN")
> ['AN', 'ANNA']
>
> The value 'ANNA' shouldn't be in the trie.
>
> Peter

Thanks to Jeff Chang for a very speedy fix (sent as an attachment off list),
which I have applied to the repository:
https://github.com/biopython/biopython/commit/cd7cc7174fd4b0607381e9c58f6ae0d17cca8f74

I've also added a unit test based on Kevin's example:
https://github.com/biopython/biopython/commit/efc289c8fe2e78ad12481973e42554fa40f2ea0a

Thank you for reporting this Kevin.

Peter

P.S. Nice to hear from you again Jeff :)

I think your last commit was before we moved from CVS to git, please
let us know if you want commit access on github.


From p.j.a.cock at googlemail.com  Thu Jan 31 11:43:44 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 31 Jan 2013 11:43:44 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359630192.62870.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <CADEGkF4LL=0_5uCtMMsBveM-ce1XkxPDcTASMhAUEXKiYFLY9A@mail.gmail.com>
	<1359630192.62870.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_5Z3cGsKY8xAi_PCjPdwALPV3qg2wfihd9o1BOQE48eYQ@mail.gmail.com>

On Thu, Jan 31, 2013 at 11:03 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Dear all,
>
> [Michiel wrote:]
>> Still I would think that there is a better way to do this,
>> and I doubt that we are the first ones who want to access
>> test files with doctests. I can write a short message to
>> comp.lang.python to see have anybody has any suggestions.
>
> So I started writing a message to comp.lang.python, and while reading
> the doctest documentation to make my message understandable I
> realized that we can solve our problem by using the setUp and tearDown
> arguments to doctest.DocTestSuite. Then we put the test files in the same
> directory as the module we want to test, and use setUp/tearDown to let
> the unittest switch to this directory when needed.
>
> This has the added benefit that the example files are easier to find
> for users who want to try out a doctest example.
>
> Perhaps we'll still run into some issues if we try to implement this, but
> it seems a step in the right direction.

I don't follow what you are suggesting here. Are you suggesting putting
test files under Bio/* as well/instead or under Tests/* ?

Peter


From mjldehoon at yahoo.com  Thu Jan 31 13:46:47 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 31 Jan 2013 05:46:47 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CAKVJ-_5Z3cGsKY8xAi_PCjPdwALPV3qg2wfihd9o1BOQE48eYQ@mail.gmail.com>
Message-ID: <1359640007.58576.YahooMailClassic@web164005.mail.gq1.yahoo.com>

> I don't follow what you are suggesting here. Are you
> suggesting putting test files under Bio/* as well/instead
> or under Tests/* ?

Well the key point is that if we run the doctests from the Tests directory (with run_tests.py), we can change directory to the directory containing the module whose doctests we want to test. Then, if "python somemodule.py" can find the test files, then so can run_tests.py. We'd just need to make sure that the relative paths in somemodule.py are correct with respect to the directory in which somemodule.py resides.

But keep in mind that the unit tests in Tests and the doctests in the modules have different functions. The purpose of the unit tests is to test the Biopython code; the purpose of the doctests is to make sure the docstring examples work. So one could argue that the heavy test files should go under Tests, while simple test files just for the docstring examples should go under Bio/SomeModule.

Best,
-Michiel.

--- On Thu, 1/31/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "Wibowo Arindrarto" <w.arindrarto at gmail.com>, "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Thursday, January 31, 2013, 6:43 AM
> On Thu, Jan 31, 2013 at 11:03 AM,
> Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > Dear all,
> >
> > [Michiel wrote:]
> >> Still I would think that there is a better way to
> do this,
> >> and I doubt that we are the first ones who want to
> access
> >> test files with doctests. I can write a short
> message to
> >> comp.lang.python to see have anybody has any
> suggestions.
> >
> > So I started writing a message to comp.lang.python, and
> while reading
> > the doctest documentation to make my message
> understandable I
> > realized that we can solve our problem by using the
> setUp and tearDown
> > arguments to doctest.DocTestSuite. Then we put the test
> files in the same
> > directory as the module we want to test, and use
> setUp/tearDown to let
> > the unittest switch to this directory when needed.
> >
> > This has the added benefit that the example files are
> easier to find
> > for users who want to try out a doctest example.
> >
> > Perhaps we'll still run into some issues if we try to
> implement this, but
> > it seems a step in the right direction.
> 
> I don't follow what you are suggesting here. Are you
> suggesting putting
> test files under Bio/* as well/instead or under Tests/* ?
> 
> Peter
> 


From p.j.a.cock at googlemail.com  Thu Jan 31 14:26:50 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 31 Jan 2013 14:26:50 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359640007.58576.YahooMailClassic@web164005.mail.gq1.yahoo.com>
References: <CAKVJ-_5Z3cGsKY8xAi_PCjPdwALPV3qg2wfihd9o1BOQE48eYQ@mail.gmail.com>
	<1359640007.58576.YahooMailClassic@web164005.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_5c=FmHibGh++thBzPsrAYv+_fnjEYgHwBy74Zpvkf-Cw@mail.gmail.com>

On Thu, Jan 31, 2013 at 1:46 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>> I don't follow what you are suggesting here. Are you
>> suggesting putting test files under Bio/* as well/instead
>> or under Tests/* ?
>
> Well the key point is that if we run the doctests from the Tests directory
> (with run_tests.py), we can change directory to the directory containing
> the module whose doctests we want to test. Then, if "python somemodule.py"
> can find the test files, then so can run_tests.py. We'd just need to make
> sure that the relative paths in somemodule.py are correct with respect to
> the directory in which somemodule.py resides.

I can see how that would work - put all the path changing magic into
run_tests.py (before running the doctest for Bio/x/y/z.py change to
the directory Bio/x/y and so on), and have the Bio/x/y/z.py doctests
assume they will be run from Bio/x/y only.

> But keep in mind that the unit tests in Tests and the doctests in the modules
> have different functions. The purpose of the unit tests is to test the Biopython
> code; the purpose of the doctests is to make sure the docstring examples work.

Of course.

> So one could argue that the heavy test files should go under Tests, while
> simple test files just for the docstring examples should go under Bio/SomeModule.

Many of the unittests and doctests currently use the same example files.

However, my main objection is that I don't like the idea of putting test files
under Bio/* - I feel it should be the source code only (bar some special
cases like data files). There are probably packaging guidelines about this
somewhere... but I can't find anything immediately.

Regards,

Peter


From mjldehoon at yahoo.com  Thu Jan 31 15:33:35 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 31 Jan 2013 07:33:35 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
Message-ID: <1359646415.80564.YahooMailClassic@web164006.mail.gq1.yahoo.com>

--- On Thu, 1/31/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> However, my main objection is that I don't like the idea of
> putting test files under Bio/* 

I'm OK with using the setUp and tearDown arguments to doctest.DocTestSuite to do the directory magic, but keeping the test files under Tests/.

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Thu Jan 31 15:47:18 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 31 Jan 2013 15:47:18 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359646415.80564.YahooMailClassic@web164006.mail.gq1.yahoo.com>
References: <1359646415.80564.YahooMailClassic@web164006.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_6RR-gZ+oCmW+N2hBnutvk7OvwxiFDym_y5NZ+KGi0Sow@mail.gmail.com>

On Thu, Jan 31, 2013 at 3:33 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Thu, 1/31/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> However, my main objection is that I don't like the idea of
>> putting test files under Bio/*
>
> I'm OK with using the setUp and tearDown arguments to
> doctest.DocTestSuite to do the directory magic, but keeping the test files
> under Tests/.

As a more elegant version of the Bio._utils.run_doctest() function?

Peter