From b.invergo at gmail.com  Mon Oct  1 05:52:04 2012
From: b.invergo at gmail.com (Brandon Invergo)
Date: Mon, 01 Oct 2012 11:52:04 +0200
Subject: [Biopython-dev] PAML test problems under Python 3.3.0
In-Reply-To: <CAKVJ-_4DCG=_d097D=M5Ld1AthCVmZ50qixL4HR7OLOK68ZkuQ@mail.gmail.com>
References: <CAKVJ-_4DCG=_d097D=M5Ld1AthCVmZ50qixL4HR7OLOK68ZkuQ@mail.gmail.com>
Message-ID: <87k3vazfi3.fsf@invergo.net>

Yes no problem, I can take a look at it. I'm completely swamped at the
moment, though, so I might have to put it off for a couple of days. If
it's an emergency, let me know.

-brandon


Peter Cock <p.j.a.cock at googlemail.com> writes:

> Hi Brandon (et al),
>
> Could you have a look at the PAML unit tests under Python 3.3 please?
> I see a mix of failures and 'blocking' under a self-compiled Python 3.3.0
> on Mac OS X 10.8 (Mountain Lion):
>
> $ python3 test_PAML_yn00.py
> testAlignmentExists (__main__.ModTest) ... ok
> testAlignmentFileIsValid (__main__.ModTest) ... FAIL
> testAlignmentSpecified (__main__.ModTest) ... ok
> testCtlFileExistsOnRead (__main__.ModTest) ... ok
> testCtlFileExistsOnRun (__main__.ModTest) ... ok
> testCtlFileValidOnRead (__main__.ModTest) ... ERROR
> testCtlFileValidOnRun (__main__.ModTest) ... ok
> testOptionExists (__main__.ModTest) ... ok
> testOutputFileSpecified (__main__.ModTest) ... ok
> testOutputFileValid (__main__.ModTest) ... ok
> testParseAllVersions (__main__.ModTest) ... ok
> testResultsExist (__main__.ModTest) ... ok
> testResultsParsable (__main__.ModTest) ... ok
> testResultsValid (__main__.ModTest) ... ^C
>
> $ python3 test_PAML_codeml.py
> testAlignmentExists (__main__.ModTest) ... ok
> testAlignmentFileIsValid (__main__.ModTest) ... FAIL
> testAlignmentSpecified (__main__.ModTest) ... ok
> testCtlFileExistsOnRead (__main__.ModTest) ... ok
> testCtlFileExistsOnRun (__main__.ModTest) ... ok
> testCtlFileValidOnRead (__main__.ModTest) ... ERROR
> testCtlFileValidOnRun (__main__.ModTest) ... ok
> testOptionExists (__main__.ModTest) ... ok
> testOutputFileSpecified (__main__.ModTest) ... ok
> testOutputFileValid (__main__.ModTest) ... ok
> testPamlErrorsCaught (__main__.ModTest) ... ok
> testParseAA (__main__.ModTest) ... ok
> testParseAAPairwise (__main__.ModTest) ... ok
> testParseAllNSsites (__main__.ModTest) ... ok
> testParseBranchSiteA (__main__.ModTest) ... ok
> testParseCladeModelC (__main__.ModTest) ... ok
> testParseFreeRatio (__main__.ModTest) ... ok
> testParseNSsite3 (__main__.ModTest) ... ok
> testParseNgene2Mgene02 (__main__.ModTest) ... ok
> testParseNgene2Mgene1 (__main__.ModTest) ... ok
> testParseNgene2Mgene34 (__main__.ModTest) ... ok
> testParsePairwise (__main__.ModTest) ... ok
> testParseSEs (__main__.ModTest) ... ok
> testResultsExist (__main__.ModTest) ... ok
> testResultsParsable (__main__.ModTest) ... ok
> testResultsValid (__main__.ModTest) ... ^C
>
> $ python3 test_PAML_baseml.py
> testAlignmentExists (__main__.ModTest) ... ok
> testAlignmentFileIsValid (__main__.ModTest) ... FAIL
> testAlignmentSpecified (__main__.ModTest) ... ok
> testCtlFileExistsOnRead (__main__.ModTest) ... ok
> testCtlFileExistsOnRun (__main__.ModTest) ... ok
> testCtlFileValidOnRead (__main__.ModTest) ... ERROR
> testCtlFileValidOnRun (__main__.ModTest) ... ok
> testOptionExists (__main__.ModTest) ... ok
> testOutputFileSpecified (__main__.ModTest) ... ok
> testOutputFileValid (__main__.ModTest) ... ok
> testPamlErrorsCaught (__main__.ModTest) ... ok
> testParseAllVersions (__main__.ModTest) ... ok
> testParseAlpha1Rho1 (__main__.ModTest) ... ok
> testParseModel (__main__.ModTest) ... ok
> testParseNhomo (__main__.ModTest) ... ok
> testParseSEs (__main__.ModTest) ... ok
> testResultsExist (__main__.ModTest) ... ok
> testResultsParsable (__main__.ModTest) ... ok
> testResultsValid (__main__.ModTest) ... ^C
>
> If you've not tried this before, the procedure I'm using is:
>
> $ python3 setup.py build
> $ cd build/py3.3/Tests
> $ python3 test_PAML_baseml.py
> etc
>
> The key point is to run the tests directly (rather than
> just via 'python3 setup.py test') you must change
> director to the 2to3 converted folder under the build
> folder.
>
> By commenting out the test methods which seem to
> blocking, it seems some of the failures are to do with
> exception handling. I've not dug any further into this.
>
> Thanks,
>
> Peter

From bjoern at gruenings.eu  Mon Oct  1 17:44:10 2012
From: bjoern at gruenings.eu (=?ISO-8859-1?Q?Bj=F6rn_Gr=FCning?=)
Date: Mon, 01 Oct 2012 23:44:10 +0200
Subject: [Biopython-dev] [Patch] Genbank Parser
In-Reply-To: <CAKVJ-_5nekcTBYejUTVV6VvjV+mB0WV0eoEWKytGZOTmgfmw1g@mail.gmail.com>
References: <1348837402.21455.1.camel@threonin>
	<CAKVJ-_5nekcTBYejUTVV6VvjV+mB0WV0eoEWKytGZOTmgfmw1g@mail.gmail.com>
Message-ID: <1349127850.19730.11.camel@threonin>

Hi Peter,

> >
> > the tbl2asn tool from the ncbi creates genbank files that did not have a
> > version number. Unfortunately that version number is used to fill
> > consumer.data.id.
> > I implemented the following fall-back:
> > If there is no version information available than it takes the
> > consumer.data.name for the consumer.data.id. Does that makes sense?
> >
> > Thanks!
> > Bjoern
> 
> Can you share some example output from tbl2asn that shows
> this problem? Ideally something small we could include as a
> unit test.

please find attached a small, stripped version of such an genbank file.

Thanks,
Bjoern

> Thanks,
> 
> Peter

-------------- next part --------------
A non-text attachment was scrubbed...
Name: tbl1asn_output.gb
Type: application/x-gameboy-rom
Size: 5090 bytes
Desc: 
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20121001/1d6940cf/attachment.bin>

From p.j.a.cock at googlemail.com  Thu Oct  4 05:11:01 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 4 Oct 2012 10:11:01 +0100
Subject: [Biopython-dev] [Patch] Genbank Parser
In-Reply-To: <1349127850.19730.11.camel@threonin>
References: <1348837402.21455.1.camel@threonin>
	<CAKVJ-_5nekcTBYejUTVV6VvjV+mB0WV0eoEWKytGZOTmgfmw1g@mail.gmail.com>
	<1349127850.19730.11.camel@threonin>
Message-ID: <CAKVJ-_5Bb_QEAVmTZz_oHkKXbSBe2g86=ekVZ+Xtt326bbJQLQ@mail.gmail.com>

On Mon, Oct 1, 2012 at 10:44 PM, Bj?rn Gr?ning <bjoern at gruenings.eu> wrote:
> Hi Peter,
>
>> >
>> > the tbl2asn tool from the ncbi creates genbank files that did not have a
>> > version number. Unfortunately that version number is used to fill
>> > consumer.data.id.
>> > I implemented the following fall-back:
>> > If there is no version information available than it takes the
>> > consumer.data.name for the consumer.data.id. Does that makes sense?
>> >
>> > Thanks!
>> > Bjoern
>>
>> Can you share some example output from tbl2asn that shows
>> this problem? Ideally something small we could include as a
>> unit test.
>
> please find attached a small, stripped version of such an genbank file.
>
> Thanks,
> Bjoern

$ python
Python 2.7.2 (default, Jun 20 2012, 16:23:33)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import SeqIO
>>> r = SeqIO.read("tbl1asn_output.gb", "gb")
/Library/Python/2.7/site-packages/Bio/GenBank/__init__.py:1158:
BiopythonParserWarning: Expected sequence length 300246, found 2220
().
  BiopythonParserWarning)
>>> r.id
''
>>> r.name
'Seq1'
>>> r.description
'Glarea strain lozoyensis.'
>>> quit()

That warning is because this test file has only the start of the sequence
present, yet the LOCUS line still gives the original length.

$ head tbl1asn_output.gb
LOCUS       Seq1                  300246 bp    DNA     linear       10-MAY-2012
DEFINITION  Glarea strain lozoyensis.
ACCESSION
VERSION
KEYWORDS    .
SOURCE      Glarea
  ORGANISM  Glarea
            Unclassified.
REFERENCE   1
  AUTHORS   Test

I didn't use your patch - looking over the code, it was already intended
that if there was no record.id that record.name would be used. Sadly
this was a bit too strict about None versus an empty string, fixed:
https://github.com/biopython/biopython/commit/e67d22e4b4f344a5a3c15b6e939c82f58986d87f

Thanks for your help,

Peter


From chapmanb at 50mail.com  Thu Oct  4 21:02:06 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 04 Oct 2012 21:02:06 -0400
Subject: [Biopython-dev] TAIR/AGI support
In-Reply-To: <CAH80STVrvSnxp4JkgrZoywMQqiMg8t=nJtTcGnNggCe4k-Y4aQ@mail.gmail.com>
References: <CAH80STXOOUjqYcQ82C2C25-gACyzwx0D4-VD+CMTes90CdZbnw@mail.gmail.com>
	<87txvcx9ls.fsf@fastmail.fm>
	<CAH80STVrvSnxp4JkgrZoywMQqiMg8t=nJtTcGnNggCe4k-Y4aQ@mail.gmail.com>
Message-ID: <874nm9g29d.fsf@fastmail.fm>


Kevin;
Thanks for making this available. This looks like a great start and
seems like it would be a nice starting place for folks dealing with
Arabidopsis data. A couple of thoughts which you've essentially already
covered:

- Could you build up a small test suite that fits into the testing
  framework:

  http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc246

  Your probably the best person to pick some disparate IDs that exercise
  different components and try to catch any edge cases.

- Additional interfaces that help folks do more than get sequence are a
  great idea. The ideas you've proposed below sound perfect.

- Provide some documentation on the Cookbook for common use cases with
  Biopython + your module. This will help motivate the addition and also
  help folks test it out on their data.

Thanks again for making this available,
Brad


> Hi Brad,
>
> My TAIR/AGI script is on github here:
> https://github.com/kdmurray91/biopython/blob/master/Bio/TAIR/__init__.py
>
> I got it to work directly from TAIR's website, however it has not been
> rigorously tested. I plan on implementing the process as i described in my
> previous email, whereby it fetches the Genbank record from TOGOws or via
> NCBI's Efetch (using biopython's interfaces of course). I will keep you all
> posted.
>
> To the list in general, I'm open to suggestions on what to work on next?
>
>
> Regards
> Kevin Murray
>
>
> On 6 September 2012 10:45, Brad Chapman <chapmanb at 50mail.com> wrote:
>
>>
>> Kevin;
>> Thanks for the e-mail and offers of code. Always happy to have other
>> folks involved with the project.
>>
>> > What's the status of TAIR AGIs in BioPython (I can see no mention of
>> them,
>> > or support for them)? I've written a brief module which allows a user to
>> > query NCBI with a TAIR AGI, returning a Seq object (via Efetch). Is there
>> > any interest in including such functionality in BioPython?
>>
>> Is the code available on GitHub to get a better sense of all the
>> functionality it supports? Do you have an idea where it would fit best?
>> As a tair submodule inside of Bio.Entrez, or somewhere else?
>>
>> > More generally, are there any particular areas of BioPython development
>> > which could use an extra pair of hands?
>>
>> Following the mailing list for discussions on current projects is the
>> best way to get a sense of what different folks are working on. The
>> issue tracker also has open issues and features that could use attention
>> if anything there strikes your fancy:
>>
>> https://redmine.open-bio.org/projects/biopython
>>
>> Hope this helps,
>> Brad
>>
>>

From tiagoantao at gmail.com  Fri Oct  5 23:21:50 2012
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 5 Oct 2012 20:21:50 -0700
Subject: [Biopython-dev] Away Re: buildbot failure in Biopython on Windows
	XP - Python 2.5
Message-ID: <CAA9RGEPgJabH5mPrOB5M-AVx4-jrCM2SjwAgWUhg0Gb97vPAgw@mail.gmail.com>

I am currently away from office. I will respond back on as soon as I retunr.

Regards,
Tiago

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin

From chris.mit7 at gmail.com  Sun Oct  7 22:48:20 2012
From: chris.mit7 at gmail.com (Chris Mitchell)
Date: Sun, 7 Oct 2012 22:48:20 -0400
Subject: [Biopython-dev] Proteomics/Mass Spec in Biopython
Message-ID: <CAK_U6OBpNCYoSuq70wAokoqn78T8p3CAFgw+TTNt-ebdTGVj6Q@mail.gmail.com>

Hi everyone,

I recall some time ago there was an email about getting some mass spec
functionality within BioPython.  I started a BioPython branch to
incorporate some iterators for common file types.  Of note, there is an
iterator for .msf files created by Proteome Discoverer, which thankfully is
light-years faster than using PD (and much more forgiving on memory...).

It's located here:
https://github.com/chrismit/biopython/tree/Proteomics

It's following along the progression of my spectra viewer, which is hosted
on the same repository (which, for anyone using linux might want to look
at; I couldn't find a spectra viewer I liked for linux.).  As I generalize
more of the methods within that program I'll be adding them to the
BioPython branch.  Also, I'll be putting in some methods to take care of
other common tasks such as FDRs calculation from the input files.

I'd love to hear if anyone else wants to join up on this branch or provide
suggestions.

Chris

From redmine at redmine.open-bio.org  Wed Oct 10 09:02:23 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 10 Oct 2012 13:02:23 +0000
Subject: [Biopython-dev] [Biopython - Bug #3386] (New) NewickIO parse_tree
	is slow
Message-ID: <redmine.issue-3386.20121010130223@redmine.open-bio.org>


Issue #3386 has been reported by Aleksey Kladov.

----------------------------------------
Bug #3386: NewickIO parse_tree is slow
https://redmine.open-bio.org/issues/3386

Author: Aleksey Kladov
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


In the file NewickIO.py class Parser method _parse_subtree seems to be inefficient in time and space. In fact, it's running time is quadratic in respect to size of input, while it can be linear. The problem is that each symbol is read many (up to O(len(text))) times, for example here

<pre>
for posn in range(1, close_posn):
            if text[posn] == '(':
                plevel += 1
            elif text[posn] == ')':
                plevel -= 1
            elif text[posn] == ',' and plevel == 0:
                subtrees.append(text[prev:posn])
                prev = posn + 1
</pre>

or here

<pre>
comment_start = text.find(NODECOMMENT_START)
</pre>

Also, _parse_subtree relies heavily on slices and strips of strings, which gives quadratic memory consumption.

Here is my dirty patched implementation. It's incomplete in many senses, I wrote it only to prove that parsing can be done faster.

For unrooted binary tree with 15000 leaves it runs for 1 second, compared to 13 seconds from current implementation.

<pre>
def _parse_tree(self, text, rooted):
        """Parses the text representation into an Tree object."""
        # XXX Pass **kwargs along from Parser.parse?
        return Newick.Tree(root=self._parse_subtree_fast(text)[0], rooted=rooted)

    def _parse_subtree_fast(self, text):
        id = re.compile(r'[A-Za-z0-9_]+')
        children = []
        if text.startswith('('):
            text = text[1:]
            while True:
                child, text = self._parse_subtree_fast(text)
                children.append(child)
                if text.startswith(','):
                    text = text[1:]
                else:
                    text = text[1:]
                    break
        m = re.match(id, text)
        if m:
            clade = self._parse_tag(m.group())
            text = text[m.end():]
        else:
            clade = Newick.Clade(comment=None)
        clade.clades = children
        return clade, text
</pre>

PS. I don't know if someone really needs to parse huge trees with BioPython, but I need this feature for couple of http://rosalind.info problems


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From kjwu at ucsd.edu  Wed Oct 10 17:27:19 2012
From: kjwu at ucsd.edu (Kevin Wu)
Date: Wed, 10 Oct 2012 14:27:19 -0700
Subject: [Biopython-dev] KEGG API Wrapper
Message-ID: <CAEe6yUE61E=ekS0zFGN-cUDw0-0+ExB-PGDwdXLMYgbQBPUnAA@mail.gmail.com>

Hi,

I've written a simple wrapper on top of KEGG's new REST API (
http://www.kegg.jp/kegg/docs/keggapi.html). The main functionality of this
module is that can detect some invalid queries based on kegg's defined
rules. I've implemented each of the examples given on the api docs as tests
as well. Here's a quick example of its usage.

The api call to http://rest.kegg.jp/get/hsa:10458+ece:Z5100/aaseq can be
done using the wrapper as:
KEGG.query("get", ["hsa:10458", "ece:Z5100"], "aaseq")

Querying the api works well with the current parsers written for KEGG
formats. Let me know if there are issues or if it's useful enough to be
merged into Biopython!

https://github.com/kevinwuhoo/biopython

Thanks!
Kevin

From mjldehoon at yahoo.com  Sat Oct 13 07:38:04 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 13 Oct 2012 04:38:04 -0700 (PDT)
Subject: [Biopython-dev] KEGG API Wrapper
In-Reply-To: <CAEe6yUE61E=ekS0zFGN-cUDw0-0+ExB-PGDwdXLMYgbQBPUnAA@mail.gmail.com>
Message-ID: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi Kevin,

It would be great to have better KEGG support in Biopython, so I think that this is useful and could in principle be merged into Biopython. But before we do so, is there some documentation for your code (or even better, for the Bio.KEGG module as a whole)? Then it's easier to see how this module can be used, and to discuss any modifications.

Thanks for your contribution!
-Michiel.

--- On Wed, 10/10/12, Kevin Wu <kjwu at ucsd.edu> wrote:

> From: Kevin Wu <kjwu at ucsd.edu>
> Subject: [Biopython-dev] KEGG API Wrapper
> To: Biopython-dev at lists.open-bio.org
> Date: Wednesday, October 10, 2012, 5:27 PM
> Hi,
> 
> I've written a simple wrapper on top of KEGG's new REST API
> (
> http://www.kegg.jp/kegg/docs/keggapi.html). The main
> functionality of this
> module is that can detect some invalid queries based on
> kegg's defined
> rules. I've implemented each of the examples given on the
> api docs as tests
> as well. Here's a quick example of its usage.
> 
> The api call to http://rest.kegg.jp/get/hsa:10458+ece:Z5100/aaseq can
> be
> done using the wrapper as:
> KEGG.query("get", ["hsa:10458", "ece:Z5100"], "aaseq")
> 
> Querying the api works well with the current parsers written
> for KEGG
> formats. Let me know if there are issues or if it's useful
> enough to be
> merged into Biopython!
> 
> https://github.com/kevinwuhoo/biopython
> 
> Thanks!
> Kevin
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 

From chapmanb at 50mail.com  Mon Oct 15 11:02:12 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 15 Oct 2012 11:02:12 -0400
Subject: [Biopython-dev] BOSC/Broad Interoperability Hackathon: potential
	dates
Message-ID: <87ipabeq2z.fsf@fastmail.fm>


Hi all;
Open Bio regularly organizes hackathon coding sessions in conjunction
with the Bioinformatics Open Source Conference. The goal is to get
together biologists writing open source code, provide a room and
internet, and encourage fun collaborative coding. We've had successful
two day Codefests the past three years:

http://www.open-bio.org/wiki/Codefest_2012

This year, the Broad Institute kindly offered to host a two day
Hackathon in Boston during April. We've proposed three sets of dates:

April 4-5th, Thursday and Friday before Bio-IT
April 7-8th, Sunday and Monday before Bio-IT
April 22-23rd, Monday and Tuesday

If you have interest in attending, please fill out this Doodle poll to
let us know which dates work best:

http://doodle.com/aapy694g43e6ya4f

If you can find funds for travel and hotel (or are local to Boston), the
event is free and everyone is welcome. As we finalize dates, we'll send
around additional details. Thanks everyone,
Brad

From k.d.murray.91 at gmail.com  Mon Oct 15 23:49:22 2012
From: k.d.murray.91 at gmail.com (Kevin Murray)
Date: Tue, 16 Oct 2012 14:49:22 +1100
Subject: [Biopython-dev] TAIR/AGI support
In-Reply-To: <874nm9g29d.fsf@fastmail.fm>
References: <CAH80STXOOUjqYcQ82C2C25-gACyzwx0D4-VD+CMTes90CdZbnw@mail.gmail.com>
	<87txvcx9ls.fsf@fastmail.fm>
	<CAH80STVrvSnxp4JkgrZoywMQqiMg8t=nJtTcGnNggCe4k-Y4aQ@mail.gmail.com>
	<874nm9g29d.fsf@fastmail.fm>
Message-ID: <CAH80STXQNyPWYgk0mEWApd45Da1gmDHg05QXBmGjfkXeksc0EA@mail.gmail.com>

Brad,

I shall work on this as time permits, and get back to you all when complete.
Cheers,

Regards
Kevin Murray


On 5 October 2012 11:02, Brad Chapman <chapmanb at 50mail.com> wrote:

>
> Kevin;
> Thanks for making this available. This looks like a great start and
> seems like it would be a nice starting place for folks dealing with
> Arabidopsis data. A couple of thoughts which you've essentially already
> covered:
>
> - Could you build up a small test suite that fits into the testing
>   framework:
>
>   http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc246
>
>   Your probably the best person to pick some disparate IDs that exercise
>   different components and try to catch any edge cases.
>
> - Additional interfaces that help folks do more than get sequence are a
>   great idea. The ideas you've proposed below sound perfect.
>
> - Provide some documentation on the Cookbook for common use cases with
>   Biopython + your module. This will help motivate the addition and also
>   help folks test it out on their data.
>
> Thanks again for making this available,
> Brad
>
>
> > Hi Brad,
> >
> > My TAIR/AGI script is on github here:
> > https://github.com/kdmurray91/biopython/blob/master/Bio/TAIR/__init__.py
> >
> > I got it to work directly from TAIR's website, however it has not been
> > rigorously tested. I plan on implementing the process as i described in
> my
> > previous email, whereby it fetches the Genbank record from TOGOws or via
> > NCBI's Efetch (using biopython's interfaces of course). I will keep you
> all
> > posted.
> >
> > To the list in general, I'm open to suggestions on what to work on next?
> >
> >
> > Regards
> > Kevin Murray
> >
> >
> > On 6 September 2012 10:45, Brad Chapman <chapmanb at 50mail.com> wrote:
> >
> >>
> >> Kevin;
> >> Thanks for the e-mail and offers of code. Always happy to have other
> >> folks involved with the project.
> >>
> >> > What's the status of TAIR AGIs in BioPython (I can see no mention of
> >> them,
> >> > or support for them)? I've written a brief module which allows a user
> to
> >> > query NCBI with a TAIR AGI, returning a Seq object (via Efetch). Is
> there
> >> > any interest in including such functionality in BioPython?
> >>
> >> Is the code available on GitHub to get a better sense of all the
> >> functionality it supports? Do you have an idea where it would fit best?
> >> As a tair submodule inside of Bio.Entrez, or somewhere else?
> >>
> >> > More generally, are there any particular areas of BioPython
> development
> >> > which could use an extra pair of hands?
> >>
> >> Following the mailing list for discussions on current projects is the
> >> best way to get a sense of what different folks are working on. The
> >> issue tracker also has open issues and features that could use attention
> >> if anything there strikes your fancy:
> >>
> >> https://redmine.open-bio.org/projects/biopython
> >>
> >> Hope this helps,
> >> Brad
> >>
> >>
>

From zcharlop at mail.rockefeller.edu  Tue Oct 16 19:55:26 2012
From: zcharlop at mail.rockefeller.edu (Zachary Charlop-Powers)
Date: Tue, 16 Oct 2012 23:55:26 +0000
Subject: [Biopython-dev] KEGG API Wrapper
In-Reply-To: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <C45C27DAD51E2647959A053EBBF42C86089480@RUMBX2.rockefeller.edu>

Kevin,
Michiel,


I just tested Kevin's code for a few simple queries and it worked great. I have always liked KEGG's organization of data and really appreciate this RESTful interface to their data; in some ways I think it easier to use the web interfaces for KEGG than it is for NCBI. Plus the KEGG coverage of metabolic networks is awesome.  I found the examples in Kevin's test script to be fairly self-explanatory but a simple-spelled out example in the Tutorial would be nice.

One thought, though, is that you can retrieve MANY different types of data from the KEGG Rest API - which means that the user will probably have to parse the data his/herself. Data retrieved with "list" can return lists of genes or compounds or organism and after a  cursory look  these are each formatted differently. Also true with the 'find' command. So I think you were right to leave out parsers because i think they will be a moving target highly dependent on the query.

Thank You Kevin,
zach cp


On Oct 13, 2012, at 7:38 AM, Michiel de Hoon <mjldehoon at yahoo.com<mailto:mjldehoon at yahoo.com>> wrote:

Hi Kevin,

It would be great to have better KEGG support in Biopython, so I think that this is useful and could in principle be merged into Biopython. But before we do so, is there some documentation for your code (or even better, for the Bio.KEGG module as a whole)? Then it's easier to see how this module can be used, and to discuss any modifications.

Thanks for your contribution!
-Michiel.

--- On Wed, 10/10/12, Kevin Wu <kjwu at ucsd.edu<mailto:kjwu at ucsd.edu>> wrote:

From: Kevin Wu <kjwu at ucsd.edu<mailto:kjwu at ucsd.edu>>
Subject: [Biopython-dev] KEGG API Wrapper
To: Biopython-dev at lists.open-bio.org<mailto:Biopython-dev at lists.open-bio.org>
Date: Wednesday, October 10, 2012, 5:27 PM
Hi,

I've written a simple wrapper on top of KEGG's new REST API
(
http://www.kegg.jp/kegg/docs/keggapi.html). The main
functionality of this
module is that can detect some invalid queries based on
kegg's defined
rules. I've implemented each of the examples given on the
api docs as tests
as well. Here's a quick example of its usage.

The api call to http://rest.kegg.jp/get/hsa:10458+ece:Z5100/aaseq can
be
done using the wrapper as:
KEGG.query("get", ["hsa:10458", "ece:Z5100"], "aaseq")

Querying the api works well with the current parsers written
for KEGG
formats. Let me know if there are issues or if it's useful
enough to be
merged into Biopython!

https://github.com/kevinwuhoo/biopython

Thanks!
Kevin
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev

_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org<mailto:Biopython-dev at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/biopython-dev

Zach Charlop-Powers
Post-Doctoral Fellow
Laboratory of Genetically Encoded Small Molecules
Rockefeller University

zcharlop at rockefeller.edu<mailto:zcharlop at rockefeller.edu>


From p.j.a.cock at googlemail.com  Wed Oct 17 07:09:07 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 17 Oct 2012 12:09:07 +0100
Subject: [Biopython-dev] KEGG API Wrapper
In-Reply-To: <C45C27DAD51E2647959A053EBBF42C86089480@RUMBX2.rockefeller.edu>
References: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com>
	<C45C27DAD51E2647959A053EBBF42C86089480@RUMBX2.rockefeller.edu>
Message-ID: <CAKVJ-_7Ao1gdtF2_-7GH89qWGtseLVuJ4beB9bUpun5DLwcQsA@mail.gmail.com>

On Wed, Oct 17, 2012 at 12:55 AM, Zachary Charlop-Powers
<zcharlop at mail.rockefeller.edu> wrote:
> Kevin,
> Michiel,
>
> I just tested Kevin's code for a few simple queries and it worked great. I
> have always liked KEGG's organization of data and really appreciate this
> RESTful interface to their data; in some ways I think it easier to use the
> web interfaces for KEGG than it is for NCBI. Plus the KEGG coverage of
> metabolic networks is awesome.  I found the examples in Kevin's test script
> to be fairly self-explanatory but a simple-spelled out example in the
> Tutorial would be nice.
>
> One thought, though, is that you can retrieve MANY different types of data
> from the KEGG Rest API - which means that the user will probably have to
> parse the data his/herself. Data retrieved with "list" can return lists of
> genes or compounds or organism and after a  cursory look  these are each
> formatted differently. Also true with the 'find' command. So I think you
> were right to leave out parsers because i think they will be a moving target
> highly dependent on the query.
>
> Thank You Kevin,
> zach cp

Good point about decoupling the web API wrapper and the parsers -
how the Bio.Entrez module and Bio.TogoWS handle this is to return
handles for web results, which you can then parse with an appropriate
parser (e.g. SeqIO for GenBank files, Medline parser, etc).

Note that this is a little more fiddly under Python 3 due to the text
mode distinction between unicode and binary... just something to
keep in the back of your mind.

Peter

From redmine at redmine.open-bio.org  Wed Oct 17 09:27:18 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 17 Oct 2012 13:27:18 +0000
Subject: [Biopython-dev] [Biopython - Bug #3387] (New) Generic per column
	annotation from stockholm alignment are not stored in
	alignment object
Message-ID: <redmine.issue-3387.20121017132718@redmine.open-bio.org>


Issue #3387 has been reported by saverio vicario.

----------------------------------------
Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object
https://redmine.open-bio.org/issues/3387

Author: saverio vicario
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Stockholm format includes 4 types of annotations
#=GF <feature> <Generic per-File annotation, free text>
#=GC <feature> <Generic per-Column annotation, exactly 1 char per column>
#=GS <seqname> <feature> <Generic per-Sequence annotation, free text>
#=GR <seqname> <feature> <Generic per-Sequence AND per-Column markup, exactly 1 char per column>
GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations.

GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. 


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Wed Oct 17 09:27:18 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 17 Oct 2012 13:27:18 +0000
Subject: [Biopython-dev] [Biopython - Bug #3387] (New) Generic per column
	annotation from stockholm alignment are not stored in
	alignment object
Message-ID: <redmine.issue-3387.20121017132718@redmine.open-bio.org>


Issue #3387 has been reported by saverio vicario.

----------------------------------------
Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object
https://redmine.open-bio.org/issues/3387

Author: saverio vicario
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Stockholm format includes 4 types of annotations
#=GF <feature> <Generic per-File annotation, free text>
#=GC <feature> <Generic per-Column annotation, exactly 1 char per column>
#=GS <seqname> <feature> <Generic per-Sequence annotation, free text>
#=GR <seqname> <feature> <Generic per-Sequence AND per-Column markup, exactly 1 char per column>
GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations.

GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. 


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Wed Oct 17 09:36:24 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 17 Oct 2012 13:36:24 +0000
Subject: [Biopython-dev] [Biopython - Bug #3387] Generic per column
	annotation from stockholm alignment are not stored in
	alignment object
References: <redmine.issue-3387.20121017132718@redmine.open-bio.org>
Message-ID: <redmine.journal-14973.20121017133624@redmine.open-bio.org>


Issue #3387 has been updated by Peter Cock.


The underlying alignment class would need a per-column-annotation dictionary (as well as an annotations dictionary, also on the TODO list), to match the per-letter-annotation and annotations dictionaries of the SeqRecord.

Parsing this and putting it in alignment._letter_annotation (dictionary as a private variable) would be a reasonable short term hack if you'd like to work on that.
----------------------------------------
Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object
https://redmine.open-bio.org/issues/3387

Author: saverio vicario
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Stockholm format includes 4 types of annotations
#=GF <feature> <Generic per-File annotation, free text>
#=GC <feature> <Generic per-Column annotation, exactly 1 char per column>
#=GS <seqname> <feature> <Generic per-Sequence annotation, free text>
#=GR <seqname> <feature> <Generic per-Sequence AND per-Column markup, exactly 1 char per column>
GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations.

GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. 


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Wed Oct 17 09:39:25 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 17 Oct 2012 13:39:25 +0000
Subject: [Biopython-dev] [Biopython - Feature #3388] (New) add annotation
	and letter_annotations attributed for
	Bio.Align.MultipleSeqAlignment. object
Message-ID: <redmine.issue-3388.20121017133925@redmine.open-bio.org>


Issue #3388 has been reported by saverio vicario.

----------------------------------------
Feature #3388: add annotation and letter_annotations attributed for Bio.Align.MultipleSeqAlignment. object
https://redmine.open-bio.org/issues/3388

Author: saverio vicario
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


At the moment I could not add annotation at alignment level.  annotation could be usefull for tracking info linked to the loci ( i.e. name of domain), while letter annotation could be usefull to track quality score of alignment or if the sites belong to a given character set.
In particular when to alignment are merged it would be usefull tha the bounduary of the merge is tracked
for example in Letter annotation of the merge of an alignment a with 10 sites and b of 5 sites the letter_annotations would be as following 

{locus1:'111111111100000',locus2:'000000000011111'} 
this could be usefull also to annotate the 3 position of codons
{pos1:'1001001001',pos2:'0100100100', pos3:'0010010010'}

If this letter_annotation would be supported the annotation could be kept across merging and splitting of the alignment


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Wed Oct 17 09:39:25 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 17 Oct 2012 13:39:25 +0000
Subject: [Biopython-dev] [Biopython - Feature #3388] (New) add annotation
	and letter_annotations attributed for
	Bio.Align.MultipleSeqAlignment. object
Message-ID: <redmine.issue-3388.20121017133925@redmine.open-bio.org>


Issue #3388 has been reported by saverio vicario.

----------------------------------------
Feature #3388: add annotation and letter_annotations attributed for Bio.Align.MultipleSeqAlignment. object
https://redmine.open-bio.org/issues/3388

Author: saverio vicario
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


At the moment I could not add annotation at alignment level.  annotation could be usefull for tracking info linked to the loci ( i.e. name of domain), while letter annotation could be usefull to track quality score of alignment or if the sites belong to a given character set.
In particular when to alignment are merged it would be usefull tha the bounduary of the merge is tracked
for example in Letter annotation of the merge of an alignment a with 10 sites and b of 5 sites the letter_annotations would be as following 

{locus1:'111111111100000',locus2:'000000000011111'} 
this could be usefull also to annotate the 3 position of codons
{pos1:'1001001001',pos2:'0100100100', pos3:'0010010010'}

If this letter_annotation would be supported the annotation could be kept across merging and splitting of the alignment


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Wed Oct 17 11:00:15 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 17 Oct 2012 15:00:15 +0000
Subject: [Biopython-dev] [Biopython - Bug #3387] Generic per column
	annotation from stockholm alignment are not stored in
	alignment object
References: <redmine.issue-3387.20121017132718@redmine.open-bio.org>
Message-ID: <redmine.journal-14974.20121017150015@redmine.open-bio.org>


Issue #3387 has been updated by Peter Cock.


Depends on issue #3388, add annotation and letter_annotations attributed to Bio.Align.MultipleSeqAlignment object
https://redmine.open-bio.org/issues/3388
----------------------------------------
Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object
https://redmine.open-bio.org/issues/3387

Author: saverio vicario
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Stockholm format includes 4 types of annotations
#=GF <feature> <Generic per-File annotation, free text>
#=GC <feature> <Generic per-Column annotation, exactly 1 char per column>
#=GS <seqname> <feature> <Generic per-Sequence annotation, free text>
#=GR <seqname> <feature> <Generic per-Sequence AND per-Column markup, exactly 1 char per column>
GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations.

GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. 


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Oct 18 07:02:49 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 18 Oct 2012 11:02:49 +0000
Subject: [Biopython-dev] [Biopython - Bug #3387] Generic per column
	annotation from stockholm alignment are not stored in
	alignment object
References: <redmine.issue-3387.20121017132718@redmine.open-bio.org>
Message-ID: <redmine.journal-14975.20121018110249@redmine.open-bio.org>


Issue #3387 has been updated by saverio vicario.

File diff_StockholmIO.py added
File StockholmIO.py added

This is my proposal of patch for StockholmIO.
Attached you will find the new StockholmIO.py and a diff file with the old one. 
To highlight further the new comments I start the comment by #SV 

In summary the patch implement the new attribute _letter_annotations for Bio.Align.MultipleSeqAlignment and store the GC features within, in the iterator while in the writer write the GC features after all sequence record as stated in http://sonnhammer.sbc.su.se/Stockholm.html.

I added a new dictionary for GC and GF features using PFAM standard and it is used in the writing phase to write only PFAM legitimate attributes. The only addition to PFAM standard is the GC features "RF" that is add by HMMer3.0 softwares to indicates what sites where originally present in the profile used to generate the alignment. 

I do not use the dictionary of PFAM standard to translate the GF, GR attributes of alignment._annotations or the GC attributes in alignment._letter_annotations as is done in the seqRecord for consistency with decision taken originally with GR attributes in alignment._annotations

----------------------------------------
Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object
https://redmine.open-bio.org/issues/3387

Author: saverio vicario
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Stockholm format includes 4 types of annotations
#=GF <feature> <Generic per-File annotation, free text>
#=GC <feature> <Generic per-Column annotation, exactly 1 char per column>
#=GS <seqname> <feature> <Generic per-Sequence annotation, free text>
#=GR <seqname> <feature> <Generic per-Sequence AND per-Column markup, exactly 1 char per column>
GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations.

GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. 


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Thu Oct 18 14:33:04 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 18 Oct 2012 19:33:04 +0100
Subject: [Biopython-dev] PyPy 1.8 support?
Message-ID: <CAKVJ-_61pBGwbFRaYB9UqWmtozZpZ_JStYdaKfzArGZn29RQ6w@mail.gmail.com>

Hello all,

We currently run the test suite against both PyPy 1.8 and
1.9 on Linux via the TravisCI.org continuous integration
testing service.

Is anyone actually using Biopython under PyPy 1.8?

If not, I intend to drop automated testing under PyPy 1.8
and focus just on PyPy 1.9 instead.

(Automated testing under C Python 2.5, 2.6, 2.7, 3.1 and
3.2 etc will continue - I'm hoping to add Python 3.3 as well)

Thanks,

Peter

From ben at benfulton.net  Thu Oct 18 23:16:45 2012
From: ben at benfulton.net (Ben Fulton)
Date: Thu, 18 Oct 2012 23:16:45 -0400
Subject: [Biopython-dev] Contributing startup
Message-ID: <CA+ijMsk_dCk0w+MGiAtzzE8rAqWAZ4BzDxwCA7yniF1CS-o4TQ@mail.gmail.com>

Hi,

I was looking for some introductory tickets or other methods to familiarize
myself with the Biopython codebase. I saw some suggestions on the wiki to
improve unit test coverage or to add additional file formats, which sounds
fine - are there particular areas of code that lack coverage, or file
formats that are particularly wanted? Or would it be better to look over
the issue tracker and try to identify some smallish issues?

Thanks for any suggestions.

Ben Fulton

From p.j.a.cock at googlemail.com  Fri Oct 19 03:52:19 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 19 Oct 2012 08:52:19 +0100
Subject: [Biopython-dev] PyPy 1.8 support?
In-Reply-To: <CAKVJ-_61pBGwbFRaYB9UqWmtozZpZ_JStYdaKfzArGZn29RQ6w@mail.gmail.com>
References: <CAKVJ-_61pBGwbFRaYB9UqWmtozZpZ_JStYdaKfzArGZn29RQ6w@mail.gmail.com>
Message-ID: <CAKVJ-_57=R6aMQSxndyVGtJtZ1O8_Q2kF2BPmTK-GyKVKhR_PA@mail.gmail.com>

On Thu, Oct 18, 2012 at 7:33 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hello all,
>
> We currently run the test suite against both PyPy 1.8 and
> 1.9 on Linux via the TravisCI.org continuous integration
> testing service.
>
> Is anyone actually using Biopython under PyPy 1.8?
>
> If not, I intend to drop automated testing under PyPy 1.8
> and focus just on PyPy 1.9 instead.

Done on TravisCI, but easy to revert:
https://github.com/biopython/biopython/commit/126c944812730df4677c8fa2f63abc29ddd084bb

One reason was the previous build failed due to a timeout
fetching PyPy for a custom install. Now we use the TravisCI
provided PyPy which should avoid that issue.

(It still happens for Jython sometimes).

Peter

From p.j.a.cock at googlemail.com  Fri Oct 19 04:26:35 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 19 Oct 2012 09:26:35 +0100
Subject: [Biopython-dev] Contributing startup
In-Reply-To: <CA+ijMsk_dCk0w+MGiAtzzE8rAqWAZ4BzDxwCA7yniF1CS-o4TQ@mail.gmail.com>
References: <CA+ijMsk_dCk0w+MGiAtzzE8rAqWAZ4BzDxwCA7yniF1CS-o4TQ@mail.gmail.com>
Message-ID: <CAKVJ-_6w14q3-6nq1QSs_yHXONh+CZDWk4YCbELrGfs6g8D3ug@mail.gmail.com>

On Fri, Oct 19, 2012 at 4:16 AM, Ben Fulton <ben at benfulton.net> wrote:
> Hi,
>
> I was looking for some introductory tickets or other methods to familiarize
> myself with the Biopython codebase. I saw some suggestions on the wiki to
> improve unit test coverage or to add additional file formats, which sounds
> fine - are there particular areas of code that lack coverage, or file
> formats that are particularly wanted? Or would it be better to look over
> the issue tracker and try to identify some smallish issues?
>
> Thanks for any suggestions.
>
> Ben Fulton

Hi Ben,

Welcome - more volunteer developers willing to help is always nice.

You asked about test coverage, and while I could guess about things
what might be most interesting would be to try and measure this
using something like coverage or figleaf:
http://nedbatchelder.com/code/coverage/
http://darcs.idyll.org/~t/projects/figleaf/doc/

Another general area would be improving our support under
Python 3.

In terms of specific modules, is there anything in particular which
seems like a good match with your work/research interests?

Regards,

Peter

From p.j.a.cock at googlemail.com  Mon Oct 22 12:43:07 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 22 Oct 2012 17:43:07 +0100
Subject: [Biopython-dev] Low level string based FASTA parser
Message-ID: <CAKVJ-_7XXJqby4HBPAv7P-=fVBcKC98+ev+upB4Cd-6xmjw31A@mail.gmail.com>

Hello all,

Something I've wanted/needed recently was a low-level FASTA
iterating parser which just returns tuples of strings (without the
overhead of Bio.SeqIO building SeqRecords).

We don't currently have such a thing, so I have added one to the
SeqIO Fasta module (mirroring the low level string-tuple parser
for FASTQ files) with some associated unit tests and refactoring
(separate commits):

https://github.com/biopython/biopython/commit/751fe39765ca6ba60e517b3b4657718fd48f7817

Does anyone have any views on the name of this new
function, currently SimpleFastaParser, used as follows:

    >>> from Bio.SeqIO.FastaIO import SimpleFastaParser
    >>> with open("Fasta/dups.fasta") as handle:
    ...     for values in SimpleFastaParser(handle):
    ...         print values
    ('alpha', 'ACGTA')
    ('beta', 'CGTC')
    ('gamma', 'CCGCC')
    ('alpha (again - this is a duplicate entry to test the indexing
code)', 'ACGTA')
    ('delta', 'CGCGC')

The capitalisation style is consistent with other functions in
SeqIO, but not with PEP8.

Peter

P.S. I've also updated the legacy function quick_FASTA_reader
in Bio.SeqUtils to use this. Since it loads the whole dataset into
memory, if no one objects I would like to deprecate this old function.

From p.j.a.cock at googlemail.com  Mon Oct 22 13:08:47 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 22 Oct 2012 18:08:47 +0100
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_4PV3VMx5pju65578gq8TSN936T5ePH_cjhtUQcrECHYg@mail.gmail.com>
References: <CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<1346926418.97489.YahooMailClassic@web164004.mail.gq1.yahoo.com>
	<CAMC681n6=UuotEUdxGVEWDK4vPGd3=4O0yW82UQ3upTNMfy1iw@mail.gmail.com>
	<CAKVJ-_6rTsfqphX6i+YGA8ijLN+04kP+Gxk=BjwWCcXJtF97Vg@mail.gmail.com>
	<CAKVJ-_7-KXVZ96bHLG6XD88zcN9rPvnTf7yQ0E6J1jhb_5yx+g@mail.gmail.com>
	<CAKVJ-_6U0PrsTWM8sMPgsSX8cnfTandTGKz5j829K8so7whPgA@mail.gmail.com>
	<CAKVJ-_4PV3VMx5pju65578gq8TSN936T5ePH_cjhtUQcrECHYg@mail.gmail.com>
Message-ID: <CAKVJ-_5ZGQrj_xt4b5a7s7TrZ=7B-PXCJfBykHYNidE2L991jg@mail.gmail.com>

On Fri, Sep 28, 2012 at 11:50 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Sep 20, 2012 at 10:08 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> On Sun, Sep 16, 2012 at 1:34 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>>
>>>> I guess we need to have a little hack with the 2to3 library and
>>>> try defining our own custom fixer for the imports...
>>>
>>> I've made a start at this - the easy part seems to work :)
>>>
>>> https://github.com/peterjc/biopython/commits/py3lower
>>>
>>> ...
>
> The code to do this lower case name mangling remains
> a quite spaghetti like mess in do2to3.py but it now works
> enough to pass the test suite (with some but not all 3rd
> party dependencies installed) under Linux and my Mac
> OS X machine (where like Windows I have a case
> insensitive file system).
>
> ...
>
> So this idea to adopt PEP8 lower case module names
> as part of supporting Python 3 appears to be technically
> viable.

Has anyone else tried this branch yet? Has the lower case
module names under Python 3 idea grown on anyone?
I think it makes sense in terms of a long term vision - I do
expect to be primarily working under Python 3 within a
couple of years.

It occurs to me we can make a partial step in this direction
with moving to a directory for Bio.Seq, since this could be
Bio.seq instead. For example, we talked about something
like this:

Bio.Seq -> Bio.seq
Bio.SeqRecord -> Bio.seq.record
Bio.SeqFeature -> Bio.seq.feature
Bio.SeqUtils -> Bio.seq.utils
Bio.SearchIO -> Bio.seq.search

I'm not 100% sure where the Bio.SeqIO top level functions
would belong, either directly under Bio.seq or Bio.seq.record
might work too.

We can have imports setup so that all the classes etc
are only defined once, e.g. Bio/seq/__init__.py could
initially just contain 'from Bio.Seq import *' and so on.

(We'd commit to maintaining the old namespace for
at least as long as our standard deprecation cycle,
longer ideally).

Peter

From p.j.a.cock at googlemail.com  Mon Oct 22 13:17:34 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 22 Oct 2012 18:17:34 +0100
Subject: [Biopython-dev] Dropping Python 2.5 and Jython 2.5 support?
Message-ID: <CAKVJ-_60vgH_T8eqrfYKBScyRVNUApmFM32_4D+uTLABLy19Ng@mail.gmail.com>

Dear Biopythoneers,

Would anyone object to us preparing to drop support for Python 2.5 and
Jython 2.5, perhaps after the next Biopython release?

To reassure those of you using Jython, we'd wait until Jython 2.7 is out
first. Jython 2.7 is already in alpha, and brings support for C Python 2.7
language features.

Thanks,

Peter

From eric.talevich at gmail.com  Mon Oct 22 17:53:55 2012
From: eric.talevich at gmail.com (Eric Talevich)
Date: Mon, 22 Oct 2012 17:53:55 -0400
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_5ZGQrj_xt4b5a7s7TrZ=7B-PXCJfBykHYNidE2L991jg@mail.gmail.com>
References: <CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<1346926418.97489.YahooMailClassic@web164004.mail.gq1.yahoo.com>
	<CAMC681n6=UuotEUdxGVEWDK4vPGd3=4O0yW82UQ3upTNMfy1iw@mail.gmail.com>
	<CAKVJ-_6rTsfqphX6i+YGA8ijLN+04kP+Gxk=BjwWCcXJtF97Vg@mail.gmail.com>
	<CAKVJ-_7-KXVZ96bHLG6XD88zcN9rPvnTf7yQ0E6J1jhb_5yx+g@mail.gmail.com>
	<CAKVJ-_6U0PrsTWM8sMPgsSX8cnfTandTGKz5j829K8so7whPgA@mail.gmail.com>
	<CAKVJ-_4PV3VMx5pju65578gq8TSN936T5ePH_cjhtUQcrECHYg@mail.gmail.com>
	<CAKVJ-_5ZGQrj_xt4b5a7s7TrZ=7B-PXCJfBykHYNidE2L991jg@mail.gmail.com>
Message-ID: <CAMC681=XpBsqO3ohLbDAPyqODCtXy614k=C_f9XfJaqn6xBUhg@mail.gmail.com>

On Mon, Oct 22, 2012 at 1:08 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Fri, Sep 28, 2012 at 11:50 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> > On Thu, Sep 20, 2012 at 10:08 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >> On Sun, Sep 16, 2012 at 1:34 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >>>>
> >>>> I guess we need to have a little hack with the 2to3 library and
> >>>> try defining our own custom fixer for the imports...
> >>>
> >>> I've made a start at this - the easy part seems to work :)
> >>>
> >>> https://github.com/peterjc/biopython/commits/py3lower
> >>>
> >>> ...
> >
> > The code to do this lower case name mangling remains
> > a quite spaghetti like mess in do2to3.py but it now works
> > enough to pass the test suite (with some but not all 3rd
> > party dependencies installed) under Linux and my Mac
> > OS X machine (where like Windows I have a case
> > insensitive file system).
> >
> > ...
> >
> > So this idea to adopt PEP8 lower case module names
> > as part of supporting Python 3 appears to be technically
> > viable.
>
> Has anyone else tried this branch yet? Has the lower case
> module names under Python 3 idea grown on anyone?
> I think it makes sense in terms of a long term vision - I do
> expect to be primarily working under Python 3 within a
> couple of years.
>
> It occurs to me we can make a partial step in this direction
> with moving to a directory for Bio.Seq, since this could be
> Bio.seq instead. For example, we talked about something
> like this:
>
> Bio.Seq -> Bio.seq
> Bio.SeqRecord -> Bio.seq.record
> Bio.SeqFeature -> Bio.seq.feature
> Bio.SeqUtils -> Bio.seq.utils
> Bio.SearchIO -> Bio.seq.search
>
> I'm not 100% sure where the Bio.SeqIO top level functions
> would belong, either directly under Bio.seq or Bio.seq.record
> might work too.
>


Personally, I've used the variable name "seq" an awful lot, so I'm wary of
using "seq" as a module name. However, reasonable coding style could make
this easy to avoid if we have a "seq" module containing all of Seq,
SeqRecord and SeqFeature (maybe even Alphabet), and "sequtil" containing
standalone functions.

Result:

# Everything you need to build a new sequence record, but not much else
from Bio.seq import Seq, SeqRecord, SeqFeature

# Working with sequence strings
from Bio import sequtil

It also seems reasonable to treat molecular sequences as the implied core
object type at the top-level namespace. From that viewpoint, Bio.Search
would mean sequence search, as everything else is typically tucked away in
a sub-module like PDB (pdb?), Motif (motif), or Phylo (phylo); then it's
also fine to keep seqio and alignio directly under the Bio namespace.

(Given a clean, I'd prefer "from Bio import Seq, SeqRecord, SeqFeature",
but since those are already module names it would be brutal to make that
transition now.)


> We can have imports setup so that all the classes etc
> are only defined once, e.g. Bio/seq/__init__.py could
> initially just contain 'from Bio.Seq import *' and so on.
>
>
Sounds cool. We'll need to watch out for the PDB module, where classes and
modules have identical names, and the class names are imported to shadow
the module names at import time.

-Eric

From p.j.a.cock at googlemail.com  Mon Oct 22 18:59:21 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 22 Oct 2012 23:59:21 +0100
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAMC681=XpBsqO3ohLbDAPyqODCtXy614k=C_f9XfJaqn6xBUhg@mail.gmail.com>
References: <CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<1346926418.97489.YahooMailClassic@web164004.mail.gq1.yahoo.com>
	<CAMC681n6=UuotEUdxGVEWDK4vPGd3=4O0yW82UQ3upTNMfy1iw@mail.gmail.com>
	<CAKVJ-_6rTsfqphX6i+YGA8ijLN+04kP+Gxk=BjwWCcXJtF97Vg@mail.gmail.com>
	<CAKVJ-_7-KXVZ96bHLG6XD88zcN9rPvnTf7yQ0E6J1jhb_5yx+g@mail.gmail.com>
	<CAKVJ-_6U0PrsTWM8sMPgsSX8cnfTandTGKz5j829K8so7whPgA@mail.gmail.com>
	<CAKVJ-_4PV3VMx5pju65578gq8TSN936T5ePH_cjhtUQcrECHYg@mail.gmail.com>
	<CAKVJ-_5ZGQrj_xt4b5a7s7TrZ=7B-PXCJfBykHYNidE2L991jg@mail.gmail.com>
	<CAMC681=XpBsqO3ohLbDAPyqODCtXy614k=C_f9XfJaqn6xBUhg@mail.gmail.com>
Message-ID: <CAKVJ-_7spK5YYSZsoU1jqocYv2TPyCtHqUFokDae1esqfDbgTA@mail.gmail.com>

On Mon, Oct 22, 2012 at 10:53 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
>
> Personally, I've used the variable name "seq" an awful lot, so I'm wary of
> using "seq" as a module name. However, reasonable coding style could make
> this easy to avoid if we have a "seq" module containing all of Seq,
> SeqRecord and SeqFeature (maybe even Alphabet), and "sequtil" containing
> standalone functions.
>
> Result:
>
> # Everything you need to build a new sequence record, but not much else
> from Bio.seq import Seq, SeqRecord, SeqFeature

I'd been picturing:

from Bio.seq import Seq
from Bio.seq.record import SeqRecord
from Bio.seq.feature import SeqFeature

but you're right, those three classes could all be exposed at the level
of Bio.seq (while still having the SeqRecord defined in the file
Bio/seq/record.py and SeqFeature etc in Bio/seq/feature.py) for
connivence.

> # Working with sequence strings
> from Bio import sequtil

If you mean strings rather than Seq objects, currently Bio.SeqUtils
should most work on Seq or strings. It is kind of an odds and ends
module, rather than deliberately focusing on sequences as strings.

> It also seems reasonable to treat molecular sequences as the implied core
> object type at the top-level namespace. From that viewpoint, Bio.Search
> would mean sequence search, as everything else is typically tucked away in a
> sub-module like PDB (pdb?), Motif (motif), or Phylo (phylo); then it's also
> fine to keep seqio and alignio directly under the Bio namespace.

Having sequence stuff collected under Bio.Seq or Bio.seq (or bio.seq
if we go with the lower case plan for Python 3) seems more organised.
It also keeps the import times down for people not working with
sequences (e.g. a script using clustering or PDB files).

> (Given a clean, I'd prefer "from Bio import Seq, SeqRecord, SeqFeature", but
> since those are already module names it would be brutal to make that
> transition now.)

That isn't a good plan anyway in terms of polluting the namespace
and loading things into memory for anyone not working with sequences.

>> We can have imports setup so that all the classes etc
>> are only defined once, e.g. Bio/seq/__init__.py could
>> initially just contain 'from Bio.Seq import *' and so on.
>>
>
> Sounds cool. We'll need to watch out for the PDB module, where classes and
> modules have identical names, and the class names are imported to shadow the
> module names at import time.

The shadowing was one of the gotchas in the auto-conversion
of all the module names to lower case - but solvable. Adopting
lower case module names has the bonus of fixing this in the long
term.

Peter

From kjwu at ucsd.edu  Wed Oct 24 18:38:04 2012
From: kjwu at ucsd.edu (Kevin Wu)
Date: Wed, 24 Oct 2012 15:38:04 -0700
Subject: [Biopython-dev] KEGG API Wrapper
In-Reply-To: <CAKVJ-_7Ao1gdtF2_-7GH89qWGtseLVuJ4beB9bUpun5DLwcQsA@mail.gmail.com>
References: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com>
	<C45C27DAD51E2647959A053EBBF42C86089480@RUMBX2.rockefeller.edu>
	<CAKVJ-_7Ao1gdtF2_-7GH89qWGtseLVuJ4beB9bUpun5DLwcQsA@mail.gmail.com>
Message-ID: <CAEe6yUEbiK3tFdvx1hEGE2==QR7Pab2HcvL6x-CqOivWCB9=sg@mail.gmail.com>

Hi All,

Thanks for the comments, I've written a bit of documentation on the entire
KEGG module and have attached those relevant pages to the email. There
didn't seem like an appropriate place for examples, so I just added a new
chapter. I've also committed the updated file to github.

I did leave out the parsers due to the fact that the current parsers only
cover a small portion of possible responses from the api. Also, I'm not
confident that the some of the parsers correctly retrieves all the fields.
However, I've written a really general parser that does a rough job of
retrieving fields if it's a database format returned since I find myself
reusing the code for all database formats. It's possible to modify this to
correctly account for the different fields, but would probably take a bit
of work to manually figure each field out. Otherwise it also parses the
tsv/flat file returned.

Also, @zach, thanks for checking it out and testing it!

Thanks All!
Kevin

On Wed, Oct 17, 2012 at 4:09 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Wed, Oct 17, 2012 at 12:55 AM, Zachary Charlop-Powers
> <zcharlop at mail.rockefeller.edu> wrote:
> > Kevin,
> > Michiel,
> >
> > I just tested Kevin's code for a few simple queries and it worked great.
> I
> > have always liked KEGG's organization of data and really appreciate this
> > RESTful interface to their data; in some ways I think it easier to use
> the
> > web interfaces for KEGG than it is for NCBI. Plus the KEGG coverage of
> > metabolic networks is awesome.  I found the examples in Kevin's test
> script
> > to be fairly self-explanatory but a simple-spelled out example in the
> > Tutorial would be nice.
> >
> > One thought, though, is that you can retrieve MANY different types of
> data
> > from the KEGG Rest API - which means that the user will probably have to
> > parse the data his/herself. Data retrieved with "list" can return lists
> of
> > genes or compounds or organism and after a  cursory look  these are each
> > formatted differently. Also true with the 'find' command. So I think you
> > were right to leave out parsers because i think they will be a moving
> target
> > highly dependent on the query.
> >
> > Thank You Kevin,
> > zach cp
>
> Good point about decoupling the web API wrapper and the parsers -
> how the Bio.Entrez module and Bio.TogoWS handle this is to return
> handles for web results, which you can then parse with an appropriate
> parser (e.g. SeqIO for GenBank files, Medline parser, etc).
>
> Note that this is a little more fiddly under Python 3 due to the text
> mode distinction between unicode and binary... just something to
> keep in the back of your mind.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: KEGG documentation.pdf
Type: application/pdf
Size: 128597 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20121024/3f7b7063/attachment-0001.pdf>

From cmccoy at fhcrc.org  Thu Oct 25 17:36:44 2012
From: cmccoy at fhcrc.org (Connor McCoy)
Date: Thu, 25 Oct 2012 14:36:44 -0700
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
Message-ID: <CAChfGK0C=-facUZeq_Aqd4LS=NFCE8iBmtcTt=OmL7-L62PUew@mail.gmail.com>

Hello,

About a year ago, pip support came up on the list:

http://biopython.org/pipermail/biopython-dev/2011-October/009234.html

I remember this being resolved, but when I try to install biopython with
pip, it fails:

    $ testenv/bin/pip install biopython

    Downloading/unpacking biopython
      Running setup.py egg_info for package biopython

        warning: no previously-included files matching '.cvsignore' found
under directory '*'
        warning: no previously-included files matching '*.pyc' found under
directory '*'
    Installing collected packages: biopython
      Running setup.py install for biopython

        Numerical Python (NumPy) is not installed.

        This package is required for many Biopython features.  Please
install
        it before you install Biopython. You can install Biopython anyway,
but
        anything dependent on NumPy will not work. If you do this, and later
        install NumPy, you should then re-install Biopython.

        You can find NumPy at http://numpy.scipy.org

        Complete output from command
/home/cmccoy/development/seqmagick/testenv/bin/python -c "import
setuptools;__file__='/home/cmccoy/development/seqmagick/testenv/build/biopython/set
    up.py';exec(compile(open(__file__).read().replace('\r\n', '\n'),
__file__, 'exec'))" install --single-version-externally-managed --record
/tmp/pip-wc___H-record/install-record.txt -
    -install-headers
/home/cmccoy/development/seqmagick/testenv/include/site/python2.7:
        running install


    Numerical Python (NumPy) is not installed.


    This package is required for many Biopython features.  Please install

    it before you install Biopython. You can install Biopython anyway, but

    anything dependent on NumPy will not work. If you do this, and later

    install NumPy, you should then re-install Biopython.


    You can find NumPy at http://numpy.scipy.org


    ----------------------------------------
    Command /home/cmccoy/development/seqmagick/testenv/bin/python -c
"import
setuptools;__file__='/home/cmccoy/development/seqmagick/testenv/build/biopython/setup.py';exec(compile(open(
    __file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install
--single-version-externally-managed --record
/tmp/pip-wc___H-record/install-record.txt --install-headers /home/cm
    ccoy/development/seqmagick/testenv/include/site/python2.7 failed with
error code 255 in /home/cmccoy/development/seqmagick/testenv/build/biopython
    Storing complete log in /home/cmccoy/.pip/pip.log


Same for libraries which list biopython in `install_requires`.

Does anyone know of a way around this?

Thanks,
Connor

-- 
Connor McCoy
Fred Hutchinson Cancer Research Center
1100 Fairview Ave N.
Seattle, WA 98109-1924
cmccoy at fhcrc.org

From mjldehoon at yahoo.com  Thu Oct 25 22:52:42 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 25 Oct 2012 19:52:42 -0700 (PDT)
Subject: [Biopython-dev] KEGG API Wrapper
In-Reply-To: <CAEe6yUEbiK3tFdvx1hEGE2==QR7Pab2HcvL6x-CqOivWCB9=sg@mail.gmail.com>
Message-ID: <1351219962.39081.YahooMailClassic@web164002.mail.gq1.yahoo.com>

Hi Kevin,

Thanks for the documentation! That makes everything a lot clearer.
Overall I like the querying code and I think we should add it to Biopython.

I have a bunch of comments on the KEGG module, some on the existing code and some on the new querying code, see below. Most of these are trivial; some may need some further discussion. Perhaps could you let us know which of these comments you can address, and which ones you want to skip for now?

Once we converged with regards to the querying code and the documentation, I think we can import your version of the KEGG module into the main Biopython repository and add your chapter on KEGG to the main documentation, and continue from there on the parsers and the unit tests.

Many thanks!
-Michiel.


About the querying code:
----------------------------------

I would replace KEGG.query("list", KEGG.query("find", KEGG.query("conv", KEGG.query("link", KEGG.query("info", KEGG.query("get" by the functions KEGG.list, KEGG.find, KEGG.conv, KEGG.link, KEGG.info, and KEGG.get.

For list, find, conv, link, and info, instead of going through KEGG.generic_parser, I would return the result directly as a Python list.
In contrast, KEGG.get should return the handle to the results, not the data itself. So the _q function, instead of
? ...
? resp = urllib2.urlopen(req)
? data = resp.read()
? return query_url, data
have
? ...

? resp = urllib2.urlopen(req)

? return resp
Then the user can decide whether to parse the data on the fly with Bio.KEGG, or read the data line by line and pick up what they are interested in, or to get all data from the handle and save it in a file. Note that resp will have a .url attribute that contains the url, so you won't need the ret_url keyword.


About the parsers:

------------------------


I think that we should drop generic_parser. For link, find, conv, link, and info, parsing is trivial and can be done by the respective functions directly. For get, we already have an appropriate parser for some databases (compound, map, and enzyme), but it's easy to add parsers for the other databases.

For all parsers in Biopython, there is the question whether the record should store information in attributes (as is currently done in Bio.KEGG), or alternatively if the record should inherit from a dictionary and store information in keys in the dictionary. Personally I have a preference for a dictionary, since that allows us to use the exact same keys in the dictionary as is used in the file (e.g., we can use "CLASS" as a key, while we cannot use .class as an attribute since it is a reserved word, so we use .classname instead). But other Biopython developers may not agree with me, and to some extent it depends on personal preference. 

The parsers miss some key words. The ones I noticed are ALL_REAC, REFERENCE, and ORTHOLOGY. Probably we'll find more once we extend the unit tests.

Remove the ';' at the end of each term in record.classname.

Convert record.genes to a dictionary for each organism. So instead of
[('HSA', ['5236', '55276']), ('PTR', ['456908', '461162']), ('PON', ['100190836', '100438793']), ('MCC', ['100424648', '699401']...
have
{'HSA': ['5236', '55276'], 'PTR': ['456908', '461162'], 'PON': ['100190836', '100438793'], 'MCC': ['100424648', '699401'], ...

Also for record.dblinks, record.disease, record.structures, use a dictionary.

In record.pathway, all entries start with 'PATH'. Perhaps we should check with KEGG if there could be anything else than 'PATH' there, otherwise I don't see the reason why it's there. Assuming that there could be something different there, I would also use a dictionary with 'PATH' as the key.

In record.reaction, some chemical names can be very long and extend over multiple lines. In such cases, the continuation line starts with a '$'. The parser should remove the '$' and join the two lines.

About the tests:


--------------------

We should update the data files in Tests/KEGG. This will fix some "bugs" in these data files.

We should switch test_KEGG.py to the unit test framework.

We should do some more extensive testing to make sure we are not missing some key words.

About the documentation:
---------------------------------
It's great that we now have some documentation.

On page 233, I would suggest to replace the "id_" by "accession" or something else, since the underscore in "id_" may look funky to new users.


Also it may be better not to reuse variable names (e.g. "pathway" is used in three different ways in the example). It's OK of course in general, but for this example it may be more clear to distinguish the different usages of this variable from each other.

For repair_genes, you can use a set instead of a list throughout.


--- On Wed, 10/24/12, Kevin Wu <kjwu at ucsd.edu> wrote:

From: Kevin Wu <kjwu at ucsd.edu>
Subject: Re: [Biopython-dev] KEGG API Wrapper
To: "Peter Cock" <p.j.a.cock at googlemail.com>, "Zachary Charlop-Powers" <zcharlop at mail.rockefeller.edu>, "Michiel de Hoon" <mjldehoon at yahoo.com>
Cc: Biopython-dev at lists.open-bio.org
Date: Wednesday, October 24, 2012, 6:38 PM

Hi All,
Thanks for the comments, I've written a bit of documentation on the entire KEGG module and have attached those relevant pages to the email. There didn't seem like an?appropriate place for examples, so I just added a new chapter. I've also committed the updated file to github.


I did leave out the parsers due to the fact that the current parsers only cover a small portion of possible responses from the api. Also, I'm not confident that the some of the parsers correctly retrieves all the fields. However, I've written a really general parser that does a rough job of retrieving fields if it's a database format returned since I find myself reusing the code for all database formats. It's possible to modify this to correctly account for the different fields, but would probably take a bit of work to manually figure each field out. Otherwise it also parses the tsv/flat file returned.


Also, @zach, thanks for checking it out and testing it!
Thanks All!Kevin
On Wed, Oct 17, 2012 at 4:09 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:


On Wed, Oct 17, 2012 at 12:55 AM, Zachary Charlop-Powers

<zcharlop at mail.rockefeller.edu> wrote:

> Kevin,

> Michiel,

>

> I just tested Kevin's code for a few simple queries and it worked great. I

> have always liked KEGG's organization of data and really appreciate this

> RESTful interface to their data; in some ways I think it easier to use the

> web interfaces for KEGG than it is for NCBI. Plus the KEGG coverage of

> metabolic networks is awesome. ?I found the examples in Kevin's test script

> to be fairly self-explanatory but a simple-spelled out example in the

> Tutorial would be nice.

>

> One thought, though, is that you can retrieve MANY different types of data

> from the KEGG Rest API - which means that the user will probably have to

> parse the data his/herself. Data retrieved with "list" can return lists of

> genes or compounds or organism and after a ?cursory look ?these are each

> formatted differently. Also true with the 'find' command. So I think you

> were right to leave out parsers because i think they will be a moving target

> highly dependent on the query.

>

> Thank You Kevin,

> zach cp


Good point about decoupling the web API wrapper and the parsers -

how the Bio.Entrez module and Bio.TogoWS handle this is to return

handles for web results, which you can then parse with an appropriate

parser (e.g. SeqIO for GenBank files, Medline parser, etc).


Note that this is a little more fiddly under Python 3 due to the text

mode distinction between unicode and binary... just something to

keep in the back of your mind.


Peter

_______________________________________________

Biopython-dev mailing list

Biopython-dev at lists.open-bio.org

http://lists.open-bio.org/mailman/listinfo/biopython-dev


From kai.blin at biotech.uni-tuebingen.de  Fri Oct 26 04:35:56 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Fri, 26 Oct 2012 10:35:56 +0200
Subject: [Biopython-dev] Status of SearchIO
Message-ID: <508A4B6C.6020801@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi folks,

In the summer, I've written a HMMer2 parser based on Bow's SearchIO
code. I'm finally getting around to continue work on the project I
needed this parser for, and I'm trying to get my code up-to-date.

I notice that Bow's code hasn't hit the biopython master tree yet, and
also doesn't rebase cleanly on top of it. A merge gives a couple of
merge conflicts, but seems manageable. However, I'd prefer to stick to
the upstream sources instead of maintaining my own branch containing
Bow's SearchIO code merged to master.

What's the chance of this happening any time soon, and can I help?

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQiktsAAoJEKM5lwBiwTTPuDMH/33PGo/zLpBGw+dKIBXZ9b9L
opaoI5uUsj4XzWU1A8u50BXFqa6ogwUWeZFaA2j25nQgEClWA5TFdHAJM4urTTgD
pM2g2rsL/yLSrVifM95c2IcRW2z7dunccpJDd6cc82BRpqqgGWrkNo7OSUk/exP3
DbfooBw66Scxt+6o6S9zEH4IY5giuDOGzwQm195TCaZ/x/8/y1F8Ub/8Aporbj47
eJgZmEKzh0k8KePKOdyCmnt/d/bDGplFSvgqXET6Q0jmVAG44lAU679UPCmNiuJr
VZD2SMRKy+Buy3TjJjQCeUEm+awN4T2LnPLDJgJkvRHjl6G+M9aljsuL78uCp9g=
=1Nrt
-----END PGP SIGNATURE-----

From p.j.a.cock at googlemail.com  Fri Oct 26 05:21:50 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 10:21:50 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <508A4B6C.6020801@biotech.uni-tuebingen.de>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>

On Fri, Oct 26, 2012 at 9:35 AM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi folks,
>
> In the summer, I've written a HMMer2 parser based on Bow's SearchIO
> code. I'm finally getting around to continue work on the project I
> needed this parser for, and I'm trying to get my code up-to-date.
>
> I notice that Bow's code hasn't hit the biopython master tree yet, and
> also doesn't rebase cleanly on top of it. A merge gives a couple of
> merge conflicts, but seems manageable. However, I'd prefer to stick to
> the upstream sources instead of maintaining my own branch containing
> Bow's SearchIO code merged to master.
>
> What's the chance of this happening any time soon, and can I help?
>
> Cheers,
> Kai

I'm not sure where the merge conflict is - Bow can probably help
and confirm you're looking at the appropriate branch.

What would help is comments on the name space ideas in this
thread, since one major point we need to settle ASAP is where
in the namespace SearchIO would go (since it probably won't
just stay as Bio.SearchIO as it is on the branch):

http://lists.open-bio.org/pipermail/biopython-dev/2012-September/009910.html
...
http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html
...

Peter

From w.arindrarto at gmail.com  Fri Oct 26 05:33:35 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 26 Oct 2012 11:33:35 +0200
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
Message-ID: <CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>

Hi Kai, Peter,

For the merge conflict, which branch are you using? Can you point to
specific commits that cause the conflicts? I haven't tried merging /
rebasing my own branch to the current master myself ~ so knowing this
should help the process as well.

And suggestions are still welcomed for the namespace :). Bio.SearchIO is
the current one, but we have other alternatives (the most recent one being
Bio.seq.search; following the change in Bio.Seq -> Bio.seq namespace
change).

Also, I think there are still some issues that need to be dealt with before
we put SearchIO into master, notably with Bio.BLAST module. If not the
official deprecation notice, at least the the tutorial has to be updated
(let Bio.BLAST readers know about the plan with SearchIO). I've written a
short tutorial here: http://bow.web.id/biopython/Tutorial.html. This is
still a draft, but you can already see that there are some obvious overlaps
between Bio.BLAST and Bio.SearchIO, which is confusing to new readers.

regards,
Bow

On Fri, Oct 26, 2012 at 11:21 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Fri, Oct 26, 2012 at 9:35 AM, Kai Blin
> <kai.blin at biotech.uni-tuebingen.de> wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hi folks,
> >
> > In the summer, I've written a HMMer2 parser based on Bow's SearchIO
> > code. I'm finally getting around to continue work on the project I
> > needed this parser for, and I'm trying to get my code up-to-date.
> >
> > I notice that Bow's code hasn't hit the biopython master tree yet, and
> > also doesn't rebase cleanly on top of it. A merge gives a couple of
> > merge conflicts, but seems manageable. However, I'd prefer to stick to
> > the upstream sources instead of maintaining my own branch containing
> > Bow's SearchIO code merged to master.
> >
> > What's the chance of this happening any time soon, and can I help?
> >
> > Cheers,
> > Kai
>
> I'm not sure where the merge conflict is - Bow can probably help
> and confirm you're looking at the appropriate branch.
>
> What would help is comments on the name space ideas in this
> thread, since one major point we need to settle ASAP is where
> in the namespace SearchIO would go (since it probably won't
> just stay as Bio.SearchIO as it is on the branch):
>
>
> http://lists.open-bio.org/pipermail/biopython-dev/2012-September/009910.html
> ...
> http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html
> ...
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

From p.j.a.cock at googlemail.com  Fri Oct 26 05:43:28 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 10:43:28 +0100
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <CAChfGK0C=-facUZeq_Aqd4LS=NFCE8iBmtcTt=OmL7-L62PUew@mail.gmail.com>
References: <CAChfGK0C=-facUZeq_Aqd4LS=NFCE8iBmtcTt=OmL7-L62PUew@mail.gmail.com>
Message-ID: <CAKVJ-_7fDh8QTzAZRQcsH5uZndt+a3N0uDEkpuzsrau=J3aLhA@mail.gmail.com>

On Thu, Oct 25, 2012 at 10:36 PM, Connor McCoy <cmccoy at fhcrc.org> wrote:
> Hello,
>
> About a year ago, pip support came up on the list:
>
> http://biopython.org/pipermail/biopython-dev/2011-October/009234.html
>
> I remember this being resolved, but when I try to install biopython with
> pip, it fails:
>
>     $ testenv/bin/pip install biopython
>
>     Downloading/unpacking biopython
>       Running setup.py egg_info for package biopython
>
>         warning: no previously-included files matching '.cvsignore' found
> under directory '*'
>         warning: no previously-included files matching '*.pyc' found under
> directory '*'
>     Installing collected packages: biopython
>       Running setup.py install for biopython
>
>         Numerical Python (NumPy) is not installed.
>
>         This package is required for many Biopython features.  Please
> install
>         it before you install Biopython. You can install Biopython anyway,
> but
>         anything dependent on NumPy will not work. If you do this, and later
>         install NumPy, you should then re-install Biopython.
>
>         You can find NumPy at http://numpy.scipy.org
>
>         Complete output from command
> /home/cmccoy/development/seqmagick/testenv/bin/python -c "import
> setuptools;__file__='/home/cmccoy/development/seqmagick/testenv/build/biopython/set
>     up.py';exec(compile(open(__file__).read().replace('\r\n', '\n'),
> __file__, 'exec'))" install --single-version-externally-managed --record
> /tmp/pip-wc___H-record/install-record.txt -
>     -install-headers
> /home/cmccoy/development/seqmagick/testenv/include/site/python2.7:
>         running install
>
>
>
>     Numerical Python (NumPy) is not installed.
>
>
>
>     This package is required for many Biopython features.  Please install
>
>     it before you install Biopython. You can install Biopython anyway, but
>
>     anything dependent on NumPy will not work. If you do this, and later
>
>     install NumPy, you should then re-install Biopython.
>
>
>
>     You can find NumPy at http://numpy.scipy.org
>
>
>
>     ----------------------------------------
>     Command /home/cmccoy/development/seqmagick/testenv/bin/python -c
> "import
> setuptools;__file__='/home/cmccoy/development/seqmagick/testenv/build/biopython/setup.py';exec(compile(open(
>     __file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install
> --single-version-externally-managed --record
> /tmp/pip-wc___H-record/install-record.txt --install-headers /home/cm
>     ccoy/development/seqmagick/testenv/include/site/python2.7 failed with
> error code 255 in /home/cmccoy/development/seqmagick/testenv/build/biopython
>     Storing complete log in /home/cmccoy/.pip/pip.log
>
>
> Same for libraries which list biopython in `install_requires`.
>
> Does anyone know of a way around this?
>
> Thanks,
> Connor

Hi Connor,

This is probably a question for Brad - I don't use pip.

Was it sitting stalled at the prompt from Biopython's setup.py?
"Do you want to continue this installation? (y/N)" or from pip?
i.e. What was at the end of the complete log?

In terms of a quick workaround, what we use under TravisCI
(where most of the targets don't have numpy installed) is
piping a yes on stdin, e.g.

$ /usr/bin/yes | python setup.py install

Peter

From p.j.a.cock at googlemail.com  Fri Oct 26 06:31:06 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 11:31:06 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <508A6535.6070507@biotech.uni-tuebingen.de>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
	<508A6535.6070507@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_6pvuFqCc+EJVgP-GAnAN6XQ83Y7tqAE8NKU1j95qsEnA@mail.gmail.com>

On Fri, Oct 26, 2012 at 11:25 AM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
>> Also, I think there are still some issues that need to be dealt
>> with before we put SearchIO into master, notably with Bio.BLAST
>> module. If not the official deprecation notice, at least the the
>> tutorial has to be updated (let Bio.BLAST readers know about the
>> plan with SearchIO). I've written a short tutorial here:
>> http://bow.web.id/biopython/Tutorial.html. This is still a draft,
>> but you can already see that there are some obvious overlaps
>> between Bio.BLAST and Bio.SearchIO, which is confusing to new
>> readers.
>
> Personally I wouldn't let this consideration block the inclusion of a
> module as useful like that. Of course I need this code, so I'm biased.

I'm also OK with merging the code before updating the Tutorial
chapter on BLAST (which would probably become a broader
chapter on BLAST and other tools using SearchIO). As discussed
before, the long term aim would be to remove Bio.BLAST.

> I'll have to read up on the namespace discussion. While I see the
> benefit of using PEP8 names, intuitively I don't like bio.seq.search
> much. Then again, I started my life in Bio* with BioPerl, and like the
> pretty similar module layout BioPython has so far.

Yeah - the current naming of SeqIO and AlignIO was directly
inspired by BioPerl, and give the working name of SearchIO.

Peter

From kai.blin at biotech.uni-tuebingen.de  Fri Oct 26 06:25:57 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Fri, 26 Oct 2012 12:25:57 +0200
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
Message-ID: <508A6535.6070507@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-10-26 11:33, Wibowo Arindrarto wrote:
> Hi Kai, Peter,
> 
> For the merge conflict, which branch are you using? Can you point
> to specific commits that cause the conflicts? I haven't tried
> merging / rebasing my own branch to the current master myself ~ so
> knowing this should help the process as well.

For merging, I think I had to change
.travis.yml
setup.py
and Tests/run_tests.py

.travis.yml and setup.py mainly had whitespace changes in comments, so
I just went with the version from master on those changes. As I said,
nothing really huge.

https://github.com/kblin/biopython/tree/searchio-merge is the merged tree.

The rebase had a number of things, I just gave up on that.

> Also, I think there are still some issues that need to be dealt
> with before we put SearchIO into master, notably with Bio.BLAST
> module. If not the official deprecation notice, at least the the
> tutorial has to be updated (let Bio.BLAST readers know about the
> plan with SearchIO). I've written a short tutorial here:
> http://bow.web.id/biopython/Tutorial.html. This is still a draft, 
> but you can already see that there are some obvious overlaps
> between Bio.BLAST and Bio.SearchIO, which is confusing to new
> readers.

Personally I wouldn't let this consideration block the inclusion of a
module as useful like that. Of course I need this code, so I'm biased.

I'll have to read up on the namespace discussion. While I see the
benefit of using PEP8 names, intuitively I don't like bio.seq.search
much. Then again, I started my life in Bio* with BioPerl, and like the
pretty similar module layout BioPython has so far.

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQimU1AAoJEKM5lwBiwTTPLUsH/i1C1jWmSgjk3PZSOo2kpn4l
sGfonyZ7UcyOyM1RYMOc9xaJwevyGJbxVpdmhzIsCr8WZ2++uTgqwOKHROw84bu4
BfVTovUD3mNUK3kGEemOQQal8HyjTZozRFmPgQpSSTOOgQE964kA7mm2HJH9sNx9
NHUKj+dk7UwmbzETl2Q0/1lmxdptOVCTyQvwMzleCX4dwgdGumyrNiBQmBLerAKV
CRW8cVmVPKkVUokuzWpt6LPZIoUxMz5RVmTJktOX0fpg79ULfXQucByrGtGQbiSR
JMWGrK5yCliSz1WqV8r/Tx0VfPmEeiZFyzZb5KiAFE88sJK85cbFgUBegUTDZSU=
=372O
-----END PGP SIGNATURE-----

From w.arindrarto at gmail.com  Fri Oct 26 06:38:50 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 26 Oct 2012 12:38:50 +0200
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAKVJ-_6pvuFqCc+EJVgP-GAnAN6XQ83Y7tqAE8NKU1j95qsEnA@mail.gmail.com>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
	<508A6535.6070507@biotech.uni-tuebingen.de>
	<CAKVJ-_6pvuFqCc+EJVgP-GAnAN6XQ83Y7tqAE8NKU1j95qsEnA@mail.gmail.com>
Message-ID: <CADEGkF5J0pHBoNaB1xKfQzwNYYzuKverV+1zt3EiEXVA0dEQKg@mail.gmail.com>

>> Also, I think there are still some issues that need to be dealt
>
> >> with before we put SearchIO into master, notably with Bio.BLAST
> >> module. If not the official deprecation notice, at least the the
> >> tutorial has to be updated (let Bio.BLAST readers know about the
> >> plan with SearchIO). I've written a short tutorial here:
> >> http://bow.web.id/biopython/Tutorial.html. This is still a draft,
> >> but you can already see that there are some obvious overlaps
> >> between Bio.BLAST and Bio.SearchIO, which is confusing to new
> >> readers.
> >
> > Personally I wouldn't let this consideration block the inclusion of a
> > module as useful like that. Of course I need this code, so I'm biased.
>
> I'm also OK with merging the code before updating the Tutorial
> chapter on BLAST (which would probably become a broader
> chapter on BLAST and other tools using SearchIO). As discussed
> before, the long term aim would be to remove Bio.BLAST.

Ah, ok then :). There are other things I'm still working on at the
moment (BLAST plain text writer, details about migrating from
Bio.Blast), but I consider these to be less urgent than the tutorial.
If everyone is ok for merging, then I'm good too :). I suppose we are
going to use the 'beta' new feature warning here, right?

> > I'll have to read up on the namespace discussion. While I see the
> > benefit of using PEP8 names, intuitively I don't like bio.seq.search
> > much. Then again, I started my life in Bio* with BioPerl, and like the
> > pretty similar module layout BioPython has so far.
>
> Yeah - the current naming of SeqIO and AlignIO was directly
> inspired by BioPerl, and give the working name of SearchIO.
>
> Peter

Reaching a unanimous decision on name preference seems difficult :/.
We now have:

1. Bio.seq.search (in line with the namespace change)
2. Bio.seqsearch (top-level module, separate from Bio.seq. This used
to be Bio.SeqSearch, now adjusted for PEP8 compliance)
3. Bio.search (same reasoning + explanation like Bio.seqsearch).
4. Bio.SearchIO / Bio.searchio
5. Bio.psearch (p for pairwise)

Any other suggestions? Should we put it to a vote?

regards,
Bowo

From p.j.a.cock at googlemail.com  Fri Oct 26 06:51:32 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 11:51:32 +0100
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <508A694B.7030800@biotech.uni-tuebingen.de>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>

On Fri, Oct 26, 2012 at 11:43 AM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
>
> Hi folks,
>
> I realize I'm late to this party, but I was asked to give an opinion
> in the SearchIO thread.
>
> On 2012-09-06 09:06, Peter Cock wrote:
>> For single user machines, where the single user has only a small
>> collection of scripts this isn't such an issue. For any shared
>> server, or user with lots of Biopython scripts (some of which may
>> have been written by different people), you would be forced into a
>> mass change at one go.
>>
>> You would also have considerable hassle later on with any attempt
>> to re-run old scripts.
>
> In my opinion, this is where python virtualenv [1] can really make
> life easier, and I'd recommend this for running old library versions
> anyway.
>
> I'd rather do the correct change now, for every version of python, and
> explain to people how to set up virtualenvs for their older scripts.

I don't think this is practical - you'd have a *lot* of explaining to do
for all the users who'd be bitten by such a big non-backward
compatible change (and associated systems administrators).

Indirectly it sounds like you like the lower case name idea - what
do you think about making this switch under Python 3? (This will
only inconvenience the relatively small number of early adopters
already trying Biopython under Python 3 - but it would be another
bump for people transitioning from Python 2 to 3).

Peter

From p.j.a.cock at googlemail.com  Fri Oct 26 06:57:16 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 11:57:16 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CADEGkF5J0pHBoNaB1xKfQzwNYYzuKverV+1zt3EiEXVA0dEQKg@mail.gmail.com>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
	<508A6535.6070507@biotech.uni-tuebingen.de>
	<CAKVJ-_6pvuFqCc+EJVgP-GAnAN6XQ83Y7tqAE8NKU1j95qsEnA@mail.gmail.com>
	<CADEGkF5J0pHBoNaB1xKfQzwNYYzuKverV+1zt3EiEXVA0dEQKg@mail.gmail.com>
Message-ID: <CAKVJ-_6Yaa0-xBbw5TgqMny9LbwpTJXG2X_dE2=ybcP_GFRvAg@mail.gmail.com>

On Fri, Oct 26, 2012 at 11:38 AM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
>>> Also, I think there are still some issues that need to be dealt
>>
>> >> with before we put SearchIO into master, notably with Bio.BLAST
>> >> module. If not the official deprecation notice, at least the the
>> >> tutorial has to be updated (let Bio.BLAST readers know about the
>> >> plan with SearchIO). I've written a short tutorial here:
>> >> http://bow.web.id/biopython/Tutorial.html. This is still a draft,
>> >> but you can already see that there are some obvious overlaps
>> >> between Bio.BLAST and Bio.SearchIO, which is confusing to new
>> >> readers.
>> >
>> > Personally I wouldn't let this consideration block the inclusion of a
>> > module as useful like that. Of course I need this code, so I'm biased.
>>
>> I'm also OK with merging the code before updating the Tutorial
>> chapter on BLAST (which would probably become a broader
>> chapter on BLAST and other tools using SearchIO). As discussed
>> before, the long term aim would be to remove Bio.BLAST.
>
> Ah, ok then :). There are other things I'm still working on at the
> moment (BLAST plain text writer, details about migrating from
> Bio.Blast), but I consider these to be less urgent than the tutorial.
> If everyone is ok for merging, then I'm good too :). I suppose we are
> going to use the 'beta' new feature warning here, right?

Yes to the 'beta' warning. I'd like to get some wider testing with
community feedback on the API, while giving us the option to
change it before declaring it stable.

>> > I'll have to read up on the namespace discussion. While I see the
>> > benefit of using PEP8 names, intuitively I don't like bio.seq.search
>> > much. Then again, I started my life in Bio* with BioPerl, and like the
>> > pretty similar module layout BioPython has so far.
>>
>> Yeah - the current naming of SeqIO and AlignIO was directly
>> inspired by BioPerl, and give the working name of SearchIO.
>>
>> Peter
>
> Reaching a unanimous decision on name preference seems difficult :/.
> We now have:
>
> 1. Bio.seq.search (in line with the namespace change)
> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used
> to be Bio.SeqSearch, now adjusted for PEP8 compliance)
> 3. Bio.search (same reasoning + explanation like Bio.seqsearch).
> 4. Bio.SearchIO / Bio.searchio
> 5. Bio.psearch (p for pairwise)
>
> Any other suggestions? Should we put it to a vote?

I'd like a consensus first on the larger question of should we
adopt lower case module names automatically under Python 3.
In that case, option (1) about would be bio.seq.search under
Python 3, and so on.

Peter

From kai.blin at biotech.uni-tuebingen.de  Fri Oct 26 06:43:23 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Fri, 26 Oct 2012 12:43:23 +0200
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
Message-ID: <508A694B.7030800@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-09-06 09:06, Peter Cock wrote:

Hi folks,

I realize I'm late to this party, but I was asked to give an opinion
in the SearchIO thread.

> For single user machines, where the single user has only a small
> collection of scripts this isn't such an issue. For any shared
> server, or user with lots of Biopython scripts (some of which may
> have been written by different people), you would be forced into a
> mass change at one go.
> 
> You would also have considerable hassle later on with any attempt
> to re-run old scripts.

In my opinion, this is where python virtualenv [1] can really make
life easier, and I'd recommend this for running old library versions
anyway.

I'd rather do the correct change now, for every version of python, and
explain to people how to set up virtualenvs for their older scripts.

Cheers,
Kai

[1] http://pypi.python.org/pypi/virtualenv

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQimlLAAoJEKM5lwBiwTTPsswIAMnEn4AT8xrfsq3xzkbB6tS2
y5FkLAb11xDP5PpttA+5qDXmnmJuMFqYq8FsSnJnpVq+ZGSAkswFC1prqQp57LdG
V+EVZtf/HDzepbrVgNYe272nTPlc6cxjmtjWJca19fg8gKI97ryUiji/bbOfgjgM
cnGHeUYkGmrcWrI8ergOS/5qLi3Z6S6t+uJezPT3DkbSm8oiOVAuPrIv6MziX69W
QrKF3Edf4s1Do4URSVfZI1qVUEGFaLZMYvZ8/TMgDI2CAQLo0r2OxylrjJxcuqIB
nORFTdwFMD7npDLkyG5U4eWZpfAV9A4RHNTybhpb7RgdVHifnoivA0nIAhsIAWE=
=3VH6
-----END PGP SIGNATURE-----

From kai.blin at biotech.uni-tuebingen.de  Fri Oct 26 08:21:21 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Fri, 26 Oct 2012 14:21:21 +0200
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
Message-ID: <508A8041.2020203@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-10-26 12:51, Peter Cock wrote:

Hi Peter,

> Indirectly it sounds like you like the lower case name idea - what 
> do you think about making this switch under Python 3? (This will 
> only inconvenience the relatively small number of early adopters 
> already trying Biopython under Python 3 - but it would be another 
> bump for people transitioning from Python 2 to 3).

Actually, as someone who has to switch between BioPython and BioPerl a
lot, I'd personally prefer if both libraries stayed as close as
possible in their structure. In my opinion, the ability to easily
switch between languages while using the Bio* libraries is one of the
biggest features. As far as I understand we're just changing module
names here, so all that'd be different would be the import lines.

After reading thought this thread, I got the impression that there was
a general agreement on switching to PEP8-compatible names eventually,
and the remaining question was how to best do that.

I haven't played with Python 3 much yet, but I have the impression
that switching to it likely is going to be painful anyway. Even if the
module renaming makes the transition a bit more painful, at least
you've only got to go through the pain once.

Assuming the translations between the 2.x and 3.x names can be done
automatically by the conversion script, this sounds like a good idea.

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQioBBAAoJEKM5lwBiwTTPhxYIALTM1TQvOcE6upSFOCrfA0Uh
irgvsQi77JfWvDsvGnOk74+ZQDDM2KGGAR3s9QBPdjRtaXhxSvdSxlXq3sdTNsXh
VjbhEkeW6J3NzVSYbwK3U/mP0D9Xs6ihvnne06Nn7qjH+TLGm2x78cPM5SvjUcL3
QHiHda0wW479J9ZyKhmDTsCXqpX96uH3sjLiKZfs3KJbZ79j20BBWJqWypDuIUb7
DmtY/sngRsqs16yJL1Q35LXskOlCYsHOmJmkXg3Umr8gKOSw5nCEszhatXS3Oygo
Pv8F7exvoEfNHg1IQtmEFycou9k5IaGVsZoRhCE6YvUCJH4Zfz4eOUTD323AzT4=
=UPdn
-----END PGP SIGNATURE-----

From p.j.a.cock at googlemail.com  Fri Oct 26 08:42:25 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 13:42:25 +0100
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <508A8041.2020203@biotech.uni-tuebingen.de>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>

On Fri, Oct 26, 2012 at 1:21 PM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 2012-10-26 12:51, Peter Cock wrote:
>
> Hi Peter,
>
>> Indirectly it sounds like you like the lower case name idea - what
>> do you think about making this switch under Python 3? (This will
>> only inconvenience the relatively small number of early adopters
>> already trying Biopython under Python 3 - but it would be another
>> bump for people transitioning from Python 2 to 3).
>
> Actually, as someone who has to switch between BioPython and BioPerl a
> lot, I'd personally prefer if both libraries stayed as close as
> possible in their structure. In my opinion, the ability to easily
> switch between languages while using the Bio* libraries is one of the
> biggest features. As far as I understand we're just changing module
> names here, so all that'd be different would be the import lines.
>
> After reading thought this thread, I got the impression that there was
> a general agreement on switching to PEP8-compatible names eventually,
> and the remaining question was how to best do that.

Yes - hindered by the fact that due to file system limitations we can't
have multiple capitalisations of a given module at the same time.
Ideally we'd like to use bio.* as the namespace, and make this
switch as part of moving to Python 3 is one way to do that.

My personal preference is for a new lowercase namespace like
biopy.* or biopython.* which can co-exist with Bio.* during a
transition period. However, this did not seem popular.

> I haven't played with Python 3 much yet, but I have the impression
> that switching to it likely is going to be painful anyway. Even if the
> module renaming makes the transition a bit more painful, at least
> you've only got to go through the pain once.
>
> Assuming the translations between the 2.x and 3.x names can be done
> automatically by the conversion script, this sounds like a good idea.

That was my thinking - but it does go against the general advice
to library authors in that API changes from Python 2.x to 3.x are
discouraged.

We can of course stick with Bio.* as it is (which I believe is Brad's
favoured option). And I'm OK with this - it is the simplest option
(and doesn't prevent us doing some more minor changes if we
want to, such as reorganising all the Bio.SeqXXXX modules
under one directory).

Perhaps a blog post & email to the announcement mailing list
soliciting feedback on this proposal is the best way forward,
perhaps with a web-survey form? e.g.

(1) Keep the namespace as 'Bio'

(2) Keep the namespace as 'Bio' on Python 2,
but adopt all lowercase module names on Python 3.

(3) Move to a new all lowercase namespace like 'biopy'
(anything except 'bio'), allowing the current 'Bio' namespace
to continue to be available as well during a transition period.

And the most disruptive option:

(4) Switch to an all lowercase namespace 'bio', which
cannot in general co-exist with the old 'Bio' namespace
(perhaps bumping the version number to 2.0.0?). This
would break legacy scripts, which would need to be
updated, e.g.:

from Bio.SeqRecord import SeqRecord
from Bio import SeqIO

could be replaced by:

try:
    #Biopython 1.x uses Bio.*
    from Bio.SeqRecord import SeqRecord
    from Bio import SeqIO
except ImportError:


This would mean under Windows and most Mac install
you cannot have both
you (and all other users of the machine) m
must be remove

Regards,

Peter

From p.j.a.cock at googlemail.com  Fri Oct 26 08:43:36 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 13:43:36 +0100
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
Message-ID: <CAKVJ-_49AKymz+kB=1vfU2NX6WcuKaeODnH9m1h2OXt2FjqMTQ@mail.gmail.com>

Arg - occidentally tabbed to the send button while trying to indent
sample code...

On Fri, Oct 26, 2012 at 1:42 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Perhaps a blog post & email to the announcement mailing list
> soliciting feedback on this proposal is the best way forward,
> perhaps with a web-survey form? e.g.
>
> (1) Keep the namespace as 'Bio'
>
> (2) Keep the namespace as 'Bio' on Python 2,
> but adopt all lowercase module names on Python 3.
>
> (3) Move to a new all lowercase namespace like 'biopy'
> (anything except 'bio'), allowing the current 'Bio' namespace
> to continue to be available as well during a transition period.
>
> And the most disruptive option:
>
> (4) Switch to an all lowercase namespace 'bio', which
> cannot in general co-exist with the old 'Bio' namespace
> (perhaps bumping the version number to 2.0.0?). This
> would break legacy scripts, which would need to be
> updated, e.g.:
>
> from Bio.SeqRecord import SeqRecord
> from Bio import SeqIO
>
> could be replaced by:


try:
     #Biopython 1.x uses Bio.*
     from Bio.SeqRecord import SeqRecord
     from Bio import SeqIO
except ImportError:

>
>
>
>
> This would mean under Windows and most Mac install
> you cannot have both
> you (and all other users of the machine) m
> must be remove
>
> Regards,
>
> Peter

From p.j.a.cock at googlemail.com  Fri Oct 26 08:50:23 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 13:50:23 +0100
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_49AKymz+kB=1vfU2NX6WcuKaeODnH9m1h2OXt2FjqMTQ@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
	<CAKVJ-_49AKymz+kB=1vfU2NX6WcuKaeODnH9m1h2OXt2FjqMTQ@mail.gmail.com>
Message-ID: <CAKVJ-_484A0E4-cHYE2XT7FtDp04b8BW_QA89NTdKHNHskPWMw@mail.gmail.com>

On Fri, Oct 26, 2012 at 1:43 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Arg - occidentally tabbed to the send button while trying to indent
> sample code...

Has something changed on GoogleMail's keyboard handling?
Either that or I'm having a bad typing day... my apologies for
the two extra emails.

To continue:

Perhaps a blog post & email to the announcement mailing list
soliciting feedback on this proposal is the best way forward,
perhaps with a web-survey form? e.g.

(1) Keep the namespace as 'Bio'

(2) Keep the namespace as 'Bio' on Python 2,
but adopt all lowercase module names on Python 3.

(3) Move to a new all lowercase namespace like 'biopy'
(anything except 'bio'), allowing the current 'Bio' namespace
to continue to be available as well during a transition period.

And the most disruptive option:

(4) Switch to an all lowercase namespace 'bio', which
cannot in general co-exist with the old 'Bio' namespace
(perhaps bumping the version number to 2.0.0?). This
would break legacy scripts, which would need to be
updated, e.g.:

from Bio.SeqRecord import SeqRecord
from Bio import SeqIO

could be replaced by:

try:
    #Biopython 1.x uses Bio.*
    from Bio.SeqRecord import SeqRecord
    from Bio import SeqIO
except ImportError:
    #Try the new lowercase module names,
    from bio.seqrecord import SeqRecord
    from bio import seqio as SeqIO

Users on Windows and most Mac users might find updating
Biopython complicated during this transition due to the
change in case of the folder names. For anyone installing
from source this might require manual removal of the old
folders (I ran into this kind of issue while trying the lower
case naming under Python 3).

Potentially under Linux (and any Mac using a case sensitive
file system) an old Biopython install using Bio/ and the newer
Biopython using bio/ could co-exist... we would have to look
at that.

Regards,

Peter

From kai.blin at biotech.uni-tuebingen.de  Fri Oct 26 09:34:12 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Fri, 26 Oct 2012 15:34:12 +0200
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
Message-ID: <508A9154.8020507@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-10-26 14:42, Peter Cock wrote:

> My personal preference is for a new lowercase namespace like 
> biopy.* or biopython.* which can co-exist with Bio.* during a 
> transition period. However, this did not seem popular.

That'd still mean older scripts would break after the transition
period, and we'll end up encoding the language name in the module,
which seems a bit silly.

Having said that, I see the least amount of pain for BioPython users
going that route, with the possibly larger maintenance headache for
BioPython developers.

I think this is one of these "what color do we paint the bikeshed"
discussions, where there really isn't any objectively superior solution.

> That was my thinking - but it does go against the general advice to
> library authors in that API changes from Python 2.x to 3.x are 
> discouraged.

Right, but from dealing with the python folks on Freenode IRC, I
gather that many of them assume the switch from Python 2.x to 3.x is a
very low-impact change for code authors. I tend to disagree there. :)

> We can of course stick with Bio.* as it is (which I believe is
> Brad's favoured option). And I'm OK with this - it is the simplest
> option (and doesn't prevent us doing some more minor changes if we 
> want to, such as reorganising all the Bio.SeqXXXX modules under one
> directory).

As I said, strong feeling of a bikeshed discussion here. :)

> Perhaps a blog post & email to the announcement mailing list 
> soliciting feedback on this proposal is the best way forward, 
> perhaps with a web-survey form? e.g.

To be honest, I don't care that much about which solution is decided
on, as long as the decision is made soon. I've got some programs that
need the HMMer2 parser that I've added to Bow's SearchIO code, and I'm
hoping to get that into BioPython soon instead of having to ship with
a custom BioPython for publication.

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQipFTAAoJEKM5lwBiwTTP4nkIAI5TegXeWy6b8FoPmq46XPzz
iVh6g0t37xAJ9Aat3aE5vDklF7yqEwcVPKxFkj2Nd2MLaDqhfnuldE9pEqbPmZfl
eQptF5JXTAlw/YKAPFzTyFSIlKv3wiuTiGeTxKJtXewOkgEu6VwzNgjPnCYhamaT
Nda7NQEA6mlmaH7ABwO1mLLObk7i90oqVNDIuhnOAAA1ZrVnnQ4QHRupbiLZVd3d
3od3JVM4h+ZT5AL12Lts9lAdrc94MVri5i0P1VSQEnAQV/LJ5uoT2a4l2DRFM35R
NR501X7ubTQPrK8ATveTWaCYYcn/XMnS7dEpvSWsxFR8oM+69LxF3UVtH2ShfDs=
=Teym
-----END PGP SIGNATURE-----

From eric.talevich at gmail.com  Fri Oct 26 11:19:23 2012
From: eric.talevich at gmail.com (Eric Talevich)
Date: Fri, 26 Oct 2012 11:19:23 -0400
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CADEGkF5J0pHBoNaB1xKfQzwNYYzuKverV+1zt3EiEXVA0dEQKg@mail.gmail.com>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
	<508A6535.6070507@biotech.uni-tuebingen.de>
	<CAKVJ-_6pvuFqCc+EJVgP-GAnAN6XQ83Y7tqAE8NKU1j95qsEnA@mail.gmail.com>
	<CADEGkF5J0pHBoNaB1xKfQzwNYYzuKverV+1zt3EiEXVA0dEQKg@mail.gmail.com>
Message-ID: <CAMC681mXGXR05+g=wMfMcYB40oySoe2aomRRQvx5Y4doXFF3TA@mail.gmail.com>

On Fri, Oct 26, 2012 at 6:38 AM, Wibowo Arindrarto
<w.arindrarto at gmail.com>wrote:

> >> Also, I think there are still some issues that need to be dealt
> >
> > >> with before we put SearchIO into master, notably with Bio.BLAST
> > >> module. If not the official deprecation notice, at least the the
> > >> tutorial has to be updated (let Bio.BLAST readers know about the
> > >> plan with SearchIO). I've written a short tutorial here:
> > >> http://bow.web.id/biopython/Tutorial.html. This is still a draft,
> > >> but you can already see that there are some obvious overlaps
> > >> between Bio.BLAST and Bio.SearchIO, which is confusing to new
> > >> readers.
> > >
> > > Personally I wouldn't let this consideration block the inclusion of a
> > > module as useful like that. Of course I need this code, so I'm biased.
> >
> > I'm also OK with merging the code before updating the Tutorial
> > chapter on BLAST (which would probably become a broader
> > chapter on BLAST and other tools using SearchIO). As discussed
> > before, the long term aim would be to remove Bio.BLAST.
>

Bio.Blast does contain some features beyond parsing the output of BLAST...


> > I'll have to read up on the namespace discussion. While I see the
> > > benefit of using PEP8 names, intuitively I don't like bio.seq.search
> > > much. Then again, I started my life in Bio* with BioPerl, and like the
> > > pretty similar module layout BioPython has so far.
> >
> > Yeah - the current naming of SeqIO and AlignIO was directly
> > inspired by BioPerl, and give the working name of SearchIO.
> >
> > Peter
>
> Reaching a unanimous decision on name preference seems difficult :/.
> We now have:
>
> 1. Bio.seq.search (in line with the namespace change)
> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used
> to be Bio.SeqSearch, now adjusted for PEP8 compliance)
> 3. Bio.search (same reasoning + explanation like Bio.seqsearch).
> 4. Bio.SearchIO / Bio.searchio
> 5. Bio.psearch (p for pairwise)
>
> Any other suggestions? Should we put it to a vote?
>
> regards,
> Bowo
>
>
If it's down to a vote, I would vote to merge this branch as Bio.SearchIO,
and perhaps lowercase it to Bio.searchio or biopy.searchio in the Py3
lowercase branch.

Rationale: We already follow BioPerl with SeqIO and AlignIO, and it seems
to help users. It's also Google-friendly.

-Eric

From p.j.a.cock at googlemail.com  Fri Oct 26 11:42:18 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 16:42:18 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAMC681mXGXR05+g=wMfMcYB40oySoe2aomRRQvx5Y4doXFF3TA@mail.gmail.com>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
	<508A6535.6070507@biotech.uni-tuebingen.de>
	<CAKVJ-_6pvuFqCc+EJVgP-GAnAN6XQ83Y7tqAE8NKU1j95qsEnA@mail.gmail.com>
	<CADEGkF5J0pHBoNaB1xKfQzwNYYzuKverV+1zt3EiEXVA0dEQKg@mail.gmail.com>
	<CAMC681mXGXR05+g=wMfMcYB40oySoe2aomRRQvx5Y4doXFF3TA@mail.gmail.com>
Message-ID: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>

On Fri, Oct 26, 2012 at 4:19 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
>
> Bio.Blast does contain some features beyond parsing the output of BLAST...
>

Also wrappers to call the tools, and the online search.
Easy enough.

>> Reaching a unanimous decision on name preference seems difficult :/.
>> We now have:
>>
>> 1. Bio.seq.search (in line with the namespace change)
>> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used
>> to be Bio.SeqSearch, now adjusted for PEP8 compliance)
>> 3. Bio.search (same reasoning + explanation like Bio.seqsearch).
>> 4. Bio.SearchIO / Bio.searchio
>> 5. Bio.psearch (p for pairwise)
>>
>> Any other suggestions? Should we put it to a vote?
>>
>> regards,
>> Bowo
>>
>
> If it's down to a vote, I would vote to merge this branch as Bio.SearchIO,
> and perhaps lowercase it to Bio.searchio or biopy.searchio in the Py3
> lowercase branch.
>
> Rationale: We already follow BioPerl with SeqIO and AlignIO, and it
> seems to help users. It's also Google-friendly.

I like Bio.SearchIO for those reasons too. Perhaps that is the
most popular name?

Peter

From mjldehoon at yahoo.com  Fri Oct 26 11:58:04 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 26 Oct 2012 08:58:04 -0700 (PDT)
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>
Message-ID: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>

> 1. Bio.seq.search (in line with the namespace change)
> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used
>    to be Bio.SeqSearch, now adjusted for PEP8 compliance)
> 3. Bio.search (same reasoning + explanation like Bio.seqsearch).
> 4. Bio.SearchIO / Bio.searchio
> 5. Bio.psearch (p for pairwise)

> If it's down to a vote, I would vote to merge this branch as
> Bio.SearchIO, and perhaps lowercase it to Bio.searchio or
> biopy.searchio in the Py3 lowercase branch.
> 
> Rationale: We already follow BioPerl with SeqIO and AlignIO, and it
> seems to help users. It's also Google-friendly.

I would vote for Bio.seq.search.
I don't like Bio.SearchIO much because a) it doesn't tell you clearly what the module is about; and b) I think it it is a mistake to have Bio.SeqIO separate from Bio.Seq, and Bio.AlignIO separate from Bio.Align, because in both cases the two modules conceptually deal with the same thing. We don't have Bio.Cluster and Bio.ClusterIO, Bio.Entrez and Bio.EntrezIO, Bio.Motif and Bio.MotifIO; why should Bio.Seq and Bio.Align be different?

-Michiel.

From p.j.a.cock at googlemail.com  Fri Oct 26 12:14:22 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 17:14:22 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>
	<1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>

On Fri, Oct 26, 2012 at 4:58 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>> 1. Bio.seq.search (in line with the namespace change)
>> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used
>>    to be Bio.SeqSearch, now adjusted for PEP8 compliance)
>> 3. Bio.search (same reasoning + explanation like Bio.seqsearch).
>> 4. Bio.SearchIO / Bio.searchio
>> 5. Bio.psearch (p for pairwise)
>
>> If it's down to a vote, I would vote to merge this branch as
>> Bio.SearchIO, and perhaps lowercase it to Bio.searchio or
>> biopy.searchio in the Py3 lowercase branch.
>>
>> Rationale: We already follow BioPerl with SeqIO and AlignIO, and it
>> seems to help users. It's also Google-friendly.
>
> I would vote for Bio.seq.search.

And would you support moving other existing Bio.SeqXXX modules
under Bio.seq.* as for example outlined here?:
http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html
If so then I think we should go with that plan.

> I don't like Bio.SearchIO much because a) it doesn't tell you clearly
> what the module is about; and b) I think it it is a mistake to have
> Bio.SeqIO separate from Bio.Seq, and Bio.AlignIO separate from
> Bio.Align, because in both cases the two modules conceptually deal
> with the same thing. We don't have Bio.Cluster and Bio.ClusterIO,
> Bio.Entrez and Bio.EntrezIO, Bio.Motif and Bio.MotifIO; why should
> Bio.Seq and Bio.Align be different?

After all, not everyone was exposed to BioPerl before Biopython ;)

Peter

From p.j.a.cock at googlemail.com  Fri Oct 26 17:19:28 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 22:19:28 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>
References: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>
	<1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>
Message-ID: <CAKVJ-_6XT6EO9rJaJ9RJWiE7XTvtAGe4Fq54wfNmCt+Vj57gzg@mail.gmail.com>

On Fri, Oct 26, 2012 at 5:14 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Fri, Oct 26, 2012 at 4:58 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>>> 1. Bio.seq.search (in line with the namespace change)
>>> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used
>>>    to be Bio.SeqSearch, now adjusted for PEP8 compliance)
>>> 3. Bio.search (same reasoning + explanation like Bio.seqsearch).
>>> 4. Bio.SearchIO / Bio.searchio
>>> 5. Bio.psearch (p for pairwise)
>>
>>> If it's down to a vote, I would vote to merge this branch as
>>> Bio.SearchIO, and perhaps lowercase it to Bio.searchio or
>>> biopy.searchio in the Py3 lowercase branch.
>>>
>>> Rationale: We already follow BioPerl with SeqIO and AlignIO, and it
>>> seems to help users. It's also Google-friendly.
>>
>> I would vote for Bio.seq.search.
>
> And would you support moving other existing Bio.SeqXXX
> modules under Bio.seq.* as for example outlined here?:
> http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html
> If so then I think we should go with that plan.

I have started exploring that idea on this new branch,
https://github.com/peterjc/biopython/tree/bioseq

Does anyone object to me applying the first commit to the master
branch (defining the previously discussed new warning for 'beta' code)?
https://github.com/peterjc/biopython/commit/97485d5dcf2620f7664ae46a7897c1203847538d

Note that introducing Bio.seq now (and any relocations under this)
can (I believe) still be combined with the lower-case modules under
Python 3 idea as well. This just requires the public classes and
functions defined under Bio.Seq.* remains mirrored under Bio.Seq.*
(this means assorted Seq objects and some functions like translate).

Peter

From w.arindrarto at gmail.com  Fri Oct 26 18:43:45 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Sat, 27 Oct 2012 00:43:45 +0200
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAKVJ-_6XT6EO9rJaJ9RJWiE7XTvtAGe4Fq54wfNmCt+Vj57gzg@mail.gmail.com>
References: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>
	<1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>
	<CAKVJ-_6XT6EO9rJaJ9RJWiE7XTvtAGe4Fq54wfNmCt+Vj57gzg@mail.gmail.com>
Message-ID: <CADEGkF7GLd6eEzXec-5W_U4s9F2THe0wknyYpb2ptSY6oFH1aQ@mail.gmail.com>

>>> 1. Bio.seq.search (in line with the namespace change)
>>>> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used
>>>>    to be Bio.SeqSearch, now adjusted for PEP8 compliance)
>>>> 3. Bio.search (same reasoning + explanation like Bio.seqsearch).
>>>> 4. Bio.SearchIO / Bio.searchio
>>>> 5. Bio.psearch (p for pairwise)
>>>
>>>> If it's down to a vote, I would vote to merge this branch as
>>>> Bio.SearchIO, and perhaps lowercase it to Bio.searchio or
>>>> biopy.searchio in the Py3 lowercase branch.
>>>>
>>>> Rationale: We already follow BioPerl with SeqIO and AlignIO, and it
>>>> seems to help users. It's also Google-friendly.
>>>
>>> I would vote for Bio.seq.search.
>>
>> And would you support moving other existing Bio.SeqXXX
>> modules under Bio.seq.* as for example outlined here?:
>> http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html
>> If so then I think we should go with that plan.
>
> I have started exploring that idea on this new branch,
> https://github.com/peterjc/biopython/tree/bioseq
>
> Does anyone object to me applying the first commit to the master
> branch (defining the previously discussed new warning for 'beta' code)?
> https://github.com/peterjc/biopython/commit/97485d5dcf2620f7664ae46a7897c1203847538d

No objection from me for the commit :).

But I have some concerns for the SearchIO naming. I like Bio.seqsearch
best at the moment. Bio.seq.search is good, but I understand that
Bio.SearchIO will eventually contain app wrappers and code for remote
searches as well. Putting it three levels-deep doesn't feel nice to
me. As comparisons, submodules with similar features (Bio.Phylo, and
possibly Bio.AlignIO, if in the future it will be merged with
alignment app wrappers and the alignment object model) are available
under Bio.

> Note that introducing Bio.seq now (and any relocations under this)
> can (I believe) still be combined with the lower-case modules under
> Python 3 idea as well. This just requires the public classes and
> functions defined under Bio.Seq.* remains mirrored under Bio.Seq.*
> (this means assorted Seq objects and some functions like translate).
>
> Peter

regards,
Bow

From p.j.a.cock at googlemail.com  Fri Oct 26 20:54:47 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 27 Oct 2012 01:54:47 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CADEGkF7GLd6eEzXec-5W_U4s9F2THe0wknyYpb2ptSY6oFH1aQ@mail.gmail.com>
References: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>
	<1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>
	<CAKVJ-_6XT6EO9rJaJ9RJWiE7XTvtAGe4Fq54wfNmCt+Vj57gzg@mail.gmail.com>
	<CADEGkF7GLd6eEzXec-5W_U4s9F2THe0wknyYpb2ptSY6oFH1aQ@mail.gmail.com>
Message-ID: <CAKVJ-_7jGGqNH9LK6aosTCnJznEN4oaNB+6w6m2P02WGz3czSw@mail.gmail.com>

On Fri, Oct 26, 2012 at 11:43 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
>Peter wrote:
>> I have started exploring that idea on this new branch,
>> https://github.com/peterjc/biopython/tree/bioseq
>>
>> Does anyone object to me applying the first commit to the master
>> branch (defining the previously discussed new warning for 'beta' code)?
>> https://github.com/peterjc/biopython/commit/97485d5dcf2620f7664ae46a7897c1203847538d
>
> No objection from me for the commit :).
>
> But I have some concerns for the SearchIO naming. I like Bio.seqsearch
> best at the moment. Bio.seq.search is good, but I understand that
> Bio.SearchIO will eventually contain app wrappers and code for remote
> searches as well. Putting it three levels-deep doesn't feel nice to
> me. As comparisons, submodules with similar features (Bio.Phylo, and
> possibly Bio.AlignIO, if in the future it will be merged with
> alignment app wrappers and the alignment object model) are available
> under Bio.

I think we'd get used to the nested namespace pretty quickly, and
this really only affect the import line anyway, e.g. something like
this isn't so bad as long as we document this:

from Bio.seq.search.apps import BlatCommandLine

If the namespace nesting bothers you, then you might not like
my thoughts for how to combine Bio.Align and Bio.AlignIO
(since we can't use Bio.align due to the folder name clash on
case incentive platforms): I was wondering about using
Bio.seq.align for this, which again is a bit nested but would
make it a system module to Bio.seq.search (aka SearchIO)
and Bio.seq.record (which could include the former SeqIO
code as well as the SeqRecord class).

Peter

From eric.talevich at gmail.com  Sat Oct 27 00:03:46 2012
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sat, 27 Oct 2012 00:03:46 -0400
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAKVJ-_7jGGqNH9LK6aosTCnJznEN4oaNB+6w6m2P02WGz3czSw@mail.gmail.com>
References: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>
	<1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>
	<CAKVJ-_6XT6EO9rJaJ9RJWiE7XTvtAGe4Fq54wfNmCt+Vj57gzg@mail.gmail.com>
	<CADEGkF7GLd6eEzXec-5W_U4s9F2THe0wknyYpb2ptSY6oFH1aQ@mail.gmail.com>
	<CAKVJ-_7jGGqNH9LK6aosTCnJznEN4oaNB+6w6m2P02WGz3czSw@mail.gmail.com>
Message-ID: <CAMC681m5bXWPt48EH2Yrhu9b2F2kUTQVkkmRAJeONsJ--0GJjA@mail.gmail.com>

On Fri, Oct 26, 2012 at 8:54 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

>
> If the namespace nesting bothers you, then you might not like
> my thoughts for how to combine Bio.Align and Bio.AlignIO
> (since we can't use Bio.align due to the folder name clash on
> case incentive platforms): I was wondering about using
> Bio.seq.align for this, which again is a bit nested but would
> make it a system module to Bio.seq.search (aka SearchIO)
> and Bio.seq.record (which could include the former SeqIO
> code as well as the SeqRecord class).
>
>
Does that mean we'd have read, write, convert, etc. under Bio.seq.record?
This is how that API would look:

from Bio.seq import record
for rec in record.parse("example.fa", "fasta"): ...

As opposed to:

# Minor change
from Bio import seqio
for record in seqio.parse(...)

# Make sure we get those relative imports right!
from Bio.seq import io
for record in io.parse(...)

# Slight cognitive distance, but maybe worth it
from Bio import seq
for record in seq.parse(...)


Also: Technically, Bio.Motif operates on multiple sequence alignments, so
it could be moved to Bio.seq.align.motif. (Not entirely trolling here, just
pointing out possible consequences.)

-Eric

From w.arindrarto at gmail.com  Sat Oct 27 01:55:27 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Sat, 27 Oct 2012 07:55:27 +0200
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAMC681m5bXWPt48EH2Yrhu9b2F2kUTQVkkmRAJeONsJ--0GJjA@mail.gmail.com>
References: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>
	<1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>
	<CAKVJ-_6XT6EO9rJaJ9RJWiE7XTvtAGe4Fq54wfNmCt+Vj57gzg@mail.gmail.com>
	<CADEGkF7GLd6eEzXec-5W_U4s9F2THe0wknyYpb2ptSY6oFH1aQ@mail.gmail.com>
	<CAKVJ-_7jGGqNH9LK6aosTCnJznEN4oaNB+6w6m2P02WGz3czSw@mail.gmail.com>
	<CAMC681m5bXWPt48EH2Yrhu9b2F2kUTQVkkmRAJeONsJ--0GJjA@mail.gmail.com>
Message-ID: <CADEGkF4nOFCGkiLvpre2KOeWbjf_zFgH2kpCavxwc_pOCEUq6g@mail.gmail.com>

>> If the namespace nesting bothers you, then you might not like
>> my thoughts for how to combine Bio.Align and Bio.AlignIO
>> (since we can't use Bio.align due to the folder name clash on
>> case incentive platforms): I was wondering about using
>> Bio.seq.align for this, which again is a bit nested but would
>> make it a system module to Bio.seq.search (aka SearchIO)
>> and Bio.seq.record (which could include the former SeqIO
>> code as well as the SeqRecord class).
>>
> Does that mean we'd have read, write, convert, etc. under Bio.seq.record?
> This is how that API would look:
>
> from Bio.seq import record
> for rec in record.parse("example.fa", "fasta"): ...
>
> As opposed to:
>
> # Minor change
> from Bio import seqio
> for record in seqio.parse(...)
>
> # Make sure we get those relative imports right!
> from Bio.seq import io
> for record in io.parse(...)
>
> # Slight cognitive distance, but maybe worth it
> from Bio import seq
> for record in seq.parse(...)
>
>
> Also: Technically, Bio.Motif operates on multiple sequence alignments, so it
> could be moved to Bio.seq.align.motif. (Not entirely trolling here, just
> pointing out possible consequences.)
>
> -Eric

What bothers me other than it being hidden is also the inconsistency
(comparing it to the current namespace). However, if there is also a
plan to merge sequence-related submodules under Bio.seq, it feels
better and I'm ok with it. Still hidden, but we'll have more
consistency and the root namespace will have less clutter.

So it would look like this (with previously mentioned examples):

Bio.SearchIO -> Bio.seq.search
Bio.AlignIO -> Bio.seq.align
Bio.Motif -> Bio.seq.motif
Bio.SeqIO -> Bio.seq (or merge with Bio.SeqRecord into Bio.seq.record)
Bio.SeqRecord -> Bio.seq.record
Bio.SeqUtils -> Bio.seq.utils
Bio.SeqFeature -> Bio.seq.feature

Also maybe:
Bio.Alphabet -> Bio.seq.alphabet
Bio.Restriction  -> Bio.seq.restriction or Bio.seq.utils.restriction

And Eric is right, we may go further with Bio.seq.align.motif, but I
think nesting sequence-related modules under Bio.seq is the furthest
we should go. I personally find it the most intuitive.

regards,
Bow

From mjldehoon at yahoo.com  Sat Oct 27 06:46:10 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 27 Oct 2012 03:46:10 -0700 (PDT)
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>
Message-ID: <1351334770.89984.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi everybody,

--- On Fri, 10/26/12, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> And would you support moving other existing Bio.SeqXXX
> modules under Bio.seq.* as for example outlined here?:
> http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html

Yes that looks good to me.

> I'm not 100% sure where the Bio.SeqIO top level functions
> would belong, either directly under Bio.seq or Bio.seq.record
> might work too.

I would prefer to have the top-level functions directly under Bio.seq, since they will be used a lot.

Best,
-Michiel.

From mjldehoon at yahoo.com  Sat Oct 27 06:47:43 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 27 Oct 2012 03:47:43 -0700 (PDT)
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CADEGkF4nOFCGkiLvpre2KOeWbjf_zFgH2kpCavxwc_pOCEUq6g@mail.gmail.com>
Message-ID: <1351334863.39503.YahooMailClassic@web164006.mail.gq1.yahoo.com>

--- On Sat, 10/27/12, Wibowo Arindrarto <w.arindrarto at gmail.com> wrote:
> And Eric is right, we may go further with Bio.seq.align.motif, but I
> think nesting sequence-related modules under Bio.seq is the furthest
> we should go. I personally find it the most intuitive.

I agree. And according to the Zen of Python, flat is better than nested.

Best,
-Michiel.

From bartek at rezolwenta.eu.org  Sat Oct 27 08:55:12 2012
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Sat, 27 Oct 2012 14:55:12 +0200
Subject: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif
In-Reply-To: <1351339617.3402.YahooMailClassic@web164005.mail.gq1.yahoo.com>
References: <CABHxouWroakKxFvTQK-5y=FvOeXc_7bLHNnCYnz3wgAup_c_jg@mail.gmail.com>
	<1351339617.3402.YahooMailClassic@web164005.mail.gq1.yahoo.com>
Message-ID: <CABHxouUk5jPrhP8w-KTvuDhJCeguEVXR=4JO-dbbetZr9q5BjA@mail.gmail.com>

Hi Michiel,

On Sat, Oct 27, 2012 at 2:06 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> Actually I was thinking about the suggestions for Bio.Motif I made earlier (see http://lists.open-bio.org/pipermail/biopython-dev/2012-September/009946.html). Right now they are just ideas, so I haven't implemented them yet. You mentioned in your reply last month:
>
>> I'll try to come up with a more thought through and longer response
>> later in the week...
>

Absolutely. It's just that I had quite a crazy time lately (time spent
writing proposals and other such stuff...) and I didn't really think
too much about Bio.Motif.

> So I was wondering if you have any additional comments on these suggestions, or if I can go ahead and start implementing.
>

I'm sorry if my inactivity has slowed things down. I'll try to be more
constructive this time.

I think that one thing clear is the Bio.Motif could use some code
optimization, especially in the area of PWM searching. Honestly, I
don't think that there will be a time in a forseeable future that I'll
do it, so if you feel like implementing a better code for PWM
handling/searching I'll be happy to do some code review or testing.

There are a few things I think would be good to keep:
- possibility to invoke motif.pwm_search(...) without worrying about
the fact that it is actually carried out by some specialized class
- possibility to determine motif thresholds based on fpr or fnr as
currently implemented in Bio.Motif.Thresholds module
- possibility to convert count based motifs to PWM based motifs
without much fuss...

All of these things are not really in conflict with your idea of
moving the PWM related code to the special class, so if you want to do
that, go ahead.

If you also have trouble finding time to implement these improvements,
I could try to recruit some master student from our department to do
that. But if you have time to do the implementation yourself, it will
probably be better and faster that way.

best
Bartek

-- 
Bartek Wilczynski


From mjldehoon at yahoo.com  Sat Oct 27 22:47:15 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 27 Oct 2012 19:47:15 -0700 (PDT)
Subject: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif
In-Reply-To: <CABHxouUk5jPrhP8w-KTvuDhJCeguEVXR=4JO-dbbetZr9q5BjA@mail.gmail.com>
Message-ID: <1351392435.42713.YahooMailClassic@web164006.mail.gq1.yahoo.com>

Hi Bartek,

OK, thanks!
I'll go ahead with the implementation then, and write an update to the mailing list again so people can have a look at it.

Best,
-Michiel.

--- On Sat, 10/27/12, Bartek Wilczynski <bartek at rezolwenta.eu.org> wrote:

> From: Bartek Wilczynski <bartek at rezolwenta.eu.org>
> Subject: Re: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "BioPython-Dev" <biopython-dev at biopython.org>
> Date: Saturday, October 27, 2012, 8:55 AM
> Hi Michiel,
> 
> On Sat, Oct 27, 2012 at 2:06 PM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> 
> > Actually I was thinking about the suggestions for
> Bio.Motif I made earlier (see http://lists.open-bio.org/pipermail/biopython-dev/2012-September/009946.html).
> Right now they are just ideas, so I haven't implemented them
> yet. You mentioned in your reply last month:
> >
> >> I'll try to come up with a more thought through and
> longer response
> >> later in the week...
> >
> 
> Absolutely. It's just that I had quite a crazy time lately
> (time spent
> writing proposals and other such stuff...) and I didn't
> really think
> too much about Bio.Motif.
> 
> > So I was wondering if you have any additional comments
> on these suggestions, or if I can go ahead and start
> implementing.
> >
> 
> I'm sorry if my inactivity has slowed things down. I'll try
> to be more
> constructive this time.
> 
> I think that one thing clear is the Bio.Motif could use some
> code
> optimization, especially in the area of PWM searching.
> Honestly, I
> don't think that there will be a time in a forseeable future
> that I'll
> do it, so if you feel like implementing a better code for
> PWM
> handling/searching I'll be happy to do some code review or
> testing.
> 
> There are a few things I think would be good to keep:
> - possibility to invoke motif.pwm_search(...) without
> worrying about
> the fact that it is actually carried out by some specialized
> class
> - possibility to determine motif thresholds based on fpr or
> fnr as
> currently implemented in Bio.Motif.Thresholds module
> - possibility to convert count based motifs to PWM based
> motifs
> without much fuss...
> 
> All of these things are not really in conflict with your
> idea of
> moving the PWM related code to the special class, so if you
> want to do
> that, go ahead.
> 
> If you also have trouble finding time to implement these
> improvements,
> I could try to recruit some master student from our
> department to do
> that. But if you have time to do the implementation
> yourself, it will
> probably be better and faster that way.
> 
> best
> Bartek
> 
> -- 
> Bartek Wilczynski
> 

From chapmanb at 50mail.com  Sun Oct 28 14:55:31 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sun, 28 Oct 2012 14:55:31 -0400
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
In-Reply-To: <CAChfGK0C=-facUZeq_Aqd4LS=NFCE8iBmtcTt=OmL7-L62PUew@mail.gmail.com>
References: <CAChfGK0C=-facUZeq_Aqd4LS=NFCE8iBmtcTt=OmL7-L62PUew@mail.gmail.com>
Message-ID: <87sj8ys9y4.fsf@fastmail.fm>


Connor;

> I remember this being resolved, but when I try to install biopython with
> pip, it fails:

Thanks for the report. It looks like the command line options pip uses
to call setup.py changed a bit, so the hack we have in place is no
longer working. I pushed a fix for this:

https://github.com/chapmanb/biopython/commit/e05a355e3e9825c44c4a9b3bdfdda25c9a92c9c4

which seems to resolve the issue and hopefully make it more robust going
forward. Could you confirm it works on your system:

$ cd /tmp
$ git clone git://github.com/chapmanb/biopython.git
$ sudo pip install /tmp/biopython

If so, I'll push this into the main repo for the next release. Thanks
again for letting us know about the problem,
Brad

From chapmanb at 50mail.com  Sun Oct 28 15:02:54 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sun, 28 Oct 2012 15:02:54 -0400
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
Message-ID: <87pq42s9lt.fsf@fastmail.fm>


Peter and all;
Interesting discussion on the module path issues. I'm agreed with
everyone that it would be nice to be pep8 compliant. However, my vote
would be to stick with our traditional namespace to avoid widespread
breakage. The changes everyone is proposing are nice, but not nice
enough to deal with introducing an incompatible version and the
documentation and help fallout from that.

If everyone wants to go down the module name path, it would be worth
investing in a biopython1to2 script that automatically handles the
renamings for folks.

Just my 2 cents,
Brad

From p.j.a.cock at googlemail.com  Mon Oct 29 04:15:59 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 29 Oct 2012 08:15:59 +0000
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <87pq42s9lt.fsf@fastmail.fm>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
	<87pq42s9lt.fsf@fastmail.fm>
Message-ID: <CAKVJ-_4GzU+5vMXd1XLvycV=tK6xcgMoSA53cjNmYC4fGoPM6w@mail.gmail.com>

On Sunday, October 28, 2012, Brad Chapman wrote:

>
> Peter and all;
> Interesting discussion on the module path issues. I'm agreed with
> everyone that it would be nice to be pep8 compliant. However, my vote
> would be to stick with our traditional namespace to avoid widespread
> breakage. The changes everyone is proposing are nice, but not nice
> enough to deal with introducing an incompatible version and the
> documentation and help fallout from that.
>
> If everyone wants to go down the module name path, it would be worth
> investing in a biopython1to2 script that automatically handles the
> renamings for folks.
>
> Just my 2 cents,
> Brad
>

Hi Brad,

In the case of Bow's SearchIO code, what would you prefer?
e.g. Bio.SearchIO as it is now on his branch?

Peter

From kai.blin at biotech.uni-tuebingen.de  Mon Oct 29 06:26:03 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Mon, 29 Oct 2012 11:26:03 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
Message-ID: <508E59BB.1050705@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-10-26 11:33, Wibowo Arindrarto wrote:

Hi Bow, Peter,

> For the merge conflict, which branch are you using? Can you point
> to specific commits that cause the conflicts? I haven't tried
> merging / rebasing my own branch to the current master myself ~ so
> knowing this should help the process as well.

Disregarding the namespace discussion, I needed to get a reasonable
branch to get my HMMer2 parser up-to-date in. As I said last week I
tried rebasing Bow's searchio branch and had a bunch of merge conflicts.

I've retried the rebase today, and most of the merge conflicts are
actually pretty trivial and mostly around the question where the code
gets it's OrderedDict from for python versions < 2.7.

I've pushed the rebased patchset to
https://github.com/kblin/biopython/tree/searchio-rebase if anybody
wants to have a look. With the last patch fixing an error I seem to
have introduced during merge conflict resolution, the SearchIO tests
pass on that branch.

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQjlm7AAoJEKM5lwBiwTTPFe8IAMMLmM2kQmb9vOSCuNjbcIfJ
HqzzvLaw8Eo44uEb0zmxhuJwPoPZpdZIWCNM1t3LpynaE3mHawLcrYJTT/R1YxkS
udBHvMlU6h76J93NITWCzFZ7HHlMMrbzyPel7rifWXbv5xpG2BREpmr1V7lKmbH7
XbInPsVP0PjySFlCQb3219M+IZ4fA+ViYSBlQeXs91G1YzMVo6nkDcs+FkDG8mJt
Qg2u4Bhrxaf3qQKNuQzb2AHJ4KpnEkYsTI2FUJfHaulNfN6w9HwsEgyvM6hVqONP
4aIYlsbSlLjbGG3sdliibPJy5A+8AnkNSFlAHydL+FgBVmPqo3Xe0O5buTdz3Vs=
=prZo
-----END PGP SIGNATURE-----

From cmccoy at fhcrc.org  Mon Oct 29 11:24:45 2012
From: cmccoy at fhcrc.org (Connor McCoy)
Date: Mon, 29 Oct 2012 08:24:45 -0700
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <87sj8ys9y4.fsf@fastmail.fm>
References: <CAChfGK0C=-facUZeq_Aqd4LS=NFCE8iBmtcTt=OmL7-L62PUew@mail.gmail.com>
	<87sj8ys9y4.fsf@fastmail.fm>
Message-ID: <CAChfGK3jP-1vvHCOn7+HC8omhUNyMJMVvq369f=4H307SrO-yg@mail.gmail.com>

Hi Brad,

Thank you so much for the quick reply.  I just got a chance to test this,
and it seems to be working again.

Best,
Connor

On Sun, Oct 28, 2012 at 11:55 AM, Brad Chapman <chapmanb at 50mail.com> wrote:

>
> Connor;
>
> > I remember this being resolved, but when I try to install biopython with
> > pip, it fails:
>
> Thanks for the report. It looks like the command line options pip uses
> to call setup.py changed a bit, so the hack we have in place is no
> longer working. I pushed a fix for this:
>
>
> https://github.com/chapmanb/biopython/commit/e05a355e3e9825c44c4a9b3bdfdda25c9a92c9c4
>
> which seems to resolve the issue and hopefully make it more robust going
> forward. Could you confirm it works on your system:
>
> $ cd /tmp
> $ git clone git://github.com/chapmanb/biopython.git
> $ sudo pip install /tmp/biopython
>
> If so, I'll push this into the main repo for the next release. Thanks
> again for letting us know about the problem,
> Brad
>


-- 
Connor McCoy
Fred Hutchinson Cancer Research Center
1100 Fairview Ave N.
Seattle, WA 98109-1924
cmccoy at fhcrc.org

From chapmanb at 50mail.com  Mon Oct 29 13:54:30 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 29 Oct 2012 13:54:30 -0400
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_4GzU+5vMXd1XLvycV=tK6xcgMoSA53cjNmYC4fGoPM6w@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
	<87pq42s9lt.fsf@fastmail.fm>
	<CAKVJ-_4GzU+5vMXd1XLvycV=tK6xcgMoSA53cjNmYC4fGoPM6w@mail.gmail.com>
Message-ID: <874nldqi3t.fsf@fastmail.fm>


Peter;

> In the case of Bow's SearchIO code, what would you prefer?
> e.g. Bio.SearchIO as it is now on his branch?

I like plain ol' Search the best but don't have a strong preference. I'm
terrible at naming things so trust everyone's judgment on this.

Brad

From w.arindrarto at gmail.com  Mon Oct 29 16:11:09 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Mon, 29 Oct 2012 21:11:09 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <508E59BB.1050705@biotech.uni-tuebingen.de>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
	<508E59BB.1050705@biotech.uni-tuebingen.de>
Message-ID: <CADEGkF7LtqRbD_wUA5F782Wc=_hmSXhrQ8NjeZ9tC2AWW=qiXw@mail.gmail.com>

Hi Kai,

> > For the merge conflict, which branch are you using? Can you point
> > to specific commits that cause the conflicts? I haven't tried
> > merging / rebasing my own branch to the current master myself ~ so
> > knowing this should help the process as well.
>
> Disregarding the namespace discussion, I needed to get a reasonable
> branch to get my HMMer2 parser up-to-date in. As I said last week I
> tried rebasing Bow's searchio branch and had a bunch of merge conflicts.
>
> I've retried the rebase today, and most of the merge conflicts are
> actually pretty trivial and mostly around the question where the code
> gets it's OrderedDict from for python versions < 2.7.
>
> I've pushed the rebased patchset to
> https://github.com/kblin/biopython/tree/searchio-rebase if anybody
> wants to have a look. With the last patch fixing an error I seem to
> have introduced during merge conflict resolution, the SearchIO tests
> pass on that branch.

Thanks for doing the rebase :)! I just checked it and everything looks
fine; all unit tests + doctests pass.

On another note, I was wondering about how to combine this rebased
branch with my local branch. Is there a simple way to apply the
changes in the rebased branch to my local working searchio branch or
should I just switch to a local checkout of the rebased branch?

regards,
Bow

From kai.blin at biotech.uni-tuebingen.de  Mon Oct 29 16:43:49 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Mon, 29 Oct 2012 21:43:49 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
Message-ID: <508EEA85.6060906@biotech.uni-tuebingen.de>

Hi Bow,

I've been looking closer at the SearchIO API changes introduced in
August. I think there still is a design problem with the object model,
at least when looking at how this affects the hmmer3 parser (and affects
the hmmer2 parsing as well).

Possibly I'm not seeing the big picture here, so let me explain what I'm
seeing, and then you can tell me what I missed. :)

So, the hmmer2 and hmmer3 file format basically looks like this

# header
# ...
# ...

information about the query

list of hits

list of hsps

(alignments for hsps)

(some statistics)
//

Now, when parsing this file line-wise, you obviously run into the hits
first. However, with the new API, you can't create a Hit object without
knowing the HSPs, but you haven't read them yet.

To work around this, you need to create a fake hit object
(https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L201).
Then, in the loop that creates the fake hit objects, one of the exit
conditions then parses the HSP entries and then replaces the fake hit
objects by "real" Hit objects.
(https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L188)

By the way, that code is a bit misleading. Took me a while to notice the
switch of the list's contents. Anyway, back to business.

So basically you need to create two hit objects for every hit you're
looking at. What's the advantage of forcing Hsp objects to be passed to
the Hit constructor? Just to make sure your Hit objects have a valid Hsp
at some later point?

I'm aware that I'm just looking at the SearchIO design from the
perspective of the hmmer2 parser, but I'd like to understand the reasons
for the API being the way it currently is.

Hope you can shed some light on this,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben

From kai.blin at biotech.uni-tuebingen.de  Mon Oct 29 16:47:11 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Mon, 29 Oct 2012 21:47:11 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CADEGkF7LtqRbD_wUA5F782Wc=_hmSXhrQ8NjeZ9tC2AWW=qiXw@mail.gmail.com>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
	<508E59BB.1050705@biotech.uni-tuebingen.de>
	<CADEGkF7LtqRbD_wUA5F782Wc=_hmSXhrQ8NjeZ9tC2AWW=qiXw@mail.gmail.com>
Message-ID: <508EEB4F.7050607@biotech.uni-tuebingen.de>

On 2012-10-29 21:11, Wibowo Arindrarto wrote:

Hi Bow,

> On another note, I was wondering about how to combine this rebased
> branch with my local branch. Is there a simple way to apply the
> changes in the rebased branch to my local working searchio branch or
> should I just switch to a local checkout of the rebased branch?

Well, you could rebase your local changes on top of the rebased branch. :)
Or, depending on how many changes you have in your local branch, check
our the rebased branch and then git cherry-pick your changes on top of
the rebased branch.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben

From w.arindrarto at gmail.com  Mon Oct 29 18:55:19 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Mon, 29 Oct 2012 23:55:19 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <508EEA85.6060906@biotech.uni-tuebingen.de>
References: <508EEA85.6060906@biotech.uni-tuebingen.de>
Message-ID: <CADEGkF69XmC-ShWHpyhAcJBY0ZuUCAjXzQhE=FwCxCMXaUrFng@mail.gmail.com>

Hi Kai,

Thanks for the input & comments! I made the API change mainly because
I want to keep the SearchIO object hierarchy more consistent, i.e.
there should be as few places as possible to make changes that break
the model.

There are several attributes that should remain the same between a
single QueryResult object and the Hits, HSPs, and HSPFragments it
contain. For now, these attributes are the ID (both query and hit ID)
and description (also for both query and ID). In the old API, each
object in the object model hierarchy stores these values as its own
attribute. For example, to store the ID of the Hit object, the old API
has the 'id' attribute in the Hit object, 'hit_id' attribute in all
HSP objects it contains, and 'hit_id' attributes in all HSPFragment
contained by each HSP in the Hit. I see this as unecessary
duplications and a possible source of confusion, since these
attributes are completely decoupled from one another even though they
mean the same thing.

The new API stores the these values only at the innermost object in
the hierarchy (the HSPFragment), reducing duplications and possible
sources of inconsistencies. When you access the attributes from
objects other than the HSPFragment, a getter retrieves it from one of
the contained HSPFragment object, after ensuring that all HSPFragment
contain the same value of the attribute
(https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/_utils.py#L99).
Similarly, when you set the attribute, a setter applies the new value
to all HSPFragment objects contained
(https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/_utils.py#L106).

This allows you to keep the values consistent across the hierarchy, so
long as the change is done at the highest level possible (e.g.
changing the hit ID in the HSP object will break consistency, but
changing hit ID through the Hit object will update the hit_id
attribute value across all HSPs it contains). Conceptually, this is
also closer to the real 'Hit' object we're modeling since we always
need at least one HSP to declare a database entry as a Hit.

The HMMER parser's update is partially influenced by this API change,
as you've seen. In the previous version
(https://github.com/bow/biopython/blob/12fbe05c5e17f7a356ab672358b2698612aa8cad/Bio/SearchIO/HmmerIO/hmmertext.py),
the HMMER parser has several ugly bits (e.g. it sets the hit
description in more than one place, a possible source of error). After
changing the API to force the creation of Hits with HSPs, these kinds
of duplications are eliminated. I personally also feel that using the
new API allows me (sometimes forces me) to improve the other format's
parsers in a similar way.

It's unfortunate that the HMMER text parser is made a little difficult
to understand, due to the way HMMER arranges the text output format.
And I admit I didn't do any performance benchmark for the HMMER text
parser when I made the change (I suspected one extra dictionary per
Hit object should not decrease performance that much. Of course, if
the change proves to cause severe performance penalties, then yes, we
should look into it again.).

But for now, I think these are acceptable tradeoffs, if it means the
object model becomes more consistent and the other format parsers
improved as well.

Hope that helps :).

regards,
Bow

P.S. As for the misleading part, yes, I admit that maybe a different
name should be used to note that the contents of the list differ.


On Mon, Oct 29, 2012 at 9:43 PM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
> Hi Bow,
>
> I've been looking closer at the SearchIO API changes introduced in
> August. I think there still is a design problem with the object model,
> at least when looking at how this affects the hmmer3 parser (and affects
> the hmmer2 parsing as well).
>
> Possibly I'm not seeing the big picture here, so let me explain what I'm
> seeing, and then you can tell me what I missed. :)
>
> So, the hmmer2 and hmmer3 file format basically looks like this
>
> # header
> # ...
> # ...
>
> information about the query
>
> list of hits
>
> list of hsps
>
> (alignments for hsps)
>
> (some statistics)
> //
>
> Now, when parsing this file line-wise, you obviously run into the hits
> first. However, with the new API, you can't create a Hit object without
> knowing the HSPs, but you haven't read them yet.
>
> To work around this, you need to create a fake hit object
> (https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L201).
> Then, in the loop that creates the fake hit objects, one of the exit
> conditions then parses the HSP entries and then replaces the fake hit
> objects by "real" Hit objects.
> (https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L188)
>
> By the way, that code is a bit misleading. Took me a while to notice the
> switch of the list's contents. Anyway, back to business.
>
> So basically you need to create two hit objects for every hit you're
> looking at. What's the advantage of forcing Hsp objects to be passed to
> the Hit constructor? Just to make sure your Hit objects have a valid Hsp
> at some later point?
>
> I'm aware that I'm just looking at the SearchIO design from the
> perspective of the hmmer2 parser, but I'd like to understand the reasons
> for the API being the way it currently is.
>
> Hope you can shed some light on this,
> Kai
>
> --
> Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-University of T?bingen
> Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
> D-72076 T?bingen                        Fax :   ++49 7071 29-5979
> Deutschland
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From kai.blin at biotech.uni-tuebingen.de  Tue Oct 30 03:35:40 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Tue, 30 Oct 2012 08:35:40 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <CADEGkF69XmC-ShWHpyhAcJBY0ZuUCAjXzQhE=FwCxCMXaUrFng@mail.gmail.com>
References: <508EEA85.6060906@biotech.uni-tuebingen.de>
	<CADEGkF69XmC-ShWHpyhAcJBY0ZuUCAjXzQhE=FwCxCMXaUrFng@mail.gmail.com>
Message-ID: <508F834C.6010404@biotech.uni-tuebingen.de>

On 2012-10-29 23:55, Wibowo Arindrarto wrote:

Hi Bow,

> Thanks for the input & comments! I made the API change mainly because
> I want to keep the SearchIO object hierarchy more consistent, i.e.
> there should be as few places as possible to make changes that break
> the model.

Thanks for the explanation.

...

> This allows you to keep the values consistent across the hierarchy, so
> long as the change is done at the highest level possible (e.g.
> changing the hit ID in the HSP object will break consistency, but
> changing hit ID through the Hit object will update the hit_id
> attribute value across all HSPs it contains). Conceptually, this is
> also closer to the real 'Hit' object we're modeling since we always
> need at least one HSP to declare a database entry as a Hit.

I see. I didn't think about the programmatic side of things. I see the
advantage of having only one attribute there and of keeping it consistent.

> The HMMER parser's update is partially influenced by this API change,
> as you've seen. In the previous version
> (https://github.com/bow/biopython/blob/12fbe05c5e17f7a356ab672358b2698612aa8cad/Bio/SearchIO/HmmerIO/hmmertext.py),
> the HMMER parser has several ugly bits (e.g. it sets the hit
> description in more than one place, a possible source of error). After
> changing the API to force the creation of Hits with HSPs, these kinds
> of duplications are eliminated. I personally also feel that using the
> new API allows me (sometimes forces me) to improve the other format's
> parsers in a similar way.

Arguably, the more human-readable the file you need to parse, the less
readable the parser tends to be. ;) I think the old parser was a more
straightforward piece of code.

> It's unfortunate that the HMMER text parser is made a little difficult
> to understand, due to the way HMMER arranges the text output format.
> And I admit I didn't do any performance benchmark for the HMMER text
> parser when I made the change (I suspected one extra dictionary per
> Hit object should not decrease performance that much. Of course, if
> the change proves to cause severe performance penalties, then yes, we
> should look into it again.).

I'm not talking about performance here, performance likely isn't a
problem. I'm saying that you're conceptually creating the Hit object
twice. Even the comment in line 200 says so. :)

[snip]
            # create the hit object
            hit_attrs = {
                'id': row[8],
                'query_id': qid,
                'evalue': float(row[0]),
                'bitscore': float(row[1]),
                'bias': float(row[2]),
                # row[3:6] is not parsed, since the info is available
                # at the the HSP level
                'domain_exp_num': float(row[6]),
                'domain_obs_num': int(row[7]),
                'description': row[9],
                'is_included': is_included,
            }
            hit_list.append(hit_attrs)
[snip]

I'm mainly wondering why at this position, I can't just create the Hit
object already, and then later set the HSPs. You could do this via a
setter function that validates the IDs are identical if you want to make
sure you're not shooting yourself in the foot there.

> But for now, I think these are acceptable tradeoffs, if it means the
> object model becomes more consistent and the other format parsers
> improved as well.

I haven't looked into the other parsers, so I'll take your word on that.
I can of course take the same detour of creating a placeholder hit
object for the first pass and then when I've parsed the HSPs create the
real Hit object. If this makes all the other parsers more readable at
the cost of some obscurity in the hmmer text parsers, well, so be it.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben

From p.j.a.cock at googlemail.com  Tue Oct 30 06:59:44 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 30 Oct 2012 10:59:44 +0000
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CADEGkF7GLd6eEzXec-5W_U4s9F2THe0wknyYpb2ptSY6oFH1aQ@mail.gmail.com>
References: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>
	<1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>
	<CAKVJ-_6XT6EO9rJaJ9RJWiE7XTvtAGe4Fq54wfNmCt+Vj57gzg@mail.gmail.com>
	<CADEGkF7GLd6eEzXec-5W_U4s9F2THe0wknyYpb2ptSY6oFH1aQ@mail.gmail.com>
Message-ID: <CAKVJ-_5FFHa25QLE+O6BaURTc6+1ZLQh0rc15iMHfeMbJS_dgA@mail.gmail.com>

On Fri, Oct 26, 2012 at 11:43 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
>>
>> I have started exploring that idea on this new branch,
>> https://github.com/peterjc/biopython/tree/bioseq
>>
>> Does anyone object to me applying the first commit to the master
>> branch (defining the previously discussed new warning for 'beta' code)?
>> https://github.com/peterjc/biopython/commit/97485d5dcf2620f7664ae46a7897c1203847538d
>
> No objection from me for the commit :).
>

Done, commit adding Bio.BiopythonExperimentalWarning cherry-picked
to the master,

https://github.com/biopython/biopython/commit/52ac4383b12335ebcdcb8ea52eec8d23ac28b5e2

Peter

From p.j.a.cock at googlemail.com  Tue Oct 30 07:03:07 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 30 Oct 2012 11:03:07 +0000
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <874nldqi3t.fsf@fastmail.fm>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
	<87pq42s9lt.fsf@fastmail.fm>
	<CAKVJ-_4GzU+5vMXd1XLvycV=tK6xcgMoSA53cjNmYC4fGoPM6w@mail.gmail.com>
	<874nldqi3t.fsf@fastmail.fm>
Message-ID: <CAKVJ-_6uVs=VE6boPAHgPTHgrBS-Q9UrGL+_63V6Go=ch-8oEw@mail.gmail.com>

On Mon, Oct 29, 2012 at 5:54 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
>
>> In the case of Bow's SearchIO code, what would you prefer?
>> e.g. Bio.SearchIO as it is now on his branch?
>
> I like plain ol' Search the best but don't have a strong preference. I'm
> terrible at naming things so trust everyone's judgment on this.
>
> Brad

Since we have no clear consensus, I propose we add Bow's code
as Bio.SearchIO (which is how it is written right now), with the new
BiopythonExperimentalWarning in place (to alert people that it may
change in the next release). We can then rename or move it at a
later date. This will make it easier for people to test the code, and
also suggest further changes or additions (e.g. Kai's HMMER work).

If we and when we agree a consolidation of the Bio.SeqXXX
modules, then Bio.SearchIO could move too. If this happens
before any public release as Bio.SearchIO so much the better.

Adopting lower case module names under Python 3 is also a
separate issue.

Peter

From kai.blin at biotech.uni-tuebingen.de  Tue Oct 30 10:17:38 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Tue, 30 Oct 2012 15:17:38 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <508EEA85.6060906@biotech.uni-tuebingen.de>
References: <508EEA85.6060906@biotech.uni-tuebingen.de>
Message-ID: <508FE182.3040202@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-10-29 21:43, Kai Blin wrote:
Hi Bow,

one more thing:

Hmmer2 has the concept of an accession number in the result. Is there
an attribute for that in the QueryResult object that I'm missing or do
we want a new attribute for that. Would "accession" be a good name?

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQj+GCAAoJEKM5lwBiwTTPaT4IAJb+Xs7sMPpQH4SwUQarItyP
Cg0UYLQNRtKBlyhNpipCbz7BWfqxd8fU0GsYSCVF275fDuBLUa337A6psRzefkWa
84cC7uHmOdcmhyeCipdAs5Jtouxf7ReGuQ+m3/SsW0pRfMHOuZamKw+5+oETnisM
DiHJUv6iKMHCpXrVWpofcKywqb1uqpxdhTp9F1gy+v6rVGKMI4r/fW5mRQZVxC3s
aQdhubCHoN+LUEo/OUKIF6cNeHWLMBToENdYlBhk62gLeSX5bxyhog21pzD+HTYf
5u4rPC2ikVR7iGQ9QPsvW7r7lqpDgoxFbnDYzcsAa+bNYd6+ENs+MAePb8Va2Dg=
=Luz9
-----END PGP SIGNATURE-----

From kai.blin at biotech.uni-tuebingen.de  Tue Oct 30 11:54:50 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Tue, 30 Oct 2012 16:54:50 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <508F834C.6010404@biotech.uni-tuebingen.de>
References: <508EEA85.6060906@biotech.uni-tuebingen.de>
	<CADEGkF69XmC-ShWHpyhAcJBY0ZuUCAjXzQhE=FwCxCMXaUrFng@mail.gmail.com>
	<508F834C.6010404@biotech.uni-tuebingen.de>
Message-ID: <508FF84A.2020802@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-10-30 08:35, Kai Blin wrote:

Hi Bow,

> I'm mainly wondering why at this position, I can't just create the
> Hit object already, and then later set the HSPs. You could do this
> via a setter function that validates the IDs are identical if you
> want to make sure you're not shooting yourself in the foot there.

I've just stumbled over a case where not being able to pre-create Hit
objects really bites me.

See the attached hmmpfam output. You'll notice that the domain table
is not in the order of the hit table. As I'd like to preserve the
order of the hit table, the current setup of the API forces me to
either repeatedly parse the domain annotations until I find the
correct domain annotations for my hit, or to create the hits in the
order of the domain annotation table and then reshuffle them to make
sure they're in the order of the hit table.

If I could just create "empty" hit objects when parsing the hit table,
I could easily preserve the order of the hits but still add the hsps
as I parse them.

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQj/hKAAoJEKM5lwBiwTTPWTYH/2miexrfxolw9J0tOCSHXFYn
eNEzLcIM8ZHUoBCL1fsS/9166VH8D8HpyZCgTQwsSt9BUhQbjkwTmyfmP9wr0QDp
80IbxqWkMAJmDv3Q1RxbVVmD8TTfY6AwezQuwnYb8EFJDD7wvcJOJgJEqlp6zZu1
K/fJNYOXt2GekcXkrOMO1jGkzzpiwBs1uhhpYH9LxMAHPW3vnfTf4/tVSRPOKWRr
IXtxRnLSSurmZP4DYNm1ys4NykY6cO6zPOWxJIiI1lBLR7AVaKNK1bZ75m2D7/Mr
Y4FjnIlqaCFuNwiYPSNWQvTHOIj/VF/nRSWAVRRCqYZoYaDuZa25rb3Fo5RHMC8=
=Lerj
-----END PGP SIGNATURE-----
-------------- next part --------------
hmmpfam - search one or more sequences against HMM database
HMMER 2.3.2 (Oct 2003)
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
Freely distributed under the GNU General Public License (GPL)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 ../Shared/Pfam_fs
Sequence file:            single_porphyra_AA.fa
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query sequence: gi|90819130|dbj|BAE92499.1|
Accession:      [none]
Description:    glutamate synthase [Porphyra yezoensis]

Scores for sequence family classification (score includes all domains):
Model           Description                             Score    E-value  N 
--------        -----------                             -----    ------- ---
Glu_synthase    Conserved region in glutamate synthas   858.6   3.6e-255   2
GATase_2        Glutamine amidotransferases class-II    731.8   3.9e-226   1
Glu_syn_central Glutamate synthase central domain       649.1   7.9e-213   1
GXGXG           GXGXG motif                             367.3   2.7e-107   1
HdeA            hns-dependent expression protein A (H     9.6      0.015   1
GDC-P           Glycine cleavage system P-protein         7.1      0.086   1
Cache_1         Cache domain                              7.0       0.14   1
IBN_N           Importin-beta N-terminal domain           8.2       0.17   1
DUF1200         Protein of unknown function (DUF1200)     6.7       0.42   1
cobW            CobW/HypB/UreG, nucleotide-binding do     5.1       0.45   1
PUF             Pumilio-family RNA binding repeat         6.5       0.47   1
Arch_flagellin  Archaebacterial flagellin                 4.1       0.66   1
FMN_dh          FMN-dependent dehydrogenase               3.2       0.89   1
RNA_pol_Rpb2_4  RNA polymerase Rpb2, domain 4             4.6        1.4   1
DUF477          Domain of unknown function (DUF477)       3.8        1.7   1
FRG1            FRG1-like family                          0.2        1.7   1
DUF1393         Protein of unknown function (DUF1393)     3.1          2   1
tRNA_anti       OB-fold nucleic acid binding domain       4.9          2   1
SelT            Selenoprotein T                           3.1        2.2   1
RNase_PH_C      3' exoribonuclease family, domain 2       4.2        2.3   1
Pencillinase_R  Penicillinase repressor                   3.9        2.5   1
Hormone_4       Neurohypophysial hormones, N-terminal     4.4        2.5   1
DSRB            Dextransucrase DSRB                       2.7        2.7   1
FtsK_SpoIIIE    FtsK/SpoIIIE family                       2.6        3.1   1
UBA             UBA/TS-N domain                           4.2        3.1   1
DUF1981         Domain of unknown function (DUF1981)      3.6        3.3   1
Gla             Vitamin K-dependent carboxylation/gam     4.0        3.5   1
Scm3            Centromere protein Scm3                   2.2        3.5   1
Ribosomal_S6    Ribosomal protein S6                      3.3        3.7   1
Cystatin        Cystatin domain                           2.4        3.9   1
Phage_prot_Gp6  Phage portal protein, SPP1 Gp6-like       1.0          4   1
DUF1976         Domain of unknown function (DUF1976)     -1.5        4.3   1
DUF37           Domain of unknown function DUF37          3.0        4.5   1
Flavodoxin_NdrI NrdI Flavodoxin like                      2.1        4.6   1
Bac_rhodopsin   Bacteriorhodopsin                         0.9        4.9   1
Nitro_FeMo-Co   Dinitrogenase iron-molybdenum cofacto     2.1        5.3   1
MoCF_biosynth   Probable molybdopterin binding domain     1.3        5.6   1
PaaA_PaaC       Phenylacetic acid catabolic protein       0.4        5.6   1
Albicidin_res   Albicidin resistance domain               1.7        5.7   1
DUF1514         Protein of unknown function (DUF1514)     3.5        5.7   1
T5orf172        T5orf172 domain                           2.0        6.1   1
Nup133_N        Nup133 N terminal like                   -0.6        6.5   1
BicD            Microtubule-associated protein Bicaud    -1.6        6.8   1
Sel1            Sel1 repeat                               2.5          7   1
CAP_C           DE   Adenylate cyclase associated (CA     1.3        7.4   1
Colicin         Colicin pore forming domain               1.4        7.5   1
MADF_DNA_bdg    Alcohol dehydrogenase transcription f     1.8        8.2   1
DUF258          Protein of unknown function, DUF258       0.3        8.3   1
PspB            Phage shock protein B                     0.4        8.4   1
GspM            General secretion pathway, M protein      1.0        8.6   1
Coq4            Coenzyme Q (ubiquinone) biosynthesis     -0.3        9.1   1
P22_AR_N        P22_AR N-terminal domain                 -0.2        9.5   1
C1_2            C1 domain                                 1.1        9.6   1
Phage_Mu_P      Bacteriophage Mu P protein               -0.4         10   1

Parsed for domains:
Model           Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
--------        ------- ----- -----    ----- -----      -----  -------
GATase_2          1/1      34   404 ..     1   385 []   731.8 3.9e-226
FRG1              1/1      88   107 ..   151   173 ..     0.2      1.7
C1_2              1/1     191   210 ..     9    27 ..     1.1      9.6
MADF_DNA_bdg      1/1     235   261 ..    57    95 .]     1.8      8.2
PaaA_PaaC         1/1     258   269 ..     1    13 [.     0.4      5.6
Albicidin_res     1/1     274   289 ..    50    65 ..     1.7      5.7
UBA               1/1     311   331 ..    18    38 .]     4.2      3.1
Gla               1/1     342   357 ..    27    42 .]     4.0      3.5
RNA_pol_Rpb2_4    1/1     369   381 ..     1    13 [.     4.6      1.4
MoCF_biosynth     1/1     371   396 ..    23    49 ..     1.3      5.6
DUF1200           1/1     389   401 ..     1    13 [.     6.7     0.42
Nup133_N          1/1     397   419 ..   475   498 .]    -0.6      6.5
DUF1976           1/1     428   448 ..  1296  1319 .]    -1.5      4.3
Bac_rhodopsin     1/1     445   472 ..   219   250 .]     0.9      4.9
Coq4              1/1     459   481 ..    60    82 ..    -0.3      9.1
Glu_syn_central   1/1     478   773 ..     1   301 []   649.1 7.9e-213
Flavodoxin_NdrI   1/1     488   497 ..   122   131 .]     2.1      4.6
P22_AR_N          1/1     524   541 ..   110   126 .]    -0.2      9.5
Cache_1           1/1     537   557 ..     1    23 [.     7.0     0.14
Glu_synthase      1/2     650   676 ..   297   323 ..     1.3        3
HdeA              1/1     727   749 ..    58    79 .]     9.6    0.015
Sel1              1/1     729   745 ..    32    49 .]     2.5        7
DUF1981           1/1     765   787 ..    62    88 .]     3.6      3.3
tRNA_anti         1/1     818   839 ..    54    85 .]     4.9        2
Cystatin          1/1     826   859 ..     1    38 [.     2.4      3.9
RNase_PH_C        1/1     827   846 ..    64    84 .]     4.2      2.3
Glu_synthase      2/2     830  1216 ..     1   412 []   857.3   9e-255
DUF258            1/1     839   860 ..   282   305 .]     0.3      8.3
Pencillinase_R    1/1     856   894 ..    84   118 .]     3.9      2.5
SelT              1/1     872   885 ..    96   111 .]     3.1      2.2
Nitro_FeMo-Co     1/1     879   897 ..    87   105 .]     2.1      5.3
DUF37             1/1     927   934 ..    61    68 .]     3.0      4.5
Scm3              1/1     953   963 ..   103   113 .]     2.2      3.5
cobW              1/1    1038  1058 ..   202   222 .]     5.1     0.45
Arch_flagellin    1/1    1050  1072 ..   197   219 .]     4.1     0.66
DUF1393           1/1    1055  1068 ..     1    14 [.     3.1        2
FtsK_SpoIIIE      1/1    1107  1143 ..   163   198 ..     2.6      3.1
FMN_dh            1/1    1109  1148 ..   291   330 ..     3.2     0.89
DSRB              1/1    1120  1134 ..     1    16 [.     2.7      2.7
Phage_Mu_P        1/1    1122  1131 ..     1    10 [.    -0.4       10
Hormone_4         1/1    1168  1176 ..     1     9 []     4.4      2.5
GDC-P             1/1    1205  1225 ..    10    30 ..     7.1    0.086
PspB              1/1    1268  1276 ..     1     9 [.     0.4      8.4
T5orf172          1/1    1271  1293 ..    35    58 ..     2.0      6.1
CAP_C             1/1    1283  1292 ..   161   170 .]     1.3      7.4
GXGXG             1/1    1290  1485 ..     1   228 []   367.3 2.7e-107
DUF1514           1/1    1453  1469 ..    50    66 .]     3.5      5.7
Colicin           1/1    1456  1467 ..   192   203 .]     1.4      7.5
Ribosomal_S6      1/1    1461  1481 ..    16    36 ..     3.3      3.7
BicD              1/1    1465  1481 ..     1    17 [.    -1.6      6.8
PUF               1/1    1470  1486 ..    19    35 .]     6.5     0.47
DUF477            1/1    1472  1495 ..     1    24 [.     3.8      1.7
Phage_prot_Gp6    1/1    1479  1492 ..     1    14 [.     1.0        4
IBN_N             1/1    1498  1516 ..     1    20 [.     8.2     0.17
GspM              1/1    1506  1520 ..     1    15 [.     1.0      8.6

Alignments of top-scoring domains:
GATase_2: domain 1 of 1, from 34 to 404: score 731.8, E = 3.9e-226
                CS    EEEEEEEEETSSHSBHHHHHHHHHHHHHGGGGSSCSTTSSCECEEEE
                   *->CGvlGfiAhikgkpshkivedaleaLerLeHRGavgADgktGDGAGI
                      CGv GfiA+ ++ ++hkiv +aleaL+++eHRGa++AD ++GDGAGI
  gi|9081913    34    CGV-GFIADVNNVANHKIVVQALEALTCMEHRGACSADRDSGDGAGI 79   

                CS EEECTCCCHHHHHHHCT----S GC-EEEEEEE-SSHHHHHHHHHHHHHH
                   ltqiPdgFFrevakelGieLpe.gqYAVGmvFLPqdelaraearkifEki
                    t+iP+++F++  ++++i++ ++   +VGm+FLP   l+    + i+E +
  gi|9081913    80 TTAIPWNLFQKSLQNQNIKFEQnDSVGVGMLFLPAHKLKES--KLIIETV 127  

                CS HHHTT-EEEEEEE--B-GGGS-HHHHHC--EEEEEEEE-TT--HHHHHHC
                   aeeeGLeVLGWReVPvnnsvLGetAlatePvIeQvFvgapsgdgedfErr
                   ++ee+Le++GWR VP+  +vLG++A  + P++eQvF+ +++ +++ +E++
  gi|9081913   128 LKEENLEIIGWRLVPTVQEVLGKQAYLNKPHVEQVFCKSSNLSKDRLEQQ 177  

                CS EEEEECHSCHHHHTHHH.    BEEEEEESSEEEEEECC-GGGHHHHBHG
                   LyviRkrieksivaenvn....fYiCSLSsrTIVYKGMLtseQLgqFYpD
                   L+++Rk+iek+i+  + +  ++fYiCSLS++TIVYKGM++s++LgqFY+D
  gi|9081913   178 LFLVRKKIEKYIGINGKDwaheFYICSLSCYTIVYKGMMRSAVLGQFYQD 227  

                CS GGSTTEEBSEEEEEECESSSSSCTGGGSSCEEECCCTTCEEEEEEEEETT
                   LqderfeSalAivHsRFSTNTfPsWplAQPfRVnslwgggivlAHNGEIN
                   L++++++S++Ai+H+RFSTNT+P+WplAQP+R         ++ HNGEIN
  gi|9081913   228 LYHSEYTSSFAIYHRRFSTNTMPKWPLAQPMR---------FVSHNGEIN 268  

                CS THHHHHHHHHHTSCCCSSTTCGHHHHCC-SSS-TTSCHHHHHHHHHHHHH
                   TlrgNrnwMraRegvlksplFgddldkLkPIvneggSDSaalDnvlEllv
                   Tl gN nwM++Re +l+s++++d++++LkPI n+++SDSa+lD ++Ell+
  gi|9081913   269 TLLGNLNWMQSREPLLQSKVWKDRIHELKPITNKDNSDSANLDAAVELLI 318  

                CS HTT--HHHHHHHHS----TT-GGGTST-HHHHHHHHHHHHHHCCHCCEEE
                   raGRslpeAlMMlIPEAWqnnpdmdkdrpekraFYeylsglmEPWDGPAa
                   ++GRs++eAlM+l+PEA+qn+pd   +++e+ +FYey+sgl+EPWDGPA+
  gi|9081913   319 ASGRSPEEALMILVPEAFQNQPDFA-NNTEISDFYEYYSGLQEPWDGPAL 367  

                CS EEEETSSEEEEEEETTTSCESEEEEEEEEEE.TTEEEEEESSC   
                   lvftDGryavgAtLDRNGLTRPaRygiTrdldkDglvvvaSEa<-*
                   +vft+G++ +gAtLDRNGL RPaRy+iT    kD+lv+v+SE+   
  gi|9081913   368 VVFTNGKV-IGATLDRNGL-RPARYVIT----KDNLVIVSSES    404  

FRG1: domain 1 of 1, from 88 to 107: score 0.2, E = 1.7
                   *->FQkfKvDLqdrklrinekDkkel<-*
                      FQk+   Lq+  +  +++D+ ++   
  gi|9081913    88    FQKS---LQNQNIKFEQNDSVGV    107  

C1_2: domain 1 of 1, from 191 to 210: score 1.1, E = 9.6
                   *->idgfyg...fYsCkkccddftl<-*
                      i+g+++ ++fY C+  c  +t+   
  gi|9081913   191    INGKDWaheFYICSLSC--YTI    210  

MADF_DNA_bdg: domain 1 of 1, from 235 to 261: score 1.8, E = 8.2
                   *->drYrrelrkirqgnsegsstgsgesykskWryyeelsFL<-*
                      +++  ++r+               ++ +kW+++  ++F    
  gi|9081913   235    SSFAIYHRRFS------------TNTMPKWPLAQPMRFV    261  

PaaA_PaaC: domain 1 of 1, from 258 to 269: score 0.4, E = 5.6
                CS    X............   
                   *->MYnFvEHGGvint<-*
                      M  Fv H G int   
  gi|9081913   258    M-RFVSHNGEINT    269  

Albicidin_res: domain 1 of 1, from 274 to 289: score 1.7, E = 5.7
                   *->LrlmharEPsLrkgtG<-*
                      L+ m+ rEP L+ +++   
  gi|9081913   274    LNWMQSREPLLQSKVW    289  

UBA: domain 1 of 1, from 311 to 331: score 4.2, E = 3.1
                CS    HHHHHHHHHTTT-HHHHHHHH   
                   *->eeakkALeatngnverAvewL<-*
                      ++a++ L a++ ++e+A+++L   
  gi|9081913   311    DAAVELLIASGRSPEEALMIL    331  

Gla: domain 1 of 1, from 342 to 357: score 4.0, E = 3.5
                CS    CSSHHHHHHHHHHCTC   
                   *->fednegtkefwrkYfg<-*
                      f++n+++  f++ Y g   
  gi|9081913   342    FANNTEISDFYEYYSG    357  

RNA_pol_Rpb2_4: domain 1 of 1, from 369 to 381: score 4.6, E = 1.4
                CS    EEETTEEEEEESS   
                   *->VYvNGklvGthrn<-*
                      V+ NGk++G + +   
  gi|9081913   369    VFTNGKVIGATLD    381  

MoCF_biosynth: domain 1 of 1, from 371 to 396: score 1.3, E = 5.6
                CS    CHHHHHHHHHHHTTTCEEEEEEEE-SS   
                   *->tNgpmLaalLresaGaevirygiVpDd<-*
                      tNg+ + a L +  G  ++ry+i +D+   
  gi|9081913   371    TNGKVIGATLDR-NGLRPARYVITKDN    396  

DUF1200: domain 1 of 1, from 389 to 401: score 6.7, E = 0.42
                   *->kYvltedtLlIks<-*
                      +Yv+t+d L+I+s   
  gi|9081913   389    RYVITKDNLVIVS    401  

Nup133_N: domain 1 of 1, from 397 to 419: score -0.6, E = 6.5
                   *->lylltrnsGvvrIeHaleedstne<-*
                      l++ + +sGvv++e +  + s  +   
  gi|9081913   397    LVIVSSESGVVQVE-PGNVKSKGR    419  

DUF1976: domain 1 of 1, from 428 to 448: score -1.5, E = 4.3
                   *->VsvYiyFkevtdnksLsEysVtyk<-*
                      V++++   ++++nk ++  sVt k   
  gi|9081913   428    VDIFS--HKILNNKEIK-TSVTTK    448  

Bac_rhodopsin: domain 1 of 1, from 445 to 472: score 0.9, E = 4.9
                CS    HHHHHHHHHHHHHHHHHCHHHTC---------   
                   *->vvAKVgFgfilLrsravlertvavgsalaage<-*
                      v++K+++g +l ++r++le  +   + l+++    
  gi|9081913   445    VTTKIPYGELLTDARQILE--HK--PFLSDQQ    472  

Coq4: domain 1 of 1, from 459 to 481: score -0.3, E = 9.1
                   *->rrILkEkPRissetldlkkLrkL<-*
                      r+IL  kP  s  ++d kkL +L   
  gi|9081913   459    RQILEHKPFLSDQQVDIKKLMQL    481  

Glu_syn_central: domain 1 of 1, from 478 to 773: score 649.1, E = 7.9e-213
                CS    HHHHHHCTT--HHHHHCTCHHHHHHSS--EE-S---S--CCC-SS--
                   *->llrrQkAFGYTyEdvelvllPMAetGkEalGSMGdDtPLAVLSekpr
                      l+++Q+AFGYT+Edvelv+++MA+++kE++++MGdD+PL +LSek++
  gi|9081913   478    LMQLQTAFGYTNEDVELVIEHMASQAKEPTFCMGDDIPLSILSEKSH 524  

                CS -GGGCEEE----SSS----TTTTGGG-B--EEES--S-TTS-SGGGC-CE
                   lLYdYFKQlFAQVTNPPIDPIREelVMSLetylGpegNlLeptpeqarrl
                   +LYdYFKQ+FAQVTNP+IDP+RE+lVMSL+ ++G+++NlL+  p+ a+++
  gi|9081913   525 ILYDYFKQRFAQVTNPAIDPLRESLVMSLAIQIGHKSNLLDDQPTLAKHI 574  

                CS EESSSB--HHHHHH.HHHH....CCCCEEEEESEEESTTSTTCHHHHHHH
                   kLesPILsnselekmlknidairegfkaatIditFdveeGvdgLeaaLdr
                   kLesP+++++el++ + +     +++++  I+++F  e+G++ ++  + +
  gi|9081913   575 KLESPVINEGELNA-IFE-----SKLSCIRINTLFQLEDGPKNFKQQIQQ 618  

                CS HHHHHHHHHHCT-SEEEEESTCG--CTTEEE--HHHHHHHHHHHHHCTT-
                   lceeAeeAirsGaniivLSDRndildeervaIPaLLAvGAVHhHLIrkgL
                   lce A++Ai +G ni+vLSD+n+ ld+e+v+IP+LLAvGAVHhHLI kgL
  gi|9081913   619 LCENASQAILDGNNILVLSDKNNSLDSEKVSIPPLLAVGAVHHHLINKGL 668  

                CS CCC-EEEEEESS--SHHHHHHHHCTT-SEEEEHCCHHHHHHHHCCCCCCC
                   RtkvslvVETGEaREvHHFAvLiGYGAsAInPYLAyETirdWWlirrGll
                   R+ +s+ VET++++++HHFA+LiGYGAsAI+PYLA+ET r+WW + ++++
  gi|9081913   669 RQEASILVETAQCWSTHHFACLIGYGASAICPYLAFETARHWWSNPKTKM 718  

                CS CHTTTS- T--HHHHHHHHHHHHHHHHHHHHHCTT--BHHHHCCS--EEE
                   lmskGkl.elsleeavkNYrkAiekGlLKIMSKMGISTlqSYrGAQIFEA
                   lmskG+l++++++ea++NY+kA+e+GlLKI+SKMGIS+l+SY+GAQIFE+
  gi|9081913   719 LMSKGRLpACNIQEAQANYKKAVEAGLLKILSKMGISLLSSYHGAQIFEI 768  

                CS SSB-H   
                   vGLsk<-*
                   +GL++   
  gi|9081913   769 LGLGS    773  

Flavodoxin_NdrI: domain 1 of 1, from 488 to 497: score 2.1, E = 4.6
                CS    -HHHHHHHHH   
                   *->TneDVerVrk<-*
                      TneDVe V +   
  gi|9081913   488    TNEDVELVIE    497  

P22_AR_N: domain 1 of 1, from 524 to 541: score -0.2, E = 9.5
                   *->dVLydYWtrkGkAv..NPR<-*
                      ++LydY+  + +A  +NP+   
  gi|9081913   524    HILYDYFK-QRFAQvtNPA    541  

Cache_1: domain 1 of 1, from 537 to 557: score 7.0, E = 0.14
                   *->wTePYvdaalktgdlViTiaqPv<-*
                      +T+P++d +  +++lV ++a+++   
  gi|9081913   537    VTNPAIDPL--RESLVMSLAIQI    557  

Glu_synthase: domain 1 of 2, from 650 to 676: score 1.3, E = 3
                CS    --HHHHHHHHHHHHHCTT-CCCSEEEE   
                   *->lPwelgLaevhqtLvengLRdrVsLia<-*
                      +P  l++ +vh  L++ gLR + s+ +   
  gi|9081913   650    IPPLLAVGAVHHHLINKGLRQEASILV    676  

HdeA: domain 1 of 1, from 727 to 749: score 9.6, E = 0.015
                   *->ACk.QdkkAsFkdKvkaEldKvk<-*
                      AC  Q+ +A++k+ v+a l K+    
  gi|9081913   727    ACNiQEAQANYKKAVEAGLLKIL    749  

Sel1: domain 1 of 1, from 729 to 745: score 2.5, E = 7
                CS    .HHH.HHHHHHHHHHTT-   
                   *->DyekeAlkwyekAAeqGn<-*
                      ++++ A + y+kA e+G    
  gi|9081913   729    NIQE-AQANYKKAVEAGL    745  

DUF1981: domain 1 of 1, from 765 to 787: score 3.6, E = 3.3
                   *->iFgvltlaakeesesivklAfqiid.qi<-*
                      iF++l+l++       v+lAf+ +++qi   
  gi|9081913   765    IFEILGLGSEV-----VNLAFKGTTsQI    787  

tRNA_anti: domain 1 of 1, from 818 to 839: score 4.9, E = 2
                CS    EEEEEEETTSSTSTCTCTT..EEEEEEEEEEE   
                   *->tGkvkkrpggeqNnlkTGeKAlelvveeievl<-*
                      +G v+ rpgge          ++++ +e+      
  gi|9081913   818    YGFVQYRPGGE----------YHINNPEMSKA    839  

Cystatin: domain 1 of 1, from 826 to 859: score 2.4, E = 3.9
                CS    ECEEEEET.STSHHHHHHHHHHHHHHHHHSSSSEEEEE   
                   *->GglspvdpNendpevqealdfAlakyNeksndnylfel<-*
                      Gg   +++    pe  +al+ A+  yN +  +ny++ l   
  gi|9081913   826    GGEYHINN----PEMSKALHQAVRGYNPEYYNNYQSLL    859  

RNase_PH_C: domain 1 of 1, from 827 to 846: score 4.2, E = 2.3
                CS    SSSS.B.HHHHHHHHHHHHHH   
                   *->GkgnglteelleealelAkeg<-*
                      G +++++ +++ +al++A+ g   
  gi|9081913   827    G-EYHINNPEMSKALHQAVRG    846  

Glu_synthase: domain 2 of 2, from 830 to 1216: score 857.3, E = 9e-255
                CS    -SS-HHHHHHHHHHHHC--T-HHHHHHHHHHHHTS.-S-SGGGGEEE
                   *->hrnepeviktlqkavqvpveskpsydkYreplnertpigalrdlLef
                      h n+pe++k l++av+    +   y +Y+ +l +r p++alrdlL++
  gi|9081913   830    HINNPEMSKALHQAVRG--YNPEYYNNYQSLLQNR-PPTALRDLLKL 873  

                CS --SS--......--GGGS--HHHHHTTEEEEEB-CTTC-HHHHHHHHHHH
                   kyaeepldtdkiipieevepaleikkrfctgaMSyGALSeeAheALAiAm
                    ++++p      i+i+eve+++ i + fctg+MS+GALS+e+he+LAiAm
  gi|9081913   874 QSNRAP------ISIDEVESIEDILQKFCTGGMSLGALSRETHETLAIAM 917  

                CS HHCT-EEEETTT---GGGCSB-TTS-T S BTTSTT--S--TT-B---SE
                   nriGtksNtGEGGedperlkpaadlds.G.SpTlpHLkGLqnednarSAI
                   nriG+ksN+GEGGedp r+k + d++s+G+Sp lpHLkGL+n+d+a+SAI
  gi|9081913   918 NRIGGKSNSGEGGEDPVRFKILNDVNSsGtSPLLPHLKGLKNGDTASSAI 967  

                CS EEE-TT-TT--............HHHHCC-SEEEEE---TTSTTT--EE-
                   kQvASGRFGVtkRnGefWeefkRseYLvnAdalEIKiAQGAKPGeGGhLP
                   kQ+ASGRFGVt            +eYL+nA++lEIKiAQGAKPGeGG+LP
  gi|9081913   968 KQIASGRFGVT------------PEYLMNAKQLEIKIAQGAKPGEGGQLP 1005 

                CS GGG--HHHHHHHTS-TT--EE--SS-TT-SSHHHHHHHHHHHHHH-.TTS
                   GeKVspeIAriRnstPGvgliSPpPHHDIysiEDLaqLIydLkeindpkA
                   G+K+sp+IA +R ++PGv liSPpPHHDIysiEDL+qLI+dL++in pkA
  gi|9081913  1006 GKKISPYIATLRKCKPGVPLISPPPHHDIYSIEDLSQLIFDLHQIN-PKA 1054 

                CS EEEEEEE-STTHHHHHHH...HHHTT-SEEEEE-TT---SSEECCHHHHC
                   pisVKLVsehgvgtiaaGhmqvakAnADiIlIdGhdGGTGASpktsikha
                   +isVKLVse g+gtiaaG   vak+nADiI+I+GhdGGTGASp++sikha
  gi|9081913  1055 KISVKLVSEIGIGTIAAG---VAKGNADIIQISGHDGGTGASPLSSIKHA 1101 

                CS ---HHHHHHHHHHHHHCTT-CCCSEEEEESS--SHHHHHHHHHCT-SEEE
                   GlPwelgLaevhqtLvengLRdrVsLiadGGLrTGaDVakAaaLGAdavg
                   G PwelgL+evhq+L en+LRdrV+L++dGGLrTG D+++Aa++GA+++g
  gi|9081913  1102 GSPWELGLSEVHQLLAENQLRDRVTLRVDGGLRTGSDIVLAAIMGAEEFG 1151 

                CS -SHHHHHHCT--S---CCCT--TTSSS---CCHH..CT----HHHHHHHH
                   iGTaaLiAlGCimaRvCHtntCPvGvATQDPeLrKrlkfegaperVvNyf
                   +GT+a+iA+GCimaR+CHtn+CPvGvATQ++eLr   +f g+pe +vN+f
  gi|9081913  1152 FGTVAMIATGCIMARICHTNKCPVGVATQREELR--ARFSGVPEALVNFF 1199 

                CS HHHHHHHHHHHHHHT-S   
                   iflaeEvrellaqlGfr<-*
                   +f+  Evre+la+lG++   
  gi|9081913  1200 LFIGNEVREILASLGYK    1216 

DUF258: domain 1 of 1, from 839 to 860: score 0.3, E = 8.3
                CS    HHHHHHHCTSS-HHHHHHHHHHHH   
                   *->AVkaAveeGeIseeRYesYlklle<-*
                      A+ +Av    +++e Y++Y+ ll+   
  gi|9081913   839    ALHQAVR--GYNPEYYNNYQSLLQ    860  

Pencillinase_R: domain 1 of 1, from 856 to 894: score 3.9, E = 2.5
                CS    XXXXXXXXXXXXXXXXXXX    XXXXXXXXXXXXXXXX   
                   *->drlfggsvgalvanfleee....klSeddieeLrelLde<-*
                      + l++++++ ++ ++l+ ++++ ++S d++e ++++L++   
  gi|9081913   856    QSLLQNRPPTALRDLLKLQsnraPISIDEVESIEDILQK    894  

SelT: domain 1 of 1, from 872 to 885: score 3.1, E = 2.2
                   *->KLqtGrvYAPPtpqEL<-*
                      KLq++r   P++++E+   
  gi|9081913   872    KLQSNRA--PISIDEV    885  

Nitro_FeMo-Co: domain 1 of 1, from 879 to 897: score 2.1, E = 5.3
                CS    EEE-TTSSBHHHHHHHHHC   
                   *->pikagegetieeaiealqe<-*
                      pi   e e+ie+ + ++ +   
  gi|9081913   879    PISIDEVESIEDILQKFCT    897  

DUF37: domain 1 of 1, from 927 to 934: score 3.0, E = 4.5
                   *->hpGGyDPV<-*
                      ++GG DPV   
  gi|9081913   927    GEGGEDPV    934  

Scm3: domain 1 of 1, from 953 to 963: score 2.2, E = 3.5
                   *->HLraLeteddi<-*
                      HL++L+++d++   
  gi|9081913   953    HLKGLKNGDTA    963  

cobW: domain 1 of 1, from 1038 to 1058: score 5.1, E = 0.45
                CS    ...HHHHHHHHHH-SSS-EEE   
                   *->adlekleadlrrlnpeapiip<-*
                      +dl++l+ dl+++np+a+i     
  gi|9081913  1038    EDLSQLIFDLHQINPKAKISV    1058 

Arch_flagellin: domain 1 of 1, from 1050 to 1072: score 4.1, E = 0.66
                   *->inpstkvrgeVvpenGapgtief<-*
                      inp  k+++++v+e+G+ ++      
  gi|9081913  1050    INPKAKISVKLVSEIGIGTIAAG    1072 

DUF1393: domain 1 of 1, from 1055 to 1068: score 3.1, E = 2
                   *->klSvKtVVAiGIGA<-*
                      k+SvK V  iGIG+   
  gi|9081913  1055    KISVKLVSEIGIGT    1068 

FtsK_SpoIIIE: domain 1 of 1, from 1107 to 1143: score 2.6, E = 3.1
                   *->lviDnydeLaeenlL.ervtsLknqGlsygvhvmata<-*
                      l++ + ++L +en+L++rvt+ + +Gl +g +++++a   
  gi|9081913  1107    LGLSEVHQLLAENQLrDRVTLRVDGGLRTGSDIVLAA    1143 

FMN_dh: domain 1 of 1, from 1109 to 1148: score 3.2, E = 0.89
                CS    HHHHHHHHHCHHTTTSSEEEEESS-SSHHHHHHHHHHTSS   
                   *->LpeVvPIlkeaAvkgdieVllDgGvRRGtDVlKALALGAr<-*
                      L eV  +l e  + +++   +DgG R+G+D++ A  +GA+   
  gi|9081913  1109    LSEVHQLLAENQLRDRVTLRVDGGLRTGSDIVLAAIMGAE    1148 

DSRB: domain 1 of 1, from 1120 to 1134: score 2.7, E = 2.7
                   *->mKvndrvtvKtDGgpR<-*
                       ++ drvt + DGg R   
  gi|9081913  1120    -QLRDRVTLRVDGGLR    1134 

Phage_Mu_P: domain 1 of 1, from 1122 to 1131: score -0.4, E = 10
                   *->sntVtLrvgG<-*
                       ++VtLrv+G   
  gi|9081913  1122    RDRVTLRVDG    1131 

Hormone_4: domain 1 of 1, from 1168 to 1176: score 4.4, E = 2.5
                CS    X-TT--TT-   
                   *->CyirnCPrG<-*
                      C  + CP+G   
  gi|9081913  1168    CHTNKCPVG    1176 

GDC-P: domain 1 of 1, from 1205 to 1225: score 7.1, E = 0.086
                   *->eqqeMLstiGlssLddLidat<-*
                      e++e+L+++G++sLdd ++++   
  gi|9081913  1205    EVREILASLGYKSLDDITGQN    1225 

PspB: domain 1 of 1, from 1268 to 1276: score 0.4, E = 8.4
                   *->MsaffLagP<-*
                      M+ ++La+P   
  gi|9081913  1268    MDDDILAIP    1276 

T5orf172: domain 1 of 1, from 1271 to 1293: score 2.0, E = 6.1
                   *->dvvalievedaraklEklLHkrFk<-*
                      d+ a+ ev++a  klE+++ k+Fk   
  gi|9081913  1271    DILAIPEVSNAI-KLETEITKHFK    1293 

CAP_C: domain 1 of 1, from 1283 to 1292: score 1.3, E = 7.4
                CS    EEEEEE----   
                   *->KLvTevveha<-*
                      KL+Te++ h    
  gi|9081913  1283    KLETEITKHF    1292 

GXGXG: domain 1 of 1, from 1290 to 1485: score 367.3, E = 2.7e-107
                CS    EEEEE-TT--STTHHHHHHHHHHCTTTS.S-TTCEEEEEEEEE-TTT
                   *->keeaiiNtdrlvgtrlsgeiakkygeegalpkdtgkivfnGsAGqsf
                      k+++i Nt+r+vgtrlsg iak yg+ g + k+ +k++f+GsAGqsf
  gi|9081913  1290    KHFKIANTNRTVGTRLSGIIAKNYGNTG-F-KGLIKLNFYGSAGQSF 1334 

                CS TTT-BTTEEEEEEEEE-S.TTTTT-ECCEEEEE--TT-.......SS-GG
                   GafmagGvtLeleGdAnddyvGkgmsGGeIvikgnagdpvGnnMdageyv
                   Gaf+a+G++L l+G+And yvGkgm+GG+Ivi+++ag         +e +
  gi|9081913  1335 GAFLASGINLKLMGEAND-YVGKGMNGGSIVIVPPAGT-------IYEDN 1376 

                CS GSEEC-SSTTTT--CEEEEESSEE-TTTTTT-.....CCEEEEESEB.-S
                   gnviaGNtclyGatGGkifiaGdAGerfgvrnkayKdsgatiVveGvaGd
                   ++vi+GNtclyGatGG++f++G+AGerf+vrn     s a+ VveGv Gd
  gi|9081913  1377 NQVIIGNTCLYGATGGYLFAQGQAGERFAVRN-----SLAESVVEGV-GD 1420 

                CS STTTT-EEEEEEESS-B-SSBTTT--CCEEEEE-TTS.......THHHHB
                   hggEYMtGGtivVlGdaGrnvGagMtGGiaYvlgeiedfsyMiatlpgkv
                   h++EYMtGG+ivVlG+aGrnvGagMtGG+aY+l+e+e        + ++v
  gi|9081913  1421 HACEYMTGGVIVVLGKAGRNVGAGMTGGLAYFLDEDE-------NFIDRV 1463 

                CS -CCCEEEE...ES-S......CCHHHHHHHH   
                   nleiVeledlkrievkrkklLpegekqlkel<-*
                   n+eiV+ +   r+ +      ++ge+qlk+l   
  gi|9081913  1464 NSEIVKIQ---RVIT------KAGEEQLKNL    1485 

DUF1514: domain 1 of 1, from 1453 to 1469: score 3.5, E = 5.7
                   *->LeeyrieveRikkevkk<-*
                      L e+++ ++R++ e+ k   
  gi|9081913  1453    LDEDENFIDRVNSEIVK    1469 

Colicin: domain 1 of 1, from 1456 to 1467: score 1.4, E = 7.5
                CS    SHHHHHHHHHCH   
                   *->DdkfveklNkli<-*
                      D++f++ +N +i   
  gi|9081913  1456    DENFIDRVNSEI    1467 

Ribosomal_S6: domain 1 of 1, from 1461 to 1481: score 3.3, E = 3.7
                CS    CCHHHHHHHHHHHHHCTT-EE   
                   *->EqvkqeiekYqkvLtnngAei<-*
                      ++v++ei k+q+v+t++g+e+   
  gi|9081913  1461    DRVNSEIVKIQRVITKAGEEQ    1481 

BicD: domain 1 of 1, from 1465 to 1481: score -1.6, E = 6.8
                   *->gqaysnqrkvAkdGeer<-*
                       + +++qr+ +k Gee+   
  gi|9081913  1465    SEIVKIQRVITKAGEEQ    1481 

PUF: domain 1 of 1, from 1470 to 1486: score 6.5, E = 0.47
                   *->lQkllevateeqkqlil<-*
                      +Q+++++a+eeq ++++   
  gi|9081913  1470    IQRVITKAGEEQLKNLI    1486 

DUF477: domain 1 of 1, from 1472 to 1495: score 3.8, E = 1.7
                   *->gtLspserarLeqalaalEqktga<-*
                      ++++++  ++L   ++  ++ktg+   
  gi|9081913  1472    RVITKAGEEQLKNLIENHAAKTGS    1495 

Phage_prot_Gp6: domain 1 of 1, from 1479 to 1492: score 1.0, E = 4
                   *->eEmikkFidkHklr<-*
                      eE +k++i+ H+++   
  gi|9081913  1479    EEQLKNLIENHAAK    1492 

IBN_N: domain 1 of 1, from 1498 to 1516: score 8.2, E = 0.17
                CS    HHHHHHHHHCCTHHCHHHHH   
                   *->AEkqLeqlekqklPgfllaL<-*
                      A++ Le+++++ lP+f++ +   
  gi|9081913  1498    AHTILEKWNSY-LPQFWQVV    1516 

GspM: domain 1 of 1, from 1506 to 1520: score 1.0, E = 8.6
                CS    XXXXXXXXXXXXXXX   
                   *->mneLqawWqgrspRE<-*
                      ++ L ++Wq ++p+E   
  gi|9081913  1506    NSYLPQFWQVVPPSE    1520 

//

From etal at uga.edu  Tue Oct 30 13:21:25 2012
From: etal at uga.edu (Eric Talevich)
Date: Tue, 30 Oct 2012 13:21:25 -0400
Subject: [Biopython-dev] Fwd: Pull Request: MafIO.py
In-Reply-To: <5aa8ce85a0ec41b5b817cdc5105bfdfb@BLUPRD0210HT004.namprd02.prod.outlook.com>
References: <5aa8ce85a0ec41b5b817cdc5105bfdfb@BLUPRD0210HT004.namprd02.prod.outlook.com>
Message-ID: <CAMC681mrW5KrjZb32tUHDm5bBHQfosNZHM1yQaN4Ac9YjVHS3A@mail.gmail.com>

---------- Forwarded message ----------
From: Nick Loman <n.j.loman at bham.ac.uk>
Date: Tue, Oct 30, 2012 at 6:34 AM
Subject: Pull Request: MafIO.py


 Hi there

 Thanks for the MafIO branch. In order to get it to read MAF files produced
by Mugsy (mugsy.sourceforge.net) I had to make the following change:

 diff --git a/Bio/AlignIO/MafIO.py b/Bio/AlignIO/MafIO.py
index 6eda0ca..4bb1407 100644
--- a/Bio/AlignIO/MafIO.py
+++ b/Bio/AlignIO/MafIO.py
@@ -178,7 +178,7 @@ def MafIterator(handle, seq_count = None, alphabet =
single_letter_alphabet):

              annotations = dict([x.split("=") for x in
line.strip().split()[1:]])

 -            if len([x for x in annotations.keys() if x not in ("score",
"pass")]) > 0:
+            if len([x for x in annotations.keys() if x not in ("score",
"pass", "label", "mult")]) > 0:
                 raise ValueError("Error parsing alignment - invalid key in
'a' line")
         elif line.startswith("#"):
             # ignore comments


 My Python fork is a bit confusing right now so hope you don't mind me
sending this pull request via email!

 Cheers

 Nick

From w.arindrarto at gmail.com  Tue Oct 30 20:09:41 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 31 Oct 2012 01:09:41 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <508FE182.3040202@biotech.uni-tuebingen.de>
References: <508EEA85.6060906@biotech.uni-tuebingen.de>
	<508FE182.3040202@biotech.uni-tuebingen.de>
Message-ID: <CADEGkF6tBqaYUTuX26MtLuq+sncu_=zdo8P-+yfg4Nn11huo_Q@mail.gmail.com>

Hi Kai,

> one more thing:
>
> Hmmer2 has the concept of an accession number in the result. Is there
> an attribute for that in the QueryResult object that I'm missing or do
> we want a new attribute for that. Would "accession" be a good name?
>
> Cheers,
> Kai

I've used '.acc' for accesion number properties in the current HMMER3
and BLAST parsers, but this choice was arbitrary. '.accession' is a
good name. I didn't use it because I like shorter names better, but
then again it may be unclear at times.

Does anyone have preference between '.acc' or '.accession'? If not, I
can change the current '.acc' into '.accession'.

cheers,
Bow

From w.arindrarto at gmail.com  Tue Oct 30 20:19:30 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 31 Oct 2012 01:19:30 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <508FF84A.2020802@biotech.uni-tuebingen.de>
References: <508EEA85.6060906@biotech.uni-tuebingen.de>
	<CADEGkF69XmC-ShWHpyhAcJBY0ZuUCAjXzQhE=FwCxCMXaUrFng@mail.gmail.com>
	<508F834C.6010404@biotech.uni-tuebingen.de>
	<508FF84A.2020802@biotech.uni-tuebingen.de>
Message-ID: <CADEGkF4kdFwBwHfHJ-ZF42zijKdAoVTiGqTpJEqVu-9JnNS4mQ@mail.gmail.com>

Hi Kai,

> I've just stumbled over a case where not being able to pre-create Hit
> objects really bites me.
>
> See the attached hmmpfam output. You'll notice that the domain table
> is not in the order of the hit table. As I'd like to preserve the
> order of the hit table, the current setup of the API forces me to
> either repeatedly parse the domain annotations until I find the
> correct domain annotations for my hit, or to create the hits in the
> order of the domain annotation table and then reshuffle them to make
> sure they're in the order of the hit table.
>
> If I could just create "empty" hit objects when parsing the hit table,
> I could easily preserve the order of the hits but still add the hsps
> as I parse them.

Hmm..

This is a problem :/. I didn't expect any format to have this kind of ordering.

I'll see what I can do with the current API limitation. We may need to
change it back to not requiring any HSPs for Hit. In any case, I'll
see what needs to be done first and get back asap.

cheers,
Bow

From mjldehoon at yahoo.com  Tue Oct 30 21:12:18 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 30 Oct 2012 18:12:18 -0700 (PDT)
Subject: [Biopython-dev] Working with the new SearchIO API
Message-ID: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com>


>Does anyone have preference between '.acc' or '.accession'? If not, I
>can change the current '.acc' into '.accession'.

I would prefer .accession for clarity.
Best,
-Michiel


From andrewscz at gmail.com  Wed Oct 31 14:10:48 2012
From: andrewscz at gmail.com (Andrew Sczesnak)
Date: Wed, 31 Oct 2012 11:10:48 -0700
Subject: [Biopython-dev] Pull Request: MafIO.py
In-Reply-To: <mailman.1.1351699203.6679.biopython-dev@lists.open-bio.org>
References: <mailman.1.1351699203.6679.biopython-dev@lists.open-bio.org>
Message-ID: <01027F16-EBA0-41A2-B1F5-D0E128B0B08E@gmail.com>

Nick,

Can you provide a snippet of a file from mugsy for the unit tests?

Thanks,
Andrew

On Oct 31, 2012, at 9:00 AM, biopython-dev-request at lists.open-bio.org wrote:

> From: Nick Loman <n.j.loman at bham.ac.uk>
> Date: Tue, Oct 30, 2012 at 6:34 AM
> Subject: Pull Request: MafIO.py
> 
> 
> Hi there
> 
> Thanks for the MafIO branch. In order to get it to read MAF files produced
> by Mugsy (mugsy.sourceforge.net) I had to make the following change:
> 
> diff --git a/Bio/AlignIO/MafIO.py b/Bio/AlignIO/MafIO.py
> index 6eda0ca..4bb1407 100644
> --- a/Bio/AlignIO/MafIO.py
> +++ b/Bio/AlignIO/MafIO.py
> @@ -178,7 +178,7 @@ def MafIterator(handle, seq_count = None, alphabet =
> single_letter_alphabet):
> 
>              annotations = dict([x.split("=") for x in
> line.strip().split()[1:]])
> 
> -            if len([x for x in annotations.keys() if x not in ("score",
> "pass")]) > 0:
> +            if len([x for x in annotations.keys() if x not in ("score",
> "pass", "label", "mult")]) > 0:
>                 raise ValueError("Error parsing alignment - invalid key in
> 'a' line")
>         elif line.startswith("#"):
>             # ignore comments
> 
> 
> My Python fork is a bit confusing right now so hope you don't mind me
> sending this pull request via email!
> 
> Cheers
> 
> Nick

From redmine at redmine.open-bio.org  Wed Oct 31 15:09:57 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 31 Oct 2012 19:09:57 +0000
Subject: [Biopython-dev] [Biopython - Bug #3297] newline added in quated
	features
References: <redmine.issue-3297.20110926204742@redmine.open-bio.org>
Message-ID: <redmine.journal-14991.20121031190957@redmine.open-bio.org>


Issue #3297 has been updated by Chris Fields.

Assignee changed from Bioperl Guts to Biopython Dev Mailing List

Changing default assignee.
----------------------------------------
Bug #3297: newline added in quated features
https://redmine.open-bio.org/issues/3297

Author: Jesse van Dam
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


Note: sorry for the duplicate reporting, did not notice the makeup of the bug reporting system

When I have a feature line like (which spans multiple lines) in a genbank file

<pre>
                     /product="Glutamate synthase [NADPH] small chain (EC 1.4.1
                     .13)"

</pre>

Then a space/newline will be added between 1.4.1 and .13 in the result so when printing the feature with the following code
<pre>
  print(source[0].qualifiers["product"])
</pre>

It will print (with the an unwanted space) 
<pre>
Glutamate synthase [NADPH] small chain (EC 1.4.1 .13)
</pre>

Changed the following thing in scanner.py to fix this problem
<pre>
                    elif value[0]=='"':
                        #Quoted...
                        if value[-1]!='"' or value!='"':
                            #No closing quote on the first line...
                            while value[-1] != '"':
-                               value += "\n" + iterator.next() 
+                               value += iterator.next() 
                        else:
                            #One single line (quoted)
                            assert value == '"'
                            if self.debug : print "Quoted line %s:%s" % (key, value)
                        #DO NOT remove the quotes...
                        qualifiers.append((key,value))

</pre>


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From b.invergo at gmail.com  Mon Oct  1 09:52:04 2012
From: b.invergo at gmail.com (Brandon Invergo)
Date: Mon, 01 Oct 2012 11:52:04 +0200
Subject: [Biopython-dev] PAML test problems under Python 3.3.0
In-Reply-To: <CAKVJ-_4DCG=_d097D=M5Ld1AthCVmZ50qixL4HR7OLOK68ZkuQ@mail.gmail.com>
References: <CAKVJ-_4DCG=_d097D=M5Ld1AthCVmZ50qixL4HR7OLOK68ZkuQ@mail.gmail.com>
Message-ID: <87k3vazfi3.fsf@invergo.net>

Yes no problem, I can take a look at it. I'm completely swamped at the
moment, though, so I might have to put it off for a couple of days. If
it's an emergency, let me know.

-brandon


Peter Cock <p.j.a.cock at googlemail.com> writes:

> Hi Brandon (et al),
>
> Could you have a look at the PAML unit tests under Python 3.3 please?
> I see a mix of failures and 'blocking' under a self-compiled Python 3.3.0
> on Mac OS X 10.8 (Mountain Lion):
>
> $ python3 test_PAML_yn00.py
> testAlignmentExists (__main__.ModTest) ... ok
> testAlignmentFileIsValid (__main__.ModTest) ... FAIL
> testAlignmentSpecified (__main__.ModTest) ... ok
> testCtlFileExistsOnRead (__main__.ModTest) ... ok
> testCtlFileExistsOnRun (__main__.ModTest) ... ok
> testCtlFileValidOnRead (__main__.ModTest) ... ERROR
> testCtlFileValidOnRun (__main__.ModTest) ... ok
> testOptionExists (__main__.ModTest) ... ok
> testOutputFileSpecified (__main__.ModTest) ... ok
> testOutputFileValid (__main__.ModTest) ... ok
> testParseAllVersions (__main__.ModTest) ... ok
> testResultsExist (__main__.ModTest) ... ok
> testResultsParsable (__main__.ModTest) ... ok
> testResultsValid (__main__.ModTest) ... ^C
>
> $ python3 test_PAML_codeml.py
> testAlignmentExists (__main__.ModTest) ... ok
> testAlignmentFileIsValid (__main__.ModTest) ... FAIL
> testAlignmentSpecified (__main__.ModTest) ... ok
> testCtlFileExistsOnRead (__main__.ModTest) ... ok
> testCtlFileExistsOnRun (__main__.ModTest) ... ok
> testCtlFileValidOnRead (__main__.ModTest) ... ERROR
> testCtlFileValidOnRun (__main__.ModTest) ... ok
> testOptionExists (__main__.ModTest) ... ok
> testOutputFileSpecified (__main__.ModTest) ... ok
> testOutputFileValid (__main__.ModTest) ... ok
> testPamlErrorsCaught (__main__.ModTest) ... ok
> testParseAA (__main__.ModTest) ... ok
> testParseAAPairwise (__main__.ModTest) ... ok
> testParseAllNSsites (__main__.ModTest) ... ok
> testParseBranchSiteA (__main__.ModTest) ... ok
> testParseCladeModelC (__main__.ModTest) ... ok
> testParseFreeRatio (__main__.ModTest) ... ok
> testParseNSsite3 (__main__.ModTest) ... ok
> testParseNgene2Mgene02 (__main__.ModTest) ... ok
> testParseNgene2Mgene1 (__main__.ModTest) ... ok
> testParseNgene2Mgene34 (__main__.ModTest) ... ok
> testParsePairwise (__main__.ModTest) ... ok
> testParseSEs (__main__.ModTest) ... ok
> testResultsExist (__main__.ModTest) ... ok
> testResultsParsable (__main__.ModTest) ... ok
> testResultsValid (__main__.ModTest) ... ^C
>
> $ python3 test_PAML_baseml.py
> testAlignmentExists (__main__.ModTest) ... ok
> testAlignmentFileIsValid (__main__.ModTest) ... FAIL
> testAlignmentSpecified (__main__.ModTest) ... ok
> testCtlFileExistsOnRead (__main__.ModTest) ... ok
> testCtlFileExistsOnRun (__main__.ModTest) ... ok
> testCtlFileValidOnRead (__main__.ModTest) ... ERROR
> testCtlFileValidOnRun (__main__.ModTest) ... ok
> testOptionExists (__main__.ModTest) ... ok
> testOutputFileSpecified (__main__.ModTest) ... ok
> testOutputFileValid (__main__.ModTest) ... ok
> testPamlErrorsCaught (__main__.ModTest) ... ok
> testParseAllVersions (__main__.ModTest) ... ok
> testParseAlpha1Rho1 (__main__.ModTest) ... ok
> testParseModel (__main__.ModTest) ... ok
> testParseNhomo (__main__.ModTest) ... ok
> testParseSEs (__main__.ModTest) ... ok
> testResultsExist (__main__.ModTest) ... ok
> testResultsParsable (__main__.ModTest) ... ok
> testResultsValid (__main__.ModTest) ... ^C
>
> If you've not tried this before, the procedure I'm using is:
>
> $ python3 setup.py build
> $ cd build/py3.3/Tests
> $ python3 test_PAML_baseml.py
> etc
>
> The key point is to run the tests directly (rather than
> just via 'python3 setup.py test') you must change
> director to the 2to3 converted folder under the build
> folder.
>
> By commenting out the test methods which seem to
> blocking, it seems some of the failures are to do with
> exception handling. I've not dug any further into this.
>
> Thanks,
>
> Peter


From bjoern at gruenings.eu  Mon Oct  1 21:44:10 2012
From: bjoern at gruenings.eu (=?ISO-8859-1?Q?Bj=F6rn_Gr=FCning?=)
Date: Mon, 01 Oct 2012 23:44:10 +0200
Subject: [Biopython-dev] [Patch] Genbank Parser
In-Reply-To: <CAKVJ-_5nekcTBYejUTVV6VvjV+mB0WV0eoEWKytGZOTmgfmw1g@mail.gmail.com>
References: <1348837402.21455.1.camel@threonin>
	<CAKVJ-_5nekcTBYejUTVV6VvjV+mB0WV0eoEWKytGZOTmgfmw1g@mail.gmail.com>
Message-ID: <1349127850.19730.11.camel@threonin>

Hi Peter,

> >
> > the tbl2asn tool from the ncbi creates genbank files that did not have a
> > version number. Unfortunately that version number is used to fill
> > consumer.data.id.
> > I implemented the following fall-back:
> > If there is no version information available than it takes the
> > consumer.data.name for the consumer.data.id. Does that makes sense?
> >
> > Thanks!
> > Bjoern
> 
> Can you share some example output from tbl2asn that shows
> this problem? Ideally something small we could include as a
> unit test.

please find attached a small, stripped version of such an genbank file.

Thanks,
Bjoern

> Thanks,
> 
> Peter

-------------- next part --------------
A non-text attachment was scrubbed...
Name: tbl1asn_output.gb
Type: application/x-gameboy-rom
Size: 5090 bytes
Desc: 
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20121001/1d6940cf/attachment-0002.bin>

From p.j.a.cock at googlemail.com  Thu Oct  4 09:11:01 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 4 Oct 2012 10:11:01 +0100
Subject: [Biopython-dev] [Patch] Genbank Parser
In-Reply-To: <1349127850.19730.11.camel@threonin>
References: <1348837402.21455.1.camel@threonin>
	<CAKVJ-_5nekcTBYejUTVV6VvjV+mB0WV0eoEWKytGZOTmgfmw1g@mail.gmail.com>
	<1349127850.19730.11.camel@threonin>
Message-ID: <CAKVJ-_5Bb_QEAVmTZz_oHkKXbSBe2g86=ekVZ+Xtt326bbJQLQ@mail.gmail.com>

On Mon, Oct 1, 2012 at 10:44 PM, Bj?rn Gr?ning <bjoern at gruenings.eu> wrote:
> Hi Peter,
>
>> >
>> > the tbl2asn tool from the ncbi creates genbank files that did not have a
>> > version number. Unfortunately that version number is used to fill
>> > consumer.data.id.
>> > I implemented the following fall-back:
>> > If there is no version information available than it takes the
>> > consumer.data.name for the consumer.data.id. Does that makes sense?
>> >
>> > Thanks!
>> > Bjoern
>>
>> Can you share some example output from tbl2asn that shows
>> this problem? Ideally something small we could include as a
>> unit test.
>
> please find attached a small, stripped version of such an genbank file.
>
> Thanks,
> Bjoern

$ python
Python 2.7.2 (default, Jun 20 2012, 16:23:33)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import SeqIO
>>> r = SeqIO.read("tbl1asn_output.gb", "gb")
/Library/Python/2.7/site-packages/Bio/GenBank/__init__.py:1158:
BiopythonParserWarning: Expected sequence length 300246, found 2220
().
  BiopythonParserWarning)
>>> r.id
''
>>> r.name
'Seq1'
>>> r.description
'Glarea strain lozoyensis.'
>>> quit()

That warning is because this test file has only the start of the sequence
present, yet the LOCUS line still gives the original length.

$ head tbl1asn_output.gb
LOCUS       Seq1                  300246 bp    DNA     linear       10-MAY-2012
DEFINITION  Glarea strain lozoyensis.
ACCESSION
VERSION
KEYWORDS    .
SOURCE      Glarea
  ORGANISM  Glarea
            Unclassified.
REFERENCE   1
  AUTHORS   Test

I didn't use your patch - looking over the code, it was already intended
that if there was no record.id that record.name would be used. Sadly
this was a bit too strict about None versus an empty string, fixed:
https://github.com/biopython/biopython/commit/e67d22e4b4f344a5a3c15b6e939c82f58986d87f

Thanks for your help,

Peter


From chapmanb at 50mail.com  Fri Oct  5 01:02:06 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 04 Oct 2012 21:02:06 -0400
Subject: [Biopython-dev] TAIR/AGI support
In-Reply-To: <CAH80STVrvSnxp4JkgrZoywMQqiMg8t=nJtTcGnNggCe4k-Y4aQ@mail.gmail.com>
References: <CAH80STXOOUjqYcQ82C2C25-gACyzwx0D4-VD+CMTes90CdZbnw@mail.gmail.com>
	<87txvcx9ls.fsf@fastmail.fm>
	<CAH80STVrvSnxp4JkgrZoywMQqiMg8t=nJtTcGnNggCe4k-Y4aQ@mail.gmail.com>
Message-ID: <874nm9g29d.fsf@fastmail.fm>


Kevin;
Thanks for making this available. This looks like a great start and
seems like it would be a nice starting place for folks dealing with
Arabidopsis data. A couple of thoughts which you've essentially already
covered:

- Could you build up a small test suite that fits into the testing
  framework:

  http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc246

  Your probably the best person to pick some disparate IDs that exercise
  different components and try to catch any edge cases.

- Additional interfaces that help folks do more than get sequence are a
  great idea. The ideas you've proposed below sound perfect.

- Provide some documentation on the Cookbook for common use cases with
  Biopython + your module. This will help motivate the addition and also
  help folks test it out on their data.

Thanks again for making this available,
Brad


> Hi Brad,
>
> My TAIR/AGI script is on github here:
> https://github.com/kdmurray91/biopython/blob/master/Bio/TAIR/__init__.py
>
> I got it to work directly from TAIR's website, however it has not been
> rigorously tested. I plan on implementing the process as i described in my
> previous email, whereby it fetches the Genbank record from TOGOws or via
> NCBI's Efetch (using biopython's interfaces of course). I will keep you all
> posted.
>
> To the list in general, I'm open to suggestions on what to work on next?
>
>
> Regards
> Kevin Murray
>
>
> On 6 September 2012 10:45, Brad Chapman <chapmanb at 50mail.com> wrote:
>
>>
>> Kevin;
>> Thanks for the e-mail and offers of code. Always happy to have other
>> folks involved with the project.
>>
>> > What's the status of TAIR AGIs in BioPython (I can see no mention of
>> them,
>> > or support for them)? I've written a brief module which allows a user to
>> > query NCBI with a TAIR AGI, returning a Seq object (via Efetch). Is there
>> > any interest in including such functionality in BioPython?
>>
>> Is the code available on GitHub to get a better sense of all the
>> functionality it supports? Do you have an idea where it would fit best?
>> As a tair submodule inside of Bio.Entrez, or somewhere else?
>>
>> > More generally, are there any particular areas of BioPython development
>> > which could use an extra pair of hands?
>>
>> Following the mailing list for discussions on current projects is the
>> best way to get a sense of what different folks are working on. The
>> issue tracker also has open issues and features that could use attention
>> if anything there strikes your fancy:
>>
>> https://redmine.open-bio.org/projects/biopython
>>
>> Hope this helps,
>> Brad
>>
>>


From tiagoantao at gmail.com  Sat Oct  6 03:21:50 2012
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 5 Oct 2012 20:21:50 -0700
Subject: [Biopython-dev] Away Re: buildbot failure in Biopython on Windows
	XP - Python 2.5
Message-ID: <CAA9RGEPgJabH5mPrOB5M-AVx4-jrCM2SjwAgWUhg0Gb97vPAgw@mail.gmail.com>

I am currently away from office. I will respond back on as soon as I retunr.

Regards,
Tiago

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin


From chris.mit7 at gmail.com  Mon Oct  8 02:48:20 2012
From: chris.mit7 at gmail.com (Chris Mitchell)
Date: Sun, 7 Oct 2012 22:48:20 -0400
Subject: [Biopython-dev] Proteomics/Mass Spec in Biopython
Message-ID: <CAK_U6OBpNCYoSuq70wAokoqn78T8p3CAFgw+TTNt-ebdTGVj6Q@mail.gmail.com>

Hi everyone,

I recall some time ago there was an email about getting some mass spec
functionality within BioPython.  I started a BioPython branch to
incorporate some iterators for common file types.  Of note, there is an
iterator for .msf files created by Proteome Discoverer, which thankfully is
light-years faster than using PD (and much more forgiving on memory...).

It's located here:
https://github.com/chrismit/biopython/tree/Proteomics

It's following along the progression of my spectra viewer, which is hosted
on the same repository (which, for anyone using linux might want to look
at; I couldn't find a spectra viewer I liked for linux.).  As I generalize
more of the methods within that program I'll be adding them to the
BioPython branch.  Also, I'll be putting in some methods to take care of
other common tasks such as FDRs calculation from the input files.

I'd love to hear if anyone else wants to join up on this branch or provide
suggestions.

Chris


From redmine at redmine.open-bio.org  Wed Oct 10 13:02:23 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 10 Oct 2012 13:02:23 +0000
Subject: [Biopython-dev] [Biopython - Bug #3386] (New) NewickIO parse_tree
	is slow
Message-ID: <redmine.issue-3386.20121010130223@redmine.open-bio.org>


Issue #3386 has been reported by Aleksey Kladov.

----------------------------------------
Bug #3386: NewickIO parse_tree is slow
https://redmine.open-bio.org/issues/3386

Author: Aleksey Kladov
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


In the file NewickIO.py class Parser method _parse_subtree seems to be inefficient in time and space. In fact, it's running time is quadratic in respect to size of input, while it can be linear. The problem is that each symbol is read many (up to O(len(text))) times, for example here

<pre>
for posn in range(1, close_posn):
            if text[posn] == '(':
                plevel += 1
            elif text[posn] == ')':
                plevel -= 1
            elif text[posn] == ',' and plevel == 0:
                subtrees.append(text[prev:posn])
                prev = posn + 1
</pre>

or here

<pre>
comment_start = text.find(NODECOMMENT_START)
</pre>

Also, _parse_subtree relies heavily on slices and strips of strings, which gives quadratic memory consumption.

Here is my dirty patched implementation. It's incomplete in many senses, I wrote it only to prove that parsing can be done faster.

For unrooted binary tree with 15000 leaves it runs for 1 second, compared to 13 seconds from current implementation.

<pre>
def _parse_tree(self, text, rooted):
        """Parses the text representation into an Tree object."""
        # XXX Pass **kwargs along from Parser.parse?
        return Newick.Tree(root=self._parse_subtree_fast(text)[0], rooted=rooted)

    def _parse_subtree_fast(self, text):
        id = re.compile(r'[A-Za-z0-9_]+')
        children = []
        if text.startswith('('):
            text = text[1:]
            while True:
                child, text = self._parse_subtree_fast(text)
                children.append(child)
                if text.startswith(','):
                    text = text[1:]
                else:
                    text = text[1:]
                    break
        m = re.match(id, text)
        if m:
            clade = self._parse_tag(m.group())
            text = text[m.end():]
        else:
            clade = Newick.Clade(comment=None)
        clade.clades = children
        return clade, text
</pre>

PS. I don't know if someone really needs to parse huge trees with BioPython, but I need this feature for couple of http://rosalind.info problems


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From kjwu at ucsd.edu  Wed Oct 10 21:27:19 2012
From: kjwu at ucsd.edu (Kevin Wu)
Date: Wed, 10 Oct 2012 14:27:19 -0700
Subject: [Biopython-dev] KEGG API Wrapper
Message-ID: <CAEe6yUE61E=ekS0zFGN-cUDw0-0+ExB-PGDwdXLMYgbQBPUnAA@mail.gmail.com>

Hi,

I've written a simple wrapper on top of KEGG's new REST API (
http://www.kegg.jp/kegg/docs/keggapi.html). The main functionality of this
module is that can detect some invalid queries based on kegg's defined
rules. I've implemented each of the examples given on the api docs as tests
as well. Here's a quick example of its usage.

The api call to http://rest.kegg.jp/get/hsa:10458+ece:Z5100/aaseq can be
done using the wrapper as:
KEGG.query("get", ["hsa:10458", "ece:Z5100"], "aaseq")

Querying the api works well with the current parsers written for KEGG
formats. Let me know if there are issues or if it's useful enough to be
merged into Biopython!

https://github.com/kevinwuhoo/biopython

Thanks!
Kevin


From mjldehoon at yahoo.com  Sat Oct 13 11:38:04 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 13 Oct 2012 04:38:04 -0700 (PDT)
Subject: [Biopython-dev] KEGG API Wrapper
In-Reply-To: <CAEe6yUE61E=ekS0zFGN-cUDw0-0+ExB-PGDwdXLMYgbQBPUnAA@mail.gmail.com>
Message-ID: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi Kevin,

It would be great to have better KEGG support in Biopython, so I think that this is useful and could in principle be merged into Biopython. But before we do so, is there some documentation for your code (or even better, for the Bio.KEGG module as a whole)? Then it's easier to see how this module can be used, and to discuss any modifications.

Thanks for your contribution!
-Michiel.

--- On Wed, 10/10/12, Kevin Wu <kjwu at ucsd.edu> wrote:

> From: Kevin Wu <kjwu at ucsd.edu>
> Subject: [Biopython-dev] KEGG API Wrapper
> To: Biopython-dev at lists.open-bio.org
> Date: Wednesday, October 10, 2012, 5:27 PM
> Hi,
> 
> I've written a simple wrapper on top of KEGG's new REST API
> (
> http://www.kegg.jp/kegg/docs/keggapi.html). The main
> functionality of this
> module is that can detect some invalid queries based on
> kegg's defined
> rules. I've implemented each of the examples given on the
> api docs as tests
> as well. Here's a quick example of its usage.
> 
> The api call to http://rest.kegg.jp/get/hsa:10458+ece:Z5100/aaseq can
> be
> done using the wrapper as:
> KEGG.query("get", ["hsa:10458", "ece:Z5100"], "aaseq")
> 
> Querying the api works well with the current parsers written
> for KEGG
> formats. Let me know if there are issues or if it's useful
> enough to be
> merged into Biopython!
> 
> https://github.com/kevinwuhoo/biopython
> 
> Thanks!
> Kevin
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


From chapmanb at 50mail.com  Mon Oct 15 15:02:12 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 15 Oct 2012 11:02:12 -0400
Subject: [Biopython-dev] BOSC/Broad Interoperability Hackathon: potential
	dates
Message-ID: <87ipabeq2z.fsf@fastmail.fm>


Hi all;
Open Bio regularly organizes hackathon coding sessions in conjunction
with the Bioinformatics Open Source Conference. The goal is to get
together biologists writing open source code, provide a room and
internet, and encourage fun collaborative coding. We've had successful
two day Codefests the past three years:

http://www.open-bio.org/wiki/Codefest_2012

This year, the Broad Institute kindly offered to host a two day
Hackathon in Boston during April. We've proposed three sets of dates:

April 4-5th, Thursday and Friday before Bio-IT
April 7-8th, Sunday and Monday before Bio-IT
April 22-23rd, Monday and Tuesday

If you have interest in attending, please fill out this Doodle poll to
let us know which dates work best:

http://doodle.com/aapy694g43e6ya4f

If you can find funds for travel and hotel (or are local to Boston), the
event is free and everyone is welcome. As we finalize dates, we'll send
around additional details. Thanks everyone,
Brad


From k.d.murray.91 at gmail.com  Tue Oct 16 03:49:22 2012
From: k.d.murray.91 at gmail.com (Kevin Murray)
Date: Tue, 16 Oct 2012 14:49:22 +1100
Subject: [Biopython-dev] TAIR/AGI support
In-Reply-To: <874nm9g29d.fsf@fastmail.fm>
References: <CAH80STXOOUjqYcQ82C2C25-gACyzwx0D4-VD+CMTes90CdZbnw@mail.gmail.com>
	<87txvcx9ls.fsf@fastmail.fm>
	<CAH80STVrvSnxp4JkgrZoywMQqiMg8t=nJtTcGnNggCe4k-Y4aQ@mail.gmail.com>
	<874nm9g29d.fsf@fastmail.fm>
Message-ID: <CAH80STXQNyPWYgk0mEWApd45Da1gmDHg05QXBmGjfkXeksc0EA@mail.gmail.com>

Brad,

I shall work on this as time permits, and get back to you all when complete.
Cheers,

Regards
Kevin Murray


On 5 October 2012 11:02, Brad Chapman <chapmanb at 50mail.com> wrote:

>
> Kevin;
> Thanks for making this available. This looks like a great start and
> seems like it would be a nice starting place for folks dealing with
> Arabidopsis data. A couple of thoughts which you've essentially already
> covered:
>
> - Could you build up a small test suite that fits into the testing
>   framework:
>
>   http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc246
>
>   Your probably the best person to pick some disparate IDs that exercise
>   different components and try to catch any edge cases.
>
> - Additional interfaces that help folks do more than get sequence are a
>   great idea. The ideas you've proposed below sound perfect.
>
> - Provide some documentation on the Cookbook for common use cases with
>   Biopython + your module. This will help motivate the addition and also
>   help folks test it out on their data.
>
> Thanks again for making this available,
> Brad
>
>
> > Hi Brad,
> >
> > My TAIR/AGI script is on github here:
> > https://github.com/kdmurray91/biopython/blob/master/Bio/TAIR/__init__.py
> >
> > I got it to work directly from TAIR's website, however it has not been
> > rigorously tested. I plan on implementing the process as i described in
> my
> > previous email, whereby it fetches the Genbank record from TOGOws or via
> > NCBI's Efetch (using biopython's interfaces of course). I will keep you
> all
> > posted.
> >
> > To the list in general, I'm open to suggestions on what to work on next?
> >
> >
> > Regards
> > Kevin Murray
> >
> >
> > On 6 September 2012 10:45, Brad Chapman <chapmanb at 50mail.com> wrote:
> >
> >>
> >> Kevin;
> >> Thanks for the e-mail and offers of code. Always happy to have other
> >> folks involved with the project.
> >>
> >> > What's the status of TAIR AGIs in BioPython (I can see no mention of
> >> them,
> >> > or support for them)? I've written a brief module which allows a user
> to
> >> > query NCBI with a TAIR AGI, returning a Seq object (via Efetch). Is
> there
> >> > any interest in including such functionality in BioPython?
> >>
> >> Is the code available on GitHub to get a better sense of all the
> >> functionality it supports? Do you have an idea where it would fit best?
> >> As a tair submodule inside of Bio.Entrez, or somewhere else?
> >>
> >> > More generally, are there any particular areas of BioPython
> development
> >> > which could use an extra pair of hands?
> >>
> >> Following the mailing list for discussions on current projects is the
> >> best way to get a sense of what different folks are working on. The
> >> issue tracker also has open issues and features that could use attention
> >> if anything there strikes your fancy:
> >>
> >> https://redmine.open-bio.org/projects/biopython
> >>
> >> Hope this helps,
> >> Brad
> >>
> >>
>


From zcharlop at mail.rockefeller.edu  Tue Oct 16 23:55:26 2012
From: zcharlop at mail.rockefeller.edu (Zachary Charlop-Powers)
Date: Tue, 16 Oct 2012 23:55:26 +0000
Subject: [Biopython-dev] KEGG API Wrapper
In-Reply-To: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <C45C27DAD51E2647959A053EBBF42C86089480@RUMBX2.rockefeller.edu>

Kevin,
Michiel,


I just tested Kevin's code for a few simple queries and it worked great. I have always liked KEGG's organization of data and really appreciate this RESTful interface to their data; in some ways I think it easier to use the web interfaces for KEGG than it is for NCBI. Plus the KEGG coverage of metabolic networks is awesome.  I found the examples in Kevin's test script to be fairly self-explanatory but a simple-spelled out example in the Tutorial would be nice.

One thought, though, is that you can retrieve MANY different types of data from the KEGG Rest API - which means that the user will probably have to parse the data his/herself. Data retrieved with "list" can return lists of genes or compounds or organism and after a  cursory look  these are each formatted differently. Also true with the 'find' command. So I think you were right to leave out parsers because i think they will be a moving target highly dependent on the query.

Thank You Kevin,
zach cp


On Oct 13, 2012, at 7:38 AM, Michiel de Hoon <mjldehoon at yahoo.com<mailto:mjldehoon at yahoo.com>> wrote:

Hi Kevin,

It would be great to have better KEGG support in Biopython, so I think that this is useful and could in principle be merged into Biopython. But before we do so, is there some documentation for your code (or even better, for the Bio.KEGG module as a whole)? Then it's easier to see how this module can be used, and to discuss any modifications.

Thanks for your contribution!
-Michiel.

--- On Wed, 10/10/12, Kevin Wu <kjwu at ucsd.edu<mailto:kjwu at ucsd.edu>> wrote:

From: Kevin Wu <kjwu at ucsd.edu<mailto:kjwu at ucsd.edu>>
Subject: [Biopython-dev] KEGG API Wrapper
To: Biopython-dev at lists.open-bio.org<mailto:Biopython-dev at lists.open-bio.org>
Date: Wednesday, October 10, 2012, 5:27 PM
Hi,

I've written a simple wrapper on top of KEGG's new REST API
(
http://www.kegg.jp/kegg/docs/keggapi.html). The main
functionality of this
module is that can detect some invalid queries based on
kegg's defined
rules. I've implemented each of the examples given on the
api docs as tests
as well. Here's a quick example of its usage.

The api call to http://rest.kegg.jp/get/hsa:10458+ece:Z5100/aaseq can
be
done using the wrapper as:
KEGG.query("get", ["hsa:10458", "ece:Z5100"], "aaseq")

Querying the api works well with the current parsers written
for KEGG
formats. Let me know if there are issues or if it's useful
enough to be
merged into Biopython!

https://github.com/kevinwuhoo/biopython

Thanks!
Kevin
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev

_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org<mailto:Biopython-dev at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/biopython-dev

Zach Charlop-Powers
Post-Doctoral Fellow
Laboratory of Genetically Encoded Small Molecules
Rockefeller University

zcharlop at rockefeller.edu<mailto:zcharlop at rockefeller.edu>


From p.j.a.cock at googlemail.com  Wed Oct 17 11:09:07 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 17 Oct 2012 12:09:07 +0100
Subject: [Biopython-dev] KEGG API Wrapper
In-Reply-To: <C45C27DAD51E2647959A053EBBF42C86089480@RUMBX2.rockefeller.edu>
References: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com>
	<C45C27DAD51E2647959A053EBBF42C86089480@RUMBX2.rockefeller.edu>
Message-ID: <CAKVJ-_7Ao1gdtF2_-7GH89qWGtseLVuJ4beB9bUpun5DLwcQsA@mail.gmail.com>

On Wed, Oct 17, 2012 at 12:55 AM, Zachary Charlop-Powers
<zcharlop at mail.rockefeller.edu> wrote:
> Kevin,
> Michiel,
>
> I just tested Kevin's code for a few simple queries and it worked great. I
> have always liked KEGG's organization of data and really appreciate this
> RESTful interface to their data; in some ways I think it easier to use the
> web interfaces for KEGG than it is for NCBI. Plus the KEGG coverage of
> metabolic networks is awesome.  I found the examples in Kevin's test script
> to be fairly self-explanatory but a simple-spelled out example in the
> Tutorial would be nice.
>
> One thought, though, is that you can retrieve MANY different types of data
> from the KEGG Rest API - which means that the user will probably have to
> parse the data his/herself. Data retrieved with "list" can return lists of
> genes or compounds or organism and after a  cursory look  these are each
> formatted differently. Also true with the 'find' command. So I think you
> were right to leave out parsers because i think they will be a moving target
> highly dependent on the query.
>
> Thank You Kevin,
> zach cp

Good point about decoupling the web API wrapper and the parsers -
how the Bio.Entrez module and Bio.TogoWS handle this is to return
handles for web results, which you can then parse with an appropriate
parser (e.g. SeqIO for GenBank files, Medline parser, etc).

Note that this is a little more fiddly under Python 3 due to the text
mode distinction between unicode and binary... just something to
keep in the back of your mind.

Peter


From redmine at redmine.open-bio.org  Wed Oct 17 13:27:18 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 17 Oct 2012 13:27:18 +0000
Subject: [Biopython-dev] [Biopython - Bug #3387] (New) Generic per column
	annotation from stockholm alignment are not stored in
	alignment object
Message-ID: <redmine.issue-3387.20121017132718@redmine.open-bio.org>


Issue #3387 has been reported by saverio vicario.

----------------------------------------
Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object
https://redmine.open-bio.org/issues/3387

Author: saverio vicario
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Stockholm format includes 4 types of annotations
#=GF <feature> <Generic per-File annotation, free text>
#=GC <feature> <Generic per-Column annotation, exactly 1 char per column>
#=GS <seqname> <feature> <Generic per-Sequence annotation, free text>
#=GR <seqname> <feature> <Generic per-Sequence AND per-Column markup, exactly 1 char per column>
GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations.

GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. 


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Wed Oct 17 13:27:18 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 17 Oct 2012 13:27:18 +0000
Subject: [Biopython-dev] [Biopython - Bug #3387] (New) Generic per column
	annotation from stockholm alignment are not stored in
	alignment object
Message-ID: <redmine.issue-3387.20121017132718@redmine.open-bio.org>


Issue #3387 has been reported by saverio vicario.

----------------------------------------
Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object
https://redmine.open-bio.org/issues/3387

Author: saverio vicario
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Stockholm format includes 4 types of annotations
#=GF <feature> <Generic per-File annotation, free text>
#=GC <feature> <Generic per-Column annotation, exactly 1 char per column>
#=GS <seqname> <feature> <Generic per-Sequence annotation, free text>
#=GR <seqname> <feature> <Generic per-Sequence AND per-Column markup, exactly 1 char per column>
GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations.

GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. 


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Wed Oct 17 13:36:24 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 17 Oct 2012 13:36:24 +0000
Subject: [Biopython-dev] [Biopython - Bug #3387] Generic per column
	annotation from stockholm alignment are not stored in
	alignment object
References: <redmine.issue-3387.20121017132718@redmine.open-bio.org>
Message-ID: <redmine.journal-14973.20121017133624@redmine.open-bio.org>


Issue #3387 has been updated by Peter Cock.


The underlying alignment class would need a per-column-annotation dictionary (as well as an annotations dictionary, also on the TODO list), to match the per-letter-annotation and annotations dictionaries of the SeqRecord.

Parsing this and putting it in alignment._letter_annotation (dictionary as a private variable) would be a reasonable short term hack if you'd like to work on that.
----------------------------------------
Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object
https://redmine.open-bio.org/issues/3387

Author: saverio vicario
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Stockholm format includes 4 types of annotations
#=GF <feature> <Generic per-File annotation, free text>
#=GC <feature> <Generic per-Column annotation, exactly 1 char per column>
#=GS <seqname> <feature> <Generic per-Sequence annotation, free text>
#=GR <seqname> <feature> <Generic per-Sequence AND per-Column markup, exactly 1 char per column>
GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations.

GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. 


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Wed Oct 17 13:39:25 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 17 Oct 2012 13:39:25 +0000
Subject: [Biopython-dev] [Biopython - Feature #3388] (New) add annotation
	and letter_annotations attributed for
	Bio.Align.MultipleSeqAlignment. object
Message-ID: <redmine.issue-3388.20121017133925@redmine.open-bio.org>


Issue #3388 has been reported by saverio vicario.

----------------------------------------
Feature #3388: add annotation and letter_annotations attributed for Bio.Align.MultipleSeqAlignment. object
https://redmine.open-bio.org/issues/3388

Author: saverio vicario
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


At the moment I could not add annotation at alignment level.  annotation could be usefull for tracking info linked to the loci ( i.e. name of domain), while letter annotation could be usefull to track quality score of alignment or if the sites belong to a given character set.
In particular when to alignment are merged it would be usefull tha the bounduary of the merge is tracked
for example in Letter annotation of the merge of an alignment a with 10 sites and b of 5 sites the letter_annotations would be as following 

{locus1:'111111111100000',locus2:'000000000011111'} 
this could be usefull also to annotate the 3 position of codons
{pos1:'1001001001',pos2:'0100100100', pos3:'0010010010'}

If this letter_annotation would be supported the annotation could be kept across merging and splitting of the alignment


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Wed Oct 17 13:39:25 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 17 Oct 2012 13:39:25 +0000
Subject: [Biopython-dev] [Biopython - Feature #3388] (New) add annotation
	and letter_annotations attributed for
	Bio.Align.MultipleSeqAlignment. object
Message-ID: <redmine.issue-3388.20121017133925@redmine.open-bio.org>


Issue #3388 has been reported by saverio vicario.

----------------------------------------
Feature #3388: add annotation and letter_annotations attributed for Bio.Align.MultipleSeqAlignment. object
https://redmine.open-bio.org/issues/3388

Author: saverio vicario
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


At the moment I could not add annotation at alignment level.  annotation could be usefull for tracking info linked to the loci ( i.e. name of domain), while letter annotation could be usefull to track quality score of alignment or if the sites belong to a given character set.
In particular when to alignment are merged it would be usefull tha the bounduary of the merge is tracked
for example in Letter annotation of the merge of an alignment a with 10 sites and b of 5 sites the letter_annotations would be as following 

{locus1:'111111111100000',locus2:'000000000011111'} 
this could be usefull also to annotate the 3 position of codons
{pos1:'1001001001',pos2:'0100100100', pos3:'0010010010'}

If this letter_annotation would be supported the annotation could be kept across merging and splitting of the alignment


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Wed Oct 17 15:00:15 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 17 Oct 2012 15:00:15 +0000
Subject: [Biopython-dev] [Biopython - Bug #3387] Generic per column
	annotation from stockholm alignment are not stored in
	alignment object
References: <redmine.issue-3387.20121017132718@redmine.open-bio.org>
Message-ID: <redmine.journal-14974.20121017150015@redmine.open-bio.org>


Issue #3387 has been updated by Peter Cock.


Depends on issue #3388, add annotation and letter_annotations attributed to Bio.Align.MultipleSeqAlignment object
https://redmine.open-bio.org/issues/3388
----------------------------------------
Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object
https://redmine.open-bio.org/issues/3387

Author: saverio vicario
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Stockholm format includes 4 types of annotations
#=GF <feature> <Generic per-File annotation, free text>
#=GC <feature> <Generic per-Column annotation, exactly 1 char per column>
#=GS <seqname> <feature> <Generic per-Sequence annotation, free text>
#=GR <seqname> <feature> <Generic per-Sequence AND per-Column markup, exactly 1 char per column>
GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations.

GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. 


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Oct 18 11:02:49 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 18 Oct 2012 11:02:49 +0000
Subject: [Biopython-dev] [Biopython - Bug #3387] Generic per column
	annotation from stockholm alignment are not stored in
	alignment object
References: <redmine.issue-3387.20121017132718@redmine.open-bio.org>
Message-ID: <redmine.journal-14975.20121018110249@redmine.open-bio.org>


Issue #3387 has been updated by saverio vicario.

File diff_StockholmIO.py added
File StockholmIO.py added

This is my proposal of patch for StockholmIO.
Attached you will find the new StockholmIO.py and a diff file with the old one. 
To highlight further the new comments I start the comment by #SV 

In summary the patch implement the new attribute _letter_annotations for Bio.Align.MultipleSeqAlignment and store the GC features within, in the iterator while in the writer write the GC features after all sequence record as stated in http://sonnhammer.sbc.su.se/Stockholm.html.

I added a new dictionary for GC and GF features using PFAM standard and it is used in the writing phase to write only PFAM legitimate attributes. The only addition to PFAM standard is the GC features "RF" that is add by HMMer3.0 softwares to indicates what sites where originally present in the profile used to generate the alignment. 

I do not use the dictionary of PFAM standard to translate the GF, GR attributes of alignment._annotations or the GC attributes in alignment._letter_annotations as is done in the seqRecord for consistency with decision taken originally with GR attributes in alignment._annotations

----------------------------------------
Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object
https://redmine.open-bio.org/issues/3387

Author: saverio vicario
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Stockholm format includes 4 types of annotations
#=GF <feature> <Generic per-File annotation, free text>
#=GC <feature> <Generic per-Column annotation, exactly 1 char per column>
#=GS <seqname> <feature> <Generic per-Sequence annotation, free text>
#=GR <seqname> <feature> <Generic per-Sequence AND per-Column markup, exactly 1 char per column>
GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations.

GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. 


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Thu Oct 18 18:33:04 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 18 Oct 2012 19:33:04 +0100
Subject: [Biopython-dev] PyPy 1.8 support?
Message-ID: <CAKVJ-_61pBGwbFRaYB9UqWmtozZpZ_JStYdaKfzArGZn29RQ6w@mail.gmail.com>

Hello all,

We currently run the test suite against both PyPy 1.8 and
1.9 on Linux via the TravisCI.org continuous integration
testing service.

Is anyone actually using Biopython under PyPy 1.8?

If not, I intend to drop automated testing under PyPy 1.8
and focus just on PyPy 1.9 instead.

(Automated testing under C Python 2.5, 2.6, 2.7, 3.1 and
3.2 etc will continue - I'm hoping to add Python 3.3 as well)

Thanks,

Peter


From ben at benfulton.net  Fri Oct 19 03:16:45 2012
From: ben at benfulton.net (Ben Fulton)
Date: Thu, 18 Oct 2012 23:16:45 -0400
Subject: [Biopython-dev] Contributing startup
Message-ID: <CA+ijMsk_dCk0w+MGiAtzzE8rAqWAZ4BzDxwCA7yniF1CS-o4TQ@mail.gmail.com>

Hi,

I was looking for some introductory tickets or other methods to familiarize
myself with the Biopython codebase. I saw some suggestions on the wiki to
improve unit test coverage or to add additional file formats, which sounds
fine - are there particular areas of code that lack coverage, or file
formats that are particularly wanted? Or would it be better to look over
the issue tracker and try to identify some smallish issues?

Thanks for any suggestions.

Ben Fulton


From p.j.a.cock at googlemail.com  Fri Oct 19 07:52:19 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 19 Oct 2012 08:52:19 +0100
Subject: [Biopython-dev] PyPy 1.8 support?
In-Reply-To: <CAKVJ-_61pBGwbFRaYB9UqWmtozZpZ_JStYdaKfzArGZn29RQ6w@mail.gmail.com>
References: <CAKVJ-_61pBGwbFRaYB9UqWmtozZpZ_JStYdaKfzArGZn29RQ6w@mail.gmail.com>
Message-ID: <CAKVJ-_57=R6aMQSxndyVGtJtZ1O8_Q2kF2BPmTK-GyKVKhR_PA@mail.gmail.com>

On Thu, Oct 18, 2012 at 7:33 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hello all,
>
> We currently run the test suite against both PyPy 1.8 and
> 1.9 on Linux via the TravisCI.org continuous integration
> testing service.
>
> Is anyone actually using Biopython under PyPy 1.8?
>
> If not, I intend to drop automated testing under PyPy 1.8
> and focus just on PyPy 1.9 instead.

Done on TravisCI, but easy to revert:
https://github.com/biopython/biopython/commit/126c944812730df4677c8fa2f63abc29ddd084bb

One reason was the previous build failed due to a timeout
fetching PyPy for a custom install. Now we use the TravisCI
provided PyPy which should avoid that issue.

(It still happens for Jython sometimes).

Peter


From p.j.a.cock at googlemail.com  Fri Oct 19 08:26:35 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 19 Oct 2012 09:26:35 +0100
Subject: [Biopython-dev] Contributing startup
In-Reply-To: <CA+ijMsk_dCk0w+MGiAtzzE8rAqWAZ4BzDxwCA7yniF1CS-o4TQ@mail.gmail.com>
References: <CA+ijMsk_dCk0w+MGiAtzzE8rAqWAZ4BzDxwCA7yniF1CS-o4TQ@mail.gmail.com>
Message-ID: <CAKVJ-_6w14q3-6nq1QSs_yHXONh+CZDWk4YCbELrGfs6g8D3ug@mail.gmail.com>

On Fri, Oct 19, 2012 at 4:16 AM, Ben Fulton <ben at benfulton.net> wrote:
> Hi,
>
> I was looking for some introductory tickets or other methods to familiarize
> myself with the Biopython codebase. I saw some suggestions on the wiki to
> improve unit test coverage or to add additional file formats, which sounds
> fine - are there particular areas of code that lack coverage, or file
> formats that are particularly wanted? Or would it be better to look over
> the issue tracker and try to identify some smallish issues?
>
> Thanks for any suggestions.
>
> Ben Fulton

Hi Ben,

Welcome - more volunteer developers willing to help is always nice.

You asked about test coverage, and while I could guess about things
what might be most interesting would be to try and measure this
using something like coverage or figleaf:
http://nedbatchelder.com/code/coverage/
http://darcs.idyll.org/~t/projects/figleaf/doc/

Another general area would be improving our support under
Python 3.

In terms of specific modules, is there anything in particular which
seems like a good match with your work/research interests?

Regards,

Peter


From p.j.a.cock at googlemail.com  Mon Oct 22 16:43:07 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 22 Oct 2012 17:43:07 +0100
Subject: [Biopython-dev] Low level string based FASTA parser
Message-ID: <CAKVJ-_7XXJqby4HBPAv7P-=fVBcKC98+ev+upB4Cd-6xmjw31A@mail.gmail.com>

Hello all,

Something I've wanted/needed recently was a low-level FASTA
iterating parser which just returns tuples of strings (without the
overhead of Bio.SeqIO building SeqRecords).

We don't currently have such a thing, so I have added one to the
SeqIO Fasta module (mirroring the low level string-tuple parser
for FASTQ files) with some associated unit tests and refactoring
(separate commits):

https://github.com/biopython/biopython/commit/751fe39765ca6ba60e517b3b4657718fd48f7817

Does anyone have any views on the name of this new
function, currently SimpleFastaParser, used as follows:

    >>> from Bio.SeqIO.FastaIO import SimpleFastaParser
    >>> with open("Fasta/dups.fasta") as handle:
    ...     for values in SimpleFastaParser(handle):
    ...         print values
    ('alpha', 'ACGTA')
    ('beta', 'CGTC')
    ('gamma', 'CCGCC')
    ('alpha (again - this is a duplicate entry to test the indexing
code)', 'ACGTA')
    ('delta', 'CGCGC')

The capitalisation style is consistent with other functions in
SeqIO, but not with PEP8.

Peter

P.S. I've also updated the legacy function quick_FASTA_reader
in Bio.SeqUtils to use this. Since it loads the whole dataset into
memory, if no one objects I would like to deprecate this old function.


From p.j.a.cock at googlemail.com  Mon Oct 22 17:08:47 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 22 Oct 2012 18:08:47 +0100
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_4PV3VMx5pju65578gq8TSN936T5ePH_cjhtUQcrECHYg@mail.gmail.com>
References: <CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<1346926418.97489.YahooMailClassic@web164004.mail.gq1.yahoo.com>
	<CAMC681n6=UuotEUdxGVEWDK4vPGd3=4O0yW82UQ3upTNMfy1iw@mail.gmail.com>
	<CAKVJ-_6rTsfqphX6i+YGA8ijLN+04kP+Gxk=BjwWCcXJtF97Vg@mail.gmail.com>
	<CAKVJ-_7-KXVZ96bHLG6XD88zcN9rPvnTf7yQ0E6J1jhb_5yx+g@mail.gmail.com>
	<CAKVJ-_6U0PrsTWM8sMPgsSX8cnfTandTGKz5j829K8so7whPgA@mail.gmail.com>
	<CAKVJ-_4PV3VMx5pju65578gq8TSN936T5ePH_cjhtUQcrECHYg@mail.gmail.com>
Message-ID: <CAKVJ-_5ZGQrj_xt4b5a7s7TrZ=7B-PXCJfBykHYNidE2L991jg@mail.gmail.com>

On Fri, Sep 28, 2012 at 11:50 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Sep 20, 2012 at 10:08 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> On Sun, Sep 16, 2012 at 1:34 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>>
>>>> I guess we need to have a little hack with the 2to3 library and
>>>> try defining our own custom fixer for the imports...
>>>
>>> I've made a start at this - the easy part seems to work :)
>>>
>>> https://github.com/peterjc/biopython/commits/py3lower
>>>
>>> ...
>
> The code to do this lower case name mangling remains
> a quite spaghetti like mess in do2to3.py but it now works
> enough to pass the test suite (with some but not all 3rd
> party dependencies installed) under Linux and my Mac
> OS X machine (where like Windows I have a case
> insensitive file system).
>
> ...
>
> So this idea to adopt PEP8 lower case module names
> as part of supporting Python 3 appears to be technically
> viable.

Has anyone else tried this branch yet? Has the lower case
module names under Python 3 idea grown on anyone?
I think it makes sense in terms of a long term vision - I do
expect to be primarily working under Python 3 within a
couple of years.

It occurs to me we can make a partial step in this direction
with moving to a directory for Bio.Seq, since this could be
Bio.seq instead. For example, we talked about something
like this:

Bio.Seq -> Bio.seq
Bio.SeqRecord -> Bio.seq.record
Bio.SeqFeature -> Bio.seq.feature
Bio.SeqUtils -> Bio.seq.utils
Bio.SearchIO -> Bio.seq.search

I'm not 100% sure where the Bio.SeqIO top level functions
would belong, either directly under Bio.seq or Bio.seq.record
might work too.

We can have imports setup so that all the classes etc
are only defined once, e.g. Bio/seq/__init__.py could
initially just contain 'from Bio.Seq import *' and so on.

(We'd commit to maintaining the old namespace for
at least as long as our standard deprecation cycle,
longer ideally).

Peter


From p.j.a.cock at googlemail.com  Mon Oct 22 17:17:34 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 22 Oct 2012 18:17:34 +0100
Subject: [Biopython-dev] Dropping Python 2.5 and Jython 2.5 support?
Message-ID: <CAKVJ-_60vgH_T8eqrfYKBScyRVNUApmFM32_4D+uTLABLy19Ng@mail.gmail.com>

Dear Biopythoneers,

Would anyone object to us preparing to drop support for Python 2.5 and
Jython 2.5, perhaps after the next Biopython release?

To reassure those of you using Jython, we'd wait until Jython 2.7 is out
first. Jython 2.7 is already in alpha, and brings support for C Python 2.7
language features.

Thanks,

Peter


From eric.talevich at gmail.com  Mon Oct 22 21:53:55 2012
From: eric.talevich at gmail.com (Eric Talevich)
Date: Mon, 22 Oct 2012 17:53:55 -0400
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_5ZGQrj_xt4b5a7s7TrZ=7B-PXCJfBykHYNidE2L991jg@mail.gmail.com>
References: <CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<1346926418.97489.YahooMailClassic@web164004.mail.gq1.yahoo.com>
	<CAMC681n6=UuotEUdxGVEWDK4vPGd3=4O0yW82UQ3upTNMfy1iw@mail.gmail.com>
	<CAKVJ-_6rTsfqphX6i+YGA8ijLN+04kP+Gxk=BjwWCcXJtF97Vg@mail.gmail.com>
	<CAKVJ-_7-KXVZ96bHLG6XD88zcN9rPvnTf7yQ0E6J1jhb_5yx+g@mail.gmail.com>
	<CAKVJ-_6U0PrsTWM8sMPgsSX8cnfTandTGKz5j829K8so7whPgA@mail.gmail.com>
	<CAKVJ-_4PV3VMx5pju65578gq8TSN936T5ePH_cjhtUQcrECHYg@mail.gmail.com>
	<CAKVJ-_5ZGQrj_xt4b5a7s7TrZ=7B-PXCJfBykHYNidE2L991jg@mail.gmail.com>
Message-ID: <CAMC681=XpBsqO3ohLbDAPyqODCtXy614k=C_f9XfJaqn6xBUhg@mail.gmail.com>

On Mon, Oct 22, 2012 at 1:08 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Fri, Sep 28, 2012 at 11:50 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> > On Thu, Sep 20, 2012 at 10:08 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >> On Sun, Sep 16, 2012 at 1:34 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >>>>
> >>>> I guess we need to have a little hack with the 2to3 library and
> >>>> try defining our own custom fixer for the imports...
> >>>
> >>> I've made a start at this - the easy part seems to work :)
> >>>
> >>> https://github.com/peterjc/biopython/commits/py3lower
> >>>
> >>> ...
> >
> > The code to do this lower case name mangling remains
> > a quite spaghetti like mess in do2to3.py but it now works
> > enough to pass the test suite (with some but not all 3rd
> > party dependencies installed) under Linux and my Mac
> > OS X machine (where like Windows I have a case
> > insensitive file system).
> >
> > ...
> >
> > So this idea to adopt PEP8 lower case module names
> > as part of supporting Python 3 appears to be technically
> > viable.
>
> Has anyone else tried this branch yet? Has the lower case
> module names under Python 3 idea grown on anyone?
> I think it makes sense in terms of a long term vision - I do
> expect to be primarily working under Python 3 within a
> couple of years.
>
> It occurs to me we can make a partial step in this direction
> with moving to a directory for Bio.Seq, since this could be
> Bio.seq instead. For example, we talked about something
> like this:
>
> Bio.Seq -> Bio.seq
> Bio.SeqRecord -> Bio.seq.record
> Bio.SeqFeature -> Bio.seq.feature
> Bio.SeqUtils -> Bio.seq.utils
> Bio.SearchIO -> Bio.seq.search
>
> I'm not 100% sure where the Bio.SeqIO top level functions
> would belong, either directly under Bio.seq or Bio.seq.record
> might work too.
>


Personally, I've used the variable name "seq" an awful lot, so I'm wary of
using "seq" as a module name. However, reasonable coding style could make
this easy to avoid if we have a "seq" module containing all of Seq,
SeqRecord and SeqFeature (maybe even Alphabet), and "sequtil" containing
standalone functions.

Result:

# Everything you need to build a new sequence record, but not much else
from Bio.seq import Seq, SeqRecord, SeqFeature

# Working with sequence strings
from Bio import sequtil

It also seems reasonable to treat molecular sequences as the implied core
object type at the top-level namespace. From that viewpoint, Bio.Search
would mean sequence search, as everything else is typically tucked away in
a sub-module like PDB (pdb?), Motif (motif), or Phylo (phylo); then it's
also fine to keep seqio and alignio directly under the Bio namespace.

(Given a clean, I'd prefer "from Bio import Seq, SeqRecord, SeqFeature",
but since those are already module names it would be brutal to make that
transition now.)


> We can have imports setup so that all the classes etc
> are only defined once, e.g. Bio/seq/__init__.py could
> initially just contain 'from Bio.Seq import *' and so on.
>
>
Sounds cool. We'll need to watch out for the PDB module, where classes and
modules have identical names, and the class names are imported to shadow
the module names at import time.

-Eric


From p.j.a.cock at googlemail.com  Mon Oct 22 22:59:21 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 22 Oct 2012 23:59:21 +0100
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAMC681=XpBsqO3ohLbDAPyqODCtXy614k=C_f9XfJaqn6xBUhg@mail.gmail.com>
References: <CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<1346926418.97489.YahooMailClassic@web164004.mail.gq1.yahoo.com>
	<CAMC681n6=UuotEUdxGVEWDK4vPGd3=4O0yW82UQ3upTNMfy1iw@mail.gmail.com>
	<CAKVJ-_6rTsfqphX6i+YGA8ijLN+04kP+Gxk=BjwWCcXJtF97Vg@mail.gmail.com>
	<CAKVJ-_7-KXVZ96bHLG6XD88zcN9rPvnTf7yQ0E6J1jhb_5yx+g@mail.gmail.com>
	<CAKVJ-_6U0PrsTWM8sMPgsSX8cnfTandTGKz5j829K8so7whPgA@mail.gmail.com>
	<CAKVJ-_4PV3VMx5pju65578gq8TSN936T5ePH_cjhtUQcrECHYg@mail.gmail.com>
	<CAKVJ-_5ZGQrj_xt4b5a7s7TrZ=7B-PXCJfBykHYNidE2L991jg@mail.gmail.com>
	<CAMC681=XpBsqO3ohLbDAPyqODCtXy614k=C_f9XfJaqn6xBUhg@mail.gmail.com>
Message-ID: <CAKVJ-_7spK5YYSZsoU1jqocYv2TPyCtHqUFokDae1esqfDbgTA@mail.gmail.com>

On Mon, Oct 22, 2012 at 10:53 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
>
> Personally, I've used the variable name "seq" an awful lot, so I'm wary of
> using "seq" as a module name. However, reasonable coding style could make
> this easy to avoid if we have a "seq" module containing all of Seq,
> SeqRecord and SeqFeature (maybe even Alphabet), and "sequtil" containing
> standalone functions.
>
> Result:
>
> # Everything you need to build a new sequence record, but not much else
> from Bio.seq import Seq, SeqRecord, SeqFeature

I'd been picturing:

from Bio.seq import Seq
from Bio.seq.record import SeqRecord
from Bio.seq.feature import SeqFeature

but you're right, those three classes could all be exposed at the level
of Bio.seq (while still having the SeqRecord defined in the file
Bio/seq/record.py and SeqFeature etc in Bio/seq/feature.py) for
connivence.

> # Working with sequence strings
> from Bio import sequtil

If you mean strings rather than Seq objects, currently Bio.SeqUtils
should most work on Seq or strings. It is kind of an odds and ends
module, rather than deliberately focusing on sequences as strings.

> It also seems reasonable to treat molecular sequences as the implied core
> object type at the top-level namespace. From that viewpoint, Bio.Search
> would mean sequence search, as everything else is typically tucked away in a
> sub-module like PDB (pdb?), Motif (motif), or Phylo (phylo); then it's also
> fine to keep seqio and alignio directly under the Bio namespace.

Having sequence stuff collected under Bio.Seq or Bio.seq (or bio.seq
if we go with the lower case plan for Python 3) seems more organised.
It also keeps the import times down for people not working with
sequences (e.g. a script using clustering or PDB files).

> (Given a clean, I'd prefer "from Bio import Seq, SeqRecord, SeqFeature", but
> since those are already module names it would be brutal to make that
> transition now.)

That isn't a good plan anyway in terms of polluting the namespace
and loading things into memory for anyone not working with sequences.

>> We can have imports setup so that all the classes etc
>> are only defined once, e.g. Bio/seq/__init__.py could
>> initially just contain 'from Bio.Seq import *' and so on.
>>
>
> Sounds cool. We'll need to watch out for the PDB module, where classes and
> modules have identical names, and the class names are imported to shadow the
> module names at import time.

The shadowing was one of the gotchas in the auto-conversion
of all the module names to lower case - but solvable. Adopting
lower case module names has the bonus of fixing this in the long
term.

Peter


From kjwu at ucsd.edu  Wed Oct 24 22:38:04 2012
From: kjwu at ucsd.edu (Kevin Wu)
Date: Wed, 24 Oct 2012 15:38:04 -0700
Subject: [Biopython-dev] KEGG API Wrapper
In-Reply-To: <CAKVJ-_7Ao1gdtF2_-7GH89qWGtseLVuJ4beB9bUpun5DLwcQsA@mail.gmail.com>
References: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com>
	<C45C27DAD51E2647959A053EBBF42C86089480@RUMBX2.rockefeller.edu>
	<CAKVJ-_7Ao1gdtF2_-7GH89qWGtseLVuJ4beB9bUpun5DLwcQsA@mail.gmail.com>
Message-ID: <CAEe6yUEbiK3tFdvx1hEGE2==QR7Pab2HcvL6x-CqOivWCB9=sg@mail.gmail.com>

Hi All,

Thanks for the comments, I've written a bit of documentation on the entire
KEGG module and have attached those relevant pages to the email. There
didn't seem like an appropriate place for examples, so I just added a new
chapter. I've also committed the updated file to github.

I did leave out the parsers due to the fact that the current parsers only
cover a small portion of possible responses from the api. Also, I'm not
confident that the some of the parsers correctly retrieves all the fields.
However, I've written a really general parser that does a rough job of
retrieving fields if it's a database format returned since I find myself
reusing the code for all database formats. It's possible to modify this to
correctly account for the different fields, but would probably take a bit
of work to manually figure each field out. Otherwise it also parses the
tsv/flat file returned.

Also, @zach, thanks for checking it out and testing it!

Thanks All!
Kevin

On Wed, Oct 17, 2012 at 4:09 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Wed, Oct 17, 2012 at 12:55 AM, Zachary Charlop-Powers
> <zcharlop at mail.rockefeller.edu> wrote:
> > Kevin,
> > Michiel,
> >
> > I just tested Kevin's code for a few simple queries and it worked great.
> I
> > have always liked KEGG's organization of data and really appreciate this
> > RESTful interface to their data; in some ways I think it easier to use
> the
> > web interfaces for KEGG than it is for NCBI. Plus the KEGG coverage of
> > metabolic networks is awesome.  I found the examples in Kevin's test
> script
> > to be fairly self-explanatory but a simple-spelled out example in the
> > Tutorial would be nice.
> >
> > One thought, though, is that you can retrieve MANY different types of
> data
> > from the KEGG Rest API - which means that the user will probably have to
> > parse the data his/herself. Data retrieved with "list" can return lists
> of
> > genes or compounds or organism and after a  cursory look  these are each
> > formatted differently. Also true with the 'find' command. So I think you
> > were right to leave out parsers because i think they will be a moving
> target
> > highly dependent on the query.
> >
> > Thank You Kevin,
> > zach cp
>
> Good point about decoupling the web API wrapper and the parsers -
> how the Bio.Entrez module and Bio.TogoWS handle this is to return
> handles for web results, which you can then parse with an appropriate
> parser (e.g. SeqIO for GenBank files, Medline parser, etc).
>
> Note that this is a little more fiddly under Python 3 due to the text
> mode distinction between unicode and binary... just something to
> keep in the back of your mind.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: KEGG documentation.pdf
Type: application/pdf
Size: 128597 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20121024/3f7b7063/attachment-0002.pdf>

From cmccoy at fhcrc.org  Thu Oct 25 21:36:44 2012
From: cmccoy at fhcrc.org (Connor McCoy)
Date: Thu, 25 Oct 2012 14:36:44 -0700
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
Message-ID: <CAChfGK0C=-facUZeq_Aqd4LS=NFCE8iBmtcTt=OmL7-L62PUew@mail.gmail.com>

Hello,

About a year ago, pip support came up on the list:

http://biopython.org/pipermail/biopython-dev/2011-October/009234.html

I remember this being resolved, but when I try to install biopython with
pip, it fails:

    $ testenv/bin/pip install biopython

    Downloading/unpacking biopython
      Running setup.py egg_info for package biopython

        warning: no previously-included files matching '.cvsignore' found
under directory '*'
        warning: no previously-included files matching '*.pyc' found under
directory '*'
    Installing collected packages: biopython
      Running setup.py install for biopython

        Numerical Python (NumPy) is not installed.

        This package is required for many Biopython features.  Please
install
        it before you install Biopython. You can install Biopython anyway,
but
        anything dependent on NumPy will not work. If you do this, and later
        install NumPy, you should then re-install Biopython.

        You can find NumPy at http://numpy.scipy.org

        Complete output from command
/home/cmccoy/development/seqmagick/testenv/bin/python -c "import
setuptools;__file__='/home/cmccoy/development/seqmagick/testenv/build/biopython/set
    up.py';exec(compile(open(__file__).read().replace('\r\n', '\n'),
__file__, 'exec'))" install --single-version-externally-managed --record
/tmp/pip-wc___H-record/install-record.txt -
    -install-headers
/home/cmccoy/development/seqmagick/testenv/include/site/python2.7:
        running install


    Numerical Python (NumPy) is not installed.


    This package is required for many Biopython features.  Please install

    it before you install Biopython. You can install Biopython anyway, but

    anything dependent on NumPy will not work. If you do this, and later

    install NumPy, you should then re-install Biopython.


    You can find NumPy at http://numpy.scipy.org


    ----------------------------------------
    Command /home/cmccoy/development/seqmagick/testenv/bin/python -c
"import
setuptools;__file__='/home/cmccoy/development/seqmagick/testenv/build/biopython/setup.py';exec(compile(open(
    __file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install
--single-version-externally-managed --record
/tmp/pip-wc___H-record/install-record.txt --install-headers /home/cm
    ccoy/development/seqmagick/testenv/include/site/python2.7 failed with
error code 255 in /home/cmccoy/development/seqmagick/testenv/build/biopython
    Storing complete log in /home/cmccoy/.pip/pip.log


Same for libraries which list biopython in `install_requires`.

Does anyone know of a way around this?

Thanks,
Connor

-- 
Connor McCoy
Fred Hutchinson Cancer Research Center
1100 Fairview Ave N.
Seattle, WA 98109-1924
cmccoy at fhcrc.org


From mjldehoon at yahoo.com  Fri Oct 26 02:52:42 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 25 Oct 2012 19:52:42 -0700 (PDT)
Subject: [Biopython-dev] KEGG API Wrapper
In-Reply-To: <CAEe6yUEbiK3tFdvx1hEGE2==QR7Pab2HcvL6x-CqOivWCB9=sg@mail.gmail.com>
Message-ID: <1351219962.39081.YahooMailClassic@web164002.mail.gq1.yahoo.com>

Hi Kevin,

Thanks for the documentation! That makes everything a lot clearer.
Overall I like the querying code and I think we should add it to Biopython.

I have a bunch of comments on the KEGG module, some on the existing code and some on the new querying code, see below. Most of these are trivial; some may need some further discussion. Perhaps could you let us know which of these comments you can address, and which ones you want to skip for now?

Once we converged with regards to the querying code and the documentation, I think we can import your version of the KEGG module into the main Biopython repository and add your chapter on KEGG to the main documentation, and continue from there on the parsers and the unit tests.

Many thanks!
-Michiel.


About the querying code:
----------------------------------

I would replace KEGG.query("list", KEGG.query("find", KEGG.query("conv", KEGG.query("link", KEGG.query("info", KEGG.query("get" by the functions KEGG.list, KEGG.find, KEGG.conv, KEGG.link, KEGG.info, and KEGG.get.

For list, find, conv, link, and info, instead of going through KEGG.generic_parser, I would return the result directly as a Python list.
In contrast, KEGG.get should return the handle to the results, not the data itself. So the _q function, instead of
? ...
? resp = urllib2.urlopen(req)
? data = resp.read()
? return query_url, data
have
? ...

? resp = urllib2.urlopen(req)

? return resp
Then the user can decide whether to parse the data on the fly with Bio.KEGG, or read the data line by line and pick up what they are interested in, or to get all data from the handle and save it in a file. Note that resp will have a .url attribute that contains the url, so you won't need the ret_url keyword.


About the parsers:

------------------------


I think that we should drop generic_parser. For link, find, conv, link, and info, parsing is trivial and can be done by the respective functions directly. For get, we already have an appropriate parser for some databases (compound, map, and enzyme), but it's easy to add parsers for the other databases.

For all parsers in Biopython, there is the question whether the record should store information in attributes (as is currently done in Bio.KEGG), or alternatively if the record should inherit from a dictionary and store information in keys in the dictionary. Personally I have a preference for a dictionary, since that allows us to use the exact same keys in the dictionary as is used in the file (e.g., we can use "CLASS" as a key, while we cannot use .class as an attribute since it is a reserved word, so we use .classname instead). But other Biopython developers may not agree with me, and to some extent it depends on personal preference. 

The parsers miss some key words. The ones I noticed are ALL_REAC, REFERENCE, and ORTHOLOGY. Probably we'll find more once we extend the unit tests.

Remove the ';' at the end of each term in record.classname.

Convert record.genes to a dictionary for each organism. So instead of
[('HSA', ['5236', '55276']), ('PTR', ['456908', '461162']), ('PON', ['100190836', '100438793']), ('MCC', ['100424648', '699401']...
have
{'HSA': ['5236', '55276'], 'PTR': ['456908', '461162'], 'PON': ['100190836', '100438793'], 'MCC': ['100424648', '699401'], ...

Also for record.dblinks, record.disease, record.structures, use a dictionary.

In record.pathway, all entries start with 'PATH'. Perhaps we should check with KEGG if there could be anything else than 'PATH' there, otherwise I don't see the reason why it's there. Assuming that there could be something different there, I would also use a dictionary with 'PATH' as the key.

In record.reaction, some chemical names can be very long and extend over multiple lines. In such cases, the continuation line starts with a '$'. The parser should remove the '$' and join the two lines.

About the tests:


--------------------

We should update the data files in Tests/KEGG. This will fix some "bugs" in these data files.

We should switch test_KEGG.py to the unit test framework.

We should do some more extensive testing to make sure we are not missing some key words.

About the documentation:
---------------------------------
It's great that we now have some documentation.

On page 233, I would suggest to replace the "id_" by "accession" or something else, since the underscore in "id_" may look funky to new users.


Also it may be better not to reuse variable names (e.g. "pathway" is used in three different ways in the example). It's OK of course in general, but for this example it may be more clear to distinguish the different usages of this variable from each other.

For repair_genes, you can use a set instead of a list throughout.


--- On Wed, 10/24/12, Kevin Wu <kjwu at ucsd.edu> wrote:

From: Kevin Wu <kjwu at ucsd.edu>
Subject: Re: [Biopython-dev] KEGG API Wrapper
To: "Peter Cock" <p.j.a.cock at googlemail.com>, "Zachary Charlop-Powers" <zcharlop at mail.rockefeller.edu>, "Michiel de Hoon" <mjldehoon at yahoo.com>
Cc: Biopython-dev at lists.open-bio.org
Date: Wednesday, October 24, 2012, 6:38 PM

Hi All,
Thanks for the comments, I've written a bit of documentation on the entire KEGG module and have attached those relevant pages to the email. There didn't seem like an?appropriate place for examples, so I just added a new chapter. I've also committed the updated file to github.


I did leave out the parsers due to the fact that the current parsers only cover a small portion of possible responses from the api. Also, I'm not confident that the some of the parsers correctly retrieves all the fields. However, I've written a really general parser that does a rough job of retrieving fields if it's a database format returned since I find myself reusing the code for all database formats. It's possible to modify this to correctly account for the different fields, but would probably take a bit of work to manually figure each field out. Otherwise it also parses the tsv/flat file returned.


Also, @zach, thanks for checking it out and testing it!
Thanks All!Kevin
On Wed, Oct 17, 2012 at 4:09 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:


On Wed, Oct 17, 2012 at 12:55 AM, Zachary Charlop-Powers

<zcharlop at mail.rockefeller.edu> wrote:

> Kevin,

> Michiel,

>

> I just tested Kevin's code for a few simple queries and it worked great. I

> have always liked KEGG's organization of data and really appreciate this

> RESTful interface to their data; in some ways I think it easier to use the

> web interfaces for KEGG than it is for NCBI. Plus the KEGG coverage of

> metabolic networks is awesome. ?I found the examples in Kevin's test script

> to be fairly self-explanatory but a simple-spelled out example in the

> Tutorial would be nice.

>

> One thought, though, is that you can retrieve MANY different types of data

> from the KEGG Rest API - which means that the user will probably have to

> parse the data his/herself. Data retrieved with "list" can return lists of

> genes or compounds or organism and after a ?cursory look ?these are each

> formatted differently. Also true with the 'find' command. So I think you

> were right to leave out parsers because i think they will be a moving target

> highly dependent on the query.

>

> Thank You Kevin,

> zach cp


Good point about decoupling the web API wrapper and the parsers -

how the Bio.Entrez module and Bio.TogoWS handle this is to return

handles for web results, which you can then parse with an appropriate

parser (e.g. SeqIO for GenBank files, Medline parser, etc).


Note that this is a little more fiddly under Python 3 due to the text

mode distinction between unicode and binary... just something to

keep in the back of your mind.


Peter

_______________________________________________

Biopython-dev mailing list

Biopython-dev at lists.open-bio.org

http://lists.open-bio.org/mailman/listinfo/biopython-dev


From kai.blin at biotech.uni-tuebingen.de  Fri Oct 26 08:35:56 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Fri, 26 Oct 2012 10:35:56 +0200
Subject: [Biopython-dev] Status of SearchIO
Message-ID: <508A4B6C.6020801@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi folks,

In the summer, I've written a HMMer2 parser based on Bow's SearchIO
code. I'm finally getting around to continue work on the project I
needed this parser for, and I'm trying to get my code up-to-date.

I notice that Bow's code hasn't hit the biopython master tree yet, and
also doesn't rebase cleanly on top of it. A merge gives a couple of
merge conflicts, but seems manageable. However, I'd prefer to stick to
the upstream sources instead of maintaining my own branch containing
Bow's SearchIO code merged to master.

What's the chance of this happening any time soon, and can I help?

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQiktsAAoJEKM5lwBiwTTPuDMH/33PGo/zLpBGw+dKIBXZ9b9L
opaoI5uUsj4XzWU1A8u50BXFqa6ogwUWeZFaA2j25nQgEClWA5TFdHAJM4urTTgD
pM2g2rsL/yLSrVifM95c2IcRW2z7dunccpJDd6cc82BRpqqgGWrkNo7OSUk/exP3
DbfooBw66Scxt+6o6S9zEH4IY5giuDOGzwQm195TCaZ/x/8/y1F8Ub/8Aporbj47
eJgZmEKzh0k8KePKOdyCmnt/d/bDGplFSvgqXET6Q0jmVAG44lAU679UPCmNiuJr
VZD2SMRKy+Buy3TjJjQCeUEm+awN4T2LnPLDJgJkvRHjl6G+M9aljsuL78uCp9g=
=1Nrt
-----END PGP SIGNATURE-----


From p.j.a.cock at googlemail.com  Fri Oct 26 09:21:50 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 10:21:50 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <508A4B6C.6020801@biotech.uni-tuebingen.de>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>

On Fri, Oct 26, 2012 at 9:35 AM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi folks,
>
> In the summer, I've written a HMMer2 parser based on Bow's SearchIO
> code. I'm finally getting around to continue work on the project I
> needed this parser for, and I'm trying to get my code up-to-date.
>
> I notice that Bow's code hasn't hit the biopython master tree yet, and
> also doesn't rebase cleanly on top of it. A merge gives a couple of
> merge conflicts, but seems manageable. However, I'd prefer to stick to
> the upstream sources instead of maintaining my own branch containing
> Bow's SearchIO code merged to master.
>
> What's the chance of this happening any time soon, and can I help?
>
> Cheers,
> Kai

I'm not sure where the merge conflict is - Bow can probably help
and confirm you're looking at the appropriate branch.

What would help is comments on the name space ideas in this
thread, since one major point we need to settle ASAP is where
in the namespace SearchIO would go (since it probably won't
just stay as Bio.SearchIO as it is on the branch):

http://lists.open-bio.org/pipermail/biopython-dev/2012-September/009910.html
...
http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html
...

Peter


From w.arindrarto at gmail.com  Fri Oct 26 09:33:35 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 26 Oct 2012 11:33:35 +0200
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
Message-ID: <CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>

Hi Kai, Peter,

For the merge conflict, which branch are you using? Can you point to
specific commits that cause the conflicts? I haven't tried merging /
rebasing my own branch to the current master myself ~ so knowing this
should help the process as well.

And suggestions are still welcomed for the namespace :). Bio.SearchIO is
the current one, but we have other alternatives (the most recent one being
Bio.seq.search; following the change in Bio.Seq -> Bio.seq namespace
change).

Also, I think there are still some issues that need to be dealt with before
we put SearchIO into master, notably with Bio.BLAST module. If not the
official deprecation notice, at least the the tutorial has to be updated
(let Bio.BLAST readers know about the plan with SearchIO). I've written a
short tutorial here: http://bow.web.id/biopython/Tutorial.html. This is
still a draft, but you can already see that there are some obvious overlaps
between Bio.BLAST and Bio.SearchIO, which is confusing to new readers.

regards,
Bow

On Fri, Oct 26, 2012 at 11:21 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Fri, Oct 26, 2012 at 9:35 AM, Kai Blin
> <kai.blin at biotech.uni-tuebingen.de> wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hi folks,
> >
> > In the summer, I've written a HMMer2 parser based on Bow's SearchIO
> > code. I'm finally getting around to continue work on the project I
> > needed this parser for, and I'm trying to get my code up-to-date.
> >
> > I notice that Bow's code hasn't hit the biopython master tree yet, and
> > also doesn't rebase cleanly on top of it. A merge gives a couple of
> > merge conflicts, but seems manageable. However, I'd prefer to stick to
> > the upstream sources instead of maintaining my own branch containing
> > Bow's SearchIO code merged to master.
> >
> > What's the chance of this happening any time soon, and can I help?
> >
> > Cheers,
> > Kai
>
> I'm not sure where the merge conflict is - Bow can probably help
> and confirm you're looking at the appropriate branch.
>
> What would help is comments on the name space ideas in this
> thread, since one major point we need to settle ASAP is where
> in the namespace SearchIO would go (since it probably won't
> just stay as Bio.SearchIO as it is on the branch):
>
>
> http://lists.open-bio.org/pipermail/biopython-dev/2012-September/009910.html
> ...
> http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html
> ...
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From p.j.a.cock at googlemail.com  Fri Oct 26 09:43:28 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 10:43:28 +0100
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <CAChfGK0C=-facUZeq_Aqd4LS=NFCE8iBmtcTt=OmL7-L62PUew@mail.gmail.com>
References: <CAChfGK0C=-facUZeq_Aqd4LS=NFCE8iBmtcTt=OmL7-L62PUew@mail.gmail.com>
Message-ID: <CAKVJ-_7fDh8QTzAZRQcsH5uZndt+a3N0uDEkpuzsrau=J3aLhA@mail.gmail.com>

On Thu, Oct 25, 2012 at 10:36 PM, Connor McCoy <cmccoy at fhcrc.org> wrote:
> Hello,
>
> About a year ago, pip support came up on the list:
>
> http://biopython.org/pipermail/biopython-dev/2011-October/009234.html
>
> I remember this being resolved, but when I try to install biopython with
> pip, it fails:
>
>     $ testenv/bin/pip install biopython
>
>     Downloading/unpacking biopython
>       Running setup.py egg_info for package biopython
>
>         warning: no previously-included files matching '.cvsignore' found
> under directory '*'
>         warning: no previously-included files matching '*.pyc' found under
> directory '*'
>     Installing collected packages: biopython
>       Running setup.py install for biopython
>
>         Numerical Python (NumPy) is not installed.
>
>         This package is required for many Biopython features.  Please
> install
>         it before you install Biopython. You can install Biopython anyway,
> but
>         anything dependent on NumPy will not work. If you do this, and later
>         install NumPy, you should then re-install Biopython.
>
>         You can find NumPy at http://numpy.scipy.org
>
>         Complete output from command
> /home/cmccoy/development/seqmagick/testenv/bin/python -c "import
> setuptools;__file__='/home/cmccoy/development/seqmagick/testenv/build/biopython/set
>     up.py';exec(compile(open(__file__).read().replace('\r\n', '\n'),
> __file__, 'exec'))" install --single-version-externally-managed --record
> /tmp/pip-wc___H-record/install-record.txt -
>     -install-headers
> /home/cmccoy/development/seqmagick/testenv/include/site/python2.7:
>         running install
>
>
>
>     Numerical Python (NumPy) is not installed.
>
>
>
>     This package is required for many Biopython features.  Please install
>
>     it before you install Biopython. You can install Biopython anyway, but
>
>     anything dependent on NumPy will not work. If you do this, and later
>
>     install NumPy, you should then re-install Biopython.
>
>
>
>     You can find NumPy at http://numpy.scipy.org
>
>
>
>     ----------------------------------------
>     Command /home/cmccoy/development/seqmagick/testenv/bin/python -c
> "import
> setuptools;__file__='/home/cmccoy/development/seqmagick/testenv/build/biopython/setup.py';exec(compile(open(
>     __file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install
> --single-version-externally-managed --record
> /tmp/pip-wc___H-record/install-record.txt --install-headers /home/cm
>     ccoy/development/seqmagick/testenv/include/site/python2.7 failed with
> error code 255 in /home/cmccoy/development/seqmagick/testenv/build/biopython
>     Storing complete log in /home/cmccoy/.pip/pip.log
>
>
> Same for libraries which list biopython in `install_requires`.
>
> Does anyone know of a way around this?
>
> Thanks,
> Connor

Hi Connor,

This is probably a question for Brad - I don't use pip.

Was it sitting stalled at the prompt from Biopython's setup.py?
"Do you want to continue this installation? (y/N)" or from pip?
i.e. What was at the end of the complete log?

In terms of a quick workaround, what we use under TravisCI
(where most of the targets don't have numpy installed) is
piping a yes on stdin, e.g.

$ /usr/bin/yes | python setup.py install

Peter


From p.j.a.cock at googlemail.com  Fri Oct 26 10:31:06 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 11:31:06 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <508A6535.6070507@biotech.uni-tuebingen.de>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
	<508A6535.6070507@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_6pvuFqCc+EJVgP-GAnAN6XQ83Y7tqAE8NKU1j95qsEnA@mail.gmail.com>

On Fri, Oct 26, 2012 at 11:25 AM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
>> Also, I think there are still some issues that need to be dealt
>> with before we put SearchIO into master, notably with Bio.BLAST
>> module. If not the official deprecation notice, at least the the
>> tutorial has to be updated (let Bio.BLAST readers know about the
>> plan with SearchIO). I've written a short tutorial here:
>> http://bow.web.id/biopython/Tutorial.html. This is still a draft,
>> but you can already see that there are some obvious overlaps
>> between Bio.BLAST and Bio.SearchIO, which is confusing to new
>> readers.
>
> Personally I wouldn't let this consideration block the inclusion of a
> module as useful like that. Of course I need this code, so I'm biased.

I'm also OK with merging the code before updating the Tutorial
chapter on BLAST (which would probably become a broader
chapter on BLAST and other tools using SearchIO). As discussed
before, the long term aim would be to remove Bio.BLAST.

> I'll have to read up on the namespace discussion. While I see the
> benefit of using PEP8 names, intuitively I don't like bio.seq.search
> much. Then again, I started my life in Bio* with BioPerl, and like the
> pretty similar module layout BioPython has so far.

Yeah - the current naming of SeqIO and AlignIO was directly
inspired by BioPerl, and give the working name of SearchIO.

Peter


From kai.blin at biotech.uni-tuebingen.de  Fri Oct 26 10:25:57 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Fri, 26 Oct 2012 12:25:57 +0200
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
Message-ID: <508A6535.6070507@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-10-26 11:33, Wibowo Arindrarto wrote:
> Hi Kai, Peter,
> 
> For the merge conflict, which branch are you using? Can you point
> to specific commits that cause the conflicts? I haven't tried
> merging / rebasing my own branch to the current master myself ~ so
> knowing this should help the process as well.

For merging, I think I had to change
.travis.yml
setup.py
and Tests/run_tests.py

.travis.yml and setup.py mainly had whitespace changes in comments, so
I just went with the version from master on those changes. As I said,
nothing really huge.

https://github.com/kblin/biopython/tree/searchio-merge is the merged tree.

The rebase had a number of things, I just gave up on that.

> Also, I think there are still some issues that need to be dealt
> with before we put SearchIO into master, notably with Bio.BLAST
> module. If not the official deprecation notice, at least the the
> tutorial has to be updated (let Bio.BLAST readers know about the
> plan with SearchIO). I've written a short tutorial here:
> http://bow.web.id/biopython/Tutorial.html. This is still a draft, 
> but you can already see that there are some obvious overlaps
> between Bio.BLAST and Bio.SearchIO, which is confusing to new
> readers.

Personally I wouldn't let this consideration block the inclusion of a
module as useful like that. Of course I need this code, so I'm biased.

I'll have to read up on the namespace discussion. While I see the
benefit of using PEP8 names, intuitively I don't like bio.seq.search
much. Then again, I started my life in Bio* with BioPerl, and like the
pretty similar module layout BioPython has so far.

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQimU1AAoJEKM5lwBiwTTPLUsH/i1C1jWmSgjk3PZSOo2kpn4l
sGfonyZ7UcyOyM1RYMOc9xaJwevyGJbxVpdmhzIsCr8WZ2++uTgqwOKHROw84bu4
BfVTovUD3mNUK3kGEemOQQal8HyjTZozRFmPgQpSSTOOgQE964kA7mm2HJH9sNx9
NHUKj+dk7UwmbzETl2Q0/1lmxdptOVCTyQvwMzleCX4dwgdGumyrNiBQmBLerAKV
CRW8cVmVPKkVUokuzWpt6LPZIoUxMz5RVmTJktOX0fpg79ULfXQucByrGtGQbiSR
JMWGrK5yCliSz1WqV8r/Tx0VfPmEeiZFyzZb5KiAFE88sJK85cbFgUBegUTDZSU=
=372O
-----END PGP SIGNATURE-----


From w.arindrarto at gmail.com  Fri Oct 26 10:38:50 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 26 Oct 2012 12:38:50 +0200
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAKVJ-_6pvuFqCc+EJVgP-GAnAN6XQ83Y7tqAE8NKU1j95qsEnA@mail.gmail.com>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
	<508A6535.6070507@biotech.uni-tuebingen.de>
	<CAKVJ-_6pvuFqCc+EJVgP-GAnAN6XQ83Y7tqAE8NKU1j95qsEnA@mail.gmail.com>
Message-ID: <CADEGkF5J0pHBoNaB1xKfQzwNYYzuKverV+1zt3EiEXVA0dEQKg@mail.gmail.com>

>> Also, I think there are still some issues that need to be dealt
>
> >> with before we put SearchIO into master, notably with Bio.BLAST
> >> module. If not the official deprecation notice, at least the the
> >> tutorial has to be updated (let Bio.BLAST readers know about the
> >> plan with SearchIO). I've written a short tutorial here:
> >> http://bow.web.id/biopython/Tutorial.html. This is still a draft,
> >> but you can already see that there are some obvious overlaps
> >> between Bio.BLAST and Bio.SearchIO, which is confusing to new
> >> readers.
> >
> > Personally I wouldn't let this consideration block the inclusion of a
> > module as useful like that. Of course I need this code, so I'm biased.
>
> I'm also OK with merging the code before updating the Tutorial
> chapter on BLAST (which would probably become a broader
> chapter on BLAST and other tools using SearchIO). As discussed
> before, the long term aim would be to remove Bio.BLAST.

Ah, ok then :). There are other things I'm still working on at the
moment (BLAST plain text writer, details about migrating from
Bio.Blast), but I consider these to be less urgent than the tutorial.
If everyone is ok for merging, then I'm good too :). I suppose we are
going to use the 'beta' new feature warning here, right?

> > I'll have to read up on the namespace discussion. While I see the
> > benefit of using PEP8 names, intuitively I don't like bio.seq.search
> > much. Then again, I started my life in Bio* with BioPerl, and like the
> > pretty similar module layout BioPython has so far.
>
> Yeah - the current naming of SeqIO and AlignIO was directly
> inspired by BioPerl, and give the working name of SearchIO.
>
> Peter

Reaching a unanimous decision on name preference seems difficult :/.
We now have:

1. Bio.seq.search (in line with the namespace change)
2. Bio.seqsearch (top-level module, separate from Bio.seq. This used
to be Bio.SeqSearch, now adjusted for PEP8 compliance)
3. Bio.search (same reasoning + explanation like Bio.seqsearch).
4. Bio.SearchIO / Bio.searchio
5. Bio.psearch (p for pairwise)

Any other suggestions? Should we put it to a vote?

regards,
Bowo


From p.j.a.cock at googlemail.com  Fri Oct 26 10:51:32 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 11:51:32 +0100
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <508A694B.7030800@biotech.uni-tuebingen.de>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>

On Fri, Oct 26, 2012 at 11:43 AM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
>
> Hi folks,
>
> I realize I'm late to this party, but I was asked to give an opinion
> in the SearchIO thread.
>
> On 2012-09-06 09:06, Peter Cock wrote:
>> For single user machines, where the single user has only a small
>> collection of scripts this isn't such an issue. For any shared
>> server, or user with lots of Biopython scripts (some of which may
>> have been written by different people), you would be forced into a
>> mass change at one go.
>>
>> You would also have considerable hassle later on with any attempt
>> to re-run old scripts.
>
> In my opinion, this is where python virtualenv [1] can really make
> life easier, and I'd recommend this for running old library versions
> anyway.
>
> I'd rather do the correct change now, for every version of python, and
> explain to people how to set up virtualenvs for their older scripts.

I don't think this is practical - you'd have a *lot* of explaining to do
for all the users who'd be bitten by such a big non-backward
compatible change (and associated systems administrators).

Indirectly it sounds like you like the lower case name idea - what
do you think about making this switch under Python 3? (This will
only inconvenience the relatively small number of early adopters
already trying Biopython under Python 3 - but it would be another
bump for people transitioning from Python 2 to 3).

Peter


From p.j.a.cock at googlemail.com  Fri Oct 26 10:57:16 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 11:57:16 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CADEGkF5J0pHBoNaB1xKfQzwNYYzuKverV+1zt3EiEXVA0dEQKg@mail.gmail.com>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
	<508A6535.6070507@biotech.uni-tuebingen.de>
	<CAKVJ-_6pvuFqCc+EJVgP-GAnAN6XQ83Y7tqAE8NKU1j95qsEnA@mail.gmail.com>
	<CADEGkF5J0pHBoNaB1xKfQzwNYYzuKverV+1zt3EiEXVA0dEQKg@mail.gmail.com>
Message-ID: <CAKVJ-_6Yaa0-xBbw5TgqMny9LbwpTJXG2X_dE2=ybcP_GFRvAg@mail.gmail.com>

On Fri, Oct 26, 2012 at 11:38 AM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
>>> Also, I think there are still some issues that need to be dealt
>>
>> >> with before we put SearchIO into master, notably with Bio.BLAST
>> >> module. If not the official deprecation notice, at least the the
>> >> tutorial has to be updated (let Bio.BLAST readers know about the
>> >> plan with SearchIO). I've written a short tutorial here:
>> >> http://bow.web.id/biopython/Tutorial.html. This is still a draft,
>> >> but you can already see that there are some obvious overlaps
>> >> between Bio.BLAST and Bio.SearchIO, which is confusing to new
>> >> readers.
>> >
>> > Personally I wouldn't let this consideration block the inclusion of a
>> > module as useful like that. Of course I need this code, so I'm biased.
>>
>> I'm also OK with merging the code before updating the Tutorial
>> chapter on BLAST (which would probably become a broader
>> chapter on BLAST and other tools using SearchIO). As discussed
>> before, the long term aim would be to remove Bio.BLAST.
>
> Ah, ok then :). There are other things I'm still working on at the
> moment (BLAST plain text writer, details about migrating from
> Bio.Blast), but I consider these to be less urgent than the tutorial.
> If everyone is ok for merging, then I'm good too :). I suppose we are
> going to use the 'beta' new feature warning here, right?

Yes to the 'beta' warning. I'd like to get some wider testing with
community feedback on the API, while giving us the option to
change it before declaring it stable.

>> > I'll have to read up on the namespace discussion. While I see the
>> > benefit of using PEP8 names, intuitively I don't like bio.seq.search
>> > much. Then again, I started my life in Bio* with BioPerl, and like the
>> > pretty similar module layout BioPython has so far.
>>
>> Yeah - the current naming of SeqIO and AlignIO was directly
>> inspired by BioPerl, and give the working name of SearchIO.
>>
>> Peter
>
> Reaching a unanimous decision on name preference seems difficult :/.
> We now have:
>
> 1. Bio.seq.search (in line with the namespace change)
> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used
> to be Bio.SeqSearch, now adjusted for PEP8 compliance)
> 3. Bio.search (same reasoning + explanation like Bio.seqsearch).
> 4. Bio.SearchIO / Bio.searchio
> 5. Bio.psearch (p for pairwise)
>
> Any other suggestions? Should we put it to a vote?

I'd like a consensus first on the larger question of should we
adopt lower case module names automatically under Python 3.
In that case, option (1) about would be bio.seq.search under
Python 3, and so on.

Peter


From kai.blin at biotech.uni-tuebingen.de  Fri Oct 26 10:43:23 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Fri, 26 Oct 2012 12:43:23 +0200
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
Message-ID: <508A694B.7030800@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-09-06 09:06, Peter Cock wrote:

Hi folks,

I realize I'm late to this party, but I was asked to give an opinion
in the SearchIO thread.

> For single user machines, where the single user has only a small
> collection of scripts this isn't such an issue. For any shared
> server, or user with lots of Biopython scripts (some of which may
> have been written by different people), you would be forced into a
> mass change at one go.
> 
> You would also have considerable hassle later on with any attempt
> to re-run old scripts.

In my opinion, this is where python virtualenv [1] can really make
life easier, and I'd recommend this for running old library versions
anyway.

I'd rather do the correct change now, for every version of python, and
explain to people how to set up virtualenvs for their older scripts.

Cheers,
Kai

[1] http://pypi.python.org/pypi/virtualenv

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQimlLAAoJEKM5lwBiwTTPsswIAMnEn4AT8xrfsq3xzkbB6tS2
y5FkLAb11xDP5PpttA+5qDXmnmJuMFqYq8FsSnJnpVq+ZGSAkswFC1prqQp57LdG
V+EVZtf/HDzepbrVgNYe272nTPlc6cxjmtjWJca19fg8gKI97ryUiji/bbOfgjgM
cnGHeUYkGmrcWrI8ergOS/5qLi3Z6S6t+uJezPT3DkbSm8oiOVAuPrIv6MziX69W
QrKF3Edf4s1Do4URSVfZI1qVUEGFaLZMYvZ8/TMgDI2CAQLo0r2OxylrjJxcuqIB
nORFTdwFMD7npDLkyG5U4eWZpfAV9A4RHNTybhpb7RgdVHifnoivA0nIAhsIAWE=
=3VH6
-----END PGP SIGNATURE-----


From kai.blin at biotech.uni-tuebingen.de  Fri Oct 26 12:21:21 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Fri, 26 Oct 2012 14:21:21 +0200
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
Message-ID: <508A8041.2020203@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-10-26 12:51, Peter Cock wrote:

Hi Peter,

> Indirectly it sounds like you like the lower case name idea - what 
> do you think about making this switch under Python 3? (This will 
> only inconvenience the relatively small number of early adopters 
> already trying Biopython under Python 3 - but it would be another 
> bump for people transitioning from Python 2 to 3).

Actually, as someone who has to switch between BioPython and BioPerl a
lot, I'd personally prefer if both libraries stayed as close as
possible in their structure. In my opinion, the ability to easily
switch between languages while using the Bio* libraries is one of the
biggest features. As far as I understand we're just changing module
names here, so all that'd be different would be the import lines.

After reading thought this thread, I got the impression that there was
a general agreement on switching to PEP8-compatible names eventually,
and the remaining question was how to best do that.

I haven't played with Python 3 much yet, but I have the impression
that switching to it likely is going to be painful anyway. Even if the
module renaming makes the transition a bit more painful, at least
you've only got to go through the pain once.

Assuming the translations between the 2.x and 3.x names can be done
automatically by the conversion script, this sounds like a good idea.

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQioBBAAoJEKM5lwBiwTTPhxYIALTM1TQvOcE6upSFOCrfA0Uh
irgvsQi77JfWvDsvGnOk74+ZQDDM2KGGAR3s9QBPdjRtaXhxSvdSxlXq3sdTNsXh
VjbhEkeW6J3NzVSYbwK3U/mP0D9Xs6ihvnne06Nn7qjH+TLGm2x78cPM5SvjUcL3
QHiHda0wW479J9ZyKhmDTsCXqpX96uH3sjLiKZfs3KJbZ79j20BBWJqWypDuIUb7
DmtY/sngRsqs16yJL1Q35LXskOlCYsHOmJmkXg3Umr8gKOSw5nCEszhatXS3Oygo
Pv8F7exvoEfNHg1IQtmEFycou9k5IaGVsZoRhCE6YvUCJH4Zfz4eOUTD323AzT4=
=UPdn
-----END PGP SIGNATURE-----


From p.j.a.cock at googlemail.com  Fri Oct 26 12:42:25 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 13:42:25 +0100
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <508A8041.2020203@biotech.uni-tuebingen.de>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>

On Fri, Oct 26, 2012 at 1:21 PM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 2012-10-26 12:51, Peter Cock wrote:
>
> Hi Peter,
>
>> Indirectly it sounds like you like the lower case name idea - what
>> do you think about making this switch under Python 3? (This will
>> only inconvenience the relatively small number of early adopters
>> already trying Biopython under Python 3 - but it would be another
>> bump for people transitioning from Python 2 to 3).
>
> Actually, as someone who has to switch between BioPython and BioPerl a
> lot, I'd personally prefer if both libraries stayed as close as
> possible in their structure. In my opinion, the ability to easily
> switch between languages while using the Bio* libraries is one of the
> biggest features. As far as I understand we're just changing module
> names here, so all that'd be different would be the import lines.
>
> After reading thought this thread, I got the impression that there was
> a general agreement on switching to PEP8-compatible names eventually,
> and the remaining question was how to best do that.

Yes - hindered by the fact that due to file system limitations we can't
have multiple capitalisations of a given module at the same time.
Ideally we'd like to use bio.* as the namespace, and make this
switch as part of moving to Python 3 is one way to do that.

My personal preference is for a new lowercase namespace like
biopy.* or biopython.* which can co-exist with Bio.* during a
transition period. However, this did not seem popular.

> I haven't played with Python 3 much yet, but I have the impression
> that switching to it likely is going to be painful anyway. Even if the
> module renaming makes the transition a bit more painful, at least
> you've only got to go through the pain once.
>
> Assuming the translations between the 2.x and 3.x names can be done
> automatically by the conversion script, this sounds like a good idea.

That was my thinking - but it does go against the general advice
to library authors in that API changes from Python 2.x to 3.x are
discouraged.

We can of course stick with Bio.* as it is (which I believe is Brad's
favoured option). And I'm OK with this - it is the simplest option
(and doesn't prevent us doing some more minor changes if we
want to, such as reorganising all the Bio.SeqXXXX modules
under one directory).

Perhaps a blog post & email to the announcement mailing list
soliciting feedback on this proposal is the best way forward,
perhaps with a web-survey form? e.g.

(1) Keep the namespace as 'Bio'

(2) Keep the namespace as 'Bio' on Python 2,
but adopt all lowercase module names on Python 3.

(3) Move to a new all lowercase namespace like 'biopy'
(anything except 'bio'), allowing the current 'Bio' namespace
to continue to be available as well during a transition period.

And the most disruptive option:

(4) Switch to an all lowercase namespace 'bio', which
cannot in general co-exist with the old 'Bio' namespace
(perhaps bumping the version number to 2.0.0?). This
would break legacy scripts, which would need to be
updated, e.g.:

from Bio.SeqRecord import SeqRecord
from Bio import SeqIO

could be replaced by:

try:
    #Biopython 1.x uses Bio.*
    from Bio.SeqRecord import SeqRecord
    from Bio import SeqIO
except ImportError:


This would mean under Windows and most Mac install
you cannot have both
you (and all other users of the machine) m
must be remove

Regards,

Peter


From p.j.a.cock at googlemail.com  Fri Oct 26 12:43:36 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 13:43:36 +0100
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
Message-ID: <CAKVJ-_49AKymz+kB=1vfU2NX6WcuKaeODnH9m1h2OXt2FjqMTQ@mail.gmail.com>

Arg - occidentally tabbed to the send button while trying to indent
sample code...

On Fri, Oct 26, 2012 at 1:42 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Perhaps a blog post & email to the announcement mailing list
> soliciting feedback on this proposal is the best way forward,
> perhaps with a web-survey form? e.g.
>
> (1) Keep the namespace as 'Bio'
>
> (2) Keep the namespace as 'Bio' on Python 2,
> but adopt all lowercase module names on Python 3.
>
> (3) Move to a new all lowercase namespace like 'biopy'
> (anything except 'bio'), allowing the current 'Bio' namespace
> to continue to be available as well during a transition period.
>
> And the most disruptive option:
>
> (4) Switch to an all lowercase namespace 'bio', which
> cannot in general co-exist with the old 'Bio' namespace
> (perhaps bumping the version number to 2.0.0?). This
> would break legacy scripts, which would need to be
> updated, e.g.:
>
> from Bio.SeqRecord import SeqRecord
> from Bio import SeqIO
>
> could be replaced by:


try:
     #Biopython 1.x uses Bio.*
     from Bio.SeqRecord import SeqRecord
     from Bio import SeqIO
except ImportError:

>
>
>
>
> This would mean under Windows and most Mac install
> you cannot have both
> you (and all other users of the machine) m
> must be remove
>
> Regards,
>
> Peter


From p.j.a.cock at googlemail.com  Fri Oct 26 12:50:23 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 13:50:23 +0100
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_49AKymz+kB=1vfU2NX6WcuKaeODnH9m1h2OXt2FjqMTQ@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
	<CAKVJ-_49AKymz+kB=1vfU2NX6WcuKaeODnH9m1h2OXt2FjqMTQ@mail.gmail.com>
Message-ID: <CAKVJ-_484A0E4-cHYE2XT7FtDp04b8BW_QA89NTdKHNHskPWMw@mail.gmail.com>

On Fri, Oct 26, 2012 at 1:43 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Arg - occidentally tabbed to the send button while trying to indent
> sample code...

Has something changed on GoogleMail's keyboard handling?
Either that or I'm having a bad typing day... my apologies for
the two extra emails.

To continue:

Perhaps a blog post & email to the announcement mailing list
soliciting feedback on this proposal is the best way forward,
perhaps with a web-survey form? e.g.

(1) Keep the namespace as 'Bio'

(2) Keep the namespace as 'Bio' on Python 2,
but adopt all lowercase module names on Python 3.

(3) Move to a new all lowercase namespace like 'biopy'
(anything except 'bio'), allowing the current 'Bio' namespace
to continue to be available as well during a transition period.

And the most disruptive option:

(4) Switch to an all lowercase namespace 'bio', which
cannot in general co-exist with the old 'Bio' namespace
(perhaps bumping the version number to 2.0.0?). This
would break legacy scripts, which would need to be
updated, e.g.:

from Bio.SeqRecord import SeqRecord
from Bio import SeqIO

could be replaced by:

try:
    #Biopython 1.x uses Bio.*
    from Bio.SeqRecord import SeqRecord
    from Bio import SeqIO
except ImportError:
    #Try the new lowercase module names,
    from bio.seqrecord import SeqRecord
    from bio import seqio as SeqIO

Users on Windows and most Mac users might find updating
Biopython complicated during this transition due to the
change in case of the folder names. For anyone installing
from source this might require manual removal of the old
folders (I ran into this kind of issue while trying the lower
case naming under Python 3).

Potentially under Linux (and any Mac using a case sensitive
file system) an old Biopython install using Bio/ and the newer
Biopython using bio/ could co-exist... we would have to look
at that.

Regards,

Peter


From kai.blin at biotech.uni-tuebingen.de  Fri Oct 26 13:34:12 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Fri, 26 Oct 2012 15:34:12 +0200
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
Message-ID: <508A9154.8020507@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-10-26 14:42, Peter Cock wrote:

> My personal preference is for a new lowercase namespace like 
> biopy.* or biopython.* which can co-exist with Bio.* during a 
> transition period. However, this did not seem popular.

That'd still mean older scripts would break after the transition
period, and we'll end up encoding the language name in the module,
which seems a bit silly.

Having said that, I see the least amount of pain for BioPython users
going that route, with the possibly larger maintenance headache for
BioPython developers.

I think this is one of these "what color do we paint the bikeshed"
discussions, where there really isn't any objectively superior solution.

> That was my thinking - but it does go against the general advice to
> library authors in that API changes from Python 2.x to 3.x are 
> discouraged.

Right, but from dealing with the python folks on Freenode IRC, I
gather that many of them assume the switch from Python 2.x to 3.x is a
very low-impact change for code authors. I tend to disagree there. :)

> We can of course stick with Bio.* as it is (which I believe is
> Brad's favoured option). And I'm OK with this - it is the simplest
> option (and doesn't prevent us doing some more minor changes if we 
> want to, such as reorganising all the Bio.SeqXXXX modules under one
> directory).

As I said, strong feeling of a bikeshed discussion here. :)

> Perhaps a blog post & email to the announcement mailing list 
> soliciting feedback on this proposal is the best way forward, 
> perhaps with a web-survey form? e.g.

To be honest, I don't care that much about which solution is decided
on, as long as the decision is made soon. I've got some programs that
need the HMMer2 parser that I've added to Bow's SearchIO code, and I'm
hoping to get that into BioPython soon instead of having to ship with
a custom BioPython for publication.

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQipFTAAoJEKM5lwBiwTTP4nkIAI5TegXeWy6b8FoPmq46XPzz
iVh6g0t37xAJ9Aat3aE5vDklF7yqEwcVPKxFkj2Nd2MLaDqhfnuldE9pEqbPmZfl
eQptF5JXTAlw/YKAPFzTyFSIlKv3wiuTiGeTxKJtXewOkgEu6VwzNgjPnCYhamaT
Nda7NQEA6mlmaH7ABwO1mLLObk7i90oqVNDIuhnOAAA1ZrVnnQ4QHRupbiLZVd3d
3od3JVM4h+ZT5AL12Lts9lAdrc94MVri5i0P1VSQEnAQV/LJ5uoT2a4l2DRFM35R
NR501X7ubTQPrK8ATveTWaCYYcn/XMnS7dEpvSWsxFR8oM+69LxF3UVtH2ShfDs=
=Teym
-----END PGP SIGNATURE-----


From eric.talevich at gmail.com  Fri Oct 26 15:19:23 2012
From: eric.talevich at gmail.com (Eric Talevich)
Date: Fri, 26 Oct 2012 11:19:23 -0400
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CADEGkF5J0pHBoNaB1xKfQzwNYYzuKverV+1zt3EiEXVA0dEQKg@mail.gmail.com>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
	<508A6535.6070507@biotech.uni-tuebingen.de>
	<CAKVJ-_6pvuFqCc+EJVgP-GAnAN6XQ83Y7tqAE8NKU1j95qsEnA@mail.gmail.com>
	<CADEGkF5J0pHBoNaB1xKfQzwNYYzuKverV+1zt3EiEXVA0dEQKg@mail.gmail.com>
Message-ID: <CAMC681mXGXR05+g=wMfMcYB40oySoe2aomRRQvx5Y4doXFF3TA@mail.gmail.com>

On Fri, Oct 26, 2012 at 6:38 AM, Wibowo Arindrarto
<w.arindrarto at gmail.com>wrote:

> >> Also, I think there are still some issues that need to be dealt
> >
> > >> with before we put SearchIO into master, notably with Bio.BLAST
> > >> module. If not the official deprecation notice, at least the the
> > >> tutorial has to be updated (let Bio.BLAST readers know about the
> > >> plan with SearchIO). I've written a short tutorial here:
> > >> http://bow.web.id/biopython/Tutorial.html. This is still a draft,
> > >> but you can already see that there are some obvious overlaps
> > >> between Bio.BLAST and Bio.SearchIO, which is confusing to new
> > >> readers.
> > >
> > > Personally I wouldn't let this consideration block the inclusion of a
> > > module as useful like that. Of course I need this code, so I'm biased.
> >
> > I'm also OK with merging the code before updating the Tutorial
> > chapter on BLAST (which would probably become a broader
> > chapter on BLAST and other tools using SearchIO). As discussed
> > before, the long term aim would be to remove Bio.BLAST.
>

Bio.Blast does contain some features beyond parsing the output of BLAST...


> > I'll have to read up on the namespace discussion. While I see the
> > > benefit of using PEP8 names, intuitively I don't like bio.seq.search
> > > much. Then again, I started my life in Bio* with BioPerl, and like the
> > > pretty similar module layout BioPython has so far.
> >
> > Yeah - the current naming of SeqIO and AlignIO was directly
> > inspired by BioPerl, and give the working name of SearchIO.
> >
> > Peter
>
> Reaching a unanimous decision on name preference seems difficult :/.
> We now have:
>
> 1. Bio.seq.search (in line with the namespace change)
> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used
> to be Bio.SeqSearch, now adjusted for PEP8 compliance)
> 3. Bio.search (same reasoning + explanation like Bio.seqsearch).
> 4. Bio.SearchIO / Bio.searchio
> 5. Bio.psearch (p for pairwise)
>
> Any other suggestions? Should we put it to a vote?
>
> regards,
> Bowo
>
>
If it's down to a vote, I would vote to merge this branch as Bio.SearchIO,
and perhaps lowercase it to Bio.searchio or biopy.searchio in the Py3
lowercase branch.

Rationale: We already follow BioPerl with SeqIO and AlignIO, and it seems
to help users. It's also Google-friendly.

-Eric


From p.j.a.cock at googlemail.com  Fri Oct 26 15:42:18 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 16:42:18 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAMC681mXGXR05+g=wMfMcYB40oySoe2aomRRQvx5Y4doXFF3TA@mail.gmail.com>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
	<508A6535.6070507@biotech.uni-tuebingen.de>
	<CAKVJ-_6pvuFqCc+EJVgP-GAnAN6XQ83Y7tqAE8NKU1j95qsEnA@mail.gmail.com>
	<CADEGkF5J0pHBoNaB1xKfQzwNYYzuKverV+1zt3EiEXVA0dEQKg@mail.gmail.com>
	<CAMC681mXGXR05+g=wMfMcYB40oySoe2aomRRQvx5Y4doXFF3TA@mail.gmail.com>
Message-ID: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>

On Fri, Oct 26, 2012 at 4:19 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
>
> Bio.Blast does contain some features beyond parsing the output of BLAST...
>

Also wrappers to call the tools, and the online search.
Easy enough.

>> Reaching a unanimous decision on name preference seems difficult :/.
>> We now have:
>>
>> 1. Bio.seq.search (in line with the namespace change)
>> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used
>> to be Bio.SeqSearch, now adjusted for PEP8 compliance)
>> 3. Bio.search (same reasoning + explanation like Bio.seqsearch).
>> 4. Bio.SearchIO / Bio.searchio
>> 5. Bio.psearch (p for pairwise)
>>
>> Any other suggestions? Should we put it to a vote?
>>
>> regards,
>> Bowo
>>
>
> If it's down to a vote, I would vote to merge this branch as Bio.SearchIO,
> and perhaps lowercase it to Bio.searchio or biopy.searchio in the Py3
> lowercase branch.
>
> Rationale: We already follow BioPerl with SeqIO and AlignIO, and it
> seems to help users. It's also Google-friendly.

I like Bio.SearchIO for those reasons too. Perhaps that is the
most popular name?

Peter


From mjldehoon at yahoo.com  Fri Oct 26 15:58:04 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 26 Oct 2012 08:58:04 -0700 (PDT)
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>
Message-ID: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>

> 1. Bio.seq.search (in line with the namespace change)
> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used
>    to be Bio.SeqSearch, now adjusted for PEP8 compliance)
> 3. Bio.search (same reasoning + explanation like Bio.seqsearch).
> 4. Bio.SearchIO / Bio.searchio
> 5. Bio.psearch (p for pairwise)

> If it's down to a vote, I would vote to merge this branch as
> Bio.SearchIO, and perhaps lowercase it to Bio.searchio or
> biopy.searchio in the Py3 lowercase branch.
> 
> Rationale: We already follow BioPerl with SeqIO and AlignIO, and it
> seems to help users. It's also Google-friendly.

I would vote for Bio.seq.search.
I don't like Bio.SearchIO much because a) it doesn't tell you clearly what the module is about; and b) I think it it is a mistake to have Bio.SeqIO separate from Bio.Seq, and Bio.AlignIO separate from Bio.Align, because in both cases the two modules conceptually deal with the same thing. We don't have Bio.Cluster and Bio.ClusterIO, Bio.Entrez and Bio.EntrezIO, Bio.Motif and Bio.MotifIO; why should Bio.Seq and Bio.Align be different?

-Michiel.


From p.j.a.cock at googlemail.com  Fri Oct 26 16:14:22 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 17:14:22 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>
	<1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>

On Fri, Oct 26, 2012 at 4:58 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>> 1. Bio.seq.search (in line with the namespace change)
>> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used
>>    to be Bio.SeqSearch, now adjusted for PEP8 compliance)
>> 3. Bio.search (same reasoning + explanation like Bio.seqsearch).
>> 4. Bio.SearchIO / Bio.searchio
>> 5. Bio.psearch (p for pairwise)
>
>> If it's down to a vote, I would vote to merge this branch as
>> Bio.SearchIO, and perhaps lowercase it to Bio.searchio or
>> biopy.searchio in the Py3 lowercase branch.
>>
>> Rationale: We already follow BioPerl with SeqIO and AlignIO, and it
>> seems to help users. It's also Google-friendly.
>
> I would vote for Bio.seq.search.

And would you support moving other existing Bio.SeqXXX modules
under Bio.seq.* as for example outlined here?:
http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html
If so then I think we should go with that plan.

> I don't like Bio.SearchIO much because a) it doesn't tell you clearly
> what the module is about; and b) I think it it is a mistake to have
> Bio.SeqIO separate from Bio.Seq, and Bio.AlignIO separate from
> Bio.Align, because in both cases the two modules conceptually deal
> with the same thing. We don't have Bio.Cluster and Bio.ClusterIO,
> Bio.Entrez and Bio.EntrezIO, Bio.Motif and Bio.MotifIO; why should
> Bio.Seq and Bio.Align be different?

After all, not everyone was exposed to BioPerl before Biopython ;)

Peter


From p.j.a.cock at googlemail.com  Fri Oct 26 21:19:28 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Oct 2012 22:19:28 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>
References: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>
	<1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>
Message-ID: <CAKVJ-_6XT6EO9rJaJ9RJWiE7XTvtAGe4Fq54wfNmCt+Vj57gzg@mail.gmail.com>

On Fri, Oct 26, 2012 at 5:14 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Fri, Oct 26, 2012 at 4:58 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>>> 1. Bio.seq.search (in line with the namespace change)
>>> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used
>>>    to be Bio.SeqSearch, now adjusted for PEP8 compliance)
>>> 3. Bio.search (same reasoning + explanation like Bio.seqsearch).
>>> 4. Bio.SearchIO / Bio.searchio
>>> 5. Bio.psearch (p for pairwise)
>>
>>> If it's down to a vote, I would vote to merge this branch as
>>> Bio.SearchIO, and perhaps lowercase it to Bio.searchio or
>>> biopy.searchio in the Py3 lowercase branch.
>>>
>>> Rationale: We already follow BioPerl with SeqIO and AlignIO, and it
>>> seems to help users. It's also Google-friendly.
>>
>> I would vote for Bio.seq.search.
>
> And would you support moving other existing Bio.SeqXXX
> modules under Bio.seq.* as for example outlined here?:
> http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html
> If so then I think we should go with that plan.

I have started exploring that idea on this new branch,
https://github.com/peterjc/biopython/tree/bioseq

Does anyone object to me applying the first commit to the master
branch (defining the previously discussed new warning for 'beta' code)?
https://github.com/peterjc/biopython/commit/97485d5dcf2620f7664ae46a7897c1203847538d

Note that introducing Bio.seq now (and any relocations under this)
can (I believe) still be combined with the lower-case modules under
Python 3 idea as well. This just requires the public classes and
functions defined under Bio.Seq.* remains mirrored under Bio.Seq.*
(this means assorted Seq objects and some functions like translate).

Peter


From w.arindrarto at gmail.com  Fri Oct 26 22:43:45 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Sat, 27 Oct 2012 00:43:45 +0200
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAKVJ-_6XT6EO9rJaJ9RJWiE7XTvtAGe4Fq54wfNmCt+Vj57gzg@mail.gmail.com>
References: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>
	<1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>
	<CAKVJ-_6XT6EO9rJaJ9RJWiE7XTvtAGe4Fq54wfNmCt+Vj57gzg@mail.gmail.com>
Message-ID: <CADEGkF7GLd6eEzXec-5W_U4s9F2THe0wknyYpb2ptSY6oFH1aQ@mail.gmail.com>

>>> 1. Bio.seq.search (in line with the namespace change)
>>>> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used
>>>>    to be Bio.SeqSearch, now adjusted for PEP8 compliance)
>>>> 3. Bio.search (same reasoning + explanation like Bio.seqsearch).
>>>> 4. Bio.SearchIO / Bio.searchio
>>>> 5. Bio.psearch (p for pairwise)
>>>
>>>> If it's down to a vote, I would vote to merge this branch as
>>>> Bio.SearchIO, and perhaps lowercase it to Bio.searchio or
>>>> biopy.searchio in the Py3 lowercase branch.
>>>>
>>>> Rationale: We already follow BioPerl with SeqIO and AlignIO, and it
>>>> seems to help users. It's also Google-friendly.
>>>
>>> I would vote for Bio.seq.search.
>>
>> And would you support moving other existing Bio.SeqXXX
>> modules under Bio.seq.* as for example outlined here?:
>> http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html
>> If so then I think we should go with that plan.
>
> I have started exploring that idea on this new branch,
> https://github.com/peterjc/biopython/tree/bioseq
>
> Does anyone object to me applying the first commit to the master
> branch (defining the previously discussed new warning for 'beta' code)?
> https://github.com/peterjc/biopython/commit/97485d5dcf2620f7664ae46a7897c1203847538d

No objection from me for the commit :).

But I have some concerns for the SearchIO naming. I like Bio.seqsearch
best at the moment. Bio.seq.search is good, but I understand that
Bio.SearchIO will eventually contain app wrappers and code for remote
searches as well. Putting it three levels-deep doesn't feel nice to
me. As comparisons, submodules with similar features (Bio.Phylo, and
possibly Bio.AlignIO, if in the future it will be merged with
alignment app wrappers and the alignment object model) are available
under Bio.

> Note that introducing Bio.seq now (and any relocations under this)
> can (I believe) still be combined with the lower-case modules under
> Python 3 idea as well. This just requires the public classes and
> functions defined under Bio.Seq.* remains mirrored under Bio.Seq.*
> (this means assorted Seq objects and some functions like translate).
>
> Peter

regards,
Bow


From p.j.a.cock at googlemail.com  Sat Oct 27 00:54:47 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 27 Oct 2012 01:54:47 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CADEGkF7GLd6eEzXec-5W_U4s9F2THe0wknyYpb2ptSY6oFH1aQ@mail.gmail.com>
References: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>
	<1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>
	<CAKVJ-_6XT6EO9rJaJ9RJWiE7XTvtAGe4Fq54wfNmCt+Vj57gzg@mail.gmail.com>
	<CADEGkF7GLd6eEzXec-5W_U4s9F2THe0wknyYpb2ptSY6oFH1aQ@mail.gmail.com>
Message-ID: <CAKVJ-_7jGGqNH9LK6aosTCnJznEN4oaNB+6w6m2P02WGz3czSw@mail.gmail.com>

On Fri, Oct 26, 2012 at 11:43 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
>Peter wrote:
>> I have started exploring that idea on this new branch,
>> https://github.com/peterjc/biopython/tree/bioseq
>>
>> Does anyone object to me applying the first commit to the master
>> branch (defining the previously discussed new warning for 'beta' code)?
>> https://github.com/peterjc/biopython/commit/97485d5dcf2620f7664ae46a7897c1203847538d
>
> No objection from me for the commit :).
>
> But I have some concerns for the SearchIO naming. I like Bio.seqsearch
> best at the moment. Bio.seq.search is good, but I understand that
> Bio.SearchIO will eventually contain app wrappers and code for remote
> searches as well. Putting it three levels-deep doesn't feel nice to
> me. As comparisons, submodules with similar features (Bio.Phylo, and
> possibly Bio.AlignIO, if in the future it will be merged with
> alignment app wrappers and the alignment object model) are available
> under Bio.

I think we'd get used to the nested namespace pretty quickly, and
this really only affect the import line anyway, e.g. something like
this isn't so bad as long as we document this:

from Bio.seq.search.apps import BlatCommandLine

If the namespace nesting bothers you, then you might not like
my thoughts for how to combine Bio.Align and Bio.AlignIO
(since we can't use Bio.align due to the folder name clash on
case incentive platforms): I was wondering about using
Bio.seq.align for this, which again is a bit nested but would
make it a system module to Bio.seq.search (aka SearchIO)
and Bio.seq.record (which could include the former SeqIO
code as well as the SeqRecord class).

Peter


From eric.talevich at gmail.com  Sat Oct 27 04:03:46 2012
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sat, 27 Oct 2012 00:03:46 -0400
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAKVJ-_7jGGqNH9LK6aosTCnJznEN4oaNB+6w6m2P02WGz3czSw@mail.gmail.com>
References: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>
	<1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>
	<CAKVJ-_6XT6EO9rJaJ9RJWiE7XTvtAGe4Fq54wfNmCt+Vj57gzg@mail.gmail.com>
	<CADEGkF7GLd6eEzXec-5W_U4s9F2THe0wknyYpb2ptSY6oFH1aQ@mail.gmail.com>
	<CAKVJ-_7jGGqNH9LK6aosTCnJznEN4oaNB+6w6m2P02WGz3czSw@mail.gmail.com>
Message-ID: <CAMC681m5bXWPt48EH2Yrhu9b2F2kUTQVkkmRAJeONsJ--0GJjA@mail.gmail.com>

On Fri, Oct 26, 2012 at 8:54 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

>
> If the namespace nesting bothers you, then you might not like
> my thoughts for how to combine Bio.Align and Bio.AlignIO
> (since we can't use Bio.align due to the folder name clash on
> case incentive platforms): I was wondering about using
> Bio.seq.align for this, which again is a bit nested but would
> make it a system module to Bio.seq.search (aka SearchIO)
> and Bio.seq.record (which could include the former SeqIO
> code as well as the SeqRecord class).
>
>
Does that mean we'd have read, write, convert, etc. under Bio.seq.record?
This is how that API would look:

from Bio.seq import record
for rec in record.parse("example.fa", "fasta"): ...

As opposed to:

# Minor change
from Bio import seqio
for record in seqio.parse(...)

# Make sure we get those relative imports right!
from Bio.seq import io
for record in io.parse(...)

# Slight cognitive distance, but maybe worth it
from Bio import seq
for record in seq.parse(...)


Also: Technically, Bio.Motif operates on multiple sequence alignments, so
it could be moved to Bio.seq.align.motif. (Not entirely trolling here, just
pointing out possible consequences.)

-Eric


From w.arindrarto at gmail.com  Sat Oct 27 05:55:27 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Sat, 27 Oct 2012 07:55:27 +0200
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAMC681m5bXWPt48EH2Yrhu9b2F2kUTQVkkmRAJeONsJ--0GJjA@mail.gmail.com>
References: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>
	<1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>
	<CAKVJ-_6XT6EO9rJaJ9RJWiE7XTvtAGe4Fq54wfNmCt+Vj57gzg@mail.gmail.com>
	<CADEGkF7GLd6eEzXec-5W_U4s9F2THe0wknyYpb2ptSY6oFH1aQ@mail.gmail.com>
	<CAKVJ-_7jGGqNH9LK6aosTCnJznEN4oaNB+6w6m2P02WGz3czSw@mail.gmail.com>
	<CAMC681m5bXWPt48EH2Yrhu9b2F2kUTQVkkmRAJeONsJ--0GJjA@mail.gmail.com>
Message-ID: <CADEGkF4nOFCGkiLvpre2KOeWbjf_zFgH2kpCavxwc_pOCEUq6g@mail.gmail.com>

>> If the namespace nesting bothers you, then you might not like
>> my thoughts for how to combine Bio.Align and Bio.AlignIO
>> (since we can't use Bio.align due to the folder name clash on
>> case incentive platforms): I was wondering about using
>> Bio.seq.align for this, which again is a bit nested but would
>> make it a system module to Bio.seq.search (aka SearchIO)
>> and Bio.seq.record (which could include the former SeqIO
>> code as well as the SeqRecord class).
>>
> Does that mean we'd have read, write, convert, etc. under Bio.seq.record?
> This is how that API would look:
>
> from Bio.seq import record
> for rec in record.parse("example.fa", "fasta"): ...
>
> As opposed to:
>
> # Minor change
> from Bio import seqio
> for record in seqio.parse(...)
>
> # Make sure we get those relative imports right!
> from Bio.seq import io
> for record in io.parse(...)
>
> # Slight cognitive distance, but maybe worth it
> from Bio import seq
> for record in seq.parse(...)
>
>
> Also: Technically, Bio.Motif operates on multiple sequence alignments, so it
> could be moved to Bio.seq.align.motif. (Not entirely trolling here, just
> pointing out possible consequences.)
>
> -Eric

What bothers me other than it being hidden is also the inconsistency
(comparing it to the current namespace). However, if there is also a
plan to merge sequence-related submodules under Bio.seq, it feels
better and I'm ok with it. Still hidden, but we'll have more
consistency and the root namespace will have less clutter.

So it would look like this (with previously mentioned examples):

Bio.SearchIO -> Bio.seq.search
Bio.AlignIO -> Bio.seq.align
Bio.Motif -> Bio.seq.motif
Bio.SeqIO -> Bio.seq (or merge with Bio.SeqRecord into Bio.seq.record)
Bio.SeqRecord -> Bio.seq.record
Bio.SeqUtils -> Bio.seq.utils
Bio.SeqFeature -> Bio.seq.feature

Also maybe:
Bio.Alphabet -> Bio.seq.alphabet
Bio.Restriction  -> Bio.seq.restriction or Bio.seq.utils.restriction

And Eric is right, we may go further with Bio.seq.align.motif, but I
think nesting sequence-related modules under Bio.seq is the furthest
we should go. I personally find it the most intuitive.

regards,
Bow


From mjldehoon at yahoo.com  Sat Oct 27 10:46:10 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 27 Oct 2012 03:46:10 -0700 (PDT)
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>
Message-ID: <1351334770.89984.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi everybody,

--- On Fri, 10/26/12, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> And would you support moving other existing Bio.SeqXXX
> modules under Bio.seq.* as for example outlined here?:
> http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html

Yes that looks good to me.

> I'm not 100% sure where the Bio.SeqIO top level functions
> would belong, either directly under Bio.seq or Bio.seq.record
> might work too.

I would prefer to have the top-level functions directly under Bio.seq, since they will be used a lot.

Best,
-Michiel.


From mjldehoon at yahoo.com  Sat Oct 27 10:47:43 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 27 Oct 2012 03:47:43 -0700 (PDT)
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CADEGkF4nOFCGkiLvpre2KOeWbjf_zFgH2kpCavxwc_pOCEUq6g@mail.gmail.com>
Message-ID: <1351334863.39503.YahooMailClassic@web164006.mail.gq1.yahoo.com>

--- On Sat, 10/27/12, Wibowo Arindrarto <w.arindrarto at gmail.com> wrote:
> And Eric is right, we may go further with Bio.seq.align.motif, but I
> think nesting sequence-related modules under Bio.seq is the furthest
> we should go. I personally find it the most intuitive.

I agree. And according to the Zen of Python, flat is better than nested.

Best,
-Michiel.


From bartek at rezolwenta.eu.org  Sat Oct 27 12:55:12 2012
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Sat, 27 Oct 2012 14:55:12 +0200
Subject: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif
In-Reply-To: <1351339617.3402.YahooMailClassic@web164005.mail.gq1.yahoo.com>
References: <CABHxouWroakKxFvTQK-5y=FvOeXc_7bLHNnCYnz3wgAup_c_jg@mail.gmail.com>
	<1351339617.3402.YahooMailClassic@web164005.mail.gq1.yahoo.com>
Message-ID: <CABHxouUk5jPrhP8w-KTvuDhJCeguEVXR=4JO-dbbetZr9q5BjA@mail.gmail.com>

Hi Michiel,

On Sat, Oct 27, 2012 at 2:06 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> Actually I was thinking about the suggestions for Bio.Motif I made earlier (see http://lists.open-bio.org/pipermail/biopython-dev/2012-September/009946.html). Right now they are just ideas, so I haven't implemented them yet. You mentioned in your reply last month:
>
>> I'll try to come up with a more thought through and longer response
>> later in the week...
>

Absolutely. It's just that I had quite a crazy time lately (time spent
writing proposals and other such stuff...) and I didn't really think
too much about Bio.Motif.

> So I was wondering if you have any additional comments on these suggestions, or if I can go ahead and start implementing.
>

I'm sorry if my inactivity has slowed things down. I'll try to be more
constructive this time.

I think that one thing clear is the Bio.Motif could use some code
optimization, especially in the area of PWM searching. Honestly, I
don't think that there will be a time in a forseeable future that I'll
do it, so if you feel like implementing a better code for PWM
handling/searching I'll be happy to do some code review or testing.

There are a few things I think would be good to keep:
- possibility to invoke motif.pwm_search(...) without worrying about
the fact that it is actually carried out by some specialized class
- possibility to determine motif thresholds based on fpr or fnr as
currently implemented in Bio.Motif.Thresholds module
- possibility to convert count based motifs to PWM based motifs
without much fuss...

All of these things are not really in conflict with your idea of
moving the PWM related code to the special class, so if you want to do
that, go ahead.

If you also have trouble finding time to implement these improvements,
I could try to recruit some master student from our department to do
that. But if you have time to do the implementation yourself, it will
probably be better and faster that way.

best
Bartek

-- 
Bartek Wilczynski


From mjldehoon at yahoo.com  Sun Oct 28 02:47:15 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 27 Oct 2012 19:47:15 -0700 (PDT)
Subject: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif
In-Reply-To: <CABHxouUk5jPrhP8w-KTvuDhJCeguEVXR=4JO-dbbetZr9q5BjA@mail.gmail.com>
Message-ID: <1351392435.42713.YahooMailClassic@web164006.mail.gq1.yahoo.com>

Hi Bartek,

OK, thanks!
I'll go ahead with the implementation then, and write an update to the mailing list again so people can have a look at it.

Best,
-Michiel.

--- On Sat, 10/27/12, Bartek Wilczynski <bartek at rezolwenta.eu.org> wrote:

> From: Bartek Wilczynski <bartek at rezolwenta.eu.org>
> Subject: Re: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "BioPython-Dev" <biopython-dev at biopython.org>
> Date: Saturday, October 27, 2012, 8:55 AM
> Hi Michiel,
> 
> On Sat, Oct 27, 2012 at 2:06 PM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> 
> > Actually I was thinking about the suggestions for
> Bio.Motif I made earlier (see http://lists.open-bio.org/pipermail/biopython-dev/2012-September/009946.html).
> Right now they are just ideas, so I haven't implemented them
> yet. You mentioned in your reply last month:
> >
> >> I'll try to come up with a more thought through and
> longer response
> >> later in the week...
> >
> 
> Absolutely. It's just that I had quite a crazy time lately
> (time spent
> writing proposals and other such stuff...) and I didn't
> really think
> too much about Bio.Motif.
> 
> > So I was wondering if you have any additional comments
> on these suggestions, or if I can go ahead and start
> implementing.
> >
> 
> I'm sorry if my inactivity has slowed things down. I'll try
> to be more
> constructive this time.
> 
> I think that one thing clear is the Bio.Motif could use some
> code
> optimization, especially in the area of PWM searching.
> Honestly, I
> don't think that there will be a time in a forseeable future
> that I'll
> do it, so if you feel like implementing a better code for
> PWM
> handling/searching I'll be happy to do some code review or
> testing.
> 
> There are a few things I think would be good to keep:
> - possibility to invoke motif.pwm_search(...) without
> worrying about
> the fact that it is actually carried out by some specialized
> class
> - possibility to determine motif thresholds based on fpr or
> fnr as
> currently implemented in Bio.Motif.Thresholds module
> - possibility to convert count based motifs to PWM based
> motifs
> without much fuss...
> 
> All of these things are not really in conflict with your
> idea of
> moving the PWM related code to the special class, so if you
> want to do
> that, go ahead.
> 
> If you also have trouble finding time to implement these
> improvements,
> I could try to recruit some master student from our
> department to do
> that. But if you have time to do the implementation
> yourself, it will
> probably be better and faster that way.
> 
> best
> Bartek
> 
> -- 
> Bartek Wilczynski
> 


From chapmanb at 50mail.com  Sun Oct 28 18:55:31 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sun, 28 Oct 2012 14:55:31 -0400
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
In-Reply-To: <CAChfGK0C=-facUZeq_Aqd4LS=NFCE8iBmtcTt=OmL7-L62PUew@mail.gmail.com>
References: <CAChfGK0C=-facUZeq_Aqd4LS=NFCE8iBmtcTt=OmL7-L62PUew@mail.gmail.com>
Message-ID: <87sj8ys9y4.fsf@fastmail.fm>


Connor;

> I remember this being resolved, but when I try to install biopython with
> pip, it fails:

Thanks for the report. It looks like the command line options pip uses
to call setup.py changed a bit, so the hack we have in place is no
longer working. I pushed a fix for this:

https://github.com/chapmanb/biopython/commit/e05a355e3e9825c44c4a9b3bdfdda25c9a92c9c4

which seems to resolve the issue and hopefully make it more robust going
forward. Could you confirm it works on your system:

$ cd /tmp
$ git clone git://github.com/chapmanb/biopython.git
$ sudo pip install /tmp/biopython

If so, I'll push this into the main repo for the next release. Thanks
again for letting us know about the problem,
Brad


From chapmanb at 50mail.com  Sun Oct 28 19:02:54 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sun, 28 Oct 2012 15:02:54 -0400
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
Message-ID: <87pq42s9lt.fsf@fastmail.fm>


Peter and all;
Interesting discussion on the module path issues. I'm agreed with
everyone that it would be nice to be pep8 compliant. However, my vote
would be to stick with our traditional namespace to avoid widespread
breakage. The changes everyone is proposing are nice, but not nice
enough to deal with introducing an incompatible version and the
documentation and help fallout from that.

If everyone wants to go down the module name path, it would be worth
investing in a biopython1to2 script that automatically handles the
renamings for folks.

Just my 2 cents,
Brad


From p.j.a.cock at googlemail.com  Mon Oct 29 08:15:59 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 29 Oct 2012 08:15:59 +0000
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <87pq42s9lt.fsf@fastmail.fm>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
	<87pq42s9lt.fsf@fastmail.fm>
Message-ID: <CAKVJ-_4GzU+5vMXd1XLvycV=tK6xcgMoSA53cjNmYC4fGoPM6w@mail.gmail.com>

On Sunday, October 28, 2012, Brad Chapman wrote:

>
> Peter and all;
> Interesting discussion on the module path issues. I'm agreed with
> everyone that it would be nice to be pep8 compliant. However, my vote
> would be to stick with our traditional namespace to avoid widespread
> breakage. The changes everyone is proposing are nice, but not nice
> enough to deal with introducing an incompatible version and the
> documentation and help fallout from that.
>
> If everyone wants to go down the module name path, it would be worth
> investing in a biopython1to2 script that automatically handles the
> renamings for folks.
>
> Just my 2 cents,
> Brad
>

Hi Brad,

In the case of Bow's SearchIO code, what would you prefer?
e.g. Bio.SearchIO as it is now on his branch?

Peter


From kai.blin at biotech.uni-tuebingen.de  Mon Oct 29 10:26:03 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Mon, 29 Oct 2012 11:26:03 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
Message-ID: <508E59BB.1050705@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-10-26 11:33, Wibowo Arindrarto wrote:

Hi Bow, Peter,

> For the merge conflict, which branch are you using? Can you point
> to specific commits that cause the conflicts? I haven't tried
> merging / rebasing my own branch to the current master myself ~ so
> knowing this should help the process as well.

Disregarding the namespace discussion, I needed to get a reasonable
branch to get my HMMer2 parser up-to-date in. As I said last week I
tried rebasing Bow's searchio branch and had a bunch of merge conflicts.

I've retried the rebase today, and most of the merge conflicts are
actually pretty trivial and mostly around the question where the code
gets it's OrderedDict from for python versions < 2.7.

I've pushed the rebased patchset to
https://github.com/kblin/biopython/tree/searchio-rebase if anybody
wants to have a look. With the last patch fixing an error I seem to
have introduced during merge conflict resolution, the SearchIO tests
pass on that branch.

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQjlm7AAoJEKM5lwBiwTTPFe8IAMMLmM2kQmb9vOSCuNjbcIfJ
HqzzvLaw8Eo44uEb0zmxhuJwPoPZpdZIWCNM1t3LpynaE3mHawLcrYJTT/R1YxkS
udBHvMlU6h76J93NITWCzFZ7HHlMMrbzyPel7rifWXbv5xpG2BREpmr1V7lKmbH7
XbInPsVP0PjySFlCQb3219M+IZ4fA+ViYSBlQeXs91G1YzMVo6nkDcs+FkDG8mJt
Qg2u4Bhrxaf3qQKNuQzb2AHJ4KpnEkYsTI2FUJfHaulNfN6w9HwsEgyvM6hVqONP
4aIYlsbSlLjbGG3sdliibPJy5A+8AnkNSFlAHydL+FgBVmPqo3Xe0O5buTdz3Vs=
=prZo
-----END PGP SIGNATURE-----


From cmccoy at fhcrc.org  Mon Oct 29 15:24:45 2012
From: cmccoy at fhcrc.org (Connor McCoy)
Date: Mon, 29 Oct 2012 08:24:45 -0700
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <87sj8ys9y4.fsf@fastmail.fm>
References: <CAChfGK0C=-facUZeq_Aqd4LS=NFCE8iBmtcTt=OmL7-L62PUew@mail.gmail.com>
	<87sj8ys9y4.fsf@fastmail.fm>
Message-ID: <CAChfGK3jP-1vvHCOn7+HC8omhUNyMJMVvq369f=4H307SrO-yg@mail.gmail.com>

Hi Brad,

Thank you so much for the quick reply.  I just got a chance to test this,
and it seems to be working again.

Best,
Connor

On Sun, Oct 28, 2012 at 11:55 AM, Brad Chapman <chapmanb at 50mail.com> wrote:

>
> Connor;
>
> > I remember this being resolved, but when I try to install biopython with
> > pip, it fails:
>
> Thanks for the report. It looks like the command line options pip uses
> to call setup.py changed a bit, so the hack we have in place is no
> longer working. I pushed a fix for this:
>
>
> https://github.com/chapmanb/biopython/commit/e05a355e3e9825c44c4a9b3bdfdda25c9a92c9c4
>
> which seems to resolve the issue and hopefully make it more robust going
> forward. Could you confirm it works on your system:
>
> $ cd /tmp
> $ git clone git://github.com/chapmanb/biopython.git
> $ sudo pip install /tmp/biopython
>
> If so, I'll push this into the main repo for the next release. Thanks
> again for letting us know about the problem,
> Brad
>


-- 
Connor McCoy
Fred Hutchinson Cancer Research Center
1100 Fairview Ave N.
Seattle, WA 98109-1924
cmccoy at fhcrc.org


From chapmanb at 50mail.com  Mon Oct 29 17:54:30 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 29 Oct 2012 13:54:30 -0400
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_4GzU+5vMXd1XLvycV=tK6xcgMoSA53cjNmYC4fGoPM6w@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
	<87pq42s9lt.fsf@fastmail.fm>
	<CAKVJ-_4GzU+5vMXd1XLvycV=tK6xcgMoSA53cjNmYC4fGoPM6w@mail.gmail.com>
Message-ID: <874nldqi3t.fsf@fastmail.fm>


Peter;

> In the case of Bow's SearchIO code, what would you prefer?
> e.g. Bio.SearchIO as it is now on his branch?

I like plain ol' Search the best but don't have a strong preference. I'm
terrible at naming things so trust everyone's judgment on this.

Brad


From w.arindrarto at gmail.com  Mon Oct 29 20:11:09 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Mon, 29 Oct 2012 21:11:09 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <508E59BB.1050705@biotech.uni-tuebingen.de>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
	<508E59BB.1050705@biotech.uni-tuebingen.de>
Message-ID: <CADEGkF7LtqRbD_wUA5F782Wc=_hmSXhrQ8NjeZ9tC2AWW=qiXw@mail.gmail.com>

Hi Kai,

> > For the merge conflict, which branch are you using? Can you point
> > to specific commits that cause the conflicts? I haven't tried
> > merging / rebasing my own branch to the current master myself ~ so
> > knowing this should help the process as well.
>
> Disregarding the namespace discussion, I needed to get a reasonable
> branch to get my HMMer2 parser up-to-date in. As I said last week I
> tried rebasing Bow's searchio branch and had a bunch of merge conflicts.
>
> I've retried the rebase today, and most of the merge conflicts are
> actually pretty trivial and mostly around the question where the code
> gets it's OrderedDict from for python versions < 2.7.
>
> I've pushed the rebased patchset to
> https://github.com/kblin/biopython/tree/searchio-rebase if anybody
> wants to have a look. With the last patch fixing an error I seem to
> have introduced during merge conflict resolution, the SearchIO tests
> pass on that branch.

Thanks for doing the rebase :)! I just checked it and everything looks
fine; all unit tests + doctests pass.

On another note, I was wondering about how to combine this rebased
branch with my local branch. Is there a simple way to apply the
changes in the rebased branch to my local working searchio branch or
should I just switch to a local checkout of the rebased branch?

regards,
Bow


From kai.blin at biotech.uni-tuebingen.de  Mon Oct 29 20:43:49 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Mon, 29 Oct 2012 21:43:49 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
Message-ID: <508EEA85.6060906@biotech.uni-tuebingen.de>

Hi Bow,

I've been looking closer at the SearchIO API changes introduced in
August. I think there still is a design problem with the object model,
at least when looking at how this affects the hmmer3 parser (and affects
the hmmer2 parsing as well).

Possibly I'm not seeing the big picture here, so let me explain what I'm
seeing, and then you can tell me what I missed. :)

So, the hmmer2 and hmmer3 file format basically looks like this

# header
# ...
# ...

information about the query

list of hits

list of hsps

(alignments for hsps)

(some statistics)
//

Now, when parsing this file line-wise, you obviously run into the hits
first. However, with the new API, you can't create a Hit object without
knowing the HSPs, but you haven't read them yet.

To work around this, you need to create a fake hit object
(https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L201).
Then, in the loop that creates the fake hit objects, one of the exit
conditions then parses the HSP entries and then replaces the fake hit
objects by "real" Hit objects.
(https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L188)

By the way, that code is a bit misleading. Took me a while to notice the
switch of the list's contents. Anyway, back to business.

So basically you need to create two hit objects for every hit you're
looking at. What's the advantage of forcing Hsp objects to be passed to
the Hit constructor? Just to make sure your Hit objects have a valid Hsp
at some later point?

I'm aware that I'm just looking at the SearchIO design from the
perspective of the hmmer2 parser, but I'd like to understand the reasons
for the API being the way it currently is.

Hope you can shed some light on this,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From kai.blin at biotech.uni-tuebingen.de  Mon Oct 29 20:47:11 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Mon, 29 Oct 2012 21:47:11 +0100
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CADEGkF7LtqRbD_wUA5F782Wc=_hmSXhrQ8NjeZ9tC2AWW=qiXw@mail.gmail.com>
References: <508A4B6C.6020801@biotech.uni-tuebingen.de>
	<CAKVJ-_4Y8G8wTYFqZ0uMCnRXbb1GSia+QLR8fHb4ea2Opu9=dw@mail.gmail.com>
	<CADEGkF5zp4Kd2OyQchw16JByNTsjc5LBrG=js8APSB8XFa7i=g@mail.gmail.com>
	<508E59BB.1050705@biotech.uni-tuebingen.de>
	<CADEGkF7LtqRbD_wUA5F782Wc=_hmSXhrQ8NjeZ9tC2AWW=qiXw@mail.gmail.com>
Message-ID: <508EEB4F.7050607@biotech.uni-tuebingen.de>

On 2012-10-29 21:11, Wibowo Arindrarto wrote:

Hi Bow,

> On another note, I was wondering about how to combine this rebased
> branch with my local branch. Is there a simple way to apply the
> changes in the rebased branch to my local working searchio branch or
> should I just switch to a local checkout of the rebased branch?

Well, you could rebase your local changes on top of the rebased branch. :)
Or, depending on how many changes you have in your local branch, check
our the rebased branch and then git cherry-pick your changes on top of
the rebased branch.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From w.arindrarto at gmail.com  Mon Oct 29 22:55:19 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Mon, 29 Oct 2012 23:55:19 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <508EEA85.6060906@biotech.uni-tuebingen.de>
References: <508EEA85.6060906@biotech.uni-tuebingen.de>
Message-ID: <CADEGkF69XmC-ShWHpyhAcJBY0ZuUCAjXzQhE=FwCxCMXaUrFng@mail.gmail.com>

Hi Kai,

Thanks for the input & comments! I made the API change mainly because
I want to keep the SearchIO object hierarchy more consistent, i.e.
there should be as few places as possible to make changes that break
the model.

There are several attributes that should remain the same between a
single QueryResult object and the Hits, HSPs, and HSPFragments it
contain. For now, these attributes are the ID (both query and hit ID)
and description (also for both query and ID). In the old API, each
object in the object model hierarchy stores these values as its own
attribute. For example, to store the ID of the Hit object, the old API
has the 'id' attribute in the Hit object, 'hit_id' attribute in all
HSP objects it contains, and 'hit_id' attributes in all HSPFragment
contained by each HSP in the Hit. I see this as unecessary
duplications and a possible source of confusion, since these
attributes are completely decoupled from one another even though they
mean the same thing.

The new API stores the these values only at the innermost object in
the hierarchy (the HSPFragment), reducing duplications and possible
sources of inconsistencies. When you access the attributes from
objects other than the HSPFragment, a getter retrieves it from one of
the contained HSPFragment object, after ensuring that all HSPFragment
contain the same value of the attribute
(https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/_utils.py#L99).
Similarly, when you set the attribute, a setter applies the new value
to all HSPFragment objects contained
(https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/_utils.py#L106).

This allows you to keep the values consistent across the hierarchy, so
long as the change is done at the highest level possible (e.g.
changing the hit ID in the HSP object will break consistency, but
changing hit ID through the Hit object will update the hit_id
attribute value across all HSPs it contains). Conceptually, this is
also closer to the real 'Hit' object we're modeling since we always
need at least one HSP to declare a database entry as a Hit.

The HMMER parser's update is partially influenced by this API change,
as you've seen. In the previous version
(https://github.com/bow/biopython/blob/12fbe05c5e17f7a356ab672358b2698612aa8cad/Bio/SearchIO/HmmerIO/hmmertext.py),
the HMMER parser has several ugly bits (e.g. it sets the hit
description in more than one place, a possible source of error). After
changing the API to force the creation of Hits with HSPs, these kinds
of duplications are eliminated. I personally also feel that using the
new API allows me (sometimes forces me) to improve the other format's
parsers in a similar way.

It's unfortunate that the HMMER text parser is made a little difficult
to understand, due to the way HMMER arranges the text output format.
And I admit I didn't do any performance benchmark for the HMMER text
parser when I made the change (I suspected one extra dictionary per
Hit object should not decrease performance that much. Of course, if
the change proves to cause severe performance penalties, then yes, we
should look into it again.).

But for now, I think these are acceptable tradeoffs, if it means the
object model becomes more consistent and the other format parsers
improved as well.

Hope that helps :).

regards,
Bow

P.S. As for the misleading part, yes, I admit that maybe a different
name should be used to note that the contents of the list differ.


On Mon, Oct 29, 2012 at 9:43 PM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
> Hi Bow,
>
> I've been looking closer at the SearchIO API changes introduced in
> August. I think there still is a design problem with the object model,
> at least when looking at how this affects the hmmer3 parser (and affects
> the hmmer2 parsing as well).
>
> Possibly I'm not seeing the big picture here, so let me explain what I'm
> seeing, and then you can tell me what I missed. :)
>
> So, the hmmer2 and hmmer3 file format basically looks like this
>
> # header
> # ...
> # ...
>
> information about the query
>
> list of hits
>
> list of hsps
>
> (alignments for hsps)
>
> (some statistics)
> //
>
> Now, when parsing this file line-wise, you obviously run into the hits
> first. However, with the new API, you can't create a Hit object without
> knowing the HSPs, but you haven't read them yet.
>
> To work around this, you need to create a fake hit object
> (https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L201).
> Then, in the loop that creates the fake hit objects, one of the exit
> conditions then parses the HSP entries and then replaces the fake hit
> objects by "real" Hit objects.
> (https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L188)
>
> By the way, that code is a bit misleading. Took me a while to notice the
> switch of the list's contents. Anyway, back to business.
>
> So basically you need to create two hit objects for every hit you're
> looking at. What's the advantage of forcing Hsp objects to be passed to
> the Hit constructor? Just to make sure your Hit objects have a valid Hsp
> at some later point?
>
> I'm aware that I'm just looking at the SearchIO design from the
> perspective of the hmmer2 parser, but I'd like to understand the reasons
> for the API being the way it currently is.
>
> Hope you can shed some light on this,
> Kai
>
> --
> Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-University of T?bingen
> Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
> D-72076 T?bingen                        Fax :   ++49 7071 29-5979
> Deutschland
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From kai.blin at biotech.uni-tuebingen.de  Tue Oct 30 07:35:40 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Tue, 30 Oct 2012 08:35:40 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <CADEGkF69XmC-ShWHpyhAcJBY0ZuUCAjXzQhE=FwCxCMXaUrFng@mail.gmail.com>
References: <508EEA85.6060906@biotech.uni-tuebingen.de>
	<CADEGkF69XmC-ShWHpyhAcJBY0ZuUCAjXzQhE=FwCxCMXaUrFng@mail.gmail.com>
Message-ID: <508F834C.6010404@biotech.uni-tuebingen.de>

On 2012-10-29 23:55, Wibowo Arindrarto wrote:

Hi Bow,

> Thanks for the input & comments! I made the API change mainly because
> I want to keep the SearchIO object hierarchy more consistent, i.e.
> there should be as few places as possible to make changes that break
> the model.

Thanks for the explanation.

...

> This allows you to keep the values consistent across the hierarchy, so
> long as the change is done at the highest level possible (e.g.
> changing the hit ID in the HSP object will break consistency, but
> changing hit ID through the Hit object will update the hit_id
> attribute value across all HSPs it contains). Conceptually, this is
> also closer to the real 'Hit' object we're modeling since we always
> need at least one HSP to declare a database entry as a Hit.

I see. I didn't think about the programmatic side of things. I see the
advantage of having only one attribute there and of keeping it consistent.

> The HMMER parser's update is partially influenced by this API change,
> as you've seen. In the previous version
> (https://github.com/bow/biopython/blob/12fbe05c5e17f7a356ab672358b2698612aa8cad/Bio/SearchIO/HmmerIO/hmmertext.py),
> the HMMER parser has several ugly bits (e.g. it sets the hit
> description in more than one place, a possible source of error). After
> changing the API to force the creation of Hits with HSPs, these kinds
> of duplications are eliminated. I personally also feel that using the
> new API allows me (sometimes forces me) to improve the other format's
> parsers in a similar way.

Arguably, the more human-readable the file you need to parse, the less
readable the parser tends to be. ;) I think the old parser was a more
straightforward piece of code.

> It's unfortunate that the HMMER text parser is made a little difficult
> to understand, due to the way HMMER arranges the text output format.
> And I admit I didn't do any performance benchmark for the HMMER text
> parser when I made the change (I suspected one extra dictionary per
> Hit object should not decrease performance that much. Of course, if
> the change proves to cause severe performance penalties, then yes, we
> should look into it again.).

I'm not talking about performance here, performance likely isn't a
problem. I'm saying that you're conceptually creating the Hit object
twice. Even the comment in line 200 says so. :)

[snip]
            # create the hit object
            hit_attrs = {
                'id': row[8],
                'query_id': qid,
                'evalue': float(row[0]),
                'bitscore': float(row[1]),
                'bias': float(row[2]),
                # row[3:6] is not parsed, since the info is available
                # at the the HSP level
                'domain_exp_num': float(row[6]),
                'domain_obs_num': int(row[7]),
                'description': row[9],
                'is_included': is_included,
            }
            hit_list.append(hit_attrs)
[snip]

I'm mainly wondering why at this position, I can't just create the Hit
object already, and then later set the HSPs. You could do this via a
setter function that validates the IDs are identical if you want to make
sure you're not shooting yourself in the foot there.

> But for now, I think these are acceptable tradeoffs, if it means the
> object model becomes more consistent and the other format parsers
> improved as well.

I haven't looked into the other parsers, so I'll take your word on that.
I can of course take the same detour of creating a placeholder hit
object for the first pass and then when I've parsed the HSPs create the
real Hit object. If this makes all the other parsers more readable at
the cost of some obscurity in the hmmer text parsers, well, so be it.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From p.j.a.cock at googlemail.com  Tue Oct 30 10:59:44 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 30 Oct 2012 10:59:44 +0000
Subject: [Biopython-dev] Status of SearchIO
In-Reply-To: <CADEGkF7GLd6eEzXec-5W_U4s9F2THe0wknyYpb2ptSY6oFH1aQ@mail.gmail.com>
References: <CAKVJ-_5JqiejShzbJemR2M9Qk_5KL88ayGbh3zGwWoski8DHPA@mail.gmail.com>
	<1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAKVJ-_7XP6+30=rWfryObgJYKDXO9gts7hcUmTZfrCQbEUEbEQ@mail.gmail.com>
	<CAKVJ-_6XT6EO9rJaJ9RJWiE7XTvtAGe4Fq54wfNmCt+Vj57gzg@mail.gmail.com>
	<CADEGkF7GLd6eEzXec-5W_U4s9F2THe0wknyYpb2ptSY6oFH1aQ@mail.gmail.com>
Message-ID: <CAKVJ-_5FFHa25QLE+O6BaURTc6+1ZLQh0rc15iMHfeMbJS_dgA@mail.gmail.com>

On Fri, Oct 26, 2012 at 11:43 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
>>
>> I have started exploring that idea on this new branch,
>> https://github.com/peterjc/biopython/tree/bioseq
>>
>> Does anyone object to me applying the first commit to the master
>> branch (defining the previously discussed new warning for 'beta' code)?
>> https://github.com/peterjc/biopython/commit/97485d5dcf2620f7664ae46a7897c1203847538d
>
> No objection from me for the commit :).
>

Done, commit adding Bio.BiopythonExperimentalWarning cherry-picked
to the master,

https://github.com/biopython/biopython/commit/52ac4383b12335ebcdcb8ea52eec8d23ac28b5e2

Peter


From p.j.a.cock at googlemail.com  Tue Oct 30 11:03:07 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 30 Oct 2012 11:03:07 +0000
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <874nldqi3t.fsf@fastmail.fm>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
	<87pq42s9lt.fsf@fastmail.fm>
	<CAKVJ-_4GzU+5vMXd1XLvycV=tK6xcgMoSA53cjNmYC4fGoPM6w@mail.gmail.com>
	<874nldqi3t.fsf@fastmail.fm>
Message-ID: <CAKVJ-_6uVs=VE6boPAHgPTHgrBS-Q9UrGL+_63V6Go=ch-8oEw@mail.gmail.com>

On Mon, Oct 29, 2012 at 5:54 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
>
>> In the case of Bow's SearchIO code, what would you prefer?
>> e.g. Bio.SearchIO as it is now on his branch?
>
> I like plain ol' Search the best but don't have a strong preference. I'm
> terrible at naming things so trust everyone's judgment on this.
>
> Brad

Since we have no clear consensus, I propose we add Bow's code
as Bio.SearchIO (which is how it is written right now), with the new
BiopythonExperimentalWarning in place (to alert people that it may
change in the next release). We can then rename or move it at a
later date. This will make it easier for people to test the code, and
also suggest further changes or additions (e.g. Kai's HMMER work).

If we and when we agree a consolidation of the Bio.SeqXXX
modules, then Bio.SearchIO could move too. If this happens
before any public release as Bio.SearchIO so much the better.

Adopting lower case module names under Python 3 is also a
separate issue.

Peter


From kai.blin at biotech.uni-tuebingen.de  Tue Oct 30 14:17:38 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Tue, 30 Oct 2012 15:17:38 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <508EEA85.6060906@biotech.uni-tuebingen.de>
References: <508EEA85.6060906@biotech.uni-tuebingen.de>
Message-ID: <508FE182.3040202@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-10-29 21:43, Kai Blin wrote:
Hi Bow,

one more thing:

Hmmer2 has the concept of an accession number in the result. Is there
an attribute for that in the QueryResult object that I'm missing or do
we want a new attribute for that. Would "accession" be a good name?

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQj+GCAAoJEKM5lwBiwTTPaT4IAJb+Xs7sMPpQH4SwUQarItyP
Cg0UYLQNRtKBlyhNpipCbz7BWfqxd8fU0GsYSCVF275fDuBLUa337A6psRzefkWa
84cC7uHmOdcmhyeCipdAs5Jtouxf7ReGuQ+m3/SsW0pRfMHOuZamKw+5+oETnisM
DiHJUv6iKMHCpXrVWpofcKywqb1uqpxdhTp9F1gy+v6rVGKMI4r/fW5mRQZVxC3s
aQdhubCHoN+LUEo/OUKIF6cNeHWLMBToENdYlBhk62gLeSX5bxyhog21pzD+HTYf
5u4rPC2ikVR7iGQ9QPsvW7r7lqpDgoxFbnDYzcsAa+bNYd6+ENs+MAePb8Va2Dg=
=Luz9
-----END PGP SIGNATURE-----


From kai.blin at biotech.uni-tuebingen.de  Tue Oct 30 15:54:50 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Tue, 30 Oct 2012 16:54:50 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <508F834C.6010404@biotech.uni-tuebingen.de>
References: <508EEA85.6060906@biotech.uni-tuebingen.de>
	<CADEGkF69XmC-ShWHpyhAcJBY0ZuUCAjXzQhE=FwCxCMXaUrFng@mail.gmail.com>
	<508F834C.6010404@biotech.uni-tuebingen.de>
Message-ID: <508FF84A.2020802@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-10-30 08:35, Kai Blin wrote:

Hi Bow,

> I'm mainly wondering why at this position, I can't just create the
> Hit object already, and then later set the HSPs. You could do this
> via a setter function that validates the IDs are identical if you
> want to make sure you're not shooting yourself in the foot there.

I've just stumbled over a case where not being able to pre-create Hit
objects really bites me.

See the attached hmmpfam output. You'll notice that the domain table
is not in the order of the hit table. As I'd like to preserve the
order of the hit table, the current setup of the API forces me to
either repeatedly parse the domain annotations until I find the
correct domain annotations for my hit, or to create the hits in the
order of the domain annotation table and then reshuffle them to make
sure they're in the order of the hit table.

If I could just create "empty" hit objects when parsing the hit table,
I could easily preserve the order of the hits but still add the hsps
as I parse them.

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQj/hKAAoJEKM5lwBiwTTPWTYH/2miexrfxolw9J0tOCSHXFYn
eNEzLcIM8ZHUoBCL1fsS/9166VH8D8HpyZCgTQwsSt9BUhQbjkwTmyfmP9wr0QDp
80IbxqWkMAJmDv3Q1RxbVVmD8TTfY6AwezQuwnYb8EFJDD7wvcJOJgJEqlp6zZu1
K/fJNYOXt2GekcXkrOMO1jGkzzpiwBs1uhhpYH9LxMAHPW3vnfTf4/tVSRPOKWRr
IXtxRnLSSurmZP4DYNm1ys4NykY6cO6zPOWxJIiI1lBLR7AVaKNK1bZ75m2D7/Mr
Y4FjnIlqaCFuNwiYPSNWQvTHOIj/VF/nRSWAVRRCqYZoYaDuZa25rb3Fo5RHMC8=
=Lerj
-----END PGP SIGNATURE-----
-------------- next part --------------
hmmpfam - search one or more sequences against HMM database
HMMER 2.3.2 (Oct 2003)
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
Freely distributed under the GNU General Public License (GPL)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 ../Shared/Pfam_fs
Sequence file:            single_porphyra_AA.fa
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query sequence: gi|90819130|dbj|BAE92499.1|
Accession:      [none]
Description:    glutamate synthase [Porphyra yezoensis]

Scores for sequence family classification (score includes all domains):
Model           Description                             Score    E-value  N 
--------        -----------                             -----    ------- ---
Glu_synthase    Conserved region in glutamate synthas   858.6   3.6e-255   2
GATase_2        Glutamine amidotransferases class-II    731.8   3.9e-226   1
Glu_syn_central Glutamate synthase central domain       649.1   7.9e-213   1
GXGXG           GXGXG motif                             367.3   2.7e-107   1
HdeA            hns-dependent expression protein A (H     9.6      0.015   1
GDC-P           Glycine cleavage system P-protein         7.1      0.086   1
Cache_1         Cache domain                              7.0       0.14   1
IBN_N           Importin-beta N-terminal domain           8.2       0.17   1
DUF1200         Protein of unknown function (DUF1200)     6.7       0.42   1
cobW            CobW/HypB/UreG, nucleotide-binding do     5.1       0.45   1
PUF             Pumilio-family RNA binding repeat         6.5       0.47   1
Arch_flagellin  Archaebacterial flagellin                 4.1       0.66   1
FMN_dh          FMN-dependent dehydrogenase               3.2       0.89   1
RNA_pol_Rpb2_4  RNA polymerase Rpb2, domain 4             4.6        1.4   1
DUF477          Domain of unknown function (DUF477)       3.8        1.7   1
FRG1            FRG1-like family                          0.2        1.7   1
DUF1393         Protein of unknown function (DUF1393)     3.1          2   1
tRNA_anti       OB-fold nucleic acid binding domain       4.9          2   1
SelT            Selenoprotein T                           3.1        2.2   1
RNase_PH_C      3' exoribonuclease family, domain 2       4.2        2.3   1
Pencillinase_R  Penicillinase repressor                   3.9        2.5   1
Hormone_4       Neurohypophysial hormones, N-terminal     4.4        2.5   1
DSRB            Dextransucrase DSRB                       2.7        2.7   1
FtsK_SpoIIIE    FtsK/SpoIIIE family                       2.6        3.1   1
UBA             UBA/TS-N domain                           4.2        3.1   1
DUF1981         Domain of unknown function (DUF1981)      3.6        3.3   1
Gla             Vitamin K-dependent carboxylation/gam     4.0        3.5   1
Scm3            Centromere protein Scm3                   2.2        3.5   1
Ribosomal_S6    Ribosomal protein S6                      3.3        3.7   1
Cystatin        Cystatin domain                           2.4        3.9   1
Phage_prot_Gp6  Phage portal protein, SPP1 Gp6-like       1.0          4   1
DUF1976         Domain of unknown function (DUF1976)     -1.5        4.3   1
DUF37           Domain of unknown function DUF37          3.0        4.5   1
Flavodoxin_NdrI NrdI Flavodoxin like                      2.1        4.6   1
Bac_rhodopsin   Bacteriorhodopsin                         0.9        4.9   1
Nitro_FeMo-Co   Dinitrogenase iron-molybdenum cofacto     2.1        5.3   1
MoCF_biosynth   Probable molybdopterin binding domain     1.3        5.6   1
PaaA_PaaC       Phenylacetic acid catabolic protein       0.4        5.6   1
Albicidin_res   Albicidin resistance domain               1.7        5.7   1
DUF1514         Protein of unknown function (DUF1514)     3.5        5.7   1
T5orf172        T5orf172 domain                           2.0        6.1   1
Nup133_N        Nup133 N terminal like                   -0.6        6.5   1
BicD            Microtubule-associated protein Bicaud    -1.6        6.8   1
Sel1            Sel1 repeat                               2.5          7   1
CAP_C           DE   Adenylate cyclase associated (CA     1.3        7.4   1
Colicin         Colicin pore forming domain               1.4        7.5   1
MADF_DNA_bdg    Alcohol dehydrogenase transcription f     1.8        8.2   1
DUF258          Protein of unknown function, DUF258       0.3        8.3   1
PspB            Phage shock protein B                     0.4        8.4   1
GspM            General secretion pathway, M protein      1.0        8.6   1
Coq4            Coenzyme Q (ubiquinone) biosynthesis     -0.3        9.1   1
P22_AR_N        P22_AR N-terminal domain                 -0.2        9.5   1
C1_2            C1 domain                                 1.1        9.6   1
Phage_Mu_P      Bacteriophage Mu P protein               -0.4         10   1

Parsed for domains:
Model           Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
--------        ------- ----- -----    ----- -----      -----  -------
GATase_2          1/1      34   404 ..     1   385 []   731.8 3.9e-226
FRG1              1/1      88   107 ..   151   173 ..     0.2      1.7
C1_2              1/1     191   210 ..     9    27 ..     1.1      9.6
MADF_DNA_bdg      1/1     235   261 ..    57    95 .]     1.8      8.2
PaaA_PaaC         1/1     258   269 ..     1    13 [.     0.4      5.6
Albicidin_res     1/1     274   289 ..    50    65 ..     1.7      5.7
UBA               1/1     311   331 ..    18    38 .]     4.2      3.1
Gla               1/1     342   357 ..    27    42 .]     4.0      3.5
RNA_pol_Rpb2_4    1/1     369   381 ..     1    13 [.     4.6      1.4
MoCF_biosynth     1/1     371   396 ..    23    49 ..     1.3      5.6
DUF1200           1/1     389   401 ..     1    13 [.     6.7     0.42
Nup133_N          1/1     397   419 ..   475   498 .]    -0.6      6.5
DUF1976           1/1     428   448 ..  1296  1319 .]    -1.5      4.3
Bac_rhodopsin     1/1     445   472 ..   219   250 .]     0.9      4.9
Coq4              1/1     459   481 ..    60    82 ..    -0.3      9.1
Glu_syn_central   1/1     478   773 ..     1   301 []   649.1 7.9e-213
Flavodoxin_NdrI   1/1     488   497 ..   122   131 .]     2.1      4.6
P22_AR_N          1/1     524   541 ..   110   126 .]    -0.2      9.5
Cache_1           1/1     537   557 ..     1    23 [.     7.0     0.14
Glu_synthase      1/2     650   676 ..   297   323 ..     1.3        3
HdeA              1/1     727   749 ..    58    79 .]     9.6    0.015
Sel1              1/1     729   745 ..    32    49 .]     2.5        7
DUF1981           1/1     765   787 ..    62    88 .]     3.6      3.3
tRNA_anti         1/1     818   839 ..    54    85 .]     4.9        2
Cystatin          1/1     826   859 ..     1    38 [.     2.4      3.9
RNase_PH_C        1/1     827   846 ..    64    84 .]     4.2      2.3
Glu_synthase      2/2     830  1216 ..     1   412 []   857.3   9e-255
DUF258            1/1     839   860 ..   282   305 .]     0.3      8.3
Pencillinase_R    1/1     856   894 ..    84   118 .]     3.9      2.5
SelT              1/1     872   885 ..    96   111 .]     3.1      2.2
Nitro_FeMo-Co     1/1     879   897 ..    87   105 .]     2.1      5.3
DUF37             1/1     927   934 ..    61    68 .]     3.0      4.5
Scm3              1/1     953   963 ..   103   113 .]     2.2      3.5
cobW              1/1    1038  1058 ..   202   222 .]     5.1     0.45
Arch_flagellin    1/1    1050  1072 ..   197   219 .]     4.1     0.66
DUF1393           1/1    1055  1068 ..     1    14 [.     3.1        2
FtsK_SpoIIIE      1/1    1107  1143 ..   163   198 ..     2.6      3.1
FMN_dh            1/1    1109  1148 ..   291   330 ..     3.2     0.89
DSRB              1/1    1120  1134 ..     1    16 [.     2.7      2.7
Phage_Mu_P        1/1    1122  1131 ..     1    10 [.    -0.4       10
Hormone_4         1/1    1168  1176 ..     1     9 []     4.4      2.5
GDC-P             1/1    1205  1225 ..    10    30 ..     7.1    0.086
PspB              1/1    1268  1276 ..     1     9 [.     0.4      8.4
T5orf172          1/1    1271  1293 ..    35    58 ..     2.0      6.1
CAP_C             1/1    1283  1292 ..   161   170 .]     1.3      7.4
GXGXG             1/1    1290  1485 ..     1   228 []   367.3 2.7e-107
DUF1514           1/1    1453  1469 ..    50    66 .]     3.5      5.7
Colicin           1/1    1456  1467 ..   192   203 .]     1.4      7.5
Ribosomal_S6      1/1    1461  1481 ..    16    36 ..     3.3      3.7
BicD              1/1    1465  1481 ..     1    17 [.    -1.6      6.8
PUF               1/1    1470  1486 ..    19    35 .]     6.5     0.47
DUF477            1/1    1472  1495 ..     1    24 [.     3.8      1.7
Phage_prot_Gp6    1/1    1479  1492 ..     1    14 [.     1.0        4
IBN_N             1/1    1498  1516 ..     1    20 [.     8.2     0.17
GspM              1/1    1506  1520 ..     1    15 [.     1.0      8.6

Alignments of top-scoring domains:
GATase_2: domain 1 of 1, from 34 to 404: score 731.8, E = 3.9e-226
                CS    EEEEEEEEETSSHSBHHHHHHHHHHHHHGGGGSSCSTTSSCECEEEE
                   *->CGvlGfiAhikgkpshkivedaleaLerLeHRGavgADgktGDGAGI
                      CGv GfiA+ ++ ++hkiv +aleaL+++eHRGa++AD ++GDGAGI
  gi|9081913    34    CGV-GFIADVNNVANHKIVVQALEALTCMEHRGACSADRDSGDGAGI 79   

                CS EEECTCCCHHHHHHHCT----S GC-EEEEEEE-SSHHHHHHHHHHHHHH
                   ltqiPdgFFrevakelGieLpe.gqYAVGmvFLPqdelaraearkifEki
                    t+iP+++F++  ++++i++ ++   +VGm+FLP   l+    + i+E +
  gi|9081913    80 TTAIPWNLFQKSLQNQNIKFEQnDSVGVGMLFLPAHKLKES--KLIIETV 127  

                CS HHHTT-EEEEEEE--B-GGGS-HHHHHC--EEEEEEEE-TT--HHHHHHC
                   aeeeGLeVLGWReVPvnnsvLGetAlatePvIeQvFvgapsgdgedfErr
                   ++ee+Le++GWR VP+  +vLG++A  + P++eQvF+ +++ +++ +E++
  gi|9081913   128 LKEENLEIIGWRLVPTVQEVLGKQAYLNKPHVEQVFCKSSNLSKDRLEQQ 177  

                CS EEEEECHSCHHHHTHHH.    BEEEEEESSEEEEEECC-GGGHHHHBHG
                   LyviRkrieksivaenvn....fYiCSLSsrTIVYKGMLtseQLgqFYpD
                   L+++Rk+iek+i+  + +  ++fYiCSLS++TIVYKGM++s++LgqFY+D
  gi|9081913   178 LFLVRKKIEKYIGINGKDwaheFYICSLSCYTIVYKGMMRSAVLGQFYQD 227  

                CS GGSTTEEBSEEEEEECESSSSSCTGGGSSCEEECCCTTCEEEEEEEEETT
                   LqderfeSalAivHsRFSTNTfPsWplAQPfRVnslwgggivlAHNGEIN
                   L++++++S++Ai+H+RFSTNT+P+WplAQP+R         ++ HNGEIN
  gi|9081913   228 LYHSEYTSSFAIYHRRFSTNTMPKWPLAQPMR---------FVSHNGEIN 268  

                CS THHHHHHHHHHTSCCCSSTTCGHHHHCC-SSS-TTSCHHHHHHHHHHHHH
                   TlrgNrnwMraRegvlksplFgddldkLkPIvneggSDSaalDnvlEllv
                   Tl gN nwM++Re +l+s++++d++++LkPI n+++SDSa+lD ++Ell+
  gi|9081913   269 TLLGNLNWMQSREPLLQSKVWKDRIHELKPITNKDNSDSANLDAAVELLI 318  

                CS HTT--HHHHHHHHS----TT-GGGTST-HHHHHHHHHHHHHHCCHCCEEE
                   raGRslpeAlMMlIPEAWqnnpdmdkdrpekraFYeylsglmEPWDGPAa
                   ++GRs++eAlM+l+PEA+qn+pd   +++e+ +FYey+sgl+EPWDGPA+
  gi|9081913   319 ASGRSPEEALMILVPEAFQNQPDFA-NNTEISDFYEYYSGLQEPWDGPAL 367  

                CS EEEETSSEEEEEEETTTSCESEEEEEEEEEE.TTEEEEEESSC   
                   lvftDGryavgAtLDRNGLTRPaRygiTrdldkDglvvvaSEa<-*
                   +vft+G++ +gAtLDRNGL RPaRy+iT    kD+lv+v+SE+   
  gi|9081913   368 VVFTNGKV-IGATLDRNGL-RPARYVIT----KDNLVIVSSES    404  

FRG1: domain 1 of 1, from 88 to 107: score 0.2, E = 1.7
                   *->FQkfKvDLqdrklrinekDkkel<-*
                      FQk+   Lq+  +  +++D+ ++   
  gi|9081913    88    FQKS---LQNQNIKFEQNDSVGV    107  

C1_2: domain 1 of 1, from 191 to 210: score 1.1, E = 9.6
                   *->idgfyg...fYsCkkccddftl<-*
                      i+g+++ ++fY C+  c  +t+   
  gi|9081913   191    INGKDWaheFYICSLSC--YTI    210  

MADF_DNA_bdg: domain 1 of 1, from 235 to 261: score 1.8, E = 8.2
                   *->drYrrelrkirqgnsegsstgsgesykskWryyeelsFL<-*
                      +++  ++r+               ++ +kW+++  ++F    
  gi|9081913   235    SSFAIYHRRFS------------TNTMPKWPLAQPMRFV    261  

PaaA_PaaC: domain 1 of 1, from 258 to 269: score 0.4, E = 5.6
                CS    X............   
                   *->MYnFvEHGGvint<-*
                      M  Fv H G int   
  gi|9081913   258    M-RFVSHNGEINT    269  

Albicidin_res: domain 1 of 1, from 274 to 289: score 1.7, E = 5.7
                   *->LrlmharEPsLrkgtG<-*
                      L+ m+ rEP L+ +++   
  gi|9081913   274    LNWMQSREPLLQSKVW    289  

UBA: domain 1 of 1, from 311 to 331: score 4.2, E = 3.1
                CS    HHHHHHHHHTTT-HHHHHHHH   
                   *->eeakkALeatngnverAvewL<-*
                      ++a++ L a++ ++e+A+++L   
  gi|9081913   311    DAAVELLIASGRSPEEALMIL    331  

Gla: domain 1 of 1, from 342 to 357: score 4.0, E = 3.5
                CS    CSSHHHHHHHHHHCTC   
                   *->fednegtkefwrkYfg<-*
                      f++n+++  f++ Y g   
  gi|9081913   342    FANNTEISDFYEYYSG    357  

RNA_pol_Rpb2_4: domain 1 of 1, from 369 to 381: score 4.6, E = 1.4
                CS    EEETTEEEEEESS   
                   *->VYvNGklvGthrn<-*
                      V+ NGk++G + +   
  gi|9081913   369    VFTNGKVIGATLD    381  

MoCF_biosynth: domain 1 of 1, from 371 to 396: score 1.3, E = 5.6
                CS    CHHHHHHHHHHHTTTCEEEEEEEE-SS   
                   *->tNgpmLaalLresaGaevirygiVpDd<-*
                      tNg+ + a L +  G  ++ry+i +D+   
  gi|9081913   371    TNGKVIGATLDR-NGLRPARYVITKDN    396  

DUF1200: domain 1 of 1, from 389 to 401: score 6.7, E = 0.42
                   *->kYvltedtLlIks<-*
                      +Yv+t+d L+I+s   
  gi|9081913   389    RYVITKDNLVIVS    401  

Nup133_N: domain 1 of 1, from 397 to 419: score -0.6, E = 6.5
                   *->lylltrnsGvvrIeHaleedstne<-*
                      l++ + +sGvv++e +  + s  +   
  gi|9081913   397    LVIVSSESGVVQVE-PGNVKSKGR    419  

DUF1976: domain 1 of 1, from 428 to 448: score -1.5, E = 4.3
                   *->VsvYiyFkevtdnksLsEysVtyk<-*
                      V++++   ++++nk ++  sVt k   
  gi|9081913   428    VDIFS--HKILNNKEIK-TSVTTK    448  

Bac_rhodopsin: domain 1 of 1, from 445 to 472: score 0.9, E = 4.9
                CS    HHHHHHHHHHHHHHHHHCHHHTC---------   
                   *->vvAKVgFgfilLrsravlertvavgsalaage<-*
                      v++K+++g +l ++r++le  +   + l+++    
  gi|9081913   445    VTTKIPYGELLTDARQILE--HK--PFLSDQQ    472  

Coq4: domain 1 of 1, from 459 to 481: score -0.3, E = 9.1
                   *->rrILkEkPRissetldlkkLrkL<-*
                      r+IL  kP  s  ++d kkL +L   
  gi|9081913   459    RQILEHKPFLSDQQVDIKKLMQL    481  

Glu_syn_central: domain 1 of 1, from 478 to 773: score 649.1, E = 7.9e-213
                CS    HHHHHHCTT--HHHHHCTCHHHHHHSS--EE-S---S--CCC-SS--
                   *->llrrQkAFGYTyEdvelvllPMAetGkEalGSMGdDtPLAVLSekpr
                      l+++Q+AFGYT+Edvelv+++MA+++kE++++MGdD+PL +LSek++
  gi|9081913   478    LMQLQTAFGYTNEDVELVIEHMASQAKEPTFCMGDDIPLSILSEKSH 524  

                CS -GGGCEEE----SSS----TTTTGGG-B--EEES--S-TTS-SGGGC-CE
                   lLYdYFKQlFAQVTNPPIDPIREelVMSLetylGpegNlLeptpeqarrl
                   +LYdYFKQ+FAQVTNP+IDP+RE+lVMSL+ ++G+++NlL+  p+ a+++
  gi|9081913   525 ILYDYFKQRFAQVTNPAIDPLRESLVMSLAIQIGHKSNLLDDQPTLAKHI 574  

                CS EESSSB--HHHHHH.HHHH....CCCCEEEEESEEESTTSTTCHHHHHHH
                   kLesPILsnselekmlknidairegfkaatIditFdveeGvdgLeaaLdr
                   kLesP+++++el++ + +     +++++  I+++F  e+G++ ++  + +
  gi|9081913   575 KLESPVINEGELNA-IFE-----SKLSCIRINTLFQLEDGPKNFKQQIQQ 618  

                CS HHHHHHHHHHCT-SEEEEESTCG--CTTEEE--HHHHHHHHHHHHHCTT-
                   lceeAeeAirsGaniivLSDRndildeervaIPaLLAvGAVHhHLIrkgL
                   lce A++Ai +G ni+vLSD+n+ ld+e+v+IP+LLAvGAVHhHLI kgL
  gi|9081913   619 LCENASQAILDGNNILVLSDKNNSLDSEKVSIPPLLAVGAVHHHLINKGL 668  

                CS CCC-EEEEEESS--SHHHHHHHHCTT-SEEEEHCCHHHHHHHHCCCCCCC
                   RtkvslvVETGEaREvHHFAvLiGYGAsAInPYLAyETirdWWlirrGll
                   R+ +s+ VET++++++HHFA+LiGYGAsAI+PYLA+ET r+WW + ++++
  gi|9081913   669 RQEASILVETAQCWSTHHFACLIGYGASAICPYLAFETARHWWSNPKTKM 718  

                CS CHTTTS- T--HHHHHHHHHHHHHHHHHHHHHCTT--BHHHHCCS--EEE
                   lmskGkl.elsleeavkNYrkAiekGlLKIMSKMGISTlqSYrGAQIFEA
                   lmskG+l++++++ea++NY+kA+e+GlLKI+SKMGIS+l+SY+GAQIFE+
  gi|9081913   719 LMSKGRLpACNIQEAQANYKKAVEAGLLKILSKMGISLLSSYHGAQIFEI 768  

                CS SSB-H   
                   vGLsk<-*
                   +GL++   
  gi|9081913   769 LGLGS    773  

Flavodoxin_NdrI: domain 1 of 1, from 488 to 497: score 2.1, E = 4.6
                CS    -HHHHHHHHH   
                   *->TneDVerVrk<-*
                      TneDVe V +   
  gi|9081913   488    TNEDVELVIE    497  

P22_AR_N: domain 1 of 1, from 524 to 541: score -0.2, E = 9.5
                   *->dVLydYWtrkGkAv..NPR<-*
                      ++LydY+  + +A  +NP+   
  gi|9081913   524    HILYDYFK-QRFAQvtNPA    541  

Cache_1: domain 1 of 1, from 537 to 557: score 7.0, E = 0.14
                   *->wTePYvdaalktgdlViTiaqPv<-*
                      +T+P++d +  +++lV ++a+++   
  gi|9081913   537    VTNPAIDPL--RESLVMSLAIQI    557  

Glu_synthase: domain 1 of 2, from 650 to 676: score 1.3, E = 3
                CS    --HHHHHHHHHHHHHCTT-CCCSEEEE   
                   *->lPwelgLaevhqtLvengLRdrVsLia<-*
                      +P  l++ +vh  L++ gLR + s+ +   
  gi|9081913   650    IPPLLAVGAVHHHLINKGLRQEASILV    676  

HdeA: domain 1 of 1, from 727 to 749: score 9.6, E = 0.015
                   *->ACk.QdkkAsFkdKvkaEldKvk<-*
                      AC  Q+ +A++k+ v+a l K+    
  gi|9081913   727    ACNiQEAQANYKKAVEAGLLKIL    749  

Sel1: domain 1 of 1, from 729 to 745: score 2.5, E = 7
                CS    .HHH.HHHHHHHHHHTT-   
                   *->DyekeAlkwyekAAeqGn<-*
                      ++++ A + y+kA e+G    
  gi|9081913   729    NIQE-AQANYKKAVEAGL    745  

DUF1981: domain 1 of 1, from 765 to 787: score 3.6, E = 3.3
                   *->iFgvltlaakeesesivklAfqiid.qi<-*
                      iF++l+l++       v+lAf+ +++qi   
  gi|9081913   765    IFEILGLGSEV-----VNLAFKGTTsQI    787  

tRNA_anti: domain 1 of 1, from 818 to 839: score 4.9, E = 2
                CS    EEEEEEETTSSTSTCTCTT..EEEEEEEEEEE   
                   *->tGkvkkrpggeqNnlkTGeKAlelvveeievl<-*
                      +G v+ rpgge          ++++ +e+      
  gi|9081913   818    YGFVQYRPGGE----------YHINNPEMSKA    839  

Cystatin: domain 1 of 1, from 826 to 859: score 2.4, E = 3.9
                CS    ECEEEEET.STSHHHHHHHHHHHHHHHHHSSSSEEEEE   
                   *->GglspvdpNendpevqealdfAlakyNeksndnylfel<-*
                      Gg   +++    pe  +al+ A+  yN +  +ny++ l   
  gi|9081913   826    GGEYHINN----PEMSKALHQAVRGYNPEYYNNYQSLL    859  

RNase_PH_C: domain 1 of 1, from 827 to 846: score 4.2, E = 2.3
                CS    SSSS.B.HHHHHHHHHHHHHH   
                   *->GkgnglteelleealelAkeg<-*
                      G +++++ +++ +al++A+ g   
  gi|9081913   827    G-EYHINNPEMSKALHQAVRG    846  

Glu_synthase: domain 2 of 2, from 830 to 1216: score 857.3, E = 9e-255
                CS    -SS-HHHHHHHHHHHHC--T-HHHHHHHHHHHHTS.-S-SGGGGEEE
                   *->hrnepeviktlqkavqvpveskpsydkYreplnertpigalrdlLef
                      h n+pe++k l++av+    +   y +Y+ +l +r p++alrdlL++
  gi|9081913   830    HINNPEMSKALHQAVRG--YNPEYYNNYQSLLQNR-PPTALRDLLKL 873  

                CS --SS--......--GGGS--HHHHHTTEEEEEB-CTTC-HHHHHHHHHHH
                   kyaeepldtdkiipieevepaleikkrfctgaMSyGALSeeAheALAiAm
                    ++++p      i+i+eve+++ i + fctg+MS+GALS+e+he+LAiAm
  gi|9081913   874 QSNRAP------ISIDEVESIEDILQKFCTGGMSLGALSRETHETLAIAM 917  

                CS HHCT-EEEETTT---GGGCSB-TTS-T S BTTSTT--S--TT-B---SE
                   nriGtksNtGEGGedperlkpaadlds.G.SpTlpHLkGLqnednarSAI
                   nriG+ksN+GEGGedp r+k + d++s+G+Sp lpHLkGL+n+d+a+SAI
  gi|9081913   918 NRIGGKSNSGEGGEDPVRFKILNDVNSsGtSPLLPHLKGLKNGDTASSAI 967  

                CS EEE-TT-TT--............HHHHCC-SEEEEE---TTSTTT--EE-
                   kQvASGRFGVtkRnGefWeefkRseYLvnAdalEIKiAQGAKPGeGGhLP
                   kQ+ASGRFGVt            +eYL+nA++lEIKiAQGAKPGeGG+LP
  gi|9081913   968 KQIASGRFGVT------------PEYLMNAKQLEIKIAQGAKPGEGGQLP 1005 

                CS GGG--HHHHHHHTS-TT--EE--SS-TT-SSHHHHHHHHHHHHHH-.TTS
                   GeKVspeIAriRnstPGvgliSPpPHHDIysiEDLaqLIydLkeindpkA
                   G+K+sp+IA +R ++PGv liSPpPHHDIysiEDL+qLI+dL++in pkA
  gi|9081913  1006 GKKISPYIATLRKCKPGVPLISPPPHHDIYSIEDLSQLIFDLHQIN-PKA 1054 

                CS EEEEEEE-STTHHHHHHH...HHHTT-SEEEEE-TT---SSEECCHHHHC
                   pisVKLVsehgvgtiaaGhmqvakAnADiIlIdGhdGGTGASpktsikha
                   +isVKLVse g+gtiaaG   vak+nADiI+I+GhdGGTGASp++sikha
  gi|9081913  1055 KISVKLVSEIGIGTIAAG---VAKGNADIIQISGHDGGTGASPLSSIKHA 1101 

                CS ---HHHHHHHHHHHHHCTT-CCCSEEEEESS--SHHHHHHHHHCT-SEEE
                   GlPwelgLaevhqtLvengLRdrVsLiadGGLrTGaDVakAaaLGAdavg
                   G PwelgL+evhq+L en+LRdrV+L++dGGLrTG D+++Aa++GA+++g
  gi|9081913  1102 GSPWELGLSEVHQLLAENQLRDRVTLRVDGGLRTGSDIVLAAIMGAEEFG 1151 

                CS -SHHHHHHCT--S---CCCT--TTSSS---CCHH..CT----HHHHHHHH
                   iGTaaLiAlGCimaRvCHtntCPvGvATQDPeLrKrlkfegaperVvNyf
                   +GT+a+iA+GCimaR+CHtn+CPvGvATQ++eLr   +f g+pe +vN+f
  gi|9081913  1152 FGTVAMIATGCIMARICHTNKCPVGVATQREELR--ARFSGVPEALVNFF 1199 

                CS HHHHHHHHHHHHHHT-S   
                   iflaeEvrellaqlGfr<-*
                   +f+  Evre+la+lG++   
  gi|9081913  1200 LFIGNEVREILASLGYK    1216 

DUF258: domain 1 of 1, from 839 to 860: score 0.3, E = 8.3
                CS    HHHHHHHCTSS-HHHHHHHHHHHH   
                   *->AVkaAveeGeIseeRYesYlklle<-*
                      A+ +Av    +++e Y++Y+ ll+   
  gi|9081913   839    ALHQAVR--GYNPEYYNNYQSLLQ    860  

Pencillinase_R: domain 1 of 1, from 856 to 894: score 3.9, E = 2.5
                CS    XXXXXXXXXXXXXXXXXXX    XXXXXXXXXXXXXXXX   
                   *->drlfggsvgalvanfleee....klSeddieeLrelLde<-*
                      + l++++++ ++ ++l+ ++++ ++S d++e ++++L++   
  gi|9081913   856    QSLLQNRPPTALRDLLKLQsnraPISIDEVESIEDILQK    894  

SelT: domain 1 of 1, from 872 to 885: score 3.1, E = 2.2
                   *->KLqtGrvYAPPtpqEL<-*
                      KLq++r   P++++E+   
  gi|9081913   872    KLQSNRA--PISIDEV    885  

Nitro_FeMo-Co: domain 1 of 1, from 879 to 897: score 2.1, E = 5.3
                CS    EEE-TTSSBHHHHHHHHHC   
                   *->pikagegetieeaiealqe<-*
                      pi   e e+ie+ + ++ +   
  gi|9081913   879    PISIDEVESIEDILQKFCT    897  

DUF37: domain 1 of 1, from 927 to 934: score 3.0, E = 4.5
                   *->hpGGyDPV<-*
                      ++GG DPV   
  gi|9081913   927    GEGGEDPV    934  

Scm3: domain 1 of 1, from 953 to 963: score 2.2, E = 3.5
                   *->HLraLeteddi<-*
                      HL++L+++d++   
  gi|9081913   953    HLKGLKNGDTA    963  

cobW: domain 1 of 1, from 1038 to 1058: score 5.1, E = 0.45
                CS    ...HHHHHHHHHH-SSS-EEE   
                   *->adlekleadlrrlnpeapiip<-*
                      +dl++l+ dl+++np+a+i     
  gi|9081913  1038    EDLSQLIFDLHQINPKAKISV    1058 

Arch_flagellin: domain 1 of 1, from 1050 to 1072: score 4.1, E = 0.66
                   *->inpstkvrgeVvpenGapgtief<-*
                      inp  k+++++v+e+G+ ++      
  gi|9081913  1050    INPKAKISVKLVSEIGIGTIAAG    1072 

DUF1393: domain 1 of 1, from 1055 to 1068: score 3.1, E = 2
                   *->klSvKtVVAiGIGA<-*
                      k+SvK V  iGIG+   
  gi|9081913  1055    KISVKLVSEIGIGT    1068 

FtsK_SpoIIIE: domain 1 of 1, from 1107 to 1143: score 2.6, E = 3.1
                   *->lviDnydeLaeenlL.ervtsLknqGlsygvhvmata<-*
                      l++ + ++L +en+L++rvt+ + +Gl +g +++++a   
  gi|9081913  1107    LGLSEVHQLLAENQLrDRVTLRVDGGLRTGSDIVLAA    1143 

FMN_dh: domain 1 of 1, from 1109 to 1148: score 3.2, E = 0.89
                CS    HHHHHHHHHCHHTTTSSEEEEESS-SSHHHHHHHHHHTSS   
                   *->LpeVvPIlkeaAvkgdieVllDgGvRRGtDVlKALALGAr<-*
                      L eV  +l e  + +++   +DgG R+G+D++ A  +GA+   
  gi|9081913  1109    LSEVHQLLAENQLRDRVTLRVDGGLRTGSDIVLAAIMGAE    1148 

DSRB: domain 1 of 1, from 1120 to 1134: score 2.7, E = 2.7
                   *->mKvndrvtvKtDGgpR<-*
                       ++ drvt + DGg R   
  gi|9081913  1120    -QLRDRVTLRVDGGLR    1134 

Phage_Mu_P: domain 1 of 1, from 1122 to 1131: score -0.4, E = 10
                   *->sntVtLrvgG<-*
                       ++VtLrv+G   
  gi|9081913  1122    RDRVTLRVDG    1131 

Hormone_4: domain 1 of 1, from 1168 to 1176: score 4.4, E = 2.5
                CS    X-TT--TT-   
                   *->CyirnCPrG<-*
                      C  + CP+G   
  gi|9081913  1168    CHTNKCPVG    1176 

GDC-P: domain 1 of 1, from 1205 to 1225: score 7.1, E = 0.086
                   *->eqqeMLstiGlssLddLidat<-*
                      e++e+L+++G++sLdd ++++   
  gi|9081913  1205    EVREILASLGYKSLDDITGQN    1225 

PspB: domain 1 of 1, from 1268 to 1276: score 0.4, E = 8.4
                   *->MsaffLagP<-*
                      M+ ++La+P   
  gi|9081913  1268    MDDDILAIP    1276 

T5orf172: domain 1 of 1, from 1271 to 1293: score 2.0, E = 6.1
                   *->dvvalievedaraklEklLHkrFk<-*
                      d+ a+ ev++a  klE+++ k+Fk   
  gi|9081913  1271    DILAIPEVSNAI-KLETEITKHFK    1293 

CAP_C: domain 1 of 1, from 1283 to 1292: score 1.3, E = 7.4
                CS    EEEEEE----   
                   *->KLvTevveha<-*
                      KL+Te++ h    
  gi|9081913  1283    KLETEITKHF    1292 

GXGXG: domain 1 of 1, from 1290 to 1485: score 367.3, E = 2.7e-107
                CS    EEEEE-TT--STTHHHHHHHHHHCTTTS.S-TTCEEEEEEEEE-TTT
                   *->keeaiiNtdrlvgtrlsgeiakkygeegalpkdtgkivfnGsAGqsf
                      k+++i Nt+r+vgtrlsg iak yg+ g + k+ +k++f+GsAGqsf
  gi|9081913  1290    KHFKIANTNRTVGTRLSGIIAKNYGNTG-F-KGLIKLNFYGSAGQSF 1334 

                CS TTT-BTTEEEEEEEEE-S.TTTTT-ECCEEEEE--TT-.......SS-GG
                   GafmagGvtLeleGdAnddyvGkgmsGGeIvikgnagdpvGnnMdageyv
                   Gaf+a+G++L l+G+And yvGkgm+GG+Ivi+++ag         +e +
  gi|9081913  1335 GAFLASGINLKLMGEAND-YVGKGMNGGSIVIVPPAGT-------IYEDN 1376 

                CS GSEEC-SSTTTT--CEEEEESSEE-TTTTTT-.....CCEEEEESEB.-S
                   gnviaGNtclyGatGGkifiaGdAGerfgvrnkayKdsgatiVveGvaGd
                   ++vi+GNtclyGatGG++f++G+AGerf+vrn     s a+ VveGv Gd
  gi|9081913  1377 NQVIIGNTCLYGATGGYLFAQGQAGERFAVRN-----SLAESVVEGV-GD 1420 

                CS STTTT-EEEEEEESS-B-SSBTTT--CCEEEEE-TTS.......THHHHB
                   hggEYMtGGtivVlGdaGrnvGagMtGGiaYvlgeiedfsyMiatlpgkv
                   h++EYMtGG+ivVlG+aGrnvGagMtGG+aY+l+e+e        + ++v
  gi|9081913  1421 HACEYMTGGVIVVLGKAGRNVGAGMTGGLAYFLDEDE-------NFIDRV 1463 

                CS -CCCEEEE...ES-S......CCHHHHHHHH   
                   nleiVeledlkrievkrkklLpegekqlkel<-*
                   n+eiV+ +   r+ +      ++ge+qlk+l   
  gi|9081913  1464 NSEIVKIQ---RVIT------KAGEEQLKNL    1485 

DUF1514: domain 1 of 1, from 1453 to 1469: score 3.5, E = 5.7
                   *->LeeyrieveRikkevkk<-*
                      L e+++ ++R++ e+ k   
  gi|9081913  1453    LDEDENFIDRVNSEIVK    1469 

Colicin: domain 1 of 1, from 1456 to 1467: score 1.4, E = 7.5
                CS    SHHHHHHHHHCH   
                   *->DdkfveklNkli<-*
                      D++f++ +N +i   
  gi|9081913  1456    DENFIDRVNSEI    1467 

Ribosomal_S6: domain 1 of 1, from 1461 to 1481: score 3.3, E = 3.7
                CS    CCHHHHHHHHHHHHHCTT-EE   
                   *->EqvkqeiekYqkvLtnngAei<-*
                      ++v++ei k+q+v+t++g+e+   
  gi|9081913  1461    DRVNSEIVKIQRVITKAGEEQ    1481 

BicD: domain 1 of 1, from 1465 to 1481: score -1.6, E = 6.8
                   *->gqaysnqrkvAkdGeer<-*
                       + +++qr+ +k Gee+   
  gi|9081913  1465    SEIVKIQRVITKAGEEQ    1481 

PUF: domain 1 of 1, from 1470 to 1486: score 6.5, E = 0.47
                   *->lQkllevateeqkqlil<-*
                      +Q+++++a+eeq ++++   
  gi|9081913  1470    IQRVITKAGEEQLKNLI    1486 

DUF477: domain 1 of 1, from 1472 to 1495: score 3.8, E = 1.7
                   *->gtLspserarLeqalaalEqktga<-*
                      ++++++  ++L   ++  ++ktg+   
  gi|9081913  1472    RVITKAGEEQLKNLIENHAAKTGS    1495 

Phage_prot_Gp6: domain 1 of 1, from 1479 to 1492: score 1.0, E = 4
                   *->eEmikkFidkHklr<-*
                      eE +k++i+ H+++   
  gi|9081913  1479    EEQLKNLIENHAAK    1492 

IBN_N: domain 1 of 1, from 1498 to 1516: score 8.2, E = 0.17
                CS    HHHHHHHHHCCTHHCHHHHH   
                   *->AEkqLeqlekqklPgfllaL<-*
                      A++ Le+++++ lP+f++ +   
  gi|9081913  1498    AHTILEKWNSY-LPQFWQVV    1516 

GspM: domain 1 of 1, from 1506 to 1520: score 1.0, E = 8.6
                CS    XXXXXXXXXXXXXXX   
                   *->mneLqawWqgrspRE<-*
                      ++ L ++Wq ++p+E   
  gi|9081913  1506    NSYLPQFWQVVPPSE    1520 

//

From etal at uga.edu  Tue Oct 30 17:21:25 2012
From: etal at uga.edu (Eric Talevich)
Date: Tue, 30 Oct 2012 13:21:25 -0400
Subject: [Biopython-dev] Fwd: Pull Request: MafIO.py
In-Reply-To: <5aa8ce85a0ec41b5b817cdc5105bfdfb@BLUPRD0210HT004.namprd02.prod.outlook.com>
References: <5aa8ce85a0ec41b5b817cdc5105bfdfb@BLUPRD0210HT004.namprd02.prod.outlook.com>
Message-ID: <CAMC681mrW5KrjZb32tUHDm5bBHQfosNZHM1yQaN4Ac9YjVHS3A@mail.gmail.com>

---------- Forwarded message ----------
From: Nick Loman <n.j.loman at bham.ac.uk>
Date: Tue, Oct 30, 2012 at 6:34 AM
Subject: Pull Request: MafIO.py


 Hi there

 Thanks for the MafIO branch. In order to get it to read MAF files produced
by Mugsy (mugsy.sourceforge.net) I had to make the following change:

 diff --git a/Bio/AlignIO/MafIO.py b/Bio/AlignIO/MafIO.py
index 6eda0ca..4bb1407 100644
--- a/Bio/AlignIO/MafIO.py
+++ b/Bio/AlignIO/MafIO.py
@@ -178,7 +178,7 @@ def MafIterator(handle, seq_count = None, alphabet =
single_letter_alphabet):

              annotations = dict([x.split("=") for x in
line.strip().split()[1:]])

 -            if len([x for x in annotations.keys() if x not in ("score",
"pass")]) > 0:
+            if len([x for x in annotations.keys() if x not in ("score",
"pass", "label", "mult")]) > 0:
                 raise ValueError("Error parsing alignment - invalid key in
'a' line")
         elif line.startswith("#"):
             # ignore comments


 My Python fork is a bit confusing right now so hope you don't mind me
sending this pull request via email!

 Cheers

 Nick


From w.arindrarto at gmail.com  Wed Oct 31 00:09:41 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 31 Oct 2012 01:09:41 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <508FE182.3040202@biotech.uni-tuebingen.de>
References: <508EEA85.6060906@biotech.uni-tuebingen.de>
	<508FE182.3040202@biotech.uni-tuebingen.de>
Message-ID: <CADEGkF6tBqaYUTuX26MtLuq+sncu_=zdo8P-+yfg4Nn11huo_Q@mail.gmail.com>

Hi Kai,

> one more thing:
>
> Hmmer2 has the concept of an accession number in the result. Is there
> an attribute for that in the QueryResult object that I'm missing or do
> we want a new attribute for that. Would "accession" be a good name?
>
> Cheers,
> Kai

I've used '.acc' for accesion number properties in the current HMMER3
and BLAST parsers, but this choice was arbitrary. '.accession' is a
good name. I didn't use it because I like shorter names better, but
then again it may be unclear at times.

Does anyone have preference between '.acc' or '.accession'? If not, I
can change the current '.acc' into '.accession'.

cheers,
Bow


From w.arindrarto at gmail.com  Wed Oct 31 00:19:30 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 31 Oct 2012 01:19:30 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <508FF84A.2020802@biotech.uni-tuebingen.de>
References: <508EEA85.6060906@biotech.uni-tuebingen.de>
	<CADEGkF69XmC-ShWHpyhAcJBY0ZuUCAjXzQhE=FwCxCMXaUrFng@mail.gmail.com>
	<508F834C.6010404@biotech.uni-tuebingen.de>
	<508FF84A.2020802@biotech.uni-tuebingen.de>
Message-ID: <CADEGkF4kdFwBwHfHJ-ZF42zijKdAoVTiGqTpJEqVu-9JnNS4mQ@mail.gmail.com>

Hi Kai,

> I've just stumbled over a case where not being able to pre-create Hit
> objects really bites me.
>
> See the attached hmmpfam output. You'll notice that the domain table
> is not in the order of the hit table. As I'd like to preserve the
> order of the hit table, the current setup of the API forces me to
> either repeatedly parse the domain annotations until I find the
> correct domain annotations for my hit, or to create the hits in the
> order of the domain annotation table and then reshuffle them to make
> sure they're in the order of the hit table.
>
> If I could just create "empty" hit objects when parsing the hit table,
> I could easily preserve the order of the hits but still add the hsps
> as I parse them.

Hmm..

This is a problem :/. I didn't expect any format to have this kind of ordering.

I'll see what I can do with the current API limitation. We may need to
change it back to not requiring any HSPs for Hit. In any case, I'll
see what needs to be done first and get back asap.

cheers,
Bow


From mjldehoon at yahoo.com  Wed Oct 31 01:12:18 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 30 Oct 2012 18:12:18 -0700 (PDT)
Subject: [Biopython-dev] Working with the new SearchIO API
Message-ID: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com>


>Does anyone have preference between '.acc' or '.accession'? If not, I
>can change the current '.acc' into '.accession'.

I would prefer .accession for clarity.
Best,
-Michiel


From andrewscz at gmail.com  Wed Oct 31 18:10:48 2012
From: andrewscz at gmail.com (Andrew Sczesnak)
Date: Wed, 31 Oct 2012 11:10:48 -0700
Subject: [Biopython-dev] Pull Request: MafIO.py
In-Reply-To: <mailman.1.1351699203.6679.biopython-dev@lists.open-bio.org>
References: <mailman.1.1351699203.6679.biopython-dev@lists.open-bio.org>
Message-ID: <01027F16-EBA0-41A2-B1F5-D0E128B0B08E@gmail.com>

Nick,

Can you provide a snippet of a file from mugsy for the unit tests?

Thanks,
Andrew

On Oct 31, 2012, at 9:00 AM, biopython-dev-request at lists.open-bio.org wrote:

> From: Nick Loman <n.j.loman at bham.ac.uk>
> Date: Tue, Oct 30, 2012 at 6:34 AM
> Subject: Pull Request: MafIO.py
> 
> 
> Hi there
> 
> Thanks for the MafIO branch. In order to get it to read MAF files produced
> by Mugsy (mugsy.sourceforge.net) I had to make the following change:
> 
> diff --git a/Bio/AlignIO/MafIO.py b/Bio/AlignIO/MafIO.py
> index 6eda0ca..4bb1407 100644
> --- a/Bio/AlignIO/MafIO.py
> +++ b/Bio/AlignIO/MafIO.py
> @@ -178,7 +178,7 @@ def MafIterator(handle, seq_count = None, alphabet =
> single_letter_alphabet):
> 
>              annotations = dict([x.split("=") for x in
> line.strip().split()[1:]])
> 
> -            if len([x for x in annotations.keys() if x not in ("score",
> "pass")]) > 0:
> +            if len([x for x in annotations.keys() if x not in ("score",
> "pass", "label", "mult")]) > 0:
>                 raise ValueError("Error parsing alignment - invalid key in
> 'a' line")
>         elif line.startswith("#"):
>             # ignore comments
> 
> 
> My Python fork is a bit confusing right now so hope you don't mind me
> sending this pull request via email!
> 
> Cheers
> 
> Nick


From redmine at redmine.open-bio.org  Wed Oct 31 19:09:57 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 31 Oct 2012 19:09:57 +0000
Subject: [Biopython-dev] [Biopython - Bug #3297] newline added in quated
	features
References: <redmine.issue-3297.20110926204742@redmine.open-bio.org>
Message-ID: <redmine.journal-14991.20121031190957@redmine.open-bio.org>


Issue #3297 has been updated by Chris Fields.

Assignee changed from Bioperl Guts to Biopython Dev Mailing List

Changing default assignee.
----------------------------------------
Bug #3297: newline added in quated features
https://redmine.open-bio.org/issues/3297

Author: Jesse van Dam
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


Note: sorry for the duplicate reporting, did not notice the makeup of the bug reporting system

When I have a feature line like (which spans multiple lines) in a genbank file

<pre>
                     /product="Glutamate synthase [NADPH] small chain (EC 1.4.1
                     .13)"

</pre>

Then a space/newline will be added between 1.4.1 and .13 in the result so when printing the feature with the following code
<pre>
  print(source[0].qualifiers["product"])
</pre>

It will print (with the an unwanted space) 
<pre>
Glutamate synthase [NADPH] small chain (EC 1.4.1 .13)
</pre>

Changed the following thing in scanner.py to fix this problem
<pre>
                    elif value[0]=='"':
                        #Quoted...
                        if value[-1]!='"' or value!='"':
                            #No closing quote on the first line...
                            while value[-1] != '"':
-                               value += "\n" + iterator.next() 
+                               value += iterator.next() 
                        else:
                            #One single line (quoted)
                            assert value == '"'
                            if self.debug : print "Quoted line %s:%s" % (key, value)
                        #DO NOT remove the quotes...
                        qualifiers.append((key,value))

</pre>


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org