From b.invergo at gmail.com Mon Oct 1 05:52:04 2012 From: b.invergo at gmail.com (Brandon Invergo) Date: Mon, 01 Oct 2012 11:52:04 +0200 Subject: [Biopython-dev] PAML test problems under Python 3.3.0 In-Reply-To: References: Message-ID: <87k3vazfi3.fsf@invergo.net> Yes no problem, I can take a look at it. I'm completely swamped at the moment, though, so I might have to put it off for a couple of days. If it's an emergency, let me know. -brandon Peter Cock writes: > Hi Brandon (et al), > > Could you have a look at the PAML unit tests under Python 3.3 please? > I see a mix of failures and 'blocking' under a self-compiled Python 3.3.0 > on Mac OS X 10.8 (Mountain Lion): > > $ python3 test_PAML_yn00.py > testAlignmentExists (__main__.ModTest) ... ok > testAlignmentFileIsValid (__main__.ModTest) ... FAIL > testAlignmentSpecified (__main__.ModTest) ... ok > testCtlFileExistsOnRead (__main__.ModTest) ... ok > testCtlFileExistsOnRun (__main__.ModTest) ... ok > testCtlFileValidOnRead (__main__.ModTest) ... ERROR > testCtlFileValidOnRun (__main__.ModTest) ... ok > testOptionExists (__main__.ModTest) ... ok > testOutputFileSpecified (__main__.ModTest) ... ok > testOutputFileValid (__main__.ModTest) ... ok > testParseAllVersions (__main__.ModTest) ... ok > testResultsExist (__main__.ModTest) ... ok > testResultsParsable (__main__.ModTest) ... ok > testResultsValid (__main__.ModTest) ... ^C > > $ python3 test_PAML_codeml.py > testAlignmentExists (__main__.ModTest) ... ok > testAlignmentFileIsValid (__main__.ModTest) ... FAIL > testAlignmentSpecified (__main__.ModTest) ... ok > testCtlFileExistsOnRead (__main__.ModTest) ... ok > testCtlFileExistsOnRun (__main__.ModTest) ... ok > testCtlFileValidOnRead (__main__.ModTest) ... ERROR > testCtlFileValidOnRun (__main__.ModTest) ... ok > testOptionExists (__main__.ModTest) ... ok > testOutputFileSpecified (__main__.ModTest) ... ok > testOutputFileValid (__main__.ModTest) ... ok > testPamlErrorsCaught (__main__.ModTest) ... ok > testParseAA (__main__.ModTest) ... ok > testParseAAPairwise (__main__.ModTest) ... ok > testParseAllNSsites (__main__.ModTest) ... ok > testParseBranchSiteA (__main__.ModTest) ... ok > testParseCladeModelC (__main__.ModTest) ... ok > testParseFreeRatio (__main__.ModTest) ... ok > testParseNSsite3 (__main__.ModTest) ... ok > testParseNgene2Mgene02 (__main__.ModTest) ... ok > testParseNgene2Mgene1 (__main__.ModTest) ... ok > testParseNgene2Mgene34 (__main__.ModTest) ... ok > testParsePairwise (__main__.ModTest) ... ok > testParseSEs (__main__.ModTest) ... ok > testResultsExist (__main__.ModTest) ... ok > testResultsParsable (__main__.ModTest) ... ok > testResultsValid (__main__.ModTest) ... ^C > > $ python3 test_PAML_baseml.py > testAlignmentExists (__main__.ModTest) ... ok > testAlignmentFileIsValid (__main__.ModTest) ... FAIL > testAlignmentSpecified (__main__.ModTest) ... ok > testCtlFileExistsOnRead (__main__.ModTest) ... ok > testCtlFileExistsOnRun (__main__.ModTest) ... ok > testCtlFileValidOnRead (__main__.ModTest) ... ERROR > testCtlFileValidOnRun (__main__.ModTest) ... ok > testOptionExists (__main__.ModTest) ... ok > testOutputFileSpecified (__main__.ModTest) ... ok > testOutputFileValid (__main__.ModTest) ... ok > testPamlErrorsCaught (__main__.ModTest) ... ok > testParseAllVersions (__main__.ModTest) ... ok > testParseAlpha1Rho1 (__main__.ModTest) ... ok > testParseModel (__main__.ModTest) ... ok > testParseNhomo (__main__.ModTest) ... ok > testParseSEs (__main__.ModTest) ... ok > testResultsExist (__main__.ModTest) ... ok > testResultsParsable (__main__.ModTest) ... ok > testResultsValid (__main__.ModTest) ... ^C > > If you've not tried this before, the procedure I'm using is: > > $ python3 setup.py build > $ cd build/py3.3/Tests > $ python3 test_PAML_baseml.py > etc > > The key point is to run the tests directly (rather than > just via 'python3 setup.py test') you must change > director to the 2to3 converted folder under the build > folder. > > By commenting out the test methods which seem to > blocking, it seems some of the failures are to do with > exception handling. I've not dug any further into this. > > Thanks, > > Peter From bjoern at gruenings.eu Mon Oct 1 17:44:10 2012 From: bjoern at gruenings.eu (=?ISO-8859-1?Q?Bj=F6rn_Gr=FCning?=) Date: Mon, 01 Oct 2012 23:44:10 +0200 Subject: [Biopython-dev] [Patch] Genbank Parser In-Reply-To: References: <1348837402.21455.1.camel@threonin> Message-ID: <1349127850.19730.11.camel@threonin> Hi Peter, > > > > the tbl2asn tool from the ncbi creates genbank files that did not have a > > version number. Unfortunately that version number is used to fill > > consumer.data.id. > > I implemented the following fall-back: > > If there is no version information available than it takes the > > consumer.data.name for the consumer.data.id. Does that makes sense? > > > > Thanks! > > Bjoern > > Can you share some example output from tbl2asn that shows > this problem? Ideally something small we could include as a > unit test. please find attached a small, stripped version of such an genbank file. Thanks, Bjoern > Thanks, > > Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: tbl1asn_output.gb Type: application/x-gameboy-rom Size: 5090 bytes Desc: URL: From p.j.a.cock at googlemail.com Thu Oct 4 05:11:01 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 4 Oct 2012 10:11:01 +0100 Subject: [Biopython-dev] [Patch] Genbank Parser In-Reply-To: <1349127850.19730.11.camel@threonin> References: <1348837402.21455.1.camel@threonin> <1349127850.19730.11.camel@threonin> Message-ID: On Mon, Oct 1, 2012 at 10:44 PM, Bj?rn Gr?ning wrote: > Hi Peter, > >> > >> > the tbl2asn tool from the ncbi creates genbank files that did not have a >> > version number. Unfortunately that version number is used to fill >> > consumer.data.id. >> > I implemented the following fall-back: >> > If there is no version information available than it takes the >> > consumer.data.name for the consumer.data.id. Does that makes sense? >> > >> > Thanks! >> > Bjoern >> >> Can you share some example output from tbl2asn that shows >> this problem? Ideally something small we could include as a >> unit test. > > please find attached a small, stripped version of such an genbank file. > > Thanks, > Bjoern $ python Python 2.7.2 (default, Jun 20 2012, 16:23:33) [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import SeqIO >>> r = SeqIO.read("tbl1asn_output.gb", "gb") /Library/Python/2.7/site-packages/Bio/GenBank/__init__.py:1158: BiopythonParserWarning: Expected sequence length 300246, found 2220 (). BiopythonParserWarning) >>> r.id '' >>> r.name 'Seq1' >>> r.description 'Glarea strain lozoyensis.' >>> quit() That warning is because this test file has only the start of the sequence present, yet the LOCUS line still gives the original length. $ head tbl1asn_output.gb LOCUS Seq1 300246 bp DNA linear 10-MAY-2012 DEFINITION Glarea strain lozoyensis. ACCESSION VERSION KEYWORDS . SOURCE Glarea ORGANISM Glarea Unclassified. REFERENCE 1 AUTHORS Test I didn't use your patch - looking over the code, it was already intended that if there was no record.id that record.name would be used. Sadly this was a bit too strict about None versus an empty string, fixed: https://github.com/biopython/biopython/commit/e67d22e4b4f344a5a3c15b6e939c82f58986d87f Thanks for your help, Peter From chapmanb at 50mail.com Thu Oct 4 21:02:06 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 04 Oct 2012 21:02:06 -0400 Subject: [Biopython-dev] TAIR/AGI support In-Reply-To: References: <87txvcx9ls.fsf@fastmail.fm> Message-ID: <874nm9g29d.fsf@fastmail.fm> Kevin; Thanks for making this available. This looks like a great start and seems like it would be a nice starting place for folks dealing with Arabidopsis data. A couple of thoughts which you've essentially already covered: - Could you build up a small test suite that fits into the testing framework: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc246 Your probably the best person to pick some disparate IDs that exercise different components and try to catch any edge cases. - Additional interfaces that help folks do more than get sequence are a great idea. The ideas you've proposed below sound perfect. - Provide some documentation on the Cookbook for common use cases with Biopython + your module. This will help motivate the addition and also help folks test it out on their data. Thanks again for making this available, Brad > Hi Brad, > > My TAIR/AGI script is on github here: > https://github.com/kdmurray91/biopython/blob/master/Bio/TAIR/__init__.py > > I got it to work directly from TAIR's website, however it has not been > rigorously tested. I plan on implementing the process as i described in my > previous email, whereby it fetches the Genbank record from TOGOws or via > NCBI's Efetch (using biopython's interfaces of course). I will keep you all > posted. > > To the list in general, I'm open to suggestions on what to work on next? > > > Regards > Kevin Murray > > > On 6 September 2012 10:45, Brad Chapman wrote: > >> >> Kevin; >> Thanks for the e-mail and offers of code. Always happy to have other >> folks involved with the project. >> >> > What's the status of TAIR AGIs in BioPython (I can see no mention of >> them, >> > or support for them)? I've written a brief module which allows a user to >> > query NCBI with a TAIR AGI, returning a Seq object (via Efetch). Is there >> > any interest in including such functionality in BioPython? >> >> Is the code available on GitHub to get a better sense of all the >> functionality it supports? Do you have an idea where it would fit best? >> As a tair submodule inside of Bio.Entrez, or somewhere else? >> >> > More generally, are there any particular areas of BioPython development >> > which could use an extra pair of hands? >> >> Following the mailing list for discussions on current projects is the >> best way to get a sense of what different folks are working on. The >> issue tracker also has open issues and features that could use attention >> if anything there strikes your fancy: >> >> https://redmine.open-bio.org/projects/biopython >> >> Hope this helps, >> Brad >> >> From tiagoantao at gmail.com Fri Oct 5 23:21:50 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 5 Oct 2012 20:21:50 -0700 Subject: [Biopython-dev] Away Re: buildbot failure in Biopython on Windows XP - Python 2.5 Message-ID: I am currently away from office. I will respond back on as soon as I retunr. Regards, Tiago -- "Liberty for wolves is death to the lambs" - Isaiah Berlin From chris.mit7 at gmail.com Sun Oct 7 22:48:20 2012 From: chris.mit7 at gmail.com (Chris Mitchell) Date: Sun, 7 Oct 2012 22:48:20 -0400 Subject: [Biopython-dev] Proteomics/Mass Spec in Biopython Message-ID: Hi everyone, I recall some time ago there was an email about getting some mass spec functionality within BioPython. I started a BioPython branch to incorporate some iterators for common file types. Of note, there is an iterator for .msf files created by Proteome Discoverer, which thankfully is light-years faster than using PD (and much more forgiving on memory...). It's located here: https://github.com/chrismit/biopython/tree/Proteomics It's following along the progression of my spectra viewer, which is hosted on the same repository (which, for anyone using linux might want to look at; I couldn't find a spectra viewer I liked for linux.). As I generalize more of the methods within that program I'll be adding them to the BioPython branch. Also, I'll be putting in some methods to take care of other common tasks such as FDRs calculation from the input files. I'd love to hear if anyone else wants to join up on this branch or provide suggestions. Chris From redmine at redmine.open-bio.org Wed Oct 10 09:02:23 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 10 Oct 2012 13:02:23 +0000 Subject: [Biopython-dev] [Biopython - Bug #3386] (New) NewickIO parse_tree is slow Message-ID: Issue #3386 has been reported by Aleksey Kladov. ---------------------------------------- Bug #3386: NewickIO parse_tree is slow https://redmine.open-bio.org/issues/3386 Author: Aleksey Kladov Status: New Priority: Normal Assignee: Category: Target version: URL: In the file NewickIO.py class Parser method _parse_subtree seems to be inefficient in time and space. In fact, it's running time is quadratic in respect to size of input, while it can be linear. The problem is that each symbol is read many (up to O(len(text))) times, for example here
for posn in range(1, close_posn):
            if text[posn] == '(':
                plevel += 1
            elif text[posn] == ')':
                plevel -= 1
            elif text[posn] == ',' and plevel == 0:
                subtrees.append(text[prev:posn])
                prev = posn + 1
or here
comment_start = text.find(NODECOMMENT_START)
Also, _parse_subtree relies heavily on slices and strips of strings, which gives quadratic memory consumption. Here is my dirty patched implementation. It's incomplete in many senses, I wrote it only to prove that parsing can be done faster. For unrooted binary tree with 15000 leaves it runs for 1 second, compared to 13 seconds from current implementation.
def _parse_tree(self, text, rooted):
        """Parses the text representation into an Tree object."""
        # XXX Pass **kwargs along from Parser.parse?
        return Newick.Tree(root=self._parse_subtree_fast(text)[0], rooted=rooted)

    def _parse_subtree_fast(self, text):
        id = re.compile(r'[A-Za-z0-9_]+')
        children = []
        if text.startswith('('):
            text = text[1:]
            while True:
                child, text = self._parse_subtree_fast(text)
                children.append(child)
                if text.startswith(','):
                    text = text[1:]
                else:
                    text = text[1:]
                    break
        m = re.match(id, text)
        if m:
            clade = self._parse_tag(m.group())
            text = text[m.end():]
        else:
            clade = Newick.Clade(comment=None)
        clade.clades = children
        return clade, text
PS. I don't know if someone really needs to parse huge trees with BioPython, but I need this feature for couple of http://rosalind.info problems ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From kjwu at ucsd.edu Wed Oct 10 17:27:19 2012 From: kjwu at ucsd.edu (Kevin Wu) Date: Wed, 10 Oct 2012 14:27:19 -0700 Subject: [Biopython-dev] KEGG API Wrapper Message-ID: Hi, I've written a simple wrapper on top of KEGG's new REST API ( http://www.kegg.jp/kegg/docs/keggapi.html). The main functionality of this module is that can detect some invalid queries based on kegg's defined rules. I've implemented each of the examples given on the api docs as tests as well. Here's a quick example of its usage. The api call to http://rest.kegg.jp/get/hsa:10458+ece:Z5100/aaseq can be done using the wrapper as: KEGG.query("get", ["hsa:10458", "ece:Z5100"], "aaseq") Querying the api works well with the current parsers written for KEGG formats. Let me know if there are issues or if it's useful enough to be merged into Biopython! https://github.com/kevinwuhoo/biopython Thanks! Kevin From mjldehoon at yahoo.com Sat Oct 13 07:38:04 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 13 Oct 2012 04:38:04 -0700 (PDT) Subject: [Biopython-dev] KEGG API Wrapper In-Reply-To: Message-ID: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com> Hi Kevin, It would be great to have better KEGG support in Biopython, so I think that this is useful and could in principle be merged into Biopython. But before we do so, is there some documentation for your code (or even better, for the Bio.KEGG module as a whole)? Then it's easier to see how this module can be used, and to discuss any modifications. Thanks for your contribution! -Michiel. --- On Wed, 10/10/12, Kevin Wu wrote: > From: Kevin Wu > Subject: [Biopython-dev] KEGG API Wrapper > To: Biopython-dev at lists.open-bio.org > Date: Wednesday, October 10, 2012, 5:27 PM > Hi, > > I've written a simple wrapper on top of KEGG's new REST API > ( > http://www.kegg.jp/kegg/docs/keggapi.html). The main > functionality of this > module is that can detect some invalid queries based on > kegg's defined > rules. I've implemented each of the examples given on the > api docs as tests > as well. Here's a quick example of its usage. > > The api call to http://rest.kegg.jp/get/hsa:10458+ece:Z5100/aaseq can > be > done using the wrapper as: > KEGG.query("get", ["hsa:10458", "ece:Z5100"], "aaseq") > > Querying the api works well with the current parsers written > for KEGG > formats. Let me know if there are issues or if it's useful > enough to be > merged into Biopython! > > https://github.com/kevinwuhoo/biopython > > Thanks! > Kevin > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From chapmanb at 50mail.com Mon Oct 15 11:02:12 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 15 Oct 2012 11:02:12 -0400 Subject: [Biopython-dev] BOSC/Broad Interoperability Hackathon: potential dates Message-ID: <87ipabeq2z.fsf@fastmail.fm> Hi all; Open Bio regularly organizes hackathon coding sessions in conjunction with the Bioinformatics Open Source Conference. The goal is to get together biologists writing open source code, provide a room and internet, and encourage fun collaborative coding. We've had successful two day Codefests the past three years: http://www.open-bio.org/wiki/Codefest_2012 This year, the Broad Institute kindly offered to host a two day Hackathon in Boston during April. We've proposed three sets of dates: April 4-5th, Thursday and Friday before Bio-IT April 7-8th, Sunday and Monday before Bio-IT April 22-23rd, Monday and Tuesday If you have interest in attending, please fill out this Doodle poll to let us know which dates work best: http://doodle.com/aapy694g43e6ya4f If you can find funds for travel and hotel (or are local to Boston), the event is free and everyone is welcome. As we finalize dates, we'll send around additional details. Thanks everyone, Brad From k.d.murray.91 at gmail.com Mon Oct 15 23:49:22 2012 From: k.d.murray.91 at gmail.com (Kevin Murray) Date: Tue, 16 Oct 2012 14:49:22 +1100 Subject: [Biopython-dev] TAIR/AGI support In-Reply-To: <874nm9g29d.fsf@fastmail.fm> References: <87txvcx9ls.fsf@fastmail.fm> <874nm9g29d.fsf@fastmail.fm> Message-ID: Brad, I shall work on this as time permits, and get back to you all when complete. Cheers, Regards Kevin Murray On 5 October 2012 11:02, Brad Chapman wrote: > > Kevin; > Thanks for making this available. This looks like a great start and > seems like it would be a nice starting place for folks dealing with > Arabidopsis data. A couple of thoughts which you've essentially already > covered: > > - Could you build up a small test suite that fits into the testing > framework: > > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc246 > > Your probably the best person to pick some disparate IDs that exercise > different components and try to catch any edge cases. > > - Additional interfaces that help folks do more than get sequence are a > great idea. The ideas you've proposed below sound perfect. > > - Provide some documentation on the Cookbook for common use cases with > Biopython + your module. This will help motivate the addition and also > help folks test it out on their data. > > Thanks again for making this available, > Brad > > > > Hi Brad, > > > > My TAIR/AGI script is on github here: > > https://github.com/kdmurray91/biopython/blob/master/Bio/TAIR/__init__.py > > > > I got it to work directly from TAIR's website, however it has not been > > rigorously tested. I plan on implementing the process as i described in > my > > previous email, whereby it fetches the Genbank record from TOGOws or via > > NCBI's Efetch (using biopython's interfaces of course). I will keep you > all > > posted. > > > > To the list in general, I'm open to suggestions on what to work on next? > > > > > > Regards > > Kevin Murray > > > > > > On 6 September 2012 10:45, Brad Chapman wrote: > > > >> > >> Kevin; > >> Thanks for the e-mail and offers of code. Always happy to have other > >> folks involved with the project. > >> > >> > What's the status of TAIR AGIs in BioPython (I can see no mention of > >> them, > >> > or support for them)? I've written a brief module which allows a user > to > >> > query NCBI with a TAIR AGI, returning a Seq object (via Efetch). Is > there > >> > any interest in including such functionality in BioPython? > >> > >> Is the code available on GitHub to get a better sense of all the > >> functionality it supports? Do you have an idea where it would fit best? > >> As a tair submodule inside of Bio.Entrez, or somewhere else? > >> > >> > More generally, are there any particular areas of BioPython > development > >> > which could use an extra pair of hands? > >> > >> Following the mailing list for discussions on current projects is the > >> best way to get a sense of what different folks are working on. The > >> issue tracker also has open issues and features that could use attention > >> if anything there strikes your fancy: > >> > >> https://redmine.open-bio.org/projects/biopython > >> > >> Hope this helps, > >> Brad > >> > >> > From zcharlop at mail.rockefeller.edu Tue Oct 16 19:55:26 2012 From: zcharlop at mail.rockefeller.edu (Zachary Charlop-Powers) Date: Tue, 16 Oct 2012 23:55:26 +0000 Subject: [Biopython-dev] KEGG API Wrapper In-Reply-To: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com> References: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: Kevin, Michiel, I just tested Kevin's code for a few simple queries and it worked great. I have always liked KEGG's organization of data and really appreciate this RESTful interface to their data; in some ways I think it easier to use the web interfaces for KEGG than it is for NCBI. Plus the KEGG coverage of metabolic networks is awesome. I found the examples in Kevin's test script to be fairly self-explanatory but a simple-spelled out example in the Tutorial would be nice. One thought, though, is that you can retrieve MANY different types of data from the KEGG Rest API - which means that the user will probably have to parse the data his/herself. Data retrieved with "list" can return lists of genes or compounds or organism and after a cursory look these are each formatted differently. Also true with the 'find' command. So I think you were right to leave out parsers because i think they will be a moving target highly dependent on the query. Thank You Kevin, zach cp On Oct 13, 2012, at 7:38 AM, Michiel de Hoon > wrote: Hi Kevin, It would be great to have better KEGG support in Biopython, so I think that this is useful and could in principle be merged into Biopython. But before we do so, is there some documentation for your code (or even better, for the Bio.KEGG module as a whole)? Then it's easier to see how this module can be used, and to discuss any modifications. Thanks for your contribution! -Michiel. --- On Wed, 10/10/12, Kevin Wu > wrote: From: Kevin Wu > Subject: [Biopython-dev] KEGG API Wrapper To: Biopython-dev at lists.open-bio.org Date: Wednesday, October 10, 2012, 5:27 PM Hi, I've written a simple wrapper on top of KEGG's new REST API ( http://www.kegg.jp/kegg/docs/keggapi.html). The main functionality of this module is that can detect some invalid queries based on kegg's defined rules. I've implemented each of the examples given on the api docs as tests as well. Here's a quick example of its usage. The api call to http://rest.kegg.jp/get/hsa:10458+ece:Z5100/aaseq can be done using the wrapper as: KEGG.query("get", ["hsa:10458", "ece:Z5100"], "aaseq") Querying the api works well with the current parsers written for KEGG formats. Let me know if there are issues or if it's useful enough to be merged into Biopython! https://github.com/kevinwuhoo/biopython Thanks! Kevin _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev Zach Charlop-Powers Post-Doctoral Fellow Laboratory of Genetically Encoded Small Molecules Rockefeller University zcharlop at rockefeller.edu From p.j.a.cock at googlemail.com Wed Oct 17 07:09:07 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 17 Oct 2012 12:09:07 +0100 Subject: [Biopython-dev] KEGG API Wrapper In-Reply-To: References: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: On Wed, Oct 17, 2012 at 12:55 AM, Zachary Charlop-Powers wrote: > Kevin, > Michiel, > > I just tested Kevin's code for a few simple queries and it worked great. I > have always liked KEGG's organization of data and really appreciate this > RESTful interface to their data; in some ways I think it easier to use the > web interfaces for KEGG than it is for NCBI. Plus the KEGG coverage of > metabolic networks is awesome. I found the examples in Kevin's test script > to be fairly self-explanatory but a simple-spelled out example in the > Tutorial would be nice. > > One thought, though, is that you can retrieve MANY different types of data > from the KEGG Rest API - which means that the user will probably have to > parse the data his/herself. Data retrieved with "list" can return lists of > genes or compounds or organism and after a cursory look these are each > formatted differently. Also true with the 'find' command. So I think you > were right to leave out parsers because i think they will be a moving target > highly dependent on the query. > > Thank You Kevin, > zach cp Good point about decoupling the web API wrapper and the parsers - how the Bio.Entrez module and Bio.TogoWS handle this is to return handles for web results, which you can then parse with an appropriate parser (e.g. SeqIO for GenBank files, Medline parser, etc). Note that this is a little more fiddly under Python 3 due to the text mode distinction between unicode and binary... just something to keep in the back of your mind. Peter From redmine at redmine.open-bio.org Wed Oct 17 09:27:18 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 17 Oct 2012 13:27:18 +0000 Subject: [Biopython-dev] [Biopython - Bug #3387] (New) Generic per column annotation from stockholm alignment are not stored in alignment object Message-ID: Issue #3387 has been reported by saverio vicario. ---------------------------------------- Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object https://redmine.open-bio.org/issues/3387 Author: saverio vicario Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Stockholm format includes 4 types of annotations #=GF #=GC #=GS #=GR GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations. GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Wed Oct 17 09:27:18 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 17 Oct 2012 13:27:18 +0000 Subject: [Biopython-dev] [Biopython - Bug #3387] (New) Generic per column annotation from stockholm alignment are not stored in alignment object Message-ID: Issue #3387 has been reported by saverio vicario. ---------------------------------------- Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object https://redmine.open-bio.org/issues/3387 Author: saverio vicario Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Stockholm format includes 4 types of annotations #=GF #=GC #=GS #=GR GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations. GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Wed Oct 17 09:36:24 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 17 Oct 2012 13:36:24 +0000 Subject: [Biopython-dev] [Biopython - Bug #3387] Generic per column annotation from stockholm alignment are not stored in alignment object References: Message-ID: Issue #3387 has been updated by Peter Cock. The underlying alignment class would need a per-column-annotation dictionary (as well as an annotations dictionary, also on the TODO list), to match the per-letter-annotation and annotations dictionaries of the SeqRecord. Parsing this and putting it in alignment._letter_annotation (dictionary as a private variable) would be a reasonable short term hack if you'd like to work on that. ---------------------------------------- Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object https://redmine.open-bio.org/issues/3387 Author: saverio vicario Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Stockholm format includes 4 types of annotations #=GF #=GC #=GS #=GR GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations. GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Wed Oct 17 09:39:25 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 17 Oct 2012 13:39:25 +0000 Subject: [Biopython-dev] [Biopython - Feature #3388] (New) add annotation and letter_annotations attributed for Bio.Align.MultipleSeqAlignment. object Message-ID: Issue #3388 has been reported by saverio vicario. ---------------------------------------- Feature #3388: add annotation and letter_annotations attributed for Bio.Align.MultipleSeqAlignment. object https://redmine.open-bio.org/issues/3388 Author: saverio vicario Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: At the moment I could not add annotation at alignment level. annotation could be usefull for tracking info linked to the loci ( i.e. name of domain), while letter annotation could be usefull to track quality score of alignment or if the sites belong to a given character set. In particular when to alignment are merged it would be usefull tha the bounduary of the merge is tracked for example in Letter annotation of the merge of an alignment a with 10 sites and b of 5 sites the letter_annotations would be as following {locus1:'111111111100000',locus2:'000000000011111'} this could be usefull also to annotate the 3 position of codons {pos1:'1001001001',pos2:'0100100100', pos3:'0010010010'} If this letter_annotation would be supported the annotation could be kept across merging and splitting of the alignment -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Wed Oct 17 09:39:25 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 17 Oct 2012 13:39:25 +0000 Subject: [Biopython-dev] [Biopython - Feature #3388] (New) add annotation and letter_annotations attributed for Bio.Align.MultipleSeqAlignment. object Message-ID: Issue #3388 has been reported by saverio vicario. ---------------------------------------- Feature #3388: add annotation and letter_annotations attributed for Bio.Align.MultipleSeqAlignment. object https://redmine.open-bio.org/issues/3388 Author: saverio vicario Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: At the moment I could not add annotation at alignment level. annotation could be usefull for tracking info linked to the loci ( i.e. name of domain), while letter annotation could be usefull to track quality score of alignment or if the sites belong to a given character set. In particular when to alignment are merged it would be usefull tha the bounduary of the merge is tracked for example in Letter annotation of the merge of an alignment a with 10 sites and b of 5 sites the letter_annotations would be as following {locus1:'111111111100000',locus2:'000000000011111'} this could be usefull also to annotate the 3 position of codons {pos1:'1001001001',pos2:'0100100100', pos3:'0010010010'} If this letter_annotation would be supported the annotation could be kept across merging and splitting of the alignment ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Wed Oct 17 11:00:15 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 17 Oct 2012 15:00:15 +0000 Subject: [Biopython-dev] [Biopython - Bug #3387] Generic per column annotation from stockholm alignment are not stored in alignment object References: Message-ID: Issue #3387 has been updated by Peter Cock. Depends on issue #3388, add annotation and letter_annotations attributed to Bio.Align.MultipleSeqAlignment object https://redmine.open-bio.org/issues/3388 ---------------------------------------- Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object https://redmine.open-bio.org/issues/3387 Author: saverio vicario Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Stockholm format includes 4 types of annotations #=GF #=GC #=GS #=GR GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations. GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Oct 18 07:02:49 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 18 Oct 2012 11:02:49 +0000 Subject: [Biopython-dev] [Biopython - Bug #3387] Generic per column annotation from stockholm alignment are not stored in alignment object References: Message-ID: Issue #3387 has been updated by saverio vicario. File diff_StockholmIO.py added File StockholmIO.py added This is my proposal of patch for StockholmIO. Attached you will find the new StockholmIO.py and a diff file with the old one. To highlight further the new comments I start the comment by #SV In summary the patch implement the new attribute _letter_annotations for Bio.Align.MultipleSeqAlignment and store the GC features within, in the iterator while in the writer write the GC features after all sequence record as stated in http://sonnhammer.sbc.su.se/Stockholm.html. I added a new dictionary for GC and GF features using PFAM standard and it is used in the writing phase to write only PFAM legitimate attributes. The only addition to PFAM standard is the GC features "RF" that is add by HMMer3.0 softwares to indicates what sites where originally present in the profile used to generate the alignment. I do not use the dictionary of PFAM standard to translate the GF, GR attributes of alignment._annotations or the GC attributes in alignment._letter_annotations as is done in the seqRecord for consistency with decision taken originally with GR attributes in alignment._annotations ---------------------------------------- Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object https://redmine.open-bio.org/issues/3387 Author: saverio vicario Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Stockholm format includes 4 types of annotations #=GF #=GC #=GS #=GR GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations. GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Thu Oct 18 14:33:04 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 18 Oct 2012 19:33:04 +0100 Subject: [Biopython-dev] PyPy 1.8 support? Message-ID: Hello all, We currently run the test suite against both PyPy 1.8 and 1.9 on Linux via the TravisCI.org continuous integration testing service. Is anyone actually using Biopython under PyPy 1.8? If not, I intend to drop automated testing under PyPy 1.8 and focus just on PyPy 1.9 instead. (Automated testing under C Python 2.5, 2.6, 2.7, 3.1 and 3.2 etc will continue - I'm hoping to add Python 3.3 as well) Thanks, Peter From ben at benfulton.net Thu Oct 18 23:16:45 2012 From: ben at benfulton.net (Ben Fulton) Date: Thu, 18 Oct 2012 23:16:45 -0400 Subject: [Biopython-dev] Contributing startup Message-ID: Hi, I was looking for some introductory tickets or other methods to familiarize myself with the Biopython codebase. I saw some suggestions on the wiki to improve unit test coverage or to add additional file formats, which sounds fine - are there particular areas of code that lack coverage, or file formats that are particularly wanted? Or would it be better to look over the issue tracker and try to identify some smallish issues? Thanks for any suggestions. Ben Fulton From p.j.a.cock at googlemail.com Fri Oct 19 03:52:19 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 19 Oct 2012 08:52:19 +0100 Subject: [Biopython-dev] PyPy 1.8 support? In-Reply-To: References: Message-ID: On Thu, Oct 18, 2012 at 7:33 PM, Peter Cock wrote: > Hello all, > > We currently run the test suite against both PyPy 1.8 and > 1.9 on Linux via the TravisCI.org continuous integration > testing service. > > Is anyone actually using Biopython under PyPy 1.8? > > If not, I intend to drop automated testing under PyPy 1.8 > and focus just on PyPy 1.9 instead. Done on TravisCI, but easy to revert: https://github.com/biopython/biopython/commit/126c944812730df4677c8fa2f63abc29ddd084bb One reason was the previous build failed due to a timeout fetching PyPy for a custom install. Now we use the TravisCI provided PyPy which should avoid that issue. (It still happens for Jython sometimes). Peter From p.j.a.cock at googlemail.com Fri Oct 19 04:26:35 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 19 Oct 2012 09:26:35 +0100 Subject: [Biopython-dev] Contributing startup In-Reply-To: References: Message-ID: On Fri, Oct 19, 2012 at 4:16 AM, Ben Fulton wrote: > Hi, > > I was looking for some introductory tickets or other methods to familiarize > myself with the Biopython codebase. I saw some suggestions on the wiki to > improve unit test coverage or to add additional file formats, which sounds > fine - are there particular areas of code that lack coverage, or file > formats that are particularly wanted? Or would it be better to look over > the issue tracker and try to identify some smallish issues? > > Thanks for any suggestions. > > Ben Fulton Hi Ben, Welcome - more volunteer developers willing to help is always nice. You asked about test coverage, and while I could guess about things what might be most interesting would be to try and measure this using something like coverage or figleaf: http://nedbatchelder.com/code/coverage/ http://darcs.idyll.org/~t/projects/figleaf/doc/ Another general area would be improving our support under Python 3. In terms of specific modules, is there anything in particular which seems like a good match with your work/research interests? Regards, Peter From p.j.a.cock at googlemail.com Mon Oct 22 12:43:07 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 22 Oct 2012 17:43:07 +0100 Subject: [Biopython-dev] Low level string based FASTA parser Message-ID: Hello all, Something I've wanted/needed recently was a low-level FASTA iterating parser which just returns tuples of strings (without the overhead of Bio.SeqIO building SeqRecords). We don't currently have such a thing, so I have added one to the SeqIO Fasta module (mirroring the low level string-tuple parser for FASTQ files) with some associated unit tests and refactoring (separate commits): https://github.com/biopython/biopython/commit/751fe39765ca6ba60e517b3b4657718fd48f7817 Does anyone have any views on the name of this new function, currently SimpleFastaParser, used as follows: >>> from Bio.SeqIO.FastaIO import SimpleFastaParser >>> with open("Fasta/dups.fasta") as handle: ... for values in SimpleFastaParser(handle): ... print values ('alpha', 'ACGTA') ('beta', 'CGTC') ('gamma', 'CCGCC') ('alpha (again - this is a duplicate entry to test the indexing code)', 'ACGTA') ('delta', 'CGCGC') The capitalisation style is consistent with other functions in SeqIO, but not with PEP8. Peter P.S. I've also updated the legacy function quick_FASTA_reader in Bio.SeqUtils to use this. Since it loads the whole dataset into memory, if no one objects I would like to deprecate this old function. From p.j.a.cock at googlemail.com Mon Oct 22 13:08:47 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 22 Oct 2012 18:08:47 +0100 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346926418.97489.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: On Fri, Sep 28, 2012 at 11:50 AM, Peter Cock wrote: > On Thu, Sep 20, 2012 at 10:08 AM, Peter Cock wrote: >> On Sun, Sep 16, 2012 at 1:34 PM, Peter Cock wrote: >>>> >>>> I guess we need to have a little hack with the 2to3 library and >>>> try defining our own custom fixer for the imports... >>> >>> I've made a start at this - the easy part seems to work :) >>> >>> https://github.com/peterjc/biopython/commits/py3lower >>> >>> ... > > The code to do this lower case name mangling remains > a quite spaghetti like mess in do2to3.py but it now works > enough to pass the test suite (with some but not all 3rd > party dependencies installed) under Linux and my Mac > OS X machine (where like Windows I have a case > insensitive file system). > > ... > > So this idea to adopt PEP8 lower case module names > as part of supporting Python 3 appears to be technically > viable. Has anyone else tried this branch yet? Has the lower case module names under Python 3 idea grown on anyone? I think it makes sense in terms of a long term vision - I do expect to be primarily working under Python 3 within a couple of years. It occurs to me we can make a partial step in this direction with moving to a directory for Bio.Seq, since this could be Bio.seq instead. For example, we talked about something like this: Bio.Seq -> Bio.seq Bio.SeqRecord -> Bio.seq.record Bio.SeqFeature -> Bio.seq.feature Bio.SeqUtils -> Bio.seq.utils Bio.SearchIO -> Bio.seq.search I'm not 100% sure where the Bio.SeqIO top level functions would belong, either directly under Bio.seq or Bio.seq.record might work too. We can have imports setup so that all the classes etc are only defined once, e.g. Bio/seq/__init__.py could initially just contain 'from Bio.Seq import *' and so on. (We'd commit to maintaining the old namespace for at least as long as our standard deprecation cycle, longer ideally). Peter From p.j.a.cock at googlemail.com Mon Oct 22 13:17:34 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 22 Oct 2012 18:17:34 +0100 Subject: [Biopython-dev] Dropping Python 2.5 and Jython 2.5 support? Message-ID: Dear Biopythoneers, Would anyone object to us preparing to drop support for Python 2.5 and Jython 2.5, perhaps after the next Biopython release? To reassure those of you using Jython, we'd wait until Jython 2.7 is out first. Jython 2.7 is already in alpha, and brings support for C Python 2.7 language features. Thanks, Peter From eric.talevich at gmail.com Mon Oct 22 17:53:55 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 22 Oct 2012 17:53:55 -0400 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346926418.97489.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: On Mon, Oct 22, 2012 at 1:08 PM, Peter Cock wrote: > On Fri, Sep 28, 2012 at 11:50 AM, Peter Cock > wrote: > > On Thu, Sep 20, 2012 at 10:08 AM, Peter Cock > wrote: > >> On Sun, Sep 16, 2012 at 1:34 PM, Peter Cock > wrote: > >>>> > >>>> I guess we need to have a little hack with the 2to3 library and > >>>> try defining our own custom fixer for the imports... > >>> > >>> I've made a start at this - the easy part seems to work :) > >>> > >>> https://github.com/peterjc/biopython/commits/py3lower > >>> > >>> ... > > > > The code to do this lower case name mangling remains > > a quite spaghetti like mess in do2to3.py but it now works > > enough to pass the test suite (with some but not all 3rd > > party dependencies installed) under Linux and my Mac > > OS X machine (where like Windows I have a case > > insensitive file system). > > > > ... > > > > So this idea to adopt PEP8 lower case module names > > as part of supporting Python 3 appears to be technically > > viable. > > Has anyone else tried this branch yet? Has the lower case > module names under Python 3 idea grown on anyone? > I think it makes sense in terms of a long term vision - I do > expect to be primarily working under Python 3 within a > couple of years. > > It occurs to me we can make a partial step in this direction > with moving to a directory for Bio.Seq, since this could be > Bio.seq instead. For example, we talked about something > like this: > > Bio.Seq -> Bio.seq > Bio.SeqRecord -> Bio.seq.record > Bio.SeqFeature -> Bio.seq.feature > Bio.SeqUtils -> Bio.seq.utils > Bio.SearchIO -> Bio.seq.search > > I'm not 100% sure where the Bio.SeqIO top level functions > would belong, either directly under Bio.seq or Bio.seq.record > might work too. > Personally, I've used the variable name "seq" an awful lot, so I'm wary of using "seq" as a module name. However, reasonable coding style could make this easy to avoid if we have a "seq" module containing all of Seq, SeqRecord and SeqFeature (maybe even Alphabet), and "sequtil" containing standalone functions. Result: # Everything you need to build a new sequence record, but not much else from Bio.seq import Seq, SeqRecord, SeqFeature # Working with sequence strings from Bio import sequtil It also seems reasonable to treat molecular sequences as the implied core object type at the top-level namespace. From that viewpoint, Bio.Search would mean sequence search, as everything else is typically tucked away in a sub-module like PDB (pdb?), Motif (motif), or Phylo (phylo); then it's also fine to keep seqio and alignio directly under the Bio namespace. (Given a clean, I'd prefer "from Bio import Seq, SeqRecord, SeqFeature", but since those are already module names it would be brutal to make that transition now.) > We can have imports setup so that all the classes etc > are only defined once, e.g. Bio/seq/__init__.py could > initially just contain 'from Bio.Seq import *' and so on. > > Sounds cool. We'll need to watch out for the PDB module, where classes and modules have identical names, and the class names are imported to shadow the module names at import time. -Eric From p.j.a.cock at googlemail.com Mon Oct 22 18:59:21 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 22 Oct 2012 23:59:21 +0100 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346926418.97489.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: On Mon, Oct 22, 2012 at 10:53 PM, Eric Talevich wrote: > > Personally, I've used the variable name "seq" an awful lot, so I'm wary of > using "seq" as a module name. However, reasonable coding style could make > this easy to avoid if we have a "seq" module containing all of Seq, > SeqRecord and SeqFeature (maybe even Alphabet), and "sequtil" containing > standalone functions. > > Result: > > # Everything you need to build a new sequence record, but not much else > from Bio.seq import Seq, SeqRecord, SeqFeature I'd been picturing: from Bio.seq import Seq from Bio.seq.record import SeqRecord from Bio.seq.feature import SeqFeature but you're right, those three classes could all be exposed at the level of Bio.seq (while still having the SeqRecord defined in the file Bio/seq/record.py and SeqFeature etc in Bio/seq/feature.py) for connivence. > # Working with sequence strings > from Bio import sequtil If you mean strings rather than Seq objects, currently Bio.SeqUtils should most work on Seq or strings. It is kind of an odds and ends module, rather than deliberately focusing on sequences as strings. > It also seems reasonable to treat molecular sequences as the implied core > object type at the top-level namespace. From that viewpoint, Bio.Search > would mean sequence search, as everything else is typically tucked away in a > sub-module like PDB (pdb?), Motif (motif), or Phylo (phylo); then it's also > fine to keep seqio and alignio directly under the Bio namespace. Having sequence stuff collected under Bio.Seq or Bio.seq (or bio.seq if we go with the lower case plan for Python 3) seems more organised. It also keeps the import times down for people not working with sequences (e.g. a script using clustering or PDB files). > (Given a clean, I'd prefer "from Bio import Seq, SeqRecord, SeqFeature", but > since those are already module names it would be brutal to make that > transition now.) That isn't a good plan anyway in terms of polluting the namespace and loading things into memory for anyone not working with sequences. >> We can have imports setup so that all the classes etc >> are only defined once, e.g. Bio/seq/__init__.py could >> initially just contain 'from Bio.Seq import *' and so on. >> > > Sounds cool. We'll need to watch out for the PDB module, where classes and > modules have identical names, and the class names are imported to shadow the > module names at import time. The shadowing was one of the gotchas in the auto-conversion of all the module names to lower case - but solvable. Adopting lower case module names has the bonus of fixing this in the long term. Peter From kjwu at ucsd.edu Wed Oct 24 18:38:04 2012 From: kjwu at ucsd.edu (Kevin Wu) Date: Wed, 24 Oct 2012 15:38:04 -0700 Subject: [Biopython-dev] KEGG API Wrapper In-Reply-To: References: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: Hi All, Thanks for the comments, I've written a bit of documentation on the entire KEGG module and have attached those relevant pages to the email. There didn't seem like an appropriate place for examples, so I just added a new chapter. I've also committed the updated file to github. I did leave out the parsers due to the fact that the current parsers only cover a small portion of possible responses from the api. Also, I'm not confident that the some of the parsers correctly retrieves all the fields. However, I've written a really general parser that does a rough job of retrieving fields if it's a database format returned since I find myself reusing the code for all database formats. It's possible to modify this to correctly account for the different fields, but would probably take a bit of work to manually figure each field out. Otherwise it also parses the tsv/flat file returned. Also, @zach, thanks for checking it out and testing it! Thanks All! Kevin On Wed, Oct 17, 2012 at 4:09 AM, Peter Cock wrote: > On Wed, Oct 17, 2012 at 12:55 AM, Zachary Charlop-Powers > wrote: > > Kevin, > > Michiel, > > > > I just tested Kevin's code for a few simple queries and it worked great. > I > > have always liked KEGG's organization of data and really appreciate this > > RESTful interface to their data; in some ways I think it easier to use > the > > web interfaces for KEGG than it is for NCBI. Plus the KEGG coverage of > > metabolic networks is awesome. I found the examples in Kevin's test > script > > to be fairly self-explanatory but a simple-spelled out example in the > > Tutorial would be nice. > > > > One thought, though, is that you can retrieve MANY different types of > data > > from the KEGG Rest API - which means that the user will probably have to > > parse the data his/herself. Data retrieved with "list" can return lists > of > > genes or compounds or organism and after a cursory look these are each > > formatted differently. Also true with the 'find' command. So I think you > > were right to leave out parsers because i think they will be a moving > target > > highly dependent on the query. > > > > Thank You Kevin, > > zach cp > > Good point about decoupling the web API wrapper and the parsers - > how the Bio.Entrez module and Bio.TogoWS handle this is to return > handles for web results, which you can then parse with an appropriate > parser (e.g. SeqIO for GenBank files, Medline parser, etc). > > Note that this is a little more fiddly under Python 3 due to the text > mode distinction between unicode and binary... just something to > keep in the back of your mind. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -------------- next part -------------- A non-text attachment was scrubbed... Name: KEGG documentation.pdf Type: application/pdf Size: 128597 bytes Desc: not available URL: From cmccoy at fhcrc.org Thu Oct 25 17:36:44 2012 From: cmccoy at fhcrc.org (Connor McCoy) Date: Thu, 25 Oct 2012 14:36:44 -0700 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs Message-ID: Hello, About a year ago, pip support came up on the list: http://biopython.org/pipermail/biopython-dev/2011-October/009234.html I remember this being resolved, but when I try to install biopython with pip, it fails: $ testenv/bin/pip install biopython Downloading/unpacking biopython Running setup.py egg_info for package biopython warning: no previously-included files matching '.cvsignore' found under directory '*' warning: no previously-included files matching '*.pyc' found under directory '*' Installing collected packages: biopython Running setup.py install for biopython Numerical Python (NumPy) is not installed. This package is required for many Biopython features. Please install it before you install Biopython. You can install Biopython anyway, but anything dependent on NumPy will not work. If you do this, and later install NumPy, you should then re-install Biopython. You can find NumPy at http://numpy.scipy.org Complete output from command /home/cmccoy/development/seqmagick/testenv/bin/python -c "import setuptools;__file__='/home/cmccoy/development/seqmagick/testenv/build/biopython/set up.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /tmp/pip-wc___H-record/install-record.txt - -install-headers /home/cmccoy/development/seqmagick/testenv/include/site/python2.7: running install Numerical Python (NumPy) is not installed. This package is required for many Biopython features. Please install it before you install Biopython. You can install Biopython anyway, but anything dependent on NumPy will not work. If you do this, and later install NumPy, you should then re-install Biopython. You can find NumPy at http://numpy.scipy.org ---------------------------------------- Command /home/cmccoy/development/seqmagick/testenv/bin/python -c "import setuptools;__file__='/home/cmccoy/development/seqmagick/testenv/build/biopython/setup.py';exec(compile(open( __file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /tmp/pip-wc___H-record/install-record.txt --install-headers /home/cm ccoy/development/seqmagick/testenv/include/site/python2.7 failed with error code 255 in /home/cmccoy/development/seqmagick/testenv/build/biopython Storing complete log in /home/cmccoy/.pip/pip.log Same for libraries which list biopython in `install_requires`. Does anyone know of a way around this? Thanks, Connor -- Connor McCoy Fred Hutchinson Cancer Research Center 1100 Fairview Ave N. Seattle, WA 98109-1924 cmccoy at fhcrc.org From mjldehoon at yahoo.com Thu Oct 25 22:52:42 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 25 Oct 2012 19:52:42 -0700 (PDT) Subject: [Biopython-dev] KEGG API Wrapper In-Reply-To: Message-ID: <1351219962.39081.YahooMailClassic@web164002.mail.gq1.yahoo.com> Hi Kevin, Thanks for the documentation! That makes everything a lot clearer. Overall I like the querying code and I think we should add it to Biopython. I have a bunch of comments on the KEGG module, some on the existing code and some on the new querying code, see below. Most of these are trivial; some may need some further discussion. Perhaps could you let us know which of these comments you can address, and which ones you want to skip for now? Once we converged with regards to the querying code and the documentation, I think we can import your version of the KEGG module into the main Biopython repository and add your chapter on KEGG to the main documentation, and continue from there on the parsers and the unit tests. Many thanks! -Michiel. About the querying code: ---------------------------------- I would replace KEGG.query("list", KEGG.query("find", KEGG.query("conv", KEGG.query("link", KEGG.query("info", KEGG.query("get" by the functions KEGG.list, KEGG.find, KEGG.conv, KEGG.link, KEGG.info, and KEGG.get. For list, find, conv, link, and info, instead of going through KEGG.generic_parser, I would return the result directly as a Python list. In contrast, KEGG.get should return the handle to the results, not the data itself. So the _q function, instead of ? ... ? resp = urllib2.urlopen(req) ? data = resp.read() ? return query_url, data have ? ... ? resp = urllib2.urlopen(req) ? return resp Then the user can decide whether to parse the data on the fly with Bio.KEGG, or read the data line by line and pick up what they are interested in, or to get all data from the handle and save it in a file. Note that resp will have a .url attribute that contains the url, so you won't need the ret_url keyword. About the parsers: ------------------------ I think that we should drop generic_parser. For link, find, conv, link, and info, parsing is trivial and can be done by the respective functions directly. For get, we already have an appropriate parser for some databases (compound, map, and enzyme), but it's easy to add parsers for the other databases. For all parsers in Biopython, there is the question whether the record should store information in attributes (as is currently done in Bio.KEGG), or alternatively if the record should inherit from a dictionary and store information in keys in the dictionary. Personally I have a preference for a dictionary, since that allows us to use the exact same keys in the dictionary as is used in the file (e.g., we can use "CLASS" as a key, while we cannot use .class as an attribute since it is a reserved word, so we use .classname instead). But other Biopython developers may not agree with me, and to some extent it depends on personal preference. The parsers miss some key words. The ones I noticed are ALL_REAC, REFERENCE, and ORTHOLOGY. Probably we'll find more once we extend the unit tests. Remove the ';' at the end of each term in record.classname. Convert record.genes to a dictionary for each organism. So instead of [('HSA', ['5236', '55276']), ('PTR', ['456908', '461162']), ('PON', ['100190836', '100438793']), ('MCC', ['100424648', '699401']... have {'HSA': ['5236', '55276'], 'PTR': ['456908', '461162'], 'PON': ['100190836', '100438793'], 'MCC': ['100424648', '699401'], ... Also for record.dblinks, record.disease, record.structures, use a dictionary. In record.pathway, all entries start with 'PATH'. Perhaps we should check with KEGG if there could be anything else than 'PATH' there, otherwise I don't see the reason why it's there. Assuming that there could be something different there, I would also use a dictionary with 'PATH' as the key. In record.reaction, some chemical names can be very long and extend over multiple lines. In such cases, the continuation line starts with a '$'. The parser should remove the '$' and join the two lines. About the tests: -------------------- We should update the data files in Tests/KEGG. This will fix some "bugs" in these data files. We should switch test_KEGG.py to the unit test framework. We should do some more extensive testing to make sure we are not missing some key words. About the documentation: --------------------------------- It's great that we now have some documentation. On page 233, I would suggest to replace the "id_" by "accession" or something else, since the underscore in "id_" may look funky to new users. Also it may be better not to reuse variable names (e.g. "pathway" is used in three different ways in the example). It's OK of course in general, but for this example it may be more clear to distinguish the different usages of this variable from each other. For repair_genes, you can use a set instead of a list throughout. --- On Wed, 10/24/12, Kevin Wu wrote: From: Kevin Wu Subject: Re: [Biopython-dev] KEGG API Wrapper To: "Peter Cock" , "Zachary Charlop-Powers" , "Michiel de Hoon" Cc: Biopython-dev at lists.open-bio.org Date: Wednesday, October 24, 2012, 6:38 PM Hi All, Thanks for the comments, I've written a bit of documentation on the entire KEGG module and have attached those relevant pages to the email. There didn't seem like an?appropriate place for examples, so I just added a new chapter. I've also committed the updated file to github. I did leave out the parsers due to the fact that the current parsers only cover a small portion of possible responses from the api. Also, I'm not confident that the some of the parsers correctly retrieves all the fields. However, I've written a really general parser that does a rough job of retrieving fields if it's a database format returned since I find myself reusing the code for all database formats. It's possible to modify this to correctly account for the different fields, but would probably take a bit of work to manually figure each field out. Otherwise it also parses the tsv/flat file returned. Also, @zach, thanks for checking it out and testing it! Thanks All!Kevin On Wed, Oct 17, 2012 at 4:09 AM, Peter Cock wrote: On Wed, Oct 17, 2012 at 12:55 AM, Zachary Charlop-Powers wrote: > Kevin, > Michiel, > > I just tested Kevin's code for a few simple queries and it worked great. I > have always liked KEGG's organization of data and really appreciate this > RESTful interface to their data; in some ways I think it easier to use the > web interfaces for KEGG than it is for NCBI. Plus the KEGG coverage of > metabolic networks is awesome. ?I found the examples in Kevin's test script > to be fairly self-explanatory but a simple-spelled out example in the > Tutorial would be nice. > > One thought, though, is that you can retrieve MANY different types of data > from the KEGG Rest API - which means that the user will probably have to > parse the data his/herself. Data retrieved with "list" can return lists of > genes or compounds or organism and after a ?cursory look ?these are each > formatted differently. Also true with the 'find' command. So I think you > were right to leave out parsers because i think they will be a moving target > highly dependent on the query. > > Thank You Kevin, > zach cp Good point about decoupling the web API wrapper and the parsers - how the Bio.Entrez module and Bio.TogoWS handle this is to return handles for web results, which you can then parse with an appropriate parser (e.g. SeqIO for GenBank files, Medline parser, etc). Note that this is a little more fiddly under Python 3 due to the text mode distinction between unicode and binary... just something to keep in the back of your mind. Peter _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From kai.blin at biotech.uni-tuebingen.de Fri Oct 26 04:35:56 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 26 Oct 2012 10:35:56 +0200 Subject: [Biopython-dev] Status of SearchIO Message-ID: <508A4B6C.6020801@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi folks, In the summer, I've written a HMMer2 parser based on Bow's SearchIO code. I'm finally getting around to continue work on the project I needed this parser for, and I'm trying to get my code up-to-date. I notice that Bow's code hasn't hit the biopython master tree yet, and also doesn't rebase cleanly on top of it. A merge gives a couple of merge conflicts, but seems manageable. However, I'd prefer to stick to the upstream sources instead of maintaining my own branch containing Bow's SearchIO code merged to master. What's the chance of this happening any time soon, and can I help? Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQiktsAAoJEKM5lwBiwTTPuDMH/33PGo/zLpBGw+dKIBXZ9b9L opaoI5uUsj4XzWU1A8u50BXFqa6ogwUWeZFaA2j25nQgEClWA5TFdHAJM4urTTgD pM2g2rsL/yLSrVifM95c2IcRW2z7dunccpJDd6cc82BRpqqgGWrkNo7OSUk/exP3 DbfooBw66Scxt+6o6S9zEH4IY5giuDOGzwQm195TCaZ/x/8/y1F8Ub/8Aporbj47 eJgZmEKzh0k8KePKOdyCmnt/d/bDGplFSvgqXET6Q0jmVAG44lAU679UPCmNiuJr VZD2SMRKy+Buy3TjJjQCeUEm+awN4T2LnPLDJgJkvRHjl6G+M9aljsuL78uCp9g= =1Nrt -----END PGP SIGNATURE----- From p.j.a.cock at googlemail.com Fri Oct 26 05:21:50 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 10:21:50 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: <508A4B6C.6020801@biotech.uni-tuebingen.de> References: <508A4B6C.6020801@biotech.uni-tuebingen.de> Message-ID: On Fri, Oct 26, 2012 at 9:35 AM, Kai Blin wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi folks, > > In the summer, I've written a HMMer2 parser based on Bow's SearchIO > code. I'm finally getting around to continue work on the project I > needed this parser for, and I'm trying to get my code up-to-date. > > I notice that Bow's code hasn't hit the biopython master tree yet, and > also doesn't rebase cleanly on top of it. A merge gives a couple of > merge conflicts, but seems manageable. However, I'd prefer to stick to > the upstream sources instead of maintaining my own branch containing > Bow's SearchIO code merged to master. > > What's the chance of this happening any time soon, and can I help? > > Cheers, > Kai I'm not sure where the merge conflict is - Bow can probably help and confirm you're looking at the appropriate branch. What would help is comments on the name space ideas in this thread, since one major point we need to settle ASAP is where in the namespace SearchIO would go (since it probably won't just stay as Bio.SearchIO as it is on the branch): http://lists.open-bio.org/pipermail/biopython-dev/2012-September/009910.html ... http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html ... Peter From w.arindrarto at gmail.com Fri Oct 26 05:33:35 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 26 Oct 2012 11:33:35 +0200 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <508A4B6C.6020801@biotech.uni-tuebingen.de> Message-ID: Hi Kai, Peter, For the merge conflict, which branch are you using? Can you point to specific commits that cause the conflicts? I haven't tried merging / rebasing my own branch to the current master myself ~ so knowing this should help the process as well. And suggestions are still welcomed for the namespace :). Bio.SearchIO is the current one, but we have other alternatives (the most recent one being Bio.seq.search; following the change in Bio.Seq -> Bio.seq namespace change). Also, I think there are still some issues that need to be dealt with before we put SearchIO into master, notably with Bio.BLAST module. If not the official deprecation notice, at least the the tutorial has to be updated (let Bio.BLAST readers know about the plan with SearchIO). I've written a short tutorial here: http://bow.web.id/biopython/Tutorial.html. This is still a draft, but you can already see that there are some obvious overlaps between Bio.BLAST and Bio.SearchIO, which is confusing to new readers. regards, Bow On Fri, Oct 26, 2012 at 11:21 AM, Peter Cock wrote: > On Fri, Oct 26, 2012 at 9:35 AM, Kai Blin > wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Hi folks, > > > > In the summer, I've written a HMMer2 parser based on Bow's SearchIO > > code. I'm finally getting around to continue work on the project I > > needed this parser for, and I'm trying to get my code up-to-date. > > > > I notice that Bow's code hasn't hit the biopython master tree yet, and > > also doesn't rebase cleanly on top of it. A merge gives a couple of > > merge conflicts, but seems manageable. However, I'd prefer to stick to > > the upstream sources instead of maintaining my own branch containing > > Bow's SearchIO code merged to master. > > > > What's the chance of this happening any time soon, and can I help? > > > > Cheers, > > Kai > > I'm not sure where the merge conflict is - Bow can probably help > and confirm you're looking at the appropriate branch. > > What would help is comments on the name space ideas in this > thread, since one major point we need to settle ASAP is where > in the namespace SearchIO would go (since it probably won't > just stay as Bio.SearchIO as it is on the branch): > > > http://lists.open-bio.org/pipermail/biopython-dev/2012-September/009910.html > ... > http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html > ... > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From p.j.a.cock at googlemail.com Fri Oct 26 05:43:28 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 10:43:28 +0100 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: References: Message-ID: On Thu, Oct 25, 2012 at 10:36 PM, Connor McCoy wrote: > Hello, > > About a year ago, pip support came up on the list: > > http://biopython.org/pipermail/biopython-dev/2011-October/009234.html > > I remember this being resolved, but when I try to install biopython with > pip, it fails: > > $ testenv/bin/pip install biopython > > Downloading/unpacking biopython > Running setup.py egg_info for package biopython > > warning: no previously-included files matching '.cvsignore' found > under directory '*' > warning: no previously-included files matching '*.pyc' found under > directory '*' > Installing collected packages: biopython > Running setup.py install for biopython > > Numerical Python (NumPy) is not installed. > > This package is required for many Biopython features. Please > install > it before you install Biopython. You can install Biopython anyway, > but > anything dependent on NumPy will not work. If you do this, and later > install NumPy, you should then re-install Biopython. > > You can find NumPy at http://numpy.scipy.org > > Complete output from command > /home/cmccoy/development/seqmagick/testenv/bin/python -c "import > setuptools;__file__='/home/cmccoy/development/seqmagick/testenv/build/biopython/set > up.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), > __file__, 'exec'))" install --single-version-externally-managed --record > /tmp/pip-wc___H-record/install-record.txt - > -install-headers > /home/cmccoy/development/seqmagick/testenv/include/site/python2.7: > running install > > > > Numerical Python (NumPy) is not installed. > > > > This package is required for many Biopython features. Please install > > it before you install Biopython. You can install Biopython anyway, but > > anything dependent on NumPy will not work. If you do this, and later > > install NumPy, you should then re-install Biopython. > > > > You can find NumPy at http://numpy.scipy.org > > > > ---------------------------------------- > Command /home/cmccoy/development/seqmagick/testenv/bin/python -c > "import > setuptools;__file__='/home/cmccoy/development/seqmagick/testenv/build/biopython/setup.py';exec(compile(open( > __file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install > --single-version-externally-managed --record > /tmp/pip-wc___H-record/install-record.txt --install-headers /home/cm > ccoy/development/seqmagick/testenv/include/site/python2.7 failed with > error code 255 in /home/cmccoy/development/seqmagick/testenv/build/biopython > Storing complete log in /home/cmccoy/.pip/pip.log > > > Same for libraries which list biopython in `install_requires`. > > Does anyone know of a way around this? > > Thanks, > Connor Hi Connor, This is probably a question for Brad - I don't use pip. Was it sitting stalled at the prompt from Biopython's setup.py? "Do you want to continue this installation? (y/N)" or from pip? i.e. What was at the end of the complete log? In terms of a quick workaround, what we use under TravisCI (where most of the targets don't have numpy installed) is piping a yes on stdin, e.g. $ /usr/bin/yes | python setup.py install Peter From p.j.a.cock at googlemail.com Fri Oct 26 06:31:06 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 11:31:06 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: <508A6535.6070507@biotech.uni-tuebingen.de> References: <508A4B6C.6020801@biotech.uni-tuebingen.de> <508A6535.6070507@biotech.uni-tuebingen.de> Message-ID: On Fri, Oct 26, 2012 at 11:25 AM, Kai Blin wrote: >> Also, I think there are still some issues that need to be dealt >> with before we put SearchIO into master, notably with Bio.BLAST >> module. If not the official deprecation notice, at least the the >> tutorial has to be updated (let Bio.BLAST readers know about the >> plan with SearchIO). I've written a short tutorial here: >> http://bow.web.id/biopython/Tutorial.html. This is still a draft, >> but you can already see that there are some obvious overlaps >> between Bio.BLAST and Bio.SearchIO, which is confusing to new >> readers. > > Personally I wouldn't let this consideration block the inclusion of a > module as useful like that. Of course I need this code, so I'm biased. I'm also OK with merging the code before updating the Tutorial chapter on BLAST (which would probably become a broader chapter on BLAST and other tools using SearchIO). As discussed before, the long term aim would be to remove Bio.BLAST. > I'll have to read up on the namespace discussion. While I see the > benefit of using PEP8 names, intuitively I don't like bio.seq.search > much. Then again, I started my life in Bio* with BioPerl, and like the > pretty similar module layout BioPython has so far. Yeah - the current naming of SeqIO and AlignIO was directly inspired by BioPerl, and give the working name of SearchIO. Peter From kai.blin at biotech.uni-tuebingen.de Fri Oct 26 06:25:57 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 26 Oct 2012 12:25:57 +0200 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <508A4B6C.6020801@biotech.uni-tuebingen.de> Message-ID: <508A6535.6070507@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-10-26 11:33, Wibowo Arindrarto wrote: > Hi Kai, Peter, > > For the merge conflict, which branch are you using? Can you point > to specific commits that cause the conflicts? I haven't tried > merging / rebasing my own branch to the current master myself ~ so > knowing this should help the process as well. For merging, I think I had to change .travis.yml setup.py and Tests/run_tests.py .travis.yml and setup.py mainly had whitespace changes in comments, so I just went with the version from master on those changes. As I said, nothing really huge. https://github.com/kblin/biopython/tree/searchio-merge is the merged tree. The rebase had a number of things, I just gave up on that. > Also, I think there are still some issues that need to be dealt > with before we put SearchIO into master, notably with Bio.BLAST > module. If not the official deprecation notice, at least the the > tutorial has to be updated (let Bio.BLAST readers know about the > plan with SearchIO). I've written a short tutorial here: > http://bow.web.id/biopython/Tutorial.html. This is still a draft, > but you can already see that there are some obvious overlaps > between Bio.BLAST and Bio.SearchIO, which is confusing to new > readers. Personally I wouldn't let this consideration block the inclusion of a module as useful like that. Of course I need this code, so I'm biased. I'll have to read up on the namespace discussion. While I see the benefit of using PEP8 names, intuitively I don't like bio.seq.search much. Then again, I started my life in Bio* with BioPerl, and like the pretty similar module layout BioPython has so far. Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQimU1AAoJEKM5lwBiwTTPLUsH/i1C1jWmSgjk3PZSOo2kpn4l sGfonyZ7UcyOyM1RYMOc9xaJwevyGJbxVpdmhzIsCr8WZ2++uTgqwOKHROw84bu4 BfVTovUD3mNUK3kGEemOQQal8HyjTZozRFmPgQpSSTOOgQE964kA7mm2HJH9sNx9 NHUKj+dk7UwmbzETl2Q0/1lmxdptOVCTyQvwMzleCX4dwgdGumyrNiBQmBLerAKV CRW8cVmVPKkVUokuzWpt6LPZIoUxMz5RVmTJktOX0fpg79ULfXQucByrGtGQbiSR JMWGrK5yCliSz1WqV8r/Tx0VfPmEeiZFyzZb5KiAFE88sJK85cbFgUBegUTDZSU= =372O -----END PGP SIGNATURE----- From w.arindrarto at gmail.com Fri Oct 26 06:38:50 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 26 Oct 2012 12:38:50 +0200 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <508A4B6C.6020801@biotech.uni-tuebingen.de> <508A6535.6070507@biotech.uni-tuebingen.de> Message-ID: >> Also, I think there are still some issues that need to be dealt > > >> with before we put SearchIO into master, notably with Bio.BLAST > >> module. If not the official deprecation notice, at least the the > >> tutorial has to be updated (let Bio.BLAST readers know about the > >> plan with SearchIO). I've written a short tutorial here: > >> http://bow.web.id/biopython/Tutorial.html. This is still a draft, > >> but you can already see that there are some obvious overlaps > >> between Bio.BLAST and Bio.SearchIO, which is confusing to new > >> readers. > > > > Personally I wouldn't let this consideration block the inclusion of a > > module as useful like that. Of course I need this code, so I'm biased. > > I'm also OK with merging the code before updating the Tutorial > chapter on BLAST (which would probably become a broader > chapter on BLAST and other tools using SearchIO). As discussed > before, the long term aim would be to remove Bio.BLAST. Ah, ok then :). There are other things I'm still working on at the moment (BLAST plain text writer, details about migrating from Bio.Blast), but I consider these to be less urgent than the tutorial. If everyone is ok for merging, then I'm good too :). I suppose we are going to use the 'beta' new feature warning here, right? > > I'll have to read up on the namespace discussion. While I see the > > benefit of using PEP8 names, intuitively I don't like bio.seq.search > > much. Then again, I started my life in Bio* with BioPerl, and like the > > pretty similar module layout BioPython has so far. > > Yeah - the current naming of SeqIO and AlignIO was directly > inspired by BioPerl, and give the working name of SearchIO. > > Peter Reaching a unanimous decision on name preference seems difficult :/. We now have: 1. Bio.seq.search (in line with the namespace change) 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used to be Bio.SeqSearch, now adjusted for PEP8 compliance) 3. Bio.search (same reasoning + explanation like Bio.seqsearch). 4. Bio.SearchIO / Bio.searchio 5. Bio.psearch (p for pairwise) Any other suggestions? Should we put it to a vote? regards, Bowo From p.j.a.cock at googlemail.com Fri Oct 26 06:51:32 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 11:51:32 +0100 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: <508A694B.7030800@biotech.uni-tuebingen.de> References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> Message-ID: On Fri, Oct 26, 2012 at 11:43 AM, Kai Blin wrote: > > Hi folks, > > I realize I'm late to this party, but I was asked to give an opinion > in the SearchIO thread. > > On 2012-09-06 09:06, Peter Cock wrote: >> For single user machines, where the single user has only a small >> collection of scripts this isn't such an issue. For any shared >> server, or user with lots of Biopython scripts (some of which may >> have been written by different people), you would be forced into a >> mass change at one go. >> >> You would also have considerable hassle later on with any attempt >> to re-run old scripts. > > In my opinion, this is where python virtualenv [1] can really make > life easier, and I'd recommend this for running old library versions > anyway. > > I'd rather do the correct change now, for every version of python, and > explain to people how to set up virtualenvs for their older scripts. I don't think this is practical - you'd have a *lot* of explaining to do for all the users who'd be bitten by such a big non-backward compatible change (and associated systems administrators). Indirectly it sounds like you like the lower case name idea - what do you think about making this switch under Python 3? (This will only inconvenience the relatively small number of early adopters already trying Biopython under Python 3 - but it would be another bump for people transitioning from Python 2 to 3). Peter From p.j.a.cock at googlemail.com Fri Oct 26 06:57:16 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 11:57:16 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <508A4B6C.6020801@biotech.uni-tuebingen.de> <508A6535.6070507@biotech.uni-tuebingen.de> Message-ID: On Fri, Oct 26, 2012 at 11:38 AM, Wibowo Arindrarto wrote: >>> Also, I think there are still some issues that need to be dealt >> >> >> with before we put SearchIO into master, notably with Bio.BLAST >> >> module. If not the official deprecation notice, at least the the >> >> tutorial has to be updated (let Bio.BLAST readers know about the >> >> plan with SearchIO). I've written a short tutorial here: >> >> http://bow.web.id/biopython/Tutorial.html. This is still a draft, >> >> but you can already see that there are some obvious overlaps >> >> between Bio.BLAST and Bio.SearchIO, which is confusing to new >> >> readers. >> > >> > Personally I wouldn't let this consideration block the inclusion of a >> > module as useful like that. Of course I need this code, so I'm biased. >> >> I'm also OK with merging the code before updating the Tutorial >> chapter on BLAST (which would probably become a broader >> chapter on BLAST and other tools using SearchIO). As discussed >> before, the long term aim would be to remove Bio.BLAST. > > Ah, ok then :). There are other things I'm still working on at the > moment (BLAST plain text writer, details about migrating from > Bio.Blast), but I consider these to be less urgent than the tutorial. > If everyone is ok for merging, then I'm good too :). I suppose we are > going to use the 'beta' new feature warning here, right? Yes to the 'beta' warning. I'd like to get some wider testing with community feedback on the API, while giving us the option to change it before declaring it stable. >> > I'll have to read up on the namespace discussion. While I see the >> > benefit of using PEP8 names, intuitively I don't like bio.seq.search >> > much. Then again, I started my life in Bio* with BioPerl, and like the >> > pretty similar module layout BioPython has so far. >> >> Yeah - the current naming of SeqIO and AlignIO was directly >> inspired by BioPerl, and give the working name of SearchIO. >> >> Peter > > Reaching a unanimous decision on name preference seems difficult :/. > We now have: > > 1. Bio.seq.search (in line with the namespace change) > 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used > to be Bio.SeqSearch, now adjusted for PEP8 compliance) > 3. Bio.search (same reasoning + explanation like Bio.seqsearch). > 4. Bio.SearchIO / Bio.searchio > 5. Bio.psearch (p for pairwise) > > Any other suggestions? Should we put it to a vote? I'd like a consensus first on the larger question of should we adopt lower case module names automatically under Python 3. In that case, option (1) about would be bio.seq.search under Python 3, and so on. Peter From kai.blin at biotech.uni-tuebingen.de Fri Oct 26 06:43:23 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 26 Oct 2012 12:43:23 +0200 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: <508A694B.7030800@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-09-06 09:06, Peter Cock wrote: Hi folks, I realize I'm late to this party, but I was asked to give an opinion in the SearchIO thread. > For single user machines, where the single user has only a small > collection of scripts this isn't such an issue. For any shared > server, or user with lots of Biopython scripts (some of which may > have been written by different people), you would be forced into a > mass change at one go. > > You would also have considerable hassle later on with any attempt > to re-run old scripts. In my opinion, this is where python virtualenv [1] can really make life easier, and I'd recommend this for running old library versions anyway. I'd rather do the correct change now, for every version of python, and explain to people how to set up virtualenvs for their older scripts. Cheers, Kai [1] http://pypi.python.org/pypi/virtualenv - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQimlLAAoJEKM5lwBiwTTPsswIAMnEn4AT8xrfsq3xzkbB6tS2 y5FkLAb11xDP5PpttA+5qDXmnmJuMFqYq8FsSnJnpVq+ZGSAkswFC1prqQp57LdG V+EVZtf/HDzepbrVgNYe272nTPlc6cxjmtjWJca19fg8gKI97ryUiji/bbOfgjgM cnGHeUYkGmrcWrI8ergOS/5qLi3Z6S6t+uJezPT3DkbSm8oiOVAuPrIv6MziX69W QrKF3Edf4s1Do4URSVfZI1qVUEGFaLZMYvZ8/TMgDI2CAQLo0r2OxylrjJxcuqIB nORFTdwFMD7npDLkyG5U4eWZpfAV9A4RHNTybhpb7RgdVHifnoivA0nIAhsIAWE= =3VH6 -----END PGP SIGNATURE----- From kai.blin at biotech.uni-tuebingen.de Fri Oct 26 08:21:21 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 26 Oct 2012 14:21:21 +0200 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> Message-ID: <508A8041.2020203@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-10-26 12:51, Peter Cock wrote: Hi Peter, > Indirectly it sounds like you like the lower case name idea - what > do you think about making this switch under Python 3? (This will > only inconvenience the relatively small number of early adopters > already trying Biopython under Python 3 - but it would be another > bump for people transitioning from Python 2 to 3). Actually, as someone who has to switch between BioPython and BioPerl a lot, I'd personally prefer if both libraries stayed as close as possible in their structure. In my opinion, the ability to easily switch between languages while using the Bio* libraries is one of the biggest features. As far as I understand we're just changing module names here, so all that'd be different would be the import lines. After reading thought this thread, I got the impression that there was a general agreement on switching to PEP8-compatible names eventually, and the remaining question was how to best do that. I haven't played with Python 3 much yet, but I have the impression that switching to it likely is going to be painful anyway. Even if the module renaming makes the transition a bit more painful, at least you've only got to go through the pain once. Assuming the translations between the 2.x and 3.x names can be done automatically by the conversion script, this sounds like a good idea. Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQioBBAAoJEKM5lwBiwTTPhxYIALTM1TQvOcE6upSFOCrfA0Uh irgvsQi77JfWvDsvGnOk74+ZQDDM2KGGAR3s9QBPdjRtaXhxSvdSxlXq3sdTNsXh VjbhEkeW6J3NzVSYbwK3U/mP0D9Xs6ihvnne06Nn7qjH+TLGm2x78cPM5SvjUcL3 QHiHda0wW479J9ZyKhmDTsCXqpX96uH3sjLiKZfs3KJbZ79j20BBWJqWypDuIUb7 DmtY/sngRsqs16yJL1Q35LXskOlCYsHOmJmkXg3Umr8gKOSw5nCEszhatXS3Oygo Pv8F7exvoEfNHg1IQtmEFycou9k5IaGVsZoRhCE6YvUCJH4Zfz4eOUTD323AzT4= =UPdn -----END PGP SIGNATURE----- From p.j.a.cock at googlemail.com Fri Oct 26 08:42:25 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 13:42:25 +0100 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: <508A8041.2020203@biotech.uni-tuebingen.de> References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> Message-ID: On Fri, Oct 26, 2012 at 1:21 PM, Kai Blin wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 2012-10-26 12:51, Peter Cock wrote: > > Hi Peter, > >> Indirectly it sounds like you like the lower case name idea - what >> do you think about making this switch under Python 3? (This will >> only inconvenience the relatively small number of early adopters >> already trying Biopython under Python 3 - but it would be another >> bump for people transitioning from Python 2 to 3). > > Actually, as someone who has to switch between BioPython and BioPerl a > lot, I'd personally prefer if both libraries stayed as close as > possible in their structure. In my opinion, the ability to easily > switch between languages while using the Bio* libraries is one of the > biggest features. As far as I understand we're just changing module > names here, so all that'd be different would be the import lines. > > After reading thought this thread, I got the impression that there was > a general agreement on switching to PEP8-compatible names eventually, > and the remaining question was how to best do that. Yes - hindered by the fact that due to file system limitations we can't have multiple capitalisations of a given module at the same time. Ideally we'd like to use bio.* as the namespace, and make this switch as part of moving to Python 3 is one way to do that. My personal preference is for a new lowercase namespace like biopy.* or biopython.* which can co-exist with Bio.* during a transition period. However, this did not seem popular. > I haven't played with Python 3 much yet, but I have the impression > that switching to it likely is going to be painful anyway. Even if the > module renaming makes the transition a bit more painful, at least > you've only got to go through the pain once. > > Assuming the translations between the 2.x and 3.x names can be done > automatically by the conversion script, this sounds like a good idea. That was my thinking - but it does go against the general advice to library authors in that API changes from Python 2.x to 3.x are discouraged. We can of course stick with Bio.* as it is (which I believe is Brad's favoured option). And I'm OK with this - it is the simplest option (and doesn't prevent us doing some more minor changes if we want to, such as reorganising all the Bio.SeqXXXX modules under one directory). Perhaps a blog post & email to the announcement mailing list soliciting feedback on this proposal is the best way forward, perhaps with a web-survey form? e.g. (1) Keep the namespace as 'Bio' (2) Keep the namespace as 'Bio' on Python 2, but adopt all lowercase module names on Python 3. (3) Move to a new all lowercase namespace like 'biopy' (anything except 'bio'), allowing the current 'Bio' namespace to continue to be available as well during a transition period. And the most disruptive option: (4) Switch to an all lowercase namespace 'bio', which cannot in general co-exist with the old 'Bio' namespace (perhaps bumping the version number to 2.0.0?). This would break legacy scripts, which would need to be updated, e.g.: from Bio.SeqRecord import SeqRecord from Bio import SeqIO could be replaced by: try: #Biopython 1.x uses Bio.* from Bio.SeqRecord import SeqRecord from Bio import SeqIO except ImportError: This would mean under Windows and most Mac install you cannot have both you (and all other users of the machine) m must be remove Regards, Peter From p.j.a.cock at googlemail.com Fri Oct 26 08:43:36 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 13:43:36 +0100 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> Message-ID: Arg - occidentally tabbed to the send button while trying to indent sample code... On Fri, Oct 26, 2012 at 1:42 PM, Peter Cock wrote: > > Perhaps a blog post & email to the announcement mailing list > soliciting feedback on this proposal is the best way forward, > perhaps with a web-survey form? e.g. > > (1) Keep the namespace as 'Bio' > > (2) Keep the namespace as 'Bio' on Python 2, > but adopt all lowercase module names on Python 3. > > (3) Move to a new all lowercase namespace like 'biopy' > (anything except 'bio'), allowing the current 'Bio' namespace > to continue to be available as well during a transition period. > > And the most disruptive option: > > (4) Switch to an all lowercase namespace 'bio', which > cannot in general co-exist with the old 'Bio' namespace > (perhaps bumping the version number to 2.0.0?). This > would break legacy scripts, which would need to be > updated, e.g.: > > from Bio.SeqRecord import SeqRecord > from Bio import SeqIO > > could be replaced by: try: #Biopython 1.x uses Bio.* from Bio.SeqRecord import SeqRecord from Bio import SeqIO except ImportError: > > > > > This would mean under Windows and most Mac install > you cannot have both > you (and all other users of the machine) m > must be remove > > Regards, > > Peter From p.j.a.cock at googlemail.com Fri Oct 26 08:50:23 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 13:50:23 +0100 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> Message-ID: On Fri, Oct 26, 2012 at 1:43 PM, Peter Cock wrote: > Arg - occidentally tabbed to the send button while trying to indent > sample code... Has something changed on GoogleMail's keyboard handling? Either that or I'm having a bad typing day... my apologies for the two extra emails. To continue: Perhaps a blog post & email to the announcement mailing list soliciting feedback on this proposal is the best way forward, perhaps with a web-survey form? e.g. (1) Keep the namespace as 'Bio' (2) Keep the namespace as 'Bio' on Python 2, but adopt all lowercase module names on Python 3. (3) Move to a new all lowercase namespace like 'biopy' (anything except 'bio'), allowing the current 'Bio' namespace to continue to be available as well during a transition period. And the most disruptive option: (4) Switch to an all lowercase namespace 'bio', which cannot in general co-exist with the old 'Bio' namespace (perhaps bumping the version number to 2.0.0?). This would break legacy scripts, which would need to be updated, e.g.: from Bio.SeqRecord import SeqRecord from Bio import SeqIO could be replaced by: try: #Biopython 1.x uses Bio.* from Bio.SeqRecord import SeqRecord from Bio import SeqIO except ImportError: #Try the new lowercase module names, from bio.seqrecord import SeqRecord from bio import seqio as SeqIO Users on Windows and most Mac users might find updating Biopython complicated during this transition due to the change in case of the folder names. For anyone installing from source this might require manual removal of the old folders (I ran into this kind of issue while trying the lower case naming under Python 3). Potentially under Linux (and any Mac using a case sensitive file system) an old Biopython install using Bio/ and the newer Biopython using bio/ could co-exist... we would have to look at that. Regards, Peter From kai.blin at biotech.uni-tuebingen.de Fri Oct 26 09:34:12 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 26 Oct 2012 15:34:12 +0200 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> Message-ID: <508A9154.8020507@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-10-26 14:42, Peter Cock wrote: > My personal preference is for a new lowercase namespace like > biopy.* or biopython.* which can co-exist with Bio.* during a > transition period. However, this did not seem popular. That'd still mean older scripts would break after the transition period, and we'll end up encoding the language name in the module, which seems a bit silly. Having said that, I see the least amount of pain for BioPython users going that route, with the possibly larger maintenance headache for BioPython developers. I think this is one of these "what color do we paint the bikeshed" discussions, where there really isn't any objectively superior solution. > That was my thinking - but it does go against the general advice to > library authors in that API changes from Python 2.x to 3.x are > discouraged. Right, but from dealing with the python folks on Freenode IRC, I gather that many of them assume the switch from Python 2.x to 3.x is a very low-impact change for code authors. I tend to disagree there. :) > We can of course stick with Bio.* as it is (which I believe is > Brad's favoured option). And I'm OK with this - it is the simplest > option (and doesn't prevent us doing some more minor changes if we > want to, such as reorganising all the Bio.SeqXXXX modules under one > directory). As I said, strong feeling of a bikeshed discussion here. :) > Perhaps a blog post & email to the announcement mailing list > soliciting feedback on this proposal is the best way forward, > perhaps with a web-survey form? e.g. To be honest, I don't care that much about which solution is decided on, as long as the decision is made soon. I've got some programs that need the HMMer2 parser that I've added to Bow's SearchIO code, and I'm hoping to get that into BioPython soon instead of having to ship with a custom BioPython for publication. Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQipFTAAoJEKM5lwBiwTTP4nkIAI5TegXeWy6b8FoPmq46XPzz iVh6g0t37xAJ9Aat3aE5vDklF7yqEwcVPKxFkj2Nd2MLaDqhfnuldE9pEqbPmZfl eQptF5JXTAlw/YKAPFzTyFSIlKv3wiuTiGeTxKJtXewOkgEu6VwzNgjPnCYhamaT Nda7NQEA6mlmaH7ABwO1mLLObk7i90oqVNDIuhnOAAA1ZrVnnQ4QHRupbiLZVd3d 3od3JVM4h+ZT5AL12Lts9lAdrc94MVri5i0P1VSQEnAQV/LJ5uoT2a4l2DRFM35R NR501X7ubTQPrK8ATveTWaCYYcn/XMnS7dEpvSWsxFR8oM+69LxF3UVtH2ShfDs= =Teym -----END PGP SIGNATURE----- From eric.talevich at gmail.com Fri Oct 26 11:19:23 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 26 Oct 2012 11:19:23 -0400 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <508A4B6C.6020801@biotech.uni-tuebingen.de> <508A6535.6070507@biotech.uni-tuebingen.de> Message-ID: On Fri, Oct 26, 2012 at 6:38 AM, Wibowo Arindrarto wrote: > >> Also, I think there are still some issues that need to be dealt > > > > >> with before we put SearchIO into master, notably with Bio.BLAST > > >> module. If not the official deprecation notice, at least the the > > >> tutorial has to be updated (let Bio.BLAST readers know about the > > >> plan with SearchIO). I've written a short tutorial here: > > >> http://bow.web.id/biopython/Tutorial.html. This is still a draft, > > >> but you can already see that there are some obvious overlaps > > >> between Bio.BLAST and Bio.SearchIO, which is confusing to new > > >> readers. > > > > > > Personally I wouldn't let this consideration block the inclusion of a > > > module as useful like that. Of course I need this code, so I'm biased. > > > > I'm also OK with merging the code before updating the Tutorial > > chapter on BLAST (which would probably become a broader > > chapter on BLAST and other tools using SearchIO). As discussed > > before, the long term aim would be to remove Bio.BLAST. > Bio.Blast does contain some features beyond parsing the output of BLAST... > > I'll have to read up on the namespace discussion. While I see the > > > benefit of using PEP8 names, intuitively I don't like bio.seq.search > > > much. Then again, I started my life in Bio* with BioPerl, and like the > > > pretty similar module layout BioPython has so far. > > > > Yeah - the current naming of SeqIO and AlignIO was directly > > inspired by BioPerl, and give the working name of SearchIO. > > > > Peter > > Reaching a unanimous decision on name preference seems difficult :/. > We now have: > > 1. Bio.seq.search (in line with the namespace change) > 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used > to be Bio.SeqSearch, now adjusted for PEP8 compliance) > 3. Bio.search (same reasoning + explanation like Bio.seqsearch). > 4. Bio.SearchIO / Bio.searchio > 5. Bio.psearch (p for pairwise) > > Any other suggestions? Should we put it to a vote? > > regards, > Bowo > > If it's down to a vote, I would vote to merge this branch as Bio.SearchIO, and perhaps lowercase it to Bio.searchio or biopy.searchio in the Py3 lowercase branch. Rationale: We already follow BioPerl with SeqIO and AlignIO, and it seems to help users. It's also Google-friendly. -Eric From p.j.a.cock at googlemail.com Fri Oct 26 11:42:18 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 16:42:18 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <508A4B6C.6020801@biotech.uni-tuebingen.de> <508A6535.6070507@biotech.uni-tuebingen.de> Message-ID: On Fri, Oct 26, 2012 at 4:19 PM, Eric Talevich wrote: > > Bio.Blast does contain some features beyond parsing the output of BLAST... > Also wrappers to call the tools, and the online search. Easy enough. >> Reaching a unanimous decision on name preference seems difficult :/. >> We now have: >> >> 1. Bio.seq.search (in line with the namespace change) >> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used >> to be Bio.SeqSearch, now adjusted for PEP8 compliance) >> 3. Bio.search (same reasoning + explanation like Bio.seqsearch). >> 4. Bio.SearchIO / Bio.searchio >> 5. Bio.psearch (p for pairwise) >> >> Any other suggestions? Should we put it to a vote? >> >> regards, >> Bowo >> > > If it's down to a vote, I would vote to merge this branch as Bio.SearchIO, > and perhaps lowercase it to Bio.searchio or biopy.searchio in the Py3 > lowercase branch. > > Rationale: We already follow BioPerl with SeqIO and AlignIO, and it > seems to help users. It's also Google-friendly. I like Bio.SearchIO for those reasons too. Perhaps that is the most popular name? Peter From mjldehoon at yahoo.com Fri Oct 26 11:58:04 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 26 Oct 2012 08:58:04 -0700 (PDT) Subject: [Biopython-dev] Status of SearchIO In-Reply-To: Message-ID: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> > 1. Bio.seq.search (in line with the namespace change) > 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used > to be Bio.SeqSearch, now adjusted for PEP8 compliance) > 3. Bio.search (same reasoning + explanation like Bio.seqsearch). > 4. Bio.SearchIO / Bio.searchio > 5. Bio.psearch (p for pairwise) > If it's down to a vote, I would vote to merge this branch as > Bio.SearchIO, and perhaps lowercase it to Bio.searchio or > biopy.searchio in the Py3 lowercase branch. > > Rationale: We already follow BioPerl with SeqIO and AlignIO, and it > seems to help users. It's also Google-friendly. I would vote for Bio.seq.search. I don't like Bio.SearchIO much because a) it doesn't tell you clearly what the module is about; and b) I think it it is a mistake to have Bio.SeqIO separate from Bio.Seq, and Bio.AlignIO separate from Bio.Align, because in both cases the two modules conceptually deal with the same thing. We don't have Bio.Cluster and Bio.ClusterIO, Bio.Entrez and Bio.EntrezIO, Bio.Motif and Bio.MotifIO; why should Bio.Seq and Bio.Align be different? -Michiel. From p.j.a.cock at googlemail.com Fri Oct 26 12:14:22 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 17:14:22 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> References: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: On Fri, Oct 26, 2012 at 4:58 PM, Michiel de Hoon wrote: >> 1. Bio.seq.search (in line with the namespace change) >> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used >> to be Bio.SeqSearch, now adjusted for PEP8 compliance) >> 3. Bio.search (same reasoning + explanation like Bio.seqsearch). >> 4. Bio.SearchIO / Bio.searchio >> 5. Bio.psearch (p for pairwise) > >> If it's down to a vote, I would vote to merge this branch as >> Bio.SearchIO, and perhaps lowercase it to Bio.searchio or >> biopy.searchio in the Py3 lowercase branch. >> >> Rationale: We already follow BioPerl with SeqIO and AlignIO, and it >> seems to help users. It's also Google-friendly. > > I would vote for Bio.seq.search. And would you support moving other existing Bio.SeqXXX modules under Bio.seq.* as for example outlined here?: http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html If so then I think we should go with that plan. > I don't like Bio.SearchIO much because a) it doesn't tell you clearly > what the module is about; and b) I think it it is a mistake to have > Bio.SeqIO separate from Bio.Seq, and Bio.AlignIO separate from > Bio.Align, because in both cases the two modules conceptually deal > with the same thing. We don't have Bio.Cluster and Bio.ClusterIO, > Bio.Entrez and Bio.EntrezIO, Bio.Motif and Bio.MotifIO; why should > Bio.Seq and Bio.Align be different? After all, not everyone was exposed to BioPerl before Biopython ;) Peter From p.j.a.cock at googlemail.com Fri Oct 26 17:19:28 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 22:19:28 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: On Fri, Oct 26, 2012 at 5:14 PM, Peter Cock wrote: > On Fri, Oct 26, 2012 at 4:58 PM, Michiel de Hoon wrote: >>> 1. Bio.seq.search (in line with the namespace change) >>> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used >>> to be Bio.SeqSearch, now adjusted for PEP8 compliance) >>> 3. Bio.search (same reasoning + explanation like Bio.seqsearch). >>> 4. Bio.SearchIO / Bio.searchio >>> 5. Bio.psearch (p for pairwise) >> >>> If it's down to a vote, I would vote to merge this branch as >>> Bio.SearchIO, and perhaps lowercase it to Bio.searchio or >>> biopy.searchio in the Py3 lowercase branch. >>> >>> Rationale: We already follow BioPerl with SeqIO and AlignIO, and it >>> seems to help users. It's also Google-friendly. >> >> I would vote for Bio.seq.search. > > And would you support moving other existing Bio.SeqXXX > modules under Bio.seq.* as for example outlined here?: > http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html > If so then I think we should go with that plan. I have started exploring that idea on this new branch, https://github.com/peterjc/biopython/tree/bioseq Does anyone object to me applying the first commit to the master branch (defining the previously discussed new warning for 'beta' code)? https://github.com/peterjc/biopython/commit/97485d5dcf2620f7664ae46a7897c1203847538d Note that introducing Bio.seq now (and any relocations under this) can (I believe) still be combined with the lower-case modules under Python 3 idea as well. This just requires the public classes and functions defined under Bio.Seq.* remains mirrored under Bio.Seq.* (this means assorted Seq objects and some functions like translate). Peter From w.arindrarto at gmail.com Fri Oct 26 18:43:45 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Sat, 27 Oct 2012 00:43:45 +0200 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: >>> 1. Bio.seq.search (in line with the namespace change) >>>> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used >>>> to be Bio.SeqSearch, now adjusted for PEP8 compliance) >>>> 3. Bio.search (same reasoning + explanation like Bio.seqsearch). >>>> 4. Bio.SearchIO / Bio.searchio >>>> 5. Bio.psearch (p for pairwise) >>> >>>> If it's down to a vote, I would vote to merge this branch as >>>> Bio.SearchIO, and perhaps lowercase it to Bio.searchio or >>>> biopy.searchio in the Py3 lowercase branch. >>>> >>>> Rationale: We already follow BioPerl with SeqIO and AlignIO, and it >>>> seems to help users. It's also Google-friendly. >>> >>> I would vote for Bio.seq.search. >> >> And would you support moving other existing Bio.SeqXXX >> modules under Bio.seq.* as for example outlined here?: >> http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html >> If so then I think we should go with that plan. > > I have started exploring that idea on this new branch, > https://github.com/peterjc/biopython/tree/bioseq > > Does anyone object to me applying the first commit to the master > branch (defining the previously discussed new warning for 'beta' code)? > https://github.com/peterjc/biopython/commit/97485d5dcf2620f7664ae46a7897c1203847538d No objection from me for the commit :). But I have some concerns for the SearchIO naming. I like Bio.seqsearch best at the moment. Bio.seq.search is good, but I understand that Bio.SearchIO will eventually contain app wrappers and code for remote searches as well. Putting it three levels-deep doesn't feel nice to me. As comparisons, submodules with similar features (Bio.Phylo, and possibly Bio.AlignIO, if in the future it will be merged with alignment app wrappers and the alignment object model) are available under Bio. > Note that introducing Bio.seq now (and any relocations under this) > can (I believe) still be combined with the lower-case modules under > Python 3 idea as well. This just requires the public classes and > functions defined under Bio.Seq.* remains mirrored under Bio.Seq.* > (this means assorted Seq objects and some functions like translate). > > Peter regards, Bow From p.j.a.cock at googlemail.com Fri Oct 26 20:54:47 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 27 Oct 2012 01:54:47 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: On Fri, Oct 26, 2012 at 11:43 PM, Wibowo Arindrarto wrote: >Peter wrote: >> I have started exploring that idea on this new branch, >> https://github.com/peterjc/biopython/tree/bioseq >> >> Does anyone object to me applying the first commit to the master >> branch (defining the previously discussed new warning for 'beta' code)? >> https://github.com/peterjc/biopython/commit/97485d5dcf2620f7664ae46a7897c1203847538d > > No objection from me for the commit :). > > But I have some concerns for the SearchIO naming. I like Bio.seqsearch > best at the moment. Bio.seq.search is good, but I understand that > Bio.SearchIO will eventually contain app wrappers and code for remote > searches as well. Putting it three levels-deep doesn't feel nice to > me. As comparisons, submodules with similar features (Bio.Phylo, and > possibly Bio.AlignIO, if in the future it will be merged with > alignment app wrappers and the alignment object model) are available > under Bio. I think we'd get used to the nested namespace pretty quickly, and this really only affect the import line anyway, e.g. something like this isn't so bad as long as we document this: from Bio.seq.search.apps import BlatCommandLine If the namespace nesting bothers you, then you might not like my thoughts for how to combine Bio.Align and Bio.AlignIO (since we can't use Bio.align due to the folder name clash on case incentive platforms): I was wondering about using Bio.seq.align for this, which again is a bit nested but would make it a system module to Bio.seq.search (aka SearchIO) and Bio.seq.record (which could include the former SeqIO code as well as the SeqRecord class). Peter From eric.talevich at gmail.com Sat Oct 27 00:03:46 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 27 Oct 2012 00:03:46 -0400 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: On Fri, Oct 26, 2012 at 8:54 PM, Peter Cock wrote: > > If the namespace nesting bothers you, then you might not like > my thoughts for how to combine Bio.Align and Bio.AlignIO > (since we can't use Bio.align due to the folder name clash on > case incentive platforms): I was wondering about using > Bio.seq.align for this, which again is a bit nested but would > make it a system module to Bio.seq.search (aka SearchIO) > and Bio.seq.record (which could include the former SeqIO > code as well as the SeqRecord class). > > Does that mean we'd have read, write, convert, etc. under Bio.seq.record? This is how that API would look: from Bio.seq import record for rec in record.parse("example.fa", "fasta"): ... As opposed to: # Minor change from Bio import seqio for record in seqio.parse(...) # Make sure we get those relative imports right! from Bio.seq import io for record in io.parse(...) # Slight cognitive distance, but maybe worth it from Bio import seq for record in seq.parse(...) Also: Technically, Bio.Motif operates on multiple sequence alignments, so it could be moved to Bio.seq.align.motif. (Not entirely trolling here, just pointing out possible consequences.) -Eric From w.arindrarto at gmail.com Sat Oct 27 01:55:27 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Sat, 27 Oct 2012 07:55:27 +0200 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: >> If the namespace nesting bothers you, then you might not like >> my thoughts for how to combine Bio.Align and Bio.AlignIO >> (since we can't use Bio.align due to the folder name clash on >> case incentive platforms): I was wondering about using >> Bio.seq.align for this, which again is a bit nested but would >> make it a system module to Bio.seq.search (aka SearchIO) >> and Bio.seq.record (which could include the former SeqIO >> code as well as the SeqRecord class). >> > Does that mean we'd have read, write, convert, etc. under Bio.seq.record? > This is how that API would look: > > from Bio.seq import record > for rec in record.parse("example.fa", "fasta"): ... > > As opposed to: > > # Minor change > from Bio import seqio > for record in seqio.parse(...) > > # Make sure we get those relative imports right! > from Bio.seq import io > for record in io.parse(...) > > # Slight cognitive distance, but maybe worth it > from Bio import seq > for record in seq.parse(...) > > > Also: Technically, Bio.Motif operates on multiple sequence alignments, so it > could be moved to Bio.seq.align.motif. (Not entirely trolling here, just > pointing out possible consequences.) > > -Eric What bothers me other than it being hidden is also the inconsistency (comparing it to the current namespace). However, if there is also a plan to merge sequence-related submodules under Bio.seq, it feels better and I'm ok with it. Still hidden, but we'll have more consistency and the root namespace will have less clutter. So it would look like this (with previously mentioned examples): Bio.SearchIO -> Bio.seq.search Bio.AlignIO -> Bio.seq.align Bio.Motif -> Bio.seq.motif Bio.SeqIO -> Bio.seq (or merge with Bio.SeqRecord into Bio.seq.record) Bio.SeqRecord -> Bio.seq.record Bio.SeqUtils -> Bio.seq.utils Bio.SeqFeature -> Bio.seq.feature Also maybe: Bio.Alphabet -> Bio.seq.alphabet Bio.Restriction -> Bio.seq.restriction or Bio.seq.utils.restriction And Eric is right, we may go further with Bio.seq.align.motif, but I think nesting sequence-related modules under Bio.seq is the furthest we should go. I personally find it the most intuitive. regards, Bow From mjldehoon at yahoo.com Sat Oct 27 06:46:10 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 27 Oct 2012 03:46:10 -0700 (PDT) Subject: [Biopython-dev] Status of SearchIO In-Reply-To: Message-ID: <1351334770.89984.YahooMailClassic@web164003.mail.gq1.yahoo.com> Hi everybody, --- On Fri, 10/26/12, Peter Cock wrote: > And would you support moving other existing Bio.SeqXXX > modules under Bio.seq.* as for example outlined here?: > http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html Yes that looks good to me. > I'm not 100% sure where the Bio.SeqIO top level functions > would belong, either directly under Bio.seq or Bio.seq.record > might work too. I would prefer to have the top-level functions directly under Bio.seq, since they will be used a lot. Best, -Michiel. From mjldehoon at yahoo.com Sat Oct 27 06:47:43 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 27 Oct 2012 03:47:43 -0700 (PDT) Subject: [Biopython-dev] Status of SearchIO In-Reply-To: Message-ID: <1351334863.39503.YahooMailClassic@web164006.mail.gq1.yahoo.com> --- On Sat, 10/27/12, Wibowo Arindrarto wrote: > And Eric is right, we may go further with Bio.seq.align.motif, but I > think nesting sequence-related modules under Bio.seq is the furthest > we should go. I personally find it the most intuitive. I agree. And according to the Zen of Python, flat is better than nested. Best, -Michiel. From bartek at rezolwenta.eu.org Sat Oct 27 08:55:12 2012 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Sat, 27 Oct 2012 14:55:12 +0200 Subject: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif In-Reply-To: <1351339617.3402.YahooMailClassic@web164005.mail.gq1.yahoo.com> References: <1351339617.3402.YahooMailClassic@web164005.mail.gq1.yahoo.com> Message-ID: Hi Michiel, On Sat, Oct 27, 2012 at 2:06 PM, Michiel de Hoon wrote: > Actually I was thinking about the suggestions for Bio.Motif I made earlier (see http://lists.open-bio.org/pipermail/biopython-dev/2012-September/009946.html). Right now they are just ideas, so I haven't implemented them yet. You mentioned in your reply last month: > >> I'll try to come up with a more thought through and longer response >> later in the week... > Absolutely. It's just that I had quite a crazy time lately (time spent writing proposals and other such stuff...) and I didn't really think too much about Bio.Motif. > So I was wondering if you have any additional comments on these suggestions, or if I can go ahead and start implementing. > I'm sorry if my inactivity has slowed things down. I'll try to be more constructive this time. I think that one thing clear is the Bio.Motif could use some code optimization, especially in the area of PWM searching. Honestly, I don't think that there will be a time in a forseeable future that I'll do it, so if you feel like implementing a better code for PWM handling/searching I'll be happy to do some code review or testing. There are a few things I think would be good to keep: - possibility to invoke motif.pwm_search(...) without worrying about the fact that it is actually carried out by some specialized class - possibility to determine motif thresholds based on fpr or fnr as currently implemented in Bio.Motif.Thresholds module - possibility to convert count based motifs to PWM based motifs without much fuss... All of these things are not really in conflict with your idea of moving the PWM related code to the special class, so if you want to do that, go ahead. If you also have trouble finding time to implement these improvements, I could try to recruit some master student from our department to do that. But if you have time to do the implementation yourself, it will probably be better and faster that way. best Bartek -- Bartek Wilczynski From mjldehoon at yahoo.com Sat Oct 27 22:47:15 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 27 Oct 2012 19:47:15 -0700 (PDT) Subject: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif In-Reply-To: Message-ID: <1351392435.42713.YahooMailClassic@web164006.mail.gq1.yahoo.com> Hi Bartek, OK, thanks! I'll go ahead with the implementation then, and write an update to the mailing list again so people can have a look at it. Best, -Michiel. --- On Sat, 10/27/12, Bartek Wilczynski wrote: > From: Bartek Wilczynski > Subject: Re: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif > To: "Michiel de Hoon" > Cc: "BioPython-Dev" > Date: Saturday, October 27, 2012, 8:55 AM > Hi Michiel, > > On Sat, Oct 27, 2012 at 2:06 PM, Michiel de Hoon > wrote: > > > Actually I was thinking about the suggestions for > Bio.Motif I made earlier (see http://lists.open-bio.org/pipermail/biopython-dev/2012-September/009946.html). > Right now they are just ideas, so I haven't implemented them > yet. You mentioned in your reply last month: > > > >> I'll try to come up with a more thought through and > longer response > >> later in the week... > > > > Absolutely. It's just that I had quite a crazy time lately > (time spent > writing proposals and other such stuff...) and I didn't > really think > too much about Bio.Motif. > > > So I was wondering if you have any additional comments > on these suggestions, or if I can go ahead and start > implementing. > > > > I'm sorry if my inactivity has slowed things down. I'll try > to be more > constructive this time. > > I think that one thing clear is the Bio.Motif could use some > code > optimization, especially in the area of PWM searching. > Honestly, I > don't think that there will be a time in a forseeable future > that I'll > do it, so if you feel like implementing a better code for > PWM > handling/searching I'll be happy to do some code review or > testing. > > There are a few things I think would be good to keep: > - possibility to invoke motif.pwm_search(...) without > worrying about > the fact that it is actually carried out by some specialized > class > - possibility to determine motif thresholds based on fpr or > fnr as > currently implemented in Bio.Motif.Thresholds module > - possibility to convert count based motifs to PWM based > motifs > without much fuss... > > All of these things are not really in conflict with your > idea of > moving the PWM related code to the special class, so if you > want to do > that, go ahead. > > If you also have trouble finding time to implement these > improvements, > I could try to recruit some master student from our > department to do > that. But if you have time to do the implementation > yourself, it will > probably be better and faster that way. > > best > Bartek > > -- > Bartek Wilczynski > From chapmanb at 50mail.com Sun Oct 28 14:55:31 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 28 Oct 2012 14:55:31 -0400 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: References: Message-ID: <87sj8ys9y4.fsf@fastmail.fm> Connor; > I remember this being resolved, but when I try to install biopython with > pip, it fails: Thanks for the report. It looks like the command line options pip uses to call setup.py changed a bit, so the hack we have in place is no longer working. I pushed a fix for this: https://github.com/chapmanb/biopython/commit/e05a355e3e9825c44c4a9b3bdfdda25c9a92c9c4 which seems to resolve the issue and hopefully make it more robust going forward. Could you confirm it works on your system: $ cd /tmp $ git clone git://github.com/chapmanb/biopython.git $ sudo pip install /tmp/biopython If so, I'll push this into the main repo for the next release. Thanks again for letting us know about the problem, Brad From chapmanb at 50mail.com Sun Oct 28 15:02:54 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 28 Oct 2012 15:02:54 -0400 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> Message-ID: <87pq42s9lt.fsf@fastmail.fm> Peter and all; Interesting discussion on the module path issues. I'm agreed with everyone that it would be nice to be pep8 compliant. However, my vote would be to stick with our traditional namespace to avoid widespread breakage. The changes everyone is proposing are nice, but not nice enough to deal with introducing an incompatible version and the documentation and help fallout from that. If everyone wants to go down the module name path, it would be worth investing in a biopython1to2 script that automatically handles the renamings for folks. Just my 2 cents, Brad From p.j.a.cock at googlemail.com Mon Oct 29 04:15:59 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 29 Oct 2012 08:15:59 +0000 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: <87pq42s9lt.fsf@fastmail.fm> References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> <87pq42s9lt.fsf@fastmail.fm> Message-ID: On Sunday, October 28, 2012, Brad Chapman wrote: > > Peter and all; > Interesting discussion on the module path issues. I'm agreed with > everyone that it would be nice to be pep8 compliant. However, my vote > would be to stick with our traditional namespace to avoid widespread > breakage. The changes everyone is proposing are nice, but not nice > enough to deal with introducing an incompatible version and the > documentation and help fallout from that. > > If everyone wants to go down the module name path, it would be worth > investing in a biopython1to2 script that automatically handles the > renamings for folks. > > Just my 2 cents, > Brad > Hi Brad, In the case of Bow's SearchIO code, what would you prefer? e.g. Bio.SearchIO as it is now on his branch? Peter From kai.blin at biotech.uni-tuebingen.de Mon Oct 29 06:26:03 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Mon, 29 Oct 2012 11:26:03 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <508A4B6C.6020801@biotech.uni-tuebingen.de> Message-ID: <508E59BB.1050705@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-10-26 11:33, Wibowo Arindrarto wrote: Hi Bow, Peter, > For the merge conflict, which branch are you using? Can you point > to specific commits that cause the conflicts? I haven't tried > merging / rebasing my own branch to the current master myself ~ so > knowing this should help the process as well. Disregarding the namespace discussion, I needed to get a reasonable branch to get my HMMer2 parser up-to-date in. As I said last week I tried rebasing Bow's searchio branch and had a bunch of merge conflicts. I've retried the rebase today, and most of the merge conflicts are actually pretty trivial and mostly around the question where the code gets it's OrderedDict from for python versions < 2.7. I've pushed the rebased patchset to https://github.com/kblin/biopython/tree/searchio-rebase if anybody wants to have a look. With the last patch fixing an error I seem to have introduced during merge conflict resolution, the SearchIO tests pass on that branch. Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQjlm7AAoJEKM5lwBiwTTPFe8IAMMLmM2kQmb9vOSCuNjbcIfJ HqzzvLaw8Eo44uEb0zmxhuJwPoPZpdZIWCNM1t3LpynaE3mHawLcrYJTT/R1YxkS udBHvMlU6h76J93NITWCzFZ7HHlMMrbzyPel7rifWXbv5xpG2BREpmr1V7lKmbH7 XbInPsVP0PjySFlCQb3219M+IZ4fA+ViYSBlQeXs91G1YzMVo6nkDcs+FkDG8mJt Qg2u4Bhrxaf3qQKNuQzb2AHJ4KpnEkYsTI2FUJfHaulNfN6w9HwsEgyvM6hVqONP 4aIYlsbSlLjbGG3sdliibPJy5A+8AnkNSFlAHydL+FgBVmPqo3Xe0O5buTdz3Vs= =prZo -----END PGP SIGNATURE----- From cmccoy at fhcrc.org Mon Oct 29 11:24:45 2012 From: cmccoy at fhcrc.org (Connor McCoy) Date: Mon, 29 Oct 2012 08:24:45 -0700 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: <87sj8ys9y4.fsf@fastmail.fm> References: <87sj8ys9y4.fsf@fastmail.fm> Message-ID: Hi Brad, Thank you so much for the quick reply. I just got a chance to test this, and it seems to be working again. Best, Connor On Sun, Oct 28, 2012 at 11:55 AM, Brad Chapman wrote: > > Connor; > > > I remember this being resolved, but when I try to install biopython with > > pip, it fails: > > Thanks for the report. It looks like the command line options pip uses > to call setup.py changed a bit, so the hack we have in place is no > longer working. I pushed a fix for this: > > > https://github.com/chapmanb/biopython/commit/e05a355e3e9825c44c4a9b3bdfdda25c9a92c9c4 > > which seems to resolve the issue and hopefully make it more robust going > forward. Could you confirm it works on your system: > > $ cd /tmp > $ git clone git://github.com/chapmanb/biopython.git > $ sudo pip install /tmp/biopython > > If so, I'll push this into the main repo for the next release. Thanks > again for letting us know about the problem, > Brad > -- Connor McCoy Fred Hutchinson Cancer Research Center 1100 Fairview Ave N. Seattle, WA 98109-1924 cmccoy at fhcrc.org From chapmanb at 50mail.com Mon Oct 29 13:54:30 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 29 Oct 2012 13:54:30 -0400 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> <87pq42s9lt.fsf@fastmail.fm> Message-ID: <874nldqi3t.fsf@fastmail.fm> Peter; > In the case of Bow's SearchIO code, what would you prefer? > e.g. Bio.SearchIO as it is now on his branch? I like plain ol' Search the best but don't have a strong preference. I'm terrible at naming things so trust everyone's judgment on this. Brad From w.arindrarto at gmail.com Mon Oct 29 16:11:09 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Mon, 29 Oct 2012 21:11:09 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: <508E59BB.1050705@biotech.uni-tuebingen.de> References: <508A4B6C.6020801@biotech.uni-tuebingen.de> <508E59BB.1050705@biotech.uni-tuebingen.de> Message-ID: Hi Kai, > > For the merge conflict, which branch are you using? Can you point > > to specific commits that cause the conflicts? I haven't tried > > merging / rebasing my own branch to the current master myself ~ so > > knowing this should help the process as well. > > Disregarding the namespace discussion, I needed to get a reasonable > branch to get my HMMer2 parser up-to-date in. As I said last week I > tried rebasing Bow's searchio branch and had a bunch of merge conflicts. > > I've retried the rebase today, and most of the merge conflicts are > actually pretty trivial and mostly around the question where the code > gets it's OrderedDict from for python versions < 2.7. > > I've pushed the rebased patchset to > https://github.com/kblin/biopython/tree/searchio-rebase if anybody > wants to have a look. With the last patch fixing an error I seem to > have introduced during merge conflict resolution, the SearchIO tests > pass on that branch. Thanks for doing the rebase :)! I just checked it and everything looks fine; all unit tests + doctests pass. On another note, I was wondering about how to combine this rebased branch with my local branch. Is there a simple way to apply the changes in the rebased branch to my local working searchio branch or should I just switch to a local checkout of the rebased branch? regards, Bow From kai.blin at biotech.uni-tuebingen.de Mon Oct 29 16:43:49 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Mon, 29 Oct 2012 21:43:49 +0100 Subject: [Biopython-dev] Working with the new SearchIO API Message-ID: <508EEA85.6060906@biotech.uni-tuebingen.de> Hi Bow, I've been looking closer at the SearchIO API changes introduced in August. I think there still is a design problem with the object model, at least when looking at how this affects the hmmer3 parser (and affects the hmmer2 parsing as well). Possibly I'm not seeing the big picture here, so let me explain what I'm seeing, and then you can tell me what I missed. :) So, the hmmer2 and hmmer3 file format basically looks like this # header # ... # ... information about the query list of hits list of hsps (alignments for hsps) (some statistics) // Now, when parsing this file line-wise, you obviously run into the hits first. However, with the new API, you can't create a Hit object without knowing the HSPs, but you haven't read them yet. To work around this, you need to create a fake hit object (https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L201). Then, in the loop that creates the fake hit objects, one of the exit conditions then parses the HSP entries and then replaces the fake hit objects by "real" Hit objects. (https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L188) By the way, that code is a bit misleading. Took me a while to notice the switch of the list's contents. Anyway, back to business. So basically you need to create two hit objects for every hit you're looking at. What's the advantage of forcing Hsp objects to be passed to the Hit constructor? Just to make sure your Hit objects have a valid Hsp at some later point? I'm aware that I'm just looking at the SearchIO design from the perspective of the hmmer2 parser, but I'd like to understand the reasons for the API being the way it currently is. Hope you can shed some light on this, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From kai.blin at biotech.uni-tuebingen.de Mon Oct 29 16:47:11 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Mon, 29 Oct 2012 21:47:11 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <508A4B6C.6020801@biotech.uni-tuebingen.de> <508E59BB.1050705@biotech.uni-tuebingen.de> Message-ID: <508EEB4F.7050607@biotech.uni-tuebingen.de> On 2012-10-29 21:11, Wibowo Arindrarto wrote: Hi Bow, > On another note, I was wondering about how to combine this rebased > branch with my local branch. Is there a simple way to apply the > changes in the rebased branch to my local working searchio branch or > should I just switch to a local checkout of the rebased branch? Well, you could rebase your local changes on top of the rebased branch. :) Or, depending on how many changes you have in your local branch, check our the rebased branch and then git cherry-pick your changes on top of the rebased branch. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From w.arindrarto at gmail.com Mon Oct 29 18:55:19 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Mon, 29 Oct 2012 23:55:19 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: <508EEA85.6060906@biotech.uni-tuebingen.de> References: <508EEA85.6060906@biotech.uni-tuebingen.de> Message-ID: Hi Kai, Thanks for the input & comments! I made the API change mainly because I want to keep the SearchIO object hierarchy more consistent, i.e. there should be as few places as possible to make changes that break the model. There are several attributes that should remain the same between a single QueryResult object and the Hits, HSPs, and HSPFragments it contain. For now, these attributes are the ID (both query and hit ID) and description (also for both query and ID). In the old API, each object in the object model hierarchy stores these values as its own attribute. For example, to store the ID of the Hit object, the old API has the 'id' attribute in the Hit object, 'hit_id' attribute in all HSP objects it contains, and 'hit_id' attributes in all HSPFragment contained by each HSP in the Hit. I see this as unecessary duplications and a possible source of confusion, since these attributes are completely decoupled from one another even though they mean the same thing. The new API stores the these values only at the innermost object in the hierarchy (the HSPFragment), reducing duplications and possible sources of inconsistencies. When you access the attributes from objects other than the HSPFragment, a getter retrieves it from one of the contained HSPFragment object, after ensuring that all HSPFragment contain the same value of the attribute (https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/_utils.py#L99). Similarly, when you set the attribute, a setter applies the new value to all HSPFragment objects contained (https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/_utils.py#L106). This allows you to keep the values consistent across the hierarchy, so long as the change is done at the highest level possible (e.g. changing the hit ID in the HSP object will break consistency, but changing hit ID through the Hit object will update the hit_id attribute value across all HSPs it contains). Conceptually, this is also closer to the real 'Hit' object we're modeling since we always need at least one HSP to declare a database entry as a Hit. The HMMER parser's update is partially influenced by this API change, as you've seen. In the previous version (https://github.com/bow/biopython/blob/12fbe05c5e17f7a356ab672358b2698612aa8cad/Bio/SearchIO/HmmerIO/hmmertext.py), the HMMER parser has several ugly bits (e.g. it sets the hit description in more than one place, a possible source of error). After changing the API to force the creation of Hits with HSPs, these kinds of duplications are eliminated. I personally also feel that using the new API allows me (sometimes forces me) to improve the other format's parsers in a similar way. It's unfortunate that the HMMER text parser is made a little difficult to understand, due to the way HMMER arranges the text output format. And I admit I didn't do any performance benchmark for the HMMER text parser when I made the change (I suspected one extra dictionary per Hit object should not decrease performance that much. Of course, if the change proves to cause severe performance penalties, then yes, we should look into it again.). But for now, I think these are acceptable tradeoffs, if it means the object model becomes more consistent and the other format parsers improved as well. Hope that helps :). regards, Bow P.S. As for the misleading part, yes, I admit that maybe a different name should be used to note that the contents of the list differ. On Mon, Oct 29, 2012 at 9:43 PM, Kai Blin wrote: > Hi Bow, > > I've been looking closer at the SearchIO API changes introduced in > August. I think there still is a design problem with the object model, > at least when looking at how this affects the hmmer3 parser (and affects > the hmmer2 parsing as well). > > Possibly I'm not seeing the big picture here, so let me explain what I'm > seeing, and then you can tell me what I missed. :) > > So, the hmmer2 and hmmer3 file format basically looks like this > > # header > # ... > # ... > > information about the query > > list of hits > > list of hsps > > (alignments for hsps) > > (some statistics) > // > > Now, when parsing this file line-wise, you obviously run into the hits > first. However, with the new API, you can't create a Hit object without > knowing the HSPs, but you haven't read them yet. > > To work around this, you need to create a fake hit object > (https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L201). > Then, in the loop that creates the fake hit objects, one of the exit > conditions then parses the HSP entries and then replaces the fake hit > objects by "real" Hit objects. > (https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L188) > > By the way, that code is a bit misleading. Took me a while to notice the > switch of the list's contents. Anyway, back to business. > > So basically you need to create two hit objects for every hit you're > looking at. What's the advantage of forcing Hsp objects to be passed to > the Hit constructor? Just to make sure your Hit objects have a valid Hsp > at some later point? > > I'm aware that I'm just looking at the SearchIO design from the > perspective of the hmmer2 parser, but I'd like to understand the reasons > for the API being the way it currently is. > > Hope you can shed some light on this, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Institute for Microbiology and Infection Medicine > Division of Microbiology/Biotechnology > Eberhard-Karls-University of T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From kai.blin at biotech.uni-tuebingen.de Tue Oct 30 03:35:40 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 30 Oct 2012 08:35:40 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: References: <508EEA85.6060906@biotech.uni-tuebingen.de> Message-ID: <508F834C.6010404@biotech.uni-tuebingen.de> On 2012-10-29 23:55, Wibowo Arindrarto wrote: Hi Bow, > Thanks for the input & comments! I made the API change mainly because > I want to keep the SearchIO object hierarchy more consistent, i.e. > there should be as few places as possible to make changes that break > the model. Thanks for the explanation. ... > This allows you to keep the values consistent across the hierarchy, so > long as the change is done at the highest level possible (e.g. > changing the hit ID in the HSP object will break consistency, but > changing hit ID through the Hit object will update the hit_id > attribute value across all HSPs it contains). Conceptually, this is > also closer to the real 'Hit' object we're modeling since we always > need at least one HSP to declare a database entry as a Hit. I see. I didn't think about the programmatic side of things. I see the advantage of having only one attribute there and of keeping it consistent. > The HMMER parser's update is partially influenced by this API change, > as you've seen. In the previous version > (https://github.com/bow/biopython/blob/12fbe05c5e17f7a356ab672358b2698612aa8cad/Bio/SearchIO/HmmerIO/hmmertext.py), > the HMMER parser has several ugly bits (e.g. it sets the hit > description in more than one place, a possible source of error). After > changing the API to force the creation of Hits with HSPs, these kinds > of duplications are eliminated. I personally also feel that using the > new API allows me (sometimes forces me) to improve the other format's > parsers in a similar way. Arguably, the more human-readable the file you need to parse, the less readable the parser tends to be. ;) I think the old parser was a more straightforward piece of code. > It's unfortunate that the HMMER text parser is made a little difficult > to understand, due to the way HMMER arranges the text output format. > And I admit I didn't do any performance benchmark for the HMMER text > parser when I made the change (I suspected one extra dictionary per > Hit object should not decrease performance that much. Of course, if > the change proves to cause severe performance penalties, then yes, we > should look into it again.). I'm not talking about performance here, performance likely isn't a problem. I'm saying that you're conceptually creating the Hit object twice. Even the comment in line 200 says so. :) [snip] # create the hit object hit_attrs = { 'id': row[8], 'query_id': qid, 'evalue': float(row[0]), 'bitscore': float(row[1]), 'bias': float(row[2]), # row[3:6] is not parsed, since the info is available # at the the HSP level 'domain_exp_num': float(row[6]), 'domain_obs_num': int(row[7]), 'description': row[9], 'is_included': is_included, } hit_list.append(hit_attrs) [snip] I'm mainly wondering why at this position, I can't just create the Hit object already, and then later set the HSPs. You could do this via a setter function that validates the IDs are identical if you want to make sure you're not shooting yourself in the foot there. > But for now, I think these are acceptable tradeoffs, if it means the > object model becomes more consistent and the other format parsers > improved as well. I haven't looked into the other parsers, so I'll take your word on that. I can of course take the same detour of creating a placeholder hit object for the first pass and then when I've parsed the HSPs create the real Hit object. If this makes all the other parsers more readable at the cost of some obscurity in the hmmer text parsers, well, so be it. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From p.j.a.cock at googlemail.com Tue Oct 30 06:59:44 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 30 Oct 2012 10:59:44 +0000 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: On Fri, Oct 26, 2012 at 11:43 PM, Wibowo Arindrarto wrote: >> >> I have started exploring that idea on this new branch, >> https://github.com/peterjc/biopython/tree/bioseq >> >> Does anyone object to me applying the first commit to the master >> branch (defining the previously discussed new warning for 'beta' code)? >> https://github.com/peterjc/biopython/commit/97485d5dcf2620f7664ae46a7897c1203847538d > > No objection from me for the commit :). > Done, commit adding Bio.BiopythonExperimentalWarning cherry-picked to the master, https://github.com/biopython/biopython/commit/52ac4383b12335ebcdcb8ea52eec8d23ac28b5e2 Peter From p.j.a.cock at googlemail.com Tue Oct 30 07:03:07 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 30 Oct 2012 11:03:07 +0000 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: <874nldqi3t.fsf@fastmail.fm> References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> <87pq42s9lt.fsf@fastmail.fm> <874nldqi3t.fsf@fastmail.fm> Message-ID: On Mon, Oct 29, 2012 at 5:54 PM, Brad Chapman wrote: > > Peter; > >> In the case of Bow's SearchIO code, what would you prefer? >> e.g. Bio.SearchIO as it is now on his branch? > > I like plain ol' Search the best but don't have a strong preference. I'm > terrible at naming things so trust everyone's judgment on this. > > Brad Since we have no clear consensus, I propose we add Bow's code as Bio.SearchIO (which is how it is written right now), with the new BiopythonExperimentalWarning in place (to alert people that it may change in the next release). We can then rename or move it at a later date. This will make it easier for people to test the code, and also suggest further changes or additions (e.g. Kai's HMMER work). If we and when we agree a consolidation of the Bio.SeqXXX modules, then Bio.SearchIO could move too. If this happens before any public release as Bio.SearchIO so much the better. Adopting lower case module names under Python 3 is also a separate issue. Peter From kai.blin at biotech.uni-tuebingen.de Tue Oct 30 10:17:38 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 30 Oct 2012 15:17:38 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: <508EEA85.6060906@biotech.uni-tuebingen.de> References: <508EEA85.6060906@biotech.uni-tuebingen.de> Message-ID: <508FE182.3040202@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-10-29 21:43, Kai Blin wrote: Hi Bow, one more thing: Hmmer2 has the concept of an accession number in the result. Is there an attribute for that in the QueryResult object that I'm missing or do we want a new attribute for that. Would "accession" be a good name? Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQj+GCAAoJEKM5lwBiwTTPaT4IAJb+Xs7sMPpQH4SwUQarItyP Cg0UYLQNRtKBlyhNpipCbz7BWfqxd8fU0GsYSCVF275fDuBLUa337A6psRzefkWa 84cC7uHmOdcmhyeCipdAs5Jtouxf7ReGuQ+m3/SsW0pRfMHOuZamKw+5+oETnisM DiHJUv6iKMHCpXrVWpofcKywqb1uqpxdhTp9F1gy+v6rVGKMI4r/fW5mRQZVxC3s aQdhubCHoN+LUEo/OUKIF6cNeHWLMBToENdYlBhk62gLeSX5bxyhog21pzD+HTYf 5u4rPC2ikVR7iGQ9QPsvW7r7lqpDgoxFbnDYzcsAa+bNYd6+ENs+MAePb8Va2Dg= =Luz9 -----END PGP SIGNATURE----- From kai.blin at biotech.uni-tuebingen.de Tue Oct 30 11:54:50 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 30 Oct 2012 16:54:50 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: <508F834C.6010404@biotech.uni-tuebingen.de> References: <508EEA85.6060906@biotech.uni-tuebingen.de> <508F834C.6010404@biotech.uni-tuebingen.de> Message-ID: <508FF84A.2020802@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-10-30 08:35, Kai Blin wrote: Hi Bow, > I'm mainly wondering why at this position, I can't just create the > Hit object already, and then later set the HSPs. You could do this > via a setter function that validates the IDs are identical if you > want to make sure you're not shooting yourself in the foot there. I've just stumbled over a case where not being able to pre-create Hit objects really bites me. See the attached hmmpfam output. You'll notice that the domain table is not in the order of the hit table. As I'd like to preserve the order of the hit table, the current setup of the API forces me to either repeatedly parse the domain annotations until I find the correct domain annotations for my hit, or to create the hits in the order of the domain annotation table and then reshuffle them to make sure they're in the order of the hit table. If I could just create "empty" hit objects when parsing the hit table, I could easily preserve the order of the hits but still add the hsps as I parse them. Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQj/hKAAoJEKM5lwBiwTTPWTYH/2miexrfxolw9J0tOCSHXFYn eNEzLcIM8ZHUoBCL1fsS/9166VH8D8HpyZCgTQwsSt9BUhQbjkwTmyfmP9wr0QDp 80IbxqWkMAJmDv3Q1RxbVVmD8TTfY6AwezQuwnYb8EFJDD7wvcJOJgJEqlp6zZu1 K/fJNYOXt2GekcXkrOMO1jGkzzpiwBs1uhhpYH9LxMAHPW3vnfTf4/tVSRPOKWRr IXtxRnLSSurmZP4DYNm1ys4NykY6cO6zPOWxJIiI1lBLR7AVaKNK1bZ75m2D7/Mr Y4FjnIlqaCFuNwiYPSNWQvTHOIj/VF/nRSWAVRRCqYZoYaDuZa25rb3Fo5RHMC8= =Lerj -----END PGP SIGNATURE----- -------------- next part -------------- hmmpfam - search one or more sequences against HMM database HMMER 2.3.2 (Oct 2003) Copyright (C) 1992-2003 HHMI/Washington University School of Medicine Freely distributed under the GNU General Public License (GPL) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: ../Shared/Pfam_fs Sequence file: single_porphyra_AA.fa - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query sequence: gi|90819130|dbj|BAE92499.1| Accession: [none] Description: glutamate synthase [Porphyra yezoensis] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- Glu_synthase Conserved region in glutamate synthas 858.6 3.6e-255 2 GATase_2 Glutamine amidotransferases class-II 731.8 3.9e-226 1 Glu_syn_central Glutamate synthase central domain 649.1 7.9e-213 1 GXGXG GXGXG motif 367.3 2.7e-107 1 HdeA hns-dependent expression protein A (H 9.6 0.015 1 GDC-P Glycine cleavage system P-protein 7.1 0.086 1 Cache_1 Cache domain 7.0 0.14 1 IBN_N Importin-beta N-terminal domain 8.2 0.17 1 DUF1200 Protein of unknown function (DUF1200) 6.7 0.42 1 cobW CobW/HypB/UreG, nucleotide-binding do 5.1 0.45 1 PUF Pumilio-family RNA binding repeat 6.5 0.47 1 Arch_flagellin Archaebacterial flagellin 4.1 0.66 1 FMN_dh FMN-dependent dehydrogenase 3.2 0.89 1 RNA_pol_Rpb2_4 RNA polymerase Rpb2, domain 4 4.6 1.4 1 DUF477 Domain of unknown function (DUF477) 3.8 1.7 1 FRG1 FRG1-like family 0.2 1.7 1 DUF1393 Protein of unknown function (DUF1393) 3.1 2 1 tRNA_anti OB-fold nucleic acid binding domain 4.9 2 1 SelT Selenoprotein T 3.1 2.2 1 RNase_PH_C 3' exoribonuclease family, domain 2 4.2 2.3 1 Pencillinase_R Penicillinase repressor 3.9 2.5 1 Hormone_4 Neurohypophysial hormones, N-terminal 4.4 2.5 1 DSRB Dextransucrase DSRB 2.7 2.7 1 FtsK_SpoIIIE FtsK/SpoIIIE family 2.6 3.1 1 UBA UBA/TS-N domain 4.2 3.1 1 DUF1981 Domain of unknown function (DUF1981) 3.6 3.3 1 Gla Vitamin K-dependent carboxylation/gam 4.0 3.5 1 Scm3 Centromere protein Scm3 2.2 3.5 1 Ribosomal_S6 Ribosomal protein S6 3.3 3.7 1 Cystatin Cystatin domain 2.4 3.9 1 Phage_prot_Gp6 Phage portal protein, SPP1 Gp6-like 1.0 4 1 DUF1976 Domain of unknown function (DUF1976) -1.5 4.3 1 DUF37 Domain of unknown function DUF37 3.0 4.5 1 Flavodoxin_NdrI NrdI Flavodoxin like 2.1 4.6 1 Bac_rhodopsin Bacteriorhodopsin 0.9 4.9 1 Nitro_FeMo-Co Dinitrogenase iron-molybdenum cofacto 2.1 5.3 1 MoCF_biosynth Probable molybdopterin binding domain 1.3 5.6 1 PaaA_PaaC Phenylacetic acid catabolic protein 0.4 5.6 1 Albicidin_res Albicidin resistance domain 1.7 5.7 1 DUF1514 Protein of unknown function (DUF1514) 3.5 5.7 1 T5orf172 T5orf172 domain 2.0 6.1 1 Nup133_N Nup133 N terminal like -0.6 6.5 1 BicD Microtubule-associated protein Bicaud -1.6 6.8 1 Sel1 Sel1 repeat 2.5 7 1 CAP_C DE Adenylate cyclase associated (CA 1.3 7.4 1 Colicin Colicin pore forming domain 1.4 7.5 1 MADF_DNA_bdg Alcohol dehydrogenase transcription f 1.8 8.2 1 DUF258 Protein of unknown function, DUF258 0.3 8.3 1 PspB Phage shock protein B 0.4 8.4 1 GspM General secretion pathway, M protein 1.0 8.6 1 Coq4 Coenzyme Q (ubiquinone) biosynthesis -0.3 9.1 1 P22_AR_N P22_AR N-terminal domain -0.2 9.5 1 C1_2 C1 domain 1.1 9.6 1 Phage_Mu_P Bacteriophage Mu P protein -0.4 10 1 Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- GATase_2 1/1 34 404 .. 1 385 [] 731.8 3.9e-226 FRG1 1/1 88 107 .. 151 173 .. 0.2 1.7 C1_2 1/1 191 210 .. 9 27 .. 1.1 9.6 MADF_DNA_bdg 1/1 235 261 .. 57 95 .] 1.8 8.2 PaaA_PaaC 1/1 258 269 .. 1 13 [. 0.4 5.6 Albicidin_res 1/1 274 289 .. 50 65 .. 1.7 5.7 UBA 1/1 311 331 .. 18 38 .] 4.2 3.1 Gla 1/1 342 357 .. 27 42 .] 4.0 3.5 RNA_pol_Rpb2_4 1/1 369 381 .. 1 13 [. 4.6 1.4 MoCF_biosynth 1/1 371 396 .. 23 49 .. 1.3 5.6 DUF1200 1/1 389 401 .. 1 13 [. 6.7 0.42 Nup133_N 1/1 397 419 .. 475 498 .] -0.6 6.5 DUF1976 1/1 428 448 .. 1296 1319 .] -1.5 4.3 Bac_rhodopsin 1/1 445 472 .. 219 250 .] 0.9 4.9 Coq4 1/1 459 481 .. 60 82 .. -0.3 9.1 Glu_syn_central 1/1 478 773 .. 1 301 [] 649.1 7.9e-213 Flavodoxin_NdrI 1/1 488 497 .. 122 131 .] 2.1 4.6 P22_AR_N 1/1 524 541 .. 110 126 .] -0.2 9.5 Cache_1 1/1 537 557 .. 1 23 [. 7.0 0.14 Glu_synthase 1/2 650 676 .. 297 323 .. 1.3 3 HdeA 1/1 727 749 .. 58 79 .] 9.6 0.015 Sel1 1/1 729 745 .. 32 49 .] 2.5 7 DUF1981 1/1 765 787 .. 62 88 .] 3.6 3.3 tRNA_anti 1/1 818 839 .. 54 85 .] 4.9 2 Cystatin 1/1 826 859 .. 1 38 [. 2.4 3.9 RNase_PH_C 1/1 827 846 .. 64 84 .] 4.2 2.3 Glu_synthase 2/2 830 1216 .. 1 412 [] 857.3 9e-255 DUF258 1/1 839 860 .. 282 305 .] 0.3 8.3 Pencillinase_R 1/1 856 894 .. 84 118 .] 3.9 2.5 SelT 1/1 872 885 .. 96 111 .] 3.1 2.2 Nitro_FeMo-Co 1/1 879 897 .. 87 105 .] 2.1 5.3 DUF37 1/1 927 934 .. 61 68 .] 3.0 4.5 Scm3 1/1 953 963 .. 103 113 .] 2.2 3.5 cobW 1/1 1038 1058 .. 202 222 .] 5.1 0.45 Arch_flagellin 1/1 1050 1072 .. 197 219 .] 4.1 0.66 DUF1393 1/1 1055 1068 .. 1 14 [. 3.1 2 FtsK_SpoIIIE 1/1 1107 1143 .. 163 198 .. 2.6 3.1 FMN_dh 1/1 1109 1148 .. 291 330 .. 3.2 0.89 DSRB 1/1 1120 1134 .. 1 16 [. 2.7 2.7 Phage_Mu_P 1/1 1122 1131 .. 1 10 [. -0.4 10 Hormone_4 1/1 1168 1176 .. 1 9 [] 4.4 2.5 GDC-P 1/1 1205 1225 .. 10 30 .. 7.1 0.086 PspB 1/1 1268 1276 .. 1 9 [. 0.4 8.4 T5orf172 1/1 1271 1293 .. 35 58 .. 2.0 6.1 CAP_C 1/1 1283 1292 .. 161 170 .] 1.3 7.4 GXGXG 1/1 1290 1485 .. 1 228 [] 367.3 2.7e-107 DUF1514 1/1 1453 1469 .. 50 66 .] 3.5 5.7 Colicin 1/1 1456 1467 .. 192 203 .] 1.4 7.5 Ribosomal_S6 1/1 1461 1481 .. 16 36 .. 3.3 3.7 BicD 1/1 1465 1481 .. 1 17 [. -1.6 6.8 PUF 1/1 1470 1486 .. 19 35 .] 6.5 0.47 DUF477 1/1 1472 1495 .. 1 24 [. 3.8 1.7 Phage_prot_Gp6 1/1 1479 1492 .. 1 14 [. 1.0 4 IBN_N 1/1 1498 1516 .. 1 20 [. 8.2 0.17 GspM 1/1 1506 1520 .. 1 15 [. 1.0 8.6 Alignments of top-scoring domains: GATase_2: domain 1 of 1, from 34 to 404: score 731.8, E = 3.9e-226 CS EEEEEEEEETSSHSBHHHHHHHHHHHHHGGGGSSCSTTSSCECEEEE *->CGvlGfiAhikgkpshkivedaleaLerLeHRGavgADgktGDGAGI CGv GfiA+ ++ ++hkiv +aleaL+++eHRGa++AD ++GDGAGI gi|9081913 34 CGV-GFIADVNNVANHKIVVQALEALTCMEHRGACSADRDSGDGAGI 79 CS EEECTCCCHHHHHHHCT----S GC-EEEEEEE-SSHHHHHHHHHHHHHH ltqiPdgFFrevakelGieLpe.gqYAVGmvFLPqdelaraearkifEki t+iP+++F++ ++++i++ ++ +VGm+FLP l+ + i+E + gi|9081913 80 TTAIPWNLFQKSLQNQNIKFEQnDSVGVGMLFLPAHKLKES--KLIIETV 127 CS HHHTT-EEEEEEE--B-GGGS-HHHHHC--EEEEEEEE-TT--HHHHHHC aeeeGLeVLGWReVPvnnsvLGetAlatePvIeQvFvgapsgdgedfErr ++ee+Le++GWR VP+ +vLG++A + P++eQvF+ +++ +++ +E++ gi|9081913 128 LKEENLEIIGWRLVPTVQEVLGKQAYLNKPHVEQVFCKSSNLSKDRLEQQ 177 CS EEEEECHSCHHHHTHHH. BEEEEEESSEEEEEECC-GGGHHHHBHG LyviRkrieksivaenvn....fYiCSLSsrTIVYKGMLtseQLgqFYpD L+++Rk+iek+i+ + + ++fYiCSLS++TIVYKGM++s++LgqFY+D gi|9081913 178 LFLVRKKIEKYIGINGKDwaheFYICSLSCYTIVYKGMMRSAVLGQFYQD 227 CS GGSTTEEBSEEEEEECESSSSSCTGGGSSCEEECCCTTCEEEEEEEEETT LqderfeSalAivHsRFSTNTfPsWplAQPfRVnslwgggivlAHNGEIN L++++++S++Ai+H+RFSTNT+P+WplAQP+R ++ HNGEIN gi|9081913 228 LYHSEYTSSFAIYHRRFSTNTMPKWPLAQPMR---------FVSHNGEIN 268 CS THHHHHHHHHHTSCCCSSTTCGHHHHCC-SSS-TTSCHHHHHHHHHHHHH TlrgNrnwMraRegvlksplFgddldkLkPIvneggSDSaalDnvlEllv Tl gN nwM++Re +l+s++++d++++LkPI n+++SDSa+lD ++Ell+ gi|9081913 269 TLLGNLNWMQSREPLLQSKVWKDRIHELKPITNKDNSDSANLDAAVELLI 318 CS HTT--HHHHHHHHS----TT-GGGTST-HHHHHHHHHHHHHHCCHCCEEE raGRslpeAlMMlIPEAWqnnpdmdkdrpekraFYeylsglmEPWDGPAa ++GRs++eAlM+l+PEA+qn+pd +++e+ +FYey+sgl+EPWDGPA+ gi|9081913 319 ASGRSPEEALMILVPEAFQNQPDFA-NNTEISDFYEYYSGLQEPWDGPAL 367 CS EEEETSSEEEEEEETTTSCESEEEEEEEEEE.TTEEEEEESSC lvftDGryavgAtLDRNGLTRPaRygiTrdldkDglvvvaSEa<-* +vft+G++ +gAtLDRNGL RPaRy+iT kD+lv+v+SE+ gi|9081913 368 VVFTNGKV-IGATLDRNGL-RPARYVIT----KDNLVIVSSES 404 FRG1: domain 1 of 1, from 88 to 107: score 0.2, E = 1.7 *->FQkfKvDLqdrklrinekDkkel<-* FQk+ Lq+ + +++D+ ++ gi|9081913 88 FQKS---LQNQNIKFEQNDSVGV 107 C1_2: domain 1 of 1, from 191 to 210: score 1.1, E = 9.6 *->idgfyg...fYsCkkccddftl<-* i+g+++ ++fY C+ c +t+ gi|9081913 191 INGKDWaheFYICSLSC--YTI 210 MADF_DNA_bdg: domain 1 of 1, from 235 to 261: score 1.8, E = 8.2 *->drYrrelrkirqgnsegsstgsgesykskWryyeelsFL<-* +++ ++r+ ++ +kW+++ ++F gi|9081913 235 SSFAIYHRRFS------------TNTMPKWPLAQPMRFV 261 PaaA_PaaC: domain 1 of 1, from 258 to 269: score 0.4, E = 5.6 CS X............ *->MYnFvEHGGvint<-* M Fv H G int gi|9081913 258 M-RFVSHNGEINT 269 Albicidin_res: domain 1 of 1, from 274 to 289: score 1.7, E = 5.7 *->LrlmharEPsLrkgtG<-* L+ m+ rEP L+ +++ gi|9081913 274 LNWMQSREPLLQSKVW 289 UBA: domain 1 of 1, from 311 to 331: score 4.2, E = 3.1 CS HHHHHHHHHTTT-HHHHHHHH *->eeakkALeatngnverAvewL<-* ++a++ L a++ ++e+A+++L gi|9081913 311 DAAVELLIASGRSPEEALMIL 331 Gla: domain 1 of 1, from 342 to 357: score 4.0, E = 3.5 CS CSSHHHHHHHHHHCTC *->fednegtkefwrkYfg<-* f++n+++ f++ Y g gi|9081913 342 FANNTEISDFYEYYSG 357 RNA_pol_Rpb2_4: domain 1 of 1, from 369 to 381: score 4.6, E = 1.4 CS EEETTEEEEEESS *->VYvNGklvGthrn<-* V+ NGk++G + + gi|9081913 369 VFTNGKVIGATLD 381 MoCF_biosynth: domain 1 of 1, from 371 to 396: score 1.3, E = 5.6 CS CHHHHHHHHHHHTTTCEEEEEEEE-SS *->tNgpmLaalLresaGaevirygiVpDd<-* tNg+ + a L + G ++ry+i +D+ gi|9081913 371 TNGKVIGATLDR-NGLRPARYVITKDN 396 DUF1200: domain 1 of 1, from 389 to 401: score 6.7, E = 0.42 *->kYvltedtLlIks<-* +Yv+t+d L+I+s gi|9081913 389 RYVITKDNLVIVS 401 Nup133_N: domain 1 of 1, from 397 to 419: score -0.6, E = 6.5 *->lylltrnsGvvrIeHaleedstne<-* l++ + +sGvv++e + + s + gi|9081913 397 LVIVSSESGVVQVE-PGNVKSKGR 419 DUF1976: domain 1 of 1, from 428 to 448: score -1.5, E = 4.3 *->VsvYiyFkevtdnksLsEysVtyk<-* V++++ ++++nk ++ sVt k gi|9081913 428 VDIFS--HKILNNKEIK-TSVTTK 448 Bac_rhodopsin: domain 1 of 1, from 445 to 472: score 0.9, E = 4.9 CS HHHHHHHHHHHHHHHHHCHHHTC--------- *->vvAKVgFgfilLrsravlertvavgsalaage<-* v++K+++g +l ++r++le + + l+++ gi|9081913 445 VTTKIPYGELLTDARQILE--HK--PFLSDQQ 472 Coq4: domain 1 of 1, from 459 to 481: score -0.3, E = 9.1 *->rrILkEkPRissetldlkkLrkL<-* r+IL kP s ++d kkL +L gi|9081913 459 RQILEHKPFLSDQQVDIKKLMQL 481 Glu_syn_central: domain 1 of 1, from 478 to 773: score 649.1, E = 7.9e-213 CS HHHHHHCTT--HHHHHCTCHHHHHHSS--EE-S---S--CCC-SS-- *->llrrQkAFGYTyEdvelvllPMAetGkEalGSMGdDtPLAVLSekpr l+++Q+AFGYT+Edvelv+++MA+++kE++++MGdD+PL +LSek++ gi|9081913 478 LMQLQTAFGYTNEDVELVIEHMASQAKEPTFCMGDDIPLSILSEKSH 524 CS -GGGCEEE----SSS----TTTTGGG-B--EEES--S-TTS-SGGGC-CE lLYdYFKQlFAQVTNPPIDPIREelVMSLetylGpegNlLeptpeqarrl +LYdYFKQ+FAQVTNP+IDP+RE+lVMSL+ ++G+++NlL+ p+ a+++ gi|9081913 525 ILYDYFKQRFAQVTNPAIDPLRESLVMSLAIQIGHKSNLLDDQPTLAKHI 574 CS EESSSB--HHHHHH.HHHH....CCCCEEEEESEEESTTSTTCHHHHHHH kLesPILsnselekmlknidairegfkaatIditFdveeGvdgLeaaLdr kLesP+++++el++ + + +++++ I+++F e+G++ ++ + + gi|9081913 575 KLESPVINEGELNA-IFE-----SKLSCIRINTLFQLEDGPKNFKQQIQQ 618 CS HHHHHHHHHHCT-SEEEEESTCG--CTTEEE--HHHHHHHHHHHHHCTT- lceeAeeAirsGaniivLSDRndildeervaIPaLLAvGAVHhHLIrkgL lce A++Ai +G ni+vLSD+n+ ld+e+v+IP+LLAvGAVHhHLI kgL gi|9081913 619 LCENASQAILDGNNILVLSDKNNSLDSEKVSIPPLLAVGAVHHHLINKGL 668 CS CCC-EEEEEESS--SHHHHHHHHCTT-SEEEEHCCHHHHHHHHCCCCCCC RtkvslvVETGEaREvHHFAvLiGYGAsAInPYLAyETirdWWlirrGll R+ +s+ VET++++++HHFA+LiGYGAsAI+PYLA+ET r+WW + ++++ gi|9081913 669 RQEASILVETAQCWSTHHFACLIGYGASAICPYLAFETARHWWSNPKTKM 718 CS CHTTTS- T--HHHHHHHHHHHHHHHHHHHHHCTT--BHHHHCCS--EEE lmskGkl.elsleeavkNYrkAiekGlLKIMSKMGISTlqSYrGAQIFEA lmskG+l++++++ea++NY+kA+e+GlLKI+SKMGIS+l+SY+GAQIFE+ gi|9081913 719 LMSKGRLpACNIQEAQANYKKAVEAGLLKILSKMGISLLSSYHGAQIFEI 768 CS SSB-H vGLsk<-* +GL++ gi|9081913 769 LGLGS 773 Flavodoxin_NdrI: domain 1 of 1, from 488 to 497: score 2.1, E = 4.6 CS -HHHHHHHHH *->TneDVerVrk<-* TneDVe V + gi|9081913 488 TNEDVELVIE 497 P22_AR_N: domain 1 of 1, from 524 to 541: score -0.2, E = 9.5 *->dVLydYWtrkGkAv..NPR<-* ++LydY+ + +A +NP+ gi|9081913 524 HILYDYFK-QRFAQvtNPA 541 Cache_1: domain 1 of 1, from 537 to 557: score 7.0, E = 0.14 *->wTePYvdaalktgdlViTiaqPv<-* +T+P++d + +++lV ++a+++ gi|9081913 537 VTNPAIDPL--RESLVMSLAIQI 557 Glu_synthase: domain 1 of 2, from 650 to 676: score 1.3, E = 3 CS --HHHHHHHHHHHHHCTT-CCCSEEEE *->lPwelgLaevhqtLvengLRdrVsLia<-* +P l++ +vh L++ gLR + s+ + gi|9081913 650 IPPLLAVGAVHHHLINKGLRQEASILV 676 HdeA: domain 1 of 1, from 727 to 749: score 9.6, E = 0.015 *->ACk.QdkkAsFkdKvkaEldKvk<-* AC Q+ +A++k+ v+a l K+ gi|9081913 727 ACNiQEAQANYKKAVEAGLLKIL 749 Sel1: domain 1 of 1, from 729 to 745: score 2.5, E = 7 CS .HHH.HHHHHHHHHHTT- *->DyekeAlkwyekAAeqGn<-* ++++ A + y+kA e+G gi|9081913 729 NIQE-AQANYKKAVEAGL 745 DUF1981: domain 1 of 1, from 765 to 787: score 3.6, E = 3.3 *->iFgvltlaakeesesivklAfqiid.qi<-* iF++l+l++ v+lAf+ +++qi gi|9081913 765 IFEILGLGSEV-----VNLAFKGTTsQI 787 tRNA_anti: domain 1 of 1, from 818 to 839: score 4.9, E = 2 CS EEEEEEETTSSTSTCTCTT..EEEEEEEEEEE *->tGkvkkrpggeqNnlkTGeKAlelvveeievl<-* +G v+ rpgge ++++ +e+ gi|9081913 818 YGFVQYRPGGE----------YHINNPEMSKA 839 Cystatin: domain 1 of 1, from 826 to 859: score 2.4, E = 3.9 CS ECEEEEET.STSHHHHHHHHHHHHHHHHHSSSSEEEEE *->GglspvdpNendpevqealdfAlakyNeksndnylfel<-* Gg +++ pe +al+ A+ yN + +ny++ l gi|9081913 826 GGEYHINN----PEMSKALHQAVRGYNPEYYNNYQSLL 859 RNase_PH_C: domain 1 of 1, from 827 to 846: score 4.2, E = 2.3 CS SSSS.B.HHHHHHHHHHHHHH *->GkgnglteelleealelAkeg<-* G +++++ +++ +al++A+ g gi|9081913 827 G-EYHINNPEMSKALHQAVRG 846 Glu_synthase: domain 2 of 2, from 830 to 1216: score 857.3, E = 9e-255 CS -SS-HHHHHHHHHHHHC--T-HHHHHHHHHHHHTS.-S-SGGGGEEE *->hrnepeviktlqkavqvpveskpsydkYreplnertpigalrdlLef h n+pe++k l++av+ + y +Y+ +l +r p++alrdlL++ gi|9081913 830 HINNPEMSKALHQAVRG--YNPEYYNNYQSLLQNR-PPTALRDLLKL 873 CS --SS--......--GGGS--HHHHHTTEEEEEB-CTTC-HHHHHHHHHHH kyaeepldtdkiipieevepaleikkrfctgaMSyGALSeeAheALAiAm ++++p i+i+eve+++ i + fctg+MS+GALS+e+he+LAiAm gi|9081913 874 QSNRAP------ISIDEVESIEDILQKFCTGGMSLGALSRETHETLAIAM 917 CS HHCT-EEEETTT---GGGCSB-TTS-T S BTTSTT--S--TT-B---SE nriGtksNtGEGGedperlkpaadlds.G.SpTlpHLkGLqnednarSAI nriG+ksN+GEGGedp r+k + d++s+G+Sp lpHLkGL+n+d+a+SAI gi|9081913 918 NRIGGKSNSGEGGEDPVRFKILNDVNSsGtSPLLPHLKGLKNGDTASSAI 967 CS EEE-TT-TT--............HHHHCC-SEEEEE---TTSTTT--EE- kQvASGRFGVtkRnGefWeefkRseYLvnAdalEIKiAQGAKPGeGGhLP kQ+ASGRFGVt +eYL+nA++lEIKiAQGAKPGeGG+LP gi|9081913 968 KQIASGRFGVT------------PEYLMNAKQLEIKIAQGAKPGEGGQLP 1005 CS GGG--HHHHHHHTS-TT--EE--SS-TT-SSHHHHHHHHHHHHHH-.TTS GeKVspeIAriRnstPGvgliSPpPHHDIysiEDLaqLIydLkeindpkA G+K+sp+IA +R ++PGv liSPpPHHDIysiEDL+qLI+dL++in pkA gi|9081913 1006 GKKISPYIATLRKCKPGVPLISPPPHHDIYSIEDLSQLIFDLHQIN-PKA 1054 CS EEEEEEE-STTHHHHHHH...HHHTT-SEEEEE-TT---SSEECCHHHHC pisVKLVsehgvgtiaaGhmqvakAnADiIlIdGhdGGTGASpktsikha +isVKLVse g+gtiaaG vak+nADiI+I+GhdGGTGASp++sikha gi|9081913 1055 KISVKLVSEIGIGTIAAG---VAKGNADIIQISGHDGGTGASPLSSIKHA 1101 CS ---HHHHHHHHHHHHHCTT-CCCSEEEEESS--SHHHHHHHHHCT-SEEE GlPwelgLaevhqtLvengLRdrVsLiadGGLrTGaDVakAaaLGAdavg G PwelgL+evhq+L en+LRdrV+L++dGGLrTG D+++Aa++GA+++g gi|9081913 1102 GSPWELGLSEVHQLLAENQLRDRVTLRVDGGLRTGSDIVLAAIMGAEEFG 1151 CS -SHHHHHHCT--S---CCCT--TTSSS---CCHH..CT----HHHHHHHH iGTaaLiAlGCimaRvCHtntCPvGvATQDPeLrKrlkfegaperVvNyf +GT+a+iA+GCimaR+CHtn+CPvGvATQ++eLr +f g+pe +vN+f gi|9081913 1152 FGTVAMIATGCIMARICHTNKCPVGVATQREELR--ARFSGVPEALVNFF 1199 CS HHHHHHHHHHHHHHT-S iflaeEvrellaqlGfr<-* +f+ Evre+la+lG++ gi|9081913 1200 LFIGNEVREILASLGYK 1216 DUF258: domain 1 of 1, from 839 to 860: score 0.3, E = 8.3 CS HHHHHHHCTSS-HHHHHHHHHHHH *->AVkaAveeGeIseeRYesYlklle<-* A+ +Av +++e Y++Y+ ll+ gi|9081913 839 ALHQAVR--GYNPEYYNNYQSLLQ 860 Pencillinase_R: domain 1 of 1, from 856 to 894: score 3.9, E = 2.5 CS XXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXX *->drlfggsvgalvanfleee....klSeddieeLrelLde<-* + l++++++ ++ ++l+ ++++ ++S d++e ++++L++ gi|9081913 856 QSLLQNRPPTALRDLLKLQsnraPISIDEVESIEDILQK 894 SelT: domain 1 of 1, from 872 to 885: score 3.1, E = 2.2 *->KLqtGrvYAPPtpqEL<-* KLq++r P++++E+ gi|9081913 872 KLQSNRA--PISIDEV 885 Nitro_FeMo-Co: domain 1 of 1, from 879 to 897: score 2.1, E = 5.3 CS EEE-TTSSBHHHHHHHHHC *->pikagegetieeaiealqe<-* pi e e+ie+ + ++ + gi|9081913 879 PISIDEVESIEDILQKFCT 897 DUF37: domain 1 of 1, from 927 to 934: score 3.0, E = 4.5 *->hpGGyDPV<-* ++GG DPV gi|9081913 927 GEGGEDPV 934 Scm3: domain 1 of 1, from 953 to 963: score 2.2, E = 3.5 *->HLraLeteddi<-* HL++L+++d++ gi|9081913 953 HLKGLKNGDTA 963 cobW: domain 1 of 1, from 1038 to 1058: score 5.1, E = 0.45 CS ...HHHHHHHHHH-SSS-EEE *->adlekleadlrrlnpeapiip<-* +dl++l+ dl+++np+a+i gi|9081913 1038 EDLSQLIFDLHQINPKAKISV 1058 Arch_flagellin: domain 1 of 1, from 1050 to 1072: score 4.1, E = 0.66 *->inpstkvrgeVvpenGapgtief<-* inp k+++++v+e+G+ ++ gi|9081913 1050 INPKAKISVKLVSEIGIGTIAAG 1072 DUF1393: domain 1 of 1, from 1055 to 1068: score 3.1, E = 2 *->klSvKtVVAiGIGA<-* k+SvK V iGIG+ gi|9081913 1055 KISVKLVSEIGIGT 1068 FtsK_SpoIIIE: domain 1 of 1, from 1107 to 1143: score 2.6, E = 3.1 *->lviDnydeLaeenlL.ervtsLknqGlsygvhvmata<-* l++ + ++L +en+L++rvt+ + +Gl +g +++++a gi|9081913 1107 LGLSEVHQLLAENQLrDRVTLRVDGGLRTGSDIVLAA 1143 FMN_dh: domain 1 of 1, from 1109 to 1148: score 3.2, E = 0.89 CS HHHHHHHHHCHHTTTSSEEEEESS-SSHHHHHHHHHHTSS *->LpeVvPIlkeaAvkgdieVllDgGvRRGtDVlKALALGAr<-* L eV +l e + +++ +DgG R+G+D++ A +GA+ gi|9081913 1109 LSEVHQLLAENQLRDRVTLRVDGGLRTGSDIVLAAIMGAE 1148 DSRB: domain 1 of 1, from 1120 to 1134: score 2.7, E = 2.7 *->mKvndrvtvKtDGgpR<-* ++ drvt + DGg R gi|9081913 1120 -QLRDRVTLRVDGGLR 1134 Phage_Mu_P: domain 1 of 1, from 1122 to 1131: score -0.4, E = 10 *->sntVtLrvgG<-* ++VtLrv+G gi|9081913 1122 RDRVTLRVDG 1131 Hormone_4: domain 1 of 1, from 1168 to 1176: score 4.4, E = 2.5 CS X-TT--TT- *->CyirnCPrG<-* C + CP+G gi|9081913 1168 CHTNKCPVG 1176 GDC-P: domain 1 of 1, from 1205 to 1225: score 7.1, E = 0.086 *->eqqeMLstiGlssLddLidat<-* e++e+L+++G++sLdd ++++ gi|9081913 1205 EVREILASLGYKSLDDITGQN 1225 PspB: domain 1 of 1, from 1268 to 1276: score 0.4, E = 8.4 *->MsaffLagP<-* M+ ++La+P gi|9081913 1268 MDDDILAIP 1276 T5orf172: domain 1 of 1, from 1271 to 1293: score 2.0, E = 6.1 *->dvvalievedaraklEklLHkrFk<-* d+ a+ ev++a klE+++ k+Fk gi|9081913 1271 DILAIPEVSNAI-KLETEITKHFK 1293 CAP_C: domain 1 of 1, from 1283 to 1292: score 1.3, E = 7.4 CS EEEEEE---- *->KLvTevveha<-* KL+Te++ h gi|9081913 1283 KLETEITKHF 1292 GXGXG: domain 1 of 1, from 1290 to 1485: score 367.3, E = 2.7e-107 CS EEEEE-TT--STTHHHHHHHHHHCTTTS.S-TTCEEEEEEEEE-TTT *->keeaiiNtdrlvgtrlsgeiakkygeegalpkdtgkivfnGsAGqsf k+++i Nt+r+vgtrlsg iak yg+ g + k+ +k++f+GsAGqsf gi|9081913 1290 KHFKIANTNRTVGTRLSGIIAKNYGNTG-F-KGLIKLNFYGSAGQSF 1334 CS TTT-BTTEEEEEEEEE-S.TTTTT-ECCEEEEE--TT-.......SS-GG GafmagGvtLeleGdAnddyvGkgmsGGeIvikgnagdpvGnnMdageyv Gaf+a+G++L l+G+And yvGkgm+GG+Ivi+++ag +e + gi|9081913 1335 GAFLASGINLKLMGEAND-YVGKGMNGGSIVIVPPAGT-------IYEDN 1376 CS GSEEC-SSTTTT--CEEEEESSEE-TTTTTT-.....CCEEEEESEB.-S gnviaGNtclyGatGGkifiaGdAGerfgvrnkayKdsgatiVveGvaGd ++vi+GNtclyGatGG++f++G+AGerf+vrn s a+ VveGv Gd gi|9081913 1377 NQVIIGNTCLYGATGGYLFAQGQAGERFAVRN-----SLAESVVEGV-GD 1420 CS STTTT-EEEEEEESS-B-SSBTTT--CCEEEEE-TTS.......THHHHB hggEYMtGGtivVlGdaGrnvGagMtGGiaYvlgeiedfsyMiatlpgkv h++EYMtGG+ivVlG+aGrnvGagMtGG+aY+l+e+e + ++v gi|9081913 1421 HACEYMTGGVIVVLGKAGRNVGAGMTGGLAYFLDEDE-------NFIDRV 1463 CS -CCCEEEE...ES-S......CCHHHHHHHH nleiVeledlkrievkrkklLpegekqlkel<-* n+eiV+ + r+ + ++ge+qlk+l gi|9081913 1464 NSEIVKIQ---RVIT------KAGEEQLKNL 1485 DUF1514: domain 1 of 1, from 1453 to 1469: score 3.5, E = 5.7 *->LeeyrieveRikkevkk<-* L e+++ ++R++ e+ k gi|9081913 1453 LDEDENFIDRVNSEIVK 1469 Colicin: domain 1 of 1, from 1456 to 1467: score 1.4, E = 7.5 CS SHHHHHHHHHCH *->DdkfveklNkli<-* D++f++ +N +i gi|9081913 1456 DENFIDRVNSEI 1467 Ribosomal_S6: domain 1 of 1, from 1461 to 1481: score 3.3, E = 3.7 CS CCHHHHHHHHHHHHHCTT-EE *->EqvkqeiekYqkvLtnngAei<-* ++v++ei k+q+v+t++g+e+ gi|9081913 1461 DRVNSEIVKIQRVITKAGEEQ 1481 BicD: domain 1 of 1, from 1465 to 1481: score -1.6, E = 6.8 *->gqaysnqrkvAkdGeer<-* + +++qr+ +k Gee+ gi|9081913 1465 SEIVKIQRVITKAGEEQ 1481 PUF: domain 1 of 1, from 1470 to 1486: score 6.5, E = 0.47 *->lQkllevateeqkqlil<-* +Q+++++a+eeq ++++ gi|9081913 1470 IQRVITKAGEEQLKNLI 1486 DUF477: domain 1 of 1, from 1472 to 1495: score 3.8, E = 1.7 *->gtLspserarLeqalaalEqktga<-* ++++++ ++L ++ ++ktg+ gi|9081913 1472 RVITKAGEEQLKNLIENHAAKTGS 1495 Phage_prot_Gp6: domain 1 of 1, from 1479 to 1492: score 1.0, E = 4 *->eEmikkFidkHklr<-* eE +k++i+ H+++ gi|9081913 1479 EEQLKNLIENHAAK 1492 IBN_N: domain 1 of 1, from 1498 to 1516: score 8.2, E = 0.17 CS HHHHHHHHHCCTHHCHHHHH *->AEkqLeqlekqklPgfllaL<-* A++ Le+++++ lP+f++ + gi|9081913 1498 AHTILEKWNSY-LPQFWQVV 1516 GspM: domain 1 of 1, from 1506 to 1520: score 1.0, E = 8.6 CS XXXXXXXXXXXXXXX *->mneLqawWqgrspRE<-* ++ L ++Wq ++p+E gi|9081913 1506 NSYLPQFWQVVPPSE 1520 // From etal at uga.edu Tue Oct 30 13:21:25 2012 From: etal at uga.edu (Eric Talevich) Date: Tue, 30 Oct 2012 13:21:25 -0400 Subject: [Biopython-dev] Fwd: Pull Request: MafIO.py In-Reply-To: <5aa8ce85a0ec41b5b817cdc5105bfdfb@BLUPRD0210HT004.namprd02.prod.outlook.com> References: <5aa8ce85a0ec41b5b817cdc5105bfdfb@BLUPRD0210HT004.namprd02.prod.outlook.com> Message-ID: ---------- Forwarded message ---------- From: Nick Loman Date: Tue, Oct 30, 2012 at 6:34 AM Subject: Pull Request: MafIO.py Hi there Thanks for the MafIO branch. In order to get it to read MAF files produced by Mugsy (mugsy.sourceforge.net) I had to make the following change: diff --git a/Bio/AlignIO/MafIO.py b/Bio/AlignIO/MafIO.py index 6eda0ca..4bb1407 100644 --- a/Bio/AlignIO/MafIO.py +++ b/Bio/AlignIO/MafIO.py @@ -178,7 +178,7 @@ def MafIterator(handle, seq_count = None, alphabet = single_letter_alphabet): annotations = dict([x.split("=") for x in line.strip().split()[1:]]) - if len([x for x in annotations.keys() if x not in ("score", "pass")]) > 0: + if len([x for x in annotations.keys() if x not in ("score", "pass", "label", "mult")]) > 0: raise ValueError("Error parsing alignment - invalid key in 'a' line") elif line.startswith("#"): # ignore comments My Python fork is a bit confusing right now so hope you don't mind me sending this pull request via email! Cheers Nick From w.arindrarto at gmail.com Tue Oct 30 20:09:41 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 31 Oct 2012 01:09:41 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: <508FE182.3040202@biotech.uni-tuebingen.de> References: <508EEA85.6060906@biotech.uni-tuebingen.de> <508FE182.3040202@biotech.uni-tuebingen.de> Message-ID: Hi Kai, > one more thing: > > Hmmer2 has the concept of an accession number in the result. Is there > an attribute for that in the QueryResult object that I'm missing or do > we want a new attribute for that. Would "accession" be a good name? > > Cheers, > Kai I've used '.acc' for accesion number properties in the current HMMER3 and BLAST parsers, but this choice was arbitrary. '.accession' is a good name. I didn't use it because I like shorter names better, but then again it may be unclear at times. Does anyone have preference between '.acc' or '.accession'? If not, I can change the current '.acc' into '.accession'. cheers, Bow From w.arindrarto at gmail.com Tue Oct 30 20:19:30 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 31 Oct 2012 01:19:30 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: <508FF84A.2020802@biotech.uni-tuebingen.de> References: <508EEA85.6060906@biotech.uni-tuebingen.de> <508F834C.6010404@biotech.uni-tuebingen.de> <508FF84A.2020802@biotech.uni-tuebingen.de> Message-ID: Hi Kai, > I've just stumbled over a case where not being able to pre-create Hit > objects really bites me. > > See the attached hmmpfam output. You'll notice that the domain table > is not in the order of the hit table. As I'd like to preserve the > order of the hit table, the current setup of the API forces me to > either repeatedly parse the domain annotations until I find the > correct domain annotations for my hit, or to create the hits in the > order of the domain annotation table and then reshuffle them to make > sure they're in the order of the hit table. > > If I could just create "empty" hit objects when parsing the hit table, > I could easily preserve the order of the hits but still add the hsps > as I parse them. Hmm.. This is a problem :/. I didn't expect any format to have this kind of ordering. I'll see what I can do with the current API limitation. We may need to change it back to not requiring any HSPs for Hit. In any case, I'll see what needs to be done first and get back asap. cheers, Bow From mjldehoon at yahoo.com Tue Oct 30 21:12:18 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 30 Oct 2012 18:12:18 -0700 (PDT) Subject: [Biopython-dev] Working with the new SearchIO API Message-ID: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com> >Does anyone have preference between '.acc' or '.accession'? If not, I >can change the current '.acc' into '.accession'. I would prefer .accession for clarity. Best, -Michiel From andrewscz at gmail.com Wed Oct 31 14:10:48 2012 From: andrewscz at gmail.com (Andrew Sczesnak) Date: Wed, 31 Oct 2012 11:10:48 -0700 Subject: [Biopython-dev] Pull Request: MafIO.py In-Reply-To: References: Message-ID: <01027F16-EBA0-41A2-B1F5-D0E128B0B08E@gmail.com> Nick, Can you provide a snippet of a file from mugsy for the unit tests? Thanks, Andrew On Oct 31, 2012, at 9:00 AM, biopython-dev-request at lists.open-bio.org wrote: > From: Nick Loman > Date: Tue, Oct 30, 2012 at 6:34 AM > Subject: Pull Request: MafIO.py > > > Hi there > > Thanks for the MafIO branch. In order to get it to read MAF files produced > by Mugsy (mugsy.sourceforge.net) I had to make the following change: > > diff --git a/Bio/AlignIO/MafIO.py b/Bio/AlignIO/MafIO.py > index 6eda0ca..4bb1407 100644 > --- a/Bio/AlignIO/MafIO.py > +++ b/Bio/AlignIO/MafIO.py > @@ -178,7 +178,7 @@ def MafIterator(handle, seq_count = None, alphabet = > single_letter_alphabet): > > annotations = dict([x.split("=") for x in > line.strip().split()[1:]]) > > - if len([x for x in annotations.keys() if x not in ("score", > "pass")]) > 0: > + if len([x for x in annotations.keys() if x not in ("score", > "pass", "label", "mult")]) > 0: > raise ValueError("Error parsing alignment - invalid key in > 'a' line") > elif line.startswith("#"): > # ignore comments > > > My Python fork is a bit confusing right now so hope you don't mind me > sending this pull request via email! > > Cheers > > Nick From redmine at redmine.open-bio.org Wed Oct 31 15:09:57 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 31 Oct 2012 19:09:57 +0000 Subject: [Biopython-dev] [Biopython - Bug #3297] newline added in quated features References: Message-ID: Issue #3297 has been updated by Chris Fields. Assignee changed from Bioperl Guts to Biopython Dev Mailing List Changing default assignee. ---------------------------------------- Bug #3297: newline added in quated features https://redmine.open-bio.org/issues/3297 Author: Jesse van Dam Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: Note: sorry for the duplicate reporting, did not notice the makeup of the bug reporting system When I have a feature line like (which spans multiple lines) in a genbank file
                     /product="Glutamate synthase [NADPH] small chain (EC 1.4.1
                     .13)"

Then a space/newline will be added between 1.4.1 and .13 in the result so when printing the feature with the following code
  print(source[0].qualifiers["product"])
It will print (with the an unwanted space)
Glutamate synthase [NADPH] small chain (EC 1.4.1 .13)
Changed the following thing in scanner.py to fix this problem
                    elif value[0]=='"':
                        #Quoted...
                        if value[-1]!='"' or value!='"':
                            #No closing quote on the first line...
                            while value[-1] != '"':
-                               value += "\n" + iterator.next() 
+                               value += iterator.next() 
                        else:
                            #One single line (quoted)
                            assert value == '"'
                            if self.debug : print "Quoted line %s:%s" % (key, value)
                        #DO NOT remove the quotes...
                        qualifiers.append((key,value))

-- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From b.invergo at gmail.com Mon Oct 1 09:52:04 2012 From: b.invergo at gmail.com (Brandon Invergo) Date: Mon, 01 Oct 2012 11:52:04 +0200 Subject: [Biopython-dev] PAML test problems under Python 3.3.0 In-Reply-To: References: Message-ID: <87k3vazfi3.fsf@invergo.net> Yes no problem, I can take a look at it. I'm completely swamped at the moment, though, so I might have to put it off for a couple of days. If it's an emergency, let me know. -brandon Peter Cock writes: > Hi Brandon (et al), > > Could you have a look at the PAML unit tests under Python 3.3 please? > I see a mix of failures and 'blocking' under a self-compiled Python 3.3.0 > on Mac OS X 10.8 (Mountain Lion): > > $ python3 test_PAML_yn00.py > testAlignmentExists (__main__.ModTest) ... ok > testAlignmentFileIsValid (__main__.ModTest) ... FAIL > testAlignmentSpecified (__main__.ModTest) ... ok > testCtlFileExistsOnRead (__main__.ModTest) ... ok > testCtlFileExistsOnRun (__main__.ModTest) ... ok > testCtlFileValidOnRead (__main__.ModTest) ... ERROR > testCtlFileValidOnRun (__main__.ModTest) ... ok > testOptionExists (__main__.ModTest) ... ok > testOutputFileSpecified (__main__.ModTest) ... ok > testOutputFileValid (__main__.ModTest) ... ok > testParseAllVersions (__main__.ModTest) ... ok > testResultsExist (__main__.ModTest) ... ok > testResultsParsable (__main__.ModTest) ... ok > testResultsValid (__main__.ModTest) ... ^C > > $ python3 test_PAML_codeml.py > testAlignmentExists (__main__.ModTest) ... ok > testAlignmentFileIsValid (__main__.ModTest) ... FAIL > testAlignmentSpecified (__main__.ModTest) ... ok > testCtlFileExistsOnRead (__main__.ModTest) ... ok > testCtlFileExistsOnRun (__main__.ModTest) ... ok > testCtlFileValidOnRead (__main__.ModTest) ... ERROR > testCtlFileValidOnRun (__main__.ModTest) ... ok > testOptionExists (__main__.ModTest) ... ok > testOutputFileSpecified (__main__.ModTest) ... ok > testOutputFileValid (__main__.ModTest) ... ok > testPamlErrorsCaught (__main__.ModTest) ... ok > testParseAA (__main__.ModTest) ... ok > testParseAAPairwise (__main__.ModTest) ... ok > testParseAllNSsites (__main__.ModTest) ... ok > testParseBranchSiteA (__main__.ModTest) ... ok > testParseCladeModelC (__main__.ModTest) ... ok > testParseFreeRatio (__main__.ModTest) ... ok > testParseNSsite3 (__main__.ModTest) ... ok > testParseNgene2Mgene02 (__main__.ModTest) ... ok > testParseNgene2Mgene1 (__main__.ModTest) ... ok > testParseNgene2Mgene34 (__main__.ModTest) ... ok > testParsePairwise (__main__.ModTest) ... ok > testParseSEs (__main__.ModTest) ... ok > testResultsExist (__main__.ModTest) ... ok > testResultsParsable (__main__.ModTest) ... ok > testResultsValid (__main__.ModTest) ... ^C > > $ python3 test_PAML_baseml.py > testAlignmentExists (__main__.ModTest) ... ok > testAlignmentFileIsValid (__main__.ModTest) ... FAIL > testAlignmentSpecified (__main__.ModTest) ... ok > testCtlFileExistsOnRead (__main__.ModTest) ... ok > testCtlFileExistsOnRun (__main__.ModTest) ... ok > testCtlFileValidOnRead (__main__.ModTest) ... ERROR > testCtlFileValidOnRun (__main__.ModTest) ... ok > testOptionExists (__main__.ModTest) ... ok > testOutputFileSpecified (__main__.ModTest) ... ok > testOutputFileValid (__main__.ModTest) ... ok > testPamlErrorsCaught (__main__.ModTest) ... ok > testParseAllVersions (__main__.ModTest) ... ok > testParseAlpha1Rho1 (__main__.ModTest) ... ok > testParseModel (__main__.ModTest) ... ok > testParseNhomo (__main__.ModTest) ... ok > testParseSEs (__main__.ModTest) ... ok > testResultsExist (__main__.ModTest) ... ok > testResultsParsable (__main__.ModTest) ... ok > testResultsValid (__main__.ModTest) ... ^C > > If you've not tried this before, the procedure I'm using is: > > $ python3 setup.py build > $ cd build/py3.3/Tests > $ python3 test_PAML_baseml.py > etc > > The key point is to run the tests directly (rather than > just via 'python3 setup.py test') you must change > director to the 2to3 converted folder under the build > folder. > > By commenting out the test methods which seem to > blocking, it seems some of the failures are to do with > exception handling. I've not dug any further into this. > > Thanks, > > Peter From bjoern at gruenings.eu Mon Oct 1 21:44:10 2012 From: bjoern at gruenings.eu (=?ISO-8859-1?Q?Bj=F6rn_Gr=FCning?=) Date: Mon, 01 Oct 2012 23:44:10 +0200 Subject: [Biopython-dev] [Patch] Genbank Parser In-Reply-To: References: <1348837402.21455.1.camel@threonin> Message-ID: <1349127850.19730.11.camel@threonin> Hi Peter, > > > > the tbl2asn tool from the ncbi creates genbank files that did not have a > > version number. Unfortunately that version number is used to fill > > consumer.data.id. > > I implemented the following fall-back: > > If there is no version information available than it takes the > > consumer.data.name for the consumer.data.id. Does that makes sense? > > > > Thanks! > > Bjoern > > Can you share some example output from tbl2asn that shows > this problem? Ideally something small we could include as a > unit test. please find attached a small, stripped version of such an genbank file. Thanks, Bjoern > Thanks, > > Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: tbl1asn_output.gb Type: application/x-gameboy-rom Size: 5090 bytes Desc: URL: From p.j.a.cock at googlemail.com Thu Oct 4 09:11:01 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 4 Oct 2012 10:11:01 +0100 Subject: [Biopython-dev] [Patch] Genbank Parser In-Reply-To: <1349127850.19730.11.camel@threonin> References: <1348837402.21455.1.camel@threonin> <1349127850.19730.11.camel@threonin> Message-ID: On Mon, Oct 1, 2012 at 10:44 PM, Bj?rn Gr?ning wrote: > Hi Peter, > >> > >> > the tbl2asn tool from the ncbi creates genbank files that did not have a >> > version number. Unfortunately that version number is used to fill >> > consumer.data.id. >> > I implemented the following fall-back: >> > If there is no version information available than it takes the >> > consumer.data.name for the consumer.data.id. Does that makes sense? >> > >> > Thanks! >> > Bjoern >> >> Can you share some example output from tbl2asn that shows >> this problem? Ideally something small we could include as a >> unit test. > > please find attached a small, stripped version of such an genbank file. > > Thanks, > Bjoern $ python Python 2.7.2 (default, Jun 20 2012, 16:23:33) [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import SeqIO >>> r = SeqIO.read("tbl1asn_output.gb", "gb") /Library/Python/2.7/site-packages/Bio/GenBank/__init__.py:1158: BiopythonParserWarning: Expected sequence length 300246, found 2220 (). BiopythonParserWarning) >>> r.id '' >>> r.name 'Seq1' >>> r.description 'Glarea strain lozoyensis.' >>> quit() That warning is because this test file has only the start of the sequence present, yet the LOCUS line still gives the original length. $ head tbl1asn_output.gb LOCUS Seq1 300246 bp DNA linear 10-MAY-2012 DEFINITION Glarea strain lozoyensis. ACCESSION VERSION KEYWORDS . SOURCE Glarea ORGANISM Glarea Unclassified. REFERENCE 1 AUTHORS Test I didn't use your patch - looking over the code, it was already intended that if there was no record.id that record.name would be used. Sadly this was a bit too strict about None versus an empty string, fixed: https://github.com/biopython/biopython/commit/e67d22e4b4f344a5a3c15b6e939c82f58986d87f Thanks for your help, Peter From chapmanb at 50mail.com Fri Oct 5 01:02:06 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 04 Oct 2012 21:02:06 -0400 Subject: [Biopython-dev] TAIR/AGI support In-Reply-To: References: <87txvcx9ls.fsf@fastmail.fm> Message-ID: <874nm9g29d.fsf@fastmail.fm> Kevin; Thanks for making this available. This looks like a great start and seems like it would be a nice starting place for folks dealing with Arabidopsis data. A couple of thoughts which you've essentially already covered: - Could you build up a small test suite that fits into the testing framework: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc246 Your probably the best person to pick some disparate IDs that exercise different components and try to catch any edge cases. - Additional interfaces that help folks do more than get sequence are a great idea. The ideas you've proposed below sound perfect. - Provide some documentation on the Cookbook for common use cases with Biopython + your module. This will help motivate the addition and also help folks test it out on their data. Thanks again for making this available, Brad > Hi Brad, > > My TAIR/AGI script is on github here: > https://github.com/kdmurray91/biopython/blob/master/Bio/TAIR/__init__.py > > I got it to work directly from TAIR's website, however it has not been > rigorously tested. I plan on implementing the process as i described in my > previous email, whereby it fetches the Genbank record from TOGOws or via > NCBI's Efetch (using biopython's interfaces of course). I will keep you all > posted. > > To the list in general, I'm open to suggestions on what to work on next? > > > Regards > Kevin Murray > > > On 6 September 2012 10:45, Brad Chapman wrote: > >> >> Kevin; >> Thanks for the e-mail and offers of code. Always happy to have other >> folks involved with the project. >> >> > What's the status of TAIR AGIs in BioPython (I can see no mention of >> them, >> > or support for them)? I've written a brief module which allows a user to >> > query NCBI with a TAIR AGI, returning a Seq object (via Efetch). Is there >> > any interest in including such functionality in BioPython? >> >> Is the code available on GitHub to get a better sense of all the >> functionality it supports? Do you have an idea where it would fit best? >> As a tair submodule inside of Bio.Entrez, or somewhere else? >> >> > More generally, are there any particular areas of BioPython development >> > which could use an extra pair of hands? >> >> Following the mailing list for discussions on current projects is the >> best way to get a sense of what different folks are working on. The >> issue tracker also has open issues and features that could use attention >> if anything there strikes your fancy: >> >> https://redmine.open-bio.org/projects/biopython >> >> Hope this helps, >> Brad >> >> From tiagoantao at gmail.com Sat Oct 6 03:21:50 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 5 Oct 2012 20:21:50 -0700 Subject: [Biopython-dev] Away Re: buildbot failure in Biopython on Windows XP - Python 2.5 Message-ID: I am currently away from office. I will respond back on as soon as I retunr. Regards, Tiago -- "Liberty for wolves is death to the lambs" - Isaiah Berlin From chris.mit7 at gmail.com Mon Oct 8 02:48:20 2012 From: chris.mit7 at gmail.com (Chris Mitchell) Date: Sun, 7 Oct 2012 22:48:20 -0400 Subject: [Biopython-dev] Proteomics/Mass Spec in Biopython Message-ID: Hi everyone, I recall some time ago there was an email about getting some mass spec functionality within BioPython. I started a BioPython branch to incorporate some iterators for common file types. Of note, there is an iterator for .msf files created by Proteome Discoverer, which thankfully is light-years faster than using PD (and much more forgiving on memory...). It's located here: https://github.com/chrismit/biopython/tree/Proteomics It's following along the progression of my spectra viewer, which is hosted on the same repository (which, for anyone using linux might want to look at; I couldn't find a spectra viewer I liked for linux.). As I generalize more of the methods within that program I'll be adding them to the BioPython branch. Also, I'll be putting in some methods to take care of other common tasks such as FDRs calculation from the input files. I'd love to hear if anyone else wants to join up on this branch or provide suggestions. Chris From redmine at redmine.open-bio.org Wed Oct 10 13:02:23 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 10 Oct 2012 13:02:23 +0000 Subject: [Biopython-dev] [Biopython - Bug #3386] (New) NewickIO parse_tree is slow Message-ID: Issue #3386 has been reported by Aleksey Kladov. ---------------------------------------- Bug #3386: NewickIO parse_tree is slow https://redmine.open-bio.org/issues/3386 Author: Aleksey Kladov Status: New Priority: Normal Assignee: Category: Target version: URL: In the file NewickIO.py class Parser method _parse_subtree seems to be inefficient in time and space. In fact, it's running time is quadratic in respect to size of input, while it can be linear. The problem is that each symbol is read many (up to O(len(text))) times, for example here
for posn in range(1, close_posn):
            if text[posn] == '(':
                plevel += 1
            elif text[posn] == ')':
                plevel -= 1
            elif text[posn] == ',' and plevel == 0:
                subtrees.append(text[prev:posn])
                prev = posn + 1
or here
comment_start = text.find(NODECOMMENT_START)
Also, _parse_subtree relies heavily on slices and strips of strings, which gives quadratic memory consumption. Here is my dirty patched implementation. It's incomplete in many senses, I wrote it only to prove that parsing can be done faster. For unrooted binary tree with 15000 leaves it runs for 1 second, compared to 13 seconds from current implementation.
def _parse_tree(self, text, rooted):
        """Parses the text representation into an Tree object."""
        # XXX Pass **kwargs along from Parser.parse?
        return Newick.Tree(root=self._parse_subtree_fast(text)[0], rooted=rooted)

    def _parse_subtree_fast(self, text):
        id = re.compile(r'[A-Za-z0-9_]+')
        children = []
        if text.startswith('('):
            text = text[1:]
            while True:
                child, text = self._parse_subtree_fast(text)
                children.append(child)
                if text.startswith(','):
                    text = text[1:]
                else:
                    text = text[1:]
                    break
        m = re.match(id, text)
        if m:
            clade = self._parse_tag(m.group())
            text = text[m.end():]
        else:
            clade = Newick.Clade(comment=None)
        clade.clades = children
        return clade, text
PS. I don't know if someone really needs to parse huge trees with BioPython, but I need this feature for couple of http://rosalind.info problems ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From kjwu at ucsd.edu Wed Oct 10 21:27:19 2012 From: kjwu at ucsd.edu (Kevin Wu) Date: Wed, 10 Oct 2012 14:27:19 -0700 Subject: [Biopython-dev] KEGG API Wrapper Message-ID: Hi, I've written a simple wrapper on top of KEGG's new REST API ( http://www.kegg.jp/kegg/docs/keggapi.html). The main functionality of this module is that can detect some invalid queries based on kegg's defined rules. I've implemented each of the examples given on the api docs as tests as well. Here's a quick example of its usage. The api call to http://rest.kegg.jp/get/hsa:10458+ece:Z5100/aaseq can be done using the wrapper as: KEGG.query("get", ["hsa:10458", "ece:Z5100"], "aaseq") Querying the api works well with the current parsers written for KEGG formats. Let me know if there are issues or if it's useful enough to be merged into Biopython! https://github.com/kevinwuhoo/biopython Thanks! Kevin From mjldehoon at yahoo.com Sat Oct 13 11:38:04 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 13 Oct 2012 04:38:04 -0700 (PDT) Subject: [Biopython-dev] KEGG API Wrapper In-Reply-To: Message-ID: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com> Hi Kevin, It would be great to have better KEGG support in Biopython, so I think that this is useful and could in principle be merged into Biopython. But before we do so, is there some documentation for your code (or even better, for the Bio.KEGG module as a whole)? Then it's easier to see how this module can be used, and to discuss any modifications. Thanks for your contribution! -Michiel. --- On Wed, 10/10/12, Kevin Wu wrote: > From: Kevin Wu > Subject: [Biopython-dev] KEGG API Wrapper > To: Biopython-dev at lists.open-bio.org > Date: Wednesday, October 10, 2012, 5:27 PM > Hi, > > I've written a simple wrapper on top of KEGG's new REST API > ( > http://www.kegg.jp/kegg/docs/keggapi.html). The main > functionality of this > module is that can detect some invalid queries based on > kegg's defined > rules. I've implemented each of the examples given on the > api docs as tests > as well. Here's a quick example of its usage. > > The api call to http://rest.kegg.jp/get/hsa:10458+ece:Z5100/aaseq can > be > done using the wrapper as: > KEGG.query("get", ["hsa:10458", "ece:Z5100"], "aaseq") > > Querying the api works well with the current parsers written > for KEGG > formats. Let me know if there are issues or if it's useful > enough to be > merged into Biopython! > > https://github.com/kevinwuhoo/biopython > > Thanks! > Kevin > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From chapmanb at 50mail.com Mon Oct 15 15:02:12 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 15 Oct 2012 11:02:12 -0400 Subject: [Biopython-dev] BOSC/Broad Interoperability Hackathon: potential dates Message-ID: <87ipabeq2z.fsf@fastmail.fm> Hi all; Open Bio regularly organizes hackathon coding sessions in conjunction with the Bioinformatics Open Source Conference. The goal is to get together biologists writing open source code, provide a room and internet, and encourage fun collaborative coding. We've had successful two day Codefests the past three years: http://www.open-bio.org/wiki/Codefest_2012 This year, the Broad Institute kindly offered to host a two day Hackathon in Boston during April. We've proposed three sets of dates: April 4-5th, Thursday and Friday before Bio-IT April 7-8th, Sunday and Monday before Bio-IT April 22-23rd, Monday and Tuesday If you have interest in attending, please fill out this Doodle poll to let us know which dates work best: http://doodle.com/aapy694g43e6ya4f If you can find funds for travel and hotel (or are local to Boston), the event is free and everyone is welcome. As we finalize dates, we'll send around additional details. Thanks everyone, Brad From k.d.murray.91 at gmail.com Tue Oct 16 03:49:22 2012 From: k.d.murray.91 at gmail.com (Kevin Murray) Date: Tue, 16 Oct 2012 14:49:22 +1100 Subject: [Biopython-dev] TAIR/AGI support In-Reply-To: <874nm9g29d.fsf@fastmail.fm> References: <87txvcx9ls.fsf@fastmail.fm> <874nm9g29d.fsf@fastmail.fm> Message-ID: Brad, I shall work on this as time permits, and get back to you all when complete. Cheers, Regards Kevin Murray On 5 October 2012 11:02, Brad Chapman wrote: > > Kevin; > Thanks for making this available. This looks like a great start and > seems like it would be a nice starting place for folks dealing with > Arabidopsis data. A couple of thoughts which you've essentially already > covered: > > - Could you build up a small test suite that fits into the testing > framework: > > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc246 > > Your probably the best person to pick some disparate IDs that exercise > different components and try to catch any edge cases. > > - Additional interfaces that help folks do more than get sequence are a > great idea. The ideas you've proposed below sound perfect. > > - Provide some documentation on the Cookbook for common use cases with > Biopython + your module. This will help motivate the addition and also > help folks test it out on their data. > > Thanks again for making this available, > Brad > > > > Hi Brad, > > > > My TAIR/AGI script is on github here: > > https://github.com/kdmurray91/biopython/blob/master/Bio/TAIR/__init__.py > > > > I got it to work directly from TAIR's website, however it has not been > > rigorously tested. I plan on implementing the process as i described in > my > > previous email, whereby it fetches the Genbank record from TOGOws or via > > NCBI's Efetch (using biopython's interfaces of course). I will keep you > all > > posted. > > > > To the list in general, I'm open to suggestions on what to work on next? > > > > > > Regards > > Kevin Murray > > > > > > On 6 September 2012 10:45, Brad Chapman wrote: > > > >> > >> Kevin; > >> Thanks for the e-mail and offers of code. Always happy to have other > >> folks involved with the project. > >> > >> > What's the status of TAIR AGIs in BioPython (I can see no mention of > >> them, > >> > or support for them)? I've written a brief module which allows a user > to > >> > query NCBI with a TAIR AGI, returning a Seq object (via Efetch). Is > there > >> > any interest in including such functionality in BioPython? > >> > >> Is the code available on GitHub to get a better sense of all the > >> functionality it supports? Do you have an idea where it would fit best? > >> As a tair submodule inside of Bio.Entrez, or somewhere else? > >> > >> > More generally, are there any particular areas of BioPython > development > >> > which could use an extra pair of hands? > >> > >> Following the mailing list for discussions on current projects is the > >> best way to get a sense of what different folks are working on. The > >> issue tracker also has open issues and features that could use attention > >> if anything there strikes your fancy: > >> > >> https://redmine.open-bio.org/projects/biopython > >> > >> Hope this helps, > >> Brad > >> > >> > From zcharlop at mail.rockefeller.edu Tue Oct 16 23:55:26 2012 From: zcharlop at mail.rockefeller.edu (Zachary Charlop-Powers) Date: Tue, 16 Oct 2012 23:55:26 +0000 Subject: [Biopython-dev] KEGG API Wrapper In-Reply-To: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com> References: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: Kevin, Michiel, I just tested Kevin's code for a few simple queries and it worked great. I have always liked KEGG's organization of data and really appreciate this RESTful interface to their data; in some ways I think it easier to use the web interfaces for KEGG than it is for NCBI. Plus the KEGG coverage of metabolic networks is awesome. I found the examples in Kevin's test script to be fairly self-explanatory but a simple-spelled out example in the Tutorial would be nice. One thought, though, is that you can retrieve MANY different types of data from the KEGG Rest API - which means that the user will probably have to parse the data his/herself. Data retrieved with "list" can return lists of genes or compounds or organism and after a cursory look these are each formatted differently. Also true with the 'find' command. So I think you were right to leave out parsers because i think they will be a moving target highly dependent on the query. Thank You Kevin, zach cp On Oct 13, 2012, at 7:38 AM, Michiel de Hoon > wrote: Hi Kevin, It would be great to have better KEGG support in Biopython, so I think that this is useful and could in principle be merged into Biopython. But before we do so, is there some documentation for your code (or even better, for the Bio.KEGG module as a whole)? Then it's easier to see how this module can be used, and to discuss any modifications. Thanks for your contribution! -Michiel. --- On Wed, 10/10/12, Kevin Wu > wrote: From: Kevin Wu > Subject: [Biopython-dev] KEGG API Wrapper To: Biopython-dev at lists.open-bio.org Date: Wednesday, October 10, 2012, 5:27 PM Hi, I've written a simple wrapper on top of KEGG's new REST API ( http://www.kegg.jp/kegg/docs/keggapi.html). The main functionality of this module is that can detect some invalid queries based on kegg's defined rules. I've implemented each of the examples given on the api docs as tests as well. Here's a quick example of its usage. The api call to http://rest.kegg.jp/get/hsa:10458+ece:Z5100/aaseq can be done using the wrapper as: KEGG.query("get", ["hsa:10458", "ece:Z5100"], "aaseq") Querying the api works well with the current parsers written for KEGG formats. Let me know if there are issues or if it's useful enough to be merged into Biopython! https://github.com/kevinwuhoo/biopython Thanks! Kevin _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev Zach Charlop-Powers Post-Doctoral Fellow Laboratory of Genetically Encoded Small Molecules Rockefeller University zcharlop at rockefeller.edu From p.j.a.cock at googlemail.com Wed Oct 17 11:09:07 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 17 Oct 2012 12:09:07 +0100 Subject: [Biopython-dev] KEGG API Wrapper In-Reply-To: References: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: On Wed, Oct 17, 2012 at 12:55 AM, Zachary Charlop-Powers wrote: > Kevin, > Michiel, > > I just tested Kevin's code for a few simple queries and it worked great. I > have always liked KEGG's organization of data and really appreciate this > RESTful interface to their data; in some ways I think it easier to use the > web interfaces for KEGG than it is for NCBI. Plus the KEGG coverage of > metabolic networks is awesome. I found the examples in Kevin's test script > to be fairly self-explanatory but a simple-spelled out example in the > Tutorial would be nice. > > One thought, though, is that you can retrieve MANY different types of data > from the KEGG Rest API - which means that the user will probably have to > parse the data his/herself. Data retrieved with "list" can return lists of > genes or compounds or organism and after a cursory look these are each > formatted differently. Also true with the 'find' command. So I think you > were right to leave out parsers because i think they will be a moving target > highly dependent on the query. > > Thank You Kevin, > zach cp Good point about decoupling the web API wrapper and the parsers - how the Bio.Entrez module and Bio.TogoWS handle this is to return handles for web results, which you can then parse with an appropriate parser (e.g. SeqIO for GenBank files, Medline parser, etc). Note that this is a little more fiddly under Python 3 due to the text mode distinction between unicode and binary... just something to keep in the back of your mind. Peter From redmine at redmine.open-bio.org Wed Oct 17 13:27:18 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 17 Oct 2012 13:27:18 +0000 Subject: [Biopython-dev] [Biopython - Bug #3387] (New) Generic per column annotation from stockholm alignment are not stored in alignment object Message-ID: Issue #3387 has been reported by saverio vicario. ---------------------------------------- Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object https://redmine.open-bio.org/issues/3387 Author: saverio vicario Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Stockholm format includes 4 types of annotations #=GF #=GC #=GS #=GR GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations. GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Wed Oct 17 13:27:18 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 17 Oct 2012 13:27:18 +0000 Subject: [Biopython-dev] [Biopython - Bug #3387] (New) Generic per column annotation from stockholm alignment are not stored in alignment object Message-ID: Issue #3387 has been reported by saverio vicario. ---------------------------------------- Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object https://redmine.open-bio.org/issues/3387 Author: saverio vicario Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Stockholm format includes 4 types of annotations #=GF #=GC #=GS #=GR GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations. GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Wed Oct 17 13:36:24 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 17 Oct 2012 13:36:24 +0000 Subject: [Biopython-dev] [Biopython - Bug #3387] Generic per column annotation from stockholm alignment are not stored in alignment object References: Message-ID: Issue #3387 has been updated by Peter Cock. The underlying alignment class would need a per-column-annotation dictionary (as well as an annotations dictionary, also on the TODO list), to match the per-letter-annotation and annotations dictionaries of the SeqRecord. Parsing this and putting it in alignment._letter_annotation (dictionary as a private variable) would be a reasonable short term hack if you'd like to work on that. ---------------------------------------- Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object https://redmine.open-bio.org/issues/3387 Author: saverio vicario Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Stockholm format includes 4 types of annotations #=GF #=GC #=GS #=GR GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations. GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Wed Oct 17 13:39:25 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 17 Oct 2012 13:39:25 +0000 Subject: [Biopython-dev] [Biopython - Feature #3388] (New) add annotation and letter_annotations attributed for Bio.Align.MultipleSeqAlignment. object Message-ID: Issue #3388 has been reported by saverio vicario. ---------------------------------------- Feature #3388: add annotation and letter_annotations attributed for Bio.Align.MultipleSeqAlignment. object https://redmine.open-bio.org/issues/3388 Author: saverio vicario Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: At the moment I could not add annotation at alignment level. annotation could be usefull for tracking info linked to the loci ( i.e. name of domain), while letter annotation could be usefull to track quality score of alignment or if the sites belong to a given character set. In particular when to alignment are merged it would be usefull tha the bounduary of the merge is tracked for example in Letter annotation of the merge of an alignment a with 10 sites and b of 5 sites the letter_annotations would be as following {locus1:'111111111100000',locus2:'000000000011111'} this could be usefull also to annotate the 3 position of codons {pos1:'1001001001',pos2:'0100100100', pos3:'0010010010'} If this letter_annotation would be supported the annotation could be kept across merging and splitting of the alignment -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Wed Oct 17 13:39:25 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 17 Oct 2012 13:39:25 +0000 Subject: [Biopython-dev] [Biopython - Feature #3388] (New) add annotation and letter_annotations attributed for Bio.Align.MultipleSeqAlignment. object Message-ID: Issue #3388 has been reported by saverio vicario. ---------------------------------------- Feature #3388: add annotation and letter_annotations attributed for Bio.Align.MultipleSeqAlignment. object https://redmine.open-bio.org/issues/3388 Author: saverio vicario Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: At the moment I could not add annotation at alignment level. annotation could be usefull for tracking info linked to the loci ( i.e. name of domain), while letter annotation could be usefull to track quality score of alignment or if the sites belong to a given character set. In particular when to alignment are merged it would be usefull tha the bounduary of the merge is tracked for example in Letter annotation of the merge of an alignment a with 10 sites and b of 5 sites the letter_annotations would be as following {locus1:'111111111100000',locus2:'000000000011111'} this could be usefull also to annotate the 3 position of codons {pos1:'1001001001',pos2:'0100100100', pos3:'0010010010'} If this letter_annotation would be supported the annotation could be kept across merging and splitting of the alignment ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Wed Oct 17 15:00:15 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 17 Oct 2012 15:00:15 +0000 Subject: [Biopython-dev] [Biopython - Bug #3387] Generic per column annotation from stockholm alignment are not stored in alignment object References: Message-ID: Issue #3387 has been updated by Peter Cock. Depends on issue #3388, add annotation and letter_annotations attributed to Bio.Align.MultipleSeqAlignment object https://redmine.open-bio.org/issues/3388 ---------------------------------------- Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object https://redmine.open-bio.org/issues/3387 Author: saverio vicario Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Stockholm format includes 4 types of annotations #=GF #=GC #=GS #=GR GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations. GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Oct 18 11:02:49 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 18 Oct 2012 11:02:49 +0000 Subject: [Biopython-dev] [Biopython - Bug #3387] Generic per column annotation from stockholm alignment are not stored in alignment object References: Message-ID: Issue #3387 has been updated by saverio vicario. File diff_StockholmIO.py added File StockholmIO.py added This is my proposal of patch for StockholmIO. Attached you will find the new StockholmIO.py and a diff file with the old one. To highlight further the new comments I start the comment by #SV In summary the patch implement the new attribute _letter_annotations for Bio.Align.MultipleSeqAlignment and store the GC features within, in the iterator while in the writer write the GC features after all sequence record as stated in http://sonnhammer.sbc.su.se/Stockholm.html. I added a new dictionary for GC and GF features using PFAM standard and it is used in the writing phase to write only PFAM legitimate attributes. The only addition to PFAM standard is the GC features "RF" that is add by HMMer3.0 softwares to indicates what sites where originally present in the profile used to generate the alignment. I do not use the dictionary of PFAM standard to translate the GF, GR attributes of alignment._annotations or the GC attributes in alignment._letter_annotations as is done in the seqRecord for consistency with decision taken originally with GR attributes in alignment._annotations ---------------------------------------- Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object https://redmine.open-bio.org/issues/3387 Author: saverio vicario Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Stockholm format includes 4 types of annotations #=GF #=GC #=GS #=GR GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations. GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Thu Oct 18 18:33:04 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 18 Oct 2012 19:33:04 +0100 Subject: [Biopython-dev] PyPy 1.8 support? Message-ID: Hello all, We currently run the test suite against both PyPy 1.8 and 1.9 on Linux via the TravisCI.org continuous integration testing service. Is anyone actually using Biopython under PyPy 1.8? If not, I intend to drop automated testing under PyPy 1.8 and focus just on PyPy 1.9 instead. (Automated testing under C Python 2.5, 2.6, 2.7, 3.1 and 3.2 etc will continue - I'm hoping to add Python 3.3 as well) Thanks, Peter From ben at benfulton.net Fri Oct 19 03:16:45 2012 From: ben at benfulton.net (Ben Fulton) Date: Thu, 18 Oct 2012 23:16:45 -0400 Subject: [Biopython-dev] Contributing startup Message-ID: Hi, I was looking for some introductory tickets or other methods to familiarize myself with the Biopython codebase. I saw some suggestions on the wiki to improve unit test coverage or to add additional file formats, which sounds fine - are there particular areas of code that lack coverage, or file formats that are particularly wanted? Or would it be better to look over the issue tracker and try to identify some smallish issues? Thanks for any suggestions. Ben Fulton From p.j.a.cock at googlemail.com Fri Oct 19 07:52:19 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 19 Oct 2012 08:52:19 +0100 Subject: [Biopython-dev] PyPy 1.8 support? In-Reply-To: References: Message-ID: On Thu, Oct 18, 2012 at 7:33 PM, Peter Cock wrote: > Hello all, > > We currently run the test suite against both PyPy 1.8 and > 1.9 on Linux via the TravisCI.org continuous integration > testing service. > > Is anyone actually using Biopython under PyPy 1.8? > > If not, I intend to drop automated testing under PyPy 1.8 > and focus just on PyPy 1.9 instead. Done on TravisCI, but easy to revert: https://github.com/biopython/biopython/commit/126c944812730df4677c8fa2f63abc29ddd084bb One reason was the previous build failed due to a timeout fetching PyPy for a custom install. Now we use the TravisCI provided PyPy which should avoid that issue. (It still happens for Jython sometimes). Peter From p.j.a.cock at googlemail.com Fri Oct 19 08:26:35 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 19 Oct 2012 09:26:35 +0100 Subject: [Biopython-dev] Contributing startup In-Reply-To: References: Message-ID: On Fri, Oct 19, 2012 at 4:16 AM, Ben Fulton wrote: > Hi, > > I was looking for some introductory tickets or other methods to familiarize > myself with the Biopython codebase. I saw some suggestions on the wiki to > improve unit test coverage or to add additional file formats, which sounds > fine - are there particular areas of code that lack coverage, or file > formats that are particularly wanted? Or would it be better to look over > the issue tracker and try to identify some smallish issues? > > Thanks for any suggestions. > > Ben Fulton Hi Ben, Welcome - more volunteer developers willing to help is always nice. You asked about test coverage, and while I could guess about things what might be most interesting would be to try and measure this using something like coverage or figleaf: http://nedbatchelder.com/code/coverage/ http://darcs.idyll.org/~t/projects/figleaf/doc/ Another general area would be improving our support under Python 3. In terms of specific modules, is there anything in particular which seems like a good match with your work/research interests? Regards, Peter From p.j.a.cock at googlemail.com Mon Oct 22 16:43:07 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 22 Oct 2012 17:43:07 +0100 Subject: [Biopython-dev] Low level string based FASTA parser Message-ID: Hello all, Something I've wanted/needed recently was a low-level FASTA iterating parser which just returns tuples of strings (without the overhead of Bio.SeqIO building SeqRecords). We don't currently have such a thing, so I have added one to the SeqIO Fasta module (mirroring the low level string-tuple parser for FASTQ files) with some associated unit tests and refactoring (separate commits): https://github.com/biopython/biopython/commit/751fe39765ca6ba60e517b3b4657718fd48f7817 Does anyone have any views on the name of this new function, currently SimpleFastaParser, used as follows: >>> from Bio.SeqIO.FastaIO import SimpleFastaParser >>> with open("Fasta/dups.fasta") as handle: ... for values in SimpleFastaParser(handle): ... print values ('alpha', 'ACGTA') ('beta', 'CGTC') ('gamma', 'CCGCC') ('alpha (again - this is a duplicate entry to test the indexing code)', 'ACGTA') ('delta', 'CGCGC') The capitalisation style is consistent with other functions in SeqIO, but not with PEP8. Peter P.S. I've also updated the legacy function quick_FASTA_reader in Bio.SeqUtils to use this. Since it loads the whole dataset into memory, if no one objects I would like to deprecate this old function. From p.j.a.cock at googlemail.com Mon Oct 22 17:08:47 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 22 Oct 2012 18:08:47 +0100 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346926418.97489.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: On Fri, Sep 28, 2012 at 11:50 AM, Peter Cock wrote: > On Thu, Sep 20, 2012 at 10:08 AM, Peter Cock wrote: >> On Sun, Sep 16, 2012 at 1:34 PM, Peter Cock wrote: >>>> >>>> I guess we need to have a little hack with the 2to3 library and >>>> try defining our own custom fixer for the imports... >>> >>> I've made a start at this - the easy part seems to work :) >>> >>> https://github.com/peterjc/biopython/commits/py3lower >>> >>> ... > > The code to do this lower case name mangling remains > a quite spaghetti like mess in do2to3.py but it now works > enough to pass the test suite (with some but not all 3rd > party dependencies installed) under Linux and my Mac > OS X machine (where like Windows I have a case > insensitive file system). > > ... > > So this idea to adopt PEP8 lower case module names > as part of supporting Python 3 appears to be technically > viable. Has anyone else tried this branch yet? Has the lower case module names under Python 3 idea grown on anyone? I think it makes sense in terms of a long term vision - I do expect to be primarily working under Python 3 within a couple of years. It occurs to me we can make a partial step in this direction with moving to a directory for Bio.Seq, since this could be Bio.seq instead. For example, we talked about something like this: Bio.Seq -> Bio.seq Bio.SeqRecord -> Bio.seq.record Bio.SeqFeature -> Bio.seq.feature Bio.SeqUtils -> Bio.seq.utils Bio.SearchIO -> Bio.seq.search I'm not 100% sure where the Bio.SeqIO top level functions would belong, either directly under Bio.seq or Bio.seq.record might work too. We can have imports setup so that all the classes etc are only defined once, e.g. Bio/seq/__init__.py could initially just contain 'from Bio.Seq import *' and so on. (We'd commit to maintaining the old namespace for at least as long as our standard deprecation cycle, longer ideally). Peter From p.j.a.cock at googlemail.com Mon Oct 22 17:17:34 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 22 Oct 2012 18:17:34 +0100 Subject: [Biopython-dev] Dropping Python 2.5 and Jython 2.5 support? Message-ID: Dear Biopythoneers, Would anyone object to us preparing to drop support for Python 2.5 and Jython 2.5, perhaps after the next Biopython release? To reassure those of you using Jython, we'd wait until Jython 2.7 is out first. Jython 2.7 is already in alpha, and brings support for C Python 2.7 language features. Thanks, Peter From eric.talevich at gmail.com Mon Oct 22 21:53:55 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 22 Oct 2012 17:53:55 -0400 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346926418.97489.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: On Mon, Oct 22, 2012 at 1:08 PM, Peter Cock wrote: > On Fri, Sep 28, 2012 at 11:50 AM, Peter Cock > wrote: > > On Thu, Sep 20, 2012 at 10:08 AM, Peter Cock > wrote: > >> On Sun, Sep 16, 2012 at 1:34 PM, Peter Cock > wrote: > >>>> > >>>> I guess we need to have a little hack with the 2to3 library and > >>>> try defining our own custom fixer for the imports... > >>> > >>> I've made a start at this - the easy part seems to work :) > >>> > >>> https://github.com/peterjc/biopython/commits/py3lower > >>> > >>> ... > > > > The code to do this lower case name mangling remains > > a quite spaghetti like mess in do2to3.py but it now works > > enough to pass the test suite (with some but not all 3rd > > party dependencies installed) under Linux and my Mac > > OS X machine (where like Windows I have a case > > insensitive file system). > > > > ... > > > > So this idea to adopt PEP8 lower case module names > > as part of supporting Python 3 appears to be technically > > viable. > > Has anyone else tried this branch yet? Has the lower case > module names under Python 3 idea grown on anyone? > I think it makes sense in terms of a long term vision - I do > expect to be primarily working under Python 3 within a > couple of years. > > It occurs to me we can make a partial step in this direction > with moving to a directory for Bio.Seq, since this could be > Bio.seq instead. For example, we talked about something > like this: > > Bio.Seq -> Bio.seq > Bio.SeqRecord -> Bio.seq.record > Bio.SeqFeature -> Bio.seq.feature > Bio.SeqUtils -> Bio.seq.utils > Bio.SearchIO -> Bio.seq.search > > I'm not 100% sure where the Bio.SeqIO top level functions > would belong, either directly under Bio.seq or Bio.seq.record > might work too. > Personally, I've used the variable name "seq" an awful lot, so I'm wary of using "seq" as a module name. However, reasonable coding style could make this easy to avoid if we have a "seq" module containing all of Seq, SeqRecord and SeqFeature (maybe even Alphabet), and "sequtil" containing standalone functions. Result: # Everything you need to build a new sequence record, but not much else from Bio.seq import Seq, SeqRecord, SeqFeature # Working with sequence strings from Bio import sequtil It also seems reasonable to treat molecular sequences as the implied core object type at the top-level namespace. From that viewpoint, Bio.Search would mean sequence search, as everything else is typically tucked away in a sub-module like PDB (pdb?), Motif (motif), or Phylo (phylo); then it's also fine to keep seqio and alignio directly under the Bio namespace. (Given a clean, I'd prefer "from Bio import Seq, SeqRecord, SeqFeature", but since those are already module names it would be brutal to make that transition now.) > We can have imports setup so that all the classes etc > are only defined once, e.g. Bio/seq/__init__.py could > initially just contain 'from Bio.Seq import *' and so on. > > Sounds cool. We'll need to watch out for the PDB module, where classes and modules have identical names, and the class names are imported to shadow the module names at import time. -Eric From p.j.a.cock at googlemail.com Mon Oct 22 22:59:21 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 22 Oct 2012 23:59:21 +0100 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346926418.97489.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: On Mon, Oct 22, 2012 at 10:53 PM, Eric Talevich wrote: > > Personally, I've used the variable name "seq" an awful lot, so I'm wary of > using "seq" as a module name. However, reasonable coding style could make > this easy to avoid if we have a "seq" module containing all of Seq, > SeqRecord and SeqFeature (maybe even Alphabet), and "sequtil" containing > standalone functions. > > Result: > > # Everything you need to build a new sequence record, but not much else > from Bio.seq import Seq, SeqRecord, SeqFeature I'd been picturing: from Bio.seq import Seq from Bio.seq.record import SeqRecord from Bio.seq.feature import SeqFeature but you're right, those three classes could all be exposed at the level of Bio.seq (while still having the SeqRecord defined in the file Bio/seq/record.py and SeqFeature etc in Bio/seq/feature.py) for connivence. > # Working with sequence strings > from Bio import sequtil If you mean strings rather than Seq objects, currently Bio.SeqUtils should most work on Seq or strings. It is kind of an odds and ends module, rather than deliberately focusing on sequences as strings. > It also seems reasonable to treat molecular sequences as the implied core > object type at the top-level namespace. From that viewpoint, Bio.Search > would mean sequence search, as everything else is typically tucked away in a > sub-module like PDB (pdb?), Motif (motif), or Phylo (phylo); then it's also > fine to keep seqio and alignio directly under the Bio namespace. Having sequence stuff collected under Bio.Seq or Bio.seq (or bio.seq if we go with the lower case plan for Python 3) seems more organised. It also keeps the import times down for people not working with sequences (e.g. a script using clustering or PDB files). > (Given a clean, I'd prefer "from Bio import Seq, SeqRecord, SeqFeature", but > since those are already module names it would be brutal to make that > transition now.) That isn't a good plan anyway in terms of polluting the namespace and loading things into memory for anyone not working with sequences. >> We can have imports setup so that all the classes etc >> are only defined once, e.g. Bio/seq/__init__.py could >> initially just contain 'from Bio.Seq import *' and so on. >> > > Sounds cool. We'll need to watch out for the PDB module, where classes and > modules have identical names, and the class names are imported to shadow the > module names at import time. The shadowing was one of the gotchas in the auto-conversion of all the module names to lower case - but solvable. Adopting lower case module names has the bonus of fixing this in the long term. Peter From kjwu at ucsd.edu Wed Oct 24 22:38:04 2012 From: kjwu at ucsd.edu (Kevin Wu) Date: Wed, 24 Oct 2012 15:38:04 -0700 Subject: [Biopython-dev] KEGG API Wrapper In-Reply-To: References: <1350128284.4997.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: Hi All, Thanks for the comments, I've written a bit of documentation on the entire KEGG module and have attached those relevant pages to the email. There didn't seem like an appropriate place for examples, so I just added a new chapter. I've also committed the updated file to github. I did leave out the parsers due to the fact that the current parsers only cover a small portion of possible responses from the api. Also, I'm not confident that the some of the parsers correctly retrieves all the fields. However, I've written a really general parser that does a rough job of retrieving fields if it's a database format returned since I find myself reusing the code for all database formats. It's possible to modify this to correctly account for the different fields, but would probably take a bit of work to manually figure each field out. Otherwise it also parses the tsv/flat file returned. Also, @zach, thanks for checking it out and testing it! Thanks All! Kevin On Wed, Oct 17, 2012 at 4:09 AM, Peter Cock wrote: > On Wed, Oct 17, 2012 at 12:55 AM, Zachary Charlop-Powers > wrote: > > Kevin, > > Michiel, > > > > I just tested Kevin's code for a few simple queries and it worked great. > I > > have always liked KEGG's organization of data and really appreciate this > > RESTful interface to their data; in some ways I think it easier to use > the > > web interfaces for KEGG than it is for NCBI. Plus the KEGG coverage of > > metabolic networks is awesome. I found the examples in Kevin's test > script > > to be fairly self-explanatory but a simple-spelled out example in the > > Tutorial would be nice. > > > > One thought, though, is that you can retrieve MANY different types of > data > > from the KEGG Rest API - which means that the user will probably have to > > parse the data his/herself. Data retrieved with "list" can return lists > of > > genes or compounds or organism and after a cursory look these are each > > formatted differently. Also true with the 'find' command. So I think you > > were right to leave out parsers because i think they will be a moving > target > > highly dependent on the query. > > > > Thank You Kevin, > > zach cp > > Good point about decoupling the web API wrapper and the parsers - > how the Bio.Entrez module and Bio.TogoWS handle this is to return > handles for web results, which you can then parse with an appropriate > parser (e.g. SeqIO for GenBank files, Medline parser, etc). > > Note that this is a little more fiddly under Python 3 due to the text > mode distinction between unicode and binary... just something to > keep in the back of your mind. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -------------- next part -------------- A non-text attachment was scrubbed... Name: KEGG documentation.pdf Type: application/pdf Size: 128597 bytes Desc: not available URL: From cmccoy at fhcrc.org Thu Oct 25 21:36:44 2012 From: cmccoy at fhcrc.org (Connor McCoy) Date: Thu, 25 Oct 2012 14:36:44 -0700 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs Message-ID: Hello, About a year ago, pip support came up on the list: http://biopython.org/pipermail/biopython-dev/2011-October/009234.html I remember this being resolved, but when I try to install biopython with pip, it fails: $ testenv/bin/pip install biopython Downloading/unpacking biopython Running setup.py egg_info for package biopython warning: no previously-included files matching '.cvsignore' found under directory '*' warning: no previously-included files matching '*.pyc' found under directory '*' Installing collected packages: biopython Running setup.py install for biopython Numerical Python (NumPy) is not installed. This package is required for many Biopython features. Please install it before you install Biopython. You can install Biopython anyway, but anything dependent on NumPy will not work. If you do this, and later install NumPy, you should then re-install Biopython. You can find NumPy at http://numpy.scipy.org Complete output from command /home/cmccoy/development/seqmagick/testenv/bin/python -c "import setuptools;__file__='/home/cmccoy/development/seqmagick/testenv/build/biopython/set up.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /tmp/pip-wc___H-record/install-record.txt - -install-headers /home/cmccoy/development/seqmagick/testenv/include/site/python2.7: running install Numerical Python (NumPy) is not installed. This package is required for many Biopython features. Please install it before you install Biopython. You can install Biopython anyway, but anything dependent on NumPy will not work. If you do this, and later install NumPy, you should then re-install Biopython. You can find NumPy at http://numpy.scipy.org ---------------------------------------- Command /home/cmccoy/development/seqmagick/testenv/bin/python -c "import setuptools;__file__='/home/cmccoy/development/seqmagick/testenv/build/biopython/setup.py';exec(compile(open( __file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /tmp/pip-wc___H-record/install-record.txt --install-headers /home/cm ccoy/development/seqmagick/testenv/include/site/python2.7 failed with error code 255 in /home/cmccoy/development/seqmagick/testenv/build/biopython Storing complete log in /home/cmccoy/.pip/pip.log Same for libraries which list biopython in `install_requires`. Does anyone know of a way around this? Thanks, Connor -- Connor McCoy Fred Hutchinson Cancer Research Center 1100 Fairview Ave N. Seattle, WA 98109-1924 cmccoy at fhcrc.org From mjldehoon at yahoo.com Fri Oct 26 02:52:42 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 25 Oct 2012 19:52:42 -0700 (PDT) Subject: [Biopython-dev] KEGG API Wrapper In-Reply-To: Message-ID: <1351219962.39081.YahooMailClassic@web164002.mail.gq1.yahoo.com> Hi Kevin, Thanks for the documentation! That makes everything a lot clearer. Overall I like the querying code and I think we should add it to Biopython. I have a bunch of comments on the KEGG module, some on the existing code and some on the new querying code, see below. Most of these are trivial; some may need some further discussion. Perhaps could you let us know which of these comments you can address, and which ones you want to skip for now? Once we converged with regards to the querying code and the documentation, I think we can import your version of the KEGG module into the main Biopython repository and add your chapter on KEGG to the main documentation, and continue from there on the parsers and the unit tests. Many thanks! -Michiel. About the querying code: ---------------------------------- I would replace KEGG.query("list", KEGG.query("find", KEGG.query("conv", KEGG.query("link", KEGG.query("info", KEGG.query("get" by the functions KEGG.list, KEGG.find, KEGG.conv, KEGG.link, KEGG.info, and KEGG.get. For list, find, conv, link, and info, instead of going through KEGG.generic_parser, I would return the result directly as a Python list. In contrast, KEGG.get should return the handle to the results, not the data itself. So the _q function, instead of ? ... ? resp = urllib2.urlopen(req) ? data = resp.read() ? return query_url, data have ? ... ? resp = urllib2.urlopen(req) ? return resp Then the user can decide whether to parse the data on the fly with Bio.KEGG, or read the data line by line and pick up what they are interested in, or to get all data from the handle and save it in a file. Note that resp will have a .url attribute that contains the url, so you won't need the ret_url keyword. About the parsers: ------------------------ I think that we should drop generic_parser. For link, find, conv, link, and info, parsing is trivial and can be done by the respective functions directly. For get, we already have an appropriate parser for some databases (compound, map, and enzyme), but it's easy to add parsers for the other databases. For all parsers in Biopython, there is the question whether the record should store information in attributes (as is currently done in Bio.KEGG), or alternatively if the record should inherit from a dictionary and store information in keys in the dictionary. Personally I have a preference for a dictionary, since that allows us to use the exact same keys in the dictionary as is used in the file (e.g., we can use "CLASS" as a key, while we cannot use .class as an attribute since it is a reserved word, so we use .classname instead). But other Biopython developers may not agree with me, and to some extent it depends on personal preference. The parsers miss some key words. The ones I noticed are ALL_REAC, REFERENCE, and ORTHOLOGY. Probably we'll find more once we extend the unit tests. Remove the ';' at the end of each term in record.classname. Convert record.genes to a dictionary for each organism. So instead of [('HSA', ['5236', '55276']), ('PTR', ['456908', '461162']), ('PON', ['100190836', '100438793']), ('MCC', ['100424648', '699401']... have {'HSA': ['5236', '55276'], 'PTR': ['456908', '461162'], 'PON': ['100190836', '100438793'], 'MCC': ['100424648', '699401'], ... Also for record.dblinks, record.disease, record.structures, use a dictionary. In record.pathway, all entries start with 'PATH'. Perhaps we should check with KEGG if there could be anything else than 'PATH' there, otherwise I don't see the reason why it's there. Assuming that there could be something different there, I would also use a dictionary with 'PATH' as the key. In record.reaction, some chemical names can be very long and extend over multiple lines. In such cases, the continuation line starts with a '$'. The parser should remove the '$' and join the two lines. About the tests: -------------------- We should update the data files in Tests/KEGG. This will fix some "bugs" in these data files. We should switch test_KEGG.py to the unit test framework. We should do some more extensive testing to make sure we are not missing some key words. About the documentation: --------------------------------- It's great that we now have some documentation. On page 233, I would suggest to replace the "id_" by "accession" or something else, since the underscore in "id_" may look funky to new users. Also it may be better not to reuse variable names (e.g. "pathway" is used in three different ways in the example). It's OK of course in general, but for this example it may be more clear to distinguish the different usages of this variable from each other. For repair_genes, you can use a set instead of a list throughout. --- On Wed, 10/24/12, Kevin Wu wrote: From: Kevin Wu Subject: Re: [Biopython-dev] KEGG API Wrapper To: "Peter Cock" , "Zachary Charlop-Powers" , "Michiel de Hoon" Cc: Biopython-dev at lists.open-bio.org Date: Wednesday, October 24, 2012, 6:38 PM Hi All, Thanks for the comments, I've written a bit of documentation on the entire KEGG module and have attached those relevant pages to the email. There didn't seem like an?appropriate place for examples, so I just added a new chapter. I've also committed the updated file to github. I did leave out the parsers due to the fact that the current parsers only cover a small portion of possible responses from the api. Also, I'm not confident that the some of the parsers correctly retrieves all the fields. However, I've written a really general parser that does a rough job of retrieving fields if it's a database format returned since I find myself reusing the code for all database formats. It's possible to modify this to correctly account for the different fields, but would probably take a bit of work to manually figure each field out. Otherwise it also parses the tsv/flat file returned. Also, @zach, thanks for checking it out and testing it! Thanks All!Kevin On Wed, Oct 17, 2012 at 4:09 AM, Peter Cock wrote: On Wed, Oct 17, 2012 at 12:55 AM, Zachary Charlop-Powers wrote: > Kevin, > Michiel, > > I just tested Kevin's code for a few simple queries and it worked great. I > have always liked KEGG's organization of data and really appreciate this > RESTful interface to their data; in some ways I think it easier to use the > web interfaces for KEGG than it is for NCBI. Plus the KEGG coverage of > metabolic networks is awesome. ?I found the examples in Kevin's test script > to be fairly self-explanatory but a simple-spelled out example in the > Tutorial would be nice. > > One thought, though, is that you can retrieve MANY different types of data > from the KEGG Rest API - which means that the user will probably have to > parse the data his/herself. Data retrieved with "list" can return lists of > genes or compounds or organism and after a ?cursory look ?these are each > formatted differently. Also true with the 'find' command. So I think you > were right to leave out parsers because i think they will be a moving target > highly dependent on the query. > > Thank You Kevin, > zach cp Good point about decoupling the web API wrapper and the parsers - how the Bio.Entrez module and Bio.TogoWS handle this is to return handles for web results, which you can then parse with an appropriate parser (e.g. SeqIO for GenBank files, Medline parser, etc). Note that this is a little more fiddly under Python 3 due to the text mode distinction between unicode and binary... just something to keep in the back of your mind. Peter _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From kai.blin at biotech.uni-tuebingen.de Fri Oct 26 08:35:56 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 26 Oct 2012 10:35:56 +0200 Subject: [Biopython-dev] Status of SearchIO Message-ID: <508A4B6C.6020801@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi folks, In the summer, I've written a HMMer2 parser based on Bow's SearchIO code. I'm finally getting around to continue work on the project I needed this parser for, and I'm trying to get my code up-to-date. I notice that Bow's code hasn't hit the biopython master tree yet, and also doesn't rebase cleanly on top of it. A merge gives a couple of merge conflicts, but seems manageable. However, I'd prefer to stick to the upstream sources instead of maintaining my own branch containing Bow's SearchIO code merged to master. What's the chance of this happening any time soon, and can I help? Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQiktsAAoJEKM5lwBiwTTPuDMH/33PGo/zLpBGw+dKIBXZ9b9L opaoI5uUsj4XzWU1A8u50BXFqa6ogwUWeZFaA2j25nQgEClWA5TFdHAJM4urTTgD pM2g2rsL/yLSrVifM95c2IcRW2z7dunccpJDd6cc82BRpqqgGWrkNo7OSUk/exP3 DbfooBw66Scxt+6o6S9zEH4IY5giuDOGzwQm195TCaZ/x/8/y1F8Ub/8Aporbj47 eJgZmEKzh0k8KePKOdyCmnt/d/bDGplFSvgqXET6Q0jmVAG44lAU679UPCmNiuJr VZD2SMRKy+Buy3TjJjQCeUEm+awN4T2LnPLDJgJkvRHjl6G+M9aljsuL78uCp9g= =1Nrt -----END PGP SIGNATURE----- From p.j.a.cock at googlemail.com Fri Oct 26 09:21:50 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 10:21:50 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: <508A4B6C.6020801@biotech.uni-tuebingen.de> References: <508A4B6C.6020801@biotech.uni-tuebingen.de> Message-ID: On Fri, Oct 26, 2012 at 9:35 AM, Kai Blin wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi folks, > > In the summer, I've written a HMMer2 parser based on Bow's SearchIO > code. I'm finally getting around to continue work on the project I > needed this parser for, and I'm trying to get my code up-to-date. > > I notice that Bow's code hasn't hit the biopython master tree yet, and > also doesn't rebase cleanly on top of it. A merge gives a couple of > merge conflicts, but seems manageable. However, I'd prefer to stick to > the upstream sources instead of maintaining my own branch containing > Bow's SearchIO code merged to master. > > What's the chance of this happening any time soon, and can I help? > > Cheers, > Kai I'm not sure where the merge conflict is - Bow can probably help and confirm you're looking at the appropriate branch. What would help is comments on the name space ideas in this thread, since one major point we need to settle ASAP is where in the namespace SearchIO would go (since it probably won't just stay as Bio.SearchIO as it is on the branch): http://lists.open-bio.org/pipermail/biopython-dev/2012-September/009910.html ... http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html ... Peter From w.arindrarto at gmail.com Fri Oct 26 09:33:35 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 26 Oct 2012 11:33:35 +0200 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <508A4B6C.6020801@biotech.uni-tuebingen.de> Message-ID: Hi Kai, Peter, For the merge conflict, which branch are you using? Can you point to specific commits that cause the conflicts? I haven't tried merging / rebasing my own branch to the current master myself ~ so knowing this should help the process as well. And suggestions are still welcomed for the namespace :). Bio.SearchIO is the current one, but we have other alternatives (the most recent one being Bio.seq.search; following the change in Bio.Seq -> Bio.seq namespace change). Also, I think there are still some issues that need to be dealt with before we put SearchIO into master, notably with Bio.BLAST module. If not the official deprecation notice, at least the the tutorial has to be updated (let Bio.BLAST readers know about the plan with SearchIO). I've written a short tutorial here: http://bow.web.id/biopython/Tutorial.html. This is still a draft, but you can already see that there are some obvious overlaps between Bio.BLAST and Bio.SearchIO, which is confusing to new readers. regards, Bow On Fri, Oct 26, 2012 at 11:21 AM, Peter Cock wrote: > On Fri, Oct 26, 2012 at 9:35 AM, Kai Blin > wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Hi folks, > > > > In the summer, I've written a HMMer2 parser based on Bow's SearchIO > > code. I'm finally getting around to continue work on the project I > > needed this parser for, and I'm trying to get my code up-to-date. > > > > I notice that Bow's code hasn't hit the biopython master tree yet, and > > also doesn't rebase cleanly on top of it. A merge gives a couple of > > merge conflicts, but seems manageable. However, I'd prefer to stick to > > the upstream sources instead of maintaining my own branch containing > > Bow's SearchIO code merged to master. > > > > What's the chance of this happening any time soon, and can I help? > > > > Cheers, > > Kai > > I'm not sure where the merge conflict is - Bow can probably help > and confirm you're looking at the appropriate branch. > > What would help is comments on the name space ideas in this > thread, since one major point we need to settle ASAP is where > in the namespace SearchIO would go (since it probably won't > just stay as Bio.SearchIO as it is on the branch): > > > http://lists.open-bio.org/pipermail/biopython-dev/2012-September/009910.html > ... > http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html > ... > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From p.j.a.cock at googlemail.com Fri Oct 26 09:43:28 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 10:43:28 +0100 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: References: Message-ID: On Thu, Oct 25, 2012 at 10:36 PM, Connor McCoy wrote: > Hello, > > About a year ago, pip support came up on the list: > > http://biopython.org/pipermail/biopython-dev/2011-October/009234.html > > I remember this being resolved, but when I try to install biopython with > pip, it fails: > > $ testenv/bin/pip install biopython > > Downloading/unpacking biopython > Running setup.py egg_info for package biopython > > warning: no previously-included files matching '.cvsignore' found > under directory '*' > warning: no previously-included files matching '*.pyc' found under > directory '*' > Installing collected packages: biopython > Running setup.py install for biopython > > Numerical Python (NumPy) is not installed. > > This package is required for many Biopython features. Please > install > it before you install Biopython. You can install Biopython anyway, > but > anything dependent on NumPy will not work. If you do this, and later > install NumPy, you should then re-install Biopython. > > You can find NumPy at http://numpy.scipy.org > > Complete output from command > /home/cmccoy/development/seqmagick/testenv/bin/python -c "import > setuptools;__file__='/home/cmccoy/development/seqmagick/testenv/build/biopython/set > up.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), > __file__, 'exec'))" install --single-version-externally-managed --record > /tmp/pip-wc___H-record/install-record.txt - > -install-headers > /home/cmccoy/development/seqmagick/testenv/include/site/python2.7: > running install > > > > Numerical Python (NumPy) is not installed. > > > > This package is required for many Biopython features. Please install > > it before you install Biopython. You can install Biopython anyway, but > > anything dependent on NumPy will not work. If you do this, and later > > install NumPy, you should then re-install Biopython. > > > > You can find NumPy at http://numpy.scipy.org > > > > ---------------------------------------- > Command /home/cmccoy/development/seqmagick/testenv/bin/python -c > "import > setuptools;__file__='/home/cmccoy/development/seqmagick/testenv/build/biopython/setup.py';exec(compile(open( > __file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install > --single-version-externally-managed --record > /tmp/pip-wc___H-record/install-record.txt --install-headers /home/cm > ccoy/development/seqmagick/testenv/include/site/python2.7 failed with > error code 255 in /home/cmccoy/development/seqmagick/testenv/build/biopython > Storing complete log in /home/cmccoy/.pip/pip.log > > > Same for libraries which list biopython in `install_requires`. > > Does anyone know of a way around this? > > Thanks, > Connor Hi Connor, This is probably a question for Brad - I don't use pip. Was it sitting stalled at the prompt from Biopython's setup.py? "Do you want to continue this installation? (y/N)" or from pip? i.e. What was at the end of the complete log? In terms of a quick workaround, what we use under TravisCI (where most of the targets don't have numpy installed) is piping a yes on stdin, e.g. $ /usr/bin/yes | python setup.py install Peter From p.j.a.cock at googlemail.com Fri Oct 26 10:31:06 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 11:31:06 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: <508A6535.6070507@biotech.uni-tuebingen.de> References: <508A4B6C.6020801@biotech.uni-tuebingen.de> <508A6535.6070507@biotech.uni-tuebingen.de> Message-ID: On Fri, Oct 26, 2012 at 11:25 AM, Kai Blin wrote: >> Also, I think there are still some issues that need to be dealt >> with before we put SearchIO into master, notably with Bio.BLAST >> module. If not the official deprecation notice, at least the the >> tutorial has to be updated (let Bio.BLAST readers know about the >> plan with SearchIO). I've written a short tutorial here: >> http://bow.web.id/biopython/Tutorial.html. This is still a draft, >> but you can already see that there are some obvious overlaps >> between Bio.BLAST and Bio.SearchIO, which is confusing to new >> readers. > > Personally I wouldn't let this consideration block the inclusion of a > module as useful like that. Of course I need this code, so I'm biased. I'm also OK with merging the code before updating the Tutorial chapter on BLAST (which would probably become a broader chapter on BLAST and other tools using SearchIO). As discussed before, the long term aim would be to remove Bio.BLAST. > I'll have to read up on the namespace discussion. While I see the > benefit of using PEP8 names, intuitively I don't like bio.seq.search > much. Then again, I started my life in Bio* with BioPerl, and like the > pretty similar module layout BioPython has so far. Yeah - the current naming of SeqIO and AlignIO was directly inspired by BioPerl, and give the working name of SearchIO. Peter From kai.blin at biotech.uni-tuebingen.de Fri Oct 26 10:25:57 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 26 Oct 2012 12:25:57 +0200 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <508A4B6C.6020801@biotech.uni-tuebingen.de> Message-ID: <508A6535.6070507@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-10-26 11:33, Wibowo Arindrarto wrote: > Hi Kai, Peter, > > For the merge conflict, which branch are you using? Can you point > to specific commits that cause the conflicts? I haven't tried > merging / rebasing my own branch to the current master myself ~ so > knowing this should help the process as well. For merging, I think I had to change .travis.yml setup.py and Tests/run_tests.py .travis.yml and setup.py mainly had whitespace changes in comments, so I just went with the version from master on those changes. As I said, nothing really huge. https://github.com/kblin/biopython/tree/searchio-merge is the merged tree. The rebase had a number of things, I just gave up on that. > Also, I think there are still some issues that need to be dealt > with before we put SearchIO into master, notably with Bio.BLAST > module. If not the official deprecation notice, at least the the > tutorial has to be updated (let Bio.BLAST readers know about the > plan with SearchIO). I've written a short tutorial here: > http://bow.web.id/biopython/Tutorial.html. This is still a draft, > but you can already see that there are some obvious overlaps > between Bio.BLAST and Bio.SearchIO, which is confusing to new > readers. Personally I wouldn't let this consideration block the inclusion of a module as useful like that. Of course I need this code, so I'm biased. I'll have to read up on the namespace discussion. While I see the benefit of using PEP8 names, intuitively I don't like bio.seq.search much. Then again, I started my life in Bio* with BioPerl, and like the pretty similar module layout BioPython has so far. Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQimU1AAoJEKM5lwBiwTTPLUsH/i1C1jWmSgjk3PZSOo2kpn4l sGfonyZ7UcyOyM1RYMOc9xaJwevyGJbxVpdmhzIsCr8WZ2++uTgqwOKHROw84bu4 BfVTovUD3mNUK3kGEemOQQal8HyjTZozRFmPgQpSSTOOgQE964kA7mm2HJH9sNx9 NHUKj+dk7UwmbzETl2Q0/1lmxdptOVCTyQvwMzleCX4dwgdGumyrNiBQmBLerAKV CRW8cVmVPKkVUokuzWpt6LPZIoUxMz5RVmTJktOX0fpg79ULfXQucByrGtGQbiSR JMWGrK5yCliSz1WqV8r/Tx0VfPmEeiZFyzZb5KiAFE88sJK85cbFgUBegUTDZSU= =372O -----END PGP SIGNATURE----- From w.arindrarto at gmail.com Fri Oct 26 10:38:50 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 26 Oct 2012 12:38:50 +0200 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <508A4B6C.6020801@biotech.uni-tuebingen.de> <508A6535.6070507@biotech.uni-tuebingen.de> Message-ID: >> Also, I think there are still some issues that need to be dealt > > >> with before we put SearchIO into master, notably with Bio.BLAST > >> module. If not the official deprecation notice, at least the the > >> tutorial has to be updated (let Bio.BLAST readers know about the > >> plan with SearchIO). I've written a short tutorial here: > >> http://bow.web.id/biopython/Tutorial.html. This is still a draft, > >> but you can already see that there are some obvious overlaps > >> between Bio.BLAST and Bio.SearchIO, which is confusing to new > >> readers. > > > > Personally I wouldn't let this consideration block the inclusion of a > > module as useful like that. Of course I need this code, so I'm biased. > > I'm also OK with merging the code before updating the Tutorial > chapter on BLAST (which would probably become a broader > chapter on BLAST and other tools using SearchIO). As discussed > before, the long term aim would be to remove Bio.BLAST. Ah, ok then :). There are other things I'm still working on at the moment (BLAST plain text writer, details about migrating from Bio.Blast), but I consider these to be less urgent than the tutorial. If everyone is ok for merging, then I'm good too :). I suppose we are going to use the 'beta' new feature warning here, right? > > I'll have to read up on the namespace discussion. While I see the > > benefit of using PEP8 names, intuitively I don't like bio.seq.search > > much. Then again, I started my life in Bio* with BioPerl, and like the > > pretty similar module layout BioPython has so far. > > Yeah - the current naming of SeqIO and AlignIO was directly > inspired by BioPerl, and give the working name of SearchIO. > > Peter Reaching a unanimous decision on name preference seems difficult :/. We now have: 1. Bio.seq.search (in line with the namespace change) 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used to be Bio.SeqSearch, now adjusted for PEP8 compliance) 3. Bio.search (same reasoning + explanation like Bio.seqsearch). 4. Bio.SearchIO / Bio.searchio 5. Bio.psearch (p for pairwise) Any other suggestions? Should we put it to a vote? regards, Bowo From p.j.a.cock at googlemail.com Fri Oct 26 10:51:32 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 11:51:32 +0100 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: <508A694B.7030800@biotech.uni-tuebingen.de> References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> Message-ID: On Fri, Oct 26, 2012 at 11:43 AM, Kai Blin wrote: > > Hi folks, > > I realize I'm late to this party, but I was asked to give an opinion > in the SearchIO thread. > > On 2012-09-06 09:06, Peter Cock wrote: >> For single user machines, where the single user has only a small >> collection of scripts this isn't such an issue. For any shared >> server, or user with lots of Biopython scripts (some of which may >> have been written by different people), you would be forced into a >> mass change at one go. >> >> You would also have considerable hassle later on with any attempt >> to re-run old scripts. > > In my opinion, this is where python virtualenv [1] can really make > life easier, and I'd recommend this for running old library versions > anyway. > > I'd rather do the correct change now, for every version of python, and > explain to people how to set up virtualenvs for their older scripts. I don't think this is practical - you'd have a *lot* of explaining to do for all the users who'd be bitten by such a big non-backward compatible change (and associated systems administrators). Indirectly it sounds like you like the lower case name idea - what do you think about making this switch under Python 3? (This will only inconvenience the relatively small number of early adopters already trying Biopython under Python 3 - but it would be another bump for people transitioning from Python 2 to 3). Peter From p.j.a.cock at googlemail.com Fri Oct 26 10:57:16 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 11:57:16 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <508A4B6C.6020801@biotech.uni-tuebingen.de> <508A6535.6070507@biotech.uni-tuebingen.de> Message-ID: On Fri, Oct 26, 2012 at 11:38 AM, Wibowo Arindrarto wrote: >>> Also, I think there are still some issues that need to be dealt >> >> >> with before we put SearchIO into master, notably with Bio.BLAST >> >> module. If not the official deprecation notice, at least the the >> >> tutorial has to be updated (let Bio.BLAST readers know about the >> >> plan with SearchIO). I've written a short tutorial here: >> >> http://bow.web.id/biopython/Tutorial.html. This is still a draft, >> >> but you can already see that there are some obvious overlaps >> >> between Bio.BLAST and Bio.SearchIO, which is confusing to new >> >> readers. >> > >> > Personally I wouldn't let this consideration block the inclusion of a >> > module as useful like that. Of course I need this code, so I'm biased. >> >> I'm also OK with merging the code before updating the Tutorial >> chapter on BLAST (which would probably become a broader >> chapter on BLAST and other tools using SearchIO). As discussed >> before, the long term aim would be to remove Bio.BLAST. > > Ah, ok then :). There are other things I'm still working on at the > moment (BLAST plain text writer, details about migrating from > Bio.Blast), but I consider these to be less urgent than the tutorial. > If everyone is ok for merging, then I'm good too :). I suppose we are > going to use the 'beta' new feature warning here, right? Yes to the 'beta' warning. I'd like to get some wider testing with community feedback on the API, while giving us the option to change it before declaring it stable. >> > I'll have to read up on the namespace discussion. While I see the >> > benefit of using PEP8 names, intuitively I don't like bio.seq.search >> > much. Then again, I started my life in Bio* with BioPerl, and like the >> > pretty similar module layout BioPython has so far. >> >> Yeah - the current naming of SeqIO and AlignIO was directly >> inspired by BioPerl, and give the working name of SearchIO. >> >> Peter > > Reaching a unanimous decision on name preference seems difficult :/. > We now have: > > 1. Bio.seq.search (in line with the namespace change) > 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used > to be Bio.SeqSearch, now adjusted for PEP8 compliance) > 3. Bio.search (same reasoning + explanation like Bio.seqsearch). > 4. Bio.SearchIO / Bio.searchio > 5. Bio.psearch (p for pairwise) > > Any other suggestions? Should we put it to a vote? I'd like a consensus first on the larger question of should we adopt lower case module names automatically under Python 3. In that case, option (1) about would be bio.seq.search under Python 3, and so on. Peter From kai.blin at biotech.uni-tuebingen.de Fri Oct 26 10:43:23 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 26 Oct 2012 12:43:23 +0200 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: <508A694B.7030800@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-09-06 09:06, Peter Cock wrote: Hi folks, I realize I'm late to this party, but I was asked to give an opinion in the SearchIO thread. > For single user machines, where the single user has only a small > collection of scripts this isn't such an issue. For any shared > server, or user with lots of Biopython scripts (some of which may > have been written by different people), you would be forced into a > mass change at one go. > > You would also have considerable hassle later on with any attempt > to re-run old scripts. In my opinion, this is where python virtualenv [1] can really make life easier, and I'd recommend this for running old library versions anyway. I'd rather do the correct change now, for every version of python, and explain to people how to set up virtualenvs for their older scripts. Cheers, Kai [1] http://pypi.python.org/pypi/virtualenv - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQimlLAAoJEKM5lwBiwTTPsswIAMnEn4AT8xrfsq3xzkbB6tS2 y5FkLAb11xDP5PpttA+5qDXmnmJuMFqYq8FsSnJnpVq+ZGSAkswFC1prqQp57LdG V+EVZtf/HDzepbrVgNYe272nTPlc6cxjmtjWJca19fg8gKI97ryUiji/bbOfgjgM cnGHeUYkGmrcWrI8ergOS/5qLi3Z6S6t+uJezPT3DkbSm8oiOVAuPrIv6MziX69W QrKF3Edf4s1Do4URSVfZI1qVUEGFaLZMYvZ8/TMgDI2CAQLo0r2OxylrjJxcuqIB nORFTdwFMD7npDLkyG5U4eWZpfAV9A4RHNTybhpb7RgdVHifnoivA0nIAhsIAWE= =3VH6 -----END PGP SIGNATURE----- From kai.blin at biotech.uni-tuebingen.de Fri Oct 26 12:21:21 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 26 Oct 2012 14:21:21 +0200 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> Message-ID: <508A8041.2020203@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-10-26 12:51, Peter Cock wrote: Hi Peter, > Indirectly it sounds like you like the lower case name idea - what > do you think about making this switch under Python 3? (This will > only inconvenience the relatively small number of early adopters > already trying Biopython under Python 3 - but it would be another > bump for people transitioning from Python 2 to 3). Actually, as someone who has to switch between BioPython and BioPerl a lot, I'd personally prefer if both libraries stayed as close as possible in their structure. In my opinion, the ability to easily switch between languages while using the Bio* libraries is one of the biggest features. As far as I understand we're just changing module names here, so all that'd be different would be the import lines. After reading thought this thread, I got the impression that there was a general agreement on switching to PEP8-compatible names eventually, and the remaining question was how to best do that. I haven't played with Python 3 much yet, but I have the impression that switching to it likely is going to be painful anyway. Even if the module renaming makes the transition a bit more painful, at least you've only got to go through the pain once. Assuming the translations between the 2.x and 3.x names can be done automatically by the conversion script, this sounds like a good idea. Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQioBBAAoJEKM5lwBiwTTPhxYIALTM1TQvOcE6upSFOCrfA0Uh irgvsQi77JfWvDsvGnOk74+ZQDDM2KGGAR3s9QBPdjRtaXhxSvdSxlXq3sdTNsXh VjbhEkeW6J3NzVSYbwK3U/mP0D9Xs6ihvnne06Nn7qjH+TLGm2x78cPM5SvjUcL3 QHiHda0wW479J9ZyKhmDTsCXqpX96uH3sjLiKZfs3KJbZ79j20BBWJqWypDuIUb7 DmtY/sngRsqs16yJL1Q35LXskOlCYsHOmJmkXg3Umr8gKOSw5nCEszhatXS3Oygo Pv8F7exvoEfNHg1IQtmEFycou9k5IaGVsZoRhCE6YvUCJH4Zfz4eOUTD323AzT4= =UPdn -----END PGP SIGNATURE----- From p.j.a.cock at googlemail.com Fri Oct 26 12:42:25 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 13:42:25 +0100 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: <508A8041.2020203@biotech.uni-tuebingen.de> References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> Message-ID: On Fri, Oct 26, 2012 at 1:21 PM, Kai Blin wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 2012-10-26 12:51, Peter Cock wrote: > > Hi Peter, > >> Indirectly it sounds like you like the lower case name idea - what >> do you think about making this switch under Python 3? (This will >> only inconvenience the relatively small number of early adopters >> already trying Biopython under Python 3 - but it would be another >> bump for people transitioning from Python 2 to 3). > > Actually, as someone who has to switch between BioPython and BioPerl a > lot, I'd personally prefer if both libraries stayed as close as > possible in their structure. In my opinion, the ability to easily > switch between languages while using the Bio* libraries is one of the > biggest features. As far as I understand we're just changing module > names here, so all that'd be different would be the import lines. > > After reading thought this thread, I got the impression that there was > a general agreement on switching to PEP8-compatible names eventually, > and the remaining question was how to best do that. Yes - hindered by the fact that due to file system limitations we can't have multiple capitalisations of a given module at the same time. Ideally we'd like to use bio.* as the namespace, and make this switch as part of moving to Python 3 is one way to do that. My personal preference is for a new lowercase namespace like biopy.* or biopython.* which can co-exist with Bio.* during a transition period. However, this did not seem popular. > I haven't played with Python 3 much yet, but I have the impression > that switching to it likely is going to be painful anyway. Even if the > module renaming makes the transition a bit more painful, at least > you've only got to go through the pain once. > > Assuming the translations between the 2.x and 3.x names can be done > automatically by the conversion script, this sounds like a good idea. That was my thinking - but it does go against the general advice to library authors in that API changes from Python 2.x to 3.x are discouraged. We can of course stick with Bio.* as it is (which I believe is Brad's favoured option). And I'm OK with this - it is the simplest option (and doesn't prevent us doing some more minor changes if we want to, such as reorganising all the Bio.SeqXXXX modules under one directory). Perhaps a blog post & email to the announcement mailing list soliciting feedback on this proposal is the best way forward, perhaps with a web-survey form? e.g. (1) Keep the namespace as 'Bio' (2) Keep the namespace as 'Bio' on Python 2, but adopt all lowercase module names on Python 3. (3) Move to a new all lowercase namespace like 'biopy' (anything except 'bio'), allowing the current 'Bio' namespace to continue to be available as well during a transition period. And the most disruptive option: (4) Switch to an all lowercase namespace 'bio', which cannot in general co-exist with the old 'Bio' namespace (perhaps bumping the version number to 2.0.0?). This would break legacy scripts, which would need to be updated, e.g.: from Bio.SeqRecord import SeqRecord from Bio import SeqIO could be replaced by: try: #Biopython 1.x uses Bio.* from Bio.SeqRecord import SeqRecord from Bio import SeqIO except ImportError: This would mean under Windows and most Mac install you cannot have both you (and all other users of the machine) m must be remove Regards, Peter From p.j.a.cock at googlemail.com Fri Oct 26 12:43:36 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 13:43:36 +0100 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> Message-ID: Arg - occidentally tabbed to the send button while trying to indent sample code... On Fri, Oct 26, 2012 at 1:42 PM, Peter Cock wrote: > > Perhaps a blog post & email to the announcement mailing list > soliciting feedback on this proposal is the best way forward, > perhaps with a web-survey form? e.g. > > (1) Keep the namespace as 'Bio' > > (2) Keep the namespace as 'Bio' on Python 2, > but adopt all lowercase module names on Python 3. > > (3) Move to a new all lowercase namespace like 'biopy' > (anything except 'bio'), allowing the current 'Bio' namespace > to continue to be available as well during a transition period. > > And the most disruptive option: > > (4) Switch to an all lowercase namespace 'bio', which > cannot in general co-exist with the old 'Bio' namespace > (perhaps bumping the version number to 2.0.0?). This > would break legacy scripts, which would need to be > updated, e.g.: > > from Bio.SeqRecord import SeqRecord > from Bio import SeqIO > > could be replaced by: try: #Biopython 1.x uses Bio.* from Bio.SeqRecord import SeqRecord from Bio import SeqIO except ImportError: > > > > > This would mean under Windows and most Mac install > you cannot have both > you (and all other users of the machine) m > must be remove > > Regards, > > Peter From p.j.a.cock at googlemail.com Fri Oct 26 12:50:23 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 13:50:23 +0100 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> Message-ID: On Fri, Oct 26, 2012 at 1:43 PM, Peter Cock wrote: > Arg - occidentally tabbed to the send button while trying to indent > sample code... Has something changed on GoogleMail's keyboard handling? Either that or I'm having a bad typing day... my apologies for the two extra emails. To continue: Perhaps a blog post & email to the announcement mailing list soliciting feedback on this proposal is the best way forward, perhaps with a web-survey form? e.g. (1) Keep the namespace as 'Bio' (2) Keep the namespace as 'Bio' on Python 2, but adopt all lowercase module names on Python 3. (3) Move to a new all lowercase namespace like 'biopy' (anything except 'bio'), allowing the current 'Bio' namespace to continue to be available as well during a transition period. And the most disruptive option: (4) Switch to an all lowercase namespace 'bio', which cannot in general co-exist with the old 'Bio' namespace (perhaps bumping the version number to 2.0.0?). This would break legacy scripts, which would need to be updated, e.g.: from Bio.SeqRecord import SeqRecord from Bio import SeqIO could be replaced by: try: #Biopython 1.x uses Bio.* from Bio.SeqRecord import SeqRecord from Bio import SeqIO except ImportError: #Try the new lowercase module names, from bio.seqrecord import SeqRecord from bio import seqio as SeqIO Users on Windows and most Mac users might find updating Biopython complicated during this transition due to the change in case of the folder names. For anyone installing from source this might require manual removal of the old folders (I ran into this kind of issue while trying the lower case naming under Python 3). Potentially under Linux (and any Mac using a case sensitive file system) an old Biopython install using Bio/ and the newer Biopython using bio/ could co-exist... we would have to look at that. Regards, Peter From kai.blin at biotech.uni-tuebingen.de Fri Oct 26 13:34:12 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 26 Oct 2012 15:34:12 +0200 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> Message-ID: <508A9154.8020507@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-10-26 14:42, Peter Cock wrote: > My personal preference is for a new lowercase namespace like > biopy.* or biopython.* which can co-exist with Bio.* during a > transition period. However, this did not seem popular. That'd still mean older scripts would break after the transition period, and we'll end up encoding the language name in the module, which seems a bit silly. Having said that, I see the least amount of pain for BioPython users going that route, with the possibly larger maintenance headache for BioPython developers. I think this is one of these "what color do we paint the bikeshed" discussions, where there really isn't any objectively superior solution. > That was my thinking - but it does go against the general advice to > library authors in that API changes from Python 2.x to 3.x are > discouraged. Right, but from dealing with the python folks on Freenode IRC, I gather that many of them assume the switch from Python 2.x to 3.x is a very low-impact change for code authors. I tend to disagree there. :) > We can of course stick with Bio.* as it is (which I believe is > Brad's favoured option). And I'm OK with this - it is the simplest > option (and doesn't prevent us doing some more minor changes if we > want to, such as reorganising all the Bio.SeqXXXX modules under one > directory). As I said, strong feeling of a bikeshed discussion here. :) > Perhaps a blog post & email to the announcement mailing list > soliciting feedback on this proposal is the best way forward, > perhaps with a web-survey form? e.g. To be honest, I don't care that much about which solution is decided on, as long as the decision is made soon. I've got some programs that need the HMMer2 parser that I've added to Bow's SearchIO code, and I'm hoping to get that into BioPython soon instead of having to ship with a custom BioPython for publication. Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQipFTAAoJEKM5lwBiwTTP4nkIAI5TegXeWy6b8FoPmq46XPzz iVh6g0t37xAJ9Aat3aE5vDklF7yqEwcVPKxFkj2Nd2MLaDqhfnuldE9pEqbPmZfl eQptF5JXTAlw/YKAPFzTyFSIlKv3wiuTiGeTxKJtXewOkgEu6VwzNgjPnCYhamaT Nda7NQEA6mlmaH7ABwO1mLLObk7i90oqVNDIuhnOAAA1ZrVnnQ4QHRupbiLZVd3d 3od3JVM4h+ZT5AL12Lts9lAdrc94MVri5i0P1VSQEnAQV/LJ5uoT2a4l2DRFM35R NR501X7ubTQPrK8ATveTWaCYYcn/XMnS7dEpvSWsxFR8oM+69LxF3UVtH2ShfDs= =Teym -----END PGP SIGNATURE----- From eric.talevich at gmail.com Fri Oct 26 15:19:23 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 26 Oct 2012 11:19:23 -0400 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <508A4B6C.6020801@biotech.uni-tuebingen.de> <508A6535.6070507@biotech.uni-tuebingen.de> Message-ID: On Fri, Oct 26, 2012 at 6:38 AM, Wibowo Arindrarto wrote: > >> Also, I think there are still some issues that need to be dealt > > > > >> with before we put SearchIO into master, notably with Bio.BLAST > > >> module. If not the official deprecation notice, at least the the > > >> tutorial has to be updated (let Bio.BLAST readers know about the > > >> plan with SearchIO). I've written a short tutorial here: > > >> http://bow.web.id/biopython/Tutorial.html. This is still a draft, > > >> but you can already see that there are some obvious overlaps > > >> between Bio.BLAST and Bio.SearchIO, which is confusing to new > > >> readers. > > > > > > Personally I wouldn't let this consideration block the inclusion of a > > > module as useful like that. Of course I need this code, so I'm biased. > > > > I'm also OK with merging the code before updating the Tutorial > > chapter on BLAST (which would probably become a broader > > chapter on BLAST and other tools using SearchIO). As discussed > > before, the long term aim would be to remove Bio.BLAST. > Bio.Blast does contain some features beyond parsing the output of BLAST... > > I'll have to read up on the namespace discussion. While I see the > > > benefit of using PEP8 names, intuitively I don't like bio.seq.search > > > much. Then again, I started my life in Bio* with BioPerl, and like the > > > pretty similar module layout BioPython has so far. > > > > Yeah - the current naming of SeqIO and AlignIO was directly > > inspired by BioPerl, and give the working name of SearchIO. > > > > Peter > > Reaching a unanimous decision on name preference seems difficult :/. > We now have: > > 1. Bio.seq.search (in line with the namespace change) > 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used > to be Bio.SeqSearch, now adjusted for PEP8 compliance) > 3. Bio.search (same reasoning + explanation like Bio.seqsearch). > 4. Bio.SearchIO / Bio.searchio > 5. Bio.psearch (p for pairwise) > > Any other suggestions? Should we put it to a vote? > > regards, > Bowo > > If it's down to a vote, I would vote to merge this branch as Bio.SearchIO, and perhaps lowercase it to Bio.searchio or biopy.searchio in the Py3 lowercase branch. Rationale: We already follow BioPerl with SeqIO and AlignIO, and it seems to help users. It's also Google-friendly. -Eric From p.j.a.cock at googlemail.com Fri Oct 26 15:42:18 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 16:42:18 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <508A4B6C.6020801@biotech.uni-tuebingen.de> <508A6535.6070507@biotech.uni-tuebingen.de> Message-ID: On Fri, Oct 26, 2012 at 4:19 PM, Eric Talevich wrote: > > Bio.Blast does contain some features beyond parsing the output of BLAST... > Also wrappers to call the tools, and the online search. Easy enough. >> Reaching a unanimous decision on name preference seems difficult :/. >> We now have: >> >> 1. Bio.seq.search (in line with the namespace change) >> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used >> to be Bio.SeqSearch, now adjusted for PEP8 compliance) >> 3. Bio.search (same reasoning + explanation like Bio.seqsearch). >> 4. Bio.SearchIO / Bio.searchio >> 5. Bio.psearch (p for pairwise) >> >> Any other suggestions? Should we put it to a vote? >> >> regards, >> Bowo >> > > If it's down to a vote, I would vote to merge this branch as Bio.SearchIO, > and perhaps lowercase it to Bio.searchio or biopy.searchio in the Py3 > lowercase branch. > > Rationale: We already follow BioPerl with SeqIO and AlignIO, and it > seems to help users. It's also Google-friendly. I like Bio.SearchIO for those reasons too. Perhaps that is the most popular name? Peter From mjldehoon at yahoo.com Fri Oct 26 15:58:04 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 26 Oct 2012 08:58:04 -0700 (PDT) Subject: [Biopython-dev] Status of SearchIO In-Reply-To: Message-ID: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> > 1. Bio.seq.search (in line with the namespace change) > 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used > to be Bio.SeqSearch, now adjusted for PEP8 compliance) > 3. Bio.search (same reasoning + explanation like Bio.seqsearch). > 4. Bio.SearchIO / Bio.searchio > 5. Bio.psearch (p for pairwise) > If it's down to a vote, I would vote to merge this branch as > Bio.SearchIO, and perhaps lowercase it to Bio.searchio or > biopy.searchio in the Py3 lowercase branch. > > Rationale: We already follow BioPerl with SeqIO and AlignIO, and it > seems to help users. It's also Google-friendly. I would vote for Bio.seq.search. I don't like Bio.SearchIO much because a) it doesn't tell you clearly what the module is about; and b) I think it it is a mistake to have Bio.SeqIO separate from Bio.Seq, and Bio.AlignIO separate from Bio.Align, because in both cases the two modules conceptually deal with the same thing. We don't have Bio.Cluster and Bio.ClusterIO, Bio.Entrez and Bio.EntrezIO, Bio.Motif and Bio.MotifIO; why should Bio.Seq and Bio.Align be different? -Michiel. From p.j.a.cock at googlemail.com Fri Oct 26 16:14:22 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 17:14:22 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> References: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: On Fri, Oct 26, 2012 at 4:58 PM, Michiel de Hoon wrote: >> 1. Bio.seq.search (in line with the namespace change) >> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used >> to be Bio.SeqSearch, now adjusted for PEP8 compliance) >> 3. Bio.search (same reasoning + explanation like Bio.seqsearch). >> 4. Bio.SearchIO / Bio.searchio >> 5. Bio.psearch (p for pairwise) > >> If it's down to a vote, I would vote to merge this branch as >> Bio.SearchIO, and perhaps lowercase it to Bio.searchio or >> biopy.searchio in the Py3 lowercase branch. >> >> Rationale: We already follow BioPerl with SeqIO and AlignIO, and it >> seems to help users. It's also Google-friendly. > > I would vote for Bio.seq.search. And would you support moving other existing Bio.SeqXXX modules under Bio.seq.* as for example outlined here?: http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html If so then I think we should go with that plan. > I don't like Bio.SearchIO much because a) it doesn't tell you clearly > what the module is about; and b) I think it it is a mistake to have > Bio.SeqIO separate from Bio.Seq, and Bio.AlignIO separate from > Bio.Align, because in both cases the two modules conceptually deal > with the same thing. We don't have Bio.Cluster and Bio.ClusterIO, > Bio.Entrez and Bio.EntrezIO, Bio.Motif and Bio.MotifIO; why should > Bio.Seq and Bio.Align be different? After all, not everyone was exposed to BioPerl before Biopython ;) Peter From p.j.a.cock at googlemail.com Fri Oct 26 21:19:28 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Oct 2012 22:19:28 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: On Fri, Oct 26, 2012 at 5:14 PM, Peter Cock wrote: > On Fri, Oct 26, 2012 at 4:58 PM, Michiel de Hoon wrote: >>> 1. Bio.seq.search (in line with the namespace change) >>> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used >>> to be Bio.SeqSearch, now adjusted for PEP8 compliance) >>> 3. Bio.search (same reasoning + explanation like Bio.seqsearch). >>> 4. Bio.SearchIO / Bio.searchio >>> 5. Bio.psearch (p for pairwise) >> >>> If it's down to a vote, I would vote to merge this branch as >>> Bio.SearchIO, and perhaps lowercase it to Bio.searchio or >>> biopy.searchio in the Py3 lowercase branch. >>> >>> Rationale: We already follow BioPerl with SeqIO and AlignIO, and it >>> seems to help users. It's also Google-friendly. >> >> I would vote for Bio.seq.search. > > And would you support moving other existing Bio.SeqXXX > modules under Bio.seq.* as for example outlined here?: > http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html > If so then I think we should go with that plan. I have started exploring that idea on this new branch, https://github.com/peterjc/biopython/tree/bioseq Does anyone object to me applying the first commit to the master branch (defining the previously discussed new warning for 'beta' code)? https://github.com/peterjc/biopython/commit/97485d5dcf2620f7664ae46a7897c1203847538d Note that introducing Bio.seq now (and any relocations under this) can (I believe) still be combined with the lower-case modules under Python 3 idea as well. This just requires the public classes and functions defined under Bio.Seq.* remains mirrored under Bio.Seq.* (this means assorted Seq objects and some functions like translate). Peter From w.arindrarto at gmail.com Fri Oct 26 22:43:45 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Sat, 27 Oct 2012 00:43:45 +0200 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: >>> 1. Bio.seq.search (in line with the namespace change) >>>> 2. Bio.seqsearch (top-level module, separate from Bio.seq. This used >>>> to be Bio.SeqSearch, now adjusted for PEP8 compliance) >>>> 3. Bio.search (same reasoning + explanation like Bio.seqsearch). >>>> 4. Bio.SearchIO / Bio.searchio >>>> 5. Bio.psearch (p for pairwise) >>> >>>> If it's down to a vote, I would vote to merge this branch as >>>> Bio.SearchIO, and perhaps lowercase it to Bio.searchio or >>>> biopy.searchio in the Py3 lowercase branch. >>>> >>>> Rationale: We already follow BioPerl with SeqIO and AlignIO, and it >>>> seems to help users. It's also Google-friendly. >>> >>> I would vote for Bio.seq.search. >> >> And would you support moving other existing Bio.SeqXXX >> modules under Bio.seq.* as for example outlined here?: >> http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html >> If so then I think we should go with that plan. > > I have started exploring that idea on this new branch, > https://github.com/peterjc/biopython/tree/bioseq > > Does anyone object to me applying the first commit to the master > branch (defining the previously discussed new warning for 'beta' code)? > https://github.com/peterjc/biopython/commit/97485d5dcf2620f7664ae46a7897c1203847538d No objection from me for the commit :). But I have some concerns for the SearchIO naming. I like Bio.seqsearch best at the moment. Bio.seq.search is good, but I understand that Bio.SearchIO will eventually contain app wrappers and code for remote searches as well. Putting it three levels-deep doesn't feel nice to me. As comparisons, submodules with similar features (Bio.Phylo, and possibly Bio.AlignIO, if in the future it will be merged with alignment app wrappers and the alignment object model) are available under Bio. > Note that introducing Bio.seq now (and any relocations under this) > can (I believe) still be combined with the lower-case modules under > Python 3 idea as well. This just requires the public classes and > functions defined under Bio.Seq.* remains mirrored under Bio.Seq.* > (this means assorted Seq objects and some functions like translate). > > Peter regards, Bow From p.j.a.cock at googlemail.com Sat Oct 27 00:54:47 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 27 Oct 2012 01:54:47 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: On Fri, Oct 26, 2012 at 11:43 PM, Wibowo Arindrarto wrote: >Peter wrote: >> I have started exploring that idea on this new branch, >> https://github.com/peterjc/biopython/tree/bioseq >> >> Does anyone object to me applying the first commit to the master >> branch (defining the previously discussed new warning for 'beta' code)? >> https://github.com/peterjc/biopython/commit/97485d5dcf2620f7664ae46a7897c1203847538d > > No objection from me for the commit :). > > But I have some concerns for the SearchIO naming. I like Bio.seqsearch > best at the moment. Bio.seq.search is good, but I understand that > Bio.SearchIO will eventually contain app wrappers and code for remote > searches as well. Putting it three levels-deep doesn't feel nice to > me. As comparisons, submodules with similar features (Bio.Phylo, and > possibly Bio.AlignIO, if in the future it will be merged with > alignment app wrappers and the alignment object model) are available > under Bio. I think we'd get used to the nested namespace pretty quickly, and this really only affect the import line anyway, e.g. something like this isn't so bad as long as we document this: from Bio.seq.search.apps import BlatCommandLine If the namespace nesting bothers you, then you might not like my thoughts for how to combine Bio.Align and Bio.AlignIO (since we can't use Bio.align due to the folder name clash on case incentive platforms): I was wondering about using Bio.seq.align for this, which again is a bit nested but would make it a system module to Bio.seq.search (aka SearchIO) and Bio.seq.record (which could include the former SeqIO code as well as the SeqRecord class). Peter From eric.talevich at gmail.com Sat Oct 27 04:03:46 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 27 Oct 2012 00:03:46 -0400 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: On Fri, Oct 26, 2012 at 8:54 PM, Peter Cock wrote: > > If the namespace nesting bothers you, then you might not like > my thoughts for how to combine Bio.Align and Bio.AlignIO > (since we can't use Bio.align due to the folder name clash on > case incentive platforms): I was wondering about using > Bio.seq.align for this, which again is a bit nested but would > make it a system module to Bio.seq.search (aka SearchIO) > and Bio.seq.record (which could include the former SeqIO > code as well as the SeqRecord class). > > Does that mean we'd have read, write, convert, etc. under Bio.seq.record? This is how that API would look: from Bio.seq import record for rec in record.parse("example.fa", "fasta"): ... As opposed to: # Minor change from Bio import seqio for record in seqio.parse(...) # Make sure we get those relative imports right! from Bio.seq import io for record in io.parse(...) # Slight cognitive distance, but maybe worth it from Bio import seq for record in seq.parse(...) Also: Technically, Bio.Motif operates on multiple sequence alignments, so it could be moved to Bio.seq.align.motif. (Not entirely trolling here, just pointing out possible consequences.) -Eric From w.arindrarto at gmail.com Sat Oct 27 05:55:27 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Sat, 27 Oct 2012 07:55:27 +0200 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: >> If the namespace nesting bothers you, then you might not like >> my thoughts for how to combine Bio.Align and Bio.AlignIO >> (since we can't use Bio.align due to the folder name clash on >> case incentive platforms): I was wondering about using >> Bio.seq.align for this, which again is a bit nested but would >> make it a system module to Bio.seq.search (aka SearchIO) >> and Bio.seq.record (which could include the former SeqIO >> code as well as the SeqRecord class). >> > Does that mean we'd have read, write, convert, etc. under Bio.seq.record? > This is how that API would look: > > from Bio.seq import record > for rec in record.parse("example.fa", "fasta"): ... > > As opposed to: > > # Minor change > from Bio import seqio > for record in seqio.parse(...) > > # Make sure we get those relative imports right! > from Bio.seq import io > for record in io.parse(...) > > # Slight cognitive distance, but maybe worth it > from Bio import seq > for record in seq.parse(...) > > > Also: Technically, Bio.Motif operates on multiple sequence alignments, so it > could be moved to Bio.seq.align.motif. (Not entirely trolling here, just > pointing out possible consequences.) > > -Eric What bothers me other than it being hidden is also the inconsistency (comparing it to the current namespace). However, if there is also a plan to merge sequence-related submodules under Bio.seq, it feels better and I'm ok with it. Still hidden, but we'll have more consistency and the root namespace will have less clutter. So it would look like this (with previously mentioned examples): Bio.SearchIO -> Bio.seq.search Bio.AlignIO -> Bio.seq.align Bio.Motif -> Bio.seq.motif Bio.SeqIO -> Bio.seq (or merge with Bio.SeqRecord into Bio.seq.record) Bio.SeqRecord -> Bio.seq.record Bio.SeqUtils -> Bio.seq.utils Bio.SeqFeature -> Bio.seq.feature Also maybe: Bio.Alphabet -> Bio.seq.alphabet Bio.Restriction -> Bio.seq.restriction or Bio.seq.utils.restriction And Eric is right, we may go further with Bio.seq.align.motif, but I think nesting sequence-related modules under Bio.seq is the furthest we should go. I personally find it the most intuitive. regards, Bow From mjldehoon at yahoo.com Sat Oct 27 10:46:10 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 27 Oct 2012 03:46:10 -0700 (PDT) Subject: [Biopython-dev] Status of SearchIO In-Reply-To: Message-ID: <1351334770.89984.YahooMailClassic@web164003.mail.gq1.yahoo.com> Hi everybody, --- On Fri, 10/26/12, Peter Cock wrote: > And would you support moving other existing Bio.SeqXXX > modules under Bio.seq.* as for example outlined here?: > http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html Yes that looks good to me. > I'm not 100% sure where the Bio.SeqIO top level functions > would belong, either directly under Bio.seq or Bio.seq.record > might work too. I would prefer to have the top-level functions directly under Bio.seq, since they will be used a lot. Best, -Michiel. From mjldehoon at yahoo.com Sat Oct 27 10:47:43 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 27 Oct 2012 03:47:43 -0700 (PDT) Subject: [Biopython-dev] Status of SearchIO In-Reply-To: Message-ID: <1351334863.39503.YahooMailClassic@web164006.mail.gq1.yahoo.com> --- On Sat, 10/27/12, Wibowo Arindrarto wrote: > And Eric is right, we may go further with Bio.seq.align.motif, but I > think nesting sequence-related modules under Bio.seq is the furthest > we should go. I personally find it the most intuitive. I agree. And according to the Zen of Python, flat is better than nested. Best, -Michiel. From bartek at rezolwenta.eu.org Sat Oct 27 12:55:12 2012 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Sat, 27 Oct 2012 14:55:12 +0200 Subject: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif In-Reply-To: <1351339617.3402.YahooMailClassic@web164005.mail.gq1.yahoo.com> References: <1351339617.3402.YahooMailClassic@web164005.mail.gq1.yahoo.com> Message-ID: Hi Michiel, On Sat, Oct 27, 2012 at 2:06 PM, Michiel de Hoon wrote: > Actually I was thinking about the suggestions for Bio.Motif I made earlier (see http://lists.open-bio.org/pipermail/biopython-dev/2012-September/009946.html). Right now they are just ideas, so I haven't implemented them yet. You mentioned in your reply last month: > >> I'll try to come up with a more thought through and longer response >> later in the week... > Absolutely. It's just that I had quite a crazy time lately (time spent writing proposals and other such stuff...) and I didn't really think too much about Bio.Motif. > So I was wondering if you have any additional comments on these suggestions, or if I can go ahead and start implementing. > I'm sorry if my inactivity has slowed things down. I'll try to be more constructive this time. I think that one thing clear is the Bio.Motif could use some code optimization, especially in the area of PWM searching. Honestly, I don't think that there will be a time in a forseeable future that I'll do it, so if you feel like implementing a better code for PWM handling/searching I'll be happy to do some code review or testing. There are a few things I think would be good to keep: - possibility to invoke motif.pwm_search(...) without worrying about the fact that it is actually carried out by some specialized class - possibility to determine motif thresholds based on fpr or fnr as currently implemented in Bio.Motif.Thresholds module - possibility to convert count based motifs to PWM based motifs without much fuss... All of these things are not really in conflict with your idea of moving the PWM related code to the special class, so if you want to do that, go ahead. If you also have trouble finding time to implement these improvements, I could try to recruit some master student from our department to do that. But if you have time to do the implementation yourself, it will probably be better and faster that way. best Bartek -- Bartek Wilczynski From mjldehoon at yahoo.com Sun Oct 28 02:47:15 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 27 Oct 2012 19:47:15 -0700 (PDT) Subject: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif In-Reply-To: Message-ID: <1351392435.42713.YahooMailClassic@web164006.mail.gq1.yahoo.com> Hi Bartek, OK, thanks! I'll go ahead with the implementation then, and write an update to the mailing list again so people can have a look at it. Best, -Michiel. --- On Sat, 10/27/12, Bartek Wilczynski wrote: > From: Bartek Wilczynski > Subject: Re: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif > To: "Michiel de Hoon" > Cc: "BioPython-Dev" > Date: Saturday, October 27, 2012, 8:55 AM > Hi Michiel, > > On Sat, Oct 27, 2012 at 2:06 PM, Michiel de Hoon > wrote: > > > Actually I was thinking about the suggestions for > Bio.Motif I made earlier (see http://lists.open-bio.org/pipermail/biopython-dev/2012-September/009946.html). > Right now they are just ideas, so I haven't implemented them > yet. You mentioned in your reply last month: > > > >> I'll try to come up with a more thought through and > longer response > >> later in the week... > > > > Absolutely. It's just that I had quite a crazy time lately > (time spent > writing proposals and other such stuff...) and I didn't > really think > too much about Bio.Motif. > > > So I was wondering if you have any additional comments > on these suggestions, or if I can go ahead and start > implementing. > > > > I'm sorry if my inactivity has slowed things down. I'll try > to be more > constructive this time. > > I think that one thing clear is the Bio.Motif could use some > code > optimization, especially in the area of PWM searching. > Honestly, I > don't think that there will be a time in a forseeable future > that I'll > do it, so if you feel like implementing a better code for > PWM > handling/searching I'll be happy to do some code review or > testing. > > There are a few things I think would be good to keep: > - possibility to invoke motif.pwm_search(...) without > worrying about > the fact that it is actually carried out by some specialized > class > - possibility to determine motif thresholds based on fpr or > fnr as > currently implemented in Bio.Motif.Thresholds module > - possibility to convert count based motifs to PWM based > motifs > without much fuss... > > All of these things are not really in conflict with your > idea of > moving the PWM related code to the special class, so if you > want to do > that, go ahead. > > If you also have trouble finding time to implement these > improvements, > I could try to recruit some master student from our > department to do > that. But if you have time to do the implementation > yourself, it will > probably be better and faster that way. > > best > Bartek > > -- > Bartek Wilczynski > From chapmanb at 50mail.com Sun Oct 28 18:55:31 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 28 Oct 2012 14:55:31 -0400 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: References: Message-ID: <87sj8ys9y4.fsf@fastmail.fm> Connor; > I remember this being resolved, but when I try to install biopython with > pip, it fails: Thanks for the report. It looks like the command line options pip uses to call setup.py changed a bit, so the hack we have in place is no longer working. I pushed a fix for this: https://github.com/chapmanb/biopython/commit/e05a355e3e9825c44c4a9b3bdfdda25c9a92c9c4 which seems to resolve the issue and hopefully make it more robust going forward. Could you confirm it works on your system: $ cd /tmp $ git clone git://github.com/chapmanb/biopython.git $ sudo pip install /tmp/biopython If so, I'll push this into the main repo for the next release. Thanks again for letting us know about the problem, Brad From chapmanb at 50mail.com Sun Oct 28 19:02:54 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 28 Oct 2012 15:02:54 -0400 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> Message-ID: <87pq42s9lt.fsf@fastmail.fm> Peter and all; Interesting discussion on the module path issues. I'm agreed with everyone that it would be nice to be pep8 compliant. However, my vote would be to stick with our traditional namespace to avoid widespread breakage. The changes everyone is proposing are nice, but not nice enough to deal with introducing an incompatible version and the documentation and help fallout from that. If everyone wants to go down the module name path, it would be worth investing in a biopython1to2 script that automatically handles the renamings for folks. Just my 2 cents, Brad From p.j.a.cock at googlemail.com Mon Oct 29 08:15:59 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 29 Oct 2012 08:15:59 +0000 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: <87pq42s9lt.fsf@fastmail.fm> References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> <87pq42s9lt.fsf@fastmail.fm> Message-ID: On Sunday, October 28, 2012, Brad Chapman wrote: > > Peter and all; > Interesting discussion on the module path issues. I'm agreed with > everyone that it would be nice to be pep8 compliant. However, my vote > would be to stick with our traditional namespace to avoid widespread > breakage. The changes everyone is proposing are nice, but not nice > enough to deal with introducing an incompatible version and the > documentation and help fallout from that. > > If everyone wants to go down the module name path, it would be worth > investing in a biopython1to2 script that automatically handles the > renamings for folks. > > Just my 2 cents, > Brad > Hi Brad, In the case of Bow's SearchIO code, what would you prefer? e.g. Bio.SearchIO as it is now on his branch? Peter From kai.blin at biotech.uni-tuebingen.de Mon Oct 29 10:26:03 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Mon, 29 Oct 2012 11:26:03 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <508A4B6C.6020801@biotech.uni-tuebingen.de> Message-ID: <508E59BB.1050705@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-10-26 11:33, Wibowo Arindrarto wrote: Hi Bow, Peter, > For the merge conflict, which branch are you using? Can you point > to specific commits that cause the conflicts? I haven't tried > merging / rebasing my own branch to the current master myself ~ so > knowing this should help the process as well. Disregarding the namespace discussion, I needed to get a reasonable branch to get my HMMer2 parser up-to-date in. As I said last week I tried rebasing Bow's searchio branch and had a bunch of merge conflicts. I've retried the rebase today, and most of the merge conflicts are actually pretty trivial and mostly around the question where the code gets it's OrderedDict from for python versions < 2.7. I've pushed the rebased patchset to https://github.com/kblin/biopython/tree/searchio-rebase if anybody wants to have a look. With the last patch fixing an error I seem to have introduced during merge conflict resolution, the SearchIO tests pass on that branch. Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQjlm7AAoJEKM5lwBiwTTPFe8IAMMLmM2kQmb9vOSCuNjbcIfJ HqzzvLaw8Eo44uEb0zmxhuJwPoPZpdZIWCNM1t3LpynaE3mHawLcrYJTT/R1YxkS udBHvMlU6h76J93NITWCzFZ7HHlMMrbzyPel7rifWXbv5xpG2BREpmr1V7lKmbH7 XbInPsVP0PjySFlCQb3219M+IZ4fA+ViYSBlQeXs91G1YzMVo6nkDcs+FkDG8mJt Qg2u4Bhrxaf3qQKNuQzb2AHJ4KpnEkYsTI2FUJfHaulNfN6w9HwsEgyvM6hVqONP 4aIYlsbSlLjbGG3sdliibPJy5A+8AnkNSFlAHydL+FgBVmPqo3Xe0O5buTdz3Vs= =prZo -----END PGP SIGNATURE----- From cmccoy at fhcrc.org Mon Oct 29 15:24:45 2012 From: cmccoy at fhcrc.org (Connor McCoy) Date: Mon, 29 Oct 2012 08:24:45 -0700 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: <87sj8ys9y4.fsf@fastmail.fm> References: <87sj8ys9y4.fsf@fastmail.fm> Message-ID: Hi Brad, Thank you so much for the quick reply. I just got a chance to test this, and it seems to be working again. Best, Connor On Sun, Oct 28, 2012 at 11:55 AM, Brad Chapman wrote: > > Connor; > > > I remember this being resolved, but when I try to install biopython with > > pip, it fails: > > Thanks for the report. It looks like the command line options pip uses > to call setup.py changed a bit, so the hack we have in place is no > longer working. I pushed a fix for this: > > > https://github.com/chapmanb/biopython/commit/e05a355e3e9825c44c4a9b3bdfdda25c9a92c9c4 > > which seems to resolve the issue and hopefully make it more robust going > forward. Could you confirm it works on your system: > > $ cd /tmp > $ git clone git://github.com/chapmanb/biopython.git > $ sudo pip install /tmp/biopython > > If so, I'll push this into the main repo for the next release. Thanks > again for letting us know about the problem, > Brad > -- Connor McCoy Fred Hutchinson Cancer Research Center 1100 Fairview Ave N. Seattle, WA 98109-1924 cmccoy at fhcrc.org From chapmanb at 50mail.com Mon Oct 29 17:54:30 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 29 Oct 2012 13:54:30 -0400 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> <87pq42s9lt.fsf@fastmail.fm> Message-ID: <874nldqi3t.fsf@fastmail.fm> Peter; > In the case of Bow's SearchIO code, what would you prefer? > e.g. Bio.SearchIO as it is now on his branch? I like plain ol' Search the best but don't have a strong preference. I'm terrible at naming things so trust everyone's judgment on this. Brad From w.arindrarto at gmail.com Mon Oct 29 20:11:09 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Mon, 29 Oct 2012 21:11:09 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: <508E59BB.1050705@biotech.uni-tuebingen.de> References: <508A4B6C.6020801@biotech.uni-tuebingen.de> <508E59BB.1050705@biotech.uni-tuebingen.de> Message-ID: Hi Kai, > > For the merge conflict, which branch are you using? Can you point > > to specific commits that cause the conflicts? I haven't tried > > merging / rebasing my own branch to the current master myself ~ so > > knowing this should help the process as well. > > Disregarding the namespace discussion, I needed to get a reasonable > branch to get my HMMer2 parser up-to-date in. As I said last week I > tried rebasing Bow's searchio branch and had a bunch of merge conflicts. > > I've retried the rebase today, and most of the merge conflicts are > actually pretty trivial and mostly around the question where the code > gets it's OrderedDict from for python versions < 2.7. > > I've pushed the rebased patchset to > https://github.com/kblin/biopython/tree/searchio-rebase if anybody > wants to have a look. With the last patch fixing an error I seem to > have introduced during merge conflict resolution, the SearchIO tests > pass on that branch. Thanks for doing the rebase :)! I just checked it and everything looks fine; all unit tests + doctests pass. On another note, I was wondering about how to combine this rebased branch with my local branch. Is there a simple way to apply the changes in the rebased branch to my local working searchio branch or should I just switch to a local checkout of the rebased branch? regards, Bow From kai.blin at biotech.uni-tuebingen.de Mon Oct 29 20:43:49 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Mon, 29 Oct 2012 21:43:49 +0100 Subject: [Biopython-dev] Working with the new SearchIO API Message-ID: <508EEA85.6060906@biotech.uni-tuebingen.de> Hi Bow, I've been looking closer at the SearchIO API changes introduced in August. I think there still is a design problem with the object model, at least when looking at how this affects the hmmer3 parser (and affects the hmmer2 parsing as well). Possibly I'm not seeing the big picture here, so let me explain what I'm seeing, and then you can tell me what I missed. :) So, the hmmer2 and hmmer3 file format basically looks like this # header # ... # ... information about the query list of hits list of hsps (alignments for hsps) (some statistics) // Now, when parsing this file line-wise, you obviously run into the hits first. However, with the new API, you can't create a Hit object without knowing the HSPs, but you haven't read them yet. To work around this, you need to create a fake hit object (https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L201). Then, in the loop that creates the fake hit objects, one of the exit conditions then parses the HSP entries and then replaces the fake hit objects by "real" Hit objects. (https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L188) By the way, that code is a bit misleading. Took me a while to notice the switch of the list's contents. Anyway, back to business. So basically you need to create two hit objects for every hit you're looking at. What's the advantage of forcing Hsp objects to be passed to the Hit constructor? Just to make sure your Hit objects have a valid Hsp at some later point? I'm aware that I'm just looking at the SearchIO design from the perspective of the hmmer2 parser, but I'd like to understand the reasons for the API being the way it currently is. Hope you can shed some light on this, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From kai.blin at biotech.uni-tuebingen.de Mon Oct 29 20:47:11 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Mon, 29 Oct 2012 21:47:11 +0100 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <508A4B6C.6020801@biotech.uni-tuebingen.de> <508E59BB.1050705@biotech.uni-tuebingen.de> Message-ID: <508EEB4F.7050607@biotech.uni-tuebingen.de> On 2012-10-29 21:11, Wibowo Arindrarto wrote: Hi Bow, > On another note, I was wondering about how to combine this rebased > branch with my local branch. Is there a simple way to apply the > changes in the rebased branch to my local working searchio branch or > should I just switch to a local checkout of the rebased branch? Well, you could rebase your local changes on top of the rebased branch. :) Or, depending on how many changes you have in your local branch, check our the rebased branch and then git cherry-pick your changes on top of the rebased branch. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From w.arindrarto at gmail.com Mon Oct 29 22:55:19 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Mon, 29 Oct 2012 23:55:19 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: <508EEA85.6060906@biotech.uni-tuebingen.de> References: <508EEA85.6060906@biotech.uni-tuebingen.de> Message-ID: Hi Kai, Thanks for the input & comments! I made the API change mainly because I want to keep the SearchIO object hierarchy more consistent, i.e. there should be as few places as possible to make changes that break the model. There are several attributes that should remain the same between a single QueryResult object and the Hits, HSPs, and HSPFragments it contain. For now, these attributes are the ID (both query and hit ID) and description (also for both query and ID). In the old API, each object in the object model hierarchy stores these values as its own attribute. For example, to store the ID of the Hit object, the old API has the 'id' attribute in the Hit object, 'hit_id' attribute in all HSP objects it contains, and 'hit_id' attributes in all HSPFragment contained by each HSP in the Hit. I see this as unecessary duplications and a possible source of confusion, since these attributes are completely decoupled from one another even though they mean the same thing. The new API stores the these values only at the innermost object in the hierarchy (the HSPFragment), reducing duplications and possible sources of inconsistencies. When you access the attributes from objects other than the HSPFragment, a getter retrieves it from one of the contained HSPFragment object, after ensuring that all HSPFragment contain the same value of the attribute (https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/_utils.py#L99). Similarly, when you set the attribute, a setter applies the new value to all HSPFragment objects contained (https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/_utils.py#L106). This allows you to keep the values consistent across the hierarchy, so long as the change is done at the highest level possible (e.g. changing the hit ID in the HSP object will break consistency, but changing hit ID through the Hit object will update the hit_id attribute value across all HSPs it contains). Conceptually, this is also closer to the real 'Hit' object we're modeling since we always need at least one HSP to declare a database entry as a Hit. The HMMER parser's update is partially influenced by this API change, as you've seen. In the previous version (https://github.com/bow/biopython/blob/12fbe05c5e17f7a356ab672358b2698612aa8cad/Bio/SearchIO/HmmerIO/hmmertext.py), the HMMER parser has several ugly bits (e.g. it sets the hit description in more than one place, a possible source of error). After changing the API to force the creation of Hits with HSPs, these kinds of duplications are eliminated. I personally also feel that using the new API allows me (sometimes forces me) to improve the other format's parsers in a similar way. It's unfortunate that the HMMER text parser is made a little difficult to understand, due to the way HMMER arranges the text output format. And I admit I didn't do any performance benchmark for the HMMER text parser when I made the change (I suspected one extra dictionary per Hit object should not decrease performance that much. Of course, if the change proves to cause severe performance penalties, then yes, we should look into it again.). But for now, I think these are acceptable tradeoffs, if it means the object model becomes more consistent and the other format parsers improved as well. Hope that helps :). regards, Bow P.S. As for the misleading part, yes, I admit that maybe a different name should be used to note that the contents of the list differ. On Mon, Oct 29, 2012 at 9:43 PM, Kai Blin wrote: > Hi Bow, > > I've been looking closer at the SearchIO API changes introduced in > August. I think there still is a design problem with the object model, > at least when looking at how this affects the hmmer3 parser (and affects > the hmmer2 parsing as well). > > Possibly I'm not seeing the big picture here, so let me explain what I'm > seeing, and then you can tell me what I missed. :) > > So, the hmmer2 and hmmer3 file format basically looks like this > > # header > # ... > # ... > > information about the query > > list of hits > > list of hsps > > (alignments for hsps) > > (some statistics) > // > > Now, when parsing this file line-wise, you obviously run into the hits > first. However, with the new API, you can't create a Hit object without > knowing the HSPs, but you haven't read them yet. > > To work around this, you need to create a fake hit object > (https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L201). > Then, in the loop that creates the fake hit objects, one of the exit > conditions then parses the HSP entries and then replaces the fake hit > objects by "real" Hit objects. > (https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L188) > > By the way, that code is a bit misleading. Took me a while to notice the > switch of the list's contents. Anyway, back to business. > > So basically you need to create two hit objects for every hit you're > looking at. What's the advantage of forcing Hsp objects to be passed to > the Hit constructor? Just to make sure your Hit objects have a valid Hsp > at some later point? > > I'm aware that I'm just looking at the SearchIO design from the > perspective of the hmmer2 parser, but I'd like to understand the reasons > for the API being the way it currently is. > > Hope you can shed some light on this, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Institute for Microbiology and Infection Medicine > Division of Microbiology/Biotechnology > Eberhard-Karls-University of T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From kai.blin at biotech.uni-tuebingen.de Tue Oct 30 07:35:40 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 30 Oct 2012 08:35:40 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: References: <508EEA85.6060906@biotech.uni-tuebingen.de> Message-ID: <508F834C.6010404@biotech.uni-tuebingen.de> On 2012-10-29 23:55, Wibowo Arindrarto wrote: Hi Bow, > Thanks for the input & comments! I made the API change mainly because > I want to keep the SearchIO object hierarchy more consistent, i.e. > there should be as few places as possible to make changes that break > the model. Thanks for the explanation. ... > This allows you to keep the values consistent across the hierarchy, so > long as the change is done at the highest level possible (e.g. > changing the hit ID in the HSP object will break consistency, but > changing hit ID through the Hit object will update the hit_id > attribute value across all HSPs it contains). Conceptually, this is > also closer to the real 'Hit' object we're modeling since we always > need at least one HSP to declare a database entry as a Hit. I see. I didn't think about the programmatic side of things. I see the advantage of having only one attribute there and of keeping it consistent. > The HMMER parser's update is partially influenced by this API change, > as you've seen. In the previous version > (https://github.com/bow/biopython/blob/12fbe05c5e17f7a356ab672358b2698612aa8cad/Bio/SearchIO/HmmerIO/hmmertext.py), > the HMMER parser has several ugly bits (e.g. it sets the hit > description in more than one place, a possible source of error). After > changing the API to force the creation of Hits with HSPs, these kinds > of duplications are eliminated. I personally also feel that using the > new API allows me (sometimes forces me) to improve the other format's > parsers in a similar way. Arguably, the more human-readable the file you need to parse, the less readable the parser tends to be. ;) I think the old parser was a more straightforward piece of code. > It's unfortunate that the HMMER text parser is made a little difficult > to understand, due to the way HMMER arranges the text output format. > And I admit I didn't do any performance benchmark for the HMMER text > parser when I made the change (I suspected one extra dictionary per > Hit object should not decrease performance that much. Of course, if > the change proves to cause severe performance penalties, then yes, we > should look into it again.). I'm not talking about performance here, performance likely isn't a problem. I'm saying that you're conceptually creating the Hit object twice. Even the comment in line 200 says so. :) [snip] # create the hit object hit_attrs = { 'id': row[8], 'query_id': qid, 'evalue': float(row[0]), 'bitscore': float(row[1]), 'bias': float(row[2]), # row[3:6] is not parsed, since the info is available # at the the HSP level 'domain_exp_num': float(row[6]), 'domain_obs_num': int(row[7]), 'description': row[9], 'is_included': is_included, } hit_list.append(hit_attrs) [snip] I'm mainly wondering why at this position, I can't just create the Hit object already, and then later set the HSPs. You could do this via a setter function that validates the IDs are identical if you want to make sure you're not shooting yourself in the foot there. > But for now, I think these are acceptable tradeoffs, if it means the > object model becomes more consistent and the other format parsers > improved as well. I haven't looked into the other parsers, so I'll take your word on that. I can of course take the same detour of creating a placeholder hit object for the first pass and then when I've parsed the HSPs create the real Hit object. If this makes all the other parsers more readable at the cost of some obscurity in the hmmer text parsers, well, so be it. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From p.j.a.cock at googlemail.com Tue Oct 30 10:59:44 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 30 Oct 2012 10:59:44 +0000 Subject: [Biopython-dev] Status of SearchIO In-Reply-To: References: <1351267084.92577.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: On Fri, Oct 26, 2012 at 11:43 PM, Wibowo Arindrarto wrote: >> >> I have started exploring that idea on this new branch, >> https://github.com/peterjc/biopython/tree/bioseq >> >> Does anyone object to me applying the first commit to the master >> branch (defining the previously discussed new warning for 'beta' code)? >> https://github.com/peterjc/biopython/commit/97485d5dcf2620f7664ae46a7897c1203847538d > > No objection from me for the commit :). > Done, commit adding Bio.BiopythonExperimentalWarning cherry-picked to the master, https://github.com/biopython/biopython/commit/52ac4383b12335ebcdcb8ea52eec8d23ac28b5e2 Peter From p.j.a.cock at googlemail.com Tue Oct 30 11:03:07 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 30 Oct 2012 11:03:07 +0000 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: <874nldqi3t.fsf@fastmail.fm> References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> <87pq42s9lt.fsf@fastmail.fm> <874nldqi3t.fsf@fastmail.fm> Message-ID: On Mon, Oct 29, 2012 at 5:54 PM, Brad Chapman wrote: > > Peter; > >> In the case of Bow's SearchIO code, what would you prefer? >> e.g. Bio.SearchIO as it is now on his branch? > > I like plain ol' Search the best but don't have a strong preference. I'm > terrible at naming things so trust everyone's judgment on this. > > Brad Since we have no clear consensus, I propose we add Bow's code as Bio.SearchIO (which is how it is written right now), with the new BiopythonExperimentalWarning in place (to alert people that it may change in the next release). We can then rename or move it at a later date. This will make it easier for people to test the code, and also suggest further changes or additions (e.g. Kai's HMMER work). If we and when we agree a consolidation of the Bio.SeqXXX modules, then Bio.SearchIO could move too. If this happens before any public release as Bio.SearchIO so much the better. Adopting lower case module names under Python 3 is also a separate issue. Peter From kai.blin at biotech.uni-tuebingen.de Tue Oct 30 14:17:38 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 30 Oct 2012 15:17:38 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: <508EEA85.6060906@biotech.uni-tuebingen.de> References: <508EEA85.6060906@biotech.uni-tuebingen.de> Message-ID: <508FE182.3040202@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-10-29 21:43, Kai Blin wrote: Hi Bow, one more thing: Hmmer2 has the concept of an accession number in the result. Is there an attribute for that in the QueryResult object that I'm missing or do we want a new attribute for that. Would "accession" be a good name? Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQj+GCAAoJEKM5lwBiwTTPaT4IAJb+Xs7sMPpQH4SwUQarItyP Cg0UYLQNRtKBlyhNpipCbz7BWfqxd8fU0GsYSCVF275fDuBLUa337A6psRzefkWa 84cC7uHmOdcmhyeCipdAs5Jtouxf7ReGuQ+m3/SsW0pRfMHOuZamKw+5+oETnisM DiHJUv6iKMHCpXrVWpofcKywqb1uqpxdhTp9F1gy+v6rVGKMI4r/fW5mRQZVxC3s aQdhubCHoN+LUEo/OUKIF6cNeHWLMBToENdYlBhk62gLeSX5bxyhog21pzD+HTYf 5u4rPC2ikVR7iGQ9QPsvW7r7lqpDgoxFbnDYzcsAa+bNYd6+ENs+MAePb8Va2Dg= =Luz9 -----END PGP SIGNATURE----- From kai.blin at biotech.uni-tuebingen.de Tue Oct 30 15:54:50 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 30 Oct 2012 16:54:50 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: <508F834C.6010404@biotech.uni-tuebingen.de> References: <508EEA85.6060906@biotech.uni-tuebingen.de> <508F834C.6010404@biotech.uni-tuebingen.de> Message-ID: <508FF84A.2020802@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-10-30 08:35, Kai Blin wrote: Hi Bow, > I'm mainly wondering why at this position, I can't just create the > Hit object already, and then later set the HSPs. You could do this > via a setter function that validates the IDs are identical if you > want to make sure you're not shooting yourself in the foot there. I've just stumbled over a case where not being able to pre-create Hit objects really bites me. See the attached hmmpfam output. You'll notice that the domain table is not in the order of the hit table. As I'd like to preserve the order of the hit table, the current setup of the API forces me to either repeatedly parse the domain annotations until I find the correct domain annotations for my hit, or to create the hits in the order of the domain annotation table and then reshuffle them to make sure they're in the order of the hit table. If I could just create "empty" hit objects when parsing the hit table, I could easily preserve the order of the hits but still add the hsps as I parse them. Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQj/hKAAoJEKM5lwBiwTTPWTYH/2miexrfxolw9J0tOCSHXFYn eNEzLcIM8ZHUoBCL1fsS/9166VH8D8HpyZCgTQwsSt9BUhQbjkwTmyfmP9wr0QDp 80IbxqWkMAJmDv3Q1RxbVVmD8TTfY6AwezQuwnYb8EFJDD7wvcJOJgJEqlp6zZu1 K/fJNYOXt2GekcXkrOMO1jGkzzpiwBs1uhhpYH9LxMAHPW3vnfTf4/tVSRPOKWRr IXtxRnLSSurmZP4DYNm1ys4NykY6cO6zPOWxJIiI1lBLR7AVaKNK1bZ75m2D7/Mr Y4FjnIlqaCFuNwiYPSNWQvTHOIj/VF/nRSWAVRRCqYZoYaDuZa25rb3Fo5RHMC8= =Lerj -----END PGP SIGNATURE----- -------------- next part -------------- hmmpfam - search one or more sequences against HMM database HMMER 2.3.2 (Oct 2003) Copyright (C) 1992-2003 HHMI/Washington University School of Medicine Freely distributed under the GNU General Public License (GPL) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: ../Shared/Pfam_fs Sequence file: single_porphyra_AA.fa - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query sequence: gi|90819130|dbj|BAE92499.1| Accession: [none] Description: glutamate synthase [Porphyra yezoensis] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- Glu_synthase Conserved region in glutamate synthas 858.6 3.6e-255 2 GATase_2 Glutamine amidotransferases class-II 731.8 3.9e-226 1 Glu_syn_central Glutamate synthase central domain 649.1 7.9e-213 1 GXGXG GXGXG motif 367.3 2.7e-107 1 HdeA hns-dependent expression protein A (H 9.6 0.015 1 GDC-P Glycine cleavage system P-protein 7.1 0.086 1 Cache_1 Cache domain 7.0 0.14 1 IBN_N Importin-beta N-terminal domain 8.2 0.17 1 DUF1200 Protein of unknown function (DUF1200) 6.7 0.42 1 cobW CobW/HypB/UreG, nucleotide-binding do 5.1 0.45 1 PUF Pumilio-family RNA binding repeat 6.5 0.47 1 Arch_flagellin Archaebacterial flagellin 4.1 0.66 1 FMN_dh FMN-dependent dehydrogenase 3.2 0.89 1 RNA_pol_Rpb2_4 RNA polymerase Rpb2, domain 4 4.6 1.4 1 DUF477 Domain of unknown function (DUF477) 3.8 1.7 1 FRG1 FRG1-like family 0.2 1.7 1 DUF1393 Protein of unknown function (DUF1393) 3.1 2 1 tRNA_anti OB-fold nucleic acid binding domain 4.9 2 1 SelT Selenoprotein T 3.1 2.2 1 RNase_PH_C 3' exoribonuclease family, domain 2 4.2 2.3 1 Pencillinase_R Penicillinase repressor 3.9 2.5 1 Hormone_4 Neurohypophysial hormones, N-terminal 4.4 2.5 1 DSRB Dextransucrase DSRB 2.7 2.7 1 FtsK_SpoIIIE FtsK/SpoIIIE family 2.6 3.1 1 UBA UBA/TS-N domain 4.2 3.1 1 DUF1981 Domain of unknown function (DUF1981) 3.6 3.3 1 Gla Vitamin K-dependent carboxylation/gam 4.0 3.5 1 Scm3 Centromere protein Scm3 2.2 3.5 1 Ribosomal_S6 Ribosomal protein S6 3.3 3.7 1 Cystatin Cystatin domain 2.4 3.9 1 Phage_prot_Gp6 Phage portal protein, SPP1 Gp6-like 1.0 4 1 DUF1976 Domain of unknown function (DUF1976) -1.5 4.3 1 DUF37 Domain of unknown function DUF37 3.0 4.5 1 Flavodoxin_NdrI NrdI Flavodoxin like 2.1 4.6 1 Bac_rhodopsin Bacteriorhodopsin 0.9 4.9 1 Nitro_FeMo-Co Dinitrogenase iron-molybdenum cofacto 2.1 5.3 1 MoCF_biosynth Probable molybdopterin binding domain 1.3 5.6 1 PaaA_PaaC Phenylacetic acid catabolic protein 0.4 5.6 1 Albicidin_res Albicidin resistance domain 1.7 5.7 1 DUF1514 Protein of unknown function (DUF1514) 3.5 5.7 1 T5orf172 T5orf172 domain 2.0 6.1 1 Nup133_N Nup133 N terminal like -0.6 6.5 1 BicD Microtubule-associated protein Bicaud -1.6 6.8 1 Sel1 Sel1 repeat 2.5 7 1 CAP_C DE Adenylate cyclase associated (CA 1.3 7.4 1 Colicin Colicin pore forming domain 1.4 7.5 1 MADF_DNA_bdg Alcohol dehydrogenase transcription f 1.8 8.2 1 DUF258 Protein of unknown function, DUF258 0.3 8.3 1 PspB Phage shock protein B 0.4 8.4 1 GspM General secretion pathway, M protein 1.0 8.6 1 Coq4 Coenzyme Q (ubiquinone) biosynthesis -0.3 9.1 1 P22_AR_N P22_AR N-terminal domain -0.2 9.5 1 C1_2 C1 domain 1.1 9.6 1 Phage_Mu_P Bacteriophage Mu P protein -0.4 10 1 Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- GATase_2 1/1 34 404 .. 1 385 [] 731.8 3.9e-226 FRG1 1/1 88 107 .. 151 173 .. 0.2 1.7 C1_2 1/1 191 210 .. 9 27 .. 1.1 9.6 MADF_DNA_bdg 1/1 235 261 .. 57 95 .] 1.8 8.2 PaaA_PaaC 1/1 258 269 .. 1 13 [. 0.4 5.6 Albicidin_res 1/1 274 289 .. 50 65 .. 1.7 5.7 UBA 1/1 311 331 .. 18 38 .] 4.2 3.1 Gla 1/1 342 357 .. 27 42 .] 4.0 3.5 RNA_pol_Rpb2_4 1/1 369 381 .. 1 13 [. 4.6 1.4 MoCF_biosynth 1/1 371 396 .. 23 49 .. 1.3 5.6 DUF1200 1/1 389 401 .. 1 13 [. 6.7 0.42 Nup133_N 1/1 397 419 .. 475 498 .] -0.6 6.5 DUF1976 1/1 428 448 .. 1296 1319 .] -1.5 4.3 Bac_rhodopsin 1/1 445 472 .. 219 250 .] 0.9 4.9 Coq4 1/1 459 481 .. 60 82 .. -0.3 9.1 Glu_syn_central 1/1 478 773 .. 1 301 [] 649.1 7.9e-213 Flavodoxin_NdrI 1/1 488 497 .. 122 131 .] 2.1 4.6 P22_AR_N 1/1 524 541 .. 110 126 .] -0.2 9.5 Cache_1 1/1 537 557 .. 1 23 [. 7.0 0.14 Glu_synthase 1/2 650 676 .. 297 323 .. 1.3 3 HdeA 1/1 727 749 .. 58 79 .] 9.6 0.015 Sel1 1/1 729 745 .. 32 49 .] 2.5 7 DUF1981 1/1 765 787 .. 62 88 .] 3.6 3.3 tRNA_anti 1/1 818 839 .. 54 85 .] 4.9 2 Cystatin 1/1 826 859 .. 1 38 [. 2.4 3.9 RNase_PH_C 1/1 827 846 .. 64 84 .] 4.2 2.3 Glu_synthase 2/2 830 1216 .. 1 412 [] 857.3 9e-255 DUF258 1/1 839 860 .. 282 305 .] 0.3 8.3 Pencillinase_R 1/1 856 894 .. 84 118 .] 3.9 2.5 SelT 1/1 872 885 .. 96 111 .] 3.1 2.2 Nitro_FeMo-Co 1/1 879 897 .. 87 105 .] 2.1 5.3 DUF37 1/1 927 934 .. 61 68 .] 3.0 4.5 Scm3 1/1 953 963 .. 103 113 .] 2.2 3.5 cobW 1/1 1038 1058 .. 202 222 .] 5.1 0.45 Arch_flagellin 1/1 1050 1072 .. 197 219 .] 4.1 0.66 DUF1393 1/1 1055 1068 .. 1 14 [. 3.1 2 FtsK_SpoIIIE 1/1 1107 1143 .. 163 198 .. 2.6 3.1 FMN_dh 1/1 1109 1148 .. 291 330 .. 3.2 0.89 DSRB 1/1 1120 1134 .. 1 16 [. 2.7 2.7 Phage_Mu_P 1/1 1122 1131 .. 1 10 [. -0.4 10 Hormone_4 1/1 1168 1176 .. 1 9 [] 4.4 2.5 GDC-P 1/1 1205 1225 .. 10 30 .. 7.1 0.086 PspB 1/1 1268 1276 .. 1 9 [. 0.4 8.4 T5orf172 1/1 1271 1293 .. 35 58 .. 2.0 6.1 CAP_C 1/1 1283 1292 .. 161 170 .] 1.3 7.4 GXGXG 1/1 1290 1485 .. 1 228 [] 367.3 2.7e-107 DUF1514 1/1 1453 1469 .. 50 66 .] 3.5 5.7 Colicin 1/1 1456 1467 .. 192 203 .] 1.4 7.5 Ribosomal_S6 1/1 1461 1481 .. 16 36 .. 3.3 3.7 BicD 1/1 1465 1481 .. 1 17 [. -1.6 6.8 PUF 1/1 1470 1486 .. 19 35 .] 6.5 0.47 DUF477 1/1 1472 1495 .. 1 24 [. 3.8 1.7 Phage_prot_Gp6 1/1 1479 1492 .. 1 14 [. 1.0 4 IBN_N 1/1 1498 1516 .. 1 20 [. 8.2 0.17 GspM 1/1 1506 1520 .. 1 15 [. 1.0 8.6 Alignments of top-scoring domains: GATase_2: domain 1 of 1, from 34 to 404: score 731.8, E = 3.9e-226 CS EEEEEEEEETSSHSBHHHHHHHHHHHHHGGGGSSCSTTSSCECEEEE *->CGvlGfiAhikgkpshkivedaleaLerLeHRGavgADgktGDGAGI CGv GfiA+ ++ ++hkiv +aleaL+++eHRGa++AD ++GDGAGI gi|9081913 34 CGV-GFIADVNNVANHKIVVQALEALTCMEHRGACSADRDSGDGAGI 79 CS EEECTCCCHHHHHHHCT----S GC-EEEEEEE-SSHHHHHHHHHHHHHH ltqiPdgFFrevakelGieLpe.gqYAVGmvFLPqdelaraearkifEki t+iP+++F++ ++++i++ ++ +VGm+FLP l+ + i+E + gi|9081913 80 TTAIPWNLFQKSLQNQNIKFEQnDSVGVGMLFLPAHKLKES--KLIIETV 127 CS HHHTT-EEEEEEE--B-GGGS-HHHHHC--EEEEEEEE-TT--HHHHHHC aeeeGLeVLGWReVPvnnsvLGetAlatePvIeQvFvgapsgdgedfErr ++ee+Le++GWR VP+ +vLG++A + P++eQvF+ +++ +++ +E++ gi|9081913 128 LKEENLEIIGWRLVPTVQEVLGKQAYLNKPHVEQVFCKSSNLSKDRLEQQ 177 CS EEEEECHSCHHHHTHHH. BEEEEEESSEEEEEECC-GGGHHHHBHG LyviRkrieksivaenvn....fYiCSLSsrTIVYKGMLtseQLgqFYpD L+++Rk+iek+i+ + + ++fYiCSLS++TIVYKGM++s++LgqFY+D gi|9081913 178 LFLVRKKIEKYIGINGKDwaheFYICSLSCYTIVYKGMMRSAVLGQFYQD 227 CS GGSTTEEBSEEEEEECESSSSSCTGGGSSCEEECCCTTCEEEEEEEEETT LqderfeSalAivHsRFSTNTfPsWplAQPfRVnslwgggivlAHNGEIN L++++++S++Ai+H+RFSTNT+P+WplAQP+R ++ HNGEIN gi|9081913 228 LYHSEYTSSFAIYHRRFSTNTMPKWPLAQPMR---------FVSHNGEIN 268 CS THHHHHHHHHHTSCCCSSTTCGHHHHCC-SSS-TTSCHHHHHHHHHHHHH TlrgNrnwMraRegvlksplFgddldkLkPIvneggSDSaalDnvlEllv Tl gN nwM++Re +l+s++++d++++LkPI n+++SDSa+lD ++Ell+ gi|9081913 269 TLLGNLNWMQSREPLLQSKVWKDRIHELKPITNKDNSDSANLDAAVELLI 318 CS HTT--HHHHHHHHS----TT-GGGTST-HHHHHHHHHHHHHHCCHCCEEE raGRslpeAlMMlIPEAWqnnpdmdkdrpekraFYeylsglmEPWDGPAa ++GRs++eAlM+l+PEA+qn+pd +++e+ +FYey+sgl+EPWDGPA+ gi|9081913 319 ASGRSPEEALMILVPEAFQNQPDFA-NNTEISDFYEYYSGLQEPWDGPAL 367 CS EEEETSSEEEEEEETTTSCESEEEEEEEEEE.TTEEEEEESSC lvftDGryavgAtLDRNGLTRPaRygiTrdldkDglvvvaSEa<-* +vft+G++ +gAtLDRNGL RPaRy+iT kD+lv+v+SE+ gi|9081913 368 VVFTNGKV-IGATLDRNGL-RPARYVIT----KDNLVIVSSES 404 FRG1: domain 1 of 1, from 88 to 107: score 0.2, E = 1.7 *->FQkfKvDLqdrklrinekDkkel<-* FQk+ Lq+ + +++D+ ++ gi|9081913 88 FQKS---LQNQNIKFEQNDSVGV 107 C1_2: domain 1 of 1, from 191 to 210: score 1.1, E = 9.6 *->idgfyg...fYsCkkccddftl<-* i+g+++ ++fY C+ c +t+ gi|9081913 191 INGKDWaheFYICSLSC--YTI 210 MADF_DNA_bdg: domain 1 of 1, from 235 to 261: score 1.8, E = 8.2 *->drYrrelrkirqgnsegsstgsgesykskWryyeelsFL<-* +++ ++r+ ++ +kW+++ ++F gi|9081913 235 SSFAIYHRRFS------------TNTMPKWPLAQPMRFV 261 PaaA_PaaC: domain 1 of 1, from 258 to 269: score 0.4, E = 5.6 CS X............ *->MYnFvEHGGvint<-* M Fv H G int gi|9081913 258 M-RFVSHNGEINT 269 Albicidin_res: domain 1 of 1, from 274 to 289: score 1.7, E = 5.7 *->LrlmharEPsLrkgtG<-* L+ m+ rEP L+ +++ gi|9081913 274 LNWMQSREPLLQSKVW 289 UBA: domain 1 of 1, from 311 to 331: score 4.2, E = 3.1 CS HHHHHHHHHTTT-HHHHHHHH *->eeakkALeatngnverAvewL<-* ++a++ L a++ ++e+A+++L gi|9081913 311 DAAVELLIASGRSPEEALMIL 331 Gla: domain 1 of 1, from 342 to 357: score 4.0, E = 3.5 CS CSSHHHHHHHHHHCTC *->fednegtkefwrkYfg<-* f++n+++ f++ Y g gi|9081913 342 FANNTEISDFYEYYSG 357 RNA_pol_Rpb2_4: domain 1 of 1, from 369 to 381: score 4.6, E = 1.4 CS EEETTEEEEEESS *->VYvNGklvGthrn<-* V+ NGk++G + + gi|9081913 369 VFTNGKVIGATLD 381 MoCF_biosynth: domain 1 of 1, from 371 to 396: score 1.3, E = 5.6 CS CHHHHHHHHHHHTTTCEEEEEEEE-SS *->tNgpmLaalLresaGaevirygiVpDd<-* tNg+ + a L + G ++ry+i +D+ gi|9081913 371 TNGKVIGATLDR-NGLRPARYVITKDN 396 DUF1200: domain 1 of 1, from 389 to 401: score 6.7, E = 0.42 *->kYvltedtLlIks<-* +Yv+t+d L+I+s gi|9081913 389 RYVITKDNLVIVS 401 Nup133_N: domain 1 of 1, from 397 to 419: score -0.6, E = 6.5 *->lylltrnsGvvrIeHaleedstne<-* l++ + +sGvv++e + + s + gi|9081913 397 LVIVSSESGVVQVE-PGNVKSKGR 419 DUF1976: domain 1 of 1, from 428 to 448: score -1.5, E = 4.3 *->VsvYiyFkevtdnksLsEysVtyk<-* V++++ ++++nk ++ sVt k gi|9081913 428 VDIFS--HKILNNKEIK-TSVTTK 448 Bac_rhodopsin: domain 1 of 1, from 445 to 472: score 0.9, E = 4.9 CS HHHHHHHHHHHHHHHHHCHHHTC--------- *->vvAKVgFgfilLrsravlertvavgsalaage<-* v++K+++g +l ++r++le + + l+++ gi|9081913 445 VTTKIPYGELLTDARQILE--HK--PFLSDQQ 472 Coq4: domain 1 of 1, from 459 to 481: score -0.3, E = 9.1 *->rrILkEkPRissetldlkkLrkL<-* r+IL kP s ++d kkL +L gi|9081913 459 RQILEHKPFLSDQQVDIKKLMQL 481 Glu_syn_central: domain 1 of 1, from 478 to 773: score 649.1, E = 7.9e-213 CS HHHHHHCTT--HHHHHCTCHHHHHHSS--EE-S---S--CCC-SS-- *->llrrQkAFGYTyEdvelvllPMAetGkEalGSMGdDtPLAVLSekpr l+++Q+AFGYT+Edvelv+++MA+++kE++++MGdD+PL +LSek++ gi|9081913 478 LMQLQTAFGYTNEDVELVIEHMASQAKEPTFCMGDDIPLSILSEKSH 524 CS -GGGCEEE----SSS----TTTTGGG-B--EEES--S-TTS-SGGGC-CE lLYdYFKQlFAQVTNPPIDPIREelVMSLetylGpegNlLeptpeqarrl +LYdYFKQ+FAQVTNP+IDP+RE+lVMSL+ ++G+++NlL+ p+ a+++ gi|9081913 525 ILYDYFKQRFAQVTNPAIDPLRESLVMSLAIQIGHKSNLLDDQPTLAKHI 574 CS EESSSB--HHHHHH.HHHH....CCCCEEEEESEEESTTSTTCHHHHHHH kLesPILsnselekmlknidairegfkaatIditFdveeGvdgLeaaLdr kLesP+++++el++ + + +++++ I+++F e+G++ ++ + + gi|9081913 575 KLESPVINEGELNA-IFE-----SKLSCIRINTLFQLEDGPKNFKQQIQQ 618 CS HHHHHHHHHHCT-SEEEEESTCG--CTTEEE--HHHHHHHHHHHHHCTT- lceeAeeAirsGaniivLSDRndildeervaIPaLLAvGAVHhHLIrkgL lce A++Ai +G ni+vLSD+n+ ld+e+v+IP+LLAvGAVHhHLI kgL gi|9081913 619 LCENASQAILDGNNILVLSDKNNSLDSEKVSIPPLLAVGAVHHHLINKGL 668 CS CCC-EEEEEESS--SHHHHHHHHCTT-SEEEEHCCHHHHHHHHCCCCCCC RtkvslvVETGEaREvHHFAvLiGYGAsAInPYLAyETirdWWlirrGll R+ +s+ VET++++++HHFA+LiGYGAsAI+PYLA+ET r+WW + ++++ gi|9081913 669 RQEASILVETAQCWSTHHFACLIGYGASAICPYLAFETARHWWSNPKTKM 718 CS CHTTTS- T--HHHHHHHHHHHHHHHHHHHHHCTT--BHHHHCCS--EEE lmskGkl.elsleeavkNYrkAiekGlLKIMSKMGISTlqSYrGAQIFEA lmskG+l++++++ea++NY+kA+e+GlLKI+SKMGIS+l+SY+GAQIFE+ gi|9081913 719 LMSKGRLpACNIQEAQANYKKAVEAGLLKILSKMGISLLSSYHGAQIFEI 768 CS SSB-H vGLsk<-* +GL++ gi|9081913 769 LGLGS 773 Flavodoxin_NdrI: domain 1 of 1, from 488 to 497: score 2.1, E = 4.6 CS -HHHHHHHHH *->TneDVerVrk<-* TneDVe V + gi|9081913 488 TNEDVELVIE 497 P22_AR_N: domain 1 of 1, from 524 to 541: score -0.2, E = 9.5 *->dVLydYWtrkGkAv..NPR<-* ++LydY+ + +A +NP+ gi|9081913 524 HILYDYFK-QRFAQvtNPA 541 Cache_1: domain 1 of 1, from 537 to 557: score 7.0, E = 0.14 *->wTePYvdaalktgdlViTiaqPv<-* +T+P++d + +++lV ++a+++ gi|9081913 537 VTNPAIDPL--RESLVMSLAIQI 557 Glu_synthase: domain 1 of 2, from 650 to 676: score 1.3, E = 3 CS --HHHHHHHHHHHHHCTT-CCCSEEEE *->lPwelgLaevhqtLvengLRdrVsLia<-* +P l++ +vh L++ gLR + s+ + gi|9081913 650 IPPLLAVGAVHHHLINKGLRQEASILV 676 HdeA: domain 1 of 1, from 727 to 749: score 9.6, E = 0.015 *->ACk.QdkkAsFkdKvkaEldKvk<-* AC Q+ +A++k+ v+a l K+ gi|9081913 727 ACNiQEAQANYKKAVEAGLLKIL 749 Sel1: domain 1 of 1, from 729 to 745: score 2.5, E = 7 CS .HHH.HHHHHHHHHHTT- *->DyekeAlkwyekAAeqGn<-* ++++ A + y+kA e+G gi|9081913 729 NIQE-AQANYKKAVEAGL 745 DUF1981: domain 1 of 1, from 765 to 787: score 3.6, E = 3.3 *->iFgvltlaakeesesivklAfqiid.qi<-* iF++l+l++ v+lAf+ +++qi gi|9081913 765 IFEILGLGSEV-----VNLAFKGTTsQI 787 tRNA_anti: domain 1 of 1, from 818 to 839: score 4.9, E = 2 CS EEEEEEETTSSTSTCTCTT..EEEEEEEEEEE *->tGkvkkrpggeqNnlkTGeKAlelvveeievl<-* +G v+ rpgge ++++ +e+ gi|9081913 818 YGFVQYRPGGE----------YHINNPEMSKA 839 Cystatin: domain 1 of 1, from 826 to 859: score 2.4, E = 3.9 CS ECEEEEET.STSHHHHHHHHHHHHHHHHHSSSSEEEEE *->GglspvdpNendpevqealdfAlakyNeksndnylfel<-* Gg +++ pe +al+ A+ yN + +ny++ l gi|9081913 826 GGEYHINN----PEMSKALHQAVRGYNPEYYNNYQSLL 859 RNase_PH_C: domain 1 of 1, from 827 to 846: score 4.2, E = 2.3 CS SSSS.B.HHHHHHHHHHHHHH *->GkgnglteelleealelAkeg<-* G +++++ +++ +al++A+ g gi|9081913 827 G-EYHINNPEMSKALHQAVRG 846 Glu_synthase: domain 2 of 2, from 830 to 1216: score 857.3, E = 9e-255 CS -SS-HHHHHHHHHHHHC--T-HHHHHHHHHHHHTS.-S-SGGGGEEE *->hrnepeviktlqkavqvpveskpsydkYreplnertpigalrdlLef h n+pe++k l++av+ + y +Y+ +l +r p++alrdlL++ gi|9081913 830 HINNPEMSKALHQAVRG--YNPEYYNNYQSLLQNR-PPTALRDLLKL 873 CS --SS--......--GGGS--HHHHHTTEEEEEB-CTTC-HHHHHHHHHHH kyaeepldtdkiipieevepaleikkrfctgaMSyGALSeeAheALAiAm ++++p i+i+eve+++ i + fctg+MS+GALS+e+he+LAiAm gi|9081913 874 QSNRAP------ISIDEVESIEDILQKFCTGGMSLGALSRETHETLAIAM 917 CS HHCT-EEEETTT---GGGCSB-TTS-T S BTTSTT--S--TT-B---SE nriGtksNtGEGGedperlkpaadlds.G.SpTlpHLkGLqnednarSAI nriG+ksN+GEGGedp r+k + d++s+G+Sp lpHLkGL+n+d+a+SAI gi|9081913 918 NRIGGKSNSGEGGEDPVRFKILNDVNSsGtSPLLPHLKGLKNGDTASSAI 967 CS EEE-TT-TT--............HHHHCC-SEEEEE---TTSTTT--EE- kQvASGRFGVtkRnGefWeefkRseYLvnAdalEIKiAQGAKPGeGGhLP kQ+ASGRFGVt +eYL+nA++lEIKiAQGAKPGeGG+LP gi|9081913 968 KQIASGRFGVT------------PEYLMNAKQLEIKIAQGAKPGEGGQLP 1005 CS GGG--HHHHHHHTS-TT--EE--SS-TT-SSHHHHHHHHHHHHHH-.TTS GeKVspeIAriRnstPGvgliSPpPHHDIysiEDLaqLIydLkeindpkA G+K+sp+IA +R ++PGv liSPpPHHDIysiEDL+qLI+dL++in pkA gi|9081913 1006 GKKISPYIATLRKCKPGVPLISPPPHHDIYSIEDLSQLIFDLHQIN-PKA 1054 CS EEEEEEE-STTHHHHHHH...HHHTT-SEEEEE-TT---SSEECCHHHHC pisVKLVsehgvgtiaaGhmqvakAnADiIlIdGhdGGTGASpktsikha +isVKLVse g+gtiaaG vak+nADiI+I+GhdGGTGASp++sikha gi|9081913 1055 KISVKLVSEIGIGTIAAG---VAKGNADIIQISGHDGGTGASPLSSIKHA 1101 CS ---HHHHHHHHHHHHHCTT-CCCSEEEEESS--SHHHHHHHHHCT-SEEE GlPwelgLaevhqtLvengLRdrVsLiadGGLrTGaDVakAaaLGAdavg G PwelgL+evhq+L en+LRdrV+L++dGGLrTG D+++Aa++GA+++g gi|9081913 1102 GSPWELGLSEVHQLLAENQLRDRVTLRVDGGLRTGSDIVLAAIMGAEEFG 1151 CS -SHHHHHHCT--S---CCCT--TTSSS---CCHH..CT----HHHHHHHH iGTaaLiAlGCimaRvCHtntCPvGvATQDPeLrKrlkfegaperVvNyf +GT+a+iA+GCimaR+CHtn+CPvGvATQ++eLr +f g+pe +vN+f gi|9081913 1152 FGTVAMIATGCIMARICHTNKCPVGVATQREELR--ARFSGVPEALVNFF 1199 CS HHHHHHHHHHHHHHT-S iflaeEvrellaqlGfr<-* +f+ Evre+la+lG++ gi|9081913 1200 LFIGNEVREILASLGYK 1216 DUF258: domain 1 of 1, from 839 to 860: score 0.3, E = 8.3 CS HHHHHHHCTSS-HHHHHHHHHHHH *->AVkaAveeGeIseeRYesYlklle<-* A+ +Av +++e Y++Y+ ll+ gi|9081913 839 ALHQAVR--GYNPEYYNNYQSLLQ 860 Pencillinase_R: domain 1 of 1, from 856 to 894: score 3.9, E = 2.5 CS XXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXX *->drlfggsvgalvanfleee....klSeddieeLrelLde<-* + l++++++ ++ ++l+ ++++ ++S d++e ++++L++ gi|9081913 856 QSLLQNRPPTALRDLLKLQsnraPISIDEVESIEDILQK 894 SelT: domain 1 of 1, from 872 to 885: score 3.1, E = 2.2 *->KLqtGrvYAPPtpqEL<-* KLq++r P++++E+ gi|9081913 872 KLQSNRA--PISIDEV 885 Nitro_FeMo-Co: domain 1 of 1, from 879 to 897: score 2.1, E = 5.3 CS EEE-TTSSBHHHHHHHHHC *->pikagegetieeaiealqe<-* pi e e+ie+ + ++ + gi|9081913 879 PISIDEVESIEDILQKFCT 897 DUF37: domain 1 of 1, from 927 to 934: score 3.0, E = 4.5 *->hpGGyDPV<-* ++GG DPV gi|9081913 927 GEGGEDPV 934 Scm3: domain 1 of 1, from 953 to 963: score 2.2, E = 3.5 *->HLraLeteddi<-* HL++L+++d++ gi|9081913 953 HLKGLKNGDTA 963 cobW: domain 1 of 1, from 1038 to 1058: score 5.1, E = 0.45 CS ...HHHHHHHHHH-SSS-EEE *->adlekleadlrrlnpeapiip<-* +dl++l+ dl+++np+a+i gi|9081913 1038 EDLSQLIFDLHQINPKAKISV 1058 Arch_flagellin: domain 1 of 1, from 1050 to 1072: score 4.1, E = 0.66 *->inpstkvrgeVvpenGapgtief<-* inp k+++++v+e+G+ ++ gi|9081913 1050 INPKAKISVKLVSEIGIGTIAAG 1072 DUF1393: domain 1 of 1, from 1055 to 1068: score 3.1, E = 2 *->klSvKtVVAiGIGA<-* k+SvK V iGIG+ gi|9081913 1055 KISVKLVSEIGIGT 1068 FtsK_SpoIIIE: domain 1 of 1, from 1107 to 1143: score 2.6, E = 3.1 *->lviDnydeLaeenlL.ervtsLknqGlsygvhvmata<-* l++ + ++L +en+L++rvt+ + +Gl +g +++++a gi|9081913 1107 LGLSEVHQLLAENQLrDRVTLRVDGGLRTGSDIVLAA 1143 FMN_dh: domain 1 of 1, from 1109 to 1148: score 3.2, E = 0.89 CS HHHHHHHHHCHHTTTSSEEEEESS-SSHHHHHHHHHHTSS *->LpeVvPIlkeaAvkgdieVllDgGvRRGtDVlKALALGAr<-* L eV +l e + +++ +DgG R+G+D++ A +GA+ gi|9081913 1109 LSEVHQLLAENQLRDRVTLRVDGGLRTGSDIVLAAIMGAE 1148 DSRB: domain 1 of 1, from 1120 to 1134: score 2.7, E = 2.7 *->mKvndrvtvKtDGgpR<-* ++ drvt + DGg R gi|9081913 1120 -QLRDRVTLRVDGGLR 1134 Phage_Mu_P: domain 1 of 1, from 1122 to 1131: score -0.4, E = 10 *->sntVtLrvgG<-* ++VtLrv+G gi|9081913 1122 RDRVTLRVDG 1131 Hormone_4: domain 1 of 1, from 1168 to 1176: score 4.4, E = 2.5 CS X-TT--TT- *->CyirnCPrG<-* C + CP+G gi|9081913 1168 CHTNKCPVG 1176 GDC-P: domain 1 of 1, from 1205 to 1225: score 7.1, E = 0.086 *->eqqeMLstiGlssLddLidat<-* e++e+L+++G++sLdd ++++ gi|9081913 1205 EVREILASLGYKSLDDITGQN 1225 PspB: domain 1 of 1, from 1268 to 1276: score 0.4, E = 8.4 *->MsaffLagP<-* M+ ++La+P gi|9081913 1268 MDDDILAIP 1276 T5orf172: domain 1 of 1, from 1271 to 1293: score 2.0, E = 6.1 *->dvvalievedaraklEklLHkrFk<-* d+ a+ ev++a klE+++ k+Fk gi|9081913 1271 DILAIPEVSNAI-KLETEITKHFK 1293 CAP_C: domain 1 of 1, from 1283 to 1292: score 1.3, E = 7.4 CS EEEEEE---- *->KLvTevveha<-* KL+Te++ h gi|9081913 1283 KLETEITKHF 1292 GXGXG: domain 1 of 1, from 1290 to 1485: score 367.3, E = 2.7e-107 CS EEEEE-TT--STTHHHHHHHHHHCTTTS.S-TTCEEEEEEEEE-TTT *->keeaiiNtdrlvgtrlsgeiakkygeegalpkdtgkivfnGsAGqsf k+++i Nt+r+vgtrlsg iak yg+ g + k+ +k++f+GsAGqsf gi|9081913 1290 KHFKIANTNRTVGTRLSGIIAKNYGNTG-F-KGLIKLNFYGSAGQSF 1334 CS TTT-BTTEEEEEEEEE-S.TTTTT-ECCEEEEE--TT-.......SS-GG GafmagGvtLeleGdAnddyvGkgmsGGeIvikgnagdpvGnnMdageyv Gaf+a+G++L l+G+And yvGkgm+GG+Ivi+++ag +e + gi|9081913 1335 GAFLASGINLKLMGEAND-YVGKGMNGGSIVIVPPAGT-------IYEDN 1376 CS GSEEC-SSTTTT--CEEEEESSEE-TTTTTT-.....CCEEEEESEB.-S gnviaGNtclyGatGGkifiaGdAGerfgvrnkayKdsgatiVveGvaGd ++vi+GNtclyGatGG++f++G+AGerf+vrn s a+ VveGv Gd gi|9081913 1377 NQVIIGNTCLYGATGGYLFAQGQAGERFAVRN-----SLAESVVEGV-GD 1420 CS STTTT-EEEEEEESS-B-SSBTTT--CCEEEEE-TTS.......THHHHB hggEYMtGGtivVlGdaGrnvGagMtGGiaYvlgeiedfsyMiatlpgkv h++EYMtGG+ivVlG+aGrnvGagMtGG+aY+l+e+e + ++v gi|9081913 1421 HACEYMTGGVIVVLGKAGRNVGAGMTGGLAYFLDEDE-------NFIDRV 1463 CS -CCCEEEE...ES-S......CCHHHHHHHH nleiVeledlkrievkrkklLpegekqlkel<-* n+eiV+ + r+ + ++ge+qlk+l gi|9081913 1464 NSEIVKIQ---RVIT------KAGEEQLKNL 1485 DUF1514: domain 1 of 1, from 1453 to 1469: score 3.5, E = 5.7 *->LeeyrieveRikkevkk<-* L e+++ ++R++ e+ k gi|9081913 1453 LDEDENFIDRVNSEIVK 1469 Colicin: domain 1 of 1, from 1456 to 1467: score 1.4, E = 7.5 CS SHHHHHHHHHCH *->DdkfveklNkli<-* D++f++ +N +i gi|9081913 1456 DENFIDRVNSEI 1467 Ribosomal_S6: domain 1 of 1, from 1461 to 1481: score 3.3, E = 3.7 CS CCHHHHHHHHHHHHHCTT-EE *->EqvkqeiekYqkvLtnngAei<-* ++v++ei k+q+v+t++g+e+ gi|9081913 1461 DRVNSEIVKIQRVITKAGEEQ 1481 BicD: domain 1 of 1, from 1465 to 1481: score -1.6, E = 6.8 *->gqaysnqrkvAkdGeer<-* + +++qr+ +k Gee+ gi|9081913 1465 SEIVKIQRVITKAGEEQ 1481 PUF: domain 1 of 1, from 1470 to 1486: score 6.5, E = 0.47 *->lQkllevateeqkqlil<-* +Q+++++a+eeq ++++ gi|9081913 1470 IQRVITKAGEEQLKNLI 1486 DUF477: domain 1 of 1, from 1472 to 1495: score 3.8, E = 1.7 *->gtLspserarLeqalaalEqktga<-* ++++++ ++L ++ ++ktg+ gi|9081913 1472 RVITKAGEEQLKNLIENHAAKTGS 1495 Phage_prot_Gp6: domain 1 of 1, from 1479 to 1492: score 1.0, E = 4 *->eEmikkFidkHklr<-* eE +k++i+ H+++ gi|9081913 1479 EEQLKNLIENHAAK 1492 IBN_N: domain 1 of 1, from 1498 to 1516: score 8.2, E = 0.17 CS HHHHHHHHHCCTHHCHHHHH *->AEkqLeqlekqklPgfllaL<-* A++ Le+++++ lP+f++ + gi|9081913 1498 AHTILEKWNSY-LPQFWQVV 1516 GspM: domain 1 of 1, from 1506 to 1520: score 1.0, E = 8.6 CS XXXXXXXXXXXXXXX *->mneLqawWqgrspRE<-* ++ L ++Wq ++p+E gi|9081913 1506 NSYLPQFWQVVPPSE 1520 // From etal at uga.edu Tue Oct 30 17:21:25 2012 From: etal at uga.edu (Eric Talevich) Date: Tue, 30 Oct 2012 13:21:25 -0400 Subject: [Biopython-dev] Fwd: Pull Request: MafIO.py In-Reply-To: <5aa8ce85a0ec41b5b817cdc5105bfdfb@BLUPRD0210HT004.namprd02.prod.outlook.com> References: <5aa8ce85a0ec41b5b817cdc5105bfdfb@BLUPRD0210HT004.namprd02.prod.outlook.com> Message-ID: ---------- Forwarded message ---------- From: Nick Loman Date: Tue, Oct 30, 2012 at 6:34 AM Subject: Pull Request: MafIO.py Hi there Thanks for the MafIO branch. In order to get it to read MAF files produced by Mugsy (mugsy.sourceforge.net) I had to make the following change: diff --git a/Bio/AlignIO/MafIO.py b/Bio/AlignIO/MafIO.py index 6eda0ca..4bb1407 100644 --- a/Bio/AlignIO/MafIO.py +++ b/Bio/AlignIO/MafIO.py @@ -178,7 +178,7 @@ def MafIterator(handle, seq_count = None, alphabet = single_letter_alphabet): annotations = dict([x.split("=") for x in line.strip().split()[1:]]) - if len([x for x in annotations.keys() if x not in ("score", "pass")]) > 0: + if len([x for x in annotations.keys() if x not in ("score", "pass", "label", "mult")]) > 0: raise ValueError("Error parsing alignment - invalid key in 'a' line") elif line.startswith("#"): # ignore comments My Python fork is a bit confusing right now so hope you don't mind me sending this pull request via email! Cheers Nick From w.arindrarto at gmail.com Wed Oct 31 00:09:41 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 31 Oct 2012 01:09:41 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: <508FE182.3040202@biotech.uni-tuebingen.de> References: <508EEA85.6060906@biotech.uni-tuebingen.de> <508FE182.3040202@biotech.uni-tuebingen.de> Message-ID: Hi Kai, > one more thing: > > Hmmer2 has the concept of an accession number in the result. Is there > an attribute for that in the QueryResult object that I'm missing or do > we want a new attribute for that. Would "accession" be a good name? > > Cheers, > Kai I've used '.acc' for accesion number properties in the current HMMER3 and BLAST parsers, but this choice was arbitrary. '.accession' is a good name. I didn't use it because I like shorter names better, but then again it may be unclear at times. Does anyone have preference between '.acc' or '.accession'? If not, I can change the current '.acc' into '.accession'. cheers, Bow From w.arindrarto at gmail.com Wed Oct 31 00:19:30 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 31 Oct 2012 01:19:30 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: <508FF84A.2020802@biotech.uni-tuebingen.de> References: <508EEA85.6060906@biotech.uni-tuebingen.de> <508F834C.6010404@biotech.uni-tuebingen.de> <508FF84A.2020802@biotech.uni-tuebingen.de> Message-ID: Hi Kai, > I've just stumbled over a case where not being able to pre-create Hit > objects really bites me. > > See the attached hmmpfam output. You'll notice that the domain table > is not in the order of the hit table. As I'd like to preserve the > order of the hit table, the current setup of the API forces me to > either repeatedly parse the domain annotations until I find the > correct domain annotations for my hit, or to create the hits in the > order of the domain annotation table and then reshuffle them to make > sure they're in the order of the hit table. > > If I could just create "empty" hit objects when parsing the hit table, > I could easily preserve the order of the hits but still add the hsps > as I parse them. Hmm.. This is a problem :/. I didn't expect any format to have this kind of ordering. I'll see what I can do with the current API limitation. We may need to change it back to not requiring any HSPs for Hit. In any case, I'll see what needs to be done first and get back asap. cheers, Bow From mjldehoon at yahoo.com Wed Oct 31 01:12:18 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 30 Oct 2012 18:12:18 -0700 (PDT) Subject: [Biopython-dev] Working with the new SearchIO API Message-ID: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com> >Does anyone have preference between '.acc' or '.accession'? If not, I >can change the current '.acc' into '.accession'. I would prefer .accession for clarity. Best, -Michiel From andrewscz at gmail.com Wed Oct 31 18:10:48 2012 From: andrewscz at gmail.com (Andrew Sczesnak) Date: Wed, 31 Oct 2012 11:10:48 -0700 Subject: [Biopython-dev] Pull Request: MafIO.py In-Reply-To: References: Message-ID: <01027F16-EBA0-41A2-B1F5-D0E128B0B08E@gmail.com> Nick, Can you provide a snippet of a file from mugsy for the unit tests? Thanks, Andrew On Oct 31, 2012, at 9:00 AM, biopython-dev-request at lists.open-bio.org wrote: > From: Nick Loman > Date: Tue, Oct 30, 2012 at 6:34 AM > Subject: Pull Request: MafIO.py > > > Hi there > > Thanks for the MafIO branch. In order to get it to read MAF files produced > by Mugsy (mugsy.sourceforge.net) I had to make the following change: > > diff --git a/Bio/AlignIO/MafIO.py b/Bio/AlignIO/MafIO.py > index 6eda0ca..4bb1407 100644 > --- a/Bio/AlignIO/MafIO.py > +++ b/Bio/AlignIO/MafIO.py > @@ -178,7 +178,7 @@ def MafIterator(handle, seq_count = None, alphabet = > single_letter_alphabet): > > annotations = dict([x.split("=") for x in > line.strip().split()[1:]]) > > - if len([x for x in annotations.keys() if x not in ("score", > "pass")]) > 0: > + if len([x for x in annotations.keys() if x not in ("score", > "pass", "label", "mult")]) > 0: > raise ValueError("Error parsing alignment - invalid key in > 'a' line") > elif line.startswith("#"): > # ignore comments > > > My Python fork is a bit confusing right now so hope you don't mind me > sending this pull request via email! > > Cheers > > Nick From redmine at redmine.open-bio.org Wed Oct 31 19:09:57 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 31 Oct 2012 19:09:57 +0000 Subject: [Biopython-dev] [Biopython - Bug #3297] newline added in quated features References: Message-ID: Issue #3297 has been updated by Chris Fields. Assignee changed from Bioperl Guts to Biopython Dev Mailing List Changing default assignee. ---------------------------------------- Bug #3297: newline added in quated features https://redmine.open-bio.org/issues/3297 Author: Jesse van Dam Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: Note: sorry for the duplicate reporting, did not notice the makeup of the bug reporting system When I have a feature line like (which spans multiple lines) in a genbank file
                     /product="Glutamate synthase [NADPH] small chain (EC 1.4.1
                     .13)"

Then a space/newline will be added between 1.4.1 and .13 in the result so when printing the feature with the following code
  print(source[0].qualifiers["product"])
It will print (with the an unwanted space)
Glutamate synthase [NADPH] small chain (EC 1.4.1 .13)
Changed the following thing in scanner.py to fix this problem
                    elif value[0]=='"':
                        #Quoted...
                        if value[-1]!='"' or value!='"':
                            #No closing quote on the first line...
                            while value[-1] != '"':
-                               value += "\n" + iterator.next() 
+                               value += iterator.next() 
                        else:
                            #One single line (quoted)
                            assert value == '"'
                            if self.debug : print "Quoted line %s:%s" % (key, value)
                        #DO NOT remove the quotes...
                        qualifiers.append((key,value))

-- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org