From p.j.a.cock at googlemail.com Mon Jan 7 13:55:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 7 Jan 2013 18:55:25 +0000 Subject: [Biopython] Dropping Python 2.5 and Jython 2.5 support? In-Reply-To: References: Message-ID: On Mon, Oct 22, 2012 at 6:17 PM, Peter Cock wrote: > Dear Biopythoneers, > > Would anyone object to us preparing to drop support for Python 2.5 and > Jython 2.5, perhaps after the next Biopython release? > > To reassure those of you using Jython, we'd wait until Jython 2.7 is out > first. Jython 2.7 is already in alpha, and brings support for C Python 2.7 > language features. > > Thanks, > > Peter Hello all, Having recently back-ported some Python 3 code with a C extension to Python 2.6 and 2.7, I can now more clearly appreciate the benefits dropping Python 2.5 support has for writing code for both Python 2 and 3 - and am keen to be able to exploit this for Biopython. Given no major objections to the email I sent round in October last year (thank you for your input Nathan), we will press ahead with phasing out support for Python 2.5, provisionally supporting it in the forthcoming Biopython 1.61 and at least one more release (which would mean Biopython 1.62 due Summer 2013). https://github.com/biopython/biopython/commit/3f17f75b320fb6624d332809ef07314bab97477c My only significant concern is for Jython users, since this will also mean dropping support for Jython 2.5 (which implements the Python 2.5 language). The replacement Jython 2.7 is still only at the alpha release stage. Regards, Peter From thomas.girke at ucr.edu Wed Jan 9 16:10:05 2013 From: thomas.girke at ucr.edu (Thomas Girke) Date: Wed, 9 Jan 2013 13:10:05 -0800 Subject: [Biopython] Bioinformatics Position Opening Message-ID: <20130109211005.GA3819@dhcp-138-23-59-108.dyn.ucr.edu> Dear List, Below is an announcement for a Ph.D. level bioinformatics position at UCR. It is a long-term position with a competitive salary in a vibrant research environment with cutting edge high-performance compute and genomics facilities. Application instructions are given in the announcement. Potential candidates are welcome to email me their questions about this position directly, e.g. prior or after submitting a formal application. Best, Thomas -- Thomas Girke Associate Professor of Bioinformatics Institute for Integrative Genome Biology (IIGB) 1207F Genomics Building University of California Riverside, CA 92521 E-mail: thomas.girke at ucr.edu Ph: 951-905-5232 Fax: 951-827-5155 POSITION ANNOUNCEMENT POSITION The Institute for Integrative Genome Biology (IIGB) at the University of California, Riverside is seeking a Ph.D. level bioinformatician to manage its bioinformatics research activities and computing facility. TITLE/RANK Bioinformatics Facility Director. Salary will be competitive and commensurate with accomplishments. LOCATION University of California, Riverside. BACKGROUND Successful candidates will join an innovative and multidisciplinary Institute for Integrative Genome Biology (IIGB) that connects theoretical and experimental researchers from different departments in Life, Physical and Mathematical Sciences, Medicine, Engineering and various campus based Centers. The IIGB is organized around a 10,000 sq.ft. suite of Instrumentation Facilities that serve as a centralized, shared-use resource for faculty, staff and students, offering advanced tools in bioinformatics, microscopy, proteomics and genomics. Its bioinformatic component is equipped with an extensive high-performance compute (HPC) infrastructure. QUALIFICATIONS Applicants must have a Ph.D. from a recognized university in bioinformatics; or combined degrees in computer science and a biological science; or a degree in either computer science combined with relevant experience in biological science; or a degree in a biological science combined with relevant experience in computer science. The successful candidate will have at least 2 years of professional hands-on experience with next generation sequence data analysis, scientific data programming and high-performance computing. A strong publication record of bioinformatics research in collaboration with experimental biologists is expected. Another requirement is several years of professional experience with common programming languages/environments. This includes at least one statistical programming environment (preferentially R), one or more general-purpose scripting languages (e.g. Python, Perl or Ruby), experience with web development frameworks and relational database design. Several years of computational research experience using HPC systems will be beneficial. The incumbent should also have experience with the analysis of modern biological data sets, such as microarrays, next generation sequence data (e.g. genotyping, RNA profiling, de novo assemblies), phylogenetics and/or molecular dynamics simulations. RESPONSIBILITIES The Bioinformatics Facility Director manages IIGB's computational infrastructure jointly with its bioinformatics staff, including an HPC/Linux systems administrator, one or more programmers and students workers. The incumbent will be required to provide data analysis support to collaborative research activities and make available findings through presentations and contribute as team member to scientific publications as well as participate in the preparation of joint grant applications and reports. The teaching expectations include the development of a state-of-the-art workshop program on large-scale data analysis and programming. Participation in collaborative equipment grants will be another core responsibility to secure future growth of the facility?s computing resources. TO APPLY Review of applications will continue until the position is filled. Interested individuals should: (1) submit a curriculum vitae, (2) provide a statement of research interests, and (3) arrange to have three letters of reference sent on their behalf. All information should be emailed to: thomas.girke at ucr.edu WEBSITE http://facility.bioinformatics.ucr.edu/position-opening From mictadlo at gmail.com Fri Jan 25 08:27:52 2013 From: mictadlo at gmail.com (Mic) Date: Fri, 25 Jan 2013 23:27:52 +1000 Subject: [Biopython] goatools Message-ID: Hello, Is goatools still the most up to date python package? Any idea how the files in goatools / data / on Github were created? How would it be possible to use plot*go*term.py (from goatools) to create input files for Cytoscape or VisANT http://www.cytoscape.org/ http://visant.bu.edu/ Thank you in advance. Mic From steffen_moeller at gmx.de Fri Jan 25 10:11:28 2013 From: steffen_moeller at gmx.de (=?iso-8859-1?Q?=22Steffen_M=F6ller=22?=) Date: Fri, 25 Jan 2013 16:11:28 +0100 Subject: [Biopython] Debian Med Sprint in Kiel, Germany 23rd/24th of February Message-ID: <20130125151128.187040@gmx.net> Dear all, We have our annual Debian/Ubuntu/Bio-Linux sprint on Bioinformatics again next month. Every year there are a few individuals more peripheral to the distribution attending, which usually helps us to develop our community further in some way. Anybody from BioPython interested to join in, please read through http://wiki.debian.org/DebianMed/Meeting/Kiel2013 and just email me or add him/herself. There is not anything particular that I expect from the BioPython community, except for more and better ideas on how to develop research on and with tools in computational biology further. Registration is free. Accommodation and travel are not. Cheers, Steffen From alejandro.0317 at gmail.com Fri Jan 25 20:16:42 2013 From: alejandro.0317 at gmail.com (Cristian Alejandro Rojas) Date: Fri, 25 Jan 2013 20:16:42 -0500 Subject: [Biopython] Issue with Bio.Entrez and protein in Biopython 1.60 Message-ID: Hi all. I'm having a issue using Bio.Entrez to search a protein. I'm doing this: >>> handle=Entrez.esearch(db="protein", term="insulin AND homo") >>> record=Entrez.read(handle) Traceback (most recent call last): File "", line 1, in File "/usr/lib/pymodules/python2.7/Bio/Entrez/__init__.py", line 351, in read record = handler.read(handle) File "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line 169, in read self.parser.ParseFile(handle) File "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line 307, in endElementHandler raise RuntimeError(value) RuntimeError: Search Backend failed: Database is not supported: protein I'm having a issue with einfo() too, check at this: >>> handler=Entrez.einfo(db="protein") >>> record=Entrez.read(handler) Traceback (most recent call last): File "", line 1, in File "/usr/lib/pymodules/python2.7/Bio/Entrez/__init__.py", line 351, in read record = handler.read(handle) File "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line 169, in read self.parser.ParseFile(handle) File "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line 285, in startElementHandler raise ValidationError(name) Bio.Entrez.Parser.ValidationError: Failed to find tag 'Build' in the DTD. To skip all tags that are not represented in the DTD, please call Bio.Entrez.read or Bio.Entrez.parse with validate=False. Why protein database is not supported?, Can somebody help me with this issue? Grettings! -- *Cristian Alejandro Rojas Quintero* *Estudiante Ingenier?a de Sistemas * *Universidad Distrital Francisco Jos? de Caldas* Bogot? - Colombia From nicolas.joannin at gmail.com Sat Jan 26 07:42:19 2013 From: nicolas.joannin at gmail.com (Nicolas Joannin) Date: Sat, 26 Jan 2013 21:42:19 +0900 Subject: [Biopython] Bio.Entrez.epost error with Python 3.2 Message-ID: Hello everyone, I am having trouble with using Bio.Entrez.epost: see details below. I have tried converting the variables to bytes, but everything I've tried gives the same error message. I'm guessing that this might be a bug in biopython when used with Python 3. If not, could you please tell me where I got this wrong, and how I can fix it? Best regards, Nicolas Here are the details: I have tried the following: - post_h=Entrez.epost("nuccore",id="160418,160351") - post_h=Entrez.epost(b"nuccore",id=b"160418,160351") - post_h=Entrez.epost("nuccore".encode("utf-8"),id="160418,160351".encode("utf-8") >>> post_h=Entrez.epost("nuccore",id="160418,160351") Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/Bio/Entrez/__init__.py", line 97, in epost return _open(cgi, variables, post=True) File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/Bio/Entrez/__init__.py", line 436, in _open handle = urllib.request.urlopen(cgi, data=options) File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 138, in urlopen return opener.open(url, data, timeout) File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 367, in open req = meth(req) File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 1066, in do_request_ raise TypeError("POST data should be bytes" TypeError: POST data should be bytes or an iterable of bytes. It cannot be str. Nicolas Joannin, Ph.D. Bioinformatics Center Kyoto University, Uji campus, Japan From mjldehoon at yahoo.com Sat Jan 26 22:28:29 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 26 Jan 2013 19:28:29 -0800 (PST) Subject: [Biopython] Issue with Bio.Entrez and protein in Biopython 1.60 In-Reply-To: Message-ID: <1359257309.3476.YahooMailClassic@web164004.mail.gq1.yahoo.com> Hi Cristian, --- On Fri, 1/25/13, Cristian Alejandro Rojas wrote: > I'm having a issue using Bio.Entrez to search a protein. I'm > doing? this: > > >>> handle=Entrez.esearch(db="protein", > term="insulin AND homo") > >>> record=Entrez.read(handle) > Traceback (most recent call last): It works for me now, so this may have been a temporary glitch at the E-Utilities: >>> from Bio import Entrez >>> handle=Entrez.esearch(db="protein", term="insulin AND homo") >>> record = Entrez.read(handle) >>> print record {u'Count': '3956', u'RetMax': '20', u'IdList': ['443497968', '443497970', '443428106', '443428104', '443428107', '443428105', '83700231', '21361212', '4505143', '66472382', '443287675', '443287677', '419636284', '375298744', '341940804', '341940253', '332278248', '317373577', '317373571', '317373494'], u'TranslationStack': [{u'Count': '32050', u'Field': 'All Fields', u'Term': 'insulin[All Fields]', u'Explode': 'N'}, {u'Count': '0', u'Field': 'Organism', u'Term': '"Homo"[Organism]', u'Explode': 'N'}, {u'Count': '10253279', u'Field': 'All Fields', u'Term': 'homo[All Fields]', u'Explode': 'N'}, 'OR', 'GROUP', 'AND'], u'TranslationSet': [{u'To': '"Homo"[Organism] OR homo[All Fields]', u'From': 'homo'}], u'RetStart': '0', u'QueryTranslation': 'insulin[All Fields] AND ("Homo"[Organism] OR homo[All Fields])'} > I'm having a issue with einfo() too, check at this: > > >>> handler=Entrez.einfo(db="protein") > >>> record=Entrez.read(handler) > Traceback (most recent call last): > ? File "", line 1, in > ? File > "/usr/lib/pymodules/python2.7/Bio/Entrez/__init__.py", line > 351, in > read > ? ? record = handler.read(handle) > ? File > "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line > 169, in > read > ? ? self.parser.ParseFile(handle) > ? File > "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line > 285, in > startElementHandler > ? ? raise ValidationError(name) > Bio.Entrez.Parser.ValidationError: Failed to find tag > 'Build' in the DTD. > To skip all tags that are not represented in the DTD, please > call > Bio.Entrez.read or Bio.Entrez.parse with validate=False. > This error message means exactly what it says. To see what the E-Utilities returns, try >>> handle = Entrez.einfo(db="protein") >>> print handle.read() protein Protein Protein sequence record Build130126-0031m.1 ... If you look at the DTD file at http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eInfo_020511.dtd, you'll see that "Build" is not mentioned anywhere. But it is present in the XML file. The error message tells you that the XML file is not consistent with its DTD, but that you can ignore such tags by using validate=False: >>> handle = Entrez.einfo(db="protein") >>> record = Entrez.read(handle, validate=False) >>> print record {u'DbInfo': {u'Count': '73259352', u'LastUpdate': '2013/01/26 08:18', u'MenuName': 'Protein', u'Description': 'Protein sequence record', u'LinkList': [{u'DbTo': 'bioproject', u'Menu': 'BioProject Links', u'Name': 'protein_bioproject', u'Description': 'Proteins related to BioProjects'}, {u'DbTo': 'biosystems', u'Menu': 'BioSystem Links', u'Name': 'protein_biosystems', u'Description': 'Pathways and other biosystems containing the current prot... Best, -Michiel. From mjldehoon at yahoo.com Sat Jan 26 23:06:10 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 26 Jan 2013 20:06:10 -0800 (PST) Subject: [Biopython] Bio.Entrez.epost error with Python 3.2 In-Reply-To: Message-ID: <1359259570.36220.YahooMailClassic@web164003.mail.gq1.yahoo.com> For some reason Entrez.epost switched to post=False to post=True in the call to _open in Bio.Entrez. I don't know why; the other Entrez functions all use post=False, and if I use post=False in Entrez.epost it seems to work fine. The switch from post=False to post=True was made in this commit; Peter, do you remember why we switched to post=True? https://github.com/biopython/biopython/commit/c928057b6c811c8959daf806ee6159eb09e0928f#Bio/Entrez/__init__.py Best, -Michiel. --- On Sat, 1/26/13, Nicolas Joannin wrote: > From: Nicolas Joannin > Subject: [Biopython] Bio.Entrez.epost error with Python 3.2 > To: "Biopython Mailing List" > Date: Saturday, January 26, 2013, 7:42 AM > Hello everyone, > > I am having trouble with using Bio.Entrez.epost: see details > below. > I have tried converting the variables to bytes, but > everything I've tried > gives the same error message. > > I'm guessing that this might be a bug in biopython when used > with Python 3. > If not, could you please tell me where I got this wrong, and > how I can fix > it? > > Best regards, > Nicolas > > > Here are the details: > I have tried the following: > > ???- > post_h=Entrez.epost("nuccore",id="160418,160351") > ???- > post_h=Entrez.epost(b"nuccore",id=b"160418,160351") > ???- > ???post_h=Entrez.epost("nuccore".encode("utf-8"),id="160418,160351".encode("utf-8") > > > >>> > post_h=Entrez.epost("nuccore",id="160418,160351") > Traceback (most recent call last): > ? File "", line 1, in > ? File > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/Bio/Entrez/__init__.py", > line 97, in epost > ? ? return _open(cgi, variables, post=True) > ? File > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/Bio/Entrez/__init__.py", > line 436, in _open > ? ? handle = urllib.request.urlopen(cgi, > data=options) > ? File > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > line 138, in urlopen > ? ? return opener.open(url, data, timeout) > ? File > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > line 367, in open > ? ? req = meth(req) > ? File > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > line 1066, in do_request_ > ? ? raise TypeError("POST data should be bytes" > TypeError: POST data should be bytes or an iterable of > bytes. It cannot be > str. > > > Nicolas Joannin, Ph.D. > Bioinformatics Center > Kyoto University, Uji campus, Japan > _______________________________________________ > Biopython mailing list? -? Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From mjldehoon at yahoo.com Sat Jan 26 23:41:54 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 26 Jan 2013 20:41:54 -0800 (PST) Subject: [Biopython] Bio.Entrez.epost error with Python 3.2 In-Reply-To: <1359259570.36220.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: <1359261714.79007.YahooMailClassic@web164006.mail.gq1.yahoo.com> Looking at this some more, I found this on the mailing list explaining why we are using post=True: http://lists.open-bio.org/pipermail/biopython/2009-May/005152.html This page provides some explanation on urllib.parse.urlencode in Python3: http://docs.python.org/3/library/urllib.request.html#urllib-examples Best, -Michiel. --- On Sat, 1/26/13, Michiel de Hoon wrote: > From: Michiel de Hoon > Subject: Re: [Biopython] Bio.Entrez.epost error with Python 3.2 > To: "Biopython Mailing List" , "Nicolas Joannin" > Date: Saturday, January 26, 2013, 11:06 PM > For some reason Entrez.epost switched > to post=False to post=True in the call to _open in > Bio.Entrez. I don't know why; the other Entrez functions all > use post=False, and if I use post=False in Entrez.epost it > seems to work fine. The switch from post=False to post=True > was made in this commit; Peter, do you remember why we > switched to post=True? > > https://github.com/biopython/biopython/commit/c928057b6c811c8959daf806ee6159eb09e0928f#Bio/Entrez/__init__.py > > Best, > -Michiel. > > --- On Sat, 1/26/13, Nicolas Joannin > wrote: > > > From: Nicolas Joannin > > Subject: [Biopython] Bio.Entrez.epost error with Python > 3.2 > > To: "Biopython Mailing List" > > Date: Saturday, January 26, 2013, 7:42 AM > > Hello everyone, > > > > I am having trouble with using Bio.Entrez.epost: see > details > > below. > > I have tried converting the variables to bytes, but > > everything I've tried > > gives the same error message. > > > > I'm guessing that this might be a bug in biopython when > used > > with Python 3. > > If not, could you please tell me where I got this > wrong, and > > how I can fix > > it? > > > > Best regards, > > Nicolas > > > > > > Here are the details: > > I have tried the following: > > > > ???- > > post_h=Entrez.epost("nuccore",id="160418,160351") > > ???- > > post_h=Entrez.epost(b"nuccore",id=b"160418,160351") > > ???- > > > ???post_h=Entrez.epost("nuccore".encode("utf-8"),id="160418,160351".encode("utf-8") > > > > > > >>> > > post_h=Entrez.epost("nuccore",id="160418,160351") > > Traceback (most recent call last): > > ? File "", line 1, in > > ? File > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/Bio/Entrez/__init__.py", > > line 97, in epost > > ? ? return _open(cgi, variables, post=True) > > ? File > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/Bio/Entrez/__init__.py", > > line 436, in _open > > ? ? handle = urllib.request.urlopen(cgi, > > data=options) > > ? File > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > > line 138, in urlopen > > ? ? return opener.open(url, data, timeout) > > ? File > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > > line 367, in open > > ? ? req = meth(req) > > ? File > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > > line 1066, in do_request_ > > ? ? raise TypeError("POST data should be bytes" > > TypeError: POST data should be bytes or an iterable of > > bytes. It cannot be > > str. > > > > > > Nicolas Joannin, Ph.D. > > Bioinformatics Center > > Kyoto University, Uji campus, Japan > > _______________________________________________ > > Biopython mailing list? -? Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > _______________________________________________ > Biopython mailing list? -? Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From alejandro.0317 at gmail.com Sun Jan 27 01:18:17 2013 From: alejandro.0317 at gmail.com (Cristian Alejandro Rojas Quintero) Date: Sun, 27 Jan 2013 01:18:17 -0500 Subject: [Biopython] Issue with Bio.Entrez and protein in Biopython 1.60 In-Reply-To: <1359257309.3476.YahooMailClassic@web164004.mail.gq1.yahoo.com> References: <1359257309.3476.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: <5104C6A9.5070507@gmail.com> Hi Michiel, It was not a glitch at E-utilities, I had to download and compile Biopython from official web (previously was the Biopython from Ubuntu repositories). After of this it works perfectly. Thank you On 26/01/13 22:28, Michiel de Hoon wrote: > Hi Cristian, > > --- On Fri, 1/25/13, Cristian Alejandro Rojas wrote: >> I'm having a issue using Bio.Entrez to search a protein. I'm >> doing this: >> >>>>> handle=Entrez.esearch(db="protein", >> term="insulin AND homo") >>>>> record=Entrez.read(handle) >> Traceback (most recent call last): > It works for me now, so this may have been a temporary glitch at the E-Utilities: > >>>> from Bio import Entrez >>>> handle=Entrez.esearch(db="protein", term="insulin AND homo") >>>> record = Entrez.read(handle) >>>> print record > {u'Count': '3956', u'RetMax': '20', u'IdList': ['443497968', '443497970', '443428106', '443428104', '443428107', '443428105', '83700231', '21361212', '4505143', '66472382', '443287675', '443287677', '419636284', '375298744', '341940804', '341940253', '332278248', '317373577', '317373571', '317373494'], u'TranslationStack': [{u'Count': '32050', u'Field': 'All Fields', u'Term': 'insulin[All Fields]', u'Explode': 'N'}, {u'Count': '0', u'Field': 'Organism', u'Term': '"Homo"[Organism]', u'Explode': 'N'}, {u'Count': '10253279', u'Field': 'All Fields', u'Term': 'homo[All Fields]', u'Explode': 'N'}, 'OR', 'GROUP', 'AND'], u'TranslationSet': [{u'To': '"Homo"[Organism] OR homo[All Fields]', u'From': 'homo'}], u'RetStart': '0', u'QueryTranslation': 'insulin[All Fields] AND ("Homo"[Organism] OR homo[All Fields])'} > >> I'm having a issue with einfo() too, check at this: >> >>>>> handler=Entrez.einfo(db="protein") >>>>> record=Entrez.read(handler) >> Traceback (most recent call last): >> File "", line 1, in >> File >> "/usr/lib/pymodules/python2.7/Bio/Entrez/__init__.py", line >> 351, in >> read >> record = handler.read(handle) >> File >> "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line >> 169, in >> read >> self.parser.ParseFile(handle) >> File >> "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line >> 285, in >> startElementHandler >> raise ValidationError(name) >> Bio.Entrez.Parser.ValidationError: Failed to find tag >> 'Build' in the DTD. >> To skip all tags that are not represented in the DTD, please >> call >> Bio.Entrez.read or Bio.Entrez.parse with validate=False. >> > This error message means exactly what it says. To see what the E-Utilities returns, try > >>>> handle = Entrez.einfo(db="protein") >>>> print handle.read() > > > > > protein > Protein > Protein sequence record > Build130126-0031m.1 > ... > > If you look at the DTD file at http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eInfo_020511.dtd, you'll see that "Build" is not mentioned anywhere. But it is present in the XML file. The error message tells you that the XML file is not consistent with its DTD, but that you can ignore such tags by using validate=False: > >>>> handle = Entrez.einfo(db="protein") >>>> record = Entrez.read(handle, validate=False) >>>> print record > {u'DbInfo': {u'Count': '73259352', u'LastUpdate': '2013/01/26 08:18', u'MenuName': 'Protein', u'Description': 'Protein sequence record', u'LinkList': [{u'DbTo': 'bioproject', u'Menu': 'BioProject Links', u'Name': 'protein_bioproject', u'Description': 'Proteins related to BioProjects'}, {u'DbTo': 'biosystems', u'Menu': 'BioSystem Links', u'Name': 'protein_biosystems', u'Description': 'Pathways and other biosystems containing the current prot... > > Best, > -Michiel. From alejandro.0317 at gmail.com Sun Jan 27 01:28:35 2013 From: alejandro.0317 at gmail.com (Cristian Alejandro Rojas Quintero) Date: Sun, 27 Jan 2013 01:28:35 -0500 Subject: [Biopython] Biopython 1.60 and ExPASy Swiss-Prot search functions Message-ID: <5104C913.7000300@gmail.com> Hi all, I'm trying to make a search of Swiss-Prot records on ExPASy with sprot_search_ful and sprot_search_de , But appears like the ExPASy server has been moved, I tried to read the help from the modules and all that I can find is that the cgi url is "http://www.expasy.ch/cgi-bin/sprot-search-ful" or "http://www.expasy.ch/cgi-bin/sprot-search-de". I'm trying to access in that URLs with my web browser but appears a error message indicating that the site to search is www.uniprot.org. Maybe the functions won't work anymore because the server has been moved? Or am I doing something bad? These are the steps to reproduce the error. >>> handle=ExPASy.sprot_search_ful("Insulin") >>> print handle.read() As you can see the status code for http request was 302. BTW: The release date of the new server was jan 9, 2013 Thank you. ** From nicolas.joannin at gmail.com Sun Jan 27 02:34:48 2013 From: nicolas.joannin at gmail.com (Nicolas Joannin) Date: Sun, 27 Jan 2013 16:34:48 +0900 Subject: [Biopython] Bio.Entrez.epost error with Python 3.2 In-Reply-To: <1359261714.79007.YahooMailClassic@web164006.mail.gq1.yahoo.com> References: <1359259570.36220.YahooMailClassic@web164003.mail.gq1.yahoo.com> <1359261714.79007.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: Hi Michiel, Thank you for your help! I'm still a bit confused about how to work around this problem... I tried adding a post=False, and different variants of this, in the epost call, but I still get the same error. So, I'm guessing that it is actually in the __init__.py file that I should modify it, but I don't know if that is what you are suggesting or not (considering your second email). I am actually trying to retrieve a very large dataset from Genbank (all records for a specific species, e.g. ~90000 records). I initially tried with a simple esearch+usehistory and then using efetch to retrieve the results in batches of 500 records, but I encountered a strange error in xml format (that actually didn't break the process) with this information: "Unable to obtain query #1". Googling the problem, I found this thread: http://comments.gmane.org/gmane.comp.python.bio.general/6962 Their solution involves using epost, which brings us to my initial question :) I'd really appreciate if you could let me know: 1. How best to work around the epost problem... 2. If you know of any alternate solution to the "Unable to obtain query #1" error Thanks in advance! Best regards, Nicolas Nicolas Joannin, Ph.D. Bioinformatics Center Kyoto University, Uji campus, Japan On Sun, Jan 27, 2013 at 1:41 PM, Michiel de Hoon wrote: > Looking at this some more, I found this on the mailing list explaining why > we are using post=True: > > http://lists.open-bio.org/pipermail/biopython/2009-May/005152.html > > This page provides some explanation on urllib.parse.urlencode in Python3: > > http://docs.python.org/3/library/urllib.request.html#urllib-examples > > Best, > -Michiel. > > --- On Sat, 1/26/13, Michiel de Hoon wrote: > > > From: Michiel de Hoon > > Subject: Re: [Biopython] Bio.Entrez.epost error with Python 3.2 > > To: "Biopython Mailing List" , "Nicolas > Joannin" > > Date: Saturday, January 26, 2013, 11:06 PM > > For some reason Entrez.epost switched > > to post=False to post=True in the call to _open in > > Bio.Entrez. I don't know why; the other Entrez functions all > > use post=False, and if I use post=False in Entrez.epost it > > seems to work fine. The switch from post=False to post=True > > was made in this commit; Peter, do you remember why we > > switched to post=True? > > > > > https://github.com/biopython/biopython/commit/c928057b6c811c8959daf806ee6159eb09e0928f#Bio/Entrez/__init__.py > > > > Best, > > -Michiel. > > > > --- On Sat, 1/26/13, Nicolas Joannin > > wrote: > > > > > From: Nicolas Joannin > > > Subject: [Biopython] Bio.Entrez.epost error with Python > > 3.2 > > > To: "Biopython Mailing List" > > > Date: Saturday, January 26, 2013, 7:42 AM > > > Hello everyone, > > > > > > I am having trouble with using Bio.Entrez.epost: see > > details > > > below. > > > I have tried converting the variables to bytes, but > > > everything I've tried > > > gives the same error message. > > > > > > I'm guessing that this might be a bug in biopython when > > used > > > with Python 3. > > > If not, could you please tell me where I got this > > wrong, and > > > how I can fix > > > it? > > > > > > Best regards, > > > Nicolas > > > > > > > > > Here are the details: > > > I have tried the following: > > > > > > - > > > post_h=Entrez.epost("nuccore",id="160418,160351") > > > - > > > post_h=Entrez.epost(b"nuccore",id=b"160418,160351") > > > - > > > > > > post_h=Entrez.epost("nuccore".encode("utf-8"),id="160418,160351".encode("utf-8") > > > > > > > > > >>> > > > post_h=Entrez.epost("nuccore",id="160418,160351") > > > Traceback (most recent call last): > > > File "", line 1, in > > > File > > > > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/Bio/Entrez/__init__.py", > > > line 97, in epost > > > return _open(cgi, variables, post=True) > > > File > > > > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/Bio/Entrez/__init__.py", > > > line 436, in _open > > > handle = urllib.request.urlopen(cgi, > > > data=options) > > > File > > > > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > > > line 138, in urlopen > > > return opener.open(url, data, timeout) > > > File > > > > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > > > line 367, in open > > > req = meth(req) > > > File > > > > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > > > line 1066, in do_request_ > > > raise TypeError("POST data should be bytes" > > > TypeError: POST data should be bytes or an iterable of > > > bytes. It cannot be > > > str. > > > > > > > > > Nicolas Joannin, Ph.D. > > > Bioinformatics Center > > > Kyoto University, Uji campus, Japan > > > _______________________________________________ > > > Biopython mailing list - Biopython at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > From p.j.a.cock at googlemail.com Sun Jan 27 14:04:09 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 27 Jan 2013 19:04:09 +0000 Subject: [Biopython] Bio.Entrez.epost error with Python 3.2 In-Reply-To: <1359261714.79007.YahooMailClassic@web164006.mail.gq1.yahoo.com> References: <1359259570.36220.YahooMailClassic@web164003.mail.gq1.yahoo.com> <1359261714.79007.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: On Sun, Jan 27, 2013 at 4:41 AM, Michiel de Hoon wrote: > Looking at this some more, I found this on the mailing list explaining why we are using post=True: > > http://lists.open-bio.org/pipermail/biopython/2009-May/005152.html Yes, we use post (as the name epost suggests) to upload a long list of IDs without the long URL limitations faced if using an HTTP get. > This page provides some explanation on urllib.parse.urlencode in Python3: > > http://docs.python.org/3/library/urllib.request.html#urllib-examples > Does this mean we have a subtle Python 2 vs 3 problem with epost? Time for another unit test in test_Entrez_online.py which currently only tests einfo and efetch - we should have esearch, epost, espell and esummary in there too I think. Peter From p.j.a.cock at googlemail.com Sun Jan 27 14:07:11 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 27 Jan 2013 19:07:11 +0000 Subject: [Biopython] Issue with Bio.Entrez and protein in Biopython 1.60 In-Reply-To: <5104C6A9.5070507@gmail.com> References: <1359257309.3476.YahooMailClassic@web164004.mail.gq1.yahoo.com> <5104C6A9.5070507@gmail.com> Message-ID: On Sun, Jan 27, 2013 at 6:18 AM, Cristian Alejandro Rojas Quintero wrote: > Hi Michiel, > > It was not a glitch at E-utilities, I had to download and compile Biopython > from official web (previously was the Biopython from Ubuntu repositories). > After of this it works perfectly. > > Thank you If you have Biopython 1.60 from the Ubuntu repository, and replaced it with Biopython 1.60 compiled from source, that shouldn't have made any difference. I think a temporary problem at the NCBI is more likely as Michiel suggested. But either way, it is working now :) Peter From p.j.a.cock at googlemail.com Sun Jan 27 14:15:02 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 27 Jan 2013 19:15:02 +0000 Subject: [Biopython] Biopython 1.60 and ExPASy Swiss-Prot search functions In-Reply-To: <5104C913.7000300@gmail.com> References: <5104C913.7000300@gmail.com> Message-ID: On Sun, Jan 27, 2013 at 6:28 AM, Cristian Alejandro Rojas Quintero wrote: > Hi all, > > I'm trying to make a search of Swiss-Prot records on ExPASy with > sprot_search_ful and sprot_search_de , But appears like the ExPASy server > has been moved, I tried to read the help from the modules and all that I can > find is that the cgi url is "http://www.expasy.ch/cgi-bin/sprot-search-ful" > or "http://www.expasy.ch/cgi-bin/sprot-search-de". I'm trying to access in > that URLs with my web browser but appears a error message indicating that > the site to search is www.uniprot.org. Maybe the functions won't work > anymore because the server has been moved? > > Or am I doing something bad? > > These are the steps to reproduce the error. >>>> handle=ExPASy.sprot_search_ful("Insulin") >>>> print handle.read() > As you can see the status code for http request was 302. BTW: The release > date of the new server was jan 9, 2013 > > Thank you. > ** It looks like ExPASy have retired some more URLs - they did this and broke our get_sprot_raw function three years ago: https://github.com/biopython/biopython/commit/6689bf8657d9515965d63f9c77e6348233472046 If you can work out what the new URLs are this should be easy to fix - last time they had a table at http://www.expasy.ch/expasy_urls.html but that page doesn't exist any more :( In this case, the Error 302 tells us to use: http://www.uniprot.org/uniprot?query=Insulin&S=on Would you like to try updating Bio/ExPASy/__init__.py (and making a pull request on Github)? Regards, Peter From email2ants at gmail.com Tue Jan 29 16:20:37 2013 From: email2ants at gmail.com (Anthony Underwood) Date: Tue, 29 Jan 2013 21:20:37 +0000 Subject: [Biopython] Local blast error on mac Message-ID: Running local blast+ through biopython on a mac I get the following error Traceback (most recent call last): File "local_blast.py", line 31, in local_blast_function.local_blast_function('gene_id_sequences.fasta', 'gene_id_blast_output.txt') File "/Users/sophia/legionella/scripts/local_blast_function.py", line 12, in local_blast_function stdout, stderr = blastp_cline() File "/Users/sophia/.pythonbrew/pythons/Python-2.7/lib/python2.7/site-packages/biopython-1.60-py2.7-macosx-10.6-x86_64.egg/Bio/Application/__init__.py", line 437, in __call__ stdout_str, stderr_str) Bio.Application.ApplicationError: Command '/usr/local/ncbi/blast/bin/blastp -out gene_id_blast_output.txt -outfmt 5 -query gene_id_sequences.fasta -db /Volumes/DataRAID/blast_databases/uniprot_sprot -evalue 0.001' returned non-zero exit status -11 If I run the command on its own "/usr/local/ncbi/blast/bin/blastp -out gene_id_blast_output.txt -outfmt 5 -query gene_id_sequences.fasta -db /Volumes/DataRAID/blast_databases/uniprot_sprot -evalue 0.001" This runs fine and it produces XML output. However through biopython it exits with a strange -11 error. I've tried debugging the biopython code but to no avail- no stderr is produced. Does anybody have any suggestions as to what I might try? Thanks Anthony From p.j.a.cock at googlemail.com Tue Jan 29 16:33:41 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 29 Jan 2013 21:33:41 +0000 Subject: [Biopython] Local blast error on mac In-Reply-To: References: Message-ID: On Tue, Jan 29, 2013 at 9:20 PM, Anthony Underwood wrote: > Running local blast+ through biopython on a mac I get the following error > > Traceback (most recent call last): > ... > Bio.Application.ApplicationError: Command '/usr/local/ncbi/blast/bin/blastp -out gene_id_blast_output.txt -outfmt 5 -query gene_id_sequences.fasta -db /Volumes/DataRAID/blast_databases/uniprot_sprot -evalue 0.001' returned non-zero exit status -11 See http://docs.python.org/2/library/subprocess.html "A negative value -N indicates that the child was terminated by signal N (Unix only)." This means blastp died with signal 11, typically a segmentation fault. > If I run the command on its own > > "/usr/local/ncbi/blast/bin/blastp -out gene_id_blast_output.txt -outfmt 5 -query gene_id_sequences.fasta -db /Volumes/DataRAID/blast_databases/uniprot_sprot -evalue 0.001" > > This runs fine and it produces XML output. Very strange. There is nothing odd there with slashes, spaces or quotes (the usual suspects). Is there anything else happening in the Python code which might interact with BLAST? For example changing directory, changing environment variables, or keeping a handle open to one of the files BLAST is using? You can check that by writing a tiny Python script which just sets up the BLAST wrapper and exec How long does it take to run outside Python? Is there enough time to monitor the task list when run via Python, and perhaps get some clue like tight memory or something? Other than that, I'm not sure off hand what to suggest. Peter From email2ants at gmail.com Tue Jan 29 16:38:40 2013 From: email2ants at gmail.com (Anthony Underwood) Date: Tue, 29 Jan 2013 21:38:40 +0000 Subject: [Biopython] Local blast error on mac In-Reply-To: References: Message-ID: <83845DAA-FA5C-4DC2-8EFF-3EC1B6A53039@gmail.com> Thanks for the quick reply Peter. It's possible that the query file was opened to write the sequence to and then not closed (the code is on my work computer so I can not check right now). I will try changing this and report back. Thanks again, Anthony On 29 Jan 2013, at 21:33, Peter Cock wrote: > On Tue, Jan 29, 2013 at 9:20 PM, Anthony Underwood wrote: >> Running local blast+ through biopython on a mac I get the following error >> >> Traceback (most recent call last): >> ... >> Bio.Application.ApplicationError: Command '/usr/local/ncbi/blast/bin/blastp -out gene_id_blast_output.txt -outfmt 5 -query gene_id_sequences.fasta -db /Volumes/DataRAID/blast_databases/uniprot_sprot -evalue 0.001' returned non-zero exit status -11 > > See http://docs.python.org/2/library/subprocess.html > "A negative value -N indicates that the child was terminated by signal > N (Unix only)." > > This means blastp died with signal 11, typically a segmentation fault. > >> If I run the command on its own >> >> "/usr/local/ncbi/blast/bin/blastp -out gene_id_blast_output.txt -outfmt 5 -query gene_id_sequences.fasta -db /Volumes/DataRAID/blast_databases/uniprot_sprot -evalue 0.001" >> >> This runs fine and it produces XML output. > > Very strange. There is nothing odd there with slashes, spaces or quotes > (the usual suspects). > > Is there anything else happening in the Python code which might > interact with BLAST? For example changing directory, changing > environment variables, or keeping a handle open to one of the files > BLAST is using? > > You can check that by writing a tiny Python script which just > sets up the BLAST wrapper and exec > > How long does it take to run outside Python? Is there enough time > to monitor the task list when run via Python, and perhaps get some > clue like tight memory or something? > > Other than that, I'm not sure off hand what to suggest. > > Peter From petra.kubincova at gmail.com Wed Jan 30 16:50:50 2013 From: petra.kubincova at gmail.com (=?ISO-8859-1?Q?Petra_Kubincov=E1?=) Date: Wed, 30 Jan 2013 22:50:50 +0100 Subject: [Biopython] Fwd: Bug in bgzf module In-Reply-To: References: Message-ID: Hello, recently I have installed and tried out biopython 1.60, especially bgzf module. I have discovered that calling method "tell()" of object of bgzf.BgzfWriter type raises this error: Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.6/dist-packages/Bio/bgzf.py", line 743, in tell return make_virtual_offset(self.handle.tell(), len(self._buffer)) AttributeError: 'BgzfWriter' object has no attribute 'handle' I've checked out source code of bgzf module. Source code of error-raising method "tell()" looks like this: def tell(self): """Returns a BGZF 64-bit virtual offset.""" return make_virtual_offset(self.handle.tell(), len(self._buffer)) The problem is that BgzfWriter does not have variable called "handle", only "_handle". So (IMHO) all that needs to be done to fix this bug is change "self.handle.tell()" to "self._handle.tell()". Cheers, Petra Kubincova From p.j.a.cock at googlemail.com Wed Jan 30 17:12:39 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 30 Jan 2013 22:12:39 +0000 Subject: [Biopython] Fwd: Bug in bgzf module In-Reply-To: References: Message-ID: On Wed, Jan 30, 2013 at 9:50 PM, Petra Kubincov? wrote: > Hello, > > recently I have installed and tried out biopython 1.60, especially bgzf > module. I have discovered that calling method "tell()" of object of > bgzf.BgzfWriter type raises this error: > > Traceback (most recent call last): > File "", line 1, in > File "/usr/local/lib/python2.6/dist-packages/Bio/bgzf.py", line 743, in > tell > return make_virtual_offset(self.handle.tell(), len(self._buffer)) > AttributeError: 'BgzfWriter' object has no attribute 'handle' > > I've checked out source code of bgzf module. Source code of error-raising > method "tell()" looks like this: > > def tell(self): > """Returns a BGZF 64-bit virtual offset.""" > return make_virtual_offset(self.handle.tell(), len(self._buffer)) > > The problem is that BgzfWriter does not have variable called "handle", only > "_handle". So (IMHO) all that needs to be done to fix this bug is change > "self.handle.tell()" to "self._handle.tell()". > > Cheers, > Petra Kubincova Hi Petra, It is nice to know people are using this relatively new code :) Thank you for pointing that out, that is the correct fix: https://github.com/biopython/biopython/commit/2a1be2f3e9b731fa05cc4ad7a01a67866155827c We should add a unit test for this too, in Tests/test_bgzf.py - is that something you'd like to try and do? If not I'll try to remember to add something myself. The reason I implemented the BgzfWriter tell method was for the use-case of writing a BGZF compressed file while also recording an index. I'm curious if that's what you are doing, or if you had another purpose? Thanks, Peter From petra.kubincova at gmail.com Thu Jan 31 17:57:29 2013 From: petra.kubincova at gmail.com (=?ISO-8859-1?Q?Petra_Kubincov=E1?=) Date: Thu, 31 Jan 2013 23:57:29 +0100 Subject: [Biopython] Fwd: Bug in bgzf module In-Reply-To: References: Message-ID: Hi Peter, well, I don't have much experience with unit tests but I will try to come up with something. :) I'll let you know if I won't succeed. And yes, recording an index is exactly the thing I need to do. (I am currently working on interval mapping tool for multiple whole-genome alignments, where I need to read .maf file, write preprocessed data into a compressed file and then work just with index for the compressed file and the compressed file itself to do the mapping.) Have a nice day, Petra On Wed, Jan 30, 2013 at 11:12 PM, Peter Cock wrote: > On Wed, Jan 30, 2013 at 9:50 PM, Petra Kubincov? > wrote: > > Hello, > > > > recently I have installed and tried out biopython 1.60, especially bgzf > > module. I have discovered that calling method "tell()" of object of > > bgzf.BgzfWriter type raises this error: > > > > Traceback (most recent call last): > > File "", line 1, in > > File "/usr/local/lib/python2.6/dist-packages/Bio/bgzf.py", line 743, in > > tell > > return make_virtual_offset(self.handle.tell(), len(self._buffer)) > > AttributeError: 'BgzfWriter' object has no attribute 'handle' > > > > I've checked out source code of bgzf module. Source code of error-raising > > method "tell()" looks like this: > > > > def tell(self): > > """Returns a BGZF 64-bit virtual offset.""" > > return make_virtual_offset(self.handle.tell(), len(self._buffer)) > > > > The problem is that BgzfWriter does not have variable called "handle", > only > > "_handle". So (IMHO) all that needs to be done to fix this bug is change > > "self.handle.tell()" to "self._handle.tell()". > > > > Cheers, > > Petra Kubincova > > Hi Petra, > > It is nice to know people are using this relatively new code :) > > Thank you for pointing that out, that is the correct fix: > > https://github.com/biopython/biopython/commit/2a1be2f3e9b731fa05cc4ad7a01a67866155827c > > We should add a unit test for this too, in Tests/test_bgzf.py - > is that something you'd like to try and do? If not I'll try to > remember to add something myself. > > The reason I implemented the BgzfWriter tell method was for > the use-case of writing a BGZF compressed file while also > recording an index. I'm curious if that's what you are doing, > or if you had another purpose? > > Thanks, > > Peter > From p.j.a.cock at googlemail.com Mon Jan 7 18:55:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 7 Jan 2013 18:55:25 +0000 Subject: [Biopython] Dropping Python 2.5 and Jython 2.5 support? In-Reply-To: References: Message-ID: On Mon, Oct 22, 2012 at 6:17 PM, Peter Cock wrote: > Dear Biopythoneers, > > Would anyone object to us preparing to drop support for Python 2.5 and > Jython 2.5, perhaps after the next Biopython release? > > To reassure those of you using Jython, we'd wait until Jython 2.7 is out > first. Jython 2.7 is already in alpha, and brings support for C Python 2.7 > language features. > > Thanks, > > Peter Hello all, Having recently back-ported some Python 3 code with a C extension to Python 2.6 and 2.7, I can now more clearly appreciate the benefits dropping Python 2.5 support has for writing code for both Python 2 and 3 - and am keen to be able to exploit this for Biopython. Given no major objections to the email I sent round in October last year (thank you for your input Nathan), we will press ahead with phasing out support for Python 2.5, provisionally supporting it in the forthcoming Biopython 1.61 and at least one more release (which would mean Biopython 1.62 due Summer 2013). https://github.com/biopython/biopython/commit/3f17f75b320fb6624d332809ef07314bab97477c My only significant concern is for Jython users, since this will also mean dropping support for Jython 2.5 (which implements the Python 2.5 language). The replacement Jython 2.7 is still only at the alpha release stage. Regards, Peter From thomas.girke at ucr.edu Wed Jan 9 21:10:05 2013 From: thomas.girke at ucr.edu (Thomas Girke) Date: Wed, 9 Jan 2013 13:10:05 -0800 Subject: [Biopython] Bioinformatics Position Opening Message-ID: <20130109211005.GA3819@dhcp-138-23-59-108.dyn.ucr.edu> Dear List, Below is an announcement for a Ph.D. level bioinformatics position at UCR. It is a long-term position with a competitive salary in a vibrant research environment with cutting edge high-performance compute and genomics facilities. Application instructions are given in the announcement. Potential candidates are welcome to email me their questions about this position directly, e.g. prior or after submitting a formal application. Best, Thomas -- Thomas Girke Associate Professor of Bioinformatics Institute for Integrative Genome Biology (IIGB) 1207F Genomics Building University of California Riverside, CA 92521 E-mail: thomas.girke at ucr.edu Ph: 951-905-5232 Fax: 951-827-5155 POSITION ANNOUNCEMENT POSITION The Institute for Integrative Genome Biology (IIGB) at the University of California, Riverside is seeking a Ph.D. level bioinformatician to manage its bioinformatics research activities and computing facility. TITLE/RANK Bioinformatics Facility Director. Salary will be competitive and commensurate with accomplishments. LOCATION University of California, Riverside. BACKGROUND Successful candidates will join an innovative and multidisciplinary Institute for Integrative Genome Biology (IIGB) that connects theoretical and experimental researchers from different departments in Life, Physical and Mathematical Sciences, Medicine, Engineering and various campus based Centers. The IIGB is organized around a 10,000 sq.ft. suite of Instrumentation Facilities that serve as a centralized, shared-use resource for faculty, staff and students, offering advanced tools in bioinformatics, microscopy, proteomics and genomics. Its bioinformatic component is equipped with an extensive high-performance compute (HPC) infrastructure. QUALIFICATIONS Applicants must have a Ph.D. from a recognized university in bioinformatics; or combined degrees in computer science and a biological science; or a degree in either computer science combined with relevant experience in biological science; or a degree in a biological science combined with relevant experience in computer science. The successful candidate will have at least 2 years of professional hands-on experience with next generation sequence data analysis, scientific data programming and high-performance computing. A strong publication record of bioinformatics research in collaboration with experimental biologists is expected. Another requirement is several years of professional experience with common programming languages/environments. This includes at least one statistical programming environment (preferentially R), one or more general-purpose scripting languages (e.g. Python, Perl or Ruby), experience with web development frameworks and relational database design. Several years of computational research experience using HPC systems will be beneficial. The incumbent should also have experience with the analysis of modern biological data sets, such as microarrays, next generation sequence data (e.g. genotyping, RNA profiling, de novo assemblies), phylogenetics and/or molecular dynamics simulations. RESPONSIBILITIES The Bioinformatics Facility Director manages IIGB's computational infrastructure jointly with its bioinformatics staff, including an HPC/Linux systems administrator, one or more programmers and students workers. The incumbent will be required to provide data analysis support to collaborative research activities and make available findings through presentations and contribute as team member to scientific publications as well as participate in the preparation of joint grant applications and reports. The teaching expectations include the development of a state-of-the-art workshop program on large-scale data analysis and programming. Participation in collaborative equipment grants will be another core responsibility to secure future growth of the facility?s computing resources. TO APPLY Review of applications will continue until the position is filled. Interested individuals should: (1) submit a curriculum vitae, (2) provide a statement of research interests, and (3) arrange to have three letters of reference sent on their behalf. All information should be emailed to: thomas.girke at ucr.edu WEBSITE http://facility.bioinformatics.ucr.edu/position-opening From mictadlo at gmail.com Fri Jan 25 13:27:52 2013 From: mictadlo at gmail.com (Mic) Date: Fri, 25 Jan 2013 23:27:52 +1000 Subject: [Biopython] goatools Message-ID: Hello, Is goatools still the most up to date python package? Any idea how the files in goatools / data / on Github were created? How would it be possible to use plot*go*term.py (from goatools) to create input files for Cytoscape or VisANT http://www.cytoscape.org/ http://visant.bu.edu/ Thank you in advance. Mic From steffen_moeller at gmx.de Fri Jan 25 15:11:28 2013 From: steffen_moeller at gmx.de (=?iso-8859-1?Q?=22Steffen_M=F6ller=22?=) Date: Fri, 25 Jan 2013 16:11:28 +0100 Subject: [Biopython] Debian Med Sprint in Kiel, Germany 23rd/24th of February Message-ID: <20130125151128.187040@gmx.net> Dear all, We have our annual Debian/Ubuntu/Bio-Linux sprint on Bioinformatics again next month. Every year there are a few individuals more peripheral to the distribution attending, which usually helps us to develop our community further in some way. Anybody from BioPython interested to join in, please read through http://wiki.debian.org/DebianMed/Meeting/Kiel2013 and just email me or add him/herself. There is not anything particular that I expect from the BioPython community, except for more and better ideas on how to develop research on and with tools in computational biology further. Registration is free. Accommodation and travel are not. Cheers, Steffen From alejandro.0317 at gmail.com Sat Jan 26 01:16:42 2013 From: alejandro.0317 at gmail.com (Cristian Alejandro Rojas) Date: Fri, 25 Jan 2013 20:16:42 -0500 Subject: [Biopython] Issue with Bio.Entrez and protein in Biopython 1.60 Message-ID: Hi all. I'm having a issue using Bio.Entrez to search a protein. I'm doing this: >>> handle=Entrez.esearch(db="protein", term="insulin AND homo") >>> record=Entrez.read(handle) Traceback (most recent call last): File "", line 1, in File "/usr/lib/pymodules/python2.7/Bio/Entrez/__init__.py", line 351, in read record = handler.read(handle) File "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line 169, in read self.parser.ParseFile(handle) File "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line 307, in endElementHandler raise RuntimeError(value) RuntimeError: Search Backend failed: Database is not supported: protein I'm having a issue with einfo() too, check at this: >>> handler=Entrez.einfo(db="protein") >>> record=Entrez.read(handler) Traceback (most recent call last): File "", line 1, in File "/usr/lib/pymodules/python2.7/Bio/Entrez/__init__.py", line 351, in read record = handler.read(handle) File "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line 169, in read self.parser.ParseFile(handle) File "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line 285, in startElementHandler raise ValidationError(name) Bio.Entrez.Parser.ValidationError: Failed to find tag 'Build' in the DTD. To skip all tags that are not represented in the DTD, please call Bio.Entrez.read or Bio.Entrez.parse with validate=False. Why protein database is not supported?, Can somebody help me with this issue? Grettings! -- *Cristian Alejandro Rojas Quintero* *Estudiante Ingenier?a de Sistemas * *Universidad Distrital Francisco Jos? de Caldas* Bogot? - Colombia From nicolas.joannin at gmail.com Sat Jan 26 12:42:19 2013 From: nicolas.joannin at gmail.com (Nicolas Joannin) Date: Sat, 26 Jan 2013 21:42:19 +0900 Subject: [Biopython] Bio.Entrez.epost error with Python 3.2 Message-ID: Hello everyone, I am having trouble with using Bio.Entrez.epost: see details below. I have tried converting the variables to bytes, but everything I've tried gives the same error message. I'm guessing that this might be a bug in biopython when used with Python 3. If not, could you please tell me where I got this wrong, and how I can fix it? Best regards, Nicolas Here are the details: I have tried the following: - post_h=Entrez.epost("nuccore",id="160418,160351") - post_h=Entrez.epost(b"nuccore",id=b"160418,160351") - post_h=Entrez.epost("nuccore".encode("utf-8"),id="160418,160351".encode("utf-8") >>> post_h=Entrez.epost("nuccore",id="160418,160351") Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/Bio/Entrez/__init__.py", line 97, in epost return _open(cgi, variables, post=True) File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/Bio/Entrez/__init__.py", line 436, in _open handle = urllib.request.urlopen(cgi, data=options) File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 138, in urlopen return opener.open(url, data, timeout) File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 367, in open req = meth(req) File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 1066, in do_request_ raise TypeError("POST data should be bytes" TypeError: POST data should be bytes or an iterable of bytes. It cannot be str. Nicolas Joannin, Ph.D. Bioinformatics Center Kyoto University, Uji campus, Japan From mjldehoon at yahoo.com Sun Jan 27 03:28:29 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 26 Jan 2013 19:28:29 -0800 (PST) Subject: [Biopython] Issue with Bio.Entrez and protein in Biopython 1.60 In-Reply-To: Message-ID: <1359257309.3476.YahooMailClassic@web164004.mail.gq1.yahoo.com> Hi Cristian, --- On Fri, 1/25/13, Cristian Alejandro Rojas wrote: > I'm having a issue using Bio.Entrez to search a protein. I'm > doing? this: > > >>> handle=Entrez.esearch(db="protein", > term="insulin AND homo") > >>> record=Entrez.read(handle) > Traceback (most recent call last): It works for me now, so this may have been a temporary glitch at the E-Utilities: >>> from Bio import Entrez >>> handle=Entrez.esearch(db="protein", term="insulin AND homo") >>> record = Entrez.read(handle) >>> print record {u'Count': '3956', u'RetMax': '20', u'IdList': ['443497968', '443497970', '443428106', '443428104', '443428107', '443428105', '83700231', '21361212', '4505143', '66472382', '443287675', '443287677', '419636284', '375298744', '341940804', '341940253', '332278248', '317373577', '317373571', '317373494'], u'TranslationStack': [{u'Count': '32050', u'Field': 'All Fields', u'Term': 'insulin[All Fields]', u'Explode': 'N'}, {u'Count': '0', u'Field': 'Organism', u'Term': '"Homo"[Organism]', u'Explode': 'N'}, {u'Count': '10253279', u'Field': 'All Fields', u'Term': 'homo[All Fields]', u'Explode': 'N'}, 'OR', 'GROUP', 'AND'], u'TranslationSet': [{u'To': '"Homo"[Organism] OR homo[All Fields]', u'From': 'homo'}], u'RetStart': '0', u'QueryTranslation': 'insulin[All Fields] AND ("Homo"[Organism] OR homo[All Fields])'} > I'm having a issue with einfo() too, check at this: > > >>> handler=Entrez.einfo(db="protein") > >>> record=Entrez.read(handler) > Traceback (most recent call last): > ? File "", line 1, in > ? File > "/usr/lib/pymodules/python2.7/Bio/Entrez/__init__.py", line > 351, in > read > ? ? record = handler.read(handle) > ? File > "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line > 169, in > read > ? ? self.parser.ParseFile(handle) > ? File > "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line > 285, in > startElementHandler > ? ? raise ValidationError(name) > Bio.Entrez.Parser.ValidationError: Failed to find tag > 'Build' in the DTD. > To skip all tags that are not represented in the DTD, please > call > Bio.Entrez.read or Bio.Entrez.parse with validate=False. > This error message means exactly what it says. To see what the E-Utilities returns, try >>> handle = Entrez.einfo(db="protein") >>> print handle.read() protein Protein Protein sequence record Build130126-0031m.1 ... If you look at the DTD file at http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eInfo_020511.dtd, you'll see that "Build" is not mentioned anywhere. But it is present in the XML file. The error message tells you that the XML file is not consistent with its DTD, but that you can ignore such tags by using validate=False: >>> handle = Entrez.einfo(db="protein") >>> record = Entrez.read(handle, validate=False) >>> print record {u'DbInfo': {u'Count': '73259352', u'LastUpdate': '2013/01/26 08:18', u'MenuName': 'Protein', u'Description': 'Protein sequence record', u'LinkList': [{u'DbTo': 'bioproject', u'Menu': 'BioProject Links', u'Name': 'protein_bioproject', u'Description': 'Proteins related to BioProjects'}, {u'DbTo': 'biosystems', u'Menu': 'BioSystem Links', u'Name': 'protein_biosystems', u'Description': 'Pathways and other biosystems containing the current prot... Best, -Michiel. From mjldehoon at yahoo.com Sun Jan 27 04:06:10 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 26 Jan 2013 20:06:10 -0800 (PST) Subject: [Biopython] Bio.Entrez.epost error with Python 3.2 In-Reply-To: Message-ID: <1359259570.36220.YahooMailClassic@web164003.mail.gq1.yahoo.com> For some reason Entrez.epost switched to post=False to post=True in the call to _open in Bio.Entrez. I don't know why; the other Entrez functions all use post=False, and if I use post=False in Entrez.epost it seems to work fine. The switch from post=False to post=True was made in this commit; Peter, do you remember why we switched to post=True? https://github.com/biopython/biopython/commit/c928057b6c811c8959daf806ee6159eb09e0928f#Bio/Entrez/__init__.py Best, -Michiel. --- On Sat, 1/26/13, Nicolas Joannin wrote: > From: Nicolas Joannin > Subject: [Biopython] Bio.Entrez.epost error with Python 3.2 > To: "Biopython Mailing List" > Date: Saturday, January 26, 2013, 7:42 AM > Hello everyone, > > I am having trouble with using Bio.Entrez.epost: see details > below. > I have tried converting the variables to bytes, but > everything I've tried > gives the same error message. > > I'm guessing that this might be a bug in biopython when used > with Python 3. > If not, could you please tell me where I got this wrong, and > how I can fix > it? > > Best regards, > Nicolas > > > Here are the details: > I have tried the following: > > ???- > post_h=Entrez.epost("nuccore",id="160418,160351") > ???- > post_h=Entrez.epost(b"nuccore",id=b"160418,160351") > ???- > ???post_h=Entrez.epost("nuccore".encode("utf-8"),id="160418,160351".encode("utf-8") > > > >>> > post_h=Entrez.epost("nuccore",id="160418,160351") > Traceback (most recent call last): > ? File "", line 1, in > ? File > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/Bio/Entrez/__init__.py", > line 97, in epost > ? ? return _open(cgi, variables, post=True) > ? File > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/Bio/Entrez/__init__.py", > line 436, in _open > ? ? handle = urllib.request.urlopen(cgi, > data=options) > ? File > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > line 138, in urlopen > ? ? return opener.open(url, data, timeout) > ? File > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > line 367, in open > ? ? req = meth(req) > ? File > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > line 1066, in do_request_ > ? ? raise TypeError("POST data should be bytes" > TypeError: POST data should be bytes or an iterable of > bytes. It cannot be > str. > > > Nicolas Joannin, Ph.D. > Bioinformatics Center > Kyoto University, Uji campus, Japan > _______________________________________________ > Biopython mailing list? -? Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From mjldehoon at yahoo.com Sun Jan 27 04:41:54 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 26 Jan 2013 20:41:54 -0800 (PST) Subject: [Biopython] Bio.Entrez.epost error with Python 3.2 In-Reply-To: <1359259570.36220.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: <1359261714.79007.YahooMailClassic@web164006.mail.gq1.yahoo.com> Looking at this some more, I found this on the mailing list explaining why we are using post=True: http://lists.open-bio.org/pipermail/biopython/2009-May/005152.html This page provides some explanation on urllib.parse.urlencode in Python3: http://docs.python.org/3/library/urllib.request.html#urllib-examples Best, -Michiel. --- On Sat, 1/26/13, Michiel de Hoon wrote: > From: Michiel de Hoon > Subject: Re: [Biopython] Bio.Entrez.epost error with Python 3.2 > To: "Biopython Mailing List" , "Nicolas Joannin" > Date: Saturday, January 26, 2013, 11:06 PM > For some reason Entrez.epost switched > to post=False to post=True in the call to _open in > Bio.Entrez. I don't know why; the other Entrez functions all > use post=False, and if I use post=False in Entrez.epost it > seems to work fine. The switch from post=False to post=True > was made in this commit; Peter, do you remember why we > switched to post=True? > > https://github.com/biopython/biopython/commit/c928057b6c811c8959daf806ee6159eb09e0928f#Bio/Entrez/__init__.py > > Best, > -Michiel. > > --- On Sat, 1/26/13, Nicolas Joannin > wrote: > > > From: Nicolas Joannin > > Subject: [Biopython] Bio.Entrez.epost error with Python > 3.2 > > To: "Biopython Mailing List" > > Date: Saturday, January 26, 2013, 7:42 AM > > Hello everyone, > > > > I am having trouble with using Bio.Entrez.epost: see > details > > below. > > I have tried converting the variables to bytes, but > > everything I've tried > > gives the same error message. > > > > I'm guessing that this might be a bug in biopython when > used > > with Python 3. > > If not, could you please tell me where I got this > wrong, and > > how I can fix > > it? > > > > Best regards, > > Nicolas > > > > > > Here are the details: > > I have tried the following: > > > > ???- > > post_h=Entrez.epost("nuccore",id="160418,160351") > > ???- > > post_h=Entrez.epost(b"nuccore",id=b"160418,160351") > > ???- > > > ???post_h=Entrez.epost("nuccore".encode("utf-8"),id="160418,160351".encode("utf-8") > > > > > > >>> > > post_h=Entrez.epost("nuccore",id="160418,160351") > > Traceback (most recent call last): > > ? File "", line 1, in > > ? File > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/Bio/Entrez/__init__.py", > > line 97, in epost > > ? ? return _open(cgi, variables, post=True) > > ? File > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/Bio/Entrez/__init__.py", > > line 436, in _open > > ? ? handle = urllib.request.urlopen(cgi, > > data=options) > > ? File > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > > line 138, in urlopen > > ? ? return opener.open(url, data, timeout) > > ? File > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > > line 367, in open > > ? ? req = meth(req) > > ? File > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > > line 1066, in do_request_ > > ? ? raise TypeError("POST data should be bytes" > > TypeError: POST data should be bytes or an iterable of > > bytes. It cannot be > > str. > > > > > > Nicolas Joannin, Ph.D. > > Bioinformatics Center > > Kyoto University, Uji campus, Japan > > _______________________________________________ > > Biopython mailing list? -? Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > _______________________________________________ > Biopython mailing list? -? Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From alejandro.0317 at gmail.com Sun Jan 27 06:18:17 2013 From: alejandro.0317 at gmail.com (Cristian Alejandro Rojas Quintero) Date: Sun, 27 Jan 2013 01:18:17 -0500 Subject: [Biopython] Issue with Bio.Entrez and protein in Biopython 1.60 In-Reply-To: <1359257309.3476.YahooMailClassic@web164004.mail.gq1.yahoo.com> References: <1359257309.3476.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: <5104C6A9.5070507@gmail.com> Hi Michiel, It was not a glitch at E-utilities, I had to download and compile Biopython from official web (previously was the Biopython from Ubuntu repositories). After of this it works perfectly. Thank you On 26/01/13 22:28, Michiel de Hoon wrote: > Hi Cristian, > > --- On Fri, 1/25/13, Cristian Alejandro Rojas wrote: >> I'm having a issue using Bio.Entrez to search a protein. I'm >> doing this: >> >>>>> handle=Entrez.esearch(db="protein", >> term="insulin AND homo") >>>>> record=Entrez.read(handle) >> Traceback (most recent call last): > It works for me now, so this may have been a temporary glitch at the E-Utilities: > >>>> from Bio import Entrez >>>> handle=Entrez.esearch(db="protein", term="insulin AND homo") >>>> record = Entrez.read(handle) >>>> print record > {u'Count': '3956', u'RetMax': '20', u'IdList': ['443497968', '443497970', '443428106', '443428104', '443428107', '443428105', '83700231', '21361212', '4505143', '66472382', '443287675', '443287677', '419636284', '375298744', '341940804', '341940253', '332278248', '317373577', '317373571', '317373494'], u'TranslationStack': [{u'Count': '32050', u'Field': 'All Fields', u'Term': 'insulin[All Fields]', u'Explode': 'N'}, {u'Count': '0', u'Field': 'Organism', u'Term': '"Homo"[Organism]', u'Explode': 'N'}, {u'Count': '10253279', u'Field': 'All Fields', u'Term': 'homo[All Fields]', u'Explode': 'N'}, 'OR', 'GROUP', 'AND'], u'TranslationSet': [{u'To': '"Homo"[Organism] OR homo[All Fields]', u'From': 'homo'}], u'RetStart': '0', u'QueryTranslation': 'insulin[All Fields] AND ("Homo"[Organism] OR homo[All Fields])'} > >> I'm having a issue with einfo() too, check at this: >> >>>>> handler=Entrez.einfo(db="protein") >>>>> record=Entrez.read(handler) >> Traceback (most recent call last): >> File "", line 1, in >> File >> "/usr/lib/pymodules/python2.7/Bio/Entrez/__init__.py", line >> 351, in >> read >> record = handler.read(handle) >> File >> "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line >> 169, in >> read >> self.parser.ParseFile(handle) >> File >> "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line >> 285, in >> startElementHandler >> raise ValidationError(name) >> Bio.Entrez.Parser.ValidationError: Failed to find tag >> 'Build' in the DTD. >> To skip all tags that are not represented in the DTD, please >> call >> Bio.Entrez.read or Bio.Entrez.parse with validate=False. >> > This error message means exactly what it says. To see what the E-Utilities returns, try > >>>> handle = Entrez.einfo(db="protein") >>>> print handle.read() > > > > > protein > Protein > Protein sequence record > Build130126-0031m.1 > ... > > If you look at the DTD file at http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eInfo_020511.dtd, you'll see that "Build" is not mentioned anywhere. But it is present in the XML file. The error message tells you that the XML file is not consistent with its DTD, but that you can ignore such tags by using validate=False: > >>>> handle = Entrez.einfo(db="protein") >>>> record = Entrez.read(handle, validate=False) >>>> print record > {u'DbInfo': {u'Count': '73259352', u'LastUpdate': '2013/01/26 08:18', u'MenuName': 'Protein', u'Description': 'Protein sequence record', u'LinkList': [{u'DbTo': 'bioproject', u'Menu': 'BioProject Links', u'Name': 'protein_bioproject', u'Description': 'Proteins related to BioProjects'}, {u'DbTo': 'biosystems', u'Menu': 'BioSystem Links', u'Name': 'protein_biosystems', u'Description': 'Pathways and other biosystems containing the current prot... > > Best, > -Michiel. From alejandro.0317 at gmail.com Sun Jan 27 06:28:35 2013 From: alejandro.0317 at gmail.com (Cristian Alejandro Rojas Quintero) Date: Sun, 27 Jan 2013 01:28:35 -0500 Subject: [Biopython] Biopython 1.60 and ExPASy Swiss-Prot search functions Message-ID: <5104C913.7000300@gmail.com> Hi all, I'm trying to make a search of Swiss-Prot records on ExPASy with sprot_search_ful and sprot_search_de , But appears like the ExPASy server has been moved, I tried to read the help from the modules and all that I can find is that the cgi url is "http://www.expasy.ch/cgi-bin/sprot-search-ful" or "http://www.expasy.ch/cgi-bin/sprot-search-de". I'm trying to access in that URLs with my web browser but appears a error message indicating that the site to search is www.uniprot.org. Maybe the functions won't work anymore because the server has been moved? Or am I doing something bad? These are the steps to reproduce the error. >>> handle=ExPASy.sprot_search_ful("Insulin") >>> print handle.read() As you can see the status code for http request was 302. BTW: The release date of the new server was jan 9, 2013 Thank you. ** From nicolas.joannin at gmail.com Sun Jan 27 07:34:48 2013 From: nicolas.joannin at gmail.com (Nicolas Joannin) Date: Sun, 27 Jan 2013 16:34:48 +0900 Subject: [Biopython] Bio.Entrez.epost error with Python 3.2 In-Reply-To: <1359261714.79007.YahooMailClassic@web164006.mail.gq1.yahoo.com> References: <1359259570.36220.YahooMailClassic@web164003.mail.gq1.yahoo.com> <1359261714.79007.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: Hi Michiel, Thank you for your help! I'm still a bit confused about how to work around this problem... I tried adding a post=False, and different variants of this, in the epost call, but I still get the same error. So, I'm guessing that it is actually in the __init__.py file that I should modify it, but I don't know if that is what you are suggesting or not (considering your second email). I am actually trying to retrieve a very large dataset from Genbank (all records for a specific species, e.g. ~90000 records). I initially tried with a simple esearch+usehistory and then using efetch to retrieve the results in batches of 500 records, but I encountered a strange error in xml format (that actually didn't break the process) with this information: "Unable to obtain query #1". Googling the problem, I found this thread: http://comments.gmane.org/gmane.comp.python.bio.general/6962 Their solution involves using epost, which brings us to my initial question :) I'd really appreciate if you could let me know: 1. How best to work around the epost problem... 2. If you know of any alternate solution to the "Unable to obtain query #1" error Thanks in advance! Best regards, Nicolas Nicolas Joannin, Ph.D. Bioinformatics Center Kyoto University, Uji campus, Japan On Sun, Jan 27, 2013 at 1:41 PM, Michiel de Hoon wrote: > Looking at this some more, I found this on the mailing list explaining why > we are using post=True: > > http://lists.open-bio.org/pipermail/biopython/2009-May/005152.html > > This page provides some explanation on urllib.parse.urlencode in Python3: > > http://docs.python.org/3/library/urllib.request.html#urllib-examples > > Best, > -Michiel. > > --- On Sat, 1/26/13, Michiel de Hoon wrote: > > > From: Michiel de Hoon > > Subject: Re: [Biopython] Bio.Entrez.epost error with Python 3.2 > > To: "Biopython Mailing List" , "Nicolas > Joannin" > > Date: Saturday, January 26, 2013, 11:06 PM > > For some reason Entrez.epost switched > > to post=False to post=True in the call to _open in > > Bio.Entrez. I don't know why; the other Entrez functions all > > use post=False, and if I use post=False in Entrez.epost it > > seems to work fine. The switch from post=False to post=True > > was made in this commit; Peter, do you remember why we > > switched to post=True? > > > > > https://github.com/biopython/biopython/commit/c928057b6c811c8959daf806ee6159eb09e0928f#Bio/Entrez/__init__.py > > > > Best, > > -Michiel. > > > > --- On Sat, 1/26/13, Nicolas Joannin > > wrote: > > > > > From: Nicolas Joannin > > > Subject: [Biopython] Bio.Entrez.epost error with Python > > 3.2 > > > To: "Biopython Mailing List" > > > Date: Saturday, January 26, 2013, 7:42 AM > > > Hello everyone, > > > > > > I am having trouble with using Bio.Entrez.epost: see > > details > > > below. > > > I have tried converting the variables to bytes, but > > > everything I've tried > > > gives the same error message. > > > > > > I'm guessing that this might be a bug in biopython when > > used > > > with Python 3. > > > If not, could you please tell me where I got this > > wrong, and > > > how I can fix > > > it? > > > > > > Best regards, > > > Nicolas > > > > > > > > > Here are the details: > > > I have tried the following: > > > > > > - > > > post_h=Entrez.epost("nuccore",id="160418,160351") > > > - > > > post_h=Entrez.epost(b"nuccore",id=b"160418,160351") > > > - > > > > > > post_h=Entrez.epost("nuccore".encode("utf-8"),id="160418,160351".encode("utf-8") > > > > > > > > > >>> > > > post_h=Entrez.epost("nuccore",id="160418,160351") > > > Traceback (most recent call last): > > > File "", line 1, in > > > File > > > > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/Bio/Entrez/__init__.py", > > > line 97, in epost > > > return _open(cgi, variables, post=True) > > > File > > > > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/Bio/Entrez/__init__.py", > > > line 436, in _open > > > handle = urllib.request.urlopen(cgi, > > > data=options) > > > File > > > > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > > > line 138, in urlopen > > > return opener.open(url, data, timeout) > > > File > > > > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > > > line 367, in open > > > req = meth(req) > > > File > > > > > > "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", > > > line 1066, in do_request_ > > > raise TypeError("POST data should be bytes" > > > TypeError: POST data should be bytes or an iterable of > > > bytes. It cannot be > > > str. > > > > > > > > > Nicolas Joannin, Ph.D. > > > Bioinformatics Center > > > Kyoto University, Uji campus, Japan > > > _______________________________________________ > > > Biopython mailing list - Biopython at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > From p.j.a.cock at googlemail.com Sun Jan 27 19:04:09 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 27 Jan 2013 19:04:09 +0000 Subject: [Biopython] Bio.Entrez.epost error with Python 3.2 In-Reply-To: <1359261714.79007.YahooMailClassic@web164006.mail.gq1.yahoo.com> References: <1359259570.36220.YahooMailClassic@web164003.mail.gq1.yahoo.com> <1359261714.79007.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: On Sun, Jan 27, 2013 at 4:41 AM, Michiel de Hoon wrote: > Looking at this some more, I found this on the mailing list explaining why we are using post=True: > > http://lists.open-bio.org/pipermail/biopython/2009-May/005152.html Yes, we use post (as the name epost suggests) to upload a long list of IDs without the long URL limitations faced if using an HTTP get. > This page provides some explanation on urllib.parse.urlencode in Python3: > > http://docs.python.org/3/library/urllib.request.html#urllib-examples > Does this mean we have a subtle Python 2 vs 3 problem with epost? Time for another unit test in test_Entrez_online.py which currently only tests einfo and efetch - we should have esearch, epost, espell and esummary in there too I think. Peter From p.j.a.cock at googlemail.com Sun Jan 27 19:07:11 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 27 Jan 2013 19:07:11 +0000 Subject: [Biopython] Issue with Bio.Entrez and protein in Biopython 1.60 In-Reply-To: <5104C6A9.5070507@gmail.com> References: <1359257309.3476.YahooMailClassic@web164004.mail.gq1.yahoo.com> <5104C6A9.5070507@gmail.com> Message-ID: On Sun, Jan 27, 2013 at 6:18 AM, Cristian Alejandro Rojas Quintero wrote: > Hi Michiel, > > It was not a glitch at E-utilities, I had to download and compile Biopython > from official web (previously was the Biopython from Ubuntu repositories). > After of this it works perfectly. > > Thank you If you have Biopython 1.60 from the Ubuntu repository, and replaced it with Biopython 1.60 compiled from source, that shouldn't have made any difference. I think a temporary problem at the NCBI is more likely as Michiel suggested. But either way, it is working now :) Peter From p.j.a.cock at googlemail.com Sun Jan 27 19:15:02 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 27 Jan 2013 19:15:02 +0000 Subject: [Biopython] Biopython 1.60 and ExPASy Swiss-Prot search functions In-Reply-To: <5104C913.7000300@gmail.com> References: <5104C913.7000300@gmail.com> Message-ID: On Sun, Jan 27, 2013 at 6:28 AM, Cristian Alejandro Rojas Quintero wrote: > Hi all, > > I'm trying to make a search of Swiss-Prot records on ExPASy with > sprot_search_ful and sprot_search_de , But appears like the ExPASy server > has been moved, I tried to read the help from the modules and all that I can > find is that the cgi url is "http://www.expasy.ch/cgi-bin/sprot-search-ful" > or "http://www.expasy.ch/cgi-bin/sprot-search-de". I'm trying to access in > that URLs with my web browser but appears a error message indicating that > the site to search is www.uniprot.org. Maybe the functions won't work > anymore because the server has been moved? > > Or am I doing something bad? > > These are the steps to reproduce the error. >>>> handle=ExPASy.sprot_search_ful("Insulin") >>>> print handle.read() > As you can see the status code for http request was 302. BTW: The release > date of the new server was jan 9, 2013 > > Thank you. > ** It looks like ExPASy have retired some more URLs - they did this and broke our get_sprot_raw function three years ago: https://github.com/biopython/biopython/commit/6689bf8657d9515965d63f9c77e6348233472046 If you can work out what the new URLs are this should be easy to fix - last time they had a table at http://www.expasy.ch/expasy_urls.html but that page doesn't exist any more :( In this case, the Error 302 tells us to use: http://www.uniprot.org/uniprot?query=Insulin&S=on Would you like to try updating Bio/ExPASy/__init__.py (and making a pull request on Github)? Regards, Peter From email2ants at gmail.com Tue Jan 29 21:20:37 2013 From: email2ants at gmail.com (Anthony Underwood) Date: Tue, 29 Jan 2013 21:20:37 +0000 Subject: [Biopython] Local blast error on mac Message-ID: Running local blast+ through biopython on a mac I get the following error Traceback (most recent call last): File "local_blast.py", line 31, in local_blast_function.local_blast_function('gene_id_sequences.fasta', 'gene_id_blast_output.txt') File "/Users/sophia/legionella/scripts/local_blast_function.py", line 12, in local_blast_function stdout, stderr = blastp_cline() File "/Users/sophia/.pythonbrew/pythons/Python-2.7/lib/python2.7/site-packages/biopython-1.60-py2.7-macosx-10.6-x86_64.egg/Bio/Application/__init__.py", line 437, in __call__ stdout_str, stderr_str) Bio.Application.ApplicationError: Command '/usr/local/ncbi/blast/bin/blastp -out gene_id_blast_output.txt -outfmt 5 -query gene_id_sequences.fasta -db /Volumes/DataRAID/blast_databases/uniprot_sprot -evalue 0.001' returned non-zero exit status -11 If I run the command on its own "/usr/local/ncbi/blast/bin/blastp -out gene_id_blast_output.txt -outfmt 5 -query gene_id_sequences.fasta -db /Volumes/DataRAID/blast_databases/uniprot_sprot -evalue 0.001" This runs fine and it produces XML output. However through biopython it exits with a strange -11 error. I've tried debugging the biopython code but to no avail- no stderr is produced. Does anybody have any suggestions as to what I might try? Thanks Anthony From p.j.a.cock at googlemail.com Tue Jan 29 21:33:41 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 29 Jan 2013 21:33:41 +0000 Subject: [Biopython] Local blast error on mac In-Reply-To: References: Message-ID: On Tue, Jan 29, 2013 at 9:20 PM, Anthony Underwood wrote: > Running local blast+ through biopython on a mac I get the following error > > Traceback (most recent call last): > ... > Bio.Application.ApplicationError: Command '/usr/local/ncbi/blast/bin/blastp -out gene_id_blast_output.txt -outfmt 5 -query gene_id_sequences.fasta -db /Volumes/DataRAID/blast_databases/uniprot_sprot -evalue 0.001' returned non-zero exit status -11 See http://docs.python.org/2/library/subprocess.html "A negative value -N indicates that the child was terminated by signal N (Unix only)." This means blastp died with signal 11, typically a segmentation fault. > If I run the command on its own > > "/usr/local/ncbi/blast/bin/blastp -out gene_id_blast_output.txt -outfmt 5 -query gene_id_sequences.fasta -db /Volumes/DataRAID/blast_databases/uniprot_sprot -evalue 0.001" > > This runs fine and it produces XML output. Very strange. There is nothing odd there with slashes, spaces or quotes (the usual suspects). Is there anything else happening in the Python code which might interact with BLAST? For example changing directory, changing environment variables, or keeping a handle open to one of the files BLAST is using? You can check that by writing a tiny Python script which just sets up the BLAST wrapper and exec How long does it take to run outside Python? Is there enough time to monitor the task list when run via Python, and perhaps get some clue like tight memory or something? Other than that, I'm not sure off hand what to suggest. Peter From email2ants at gmail.com Tue Jan 29 21:38:40 2013 From: email2ants at gmail.com (Anthony Underwood) Date: Tue, 29 Jan 2013 21:38:40 +0000 Subject: [Biopython] Local blast error on mac In-Reply-To: References: Message-ID: <83845DAA-FA5C-4DC2-8EFF-3EC1B6A53039@gmail.com> Thanks for the quick reply Peter. It's possible that the query file was opened to write the sequence to and then not closed (the code is on my work computer so I can not check right now). I will try changing this and report back. Thanks again, Anthony On 29 Jan 2013, at 21:33, Peter Cock wrote: > On Tue, Jan 29, 2013 at 9:20 PM, Anthony Underwood wrote: >> Running local blast+ through biopython on a mac I get the following error >> >> Traceback (most recent call last): >> ... >> Bio.Application.ApplicationError: Command '/usr/local/ncbi/blast/bin/blastp -out gene_id_blast_output.txt -outfmt 5 -query gene_id_sequences.fasta -db /Volumes/DataRAID/blast_databases/uniprot_sprot -evalue 0.001' returned non-zero exit status -11 > > See http://docs.python.org/2/library/subprocess.html > "A negative value -N indicates that the child was terminated by signal > N (Unix only)." > > This means blastp died with signal 11, typically a segmentation fault. > >> If I run the command on its own >> >> "/usr/local/ncbi/blast/bin/blastp -out gene_id_blast_output.txt -outfmt 5 -query gene_id_sequences.fasta -db /Volumes/DataRAID/blast_databases/uniprot_sprot -evalue 0.001" >> >> This runs fine and it produces XML output. > > Very strange. There is nothing odd there with slashes, spaces or quotes > (the usual suspects). > > Is there anything else happening in the Python code which might > interact with BLAST? For example changing directory, changing > environment variables, or keeping a handle open to one of the files > BLAST is using? > > You can check that by writing a tiny Python script which just > sets up the BLAST wrapper and exec > > How long does it take to run outside Python? Is there enough time > to monitor the task list when run via Python, and perhaps get some > clue like tight memory or something? > > Other than that, I'm not sure off hand what to suggest. > > Peter From petra.kubincova at gmail.com Wed Jan 30 21:50:50 2013 From: petra.kubincova at gmail.com (=?ISO-8859-1?Q?Petra_Kubincov=E1?=) Date: Wed, 30 Jan 2013 22:50:50 +0100 Subject: [Biopython] Fwd: Bug in bgzf module In-Reply-To: References: Message-ID: Hello, recently I have installed and tried out biopython 1.60, especially bgzf module. I have discovered that calling method "tell()" of object of bgzf.BgzfWriter type raises this error: Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.6/dist-packages/Bio/bgzf.py", line 743, in tell return make_virtual_offset(self.handle.tell(), len(self._buffer)) AttributeError: 'BgzfWriter' object has no attribute 'handle' I've checked out source code of bgzf module. Source code of error-raising method "tell()" looks like this: def tell(self): """Returns a BGZF 64-bit virtual offset.""" return make_virtual_offset(self.handle.tell(), len(self._buffer)) The problem is that BgzfWriter does not have variable called "handle", only "_handle". So (IMHO) all that needs to be done to fix this bug is change "self.handle.tell()" to "self._handle.tell()". Cheers, Petra Kubincova From p.j.a.cock at googlemail.com Wed Jan 30 22:12:39 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 30 Jan 2013 22:12:39 +0000 Subject: [Biopython] Fwd: Bug in bgzf module In-Reply-To: References: Message-ID: On Wed, Jan 30, 2013 at 9:50 PM, Petra Kubincov? wrote: > Hello, > > recently I have installed and tried out biopython 1.60, especially bgzf > module. I have discovered that calling method "tell()" of object of > bgzf.BgzfWriter type raises this error: > > Traceback (most recent call last): > File "", line 1, in > File "/usr/local/lib/python2.6/dist-packages/Bio/bgzf.py", line 743, in > tell > return make_virtual_offset(self.handle.tell(), len(self._buffer)) > AttributeError: 'BgzfWriter' object has no attribute 'handle' > > I've checked out source code of bgzf module. Source code of error-raising > method "tell()" looks like this: > > def tell(self): > """Returns a BGZF 64-bit virtual offset.""" > return make_virtual_offset(self.handle.tell(), len(self._buffer)) > > The problem is that BgzfWriter does not have variable called "handle", only > "_handle". So (IMHO) all that needs to be done to fix this bug is change > "self.handle.tell()" to "self._handle.tell()". > > Cheers, > Petra Kubincova Hi Petra, It is nice to know people are using this relatively new code :) Thank you for pointing that out, that is the correct fix: https://github.com/biopython/biopython/commit/2a1be2f3e9b731fa05cc4ad7a01a67866155827c We should add a unit test for this too, in Tests/test_bgzf.py - is that something you'd like to try and do? If not I'll try to remember to add something myself. The reason I implemented the BgzfWriter tell method was for the use-case of writing a BGZF compressed file while also recording an index. I'm curious if that's what you are doing, or if you had another purpose? Thanks, Peter From petra.kubincova at gmail.com Thu Jan 31 22:57:29 2013 From: petra.kubincova at gmail.com (=?ISO-8859-1?Q?Petra_Kubincov=E1?=) Date: Thu, 31 Jan 2013 23:57:29 +0100 Subject: [Biopython] Fwd: Bug in bgzf module In-Reply-To: References: Message-ID: Hi Peter, well, I don't have much experience with unit tests but I will try to come up with something. :) I'll let you know if I won't succeed. And yes, recording an index is exactly the thing I need to do. (I am currently working on interval mapping tool for multiple whole-genome alignments, where I need to read .maf file, write preprocessed data into a compressed file and then work just with index for the compressed file and the compressed file itself to do the mapping.) Have a nice day, Petra On Wed, Jan 30, 2013 at 11:12 PM, Peter Cock wrote: > On Wed, Jan 30, 2013 at 9:50 PM, Petra Kubincov? > wrote: > > Hello, > > > > recently I have installed and tried out biopython 1.60, especially bgzf > > module. I have discovered that calling method "tell()" of object of > > bgzf.BgzfWriter type raises this error: > > > > Traceback (most recent call last): > > File "", line 1, in > > File "/usr/local/lib/python2.6/dist-packages/Bio/bgzf.py", line 743, in > > tell > > return make_virtual_offset(self.handle.tell(), len(self._buffer)) > > AttributeError: 'BgzfWriter' object has no attribute 'handle' > > > > I've checked out source code of bgzf module. Source code of error-raising > > method "tell()" looks like this: > > > > def tell(self): > > """Returns a BGZF 64-bit virtual offset.""" > > return make_virtual_offset(self.handle.tell(), len(self._buffer)) > > > > The problem is that BgzfWriter does not have variable called "handle", > only > > "_handle". So (IMHO) all that needs to be done to fix this bug is change > > "self.handle.tell()" to "self._handle.tell()". > > > > Cheers, > > Petra Kubincova > > Hi Petra, > > It is nice to know people are using this relatively new code :) > > Thank you for pointing that out, that is the correct fix: > > https://github.com/biopython/biopython/commit/2a1be2f3e9b731fa05cc4ad7a01a67866155827c > > We should add a unit test for this too, in Tests/test_bgzf.py - > is that something you'd like to try and do? If not I'll try to > remember to add something myself. > > The reason I implemented the BgzfWriter tell method was for > the use-case of writing a BGZF compressed file while also > recording an index. I'm curious if that's what you are doing, > or if you had another purpose? > > Thanks, > > Peter >