From mjldehoon at yahoo.com Sun Aug 1 11:14:23 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 1 Aug 2010 08:14:23 -0700 (PDT) Subject: [Biopython-dev] Python 3 and encoding for online resources In-Reply-To: Message-ID: <467239.37480.qm@web62408.mail.re1.yahoo.com> According to this post: http://stackoverflow.com/questions/1179305/expat-parsing-in-python-3 we need only one parser which always parses a byte stream. Bio.Entrez uses File.UndoHandle but just to look for potential errors in the first few lines when opening the Entrez url, which in my opinion we shouldn't be doing anyway since it's the parser's job to decide whether the input is well-formed. So I'd suggest to not use File.UndoHandle (at all), make sure our parser works with Python 3 byte streams, and ask users to open any downloaded Entrez XML files in binary mode. Is there a Biopython version (in trunk or otherwise) that is ready for Python 3? If so, I can have a look at the parser to see if it handles byte streams correctly. --Michiel. --- On Tue, 7/27/10, Peter wrote: > From: Peter > Subject: [Biopython-dev] Python 3 and encoding for online resources > To: "Biopython-Dev Mailing List" > Date: Tuesday, July 27, 2010, 9:23 AM > Hi all, > > One of the remaining (pure python) problems with Biopython > under Python 3 relates to parsing online resources like > the > NCBI Entrez API or even Bio.ExPASy.get_sprot_raw(). > See for example test_SeqIO_online.py for a failure. > > In Python 2, urlopen from urlib or urllib2 would give a > string handle. In python 3, you get a bytes handle (not > a unicode handle and choosing the encoding is tricky): > http://docs.python.org/py3k/library/urllib.request.html > > In the case of resources like the NCBI and ExPASy we > should be able to assume an encoding (maybe UTF-8 > or Latin) for all the plain text output, while from > XML/HTML > there are ways for the data to specify this itself. > > I think we may need to transform the urllib bytes handle > into > a unicode string handle for parsing. One option would be > to > extend the Bio.File.UndoHandle class (or invent a > subclass) > which applies the decoding. This seems simple since > Bio.Entrez and Bio.ExPASy already use this class. > > Another option (which I suggested on the Bio.SeqIO.index() > thread [1]) would be to extend our parsers to cope with > both > byte and unicode handles. That could be a lot of work > though... > > Thoughts? > > Peter > > [1] http://lists.open-bio.org/pipermail/biopython-dev/2010-July/008004.html > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython at maubp.freeserve.co.uk Sun Aug 1 13:54:03 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 1 Aug 2010 18:54:03 +0100 Subject: [Biopython-dev] Python 3 and encoding for online resources In-Reply-To: <467239.37480.qm@web62408.mail.re1.yahoo.com> References: <467239.37480.qm@web62408.mail.re1.yahoo.com> Message-ID: On Sun, Aug 1, 2010 at 4:14 PM, Michiel de Hoon wrote: > According to this post: > > http://stackoverflow.com/questions/1179305/expat-parsing-in-python-3 > > we need only one parser which always parses a byte stream. > Bio.Entrez uses File.UndoHandle but just to look for potential > errors in the first few lines when opening the Entrez url, which > in my opinion we shouldn't be doing anyway since it's the > parser's job to decide whether the input is well-formed. > So I'd suggest to not use File.UndoHandle (at all), ... I disagree. The NCBI return multiple different file formats, so there are multiple different parsers that may get an error page. Given the NCBI return HTML error pages regardless of what format the request was (XML, plain text, etc), I think we have to look for errors before giving the data to the parser. But that can be done using byte strings just as easily as with unicode strings. > make sure our parser works with Python 3 byte streams, and > ask users to open any downloaded Entrez XML files in binary > mode. That sounds workable. > Is there a Biopython version (in trunk or otherwise) that is ready > for Python 3? If so, I can have a look at the parser to see if it > handles byte streams correctly. The trunk itself -- after running 2to3 on it (as described in the README file). Or if you just want to grab some code for a quick play, I have a branch where I've been doing this on a semi-regular basis: http://github.com/peterjc/biopython/tree/auto2to3 Note that we are keeping the trunk as Python 2 code, which can make like interesting (Another option would be a Python 3 branch, but we'd then need to manually keep things in sync). To make life a little easier, we are probably going to need some python 3 compatibility functions (like bytes as unicode, unicode as bytes - see the NumPy project for other possible examples), which we are currently doing on a module by module basis. Here I'm thinking specifically of some of the things required in Bio/SeqIO/SffIO.py, but there are other python 3 hacks we may want to standardise. For the C code (which we haven't looked at yet, setup,py is ignoring the extensions on Python 3 for now) we should be able to use the normal #ifdef approach. Again, we can learn a lot from looking at NumPy here. Peter From mjldehoon at yahoo.com Mon Aug 2 09:50:47 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 2 Aug 2010 06:50:47 -0700 (PDT) Subject: [Biopython-dev] Python 3 and encoding for online resources Message-ID: <932397.16483.qm@web62408.mail.re1.yahoo.com> > Or if you just want to grab some code for a quick play, >I have a branch where I've been doing this on a > semi-regular basis: > > http://github.com/peterjc/biopython/tree/auto2to3 Thanks! I used this branch to test the Bio.Entrez and Bio.SwissProt parsers. The Bio.Entrez Parser works as is; the Bio.SwissProt parser is really easy to fix (just convert each line into a plain string inside the _read function in Bio.SwissProt.__init__). Perhaps we can do something similar for the other test_SeqIO_online.py failures (the ones appearing in Bio/SeqIO/FastaIO.py)? > > So I'd suggest to not use File.UndoHandle (at all), > ... > I disagree. The NCBI return multiple different file > formats, so there are multiple different parsers that may get > an error page. > > Given the NCBI return HTML error pages regardless of what > format the request was (XML, plain text, etc), I think we > have to look for errors before giving the data to the > parser. Part of the problem solves itself when we change to Python 3. In Python 3, urllib.request.urlopen raises a urllib.error.HTTPError in cases where urllib.urlopen in Python 2 raises no exception: mdehoon:~/Software/biopython2to3/peterjc-biopython-06c2ea6 $ python Python 2.7 (r27:82500, Jul 19 2010, 00:08:00) [GCC 4.0.1 (Apple Computer, Inc. build 5370)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import urllib >>> urllib.urlopen("http://www.biopython.org/somethingimadeup") > >>> mdehoon:~/Software/biopython2to3/peterjc-biopython-06c2ea6 $ python3 Python 3.1.2 (r312:79360M, Mar 24 2010, 01:33:18) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import urllib.request >>> urllib.request.urlopen("http://www.biopython.org/somethingimadeup") Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/urllib/request.py", line 121, in urlopen return _opener.open(url, data, timeout) File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/urllib/request.py", line 355, in open response = meth(req, response) File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/urllib/request.py", line 467, in http_response 'http', request, response, code, msg, hdrs) File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/urllib/request.py", line 393, in error return self._call_chain(*args) File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/urllib/request.py", line 327, in _call_chain result = func(*args) File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/urllib/request.py", line 475, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 404: Not Found >>> which means that we can catch at least some errors without having to actually read from the handle. A 414 Request-URI Too Large is also being caught, In this sense, urllib in Python 3 behaves as urllib2 in Python 2. I don't know though how to go about checking whether all HTTP errors we check for in Bio.Entrez are being caught (anybody know a magical way to trigger a particular HTTP error?). Nevertheless, this avoids having to go through a File.UndoHandle, and is safer than checking the HTML / text response from NCBI (at least the "download dataset is empty" response from NCBI has already changed). So I would suggest to switch from urllib to urllib2 in Bio.Entrez and catch any HTTP errors (urllib2 is translated appropriately by 2to3), and to handle any bytes/utf8/ascii conversion inside the parser (as in Bio.SwissProt). --Michiel. --- On Sun, 8/1/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Python 3 and encoding for online resources > To: "Michiel de Hoon" > Cc: "Biopython-Dev Mailing List" > Date: Sunday, August 1, 2010, 1:54 PM > On Sun, Aug 1, 2010 at 4:14 PM, > Michiel de Hoon > wrote: > > According to this post: > > > > http://stackoverflow.com/questions/1179305/expat-parsing-in-python-3 > > > > we need only one parser which always parses a byte > stream. > > Bio.Entrez uses File.UndoHandle but just to look for > potential > > errors in the first few lines when opening the Entrez > url, which > > in my opinion we shouldn't be doing anyway since it's > the > > parser's job to decide whether the input is > well-formed. > > So I'd suggest to not use File.UndoHandle (at all), > ... > > I disagree. The NCBI return multiple different file > formats, so > there are multiple different parsers that may get an error > page. > Given the NCBI return HTML error pages regardless of what > format the request was (XML, plain text, etc), I think we > have to look for errors before giving the data to the > parser. > But that can be done using byte strings just as easily as > with > unicode strings. > > > make sure our parser works with Python 3 byte streams, > and > > ask users to open any downloaded Entrez XML files in > binary > > mode. > > That sounds workable. > > > Is there a Biopython version (in trunk or otherwise) > that is ready > > for Python 3? If so, I can have a look at the parser > to see if it > > handles byte streams correctly. > > The trunk itself -- after running 2to3 on it (as described > in the > README file). Or if you just want to grab some code for a > quick > play, I have a branch where I've been doing this on a > semi-regular > basis: > > http://github.com/peterjc/biopython/tree/auto2to3 > > Note that we are keeping the trunk as Python 2 code, which > can make like interesting (Another option would be a > Python > 3 branch, but we'd then need to manually keep things in > sync). > To make life a little easier, we are probably going to need > some > python 3 compatibility functions (like bytes as unicode, > unicode > as bytes - see the NumPy project for other possible > examples), > which we are currently doing on a module by module basis. > Here I'm thinking specifically of some of the things > required in > Bio/SeqIO/SffIO.py, but there are other python 3 hacks we > may > want to standardise. > > For the C code (which we haven't looked at yet, setup,py > is > ignoring the extensions on Python 3 for now) we should be > able to use the normal #ifdef approach. Again, we can > learn > a lot from looking at NumPy here. > > Peter > From biopython at maubp.freeserve.co.uk Mon Aug 2 10:04:49 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 2 Aug 2010 15:04:49 +0100 Subject: [Biopython-dev] Python 3 and encoding for online resources In-Reply-To: <932397.16483.qm@web62408.mail.re1.yahoo.com> References: <932397.16483.qm@web62408.mail.re1.yahoo.com> Message-ID: On Mon, Aug 2, 2010 at 2:50 PM, Michiel de Hoon wrote: >> Or if you just want to grab some code for a quick play, >>I have a branch where I've been doing this on a >> semi-regular basis: >> >> http://github.com/peterjc/biopython/tree/auto2to3 > > Thanks! I used this branch to test the Bio.Entrez and Bio.SwissProt parsers. > The Bio.Entrez Parser works as is; the Bio.SwissProt parser is really easy to > fix (just convert each line into a plain string inside the _read function in > Bio.SwissProt.__init__). Perhaps we can do something similar for the other > test_SeqIO_online.py failures (the ones appearing in Bio/SeqIO/FastaIO.py)? Maybe (replied in more detail below) >> > So I'd suggest to not use File.UndoHandle (at all), >> ... >> I disagree. The NCBI return multiple different file >> formats, so there are multiple different parsers that may get >> an error page. >> >> Given the NCBI return HTML error pages regardless of what >> format the request was (XML, plain text, etc), I think we >> have to look for errors before giving the data to the >> parser. > > Part of the problem solves itself when we change to Python 3. In Python > 3, urllib.request.urlopen raises a urllib.error.HTTPError in cases where > urllib.urlopen in Python 2 raises no exception: > > ... > > So I would suggest to switch from urllib to urllib2 in Bio.Entrez and catch > any HTTP errors (urllib2 is translated appropriately by 2to3), That sounds very sensible. > ... and to handle any bytes/utf8/ascii conversion inside the parser > (as in Bio.SwissProt). i.e. Make the SwissProt, FASTA, etc parsers cope with unicode string handles (default from open on Python 3) and bytes handles (network handles or from file open in binary mode)? I think this is probably a worthwhile thing to do in any case, especially for the indexing code, see: http://lists.open-bio.org/pipermail/biopython-dev/2010-July/008011.html Peter From bugzilla-daemon at portal.open-bio.org Mon Aug 2 10:21:40 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 2 Aug 2010 10:21:40 -0400 Subject: [Biopython-dev] [Bug 3119] Bio.Nexus can't parse file from Prank 100701 (1st July 2010) In-Reply-To: Message-ID: <201008021421.o72ELesu027221@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3119 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-02 10:21 EST ------- Ari has released PRANK v100802 (2 August 2010) which fixes the NEXUS output problems identified (unquoted taxa names containing punctuation, extra comma in translate block). With Frank's small fix for the tree, we can now parse the latest PRANK output http://github.com/biopython/biopython/commit/f4b0007d29fdd878e4cc326b12e63e833e246ce4 Marking as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Aug 2 13:22:44 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 2 Aug 2010 18:22:44 +0100 Subject: [Biopython-dev] EMBOSS SAM/BAM parser and reverse strand reads Message-ID: Hi all, One of my immediate questions on learning that EMBOSS 6.3.1 had SAM/BAM support was how it handled reads mapped to the reverse strand: http://lists.open-bio.org/pipermail/emboss-dev/2010-July/000656.html > What do you do about the strand issue? SAM/BAM stored reads > which map onto the reverse strand in reverse complement. If > you want to get back to the original orientation for output as > FASTQ you must apply the reverse complement (plus reverse > the quality scores too of course). As I suspected, currently EMBOSS ignores this and gives the sequence and quality string as it is stored in the SAM/BAM file. Here are three consecutive entries from the example SAM file, http://pysam.googlecode.com/hg/tests/ex1.sam.gz ... EAS54_65:6:115:538:276 163 chr1 209 99 35M = 360 186 TATTTGTAATGAAAACTATATTTATGCTATTCAGT <<<<<<<<;<<<;;<<<;<:<<<:<<<<<<;;;7; MF:i:18 Aq:i:75 NM:i:0 UQ:i:0 H0:i:1 H1:i:0 EAS219_FC30151:7:51:1429:1043 83 chr1 209 99 35M = 59 -185 TATTTGTAATGAAAACTATATTTATGCTATTCAGT 9<5<<<<<<<<<<<<<9<<<9<<<<<<<<<<<<<< MF:i:18 Aq:i:68 NM:i:0 UQ:i:0 H0:i:1 H1:i:0 EAS114_30:1:176:168:513 163 chr1 210 99 35M = 410 235 ATTTGTAATGAAAACTATATTTATGCTATTCAGTT <<<<;<<<<<<<<<<<<<<<<<<<:&<<<<:;0;; MF:i:18 Aq:i:71 NM:i:0 UQ:i:0 H0:i:1 H1:i:0 ... The middle read of this triple, EAS219_FC30151:7:51:1429:1043, maps to chr1 on the reverse strand - we known this from the flag value 83. Note 83 = 1 + 2 + 16 + 64, or in hex, 0x53 = 0x40 + 0x10 + 0x02 + 0x01. Referring to the SAM/BAM specification, 0x01 - read is paired 0x02 - read is in a proper pair 0x10 - mapped to reverse strand 0x40 - first read in the pair This is the FASTQ output via seqret from SAM or BAM using EMBOSS 6.3.1 with the previously discussed patches: @EAS54_65:6:115:538:276 TATTTGTAATGAAAACTATATTTATGCTATTCAGT + <<<<<<<<;<<<;;<<<;<:<<<:<<<<<<;;;7; @EAS219_FC30151:7:51:1429:1043 TATTTGTAATGAAAACTATATTTATGCTATTCAGT + 9<5<<<<<<<<<<<<<9<<<9<<<<<<<<<<<<<< @EAS114_30:1:176:168:513 ATTTGTAATGAAAACTATATTTATGCTATTCAGTT + <<<<;<<<<<<<<<<<<<<<<<<<:&<<<<:;0;; Notice that all three read sequence and quality strings match the SAM file. On the other hand, this is from my experimental branch for Biopython, converting SAM/BAM to FASTQ: ... @EAS54_65:6:115:538:276/2 TATTTGTAATGAAAACTATATTTATGCTATTCAGT + <<<<<<<<;<<<;;<<<;<:<<<:<<<<<<;;;7; @EAS219_FC30151:7:51:1429:1043/1 ACTGAATAGCATAAATATAGTTTTCATTACAAATA + <<<<<<<<<<<<<<9<<<9<<<<<<<<<<<<<5<9 @EAS114_30:1:176:168:513/2 ATTTGTAATGAAAACTATATTTATGCTATTCAGTT + <<<<;<<<<<<<<<<<<<<<<<<<:&<<<<:;0;; ... Ignore for the moment the fact that I'm adding /1 and /2 suffixes to the read names for the first and second (forward and reverse) reads in a pair. Notice that for the second read (which is mapped to the reverse strand) I am deliberately returning the reverse complement of the sequence, with the quality string reversed. I'd like to propose that EMBOSS also invert the sequence for those reads mapped to the reverse strand. This is essential for the use case of converting SAM/BAM to get the *original* unmapped reads. This applies regardless of the output format: FASTA, FASTQ, or unaligned SAM/BAM (since for now EMBOSS does not output aligned SAM/BAM). Given I think there are problems with the SAM/BAM parsing in EMBOSS 6.3.1 which will require a patch or point release anyway, I don't think we need to worry about this change breaking backwards compatibility (as long as this is done as part of the first bug fix update). However, this isn't my decision of course ;) To elaborate, the reason I am acutely aware of this issue is that it has bitten me already. I had some (large) SAM/BAM files from a collaborator for paired end transcriptome data mapped onto a draft genome. Due to the file sizes we didn't want to transfer the original FASTQ files over the internet as well. When I wanted to remap the reads to a different reference, I instead extracted the reads from the SAM/BAM files. Initially I converted from SAM to FASTQ using sed (and also in Python as a check) without being aware of the reverse stand issue... There could be some valid reasons the current EMBOSS behaviour is useful - but right now I can't think of any. Any suggestions? Regards, Peter C. From bugzilla-daemon at portal.open-bio.org Mon Aug 2 14:12:57 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 2 Aug 2010 14:12:57 -0400 Subject: [Biopython-dev] [Bug 3127] New: SeqIO.write appends text to fasta comments Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3127 Summary: SeqIO.write appends text to fasta comments Product: Biopython Version: 1.54 Platform: PC OS/Version: Windows XP Status: NEW Severity: minor Priority: P2 Component: Other AssignedTo: biopython-dev at biopython.org ReportedBy: jared.ackers at smithsdetection.com When using the following SeqIO command: SeqIO.write(SeqIO.parse("file.txt", "tab"), "file.fas", "fasta") SeqIO will append the text " " to every sequence ID in the output file. The input file has two tab-delimited columns, the first with a (custom) sequence ID and the second with the corresponding sequence. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Aug 2 14:50:50 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 2 Aug 2010 14:50:50 -0400 Subject: [Biopython-dev] [Bug 3127] Set SeqRecord description in SeqIO "tab" parser In-Reply-To: Message-ID: <201008021850.o72IoopW009476@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3127 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|SeqIO.write appends text to |Set SeqRecord description in |fasta comments |SeqIO "tab" parser ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-02 14:50 EST ------- The problem isn't really in Bio.SeqIO.write(), it is with the SeqRecord default and/or the "tab" parser. Retitling bug... In the "tab" file format there is no description, so you are getting the SeqRecord's default description. We'd recently talked about making this just an empty string, alternatively and with less risk the "tab" parser could set the description to "" explicitly. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Aug 3 10:07:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Aug 2010 15:07:40 +0100 Subject: [Biopython-dev] Python 3 and encoding for online resources In-Reply-To: References: <932397.16483.qm@web62408.mail.re1.yahoo.com> Message-ID: Peter wrote: >Michiel wrote: >> So I would suggest to switch from urllib to urllib2 in Bio.Entrez and catch >> any HTTP errors (urllib2 is translated appropriately by 2to3), > > That sounds very sensible. > Hi Michiel, I see you've switched from urllib to urllib2, but you also removed all the NCBI specific error handling (which it turns out would need to be updated). I just tried a simple history example and if you deliberately use a wrong webenv you get an HTML error page back (from memory and the comments in our code it used to be a plain text error page):

Error occurred: Unable to obtain query #1


  • db=pubmed
  • query_key=1
  • report=medline
  • dispstart=0
  • dispmax=10
  • mode=text
  • WebEnv=wrong

pmfetch need params:

  • (id=NNNNNN[,NNNN,etc]) or (query_key=NNN, where NNN - number in the history, 0 - clipboard content for current database)
  • db=db_name (mandatory)
  • report=[docsum, brief, abstract, citation, medline, asn.1, mlasn1, uilist, sgml, gen] (Optional; default is asn.1)
  • mode=[html, file, text, asn.1, xml] (Optional; default is html)
  • dispstart - first element to display, from 0 to count - 1, (Optional; default is 0)
  • dispmax - number of items to display (Optional; default is all elements, from dispstart)

  • See help. The old code could handle this just by looking for "Error occurred". Anyway, this demonstrates that we can't just assume any error will be handled by the NCBI as an HTTP error code and thus get turned into an exception automatically by urllib2. In this particular case, one might argue the NCBI should use HTTP status code 400 Bad Request. I think we should write some online tests for Bio.Entrez including error conditions like this. In a related example, I'm trying added a sleep statement between my ESearch and EFetch calls in order let the session time out. I'll post back once I know what it does - but I'll be pleasantly surprised if they do something like HTTP status code 410 Gone, I'm expecting another HTML error page. Regards, Peter From mjldehoon at yahoo.com Tue Aug 3 11:44:49 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 3 Aug 2010 08:44:49 -0700 (PDT) Subject: [Biopython-dev] Python 3 and encoding for online resources In-Reply-To: Message-ID: <987321.1607.qm@web62405.mail.re1.yahoo.com> Have you tried looking at handle.info(), where handle is the handle returned by urllib.urlopen()? Another candidate is handle.getcode(). Otherwise, we could try to contact NCBI to see if their error messages can be returned in a standard format, or at least in a format consistent with the request. Otherwise, we can also consider not to parse the HTML error message; the SeqIO/Entrez parsers will notice a format problem and raise an exception anyway. --Michiel. --- On Tue, 8/3/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Python 3 and encoding for online resources > To: "Michiel de Hoon" > Cc: "Biopython-Dev Mailing List" > Date: Tuesday, August 3, 2010, 10:07 AM > Peter wrote: > >Michiel wrote: > >> So I would suggest to switch from urllib to > urllib2 in Bio.Entrez and catch > >> any HTTP errors (urllib2 is translated > appropriately by 2to3), > > > > That sounds very sensible. > > > > Hi Michiel, > > I see you've switched from urllib to urllib2, but you also > removed all > the NCBI specific error handling (which it turns out would > need to be > updated). > > I just tried a simple history example and if you > deliberately use a > wrong webenv you get an HTML error page back (from memory > and the comments in our code it used to be a plain text > error page): > > > >

    Error occurred: Unable to obtain query > #1


      title="some params from request:"> >
    • db=pubmed
    • >
    • query_key=1
    • >
    • report=medline
    • >
    • dispstart=0
    • >
    • dispmax=10
    • >
    • mode=text
    • >
    • WebEnv=wrong
    • >
    >
    pmfetch need > params:

    >
  • (id=NNNNNN[,NNNN,etc]) or (query_key=NNN, where > NNN - number in > the history, 0 - clipboard content for current > database)
  • >
  • db=db_name (mandatory)
  • >
  • report=[docsum, brief, abstract, citation, > medline, asn.1, mlasn1, > uilist, sgml, gen] (Optional; default is asn.1)
  • >
  • mode=[html, file, text, asn.1, xml] (Optional; > default is html)
  • >
  • dispstart - first element to display, from 0 to > count - 1, > (Optional; default is 0)
  • >
  • dispmax - number of items to display (Optional; > default is all > elements, from dispstart)
  • >
    See help. > > > The old code could handle this just by looking for "Error > occurred". > > Anyway, this demonstrates that we can't just assume any > error will > be handled by the NCBI as an HTTP error code and thus get > turned into an exception automatically by urllib2. In this > particular > case, one might argue the NCBI should use HTTP status code > 400 Bad Request. > > I think we should write some online tests for Bio.Entrez > including error conditions like this. > > In a related example, I'm trying added a sleep statement > between > my ESearch and EFetch calls in order let the session time > out. > I'll post back once I know what it does - but I'll be > pleasantly > surprised if they do something like HTTP status code 410 > Gone, > I'm expecting another HTML error page. > > Regards, > > Peter > From biopython at maubp.freeserve.co.uk Tue Aug 3 12:16:44 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Aug 2010 17:16:44 +0100 Subject: [Biopython-dev] Python 3 and encoding for online resources In-Reply-To: <987321.1607.qm@web62405.mail.re1.yahoo.com> References: <987321.1607.qm@web62405.mail.re1.yahoo.com> Message-ID: On Tue, Aug 3, 2010 at 4:44 PM, Michiel de Hoon wrote: > Have you tried looking at handle.info(), where handle is the handle > returned by urllib.urlopen()? Another candidate is handle.getcode(). In the case of using the history support with a bad webenv, we get an HTML error page with HTTP status code 200 (OK) which explains why urllib doesn't raise an exception (sample example in previous email). In the case of using the history support with an invalid integer query key, we get an HTML error page with HTTP status code 200 (OK), e.g.

    Error occurred: Unable to obtain query #123456789

    ... In the case of using the history support with a non-integer query key, we also get an HTML error page with HTTP status code 200 (OK), e.g.

    Error occurred: NCBI C++ Exception: Error: CORELIB(CStringException::eConvert) "/pubmed_gen/rbuild/version/20100419.1/entrez/c++/src/corelib/ncbistr.cpp", line 666: ncbi::NStr::StringToInt8() --- Cannot convert string 'wrong' to Int8 (m_Pos = 0)

    ... It puzzles me that they are still using HTTP status code 200 (OK) here. > Otherwise, we could try to contact NCBI to see if their error messages > can be returned in a standard format, or at least in a format consistent > with the request. This is definitely worth trying. Additionally we should also ask them about making more use of HTTP error codes like 400 when serving an error page. Would you like to email the NCBI Entrez team about this (and CC me please)? > Otherwise, we can also consider not to parse the HTML error message; > the SeqIO/Entrez parsers will notice a format problem and raise an > exception anyway. As things stand with the NCBI returning 200 (OK) HTML error messages I'm not comfortable with this. It will break the use case of a batch download script which writes the data direct to disk without parsing it (or giving it to another tool as input). I believe the earlier we can catch any NCBI error messages the better, even if it does require some messy peeping at the data via an buffered handle. Thanks, Peter From mjldehoon at yahoo.com Wed Aug 4 05:19:45 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 4 Aug 2010 02:19:45 -0700 (PDT) Subject: [Biopython-dev] Python 3 and encoding for online resources In-Reply-To: Message-ID: <764813.13018.qm@web62401.mail.re1.yahoo.com> Can you give an example script where you get an HTML error page? In the cases I've tried, the metadata revealed that an error had occurred, even if urllib2.urlopen didn't raise an HTTP error but returned a handle to XML containing the error message. --Michiel. --- On Tue, 8/3/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Python 3 and encoding for online resources > To: "Michiel de Hoon" > Cc: "Biopython-Dev Mailing List" > Date: Tuesday, August 3, 2010, 12:16 PM > On Tue, Aug 3, 2010 at 4:44 PM, > Michiel de Hoon > wrote: > > Have you tried looking at handle.info(), where handle > is the handle > > returned by urllib.urlopen()? Another candidate is > handle.getcode(). > > In the case of using the history support with a bad webenv, > we get an > HTML error page with HTTP status code 200 (OK) which > explains > why urllib doesn't raise an exception (sample example in > previous email). > > In the case of using the history support with an invalid > integer query key, > we get an HTML error page with HTTP status code 200 (OK), > e.g. > > > >

    Error occurred: Unable to obtain query > #123456789

    > ... > > > > In the case of using the history support with a non-integer > query key, > we also get an HTML error page with HTTP status code 200 > (OK), e.g. > > > >

    Error occurred: NCBI C++ Exception: > ? ? Error:? ? ? ? > CORELIB(CStringException::eConvert) > "/pubmed_gen/rbuild/version/20100419.1/entrez/c++/src/corelib/ncbistr.cpp", > line 666: ncbi::NStr::StringToInt8() --- Cannot convert > string 'wrong' > to Int8 (m_Pos = 0) >

    > ... > > > > It puzzles me that they are still using HTTP status code > 200 (OK) here. > > > Otherwise, we could try to contact NCBI to see if > their error messages > > can be returned in a standard format, or at least in a > format consistent > > with the request. > > This is definitely worth trying. Additionally we should > also ask them about > making more use of HTTP error codes like 400 when serving > an error page. > > Would you like to email the NCBI Entrez team about this > (and CC me > please)? > > > Otherwise, we can also consider not to parse the HTML > error message; > > the SeqIO/Entrez parsers will notice a format problem > and raise an > > exception anyway. > > As things stand with the NCBI returning 200 (OK) HTML error > messages > I'm not comfortable with this. It will break the use case > of a batch > download script which writes the data direct to disk > without parsing it > (or giving it to another tool as input). I believe the > earlier we can catch > any NCBI error messages the better, even if it does require > some messy > peeping at the data via an buffered handle. > > Thanks, > > Peter > From mjldehoon at yahoo.com Wed Aug 4 09:29:04 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 4 Aug 2010 06:29:04 -0700 (PDT) Subject: [Biopython-dev] Python 3 and encoding for online resources In-Reply-To: Message-ID: <891819.37186.qm@web62408.mail.re1.yahoo.com> --- On Wed, 8/4/10, Peter wrote: > > In the cases I've tried, the metadata revealed that an > error had occurred, > > even if urllib2.urlopen didn't raise an HTTP error but > returned a handle to > > XML containing the error message. > > What meta data? > I was looking at handle.info(), where handle is the handle returned by urllib2.urlopen. But in your example, the information in handle.info() did not reveal a difference between a successful search and an unsuccessful one, so anyway this won't work in general. Btw, this is an error message nowadays returned by epost: >>> handle = Entrez.epost(db="nothing") >>> handle.read() '\n\n\n\tInvalid db name specified: nothing\n\n' >>> Previously, the same request gave a clean error message in XML format (see epost2.xml in Tests/Entrez). --Michiel. From n.j.loman at bham.ac.uk Wed Aug 4 10:48:53 2010 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Wed, 04 Aug 2010 15:48:53 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? Message-ID: <4C597DD5.9060604@bham.ac.uk> Hi biopython-developers, Has anyone written any code to convert ACE files (Newbler ACE, in particular) to SAM format? I've seen a little bit of discussion on this subject in various places: http://seqanswers.com/forums/showthread.php?t=5138 http://biostar.stackexchange.com/questions/1828/how-to-convert-newbler-output-or-ace-to-sam-format It seems that a quick way of doing this would involve Biopython's support for reading ACE and (perhaps) PySam's support for writing SAM files (http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/contents.html) The reason I would prefer to convert ACE files to the 454PairAlign.txt file as this would mean support for both de novo assemblies as well as mapping projects. I am not particularly au fait with the SAM format but can't see why that shouldn't work. If no-one has started writing something I would be happy to give it a go but equally if someone has I'd be more than happy to try it out :) Cheers Nick From bioinformed at gmail.com Wed Aug 4 12:13:09 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 12:13:09 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C597DD5.9060604@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 10:48 AM, Nick Loman wrote: > Hi biopython-developers, > > Has anyone written any code to convert ACE files (Newbler ACE, in > particular) to SAM format? > > Hi Nick, I have a converter that uses the 454PairAlign.txt format to convert to SAM/BAM as part of the GLU package (http://code.google.com/p/glu-genetics). Their ACE files are a bit problematic, though I do not remember the exact reasons offhand. I'll revisit the issue, since the alignment records are only half of the conversion, since most folks also want untrimmed reads and quality scores. Aside from the input format, the only difficulty with my converter are the dozen or so annoying pre-requisite packages to install to use it (Python, HDF5, pytables, numpy, scipy, ply, biopython, etc. etc.). I also know that the Roche/454 folks are adding SAM/BAM support to a future version of Newbler, but I wouldn't expect to see that for at least a few more months. -Kevin From n.j.loman at bham.ac.uk Wed Aug 4 12:18:40 2010 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Wed, 04 Aug 2010 17:18:40 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> Message-ID: <4C5992E0.8040904@bham.ac.uk> Kevin Jacobs wrote: > I have a converter that uses the 454PairAlign.txt format to convert to > SAM/BAM as part of the GLU package > (http://code.google.com/p/glu-genetics). Their ACE files are a bit > problematic, though I do not remember the exact reasons offhand. I'll > revisit the issue, since the alignment records are only half of the > conversion, since most folks also want untrimmed reads and quality scores. > > Aside from the input format, the only difficulty with my converter are > the dozen or so annoying pre-requisite packages to install to use it > (Python, HDF5, pytables, numpy, scipy, ply, biopython, etc. etc.). Hi Kevin Thanks for your email. I was aware of glu-genetics and will give it a whirl. The main reason I wanted an ACE file converter is that I mistakenly thought that the de novo component of Newbler won't produce the 454PairAlign.txt file, but reading the manual I see that file can be produced by supplying the -p (or -pt, for tab delimited output) option. But as you say, getting quality scores would be useful so I would be interested to know any progress you might make with an ACE converter. Cheers Nick From bioinformed at gmail.com Wed Aug 4 12:23:23 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 12:23:23 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C5992E0.8040904@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 12:18 PM, Nick Loman wrote: > Kevin Jacobs wrote: > >> I have a converter that uses the 454PairAlign.txt format to convert to >> SAM/BAM as part of the GLU package (http://code.google.com/p/glu-genetics). >> Their ACE files are a bit problematic, though I do not remember the exact >> reasons offhand. I'll revisit the issue, since the alignment records are >> only half of the conversion, since most folks also want untrimmed reads and >> quality scores. >> >> Aside from the input format, the only difficulty with my converter are the >> dozen or so annoying pre-requisite packages to install to use it (Python, >> HDF5, pytables, numpy, scipy, ply, biopython, etc. etc.). >> > Hi Kevin > > Thanks for your email. I was aware of glu-genetics and will give it a > whirl. The main reason I wanted an ACE file converter is that I mistakenly > thought that the de novo component of Newbler won't produce the > 454PairAlign.txt file, but reading the manual I see that file can be > produced by supplying the -p (or -pt, for tab delimited output) option. But > as you say, getting quality scores would be useful so I would be interested > to know any progress you might make with an ACE converter. > > Hi Nick, I may have mislead you-- I use the 454PairAlign.txt and SFF files together to generate SAM/BAM files with full untrimmed read data and quality values. My recollection was that the Newbler ACE files contained only the consensus sequence and not the individuals reads, which is why I didn't go down that road. I routinely use "-noace" so I'm quickly realigning a small dataset to generate an example ACE file to verify this. If I am incorrect and alignment information is indeed available from the ACE files, I'll happily add support for them to my converter. -Kevin From biopython at maubp.freeserve.co.uk Wed Aug 4 12:24:18 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 4 Aug 2010 17:24:18 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C597DD5.9060604@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 3:48 PM, Nick Loman wrote: > Hi biopython-developers, > > Has anyone written any code to convert ACE files (Newbler ACE, in > particular) to SAM format? > > I've seen a little bit of discussion on this subject in various places: > http://seqanswers.com/forums/showthread.php?t=5138 > http://biostar.stackexchange.com/questions/1828/how-to-convert-newbler-output-or-ace-to-sam-format > > It seems that a quick way of doing this would involve Biopython's support > for reading ACE and (perhaps) PySam's support for writing SAM files > (http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/contents.html) > > The reason I would prefer to convert ACE files to the 454PairAlign.txt file > as this would mean support for both de novo assemblies as well as mapping > projects. I am not particularly au fait with the SAM format but can't see > why that shouldn't work. > > If no-one has started writing something I would be happy to give it a go but > equally if someone has I'd be more than happy to try it out :) > > Cheers > > Nick I've done ACE to SAM as an experiment, but haven't given the paired end stuff much testing (my usual assembly viewer Tablet doesn't support paired reads yet). Would you like the script? As you suggested it uses Biopython's ACE parser but write SAM output directly since it is very simple. Peter From biopython at maubp.freeserve.co.uk Wed Aug 4 12:27:59 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 4 Aug 2010 17:27:59 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 5:24 PM, Peter wrote: > > I've done ACE to SAM as an experiment, but haven't given the paired end > stuff much testing (my usual assembly viewer Tablet doesn't support paired > reads yet). Would you like the script? As you suggested it uses Biopython's > ACE parser but write SAM output directly since it is very simple. > Sorry - I spoke to quickly. I wrote a MAF (MIRA assembly format) to SAM converter as an experiment, not an ACE to SAM converter. The reason for this was ACE files don't have the read qualities, but MAF files do. Peter From n.j.loman at bham.ac.uk Wed Aug 4 12:32:44 2010 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Wed, 04 Aug 2010 17:32:44 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> Message-ID: <4C59962C.2010105@bham.ac.uk> Kevin Jacobs wrote: > I may have mislead you-- I use the 454PairAlign.txt and SFF files > together to generate SAM/BAM files with full untrimmed read data and > quality values. My recollection was that the Newbler ACE files > contained only the consensus sequence and not the individuals reads, > which is why I didn't go down that road. I routinely use "-noace" so > I'm quickly realigning a small dataset to generate an example ACE file > to verify this. If I am incorrect and alignment information is indeed > available from the ACE files, I'll happily add support for them to my > converter. Hi Kevin I'm pretty sure the ACE files contain the individual reads (or at the least, the trimmed, aligned portions of them) because this is the file one uses in Consed/Tablet to view an assembly. We may of course be talking at cross-purposes! Cheers Nick From n.j.loman at bham.ac.uk Wed Aug 4 12:35:51 2010 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Wed, 04 Aug 2010 17:35:51 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> Message-ID: <4C5996E7.8010807@bham.ac.uk> Peter wrote: > Sorry - I spoke to quickly. I wrote a MAF (MIRA assembly format) to SAM > converter as an experiment, not an ACE to SAM converter. The reason > for this was ACE files don't have the read qualities, but MAF files do. > Hi Peter Ah, I see - perhaps you could post this script somewhere just for educational purposes? I guess I should be able to get my work done by using Kevin's glu-genetics script and making my Newbler assemblies output the 454PairAlign.txt file using the -p option. Not sure if there's scope for tighter integration to Biopython but will leave that to you experts! Cheers Nick From bioinformed at gmail.com Wed Aug 4 13:00:50 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 13:00:50 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C5996E7.8010807@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> <4C5996E7.8010807@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 12:35 PM, Nick Loman wrote: > Peter wrote: > >> Sorry - I spoke to quickly. I wrote a MAF (MIRA assembly format) to SAM >> converter as an experiment, not an ACE to SAM converter. The reason >> for this was ACE files don't have the read qualities, but MAF files do. >> >> > [...] > Not sure if there's scope for tighter integration to Biopython but will > leave that to you experts! > > Unlike much of my other code, there is no dependency on pysam, so there is no reason why biopython couldn't adopt my converter -- I'd certainly be happy to donate it. I'm just not sure if there is a good place for it. -Kevin From biopython at maubp.freeserve.co.uk Wed Aug 4 13:01:36 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 4 Aug 2010 18:01:36 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C59962C.2010105@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 5:32 PM, Nick Loman wrote: > > Kevin Jacobs wrote: >> >> I may have mislead you-- I use the 454PairAlign.txt and SFF files together >> to generate SAM/BAM files with full untrimmed read data and quality values. >> ?My recollection was that the Newbler ACE files contained only the consensus >> sequence and not the individuals reads, which is why I didn't go down that >> road. ?I routinely use "-noace" so I'm quickly realigning a small dataset to >> generate an example ACE file to verify this. ?If I am incorrect and >> alignment information is indeed available from the ACE files, I'll happily >> add support for them to my converter. > > Hi Kevin > > I'm pretty sure the ACE files contain the individual reads (or at the least, > the trimmed, aligned portions of them) because this is the file one uses in > Consed/Tablet to view an assembly. Yes, they do. But ACE files lack the quality scores for the reads (they just have quality scores for the consensus) which are required for SAM or BAM. You'd have to insert dummy values or get them from another file - Kevin says he takes them from the SFF file. > > We may of course be talking at cross-purposes! > Maybe :) Peter From biopython at maubp.freeserve.co.uk Wed Aug 4 13:03:19 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 4 Aug 2010 18:03:19 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5996E7.8010807@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 6:00 PM, Kevin Jacobs wrote: > On Wed, Aug 4, 2010 at 12:35 PM, Nick Loman wrote: >> [...] >> Not sure if there's scope for tighter integration to Biopython but will >> leave that to you experts! >> >> > Unlike much of my other code, there is no dependency on pysam, so there is > no reason why biopython couldn't adopt my converter -- I'd certainly be > happy to donate it. ? I'm just not sure if there is a good place for it. Is it a stand alone script using Biopython to parse ACE and SFF? Maybe we can include it in the scripts folder - at the very least you could add a link to it on http://www.biopython.org/wiki/Scriptcentral Peter From n.j.loman at bham.ac.uk Wed Aug 4 13:05:25 2010 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Wed, 04 Aug 2010 18:05:25 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> Message-ID: <4C599DD5.2010308@bham.ac.uk> Peter wrote: > Yes, they do. But ACE files lack the quality scores for the reads (they > just have quality scores for the consensus) which are required for SAM > or BAM. You'd have to insert dummy values or get them from another > file - Kevin says he takes them from the SFF file. > Hi Peter Right, that makes sense. In which case it should be possible to convert ACE (when accompanied by the SFF files), as an alternative to using 454PairAlign.txt as the input file. Certainly the ACE file contains the unique 454 read identifiers that would make it possible to pull read qualities from the SFF, although you would have to watch out for the Newbler partial read alignments (READ_ID.NtoN style) Cheers Nick From bioinformed at gmail.com Wed Aug 4 13:07:36 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 13:07:36 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5996E7.8010807@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 1:03 PM, Peter wrote: > On Wed, Aug 4, 2010 at 6:00 PM, Kevin Jacobs > wrote: > > On Wed, Aug 4, 2010 at 12:35 PM, Nick Loman > wrote: > >> [...] > >> Not sure if there's scope for tighter integration to Biopython but will > >> leave that to you experts! > >> > >> > > Unlike much of my other code, there is no dependency on pysam, so there > is > > no reason why biopython couldn't adopt my converter -- I'd certainly be > > happy to donate it. I'm just not sure if there is a good place for it. > > Is it a stand alone script using Biopython to parse ACE and SFF? > Maybe we can include it in the scripts folder - at the very least you could > add a link to it on http://www.biopython.org/wiki/Scriptcentral > > The code is here: http://code.google.com/p/glu-genetics/source/browse/glu/modules/seq/Newbler2SAM.py There are some fairly simple dependencies on GLU libraries and an optional Cython accelerator for CIGAR and NM computation, but otherwise it is fairly easy to make it stand alone. -Kevin From biopython at maubp.freeserve.co.uk Wed Aug 4 13:10:51 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 4 Aug 2010 18:10:51 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C599DD5.2010308@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> <4C599DD5.2010308@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 6:05 PM, Nick Loman wrote: > Peter wrote: >> >> Yes, they do. But ACE files lack the quality scores for the reads (they >> just have quality scores for the consensus) which are required for SAM >> or BAM. You'd have to insert dummy values or get them from another >> file - Kevin says he takes them from the SFF file. >> > > Hi Peter > > Right, that makes sense. In which case it should be possible to convert ACE > (when accompanied by the SFF files), as an alternative to using > 454PairAlign.txt as the input file. Certainly the ACE file contains the > unique 454 read identifiers that would make it possible to pull read > qualities from the SFF, although you would have to watch out for the Newbler > partial read alignments (READ_ID.NtoN style) > > Cheers > > Nick Or ask your Roche representatives to implement SAM/BAM output? Mind you, as the MIRA3 author has pointed out, SAM/BAM isn't so great for de novo assemblies. It doesn't even store the contig seq ;) Peter From bioinformed at gmail.com Wed Aug 4 13:11:28 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 13:11:28 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C599DD5.2010308@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> <4C599DD5.2010308@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 1:05 PM, Nick Loman wrote: > Peter wrote: > >> Yes, they do. But ACE files lack the quality scores for the reads (they >> just have quality scores for the consensus) which are required for SAM >> or BAM. You'd have to insert dummy values or get them from another >> file - Kevin says he takes them from the SFF file. >> >> > > Right, that makes sense. In which case it should be possible to convert ACE > (when accompanied by the SFF files), as an alternative to using > 454PairAlign.txt as the input file. Certainly the ACE file contains the > unique 454 read identifiers that would make it possible to pull read > qualities from the SFF, although you would have to watch out for the Newbler > partial read alignments (READ_ID.NtoN style) > > In that case, I'm happy to add support for those ACE files. My code already handles the NtoN trimming from the 454PairAlign files. If you'd like to send me (off list) a small example ACE file, I can likely have it working very quickly. -Kevin From bioinformed at gmail.com Wed Aug 4 13:12:02 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 13:12:02 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> <4C599DD5.2010308@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 1:10 PM, Peter wrote: > Or ask your Roche representatives to implement SAM/BAM output? > Mind you, as the MIRA3 author has pointed out, SAM/BAM isn't so > great for de novo assemblies. It doesn't even store the contig seq ;) > > They're working on it. :) -Kevin From biopython at maubp.freeserve.co.uk Wed Aug 4 13:24:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 4 Aug 2010 18:24:37 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> <4C599DD5.2010308@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 6:12 PM, Kevin Jacobs wrote: > On Wed, Aug 4, 2010 at 1:10 PM, Peter wrote: >> >> Or ask your Roche representatives to implement SAM/BAM output? >> Mind you, as the MIRA3 author has pointed out, SAM/BAM isn't so >> great for de novo assemblies. It doesn't even store the contig seq ;) >> > > They're working on it. ?:) > -Kevin > Do you mean Roche are working on SAM/BAM output? If so, that's good news. If you mean the SAM/BAM format, I'm on the samtools-devel list where they are discussing several specification improvements/additions, but I don't recall anything specifically aimed at de novo assemblies. Peter From bioinformed at gmail.com Wed Aug 4 13:32:22 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 13:32:22 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> <4C599DD5.2010308@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 1:24 PM, Peter wrote: > On Wed, Aug 4, 2010 at 6:12 PM, Kevin Jacobs wrote: > > On Wed, Aug 4, 2010 at 1:10 PM, Peter wrote: > >> > >> Or ask your Roche representatives to implement SAM/BAM output? > >> Mind you, as the MIRA3 author has pointed out, SAM/BAM isn't so > >> great for de novo assemblies. It doesn't even store the contig seq ;) > >> > > > > They're working on it. :) > > -Kevin > > > > Do you mean Roche are working on SAM/BAM output? If so, that's good news. Yes, I believe that Roche is working on SAM/BAM support for a future version of Newbler. As of a few weeks ago, when I last spoke to them, they were just gathering information and had yet to start on the implementation. I'd expect to see this feature with Newbler 2.6 (supporting the same features as 2.5 on the FLX, which was for the Jr only) or more likely with the subsequent 2.7 release. In other words, I'd be surprised to see it in the coming weeks or few months. > If you mean the SAM/BAM format, I'm on the samtools-devel list where > they are discussing several specification improvements/additions, but > I don't recall anything specifically aimed at de novo assemblies. > > I didn't see the Roche folks in that discussion, but I'll look again. As far as I know, they're not looking to change or alter the spec, but I could easily be wrong. I do keep in contact with a few of the folks at Roche, but have no deep insight into their future plans. -Kevin From bugzilla-daemon at portal.open-bio.org Wed Aug 4 14:00:37 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Aug 2010 14:00:37 -0400 Subject: [Biopython-dev] [Bug 3130] New: Broken links in Documentation to NCBI Blast Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3130 Summary: Broken links in Documentation to NCBI Blast Product: Biopython Version: Not Applicable Platform: All URL: http://www.biopython.org/DIST/docs/tutorial/Tutorial.htm l OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: mphillip at vt.edu The links to NCBI Blast in 7.1 of the Biopython Tutorial and Cookbook, http://www.biopython.org/DIST/docs/tutorial/Tutorial.html, are broken (they lead to HTTP 404 pages): http://www.ncbi.nlm.nih.gov/BLAST/blast_program.html should possibly be http://www.ncbi.nlm.nih.gov/BLAST/blast_program.shtml http://www.ncbi.nlm.nih.gov/BLAST/blast_databases.html should possibly be http://www.ncbi.nlm.nih.gov/BLAST/blast_databases.shtml -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bioinformed at gmail.com Wed Aug 4 16:41:06 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 16:41:06 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C59962C.2010105@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 12:32 PM, Nick Loman wrote: > I'm pretty sure the ACE files contain the individual reads (or at the > least, the trimmed, aligned portions of them) because this is the file one > uses in Consed/Tablet to view an assembly. We may of course be talking at > cross-purposes! > > Hi Nick, I've reviewed the Newbler ACE files and re-discovered the reason why they weren't ideal in the first place: the alignment records in Newbler?s output are gapped based on a pseudo-multiple-alignment of all of the reads to the reference, not a standard pairwise alignment. So there is no easy way to differentiate which gaps in each read were introduced as part of the pairwise alignment or as artifacts of the multi-way alignment. This means I'd need to re-compute the alignment to the reference, but should be relatively easy since the aligned start position is known using a round of the standard Smith-Waterman algorithm. In other words, it is technically possible to use Newbler's ACE files, but it really is simpler and easier to use the 454PairAlign.txt results. More so because the 454PairAlign.txt files are often vastly smaller than 454Contig.ace files. On the other hand, it should be easy to adapt my scripts to convert non-Newbler ACE files to SAM/BAM provided that the reads are gapped for pairwise alignment. It has been so long since I've used consed/phred/phrap that I don't remember if this is how it is normally done. -Kevin From bioinformed at gmail.com Wed Aug 4 16:44:25 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 16:44:25 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 4:41 PM, Kevin Jacobs < bioinformed at gmail.com> wrote: > On Wed, Aug 4, 2010 at 12:32 PM, Nick Loman wrote: > >> I'm pretty sure the ACE files contain the individual reads (or at the >> least, the trimmed, aligned portions of them) because this is the file one >> uses in Consed/Tablet to view an assembly. We may of course be talking at >> cross-purposes! >> >> > I've reviewed the Newbler ACE files and re-discovered the reason why they > weren't ideal in the first place: > Never mind-- I didn't realized the consensus sequence was gapped, so it is then trivial to recover the original pairwise alignments. I'll have a version of my Newbler2SAM module that can process ACE files shortly. -Kevin From bugzilla-daemon at portal.open-bio.org Wed Aug 4 17:14:30 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Aug 2010 17:14:30 -0400 Subject: [Biopython-dev] [Bug 3130] Broken links in Documentation to NCBI Blast In-Reply-To: Message-ID: <201008042114.o74LEUCs030000@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3130 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-04 17:14 EST ------- Confirmed. I wonder why the NCBI changed this without putting redirects in place? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 6 15:05:30 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Aug 2010 15:05:30 -0400 Subject: [Biopython-dev] [Bug 3109] Record class in Bio.SCOP.Cla has hierarchy member as list instead of dictionary In-Reply-To: Message-ID: <201008061905.o76J5UTt014104@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3109 jeffrey.finkelstein at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1522 is|0 |1 obsolete| | ------- Comment #2 from jeffrey.finkelstein at gmail.com 2010-08-06 15:05 EST ------- Created an attachment (id=1538) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1538&action=view) Fix bug #3109: replace Bio.SCOP.Cla.Record.hierarchy list with dictionary Updated patch for bug #3109, without removed trailing newlines -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 6 15:51:40 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Aug 2010 15:51:40 -0400 Subject: [Biopython-dev] [Bug 3109] Record class in Bio.SCOP.Cla has hierarchy member as list instead of dictionary In-Reply-To: Message-ID: <201008061951.o76JpesH015503@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3109 jeffrey.finkelstein at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- URL|http://github.com/jfinkels/b|http://github.com/jfinkels/b |iopython/commit/6d2257dd0c46|iopython/commit/e52255f06c08 |abdf1ecd14b8bc660e32a205630a|aad1556eb518e2e0da8f030765ff Version|1.54b |1.54 ------- Comment #3 from jeffrey.finkelstein at gmail.com 2010-08-06 15:51 EST ------- (In reply to comment #2) > Created an attachment (id=1538) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1538&action=view) [details] > Fix bug #3109: replace Bio.SCOP.Cla.Record.hierarchy list with dictionary > > Updated patch for bug #3109, without removed trailing newlines > I have update the patch for this bug so that it no longer removes the trailing newlines. I will open a new bug for that change. The fix can be found at my personal github fork of Biopython: http://github.com/jfinkels/biopython/commit/e52255f06c08aad1556eb518e2e0da8f030765ff Here is a before/after demonstration of usage of the code which this patch affects. Currently, to select, for example, all PDB chains in superfamily 50156, one must use something similar to the following code: import Bio.SCOP.Cla SCOP_CLA_FILE = 'dir.cla.scop.txt_1.75' records = [] with open(SCOP_CLA_FILE, 'r') as f: for record in Bio.SCOP.Cla.parse(f): for key, value in record.hierarchy: if key == 'sf' and value == 50156: records.append(record) print [record.residues.pdbid for record in records] With this patch, the hierarchy key/value pairs can be accessed like a dictionary: import Bio.SCOP.Cla SCOP_CLA_FILE = 'dir.cla.scop.txt_1.75' with open(SCOP_CLA_FILE, 'r') as f: records = [record for record in Bio.SCOP.Cla.parse(f) if record.hierarchy['sf'] == 50156] print [record.residues.pdbid for record in records] The benefit is greater with more complex selections of sets of chains (for example, to select all families within a superfamily). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jeffrey.finkelstein at gmail.com Fri Aug 6 16:09:31 2010 From: jeffrey.finkelstein at gmail.com (Jeffrey Finkelstein) Date: Fri, 6 Aug 2010 16:09:31 -0400 Subject: [Biopython-dev] Bug #3109: Record class in Bio.SCOP.Cla has hierarchy member as list instead of dictionary Message-ID: I have submitted a bug report and a patch for a current feature of the Bio.SCOP.Cla module that makes using it somewhat difficult. Specifically, the Bio.Scop.Cla.Record class has a "hierarchy" member which is currently a list, but should be a dictionary, according to the SCOP parseable classification file format "specification" (it is an informal specification) here: http://scop.mrc-lmb.cam.ac.uk/scop/release-notes.html#scop-parseable-files. For a usage example, see the comments I have made at: http://bugzilla.open-bio.org/show_bug.cgi?id=3109 I have CC'ed the original author of the module. Gavin (or anyone else), do you have any objections to this change? Jeffrey From updates at feedmyinbox.com Sat Aug 7 03:14:21 2010 From: updates at feedmyinbox.com (Feed My Inbox) Date: Sat, 7 Aug 2010 03:14:21 -0400 Subject: [Biopython-dev] 8/7 newest questions tagged biopython - Stack Overflow Message-ID: ==================== 1. How do I parse data in a table using Biopython? ==================== August 6, 2010 at 5:53 AM Hello, I want to screen a particular column in a table using biopython. I want to parse the table and retain only entries not having "empty spaces" in a particular column. Please any ideas? http://stackoverflow.com/questions/3422677/how-do-i-parse-data-in-a-table-using-biopython -------------------- ==================== Source: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest This email was sent to biopython-dev at lists.open-bio.org. Account Login: https://www.feedmyinbox.com/members/login/ Don't want to receive this feed any longer? Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/444425/3d4c1dce7770d369dc1ad81e140923e46ae95832/ -------------------- This email was carefully delivered by FeedMyInbox.com. 230 Franklin Road Suite 814 Franklin, TN 37064 From thomasvangurp at gmail.com Sun Aug 8 05:55:11 2010 From: thomasvangurp at gmail.com (Thomas van Gurp) Date: Sun, 8 Aug 2010 11:55:11 +0200 Subject: [Biopython-dev] unsubscribe Message-ID: 2010/8/5 > Send Biopython-dev mailing list submissions to > biopython-dev at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/biopython-dev > or, via email, send a message with subject or body 'help' to > biopython-dev-request at lists.open-bio.org > > You can reach the person managing the list at > biopython-dev-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Biopython-dev digest..." > > > Today's Topics: > > 1. Re: Newbler ACE file to SAM? > (Kevin Jacobs ) > 2. [Bug 3130] New: Broken links in Documentation to NCBI Blast > (bugzilla-daemon at portal.open-bio.org) > 3. Re: Newbler ACE file to SAM? > (Kevin Jacobs ) > 4. Re: Newbler ACE file to SAM? > (Kevin Jacobs ) > 5. [Bug 3130] Broken links in Documentation to NCBI Blast > (bugzilla-daemon at portal.open-bio.org) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 4 Aug 2010 13:32:22 -0400 > From: "Kevin Jacobs " > Subject: Re: [Biopython-dev] Newbler ACE file to SAM? > To: Peter > Cc: "biopython-dev at biopython.org" > Message-ID: > > > > Content-Type: text/plain; charset=ISO-8859-1 > > On Wed, Aug 4, 2010 at 1:24 PM, Peter >wrote: > > > On Wed, Aug 4, 2010 at 6:12 PM, Kevin Jacobs wrote: > > > On Wed, Aug 4, 2010 at 1:10 PM, Peter wrote: > > >> > > >> Or ask your Roche representatives to implement SAM/BAM output? > > >> Mind you, as the MIRA3 author has pointed out, SAM/BAM isn't so > > >> great for de novo assemblies. It doesn't even store the contig seq ;) > > >> > > > > > > They're working on it. :) > > > -Kevin > > > > > > > Do you mean Roche are working on SAM/BAM output? If so, that's good news. > > > > Yes, I believe that Roche is working on SAM/BAM support for a future > version > of Newbler. As of a few weeks ago, when I last spoke to them, they were > just gathering information and had yet to start on the implementation. I'd > expect to see this feature with Newbler 2.6 (supporting the same features > as > 2.5 on the FLX, which was for the Jr only) or more likely with the > subsequent 2.7 release. In other words, I'd be surprised to see it in the > coming weeks or few months. > > > > If you mean the SAM/BAM format, I'm on the samtools-devel list where > > they are discussing several specification improvements/additions, but > > I don't recall anything specifically aimed at de novo assemblies. > > > > > I didn't see the Roche folks in that discussion, but I'll look again. As > far as I know, they're not looking to change or alter the spec, but I could > easily be wrong. I do keep in contact with a few of the folks at Roche, > but > have no deep insight into their future plans. > > -Kevin > > > ------------------------------ > > Message: 2 > Date: Wed, 4 Aug 2010 14:00:37 -0400 > From: bugzilla-daemon at portal.open-bio.org > Subject: [Biopython-dev] [Bug 3130] New: Broken links in Documentation > to NCBI Blast > To: biopython-dev at biopython.org > Message-ID: > > http://bugzilla.open-bio.org/show_bug.cgi?id=3130 > > Summary: Broken links in Documentation to NCBI Blast > Product: Biopython > Version: Not Applicable > Platform: All > URL: > http://www.biopython.org/DIST/docs/tutorial/Tutorial.htm > l > OS/Version: All > Status: NEW > Severity: normal > Priority: P2 > Component: Documentation > AssignedTo: biopython-dev at biopython.org > ReportedBy: mphillip at vt.edu > > > The links to NCBI Blast in 7.1 of the Biopython Tutorial and Cookbook, > http://www.biopython.org/DIST/docs/tutorial/Tutorial.html, are broken > (they > lead to HTTP 404 pages): > > http://www.ncbi.nlm.nih.gov/BLAST/blast_program.html should possibly be > http://www.ncbi.nlm.nih.gov/BLAST/blast_program.shtml > > http://www.ncbi.nlm.nih.gov/BLAST/blast_databases.html should possibly be > http://www.ncbi.nlm.nih.gov/BLAST/blast_databases.shtml > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > > > ------------------------------ > > Message: 3 > Date: Wed, 4 Aug 2010 16:41:06 -0400 > From: "Kevin Jacobs " > Subject: Re: [Biopython-dev] Newbler ACE file to SAM? > To: Nick Loman > Cc: "biopython-dev at biopython.org" > Message-ID: > > > > Content-Type: text/plain; charset=windows-1252 > > On Wed, Aug 4, 2010 at 12:32 PM, Nick Loman wrote: > > > I'm pretty sure the ACE files contain the individual reads (or at the > > least, the trimmed, aligned portions of them) because this is the file > one > > uses in Consed/Tablet to view an assembly. We may of course be talking at > > cross-purposes! > > > > > Hi Nick, > > I've reviewed the Newbler ACE files and re-discovered the reason why they > weren't ideal in the first place: the alignment records in Newbler?s output > are gapped based on a pseudo-multiple-alignment of all of the reads to the > reference, not a standard pairwise alignment. So there is no easy way to > differentiate which gaps in each read were introduced as part of the > pairwise alignment or as artifacts of the multi-way alignment. This means > I'd need to re-compute the alignment to the reference, but should be > relatively easy since the aligned start position is known using a round of > the standard Smith-Waterman algorithm. > > In other words, it is technically possible to use Newbler's ACE files, but > it really is simpler and easier to use the 454PairAlign.txt results. More > so because the 454PairAlign.txt files are often vastly smaller than > 454Contig.ace files. > > On the other hand, it should be easy to adapt my scripts to convert > non-Newbler ACE files to SAM/BAM provided that the reads are gapped for > pairwise alignment. It has been so long since I've used consed/phred/phrap > that I don't remember if this is how it is normally done. > > -Kevin > > > > ------------------------------ > > Message: 4 > Date: Wed, 4 Aug 2010 16:44:25 -0400 > From: "Kevin Jacobs " > Subject: Re: [Biopython-dev] Newbler ACE file to SAM? > To: Nick Loman > Cc: "biopython-dev at biopython.org" > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > On Wed, Aug 4, 2010 at 4:41 PM, Kevin Jacobs < > bioinformed at gmail.com> wrote: > > > On Wed, Aug 4, 2010 at 12:32 PM, Nick Loman > wrote: > > > >> I'm pretty sure the ACE files contain the individual reads (or at the > >> least, the trimmed, aligned portions of them) because this is the file > one > >> uses in Consed/Tablet to view an assembly. We may of course be talking > at > >> cross-purposes! > >> > >> > > I've reviewed the Newbler ACE files and re-discovered the reason why they > > weren't ideal in the first place: > > > > Never mind-- I didn't realized the consensus sequence was gapped, so it is > then trivial to recover the original pairwise alignments. I'll have a > version of my Newbler2SAM module that can process ACE files shortly. > > -Kevin > > > ------------------------------ > > Message: 5 > Date: Wed, 4 Aug 2010 17:14:30 -0400 > From: bugzilla-daemon at portal.open-bio.org > Subject: [Biopython-dev] [Bug 3130] Broken links in Documentation to > NCBI Blast > To: biopython-dev at biopython.org > Message-ID: <201008042114.o74LEUCs030000 at portal.open-bio.org> > > http://bugzilla.open-bio.org/show_bug.cgi?id=3130 > > > > > > ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-04 17:14 EST ------- > Confirmed. > > I wonder why the NCBI changed this without putting redirects in place? > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > > > ------------------------------ > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > End of Biopython-dev Digest, Vol 91, Issue 7 > ******************************************** > -- Met vriendelijke Groet, Thomas van Gurp From biopython at maubp.freeserve.co.uk Wed Aug 11 06:29:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Aug 2010 11:29:32 +0100 Subject: [Biopython-dev] Bug #3109: Record class in Bio.SCOP.Cla has hierarchy member as list instead of dictionary In-Reply-To: References: Message-ID: On Fri, Aug 6, 2010 at 9:09 PM, Jeffrey Finkelstein wrote: > I have submitted a bug report and a patch for a current feature of the > Bio.SCOP.Cla module that makes using it somewhat difficult. Specifically, > the Bio.Scop.Cla.Record class has a "hierarchy" member which is currently a > list, but should be a dictionary, according to the SCOP parseable > classification file format "specification" (it is an informal specification) > here: > http://scop.mrc-lmb.cam.ac.uk/scop/release-notes.html#scop-parseable-files. > > For a usage example, see the comments I have made at: > http://bugzilla.open-bio.org/show_bug.cgi?id=3109 > > I have CC'ed the original author of the module. Gavin (or anyone else), do > you have any objections to this change? > > Jeffrey As I commented on the bug, I'm happy with this change in principle, except for the fact it breaks backwards compatibility. If we ask on the main list and there are no objections, then the code proposed looks fine: http://github.com/jfinkels/biopython/commit/e52255f06c08aad1556eb518e2e0da8f030765ff Peter From biopython at maubp.freeserve.co.uk Thu Aug 12 12:37:14 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 12 Aug 2010 17:37:14 +0100 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 Message-ID: Hi Eric (et al), Is test_PhyloXML.py working for you under Python 3? I'm getting the following (both with and without the 2to3 --nofix=long option): $ python3 test_PhyloXML.py test_clade_getitem (__main__.MethodTests) Clade.__getitem__: get sub-clades by extended indexing. ... ERROR test_clade_to_phylogeny (__main__.MethodTests) Convert a Clade object to a new Phylogeny. ... ERROR ... Traceback (most recent call last): File "test_PhyloXML.py", line 571, in test_phylo 'test_Taxonomy', 'test_Uri', File "test_PhyloXML.py", line 504, in _rewrite_and_call phx = PhyloXMLIO.read(infile) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 105, in read return Parser(file).read() File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 298, in __init__ event, root = next(context) File "", line 59, in __iter__ TypeError: invalid event tuple ---------------------------------------------------------------------- Ran 47 tests in 0.015s All the sub-tests in test_PhyloXML.py are failing the same way. >From memory this was working recently. Peter From bugzilla-daemon at portal.open-bio.org Thu Aug 12 13:16:07 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Aug 2010 13:16:07 -0400 Subject: [Biopython-dev] [Bug 2790] Genepop parser creates a full representation of the file on memory In-Reply-To: Message-ID: <201008121716.o7CHG7A8021694@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2790 ------- Comment #1 from jeffrey.finkelstein at gmail.com 2010-08-12 13:16 EST ------- The original reporter of this bug has provided a "large" FileParser: http://github.com/biopython/biopython/commit/507839a8868f9d35dc73e6195947019e3ac7fe6b Is there a reason not to use this memory-saving generator method as the only file parser? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 12 13:23:27 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Aug 2010 13:23:27 -0400 Subject: [Biopython-dev] [Bug 2790] Genepop parser creates a full representation of the file on memory In-Reply-To: Message-ID: <201008121723.o7CHNRqO021882@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2790 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-12 13:23 EST ------- Hi Jeffrey, I'd guess Tiago is concerned with backwards compatibility, or that the all in memory case is very useful for typical smaller analyses. [If it wasn't clear, Tiago is the module owner for Bio.PopGen] Tiago, can we mark this enhancement as fixed now? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Thu Aug 12 21:24:25 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 12 Aug 2010 21:24:25 -0400 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: References: Message-ID: On Thu, Aug 12, 2010 at 12:37 PM, Peter wrote: > Hi Eric (et al), > > Is test_PhyloXML.py working for you under Python 3? > > I'm getting the following (both with and without the 2to3 --nofix=long > option): > > $ python3 test_PhyloXML.py > ... > File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", > line 298, in __init__ > event, root = next(context) > File "", line 59, in __iter__ > TypeError: invalid event tuple > > ---------------------------------------------------------------------- > Ran 47 tests in 0.015s > > All the sub-tests in test_PhyloXML.py are failing the same way. > > >From memory this was working recently. > > Yeah, it was... it's fixed now/again. This is the issue with passing byte/unicode strings to cElementTree in Python 3. I had a check for Python versions 3.0.0 through 3.1.1, where we need to import ElementTree instead of cElementTree. Apparently Python 3.1.2 still has the bug. -Eric From bugzilla-daemon at portal.open-bio.org Fri Aug 13 05:12:25 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Aug 2010 05:12:25 -0400 Subject: [Biopython-dev] [Bug 2790] Genepop parser creates a full representation of the file on memory In-Reply-To: Message-ID: <201008130912.o7D9CPKN004338@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2790 tiagoantao at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from tiagoantao at gmail.com 2010-08-13 05:12 EST ------- The in-memory parser seems to be orders of magnitude faster than the non-memory one. Therefore it might make sense to maintain both. Also for retro-compatibility. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Aug 13 06:29:23 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 13 Aug 2010 11:29:23 +0100 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: References: Message-ID: On Fri, Aug 13, 2010 at 2:24 AM, Eric Talevich wrote: > On Thu, Aug 12, 2010 at 12:37 PM, Peter wrote: > >> Hi Eric (et al), >> >> Is test_PhyloXML.py working for you under Python 3? >> >> I'm getting the following (both with and without the 2to3 --nofix=long >> option): >> >> $ python3 test_PhyloXML.py >> ... >> ?File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", >> line 298, in __init__ >> ? ?event, root = next(context) >> ?File "", line 59, in __iter__ >> TypeError: invalid event tuple >> >> ---------------------------------------------------------------------- >> Ran 47 tests in 0.015s >> >> All the sub-tests in test_PhyloXML.py are failing the same way. >> >> >From memory this was working recently. >> >> > Yeah, it was... it's fixed now/again. > > This is the issue with passing byte/unicode strings to cElementTree in > Python 3. I had a check for Python versions 3.0.0 through 3.1.1, where we > need to import ElementTree instead of cElementTree. Apparently Python 3.1.2 > still has the bug. > > -Eric Yep - much better. However, I'm still seeing four failures with Python 3.1.2 which appear to be related to float/int/long conversion: ERROR: test_made (__main__.WriterTests) Round-trip parsing and serialization of made_up.xml. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PhyloXML.py", line 550, in test_made (TreeTests, ['test_Confidence', 'test_Polygon']), File "test_PhyloXML.py", line 512, in _rewrite_and_call getattr(inst, test)() File "test_PhyloXML.py", line 360, in test_Polygon tree = PhyloXMLIO.read(EX_MADE).phylogenies[1] File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 105, in read return Parser(file).read() File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 314, in read phylogeny = self._parse_phylogeny(elem) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 356, in _parse_phylogeny phylogeny.root = self._parse_clade(elem) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 402, in _parse_clade clade.clades.append(self._parse_clade(elem)) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 422, in _parse_clade getattr(self, tag)(elem)) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 553, in distribution polygons=_get_children_as(elem, 'polygon', self.polygon)) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 195, in _get_children_as parent.findall(_ns(tag))] File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 194, in return [construct(child) for child in File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 595, in polygon points=_get_children_as(elem, 'point', self.point)) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 195, in _get_children_as parent.findall(_ns(tag))] File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 194, in return [construct(child) for child in File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 589, in point _get_child_text(elem, 'long', float), File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 187, in _get_child_text return construct(child.text) ValueError: could not convert string to float: ====================================================================== ERROR: test_phylo (__main__.WriterTests) Round-trip parsing and serialization of phyloxml_examples.xml. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PhyloXML.py", line 571, in test_phylo 'test_Taxonomy', 'test_Uri', File "test_PhyloXML.py", line 512, in _rewrite_and_call getattr(inst, test)() File "test_PhyloXML.py", line 176, in test_Phyloxml phx = PhyloXMLIO.read(EX_PHYLO) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 105, in read return Parser(file).read() File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 314, in read phylogeny = self._parse_phylogeny(elem) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 356, in _parse_phylogeny phylogeny.root = self._parse_clade(elem) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 402, in _parse_clade clade.clades.append(self._parse_clade(elem)) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 402, in _parse_clade clade.clades.append(self._parse_clade(elem)) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 422, in _parse_clade getattr(self, tag)(elem)) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 552, in distribution points=_get_children_as(elem, 'point', self.point), File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 195, in _get_children_as parent.findall(_ns(tag))] File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 194, in return [construct(child) for child in File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 589, in point _get_child_text(elem, 'long', float), File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 187, in _get_child_text return construct(child.text) ValueError: could not convert string to float: ====================================================================== FAIL: test_Distribution (__main__.TreeTests) Instantiation of Distribution objects. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PhyloXML.py", line 322, in test_Distribution self.assertEqual(point.long, longi) AssertionError: != 8.769303 ====================================================================== FAIL: test_Polygon (__main__.TreeTests) Instantiation of Polygon objects. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PhyloXML.py", line 378, in test_Polygon self.assertEqual(point.long, longi) AssertionError: != 8.769303 ---------------------------------------------------------------------- From bugzilla-daemon at portal.open-bio.org Fri Aug 13 12:14:08 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Aug 2010 12:14:08 -0400 Subject: [Biopython-dev] [Bug 3130] Broken links in Documentation to NCBI Blast In-Reply-To: Message-ID: <201008131614.o7DGE8we020360@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3130 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-13 12:14 EST ------- Updated, http://github.com/biopython/biopython/commit/9049dd7f424b27c3a386533bfdf8f0e423091e3b Thanks -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 13 12:15:13 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Aug 2010 12:15:13 -0400 Subject: [Biopython-dev] [Bug 3102] Error converting sff into fastq In-Reply-To: Message-ID: <201008131615.o7DGFDTf020499@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3102 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-13 12:15 EST ------- I'm closing this bug as INVALID on the assumption it was a corrupt SFF file. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 13 12:50:06 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Aug 2010 12:50:06 -0400 Subject: [Biopython-dev] [Bug 3100] Bio.PDB.ResidueDepth distance calculation error In-Reply-To: Message-ID: <201008131650.o7DGo6F2021556@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3100 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-13 12:50 EST ------- Hi Andres, Installing MSMS was a pain, still not sure what the best way to deal with the atmtypenumbers file is for installation. And I found the nawk versus awk bug: http://mgldev.scripps.edu/pipermail/msms/2006q1/000006.html But yes, I can confirm the bug and the fix. Fix checked in: http://github.com/biopython/biopython/commit/aaac859df5e8d6a6b3a1304ad8b5c7c6163c4433 Thank you for your contribution, Peter P.S. Your example's import statements were incomplete, e.g. from Bio.PDB import PDBParser from Bio.PDB.ResidueDepth import get_surface, min_dist -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 13 12:59:42 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Aug 2010 12:59:42 -0400 Subject: [Biopython-dev] [Bug 3127] Set SeqRecord description in SeqIO "tab" parser In-Reply-To: Message-ID: <201008131659.o7DGxgvO021809@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3127 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-13 12:59 EST ------- Fixed on trunk: http://github.com/biopython/biopython/commit/f83e80c82d804b50b290fd42aa4d4d2a3d664363 Thanks for the feedback, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 13 13:02:01 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Aug 2010 13:02:01 -0400 Subject: [Biopython-dev] [Bug 3118] isinstance should use basestring for detecting string type In-Reply-To: Message-ID: <201008131702.o7DH21Zl021947@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3118 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-13 13:02 EST ------- I'm marking this as fixed, although some of the other cases may need to be looked at as part of the Python 3 work (where byte strings/unicode can be more of an issue). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 13 13:52:49 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Aug 2010 13:52:49 -0400 Subject: [Biopython-dev] [Bug 3096] PPBuilder build_peptides bugs In-Reply-To: Message-ID: <201008131752.o7DHqnnq023520@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3096 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-13 13:52 EST ------- Hi Siong, I've been going over your example again (and adding some doctests to Bio/PDB/Polypeptide.py as well). It seems to me that in order to show this "bug" you have had to override the builder class' private _accept() method. If in doing so you break the default build_peptides() method, then you should probably also override that too. Can you show a problem without subclassing the builder object? There may be scope for enhancement, but you haven't convinced me there is a bug here. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Aug 13 14:18:04 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 13 Aug 2010 19:18:04 +0100 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? Message-ID: Hi all, We've probably clocked up enough bug fixes and additions to justify a new release (even though there are still things waiting which look close to being ready, e.g. uniprot-xml and imgt parsing). Alternatively, should we delay while we work to get some more of the new stuff tested and merged? Regarding Python 2.7 support, it all looks fine. However, for the Windows installers we are waiting on an official NumPy installer for Python 2.7 (i.e. NumPy 1.5 which is due shortly I understand). I'm aware that in the course of the Python 3 work so far, we've touched quite a lot of the code - and some areas are still not fully covered by the unit tests. With that in mind, I think a beta release would be a prudent thing to do - primarily in the hope of end users spotting any issues which the unit tests have not revealed. Does doing a beta release some time next week sound like a good plan, with the official release say a week or two later? Are there any blocker issues we should be addressing first? e.g. test_NCBI_BLAST_tools.py fails with the latest BLAST+, we need to update the application wrappers as the NCBI have changed a few of the switches. Regards, Peter From bugzilla-daemon at portal.open-bio.org Fri Aug 13 18:23:24 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Aug 2010 18:23:24 -0400 Subject: [Biopython-dev] [Bug 3096] PPBuilder build_peptides bugs In-Reply-To: Message-ID: <201008132223.o7DMNOcJ018254@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3096 skong at zymeworks.com changed: What |Removed |Added ---------------------------------------------------------------------------- Version|Not Applicable |1.53 ------- Comment #3 from skong at zymeworks.com 2010-08-13 18:23 EST ------- Hi Peter, I manage to produce the problem without modifying _accept(). DIAGNOSTIC SCRIPT: from Bio.PDB.PDBParser import PDBParser from Bio.PDB.Polypeptide import PPBuilder, is_aa def extract_peptides(model): """Extracts the peptides from a model. Returns a list of Peptide object.""" output = [] for peptide in PPBuilder().build_peptides(model): seq = str(peptide.get_sequence()) output.append(seq) return output if __name__ == '__main__': pdb = open('chopped_pdb1bfe_noca.ent') st = PDBParser().get_structure('', pdb) seqa = extract_peptides(st) print 'no ca seq all' print seqa PDB FILE: chopped_pdb1bfe_noca.ent ATOM 85 N ILE A 316 37.386 71.217 31.070 1.00 36.97 N ATOM 86 CA ILE A 316 38.311 71.290 29.949 1.00 33.71 C ATOM 87 C ILE A 316 37.634 72.103 28.862 1.00 33.93 C ATOM 88 O ILE A 316 36.415 72.216 28.839 1.00 36.46 O ATOM 89 CB ILE A 316 38.651 69.876 29.404 1.00 35.79 C ATOM 90 CG1 ILE A 316 39.331 69.049 30.501 1.00 36.78 C ATOM 91 CG2 ILE A 316 39.572 69.979 28.187 1.00 37.71 C ATOM 92 CD1 ILE A 316 39.881 67.724 30.023 1.00 39.20 C ATOM 93 N HIS A 317 38.425 72.679 27.969 1.00 35.61 N ATOM 94 CA HIS A 317 37.880 73.473 26.881 1.00 37.92 C ATOM 95 C HIS A 317 38.360 72.928 25.540 1.00 37.79 C ATOM 96 O HIS A 317 39.463 73.240 25.094 1.00 37.44 O ATOM 97 CB HIS A 317 38.303 74.930 27.052 1.00 35.19 C ATOM 98 CG HIS A 317 37.888 75.519 28.363 1.00 35.76 C ATOM 99 ND1 HIS A 317 36.611 75.981 28.602 1.00 37.74 N ATOM 100 CD2 HIS A 317 38.575 75.701 29.516 1.00 37.59 C ATOM 101 CE1 HIS A 317 36.529 76.420 29.844 1.00 38.74 C ATOM 102 NE2 HIS A 317 37.706 76.262 30.421 1.00 36.76 N ATOM 103 N ARG A 318 37.527 72.109 24.905 1.00 38.78 N ATOM 104 CA ARG A 318 37.884 71.512 23.627 1.00 42.04 C ATOM 105 C ARG A 318 38.469 72.559 22.699 1.00 45.14 C ATOM 106 O ARG A 318 39.592 72.425 22.205 1.00 42.05 O ATOM 107 CB ARG A 318 36.657 70.880 22.967 1.00 42.93 C ATOM 108 CG ARG A 318 36.934 70.321 21.576 1.00 38.60 C ATOM 109 CD ARG A 318 35.654 70.038 20.821 1.00 35.39 C ATOM 110 NE ARG A 318 34.624 69.538 21.724 1.00 34.96 N ATOM 111 CZ ARG A 318 34.539 68.278 22.141 1.00 31.51 C ATOM 112 NH1 ARG A 318 35.419 67.373 21.736 1.00 25.19 N ATOM 113 NH2 ARG A 318 33.579 67.929 22.983 1.00 29.10 N ATOM 114 N XLY A 319 37.690 73.604 22.461 1.00 49.96 N ATOM 115 CX XLY A 319 38.138 74.668 21.592 1.00 55.53 C ATOM 116 C XLY A 319 38.459 74.219 20.180 1.00 58.85 C ATOM 117 O XLY A 319 37.583 73.766 19.440 1.00 58.98 O ATOM 118 N SER A 320 39.734 74.334 19.823 1.00 61.64 N ATOM 119 CA SER A 320 40.219 73.992 18.493 1.00 63.16 C ATOM 120 C SER A 320 40.212 72.517 18.110 1.00 65.27 C ATOM 121 O SER A 320 39.558 72.127 17.145 1.00 65.12 O ATOM 122 CB SER A 320 41.634 74.542 18.316 1.00 65.36 C ATOM 123 OG SER A 320 42.124 74.255 17.019 1.00 72.05 O ATOM 124 N THR A 321 40.955 71.702 18.853 1.00 67.43 N ATOM 125 CA THR A 321 41.049 70.274 18.562 1.00 67.73 C ATOM 126 C THR A 321 40.220 69.430 19.529 1.00 66.41 C ATOM 127 O THR A 321 39.244 69.917 20.095 1.00 70.21 O ATOM 128 CB THR A 321 42.517 69.810 18.620 1.00 70.22 C ATOM 129 OG1 THR A 321 42.613 68.453 18.169 1.00 77.03 O ATOM 130 CG2 THR A 321 43.049 69.915 20.045 1.00 72.07 C ATOM 131 N GLY A 322 40.608 68.168 19.707 1.00 61.22 N ATOM 132 CA GLY A 322 39.892 67.286 20.614 1.00 53.23 C ATOM 133 C GLY A 322 40.037 67.705 22.065 1.00 48.00 C ATOM 134 O GLY A 322 40.138 68.892 22.372 1.00 50.41 O ATOM 135 N LEU A 323 40.044 66.734 22.968 1.00 41.92 N ATOM 136 CA LEU A 323 40.190 67.033 24.385 1.00 35.58 C ATOM 137 C LEU A 323 41.613 66.738 24.874 1.00 31.41 C ATOM 138 O LEU A 323 41.932 66.921 26.046 1.00 30.47 O ATOM 139 CB LEU A 323 39.160 66.240 25.191 1.00 35.76 C ATOM 140 CG LEU A 323 37.716 66.576 24.802 1.00 39.50 C ATOM 141 CD1 LEU A 323 36.733 65.796 25.670 1.00 38.15 C ATOM 142 CD2 LEU A 323 37.493 68.074 24.955 1.00 38.58 C The output peptides should be: ['IHR',STGL'] not ['IHRXTGL'] in the current version. Residue XLY A 319 or X in the fourth position should not be included since it doesn't have CA atom. Instead the current version includes it and remove the 'S' next to it, due to the same bug. One can get the right version using the patch provided before. Whether the _accept is modified or not the bug remains. Also the user should not be expected to also modify build_peptides() method whenever PPBuilder _accept is modified since the accept variable in build_peptides isn't really a local (private) variable: In line 277 this variable accept is referenced from self.accept of PPBuilder. http://www.biopython.org/DIST/docs/api/Bio.PDB.Polypeptide-pysrc.html 277 accept=self._accept On a side note the "aa_only" optional input variable for build_peptides() and its comments are very misleading (@param aa_only: if 1, the residue needs to be a standard AA). "aa_only" is meant as a flag that tells peptide_builder to start filtering amino acids that are not to be accepted, and by default it is turned on and without modifying _accept of PeptideBuilder only residues with "CA" atom are accepted (line 250-264), not standard amino acids as the comment states. In other words without modifying _accept in PeptideBuilder non standard amino acid will still be accepted and included in the peptides built. Only when overriding the _accept method of PeptideBuilder (as I did before) would build_peptides() not include non-standard amino acids. I suggest renaming "aa_only" to something more sensible like "filter_aa". http://www.biopython.org/DIST/docs/api/Bio.PDB.Polypeptide-pysrc.html 266 - def build_peptides(self, entity, aa_only=1): 273 @param aa_only: if 1, the residue needs to be a standard AA 274 @type aa_only: int -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From anaryin at gmail.com Fri Aug 13 19:44:40 2010 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Sat, 14 Aug 2010 00:44:40 +0100 Subject: [Biopython-dev] GSOC Bio.PDB Project - Final Summary Message-ID: Dear all, The results of the GSOC 2010 project are in the wiki page: http://biopython.org/wiki/GSOC2010_Joao I also started writing a Struct page regarding that same module: http://biopython.org/wiki/Struct Comments appreciated :) I will be maintaining these features in the future and adding some others as well. Best regards to all! Jo?o [...] Rodrigues @ http://doeidoei.wordpress.org From mjldehoon at yahoo.com Fri Aug 13 22:23:29 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 13 Aug 2010 19:23:29 -0700 (PDT) Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: Message-ID: <816847.43152.qm@web62406.mail.re1.yahoo.com> I'm OK with a new release, provided we can fix the test errors. I have been looking at the Blast parsers (as discussed previously) but this turned out to be more difficult than expected; a new release should not wait for it. --Michiel. --- On Fri, 8/13/10, Peter wrote: > From: Peter > Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? > To: "Biopython-Dev Mailing List" > Date: Friday, August 13, 2010, 2:18 PM > Hi all, > > We've probably clocked up enough bug fixes and additions to > justify a > new release (even though there are still things waiting > which look close > to being ready, e.g. uniprot-xml and imgt parsing). > Alternatively, should > we delay while we work to get some more of the new stuff > tested and > merged? > > Regarding Python 2.7 support, it all looks fine. However, > for the Windows > installers we are waiting on an official NumPy installer > for Python 2.7 > (i.e. NumPy 1.5 which is due shortly I understand). > > I'm aware that in the course of the Python 3 work so far, > we've touched > quite a lot of the code - and some areas are still not > fully covered by the > unit tests. With that in mind, I think a beta release would > be a prudent > thing to do - primarily in the hope of end users spotting > any issues which > the unit tests have not revealed. > > Does doing a beta release some time next week sound like a > good plan, > with the official release say a week or two later? Are > there any blocker > issues we should be addressing first? > > e.g. test_NCBI_BLAST_tools.py fails with the latest BLAST+, > we > need to update the application wrappers as the NCBI have > changed > a few of the switches. > > Regards, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython at maubp.freeserve.co.uk Mon Aug 16 09:10:30 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Aug 2010 14:10:30 +0100 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: <816847.43152.qm@web62406.mail.re1.yahoo.com> References: <816847.43152.qm@web62406.mail.re1.yahoo.com> Message-ID: On Sat, Aug 14, 2010 at 3:23 AM, Michiel de Hoon wrote: > I'm OK with a new release, provided we can fix the test errors. I've sorted out test_NCBI_BLAST_tools.py, BLAST 2.2.23+ added -off_diagonal_range to blastn, but more puzzlingly appears to have removed -gapextend, -gapopen, -xdrop_gap, and -xdrop_gap_final from tblastx. This might be something to double check with the NCBI. Peter From chapmanb at 50mail.com Mon Aug 16 09:23:41 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 16 Aug 2010 09:23:41 -0400 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: References: Message-ID: <20100816132341.GF23299@sobchak.mgh.harvard.edu> Peter; > I'm aware that in the course of the Python 3 work so far, we've touched > quite a lot of the code - and some areas are still not fully covered by the > unit tests. With that in mind, I think a beta release would be a prudent > thing to do - primarily in the hope of end users spotting any issues which > the unit tests have not revealed. How is the Python 3 stuff looking? Perception wise, it would be nice to be able to make a release with a statement like: All of the non-C extension code works on Python 3 using 2to3. Are we at all close to something like that? Otherwise, your other plans all sound good. Brad From chapmanb at 50mail.com Mon Aug 16 09:23:41 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 16 Aug 2010 09:23:41 -0400 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: References: Message-ID: <20100816132341.GF23299@sobchak.mgh.harvard.edu> Peter; > I'm aware that in the course of the Python 3 work so far, we've touched > quite a lot of the code - and some areas are still not fully covered by the > unit tests. With that in mind, I think a beta release would be a prudent > thing to do - primarily in the hope of end users spotting any issues which > the unit tests have not revealed. How is the Python 3 stuff looking? Perception wise, it would be nice to be able to make a release with a statement like: All of the non-C extension code works on Python 3 using 2to3. Are we at all close to something like that? Otherwise, your other plans all sound good. Brad From chapmanb at 50mail.com Mon Aug 16 09:27:16 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 16 Aug 2010 09:27:16 -0400 Subject: [Biopython-dev] GSOC Bio.PDB Project - Final Summary In-Reply-To: References: Message-ID: <20100816132716.GG23299@sobchak.mgh.harvard.edu> Jo?o; > The results of the GSOC 2010 project are in the wiki page: > http://biopython.org/wiki/GSOC2010_Joao > > I also started writing a Struct page regarding that same module: > http://biopython.org/wiki/Struct > > Comments appreciated :) I will be maintaining these features in the future > and adding some others as well. This looks great; tons of really useful additions. What are your thoughts on getting this integrated into the main trunk? How disruptive is it to existing PDB code? Are there any back-compatibility issues? Thanks for all the hard work this summer and looking forward to seeing it get included. Brad From chapmanb at 50mail.com Mon Aug 16 09:27:16 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 16 Aug 2010 09:27:16 -0400 Subject: [Biopython-dev] GSOC Bio.PDB Project - Final Summary In-Reply-To: References: Message-ID: <20100816132716.GG23299@sobchak.mgh.harvard.edu> Jo?o; > The results of the GSOC 2010 project are in the wiki page: > http://biopython.org/wiki/GSOC2010_Joao > > I also started writing a Struct page regarding that same module: > http://biopython.org/wiki/Struct > > Comments appreciated :) I will be maintaining these features in the future > and adding some others as well. This looks great; tons of really useful additions. What are your thoughts on getting this integrated into the main trunk? How disruptive is it to existing PDB code? Are there any back-compatibility issues? Thanks for all the hard work this summer and looking forward to seeing it get included. Brad From biopython at maubp.freeserve.co.uk Mon Aug 16 09:47:30 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Aug 2010 14:47:30 +0100 Subject: [Biopython-dev] Bio.PDB on Python 3 Message-ID: Hi all, A while back I installed NumPy from their svn under Python 3, so that I could test more of Biopython. I hadn't really looked at Bio.PDB until recently because test_PDB.py depended on Bio.KDTree which needs some C code to be compiled (which we haven't tried yet). I recently added a few doctests to Bio/PDB/Polypeptide.py which showed a problem with the code using "next" as a variable name. This is a built in function on Python 3, taking the place of the next method on iterator objects. That's fixed now: http://github.com/biopython/biopython/commit/1eb48feb5520094bf7f0177be804a953024e6938 In order to test more of Bio.PDB under Python 3, I have just split test_PDB.py into two, creating a small test_PDB_KDtree.py file for the neighbour search functionality which requires the C code. This has revealed there are at least two issues with Bio.PDB to be addressed (see below). Peter ====================================================================== ERROR: test_1_warnings (__main__.A_ExceptionTest) Check warnings: Parse a flawed PDB file in permissive mode. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 200, in _parse_coordinates fullname, serial_number, element) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", line 232, in init_atom residue.add(atom) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/Residue.py", line 82, in add "Atom %s defined twice in residue %s" % (atom_id, self)) Bio.PDB.PDBExceptions.PDBConstructionException: Atom N defined twice in residue During handling of the above exception, another exception occurred: Traceback (most recent call last): File "test_PDB.py", line 57, in test_1_warnings p.get_structure("example", "PDB/a_structure.pdb") File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 64, in get_structure self._parse(file.readlines()) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 84, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 202, in _parse_coordinates self._handle_PDB_exception(message, global_line_counter) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 256, in _handle_PDB_exception % message, PDBConstructionWarning) File "test_PDB.py", line 53, in showwarning all_warns.append(*args[0]) TypeError: append() argument after * must be a sequence, not PDBConstructionWarning ====================================================================== ERROR: test_ExposureCN (__main__.Exposure) HSExposureCN. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PDB.py", line 612, in setUp structure=PDBParser(PERMISSIVE=True).get_structure('X', pdb_filename) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 64, in get_structure self._parse(file.readlines()) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 84, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 200, in _parse_coordinates fullname, serial_number, element) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", line 185, in init_atom duplicate_atom=residue[name] TypeError: 'DisorderedResidue' object is not subscriptable ====================================================================== ERROR: test_HSExposureCA (__main__.Exposure) HSExposureCA. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PDB.py", line 612, in setUp structure=PDBParser(PERMISSIVE=True).get_structure('X', pdb_filename) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 64, in get_structure self._parse(file.readlines()) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 84, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 200, in _parse_coordinates fullname, serial_number, element) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", line 185, in init_atom duplicate_atom=residue[name] TypeError: 'DisorderedResidue' object is not subscriptable ====================================================================== ERROR: test_HSExposureCB (__main__.Exposure) HSExposureCB. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PDB.py", line 612, in setUp structure=PDBParser(PERMISSIVE=True).get_structure('X', pdb_filename) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 64, in get_structure self._parse(file.readlines()) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 84, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 200, in _parse_coordinates fullname, serial_number, element) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", line 185, in init_atom duplicate_atom=residue[name] TypeError: 'DisorderedResidue' object is not subscriptable ====================================================================== ERROR: test_c_n (__main__.ParseTest) Extract polypeptides using C-N. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PDB.py", line 138, in setUp self.structure = p.get_structure("example", "PDB/a_structure.pdb") File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 64, in get_structure self._parse(file.readlines()) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 84, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 200, in _parse_coordinates fullname, serial_number, element) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", line 185, in init_atom duplicate_atom=residue[name] TypeError: 'DisorderedResidue' object is not subscriptable ====================================================================== ERROR: test_ca_ca (__main__.ParseTest) Extract polypeptides using CA-CA. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PDB.py", line 138, in setUp self.structure = p.get_structure("example", "PDB/a_structure.pdb") File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 64, in get_structure self._parse(file.readlines()) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 84, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 200, in _parse_coordinates fullname, serial_number, element) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", line 185, in init_atom duplicate_atom=residue[name] TypeError: 'DisorderedResidue' object is not subscriptable ====================================================================== ERROR: test_details (__main__.ParseTest) Verify details of the parsed example PDB file. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PDB.py", line 138, in setUp self.structure = p.get_structure("example", "PDB/a_structure.pdb") File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 64, in get_structure self._parse(file.readlines()) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 84, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 200, in _parse_coordinates fullname, serial_number, element) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", line 185, in init_atom duplicate_atom=residue[name] TypeError: 'DisorderedResidue' object is not subscriptable ====================================================================== ERROR: test_structure (__main__.ParseTest) Verify the structure of the parsed example PDB file. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PDB.py", line 138, in setUp self.structure = p.get_structure("example", "PDB/a_structure.pdb") File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 64, in get_structure self._parse(file.readlines()) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 84, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 200, in _parse_coordinates fullname, serial_number, element) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", line 185, in init_atom duplicate_atom=residue[name] TypeError: 'DisorderedResidue' object is not subscriptable ---------------------------------------------------------------------- Ran 14 tests in 1.205s FAILED (errors=8) From biopython at maubp.freeserve.co.uk Mon Aug 16 09:48:25 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Aug 2010 14:48:25 +0100 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: <20100816132341.GF23299@sobchak.mgh.harvard.edu> References: <20100816132341.GF23299@sobchak.mgh.harvard.edu> Message-ID: On Mon, Aug 16, 2010 at 2:23 PM, Brad Chapman wrote: > Peter; > >> I'm aware that in the course of the Python 3 work so far, we've touched >> quite a lot of the code - and some areas are still not fully covered by the >> unit tests. With that in mind, I think a beta release would be a prudent >> thing to do - primarily in the hope of end users spotting any issues which >> the unit tests have not revealed. > > How is the Python 3 stuff looking? Perception wise, it would be nice to be > able to make a release with a statement like: All of the non-C > extension code works on Python 3 using 2to3. Are we at all close to > something like that? > > Otherwise, your other plans all sound good. > Brad Hi Brad, I think it is still premature to make any claims about Python 3 support (even though ignoring the C and NumPy code most stuff works). Issues like binary versus text mode for handles (bytes vs unicode) and the associated speed issues are something in particular which will need some thought (and benchmarks to guide us). See also: http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008143.html http://lists.open-bio.org/pipermail/biopython-dev/2010-July/008011.html http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008159.html Peter From biopython at maubp.freeserve.co.uk Mon Aug 16 09:48:25 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Aug 2010 14:48:25 +0100 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: <20100816132341.GF23299@sobchak.mgh.harvard.edu> References: <20100816132341.GF23299@sobchak.mgh.harvard.edu> Message-ID: On Mon, Aug 16, 2010 at 2:23 PM, Brad Chapman wrote: > Peter; > >> I'm aware that in the course of the Python 3 work so far, we've touched >> quite a lot of the code - and some areas are still not fully covered by the >> unit tests. With that in mind, I think a beta release would be a prudent >> thing to do - primarily in the hope of end users spotting any issues which >> the unit tests have not revealed. > > How is the Python 3 stuff looking? Perception wise, it would be nice to be > able to make a release with a statement like: All of the non-C > extension code works on Python 3 using 2to3. Are we at all close to > something like that? > > Otherwise, your other plans all sound good. > Brad Hi Brad, I think it is still premature to make any claims about Python 3 support (even though ignoring the C and NumPy code most stuff works). Issues like binary versus text mode for handles (bytes vs unicode) and the associated speed issues are something in particular which will need some thought (and benchmarks to guide us). See also: http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008143.html http://lists.open-bio.org/pipermail/biopython-dev/2010-July/008011.html http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008159.html Peter From eric.talevich at gmail.com Mon Aug 16 12:22:53 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 16 Aug 2010 12:22:53 -0400 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: References: Message-ID: On Fri, Aug 13, 2010 at 2:18 PM, Peter wrote: > Hi all, > > We've probably clocked up enough bug fixes and additions to justify a > new release (even though there are still things waiting which look close > to being ready, e.g. uniprot-xml and imgt parsing). Alternatively, should > we delay while we work to get some more of the new stuff tested and > merged? > I'd be happy with another release within the next couple weeks, provided I can fix the Py3 bugs you've turned up in Bio.Phylo. There are some recent fixes and improvements Bio.Phylo, e.g. root_with_outgroup, that I think make the module more useful. I'd like to take another crack at documentation before the release, though -- at least put the example from my BOSC talk into the tutorial. > Does doing a beta release some time next week sound like a good plan, > with the official release say a week or two later? Are there any blocker > issues we should be addressing first? > Just the Bio.Phylo bugs and documentation, as usual. -Eric From biopython at maubp.freeserve.co.uk Mon Aug 16 12:32:26 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Aug 2010 17:32:26 +0100 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: References: Message-ID: On Mon, Aug 16, 2010 at 5:22 PM, Eric Talevich wrote: > On Fri, Aug 13, 2010 at 2:18 PM, Peter wrote: > >> Hi all, >> >> We've probably clocked up enough bug fixes and additions to justify a >> new release (even though there are still things waiting which look close >> to being ready, e.g. uniprot-xml and imgt parsing). Alternatively, should >> we delay while we work to get some more of the new stuff tested and >> merged? >> > > I'd be happy with another release within the next couple weeks, provided I > can fix the Py3 bugs you've turned up in Bio.Phylo. i.e. http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008143.html If you can look at this Py3 issue early this week I'll wait before doing the beta. The real point of the beta is to see if we broke anything on Python 2 without realising it ;) > There are some recent fixes and improvements Bio.Phylo, e.g. > root_with_outgroup, that I think make the module more useful. > > I'd like to take another crack at documentation before the release, though > -- at least put the example from my BOSC talk into the tutorial. That would be great. >> Does doing a beta release some time next week sound like a good plan, >> with the official release say a week or two later? Are there any blocker >> issues we should be addressing first? > > Just the Bio.Phylo bugs and documentation, as usual. Yeah, releases are a good trigger for people updating documentation ;) The docs /could/ be done after the beta is released... depends on your schedule really. Peter From eric.talevich at gmail.com Mon Aug 16 21:59:27 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 16 Aug 2010 21:59:27 -0400 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: References: Message-ID: On Fri, Aug 13, 2010 at 6:29 AM, Peter wrote: > > Yep - much better. However, I'm still seeing four failures with Python > 3.1.2 > which appear to be related to float/int/long conversion: > > > ERROR: test_made (__main__.WriterTests) > Round-trip parsing and serialization of made_up.xml. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_PhyloXML.py", line 550, in test_made > (TreeTests, ['test_Confidence', 'test_Polygon']), [...] > File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", > line 589, in point > _get_child_text(elem, 'long', float), > File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", > line 187, in _get_child_text > return construct(child.text) > ValueError: could not convert string to float: > > ====================================================================== > ERROR: test_phylo (__main__.WriterTests) > Round-trip parsing and serialization of phyloxml_examples.xml. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_PhyloXML.py", line 571, in test_phylo > 'test_Taxonomy', 'test_Uri', > [...] > ValueError: could not convert string to float: > > ====================================================================== > FAIL: test_Distribution (__main__.TreeTests) > Instantiation of Distribution objects. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_PhyloXML.py", line 322, in test_Distribution > self.assertEqual(point.long, longi) > AssertionError: != 8.769303 > > ====================================================================== > FAIL: test_Polygon (__main__.TreeTests) > Instantiation of Polygon objects. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_PhyloXML.py", line 378, in test_Polygon > self.assertEqual(point.long, longi) > AssertionError: != 8.769303 > > ---------------------------------------------------------------------- > I can't seem to replicate these errors. Are they still occurring on your auto2to3 branch? >From a clean master branch, I did: git checkout -b three 2to3 -w -x long Bio/ 2to3 -w BioSQL/ Tests/ setup.py python3 setup.py build sudo python3 setup.py install cd Tests/ python3 test_Phylo.py python3 test_PhyloXML.py I'm using the 2to3 packaged with Python 2.7 from python.org, testing with the Python 3.1.2 packaged for Ubuntu 10.04. Any ideas? Thanks, Eric From biopython at maubp.freeserve.co.uk Tue Aug 17 07:25:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Aug 2010 12:25:40 +0100 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: References: Message-ID: On Tue, Aug 17, 2010 at 2:59 AM, Eric Talevich wrote: > > I can't seem to replicate these errors. Are they still occurring on your > auto2to3 branch? > Yes - but I this this is down to it not being a clean branch - see below. > From a clean master branch, I did: > > git checkout -b three > 2to3 -w -x long Bio/ > 2to3 -w BioSQL/ Tests/ setup.py > python3 setup.py build > sudo python3 setup.py install > cd Tests/ > python3 test_Phylo.py > python3 test_PhyloXML.py It doesn't matter for testing Bio.Phylo, but you shouldn't need to convert setup.py, and you haven't converted the doctests. This is what we have in the README file: $ 2to3 --nofix=long --no-diffs -n -w Bio BioSQL Tests Scripts Doc/examples $ 2to3 --nofix=long --no-diffs -n -w -d Bio BioSQL Tests Scripts Doc/examples I've been running that with the addition of -j 7 to speed it up. The good news is using the above 2to3 run on a clean branch fixes test_PhyloXML.py, e.g. git reset --hard git checkout master git checkout -b three 2to3 -j 7 --nofix=long --no-diffs -n -w Bio BioSQL Tests Scripts Doc/examples 2to3 -j 7 --nofix=long --no-diffs -n -w -d Bio BioSQL Tests Scripts Doc/examples python3 setup.py install --prefix=$HOME cd Tests python3 test_Phylo.py python3 test_PhyloXML.py The resulting code is a little different from my auto2to3 branch - all to do with long/int changes. I think my script was keeping the unwanted fixes to Bio/Phylo/PhyloXML.py as well as the useful fixes made to these files: * Bio/SeqIO/InsdcIO.py - testing for isinstance of int or long * BioSQL/Loader.py - testing for isinstance of int or long * Bio/Prosite/Prodoc.py - using long(handle.tell()) * Bio/Prosite/__init__.py - using long(handle.tell()) The bad news is running 2to3 on a clean branch with the long fixes disabled breaks a few things where we need int/long fixes, e.g. test_BioSQL.py via Bio/SeqIO/InsdcIO.py I think we have three options, (1) Manually fix the int/long issues in the four files listed, and continue to use 2to3 with the long fixer disabled. The long(handle.tell()) call can just be handle.tell() as far as I can see (at least on Python 3), and Bio.Prosite is deprecated anyway. For the other issue we can add an is_int_or_long function in our Python 3 compatibility library, which I think I can do without trouble. (2) Manually fix PhyloXML to avoid long for longitude (with a deprecation period etc), and go back to using 2to3 with default settings. A bit of a pain. (3) Don't change the code, but run 2to3 in default mode for most cost, while disabling the long fixer for Bio/Phylo - this will require scripting. I think option (1) makes most sense. Peter P.S. I also need to look at my auto2to3 script again to prevent auto merges. The simple answer is to create a clean branch each time (perhaps deleting or replacing the old conversions)... the auto2to3 branch was for testing purposes anyway so I don't mind deleting it. From n.j.loman at bham.ac.uk Tue Aug 17 09:59:36 2010 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Tue, 17 Aug 2010 14:59:36 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> Message-ID: <4C6A95C8.1040902@bham.ac.uk> Kevin Jacobs wrote: > Never mind-- I didn't realized the consensus sequence was gapped, so > it is then trivial to recover the original pairwise alignments. I'll > have a version of my Newbler2SAM module that can process ACE files > shortly. Hi Kevin I was wondering how you got on with the ACE to SAM converter? I now realise that the 454PairAlign.txt produced by Newbler when run in de novo assembly mode is not much use to me, as the alignments reported in this file are strictly pairwise between reads and don't relate back to the assembled contigs. So an ACE file parser would be extremely helpful at this point. Cheers Nick. From biopython at maubp.freeserve.co.uk Tue Aug 17 11:43:17 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Aug 2010 16:43:17 +0100 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: References: Message-ID: On Tue, Aug 17, 2010 at 12:25 PM, Peter wrote: > > P.S. > I also need to look at my auto2to3 script again to prevent auto merges. > The simple answer is to create a clean branch each time (perhaps > deleting or replacing the old conversions)... the auto2to3 branch was > for testing purposes anyway so I don't mind deleting it. > Yet again, I am wishing git still supported "theirs" as a merge strategy, which would I think be exactly what I want to use in this situation. I think I've fixed my script now, notice that now the auto2to3 branch now clearly has unconverted instances of long present: http://github.com/peterjc/biopython/commit/4773377dcda10ee3511fea7fabc196fc8d6251ed This branch is (according to a git diff) identical to the one I created from a clean branch as described earlier. Peter From biopython at maubp.freeserve.co.uk Tue Aug 17 11:47:39 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Aug 2010 16:47:39 +0100 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: References: Message-ID: On Tue, Aug 17, 2010 at 12:25 PM, Peter wrote: > > The resulting code is a little different from my auto2to3 branch - all to do > with long/int changes. I think my script was keeping the unwanted fixes to > Bio/Phylo/PhyloXML.py as well as the useful fixes made to these files: > > * Bio/SeqIO/InsdcIO.py - testing for isinstance of int or long > * BioSQL/Loader.py - testing for isinstance of int or long > * Bio/Prosite/Prodoc.py - using long(handle.tell()) > * Bio/Prosite/__init__.py - using long(handle.tell()) And also: * Bio/SwissProt/SProt.py - using long(handle.tell()) * Bio/Blast/NCBIStandalone.py - using long(float(...)) I still think all these can be fixed to work without needing the 2to3 long fixer. Peter From bioinformed at gmail.com Tue Aug 17 12:27:55 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Tue, 17 Aug 2010 12:27:55 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C6A95C8.1040902@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> <4C6A95C8.1040902@bham.ac.uk> Message-ID: On Tue, Aug 17, 2010 at 9:59 AM, Nick Loman wrote: > Kevin Jacobs wrote: > >> Never mind-- I didn't realized the consensus sequence was gapped, so it is >> then trivial to recover the original pairwise alignments. I'll have a >> version of my Newbler2SAM module that can process ACE files shortly. >> > Hi Kevin > > I was wondering how you got on with the ACE to SAM converter? > > I now realise that the 454PairAlign.txt produced by Newbler when run in de > novo assembly mode is not much use to me, as the alignments reported in this > file are strictly pairwise between reads and don't relate back to the > assembled contigs. So an ACE file parser would be extremely helpful at this > point. > > Hi Nick, I'm stuck with the ACE conversion for exactly the same reason. The consensus and reads are gapped for multiple alignments so that there are no mismatches at all. I will have to recompute the Smith-Waterman alignments of each read against the ungapped consensus in order to produce SAM/BAM output. I'm surprised that the pairwise alignments for the de novo assembly are so problematic. My understanding was they they were pairwise against the consensus contigs and would be exactly what you'd want for SAM/BAM. Unfortunately, I'm mainly dealing with only human data and don't have any direct examples to know for sure. I can re-process some of our EBV data with the de novo aligner and see what can be done. -Kevin From biopython at maubp.freeserve.co.uk Tue Aug 17 12:37:52 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Aug 2010 17:37:52 +0100 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: References: Message-ID: On Tue, Aug 17, 2010 at 4:47 PM, Peter wrote: > On Tue, Aug 17, 2010 at 12:25 PM, Peter wrote: >> >> The resulting code is a little different from my auto2to3 branch - all to do >> with long/int changes. I think my script was keeping the unwanted fixes to >> Bio/Phylo/PhyloXML.py as well as the useful fixes made to these files: >> >> * Bio/SeqIO/InsdcIO.py - testing for isinstance of int or long >> * BioSQL/Loader.py - testing for isinstance of int or long >> * Bio/Prosite/Prodoc.py - using long(handle.tell()) >> * Bio/Prosite/__init__.py - using long(handle.tell()) > > And also: > * Bio/SwissProt/SProt.py - using long(handle.tell()) > * Bio/Blast/NCBIStandalone.py - using long(float(...)) > > I still think all these can be fixed to work without needing the 2to3 long > fixer. Handling testing for int/long is done, http://github.com/biopython/biopython/commit/ddaf587afd02aa7214e53647c48e4089555e7efb And I replaced the use of long in Bio/Blast/NCBIStandalone.py with int(), http://github.com/biopython/biopython/commit/e095370184fc2ab50b37bbd86667f762ca825107 The other three uses of long I identified can probably be solved neatly like this: try: end = long(handle.tell()) except NameError: #Python 3 where 2to3 long fixer was disabled end = handle.tell() Peter From n.j.loman at bham.ac.uk Tue Aug 17 12:35:45 2010 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Tue, 17 Aug 2010 17:35:45 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> <4C6A95C8.1040902@bham.ac.uk> Message-ID: <4C6ABA61.1000307@bham.ac.uk> Kevin Jacobs wrote: > I'm stuck with the ACE conversion for exactly the same reason. The > consensus and reads are gapped for multiple alignments so that there > are no mismatches at all. I will have to recompute the Smith-Waterman > alignments of each read against the ungapped consensus in order to > produce SAM/BAM output. I'm surprised that the > pairwise alignments for the de novo assembly are so problematic. My > understanding was they they were pairwise against the consensus > contigs and would be exactly what you'd want for SAM/BAM. > Unfortunately, I'm mainly dealing with only human data and don't have > any direct examples to know for sure. I can re-process some of our > EBV data with the de novo aligner and see what can be done. Hi Kevin I was expecting it to be similar to the gsMapper output but it isn't. When you supply -pt to gsAssembler (to specify 454PairAlign.txt should be output) then each pair in the file relates to reads from the original SFF files, not the contigs. I guess this makes sense as it is probably represents a stage of the de novo assembly process (an all against all pairwise comparison on the reads). I guess I can get around this by running gsMapper against the assembly using the SFF files as a second stage, and then using Newbler2SAM on this instead, but I was kind of hoping to avoid this (as I would expect it to give slightly different results). Another possible workaround is potentially using GAP5 from the Staden package - I understand it can read ACE and output SAM. Cheers, Nick From biopython at maubp.freeserve.co.uk Tue Aug 17 15:41:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Aug 2010 20:41:57 +0100 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: References: Message-ID: On Tue, Aug 17, 2010 at 5:37 PM, Peter wrote: > > The other three uses of long I identified can probably be solved > neatly like this: > > try: > ? ?end = long(handle.tell()) > except NameError: > ? ?#Python 3 where 2to3 long fixer was disabled > ? ?end = handle.tell() > On closer inspection, probably we can just remove the long(): http://github.com/biopython/biopython/commit/e11eb52413e5fe78619c4cf5511a4db1319931fa Michiel - is this OK? This was indexing code you wrote to replace the Mindy stuff as I recall: http://github.com/biopython/biopython/commit/f17fb613cccd15e28f9f98709742f48d87ae27d4 http://github.com/biopython/biopython/commit/f17fb613cccd15e28f9f98709742f48d87ae27d4 Files: * Bio/Prosite/__init__.py * Bio/Prosite/Prodoc.py * Bio/SwissProt/SProt.py Peter From mjldehoon at yahoo.com Wed Aug 18 08:55:25 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 18 Aug 2010 05:55:25 -0700 (PDT) Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: Message-ID: <119273.97006.qm@web62406.mail.re1.yahoo.com> Since handle.tell() returns a long integer, I agree that we can remove the long(). --Michiel. --- On Tue, 8/17/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] test_PhyloXML.py on Python 3 > To: "Eric Talevich" , "Michiel de Hoon" > Cc: "Biopython-Dev Mailing List" > Date: Tuesday, August 17, 2010, 3:41 PM > On Tue, Aug 17, 2010 at 5:37 PM, > Peter wrote: > > > > The other three uses of long I identified can probably > be solved > > neatly like this: > > > > try: > > ? ?end = long(handle.tell()) > > except NameError: > > ? ?#Python 3 where 2to3 long fixer was disabled > > ? ?end = handle.tell() > > > > On closer inspection, probably we can just remove the > long(): > > http://github.com/biopython/biopython/commit/e11eb52413e5fe78619c4cf5511a4db1319931fa > > Michiel - is this OK? This was indexing code you wrote to > replace the Mindy stuff as I recall: > > http://github.com/biopython/biopython/commit/f17fb613cccd15e28f9f98709742f48d87ae27d4 > > http://github.com/biopython/biopython/commit/f17fb613cccd15e28f9f98709742f48d87ae27d4 > > Files: > > * Bio/Prosite/__init__.py > * Bio/Prosite/Prodoc.py > * Bio/SwissProt/SProt.py > > Peter > From biopython at maubp.freeserve.co.uk Wed Aug 18 09:03:45 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Aug 2010 14:03:45 +0100 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: <119273.97006.qm@web62406.mail.re1.yahoo.com> References: <119273.97006.qm@web62406.mail.re1.yahoo.com> Message-ID: On Wed, Aug 18, 2010 at 1:55 PM, Michiel de Hoon wrote: > > Since handle.tell() returns a long integer, I agree that we can remove the long(). > Great. That should mean the only long issue remaining is in phyloXML code, which just requires the 2to3 script is run without the long fixer (since we are using long as shorthand for longitude not the variable type). Peter From biopython at maubp.freeserve.co.uk Wed Aug 18 09:34:36 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Aug 2010 14:34:36 +0100 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: References: Message-ID: On Mon, Aug 16, 2010 at 5:32 PM, Peter wrote: > On Mon, Aug 16, 2010 at 5:22 PM, Eric Talevich wrote: >> >> I'd be happy with another release within the next couple weeks, provided I >> can fix the Py3 bugs you've turned up in Bio.Phylo. >> > > i.e. > http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008143.html > > If you can look at this Py3 issue early this week I'll wait before doing the > beta. The real point of the beta is to see if we broke anything on Python 2 > without realising it ;) With the long issues apparently sorted, I'm going to try and do the beta this afternoon (i.e. in the next few hours). Currently I'm running the unit tests on Windows with Python 2.4 to 2.7, so far it all looks fine. You can expect a "trunk freeze" email shortly for the actual release process. Peter From biopython at maubp.freeserve.co.uk Wed Aug 18 10:41:11 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Aug 2010 15:41:11 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 (beta) Message-ID: Hi all, Please don't commit anything to the master branch until further notice. I have started doing the Biopython 1.55 (beta) release as we discussed, http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008150.html Thanks, Peter From biopython at maubp.freeserve.co.uk Wed Aug 18 11:40:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Aug 2010 16:40:40 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 (beta) In-Reply-To: References: Message-ID: On Wed, Aug 18, 2010 at 3:41 PM, Peter wrote: > Hi all, > > Please don't commit anything to the master branch until further notice. > I have started doing the Biopython 1.55 (beta) release as we discussed, > http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008150.html > > Thanks, > > Peter OK, the source code bundles and Windows installers are up. If anyone on the dev list has the chance to download and test these now, that would be great. Note that I have not included an installer for Python 2.7 (yet). I'm waiting for official Windows installers for NumPy on Python 2.7, this will be NumPy 1.5 which should soon as they already have a beta out. I'm hoping that will be ready by the time we want to formally release Biopython 1.55. Still to do: * Update API docs with epydoc * News page entry & email announcement Peter From biopython at maubp.freeserve.co.uk Wed Aug 18 12:16:43 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Aug 2010 17:16:43 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 (beta) In-Reply-To: References: Message-ID: On Wed, Aug 18, 2010 at 4:40 PM, Peter wrote: > Still to do: > * Update API docs with epydoc Done: http://biopython.org/DIST/docs/api/ This turned up two trivial epytext formatting issues, which I have fixed: http://github.com/biopython/biopython/commit/bfa3754b469c740f27d291698e59d1379beaf14b http://github.com/biopython/biopython/commit/2d0d54999ab881343ce47733e388e5c84c5125bf This does mean the API docs are two commits ahead of the tag and the code in the downloads ;) Peter From biopython at maubp.freeserve.co.uk Wed Aug 18 17:03:28 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Aug 2010 22:03:28 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 (beta) In-Reply-To: References: Message-ID: On Wed, Aug 18, 2010 at 4:40 PM, Peter wrote: > > OK, the source code bundles and Windows installers are up. If anyone > on the dev list has the chance to download and test these now, that > would be great. > > Note that I have not included an installer for Python 2.7 (yet). I'm waiting > for official Windows installers for NumPy on Python 2.7, this will be > NumPy 1.5 which should soon as they already have a beta out. I'm > hoping that will be ready by the time we want to formally release > Biopython 1.55. > > Still to do: > * Update API docs with epydoc > * News page entry & email announcement > As mentioned earlier, epydoc is done, and I've also just done a news post: http://news.open-bio.org/news/2010/08/biopython-1-55-beta-released/ If there are any typos or other suggestions for improvement, please tell us. We can edit that page - and then turn it into an email to send out. This means the "trunk freeze" is over, but for the next week or so when we'll do the official release, let's focus on documentation and any bug fixes. [Keep new feature work only on branches please.] Thanks, Peter From biopython at maubp.freeserve.co.uk Thu Aug 19 11:43:56 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 19 Aug 2010 16:43:56 +0100 Subject: [Biopython-dev] Biopython 1.55 beta released Message-ID: Dear Biopythoneers, We?ve just released a beta of Biopython 1.55 for user testing, as announced on the news server (which has RSS and atom feeds) and on twitter: http://news.open-bio.org/news/2010/08/biopython-1-55-beta-released/ http://twitter.com/biopython Since Biopython 1.54 was released three months ago, we?ve made a good start on work for Python 3 support (via the 2to3 script), but as a side effect of this we?ve had to update quite a lot of the older parts of the library. Although the unit tests are all fine, there is a small but real chance that we?ve accidentally broken things ? which is why we?re doing this beta release. In terms of new features, the most noticeable highlight is that the command line tool application wrapper classes are now executable, which should make it much easier to call external tools. This is described in the updated documentation. http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Note we are phasing out support for Python 2.4. We will continue to support it for at least one further release (i.e. Biopython 1.56). This could be delayed given feedback from our users (e.g. if this proves to be a problem in combination with other libraries or a popular Linux distribution). (At least) 10 people have contributed to this release (so far), including 5 new people - thank you all: * Andres Colubri (first contribution) * Carlos Rios Vera (first contribution) * Claude Paroz (first contribution) * Eric Talevich * Frank Kauff * Joao Rodrigues (first contribution) * Konstantin Okonechnikov (first contribution) * Michiel de Hoon * Peter Cock * Tiago Antao Source distributions and Windows installers are available from the downloads page on the Biopython website: http://www.biopython.org/wiki/Download Feedback is welcome through the mailing lists (or bugzilla), especially if you find something that doesn't work. Thank you, Peter From mjldehoon at yahoo.com Sat Aug 21 02:30:36 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 20 Aug 2010 23:30:36 -0700 (PDT) Subject: [Biopython-dev] Obsolete code Message-ID: <984537.40520.qm@web62408.mail.re1.yahoo.com> Dear all, The classes and modules listed below were declared obsolete in Biopython 1.54 or earlier, but do not yet raise a deprecation warning. Most of this functionality moved to a different module or was implemented differently. I suggest we add a DeprecationWarning to each of these before Biopython 1.55 final. The only tricky one is the .data property of Seq classes in Bio.Seq. Still, it might be good to add a DeprecationWarning there to make people aware that this property is obsolete. Any objections? --Michiel. Bio.CelFile.CelParser Bio.CelFile.CelScanner Bio.CelFile.CelConsumer Bio.CelFile.CelRecord Bio.Align.MultipleSeqAlignment.get_column Bio.Align.Generic.Alignment Bio.Align.Generic.Alignment.get_seq_by_num Bio.AlignAce.Parser Bio.Blast.Applications.FastacmdCommandline Bio.Blast.Applications.BlastallCommandline Bio.Blast.Applications.BlastpgpCommandline Bio.Blast.Applications.RpsBlastCommandline Bio.Blast.NCBIStandalone.blastall Bio.Blast.NCBIStandalone.blastpgp Bio.Blast.NCBIStandalone.rpsblast (Bio.Blast.NCBIStandalone has been declared obsolete, but I guess some people may still be using the Blast plain-text output parser.) Bio.Clustalw Bio.Compass._Scanner Bio.Compass._Consumer Bio.Compass.RecordParser Bio.Compass.Iterator Bio.Emboss.Applications.EProtDistCommandline Bio.Emboss.Applications.ENeighborCommandline Bio.Emboss.Applications.EProtParsCommandline Bio.Emboss.Applications.EConsenseCommandline Bio.Emboss.Applications.ESeqBootCommandline Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer.xcentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer._set_xcentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer.ycentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer._set_ycentre Bio.Graphics.GenomeDiagram._Graph.centre Bio.Graphics.GenomeDiagram._Graph._set_centre Bio.Motif.Parsers.AlignAce.AlignAceConsumer Bio.Motif.Parsers.AlignAce.AlignAceParser Bio.Motif.Parsers.AlignAce.AlignAceScanner Bio.Motif.Parsers.AlignAce.CompareAceScanner Bio.Motif.Parsers.AlignAce.CompareAceConsumer Bio.Motif.Parsers.MEME.MEMEParser Bio.Motif.Parsers.MEME._MEMEScanner Bio.Motif.Parsers.MEME._MEMEConsumer Bio.Motif.Parsers.MEME._MASTConsumer Bio.Motif.Parsers.MEME.MASTParser Bio.Motif.Parsers.MEME._MASTScanner Bio.Motif.Parsers.MEME.MASTRecord Bio.PopGen.FDist.RecordParser Bio.PopGen.FDist._Scanner Bio.PopGen.FDist._RecordConsumer Bio.Seq.Seq.data Bio.SeqUtils.GC_Frame Bio.SeqUtils.fasta_uniqids Bio.SeqUtils.apply_on_multi_fasta Bio.SeqUtils.quicker_apply_on_multi_fasta Bio.UniGene.UnigeneSequenceRecord Bio.UniGene.UnigeneProtsimRecord Bio.UniGene.UnigeneSTSRecord Bio.UniGene.UnigeneRecord Bio.UniGene._RecordConsumer Bio.UniGene._Scanner Bio.UniGene.RecordParser Bio.UniGene.Iterator From mjldehoon at yahoo.com Sat Aug 21 02:56:57 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 20 Aug 2010 23:56:57 -0700 (PDT) Subject: [Biopython-dev] Deprecated code Message-ID: <207402.89678.qm@web62407.mail.re1.yahoo.com> Dear all, Below are the modules and functions that were deprecated (with a DeprecationWarning) in Biopython 1.51 or earlier, which was released on August 17, 2009. Since that is more than one year (and more than two releases) ago, we can remove these from Biopython. Any objections? If not, I'll send this list also to the user mailing list before removing them. --Michiel. Bio.Align.FormatConvert Bio.Emboss.Applications.PrimerSearchCommandline.set_parameter Bio.Entrez.efetch: rettype="genbank" option Bio.Fasta Bio.SCOP.Dom.Parser Bio.SwissProt.SProt Bio.Transcribe Bio.Translate BioSQL.BioSeqDatabase.open_database: driver="psycopg" option From bartek at rezolwenta.eu.org Sat Aug 21 04:56:52 2010 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Sat, 21 Aug 2010 10:56:52 +0200 Subject: [Biopython-dev] Obsolete code In-Reply-To: <984537.40520.qm@web62408.mail.re1.yahoo.com> References: <984537.40520.qm@web62408.mail.re1.yahoo.com> Message-ID: Hi, Great job. I just have a small comment on the Bio.AlignAce.Parser module. It is a part of the Bio.AlignAce "package" which was already deprecated (see Bio.AlignAce.__init__.py). I think either there is no need to put an extra deprecation warning in Bio.AlignAce.Parser, or we should put it in all submodules of Bio.AlignAce (like Bio.AlignaAce.Motif, etc.). As for deprecating the old parsers in Bio Motif, I've just looked through the code in Motif.__init__.py and realized that it was still using the old implemmentations... This small patch below should fix it (I don't want to push onto the main branch now, as it's frozen). Cheers Bartek diff --git a/Bio/Motif/__init__.py b/Bio/Motif/__init__.py index 4c7c19a..9ca0200 100644 --- a/Bio/Motif/__init__.py +++ b/Bio/Motif/__init__.py @@ -10,12 +10,12 @@ as well as methods for motif comparisons and motif searching in sequences. It also inlcudes functionality for parsing AlignACE and MEME programs """ from _Motif import Motif -from Parsers.AlignAce import AlignAceParser, CompareAceParser -from Parsers.MEME import MEMEParser,MASTParser +import Parsers.AlignAce +import Parsers.MEME from Thresholds import ScoreDistribution -_parsers={"AlignAce":AlignAceParser, - "MEME":MEMEParser +_parsers={"AlignAce":Parsers.AlignAce.read, + "MEME":Parsers.MEME.read } def _from_pfm(handle): @@ -75,7 +75,7 @@ def parse(handle,format): else: #we have a proper reader yield reader(handle) else: # we have a proper reader - for m in parser().parse(handle).motifs: + for m in parser(handle).motifs: yield m def read(handle,format): On Sat, Aug 21, 2010 at 8:30 AM, Michiel de Hoon wrote: > Dear all, > > The classes and modules listed below were declared obsolete in Biopython > 1.54 or earlier, but do not yet raise a deprecation warning. Most of this > functionality moved to a different module or was implemented differently. I > suggest we add a DeprecationWarning to each of these before Biopython 1.55 > final. The only tricky one is the .data property of Seq classes in Bio.Seq. > Still, it might be good to add a DeprecationWarning there to make people > aware that this property is obsolete. > > Any objections? > > --Michiel. > > Bio.CelFile.CelParser > Bio.CelFile.CelScanner > Bio.CelFile.CelConsumer > Bio.CelFile.CelRecord > > Bio.Align.MultipleSeqAlignment.get_column > Bio.Align.Generic.Alignment > Bio.Align.Generic.Alignment.get_seq_by_num > > Bio.AlignAce.Parser > > Bio.Blast.Applications.FastacmdCommandline > Bio.Blast.Applications.BlastallCommandline > Bio.Blast.Applications.BlastpgpCommandline > Bio.Blast.Applications.RpsBlastCommandline > Bio.Blast.NCBIStandalone.blastall > Bio.Blast.NCBIStandalone.blastpgp > Bio.Blast.NCBIStandalone.rpsblast > (Bio.Blast.NCBIStandalone has been declared obsolete, but I guess some > people may still be using the Blast plain-text output parser.) > > Bio.Clustalw > > Bio.Compass._Scanner > Bio.Compass._Consumer > Bio.Compass.RecordParser > Bio.Compass.Iterator > > Bio.Emboss.Applications.EProtDistCommandline > Bio.Emboss.Applications.ENeighborCommandline > Bio.Emboss.Applications.EProtParsCommandline > Bio.Emboss.Applications.EConsenseCommandline > Bio.Emboss.Applications.ESeqBootCommandline > > Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer.xcentre > Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer._set_xcentre > Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer.ycentre > Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer._set_ycentre > Bio.Graphics.GenomeDiagram._Graph.centre > Bio.Graphics.GenomeDiagram._Graph._set_centre > > Bio.Motif.Parsers.AlignAce.AlignAceConsumer > Bio.Motif.Parsers.AlignAce.AlignAceParser > Bio.Motif.Parsers.AlignAce.AlignAceScanner > Bio.Motif.Parsers.AlignAce.CompareAceScanner > Bio.Motif.Parsers.AlignAce.CompareAceConsumer > > Bio.Motif.Parsers.MEME.MEMEParser > Bio.Motif.Parsers.MEME._MEMEScanner > Bio.Motif.Parsers.MEME._MEMEConsumer > Bio.Motif.Parsers.MEME._MASTConsumer > Bio.Motif.Parsers.MEME.MASTParser > Bio.Motif.Parsers.MEME._MASTScanner > Bio.Motif.Parsers.MEME.MASTRecord > > Bio.PopGen.FDist.RecordParser > Bio.PopGen.FDist._Scanner > Bio.PopGen.FDist._RecordConsumer > > Bio.Seq.Seq.data > > Bio.SeqUtils.GC_Frame > Bio.SeqUtils.fasta_uniqids > Bio.SeqUtils.apply_on_multi_fasta > Bio.SeqUtils.quicker_apply_on_multi_fasta > > Bio.UniGene.UnigeneSequenceRecord > Bio.UniGene.UnigeneProtsimRecord > Bio.UniGene.UnigeneSTSRecord > Bio.UniGene.UnigeneRecord > Bio.UniGene._RecordConsumer > Bio.UniGene._Scanner > Bio.UniGene.RecordParser > Bio.UniGene.Iterator > > > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From tiagoantao at gmail.com Sat Aug 21 07:33:55 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 21 Aug 2010 12:33:55 +0100 Subject: [Biopython-dev] Obsolete code In-Reply-To: <984537.40520.qm@web62408.mail.re1.yahoo.com> References: <984537.40520.qm@web62408.mail.re1.yahoo.com> Message-ID: On Sat, Aug 21, 2010 at 7:30 AM, Michiel de Hoon wrote: > Bio.PopGen.FDist.RecordParser > Bio.PopGen.FDist._Scanner > Bio.PopGen.FDist._RecordConsumer I think I know everybody that uses this code (of course, surprises sometimes happen...) and I am very convinced that all is upgraded. Please go ahead. I intend to remove it in 1.56 (my branch on git does not have it anymore). From p.j.a.cock at googlemail.com Sun Aug 22 18:11:41 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 22 Aug 2010 23:11:41 +0100 Subject: [Biopython-dev] Deprecated code In-Reply-To: <207402.89678.qm@web62407.mail.re1.yahoo.com> References: <207402.89678.qm@web62407.mail.re1.yahoo.com> Message-ID: On Saturday, August 21, 2010, Michiel de Hoon wrote: > > Dear all, > > Below are the modules and functions that were deprecated (with a DeprecationWarning) in Biopython 1.51 or earlier, which was released on August 17, 2009. Since that is more than one year (and more than two releases) ago, we can remove these from Biopython. Any objections? If not, I'll send this list also to the user mailing list before removing them. > > --Michiel. > Since we didn't do this in the beta, I'd say leave these for Biopython 1.55 (about one week's time - so far so good) but then they can go. An email to the mail list would be sensible too. Peter From p.j.a.cock at googlemail.com Sun Aug 22 18:26:53 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 22 Aug 2010 23:26:53 +0100 Subject: [Biopython-dev] Obsolete code In-Reply-To: <984537.40520.qm@web62408.mail.re1.yahoo.com> References: <984537.40520.qm@web62408.mail.re1.yahoo.com> Message-ID: On Saturday, August 21, 2010, Michiel de Hoon wrote: > Dear all, > > The classes and modules listed below were declared obsolete in Biopython 1.54 or earlier, but do not yet raise a deprecation warning. Most of this functionality moved to a different module or was implemented differently. I suggest we add a DeprecationWarning to each of these before Biopython 1.55 final. The only tricky one is the .data property of Seq classes in Bio.Seq. Still, it might be good to add a DeprecationWarning there to make people aware that this property is obsolete. > > Any objections? I'd be cautious about the Seq object given it will be particularly widely used. I've commented on a few cases below, but in general yes deprecation of things previously marked as obsolete is sensible house keeping. > > --Michiel. > > Bio.CelFile.CelParser > Bio.CelFile.CelScanner > Bio.CelFile.CelConsumer > Bio.CelFile.CelRecord > Ok > Bio.Align.MultipleSeqAlignment.get_column > Bio.Align.Generic.Alignment > Bio.Align.Generic.Alignment.get_seq_by_num > I'd like to make the new alignment object a bit more user friendly before we deprecate these bits. > Bio.AlignAce.Parser Ok > > Bio.Blast.Applications.FastacmdCommandline > Bio.Blast.Applications.BlastallCommandline > Bio.Blast.Applications.BlastpgpCommandline > Bio.Blast.Applications.RpsBlastCommandline > Bio.Blast.NCBIStandalone.blastall > Bio.Blast.NCBIStandalone.blastpgp > Bio.Blast.NCBIStandalone.rpsblast The NCBI are still supporting "legacy" BLAST so it is probably a bit too early to deprecate these wrappers. Maybe I'm being cautious but I'd leave this until the next release in three months time or so. > (Bio.Blast.NCBIStandalone has been declared obsolete, but I guess some people may still be using the Blast plain-text output parser.) Deprecation seems premature - the code is still useful and probably widely used (I use it myself sometimes). Maybe once your new BLAST parsing framework is ready Michiel? > > Bio.Clustalw Ok > > Bio.Compass._Scanner > Bio.Compass._Consumer > Bio.Compass.RecordParser > Bio.Compass.Iterator Ok > Bio.Emboss.Applications.EProtDistCommandline > Bio.Emboss.Applications.ENeighborCommandline > Bio.Emboss.Applications.EProtParsCommandline > Bio.Emboss.Applications.EConsenseCommandline > Bio.Emboss.Applications. > Bio.Align.FormatConvert > Bio.Emboss.Applications.PrimerSearchCommandline.set_parameter > Bio.Entrez.efetch: rettype="genbank" option Should be fine. > Bio.Fasta Here my instinct is to be cautious given Bio.Fasta used to be such a widely used module. > Bio.SCOP.Dom.Parser > Bio.SwissProt.SProt > Bio.Transcribe > Bio.Translate > BioSQL.BioSeqDatabase.open_database: driver="psycopg" option Ok > > Bio.SeqUtils.GC_Frame > Bio.SeqUtils.fasta_uniqids > Bio.SeqUtils.apply_on_multi_fasta > Bio.SeqUtils.quicker_apply_on_multi_fasta > > Bio.UniGene.UnigeneSequenceRecord > Bio.UniGene.UnigeneProtsimRecord > Bio.UniGene.UnigeneSTSRecord > Bio.UniGene.UnigeneRecord > Bio.UniGene._RecordConsumer > Bio.UniGene._Scanner > Bio.UniGene.RecordParser > Bio.UniGene.Iterator > Ok Apologies for brevity and any typos, this was sent from an iPod. Peter From biopython at maubp.freeserve.co.uk Tue Aug 24 07:56:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Aug 2010 12:56:57 +0100 Subject: [Biopython-dev] IMGT parser (modified EMBL format), Message-ID: Hi all, The IMGT is the international ImMunoGeneTics information system, a global reference in immunogenetics and immunoinformatics. They have a sequence databases, genome database, structure database, and monoclonal antibodies database. The IMGT use a variant of the EMBL flat file format with longer feature indents: http://imgt.cines.fr/download/LIGM-DB/userman_doc.html http://imgt.cines.fr/download/LIGM-DB/ftable_doc.html http://www.ebi.ac.uk/imgt/hla/docs/manual.html Uri and I have been working on extending the SeqIO EMBL/GenBank parser and writer to support IMGT files too. This uncovered a number of data formatting issues (e.g. wrong sequence length in ID line, partial feature locations) and Uri has been liaising with the IMGT curators to address these. With their latest (Aug 2010) release, we can now parse the whole file without errors: http://imgt.cines.fr/download/LIGM-DB/imgt.dat.Z I think this code is now ready to merge - comments welcome: http://github.com/peterjc/biopython/commits/seqio-imgt Potentially we could even include this in Biopython 1.55, although it would be more cautious not to add any new features between the beta and the final release... Peter From biopython at maubp.freeserve.co.uk Tue Aug 24 08:06:15 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Aug 2010 13:06:15 +0100 Subject: [Biopython-dev] Bio.PDB on Python 3 In-Reply-To: References: Message-ID: On Mon, Aug 16, 2010 at 2:47 PM, Peter wrote: > Hi all, > > A while back I installed NumPy from their svn under Python 3, so that I > could test more of Biopython. I hadn't really looked at Bio.PDB until > recently because test_PDB.py depended on Bio.KDTree which needs > some C code to be compiled (which we haven't tried yet). > > I recently added a few doctests to Bio/PDB/Polypeptide.py which > showed a problem with the code using "next" as a variable name. > This is a built in function on Python 3, taking the place of the next > method on iterator objects. That's fixed now: > > http://github.com/biopython/biopython/commit/1eb48feb5520094bf7f0177be804a953024e6938 > > In order to test more of Bio.PDB under Python 3, I have just split > test_PDB.py into two, creating a small test_PDB_KDtree.py file > for the neighbour search functionality which requires the C code. > > This has revealed there are at least two issues with Bio.PDB to be > addressed (see below). > > Peter > > > ====================================================================== > ERROR: test_1_warnings (__main__.A_ExceptionTest) > Check warnings: Parse a flawed PDB file in permissive mode. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 200, in _parse_coordinates > ? ?fullname, serial_number, element) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", > line 232, in init_atom > ? ?residue.add(atom) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/Residue.py", > line 82, in add > ? ?"Atom %s defined twice in residue %s" % (atom_id, self)) > Bio.PDB.PDBExceptions.PDBConstructionException: Atom N defined twice > in residue > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > ?File "test_PDB.py", line 57, in test_1_warnings > ? ?p.get_structure("example", "PDB/a_structure.pdb") > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 64, in get_structure > ? ?self._parse(file.readlines()) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 84, in _parse > ? ?self.trailer=self._parse_coordinates(coords_trailer) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 202, in _parse_coordinates > ? ?self._handle_PDB_exception(message, global_line_counter) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 256, in _handle_PDB_exception > ? ?% message, PDBConstructionWarning) > ?File "test_PDB.py", line 53, in showwarning > ? ?all_warns.append(*args[0]) > TypeError: append() argument after * must be a sequence, not > PDBConstructionWarning Eric fixes this one, thanks: http://github.com/biopython/biopython/commit/f4917021cbb8a4ed4cc72dc50a2abf0066da7131 > ====================================================================== > ERROR: test_ExposureCN (__main__.Exposure) > HSExposureCN. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "test_PDB.py", line 612, in setUp > ? ?structure=PDBParser(PERMISSIVE=True).get_structure('X', pdb_filename) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 64, in get_structure > ? ?self._parse(file.readlines()) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 84, in _parse > ? ?self.trailer=self._parse_coordinates(coords_trailer) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 200, in _parse_coordinates > ? ?fullname, serial_number, element) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", > line 185, in init_atom > ? ?duplicate_atom=residue[name] > TypeError: 'DisorderedResidue' object is not subscriptable This and the others like it remain. I haven't looked into what is wrong. Peter From laserson at mit.edu Tue Aug 24 10:35:01 2010 From: laserson at mit.edu (Uri Laserson) Date: Tue, 24 Aug 2010 10:35:01 -0400 Subject: [Biopython-dev] IMGT parser (modified EMBL format), In-Reply-To: References: Message-ID: Hi all, I would obviously prefer it to go into the distribution as soon as it is possible, but I don't want to mess with the releases. The IMGT people said they'll put a news announcement on their site and a link to biopython once the code is in the official release. Uri On Tue, Aug 24, 2010 at 07:56, Peter wrote: > Hi all, > > The IMGT is the international ImMunoGeneTics information system, a global > reference in immunogenetics and immunoinformatics. They have a sequence > databases, genome database, structure database, and monoclonal antibodies > database. > > The IMGT use a variant of the EMBL flat file format with longer feature > indents: > http://imgt.cines.fr/download/LIGM-DB/userman_doc.html > http://imgt.cines.fr/download/LIGM-DB/ftable_doc.html > http://www.ebi.ac.uk/imgt/hla/docs/manual.html > > Uri and I have been working on extending the SeqIO EMBL/GenBank parser > and writer to support IMGT files too. This uncovered a number of data > formatting > issues (e.g. wrong sequence length in ID line, partial feature > locations) and Uri > has been liaising with the IMGT curators to address these. With their > latest > (Aug 2010) release, we can now parse the whole file without errors: > http://imgt.cines.fr/download/LIGM-DB/imgt.dat.Z > > I think this code is now ready to merge - comments welcome: > http://github.com/peterjc/biopython/commits/seqio-imgt > > Potentially we could even include this in Biopython 1.55, although it would > be more cautious not to add any new features between the beta and the > final release... > > Peter > -- Uri Laserson Graduate Student, Biomedical Engineering Harvard-MIT Division of Health Sciences and Technology M +1 917 742 8019 laserson at mit.edu From biopython at maubp.freeserve.co.uk Tue Aug 24 12:30:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Aug 2010 17:30:47 +0100 Subject: [Biopython-dev] Fwd: [blast-announce] Correction: BLAST 2.2.24 release announcement In-Reply-To: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> Message-ID: Hi all, The NCBI have just released a new version of BLAST+ (see below). I've just updated the existing BLAST+ application wrappers for the minor changes made in BLAST 2.2.24+. Something potentially quite useful in this release is the blast_formatter command for turning ASN.1 BLAST+ output (using ?outfmt 11) into any of the other output formats. i.e. If you are not sure what output format will be most useful (e.g. plain text, XML, tabular) and rerunning the BLAST is slow, the NCBI now let you run the BLAST once and save it as ASN.1, then convert this to any other format on demand using blast_formatter (which should be fast). We should write a command line wrapper for this new tool... Peter ---------- Forwarded message ---------- From: mcginnis Date: Tue, Aug 24, 2010 at 4:46 PM Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement To: NLM/NCBI List blast-announce A new version of the stand-alone applications is available. Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ This release includes a number of bug fixes as well as new features for the BLAST+ applications: *?Introduce BLAST Archive format to permit reformatting of?stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) * Added the blast_formatter application (see BLAST+ user manual) * Added support for translated subject soft masking in the BLAST databases * Added support for the BLAST Trace-back operations (btop) output format * Added command line options to blastdbcmd for listing available BLAST databases * Improved performance of formatting of remote BLAST searches * Use a consistent exit code for out of memory conditions * Fixed bug in indexed megablast with multiple space-separated BLAST databases * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb * Fixed Windows installer for 64-bit installations BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download From mjldehoon at yahoo.com Tue Aug 24 21:11:56 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 24 Aug 2010 18:11:56 -0700 (PDT) Subject: [Biopython-dev] Obsolete code In-Reply-To: Message-ID: <274902.34856.qm@web62407.mail.re1.yahoo.com> Tiago, Bartek, Peter, Thanks for your comments. Peter, is the main branch still frozen? I'd like to make these changes on Saturday Japanese time, so Friday night in Europe/US. > I'd be cautious about the Seq object given it will be > particularly widely used. I've commented on a few cases > below, but in general yes deprecation of things previously > marked as obsolete is sensible house keeping. For the Seq object, I think it's good to have a DeprecationWarning to make users aware of the changes. For example, I was not aware that Seq.data is obsolete. > > Bio.Align.MultipleSeqAlignment.get_column > > Bio.Align.Generic.Alignment > > Bio.Align.Generic.Alignment.get_seq_by_num > > I'd like to make the new alignment object a bit more user > friendly before we deprecate these bits. OK I won't touch these then. > > Bio.Blast.Applications.FastacmdCommandline > > Bio.Blast.Applications.BlastallCommandline > > Bio.Blast.Applications.BlastpgpCommandline > > Bio.Blast.Applications.RpsBlastCommandline > > Bio.Blast.NCBIStandalone.blastall > > Bio.Blast.NCBIStandalone.blastpgp > > Bio.Blast.NCBIStandalone.rpsblast > > The NCBI are still supporting "legacy" BLAST so it is > probably a bit too early to deprecate these wrappers. > Maybe I'm being cautious but I'd leave this until the > next release in three months time or so. OK. > (Bio.Blast.NCBIStandalone has been declared obsolete, > but I guess some people may still be using the Blast > plain-text output parser.) > > Deprecation seems premature - the code is still useful and > probably widely used (I use it myself sometimes). Maybe once your > new BLAST parsing framework is ready Michiel? OK. > > Bio.Fasta > > Here my instinct is to be cautious given Bio.Fasta used to > be such a widely used module. Here also I think we should make our users aware of the changes, especially because it used to be widely used. --Michiel. From mjldehoon at yahoo.com Tue Aug 24 21:15:04 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 24 Aug 2010 18:15:04 -0700 (PDT) Subject: [Biopython-dev] Deprecated code In-Reply-To: Message-ID: <435550.4986.qm@web62403.mail.re1.yahoo.com> Well, here I think that we should be brave and remove the deprecated code, unless if any of the users are actively using it. If we leave deprecated code in too long then Biopython becomes a mess. --Michiel. --- On Sun, 8/22/10, Peter Cock wrote: > From: Peter Cock > Subject: Re: [Biopython-dev] Deprecated code > To: "Michiel de Hoon" > Cc: "biopython-dev at biopython.org" > Date: Sunday, August 22, 2010, 6:11 PM > On Saturday, August 21, 2010, Michiel > de Hoon > wrote: > > > > Dear all, > > > > Below are the modules and functions that were > deprecated (with a DeprecationWarning) in Biopython 1.51 or > earlier, which was released on August 17, 2009. Since that > is more than one year (and more than two releases) ago, we > can remove these from Biopython. Any objections? If not, > I'll send this list also to the user mailing list before > removing them. > > > > --Michiel. > > > > Since we didn't do this in the beta, I'd say leave these > for Biopython > 1.55 (about one week's time - so far so good) but then they > can go. An > email to the mail list would be sensible too. > > Peter > From updates at feedmyinbox.com Wed Aug 25 03:12:57 2010 From: updates at feedmyinbox.com (Feed My Inbox) Date: Wed, 25 Aug 2010 03:12:57 -0400 Subject: [Biopython-dev] 8/25 biopython Questions - BioStar Message-ID: // Restriction for automated primer design // August 24, 2010 at 12:39 PM http://biostar.stackexchange.com/questions/2239/restriction-for-automated-primer-design Hi, I am pretty new to BioPython, but I am trying to write a script that would allow the user to input a fasta-file and the multiple cloning site of the target vector. In my script e.g. pQCXIN is the MCS of a vector. I successfully feed digest the insert and get a list of enzymes that does not cut the insert. However, the list lost the order of the restriction sites defined in the beginning of my code (probably because dictionaries are not ordered??). But the order is essential for my primer design step, as I would obviously like to add a 5' RE to my 5' primer... so basically, how do I get the "no_cutter" as a list that has the same order as the input in the beginning of my code Any help, suggestions would be appreciated This is my code so far... from Bio.Seq import Seq from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA from Bio.Restriction import * from Bio import SeqIO ##Define the MCS of several vectors this list will get populated while in use## pQCXIN = RestrictionBatch([NotI,AgeI,BsiWI,PacI,BamHI,EcoRI]) pUC19 = RestrictionBatch([HindIII,SphI,PstI,SalI,XbaI,BamHI,SmaI,KpnI,SacI,EcoRI]) ##prompt for vector and insert## sequence=raw_input('''select your sequence file in FASTA format: ''') vector=raw_input('''select your vector: ''') #print vector for seq in SeqIO.parse(sequence, "fasta"): Digest = Analysis(eval(vector), seq.seq, linear=True) print seq.id #Digest.print_as('map') #print Digest.print_that() no_cutters = list(Digest.without_site()) print no_cutters[1].site -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/444424/f8ec200ea7b1a33442ee9d28a3d1365a23421b9a/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. 230 Franklin Road Suite 814 Franklin, TN 37064 From biopython at maubp.freeserve.co.uk Wed Aug 25 04:29:11 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Aug 2010 09:29:11 +0100 Subject: [Biopython-dev] Obsolete code In-Reply-To: <274902.34856.qm@web62407.mail.re1.yahoo.com> References: <274902.34856.qm@web62407.mail.re1.yahoo.com> Message-ID: On Wed, Aug 25, 2010 at 2:11 AM, Michiel de Hoon wrote: > Tiago, Bartek, Peter, > > Thanks for your comments. > > Peter, is the main branch still frozen? I'd like to make these > changes on Saturday Japanese time, so Friday night in Europe/US. No - bug fixes, documentation are fine. >> I'd be cautious about the Seq object given it will be >> particularly widely used. I've commented on a few cases >> below, but in general yes deprecation of things previously >> marked as obsolete is sensible house keeping. > > For the Seq object, I think it's good to have a DeprecationWarning > to make users aware of the changes. For example, I was not > aware that Seq.data is obsolete. I guess we have to do it at some point, so OK. >> > Bio.Align.MultipleSeqAlignment.get_column >> > Bio.Align.Generic.Alignment >> > Bio.Align.Generic.Alignment.get_seq_by_num >> >> I'd like to make the new alignment object a bit more user >> friendly before we deprecate these bits. > > OK I won't touch these then. > >> > Bio.Blast.Applications.FastacmdCommandline >> > Bio.Blast.Applications.BlastallCommandline >> > Bio.Blast.Applications.BlastpgpCommandline >> > Bio.Blast.Applications.RpsBlastCommandline >> > Bio.Blast.NCBIStandalone.blastall >> > Bio.Blast.NCBIStandalone.blastpgp >> > Bio.Blast.NCBIStandalone.rpsblast >> >> The NCBI are still supporting "legacy" BLAST so it is >> probably a bit too early to deprecate these wrappers. >> Maybe I'm being cautious but I'd leave this until the >> next release in three months time or so. > > OK. > >> (Bio.Blast.NCBIStandalone has been declared obsolete, >> but I guess some people may still be using the Blast >> plain-text output parser.) >> >> Deprecation seems premature - the code is still useful and >> probably widely used (I use it myself sometimes). Maybe >> once your new BLAST parsing framework is ready Michiel? > > OK. > >> > Bio.Fasta >> >> Here my instinct is to be cautious given Bio.Fasta used to >> be such a widely used module. > > Here also I think we should make our users aware of the > changes, especially because it used to be widely used. As you pointed out on the other thread, Bio.Fasta has already been declared deprecated. Peter From p.j.a.cock at googlemail.com Wed Aug 25 04:33:44 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 25 Aug 2010 09:33:44 +0100 Subject: [Biopython-dev] Deprecated code In-Reply-To: <435550.4986.qm@web62403.mail.re1.yahoo.com> References: <435550.4986.qm@web62403.mail.re1.yahoo.com> Message-ID: Michiel wrote: >>> Below are the modules and functions that were >>> deprecated (with a DeprecationWarning) in Biopython 1.51 or >>> earlier, which was released on August 17, 2009. Since that >>> is more than one year (and more than two releases) ago, we >>> can remove these from Biopython. Any objections? If not, >>> I'll send this list also to the user mailing list before >>> removing them. Peter wrote: >> Since we didn't do this in the beta, I'd say leave these for >> Biopython 1.55 (about one week's time - so far so good) >> but then they can go. An email to the mail list would be >> sensible too. Michiel de Hoon wrote: > Well, here I think that we should be brave and remove the > deprecated code, unless if any of the users are actively using > it. If we leave deprecated code in too long then Biopython > becomes a mess. OK then - send out a warning mail on the mail list, and if there are no objections you can remove these deprecated modules at the end of the week as you'd suggested. I'll then aim to do the Biopython 1.55 final release early next week. How's that? Peter From mjldehoon at yahoo.com Wed Aug 25 09:54:02 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 25 Aug 2010 06:54:02 -0700 (PDT) Subject: [Biopython-dev] Deprecated code In-Reply-To: Message-ID: <766351.23054.qm@web62403.mail.re1.yahoo.com> --- On Wed, 8/25/10, Peter Cock wrote: > OK then - send out a warning mail on the mail list, and if > there are no objections you can remove these deprecated modules > at the end of the week as you'd suggested. I'll then aim to > do the Biopython 1.55 final release early next week. How's > that? Sounds good! I just sent out a warning mail to the user mailing list. Best, --Michiel. From bugzilla-daemon at portal.open-bio.org Thu Aug 26 09:13:21 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 26 Aug 2010 09:13:21 -0400 Subject: [Biopython-dev] [Bug 3096] PPBuilder build_peptides bugs In-Reply-To: Message-ID: <201008261313.o7QDDLlY001012@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3096 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-26 09:13 EST ------- (In reply to comment #3) > Hi Peter, > > I manage to produce the problem without modifying _accept(). > Excellent - that should help. > > The output peptides should be: ['IHR',STGL'] not ['IHRXTGL'] in the current > version... > I agree that ['IHRXTGL'] is definitely wrong (you have convinced me this is a real bug). Chain A has residues: ILE, HIS, ARG, XLY, SER, THR, GLY, LEU. Sensible results are therefore ['IHRXSTGL'] if we include XLY as a modified amino acid, or ['IHR', 'STGL'] is we exclude XLY (which we probably should). Was XLY just an artifical example for this bug report? Looking at the original PDB file for 1BFE, it is a modified GLY where you have switched CA (alpha carbon) to the non-standard CX. > Residue XLY A 319 or X in the fourth position should not be included > since it doesn't have CA atom. Instead the current version includes it and > remove the 'S' next to it, due to the same bug. One can get the right version > using the patch provided before. > > Whether the _accept is modified or not the bug remains. Also the user should > not be expected to also modify build_peptides() method whenever PPBuilder > _accept is modified since the accept variable in build_peptides isn't really a > local (private) variable: In line 277 this variable accept is referenced from > self.accept of PPBuilder. > > http://www.biopython.org/DIST/docs/api/Bio.PDB.Polypeptide-pysrc.html > 277 accept=self._accept I'm assuming you mean the line "accept=self._accept" in the build_peptides method of the _PPBuilder class in Bio/PDB/Polypeptide.py (the line numbers have changed). If so, all that does is define a local variable within the scope of that method - it does not expose the method in any way. I don't understand what you mean here. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Thu Aug 26 06:43:38 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 26 Aug 2010 11:43:38 +0100 Subject: [Biopython-dev] GTF (T not F) Message-ID: Hi, I've been noticing that there has been some work with GFF files around here. I've done a parser for GTF files ( http://mblab.wustl.edu/GTF22.html ) and I was wondering if someone would find interest in it? My knowledge of use cases of GTF/GFF is quite limited. I've done this to support reading Ensembl data in the context of supporting my work with HapMap datasets (The related project is this: http://popgen.eu/soft/interPop/ ) , but I really do not know the "big picture" of use cases. Anyway, I would be willing to donate the code if there is interest. Also adapt it to support more general use cases The code is available here http://bazaar.launchpad.net/~tiagoantao/interpopula/trunk/annotate/head%3A/src/interPopula/Ensembl/GTF.py But as you will notice it is wrapped in lots of SQL stuff (which would have to be removed/adapted). I could remove my SQL fluff and just produce a simple parser if somebody would tell me how should the design be done to support more general use cases. The format is not very complex, anyway. Tiago -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From biopython at maubp.freeserve.co.uk Thu Aug 26 09:52:16 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Aug 2010 14:52:16 +0100 Subject: [Biopython-dev] GTF (T not F) In-Reply-To: References: Message-ID: 2010/8/26 Tiago Ant?o : > Hi, > > I've been noticing that there has been some work with GFF files around here. > I've done a parser for GTF files ( http://mblab.wustl.edu/GTF22.html ) > and I was wondering if someone would find interest in it? > My knowledge of use cases of GTF/GFF is quite limited. I think Brad can comment, but as I understand it GTF is part of the GFF family, and he was going to support this as well as vague GFF and GFF3. Peter From bugzilla-daemon at portal.open-bio.org Thu Aug 26 12:30:00 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 26 Aug 2010 12:30:00 -0400 Subject: [Biopython-dev] [Bug 3096] PPBuilder build_peptides bugs In-Reply-To: Message-ID: <201008261630.o7QGU07U009778@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3096 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-26 12:30 EST ------- Hi Siong, Can you test this branch? I've made a change based on your suggestion: http://github.com/peterjc/biopython/tree/bug3096 Currently there is just this one commit: http://github.com/peterjc/biopython/commit/d65d2f4dfbedffa2847db0a37984c354586b4cb8 If you don't have git installed, or are not familiar with it, you can just modified file Bio/PDB/Polypeptide.py from here: http://github.com/peterjc/biopython/raw/d65d2f4dfbedffa2847db0a37984c354586b4cb8/Bio/PDB/Polypeptide.py Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Aug 26 13:37:06 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Aug 2010 18:37:06 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 (beta) In-Reply-To: References: Message-ID: On Wed, Aug 18, 2010 at 10:03 PM, Peter wrote: > > As mentioned earlier, epydoc is done, and I've also just done a news post: > http://news.open-bio.org/news/2010/08/biopython-1-55-beta-released/ > > If there are any typos or other suggestions for improvement, please tell > us. We can edit that page - and then turn it into an email to send out. > > This means the "trunk freeze" is over, but for the next week or so when > we'll do the official release, let's focus on documentation and any bug fixes. > [Keep new feature work only on branches please.] > As discussed here, the plan is to do the final release on Monday or Tuesday (30 or 31 August 2010), after a few deprecations/removals are done: http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008194.html http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008196.html Peter From bugzilla-daemon at portal.open-bio.org Thu Aug 26 13:38:19 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 26 Aug 2010 13:38:19 -0400 Subject: [Biopython-dev] [Bug 3109] Record class in Bio.SCOP.Cla has hierarchy member as list instead of dictionary In-Reply-To: Message-ID: <201008261738.o7QHcJZP012135@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3109 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-26 13:38 EST ------- After Biopython 1.55 final is out I'll look at merging this. Thanks. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Fri Aug 27 08:06:57 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 27 Aug 2010 08:06:57 -0400 Subject: [Biopython-dev] GTF (T not F) In-Reply-To: References: Message-ID: <20100827120657.GC23299@sobchak.mgh.harvard.edu> Tiago; > I've been noticing that there has been some work with GFF files around here. > I've done a parser for GTF files ( http://mblab.wustl.edu/GTF22.html ) > and I was wondering if someone would find interest in it? The GFF parser should parse the GTF variant as well: http://github.com/chapmanb/bcbb/tree/master/gff/ If it is having trouble on any specific files please send them along and I'll be happy to have a look. > My knowledge of use cases of GTF/GFF is quite limited. I've done this > to support reading Ensembl data in the context of supporting my work > with HapMap datasets (The related project is this: > http://popgen.eu/soft/interPop/ ) , but I really do not know the "big > picture" of use cases. This looks like you've specialized the extraction to this particular type of GFF, which could be useful for folks dealing with the same specific files you are. The GFF parser is more general and returns Biopython SeqFeature objects, so you could use it to actually do the parse part, and then provide your specific extraction and storage on top of that. Brad From tiagoantao at gmail.com Fri Aug 27 08:33:35 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 27 Aug 2010 13:33:35 +0100 Subject: [Biopython-dev] GTF (T not F) In-Reply-To: <20100827120657.GC23299@sobchak.mgh.harvard.edu> References: <20100827120657.GC23299@sobchak.mgh.harvard.edu> Message-ID: 2010/8/27 Brad Chapman : > Tiago; > >> I've been noticing that there has been some work with GFF files around here. >> I've done a parser for GTF files ( http://mblab.wustl.edu/GTF22.html ) >> and I was wondering if someone would find interest in it? > > The GFF parser should parse the GTF variant as well: OK then. I really did not know if there was any GTF support. I have a specific case with a target use case (help processing HapMap data). When your code is included in biopython, I think I will just deprecate mine in favour of using your more general solution. In this case I see no good reason to maintain 2 separate implementations (and my core functionality is really HapMap related). Tiago From mjldehoon at yahoo.com Fri Aug 27 21:41:04 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 27 Aug 2010 18:41:04 -0700 (PDT) Subject: [Biopython-dev] Test suite failure Message-ID: <829004.66089.qm@web62403.mail.re1.yahoo.com> Dear all, I am getting the errors below when running the Biopython tests on Mac OS X. This is with blast+ 2.2.23. The blast+ 2.2.24 installer fails on Mac OS X, so I don't know if the same problem would occur with that version. --Michiel ====================================================================== ERROR: test_blastn (test_NCBI_BLAST_tools.CheckCompleteArgList) Check all blastn arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 220, in test_blastn self.check("blastn", Applications.NcbiblastnCommandline) File "test_NCBI_BLAST_tools.py", line 204, in check ",".join(sorted(missing)))) MissingExternalDependencyError: BLAST+ and Biopython out of sync. Your version of the NCBI BLAST+ tool blastn does not match what we are expecting. Please update your copy of Biopython, or report this issue if you are already using the latest version. (Exta args: -seqidlist; Missing: ) ====================================================================== ERROR: test_blastp (test_NCBI_BLAST_tools.CheckCompleteArgList) Check all blastp arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 216, in test_blastp self.check("blastp", Applications.NcbiblastpCommandline) File "test_NCBI_BLAST_tools.py", line 204, in check ",".join(sorted(missing)))) MissingExternalDependencyError: BLAST+ and Biopython out of sync. Your version of the NCBI BLAST+ tool blastp does not match what we are expecting. Please update your copy of Biopython, or report this issue if you are already using the latest version. (Exta args: -seqidlist; Missing: ) ====================================================================== ERROR: test_blastx (test_NCBI_BLAST_tools.CheckCompleteArgList) Check all blastx arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 212, in test_blastx self.check("blastx", Applications.NcbiblastxCommandline) File "test_NCBI_BLAST_tools.py", line 204, in check ",".join(sorted(missing)))) MissingExternalDependencyError: BLAST+ and Biopython out of sync. Your version of the NCBI BLAST+ tool blastx does not match what we are expecting. Please update your copy of Biopython, or report this issue if you are already using the latest version. (Exta args: -seqidlist; Missing: ) ====================================================================== ERROR: test_psiblast (test_NCBI_BLAST_tools.CheckCompleteArgList) Check all psiblast arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 232, in test_psiblast self.check("psiblast", Applications.NcbipsiblastCommandline) File "test_NCBI_BLAST_tools.py", line 204, in check ",".join(sorted(missing)))) MissingExternalDependencyError: BLAST+ and Biopython out of sync. Your version of the NCBI BLAST+ tool psiblast does not match what we are expecting. Please update your copy of Biopython, or report this issue if you are already using the latest version. (Exta args: -seqidlist; Missing: ) ====================================================================== ERROR: test_rpsblast (test_NCBI_BLAST_tools.CheckCompleteArgList) Check all rpsblast arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 236, in test_rpsblast self.check("rpsblast", Applications.NcbirpsblastCommandline) File "test_NCBI_BLAST_tools.py", line 204, in check ",".join(sorted(missing)))) MissingExternalDependencyError: BLAST+ and Biopython out of sync. Your version of the NCBI BLAST+ tool rpsblast does not match what we are expecting. Please update your copy of Biopython, or report this issue if you are already using the latest version. (Exta args: -seqidlist; Missing: ) ====================================================================== ERROR: test_rpstblastn (test_NCBI_BLAST_tools.CheckCompleteArgList) Check all rpstblastn arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 240, in test_rpstblastn self.check("rpstblastn", Applications.NcbirpstblastnCommandline) File "test_NCBI_BLAST_tools.py", line 204, in check ",".join(sorted(missing)))) MissingExternalDependencyError: BLAST+ and Biopython out of sync. Your version of the NCBI BLAST+ tool rpstblastn does not match what we are expecting. Please update your copy of Biopython, or report this issue if you are already using the latest version. (Exta args: -seqidlist; Missing: ) ====================================================================== ERROR: test_tblastn (test_NCBI_BLAST_tools.CheckCompleteArgList) Check all tblastn arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 228, in test_tblastn self.check("tblastn", Applications.NcbitblastnCommandline) File "test_NCBI_BLAST_tools.py", line 204, in check ",".join(sorted(missing)))) MissingExternalDependencyError: BLAST+ and Biopython out of sync. Your version of the NCBI BLAST+ tool tblastn does not match what we are expecting. Please update your copy of Biopython, or report this issue if you are already using the latest version. (Exta args: -db_soft_mask,-seqidlist; Missing: ) ====================================================================== ERROR: test_tblastx (test_NCBI_BLAST_tools.CheckCompleteArgList) Check all tblastx arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 224, in test_tblastx self.check("tblastx", Applications.NcbitblastxCommandline) File "test_NCBI_BLAST_tools.py", line 204, in check ",".join(sorted(missing)))) MissingExternalDependencyError: BLAST+ and Biopython out of sync. Your version of the NCBI BLAST+ tool tblastx does not match what we are expecting. Please update your copy of Biopython, or report this issue if you are already using the latest version. (Exta args: -db_soft_mask,-seqidlist; Missing: ) ---------------------------------------------------------------------- From mjldehoon at yahoo.com Fri Aug 27 22:53:19 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 27 Aug 2010 19:53:19 -0700 (PDT) Subject: [Biopython-dev] Obsolete code In-Reply-To: Message-ID: <992835.52804.qm@web62406.mail.re1.yahoo.com> I applied your patch to Bio.Motif, and added DeprecationWarnings to each of the submodules of Bio.AlignAce (without them, importing one of the submodules directly did not issue the DeprecationWarning in Bio.AlignAce.__init__). Thanks, --Michiel. --- On Sat, 8/21/10, Bartek Wilczynski wrote: From: Bartek Wilczynski Subject: Re: [Biopython-dev] Obsolete code To: "Michiel de Hoon" Cc: biopython-dev at biopython.org Date: Saturday, August 21, 2010, 4:56 AM Hi, Great job. I just have a small comment on the Bio.AlignAce.Parser module. It is a part of the Bio.AlignAce "package" which was already deprecated? (see Bio.AlignAce.__init__.py). I think either there is no need to put an extra deprecation warning in Bio.AlignAce.Parser, or we should put it in all submodules of Bio.AlignAce (like Bio.AlignaAce.Motif, etc.). As for deprecating the old parsers in Bio Motif, I've just looked through the code in Motif.__init__.py and realized that it was still using the old implemmentations... This small patch below should fix it (I don't want to push onto the main branch now, as it's frozen). Cheers Bartek diff --git a/Bio/Motif/__init__.py b/Bio/Motif/__init__.py index 4c7c19a..9ca0200 100644 --- a/Bio/Motif/__init__.py +++ b/Bio/Motif/__init__.py @@ -10,12 +10,12 @@ as well as methods for motif comparisons and motif searching in sequences. ?It also inlcudes functionality for parsing AlignACE and MEME programs ?""" ?from _Motif import Motif -from Parsers.AlignAce import AlignAceParser, CompareAceParser -from Parsers.MEME import MEMEParser,MASTParser +import Parsers.AlignAce +import Parsers.MEME ?from Thresholds import ScoreDistribution ? -_parsers={"AlignAce":AlignAceParser, -????????? "MEME":MEMEParser +_parsers={"AlignAce":Parsers.AlignAce.read, +????????? "MEME":Parsers.MEME.read ?????????? } ? ?def _from_pfm(handle): @@ -75,7 +75,7 @@ def parse(handle,format): ???????? else: #we have a proper reader ???????????? yield reader(handle) ???? else: # we have a proper reader -??????? for m in parser().parse(handle).motifs: +??????? for m in parser(handle).motifs: ???????????? yield m ? ?def read(handle,format): On Sat, Aug 21, 2010 at 8:30 AM, Michiel de Hoon wrote: Dear all, The classes and modules listed below were declared obsolete in Biopython 1.54 or earlier, but do not yet raise a deprecation warning. Most of this functionality moved to a different module or was implemented differently. I suggest we add a DeprecationWarning to each of these before Biopython 1.55 final. The only tricky one is the .data property of Seq classes in Bio.Seq. Still, it might be good to add a DeprecationWarning there to make people aware that this property is obsolete. Any objections? --Michiel. Bio.CelFile.CelParser Bio.CelFile.CelScanner Bio.CelFile.CelConsumer Bio.CelFile.CelRecord Bio.Align.MultipleSeqAlignment.get_column Bio.Align.Generic.Alignment Bio.Align.Generic.Alignment.get_seq_by_num Bio.AlignAce.Parser Bio.Blast.Applications.FastacmdCommandline Bio.Blast.Applications.BlastallCommandline Bio.Blast.Applications.BlastpgpCommandline Bio.Blast.Applications.RpsBlastCommandline Bio.Blast.NCBIStandalone.blastall Bio.Blast.NCBIStandalone.blastpgp Bio.Blast.NCBIStandalone.rpsblast (Bio.Blast.NCBIStandalone has been declared obsolete, but I guess some people may still be using the Blast plain-text output parser.) Bio.Clustalw Bio.Compass._Scanner Bio.Compass._Consumer Bio.Compass.RecordParser Bio.Compass.Iterator Bio.Emboss.Applications.EProtDistCommandline Bio.Emboss.Applications.ENeighborCommandline Bio.Emboss.Applications.EProtParsCommandline Bio.Emboss.Applications.EConsenseCommandline Bio.Emboss.Applications.ESeqBootCommandline Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer.xcentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer._set_xcentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer.ycentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer._set_ycentre Bio.Graphics.GenomeDiagram._Graph.centre Bio.Graphics.GenomeDiagram._Graph._set_centre Bio.Motif.Parsers.AlignAce.AlignAceConsumer Bio.Motif.Parsers.AlignAce.AlignAceParser Bio.Motif.Parsers.AlignAce.AlignAceScanner Bio.Motif.Parsers.AlignAce.CompareAceScanner Bio.Motif.Parsers.AlignAce.CompareAceConsumer Bio.Motif.Parsers.MEME.MEMEParser Bio.Motif.Parsers.MEME._MEMEScanner Bio.Motif.Parsers.MEME._MEMEConsumer Bio.Motif.Parsers.MEME._MASTConsumer Bio.Motif.Parsers.MEME.MASTParser Bio.Motif.Parsers.MEME._MASTScanner Bio.Motif.Parsers.MEME.MASTRecord Bio.PopGen.FDist.RecordParser Bio.PopGen.FDist._Scanner Bio.PopGen.FDist._RecordConsumer Bio.Seq.Seq.data Bio.SeqUtils.GC_Frame Bio.SeqUtils.fasta_uniqids Bio.SeqUtils.apply_on_multi_fasta Bio.SeqUtils.quicker_apply_on_multi_fasta Bio.UniGene.UnigeneSequenceRecord Bio.UniGene.UnigeneProtsimRecord Bio.UniGene.UnigeneSTSRecord Bio.UniGene.UnigeneRecord Bio.UniGene._RecordConsumer Bio.UniGene._Scanner Bio.UniGene.RecordParser Bio.UniGene.Iterator _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From mjldehoon at yahoo.com Fri Aug 27 23:21:26 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 27 Aug 2010 20:21:26 -0700 (PDT) Subject: [Biopython-dev] Obsolete code Message-ID: <380077.17575.qm@web62401.mail.re1.yahoo.com> > (without them, importing one of the submodules directly did not issue > the DeprecationWarning in Bio.AlignAce.__init__). I take that back ... it turned out that Python 2.7 silences DeprecationWarnings by default. These warnings can be switched on by the -Wd flag when starting Python. For Biopython 1.56, we should replace DeprecationWarnings by a Biopython-specific warning class. --Michiel. --- On Fri, 8/27/10, Michiel de Hoon wrote: From: Michiel de Hoon Subject: Re: [Biopython-dev] Obsolete code To: "Bartek Wilczynski" Cc: biopython-dev at biopython.org Date: Friday, August 27, 2010, 10:53 PM I applied your patch to Bio.Motif, and added DeprecationWarnings to each of the submodules of Bio.AlignAce (without them, importing one of the submodules directly did not issue the DeprecationWarning in Bio.AlignAce.__init__). Thanks, --Michiel. --- On Sat, 8/21/10, Bartek Wilczynski wrote: From: Bartek Wilczynski Subject: Re: [Biopython-dev] Obsolete code To: "Michiel de Hoon" Cc: biopython-dev at biopython.org Date: Saturday, August 21, 2010, 4:56 AM Hi, Great job. I just have a small comment on the Bio.AlignAce.Parser module. It is a part of the Bio.AlignAce "package" which was already deprecated? (see Bio.AlignAce.__init__.py). I think either there is no need to put an extra deprecation warning in Bio.AlignAce.Parser, or we should put it in all submodules of Bio.AlignAce (like Bio.AlignaAce.Motif, etc.). As for deprecating the old parsers in Bio Motif, I've just looked through the code in Motif.__init__.py and realized that it was still using the old implemmentations... This small patch below should fix it (I don't want to push onto the main branch now, as it's frozen). Cheers Bartek diff --git a/Bio/Motif/__init__.py b/Bio/Motif/__init__.py index 4c7c19a..9ca0200 100644 --- a/Bio/Motif/__init__.py +++ b/Bio/Motif/__init__.py @@ -10,12 +10,12 @@ as well as methods for motif comparisons and motif searching in sequences. ?It also inlcudes functionality for parsing AlignACE and MEME programs ?""" ?from _Motif import Motif -from Parsers.AlignAce import AlignAceParser, CompareAceParser -from Parsers.MEME import MEMEParser,MASTParser +import Parsers.AlignAce +import Parsers.MEME ?from Thresholds import ScoreDistribution ? -_parsers={"AlignAce":AlignAceParser, -????????? "MEME":MEMEParser +_parsers={"AlignAce":Parsers.AlignAce.read, +????????? "MEME":Parsers.MEME.read ?????????? } ? ?def _from_pfm(handle): @@ -75,7 +75,7 @@ def parse(handle,format): ???????? else: #we have a proper reader ???????????? yield reader(handle) ???? else: # we have a proper reader -??????? for m in parser().parse(handle).motifs: +??????? for m in parser(handle).motifs: ???????????? yield m ? ?def read(handle,format): On Sat, Aug 21, 2010 at 8:30 AM, Michiel de Hoon wrote: Dear all, The classes and modules listed below were declared obsolete in Biopython 1.54 or earlier, but do not yet raise a deprecation warning. Most of this functionality moved to a different module or was implemented differently. I suggest we add a DeprecationWarning to each of these before Biopython 1.55 final. The only tricky one is the .data property of Seq classes in Bio.Seq. Still, it might be good to add a DeprecationWarning there to make people aware that this property is obsolete. Any objections? --Michiel. Bio.CelFile.CelParser Bio.CelFile.CelScanner Bio.CelFile.CelConsumer Bio.CelFile.CelRecord Bio.Align.MultipleSeqAlignment.get_column Bio.Align.Generic.Alignment Bio.Align.Generic.Alignment.get_seq_by_num Bio.AlignAce.Parser Bio.Blast.Applications.FastacmdCommandline Bio.Blast.Applications.BlastallCommandline Bio.Blast.Applications.BlastpgpCommandline Bio.Blast.Applications.RpsBlastCommandline Bio.Blast.NCBIStandalone.blastall Bio.Blast.NCBIStandalone.blastpgp Bio.Blast.NCBIStandalone.rpsblast (Bio.Blast.NCBIStandalone has been declared obsolete, but I guess some people may still be using the Blast plain-text output parser.) Bio.Clustalw Bio.Compass._Scanner Bio.Compass._Consumer Bio.Compass.RecordParser Bio.Compass.Iterator Bio.Emboss.Applications.EProtDistCommandline Bio.Emboss.Applications.ENeighborCommandline Bio.Emboss.Applications.EProtParsCommandline Bio.Emboss.Applications.EConsenseCommandline Bio.Emboss.Applications.ESeqBootCommandline Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer.xcentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer._set_xcentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer.ycentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer._set_ycentre Bio.Graphics.GenomeDiagram._Graph.centre Bio.Graphics.GenomeDiagram._Graph._set_centre Bio.Motif.Parsers.AlignAce.AlignAceConsumer Bio.Motif.Parsers.AlignAce.AlignAceParser Bio.Motif.Parsers.AlignAce.AlignAceScanner Bio.Motif.Parsers.AlignAce.CompareAceScanner Bio.Motif.Parsers.AlignAce.CompareAceConsumer Bio.Motif.Parsers.MEME.MEMEParser Bio.Motif.Parsers.MEME._MEMEScanner Bio.Motif.Parsers.MEME._MEMEConsumer Bio.Motif.Parsers.MEME._MASTConsumer Bio.Motif.Parsers.MEME.MASTParser Bio.Motif.Parsers.MEME._MASTScanner Bio.Motif.Parsers.MEME.MASTRecord Bio.PopGen.FDist.RecordParser Bio.PopGen.FDist._Scanner Bio.PopGen.FDist._RecordConsumer Bio.Seq.Seq.data Bio.SeqUtils.GC_Frame Bio.SeqUtils.fasta_uniqids Bio.SeqUtils.apply_on_multi_fasta Bio.SeqUtils.quicker_apply_on_multi_fasta Bio.UniGene.UnigeneSequenceRecord Bio.UniGene.UnigeneProtsimRecord Bio.UniGene.UnigeneSTSRecord Bio.UniGene.UnigeneRecord Bio.UniGene._RecordConsumer Bio.UniGene._Scanner Bio.UniGene.RecordParser Bio.UniGene.Iterator _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Sat Aug 28 07:33:50 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 28 Aug 2010 12:33:50 +0100 Subject: [Biopython-dev] Test suite failure In-Reply-To: <829004.66089.qm@web62403.mail.re1.yahoo.com> References: <829004.66089.qm@web62403.mail.re1.yahoo.com> Message-ID: On Sat, Aug 28, 2010 at 2:41 AM, Michiel de Hoon wrote: > Dear all, > > I am getting the errors below when running the Biopython tests on > Mac OS X. This is with blast+ 2.2.23. The blast+ 2.2.24 installer > fails on Mac OS X, so I don't know if the same problem would > occur with that version. > > --Michiel My fault, that's a new argument added in 2.2.24+ which shouldn't be expected on older versions. I'll fix that (probably Monday). Peter From mjldehoon at yahoo.com Sat Aug 28 08:21:31 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 28 Aug 2010 05:21:31 -0700 (PDT) Subject: [Biopython-dev] Deprecated code In-Reply-To: <207402.89678.qm@web62407.mail.re1.yahoo.com> Message-ID: <198514.66980.qm@web62407.mail.re1.yahoo.com> I finished adding the deprecation warnings and removing deprecated code as discussed, except for these three: Bio.Transcribe Bio.Translate BioSQL.BioSeqDatabase.open_database: driver="psycopg" option For last one, I wasn't sure how to appropriately remove this option from the code. Maybe somebody more familiar with BioSQL can take care of this? For Bio.Transcribe and Bio.Translate, it turned out that Bio/Encodings/IUPACEncoding.py still use these modules. I don't know if Bio.Encodings.IUPACEncoding is still being used. It's only imported from Bio.Alphabet.IUPAC, but it doesn't seem to be actually used there. Bio.Encodings itself is not being imported anywhere in Biopython. Can we declare Bio.Encodings obsolete? Or can we just remove this module together with Bio.Transcribe, Bio.Translate? --Michiel. --- On Sat, 8/21/10, Michiel de Hoon wrote: > From: Michiel de Hoon > Subject: [Biopython-dev] Deprecated code > To: biopython-dev at biopython.org > Date: Saturday, August 21, 2010, 2:56 AM > Dear all, > > Below are the modules and functions that were deprecated > (with a DeprecationWarning) in Biopython 1.51 or earlier, > which was released on August 17, 2009. Since that is more > than one year (and more than two releases) ago, we can > remove these from Biopython. Any objections? If not, I'll > send this list also to the user mailing list before removing > them. > > --Michiel. > > Bio.Align.FormatConvert > Bio.Emboss.Applications.PrimerSearchCommandline.set_parameter > Bio.Entrez.efetch: rettype="genbank" option > Bio.Fasta > Bio.SCOP.Dom.Parser > Bio.SwissProt.SProt > Bio.Transcribe > Bio.Translate > BioSQL.BioSeqDatabase.open_database: driver="psycopg" > option > > > > ? ? ? > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython at maubp.freeserve.co.uk Sat Aug 28 09:18:14 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 28 Aug 2010 14:18:14 +0100 Subject: [Biopython-dev] Deprecated code In-Reply-To: <198514.66980.qm@web62407.mail.re1.yahoo.com> References: <207402.89678.qm@web62407.mail.re1.yahoo.com> <198514.66980.qm@web62407.mail.re1.yahoo.com> Message-ID: On Sat, Aug 28, 2010 at 1:21 PM, Michiel de Hoon wrote: > I finished adding the deprecation warnings and removing deprecated > code as discussed, except for these three: > > Bio.Transcribe > Bio.Translate > BioSQL.BioSeqDatabase.open_database: driver="psycopg" option > > For last one, I wasn't sure how to appropriately remove this option from > the code. Maybe somebody more familiar with BioSQL can take care of this? I could do that - or maybe Cymon if he has time. > For Bio.Transcribe and Bio.Translate, it turned out that Bio/Encodings > /IUPACEncoding.py still use these modules. I don't know if > Bio.Encodings.IUPACEncoding is still being used. It's only imported > from Bio.Alphabet.IUPAC, but it doesn't seem to be actually used there. > Bio.Encodings itself is not being imported anywhere in Biopython. > > Can we declare Bio.Encodings obsolete? Or can we just remove this > module together with Bio.Transcribe, Bio.Translate? This is also tied in with Bio.PropertyManager and thus Bio.utils - it is probably best to mark these and Bio.Encodings as obsolete, leave Bio.Transcribe and Bio.Translate as deprecated, and review this after Biopython 1.55 is out. Peter From mjldehoon at yahoo.com Sat Aug 28 10:19:53 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 28 Aug 2010 07:19:53 -0700 (PDT) Subject: [Biopython-dev] Deprecated code In-Reply-To: Message-ID: <258810.52757.qm@web62408.mail.re1.yahoo.com> --- On Sat, 8/28/10, Peter wrote: > This is also tied in with Bio.PropertyManager and thus > Bio.utils - it is probably best to mark these and Bio.Encodings > as obsolete, leave Bio.Transcribe and Bio.Translate as deprecated, > and review this after Biopython 1.55 is out. OK, done. I also applied Nathan's suggested fix for Bio.Entrez. --Michiel. From biopython at maubp.freeserve.co.uk Sat Aug 28 10:36:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 28 Aug 2010 15:36:40 +0100 Subject: [Biopython-dev] Test suite failure In-Reply-To: References: <829004.66089.qm@web62403.mail.re1.yahoo.com> Message-ID: On Sat, Aug 28, 2010 at 12:33 PM, Peter wrote: > On Sat, Aug 28, 2010 at 2:41 AM, Michiel de Hoon wrote: >> Dear all, >> >> I am getting the errors below when running the Biopython tests on >> Mac OS X. This is with blast+ 2.2.23. The blast+ 2.2.24 installer >> fails on Mac OS X, so I don't know if the same problem would >> occur with that version. >> >> --Michiel > > My fault, that's a new argument added in 2.2.24+ which shouldn't > be expected on older versions. I'll fix that (probably Monday). > Done, http://github.com/biopython/biopython/commit/176c277deca23657980001813f8f5315b52eb679 Peter From biopython at maubp.freeserve.co.uk Mon Aug 30 09:34:59 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Aug 2010 14:34:59 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 Message-ID: On Thu, Aug 26, 2010 at 6:37 PM, Peter wrote: > On Wed, Aug 18, 2010 at 10:03 PM, Peter wrote: >> >> As mentioned earlier, epydoc is done, and I've also just done a news post: >> http://news.open-bio.org/news/2010/08/biopython-1-55-beta-released/ >> >> If there are any typos or other suggestions for improvement, please tell >> us. We can edit that page - and then turn it into an email to send out. >> >> This means the "trunk freeze" is over, but for the next week or so when >> we'll do the official release, let's focus on documentation and any bug fixes. >> [Keep new feature work only on branches please.] >> > > As discussed here, the plan is to do the final release on Monday or Tuesday > (30 or 31 August 2010), after a few deprecations/removals are done: > http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008194.html > http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008196.html > Hi all, Those deprecations have been done, and the BLAST+ unit test tweaked. Are there any further issues we need to address before doing the final release? Please speak up soon, otherwise I'll do the release tonight or tomorrow as planned. Thanks, Peter From bugzilla-daemon at portal.open-bio.org Mon Aug 30 13:26:41 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 30 Aug 2010 13:26:41 -0400 Subject: [Biopython-dev] [Bug 3134] New: to_networkx returns weird stuff Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3134 Summary: to_networkx returns weird stuff Product: Biopython Version: 1.55b Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: john at nurfuerspam.de Hi, I tried to read http://www.phylosoft.org/archaeopteryx/examples/data/multiple_supports.xml and convert it using to_networkx(). Strangely, all nodes in the resulting graph are named Clade, and when using networkx.write_dot() I get a file with a single clade node, although the number of nodes in the graph object is correct. Also using networkx.to_agraph() does not help. tree = Phylo.read("multiple_support.xml", "phyloxml") tree = Phylo.to_networkx(tree) print set(tree.nodes()) print tree.number_of_nodes() networkx.write_dot(tree, "test.dot") tree = networkx.to_agraph(tree) tree.draw("tree.pdf", prog = "dot") For http://www.phylosoft.org/archaeopteryx/examples/data/bcl_2.xml I get a star tree with a single Clade node in the center and leafs labeled by gene names. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From cy at cymon.org Mon Aug 30 13:36:22 2010 From: cy at cymon.org (Cymon Cox) Date: Mon, 30 Aug 2010 18:36:22 +0100 Subject: [Biopython-dev] BioSQL test code on Postgres Message-ID: Hi Folks, The current test code in test_BioSQL.py fails on PostgreSQL; ERROR: Check list, keys, length etc ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/cymon/git/biopython-github-master/Tests/test_BioSQL.py", line 187, in test_get_db_items del db["non-existant-name"] File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line 519, in __delitem__ if key not in self: File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line 539, in __contains__ (self.dbid, value))[0]) File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line 423, in execute_and_fetch_col0 self.execute(sql, args or ()) File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line 404, in execute self.dbutils.execute(self.cursor, sql, args) File "/home/cymon/git/biopython-github-master/BioSQL/DBUtils.py", line 33, in execute cursor.execute(sql, args or ()) DataError: invalid input syntax for integer: "non-existant-name" LINE 1: ...M bioentry WHERE biodatabase_id=1 AND bioentry_id=E'non-exis... Because when trying to delete a bioentry_id that is a string type, ie. "non-existant-name" (line 188 on test_BioSQL.py), postgres throws an error rather than returning a long (0,1) as in sqlite (and presumably MySQL (I havent tried it)). Should we be type checking in __delitem__ (line 517) in BioSeqDatabase.py so that trying to delete a bioentry_id that is a string throws an appropriate error? Otherwise the BioSQL tests pass on PostGreSQL. The default DBDRIVER PostgreSQL driver in setup.py should be changed to "pyscopg2" Cheers, Cymon From bugzilla-daemon at portal.open-bio.org Mon Aug 30 14:01:52 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 30 Aug 2010 14:01:52 -0400 Subject: [Biopython-dev] [Bug 3134] to_networkx returns weird stuff In-Reply-To: Message-ID: <201008301801.o7UI1qaS024296@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3134 ------- Comment #1 from eric.talevich at gmail.com 2010-08-30 14:01 EST ------- (In reply to comment #0) > Hi, > > I tried to read > http://www.phylosoft.org/archaeopteryx/examples/data/multiple_supports.xml > and convert it using to_networkx(). Strangely, all nodes in the resulting graph > are named Clade, and when using networkx.write_dot() I get a file with a single > clade node, although the number of nodes in the graph object is correct. Hello, Yes, that's how it works. The exported graph uses Clade objects as nodes, and the string representation of unnamed nodes is just the name of the class, Clade. This should still work OK for most NetworkX operations, but not the Graphviz-based ones. The Graphviz-based operations in NetworkX convert the nodes to strings, and then assumes all identical strings refer to the same node, so you'll get a star graph whenever the internal nodes are unnamed. For drawing, try Biopython's Phylo.draw_graphviz instead -- it handles this naming issue safely: >>> Phylo.draw_graphviz(my_tree, prog='neato') You can fix the naming issue yourself by assigning unique names to all the internal clades: >>> for i, clade in enumerate(my_tree.find_clades()): ... if not clade.name: ... clade.name = "Clade_%d" % i Then networkx.write_dot should work better. Or, if you want to do something else involving Graphviz layout, you can look at the source for Phylo.draw_graphviz in the file Bio/Phylo/_utils.py. Is there anything else you'd like to see built into Bio.Phylo to make these operations easier? Thanks, -Eric -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Aug 30 14:21:43 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Aug 2010 19:21:43 +0100 Subject: [Biopython-dev] BioSQL test code on Postgres In-Reply-To: References: Message-ID: On Mon, Aug 30, 2010 at 6:36 PM, Cymon Cox wrote: > Hi Folks, > > The current test code in test_BioSQL.py fails on PostgreSQL; > > ERROR: Check list, keys, length etc > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "/home/cymon/git/biopython-github-master/Tests/test_BioSQL.py", line > 187, in test_get_db_items > ? ?del db["non-existant-name"] > ?File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", > line 519, in __delitem__ > ? ?if key not in self: > ?File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", > line 539, in __contains__ > ? ?(self.dbid, value))[0]) > ?File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", > line 423, in execute_and_fetch_col0 > ? ?self.execute(sql, args or ()) > ?File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", > line 404, in execute > ? ?self.dbutils.execute(self.cursor, sql, args) > ?File "/home/cymon/git/biopython-github-master/BioSQL/DBUtils.py", line 33, > in execute > ? ?cursor.execute(sql, args or ()) > DataError: invalid input syntax for integer: "non-existant-name" > LINE 1: ...M bioentry WHERE biodatabase_id=1 AND bioentry_id=E'non-exis... > > Because when trying to delete a bioentry_id that is a string type, ie. > "non-existant-name" (line 188 on test_BioSQL.py), ?postgres throws an error > rather than returning a long (0,1) as in sqlite (and presumably MySQL (I > havent tried it)). This (test_get_db_items) is once the unit tests added since Biopython 1.54, while I was working on making the BioSQL objects act more like dictionaries. I think the SQL statements for the __contains__ method (and others added recently) may need single quotes round the %s placeholders. Does that work? > Should we be type checking in __delitem__ (line 517) in BioSeqDatabase.py so > that trying to delete a bioentry_id that is a string throws an appropriate > error? > > Otherwise the BioSQL tests pass on PostGreSQL. > > The default DBDRIVER PostgreSQL driver in setup.py should be changed to > "pyscopg2" > > Cheers, Cymon Thanks, Peter From bugzilla-daemon at portal.open-bio.org Mon Aug 30 16:23:20 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 30 Aug 2010 16:23:20 -0400 Subject: [Biopython-dev] [Bug 3134] to_networkx returns weird stuff In-Reply-To: Message-ID: <201008302023.o7UKNK7v029152@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3134 ------- Comment #2 from john at nurfuerspam.de 2010-08-30 16:23 EST ------- thanx for the quick response! the problem is that the standard way using pylab produces ugly squares instead of arrow head in the final layout. but more importantly, I want to perform complex graph operations on the tree using networkx and use Bio.Phylo really just as a means of parsing ;-) I think that when providing a function like to_networkx, it should behave in a manner the user of networkx expects. Why not just use a unique hashable identifier like integers as standard string representation for ALL nodes, and use graphviz'/networkx' label attribute for any name label the node might have? Using the string representation of labeled leafs as identifiers in networkx is also dangerous, since they will be used as identifiers in graphviz and underly a number of restrictions (no whitespace etc.) I'd propose the following: in Clade, __repr__() should return the name of the node, if it has one, or a unique identifier like id() (the memory adress) with an additional "..." around them to make it a valid graphviz identifier.. def __repr__(self): if self.name != None: return self.name else: return "\""+str(id(self))+"\"" your workaround by manually relabeling the clades also assigns the identifiers to the leafs, but there of course I want the species/gene label ;-) cheers, john -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Aug 30 21:43:17 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 30 Aug 2010 21:43:17 -0400 Subject: [Biopython-dev] [Bug 3134] to_networkx returns weird stuff In-Reply-To: Message-ID: <201008310143.o7V1hHQH005250@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3134 ------- Comment #3 from eric.talevich at gmail.com 2010-08-30 21:43 EST ------- (In reply to comment #2) > thanx for the quick response! > > the problem is that the standard way using pylab produces ugly squares instead > of arrow head in the final layout. True. Do you know a way to fix that from NetworkX/matplotlib, or is that the whole reason you're exporting to Graphviz? > but more importantly, I want to perform > complex graph operations on the tree using networkx and use Bio.Phylo really > just as a means of parsing ;-) Great, that's what it's there for. :) > I think that when providing a function like to_networkx, it should behave in a > manner the user of networkx expects. Why not just use a unique hashable > identifier like integers as standard string representation for ALL nodes, and > use graphviz'/networkx' label attribute for any name label the node might have? OK, but wouldn't you want to be able to retrieve all of the original clade's data from any node in a networkx graph? Currently, the arrangement is: - Clade objects are the hashable object used for keys - Given a node in a networkx graph produced by to_networkx, you can uniquely locate that clade in the original tree using the tree.find_* methods -- it's still a valid target, and duplicate names aren't a problem - Other clade attributes, like taxonomy and bootstrap values, are also still available on the node - Serializing the graph nodes for Graphviz goes haywire, so we provide draw_graphviz as a workaround I think you're suggesting: - Use id(clade) or some arbitrary unique integer as keys - Attach the clade name, if available, to the networkx node as a label... right? How would I do this? - To keep other clade attributes with the node, maybe add them to the optional dictionary associated with each node, like we already do for branch colors and widths - At some point, generate a lookup table to associate the graph nodes' unique integer identifiers with the original clade objects -- or at least make this possible through another function - Serializing for Graphviz will work cleanly > Using the string representation of labeled leafs as identifiers in networkx is > also dangerous, since they will be used as identifiers in graphviz and underly > a number of restrictions (no whitespace etc.) Indeed, and as you've seen, the strings need to be unique. One alternative is to mimic Python's default repr() style for representing complex classes: '' But then, switching to the string name where clades do have the 'name' attribute set would be inconsistent. > I'd propose the following: in Clade, __repr__() should return the name of the > node, if it has one, or a unique identifier like id() (the memory adress) with > an additional "..." around them to make it a valid graphviz identifier.. > > def __repr__(self): > if self.name != None: > return self.name > else: > return "\""+str(id(self))+"\"" Remember that the NetworkX labels don't necessarily need to be the same as the string representation of clades in Bio.Phylo -- it's just convenient if they match. So __repr__ could be: While your function could be used to create labels in to_networkx. Thanks for your help, Eric -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From cy at cymon.org Tue Aug 31 05:11:08 2010 From: cy at cymon.org (Cymon Cox) Date: Tue, 31 Aug 2010 10:11:08 +0100 Subject: [Biopython-dev] BioSQL test code on Postgres In-Reply-To: References: Message-ID: Hi Peter, On 30 August 2010 19:21, Peter wrote: > On Mon, Aug 30, 2010 at 6:36 PM, Cymon Cox wrote: > > Hi Folks, > > > > The current test code in test_BioSQL.py fails on PostgreSQL; > > > > ERROR: Check list, keys, length etc > > ---------------------------------------------------------------------- > > Traceback (most recent call last): > > File "/home/cymon/git/biopython-github-master/Tests/test_BioSQL.py", > line > > 187, in test_get_db_items > > del db["non-existant-name"] > > File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", > > line 519, in __delitem__ > > if key not in self: > > File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", > > line 539, in __contains__ > > (self.dbid, value))[0]) > > File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", > > line 423, in execute_and_fetch_col0 > > self.execute(sql, args or ()) > > File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", > > line 404, in execute > > self.dbutils.execute(self.cursor, sql, args) > > File "/home/cymon/git/biopython-github-master/BioSQL/DBUtils.py", line > 33, > > in execute > > cursor.execute(sql, args or ()) > > DataError: invalid input syntax for integer: "non-existant-name" > > LINE 1: ...M bioentry WHERE biodatabase_id=1 AND > bioentry_id=E'non-exis... > > > > Because when trying to delete a bioentry_id that is a string type, ie. > > "non-existant-name" (line 188 on test_BioSQL.py), postgres throws an > error > > rather than returning a long (0,1) as in sqlite (and presumably MySQL (I > > havent tried it)). > > This (test_get_db_items) is once the unit tests added since Biopython 1.54, > while I was working on making the BioSQL objects act more like > dictionaries. > I think the SQL statements for the __contains__ method (and others added > recently) may need single quotes round the %s placeholders. Does that work? > Nope. The bioentry_id parameter is already being passed as a string - psycopg automatically converts python objects into SQL literals (see http://initd.org/psycopg/docs/usage.html#the-problem-with-the-query-parameters ). Here is the same error using the psql interface: biosqldb=# select count(bioentry_id) from bioentry where biodatabase_id=1 and bioentry_id='non-existant'; ERROR: invalid input syntax for integer: "non-existant" LINE 1: ...m bioentry where biodatabase_id=1 and bioentry_id='non-exist... biosqldb=# \d bioentry; Table "public.bioentry" Column | Type | Modifiers ----------------+------------------------+------------------------------------------------------- bioentry_id | integer | not null default nextval('bioentry_pk_seq'::regclass) Cheers, Cymon From biopython at maubp.freeserve.co.uk Tue Aug 31 06:38:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Aug 2010 11:38:37 +0100 Subject: [Biopython-dev] BioSQL test code on Postgres In-Reply-To: References: Message-ID: On Tue, Aug 31, 2010 at 10:11 AM, Cymon Cox wrote: > Hi Peter, > > Nope. The bioentry_id parameter is already being passed as a string - > psycopg automatically converts python objects into SQL literals (see > http://initd.org/psycopg/docs/usage.html#the-problem-with-the-query-parameters > ). > > Here is the same error using the psql interface: > > biosqldb=# select count(bioentry_id) from bioentry where biodatabase_id=1 > and bioentry_id='non-existant'; > ERROR: ?invalid input syntax for integer: "non-existant" > ... > > Cheers, Cymon I think I get it now - the bioentry_id is an integer (in all the schemas), and PostgreSQL throws an error due to the type mismatch (we are comparing it to a string) while MySQL and SQLite just return no matches. How's this?: http://github.com/biopython/biopython/commit/050963bd3bbd6653101306eed9aab6c629cf9375 Peter From cy at cymon.org Tue Aug 31 06:43:23 2010 From: cy at cymon.org (Cymon Cox) Date: Tue, 31 Aug 2010 11:43:23 +0100 Subject: [Biopython-dev] BioSQL test code on Postgres In-Reply-To: References: Message-ID: Hi P., On 31 August 2010 11:38, Peter wrote: > On Tue, Aug 31, 2010 at 10:11 AM, Cymon Cox wrote: > > Hi Peter, > > > > Nope. The bioentry_id parameter is already being passed as a string - > > psycopg automatically converts python objects into SQL literals (see > > > http://initd.org/psycopg/docs/usage.html#the-problem-with-the-query-parameters > > ). > > > > Here is the same error using the psql interface: > > > > biosqldb=# select count(bioentry_id) from bioentry where biodatabase_id=1 > > and bioentry_id='non-existant'; > > ERROR: invalid input syntax for integer: "non-existant" > > ... > > > > Cheers, Cymon > > I think I get it now - the bioentry_id is an integer (in all the schemas), > and PostgreSQL throws an error due to the type mismatch (we are > comparing it to a string) while MySQL and SQLite just return no > matches. How's this?: > > > http://github.com/biopython/biopython/commit/050963bd3bbd6653101306eed9aab6c629cf9375 > Sure - nice and simple. Or catching the exceptions, this'll work: diff --git a/BioSQL/BioSeqDatabase.py b/BioSQL/BioSeqDatabase.py index 45c0774..57d6ab9 100644 --- a/BioSQL/BioSeqDatabase.py +++ b/BioSQL/BioSeqDatabase.py @@ -533,9 +533,15 @@ class BioSeqDatabase: """Check if a primary (internal) id is this namespace (sub database).""" sql = "SELECT COUNT(bioentry_id) FROM bioentry " + \ "WHERE biodatabase_id=%s AND bioentry_id=%s;" - return bool(self.adaptor.execute_and_fetch_col0(sql, - (self.dbid, value))[0]) - + try: + return bool(self.adaptor.execute_and_fetch_col0(sql,(self.dbid, value))[0]) + except (self.adaptor.conn.DataError, + self.adaptor.conn.DatabaseError), e: + if "invalid input syntax for integer" in e.__str__(): + return False + else: + raise + def __iter__(self): """Iterate over ids (which may not be meaningful outside this database).""" #TODO - Iterate over the cursor, much more efficient With either correction, the test will pass with the PyGreSQL driver as well. Cheers, C. From biopython at maubp.freeserve.co.uk Tue Aug 31 07:07:33 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Aug 2010 12:07:33 +0100 Subject: [Biopython-dev] BioSQL test code on Postgres In-Reply-To: References: Message-ID: On Tue, Aug 31, 2010 at 11:43 AM, Cymon Cox wrote: > Hi P., > > Sure - nice and simple. > > Or catching the exceptions, this'll work: > ... > > With either correction, the test will pass with the PyGreSQL driver as well. > > Cheers, C. The exception approach looks fragile due to the error message check, so let's go with my commit - as you say, nice and simple. Thanks for checking this :) Peter From biopython at maubp.freeserve.co.uk Tue Aug 31 13:06:15 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Aug 2010 18:06:15 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 In-Reply-To: References: Message-ID: On Mon, Aug 30, 2010 at 2:34 PM, Peter wrote: > > Hi all, > > Those deprecations have been done, and the BLAST+ unit test tweaked. > Are there any further issues we need to address before doing the final > release? Please speak up soon, otherwise I'll do the release tonight or > tomorrow as planned. We've sorted out the BioSQL on PostreSQL problem now: http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008215.html I'm starting the release process now - just as NumPy 1.5 is released (their first release to support Python 2.7) so I should be able to do Windows installers for Biopython on Python 2.7 :) Peter From biopython at maubp.freeserve.co.uk Tue Aug 31 14:14:41 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Aug 2010 19:14:41 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 In-Reply-To: References: Message-ID: On Tue, Aug 31, 2010 at 6:06 PM, Peter wrote: > > I'm starting the release process now - just as NumPy 1.5 is released (their > first release to support Python 2.7) so I should be able to do Windows > installers for Biopython on Python 2.7 :) > Binaries are up - Brad, could you do a basic sanity test then upload to PyPI please. I really should sort out an account on there for myself... I'll write up the announcement in an hour or two's time (other things to attend to first), unless anyone else would like to do it? Peter From chapmanb at 50mail.com Tue Aug 31 14:33:34 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 31 Aug 2010 14:33:34 -0400 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 In-Reply-To: References: Message-ID: <20100831183334.GA31194@sobchak.mgh.harvard.edu> Peter; > > I'm starting the release process now - just as NumPy 1.5 is released (their > > first release to support Python 2.7) so I should be able to do Windows > > installers for Biopython on Python 2.7 :) Awesome. Thanks as always for all the hard work getting this together. Great to see a new release, and nice timing with NumPy. > Binaries are up - Brad, could you do a basic sanity test then upload to > PyPI please. I really should sort out an account on there for myself... Done. It's dead easy to do, and if you want to setup an account on pypi and send me your username I can add you as an owner so you can upload them in the future if you want. Thanks again, Brad From p.j.a.cock at googlemail.com Tue Aug 31 19:00:37 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 1 Sep 2010 00:00:37 +0100 Subject: [Biopython-dev] Biopython 1.55 released Message-ID: Dear Biopythoneers, After the beta earlier this month (thank you to everyone who helped test this), we?ve just released Biopython 1.55 . For full details see: http://news.open-bio.org/news/2010/08/biopython-1-55-released/ Note we are phasing out support for Python 2.4. We will continue to support it for at least one further release (i.e. Biopython 1.56). This could be delayed given feedback from our users (e.g. if this proves to be a problem in combination with other libraries or a popular Linux distribution). (At least) 12 people have contributed to this release, including 6 new people ? thank you all: * Andres Colubri (first contribution) * Carlos Rios Vera (first contribution) * Claude Paroz (first contribution) * Cymon Cox * Eric Talevich * Frank Kauff * Joao Rodrigues (first contribution) * Konstantin Okonechnikov (first contribution) * Michiel de Hoon * Nathan Edwards (first contribution) * Peter Cock * Tiago Antao Source distributions and Windows installers are available from the downloads page on the Biopython website: http://www.biopython.org/wiki/Download As usual, feedback is most welcome on the mailing lists (or bugzilla). Regards, Peter P.S. You can follow Biopython on Twitter, http://twitter.com/biopython From mjldehoon at yahoo.com Sun Aug 1 15:14:23 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 1 Aug 2010 08:14:23 -0700 (PDT) Subject: [Biopython-dev] Python 3 and encoding for online resources In-Reply-To: Message-ID: <467239.37480.qm@web62408.mail.re1.yahoo.com> According to this post: http://stackoverflow.com/questions/1179305/expat-parsing-in-python-3 we need only one parser which always parses a byte stream. Bio.Entrez uses File.UndoHandle but just to look for potential errors in the first few lines when opening the Entrez url, which in my opinion we shouldn't be doing anyway since it's the parser's job to decide whether the input is well-formed. So I'd suggest to not use File.UndoHandle (at all), make sure our parser works with Python 3 byte streams, and ask users to open any downloaded Entrez XML files in binary mode. Is there a Biopython version (in trunk or otherwise) that is ready for Python 3? If so, I can have a look at the parser to see if it handles byte streams correctly. --Michiel. --- On Tue, 7/27/10, Peter wrote: > From: Peter > Subject: [Biopython-dev] Python 3 and encoding for online resources > To: "Biopython-Dev Mailing List" > Date: Tuesday, July 27, 2010, 9:23 AM > Hi all, > > One of the remaining (pure python) problems with Biopython > under Python 3 relates to parsing online resources like > the > NCBI Entrez API or even Bio.ExPASy.get_sprot_raw(). > See for example test_SeqIO_online.py for a failure. > > In Python 2, urlopen from urlib or urllib2 would give a > string handle. In python 3, you get a bytes handle (not > a unicode handle and choosing the encoding is tricky): > http://docs.python.org/py3k/library/urllib.request.html > > In the case of resources like the NCBI and ExPASy we > should be able to assume an encoding (maybe UTF-8 > or Latin) for all the plain text output, while from > XML/HTML > there are ways for the data to specify this itself. > > I think we may need to transform the urllib bytes handle > into > a unicode string handle for parsing. One option would be > to > extend the Bio.File.UndoHandle class (or invent a > subclass) > which applies the decoding. This seems simple since > Bio.Entrez and Bio.ExPASy already use this class. > > Another option (which I suggested on the Bio.SeqIO.index() > thread [1]) would be to extend our parsers to cope with > both > byte and unicode handles. That could be a lot of work > though... > > Thoughts? > > Peter > > [1] http://lists.open-bio.org/pipermail/biopython-dev/2010-July/008004.html > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython at maubp.freeserve.co.uk Sun Aug 1 17:54:03 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 1 Aug 2010 18:54:03 +0100 Subject: [Biopython-dev] Python 3 and encoding for online resources In-Reply-To: <467239.37480.qm@web62408.mail.re1.yahoo.com> References: <467239.37480.qm@web62408.mail.re1.yahoo.com> Message-ID: On Sun, Aug 1, 2010 at 4:14 PM, Michiel de Hoon wrote: > According to this post: > > http://stackoverflow.com/questions/1179305/expat-parsing-in-python-3 > > we need only one parser which always parses a byte stream. > Bio.Entrez uses File.UndoHandle but just to look for potential > errors in the first few lines when opening the Entrez url, which > in my opinion we shouldn't be doing anyway since it's the > parser's job to decide whether the input is well-formed. > So I'd suggest to not use File.UndoHandle (at all), ... I disagree. The NCBI return multiple different file formats, so there are multiple different parsers that may get an error page. Given the NCBI return HTML error pages regardless of what format the request was (XML, plain text, etc), I think we have to look for errors before giving the data to the parser. But that can be done using byte strings just as easily as with unicode strings. > make sure our parser works with Python 3 byte streams, and > ask users to open any downloaded Entrez XML files in binary > mode. That sounds workable. > Is there a Biopython version (in trunk or otherwise) that is ready > for Python 3? If so, I can have a look at the parser to see if it > handles byte streams correctly. The trunk itself -- after running 2to3 on it (as described in the README file). Or if you just want to grab some code for a quick play, I have a branch where I've been doing this on a semi-regular basis: http://github.com/peterjc/biopython/tree/auto2to3 Note that we are keeping the trunk as Python 2 code, which can make like interesting (Another option would be a Python 3 branch, but we'd then need to manually keep things in sync). To make life a little easier, we are probably going to need some python 3 compatibility functions (like bytes as unicode, unicode as bytes - see the NumPy project for other possible examples), which we are currently doing on a module by module basis. Here I'm thinking specifically of some of the things required in Bio/SeqIO/SffIO.py, but there are other python 3 hacks we may want to standardise. For the C code (which we haven't looked at yet, setup,py is ignoring the extensions on Python 3 for now) we should be able to use the normal #ifdef approach. Again, we can learn a lot from looking at NumPy here. Peter From mjldehoon at yahoo.com Mon Aug 2 13:50:47 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 2 Aug 2010 06:50:47 -0700 (PDT) Subject: [Biopython-dev] Python 3 and encoding for online resources Message-ID: <932397.16483.qm@web62408.mail.re1.yahoo.com> > Or if you just want to grab some code for a quick play, >I have a branch where I've been doing this on a > semi-regular basis: > > http://github.com/peterjc/biopython/tree/auto2to3 Thanks! I used this branch to test the Bio.Entrez and Bio.SwissProt parsers. The Bio.Entrez Parser works as is; the Bio.SwissProt parser is really easy to fix (just convert each line into a plain string inside the _read function in Bio.SwissProt.__init__). Perhaps we can do something similar for the other test_SeqIO_online.py failures (the ones appearing in Bio/SeqIO/FastaIO.py)? > > So I'd suggest to not use File.UndoHandle (at all), > ... > I disagree. The NCBI return multiple different file > formats, so there are multiple different parsers that may get > an error page. > > Given the NCBI return HTML error pages regardless of what > format the request was (XML, plain text, etc), I think we > have to look for errors before giving the data to the > parser. Part of the problem solves itself when we change to Python 3. In Python 3, urllib.request.urlopen raises a urllib.error.HTTPError in cases where urllib.urlopen in Python 2 raises no exception: mdehoon:~/Software/biopython2to3/peterjc-biopython-06c2ea6 $ python Python 2.7 (r27:82500, Jul 19 2010, 00:08:00) [GCC 4.0.1 (Apple Computer, Inc. build 5370)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import urllib >>> urllib.urlopen("http://www.biopython.org/somethingimadeup") > >>> mdehoon:~/Software/biopython2to3/peterjc-biopython-06c2ea6 $ python3 Python 3.1.2 (r312:79360M, Mar 24 2010, 01:33:18) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import urllib.request >>> urllib.request.urlopen("http://www.biopython.org/somethingimadeup") Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/urllib/request.py", line 121, in urlopen return _opener.open(url, data, timeout) File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/urllib/request.py", line 355, in open response = meth(req, response) File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/urllib/request.py", line 467, in http_response 'http', request, response, code, msg, hdrs) File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/urllib/request.py", line 393, in error return self._call_chain(*args) File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/urllib/request.py", line 327, in _call_chain result = func(*args) File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/urllib/request.py", line 475, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 404: Not Found >>> which means that we can catch at least some errors without having to actually read from the handle. A 414 Request-URI Too Large is also being caught, In this sense, urllib in Python 3 behaves as urllib2 in Python 2. I don't know though how to go about checking whether all HTTP errors we check for in Bio.Entrez are being caught (anybody know a magical way to trigger a particular HTTP error?). Nevertheless, this avoids having to go through a File.UndoHandle, and is safer than checking the HTML / text response from NCBI (at least the "download dataset is empty" response from NCBI has already changed). So I would suggest to switch from urllib to urllib2 in Bio.Entrez and catch any HTTP errors (urllib2 is translated appropriately by 2to3), and to handle any bytes/utf8/ascii conversion inside the parser (as in Bio.SwissProt). --Michiel. --- On Sun, 8/1/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Python 3 and encoding for online resources > To: "Michiel de Hoon" > Cc: "Biopython-Dev Mailing List" > Date: Sunday, August 1, 2010, 1:54 PM > On Sun, Aug 1, 2010 at 4:14 PM, > Michiel de Hoon > wrote: > > According to this post: > > > > http://stackoverflow.com/questions/1179305/expat-parsing-in-python-3 > > > > we need only one parser which always parses a byte > stream. > > Bio.Entrez uses File.UndoHandle but just to look for > potential > > errors in the first few lines when opening the Entrez > url, which > > in my opinion we shouldn't be doing anyway since it's > the > > parser's job to decide whether the input is > well-formed. > > So I'd suggest to not use File.UndoHandle (at all), > ... > > I disagree. The NCBI return multiple different file > formats, so > there are multiple different parsers that may get an error > page. > Given the NCBI return HTML error pages regardless of what > format the request was (XML, plain text, etc), I think we > have to look for errors before giving the data to the > parser. > But that can be done using byte strings just as easily as > with > unicode strings. > > > make sure our parser works with Python 3 byte streams, > and > > ask users to open any downloaded Entrez XML files in > binary > > mode. > > That sounds workable. > > > Is there a Biopython version (in trunk or otherwise) > that is ready > > for Python 3? If so, I can have a look at the parser > to see if it > > handles byte streams correctly. > > The trunk itself -- after running 2to3 on it (as described > in the > README file). Or if you just want to grab some code for a > quick > play, I have a branch where I've been doing this on a > semi-regular > basis: > > http://github.com/peterjc/biopython/tree/auto2to3 > > Note that we are keeping the trunk as Python 2 code, which > can make like interesting (Another option would be a > Python > 3 branch, but we'd then need to manually keep things in > sync). > To make life a little easier, we are probably going to need > some > python 3 compatibility functions (like bytes as unicode, > unicode > as bytes - see the NumPy project for other possible > examples), > which we are currently doing on a module by module basis. > Here I'm thinking specifically of some of the things > required in > Bio/SeqIO/SffIO.py, but there are other python 3 hacks we > may > want to standardise. > > For the C code (which we haven't looked at yet, setup,py > is > ignoring the extensions on Python 3 for now) we should be > able to use the normal #ifdef approach. Again, we can > learn > a lot from looking at NumPy here. > > Peter > From biopython at maubp.freeserve.co.uk Mon Aug 2 14:04:49 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 2 Aug 2010 15:04:49 +0100 Subject: [Biopython-dev] Python 3 and encoding for online resources In-Reply-To: <932397.16483.qm@web62408.mail.re1.yahoo.com> References: <932397.16483.qm@web62408.mail.re1.yahoo.com> Message-ID: On Mon, Aug 2, 2010 at 2:50 PM, Michiel de Hoon wrote: >> Or if you just want to grab some code for a quick play, >>I have a branch where I've been doing this on a >> semi-regular basis: >> >> http://github.com/peterjc/biopython/tree/auto2to3 > > Thanks! I used this branch to test the Bio.Entrez and Bio.SwissProt parsers. > The Bio.Entrez Parser works as is; the Bio.SwissProt parser is really easy to > fix (just convert each line into a plain string inside the _read function in > Bio.SwissProt.__init__). Perhaps we can do something similar for the other > test_SeqIO_online.py failures (the ones appearing in Bio/SeqIO/FastaIO.py)? Maybe (replied in more detail below) >> > So I'd suggest to not use File.UndoHandle (at all), >> ... >> I disagree. The NCBI return multiple different file >> formats, so there are multiple different parsers that may get >> an error page. >> >> Given the NCBI return HTML error pages regardless of what >> format the request was (XML, plain text, etc), I think we >> have to look for errors before giving the data to the >> parser. > > Part of the problem solves itself when we change to Python 3. In Python > 3, urllib.request.urlopen raises a urllib.error.HTTPError in cases where > urllib.urlopen in Python 2 raises no exception: > > ... > > So I would suggest to switch from urllib to urllib2 in Bio.Entrez and catch > any HTTP errors (urllib2 is translated appropriately by 2to3), That sounds very sensible. > ... and to handle any bytes/utf8/ascii conversion inside the parser > (as in Bio.SwissProt). i.e. Make the SwissProt, FASTA, etc parsers cope with unicode string handles (default from open on Python 3) and bytes handles (network handles or from file open in binary mode)? I think this is probably a worthwhile thing to do in any case, especially for the indexing code, see: http://lists.open-bio.org/pipermail/biopython-dev/2010-July/008011.html Peter From bugzilla-daemon at portal.open-bio.org Mon Aug 2 14:21:40 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 2 Aug 2010 10:21:40 -0400 Subject: [Biopython-dev] [Bug 3119] Bio.Nexus can't parse file from Prank 100701 (1st July 2010) In-Reply-To: Message-ID: <201008021421.o72ELesu027221@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3119 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-02 10:21 EST ------- Ari has released PRANK v100802 (2 August 2010) which fixes the NEXUS output problems identified (unquoted taxa names containing punctuation, extra comma in translate block). With Frank's small fix for the tree, we can now parse the latest PRANK output http://github.com/biopython/biopython/commit/f4b0007d29fdd878e4cc326b12e63e833e246ce4 Marking as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Aug 2 17:22:44 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 2 Aug 2010 18:22:44 +0100 Subject: [Biopython-dev] EMBOSS SAM/BAM parser and reverse strand reads Message-ID: Hi all, One of my immediate questions on learning that EMBOSS 6.3.1 had SAM/BAM support was how it handled reads mapped to the reverse strand: http://lists.open-bio.org/pipermail/emboss-dev/2010-July/000656.html > What do you do about the strand issue? SAM/BAM stored reads > which map onto the reverse strand in reverse complement. If > you want to get back to the original orientation for output as > FASTQ you must apply the reverse complement (plus reverse > the quality scores too of course). As I suspected, currently EMBOSS ignores this and gives the sequence and quality string as it is stored in the SAM/BAM file. Here are three consecutive entries from the example SAM file, http://pysam.googlecode.com/hg/tests/ex1.sam.gz ... EAS54_65:6:115:538:276 163 chr1 209 99 35M = 360 186 TATTTGTAATGAAAACTATATTTATGCTATTCAGT <<<<<<<<;<<<;;<<<;<:<<<:<<<<<<;;;7; MF:i:18 Aq:i:75 NM:i:0 UQ:i:0 H0:i:1 H1:i:0 EAS219_FC30151:7:51:1429:1043 83 chr1 209 99 35M = 59 -185 TATTTGTAATGAAAACTATATTTATGCTATTCAGT 9<5<<<<<<<<<<<<<9<<<9<<<<<<<<<<<<<< MF:i:18 Aq:i:68 NM:i:0 UQ:i:0 H0:i:1 H1:i:0 EAS114_30:1:176:168:513 163 chr1 210 99 35M = 410 235 ATTTGTAATGAAAACTATATTTATGCTATTCAGTT <<<<;<<<<<<<<<<<<<<<<<<<:&<<<<:;0;; MF:i:18 Aq:i:71 NM:i:0 UQ:i:0 H0:i:1 H1:i:0 ... The middle read of this triple, EAS219_FC30151:7:51:1429:1043, maps to chr1 on the reverse strand - we known this from the flag value 83. Note 83 = 1 + 2 + 16 + 64, or in hex, 0x53 = 0x40 + 0x10 + 0x02 + 0x01. Referring to the SAM/BAM specification, 0x01 - read is paired 0x02 - read is in a proper pair 0x10 - mapped to reverse strand 0x40 - first read in the pair This is the FASTQ output via seqret from SAM or BAM using EMBOSS 6.3.1 with the previously discussed patches: @EAS54_65:6:115:538:276 TATTTGTAATGAAAACTATATTTATGCTATTCAGT + <<<<<<<<;<<<;;<<<;<:<<<:<<<<<<;;;7; @EAS219_FC30151:7:51:1429:1043 TATTTGTAATGAAAACTATATTTATGCTATTCAGT + 9<5<<<<<<<<<<<<<9<<<9<<<<<<<<<<<<<< @EAS114_30:1:176:168:513 ATTTGTAATGAAAACTATATTTATGCTATTCAGTT + <<<<;<<<<<<<<<<<<<<<<<<<:&<<<<:;0;; Notice that all three read sequence and quality strings match the SAM file. On the other hand, this is from my experimental branch for Biopython, converting SAM/BAM to FASTQ: ... @EAS54_65:6:115:538:276/2 TATTTGTAATGAAAACTATATTTATGCTATTCAGT + <<<<<<<<;<<<;;<<<;<:<<<:<<<<<<;;;7; @EAS219_FC30151:7:51:1429:1043/1 ACTGAATAGCATAAATATAGTTTTCATTACAAATA + <<<<<<<<<<<<<<9<<<9<<<<<<<<<<<<<5<9 @EAS114_30:1:176:168:513/2 ATTTGTAATGAAAACTATATTTATGCTATTCAGTT + <<<<;<<<<<<<<<<<<<<<<<<<:&<<<<:;0;; ... Ignore for the moment the fact that I'm adding /1 and /2 suffixes to the read names for the first and second (forward and reverse) reads in a pair. Notice that for the second read (which is mapped to the reverse strand) I am deliberately returning the reverse complement of the sequence, with the quality string reversed. I'd like to propose that EMBOSS also invert the sequence for those reads mapped to the reverse strand. This is essential for the use case of converting SAM/BAM to get the *original* unmapped reads. This applies regardless of the output format: FASTA, FASTQ, or unaligned SAM/BAM (since for now EMBOSS does not output aligned SAM/BAM). Given I think there are problems with the SAM/BAM parsing in EMBOSS 6.3.1 which will require a patch or point release anyway, I don't think we need to worry about this change breaking backwards compatibility (as long as this is done as part of the first bug fix update). However, this isn't my decision of course ;) To elaborate, the reason I am acutely aware of this issue is that it has bitten me already. I had some (large) SAM/BAM files from a collaborator for paired end transcriptome data mapped onto a draft genome. Due to the file sizes we didn't want to transfer the original FASTQ files over the internet as well. When I wanted to remap the reads to a different reference, I instead extracted the reads from the SAM/BAM files. Initially I converted from SAM to FASTQ using sed (and also in Python as a check) without being aware of the reverse stand issue... There could be some valid reasons the current EMBOSS behaviour is useful - but right now I can't think of any. Any suggestions? Regards, Peter C. From bugzilla-daemon at portal.open-bio.org Mon Aug 2 18:12:57 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 2 Aug 2010 14:12:57 -0400 Subject: [Biopython-dev] [Bug 3127] New: SeqIO.write appends text to fasta comments Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3127 Summary: SeqIO.write appends text to fasta comments Product: Biopython Version: 1.54 Platform: PC OS/Version: Windows XP Status: NEW Severity: minor Priority: P2 Component: Other AssignedTo: biopython-dev at biopython.org ReportedBy: jared.ackers at smithsdetection.com When using the following SeqIO command: SeqIO.write(SeqIO.parse("file.txt", "tab"), "file.fas", "fasta") SeqIO will append the text " " to every sequence ID in the output file. The input file has two tab-delimited columns, the first with a (custom) sequence ID and the second with the corresponding sequence. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Aug 2 18:50:50 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 2 Aug 2010 14:50:50 -0400 Subject: [Biopython-dev] [Bug 3127] Set SeqRecord description in SeqIO "tab" parser In-Reply-To: Message-ID: <201008021850.o72IoopW009476@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3127 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|SeqIO.write appends text to |Set SeqRecord description in |fasta comments |SeqIO "tab" parser ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-02 14:50 EST ------- The problem isn't really in Bio.SeqIO.write(), it is with the SeqRecord default and/or the "tab" parser. Retitling bug... In the "tab" file format there is no description, so you are getting the SeqRecord's default description. We'd recently talked about making this just an empty string, alternatively and with less risk the "tab" parser could set the description to "" explicitly. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Aug 3 14:07:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Aug 2010 15:07:40 +0100 Subject: [Biopython-dev] Python 3 and encoding for online resources In-Reply-To: References: <932397.16483.qm@web62408.mail.re1.yahoo.com> Message-ID: Peter wrote: >Michiel wrote: >> So I would suggest to switch from urllib to urllib2 in Bio.Entrez and catch >> any HTTP errors (urllib2 is translated appropriately by 2to3), > > That sounds very sensible. > Hi Michiel, I see you've switched from urllib to urllib2, but you also removed all the NCBI specific error handling (which it turns out would need to be updated). I just tried a simple history example and if you deliberately use a wrong webenv you get an HTML error page back (from memory and the comments in our code it used to be a plain text error page):

    Error occurred: Unable to obtain query #1


    • db=pubmed
    • query_key=1
    • report=medline
    • dispstart=0
    • dispmax=10
    • mode=text
    • WebEnv=wrong

    pmfetch need params:

  • (id=NNNNNN[,NNNN,etc]) or (query_key=NNN, where NNN - number in the history, 0 - clipboard content for current database)
  • db=db_name (mandatory)
  • report=[docsum, brief, abstract, citation, medline, asn.1, mlasn1, uilist, sgml, gen] (Optional; default is asn.1)
  • mode=[html, file, text, asn.1, xml] (Optional; default is html)
  • dispstart - first element to display, from 0 to count - 1, (Optional; default is 0)
  • dispmax - number of items to display (Optional; default is all elements, from dispstart)

  • See help. The old code could handle this just by looking for "Error occurred". Anyway, this demonstrates that we can't just assume any error will be handled by the NCBI as an HTTP error code and thus get turned into an exception automatically by urllib2. In this particular case, one might argue the NCBI should use HTTP status code 400 Bad Request. I think we should write some online tests for Bio.Entrez including error conditions like this. In a related example, I'm trying added a sleep statement between my ESearch and EFetch calls in order let the session time out. I'll post back once I know what it does - but I'll be pleasantly surprised if they do something like HTTP status code 410 Gone, I'm expecting another HTML error page. Regards, Peter From mjldehoon at yahoo.com Tue Aug 3 15:44:49 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 3 Aug 2010 08:44:49 -0700 (PDT) Subject: [Biopython-dev] Python 3 and encoding for online resources In-Reply-To: Message-ID: <987321.1607.qm@web62405.mail.re1.yahoo.com> Have you tried looking at handle.info(), where handle is the handle returned by urllib.urlopen()? Another candidate is handle.getcode(). Otherwise, we could try to contact NCBI to see if their error messages can be returned in a standard format, or at least in a format consistent with the request. Otherwise, we can also consider not to parse the HTML error message; the SeqIO/Entrez parsers will notice a format problem and raise an exception anyway. --Michiel. --- On Tue, 8/3/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Python 3 and encoding for online resources > To: "Michiel de Hoon" > Cc: "Biopython-Dev Mailing List" > Date: Tuesday, August 3, 2010, 10:07 AM > Peter wrote: > >Michiel wrote: > >> So I would suggest to switch from urllib to > urllib2 in Bio.Entrez and catch > >> any HTTP errors (urllib2 is translated > appropriately by 2to3), > > > > That sounds very sensible. > > > > Hi Michiel, > > I see you've switched from urllib to urllib2, but you also > removed all > the NCBI specific error handling (which it turns out would > need to be > updated). > > I just tried a simple history example and if you > deliberately use a > wrong webenv you get an HTML error page back (from memory > and the comments in our code it used to be a plain text > error page): > > > >

    Error occurred: Unable to obtain query > #1


      title="some params from request:"> >
    • db=pubmed
    • >
    • query_key=1
    • >
    • report=medline
    • >
    • dispstart=0
    • >
    • dispmax=10
    • >
    • mode=text
    • >
    • WebEnv=wrong
    • >
    >
    pmfetch need > params:

    >
  • (id=NNNNNN[,NNNN,etc]) or (query_key=NNN, where > NNN - number in > the history, 0 - clipboard content for current > database)
  • >
  • db=db_name (mandatory)
  • >
  • report=[docsum, brief, abstract, citation, > medline, asn.1, mlasn1, > uilist, sgml, gen] (Optional; default is asn.1)
  • >
  • mode=[html, file, text, asn.1, xml] (Optional; > default is html)
  • >
  • dispstart - first element to display, from 0 to > count - 1, > (Optional; default is 0)
  • >
  • dispmax - number of items to display (Optional; > default is all > elements, from dispstart)
  • >
    See help. > > > The old code could handle this just by looking for "Error > occurred". > > Anyway, this demonstrates that we can't just assume any > error will > be handled by the NCBI as an HTTP error code and thus get > turned into an exception automatically by urllib2. In this > particular > case, one might argue the NCBI should use HTTP status code > 400 Bad Request. > > I think we should write some online tests for Bio.Entrez > including error conditions like this. > > In a related example, I'm trying added a sleep statement > between > my ESearch and EFetch calls in order let the session time > out. > I'll post back once I know what it does - but I'll be > pleasantly > surprised if they do something like HTTP status code 410 > Gone, > I'm expecting another HTML error page. > > Regards, > > Peter > From biopython at maubp.freeserve.co.uk Tue Aug 3 16:16:44 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Aug 2010 17:16:44 +0100 Subject: [Biopython-dev] Python 3 and encoding for online resources In-Reply-To: <987321.1607.qm@web62405.mail.re1.yahoo.com> References: <987321.1607.qm@web62405.mail.re1.yahoo.com> Message-ID: On Tue, Aug 3, 2010 at 4:44 PM, Michiel de Hoon wrote: > Have you tried looking at handle.info(), where handle is the handle > returned by urllib.urlopen()? Another candidate is handle.getcode(). In the case of using the history support with a bad webenv, we get an HTML error page with HTTP status code 200 (OK) which explains why urllib doesn't raise an exception (sample example in previous email). In the case of using the history support with an invalid integer query key, we get an HTML error page with HTTP status code 200 (OK), e.g.

    Error occurred: Unable to obtain query #123456789

    ... In the case of using the history support with a non-integer query key, we also get an HTML error page with HTTP status code 200 (OK), e.g.

    Error occurred: NCBI C++ Exception: Error: CORELIB(CStringException::eConvert) "/pubmed_gen/rbuild/version/20100419.1/entrez/c++/src/corelib/ncbistr.cpp", line 666: ncbi::NStr::StringToInt8() --- Cannot convert string 'wrong' to Int8 (m_Pos = 0)

    ... It puzzles me that they are still using HTTP status code 200 (OK) here. > Otherwise, we could try to contact NCBI to see if their error messages > can be returned in a standard format, or at least in a format consistent > with the request. This is definitely worth trying. Additionally we should also ask them about making more use of HTTP error codes like 400 when serving an error page. Would you like to email the NCBI Entrez team about this (and CC me please)? > Otherwise, we can also consider not to parse the HTML error message; > the SeqIO/Entrez parsers will notice a format problem and raise an > exception anyway. As things stand with the NCBI returning 200 (OK) HTML error messages I'm not comfortable with this. It will break the use case of a batch download script which writes the data direct to disk without parsing it (or giving it to another tool as input). I believe the earlier we can catch any NCBI error messages the better, even if it does require some messy peeping at the data via an buffered handle. Thanks, Peter From mjldehoon at yahoo.com Wed Aug 4 09:19:45 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 4 Aug 2010 02:19:45 -0700 (PDT) Subject: [Biopython-dev] Python 3 and encoding for online resources In-Reply-To: Message-ID: <764813.13018.qm@web62401.mail.re1.yahoo.com> Can you give an example script where you get an HTML error page? In the cases I've tried, the metadata revealed that an error had occurred, even if urllib2.urlopen didn't raise an HTTP error but returned a handle to XML containing the error message. --Michiel. --- On Tue, 8/3/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Python 3 and encoding for online resources > To: "Michiel de Hoon" > Cc: "Biopython-Dev Mailing List" > Date: Tuesday, August 3, 2010, 12:16 PM > On Tue, Aug 3, 2010 at 4:44 PM, > Michiel de Hoon > wrote: > > Have you tried looking at handle.info(), where handle > is the handle > > returned by urllib.urlopen()? Another candidate is > handle.getcode(). > > In the case of using the history support with a bad webenv, > we get an > HTML error page with HTTP status code 200 (OK) which > explains > why urllib doesn't raise an exception (sample example in > previous email). > > In the case of using the history support with an invalid > integer query key, > we get an HTML error page with HTTP status code 200 (OK), > e.g. > > > >

    Error occurred: Unable to obtain query > #123456789

    > ... > > > > In the case of using the history support with a non-integer > query key, > we also get an HTML error page with HTTP status code 200 > (OK), e.g. > > > >

    Error occurred: NCBI C++ Exception: > ? ? Error:? ? ? ? > CORELIB(CStringException::eConvert) > "/pubmed_gen/rbuild/version/20100419.1/entrez/c++/src/corelib/ncbistr.cpp", > line 666: ncbi::NStr::StringToInt8() --- Cannot convert > string 'wrong' > to Int8 (m_Pos = 0) >

    > ... > > > > It puzzles me that they are still using HTTP status code > 200 (OK) here. > > > Otherwise, we could try to contact NCBI to see if > their error messages > > can be returned in a standard format, or at least in a > format consistent > > with the request. > > This is definitely worth trying. Additionally we should > also ask them about > making more use of HTTP error codes like 400 when serving > an error page. > > Would you like to email the NCBI Entrez team about this > (and CC me > please)? > > > Otherwise, we can also consider not to parse the HTML > error message; > > the SeqIO/Entrez parsers will notice a format problem > and raise an > > exception anyway. > > As things stand with the NCBI returning 200 (OK) HTML error > messages > I'm not comfortable with this. It will break the use case > of a batch > download script which writes the data direct to disk > without parsing it > (or giving it to another tool as input). I believe the > earlier we can catch > any NCBI error messages the better, even if it does require > some messy > peeping at the data via an buffered handle. > > Thanks, > > Peter > From mjldehoon at yahoo.com Wed Aug 4 13:29:04 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 4 Aug 2010 06:29:04 -0700 (PDT) Subject: [Biopython-dev] Python 3 and encoding for online resources In-Reply-To: Message-ID: <891819.37186.qm@web62408.mail.re1.yahoo.com> --- On Wed, 8/4/10, Peter wrote: > > In the cases I've tried, the metadata revealed that an > error had occurred, > > even if urllib2.urlopen didn't raise an HTTP error but > returned a handle to > > XML containing the error message. > > What meta data? > I was looking at handle.info(), where handle is the handle returned by urllib2.urlopen. But in your example, the information in handle.info() did not reveal a difference between a successful search and an unsuccessful one, so anyway this won't work in general. Btw, this is an error message nowadays returned by epost: >>> handle = Entrez.epost(db="nothing") >>> handle.read() '\n\n\n\tInvalid db name specified: nothing\n\n' >>> Previously, the same request gave a clean error message in XML format (see epost2.xml in Tests/Entrez). --Michiel. From n.j.loman at bham.ac.uk Wed Aug 4 14:48:53 2010 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Wed, 04 Aug 2010 15:48:53 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? Message-ID: <4C597DD5.9060604@bham.ac.uk> Hi biopython-developers, Has anyone written any code to convert ACE files (Newbler ACE, in particular) to SAM format? I've seen a little bit of discussion on this subject in various places: http://seqanswers.com/forums/showthread.php?t=5138 http://biostar.stackexchange.com/questions/1828/how-to-convert-newbler-output-or-ace-to-sam-format It seems that a quick way of doing this would involve Biopython's support for reading ACE and (perhaps) PySam's support for writing SAM files (http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/contents.html) The reason I would prefer to convert ACE files to the 454PairAlign.txt file as this would mean support for both de novo assemblies as well as mapping projects. I am not particularly au fait with the SAM format but can't see why that shouldn't work. If no-one has started writing something I would be happy to give it a go but equally if someone has I'd be more than happy to try it out :) Cheers Nick From bioinformed at gmail.com Wed Aug 4 16:13:09 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 12:13:09 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C597DD5.9060604@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 10:48 AM, Nick Loman wrote: > Hi biopython-developers, > > Has anyone written any code to convert ACE files (Newbler ACE, in > particular) to SAM format? > > Hi Nick, I have a converter that uses the 454PairAlign.txt format to convert to SAM/BAM as part of the GLU package (http://code.google.com/p/glu-genetics). Their ACE files are a bit problematic, though I do not remember the exact reasons offhand. I'll revisit the issue, since the alignment records are only half of the conversion, since most folks also want untrimmed reads and quality scores. Aside from the input format, the only difficulty with my converter are the dozen or so annoying pre-requisite packages to install to use it (Python, HDF5, pytables, numpy, scipy, ply, biopython, etc. etc.). I also know that the Roche/454 folks are adding SAM/BAM support to a future version of Newbler, but I wouldn't expect to see that for at least a few more months. -Kevin From n.j.loman at bham.ac.uk Wed Aug 4 16:18:40 2010 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Wed, 04 Aug 2010 17:18:40 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> Message-ID: <4C5992E0.8040904@bham.ac.uk> Kevin Jacobs wrote: > I have a converter that uses the 454PairAlign.txt format to convert to > SAM/BAM as part of the GLU package > (http://code.google.com/p/glu-genetics). Their ACE files are a bit > problematic, though I do not remember the exact reasons offhand. I'll > revisit the issue, since the alignment records are only half of the > conversion, since most folks also want untrimmed reads and quality scores. > > Aside from the input format, the only difficulty with my converter are > the dozen or so annoying pre-requisite packages to install to use it > (Python, HDF5, pytables, numpy, scipy, ply, biopython, etc. etc.). Hi Kevin Thanks for your email. I was aware of glu-genetics and will give it a whirl. The main reason I wanted an ACE file converter is that I mistakenly thought that the de novo component of Newbler won't produce the 454PairAlign.txt file, but reading the manual I see that file can be produced by supplying the -p (or -pt, for tab delimited output) option. But as you say, getting quality scores would be useful so I would be interested to know any progress you might make with an ACE converter. Cheers Nick From bioinformed at gmail.com Wed Aug 4 16:23:23 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 12:23:23 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C5992E0.8040904@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 12:18 PM, Nick Loman wrote: > Kevin Jacobs wrote: > >> I have a converter that uses the 454PairAlign.txt format to convert to >> SAM/BAM as part of the GLU package (http://code.google.com/p/glu-genetics). >> Their ACE files are a bit problematic, though I do not remember the exact >> reasons offhand. I'll revisit the issue, since the alignment records are >> only half of the conversion, since most folks also want untrimmed reads and >> quality scores. >> >> Aside from the input format, the only difficulty with my converter are the >> dozen or so annoying pre-requisite packages to install to use it (Python, >> HDF5, pytables, numpy, scipy, ply, biopython, etc. etc.). >> > Hi Kevin > > Thanks for your email. I was aware of glu-genetics and will give it a > whirl. The main reason I wanted an ACE file converter is that I mistakenly > thought that the de novo component of Newbler won't produce the > 454PairAlign.txt file, but reading the manual I see that file can be > produced by supplying the -p (or -pt, for tab delimited output) option. But > as you say, getting quality scores would be useful so I would be interested > to know any progress you might make with an ACE converter. > > Hi Nick, I may have mislead you-- I use the 454PairAlign.txt and SFF files together to generate SAM/BAM files with full untrimmed read data and quality values. My recollection was that the Newbler ACE files contained only the consensus sequence and not the individuals reads, which is why I didn't go down that road. I routinely use "-noace" so I'm quickly realigning a small dataset to generate an example ACE file to verify this. If I am incorrect and alignment information is indeed available from the ACE files, I'll happily add support for them to my converter. -Kevin From biopython at maubp.freeserve.co.uk Wed Aug 4 16:24:18 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 4 Aug 2010 17:24:18 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C597DD5.9060604@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 3:48 PM, Nick Loman wrote: > Hi biopython-developers, > > Has anyone written any code to convert ACE files (Newbler ACE, in > particular) to SAM format? > > I've seen a little bit of discussion on this subject in various places: > http://seqanswers.com/forums/showthread.php?t=5138 > http://biostar.stackexchange.com/questions/1828/how-to-convert-newbler-output-or-ace-to-sam-format > > It seems that a quick way of doing this would involve Biopython's support > for reading ACE and (perhaps) PySam's support for writing SAM files > (http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/contents.html) > > The reason I would prefer to convert ACE files to the 454PairAlign.txt file > as this would mean support for both de novo assemblies as well as mapping > projects. I am not particularly au fait with the SAM format but can't see > why that shouldn't work. > > If no-one has started writing something I would be happy to give it a go but > equally if someone has I'd be more than happy to try it out :) > > Cheers > > Nick I've done ACE to SAM as an experiment, but haven't given the paired end stuff much testing (my usual assembly viewer Tablet doesn't support paired reads yet). Would you like the script? As you suggested it uses Biopython's ACE parser but write SAM output directly since it is very simple. Peter From biopython at maubp.freeserve.co.uk Wed Aug 4 16:27:59 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 4 Aug 2010 17:27:59 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 5:24 PM, Peter wrote: > > I've done ACE to SAM as an experiment, but haven't given the paired end > stuff much testing (my usual assembly viewer Tablet doesn't support paired > reads yet). Would you like the script? As you suggested it uses Biopython's > ACE parser but write SAM output directly since it is very simple. > Sorry - I spoke to quickly. I wrote a MAF (MIRA assembly format) to SAM converter as an experiment, not an ACE to SAM converter. The reason for this was ACE files don't have the read qualities, but MAF files do. Peter From n.j.loman at bham.ac.uk Wed Aug 4 16:32:44 2010 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Wed, 04 Aug 2010 17:32:44 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> Message-ID: <4C59962C.2010105@bham.ac.uk> Kevin Jacobs wrote: > I may have mislead you-- I use the 454PairAlign.txt and SFF files > together to generate SAM/BAM files with full untrimmed read data and > quality values. My recollection was that the Newbler ACE files > contained only the consensus sequence and not the individuals reads, > which is why I didn't go down that road. I routinely use "-noace" so > I'm quickly realigning a small dataset to generate an example ACE file > to verify this. If I am incorrect and alignment information is indeed > available from the ACE files, I'll happily add support for them to my > converter. Hi Kevin I'm pretty sure the ACE files contain the individual reads (or at the least, the trimmed, aligned portions of them) because this is the file one uses in Consed/Tablet to view an assembly. We may of course be talking at cross-purposes! Cheers Nick From n.j.loman at bham.ac.uk Wed Aug 4 16:35:51 2010 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Wed, 04 Aug 2010 17:35:51 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> Message-ID: <4C5996E7.8010807@bham.ac.uk> Peter wrote: > Sorry - I spoke to quickly. I wrote a MAF (MIRA assembly format) to SAM > converter as an experiment, not an ACE to SAM converter. The reason > for this was ACE files don't have the read qualities, but MAF files do. > Hi Peter Ah, I see - perhaps you could post this script somewhere just for educational purposes? I guess I should be able to get my work done by using Kevin's glu-genetics script and making my Newbler assemblies output the 454PairAlign.txt file using the -p option. Not sure if there's scope for tighter integration to Biopython but will leave that to you experts! Cheers Nick From bioinformed at gmail.com Wed Aug 4 17:00:50 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 13:00:50 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C5996E7.8010807@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> <4C5996E7.8010807@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 12:35 PM, Nick Loman wrote: > Peter wrote: > >> Sorry - I spoke to quickly. I wrote a MAF (MIRA assembly format) to SAM >> converter as an experiment, not an ACE to SAM converter. The reason >> for this was ACE files don't have the read qualities, but MAF files do. >> >> > [...] > Not sure if there's scope for tighter integration to Biopython but will > leave that to you experts! > > Unlike much of my other code, there is no dependency on pysam, so there is no reason why biopython couldn't adopt my converter -- I'd certainly be happy to donate it. I'm just not sure if there is a good place for it. -Kevin From biopython at maubp.freeserve.co.uk Wed Aug 4 17:01:36 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 4 Aug 2010 18:01:36 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C59962C.2010105@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 5:32 PM, Nick Loman wrote: > > Kevin Jacobs wrote: >> >> I may have mislead you-- I use the 454PairAlign.txt and SFF files together >> to generate SAM/BAM files with full untrimmed read data and quality values. >> ?My recollection was that the Newbler ACE files contained only the consensus >> sequence and not the individuals reads, which is why I didn't go down that >> road. ?I routinely use "-noace" so I'm quickly realigning a small dataset to >> generate an example ACE file to verify this. ?If I am incorrect and >> alignment information is indeed available from the ACE files, I'll happily >> add support for them to my converter. > > Hi Kevin > > I'm pretty sure the ACE files contain the individual reads (or at the least, > the trimmed, aligned portions of them) because this is the file one uses in > Consed/Tablet to view an assembly. Yes, they do. But ACE files lack the quality scores for the reads (they just have quality scores for the consensus) which are required for SAM or BAM. You'd have to insert dummy values or get them from another file - Kevin says he takes them from the SFF file. > > We may of course be talking at cross-purposes! > Maybe :) Peter From biopython at maubp.freeserve.co.uk Wed Aug 4 17:03:19 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 4 Aug 2010 18:03:19 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5996E7.8010807@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 6:00 PM, Kevin Jacobs wrote: > On Wed, Aug 4, 2010 at 12:35 PM, Nick Loman wrote: >> [...] >> Not sure if there's scope for tighter integration to Biopython but will >> leave that to you experts! >> >> > Unlike much of my other code, there is no dependency on pysam, so there is > no reason why biopython couldn't adopt my converter -- I'd certainly be > happy to donate it. ? I'm just not sure if there is a good place for it. Is it a stand alone script using Biopython to parse ACE and SFF? Maybe we can include it in the scripts folder - at the very least you could add a link to it on http://www.biopython.org/wiki/Scriptcentral Peter From n.j.loman at bham.ac.uk Wed Aug 4 17:05:25 2010 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Wed, 04 Aug 2010 18:05:25 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> Message-ID: <4C599DD5.2010308@bham.ac.uk> Peter wrote: > Yes, they do. But ACE files lack the quality scores for the reads (they > just have quality scores for the consensus) which are required for SAM > or BAM. You'd have to insert dummy values or get them from another > file - Kevin says he takes them from the SFF file. > Hi Peter Right, that makes sense. In which case it should be possible to convert ACE (when accompanied by the SFF files), as an alternative to using 454PairAlign.txt as the input file. Certainly the ACE file contains the unique 454 read identifiers that would make it possible to pull read qualities from the SFF, although you would have to watch out for the Newbler partial read alignments (READ_ID.NtoN style) Cheers Nick From bioinformed at gmail.com Wed Aug 4 17:07:36 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 13:07:36 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5996E7.8010807@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 1:03 PM, Peter wrote: > On Wed, Aug 4, 2010 at 6:00 PM, Kevin Jacobs > wrote: > > On Wed, Aug 4, 2010 at 12:35 PM, Nick Loman > wrote: > >> [...] > >> Not sure if there's scope for tighter integration to Biopython but will > >> leave that to you experts! > >> > >> > > Unlike much of my other code, there is no dependency on pysam, so there > is > > no reason why biopython couldn't adopt my converter -- I'd certainly be > > happy to donate it. I'm just not sure if there is a good place for it. > > Is it a stand alone script using Biopython to parse ACE and SFF? > Maybe we can include it in the scripts folder - at the very least you could > add a link to it on http://www.biopython.org/wiki/Scriptcentral > > The code is here: http://code.google.com/p/glu-genetics/source/browse/glu/modules/seq/Newbler2SAM.py There are some fairly simple dependencies on GLU libraries and an optional Cython accelerator for CIGAR and NM computation, but otherwise it is fairly easy to make it stand alone. -Kevin From biopython at maubp.freeserve.co.uk Wed Aug 4 17:10:51 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 4 Aug 2010 18:10:51 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C599DD5.2010308@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> <4C599DD5.2010308@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 6:05 PM, Nick Loman wrote: > Peter wrote: >> >> Yes, they do. But ACE files lack the quality scores for the reads (they >> just have quality scores for the consensus) which are required for SAM >> or BAM. You'd have to insert dummy values or get them from another >> file - Kevin says he takes them from the SFF file. >> > > Hi Peter > > Right, that makes sense. In which case it should be possible to convert ACE > (when accompanied by the SFF files), as an alternative to using > 454PairAlign.txt as the input file. Certainly the ACE file contains the > unique 454 read identifiers that would make it possible to pull read > qualities from the SFF, although you would have to watch out for the Newbler > partial read alignments (READ_ID.NtoN style) > > Cheers > > Nick Or ask your Roche representatives to implement SAM/BAM output? Mind you, as the MIRA3 author has pointed out, SAM/BAM isn't so great for de novo assemblies. It doesn't even store the contig seq ;) Peter From bioinformed at gmail.com Wed Aug 4 17:11:28 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 13:11:28 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C599DD5.2010308@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> <4C599DD5.2010308@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 1:05 PM, Nick Loman wrote: > Peter wrote: > >> Yes, they do. But ACE files lack the quality scores for the reads (they >> just have quality scores for the consensus) which are required for SAM >> or BAM. You'd have to insert dummy values or get them from another >> file - Kevin says he takes them from the SFF file. >> >> > > Right, that makes sense. In which case it should be possible to convert ACE > (when accompanied by the SFF files), as an alternative to using > 454PairAlign.txt as the input file. Certainly the ACE file contains the > unique 454 read identifiers that would make it possible to pull read > qualities from the SFF, although you would have to watch out for the Newbler > partial read alignments (READ_ID.NtoN style) > > In that case, I'm happy to add support for those ACE files. My code already handles the NtoN trimming from the 454PairAlign files. If you'd like to send me (off list) a small example ACE file, I can likely have it working very quickly. -Kevin From bioinformed at gmail.com Wed Aug 4 17:12:02 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 13:12:02 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> <4C599DD5.2010308@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 1:10 PM, Peter wrote: > Or ask your Roche representatives to implement SAM/BAM output? > Mind you, as the MIRA3 author has pointed out, SAM/BAM isn't so > great for de novo assemblies. It doesn't even store the contig seq ;) > > They're working on it. :) -Kevin From biopython at maubp.freeserve.co.uk Wed Aug 4 17:24:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 4 Aug 2010 18:24:37 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> <4C599DD5.2010308@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 6:12 PM, Kevin Jacobs wrote: > On Wed, Aug 4, 2010 at 1:10 PM, Peter wrote: >> >> Or ask your Roche representatives to implement SAM/BAM output? >> Mind you, as the MIRA3 author has pointed out, SAM/BAM isn't so >> great for de novo assemblies. It doesn't even store the contig seq ;) >> > > They're working on it. ?:) > -Kevin > Do you mean Roche are working on SAM/BAM output? If so, that's good news. If you mean the SAM/BAM format, I'm on the samtools-devel list where they are discussing several specification improvements/additions, but I don't recall anything specifically aimed at de novo assemblies. Peter From bioinformed at gmail.com Wed Aug 4 17:32:22 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 13:32:22 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> <4C599DD5.2010308@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 1:24 PM, Peter wrote: > On Wed, Aug 4, 2010 at 6:12 PM, Kevin Jacobs wrote: > > On Wed, Aug 4, 2010 at 1:10 PM, Peter wrote: > >> > >> Or ask your Roche representatives to implement SAM/BAM output? > >> Mind you, as the MIRA3 author has pointed out, SAM/BAM isn't so > >> great for de novo assemblies. It doesn't even store the contig seq ;) > >> > > > > They're working on it. :) > > -Kevin > > > > Do you mean Roche are working on SAM/BAM output? If so, that's good news. Yes, I believe that Roche is working on SAM/BAM support for a future version of Newbler. As of a few weeks ago, when I last spoke to them, they were just gathering information and had yet to start on the implementation. I'd expect to see this feature with Newbler 2.6 (supporting the same features as 2.5 on the FLX, which was for the Jr only) or more likely with the subsequent 2.7 release. In other words, I'd be surprised to see it in the coming weeks or few months. > If you mean the SAM/BAM format, I'm on the samtools-devel list where > they are discussing several specification improvements/additions, but > I don't recall anything specifically aimed at de novo assemblies. > > I didn't see the Roche folks in that discussion, but I'll look again. As far as I know, they're not looking to change or alter the spec, but I could easily be wrong. I do keep in contact with a few of the folks at Roche, but have no deep insight into their future plans. -Kevin From bugzilla-daemon at portal.open-bio.org Wed Aug 4 18:00:37 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Aug 2010 14:00:37 -0400 Subject: [Biopython-dev] [Bug 3130] New: Broken links in Documentation to NCBI Blast Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3130 Summary: Broken links in Documentation to NCBI Blast Product: Biopython Version: Not Applicable Platform: All URL: http://www.biopython.org/DIST/docs/tutorial/Tutorial.htm l OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: mphillip at vt.edu The links to NCBI Blast in 7.1 of the Biopython Tutorial and Cookbook, http://www.biopython.org/DIST/docs/tutorial/Tutorial.html, are broken (they lead to HTTP 404 pages): http://www.ncbi.nlm.nih.gov/BLAST/blast_program.html should possibly be http://www.ncbi.nlm.nih.gov/BLAST/blast_program.shtml http://www.ncbi.nlm.nih.gov/BLAST/blast_databases.html should possibly be http://www.ncbi.nlm.nih.gov/BLAST/blast_databases.shtml -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bioinformed at gmail.com Wed Aug 4 20:41:06 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 16:41:06 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C59962C.2010105@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 12:32 PM, Nick Loman wrote: > I'm pretty sure the ACE files contain the individual reads (or at the > least, the trimmed, aligned portions of them) because this is the file one > uses in Consed/Tablet to view an assembly. We may of course be talking at > cross-purposes! > > Hi Nick, I've reviewed the Newbler ACE files and re-discovered the reason why they weren't ideal in the first place: the alignment records in Newbler?s output are gapped based on a pseudo-multiple-alignment of all of the reads to the reference, not a standard pairwise alignment. So there is no easy way to differentiate which gaps in each read were introduced as part of the pairwise alignment or as artifacts of the multi-way alignment. This means I'd need to re-compute the alignment to the reference, but should be relatively easy since the aligned start position is known using a round of the standard Smith-Waterman algorithm. In other words, it is technically possible to use Newbler's ACE files, but it really is simpler and easier to use the 454PairAlign.txt results. More so because the 454PairAlign.txt files are often vastly smaller than 454Contig.ace files. On the other hand, it should be easy to adapt my scripts to convert non-Newbler ACE files to SAM/BAM provided that the reads are gapped for pairwise alignment. It has been so long since I've used consed/phred/phrap that I don't remember if this is how it is normally done. -Kevin From bioinformed at gmail.com Wed Aug 4 20:44:25 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 4 Aug 2010 16:44:25 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> Message-ID: On Wed, Aug 4, 2010 at 4:41 PM, Kevin Jacobs < bioinformed at gmail.com> wrote: > On Wed, Aug 4, 2010 at 12:32 PM, Nick Loman wrote: > >> I'm pretty sure the ACE files contain the individual reads (or at the >> least, the trimmed, aligned portions of them) because this is the file one >> uses in Consed/Tablet to view an assembly. We may of course be talking at >> cross-purposes! >> >> > I've reviewed the Newbler ACE files and re-discovered the reason why they > weren't ideal in the first place: > Never mind-- I didn't realized the consensus sequence was gapped, so it is then trivial to recover the original pairwise alignments. I'll have a version of my Newbler2SAM module that can process ACE files shortly. -Kevin From bugzilla-daemon at portal.open-bio.org Wed Aug 4 21:14:30 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Aug 2010 17:14:30 -0400 Subject: [Biopython-dev] [Bug 3130] Broken links in Documentation to NCBI Blast In-Reply-To: Message-ID: <201008042114.o74LEUCs030000@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3130 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-04 17:14 EST ------- Confirmed. I wonder why the NCBI changed this without putting redirects in place? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 6 19:05:30 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Aug 2010 15:05:30 -0400 Subject: [Biopython-dev] [Bug 3109] Record class in Bio.SCOP.Cla has hierarchy member as list instead of dictionary In-Reply-To: Message-ID: <201008061905.o76J5UTt014104@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3109 jeffrey.finkelstein at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1522 is|0 |1 obsolete| | ------- Comment #2 from jeffrey.finkelstein at gmail.com 2010-08-06 15:05 EST ------- Created an attachment (id=1538) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1538&action=view) Fix bug #3109: replace Bio.SCOP.Cla.Record.hierarchy list with dictionary Updated patch for bug #3109, without removed trailing newlines -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 6 19:51:40 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Aug 2010 15:51:40 -0400 Subject: [Biopython-dev] [Bug 3109] Record class in Bio.SCOP.Cla has hierarchy member as list instead of dictionary In-Reply-To: Message-ID: <201008061951.o76JpesH015503@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3109 jeffrey.finkelstein at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- URL|http://github.com/jfinkels/b|http://github.com/jfinkels/b |iopython/commit/6d2257dd0c46|iopython/commit/e52255f06c08 |abdf1ecd14b8bc660e32a205630a|aad1556eb518e2e0da8f030765ff Version|1.54b |1.54 ------- Comment #3 from jeffrey.finkelstein at gmail.com 2010-08-06 15:51 EST ------- (In reply to comment #2) > Created an attachment (id=1538) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1538&action=view) [details] > Fix bug #3109: replace Bio.SCOP.Cla.Record.hierarchy list with dictionary > > Updated patch for bug #3109, without removed trailing newlines > I have update the patch for this bug so that it no longer removes the trailing newlines. I will open a new bug for that change. The fix can be found at my personal github fork of Biopython: http://github.com/jfinkels/biopython/commit/e52255f06c08aad1556eb518e2e0da8f030765ff Here is a before/after demonstration of usage of the code which this patch affects. Currently, to select, for example, all PDB chains in superfamily 50156, one must use something similar to the following code: import Bio.SCOP.Cla SCOP_CLA_FILE = 'dir.cla.scop.txt_1.75' records = [] with open(SCOP_CLA_FILE, 'r') as f: for record in Bio.SCOP.Cla.parse(f): for key, value in record.hierarchy: if key == 'sf' and value == 50156: records.append(record) print [record.residues.pdbid for record in records] With this patch, the hierarchy key/value pairs can be accessed like a dictionary: import Bio.SCOP.Cla SCOP_CLA_FILE = 'dir.cla.scop.txt_1.75' with open(SCOP_CLA_FILE, 'r') as f: records = [record for record in Bio.SCOP.Cla.parse(f) if record.hierarchy['sf'] == 50156] print [record.residues.pdbid for record in records] The benefit is greater with more complex selections of sets of chains (for example, to select all families within a superfamily). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jeffrey.finkelstein at gmail.com Fri Aug 6 20:09:31 2010 From: jeffrey.finkelstein at gmail.com (Jeffrey Finkelstein) Date: Fri, 6 Aug 2010 16:09:31 -0400 Subject: [Biopython-dev] Bug #3109: Record class in Bio.SCOP.Cla has hierarchy member as list instead of dictionary Message-ID: I have submitted a bug report and a patch for a current feature of the Bio.SCOP.Cla module that makes using it somewhat difficult. Specifically, the Bio.Scop.Cla.Record class has a "hierarchy" member which is currently a list, but should be a dictionary, according to the SCOP parseable classification file format "specification" (it is an informal specification) here: http://scop.mrc-lmb.cam.ac.uk/scop/release-notes.html#scop-parseable-files. For a usage example, see the comments I have made at: http://bugzilla.open-bio.org/show_bug.cgi?id=3109 I have CC'ed the original author of the module. Gavin (or anyone else), do you have any objections to this change? Jeffrey From updates at feedmyinbox.com Sat Aug 7 07:14:21 2010 From: updates at feedmyinbox.com (Feed My Inbox) Date: Sat, 7 Aug 2010 03:14:21 -0400 Subject: [Biopython-dev] 8/7 newest questions tagged biopython - Stack Overflow Message-ID: ==================== 1. How do I parse data in a table using Biopython? ==================== August 6, 2010 at 5:53 AM Hello, I want to screen a particular column in a table using biopython. I want to parse the table and retain only entries not having "empty spaces" in a particular column. Please any ideas? http://stackoverflow.com/questions/3422677/how-do-i-parse-data-in-a-table-using-biopython -------------------- ==================== Source: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest This email was sent to biopython-dev at lists.open-bio.org. Account Login: https://www.feedmyinbox.com/members/login/ Don't want to receive this feed any longer? Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/444425/3d4c1dce7770d369dc1ad81e140923e46ae95832/ -------------------- This email was carefully delivered by FeedMyInbox.com. 230 Franklin Road Suite 814 Franklin, TN 37064 From thomasvangurp at gmail.com Sun Aug 8 09:55:11 2010 From: thomasvangurp at gmail.com (Thomas van Gurp) Date: Sun, 8 Aug 2010 11:55:11 +0200 Subject: [Biopython-dev] unsubscribe Message-ID: 2010/8/5 > Send Biopython-dev mailing list submissions to > biopython-dev at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/biopython-dev > or, via email, send a message with subject or body 'help' to > biopython-dev-request at lists.open-bio.org > > You can reach the person managing the list at > biopython-dev-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Biopython-dev digest..." > > > Today's Topics: > > 1. Re: Newbler ACE file to SAM? > (Kevin Jacobs ) > 2. [Bug 3130] New: Broken links in Documentation to NCBI Blast > (bugzilla-daemon at portal.open-bio.org) > 3. Re: Newbler ACE file to SAM? > (Kevin Jacobs ) > 4. Re: Newbler ACE file to SAM? > (Kevin Jacobs ) > 5. [Bug 3130] Broken links in Documentation to NCBI Blast > (bugzilla-daemon at portal.open-bio.org) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 4 Aug 2010 13:32:22 -0400 > From: "Kevin Jacobs " > Subject: Re: [Biopython-dev] Newbler ACE file to SAM? > To: Peter > Cc: "biopython-dev at biopython.org" > Message-ID: > > > > Content-Type: text/plain; charset=ISO-8859-1 > > On Wed, Aug 4, 2010 at 1:24 PM, Peter >wrote: > > > On Wed, Aug 4, 2010 at 6:12 PM, Kevin Jacobs wrote: > > > On Wed, Aug 4, 2010 at 1:10 PM, Peter wrote: > > >> > > >> Or ask your Roche representatives to implement SAM/BAM output? > > >> Mind you, as the MIRA3 author has pointed out, SAM/BAM isn't so > > >> great for de novo assemblies. It doesn't even store the contig seq ;) > > >> > > > > > > They're working on it. :) > > > -Kevin > > > > > > > Do you mean Roche are working on SAM/BAM output? If so, that's good news. > > > > Yes, I believe that Roche is working on SAM/BAM support for a future > version > of Newbler. As of a few weeks ago, when I last spoke to them, they were > just gathering information and had yet to start on the implementation. I'd > expect to see this feature with Newbler 2.6 (supporting the same features > as > 2.5 on the FLX, which was for the Jr only) or more likely with the > subsequent 2.7 release. In other words, I'd be surprised to see it in the > coming weeks or few months. > > > > If you mean the SAM/BAM format, I'm on the samtools-devel list where > > they are discussing several specification improvements/additions, but > > I don't recall anything specifically aimed at de novo assemblies. > > > > > I didn't see the Roche folks in that discussion, but I'll look again. As > far as I know, they're not looking to change or alter the spec, but I could > easily be wrong. I do keep in contact with a few of the folks at Roche, > but > have no deep insight into their future plans. > > -Kevin > > > ------------------------------ > > Message: 2 > Date: Wed, 4 Aug 2010 14:00:37 -0400 > From: bugzilla-daemon at portal.open-bio.org > Subject: [Biopython-dev] [Bug 3130] New: Broken links in Documentation > to NCBI Blast > To: biopython-dev at biopython.org > Message-ID: > > http://bugzilla.open-bio.org/show_bug.cgi?id=3130 > > Summary: Broken links in Documentation to NCBI Blast > Product: Biopython > Version: Not Applicable > Platform: All > URL: > http://www.biopython.org/DIST/docs/tutorial/Tutorial.htm > l > OS/Version: All > Status: NEW > Severity: normal > Priority: P2 > Component: Documentation > AssignedTo: biopython-dev at biopython.org > ReportedBy: mphillip at vt.edu > > > The links to NCBI Blast in 7.1 of the Biopython Tutorial and Cookbook, > http://www.biopython.org/DIST/docs/tutorial/Tutorial.html, are broken > (they > lead to HTTP 404 pages): > > http://www.ncbi.nlm.nih.gov/BLAST/blast_program.html should possibly be > http://www.ncbi.nlm.nih.gov/BLAST/blast_program.shtml > > http://www.ncbi.nlm.nih.gov/BLAST/blast_databases.html should possibly be > http://www.ncbi.nlm.nih.gov/BLAST/blast_databases.shtml > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > > > ------------------------------ > > Message: 3 > Date: Wed, 4 Aug 2010 16:41:06 -0400 > From: "Kevin Jacobs " > Subject: Re: [Biopython-dev] Newbler ACE file to SAM? > To: Nick Loman > Cc: "biopython-dev at biopython.org" > Message-ID: > > > > Content-Type: text/plain; charset=windows-1252 > > On Wed, Aug 4, 2010 at 12:32 PM, Nick Loman wrote: > > > I'm pretty sure the ACE files contain the individual reads (or at the > > least, the trimmed, aligned portions of them) because this is the file > one > > uses in Consed/Tablet to view an assembly. We may of course be talking at > > cross-purposes! > > > > > Hi Nick, > > I've reviewed the Newbler ACE files and re-discovered the reason why they > weren't ideal in the first place: the alignment records in Newbler?s output > are gapped based on a pseudo-multiple-alignment of all of the reads to the > reference, not a standard pairwise alignment. So there is no easy way to > differentiate which gaps in each read were introduced as part of the > pairwise alignment or as artifacts of the multi-way alignment. This means > I'd need to re-compute the alignment to the reference, but should be > relatively easy since the aligned start position is known using a round of > the standard Smith-Waterman algorithm. > > In other words, it is technically possible to use Newbler's ACE files, but > it really is simpler and easier to use the 454PairAlign.txt results. More > so because the 454PairAlign.txt files are often vastly smaller than > 454Contig.ace files. > > On the other hand, it should be easy to adapt my scripts to convert > non-Newbler ACE files to SAM/BAM provided that the reads are gapped for > pairwise alignment. It has been so long since I've used consed/phred/phrap > that I don't remember if this is how it is normally done. > > -Kevin > > > > ------------------------------ > > Message: 4 > Date: Wed, 4 Aug 2010 16:44:25 -0400 > From: "Kevin Jacobs " > Subject: Re: [Biopython-dev] Newbler ACE file to SAM? > To: Nick Loman > Cc: "biopython-dev at biopython.org" > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > On Wed, Aug 4, 2010 at 4:41 PM, Kevin Jacobs < > bioinformed at gmail.com> wrote: > > > On Wed, Aug 4, 2010 at 12:32 PM, Nick Loman > wrote: > > > >> I'm pretty sure the ACE files contain the individual reads (or at the > >> least, the trimmed, aligned portions of them) because this is the file > one > >> uses in Consed/Tablet to view an assembly. We may of course be talking > at > >> cross-purposes! > >> > >> > > I've reviewed the Newbler ACE files and re-discovered the reason why they > > weren't ideal in the first place: > > > > Never mind-- I didn't realized the consensus sequence was gapped, so it is > then trivial to recover the original pairwise alignments. I'll have a > version of my Newbler2SAM module that can process ACE files shortly. > > -Kevin > > > ------------------------------ > > Message: 5 > Date: Wed, 4 Aug 2010 17:14:30 -0400 > From: bugzilla-daemon at portal.open-bio.org > Subject: [Biopython-dev] [Bug 3130] Broken links in Documentation to > NCBI Blast > To: biopython-dev at biopython.org > Message-ID: <201008042114.o74LEUCs030000 at portal.open-bio.org> > > http://bugzilla.open-bio.org/show_bug.cgi?id=3130 > > > > > > ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-04 17:14 EST ------- > Confirmed. > > I wonder why the NCBI changed this without putting redirects in place? > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > > > ------------------------------ > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > End of Biopython-dev Digest, Vol 91, Issue 7 > ******************************************** > -- Met vriendelijke Groet, Thomas van Gurp From biopython at maubp.freeserve.co.uk Wed Aug 11 10:29:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Aug 2010 11:29:32 +0100 Subject: [Biopython-dev] Bug #3109: Record class in Bio.SCOP.Cla has hierarchy member as list instead of dictionary In-Reply-To: References: Message-ID: On Fri, Aug 6, 2010 at 9:09 PM, Jeffrey Finkelstein wrote: > I have submitted a bug report and a patch for a current feature of the > Bio.SCOP.Cla module that makes using it somewhat difficult. Specifically, > the Bio.Scop.Cla.Record class has a "hierarchy" member which is currently a > list, but should be a dictionary, according to the SCOP parseable > classification file format "specification" (it is an informal specification) > here: > http://scop.mrc-lmb.cam.ac.uk/scop/release-notes.html#scop-parseable-files. > > For a usage example, see the comments I have made at: > http://bugzilla.open-bio.org/show_bug.cgi?id=3109 > > I have CC'ed the original author of the module. Gavin (or anyone else), do > you have any objections to this change? > > Jeffrey As I commented on the bug, I'm happy with this change in principle, except for the fact it breaks backwards compatibility. If we ask on the main list and there are no objections, then the code proposed looks fine: http://github.com/jfinkels/biopython/commit/e52255f06c08aad1556eb518e2e0da8f030765ff Peter From biopython at maubp.freeserve.co.uk Thu Aug 12 16:37:14 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 12 Aug 2010 17:37:14 +0100 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 Message-ID: Hi Eric (et al), Is test_PhyloXML.py working for you under Python 3? I'm getting the following (both with and without the 2to3 --nofix=long option): $ python3 test_PhyloXML.py test_clade_getitem (__main__.MethodTests) Clade.__getitem__: get sub-clades by extended indexing. ... ERROR test_clade_to_phylogeny (__main__.MethodTests) Convert a Clade object to a new Phylogeny. ... ERROR ... Traceback (most recent call last): File "test_PhyloXML.py", line 571, in test_phylo 'test_Taxonomy', 'test_Uri', File "test_PhyloXML.py", line 504, in _rewrite_and_call phx = PhyloXMLIO.read(infile) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 105, in read return Parser(file).read() File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 298, in __init__ event, root = next(context) File "", line 59, in __iter__ TypeError: invalid event tuple ---------------------------------------------------------------------- Ran 47 tests in 0.015s All the sub-tests in test_PhyloXML.py are failing the same way. >From memory this was working recently. Peter From bugzilla-daemon at portal.open-bio.org Thu Aug 12 17:16:07 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Aug 2010 13:16:07 -0400 Subject: [Biopython-dev] [Bug 2790] Genepop parser creates a full representation of the file on memory In-Reply-To: Message-ID: <201008121716.o7CHG7A8021694@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2790 ------- Comment #1 from jeffrey.finkelstein at gmail.com 2010-08-12 13:16 EST ------- The original reporter of this bug has provided a "large" FileParser: http://github.com/biopython/biopython/commit/507839a8868f9d35dc73e6195947019e3ac7fe6b Is there a reason not to use this memory-saving generator method as the only file parser? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 12 17:23:27 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Aug 2010 13:23:27 -0400 Subject: [Biopython-dev] [Bug 2790] Genepop parser creates a full representation of the file on memory In-Reply-To: Message-ID: <201008121723.o7CHNRqO021882@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2790 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-12 13:23 EST ------- Hi Jeffrey, I'd guess Tiago is concerned with backwards compatibility, or that the all in memory case is very useful for typical smaller analyses. [If it wasn't clear, Tiago is the module owner for Bio.PopGen] Tiago, can we mark this enhancement as fixed now? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Fri Aug 13 01:24:25 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 12 Aug 2010 21:24:25 -0400 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: References: Message-ID: On Thu, Aug 12, 2010 at 12:37 PM, Peter wrote: > Hi Eric (et al), > > Is test_PhyloXML.py working for you under Python 3? > > I'm getting the following (both with and without the 2to3 --nofix=long > option): > > $ python3 test_PhyloXML.py > ... > File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", > line 298, in __init__ > event, root = next(context) > File "", line 59, in __iter__ > TypeError: invalid event tuple > > ---------------------------------------------------------------------- > Ran 47 tests in 0.015s > > All the sub-tests in test_PhyloXML.py are failing the same way. > > >From memory this was working recently. > > Yeah, it was... it's fixed now/again. This is the issue with passing byte/unicode strings to cElementTree in Python 3. I had a check for Python versions 3.0.0 through 3.1.1, where we need to import ElementTree instead of cElementTree. Apparently Python 3.1.2 still has the bug. -Eric From bugzilla-daemon at portal.open-bio.org Fri Aug 13 09:12:25 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Aug 2010 05:12:25 -0400 Subject: [Biopython-dev] [Bug 2790] Genepop parser creates a full representation of the file on memory In-Reply-To: Message-ID: <201008130912.o7D9CPKN004338@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2790 tiagoantao at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from tiagoantao at gmail.com 2010-08-13 05:12 EST ------- The in-memory parser seems to be orders of magnitude faster than the non-memory one. Therefore it might make sense to maintain both. Also for retro-compatibility. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Aug 13 10:29:23 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 13 Aug 2010 11:29:23 +0100 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: References: Message-ID: On Fri, Aug 13, 2010 at 2:24 AM, Eric Talevich wrote: > On Thu, Aug 12, 2010 at 12:37 PM, Peter wrote: > >> Hi Eric (et al), >> >> Is test_PhyloXML.py working for you under Python 3? >> >> I'm getting the following (both with and without the 2to3 --nofix=long >> option): >> >> $ python3 test_PhyloXML.py >> ... >> ?File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", >> line 298, in __init__ >> ? ?event, root = next(context) >> ?File "", line 59, in __iter__ >> TypeError: invalid event tuple >> >> ---------------------------------------------------------------------- >> Ran 47 tests in 0.015s >> >> All the sub-tests in test_PhyloXML.py are failing the same way. >> >> >From memory this was working recently. >> >> > Yeah, it was... it's fixed now/again. > > This is the issue with passing byte/unicode strings to cElementTree in > Python 3. I had a check for Python versions 3.0.0 through 3.1.1, where we > need to import ElementTree instead of cElementTree. Apparently Python 3.1.2 > still has the bug. > > -Eric Yep - much better. However, I'm still seeing four failures with Python 3.1.2 which appear to be related to float/int/long conversion: ERROR: test_made (__main__.WriterTests) Round-trip parsing and serialization of made_up.xml. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PhyloXML.py", line 550, in test_made (TreeTests, ['test_Confidence', 'test_Polygon']), File "test_PhyloXML.py", line 512, in _rewrite_and_call getattr(inst, test)() File "test_PhyloXML.py", line 360, in test_Polygon tree = PhyloXMLIO.read(EX_MADE).phylogenies[1] File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 105, in read return Parser(file).read() File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 314, in read phylogeny = self._parse_phylogeny(elem) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 356, in _parse_phylogeny phylogeny.root = self._parse_clade(elem) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 402, in _parse_clade clade.clades.append(self._parse_clade(elem)) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 422, in _parse_clade getattr(self, tag)(elem)) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 553, in distribution polygons=_get_children_as(elem, 'polygon', self.polygon)) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 195, in _get_children_as parent.findall(_ns(tag))] File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 194, in return [construct(child) for child in File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 595, in polygon points=_get_children_as(elem, 'point', self.point)) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 195, in _get_children_as parent.findall(_ns(tag))] File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 194, in return [construct(child) for child in File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 589, in point _get_child_text(elem, 'long', float), File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 187, in _get_child_text return construct(child.text) ValueError: could not convert string to float: ====================================================================== ERROR: test_phylo (__main__.WriterTests) Round-trip parsing and serialization of phyloxml_examples.xml. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PhyloXML.py", line 571, in test_phylo 'test_Taxonomy', 'test_Uri', File "test_PhyloXML.py", line 512, in _rewrite_and_call getattr(inst, test)() File "test_PhyloXML.py", line 176, in test_Phyloxml phx = PhyloXMLIO.read(EX_PHYLO) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 105, in read return Parser(file).read() File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 314, in read phylogeny = self._parse_phylogeny(elem) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 356, in _parse_phylogeny phylogeny.root = self._parse_clade(elem) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 402, in _parse_clade clade.clades.append(self._parse_clade(elem)) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 402, in _parse_clade clade.clades.append(self._parse_clade(elem)) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 422, in _parse_clade getattr(self, tag)(elem)) File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 552, in distribution points=_get_children_as(elem, 'point', self.point), File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 195, in _get_children_as parent.findall(_ns(tag))] File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 194, in return [construct(child) for child in File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 589, in point _get_child_text(elem, 'long', float), File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", line 187, in _get_child_text return construct(child.text) ValueError: could not convert string to float: ====================================================================== FAIL: test_Distribution (__main__.TreeTests) Instantiation of Distribution objects. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PhyloXML.py", line 322, in test_Distribution self.assertEqual(point.long, longi) AssertionError: != 8.769303 ====================================================================== FAIL: test_Polygon (__main__.TreeTests) Instantiation of Polygon objects. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PhyloXML.py", line 378, in test_Polygon self.assertEqual(point.long, longi) AssertionError: != 8.769303 ---------------------------------------------------------------------- From bugzilla-daemon at portal.open-bio.org Fri Aug 13 16:14:08 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Aug 2010 12:14:08 -0400 Subject: [Biopython-dev] [Bug 3130] Broken links in Documentation to NCBI Blast In-Reply-To: Message-ID: <201008131614.o7DGE8we020360@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3130 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-13 12:14 EST ------- Updated, http://github.com/biopython/biopython/commit/9049dd7f424b27c3a386533bfdf8f0e423091e3b Thanks -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 13 16:15:13 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Aug 2010 12:15:13 -0400 Subject: [Biopython-dev] [Bug 3102] Error converting sff into fastq In-Reply-To: Message-ID: <201008131615.o7DGFDTf020499@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3102 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-13 12:15 EST ------- I'm closing this bug as INVALID on the assumption it was a corrupt SFF file. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 13 16:50:06 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Aug 2010 12:50:06 -0400 Subject: [Biopython-dev] [Bug 3100] Bio.PDB.ResidueDepth distance calculation error In-Reply-To: Message-ID: <201008131650.o7DGo6F2021556@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3100 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-13 12:50 EST ------- Hi Andres, Installing MSMS was a pain, still not sure what the best way to deal with the atmtypenumbers file is for installation. And I found the nawk versus awk bug: http://mgldev.scripps.edu/pipermail/msms/2006q1/000006.html But yes, I can confirm the bug and the fix. Fix checked in: http://github.com/biopython/biopython/commit/aaac859df5e8d6a6b3a1304ad8b5c7c6163c4433 Thank you for your contribution, Peter P.S. Your example's import statements were incomplete, e.g. from Bio.PDB import PDBParser from Bio.PDB.ResidueDepth import get_surface, min_dist -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 13 16:59:42 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Aug 2010 12:59:42 -0400 Subject: [Biopython-dev] [Bug 3127] Set SeqRecord description in SeqIO "tab" parser In-Reply-To: Message-ID: <201008131659.o7DGxgvO021809@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3127 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-13 12:59 EST ------- Fixed on trunk: http://github.com/biopython/biopython/commit/f83e80c82d804b50b290fd42aa4d4d2a3d664363 Thanks for the feedback, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 13 17:02:01 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Aug 2010 13:02:01 -0400 Subject: [Biopython-dev] [Bug 3118] isinstance should use basestring for detecting string type In-Reply-To: Message-ID: <201008131702.o7DH21Zl021947@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3118 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-13 13:02 EST ------- I'm marking this as fixed, although some of the other cases may need to be looked at as part of the Python 3 work (where byte strings/unicode can be more of an issue). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 13 17:52:49 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Aug 2010 13:52:49 -0400 Subject: [Biopython-dev] [Bug 3096] PPBuilder build_peptides bugs In-Reply-To: Message-ID: <201008131752.o7DHqnnq023520@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3096 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-13 13:52 EST ------- Hi Siong, I've been going over your example again (and adding some doctests to Bio/PDB/Polypeptide.py as well). It seems to me that in order to show this "bug" you have had to override the builder class' private _accept() method. If in doing so you break the default build_peptides() method, then you should probably also override that too. Can you show a problem without subclassing the builder object? There may be scope for enhancement, but you haven't convinced me there is a bug here. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Aug 13 18:18:04 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 13 Aug 2010 19:18:04 +0100 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? Message-ID: Hi all, We've probably clocked up enough bug fixes and additions to justify a new release (even though there are still things waiting which look close to being ready, e.g. uniprot-xml and imgt parsing). Alternatively, should we delay while we work to get some more of the new stuff tested and merged? Regarding Python 2.7 support, it all looks fine. However, for the Windows installers we are waiting on an official NumPy installer for Python 2.7 (i.e. NumPy 1.5 which is due shortly I understand). I'm aware that in the course of the Python 3 work so far, we've touched quite a lot of the code - and some areas are still not fully covered by the unit tests. With that in mind, I think a beta release would be a prudent thing to do - primarily in the hope of end users spotting any issues which the unit tests have not revealed. Does doing a beta release some time next week sound like a good plan, with the official release say a week or two later? Are there any blocker issues we should be addressing first? e.g. test_NCBI_BLAST_tools.py fails with the latest BLAST+, we need to update the application wrappers as the NCBI have changed a few of the switches. Regards, Peter From bugzilla-daemon at portal.open-bio.org Fri Aug 13 22:23:24 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Aug 2010 18:23:24 -0400 Subject: [Biopython-dev] [Bug 3096] PPBuilder build_peptides bugs In-Reply-To: Message-ID: <201008132223.o7DMNOcJ018254@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3096 skong at zymeworks.com changed: What |Removed |Added ---------------------------------------------------------------------------- Version|Not Applicable |1.53 ------- Comment #3 from skong at zymeworks.com 2010-08-13 18:23 EST ------- Hi Peter, I manage to produce the problem without modifying _accept(). DIAGNOSTIC SCRIPT: from Bio.PDB.PDBParser import PDBParser from Bio.PDB.Polypeptide import PPBuilder, is_aa def extract_peptides(model): """Extracts the peptides from a model. Returns a list of Peptide object.""" output = [] for peptide in PPBuilder().build_peptides(model): seq = str(peptide.get_sequence()) output.append(seq) return output if __name__ == '__main__': pdb = open('chopped_pdb1bfe_noca.ent') st = PDBParser().get_structure('', pdb) seqa = extract_peptides(st) print 'no ca seq all' print seqa PDB FILE: chopped_pdb1bfe_noca.ent ATOM 85 N ILE A 316 37.386 71.217 31.070 1.00 36.97 N ATOM 86 CA ILE A 316 38.311 71.290 29.949 1.00 33.71 C ATOM 87 C ILE A 316 37.634 72.103 28.862 1.00 33.93 C ATOM 88 O ILE A 316 36.415 72.216 28.839 1.00 36.46 O ATOM 89 CB ILE A 316 38.651 69.876 29.404 1.00 35.79 C ATOM 90 CG1 ILE A 316 39.331 69.049 30.501 1.00 36.78 C ATOM 91 CG2 ILE A 316 39.572 69.979 28.187 1.00 37.71 C ATOM 92 CD1 ILE A 316 39.881 67.724 30.023 1.00 39.20 C ATOM 93 N HIS A 317 38.425 72.679 27.969 1.00 35.61 N ATOM 94 CA HIS A 317 37.880 73.473 26.881 1.00 37.92 C ATOM 95 C HIS A 317 38.360 72.928 25.540 1.00 37.79 C ATOM 96 O HIS A 317 39.463 73.240 25.094 1.00 37.44 O ATOM 97 CB HIS A 317 38.303 74.930 27.052 1.00 35.19 C ATOM 98 CG HIS A 317 37.888 75.519 28.363 1.00 35.76 C ATOM 99 ND1 HIS A 317 36.611 75.981 28.602 1.00 37.74 N ATOM 100 CD2 HIS A 317 38.575 75.701 29.516 1.00 37.59 C ATOM 101 CE1 HIS A 317 36.529 76.420 29.844 1.00 38.74 C ATOM 102 NE2 HIS A 317 37.706 76.262 30.421 1.00 36.76 N ATOM 103 N ARG A 318 37.527 72.109 24.905 1.00 38.78 N ATOM 104 CA ARG A 318 37.884 71.512 23.627 1.00 42.04 C ATOM 105 C ARG A 318 38.469 72.559 22.699 1.00 45.14 C ATOM 106 O ARG A 318 39.592 72.425 22.205 1.00 42.05 O ATOM 107 CB ARG A 318 36.657 70.880 22.967 1.00 42.93 C ATOM 108 CG ARG A 318 36.934 70.321 21.576 1.00 38.60 C ATOM 109 CD ARG A 318 35.654 70.038 20.821 1.00 35.39 C ATOM 110 NE ARG A 318 34.624 69.538 21.724 1.00 34.96 N ATOM 111 CZ ARG A 318 34.539 68.278 22.141 1.00 31.51 C ATOM 112 NH1 ARG A 318 35.419 67.373 21.736 1.00 25.19 N ATOM 113 NH2 ARG A 318 33.579 67.929 22.983 1.00 29.10 N ATOM 114 N XLY A 319 37.690 73.604 22.461 1.00 49.96 N ATOM 115 CX XLY A 319 38.138 74.668 21.592 1.00 55.53 C ATOM 116 C XLY A 319 38.459 74.219 20.180 1.00 58.85 C ATOM 117 O XLY A 319 37.583 73.766 19.440 1.00 58.98 O ATOM 118 N SER A 320 39.734 74.334 19.823 1.00 61.64 N ATOM 119 CA SER A 320 40.219 73.992 18.493 1.00 63.16 C ATOM 120 C SER A 320 40.212 72.517 18.110 1.00 65.27 C ATOM 121 O SER A 320 39.558 72.127 17.145 1.00 65.12 O ATOM 122 CB SER A 320 41.634 74.542 18.316 1.00 65.36 C ATOM 123 OG SER A 320 42.124 74.255 17.019 1.00 72.05 O ATOM 124 N THR A 321 40.955 71.702 18.853 1.00 67.43 N ATOM 125 CA THR A 321 41.049 70.274 18.562 1.00 67.73 C ATOM 126 C THR A 321 40.220 69.430 19.529 1.00 66.41 C ATOM 127 O THR A 321 39.244 69.917 20.095 1.00 70.21 O ATOM 128 CB THR A 321 42.517 69.810 18.620 1.00 70.22 C ATOM 129 OG1 THR A 321 42.613 68.453 18.169 1.00 77.03 O ATOM 130 CG2 THR A 321 43.049 69.915 20.045 1.00 72.07 C ATOM 131 N GLY A 322 40.608 68.168 19.707 1.00 61.22 N ATOM 132 CA GLY A 322 39.892 67.286 20.614 1.00 53.23 C ATOM 133 C GLY A 322 40.037 67.705 22.065 1.00 48.00 C ATOM 134 O GLY A 322 40.138 68.892 22.372 1.00 50.41 O ATOM 135 N LEU A 323 40.044 66.734 22.968 1.00 41.92 N ATOM 136 CA LEU A 323 40.190 67.033 24.385 1.00 35.58 C ATOM 137 C LEU A 323 41.613 66.738 24.874 1.00 31.41 C ATOM 138 O LEU A 323 41.932 66.921 26.046 1.00 30.47 O ATOM 139 CB LEU A 323 39.160 66.240 25.191 1.00 35.76 C ATOM 140 CG LEU A 323 37.716 66.576 24.802 1.00 39.50 C ATOM 141 CD1 LEU A 323 36.733 65.796 25.670 1.00 38.15 C ATOM 142 CD2 LEU A 323 37.493 68.074 24.955 1.00 38.58 C The output peptides should be: ['IHR',STGL'] not ['IHRXTGL'] in the current version. Residue XLY A 319 or X in the fourth position should not be included since it doesn't have CA atom. Instead the current version includes it and remove the 'S' next to it, due to the same bug. One can get the right version using the patch provided before. Whether the _accept is modified or not the bug remains. Also the user should not be expected to also modify build_peptides() method whenever PPBuilder _accept is modified since the accept variable in build_peptides isn't really a local (private) variable: In line 277 this variable accept is referenced from self.accept of PPBuilder. http://www.biopython.org/DIST/docs/api/Bio.PDB.Polypeptide-pysrc.html 277 accept=self._accept On a side note the "aa_only" optional input variable for build_peptides() and its comments are very misleading (@param aa_only: if 1, the residue needs to be a standard AA). "aa_only" is meant as a flag that tells peptide_builder to start filtering amino acids that are not to be accepted, and by default it is turned on and without modifying _accept of PeptideBuilder only residues with "CA" atom are accepted (line 250-264), not standard amino acids as the comment states. In other words without modifying _accept in PeptideBuilder non standard amino acid will still be accepted and included in the peptides built. Only when overriding the _accept method of PeptideBuilder (as I did before) would build_peptides() not include non-standard amino acids. I suggest renaming "aa_only" to something more sensible like "filter_aa". http://www.biopython.org/DIST/docs/api/Bio.PDB.Polypeptide-pysrc.html 266 - def build_peptides(self, entity, aa_only=1): 273 @param aa_only: if 1, the residue needs to be a standard AA 274 @type aa_only: int -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From anaryin at gmail.com Fri Aug 13 23:44:40 2010 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Sat, 14 Aug 2010 00:44:40 +0100 Subject: [Biopython-dev] GSOC Bio.PDB Project - Final Summary Message-ID: Dear all, The results of the GSOC 2010 project are in the wiki page: http://biopython.org/wiki/GSOC2010_Joao I also started writing a Struct page regarding that same module: http://biopython.org/wiki/Struct Comments appreciated :) I will be maintaining these features in the future and adding some others as well. Best regards to all! Jo?o [...] Rodrigues @ http://doeidoei.wordpress.org From mjldehoon at yahoo.com Sat Aug 14 02:23:29 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 13 Aug 2010 19:23:29 -0700 (PDT) Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: Message-ID: <816847.43152.qm@web62406.mail.re1.yahoo.com> I'm OK with a new release, provided we can fix the test errors. I have been looking at the Blast parsers (as discussed previously) but this turned out to be more difficult than expected; a new release should not wait for it. --Michiel. --- On Fri, 8/13/10, Peter wrote: > From: Peter > Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? > To: "Biopython-Dev Mailing List" > Date: Friday, August 13, 2010, 2:18 PM > Hi all, > > We've probably clocked up enough bug fixes and additions to > justify a > new release (even though there are still things waiting > which look close > to being ready, e.g. uniprot-xml and imgt parsing). > Alternatively, should > we delay while we work to get some more of the new stuff > tested and > merged? > > Regarding Python 2.7 support, it all looks fine. However, > for the Windows > installers we are waiting on an official NumPy installer > for Python 2.7 > (i.e. NumPy 1.5 which is due shortly I understand). > > I'm aware that in the course of the Python 3 work so far, > we've touched > quite a lot of the code - and some areas are still not > fully covered by the > unit tests. With that in mind, I think a beta release would > be a prudent > thing to do - primarily in the hope of end users spotting > any issues which > the unit tests have not revealed. > > Does doing a beta release some time next week sound like a > good plan, > with the official release say a week or two later? Are > there any blocker > issues we should be addressing first? > > e.g. test_NCBI_BLAST_tools.py fails with the latest BLAST+, > we > need to update the application wrappers as the NCBI have > changed > a few of the switches. > > Regards, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython at maubp.freeserve.co.uk Mon Aug 16 13:10:30 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Aug 2010 14:10:30 +0100 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: <816847.43152.qm@web62406.mail.re1.yahoo.com> References: <816847.43152.qm@web62406.mail.re1.yahoo.com> Message-ID: On Sat, Aug 14, 2010 at 3:23 AM, Michiel de Hoon wrote: > I'm OK with a new release, provided we can fix the test errors. I've sorted out test_NCBI_BLAST_tools.py, BLAST 2.2.23+ added -off_diagonal_range to blastn, but more puzzlingly appears to have removed -gapextend, -gapopen, -xdrop_gap, and -xdrop_gap_final from tblastx. This might be something to double check with the NCBI. Peter From chapmanb at 50mail.com Mon Aug 16 13:23:41 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 16 Aug 2010 09:23:41 -0400 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: References: Message-ID: <20100816132341.GF23299@sobchak.mgh.harvard.edu> Peter; > I'm aware that in the course of the Python 3 work so far, we've touched > quite a lot of the code - and some areas are still not fully covered by the > unit tests. With that in mind, I think a beta release would be a prudent > thing to do - primarily in the hope of end users spotting any issues which > the unit tests have not revealed. How is the Python 3 stuff looking? Perception wise, it would be nice to be able to make a release with a statement like: All of the non-C extension code works on Python 3 using 2to3. Are we at all close to something like that? Otherwise, your other plans all sound good. Brad From chapmanb at 50mail.com Mon Aug 16 13:23:41 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 16 Aug 2010 09:23:41 -0400 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: References: Message-ID: <20100816132341.GF23299@sobchak.mgh.harvard.edu> Peter; > I'm aware that in the course of the Python 3 work so far, we've touched > quite a lot of the code - and some areas are still not fully covered by the > unit tests. With that in mind, I think a beta release would be a prudent > thing to do - primarily in the hope of end users spotting any issues which > the unit tests have not revealed. How is the Python 3 stuff looking? Perception wise, it would be nice to be able to make a release with a statement like: All of the non-C extension code works on Python 3 using 2to3. Are we at all close to something like that? Otherwise, your other plans all sound good. Brad From chapmanb at 50mail.com Mon Aug 16 13:27:16 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 16 Aug 2010 09:27:16 -0400 Subject: [Biopython-dev] GSOC Bio.PDB Project - Final Summary In-Reply-To: References: Message-ID: <20100816132716.GG23299@sobchak.mgh.harvard.edu> Jo?o; > The results of the GSOC 2010 project are in the wiki page: > http://biopython.org/wiki/GSOC2010_Joao > > I also started writing a Struct page regarding that same module: > http://biopython.org/wiki/Struct > > Comments appreciated :) I will be maintaining these features in the future > and adding some others as well. This looks great; tons of really useful additions. What are your thoughts on getting this integrated into the main trunk? How disruptive is it to existing PDB code? Are there any back-compatibility issues? Thanks for all the hard work this summer and looking forward to seeing it get included. Brad From chapmanb at 50mail.com Mon Aug 16 13:27:16 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 16 Aug 2010 09:27:16 -0400 Subject: [Biopython-dev] GSOC Bio.PDB Project - Final Summary In-Reply-To: References: Message-ID: <20100816132716.GG23299@sobchak.mgh.harvard.edu> Jo?o; > The results of the GSOC 2010 project are in the wiki page: > http://biopython.org/wiki/GSOC2010_Joao > > I also started writing a Struct page regarding that same module: > http://biopython.org/wiki/Struct > > Comments appreciated :) I will be maintaining these features in the future > and adding some others as well. This looks great; tons of really useful additions. What are your thoughts on getting this integrated into the main trunk? How disruptive is it to existing PDB code? Are there any back-compatibility issues? Thanks for all the hard work this summer and looking forward to seeing it get included. Brad From biopython at maubp.freeserve.co.uk Mon Aug 16 13:47:30 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Aug 2010 14:47:30 +0100 Subject: [Biopython-dev] Bio.PDB on Python 3 Message-ID: Hi all, A while back I installed NumPy from their svn under Python 3, so that I could test more of Biopython. I hadn't really looked at Bio.PDB until recently because test_PDB.py depended on Bio.KDTree which needs some C code to be compiled (which we haven't tried yet). I recently added a few doctests to Bio/PDB/Polypeptide.py which showed a problem with the code using "next" as a variable name. This is a built in function on Python 3, taking the place of the next method on iterator objects. That's fixed now: http://github.com/biopython/biopython/commit/1eb48feb5520094bf7f0177be804a953024e6938 In order to test more of Bio.PDB under Python 3, I have just split test_PDB.py into two, creating a small test_PDB_KDtree.py file for the neighbour search functionality which requires the C code. This has revealed there are at least two issues with Bio.PDB to be addressed (see below). Peter ====================================================================== ERROR: test_1_warnings (__main__.A_ExceptionTest) Check warnings: Parse a flawed PDB file in permissive mode. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 200, in _parse_coordinates fullname, serial_number, element) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", line 232, in init_atom residue.add(atom) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/Residue.py", line 82, in add "Atom %s defined twice in residue %s" % (atom_id, self)) Bio.PDB.PDBExceptions.PDBConstructionException: Atom N defined twice in residue During handling of the above exception, another exception occurred: Traceback (most recent call last): File "test_PDB.py", line 57, in test_1_warnings p.get_structure("example", "PDB/a_structure.pdb") File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 64, in get_structure self._parse(file.readlines()) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 84, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 202, in _parse_coordinates self._handle_PDB_exception(message, global_line_counter) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 256, in _handle_PDB_exception % message, PDBConstructionWarning) File "test_PDB.py", line 53, in showwarning all_warns.append(*args[0]) TypeError: append() argument after * must be a sequence, not PDBConstructionWarning ====================================================================== ERROR: test_ExposureCN (__main__.Exposure) HSExposureCN. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PDB.py", line 612, in setUp structure=PDBParser(PERMISSIVE=True).get_structure('X', pdb_filename) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 64, in get_structure self._parse(file.readlines()) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 84, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 200, in _parse_coordinates fullname, serial_number, element) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", line 185, in init_atom duplicate_atom=residue[name] TypeError: 'DisorderedResidue' object is not subscriptable ====================================================================== ERROR: test_HSExposureCA (__main__.Exposure) HSExposureCA. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PDB.py", line 612, in setUp structure=PDBParser(PERMISSIVE=True).get_structure('X', pdb_filename) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 64, in get_structure self._parse(file.readlines()) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 84, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 200, in _parse_coordinates fullname, serial_number, element) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", line 185, in init_atom duplicate_atom=residue[name] TypeError: 'DisorderedResidue' object is not subscriptable ====================================================================== ERROR: test_HSExposureCB (__main__.Exposure) HSExposureCB. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PDB.py", line 612, in setUp structure=PDBParser(PERMISSIVE=True).get_structure('X', pdb_filename) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 64, in get_structure self._parse(file.readlines()) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 84, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 200, in _parse_coordinates fullname, serial_number, element) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", line 185, in init_atom duplicate_atom=residue[name] TypeError: 'DisorderedResidue' object is not subscriptable ====================================================================== ERROR: test_c_n (__main__.ParseTest) Extract polypeptides using C-N. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PDB.py", line 138, in setUp self.structure = p.get_structure("example", "PDB/a_structure.pdb") File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 64, in get_structure self._parse(file.readlines()) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 84, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 200, in _parse_coordinates fullname, serial_number, element) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", line 185, in init_atom duplicate_atom=residue[name] TypeError: 'DisorderedResidue' object is not subscriptable ====================================================================== ERROR: test_ca_ca (__main__.ParseTest) Extract polypeptides using CA-CA. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PDB.py", line 138, in setUp self.structure = p.get_structure("example", "PDB/a_structure.pdb") File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 64, in get_structure self._parse(file.readlines()) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 84, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 200, in _parse_coordinates fullname, serial_number, element) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", line 185, in init_atom duplicate_atom=residue[name] TypeError: 'DisorderedResidue' object is not subscriptable ====================================================================== ERROR: test_details (__main__.ParseTest) Verify details of the parsed example PDB file. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PDB.py", line 138, in setUp self.structure = p.get_structure("example", "PDB/a_structure.pdb") File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 64, in get_structure self._parse(file.readlines()) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 84, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 200, in _parse_coordinates fullname, serial_number, element) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", line 185, in init_atom duplicate_atom=residue[name] TypeError: 'DisorderedResidue' object is not subscriptable ====================================================================== ERROR: test_structure (__main__.ParseTest) Verify the structure of the parsed example PDB file. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PDB.py", line 138, in setUp self.structure = p.get_structure("example", "PDB/a_structure.pdb") File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 64, in get_structure self._parse(file.readlines()) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 84, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", line 200, in _parse_coordinates fullname, serial_number, element) File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", line 185, in init_atom duplicate_atom=residue[name] TypeError: 'DisorderedResidue' object is not subscriptable ---------------------------------------------------------------------- Ran 14 tests in 1.205s FAILED (errors=8) From biopython at maubp.freeserve.co.uk Mon Aug 16 13:48:25 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Aug 2010 14:48:25 +0100 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: <20100816132341.GF23299@sobchak.mgh.harvard.edu> References: <20100816132341.GF23299@sobchak.mgh.harvard.edu> Message-ID: On Mon, Aug 16, 2010 at 2:23 PM, Brad Chapman wrote: > Peter; > >> I'm aware that in the course of the Python 3 work so far, we've touched >> quite a lot of the code - and some areas are still not fully covered by the >> unit tests. With that in mind, I think a beta release would be a prudent >> thing to do - primarily in the hope of end users spotting any issues which >> the unit tests have not revealed. > > How is the Python 3 stuff looking? Perception wise, it would be nice to be > able to make a release with a statement like: All of the non-C > extension code works on Python 3 using 2to3. Are we at all close to > something like that? > > Otherwise, your other plans all sound good. > Brad Hi Brad, I think it is still premature to make any claims about Python 3 support (even though ignoring the C and NumPy code most stuff works). Issues like binary versus text mode for handles (bytes vs unicode) and the associated speed issues are something in particular which will need some thought (and benchmarks to guide us). See also: http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008143.html http://lists.open-bio.org/pipermail/biopython-dev/2010-July/008011.html http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008159.html Peter From biopython at maubp.freeserve.co.uk Mon Aug 16 13:48:25 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Aug 2010 14:48:25 +0100 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: <20100816132341.GF23299@sobchak.mgh.harvard.edu> References: <20100816132341.GF23299@sobchak.mgh.harvard.edu> Message-ID: On Mon, Aug 16, 2010 at 2:23 PM, Brad Chapman wrote: > Peter; > >> I'm aware that in the course of the Python 3 work so far, we've touched >> quite a lot of the code - and some areas are still not fully covered by the >> unit tests. With that in mind, I think a beta release would be a prudent >> thing to do - primarily in the hope of end users spotting any issues which >> the unit tests have not revealed. > > How is the Python 3 stuff looking? Perception wise, it would be nice to be > able to make a release with a statement like: All of the non-C > extension code works on Python 3 using 2to3. Are we at all close to > something like that? > > Otherwise, your other plans all sound good. > Brad Hi Brad, I think it is still premature to make any claims about Python 3 support (even though ignoring the C and NumPy code most stuff works). Issues like binary versus text mode for handles (bytes vs unicode) and the associated speed issues are something in particular which will need some thought (and benchmarks to guide us). See also: http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008143.html http://lists.open-bio.org/pipermail/biopython-dev/2010-July/008011.html http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008159.html Peter From eric.talevich at gmail.com Mon Aug 16 16:22:53 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 16 Aug 2010 12:22:53 -0400 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: References: Message-ID: On Fri, Aug 13, 2010 at 2:18 PM, Peter wrote: > Hi all, > > We've probably clocked up enough bug fixes and additions to justify a > new release (even though there are still things waiting which look close > to being ready, e.g. uniprot-xml and imgt parsing). Alternatively, should > we delay while we work to get some more of the new stuff tested and > merged? > I'd be happy with another release within the next couple weeks, provided I can fix the Py3 bugs you've turned up in Bio.Phylo. There are some recent fixes and improvements Bio.Phylo, e.g. root_with_outgroup, that I think make the module more useful. I'd like to take another crack at documentation before the release, though -- at least put the example from my BOSC talk into the tutorial. > Does doing a beta release some time next week sound like a good plan, > with the official release say a week or two later? Are there any blocker > issues we should be addressing first? > Just the Bio.Phylo bugs and documentation, as usual. -Eric From biopython at maubp.freeserve.co.uk Mon Aug 16 16:32:26 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Aug 2010 17:32:26 +0100 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: References: Message-ID: On Mon, Aug 16, 2010 at 5:22 PM, Eric Talevich wrote: > On Fri, Aug 13, 2010 at 2:18 PM, Peter wrote: > >> Hi all, >> >> We've probably clocked up enough bug fixes and additions to justify a >> new release (even though there are still things waiting which look close >> to being ready, e.g. uniprot-xml and imgt parsing). Alternatively, should >> we delay while we work to get some more of the new stuff tested and >> merged? >> > > I'd be happy with another release within the next couple weeks, provided I > can fix the Py3 bugs you've turned up in Bio.Phylo. i.e. http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008143.html If you can look at this Py3 issue early this week I'll wait before doing the beta. The real point of the beta is to see if we broke anything on Python 2 without realising it ;) > There are some recent fixes and improvements Bio.Phylo, e.g. > root_with_outgroup, that I think make the module more useful. > > I'd like to take another crack at documentation before the release, though > -- at least put the example from my BOSC talk into the tutorial. That would be great. >> Does doing a beta release some time next week sound like a good plan, >> with the official release say a week or two later? Are there any blocker >> issues we should be addressing first? > > Just the Bio.Phylo bugs and documentation, as usual. Yeah, releases are a good trigger for people updating documentation ;) The docs /could/ be done after the beta is released... depends on your schedule really. Peter From eric.talevich at gmail.com Tue Aug 17 01:59:27 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 16 Aug 2010 21:59:27 -0400 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: References: Message-ID: On Fri, Aug 13, 2010 at 6:29 AM, Peter wrote: > > Yep - much better. However, I'm still seeing four failures with Python > 3.1.2 > which appear to be related to float/int/long conversion: > > > ERROR: test_made (__main__.WriterTests) > Round-trip parsing and serialization of made_up.xml. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_PhyloXML.py", line 550, in test_made > (TreeTests, ['test_Confidence', 'test_Polygon']), [...] > File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", > line 589, in point > _get_child_text(elem, 'long', float), > File "/home/xxx/lib/python3.1/site-packages/Bio/Phylo/PhyloXMLIO.py", > line 187, in _get_child_text > return construct(child.text) > ValueError: could not convert string to float: > > ====================================================================== > ERROR: test_phylo (__main__.WriterTests) > Round-trip parsing and serialization of phyloxml_examples.xml. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_PhyloXML.py", line 571, in test_phylo > 'test_Taxonomy', 'test_Uri', > [...] > ValueError: could not convert string to float: > > ====================================================================== > FAIL: test_Distribution (__main__.TreeTests) > Instantiation of Distribution objects. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_PhyloXML.py", line 322, in test_Distribution > self.assertEqual(point.long, longi) > AssertionError: != 8.769303 > > ====================================================================== > FAIL: test_Polygon (__main__.TreeTests) > Instantiation of Polygon objects. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_PhyloXML.py", line 378, in test_Polygon > self.assertEqual(point.long, longi) > AssertionError: != 8.769303 > > ---------------------------------------------------------------------- > I can't seem to replicate these errors. Are they still occurring on your auto2to3 branch? >From a clean master branch, I did: git checkout -b three 2to3 -w -x long Bio/ 2to3 -w BioSQL/ Tests/ setup.py python3 setup.py build sudo python3 setup.py install cd Tests/ python3 test_Phylo.py python3 test_PhyloXML.py I'm using the 2to3 packaged with Python 2.7 from python.org, testing with the Python 3.1.2 packaged for Ubuntu 10.04. Any ideas? Thanks, Eric From biopython at maubp.freeserve.co.uk Tue Aug 17 11:25:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Aug 2010 12:25:40 +0100 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: References: Message-ID: On Tue, Aug 17, 2010 at 2:59 AM, Eric Talevich wrote: > > I can't seem to replicate these errors. Are they still occurring on your > auto2to3 branch? > Yes - but I this this is down to it not being a clean branch - see below. > From a clean master branch, I did: > > git checkout -b three > 2to3 -w -x long Bio/ > 2to3 -w BioSQL/ Tests/ setup.py > python3 setup.py build > sudo python3 setup.py install > cd Tests/ > python3 test_Phylo.py > python3 test_PhyloXML.py It doesn't matter for testing Bio.Phylo, but you shouldn't need to convert setup.py, and you haven't converted the doctests. This is what we have in the README file: $ 2to3 --nofix=long --no-diffs -n -w Bio BioSQL Tests Scripts Doc/examples $ 2to3 --nofix=long --no-diffs -n -w -d Bio BioSQL Tests Scripts Doc/examples I've been running that with the addition of -j 7 to speed it up. The good news is using the above 2to3 run on a clean branch fixes test_PhyloXML.py, e.g. git reset --hard git checkout master git checkout -b three 2to3 -j 7 --nofix=long --no-diffs -n -w Bio BioSQL Tests Scripts Doc/examples 2to3 -j 7 --nofix=long --no-diffs -n -w -d Bio BioSQL Tests Scripts Doc/examples python3 setup.py install --prefix=$HOME cd Tests python3 test_Phylo.py python3 test_PhyloXML.py The resulting code is a little different from my auto2to3 branch - all to do with long/int changes. I think my script was keeping the unwanted fixes to Bio/Phylo/PhyloXML.py as well as the useful fixes made to these files: * Bio/SeqIO/InsdcIO.py - testing for isinstance of int or long * BioSQL/Loader.py - testing for isinstance of int or long * Bio/Prosite/Prodoc.py - using long(handle.tell()) * Bio/Prosite/__init__.py - using long(handle.tell()) The bad news is running 2to3 on a clean branch with the long fixes disabled breaks a few things where we need int/long fixes, e.g. test_BioSQL.py via Bio/SeqIO/InsdcIO.py I think we have three options, (1) Manually fix the int/long issues in the four files listed, and continue to use 2to3 with the long fixer disabled. The long(handle.tell()) call can just be handle.tell() as far as I can see (at least on Python 3), and Bio.Prosite is deprecated anyway. For the other issue we can add an is_int_or_long function in our Python 3 compatibility library, which I think I can do without trouble. (2) Manually fix PhyloXML to avoid long for longitude (with a deprecation period etc), and go back to using 2to3 with default settings. A bit of a pain. (3) Don't change the code, but run 2to3 in default mode for most cost, while disabling the long fixer for Bio/Phylo - this will require scripting. I think option (1) makes most sense. Peter P.S. I also need to look at my auto2to3 script again to prevent auto merges. The simple answer is to create a clean branch each time (perhaps deleting or replacing the old conversions)... the auto2to3 branch was for testing purposes anyway so I don't mind deleting it. From n.j.loman at bham.ac.uk Tue Aug 17 13:59:36 2010 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Tue, 17 Aug 2010 14:59:36 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> Message-ID: <4C6A95C8.1040902@bham.ac.uk> Kevin Jacobs wrote: > Never mind-- I didn't realized the consensus sequence was gapped, so > it is then trivial to recover the original pairwise alignments. I'll > have a version of my Newbler2SAM module that can process ACE files > shortly. Hi Kevin I was wondering how you got on with the ACE to SAM converter? I now realise that the 454PairAlign.txt produced by Newbler when run in de novo assembly mode is not much use to me, as the alignments reported in this file are strictly pairwise between reads and don't relate back to the assembled contigs. So an ACE file parser would be extremely helpful at this point. Cheers Nick. From biopython at maubp.freeserve.co.uk Tue Aug 17 15:43:17 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Aug 2010 16:43:17 +0100 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: References: Message-ID: On Tue, Aug 17, 2010 at 12:25 PM, Peter wrote: > > P.S. > I also need to look at my auto2to3 script again to prevent auto merges. > The simple answer is to create a clean branch each time (perhaps > deleting or replacing the old conversions)... the auto2to3 branch was > for testing purposes anyway so I don't mind deleting it. > Yet again, I am wishing git still supported "theirs" as a merge strategy, which would I think be exactly what I want to use in this situation. I think I've fixed my script now, notice that now the auto2to3 branch now clearly has unconverted instances of long present: http://github.com/peterjc/biopython/commit/4773377dcda10ee3511fea7fabc196fc8d6251ed This branch is (according to a git diff) identical to the one I created from a clean branch as described earlier. Peter From biopython at maubp.freeserve.co.uk Tue Aug 17 15:47:39 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Aug 2010 16:47:39 +0100 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: References: Message-ID: On Tue, Aug 17, 2010 at 12:25 PM, Peter wrote: > > The resulting code is a little different from my auto2to3 branch - all to do > with long/int changes. I think my script was keeping the unwanted fixes to > Bio/Phylo/PhyloXML.py as well as the useful fixes made to these files: > > * Bio/SeqIO/InsdcIO.py - testing for isinstance of int or long > * BioSQL/Loader.py - testing for isinstance of int or long > * Bio/Prosite/Prodoc.py - using long(handle.tell()) > * Bio/Prosite/__init__.py - using long(handle.tell()) And also: * Bio/SwissProt/SProt.py - using long(handle.tell()) * Bio/Blast/NCBIStandalone.py - using long(float(...)) I still think all these can be fixed to work without needing the 2to3 long fixer. Peter From bioinformed at gmail.com Tue Aug 17 16:27:55 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Tue, 17 Aug 2010 12:27:55 -0400 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: <4C6A95C8.1040902@bham.ac.uk> References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> <4C6A95C8.1040902@bham.ac.uk> Message-ID: On Tue, Aug 17, 2010 at 9:59 AM, Nick Loman wrote: > Kevin Jacobs wrote: > >> Never mind-- I didn't realized the consensus sequence was gapped, so it is >> then trivial to recover the original pairwise alignments. I'll have a >> version of my Newbler2SAM module that can process ACE files shortly. >> > Hi Kevin > > I was wondering how you got on with the ACE to SAM converter? > > I now realise that the 454PairAlign.txt produced by Newbler when run in de > novo assembly mode is not much use to me, as the alignments reported in this > file are strictly pairwise between reads and don't relate back to the > assembled contigs. So an ACE file parser would be extremely helpful at this > point. > > Hi Nick, I'm stuck with the ACE conversion for exactly the same reason. The consensus and reads are gapped for multiple alignments so that there are no mismatches at all. I will have to recompute the Smith-Waterman alignments of each read against the ungapped consensus in order to produce SAM/BAM output. I'm surprised that the pairwise alignments for the de novo assembly are so problematic. My understanding was they they were pairwise against the consensus contigs and would be exactly what you'd want for SAM/BAM. Unfortunately, I'm mainly dealing with only human data and don't have any direct examples to know for sure. I can re-process some of our EBV data with the de novo aligner and see what can be done. -Kevin From biopython at maubp.freeserve.co.uk Tue Aug 17 16:37:52 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Aug 2010 17:37:52 +0100 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: References: Message-ID: On Tue, Aug 17, 2010 at 4:47 PM, Peter wrote: > On Tue, Aug 17, 2010 at 12:25 PM, Peter wrote: >> >> The resulting code is a little different from my auto2to3 branch - all to do >> with long/int changes. I think my script was keeping the unwanted fixes to >> Bio/Phylo/PhyloXML.py as well as the useful fixes made to these files: >> >> * Bio/SeqIO/InsdcIO.py - testing for isinstance of int or long >> * BioSQL/Loader.py - testing for isinstance of int or long >> * Bio/Prosite/Prodoc.py - using long(handle.tell()) >> * Bio/Prosite/__init__.py - using long(handle.tell()) > > And also: > * Bio/SwissProt/SProt.py - using long(handle.tell()) > * Bio/Blast/NCBIStandalone.py - using long(float(...)) > > I still think all these can be fixed to work without needing the 2to3 long > fixer. Handling testing for int/long is done, http://github.com/biopython/biopython/commit/ddaf587afd02aa7214e53647c48e4089555e7efb And I replaced the use of long in Bio/Blast/NCBIStandalone.py with int(), http://github.com/biopython/biopython/commit/e095370184fc2ab50b37bbd86667f762ca825107 The other three uses of long I identified can probably be solved neatly like this: try: end = long(handle.tell()) except NameError: #Python 3 where 2to3 long fixer was disabled end = handle.tell() Peter From n.j.loman at bham.ac.uk Tue Aug 17 16:35:45 2010 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Tue, 17 Aug 2010 17:35:45 +0100 Subject: [Biopython-dev] Newbler ACE file to SAM? In-Reply-To: References: <4C597DD5.9060604@bham.ac.uk> <4C5992E0.8040904@bham.ac.uk> <4C59962C.2010105@bham.ac.uk> <4C6A95C8.1040902@bham.ac.uk> Message-ID: <4C6ABA61.1000307@bham.ac.uk> Kevin Jacobs wrote: > I'm stuck with the ACE conversion for exactly the same reason. The > consensus and reads are gapped for multiple alignments so that there > are no mismatches at all. I will have to recompute the Smith-Waterman > alignments of each read against the ungapped consensus in order to > produce SAM/BAM output. I'm surprised that the > pairwise alignments for the de novo assembly are so problematic. My > understanding was they they were pairwise against the consensus > contigs and would be exactly what you'd want for SAM/BAM. > Unfortunately, I'm mainly dealing with only human data and don't have > any direct examples to know for sure. I can re-process some of our > EBV data with the de novo aligner and see what can be done. Hi Kevin I was expecting it to be similar to the gsMapper output but it isn't. When you supply -pt to gsAssembler (to specify 454PairAlign.txt should be output) then each pair in the file relates to reads from the original SFF files, not the contigs. I guess this makes sense as it is probably represents a stage of the de novo assembly process (an all against all pairwise comparison on the reads). I guess I can get around this by running gsMapper against the assembly using the SFF files as a second stage, and then using Newbler2SAM on this instead, but I was kind of hoping to avoid this (as I would expect it to give slightly different results). Another possible workaround is potentially using GAP5 from the Staden package - I understand it can read ACE and output SAM. Cheers, Nick From biopython at maubp.freeserve.co.uk Tue Aug 17 19:41:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Aug 2010 20:41:57 +0100 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: References: Message-ID: On Tue, Aug 17, 2010 at 5:37 PM, Peter wrote: > > The other three uses of long I identified can probably be solved > neatly like this: > > try: > ? ?end = long(handle.tell()) > except NameError: > ? ?#Python 3 where 2to3 long fixer was disabled > ? ?end = handle.tell() > On closer inspection, probably we can just remove the long(): http://github.com/biopython/biopython/commit/e11eb52413e5fe78619c4cf5511a4db1319931fa Michiel - is this OK? This was indexing code you wrote to replace the Mindy stuff as I recall: http://github.com/biopython/biopython/commit/f17fb613cccd15e28f9f98709742f48d87ae27d4 http://github.com/biopython/biopython/commit/f17fb613cccd15e28f9f98709742f48d87ae27d4 Files: * Bio/Prosite/__init__.py * Bio/Prosite/Prodoc.py * Bio/SwissProt/SProt.py Peter From mjldehoon at yahoo.com Wed Aug 18 12:55:25 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 18 Aug 2010 05:55:25 -0700 (PDT) Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: Message-ID: <119273.97006.qm@web62406.mail.re1.yahoo.com> Since handle.tell() returns a long integer, I agree that we can remove the long(). --Michiel. --- On Tue, 8/17/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] test_PhyloXML.py on Python 3 > To: "Eric Talevich" , "Michiel de Hoon" > Cc: "Biopython-Dev Mailing List" > Date: Tuesday, August 17, 2010, 3:41 PM > On Tue, Aug 17, 2010 at 5:37 PM, > Peter wrote: > > > > The other three uses of long I identified can probably > be solved > > neatly like this: > > > > try: > > ? ?end = long(handle.tell()) > > except NameError: > > ? ?#Python 3 where 2to3 long fixer was disabled > > ? ?end = handle.tell() > > > > On closer inspection, probably we can just remove the > long(): > > http://github.com/biopython/biopython/commit/e11eb52413e5fe78619c4cf5511a4db1319931fa > > Michiel - is this OK? This was indexing code you wrote to > replace the Mindy stuff as I recall: > > http://github.com/biopython/biopython/commit/f17fb613cccd15e28f9f98709742f48d87ae27d4 > > http://github.com/biopython/biopython/commit/f17fb613cccd15e28f9f98709742f48d87ae27d4 > > Files: > > * Bio/Prosite/__init__.py > * Bio/Prosite/Prodoc.py > * Bio/SwissProt/SProt.py > > Peter > From biopython at maubp.freeserve.co.uk Wed Aug 18 13:03:45 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Aug 2010 14:03:45 +0100 Subject: [Biopython-dev] test_PhyloXML.py on Python 3 In-Reply-To: <119273.97006.qm@web62406.mail.re1.yahoo.com> References: <119273.97006.qm@web62406.mail.re1.yahoo.com> Message-ID: On Wed, Aug 18, 2010 at 1:55 PM, Michiel de Hoon wrote: > > Since handle.tell() returns a long integer, I agree that we can remove the long(). > Great. That should mean the only long issue remaining is in phyloXML code, which just requires the 2to3 script is run without the long fixer (since we are using long as shorthand for longitude not the variable type). Peter From biopython at maubp.freeserve.co.uk Wed Aug 18 13:34:36 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Aug 2010 14:34:36 +0100 Subject: [Biopython-dev] Release plans for Biopython 1.55 (beta)? In-Reply-To: References: Message-ID: On Mon, Aug 16, 2010 at 5:32 PM, Peter wrote: > On Mon, Aug 16, 2010 at 5:22 PM, Eric Talevich wrote: >> >> I'd be happy with another release within the next couple weeks, provided I >> can fix the Py3 bugs you've turned up in Bio.Phylo. >> > > i.e. > http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008143.html > > If you can look at this Py3 issue early this week I'll wait before doing the > beta. The real point of the beta is to see if we broke anything on Python 2 > without realising it ;) With the long issues apparently sorted, I'm going to try and do the beta this afternoon (i.e. in the next few hours). Currently I'm running the unit tests on Windows with Python 2.4 to 2.7, so far it all looks fine. You can expect a "trunk freeze" email shortly for the actual release process. Peter From biopython at maubp.freeserve.co.uk Wed Aug 18 14:41:11 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Aug 2010 15:41:11 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 (beta) Message-ID: Hi all, Please don't commit anything to the master branch until further notice. I have started doing the Biopython 1.55 (beta) release as we discussed, http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008150.html Thanks, Peter From biopython at maubp.freeserve.co.uk Wed Aug 18 15:40:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Aug 2010 16:40:40 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 (beta) In-Reply-To: References: Message-ID: On Wed, Aug 18, 2010 at 3:41 PM, Peter wrote: > Hi all, > > Please don't commit anything to the master branch until further notice. > I have started doing the Biopython 1.55 (beta) release as we discussed, > http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008150.html > > Thanks, > > Peter OK, the source code bundles and Windows installers are up. If anyone on the dev list has the chance to download and test these now, that would be great. Note that I have not included an installer for Python 2.7 (yet). I'm waiting for official Windows installers for NumPy on Python 2.7, this will be NumPy 1.5 which should soon as they already have a beta out. I'm hoping that will be ready by the time we want to formally release Biopython 1.55. Still to do: * Update API docs with epydoc * News page entry & email announcement Peter From biopython at maubp.freeserve.co.uk Wed Aug 18 16:16:43 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Aug 2010 17:16:43 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 (beta) In-Reply-To: References: Message-ID: On Wed, Aug 18, 2010 at 4:40 PM, Peter wrote: > Still to do: > * Update API docs with epydoc Done: http://biopython.org/DIST/docs/api/ This turned up two trivial epytext formatting issues, which I have fixed: http://github.com/biopython/biopython/commit/bfa3754b469c740f27d291698e59d1379beaf14b http://github.com/biopython/biopython/commit/2d0d54999ab881343ce47733e388e5c84c5125bf This does mean the API docs are two commits ahead of the tag and the code in the downloads ;) Peter From biopython at maubp.freeserve.co.uk Wed Aug 18 21:03:28 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Aug 2010 22:03:28 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 (beta) In-Reply-To: References: Message-ID: On Wed, Aug 18, 2010 at 4:40 PM, Peter wrote: > > OK, the source code bundles and Windows installers are up. If anyone > on the dev list has the chance to download and test these now, that > would be great. > > Note that I have not included an installer for Python 2.7 (yet). I'm waiting > for official Windows installers for NumPy on Python 2.7, this will be > NumPy 1.5 which should soon as they already have a beta out. I'm > hoping that will be ready by the time we want to formally release > Biopython 1.55. > > Still to do: > * Update API docs with epydoc > * News page entry & email announcement > As mentioned earlier, epydoc is done, and I've also just done a news post: http://news.open-bio.org/news/2010/08/biopython-1-55-beta-released/ If there are any typos or other suggestions for improvement, please tell us. We can edit that page - and then turn it into an email to send out. This means the "trunk freeze" is over, but for the next week or so when we'll do the official release, let's focus on documentation and any bug fixes. [Keep new feature work only on branches please.] Thanks, Peter From biopython at maubp.freeserve.co.uk Thu Aug 19 15:43:56 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 19 Aug 2010 16:43:56 +0100 Subject: [Biopython-dev] Biopython 1.55 beta released Message-ID: Dear Biopythoneers, We?ve just released a beta of Biopython 1.55 for user testing, as announced on the news server (which has RSS and atom feeds) and on twitter: http://news.open-bio.org/news/2010/08/biopython-1-55-beta-released/ http://twitter.com/biopython Since Biopython 1.54 was released three months ago, we?ve made a good start on work for Python 3 support (via the 2to3 script), but as a side effect of this we?ve had to update quite a lot of the older parts of the library. Although the unit tests are all fine, there is a small but real chance that we?ve accidentally broken things ? which is why we?re doing this beta release. In terms of new features, the most noticeable highlight is that the command line tool application wrapper classes are now executable, which should make it much easier to call external tools. This is described in the updated documentation. http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Note we are phasing out support for Python 2.4. We will continue to support it for at least one further release (i.e. Biopython 1.56). This could be delayed given feedback from our users (e.g. if this proves to be a problem in combination with other libraries or a popular Linux distribution). (At least) 10 people have contributed to this release (so far), including 5 new people - thank you all: * Andres Colubri (first contribution) * Carlos Rios Vera (first contribution) * Claude Paroz (first contribution) * Eric Talevich * Frank Kauff * Joao Rodrigues (first contribution) * Konstantin Okonechnikov (first contribution) * Michiel de Hoon * Peter Cock * Tiago Antao Source distributions and Windows installers are available from the downloads page on the Biopython website: http://www.biopython.org/wiki/Download Feedback is welcome through the mailing lists (or bugzilla), especially if you find something that doesn't work. Thank you, Peter From mjldehoon at yahoo.com Sat Aug 21 06:30:36 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 20 Aug 2010 23:30:36 -0700 (PDT) Subject: [Biopython-dev] Obsolete code Message-ID: <984537.40520.qm@web62408.mail.re1.yahoo.com> Dear all, The classes and modules listed below were declared obsolete in Biopython 1.54 or earlier, but do not yet raise a deprecation warning. Most of this functionality moved to a different module or was implemented differently. I suggest we add a DeprecationWarning to each of these before Biopython 1.55 final. The only tricky one is the .data property of Seq classes in Bio.Seq. Still, it might be good to add a DeprecationWarning there to make people aware that this property is obsolete. Any objections? --Michiel. Bio.CelFile.CelParser Bio.CelFile.CelScanner Bio.CelFile.CelConsumer Bio.CelFile.CelRecord Bio.Align.MultipleSeqAlignment.get_column Bio.Align.Generic.Alignment Bio.Align.Generic.Alignment.get_seq_by_num Bio.AlignAce.Parser Bio.Blast.Applications.FastacmdCommandline Bio.Blast.Applications.BlastallCommandline Bio.Blast.Applications.BlastpgpCommandline Bio.Blast.Applications.RpsBlastCommandline Bio.Blast.NCBIStandalone.blastall Bio.Blast.NCBIStandalone.blastpgp Bio.Blast.NCBIStandalone.rpsblast (Bio.Blast.NCBIStandalone has been declared obsolete, but I guess some people may still be using the Blast plain-text output parser.) Bio.Clustalw Bio.Compass._Scanner Bio.Compass._Consumer Bio.Compass.RecordParser Bio.Compass.Iterator Bio.Emboss.Applications.EProtDistCommandline Bio.Emboss.Applications.ENeighborCommandline Bio.Emboss.Applications.EProtParsCommandline Bio.Emboss.Applications.EConsenseCommandline Bio.Emboss.Applications.ESeqBootCommandline Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer.xcentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer._set_xcentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer.ycentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer._set_ycentre Bio.Graphics.GenomeDiagram._Graph.centre Bio.Graphics.GenomeDiagram._Graph._set_centre Bio.Motif.Parsers.AlignAce.AlignAceConsumer Bio.Motif.Parsers.AlignAce.AlignAceParser Bio.Motif.Parsers.AlignAce.AlignAceScanner Bio.Motif.Parsers.AlignAce.CompareAceScanner Bio.Motif.Parsers.AlignAce.CompareAceConsumer Bio.Motif.Parsers.MEME.MEMEParser Bio.Motif.Parsers.MEME._MEMEScanner Bio.Motif.Parsers.MEME._MEMEConsumer Bio.Motif.Parsers.MEME._MASTConsumer Bio.Motif.Parsers.MEME.MASTParser Bio.Motif.Parsers.MEME._MASTScanner Bio.Motif.Parsers.MEME.MASTRecord Bio.PopGen.FDist.RecordParser Bio.PopGen.FDist._Scanner Bio.PopGen.FDist._RecordConsumer Bio.Seq.Seq.data Bio.SeqUtils.GC_Frame Bio.SeqUtils.fasta_uniqids Bio.SeqUtils.apply_on_multi_fasta Bio.SeqUtils.quicker_apply_on_multi_fasta Bio.UniGene.UnigeneSequenceRecord Bio.UniGene.UnigeneProtsimRecord Bio.UniGene.UnigeneSTSRecord Bio.UniGene.UnigeneRecord Bio.UniGene._RecordConsumer Bio.UniGene._Scanner Bio.UniGene.RecordParser Bio.UniGene.Iterator From mjldehoon at yahoo.com Sat Aug 21 06:56:57 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 20 Aug 2010 23:56:57 -0700 (PDT) Subject: [Biopython-dev] Deprecated code Message-ID: <207402.89678.qm@web62407.mail.re1.yahoo.com> Dear all, Below are the modules and functions that were deprecated (with a DeprecationWarning) in Biopython 1.51 or earlier, which was released on August 17, 2009. Since that is more than one year (and more than two releases) ago, we can remove these from Biopython. Any objections? If not, I'll send this list also to the user mailing list before removing them. --Michiel. Bio.Align.FormatConvert Bio.Emboss.Applications.PrimerSearchCommandline.set_parameter Bio.Entrez.efetch: rettype="genbank" option Bio.Fasta Bio.SCOP.Dom.Parser Bio.SwissProt.SProt Bio.Transcribe Bio.Translate BioSQL.BioSeqDatabase.open_database: driver="psycopg" option From bartek at rezolwenta.eu.org Sat Aug 21 08:56:52 2010 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Sat, 21 Aug 2010 10:56:52 +0200 Subject: [Biopython-dev] Obsolete code In-Reply-To: <984537.40520.qm@web62408.mail.re1.yahoo.com> References: <984537.40520.qm@web62408.mail.re1.yahoo.com> Message-ID: Hi, Great job. I just have a small comment on the Bio.AlignAce.Parser module. It is a part of the Bio.AlignAce "package" which was already deprecated (see Bio.AlignAce.__init__.py). I think either there is no need to put an extra deprecation warning in Bio.AlignAce.Parser, or we should put it in all submodules of Bio.AlignAce (like Bio.AlignaAce.Motif, etc.). As for deprecating the old parsers in Bio Motif, I've just looked through the code in Motif.__init__.py and realized that it was still using the old implemmentations... This small patch below should fix it (I don't want to push onto the main branch now, as it's frozen). Cheers Bartek diff --git a/Bio/Motif/__init__.py b/Bio/Motif/__init__.py index 4c7c19a..9ca0200 100644 --- a/Bio/Motif/__init__.py +++ b/Bio/Motif/__init__.py @@ -10,12 +10,12 @@ as well as methods for motif comparisons and motif searching in sequences. It also inlcudes functionality for parsing AlignACE and MEME programs """ from _Motif import Motif -from Parsers.AlignAce import AlignAceParser, CompareAceParser -from Parsers.MEME import MEMEParser,MASTParser +import Parsers.AlignAce +import Parsers.MEME from Thresholds import ScoreDistribution -_parsers={"AlignAce":AlignAceParser, - "MEME":MEMEParser +_parsers={"AlignAce":Parsers.AlignAce.read, + "MEME":Parsers.MEME.read } def _from_pfm(handle): @@ -75,7 +75,7 @@ def parse(handle,format): else: #we have a proper reader yield reader(handle) else: # we have a proper reader - for m in parser().parse(handle).motifs: + for m in parser(handle).motifs: yield m def read(handle,format): On Sat, Aug 21, 2010 at 8:30 AM, Michiel de Hoon wrote: > Dear all, > > The classes and modules listed below were declared obsolete in Biopython > 1.54 or earlier, but do not yet raise a deprecation warning. Most of this > functionality moved to a different module or was implemented differently. I > suggest we add a DeprecationWarning to each of these before Biopython 1.55 > final. The only tricky one is the .data property of Seq classes in Bio.Seq. > Still, it might be good to add a DeprecationWarning there to make people > aware that this property is obsolete. > > Any objections? > > --Michiel. > > Bio.CelFile.CelParser > Bio.CelFile.CelScanner > Bio.CelFile.CelConsumer > Bio.CelFile.CelRecord > > Bio.Align.MultipleSeqAlignment.get_column > Bio.Align.Generic.Alignment > Bio.Align.Generic.Alignment.get_seq_by_num > > Bio.AlignAce.Parser > > Bio.Blast.Applications.FastacmdCommandline > Bio.Blast.Applications.BlastallCommandline > Bio.Blast.Applications.BlastpgpCommandline > Bio.Blast.Applications.RpsBlastCommandline > Bio.Blast.NCBIStandalone.blastall > Bio.Blast.NCBIStandalone.blastpgp > Bio.Blast.NCBIStandalone.rpsblast > (Bio.Blast.NCBIStandalone has been declared obsolete, but I guess some > people may still be using the Blast plain-text output parser.) > > Bio.Clustalw > > Bio.Compass._Scanner > Bio.Compass._Consumer > Bio.Compass.RecordParser > Bio.Compass.Iterator > > Bio.Emboss.Applications.EProtDistCommandline > Bio.Emboss.Applications.ENeighborCommandline > Bio.Emboss.Applications.EProtParsCommandline > Bio.Emboss.Applications.EConsenseCommandline > Bio.Emboss.Applications.ESeqBootCommandline > > Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer.xcentre > Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer._set_xcentre > Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer.ycentre > Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer._set_ycentre > Bio.Graphics.GenomeDiagram._Graph.centre > Bio.Graphics.GenomeDiagram._Graph._set_centre > > Bio.Motif.Parsers.AlignAce.AlignAceConsumer > Bio.Motif.Parsers.AlignAce.AlignAceParser > Bio.Motif.Parsers.AlignAce.AlignAceScanner > Bio.Motif.Parsers.AlignAce.CompareAceScanner > Bio.Motif.Parsers.AlignAce.CompareAceConsumer > > Bio.Motif.Parsers.MEME.MEMEParser > Bio.Motif.Parsers.MEME._MEMEScanner > Bio.Motif.Parsers.MEME._MEMEConsumer > Bio.Motif.Parsers.MEME._MASTConsumer > Bio.Motif.Parsers.MEME.MASTParser > Bio.Motif.Parsers.MEME._MASTScanner > Bio.Motif.Parsers.MEME.MASTRecord > > Bio.PopGen.FDist.RecordParser > Bio.PopGen.FDist._Scanner > Bio.PopGen.FDist._RecordConsumer > > Bio.Seq.Seq.data > > Bio.SeqUtils.GC_Frame > Bio.SeqUtils.fasta_uniqids > Bio.SeqUtils.apply_on_multi_fasta > Bio.SeqUtils.quicker_apply_on_multi_fasta > > Bio.UniGene.UnigeneSequenceRecord > Bio.UniGene.UnigeneProtsimRecord > Bio.UniGene.UnigeneSTSRecord > Bio.UniGene.UnigeneRecord > Bio.UniGene._RecordConsumer > Bio.UniGene._Scanner > Bio.UniGene.RecordParser > Bio.UniGene.Iterator > > > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From tiagoantao at gmail.com Sat Aug 21 11:33:55 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 21 Aug 2010 12:33:55 +0100 Subject: [Biopython-dev] Obsolete code In-Reply-To: <984537.40520.qm@web62408.mail.re1.yahoo.com> References: <984537.40520.qm@web62408.mail.re1.yahoo.com> Message-ID: On Sat, Aug 21, 2010 at 7:30 AM, Michiel de Hoon wrote: > Bio.PopGen.FDist.RecordParser > Bio.PopGen.FDist._Scanner > Bio.PopGen.FDist._RecordConsumer I think I know everybody that uses this code (of course, surprises sometimes happen...) and I am very convinced that all is upgraded. Please go ahead. I intend to remove it in 1.56 (my branch on git does not have it anymore). From p.j.a.cock at googlemail.com Sun Aug 22 22:11:41 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 22 Aug 2010 23:11:41 +0100 Subject: [Biopython-dev] Deprecated code In-Reply-To: <207402.89678.qm@web62407.mail.re1.yahoo.com> References: <207402.89678.qm@web62407.mail.re1.yahoo.com> Message-ID: On Saturday, August 21, 2010, Michiel de Hoon wrote: > > Dear all, > > Below are the modules and functions that were deprecated (with a DeprecationWarning) in Biopython 1.51 or earlier, which was released on August 17, 2009. Since that is more than one year (and more than two releases) ago, we can remove these from Biopython. Any objections? If not, I'll send this list also to the user mailing list before removing them. > > --Michiel. > Since we didn't do this in the beta, I'd say leave these for Biopython 1.55 (about one week's time - so far so good) but then they can go. An email to the mail list would be sensible too. Peter From p.j.a.cock at googlemail.com Sun Aug 22 22:26:53 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 22 Aug 2010 23:26:53 +0100 Subject: [Biopython-dev] Obsolete code In-Reply-To: <984537.40520.qm@web62408.mail.re1.yahoo.com> References: <984537.40520.qm@web62408.mail.re1.yahoo.com> Message-ID: On Saturday, August 21, 2010, Michiel de Hoon wrote: > Dear all, > > The classes and modules listed below were declared obsolete in Biopython 1.54 or earlier, but do not yet raise a deprecation warning. Most of this functionality moved to a different module or was implemented differently. I suggest we add a DeprecationWarning to each of these before Biopython 1.55 final. The only tricky one is the .data property of Seq classes in Bio.Seq. Still, it might be good to add a DeprecationWarning there to make people aware that this property is obsolete. > > Any objections? I'd be cautious about the Seq object given it will be particularly widely used. I've commented on a few cases below, but in general yes deprecation of things previously marked as obsolete is sensible house keeping. > > --Michiel. > > Bio.CelFile.CelParser > Bio.CelFile.CelScanner > Bio.CelFile.CelConsumer > Bio.CelFile.CelRecord > Ok > Bio.Align.MultipleSeqAlignment.get_column > Bio.Align.Generic.Alignment > Bio.Align.Generic.Alignment.get_seq_by_num > I'd like to make the new alignment object a bit more user friendly before we deprecate these bits. > Bio.AlignAce.Parser Ok > > Bio.Blast.Applications.FastacmdCommandline > Bio.Blast.Applications.BlastallCommandline > Bio.Blast.Applications.BlastpgpCommandline > Bio.Blast.Applications.RpsBlastCommandline > Bio.Blast.NCBIStandalone.blastall > Bio.Blast.NCBIStandalone.blastpgp > Bio.Blast.NCBIStandalone.rpsblast The NCBI are still supporting "legacy" BLAST so it is probably a bit too early to deprecate these wrappers. Maybe I'm being cautious but I'd leave this until the next release in three months time or so. > (Bio.Blast.NCBIStandalone has been declared obsolete, but I guess some people may still be using the Blast plain-text output parser.) Deprecation seems premature - the code is still useful and probably widely used (I use it myself sometimes). Maybe once your new BLAST parsing framework is ready Michiel? > > Bio.Clustalw Ok > > Bio.Compass._Scanner > Bio.Compass._Consumer > Bio.Compass.RecordParser > Bio.Compass.Iterator Ok > Bio.Emboss.Applications.EProtDistCommandline > Bio.Emboss.Applications.ENeighborCommandline > Bio.Emboss.Applications.EProtParsCommandline > Bio.Emboss.Applications.EConsenseCommandline > Bio.Emboss.Applications. > Bio.Align.FormatConvert > Bio.Emboss.Applications.PrimerSearchCommandline.set_parameter > Bio.Entrez.efetch: rettype="genbank" option Should be fine. > Bio.Fasta Here my instinct is to be cautious given Bio.Fasta used to be such a widely used module. > Bio.SCOP.Dom.Parser > Bio.SwissProt.SProt > Bio.Transcribe > Bio.Translate > BioSQL.BioSeqDatabase.open_database: driver="psycopg" option Ok > > Bio.SeqUtils.GC_Frame > Bio.SeqUtils.fasta_uniqids > Bio.SeqUtils.apply_on_multi_fasta > Bio.SeqUtils.quicker_apply_on_multi_fasta > > Bio.UniGene.UnigeneSequenceRecord > Bio.UniGene.UnigeneProtsimRecord > Bio.UniGene.UnigeneSTSRecord > Bio.UniGene.UnigeneRecord > Bio.UniGene._RecordConsumer > Bio.UniGene._Scanner > Bio.UniGene.RecordParser > Bio.UniGene.Iterator > Ok Apologies for brevity and any typos, this was sent from an iPod. Peter From biopython at maubp.freeserve.co.uk Tue Aug 24 11:56:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Aug 2010 12:56:57 +0100 Subject: [Biopython-dev] IMGT parser (modified EMBL format), Message-ID: Hi all, The IMGT is the international ImMunoGeneTics information system, a global reference in immunogenetics and immunoinformatics. They have a sequence databases, genome database, structure database, and monoclonal antibodies database. The IMGT use a variant of the EMBL flat file format with longer feature indents: http://imgt.cines.fr/download/LIGM-DB/userman_doc.html http://imgt.cines.fr/download/LIGM-DB/ftable_doc.html http://www.ebi.ac.uk/imgt/hla/docs/manual.html Uri and I have been working on extending the SeqIO EMBL/GenBank parser and writer to support IMGT files too. This uncovered a number of data formatting issues (e.g. wrong sequence length in ID line, partial feature locations) and Uri has been liaising with the IMGT curators to address these. With their latest (Aug 2010) release, we can now parse the whole file without errors: http://imgt.cines.fr/download/LIGM-DB/imgt.dat.Z I think this code is now ready to merge - comments welcome: http://github.com/peterjc/biopython/commits/seqio-imgt Potentially we could even include this in Biopython 1.55, although it would be more cautious not to add any new features between the beta and the final release... Peter From biopython at maubp.freeserve.co.uk Tue Aug 24 12:06:15 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Aug 2010 13:06:15 +0100 Subject: [Biopython-dev] Bio.PDB on Python 3 In-Reply-To: References: Message-ID: On Mon, Aug 16, 2010 at 2:47 PM, Peter wrote: > Hi all, > > A while back I installed NumPy from their svn under Python 3, so that I > could test more of Biopython. I hadn't really looked at Bio.PDB until > recently because test_PDB.py depended on Bio.KDTree which needs > some C code to be compiled (which we haven't tried yet). > > I recently added a few doctests to Bio/PDB/Polypeptide.py which > showed a problem with the code using "next" as a variable name. > This is a built in function on Python 3, taking the place of the next > method on iterator objects. That's fixed now: > > http://github.com/biopython/biopython/commit/1eb48feb5520094bf7f0177be804a953024e6938 > > In order to test more of Bio.PDB under Python 3, I have just split > test_PDB.py into two, creating a small test_PDB_KDtree.py file > for the neighbour search functionality which requires the C code. > > This has revealed there are at least two issues with Bio.PDB to be > addressed (see below). > > Peter > > > ====================================================================== > ERROR: test_1_warnings (__main__.A_ExceptionTest) > Check warnings: Parse a flawed PDB file in permissive mode. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 200, in _parse_coordinates > ? ?fullname, serial_number, element) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", > line 232, in init_atom > ? ?residue.add(atom) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/Residue.py", > line 82, in add > ? ?"Atom %s defined twice in residue %s" % (atom_id, self)) > Bio.PDB.PDBExceptions.PDBConstructionException: Atom N defined twice > in residue > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > ?File "test_PDB.py", line 57, in test_1_warnings > ? ?p.get_structure("example", "PDB/a_structure.pdb") > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 64, in get_structure > ? ?self._parse(file.readlines()) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 84, in _parse > ? ?self.trailer=self._parse_coordinates(coords_trailer) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 202, in _parse_coordinates > ? ?self._handle_PDB_exception(message, global_line_counter) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 256, in _handle_PDB_exception > ? ?% message, PDBConstructionWarning) > ?File "test_PDB.py", line 53, in showwarning > ? ?all_warns.append(*args[0]) > TypeError: append() argument after * must be a sequence, not > PDBConstructionWarning Eric fixes this one, thanks: http://github.com/biopython/biopython/commit/f4917021cbb8a4ed4cc72dc50a2abf0066da7131 > ====================================================================== > ERROR: test_ExposureCN (__main__.Exposure) > HSExposureCN. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "test_PDB.py", line 612, in setUp > ? ?structure=PDBParser(PERMISSIVE=True).get_structure('X', pdb_filename) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 64, in get_structure > ? ?self._parse(file.readlines()) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 84, in _parse > ? ?self.trailer=self._parse_coordinates(coords_trailer) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 200, in _parse_coordinates > ? ?fullname, serial_number, element) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", > line 185, in init_atom > ? ?duplicate_atom=residue[name] > TypeError: 'DisorderedResidue' object is not subscriptable This and the others like it remain. I haven't looked into what is wrong. Peter From laserson at mit.edu Tue Aug 24 14:35:01 2010 From: laserson at mit.edu (Uri Laserson) Date: Tue, 24 Aug 2010 10:35:01 -0400 Subject: [Biopython-dev] IMGT parser (modified EMBL format), In-Reply-To: References: Message-ID: Hi all, I would obviously prefer it to go into the distribution as soon as it is possible, but I don't want to mess with the releases. The IMGT people said they'll put a news announcement on their site and a link to biopython once the code is in the official release. Uri On Tue, Aug 24, 2010 at 07:56, Peter wrote: > Hi all, > > The IMGT is the international ImMunoGeneTics information system, a global > reference in immunogenetics and immunoinformatics. They have a sequence > databases, genome database, structure database, and monoclonal antibodies > database. > > The IMGT use a variant of the EMBL flat file format with longer feature > indents: > http://imgt.cines.fr/download/LIGM-DB/userman_doc.html > http://imgt.cines.fr/download/LIGM-DB/ftable_doc.html > http://www.ebi.ac.uk/imgt/hla/docs/manual.html > > Uri and I have been working on extending the SeqIO EMBL/GenBank parser > and writer to support IMGT files too. This uncovered a number of data > formatting > issues (e.g. wrong sequence length in ID line, partial feature > locations) and Uri > has been liaising with the IMGT curators to address these. With their > latest > (Aug 2010) release, we can now parse the whole file without errors: > http://imgt.cines.fr/download/LIGM-DB/imgt.dat.Z > > I think this code is now ready to merge - comments welcome: > http://github.com/peterjc/biopython/commits/seqio-imgt > > Potentially we could even include this in Biopython 1.55, although it would > be more cautious not to add any new features between the beta and the > final release... > > Peter > -- Uri Laserson Graduate Student, Biomedical Engineering Harvard-MIT Division of Health Sciences and Technology M +1 917 742 8019 laserson at mit.edu From biopython at maubp.freeserve.co.uk Tue Aug 24 16:30:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Aug 2010 17:30:47 +0100 Subject: [Biopython-dev] Fwd: [blast-announce] Correction: BLAST 2.2.24 release announcement In-Reply-To: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> Message-ID: Hi all, The NCBI have just released a new version of BLAST+ (see below). I've just updated the existing BLAST+ application wrappers for the minor changes made in BLAST 2.2.24+. Something potentially quite useful in this release is the blast_formatter command for turning ASN.1 BLAST+ output (using ?outfmt 11) into any of the other output formats. i.e. If you are not sure what output format will be most useful (e.g. plain text, XML, tabular) and rerunning the BLAST is slow, the NCBI now let you run the BLAST once and save it as ASN.1, then convert this to any other format on demand using blast_formatter (which should be fast). We should write a command line wrapper for this new tool... Peter ---------- Forwarded message ---------- From: mcginnis Date: Tue, Aug 24, 2010 at 4:46 PM Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement To: NLM/NCBI List blast-announce A new version of the stand-alone applications is available. Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ This release includes a number of bug fixes as well as new features for the BLAST+ applications: *?Introduce BLAST Archive format to permit reformatting of?stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) * Added the blast_formatter application (see BLAST+ user manual) * Added support for translated subject soft masking in the BLAST databases * Added support for the BLAST Trace-back operations (btop) output format * Added command line options to blastdbcmd for listing available BLAST databases * Improved performance of formatting of remote BLAST searches * Use a consistent exit code for out of memory conditions * Fixed bug in indexed megablast with multiple space-separated BLAST databases * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb * Fixed Windows installer for 64-bit installations BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download From mjldehoon at yahoo.com Wed Aug 25 01:11:56 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 24 Aug 2010 18:11:56 -0700 (PDT) Subject: [Biopython-dev] Obsolete code In-Reply-To: Message-ID: <274902.34856.qm@web62407.mail.re1.yahoo.com> Tiago, Bartek, Peter, Thanks for your comments. Peter, is the main branch still frozen? I'd like to make these changes on Saturday Japanese time, so Friday night in Europe/US. > I'd be cautious about the Seq object given it will be > particularly widely used. I've commented on a few cases > below, but in general yes deprecation of things previously > marked as obsolete is sensible house keeping. For the Seq object, I think it's good to have a DeprecationWarning to make users aware of the changes. For example, I was not aware that Seq.data is obsolete. > > Bio.Align.MultipleSeqAlignment.get_column > > Bio.Align.Generic.Alignment > > Bio.Align.Generic.Alignment.get_seq_by_num > > I'd like to make the new alignment object a bit more user > friendly before we deprecate these bits. OK I won't touch these then. > > Bio.Blast.Applications.FastacmdCommandline > > Bio.Blast.Applications.BlastallCommandline > > Bio.Blast.Applications.BlastpgpCommandline > > Bio.Blast.Applications.RpsBlastCommandline > > Bio.Blast.NCBIStandalone.blastall > > Bio.Blast.NCBIStandalone.blastpgp > > Bio.Blast.NCBIStandalone.rpsblast > > The NCBI are still supporting "legacy" BLAST so it is > probably a bit too early to deprecate these wrappers. > Maybe I'm being cautious but I'd leave this until the > next release in three months time or so. OK. > (Bio.Blast.NCBIStandalone has been declared obsolete, > but I guess some people may still be using the Blast > plain-text output parser.) > > Deprecation seems premature - the code is still useful and > probably widely used (I use it myself sometimes). Maybe once your > new BLAST parsing framework is ready Michiel? OK. > > Bio.Fasta > > Here my instinct is to be cautious given Bio.Fasta used to > be such a widely used module. Here also I think we should make our users aware of the changes, especially because it used to be widely used. --Michiel. From mjldehoon at yahoo.com Wed Aug 25 01:15:04 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 24 Aug 2010 18:15:04 -0700 (PDT) Subject: [Biopython-dev] Deprecated code In-Reply-To: Message-ID: <435550.4986.qm@web62403.mail.re1.yahoo.com> Well, here I think that we should be brave and remove the deprecated code, unless if any of the users are actively using it. If we leave deprecated code in too long then Biopython becomes a mess. --Michiel. --- On Sun, 8/22/10, Peter Cock wrote: > From: Peter Cock > Subject: Re: [Biopython-dev] Deprecated code > To: "Michiel de Hoon" > Cc: "biopython-dev at biopython.org" > Date: Sunday, August 22, 2010, 6:11 PM > On Saturday, August 21, 2010, Michiel > de Hoon > wrote: > > > > Dear all, > > > > Below are the modules and functions that were > deprecated (with a DeprecationWarning) in Biopython 1.51 or > earlier, which was released on August 17, 2009. Since that > is more than one year (and more than two releases) ago, we > can remove these from Biopython. Any objections? If not, > I'll send this list also to the user mailing list before > removing them. > > > > --Michiel. > > > > Since we didn't do this in the beta, I'd say leave these > for Biopython > 1.55 (about one week's time - so far so good) but then they > can go. An > email to the mail list would be sensible too. > > Peter > From updates at feedmyinbox.com Wed Aug 25 07:12:57 2010 From: updates at feedmyinbox.com (Feed My Inbox) Date: Wed, 25 Aug 2010 03:12:57 -0400 Subject: [Biopython-dev] 8/25 biopython Questions - BioStar Message-ID: // Restriction for automated primer design // August 24, 2010 at 12:39 PM http://biostar.stackexchange.com/questions/2239/restriction-for-automated-primer-design Hi, I am pretty new to BioPython, but I am trying to write a script that would allow the user to input a fasta-file and the multiple cloning site of the target vector. In my script e.g. pQCXIN is the MCS of a vector. I successfully feed digest the insert and get a list of enzymes that does not cut the insert. However, the list lost the order of the restriction sites defined in the beginning of my code (probably because dictionaries are not ordered??). But the order is essential for my primer design step, as I would obviously like to add a 5' RE to my 5' primer... so basically, how do I get the "no_cutter" as a list that has the same order as the input in the beginning of my code Any help, suggestions would be appreciated This is my code so far... from Bio.Seq import Seq from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA from Bio.Restriction import * from Bio import SeqIO ##Define the MCS of several vectors this list will get populated while in use## pQCXIN = RestrictionBatch([NotI,AgeI,BsiWI,PacI,BamHI,EcoRI]) pUC19 = RestrictionBatch([HindIII,SphI,PstI,SalI,XbaI,BamHI,SmaI,KpnI,SacI,EcoRI]) ##prompt for vector and insert## sequence=raw_input('''select your sequence file in FASTA format: ''') vector=raw_input('''select your vector: ''') #print vector for seq in SeqIO.parse(sequence, "fasta"): Digest = Analysis(eval(vector), seq.seq, linear=True) print seq.id #Digest.print_as('map') #print Digest.print_that() no_cutters = list(Digest.without_site()) print no_cutters[1].site -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/444424/f8ec200ea7b1a33442ee9d28a3d1365a23421b9a/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. 230 Franklin Road Suite 814 Franklin, TN 37064 From biopython at maubp.freeserve.co.uk Wed Aug 25 08:29:11 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Aug 2010 09:29:11 +0100 Subject: [Biopython-dev] Obsolete code In-Reply-To: <274902.34856.qm@web62407.mail.re1.yahoo.com> References: <274902.34856.qm@web62407.mail.re1.yahoo.com> Message-ID: On Wed, Aug 25, 2010 at 2:11 AM, Michiel de Hoon wrote: > Tiago, Bartek, Peter, > > Thanks for your comments. > > Peter, is the main branch still frozen? I'd like to make these > changes on Saturday Japanese time, so Friday night in Europe/US. No - bug fixes, documentation are fine. >> I'd be cautious about the Seq object given it will be >> particularly widely used. I've commented on a few cases >> below, but in general yes deprecation of things previously >> marked as obsolete is sensible house keeping. > > For the Seq object, I think it's good to have a DeprecationWarning > to make users aware of the changes. For example, I was not > aware that Seq.data is obsolete. I guess we have to do it at some point, so OK. >> > Bio.Align.MultipleSeqAlignment.get_column >> > Bio.Align.Generic.Alignment >> > Bio.Align.Generic.Alignment.get_seq_by_num >> >> I'd like to make the new alignment object a bit more user >> friendly before we deprecate these bits. > > OK I won't touch these then. > >> > Bio.Blast.Applications.FastacmdCommandline >> > Bio.Blast.Applications.BlastallCommandline >> > Bio.Blast.Applications.BlastpgpCommandline >> > Bio.Blast.Applications.RpsBlastCommandline >> > Bio.Blast.NCBIStandalone.blastall >> > Bio.Blast.NCBIStandalone.blastpgp >> > Bio.Blast.NCBIStandalone.rpsblast >> >> The NCBI are still supporting "legacy" BLAST so it is >> probably a bit too early to deprecate these wrappers. >> Maybe I'm being cautious but I'd leave this until the >> next release in three months time or so. > > OK. > >> (Bio.Blast.NCBIStandalone has been declared obsolete, >> but I guess some people may still be using the Blast >> plain-text output parser.) >> >> Deprecation seems premature - the code is still useful and >> probably widely used (I use it myself sometimes). Maybe >> once your new BLAST parsing framework is ready Michiel? > > OK. > >> > Bio.Fasta >> >> Here my instinct is to be cautious given Bio.Fasta used to >> be such a widely used module. > > Here also I think we should make our users aware of the > changes, especially because it used to be widely used. As you pointed out on the other thread, Bio.Fasta has already been declared deprecated. Peter From p.j.a.cock at googlemail.com Wed Aug 25 08:33:44 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 25 Aug 2010 09:33:44 +0100 Subject: [Biopython-dev] Deprecated code In-Reply-To: <435550.4986.qm@web62403.mail.re1.yahoo.com> References: <435550.4986.qm@web62403.mail.re1.yahoo.com> Message-ID: Michiel wrote: >>> Below are the modules and functions that were >>> deprecated (with a DeprecationWarning) in Biopython 1.51 or >>> earlier, which was released on August 17, 2009. Since that >>> is more than one year (and more than two releases) ago, we >>> can remove these from Biopython. Any objections? If not, >>> I'll send this list also to the user mailing list before >>> removing them. Peter wrote: >> Since we didn't do this in the beta, I'd say leave these for >> Biopython 1.55 (about one week's time - so far so good) >> but then they can go. An email to the mail list would be >> sensible too. Michiel de Hoon wrote: > Well, here I think that we should be brave and remove the > deprecated code, unless if any of the users are actively using > it. If we leave deprecated code in too long then Biopython > becomes a mess. OK then - send out a warning mail on the mail list, and if there are no objections you can remove these deprecated modules at the end of the week as you'd suggested. I'll then aim to do the Biopython 1.55 final release early next week. How's that? Peter From mjldehoon at yahoo.com Wed Aug 25 13:54:02 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 25 Aug 2010 06:54:02 -0700 (PDT) Subject: [Biopython-dev] Deprecated code In-Reply-To: Message-ID: <766351.23054.qm@web62403.mail.re1.yahoo.com> --- On Wed, 8/25/10, Peter Cock wrote: > OK then - send out a warning mail on the mail list, and if > there are no objections you can remove these deprecated modules > at the end of the week as you'd suggested. I'll then aim to > do the Biopython 1.55 final release early next week. How's > that? Sounds good! I just sent out a warning mail to the user mailing list. Best, --Michiel. From bugzilla-daemon at portal.open-bio.org Thu Aug 26 13:13:21 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 26 Aug 2010 09:13:21 -0400 Subject: [Biopython-dev] [Bug 3096] PPBuilder build_peptides bugs In-Reply-To: Message-ID: <201008261313.o7QDDLlY001012@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3096 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-26 09:13 EST ------- (In reply to comment #3) > Hi Peter, > > I manage to produce the problem without modifying _accept(). > Excellent - that should help. > > The output peptides should be: ['IHR',STGL'] not ['IHRXTGL'] in the current > version... > I agree that ['IHRXTGL'] is definitely wrong (you have convinced me this is a real bug). Chain A has residues: ILE, HIS, ARG, XLY, SER, THR, GLY, LEU. Sensible results are therefore ['IHRXSTGL'] if we include XLY as a modified amino acid, or ['IHR', 'STGL'] is we exclude XLY (which we probably should). Was XLY just an artifical example for this bug report? Looking at the original PDB file for 1BFE, it is a modified GLY where you have switched CA (alpha carbon) to the non-standard CX. > Residue XLY A 319 or X in the fourth position should not be included > since it doesn't have CA atom. Instead the current version includes it and > remove the 'S' next to it, due to the same bug. One can get the right version > using the patch provided before. > > Whether the _accept is modified or not the bug remains. Also the user should > not be expected to also modify build_peptides() method whenever PPBuilder > _accept is modified since the accept variable in build_peptides isn't really a > local (private) variable: In line 277 this variable accept is referenced from > self.accept of PPBuilder. > > http://www.biopython.org/DIST/docs/api/Bio.PDB.Polypeptide-pysrc.html > 277 accept=self._accept I'm assuming you mean the line "accept=self._accept" in the build_peptides method of the _PPBuilder class in Bio/PDB/Polypeptide.py (the line numbers have changed). If so, all that does is define a local variable within the scope of that method - it does not expose the method in any way. I don't understand what you mean here. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Thu Aug 26 10:43:38 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 26 Aug 2010 11:43:38 +0100 Subject: [Biopython-dev] GTF (T not F) Message-ID: Hi, I've been noticing that there has been some work with GFF files around here. I've done a parser for GTF files ( http://mblab.wustl.edu/GTF22.html ) and I was wondering if someone would find interest in it? My knowledge of use cases of GTF/GFF is quite limited. I've done this to support reading Ensembl data in the context of supporting my work with HapMap datasets (The related project is this: http://popgen.eu/soft/interPop/ ) , but I really do not know the "big picture" of use cases. Anyway, I would be willing to donate the code if there is interest. Also adapt it to support more general use cases The code is available here http://bazaar.launchpad.net/~tiagoantao/interpopula/trunk/annotate/head%3A/src/interPopula/Ensembl/GTF.py But as you will notice it is wrapped in lots of SQL stuff (which would have to be removed/adapted). I could remove my SQL fluff and just produce a simple parser if somebody would tell me how should the design be done to support more general use cases. The format is not very complex, anyway. Tiago -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From biopython at maubp.freeserve.co.uk Thu Aug 26 13:52:16 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Aug 2010 14:52:16 +0100 Subject: [Biopython-dev] GTF (T not F) In-Reply-To: References: Message-ID: 2010/8/26 Tiago Ant?o : > Hi, > > I've been noticing that there has been some work with GFF files around here. > I've done a parser for GTF files ( http://mblab.wustl.edu/GTF22.html ) > and I was wondering if someone would find interest in it? > My knowledge of use cases of GTF/GFF is quite limited. I think Brad can comment, but as I understand it GTF is part of the GFF family, and he was going to support this as well as vague GFF and GFF3. Peter From bugzilla-daemon at portal.open-bio.org Thu Aug 26 16:30:00 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 26 Aug 2010 12:30:00 -0400 Subject: [Biopython-dev] [Bug 3096] PPBuilder build_peptides bugs In-Reply-To: Message-ID: <201008261630.o7QGU07U009778@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3096 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-26 12:30 EST ------- Hi Siong, Can you test this branch? I've made a change based on your suggestion: http://github.com/peterjc/biopython/tree/bug3096 Currently there is just this one commit: http://github.com/peterjc/biopython/commit/d65d2f4dfbedffa2847db0a37984c354586b4cb8 If you don't have git installed, or are not familiar with it, you can just modified file Bio/PDB/Polypeptide.py from here: http://github.com/peterjc/biopython/raw/d65d2f4dfbedffa2847db0a37984c354586b4cb8/Bio/PDB/Polypeptide.py Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Aug 26 17:37:06 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Aug 2010 18:37:06 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 (beta) In-Reply-To: References: Message-ID: On Wed, Aug 18, 2010 at 10:03 PM, Peter wrote: > > As mentioned earlier, epydoc is done, and I've also just done a news post: > http://news.open-bio.org/news/2010/08/biopython-1-55-beta-released/ > > If there are any typos or other suggestions for improvement, please tell > us. We can edit that page - and then turn it into an email to send out. > > This means the "trunk freeze" is over, but for the next week or so when > we'll do the official release, let's focus on documentation and any bug fixes. > [Keep new feature work only on branches please.] > As discussed here, the plan is to do the final release on Monday or Tuesday (30 or 31 August 2010), after a few deprecations/removals are done: http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008194.html http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008196.html Peter From bugzilla-daemon at portal.open-bio.org Thu Aug 26 17:38:19 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 26 Aug 2010 13:38:19 -0400 Subject: [Biopython-dev] [Bug 3109] Record class in Bio.SCOP.Cla has hierarchy member as list instead of dictionary In-Reply-To: Message-ID: <201008261738.o7QHcJZP012135@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3109 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-26 13:38 EST ------- After Biopython 1.55 final is out I'll look at merging this. Thanks. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Fri Aug 27 12:06:57 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 27 Aug 2010 08:06:57 -0400 Subject: [Biopython-dev] GTF (T not F) In-Reply-To: References: Message-ID: <20100827120657.GC23299@sobchak.mgh.harvard.edu> Tiago; > I've been noticing that there has been some work with GFF files around here. > I've done a parser for GTF files ( http://mblab.wustl.edu/GTF22.html ) > and I was wondering if someone would find interest in it? The GFF parser should parse the GTF variant as well: http://github.com/chapmanb/bcbb/tree/master/gff/ If it is having trouble on any specific files please send them along and I'll be happy to have a look. > My knowledge of use cases of GTF/GFF is quite limited. I've done this > to support reading Ensembl data in the context of supporting my work > with HapMap datasets (The related project is this: > http://popgen.eu/soft/interPop/ ) , but I really do not know the "big > picture" of use cases. This looks like you've specialized the extraction to this particular type of GFF, which could be useful for folks dealing with the same specific files you are. The GFF parser is more general and returns Biopython SeqFeature objects, so you could use it to actually do the parse part, and then provide your specific extraction and storage on top of that. Brad From tiagoantao at gmail.com Fri Aug 27 12:33:35 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 27 Aug 2010 13:33:35 +0100 Subject: [Biopython-dev] GTF (T not F) In-Reply-To: <20100827120657.GC23299@sobchak.mgh.harvard.edu> References: <20100827120657.GC23299@sobchak.mgh.harvard.edu> Message-ID: 2010/8/27 Brad Chapman : > Tiago; > >> I've been noticing that there has been some work with GFF files around here. >> I've done a parser for GTF files ( http://mblab.wustl.edu/GTF22.html ) >> and I was wondering if someone would find interest in it? > > The GFF parser should parse the GTF variant as well: OK then. I really did not know if there was any GTF support. I have a specific case with a target use case (help processing HapMap data). When your code is included in biopython, I think I will just deprecate mine in favour of using your more general solution. In this case I see no good reason to maintain 2 separate implementations (and my core functionality is really HapMap related). Tiago From mjldehoon at yahoo.com Sat Aug 28 01:41:04 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 27 Aug 2010 18:41:04 -0700 (PDT) Subject: [Biopython-dev] Test suite failure Message-ID: <829004.66089.qm@web62403.mail.re1.yahoo.com> Dear all, I am getting the errors below when running the Biopython tests on Mac OS X. This is with blast+ 2.2.23. The blast+ 2.2.24 installer fails on Mac OS X, so I don't know if the same problem would occur with that version. --Michiel ====================================================================== ERROR: test_blastn (test_NCBI_BLAST_tools.CheckCompleteArgList) Check all blastn arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 220, in test_blastn self.check("blastn", Applications.NcbiblastnCommandline) File "test_NCBI_BLAST_tools.py", line 204, in check ",".join(sorted(missing)))) MissingExternalDependencyError: BLAST+ and Biopython out of sync. Your version of the NCBI BLAST+ tool blastn does not match what we are expecting. Please update your copy of Biopython, or report this issue if you are already using the latest version. (Exta args: -seqidlist; Missing: ) ====================================================================== ERROR: test_blastp (test_NCBI_BLAST_tools.CheckCompleteArgList) Check all blastp arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 216, in test_blastp self.check("blastp", Applications.NcbiblastpCommandline) File "test_NCBI_BLAST_tools.py", line 204, in check ",".join(sorted(missing)))) MissingExternalDependencyError: BLAST+ and Biopython out of sync. Your version of the NCBI BLAST+ tool blastp does not match what we are expecting. Please update your copy of Biopython, or report this issue if you are already using the latest version. (Exta args: -seqidlist; Missing: ) ====================================================================== ERROR: test_blastx (test_NCBI_BLAST_tools.CheckCompleteArgList) Check all blastx arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 212, in test_blastx self.check("blastx", Applications.NcbiblastxCommandline) File "test_NCBI_BLAST_tools.py", line 204, in check ",".join(sorted(missing)))) MissingExternalDependencyError: BLAST+ and Biopython out of sync. Your version of the NCBI BLAST+ tool blastx does not match what we are expecting. Please update your copy of Biopython, or report this issue if you are already using the latest version. (Exta args: -seqidlist; Missing: ) ====================================================================== ERROR: test_psiblast (test_NCBI_BLAST_tools.CheckCompleteArgList) Check all psiblast arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 232, in test_psiblast self.check("psiblast", Applications.NcbipsiblastCommandline) File "test_NCBI_BLAST_tools.py", line 204, in check ",".join(sorted(missing)))) MissingExternalDependencyError: BLAST+ and Biopython out of sync. Your version of the NCBI BLAST+ tool psiblast does not match what we are expecting. Please update your copy of Biopython, or report this issue if you are already using the latest version. (Exta args: -seqidlist; Missing: ) ====================================================================== ERROR: test_rpsblast (test_NCBI_BLAST_tools.CheckCompleteArgList) Check all rpsblast arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 236, in test_rpsblast self.check("rpsblast", Applications.NcbirpsblastCommandline) File "test_NCBI_BLAST_tools.py", line 204, in check ",".join(sorted(missing)))) MissingExternalDependencyError: BLAST+ and Biopython out of sync. Your version of the NCBI BLAST+ tool rpsblast does not match what we are expecting. Please update your copy of Biopython, or report this issue if you are already using the latest version. (Exta args: -seqidlist; Missing: ) ====================================================================== ERROR: test_rpstblastn (test_NCBI_BLAST_tools.CheckCompleteArgList) Check all rpstblastn arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 240, in test_rpstblastn self.check("rpstblastn", Applications.NcbirpstblastnCommandline) File "test_NCBI_BLAST_tools.py", line 204, in check ",".join(sorted(missing)))) MissingExternalDependencyError: BLAST+ and Biopython out of sync. Your version of the NCBI BLAST+ tool rpstblastn does not match what we are expecting. Please update your copy of Biopython, or report this issue if you are already using the latest version. (Exta args: -seqidlist; Missing: ) ====================================================================== ERROR: test_tblastn (test_NCBI_BLAST_tools.CheckCompleteArgList) Check all tblastn arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 228, in test_tblastn self.check("tblastn", Applications.NcbitblastnCommandline) File "test_NCBI_BLAST_tools.py", line 204, in check ",".join(sorted(missing)))) MissingExternalDependencyError: BLAST+ and Biopython out of sync. Your version of the NCBI BLAST+ tool tblastn does not match what we are expecting. Please update your copy of Biopython, or report this issue if you are already using the latest version. (Exta args: -db_soft_mask,-seqidlist; Missing: ) ====================================================================== ERROR: test_tblastx (test_NCBI_BLAST_tools.CheckCompleteArgList) Check all tblastx arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 224, in test_tblastx self.check("tblastx", Applications.NcbitblastxCommandline) File "test_NCBI_BLAST_tools.py", line 204, in check ",".join(sorted(missing)))) MissingExternalDependencyError: BLAST+ and Biopython out of sync. Your version of the NCBI BLAST+ tool tblastx does not match what we are expecting. Please update your copy of Biopython, or report this issue if you are already using the latest version. (Exta args: -db_soft_mask,-seqidlist; Missing: ) ---------------------------------------------------------------------- From mjldehoon at yahoo.com Sat Aug 28 02:53:19 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 27 Aug 2010 19:53:19 -0700 (PDT) Subject: [Biopython-dev] Obsolete code In-Reply-To: Message-ID: <992835.52804.qm@web62406.mail.re1.yahoo.com> I applied your patch to Bio.Motif, and added DeprecationWarnings to each of the submodules of Bio.AlignAce (without them, importing one of the submodules directly did not issue the DeprecationWarning in Bio.AlignAce.__init__). Thanks, --Michiel. --- On Sat, 8/21/10, Bartek Wilczynski wrote: From: Bartek Wilczynski Subject: Re: [Biopython-dev] Obsolete code To: "Michiel de Hoon" Cc: biopython-dev at biopython.org Date: Saturday, August 21, 2010, 4:56 AM Hi, Great job. I just have a small comment on the Bio.AlignAce.Parser module. It is a part of the Bio.AlignAce "package" which was already deprecated? (see Bio.AlignAce.__init__.py). I think either there is no need to put an extra deprecation warning in Bio.AlignAce.Parser, or we should put it in all submodules of Bio.AlignAce (like Bio.AlignaAce.Motif, etc.). As for deprecating the old parsers in Bio Motif, I've just looked through the code in Motif.__init__.py and realized that it was still using the old implemmentations... This small patch below should fix it (I don't want to push onto the main branch now, as it's frozen). Cheers Bartek diff --git a/Bio/Motif/__init__.py b/Bio/Motif/__init__.py index 4c7c19a..9ca0200 100644 --- a/Bio/Motif/__init__.py +++ b/Bio/Motif/__init__.py @@ -10,12 +10,12 @@ as well as methods for motif comparisons and motif searching in sequences. ?It also inlcudes functionality for parsing AlignACE and MEME programs ?""" ?from _Motif import Motif -from Parsers.AlignAce import AlignAceParser, CompareAceParser -from Parsers.MEME import MEMEParser,MASTParser +import Parsers.AlignAce +import Parsers.MEME ?from Thresholds import ScoreDistribution ? -_parsers={"AlignAce":AlignAceParser, -????????? "MEME":MEMEParser +_parsers={"AlignAce":Parsers.AlignAce.read, +????????? "MEME":Parsers.MEME.read ?????????? } ? ?def _from_pfm(handle): @@ -75,7 +75,7 @@ def parse(handle,format): ???????? else: #we have a proper reader ???????????? yield reader(handle) ???? else: # we have a proper reader -??????? for m in parser().parse(handle).motifs: +??????? for m in parser(handle).motifs: ???????????? yield m ? ?def read(handle,format): On Sat, Aug 21, 2010 at 8:30 AM, Michiel de Hoon wrote: Dear all, The classes and modules listed below were declared obsolete in Biopython 1.54 or earlier, but do not yet raise a deprecation warning. Most of this functionality moved to a different module or was implemented differently. I suggest we add a DeprecationWarning to each of these before Biopython 1.55 final. The only tricky one is the .data property of Seq classes in Bio.Seq. Still, it might be good to add a DeprecationWarning there to make people aware that this property is obsolete. Any objections? --Michiel. Bio.CelFile.CelParser Bio.CelFile.CelScanner Bio.CelFile.CelConsumer Bio.CelFile.CelRecord Bio.Align.MultipleSeqAlignment.get_column Bio.Align.Generic.Alignment Bio.Align.Generic.Alignment.get_seq_by_num Bio.AlignAce.Parser Bio.Blast.Applications.FastacmdCommandline Bio.Blast.Applications.BlastallCommandline Bio.Blast.Applications.BlastpgpCommandline Bio.Blast.Applications.RpsBlastCommandline Bio.Blast.NCBIStandalone.blastall Bio.Blast.NCBIStandalone.blastpgp Bio.Blast.NCBIStandalone.rpsblast (Bio.Blast.NCBIStandalone has been declared obsolete, but I guess some people may still be using the Blast plain-text output parser.) Bio.Clustalw Bio.Compass._Scanner Bio.Compass._Consumer Bio.Compass.RecordParser Bio.Compass.Iterator Bio.Emboss.Applications.EProtDistCommandline Bio.Emboss.Applications.ENeighborCommandline Bio.Emboss.Applications.EProtParsCommandline Bio.Emboss.Applications.EConsenseCommandline Bio.Emboss.Applications.ESeqBootCommandline Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer.xcentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer._set_xcentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer.ycentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer._set_ycentre Bio.Graphics.GenomeDiagram._Graph.centre Bio.Graphics.GenomeDiagram._Graph._set_centre Bio.Motif.Parsers.AlignAce.AlignAceConsumer Bio.Motif.Parsers.AlignAce.AlignAceParser Bio.Motif.Parsers.AlignAce.AlignAceScanner Bio.Motif.Parsers.AlignAce.CompareAceScanner Bio.Motif.Parsers.AlignAce.CompareAceConsumer Bio.Motif.Parsers.MEME.MEMEParser Bio.Motif.Parsers.MEME._MEMEScanner Bio.Motif.Parsers.MEME._MEMEConsumer Bio.Motif.Parsers.MEME._MASTConsumer Bio.Motif.Parsers.MEME.MASTParser Bio.Motif.Parsers.MEME._MASTScanner Bio.Motif.Parsers.MEME.MASTRecord Bio.PopGen.FDist.RecordParser Bio.PopGen.FDist._Scanner Bio.PopGen.FDist._RecordConsumer Bio.Seq.Seq.data Bio.SeqUtils.GC_Frame Bio.SeqUtils.fasta_uniqids Bio.SeqUtils.apply_on_multi_fasta Bio.SeqUtils.quicker_apply_on_multi_fasta Bio.UniGene.UnigeneSequenceRecord Bio.UniGene.UnigeneProtsimRecord Bio.UniGene.UnigeneSTSRecord Bio.UniGene.UnigeneRecord Bio.UniGene._RecordConsumer Bio.UniGene._Scanner Bio.UniGene.RecordParser Bio.UniGene.Iterator _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From mjldehoon at yahoo.com Sat Aug 28 03:21:26 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 27 Aug 2010 20:21:26 -0700 (PDT) Subject: [Biopython-dev] Obsolete code Message-ID: <380077.17575.qm@web62401.mail.re1.yahoo.com> > (without them, importing one of the submodules directly did not issue > the DeprecationWarning in Bio.AlignAce.__init__). I take that back ... it turned out that Python 2.7 silences DeprecationWarnings by default. These warnings can be switched on by the -Wd flag when starting Python. For Biopython 1.56, we should replace DeprecationWarnings by a Biopython-specific warning class. --Michiel. --- On Fri, 8/27/10, Michiel de Hoon wrote: From: Michiel de Hoon Subject: Re: [Biopython-dev] Obsolete code To: "Bartek Wilczynski" Cc: biopython-dev at biopython.org Date: Friday, August 27, 2010, 10:53 PM I applied your patch to Bio.Motif, and added DeprecationWarnings to each of the submodules of Bio.AlignAce (without them, importing one of the submodules directly did not issue the DeprecationWarning in Bio.AlignAce.__init__). Thanks, --Michiel. --- On Sat, 8/21/10, Bartek Wilczynski wrote: From: Bartek Wilczynski Subject: Re: [Biopython-dev] Obsolete code To: "Michiel de Hoon" Cc: biopython-dev at biopython.org Date: Saturday, August 21, 2010, 4:56 AM Hi, Great job. I just have a small comment on the Bio.AlignAce.Parser module. It is a part of the Bio.AlignAce "package" which was already deprecated? (see Bio.AlignAce.__init__.py). I think either there is no need to put an extra deprecation warning in Bio.AlignAce.Parser, or we should put it in all submodules of Bio.AlignAce (like Bio.AlignaAce.Motif, etc.). As for deprecating the old parsers in Bio Motif, I've just looked through the code in Motif.__init__.py and realized that it was still using the old implemmentations... This small patch below should fix it (I don't want to push onto the main branch now, as it's frozen). Cheers Bartek diff --git a/Bio/Motif/__init__.py b/Bio/Motif/__init__.py index 4c7c19a..9ca0200 100644 --- a/Bio/Motif/__init__.py +++ b/Bio/Motif/__init__.py @@ -10,12 +10,12 @@ as well as methods for motif comparisons and motif searching in sequences. ?It also inlcudes functionality for parsing AlignACE and MEME programs ?""" ?from _Motif import Motif -from Parsers.AlignAce import AlignAceParser, CompareAceParser -from Parsers.MEME import MEMEParser,MASTParser +import Parsers.AlignAce +import Parsers.MEME ?from Thresholds import ScoreDistribution ? -_parsers={"AlignAce":AlignAceParser, -????????? "MEME":MEMEParser +_parsers={"AlignAce":Parsers.AlignAce.read, +????????? "MEME":Parsers.MEME.read ?????????? } ? ?def _from_pfm(handle): @@ -75,7 +75,7 @@ def parse(handle,format): ???????? else: #we have a proper reader ???????????? yield reader(handle) ???? else: # we have a proper reader -??????? for m in parser().parse(handle).motifs: +??????? for m in parser(handle).motifs: ???????????? yield m ? ?def read(handle,format): On Sat, Aug 21, 2010 at 8:30 AM, Michiel de Hoon wrote: Dear all, The classes and modules listed below were declared obsolete in Biopython 1.54 or earlier, but do not yet raise a deprecation warning. Most of this functionality moved to a different module or was implemented differently. I suggest we add a DeprecationWarning to each of these before Biopython 1.55 final. The only tricky one is the .data property of Seq classes in Bio.Seq. Still, it might be good to add a DeprecationWarning there to make people aware that this property is obsolete. Any objections? --Michiel. Bio.CelFile.CelParser Bio.CelFile.CelScanner Bio.CelFile.CelConsumer Bio.CelFile.CelRecord Bio.Align.MultipleSeqAlignment.get_column Bio.Align.Generic.Alignment Bio.Align.Generic.Alignment.get_seq_by_num Bio.AlignAce.Parser Bio.Blast.Applications.FastacmdCommandline Bio.Blast.Applications.BlastallCommandline Bio.Blast.Applications.BlastpgpCommandline Bio.Blast.Applications.RpsBlastCommandline Bio.Blast.NCBIStandalone.blastall Bio.Blast.NCBIStandalone.blastpgp Bio.Blast.NCBIStandalone.rpsblast (Bio.Blast.NCBIStandalone has been declared obsolete, but I guess some people may still be using the Blast plain-text output parser.) Bio.Clustalw Bio.Compass._Scanner Bio.Compass._Consumer Bio.Compass.RecordParser Bio.Compass.Iterator Bio.Emboss.Applications.EProtDistCommandline Bio.Emboss.Applications.ENeighborCommandline Bio.Emboss.Applications.EProtParsCommandline Bio.Emboss.Applications.EConsenseCommandline Bio.Emboss.Applications.ESeqBootCommandline Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer.xcentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer._set_xcentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer.ycentre Bio.Graphics.GenomeDiagram._AbstractDrawer.AbstractDrawer._set_ycentre Bio.Graphics.GenomeDiagram._Graph.centre Bio.Graphics.GenomeDiagram._Graph._set_centre Bio.Motif.Parsers.AlignAce.AlignAceConsumer Bio.Motif.Parsers.AlignAce.AlignAceParser Bio.Motif.Parsers.AlignAce.AlignAceScanner Bio.Motif.Parsers.AlignAce.CompareAceScanner Bio.Motif.Parsers.AlignAce.CompareAceConsumer Bio.Motif.Parsers.MEME.MEMEParser Bio.Motif.Parsers.MEME._MEMEScanner Bio.Motif.Parsers.MEME._MEMEConsumer Bio.Motif.Parsers.MEME._MASTConsumer Bio.Motif.Parsers.MEME.MASTParser Bio.Motif.Parsers.MEME._MASTScanner Bio.Motif.Parsers.MEME.MASTRecord Bio.PopGen.FDist.RecordParser Bio.PopGen.FDist._Scanner Bio.PopGen.FDist._RecordConsumer Bio.Seq.Seq.data Bio.SeqUtils.GC_Frame Bio.SeqUtils.fasta_uniqids Bio.SeqUtils.apply_on_multi_fasta Bio.SeqUtils.quicker_apply_on_multi_fasta Bio.UniGene.UnigeneSequenceRecord Bio.UniGene.UnigeneProtsimRecord Bio.UniGene.UnigeneSTSRecord Bio.UniGene.UnigeneRecord Bio.UniGene._RecordConsumer Bio.UniGene._Scanner Bio.UniGene.RecordParser Bio.UniGene.Iterator _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Sat Aug 28 11:33:50 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 28 Aug 2010 12:33:50 +0100 Subject: [Biopython-dev] Test suite failure In-Reply-To: <829004.66089.qm@web62403.mail.re1.yahoo.com> References: <829004.66089.qm@web62403.mail.re1.yahoo.com> Message-ID: On Sat, Aug 28, 2010 at 2:41 AM, Michiel de Hoon wrote: > Dear all, > > I am getting the errors below when running the Biopython tests on > Mac OS X. This is with blast+ 2.2.23. The blast+ 2.2.24 installer > fails on Mac OS X, so I don't know if the same problem would > occur with that version. > > --Michiel My fault, that's a new argument added in 2.2.24+ which shouldn't be expected on older versions. I'll fix that (probably Monday). Peter From mjldehoon at yahoo.com Sat Aug 28 12:21:31 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 28 Aug 2010 05:21:31 -0700 (PDT) Subject: [Biopython-dev] Deprecated code In-Reply-To: <207402.89678.qm@web62407.mail.re1.yahoo.com> Message-ID: <198514.66980.qm@web62407.mail.re1.yahoo.com> I finished adding the deprecation warnings and removing deprecated code as discussed, except for these three: Bio.Transcribe Bio.Translate BioSQL.BioSeqDatabase.open_database: driver="psycopg" option For last one, I wasn't sure how to appropriately remove this option from the code. Maybe somebody more familiar with BioSQL can take care of this? For Bio.Transcribe and Bio.Translate, it turned out that Bio/Encodings/IUPACEncoding.py still use these modules. I don't know if Bio.Encodings.IUPACEncoding is still being used. It's only imported from Bio.Alphabet.IUPAC, but it doesn't seem to be actually used there. Bio.Encodings itself is not being imported anywhere in Biopython. Can we declare Bio.Encodings obsolete? Or can we just remove this module together with Bio.Transcribe, Bio.Translate? --Michiel. --- On Sat, 8/21/10, Michiel de Hoon wrote: > From: Michiel de Hoon > Subject: [Biopython-dev] Deprecated code > To: biopython-dev at biopython.org > Date: Saturday, August 21, 2010, 2:56 AM > Dear all, > > Below are the modules and functions that were deprecated > (with a DeprecationWarning) in Biopython 1.51 or earlier, > which was released on August 17, 2009. Since that is more > than one year (and more than two releases) ago, we can > remove these from Biopython. Any objections? If not, I'll > send this list also to the user mailing list before removing > them. > > --Michiel. > > Bio.Align.FormatConvert > Bio.Emboss.Applications.PrimerSearchCommandline.set_parameter > Bio.Entrez.efetch: rettype="genbank" option > Bio.Fasta > Bio.SCOP.Dom.Parser > Bio.SwissProt.SProt > Bio.Transcribe > Bio.Translate > BioSQL.BioSeqDatabase.open_database: driver="psycopg" > option > > > > ? ? ? > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython at maubp.freeserve.co.uk Sat Aug 28 13:18:14 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 28 Aug 2010 14:18:14 +0100 Subject: [Biopython-dev] Deprecated code In-Reply-To: <198514.66980.qm@web62407.mail.re1.yahoo.com> References: <207402.89678.qm@web62407.mail.re1.yahoo.com> <198514.66980.qm@web62407.mail.re1.yahoo.com> Message-ID: On Sat, Aug 28, 2010 at 1:21 PM, Michiel de Hoon wrote: > I finished adding the deprecation warnings and removing deprecated > code as discussed, except for these three: > > Bio.Transcribe > Bio.Translate > BioSQL.BioSeqDatabase.open_database: driver="psycopg" option > > For last one, I wasn't sure how to appropriately remove this option from > the code. Maybe somebody more familiar with BioSQL can take care of this? I could do that - or maybe Cymon if he has time. > For Bio.Transcribe and Bio.Translate, it turned out that Bio/Encodings > /IUPACEncoding.py still use these modules. I don't know if > Bio.Encodings.IUPACEncoding is still being used. It's only imported > from Bio.Alphabet.IUPAC, but it doesn't seem to be actually used there. > Bio.Encodings itself is not being imported anywhere in Biopython. > > Can we declare Bio.Encodings obsolete? Or can we just remove this > module together with Bio.Transcribe, Bio.Translate? This is also tied in with Bio.PropertyManager and thus Bio.utils - it is probably best to mark these and Bio.Encodings as obsolete, leave Bio.Transcribe and Bio.Translate as deprecated, and review this after Biopython 1.55 is out. Peter From mjldehoon at yahoo.com Sat Aug 28 14:19:53 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 28 Aug 2010 07:19:53 -0700 (PDT) Subject: [Biopython-dev] Deprecated code In-Reply-To: Message-ID: <258810.52757.qm@web62408.mail.re1.yahoo.com> --- On Sat, 8/28/10, Peter wrote: > This is also tied in with Bio.PropertyManager and thus > Bio.utils - it is probably best to mark these and Bio.Encodings > as obsolete, leave Bio.Transcribe and Bio.Translate as deprecated, > and review this after Biopython 1.55 is out. OK, done. I also applied Nathan's suggested fix for Bio.Entrez. --Michiel. From biopython at maubp.freeserve.co.uk Sat Aug 28 14:36:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 28 Aug 2010 15:36:40 +0100 Subject: [Biopython-dev] Test suite failure In-Reply-To: References: <829004.66089.qm@web62403.mail.re1.yahoo.com> Message-ID: On Sat, Aug 28, 2010 at 12:33 PM, Peter wrote: > On Sat, Aug 28, 2010 at 2:41 AM, Michiel de Hoon wrote: >> Dear all, >> >> I am getting the errors below when running the Biopython tests on >> Mac OS X. This is with blast+ 2.2.23. The blast+ 2.2.24 installer >> fails on Mac OS X, so I don't know if the same problem would >> occur with that version. >> >> --Michiel > > My fault, that's a new argument added in 2.2.24+ which shouldn't > be expected on older versions. I'll fix that (probably Monday). > Done, http://github.com/biopython/biopython/commit/176c277deca23657980001813f8f5315b52eb679 Peter From biopython at maubp.freeserve.co.uk Mon Aug 30 13:34:59 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Aug 2010 14:34:59 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 Message-ID: On Thu, Aug 26, 2010 at 6:37 PM, Peter wrote: > On Wed, Aug 18, 2010 at 10:03 PM, Peter wrote: >> >> As mentioned earlier, epydoc is done, and I've also just done a news post: >> http://news.open-bio.org/news/2010/08/biopython-1-55-beta-released/ >> >> If there are any typos or other suggestions for improvement, please tell >> us. We can edit that page - and then turn it into an email to send out. >> >> This means the "trunk freeze" is over, but for the next week or so when >> we'll do the official release, let's focus on documentation and any bug fixes. >> [Keep new feature work only on branches please.] >> > > As discussed here, the plan is to do the final release on Monday or Tuesday > (30 or 31 August 2010), after a few deprecations/removals are done: > http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008194.html > http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008196.html > Hi all, Those deprecations have been done, and the BLAST+ unit test tweaked. Are there any further issues we need to address before doing the final release? Please speak up soon, otherwise I'll do the release tonight or tomorrow as planned. Thanks, Peter From bugzilla-daemon at portal.open-bio.org Mon Aug 30 17:26:41 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 30 Aug 2010 13:26:41 -0400 Subject: [Biopython-dev] [Bug 3134] New: to_networkx returns weird stuff Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3134 Summary: to_networkx returns weird stuff Product: Biopython Version: 1.55b Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: john at nurfuerspam.de Hi, I tried to read http://www.phylosoft.org/archaeopteryx/examples/data/multiple_supports.xml and convert it using to_networkx(). Strangely, all nodes in the resulting graph are named Clade, and when using networkx.write_dot() I get a file with a single clade node, although the number of nodes in the graph object is correct. Also using networkx.to_agraph() does not help. tree = Phylo.read("multiple_support.xml", "phyloxml") tree = Phylo.to_networkx(tree) print set(tree.nodes()) print tree.number_of_nodes() networkx.write_dot(tree, "test.dot") tree = networkx.to_agraph(tree) tree.draw("tree.pdf", prog = "dot") For http://www.phylosoft.org/archaeopteryx/examples/data/bcl_2.xml I get a star tree with a single Clade node in the center and leafs labeled by gene names. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From cy at cymon.org Mon Aug 30 17:36:22 2010 From: cy at cymon.org (Cymon Cox) Date: Mon, 30 Aug 2010 18:36:22 +0100 Subject: [Biopython-dev] BioSQL test code on Postgres Message-ID: Hi Folks, The current test code in test_BioSQL.py fails on PostgreSQL; ERROR: Check list, keys, length etc ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/cymon/git/biopython-github-master/Tests/test_BioSQL.py", line 187, in test_get_db_items del db["non-existant-name"] File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line 519, in __delitem__ if key not in self: File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line 539, in __contains__ (self.dbid, value))[0]) File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line 423, in execute_and_fetch_col0 self.execute(sql, args or ()) File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line 404, in execute self.dbutils.execute(self.cursor, sql, args) File "/home/cymon/git/biopython-github-master/BioSQL/DBUtils.py", line 33, in execute cursor.execute(sql, args or ()) DataError: invalid input syntax for integer: "non-existant-name" LINE 1: ...M bioentry WHERE biodatabase_id=1 AND bioentry_id=E'non-exis... Because when trying to delete a bioentry_id that is a string type, ie. "non-existant-name" (line 188 on test_BioSQL.py), postgres throws an error rather than returning a long (0,1) as in sqlite (and presumably MySQL (I havent tried it)). Should we be type checking in __delitem__ (line 517) in BioSeqDatabase.py so that trying to delete a bioentry_id that is a string throws an appropriate error? Otherwise the BioSQL tests pass on PostGreSQL. The default DBDRIVER PostgreSQL driver in setup.py should be changed to "pyscopg2" Cheers, Cymon From bugzilla-daemon at portal.open-bio.org Mon Aug 30 18:01:52 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 30 Aug 2010 14:01:52 -0400 Subject: [Biopython-dev] [Bug 3134] to_networkx returns weird stuff In-Reply-To: Message-ID: <201008301801.o7UI1qaS024296@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3134 ------- Comment #1 from eric.talevich at gmail.com 2010-08-30 14:01 EST ------- (In reply to comment #0) > Hi, > > I tried to read > http://www.phylosoft.org/archaeopteryx/examples/data/multiple_supports.xml > and convert it using to_networkx(). Strangely, all nodes in the resulting graph > are named Clade, and when using networkx.write_dot() I get a file with a single > clade node, although the number of nodes in the graph object is correct. Hello, Yes, that's how it works. The exported graph uses Clade objects as nodes, and the string representation of unnamed nodes is just the name of the class, Clade. This should still work OK for most NetworkX operations, but not the Graphviz-based ones. The Graphviz-based operations in NetworkX convert the nodes to strings, and then assumes all identical strings refer to the same node, so you'll get a star graph whenever the internal nodes are unnamed. For drawing, try Biopython's Phylo.draw_graphviz instead -- it handles this naming issue safely: >>> Phylo.draw_graphviz(my_tree, prog='neato') You can fix the naming issue yourself by assigning unique names to all the internal clades: >>> for i, clade in enumerate(my_tree.find_clades()): ... if not clade.name: ... clade.name = "Clade_%d" % i Then networkx.write_dot should work better. Or, if you want to do something else involving Graphviz layout, you can look at the source for Phylo.draw_graphviz in the file Bio/Phylo/_utils.py. Is there anything else you'd like to see built into Bio.Phylo to make these operations easier? Thanks, -Eric -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Aug 30 18:21:43 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Aug 2010 19:21:43 +0100 Subject: [Biopython-dev] BioSQL test code on Postgres In-Reply-To: References: Message-ID: On Mon, Aug 30, 2010 at 6:36 PM, Cymon Cox wrote: > Hi Folks, > > The current test code in test_BioSQL.py fails on PostgreSQL; > > ERROR: Check list, keys, length etc > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "/home/cymon/git/biopython-github-master/Tests/test_BioSQL.py", line > 187, in test_get_db_items > ? ?del db["non-existant-name"] > ?File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", > line 519, in __delitem__ > ? ?if key not in self: > ?File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", > line 539, in __contains__ > ? ?(self.dbid, value))[0]) > ?File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", > line 423, in execute_and_fetch_col0 > ? ?self.execute(sql, args or ()) > ?File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", > line 404, in execute > ? ?self.dbutils.execute(self.cursor, sql, args) > ?File "/home/cymon/git/biopython-github-master/BioSQL/DBUtils.py", line 33, > in execute > ? ?cursor.execute(sql, args or ()) > DataError: invalid input syntax for integer: "non-existant-name" > LINE 1: ...M bioentry WHERE biodatabase_id=1 AND bioentry_id=E'non-exis... > > Because when trying to delete a bioentry_id that is a string type, ie. > "non-existant-name" (line 188 on test_BioSQL.py), ?postgres throws an error > rather than returning a long (0,1) as in sqlite (and presumably MySQL (I > havent tried it)). This (test_get_db_items) is once the unit tests added since Biopython 1.54, while I was working on making the BioSQL objects act more like dictionaries. I think the SQL statements for the __contains__ method (and others added recently) may need single quotes round the %s placeholders. Does that work? > Should we be type checking in __delitem__ (line 517) in BioSeqDatabase.py so > that trying to delete a bioentry_id that is a string throws an appropriate > error? > > Otherwise the BioSQL tests pass on PostGreSQL. > > The default DBDRIVER PostgreSQL driver in setup.py should be changed to > "pyscopg2" > > Cheers, Cymon Thanks, Peter From bugzilla-daemon at portal.open-bio.org Mon Aug 30 20:23:20 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 30 Aug 2010 16:23:20 -0400 Subject: [Biopython-dev] [Bug 3134] to_networkx returns weird stuff In-Reply-To: Message-ID: <201008302023.o7UKNK7v029152@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3134 ------- Comment #2 from john at nurfuerspam.de 2010-08-30 16:23 EST ------- thanx for the quick response! the problem is that the standard way using pylab produces ugly squares instead of arrow head in the final layout. but more importantly, I want to perform complex graph operations on the tree using networkx and use Bio.Phylo really just as a means of parsing ;-) I think that when providing a function like to_networkx, it should behave in a manner the user of networkx expects. Why not just use a unique hashable identifier like integers as standard string representation for ALL nodes, and use graphviz'/networkx' label attribute for any name label the node might have? Using the string representation of labeled leafs as identifiers in networkx is also dangerous, since they will be used as identifiers in graphviz and underly a number of restrictions (no whitespace etc.) I'd propose the following: in Clade, __repr__() should return the name of the node, if it has one, or a unique identifier like id() (the memory adress) with an additional "..." around them to make it a valid graphviz identifier.. def __repr__(self): if self.name != None: return self.name else: return "\""+str(id(self))+"\"" your workaround by manually relabeling the clades also assigns the identifiers to the leafs, but there of course I want the species/gene label ;-) cheers, john -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Aug 31 01:43:17 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 30 Aug 2010 21:43:17 -0400 Subject: [Biopython-dev] [Bug 3134] to_networkx returns weird stuff In-Reply-To: Message-ID: <201008310143.o7V1hHQH005250@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3134 ------- Comment #3 from eric.talevich at gmail.com 2010-08-30 21:43 EST ------- (In reply to comment #2) > thanx for the quick response! > > the problem is that the standard way using pylab produces ugly squares instead > of arrow head in the final layout. True. Do you know a way to fix that from NetworkX/matplotlib, or is that the whole reason you're exporting to Graphviz? > but more importantly, I want to perform > complex graph operations on the tree using networkx and use Bio.Phylo really > just as a means of parsing ;-) Great, that's what it's there for. :) > I think that when providing a function like to_networkx, it should behave in a > manner the user of networkx expects. Why not just use a unique hashable > identifier like integers as standard string representation for ALL nodes, and > use graphviz'/networkx' label attribute for any name label the node might have? OK, but wouldn't you want to be able to retrieve all of the original clade's data from any node in a networkx graph? Currently, the arrangement is: - Clade objects are the hashable object used for keys - Given a node in a networkx graph produced by to_networkx, you can uniquely locate that clade in the original tree using the tree.find_* methods -- it's still a valid target, and duplicate names aren't a problem - Other clade attributes, like taxonomy and bootstrap values, are also still available on the node - Serializing the graph nodes for Graphviz goes haywire, so we provide draw_graphviz as a workaround I think you're suggesting: - Use id(clade) or some arbitrary unique integer as keys - Attach the clade name, if available, to the networkx node as a label... right? How would I do this? - To keep other clade attributes with the node, maybe add them to the optional dictionary associated with each node, like we already do for branch colors and widths - At some point, generate a lookup table to associate the graph nodes' unique integer identifiers with the original clade objects -- or at least make this possible through another function - Serializing for Graphviz will work cleanly > Using the string representation of labeled leafs as identifiers in networkx is > also dangerous, since they will be used as identifiers in graphviz and underly > a number of restrictions (no whitespace etc.) Indeed, and as you've seen, the strings need to be unique. One alternative is to mimic Python's default repr() style for representing complex classes: '' But then, switching to the string name where clades do have the 'name' attribute set would be inconsistent. > I'd propose the following: in Clade, __repr__() should return the name of the > node, if it has one, or a unique identifier like id() (the memory adress) with > an additional "..." around them to make it a valid graphviz identifier.. > > def __repr__(self): > if self.name != None: > return self.name > else: > return "\""+str(id(self))+"\"" Remember that the NetworkX labels don't necessarily need to be the same as the string representation of clades in Bio.Phylo -- it's just convenient if they match. So __repr__ could be: While your function could be used to create labels in to_networkx. Thanks for your help, Eric -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From cy at cymon.org Tue Aug 31 09:11:08 2010 From: cy at cymon.org (Cymon Cox) Date: Tue, 31 Aug 2010 10:11:08 +0100 Subject: [Biopython-dev] BioSQL test code on Postgres In-Reply-To: References: Message-ID: Hi Peter, On 30 August 2010 19:21, Peter wrote: > On Mon, Aug 30, 2010 at 6:36 PM, Cymon Cox wrote: > > Hi Folks, > > > > The current test code in test_BioSQL.py fails on PostgreSQL; > > > > ERROR: Check list, keys, length etc > > ---------------------------------------------------------------------- > > Traceback (most recent call last): > > File "/home/cymon/git/biopython-github-master/Tests/test_BioSQL.py", > line > > 187, in test_get_db_items > > del db["non-existant-name"] > > File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", > > line 519, in __delitem__ > > if key not in self: > > File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", > > line 539, in __contains__ > > (self.dbid, value))[0]) > > File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", > > line 423, in execute_and_fetch_col0 > > self.execute(sql, args or ()) > > File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", > > line 404, in execute > > self.dbutils.execute(self.cursor, sql, args) > > File "/home/cymon/git/biopython-github-master/BioSQL/DBUtils.py", line > 33, > > in execute > > cursor.execute(sql, args or ()) > > DataError: invalid input syntax for integer: "non-existant-name" > > LINE 1: ...M bioentry WHERE biodatabase_id=1 AND > bioentry_id=E'non-exis... > > > > Because when trying to delete a bioentry_id that is a string type, ie. > > "non-existant-name" (line 188 on test_BioSQL.py), postgres throws an > error > > rather than returning a long (0,1) as in sqlite (and presumably MySQL (I > > havent tried it)). > > This (test_get_db_items) is once the unit tests added since Biopython 1.54, > while I was working on making the BioSQL objects act more like > dictionaries. > I think the SQL statements for the __contains__ method (and others added > recently) may need single quotes round the %s placeholders. Does that work? > Nope. The bioentry_id parameter is already being passed as a string - psycopg automatically converts python objects into SQL literals (see http://initd.org/psycopg/docs/usage.html#the-problem-with-the-query-parameters ). Here is the same error using the psql interface: biosqldb=# select count(bioentry_id) from bioentry where biodatabase_id=1 and bioentry_id='non-existant'; ERROR: invalid input syntax for integer: "non-existant" LINE 1: ...m bioentry where biodatabase_id=1 and bioentry_id='non-exist... biosqldb=# \d bioentry; Table "public.bioentry" Column | Type | Modifiers ----------------+------------------------+------------------------------------------------------- bioentry_id | integer | not null default nextval('bioentry_pk_seq'::regclass) Cheers, Cymon From biopython at maubp.freeserve.co.uk Tue Aug 31 10:38:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Aug 2010 11:38:37 +0100 Subject: [Biopython-dev] BioSQL test code on Postgres In-Reply-To: References: Message-ID: On Tue, Aug 31, 2010 at 10:11 AM, Cymon Cox wrote: > Hi Peter, > > Nope. The bioentry_id parameter is already being passed as a string - > psycopg automatically converts python objects into SQL literals (see > http://initd.org/psycopg/docs/usage.html#the-problem-with-the-query-parameters > ). > > Here is the same error using the psql interface: > > biosqldb=# select count(bioentry_id) from bioentry where biodatabase_id=1 > and bioentry_id='non-existant'; > ERROR: ?invalid input syntax for integer: "non-existant" > ... > > Cheers, Cymon I think I get it now - the bioentry_id is an integer (in all the schemas), and PostgreSQL throws an error due to the type mismatch (we are comparing it to a string) while MySQL and SQLite just return no matches. How's this?: http://github.com/biopython/biopython/commit/050963bd3bbd6653101306eed9aab6c629cf9375 Peter From cy at cymon.org Tue Aug 31 10:43:23 2010 From: cy at cymon.org (Cymon Cox) Date: Tue, 31 Aug 2010 11:43:23 +0100 Subject: [Biopython-dev] BioSQL test code on Postgres In-Reply-To: References: Message-ID: Hi P., On 31 August 2010 11:38, Peter wrote: > On Tue, Aug 31, 2010 at 10:11 AM, Cymon Cox wrote: > > Hi Peter, > > > > Nope. The bioentry_id parameter is already being passed as a string - > > psycopg automatically converts python objects into SQL literals (see > > > http://initd.org/psycopg/docs/usage.html#the-problem-with-the-query-parameters > > ). > > > > Here is the same error using the psql interface: > > > > biosqldb=# select count(bioentry_id) from bioentry where biodatabase_id=1 > > and bioentry_id='non-existant'; > > ERROR: invalid input syntax for integer: "non-existant" > > ... > > > > Cheers, Cymon > > I think I get it now - the bioentry_id is an integer (in all the schemas), > and PostgreSQL throws an error due to the type mismatch (we are > comparing it to a string) while MySQL and SQLite just return no > matches. How's this?: > > > http://github.com/biopython/biopython/commit/050963bd3bbd6653101306eed9aab6c629cf9375 > Sure - nice and simple. Or catching the exceptions, this'll work: diff --git a/BioSQL/BioSeqDatabase.py b/BioSQL/BioSeqDatabase.py index 45c0774..57d6ab9 100644 --- a/BioSQL/BioSeqDatabase.py +++ b/BioSQL/BioSeqDatabase.py @@ -533,9 +533,15 @@ class BioSeqDatabase: """Check if a primary (internal) id is this namespace (sub database).""" sql = "SELECT COUNT(bioentry_id) FROM bioentry " + \ "WHERE biodatabase_id=%s AND bioentry_id=%s;" - return bool(self.adaptor.execute_and_fetch_col0(sql, - (self.dbid, value))[0]) - + try: + return bool(self.adaptor.execute_and_fetch_col0(sql,(self.dbid, value))[0]) + except (self.adaptor.conn.DataError, + self.adaptor.conn.DatabaseError), e: + if "invalid input syntax for integer" in e.__str__(): + return False + else: + raise + def __iter__(self): """Iterate over ids (which may not be meaningful outside this database).""" #TODO - Iterate over the cursor, much more efficient With either correction, the test will pass with the PyGreSQL driver as well. Cheers, C. From biopython at maubp.freeserve.co.uk Tue Aug 31 11:07:33 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Aug 2010 12:07:33 +0100 Subject: [Biopython-dev] BioSQL test code on Postgres In-Reply-To: References: Message-ID: On Tue, Aug 31, 2010 at 11:43 AM, Cymon Cox wrote: > Hi P., > > Sure - nice and simple. > > Or catching the exceptions, this'll work: > ... > > With either correction, the test will pass with the PyGreSQL driver as well. > > Cheers, C. The exception approach looks fragile due to the error message check, so let's go with my commit - as you say, nice and simple. Thanks for checking this :) Peter From biopython at maubp.freeserve.co.uk Tue Aug 31 17:06:15 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Aug 2010 18:06:15 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 In-Reply-To: References: Message-ID: On Mon, Aug 30, 2010 at 2:34 PM, Peter wrote: > > Hi all, > > Those deprecations have been done, and the BLAST+ unit test tweaked. > Are there any further issues we need to address before doing the final > release? Please speak up soon, otherwise I'll do the release tonight or > tomorrow as planned. We've sorted out the BioSQL on PostreSQL problem now: http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008215.html I'm starting the release process now - just as NumPy 1.5 is released (their first release to support Python 2.7) so I should be able to do Windows installers for Biopython on Python 2.7 :) Peter From biopython at maubp.freeserve.co.uk Tue Aug 31 18:14:41 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Aug 2010 19:14:41 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 In-Reply-To: References: Message-ID: On Tue, Aug 31, 2010 at 6:06 PM, Peter wrote: > > I'm starting the release process now - just as NumPy 1.5 is released (their > first release to support Python 2.7) so I should be able to do Windows > installers for Biopython on Python 2.7 :) > Binaries are up - Brad, could you do a basic sanity test then upload to PyPI please. I really should sort out an account on there for myself... I'll write up the announcement in an hour or two's time (other things to attend to first), unless anyone else would like to do it? Peter From chapmanb at 50mail.com Tue Aug 31 18:33:34 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 31 Aug 2010 14:33:34 -0400 Subject: [Biopython-dev] Trunk freeze for Biopython 1.55 In-Reply-To: References: Message-ID: <20100831183334.GA31194@sobchak.mgh.harvard.edu> Peter; > > I'm starting the release process now - just as NumPy 1.5 is released (their > > first release to support Python 2.7) so I should be able to do Windows > > installers for Biopython on Python 2.7 :) Awesome. Thanks as always for all the hard work getting this together. Great to see a new release, and nice timing with NumPy. > Binaries are up - Brad, could you do a basic sanity test then upload to > PyPI please. I really should sort out an account on there for myself... Done. It's dead easy to do, and if you want to setup an account on pypi and send me your username I can add you as an owner so you can upload them in the future if you want. Thanks again, Brad From p.j.a.cock at googlemail.com Tue Aug 31 23:00:37 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 1 Sep 2010 00:00:37 +0100 Subject: [Biopython-dev] Biopython 1.55 released Message-ID: Dear Biopythoneers, After the beta earlier this month (thank you to everyone who helped test this), we?ve just released Biopython 1.55 . For full details see: http://news.open-bio.org/news/2010/08/biopython-1-55-released/ Note we are phasing out support for Python 2.4. We will continue to support it for at least one further release (i.e. Biopython 1.56). This could be delayed given feedback from our users (e.g. if this proves to be a problem in combination with other libraries or a popular Linux distribution). (At least) 12 people have contributed to this release, including 6 new people ? thank you all: * Andres Colubri (first contribution) * Carlos Rios Vera (first contribution) * Claude Paroz (first contribution) * Cymon Cox * Eric Talevich * Frank Kauff * Joao Rodrigues (first contribution) * Konstantin Okonechnikov (first contribution) * Michiel de Hoon * Nathan Edwards (first contribution) * Peter Cock * Tiago Antao Source distributions and Windows installers are available from the downloads page on the Biopython website: http://www.biopython.org/wiki/Download As usual, feedback is most welcome on the mailing lists (or bugzilla). Regards, Peter P.S. You can follow Biopython on Twitter, http://twitter.com/biopython