From mdehoon at c2b2.columbia.edu Sat Oct 1 20:50:03 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sat Oct 1 20:49:55 2005 Subject: [Biopython-dev] Blast Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD0D@cgcmail.cgc.cpmc.columbia.edu> Thanks, Jeff. Currently, qblast in Bio.Blast.NCBIWWW can already return text output via the format_type argument. Unfortunately, the standalone blast and www-blast return slightly different text output, so we'd have to fix the parser in Bio.Blast.NCBIStandalone for it to handle www-blast text output. I found out that both standalone blast and www-blast can also return XML output, which is identical (as far as I can tell) in both cases. I would think that a parser that can read this XML output is most stable. So I propose the following: 1) Let qblast return XML output by default; text and html output can be returned by setting the format_type argument to qblast appropriately. 2) Write an XML parser that can read blast output from standalone and www blast. 3) In a few versions, deprecate the text parser in NCBIStandalone and the html parser in NCBIWWW. (This will only affect users of the text parser in NCBIStandalone, since the html parser in NCBIWWW is already behind and cannot parse blast output as it is). Any objections, anybody? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Jeffrey Chang [mailto:jeffrey.chang@duke.edu] Sent: Thu 9/29/2005 10:16 PM To: Michiel De Hoon Cc: biopython-dev@biopython.org Subject: Re: [Biopython-dev] Blast On Sep 29, 2005, at 1:46 PM, Michiel De Hoon wrote: > To my surprise, the parser in Blast.NCBIWWW tries to parse HTML output > instead of text output. My guess is that the HTML output changes > more often > and is more difficult to parse than text output. So isn't it > possible to make > NCBIWWW.qblast return text output instead of HTML and parse that > instead? > So my question is, why was the choice made to parse HTML instead of > text? Is > it simply because blast-on-the-web couldn't return text output in > the past? You are right. It was done that way in the past when the only way to use NCBI's BLAST was to use the HTML output. (Actually, there was a version that you could access through a proprietary non-HTTP protocol, but the databases were not updated as frequently.) Now that we can get text, perhaps it is time to encourage users to use the text one. I believe the HTML parser is a few versions behind now, and unable to parse current BLAST output anymore. Jeff From mdehoon at c2b2.columbia.edu Wed Oct 5 13:37:22 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed Oct 5 13:41:29 2005 Subject: [Biopython-dev] Blast Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD23@cgcmail.cgc.cpmc.columbia.edu> Hi everybody, Fixing the Blast problem turned out to be easier than I thought, as there was already a parser (written by Bertrand Frottier) in Biopython that parses Blast XML output. This Biopython project keeps amazing me. So I just made XML output the default for qblast, and updated the Tutorial/Cookbook chapter on Blast. Feel free to test it, and let me know if there are any problems. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-dev-bounces@portal.open-bio.org on behalf of Michiel De Hoon Sent: Sat 10/1/2005 8:50 PM To: Jeffrey Chang; biopython-dev@biopython.org Subject: RE: [Biopython-dev] Blast Thanks, Jeff. Currently, qblast in Bio.Blast.NCBIWWW can already return text output via the format_type argument. Unfortunately, the standalone blast and www-blast return slightly different text output, so we'd have to fix the parser in Bio.Blast.NCBIStandalone for it to handle www-blast text output. I found out that both standalone blast and www-blast can also return XML output, which is identical (as far as I can tell) in both cases. I would think that a parser that can read this XML output is most stable. So I propose the following: 1) Let qblast return XML output by default; text and html output can be returned by setting the format_type argument to qblast appropriately. 2) Write an XML parser that can read blast output from standalone and www blast. 3) In a few versions, deprecate the text parser in NCBIStandalone and the html parser in NCBIWWW. (This will only affect users of the text parser in NCBIStandalone, since the html parser in NCBIWWW is already behind and cannot parse blast output as it is). Any objections, anybody? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Jeffrey Chang [mailto:jeffrey.chang@duke.edu] Sent: Thu 9/29/2005 10:16 PM To: Michiel De Hoon Cc: biopython-dev@biopython.org Subject: Re: [Biopython-dev] Blast On Sep 29, 2005, at 1:46 PM, Michiel De Hoon wrote: > To my surprise, the parser in Blast.NCBIWWW tries to parse HTML output > instead of text output. My guess is that the HTML output changes > more often > and is more difficult to parse than text output. So isn't it > possible to make > NCBIWWW.qblast return text output instead of HTML and parse that > instead? > So my question is, why was the choice made to parse HTML instead of > text? Is > it simply because blast-on-the-web couldn't return text output in > the past? You are right. It was done that way in the past when the only way to use NCBI's BLAST was to use the HTML output. (Actually, there was a version that you could access through a proprietary non-HTTP protocol, but the databases were not updated as frequently.) Now that we can get text, perhaps it is time to encourage users to use the text one. I believe the HTML parser is a few versions behind now, and unable to parse current BLAST output anymore. Jeff _______________________________________________ Biopython-dev mailing list Biopython-dev@biopython.org http://biopython.org/mailman/listinfo/biopython-dev From sbassi at gmail.com Wed Oct 5 13:48:19 2005 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed Oct 5 15:27:30 2005 Subject: [Biopython-dev] Blast In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD23@cgcmail.cgc.cpmc.columbia.edu> References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD23@cgcmail.cgc.cpmc.columbia.edu> Message-ID: On 10/5/05, Michiel De Hoon wrote: > Fixing the Blast problem turned out to be easier than I thought, as there was > already a parser (written by Bertrand Frottier) in Biopython that parses > Blast XML output. This Biopython project keeps amazing me. > So I just made XML output the default for qblast, and updated the > Tutorial/Cookbook chapter on Blast. Feel free to test it, and let me know if > there are any problems. Is the updated document online? Best regards, SB. -- La web sin popups ni spyware: Usa Firefox en lugar de Internet Explorer From mdehoon at c2b2.columbia.edu Wed Oct 5 15:51:46 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed Oct 5 15:51:49 2005 Subject: [Biopython-dev] Blast Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD24@cgcmail.cgc.cpmc.columbia.edu> > Is the updated document online? Yes it is, see: http://www.biopython.org/docs/tutorial/Tutorial004.html#toc10 --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From bugzilla-daemon at portal.open-bio.org Fri Oct 7 13:28:31 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Oct 7 13:30:57 2005 Subject: [Biopython-dev] [Bug 1876] Bio.pairwise2 generates incorrect Needleman-Wunsch score_matrix Message-ID: <200510071728.j97HSVvj006722@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1876 ------- Comment #1 from bill@barnard-engineering.com 2005-10-07 13:28 ------- Created an attachment (id=239) --> (http://bugzilla.open-bio.org/attachment.cgi?id=239&action=view) Generates the Durbin Fig 2.5 matrix via simple method ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 7 13:30:00 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Oct 7 13:30:58 2005 Subject: [Biopython-dev] [Bug 1876] Bio.pairwise2 generates incorrect Needleman-Wunsch score_matrix Message-ID: <200510071730.j97HU0ox006746@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1876 ------- Comment #2 from bill@barnard-engineering.com 2005-10-07 13:30 ------- Created an attachment (id=240) --> (http://bugzilla.open-bio.org/attachment.cgi?id=240&action=view) Generates the Durbin Fig 2.5 matrix using slightly modified pairwise2.py ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 7 13:26:00 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Oct 7 13:30:59 2005 Subject: [Biopython-dev] [Bug 1876] New: Bio.pairwise2 generates incorrect Needleman-Wunsch score_matrix Message-ID: <200510071726.j97HQ0UE006661@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1876 Summary: Bio.pairwise2 generates incorrect Needleman-Wunsch score_matrix Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev@biopython.org ReportedBy: bill@barnard-engineering.com Investigation of Bio.pairwise2 to duplicate the alignment example from the text "Biological sequence analysis" by R. Durbin, et. al. reveals that although the alignments returned for the example x = 'HEAGAWGHEE', y = 'PAWHEAE' are correct, the underlying scoring matrix is not correct. The Biopython version I'm using is from CVS, up-to-date as of 7 Oct 2005. My analysis shows that the scoring matrix entries are correct for each entry F(i,j) where one of the traceback vectors points to F(i-1,j-1). If the traceback vectors do not contain a pointer to the diagonally previous entry, then the F(i,j) entry is calculated incorrectly. For this initial bug report I will show output of two programs that generate the scoring matrix for this example by two methods. I will attach some supporting files to this bug report following the initial commit. These files will make it easy to reproduce the bug. The output from my simple program that duplicates the example in Durbin (there is one entry in the Durbin text that is in error) is: Score matrix for Figure 2.5 example in Durbin text x: H E A G A W G H E E y: 0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -8 -2 -9 -17 -25 -33 -41 -49 -57 -65 -73 A -16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60 W -24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37 H -32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19 E -40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5 A -48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2 E -56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1 The output from the (slightly) modified pairwise2.py code is: Global alignment: HEAGAWGHE-E -P--AW-HEAE score: 1 alignment: begin = 0, end = 11 pairwise2 Score matrix for Figure 2.5 example in Durbin text x: H E A G A W G H E E y: x x x x x x x x x x x P x -2 -9 -17 -26 -33 -44 -50 -58 -65 -73 A x -10 -3 -4 -17 -20 -36 -41 -51 -58 -66 W x -19 -13 -6 -7 -15 -5 -31 -39 -47 -55 H x -14 -18 -13 -8 -9 -18 -7 -3 -21 -29 E x -32 -8 -19 -16 -9 -12 -16 -7 3 -5 A x -42 -23 -3 -16 -11 -12 -12 -17 -8 2 E x -48 -24 -17 -6 -12 -14 -15 -12 -9 1 My supporting attachments will include the driver programs that generated these outputs, along with the patch to modify pairwise2.py so it returns the score_matrix. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 7 13:31:38 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Oct 7 14:31:15 2005 Subject: [Biopython-dev] [Bug 1876] Bio.pairwise2 generates incorrect Needleman-Wunsch score_matrix Message-ID: <200510071731.j97HVcsM006814@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1876 ------- Comment #3 from bill@barnard-engineering.com 2005-10-07 13:31 ------- Created an attachment (id=241) --> (http://bugzilla.open-bio.org/attachment.cgi?id=241&action=view) patch pairwise2.py so it returns the score_matrix Run against local copy of pairwise2.py as shown: % patch pairwise2.py pairwise2.patch ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 7 13:34:55 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Oct 7 14:31:17 2005 Subject: [Biopython-dev] [Bug 1876] Bio.pairwise2 generates incorrect Needleman-Wunsch score_matrix Message-ID: <200510071734.j97HYtqt006889@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1876 ------- Comment #4 from bill@barnard-engineering.com 2005-10-07 13:34 ------- Created an attachment (id=242) --> (http://bugzilla.open-bio.org/attachment.cgi?id=242&action=view) Spreadsheet generated comparison of Durbin vs Biopython score_matrix This pdf shows graphically where the differences lie between the two methods. It demonstrates that the erroneous entries all lie in those cells whose traceback pointers point to either F(i-1,j) or F(i,j-1). All entries whose traceback pointers include a pointer to F(i-1,j-1) are correct. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 7 15:58:06 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Oct 7 16:34:02 2005 Subject: [Biopython-dev] [Bug 1876] Bio.pairwise2 generates incorrect Needleman-Wunsch score_matrix Message-ID: <200510071958.j97Jw65J010495@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1876 jchang@biopython.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #5 from jchang@biopython.org 2005-10-07 15:58 ------- The two implementations use different algorithms, and the bookkeeping in the score matrix is done differently. For an illustration of the algorithm used in Biopython, see: http://www.maths.tcd.ie/~lily/pres2/sld006.htm The score matrices are different and not comparable. However, the final alignment, score, and traceback should be the same. Please let me know if they are not. I chose this algorithm because it was simpler for me to generate a correct traceback. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 7 18:38:57 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Oct 7 19:31:14 2005 Subject: [Biopython-dev] [Bug 1876] Bio.pairwise2 generates incorrect Needleman-Wunsch score_matrix Message-ID: <200510072238.j97McvtB012857@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1876 ------- Comment #6 from bill@barnard-engineering.com 2005-10-07 18:38 ------- Since this (http://www.maths.tcd.ie/~lily/pres2/sld003.htm) algorithm is the first entry found by googling "needleman wunsch", I should have read it more carefully. It is now clear to me why the two score matrices are different. (In this case they are tantalizingly similar...) It might make a useful test for the module, to create alignment tuples with the Durbin algorithm and compare those produced with the "McLysaght" algorithm. I'll consider creating some tests like that, and contribute those. I have not yet convinced myself the two algorithms are equivalent. Probably a literature search in my local UC library would resolve that question. Do you have a reference or web pointer to appropriate papers? I would probably change this bug's status to "INVALID", which I presume is the one for "Not a bug". ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 7 20:51:21 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Oct 7 21:31:17 2005 Subject: [Biopython-dev] [Bug 1876] Bio.pairwise2 generates incorrect Needleman-Wunsch score_matrix Message-ID: <200510080051.j980pLw1014842@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1876 jchang@biopython.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |INVALID ------- Comment #7 from jchang@biopython.org 2005-10-07 20:51 ------- The algorithms produce equivalent scores and alignments when the gap penalties are linear. However, the algorithm implemented in Biopython is more general and can handle more exotic non-linear models of gap penalties. It's been many years since I've looked at this, but IIRC the original Needleman-Wunsch paper described the algorithm implemented in Biopython, and the algorithm in Durbin is a refinement made later to increase its speed. The refinement is much faster [ O(NM) vs O(NNM) ]. In biopython, for the case of affine gap penalties, the alignment algorithm in _make_score_matrix_fast is a hybrid of the two approaches, that has O(NM) running time, while also being easier (for me) to understand and debug. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Oct 9 14:35:44 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sun Oct 9 15:31:56 2005 Subject: [Biopython-dev] [Bug 1735] Bio.Blast.NCBIStandalone.BlastParser crashs with unusual alignment fragments Message-ID: <200510091835.j99IZiwk007347@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1735 ------- Comment #2 from mdehoon@ims.u-tokyo.ac.jp 2005-10-09 14:35 ------- Does this error also appear with the XML-based parser in NCBIXML? If not, it's not worth fixing the text-based parser in NCBIStandalone. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 12 13:40:25 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Wed Oct 12 14:31:41 2005 Subject: [Biopython-dev] [Bug 1876] Bio.pairwise2 generates incorrect Needleman-Wunsch score_matrix Message-ID: <200510121740.j9CHePW7015206@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1876 ------- Comment #8 from bill@barnard-engineering.com 2005-10-12 13:40 ------- FWIW your traceback algorithm works perfectly as is for the Gotoh/Durbin/NW algorithm. The internal trace_matrix is somewhat different since each pointer may only point to three possible cells rather than to the previous max score cell (IIRC...). Anyway I used your algorithm in my program. (I want to be sure I understand all the basics before I move to something more challenging.) I tried both extracting the portions I needed, and simply importing it directly; both work perfectly for my example alignment. I fooled around a bit with making test routines to compare the output of the Gotoh algorithm to the NW algorithm. I learned a bit, but I don't think that adding these tests to the test_pairwise2 module would really add anything useful. Thanks for making all this publicly available. I really like the way your program uses the __call__ ==> decode methods to enable flexible use of the alignment programs. It's opened my eyes to Python's capabilities. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 12 16:43:14 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Wed Oct 12 17:31:43 2005 Subject: [Biopython-dev] [Bug 1876] Bio.pairwise2 generates incorrect Needleman-Wunsch score_matrix Message-ID: <200510122043.j9CKhEPl020539@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1876 ------- Comment #9 from jchang@biopython.org 2005-10-12 16:43 ------- That's good news. I'm glad the code is useful for you! Jeff ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From Mark.Hoebeke at jouy.inra.fr Mon Oct 17 10:07:13 2005 From: Mark.Hoebeke at jouy.inra.fr (Mark Hoebeke) Date: Mon Oct 17 10:36:19 2005 Subject: [Biopython-dev] NCBIStandalone Blast HSP parsing Message-ID: <4353B011.5030004@jouy.inra.fr> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, I wanted a quick and easy way to determine the endpoints of HSPs extraced from Blast reports parser with NCBIStandalone. Unfortunately the HSP class lacks the query_end and sbjct_end attributes. Googling around led me to a recipe describing how to compute the endpoint using the total length, gap length and other niceties. Not exactly intuitive to me. Hence I dove into the NCBIStandalone and HSP modules and made some slight modifications. Basically I added the two attributes to HSP and the following snippets to NCBIStandalone (release 1.4b): 972c972 < _query_re = re.compile(r"Query: (\d+)\s*(.+) (\d+)") - --- > _query_re = re.compile(r"Query: (\d+)\s*(.+) \d") 977,978c977 < start, seq, end = m.groups() < self._hsp.query_end=string.atoi(end); - --- > start, seq = m.groups() 997,998c996,997 < start, seq, end = _re_search( < r"Sbjct: (\d+)\s*(.+) (\d+)", line, - --- > start, seq = _re_search( > r"Sbjct: (\d+)\s*(.+) \d", line, 1014c1013 < self._hsp.sbjct_end=string.atoi(end) - --- > Looks to easy to be true, I thought. Now sorry if I'm missing some important issues here (I'm quite new to BioPython), but is there a reason no one has made this patch yet ? Thanks for any comments (flames and others.) Cheers, Mark - -- - ----------------------------Mark.Hoebeke@jouy.inra.fr----------------------- Unit? Statistique & G?nome _/_/_/ _/_/_/ http://stat.genopole.cnrs.fr T?l : +33 (0)1 60 87 38 03 _/ _/ Fax : +33 (0)1 60 87 38 09 Tour Evry 2, _/_/ _/ _/_/ 523, pl. des Terrasses F-91000, _/ _/ _/ Evry PGP : A2AD52E3 _/_/_/ _/_/_/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDU7ARa3nTV6KtUuMRArBqAKC/m4i+VpVaU3clvOkMuYkfRrZQ+QCfbRKg gBBW5wNKS3sb/Uqr31eumx8= =vSWV -----END PGP SIGNATURE----- From mdehoon at c2b2.columbia.edu Mon Oct 17 11:27:28 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Mon Oct 17 11:33:34 2005 Subject: [Biopython-dev] NCBIStandalone Blast HSP parsing Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD50@cgcmail.cgc.cpmc.columbia.edu> Just to make sure I understand what you're doing: Are the query_end and sbjct_end attributes found in the Blast output, or do you calculate them from the other attributes in the Blast output? If they're in the Blast output, 1) Do they always appear in the Blast output, or does it depend on the query? In the latter case, does the modified Blast parser choke on Blast output that do not contain these attributes? 2) Does these attributes also appear in Blast XML output? The XML parser is easier to maintain than the text-based parser in BlastStandalone, may therefore become the main Blast parser in Biopython in the long run. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-dev-bounces@portal.open-bio.org on behalf of Mark Hoebeke Sent: Mon 10/17/2005 10:07 AM To: biopython-dev@biopython.org Subject: [Biopython-dev] NCBIStandalone Blast HSP parsing -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, I wanted a quick and easy way to determine the endpoints of HSPs extraced from Blast reports parser with NCBIStandalone. Unfortunately the HSP class lacks the query_end and sbjct_end attributes. Googling around led me to a recipe describing how to compute the endpoint using the total length, gap length and other niceties. Not exactly intuitive to me. Hence I dove into the NCBIStandalone and HSP modules and made some slight modifications. Basically I added the two attributes to HSP and the following snippets to NCBIStandalone (release 1.4b): 972c972 < _query_re = re.compile(r"Query: (\d+)\s*(.+) (\d+)") - --- > _query_re = re.compile(r"Query: (\d+)\s*(.+) \d") 977,978c977 < start, seq, end = m.groups() < self._hsp.query_end=string.atoi(end); - --- > start, seq = m.groups() 997,998c996,997 < start, seq, end = _re_search( < r"Sbjct: (\d+)\s*(.+) (\d+)", line, - --- > start, seq = _re_search( > r"Sbjct: (\d+)\s*(.+) \d", line, 1014c1013 < self._hsp.sbjct_end=string.atoi(end) - --- > Looks to easy to be true, I thought. Now sorry if I'm missing some important issues here (I'm quite new to BioPython), but is there a reason no one has made this patch yet ? Thanks for any comments (flames and others.) Cheers, Mark - -- - ----------------------------Mark.Hoebeke@jouy.inra.fr----------------------- Unit? Statistique & G?nome _/_/_/ _/_/_/ http://stat.genopole.cnrs.fr T?l : +33 (0)1 60 87 38 03 _/ _/ Fax : +33 (0)1 60 87 38 09 Tour Evry 2, _/_/ _/ _/_/ 523, pl. des Terrasses F-91000, _/ _/ _/ Evry PGP : A2AD52E3 _/_/_/ _/_/_/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDU7ARa3nTV6KtUuMRArBqAKC/m4i+VpVaU3clvOkMuYkfRrZQ+QCfbRKg gBBW5wNKS3sb/Uqr31eumx8= =vSWV -----END PGP SIGNATURE----- _______________________________________________ Biopython-dev mailing list Biopython-dev@biopython.org http://biopython.org/mailman/listinfo/biopython-dev From mdehoon at c2b2.columbia.edu Mon Oct 17 13:51:21 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Mon Oct 17 13:52:22 2005 Subject: [Biopython-dev] NCBIStandalone Blast HSP parsing Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD54@cgcmail.cgc.cpmc.columbia.edu> The current patch breaks the parser if the Blast output does not contain query_end and sbjct_end. The problem seems to be in the line: start, seq, end = m.groups() (traceback ends with File "/usr/local/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", line 995, in query start, seq, end = m.groups() ValueError: need more than 2 values to unpack). But this should be easy to fix. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Mark Hoebeke [mailto:Mark.Hoebeke@jouy.inra.fr] Sent: Mon 10/17/2005 1:05 PM To: Michiel De Hoon Cc: biopython-dev@biopython.org Subject: Re: [Biopython-dev] NCBIStandalone Blast HSP parsing -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Michiel De Hoon wrote: > Just to make sure I understand what you're doing: > > Are the query_end and sbjct_end attributes found in the Blast output, or do > you calculate them from the other attributes in the Blast output? I directly grab them from the Blast report. >If they're > in the Blast output, > 1) Do they always appear in the Blast output, or does it depend on the query? > In the latter case, does the modified Blast parser choke on Blast output that > do not contain these attributes? The patterns in the official release 1.4b module check for "a single digit" following the string of sequence characters at the end of the alignment lines. All I did was to extend the patterns to "one or more digits" and to capture them in order to store their contents in the HSP attributes. So AFAIK, the patch does not change the way reports are currently parsed. > 2) Does these attributes also appear in Blast XML output? The XML parser is > easier to maintain than the text-based parser in BlastStandalone, may > therefore become the main Blast parser in Biopython in the long run. With the sequence set I'm currently working on (and with NCBI Blast 2.2.12), the XML output has indeed the following elements : Hsp_query-to and Hsp_hit-to which seem to have the intended meaning. I suppose I should be able to adapt the XML parser while I'm on it, if it is officially accepted. Mark > > --Michiel. > > > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-dev-bounces@portal.open-bio.org on behalf of Mark Hoebeke > Sent: Mon 10/17/2005 10:07 AM > To: biopython-dev@biopython.org > Subject: [Biopython-dev] NCBIStandalone Blast HSP parsing > > Hi all, > > I wanted a quick and easy way to determine the endpoints of HSPs extraced > from > Blast reports parser with NCBIStandalone. Unfortunately the HSP class lacks > the > query_end and sbjct_end attributes. Googling around led me to a recipe > describing how to compute the endpoint using the total length, gap length and > other niceties. Not exactly intuitive to me. > > Hence I dove into the NCBIStandalone and HSP modules and made some slight > modifications. Basically I added the two attributes to HSP and the following > snippets to NCBIStandalone (release 1.4b): > > 972c972 > < _query_re = re.compile(r"Query: (\d+)\s*(.+) (\d+)") > --- > >>> _query_re = re.compile(r"Query: (\d+)\s*(.+) \d") > > 977,978c977 > < start, seq, end = m.groups() > < self._hsp.query_end=string.atoi(end); > --- > >>> start, seq = m.groups() > > 997,998c996,997 > < start, seq, end = _re_search( > < r"Sbjct: (\d+)\s*(.+) (\d+)", line, > --- > >>> start, seq = _re_search( >>> r"Sbjct: (\d+)\s*(.+) \d", line, > > 1014c1013 > < self._hsp.sbjct_end=string.atoi(end) > --- > > > Looks to easy to be true, I thought. Now sorry if I'm missing some important > issues here (I'm quite new to BioPython), but is there a reason no one has > made > this patch yet ? > > Thanks for any comments (flames and others.) > > Cheers, > > Mark > > > -- > - > ----------------------------Mark.Hoebeke@jouy.inra.fr----------------------- > Unit? Statistique & G?nome _/_/_/ _/_/_/ http://stat.genopole.cnrs.fr > T?l : +33 (0)1 60 87 38 03 _/ _/ Fax : +33 (0)1 60 87 38 09 > Tour Evry 2, _/_/ _/ _/_/ 523, pl. des Terrasses > F-91000, _/ _/ _/ Evry > PGP : A2AD52E3 _/_/_/ _/_/_/ > > > > _______________________________________________ Biopython-dev mailing list Biopython-dev@biopython.org http://biopython.org/mailman/listinfo/biopython-dev - -- - -------------------------Mark.Hoebeke@jouy.inra.fr--------------------- Unit? Statistique & G?nome Unit? MIG +33 (0)1 60 87 38 03 T?l. +33 (0)1 34 65 28 85 +33 (0)1 60 87 38 09 Fax. +33 (0)1 34 65 29 01 Tour Evry 2, 523 pl. des Terrasses INRA - Domaine de Vilvert F - 91000 Evry F - 78352 Jouy-en-Josas CEDEX -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDU9nxa3nTV6KtUuMRApqXAJ9a9z7J0bvigZ1NiZZxmTUziMocIgCdE0O9 EvX5Bm6f7dMcAUFGfNIO8tk= =mWo3 -----END PGP SIGNATURE----- From bugzilla-daemon at portal.open-bio.org Mon Oct 17 13:53:07 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Mon Oct 17 14:31:57 2005 Subject: [Biopython-dev] [Bug 1715] Bio.Blast.NCBIStandalone does not support standalone NCBI RPS-Blast (rpsblast) output Message-ID: <200510171753.j9HHr78k000803@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1715 ------- Comment #10 from mdehoon@ims.u-tokyo.ac.jp 2005-10-17 13:53 ------- Is this modification of NCBIStandalone.py still relevant for Biopython 1.40b? If so, could you submit a patch (instead of an edited version of NCBIStandalone.py)? Also, the edited version contains many differences to the CVS version that do not seem relevent for rpsblast (differences in tabs versus spaces, for example) that make it difficult to assess this patch. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From Mark.Hoebeke at jouy.inra.fr Mon Oct 17 13:05:53 2005 From: Mark.Hoebeke at jouy.inra.fr (Mark Hoebeke) Date: Mon Oct 17 14:52:26 2005 Subject: [Biopython-dev] NCBIStandalone Blast HSP parsing In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD50@cgcmail.cgc.cpmc.columbia.edu> References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD50@cgcmail.cgc.cpmc.columbia.edu> Message-ID: <4353D9F1.4050607@jouy.inra.fr> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Michiel De Hoon wrote: > Just to make sure I understand what you're doing: > > Are the query_end and sbjct_end attributes found in the Blast output, or do > you calculate them from the other attributes in the Blast output? I directly grab them from the Blast report. >If they're > in the Blast output, > 1) Do they always appear in the Blast output, or does it depend on the query? > In the latter case, does the modified Blast parser choke on Blast output that > do not contain these attributes? The patterns in the official release 1.4b module check for "a single digit" following the string of sequence characters at the end of the alignment lines. All I did was to extend the patterns to "one or more digits" and to capture them in order to store their contents in the HSP attributes. So AFAIK, the patch does not change the way reports are currently parsed. > 2) Does these attributes also appear in Blast XML output? The XML parser is > easier to maintain than the text-based parser in BlastStandalone, may > therefore become the main Blast parser in Biopython in the long run. With the sequence set I'm currently working on (and with NCBI Blast 2.2.12), the XML output has indeed the following elements : Hsp_query-to and Hsp_hit-to which seem to have the intended meaning. I suppose I should be able to adapt the XML parser while I'm on it, if it is officially accepted. Mark > > --Michiel. > > > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-dev-bounces@portal.open-bio.org on behalf of Mark Hoebeke > Sent: Mon 10/17/2005 10:07 AM > To: biopython-dev@biopython.org > Subject: [Biopython-dev] NCBIStandalone Blast HSP parsing > > Hi all, > > I wanted a quick and easy way to determine the endpoints of HSPs extraced > from > Blast reports parser with NCBIStandalone. Unfortunately the HSP class lacks > the > query_end and sbjct_end attributes. Googling around led me to a recipe > describing how to compute the endpoint using the total length, gap length and > other niceties. Not exactly intuitive to me. > > Hence I dove into the NCBIStandalone and HSP modules and made some slight > modifications. Basically I added the two attributes to HSP and the following > snippets to NCBIStandalone (release 1.4b): > > 972c972 > < _query_re = re.compile(r"Query: (\d+)\s*(.+) (\d+)") > --- > >>> _query_re = re.compile(r"Query: (\d+)\s*(.+) \d") > > 977,978c977 > < start, seq, end = m.groups() > < self._hsp.query_end=string.atoi(end); > --- > >>> start, seq = m.groups() > > 997,998c996,997 > < start, seq, end = _re_search( > < r"Sbjct: (\d+)\s*(.+) (\d+)", line, > --- > >>> start, seq = _re_search( >>> r"Sbjct: (\d+)\s*(.+) \d", line, > > 1014c1013 > < self._hsp.sbjct_end=string.atoi(end) > --- > > > Looks to easy to be true, I thought. Now sorry if I'm missing some important > issues here (I'm quite new to BioPython), but is there a reason no one has > made > this patch yet ? > > Thanks for any comments (flames and others.) > > Cheers, > > Mark > > > -- > - > ----------------------------Mark.Hoebeke@jouy.inra.fr----------------------- > Unit? Statistique & G?nome _/_/_/ _/_/_/ http://stat.genopole.cnrs.fr > T?l : +33 (0)1 60 87 38 03 _/ _/ Fax : +33 (0)1 60 87 38 09 > Tour Evry 2, _/_/ _/ _/_/ 523, pl. des Terrasses > F-91000, _/ _/ _/ Evry > PGP : A2AD52E3 _/_/_/ _/_/_/ > > > > _______________________________________________ Biopython-dev mailing list Biopython-dev@biopython.org http://biopython.org/mailman/listinfo/biopython-dev - -- - -------------------------Mark.Hoebeke@jouy.inra.fr--------------------- Unit? Statistique & G?nome Unit? MIG +33 (0)1 60 87 38 03 T?l. +33 (0)1 34 65 28 85 +33 (0)1 60 87 38 09 Fax. +33 (0)1 34 65 29 01 Tour Evry 2, 523 pl. des Terrasses INRA - Domaine de Vilvert F - 91000 Evry F - 78352 Jouy-en-Josas CEDEX -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDU9nxa3nTV6KtUuMRApqXAJ9a9z7J0bvigZ1NiZZxmTUziMocIgCdE0O9 EvX5Bm6f7dMcAUFGfNIO8tk= =mWo3 -----END PGP SIGNATURE----- From y.benita at wanadoo.nl Mon Oct 17 19:45:47 2005 From: y.benita at wanadoo.nl (Yair Benita) Date: Mon Oct 17 20:15:13 2005 Subject: [Biopython-dev] NCBIStandalone Blast HSP parsing In-Reply-To: Message-ID: Hi Michael, This issue has already been fixed. In the last review of NCBIstandalone I made with Jeff Chang the query_end and sbjct_end were added. Just grab the latest NCBIstandalone version from CVS. Yair > From: Mark Hoebeke > Organization: INRA - MIA > Date: Mon, 17 Oct 2005 16:07:13 +0200 > To: > Subject: [Biopython-dev] NCBIStandalone Blast HSP parsing > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi all, > > I wanted a quick and easy way to determine the endpoints of HSPs extraced from > Blast reports parser with NCBIStandalone. Unfortunately the HSP class lacks > the > query_end and sbjct_end attributes. Googling around led me to a recipe > describing how to compute the endpoint using the total length, gap length and > other niceties. Not exactly intuitive to me. > > Hence I dove into the NCBIStandalone and HSP modules and made some slight > modifications. Basically I added the two attributes to HSP and the following > snippets to NCBIStandalone (release 1.4b): > > 972c972 > < _query_re = re.compile(r"Query: (\d+)\s*(.+) (\d+)") > - --- >> _query_re = re.compile(r"Query: (\d+)\s*(.+) \d") > 977,978c977 > < start, seq, end = m.groups() > < self._hsp.query_end=string.atoi(end); > - --- >> start, seq = m.groups() > 997,998c996,997 > < start, seq, end = _re_search( > < r"Sbjct: (\d+)\s*(.+) (\d+)", line, > - --- >> start, seq = _re_search( >> r"Sbjct: (\d+)\s*(.+) \d", line, > 1014c1013 > < self._hsp.sbjct_end=string.atoi(end) > - --- >> > > Looks to easy to be true, I thought. Now sorry if I'm missing some important > issues here (I'm quite new to BioPython), but is there a reason no one has > made > this patch yet ? > > Thanks for any comments (flames and others.) > > Cheers, > > Mark > > > - -- > - ----------------------------Mark.Hoebeke@jouy.inra.fr----------------------- > Unit? Statistique & G?nome _/_/_/ _/_/_/ http://stat.genopole.cnrs.fr > T?l : +33 (0)1 60 87 38 03 _/ _/ Fax : +33 (0)1 60 87 38 09 > Tour Evry 2, _/_/ _/ _/_/ 523, pl. des Terrasses > F-91000, _/ _/ _/ Evry > PGP : A2AD52E3 _/_/_/ _/_/_/ > > > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org > > iD8DBQFDU7ARa3nTV6KtUuMRArBqAKC/m4i+VpVaU3clvOkMuYkfRrZQ+QCfbRKg > gBBW5wNKS3sb/Uqr31eumx8= > =vSWV > -----END PGP SIGNATURE----- > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev > From Mark.Hoebeke at jouy.inra.fr Tue Oct 18 00:47:12 2005 From: Mark.Hoebeke at jouy.inra.fr (Mark Hoebeke) Date: Tue Oct 18 00:47:32 2005 Subject: [Biopython-dev] NCBIStandalone Blast HSP parsing In-Reply-To: References: Message-ID: <43547E50.6010600@jouy.inra.fr> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yair Benita wrote: > Hi Michael, > This issue has already been fixed. In the last review of NCBIstandalone I > made with Jeff Chang the query_end and sbjct_end were added. > Just grab the latest NCBIstandalone version from CVS. > > Yair > The patch is indeed cleaner than my submission. Sorry I didn't check beforehand. Many thanks. Mark - -- - -------------------------Mark.Hoebeke@jouy.inra.fr--------------------- Unit? Statistique & G?nome Unit? MIG +33 (0)1 60 87 38 03 T?l. +33 (0)1 34 65 28 85 +33 (0)1 60 87 38 09 Fax. +33 (0)1 34 65 29 01 Tour Evry 2, 523 pl. des Terrasses INRA - Domaine de Vilvert F - 91000 Evry F - 78352 Jouy-en-Josas CEDEX -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDVH5Qa3nTV6KtUuMRAladAKCnPCTfc1ZVRTjSlcS04EvfYlRShACfQIF7 CFKgGBooaWKQnCWunjuespo= =Z1fd -----END PGP SIGNATURE----- From bugzilla-daemon at portal.open-bio.org Tue Oct 18 11:53:07 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Tue Oct 18 12:32:39 2005 Subject: [Biopython-dev] [Bug 1715] Bio.Blast.NCBIStandalone does not support standalone NCBI RPS-Blast (rpsblast) output Message-ID: <200510181553.j9IFr7ud025687@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1715 ------- Comment #11 from biopython-bugzilla@maubp.freeserve.co.uk 2005-10-18 11:53 ------- I have just had another look at this code of mine... It turns out that the plain text output from RPS-BLAST 2.2.12 is slightly different to that from 2.2.10 which I was using. I'm wondering about supporting the RPS-BLAST XML output instead, as Michiel de Hoon has indicated a preference to move to this in the future... http://www.biopython.org/pipermail/biopython-dev/2005-October/002130.html ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Tue Oct 18 20:26:00 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue Oct 18 20:25:01 2005 Subject: [Biopython-dev] Bug 1741 Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD57@cgcmail.cgc.cpmc.columbia.edu> Hi everybody, Bug #1741 complains about the fact that fasta consumer example in the tutorial (and the corresponding example in Doc/examples) no longer works. The reason behind this is that the Fasta parser switched to Martel in revision 1.9 of Fasta/__init__.py, and therefore we no longer have a _Scanner class in Fasta/__init__.py, which causes the example in the tutorial to fail. So my question is, is Section 2.4.2 in the tutorial still relevant? If so, does anybody understand Fasta well enough to be able to fix it? If not, can we get rid of it? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From betainverse at gmail.com Wed Oct 19 12:54:03 2005 From: betainverse at gmail.com (Katie Edmonds) Date: Wed Oct 19 14:41:28 2005 Subject: [Biopython-dev] KEGG questions Message-ID: <8e76d5310510190954y6d989857qcc367c8751e5a6c7@mail.gmail.com> I've been working on trying to make the KEGG Compound module useable. Before I spend more time on it, I'd like to make sure there isn't a more recent version than the one I see in cvs from 2001 (compound_format.py) and 2004 (__init__.py). I'm also curious how nice a new version should be before it's reasonable to submit it. At this point, I've added fields to compound_format.py that apparently didn't exist in the past, so that the parser will at least not crash, but I don't really understand the Martel well enough to get it to parse any of the multiline fields as one would like (the best I've done so far with ENZYME, for example, misses all the enzymes that don't have a role listed after the enzyme id). Similarly, all I've successfully done so far to __init__.py is to add support for compound mass. Would it be appropriate for me to submit my changes at this point? Or would it be best if I kept my changes to myself until I can make the compound_format work in a more general and appropriate way? Thanks, Katie From bugzilla-daemon at portal.open-bio.org Wed Oct 19 14:58:47 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Wed Oct 19 15:32:01 2005 Subject: [Biopython-dev] [Bug 1745] Genbank parser and REGION fields Message-ID: <200510191858.j9JIwlmD032684@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1745 mdehoon@ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Component|Martel/Mindy |Main Distribution OS/Version|SunOS |All Platform|Sun |All Resolution| |FIXED ------- Comment #4 from mdehoon@ims.u-tokyo.ac.jp 2005-10-19 14:58 ------- Fixed in CVS, please try it out to make sure it works. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Wed Oct 19 19:23:15 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed Oct 19 19:22:16 2005 Subject: [Biopython-dev] Biopython new release coming up Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD5A@cgcmail.cgc.cpmc.columbia.edu> Hi everybody, It's been a while since our latest Biopython release. I'm planning to put together a new release (version 1.41, code-named "Manhattan") and hoping to release it by the end of next week. So if you have some code sitting around that is stable enough for a Biopython distribution, this would be a good time to commit it to CVS. Currently, the following tests fail: test_MEME ---> because we don't have the test_MEME output file for comparison test_Nexus test_PDB test_Registry test_SCOP_Astral So actually, we're in pretty good shape. If one of these modules is yours, please have a look at them and (if possible) try to fix them. Also, in Bugzilla, 14 bugs are still open, please have a look to see if there is something we can do about them. Finally, if a module was significantly changed since the previous release (18 February 2005), or if a module has been added since then, it would be nice to have a summary for the web page. Good luck everybody, and may the gods of software development be on our side. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Thu Oct 20 13:40:19 2005 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu Oct 20 14:22:32 2005 Subject: [Biopython-dev] Biopython new release coming up In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD5A@cgcmail.cgc.cpmc.columbia.edu> References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD5A@cgcmail.cgc.cpmc.columbia.edu> Message-ID: <4357D683.4070508@maubp.freeserve.co.uk> Michiel De Hoon wrote: > Hi everybody, > > It's been a while since our latest Biopython release. I'm planning to put > together a new release (version 1.41, code-named "Manhattan") and hoping to > release it by the end of next week. So if you have some code sitting around > that is stable enough for a Biopython distribution, this would be a good time > to commit it to CVS. > Currently, the following tests fail: In BioPython 1.40b there where no tests of NCBIXML.py Do any of you have a collection of XML blast output? Ideally the test set should cover both the online and standalone versions of blast, the different programs (blastn, blastp etc), and a range of recent releases (e.g. 2.2.10 to 2.2.12 say). I have prepared a set of XML from the current blast webserver, attached to this email: 'xbt001', # BLASTP 2.2.12, gi|49176427|ref|NP_418280.3| 'xbt002', # BLASTN 2.2.12, gi|1348916|gb|G26684.1|G26684 'xbt003', # BLASTX 2.2.12, gi|1347369|gb|G25137.1|G25137 'xbt004', # TBLASTN 2.2.12, gi|729325|sp|P39483|DHG2_BACME 'xbt005', # TBLASTX 2.2.12, gi|1348853|gb|G26621.1|G26621, BLOSUM80 [I assume the existing tests are bt### for blast test ###, so mine are xbt### for XML blast test ###] I have also attempted to create a test_NCBIXML.py based on Jeffrey Chang's test_NCBIStandalone.py but most of it doesn't seem to apply (e.g. there is no _scanner, and as far as I can tell, rec.multiple_alignment does not apply to XML blast data). Any comments? Once this is pinned down, I want to try and try dealing with some RPS-BLAST XML output with NCBIXML.py... Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: xml_blast_test_files.zip Type: application/x-zip-compressed Size: 97083 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython-dev/attachments/20051020/ecc4ff24/xml_blast_test_files-0001.bin From mdehoon at c2b2.columbia.edu Thu Oct 20 16:08:57 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu Oct 20 16:10:03 2005 Subject: [Biopython-dev] Biopython new release coming up Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD60@cgcmail.cgc.cpmc.columbia.edu> Thanks, Peter! I have added these files to CVS and written a simple test_NCBIXML.py for it (also in CVS). --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Peter [mailto:biopython-dev@maubp.freeserve.co.uk] Sent: Thu 10/20/2005 1:40 PM To: Michiel De Hoon Cc: biopython-dev@biopython.org Subject: Re: [Biopython-dev] Biopython new release coming up Michiel De Hoon wrote: > Hi everybody, > > It's been a while since our latest Biopython release. I'm planning to put > together a new release (version 1.41, code-named "Manhattan") and hoping to > release it by the end of next week. So if you have some code sitting around > that is stable enough for a Biopython distribution, this would be a good time > to commit it to CVS. > Currently, the following tests fail: In BioPython 1.40b there where no tests of NCBIXML.py Do any of you have a collection of XML blast output? Ideally the test set should cover both the online and standalone versions of blast, the different programs (blastn, blastp etc), and a range of recent releases (e.g. 2.2.10 to 2.2.12 say). I have prepared a set of XML from the current blast webserver, attached to this email: 'xbt001', # BLASTP 2.2.12, gi|49176427|ref|NP_418280.3| 'xbt002', # BLASTN 2.2.12, gi|1348916|gb|G26684.1|G26684 'xbt003', # BLASTX 2.2.12, gi|1347369|gb|G25137.1|G25137 'xbt004', # TBLASTN 2.2.12, gi|729325|sp|P39483|DHG2_BACME 'xbt005', # TBLASTX 2.2.12, gi|1348853|gb|G26621.1|G26621, BLOSUM80 [I assume the existing tests are bt### for blast test ###, so mine are xbt### for XML blast test ###] I have also attempted to create a test_NCBIXML.py based on Jeffrey Chang's test_NCBIStandalone.py but most of it doesn't seem to apply (e.g. there is no _scanner, and as far as I can tell, rec.multiple_alignment does not apply to XML blast data). Any comments? Once this is pinned down, I want to try and try dealing with some RPS-BLAST XML output with NCBIXML.py... Peter From mdehoon at c2b2.columbia.edu Thu Oct 20 18:17:00 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu Oct 20 18:16:03 2005 Subject: [Biopython-dev] KEGG questions Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD62@cgcmail.cgc.cpmc.columbia.edu> > I've been working on trying to make the KEGG Compound module useable. Great! Thanks! > Before I spend more time on it, I'd like to make sure there isn't a > more recent version than the one I see in cvs from 2001 > (compound_format.py) and 2004 (__init__.py). Are you familiar with the Kegg API: http://www.genome.jp/kegg/soap/doc/keggapi_manual.html This is what the Bioruby folks are using; it may be useful for Biopython also. > Would it be appropriate for me to submit my changes at this point? > Or would it be best if I kept my changes to myself until I can make > the compound_format work in a more general and appropriate way? If you don't have CVS access, please submit changes through the "Bugs" link on the Biopython website. I'll look at them after the upcoming 1.41 release. If you do have CVS access, please hold off committing to CVS for a week or so until the new release is out (to make sure only the stable code gets into the release). --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From sbassi at gmail.com Thu Oct 20 17:34:49 2005 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu Oct 20 20:26:31 2005 Subject: [Biopython-dev] Biopython new release coming up In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD5A@cgcmail.cgc.cpmc.columbia.edu> References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD5A@cgcmail.cgc.cpmc.columbia.edu> Message-ID: On 10/19/05, Michiel De Hoon wrote: > It's been a while since our latest Biopython release. I'm planning to put > together a new release (version 1.41, code-named "Manhattan") and hoping to > release it by the end of next week. So if you have some code sitting around > that is stable enough for a Biopython distribution, this would be a good time > to commit it to CVS. just to let you note that Jeronome j.pansanel@pansanel.net is working on a VNTI file parser. Best regards, SB. -- La web sin popups ni spyware: Usa Firefox en lugar de Internet Explorer From bugzilla-daemon at portal.open-bio.org Fri Oct 21 13:08:52 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Oct 21 13:32:12 2005 Subject: [Biopython-dev] [Bug 1885] KEGG Compound db format changes Message-ID: <200510211708.j9LH8q6L003537@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1885 ------- Comment #1 from edmonds@fas.harvard.edu 2005-10-21 13:08 ------- Created an attachment (id=243) --> (http://bugzilla.open-bio.org/attachment.cgi?id=243&action=view) New test cases for the KEGG compound format ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 21 13:11:40 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Oct 21 13:32:13 2005 Subject: [Biopython-dev] [Bug 1885] KEGG Compound db format changes Message-ID: <200510211711.j9LHBe1Q003650@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1885 ------- Comment #2 from edmonds@fas.harvard.edu 2005-10-21 13:11 ------- Created an attachment (id=244) --> (http://bugzilla.open-bio.org/attachment.cgi?id=244&action=view) patch to __init__.py and compound_format.py in Bio/KEGG/Compound ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 21 12:58:45 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Oct 21 13:32:15 2005 Subject: [Biopython-dev] [Bug 1885] New: KEGG Compound db format changes Message-ID: <200510211658.j9LGwi71003328@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1885 Summary: KEGG Compound db format changes Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev@biopython.org ReportedBy: edmonds@fas.harvard.edu Several new fields have been added (apparently including mass, comment, reference, remark, and glycan), structures are represented in a different format, not all enzymes are listed with a enzyme role component, etc. I'll post some new test cases and a starting point for a new compound_format.py that at least won't choke on anything currently in the db. I've also added a mass field to record scanner. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From fkauff at duke.edu Sat Oct 22 15:56:53 2005 From: fkauff at duke.edu (Frank Kauff) Date: Sat Oct 22 17:03:17 2005 Subject: [Biopython-dev] Biopython new release coming up In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD5A@cgcmail.cgc.cpmc.columbia.edu> References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD5A@cgcmail.cgc.cpmc.columbia.edu> Message-ID: <1130011013.4376.6.camel@osiris.biology.duke.edu> Hi all, On Wed, 2005-10-19 at 19:23 -0400, Michiel De Hoon wrote: > Hi everybody, > > It's been a while since our latest Biopython release. I'm planning to put > together a new release (version 1.41, code-named "Manhattan") and hoping to > release it by the end of next week. So if you have some code sitting around > that is stable enough for a Biopython distribution, this would be a good time > to commit it to CVS. > Currently, the following tests fail: > test_MEME ---> because we don't have the test_MEME output file for > comparison > test_Nexus Fixed. Frank > test_PDB > test_Registry > test_SCOP_Astral > So actually, we're in pretty good shape. If one of these modules is yours, > please have a look at them and (if possible) try to fix them. > > Also, in Bugzilla, 14 bugs are still open, please have a look to see if there > is something we can do about them. > > Finally, if a module was significantly changed since the previous release (18 > February 2005), or if a module has been added since then, it would be nice to > have a summary for the web page. > > Good luck everybody, and may the gods of software development be on our side. > > --Michiel. > > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev -- Frank Kauff Dept. of Biology Duke University Box 90338 Durham, NC 27708 USA Phone 919-660-7382 Fax 919-660-7293 Web http://www.lutzonilab.net From biopython-dev at maubp.freeserve.co.uk Fri Oct 28 08:33:38 2005 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Fri Oct 28 09:30:58 2005 Subject: [Biopython-dev] RPS-BLAST XML output - Was: Biopython new release coming up In-Reply-To: <4357D683.4070508@maubp.freeserve.co.uk> References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD5A@cgcmail.cgc.cpmc.columbia.edu> <4357D683.4070508@maubp.freeserve.co.uk> Message-ID: <43621AA2.4010300@maubp.freeserve.co.uk> Peter wrote: > In BioPython 1.40b there where no tests of NCBIXML.py > > ... Thanks due to Michiel for adding these to BioPython with a test script. > Once this is pinned down, I want to try and try dealing with some > RPS-BLAST XML output with NCBIXML.py... See bug 1715 for parsing the rpsblast txt output: http://bugzilla.open-bio.org/show_bug.cgi?id=1715 (Summary - I had a cobbled together a working text parser for RPS-BLAST 2.2.10, but it would need changing for RPS-BLAST 2.2.12) Attached to this email are two simple examples using standalone RPS-BLAST 2.2.10 and 2.2.12, with both txt and XML output. 'xbt006.xml', # Standalone RPS-BLAST 2.2.10, gi|49176427|ref|NP_418280.3| - PFAM database 'xbt007.xml', # Standalone RPS-BLAST 2.2.12, gi|49176427|ref|NP_418280.3| - PFAM database 'xbt008.xml', # Standalone RPS-BLAST 2.2.10, gi|729325|sp|P39483|DHG2_BACME - CDD database The NCBIXML.py seems to work fine with the standalone RPSBLAST XML output. Note that the online RPS-BLAST does not seem to offer XML output at the moment... One very odd thing I noticed, is that the XML files from RPS-BLAST seem to claim to have been produced by blastp: blastp blastp 2.2.10 [Oct-19-2004] Or: blastp blastp 2.2.12 [Aug-07-2005] The text output correctly states RPS-BLAST, so this looks like an RPS-BLAST bug which I plan to report... Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: xml_rps_blast_test_files.zip Type: application/x-zip-compressed Size: 17889 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython-dev/attachments/20051028/ddc47ee3/xml_rps_blast_test_files-0001.bin From mdehoon at c2b2.columbia.edu Fri Oct 28 13:55:38 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri Oct 28 14:01:29 2005 Subject: [Biopython-dev] CVS freeze for Manhattan release Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD89@cgcmail.cgc.cpmc.columbia.edu> Hi everybody, With all biopython tests (*) now passing and the sun shining in Manhattan, the time has come to put together the next Biopython release. To avoid any confusion, I'd like to ask you all not to make any commits to CVS until this release is done (which I will post to the biopython mailing lists). Thanks! --Michiel. (*) Except for the SQL tests. I don't know how to run those. If anybody does, let me know. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Fri Oct 28 20:14:54 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri Oct 28 20:13:38 2005 Subject: [Biopython-dev] Biopython release 1.41 Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD8B@cgcmail.cgc.cpmc.columbia.edu> Dear biopythoneers, We are pleased to announce the release of Biopython 1.41. Many improvements were made in Biopython during the eight months since the previous release, and the new release contains lots of bugfixes, improvements, new functionalities, and better documentation. To pick a few, there's the new Bio.MEME module by Jason Hackney, updates to the Blast parser using Bertrand Frottier's NCBIXML code, a BLAT parser by Yair Benita, numerous updates in Bio.PDB, CompareACE support in AlignAce, and improved user-friendliness in Bio.Seq. Lots of people of contributed to this release, in particular Frank Kauff (Bio.Nexus), Jason Hackney (Bio.MEME), Thomas Hamelryck (Bio.PDB), Fr?d?ric Sohm (Bio.Restriction), James Casbon (Bio.SCOP) for bug fixes and updates, Peter (Bio.Blast.NCBIXML test cases), and of course Jeff Chang, Brad Chapman, Andrew Dalke, and Iddo Friedman for Biopython and the fool-proof instructions on how to roll a release, which made this a lot easier than I anticipated. My apologies if I forgot to thank somebody. --Michiel Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From bugzilla-daemon at portal.open-bio.org Sat Oct 29 16:29:41 2005 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Tue Nov 1 15:57:42 2005 Subject: [Biopython-dev] [Bug 1885] KEGG Compound db format changes Message-ID: <200510292029.j9TKTfI7011222@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1885 ------- Comment #3 from mdehoon@ims.u-tokyo.ac.jp 2005-10-29 16:29 ------- How did you download the new test cases for KEGG compound? Are the existing test cases in Tests/KEGG no longer valid? The submitted patch causes test_KEGG.py to fail, but I'm not sure if that is due to a bug in the patch or whether the existing test cases don't satisfy the current KEGG standard. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.