From biopython at maubp.freeserve.co.uk Wed Sep 1 07:06:11 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 1 Sep 2010 12:06:11 +0100 Subject: [Biopython-dev] IMGT parser (modified EMBL format), In-Reply-To: References: Message-ID: On Tue, Aug 24, 2010 at 3:35 PM, Uri Laserson wrote: > Hi all, > > I would obviously prefer it to go into the distribution as soon as it is > possible, but I don't want to mess with the releases. ?The IMGT people said > they'll put a news announcement on their site and a link to biopython once > the code is in the official release. > > Uri I've checked this into the master branch now, so this will be in the next release (Biopython 1.56, probably in October/November 2010). http://github.com/biopython/biopython/commit/6d1e144e1054231162ce57cee5ca8c37921ada41 Thank you! Peter From bugzilla-daemon at portal.open-bio.org Wed Sep 1 07:06:55 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 1 Sep 2010 07:06:55 -0400 Subject: [Biopython-dev] [Bug 3069] Support for EMBL-like files from IMGT In-Reply-To: Message-ID: <201009011106.o81B6tq0020803@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #24 from biopython-bugzilla at maubp.freeserve.co.uk 2010-09-01 07:06 EST ------- Code from branch checked in, http://github.com/biopython/biopython/commit/6d1e144e1054231162ce57cee5ca8c37921ada41 Thanks! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Sep 1 07:15:17 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 1 Sep 2010 12:15:17 +0100 Subject: [Biopython-dev] Deprecated code to remove for Biopython 1.56 Message-ID: Hi all, The following were deprecated in Biopython 1.52 (released 22 September 2009), so I think they can be removed ready for Biopython 1.56 now: Bio.EZRetrieve, Bio.NetCatch, Bio.File.SGMLHandle, Bio.FilteredReader Bio.AlignAce and Bio.MEME We also still need to look at Bio.Translate, Bio.Transcribe and linked modules: http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008210.html Peter From biopython at maubp.freeserve.co.uk Wed Sep 1 11:38:31 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 1 Sep 2010 16:38:31 +0100 Subject: [Biopython-dev] Bio.utils, Bio.PropertyManager, Bio.Encodings.IUPACEncoding, etc Message-ID: On Wed, Sep 1, 2010 at 12:15 PM, Peter wrote: > Hi all, > > The following were deprecated in Biopython 1.52 (released 22 September 2009), > so I think they can be removed ready for Biopython 1.56 now: > > Bio.EZRetrieve, Bio.NetCatch, Bio.File.SGMLHandle, Bio.FilteredReader > Bio.AlignAce and Bio.MEME > > We also still need to look at Bio.Translate, Bio.Transcribe and linked modules: > http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008210.html Bio.Translate and Bio.Transcribe were deprecated in Biopython 1.51, over a year ago now. The deprecated modules Bio.Translate and Bio.Transcribe are imported by Bio.Encodings.IUPACEncoding, so trying to import that triggers deprecation warnings. Effectively this means we can claim Bio.Encodings.IUPACEncoding has already been deprecated (and this is the only code under Bio.Encodings). Now, as far as I can tell, the point of IUPACEncoding is to attach properties to the IUPAC alphabets using Bio.PropertyManager (which I assume predates the inclusion of properties into the Python language). The only place this was used (I think) was in Bio.utils, e.g. something like this: >>> from Bio.utils import total_weight >>> from Bio.Seq import Seq >>> from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA >>> total_weight(Seq("ACGT", IUPACAmbiguousDNA())) (deprecation warnings ignored) 1355.0 Since this triggers deprecation warnings from Bio.Transcribe and Bio.Transcribe, we could argue that this function in Bio.utils (and all similar ones using Bio.PropertyManager) have effectively been labelled as deprecated for some time now. Unfortunately this example would fail on Biopython 1.55 (which we didn't spot since Bio.utils has no unit tests) due to this apparently harmless change, http://github.com/biopython/biopython/commit/8a08d553d367b9aa1c7f730f967bc11e1fca7a6e It may have looked like a pointless import, but it had the (undocumented) side effect of attaching properties like weight and translation tables to the IUPAC alphabet objects. I have just reverted this: http://github.com/biopython/biopython/commit/f9efb5e5ae5c58096addd398b7d50f1400d82ccc For Biopython 1.55 we explicitly declared Bio.utils, Bio.PropertyManager, and Bio.Encodings as obsolete. So, what do we do now? I'd like to declare them deprecated in Biopython 1.56, and remove them (and Bio.Translate and Bio.Transcribe) in Biopython 1.57. This is a quicker removal than usual, but I'd argue anyone using these modules would have already been getting deprecation warnings about Bio.Translate and Bio.Transcribe anyway. [I suppose we could just remove it all now - but some explicit warning seems safer!] Comments? I think the only useful bit of functionality (which wasn't documented, nor had unit tests) was to calculate the molecular weight of sequences. That could be added under Bio.SeqUtils I think. Peter From biopython at maubp.freeserve.co.uk Wed Sep 1 11:52:48 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 1 Sep 2010 16:52:48 +0100 Subject: [Biopython-dev] Deprecated code to remove for Biopython 1.56 In-Reply-To: References: Message-ID: On Wed, Sep 1, 2010 at 12:15 PM, Peter wrote: > Hi all, > > The following were deprecated in Biopython 1.52 (released 22 September 2009), > so I think they can be removed ready for Biopython 1.56 now: > > Bio.EZRetrieve, Bio.NetCatch, Bio.File.SGMLHandle, Bio.FilteredReader > Done: http://github.com/biopython/biopython/commit/1097994fa2557cbee14a23c5354b643f312f0b07 > > Bio.AlignAce and Bio.MEME > Bartek - could you handle removing Bio.AlignAce and Bio.MEME please (assuming you agree that makes sense)? > We also still need to look at Bio.Translate, Bio.Transcribe and linked modules: > http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008210.html See this thread: http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008231.html Peter From barwil at gmail.com Wed Sep 1 11:55:41 2010 From: barwil at gmail.com (Bartek Wilczynski) Date: Wed, 1 Sep 2010 17:55:41 +0200 Subject: [Biopython-dev] Deprecated code to remove for Biopython 1.56 In-Reply-To: References: Message-ID: On Wed, Sep 1, 2010 at 5:52 PM, Peter wrote: > > > > Bio.AlignAce and Bio.MEME > > > > Bartek - could you handle removing Bio.AlignAce and Bio.MEME please > (assuming > you agree that makes sense)? > > Absolutely. It makes perfect sense. I'll do it tomorrow. Bartek -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From bugzilla-daemon at portal.open-bio.org Fri Sep 3 09:49:03 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Sep 2010 09:49:03 -0400 Subject: [Biopython-dev] [Bug 3135] New: Wrong instance length bug in MEME parser Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3135 Summary: Wrong instance length bug in MEME parser Product: Biopython Version: 1.55 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: johnbaronreid at netscape.net The MEME parser in biopython 1.55 seems to incorrectly set the length of the first instance of a motif to 0. Here is an example: #Sequence, start, length, site Motif: E-value: 0.000010 seq_3, 213, 0, AGGTGACAGAG seq_1, 146, 11, AGGTGACAGAG seq_0, 490, 11, AGGTGACAGAG seq_0, 83, 11, AGGTGACAGAG seq_0, 388, 11, AGGAAACAGAG seq_1, 422, 11, AGGGGACAGAG seq_1, 79, 11, TGGAGACAGAG seq_0, 281, 11, TGGGGACAGAG seq_0, 16, 11, TAGAGACAGAG seq_1, 228, 11, TTGTGACAGAG seq_4, 156, 11, AGGGGACAGGG seq_0, 348, 11, AGGAAAGAGAA seq_0, 374, 11, AGGAATGAGAG seq_5, 22, 11, GGGAAACTGAG seq_3, 486, 11, AAGGGAGTGAG Here's the code that generated the above: from Bio.Motif.Parsers.MEME import MEMEParser import cStringIO meme_output = cStringIO.StringIO(""" ******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 4.3.0 (Release date: Sat Sep 26 01:51:56 PDT 2009) For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.nbcr.net. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.nbcr.net. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= /home/john/Data/Tompa-data-set/Real/hm22r.fasta ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ seq_0 1.0000 500 seq_1 1.0000 500 seq_2 1.0000 500 seq_3 1.0000 500 seq_4 1.0000 500 seq_5 1.0000 500 ******************************************************************************** ******************************************************************************** COMMAND LINE SUMMARY ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. command: meme /home/john/Data/Tompa-data-set/Real/hm22r.fasta -maxsize 1000000 -oc output/run_dataset/Tompa/hm22r/Real -dna -mod anr -revcomp -print_starts -maxiter 1000 -minw 8 -maxw 20 -minsites 2 -nmotifs 1 model: mod= anr nmotifs= 1 evt= inf object function= E-value of product of p-values width: minw= 8 maxw= 20 minic= 0.00 width: wg= 11 ws= 1 endgaps= yes nsites: minsites= 2 maxsites= 30 wnsites= 0.8 theta: prob= 1 spmap= uni spfuzz= 0.5 global: substring= yes branching= no wbranch= no em: prior= dirichlet b= 0.01 maxiter= 1000 distance= 1e-05 data: n= 3000 N= 6 strands: + - sample: seed= 0 seqfrac= 1 Letter frequencies in dataset: A 0.195 C 0.305 G 0.305 T 0.195 Background letter frequencies (from dataset with add-one prior applied): A 0.195 C 0.305 G 0.305 T 0.195 ******************************************************************************** ******************************************************************************** MOTIF 1 width = 11 sites = 15 llr = 159 E-value = 9.8e-006 ******************************************************************************** -------------------------------------------------------------------------------- Motif 1 Description -------------------------------------------------------------------------------- Simplified A 71:439:9:91 pos.-specific C ::::::8:::: probability G 18a37:2:a19 matrix T 31:3:1:1::: bits 2.4 2.1 * 1.9 * * * 1.6 * * *** Relative 1.4 * * **** Entropy 1.2 * * * **** (15.3 bits) 0.9 *** ******* 0.7 *********** 0.5 *********** 0.2 *********** 0.0 ----------- Multilevel AGGAGACAGAG consensus T TA G sequence G -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Strand Start P-value Site ------------- ------ ----- --------- ----------- seq_3 - 213 4.54e-07 GGCCTTTGGA AGGTGACAGAG GCGCGGCCAC seq_1 - 146 4.54e-07 CCCAACAGGA AGGTGACAGAG GTGGCTCTGG seq_0 + 490 4.54e-07 AAAACAGCAG AGGTGACAGAG seq_0 - 83 4.54e-07 CCCAGCAGGA AGGTGACAGAG GTGGCTCTGG seq_0 + 388 5.99e-07 ATGAGAGGAG AGGAAACAGAG CTTCCTGGAC seq_1 + 422 1.10e-06 ATGAGAGGGG AGGGGACAGAG GACACCTGAA seq_1 + 79 1.33e-06 TTGGTGGTAC TGGAGACAGAG GGCTGGTCCC seq_0 + 281 3.17e-06 CCTCCCCTGA TGGGGACAGAG GTCTCATCAG seq_0 + 16 5.72e-06 CTGGTGACAC TAGAGACAGAG GGCTGGTCCC seq_1 - 228 1.18e-05 TTATTTTCCT TTGTGACAGAG AAACCCAGCA seq_4 + 156 2.07e-05 TCAAGTCCCA AGGGGACAGGG AGCAGAAGGG seq_0 + 348 2.47e-05 GTAGACAGAA AGGAAAGAGAA AGTAAGGACA seq_0 + 374 3.14e-05 GGACAAAGGT AGGAATGAGAG GAGAGGAAAC seq_5 - 22 4.53e-05 CTCTTGTGTA GGGAAACTGAG CACGGGGAAC seq_3 + 486 5.02e-05 CGCCAATGGG AAGGGAGTGAG TGCC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- seq_3 5e-05 212_[-1]_262_[+1]_4 seq_1 1.2e-05 78_[+1]_56_[-1]_71_[-1]_183_[+1]_68 seq_0 3.2e-06 15_[+1]_56_[-1]_187_[+1]_56_[+1]_ 15_[+1]_3_[+1]_91_[+1] seq_4 2.1e-05 155_[+1]_334 seq_5 4.5e-05 21_[-1]_468 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 1 width=11 seqs=15 seq_3 ( 213) AGGTGACAGAG 1 seq_1 ( 146) AGGTGACAGAG 1 seq_0 ( 490) AGGTGACAGAG 1 seq_0 ( 83) AGGTGACAGAG 1 seq_0 ( 388) AGGAAACAGAG 1 seq_1 ( 422) AGGGGACAGAG 1 seq_1 ( 79) TGGAGACAGAG 1 seq_0 ( 281) TGGGGACAGAG 1 seq_0 ( 16) TAGAGACAGAG 1 seq_1 ( 228) TTGTGACAGAG 1 seq_4 ( 156) AGGGGACAGGG 1 seq_0 ( 348) AGGAAAGAGAA 1 seq_0 ( 374) AGGAATGAGAG 1 seq_5 ( 22) GGGAAACTGAG 1 seq_3 ( 486) AAGGGAGTGAG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 11 n= 2940 bayes= 6.7534 E= 9.8e-006 177 -1055 -219 45 -55 -1055 139 -155 -1055 -1055 171 -1055 103 -1055 -19 77 45 -1055 127 -1055 226 -1055 -1055 -155 -1055 139 -61 -1055 215 -1055 -1055 -55 -1055 -1055 171 -1055 226 -1055 -219 -1055 -155 -1055 161 -1055 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 11 nsites= 15 E= 9.8e-006 0.666667 0.000000 0.066667 0.266667 0.133333 0.000000 0.800000 0.066667 0.000000 0.000000 1.000000 0.000000 0.400000 0.000000 0.266667 0.333333 0.266667 0.000000 0.733333 0.000000 0.933333 0.000000 0.000000 0.066667 0.000000 0.800000 0.200000 0.000000 0.866667 0.000000 0.000000 0.133333 0.000000 0.000000 1.000000 0.000000 0.933333 0.000000 0.066667 0.000000 0.066667 0.000000 0.933333 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 regular expression -------------------------------------------------------------------------------- [AT]GG[ATG][GA]A[CG]AGAG -------------------------------------------------------------------------------- Time 3.78 secs. ******************************************************************************** ******************************************************************************** SUMMARY OF MOTIFS ******************************************************************************** -------------------------------------------------------------------------------- Combined block diagrams: non-overlapping sites with p-value < 0.0001 -------------------------------------------------------------------------------- SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- seq_0 4.45e-04 15_[+1(5.72e-06)]_56_[-1(4.54e-07)]_187_[+1(3.17e-06)]_56_[+1(2.47e-05)]_15_[+1(3.14e-05)]_3_[+1(5.99e-07)]_91_[+1(4.54e-07)] seq_1 4.45e-04 78_[+1(1.33e-06)]_56_[-1(4.54e-07)]_71_[-1(1.18e-05)]_183_[+1(1.10e-06)]_68 seq_2 2.03e-01 500 seq_3 4.45e-04 212_[-1(4.54e-07)]_262_[+1(5.02e-05)]_4 seq_4 2.01e-02 155_[+1(2.07e-05)]_334 seq_5 4.34e-02 21_[-1(4.53e-05)]_468 -------------------------------------------------------------------------------- ******************************************************************************** ******************************************************************************** Stopped because nmotifs = 1 reached. ******************************************************************************** CPU: john-dell ******************************************************************************** """) parser = MEMEParser() parsed = parser.parse(meme_output) print '#Sequence, start, length, site' for motif in parsed.motifs: print 'Motif: E-value: %f' % motif.evalue for instance in motif.instances: print "%10s, %5d, %5d, %s" % ( instance.sequence_name, instance.start, instance.length, str(instance), ) #assert instance.length == motif.length -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Sep 3 09:49:22 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Sep 2010 09:49:22 -0400 Subject: [Biopython-dev] [Bug 3135] Wrong instance length bug in MEME parser In-Reply-To: Message-ID: <201009031349.o83DnMxp002567@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3135 johnbaronreid at netscape.net changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |johnbaronreid at netscape.net -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Sep 3 10:29:02 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Sep 2010 10:29:02 -0400 Subject: [Biopython-dev] [Bug 3135] Wrong instance length bug in MEME parser In-Reply-To: Message-ID: <201009031429.o83ET2vO005042@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3135 barwil at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Sep 3 10:53:17 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Sep 2010 10:53:17 -0400 Subject: [Biopython-dev] [Bug 3135] Wrong instance length bug in MEME parser In-Reply-To: Message-ID: <201009031453.o83ErHXg006375@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3135 barwil at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #1 from barwil at gmail.com 2010-09-03 10:53 EST ------- Small bug, fixed in the master branch. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Fri Sep 3 13:17:24 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 3 Sep 2010 10:17:24 -0700 (PDT) Subject: [Biopython-dev] DeprecationWarnings Message-ID: <712888.65830.qm@web62408.mail.re1.yahoo.com> Hi everybody, In Python 2.7, DeprecationWarnings are silenced by default; they are shown if Python is started with the -Wd option. Most of our users won't use the -Wd option and therefore won't see any of the DeprecationWarnings. I suggest that we replace all DeprecationWarnings in Biopython with a Biopython-specific warning, perhaps implemented in Bio/__init__.py, to make sure that users actually see these warnings when they occur. At the same time, we can add DeprecationWarnings to all functions marked as obsolete, since most users won't see these DeprecationWarnings anyway (assuming they are using Python 2.7 or later). This allows us to check if any software depends on obsolete code by using the -Wd option when starting Python. Any objections? --Michiel. From mjldehoon at yahoo.com Fri Sep 3 13:26:51 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 3 Sep 2010 10:26:51 -0700 (PDT) Subject: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez Message-ID: <488872.91962.qm@web62406.mail.re1.yahoo.com> Hi everybody, The parser in Bio.Entrez can parse any XML returned by the Entrez E-utilities as long as the corresponding DTD is available (which are included with each release of Biopython). One corner case is efetch results from the Journals database. Officially, efetch from the Journals database does not generate output in the XML format, but only plain text or HTML. However, when requesting XML explicitly from Entrez, in practice it does return an XML-like output. Our parser in Bio.Entrez is able to parse this XML, but it requires several hacks in the parser code. As probably few users are interested in efetch output from the Journals database, I suggest that we remove these hacks from Bio.Entrez altogether -- after all, this is for XML that is not supported by NCBI to begin with. If there are some users that really want to parse efetch output from the Journals database, we can always add a simple parser for plain-text efetch output. The advantage of removing these hacks is that it will allow us to validate all XML against the DTD, and to raise an error (if the user requests so) if any elements are found in the XML that don't validate against the DTD. Any objections? --Michiel. From biopython at maubp.freeserve.co.uk Fri Sep 3 13:28:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 3 Sep 2010 18:28:32 +0100 Subject: [Biopython-dev] DeprecationWarnings In-Reply-To: <712888.65830.qm@web62408.mail.re1.yahoo.com> References: <712888.65830.qm@web62408.mail.re1.yahoo.com> Message-ID: On Fri, Sep 3, 2010 at 6:17 PM, Michiel de Hoon wrote: > Hi everybody, > > In Python 2.7, DeprecationWarnings are silenced by default; they are shown > if Python is started with the -Wd option. Most of our users won't use the -Wd > option and therefore won't see any of the DeprecationWarnings. I remember you pointed this out last week, I agree this will be a problem. http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008207.html > I suggest that > we replace all DeprecationWarnings in Biopython with a Biopython-specific > warning, perhaps implemented in Bio/__init__.py, to make sure that users > actually see these warnings when they occur. Sounds good, maybe Bio.BioDeprecationWarning would do as a name? > At the same time, we can add DeprecationWarnings to all functions > marked as obsolete, since most users won't see these DeprecationWarnings > anyway (assuming they are using Python 2.7 or later). ?This allows us to > check if any software depends on obsolete code by using the -Wd option > when starting Python. > > Any objections? Yes. That would be a misuse of DeprecationWarning, and would by very annoying for people running on Python 2.6 or older (which will probably be most users for the time being). Instead we can use the built in PendingDeprecationWarning (which is silent by default): http://lists.open-bio.org/pipermail/biopython-dev/2009-September/006762.html Regards, Peter From biopython at maubp.freeserve.co.uk Fri Sep 3 13:31:09 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 3 Sep 2010 18:31:09 +0100 Subject: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez In-Reply-To: <488872.91962.qm@web62406.mail.re1.yahoo.com> References: <488872.91962.qm@web62406.mail.re1.yahoo.com> Message-ID: On Fri, Sep 3, 2010 at 6:26 PM, Michiel de Hoon wrote: > Hi everybody, > > The parser in Bio.Entrez can parse any XML returned by the Entrez E-utilities > as long as the corresponding DTD is available (which are included with each > release of Biopython). One corner case is efetch results from the Journals > database. Officially, efetch from the Journals database does not generate > output in the XML format, but only plain text or HTML. However, when > requesting XML explicitly from Entrez, in practice it does return an XML-like > output. Our parser in Bio.Entrez is able to parse this XML, but it requires > several hacks in the parser code. Out of interest, have you asked the NCBI about this undocumented XML output? > As probably few users are interested in efetch output from the Journals > database, I suggest that we remove these hacks from Bio.Entrez altogether > -- after all, this is for XML that is not supported by NCBI to begin with. If > there are some users that really want to parse efetch output from the > Journals database, we can always add a simple parser for plain-text > efetch output. > > The advantage of removing these hacks is that it will allow us to validate > all XML against the DTD, and to raise an error (if the user requests so) > if any elements are found in the XML that don't validate against the DTD. > > Any objections? Is it feasible to just put deprecation warnings in for Biopython 1.56, and then remove the hacks later? Peter From mjldehoon at yahoo.com Sat Sep 4 01:18:34 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 3 Sep 2010 22:18:34 -0700 (PDT) Subject: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez In-Reply-To: Message-ID: <695617.66055.qm@web62406.mail.re1.yahoo.com> --- On Fri, 9/3/10, Peter wrote: > Out of interest, have you asked the NCBI about this > undocumented XML output? Yes, several times. > Is it feasible to just put deprecation warnings in for > Biopython 1.56, and then remove the hacks later? Yes that is possible, but I don't think that we should go through the usual obsolete / deprecation / removal procedure, because this will take more than a year, and it's not worthwhile for a piece of code that probably nobody uses. In addition, once the hacks are removed, the parser will be able to validate each XML document it parses, which is to the benefit of all users and use cases. I don't think that we should postpone that by a year. --Michiel. From mjldehoon at yahoo.com Sat Sep 4 01:21:07 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 3 Sep 2010 22:21:07 -0700 (PDT) Subject: [Biopython-dev] DeprecationWarnings In-Reply-To: Message-ID: <160151.26509.qm@web62401.mail.re1.yahoo.com> --- On Fri, 9/3/10, Peter wrote: > > I suggest that > > we replace all DeprecationWarnings in Biopython with a > Biopython-specific > > warning, perhaps implemented in Bio/__init__.py, to > make sure that users > > actually see these warnings when they occur. > > Sounds good, maybe Bio.BioDeprecationWarning would do as a > name? I would prefer Bio.DeprecationWarning instead of Bio.BioDeprecationWarning. [for obsolete code:] > Instead we can use the built in > PendingDeprecationWarning (which is silent by default): OK, let's use that then for obsolete code. If there are no other opinions, I'll make these changes next weekend. --Michiel. From biopython at maubp.freeserve.co.uk Sat Sep 4 08:03:28 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 4 Sep 2010 13:03:28 +0100 Subject: [Biopython-dev] DeprecationWarnings In-Reply-To: <160151.26509.qm@web62401.mail.re1.yahoo.com> References: <160151.26509.qm@web62401.mail.re1.yahoo.com> Message-ID: On Sat, Sep 4, 2010 at 6:21 AM, Michiel de Hoon wrote: > > On Fri, 9/3/10, Peter wrote: >> >> Sounds good, maybe Bio.BioDeprecationWarning would do as a >> name? > > I would prefer Bio.DeprecationWarning instead of Bio.BioDeprecationWarning. But then if we do "from Bio import DeprecationWarning" is would mask the built in DeprecationWarning - which could cause confusion? > [for obsolete code:] >> Instead we can use the built in >> PendingDeprecationWarning (which is silent by default): > > OK, let's use that then for obsolete code. OK Peter From mjldehoon at yahoo.com Sat Sep 4 08:29:29 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 4 Sep 2010 05:29:29 -0700 (PDT) Subject: [Biopython-dev] DeprecationWarnings In-Reply-To: Message-ID: <277645.62569.qm@web62403.mail.re1.yahoo.com> --- On Sat, 9/4/10, Peter wrote: > > I would prefer Bio.DeprecationWarning instead of > Bio.BioDeprecationWarning. > > But then if we do "from Bio import DeprecationWarning" is > would mask the built in DeprecationWarning - which could cause > confusion? OK, then how about Bio.BiopythonDeprecationWarning? --Michiel. From mjldehoon at yahoo.com Sat Sep 4 11:23:16 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 4 Sep 2010 08:23:16 -0700 (PDT) Subject: [Biopython-dev] Fwd: [blast-announce] Correction: BLAST 2.2.24 release announcement In-Reply-To: Message-ID: <438480.92769.qm@web62402.mail.re1.yahoo.com> Being able to convert Blast ASN.1 output into any of the other formats will make a big difference to us. If we had a parser for ASN.1 Blast output, then strictly speaking there is no reason to have a parser for any of the other formats (in practice, we can be more flexible of course). I looked some more into the Blast parser issues we discussed earlier (starting here: http://lists.open-bio.org/pipermail/biopython-dev/2010-May/007762.html). Unfortunately things are not as easy as I had hoped. Except for the new ASN.1 output format, none of the other output formats (plain text, XML, tabular) contain all of the output generated by the Blast run. Some results are only found in the XML, some only in the plain text output, and tabular output can contain all kinds of stuff depending on the exact options that were used. As a consequence, it's hard to design a generic Blast record class; having a specialized Record class for plain text, XML, and tabular seems more appropriate, and these record classes may not be fully consistent with each other (some elements may exist in one class but not in the other). Also, we cannot read in the Blast output in one format and write out the Blast output in a different format (at least not reliably). With the format converter in Blast 2.2.24, luckily there is no longer such a need for such converters in Biopython. If we had an ASN.1 parser, we could run Blast, save its output in ASN.1, load the Blast output into Python, filter the Blast output or otherwise modify it, write out the modified output in ASN.1 format, and then use the Blast 2.2.24 format converter to convert the modified output to plain text or some other format. That would be really useful. Unfortunately, making a parser for ASN.1 will not be so easy. As far as I know there isn't anything like expat or DOM for ASN.1 like we have for XML. Maybe this is something for a google summer of code? --Michiel. --- On Tue, 8/24/10, Peter wrote: > From: Peter > Subject: [Biopython-dev] Fwd: [blast-announce] Correction: BLAST 2.2.24 release announcement > To: "Biopython-Dev Mailing List" > Date: Tuesday, August 24, 2010, 12:30 PM > Hi all, > > The NCBI have just released a new version of BLAST+ (see > below). > > I've just updated the existing BLAST+ application wrappers > for the minor > changes made in BLAST 2.2.24+. > > Something potentially quite useful in this release is the > blast_formatter > command for turning ASN.1 BLAST+ output (using ?outfmt > 11) into > any of the other output formats. i.e. If you are not sure > what output > format will be most useful (e.g. plain text, XML, tabular) > and rerunning > the BLAST is slow, the NCBI now let you run the BLAST once > and save > it as ASN.1, then convert this to any other format on > demand using > blast_formatter (which should be fast). > > We should write a command line wrapper for this new > tool... > > Peter > > ---------- Forwarded message ---------- > From: mcginnis > Date: Tue, Aug 24, 2010 at 4:46 PM > Subject: [blast-announce] Correction: BLAST 2.2.24 release > announcement > To: NLM/NCBI List blast-announce > > > A new version of the stand-alone applications is > available. > > Users are encouraged to use the BLAST+ applications > available at > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ > This release includes a number of bug fixes as well as new > features > for the BLAST+ applications: > > *?Introduce BLAST Archive format to permit reformatting > of?stand-alone > BLAST searches with the blast_formatter(see BLAST+ user > manual) > * Added the blast_formatter application (see BLAST+ user > manual) > * Added support for translated subject soft masking in the > BLAST databases > * Added support for the BLAST Trace-back operations (btop) > output format > * Added command line options to blastdbcmd for listing > available BLAST databases > * Improved performance of formatting of remote BLAST > searches > * Use a consistent exit code for out of memory conditions > * Fixed bug in indexed megablast with multiple > space-separated BLAST databases > * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and > makeblastdb > * Fixed Windows installer for 64-bit installations > > BLAST+ applications, as well as the legacy C applications > (e.g. > blastall), may be downloaded from > http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython at maubp.freeserve.co.uk Sat Sep 4 13:29:42 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 4 Sep 2010 18:29:42 +0100 Subject: [Biopython-dev] DeprecationWarnings In-Reply-To: <277645.62569.qm@web62403.mail.re1.yahoo.com> References: <277645.62569.qm@web62403.mail.re1.yahoo.com> Message-ID: On Sat, Sep 4, 2010 at 1:29 PM, Michiel de Hoon wrote: > --- On Sat, 9/4/10, Peter wrote: >> > I would prefer Bio.DeprecationWarning instead of >> Bio.BioDeprecationWarning. >> >> But then if we do "from Bio import DeprecationWarning" is >> would mask the built in DeprecationWarning - which could cause >> confusion? > > OK, then how about Bio.BiopythonDeprecationWarning? I like that too - longer but clear. Peter From anaryin at gmail.com Mon Sep 6 11:05:56 2010 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 6 Sep 2010 16:05:56 +0100 Subject: [Biopython-dev] GSOC Bio.PDB Project - Final Summary In-Reply-To: <20100816132716.GG23299@sobchak.mgh.harvard.edu> References: <20100816132716.GG23299@sobchak.mgh.harvard.edu> Message-ID: Hello Brad! I think most of it is ready to be integrated with the main branch. There are a couple of features like Hydrogenation that need a thorough look at them. Regarding compatibility, I think there is absolutely no problem at all. These are extensions based on previous code. The few completely new additions don't break anything as far as I know :) I will clean it up a little bit more and maybe then we can merge it. Best! And thanks! Jo?o From anaryin at gmail.com Mon Sep 6 11:05:56 2010 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 6 Sep 2010 16:05:56 +0100 Subject: [Biopython-dev] GSOC Bio.PDB Project - Final Summary In-Reply-To: <20100816132716.GG23299@sobchak.mgh.harvard.edu> References: <20100816132716.GG23299@sobchak.mgh.harvard.edu> Message-ID: Hello Brad! I think most of it is ready to be integrated with the main branch. There are a couple of features like Hydrogenation that need a thorough look at them. Regarding compatibility, I think there is absolutely no problem at all. These are extensions based on previous code. The few completely new additions don't break anything as far as I know :) I will clean it up a little bit more and maybe then we can merge it. Best! And thanks! Jo?o From biopython at maubp.freeserve.co.uk Tue Sep 7 07:17:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 7 Sep 2010 12:17:37 +0100 Subject: [Biopython-dev] Fwd: [blast-announce] Correction: BLAST 2.2.24 release announcement In-Reply-To: <438480.92769.qm@web62402.mail.re1.yahoo.com> References: <438480.92769.qm@web62402.mail.re1.yahoo.com> Message-ID: On Sat, Sep 4, 2010 at 4:23 PM, Michiel de Hoon wrote: > Being able to convert Blast ASN.1 output into any of the other formats > will make a big difference to us. If we had a parser for ASN.1 Blast > output, then strictly speaking there is no reason to have a parser for > any of the other formats (in practice, we can be more flexible of course). Dave Messina made a good point on the BioPerl list that (depending on what data you are interested in) post-processing to generate the alignment views is a waste of CPU time: http://lists.open-bio.org/pipermail/bioperl-l/2010-August/033972.html Also, and we see this already with the XML output, the output file size is quite inflated - especially if all you need can be presented in one of the tabular forms which is smaller and quicker to parse. So yes, in principle a parser for ASN.1 Blast would be all we need, but in practice tabular/plaintext/XML BLAST parsers are still useful. > I looked some more into the Blast parser issues we discussed > earlier (starting here: > http://lists.open-bio.org/pipermail/biopython-dev/2010-May/007762.html > ). Unfortunately things are not as easy as I had hoped. Except > for the new ASN.1 output format, none of the other output > formats (plain text, XML, tabular) contain all of the output > generated by the Blast run. Some results are only found in > the XML, some only in the plain text output, and tabular > output can contain all kinds of stuff depending on the exact > options that were used. As a consequence, it's hard to design > a generic Blast record class; having a specialized Record > class for plain text, XML, and tabular seems more appropriate, > and these record classes may not be fully consistent with each > other (some elements may exist in one class but not in the > other). I thought it would be hard :( > Also, we cannot read in the Blast output in one format and > write out the Blast output in a different format (at least not > reliably). In some cases this isn't surprising of course (e.g. tabular to XML isn't going to work). > With the format converter in Blast 2.2.24, luckily there is > no longer such a need for such converters in Biopython. > If we had an ASN.1 parser, we could run Blast, save its > output in ASN.1, load the Blast output into Python, filter > the Blast output or otherwise modify it, write out the > modified output in ASN.1 format, and then use the Blast > 2.2.24 format converter to convert the modified output > to plain text or some other format. That would be really > useful. > > Unfortunately, making a parser for ASN.1 will not be so > easy. As far as I know there isn't anything like expat or > DOM for ASN.1 like we have for XML. Maybe this is > something for a google summer of code? Maybe. There are some python libraries out there for ASN.1 (it is an ISO standard used beyond the NCBI). http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One http://bitbucket.org/haypo/hachoir/wiki/hachoir-parser Peter From updates at feedmyinbox.com Wed Sep 8 03:10:53 2010 From: updates at feedmyinbox.com (Feed My Inbox) Date: Wed, 8 Sep 2010 03:10:53 -0400 Subject: [Biopython-dev] 9/8 biopython Questions - BioStar Message-ID: // Biopython NCBIStandalone blastall gives different result than calling blastall directly from cmd // September 7, 2010 at 7:36 PM http://biostar.stackexchange.com/questions/2391/biopython-ncbistandalone-blastall-gives-different-result-than-calling-blastall-di So, first I tested what results I should get from the blastall program using the command line, with e-value 0.001: C:\Niek\Test\blast-2.2.17\bin\blastall -p blastp -d C:\Niek\Test\arabidopsis-smallproteins.fasta -i C:\Niek\Test\arabidopsis-HD.fasta -e 0.001 -F F -m 8 -o C:\Niek\Test\arab-HD-smallproteins-notfiltered.out and C:\Niek\Test\blast-2.2.17\bin\blastall -p blastp -d C:\Niek\Test\arabidopsis-smallproteins.fasta -i C:\Niek\Test\arabidopsis-HD.fasta -e 0.001 -m 8 -o C:\Niek\Test\arab-HD-smallproteins-filtered.out After that I made a local blast program, which works fine but it only found 91 results with e-value equal or lower than 0.001, where the results from the blastall via cmd gave around 140~ something results. I first thought it missed some, but all the e-values are different. from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML my_blast_db = r"C:\Niek\Test\arabidopsis-smallproteins.fasta" my_blast_file = r"C:\Niek\Test\arabidopsis-HD.fasta" my_blast_exe =r"C:\Niek\blast-2.2.17\bin\blastall.exe" result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe, "blastp", my_blast_db, my_blast_file) blast_records = NCBIXML.parse(result_handle) E_VALUE_THRESH = 0.001 x = 0 for blast_record in blast_records: blast_record = blast_records.next() for alignment in blast_record.alignments: for hsp in alignment.hsps: if hsp.expect <= E_VALUE_THRESH: print "==========Alignment========" print "sequence:", alignment.title print "length:", alignment.length print "e value:", hsp.expect x += 1 I first thought that the local blast from biopython uses a different algorithm, but at 'my_blast_exe =r"C:\Niek\blast-2.2.17\bin\blastall.exe"' I specify the same program but it should be the same. Then I thought it had something to do with the filtering option but I checked both filtered and unfiltered and wasn't any of that. If you know why the local blast from biopython NCBIStandalone gives a different result than doing it directly in the cmd, please let me know. Thanks in advance, Niek edit: I checked, and it seems that the NCBIStandalone filters out 100% identities, which the blastall called by cmd does not. However this doesn't explain why the e-values are so different. // Problem with blastp of biopython: returned non-zero exit status 1 // September 7, 2010 at 4:10 PM http://biostar.stackexchange.com/questions/2388/problem-with-blastp-of-biopython-returned-non-zero-exit-status-1 I want to do a local BLAST using blastp from the Bio.Blast.Applications. However, when I run it I get a runtime error: returned non-zero exit status 1. According to the manual it is Exit Code Meaning: 1 Error in query sequence(s) or BLAST options. The query used is fasta format protein sequences. The command line I used, with the BLAST options, was: '>>> print blastp_cline C:\Program Files\NCBI\blast-2.2.24+\bin\blastp.exe -query "C:\Documents and Settings\newintern\Desktop\Microproteins_niek\arabidopsis-HD.fasta" -db C:\Documents and Settings\newintern\Desktop\Microproteins_niek\arabidopsis-smallproteins.fasta -out test.xml -evalue 0.001 -outfmt 5 Anyone know how to fix that error? Thanks Niek -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/444424/f8ec200ea7b1a33442ee9d28a3d1365a23421b9a/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. 230 Franklin Road Suite 814 Franklin, TN 37064 From biopython at maubp.freeserve.co.uk Thu Sep 9 13:08:45 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Sep 2010 18:08:45 +0100 Subject: [Biopython-dev] Python 3 status (ignoring our C code and most dependencies) Message-ID: Hi Bartek, Eric, et al, I've rerun the test suite on the trunk code, and we have the following issues, most of which I'd already noted in this thread: http://lists.open-bio.org/pipermail/biopython-dev/2010-July/008079.html Bartek - I was seeing a couple of issues with Bio.Motif which came down to relative import issues, this seems to have fixed things. Could you confirm this change looks OK to you? http://github.com/biopython/biopython/commit/4700d1be7afffe5e06b41df6ee8cc19e68a9a6c1 Eric - Can you reproduce this test_Phylo.py failure on your machine? And is there any chance you'll be able to look at the Bio.PDB issue with DisorderedResidue? Thanks, Peter ------------------------------------------------------------------------ test_LocationParser ... Syntax error at or near `467' token Something in the spark parser isn't handled by 2to3, not urgent as I want to deprecate Bio.GenBank.LocationParser which is the only thing using spark. http://lists.open-bio.org/pipermail/biopython/2010-September/006734.html ------------------------------------------------------------------------ test_PDB ... FAIL TypeError: 'DisorderedResidue' object is not subscriptable See: http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008159.html ------------------------------------------------------------------------ test_Phylo ... FAIL Traceback (most recent call last): File "test_Phylo.py", line 47, in test_convert Phylo.convert(self.mem_file, 'nexus', mem_file_2, 'phyloxml') File "/home/xxx/repositories/biopython/Bio/Phylo/_io.py", line 102, in convert return write(trees, out_file, out_format, **kwargs) File "/home/xxx/repositories/biopython/Bio/Phylo/_io.py", line 92, in write n = getattr(supported_formats[format], 'write')(trees, file, **kwargs) File "/home/xxx/repositories/biopython/Bio/Phylo/PhyloXMLIO.py", line 148, in write return Writer(obj).write(file, encoding=encoding, indent=indent) File "/home/xxx/repositories/biopython/Bio/Phylo/PhyloXMLIO.py", line 684, in write self._tree.write(file, encoding) File "/home/xxx/lib/python3.1/xml/etree/ElementTree.py", line 659, in write self._write(file, self._root, encoding, {}) File "/home/xxx/lib/python3.1/xml/etree/ElementTree.py", line 677, in _write file.write(_encode("<" + tag, encoding)) TypeError: string argument expected, got 'bytes' ------------------------------------------------------------------------ test_SeqIO_online ... FAIL May need to turn all online byte handles into unicode handles, http://lists.open-bio.org/pipermail/biopython-dev/2010-July/008076.html ------------------------------------------------------------------------ test_property_manager ... FAIL Looks like a formatting change of some kind, but I want to deprecate this: http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008231.html ------------------------------------------------------------------------ Peter From biopython at maubp.freeserve.co.uk Thu Sep 9 13:27:16 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Sep 2010 18:27:16 +0100 Subject: [Biopython-dev] Deprecated code to remove for Biopython 1.56 In-Reply-To: References: Message-ID: On Wed, Sep 1, 2010 at 4:55 PM, Bartek Wilczynski wrote: > On Wed, Sep 1, 2010 at 5:52 PM, Peter wrote: >> >> Bartek - could you handle removing Bio.AlignAce and Bio.MEME please >> (assuming you agree that makes sense)? >> > > Absolutely. It makes perfect sense. I'll do it tomorrow. > Hi Bartek, I updated the DEPRECATED about their removal, and NEWS to mention you've contributed to what will be Biopython 1.56. Could you check if there is anything in test_MEME.py worth keeping (i.e. moving into test_Motif.py), and then delete test_MEME.py? Thanks, Peter From bugzilla-daemon at portal.open-bio.org Thu Sep 9 13:42:29 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 9 Sep 2010 13:42:29 -0400 Subject: [Biopython-dev] [Bug 3096] PPBuilder build_peptides bugs In-Reply-To: Message-ID: <201009091742.o89HgTs8024309@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3096 ------- Comment #6 from skong at zymeworks.com 2010-09-09 13:42 EST ------- Hi Peter, I tested out the code (on the script directly, not using git) and it works fine. I only have minor concerns that the additional input variable "standard_aa_only" for _accept() method in class _PPBuilder might break other codes that assumes it still has two instead of three input variables. Also within the same script there are three different naming and default value for the same flag (standard amino acid): 1. named "standard" with default False in is_aa() method 2. named "aa_only" with default 1 in build_peptides() method of class _PPBuilder 3. named "standard_aa_only" with no default value in _accept() method of class _PPBuilder Which is again minor. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Sep 9 13:53:02 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Sep 2010 18:53:02 +0100 Subject: [Biopython-dev] Bio.utils, Bio.PropertyManager, Bio.Encodings.IUPACEncoding, etc In-Reply-To: References: Message-ID: On Wed, Sep 1, 2010 at 4:38 PM, Peter wrote: > > For Biopython 1.55 we explicitly declared Bio.utils, Bio.PropertyManager, > and Bio.Encodings as obsolete. So, what do we do now? I'd like to declare > them deprecated in Biopython 1.56, and remove them (and Bio.Translate > and Bio.Transcribe) in Biopython 1.57. This is a quicker removal than usual, > but I'd argue anyone using these modules would have already been getting > deprecation warnings about Bio.Translate and Bio.Transcribe anyway. > I've marked them as deprecated, with an explicit DeprecationWarning in Bio.utils, but not for Bio.PropertyManager and Bio.Encodings which would be triggered by Bio.Alphabet.IUPAC. http://github.com/biopython/biopython/commit/28a7daeef6ff57979ec08de62777528219976df7 Peter From bugzilla-daemon at portal.open-bio.org Thu Sep 9 13:58:01 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 9 Sep 2010 13:58:01 -0400 Subject: [Biopython-dev] [Bug 3096] PPBuilder build_peptides bugs In-Reply-To: Message-ID: <201009091758.o89Hw1i8025811@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3096 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2010-09-09 13:58 EST ------- (In reply to comment #6) > Hi Peter, > > I tested out the code (on the script directly, not using git) and it works > fine. Excellent - thank you. > I only have minor concerns that the additional input variable > "standard_aa_only" for _accept() method in class _PPBuilder might break > other codes that assumes it still has two instead of three input variables. True, but I think that is a low risk and it is intended as a private API. It could be made an optional argument I suppose. > Also within the same script there are three different naming and default value > for the same flag (standard amino acid): > > 1. named "standard" with default False in is_aa() method > 2. named "aa_only" with default 1 in build_peptides() method of class > _PPBuilder > 3. named "standard_aa_only" with no default value in _accept() method of class > _PPBuilder > > Which is again minor. We can change the new argument ("standard_aa_only") added to _accept() without breaking backwards compatibility. I was trying to make it explicit - would you prefer "standard" instead? We both agreed that "aa_only" is very misleading. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bartek at rezolwenta.eu.org Thu Sep 9 14:51:07 2010 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Thu, 9 Sep 2010 20:51:07 +0200 Subject: [Biopython-dev] Python 3 status (ignoring our C code and most dependencies) In-Reply-To: References: Message-ID: On Thu, Sep 9, 2010 at 7:08 PM, Peter wrote: > Hi Bartek, Eric, et al, > > Hi, > I've rerun the test suite on the trunk code, and we have the > following issues, most of which I'd already noted in this thread: > http://lists.open-bio.org/pipermail/biopython-dev/2010-July/008079.html > > Bartek - I was seeing a couple of issues with Bio.Motif which came > down to relative import issues, this seems to have fixed things. > Could you confirm this change looks OK to you? > > http://github.com/biopython/biopython/commit/4700d1be7afffe5e06b41df6ee8cc19e68a9a6c1 > > I have to confess that my knowledge about python 3 is almost non-existent, so I was not following the messages related to the migration too closely. As far as I can tell, the changes in the imports are practically purely syntactic. I don't see possibilities of serious code breakage (the names which have changed in the Bio.Motif namespaces were introduced recently and I think any code relying on having Parsers.AlignAce.read inside Bio.Motif namespace should be reviewed anyway). Thanks for making the changes (and finishing my removal of Bio.MEME and Bio.AlignAce - I keep forgetting about those pesky files in the main biopython dir...) cheers Bartek From bugzilla-daemon at portal.open-bio.org Fri Sep 10 05:40:40 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 10 Sep 2010 05:40:40 -0400 Subject: [Biopython-dev] [Bug 3096] PPBuilder build_peptides bugs In-Reply-To: Message-ID: <201009100940.o8A9eev4019621@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3096 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2010-09-10 05:40 EST ------- Fix cherry-picked from that branch and committed: http://github.com/biopython/biopython/commit/544e4855e219cfbce813a50fa183683a7b0e4b3e I've also added you as a contributor (let me know if you want your email address included in the CONTRIB file, or would prefer not to be named): http://github.com/biopython/biopython/commit/993d58eb8e49a32d6821471421050720b88bfeeb Marking bug as fixed. Thank you :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Fri Sep 10 12:30:56 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 10 Sep 2010 12:30:56 -0400 Subject: [Biopython-dev] Python 3 status (ignoring our C code and most dependencies) In-Reply-To: References: Message-ID: On Thu, Sep 9, 2010 at 1:08 PM, Peter wrote: > Eric - Can you reproduce this test_Phylo.py failure on your machine? > And is there any chance you'll be able to look at the Bio.PDB issue > with DisorderedResidue? I'll try to give these a shot this weekend: > ------------------------------------------------------------------------ > test_PDB ... FAIL > > TypeError: 'DisorderedResidue' object is not subscriptable > > See: > http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008159.html I took an initial look at it and was baffled. 2to3 doesn't seem to do anything that would affect it, and the relevant part of the code is an interesting tangle of if-else clauses related to the state of something non-local. So, this will take some careful stepping-through. Does anyone else have any hints on why this might be happening? > ------------------------------------------------------------------------ > test_Phylo ... FAIL > > Traceback (most recent call last): > ?File "test_Phylo.py", line 47, in test_convert > ? ?Phylo.convert(self.mem_file, 'nexus', mem_file_2, 'phyloxml') > ?File "/home/xxx/repositories/biopython/Bio/Phylo/_io.py", line 102, in convert > ? ?return write(trees, out_file, out_format, **kwargs) > ?File "/home/xxx/repositories/biopython/Bio/Phylo/_io.py", line 92, in write > ? ?n = getattr(supported_formats[format], 'write')(trees, file, **kwargs) > ?File "/home/xxx/repositories/biopython/Bio/Phylo/PhyloXMLIO.py", > line 148, in write > ? ?return Writer(obj).write(file, encoding=encoding, indent=indent) > ?File "/home/xxx/repositories/biopython/Bio/Phylo/PhyloXMLIO.py", > line 684, in write > ? ?self._tree.write(file, encoding) > ?File "/home/xxx/lib/python3.1/xml/etree/ElementTree.py", line 659, in write > ? ?self._write(file, self._root, encoding, {}) > ?File "/home/xxx/lib/python3.1/xml/etree/ElementTree.py", line 677, in _write > ? ?file.write(_encode("<" + tag, encoding)) > TypeError: string argument expected, got 'bytes' > Neat. I added this test shortly before the Biopython 1.55 release, and I guess it's doing its job. It might have something to do with the 'encoding' argument triggering some string/byte incompatibility in ElementTree; I'll check it out. -Eric From biopython at maubp.freeserve.co.uk Mon Sep 13 06:29:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 13 Sep 2010 11:29:32 +0100 Subject: [Biopython-dev] SwissProt parser: Feature ID kept in description string Message-ID: Hi Michiel at al, I'm looking at the SwissProt plain text parser (for Bug 2235, making SeqFeature objects in SeqIO for "swiss" format), and noticed something that puzzles me in the new parser in Bio/SwissProt/__init__.py: The parser spots /FTId= entries and extracts the feature ID, which is good, but leaves this string in the description string, which I find odd. Essentially I'd like to change this bit: if line[29:35]==r"/FTId=": ft_id = line[35:70].rstrip()[:-1] else: ft_id ="" too: if line[29:35]==r"/FTId=": ft_id = line[35:70].rstrip()[:-1] description = "" else: ft_id ="" What do you think? Peter From mjldehoon at yahoo.com Mon Sep 13 09:01:44 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 13 Sep 2010 06:01:44 -0700 (PDT) Subject: [Biopython-dev] SwissProt parser: Feature ID kept in description string In-Reply-To: Message-ID: <185662.21952.qm@web62401.mail.re1.yahoo.com> I have no objections. --Michiel. --- On Mon, 9/13/10, Peter wrote: > From: Peter > Subject: SwissProt parser: Feature ID kept in description string > To: "Biopython-Dev Mailing List" , "Michiel de Hoon" > Date: Monday, September 13, 2010, 6:29 AM > Hi Michiel at al, > > I'm looking at the SwissProt plain text parser (for Bug > 2235, making > SeqFeature objects in SeqIO for "swiss" format), and > noticed something > that puzzles me in the new parser in > Bio/SwissProt/__init__.py: > > The parser spots /FTId= entries and extracts the feature > ID, which > is good, but leaves this string in the description string, > which I find > odd. Essentially I'd like to change this bit: > > ? ? if line[29:35]==r"/FTId=": > ? ? ? ? ft_id = > line[35:70].rstrip()[:-1] > ? ? else: > ? ? ? ? ft_id ="" > > too: > > ? ? if line[29:35]==r"/FTId=": > ? ? ? ? ft_id = > line[35:70].rstrip()[:-1] > ? ? ? ? description = "" > ? ? else: > ? ? ? ? ft_id ="" > > What do you think? > > Peter > From biopython at maubp.freeserve.co.uk Mon Sep 13 10:01:13 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 13 Sep 2010 15:01:13 +0100 Subject: [Biopython-dev] SwissProt parser: Feature ID kept in description string In-Reply-To: <185662.21952.qm@web62401.mail.re1.yahoo.com> References: <185662.21952.qm@web62401.mail.re1.yahoo.com> Message-ID: On Mon, Sep 13, 2010 at 2:01 PM, Michiel de Hoon wrote: > I have no objections. > --Michiel. Great - done here: http://github.com/biopython/biopython/tree/3f600a8197a96856c5b7977e2bc140a8c6a6f7c8 and unit test updated here: http://github.com/biopython/biopython/tree/5ab01c52cdd789133b35dccbd20896a7d342a2f5 Peter From biopython at maubp.freeserve.co.uk Mon Sep 13 13:47:23 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 13 Sep 2010 18:47:23 +0100 Subject: [Biopython-dev] New: Uniprot XML parser In-Reply-To: References: <3b674cf220d52226cf9b2e189598fe61.squirrel@lipid.biocomp.unibo.it> <8ec7479153894f66ea029abd059e06c5.squirrel@lipid.biocomp.unibo.it> Message-ID: Hi Andrea, I've done some work on the plain text swiss parser to handle features, and some basic testing to make sure it agrees with the uniprot-xml parser. This showed some problems with end locations out by one in the XML parser which I believe I was able to resolve. I have also commented out the use of the skip_parsing_errors option - it doesn't seem to be needed and silent errors are bad. I have (for the moment) introduced a couple of new position classes in Bio.SeqFeature for "?123" where we have a position but it is uncertain, and "?" where we don't have a position at all. The later might be handled more elegantly by inferring a Before/AfterPosition instead... Note that for testing purposes, I have disabled your code where it builds a SeqFeature for a dbReference - I'm not sure what the best plan here is yet. Could you have a look at my branch please? http://github.com/peterjc/biopython/commits/uniprot Thanks, Peter From updates at feedmyinbox.com Tue Sep 14 03:12:43 2010 From: updates at feedmyinbox.com (Feed My Inbox) Date: Tue, 14 Sep 2010 03:12:43 -0400 Subject: [Biopython-dev] 9/14 biopython Questions - BioStar Message-ID: <804dd35f521a1792fdf62a62806ac334@74.63.51.88> // How to show more BLAST results using biopython? // September 13, 2010 at 12:58 PM http://biostar.stackexchange.com/questions/2462/how-to-show-more-blast-results-using-biopython Hi, I'm using biopython to BLAST over the internet. However, it only saves 30 results (there are more than 30 results that are under the e-value I chose) in the xml. I've been looking all over but can't find how to make that number higher. So my question is, how can you show more results from BLAST using biopython. I'm using NCBIWWW.qblast from BIO.BLAST. from Bio.Blast import NCBIWWW File = "MIF" fasta_string = open(File+".fasta").read() result_handle = NCBIWWW.qblast("blastp", "nr", fasta_string) Thanks, Niek -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/444424/f8ec200ea7b1a33442ee9d28a3d1365a23421b9a/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. 230 Franklin Road Suite 814 Franklin, TN 37064 From tiagoantao at gmail.com Tue Sep 14 08:25:45 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 14 Sep 2010 13:25:45 +0100 Subject: [Biopython-dev] ubuntu Message-ID: Hi, Just a comment following from an email in the user list from Bartek. Should we nag people at Ubuntu/Debian to upgrade from 1.53 to something newer? See if they need help of some kind or such? I could volunteer to go and check what is happening and try to pull things a bit forward... -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From bartek at rezolwenta.eu.org Tue Sep 14 09:22:00 2010 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 14 Sep 2010 15:22:00 +0200 Subject: [Biopython-dev] ubuntu In-Reply-To: References: Message-ID: 2010/9/14 Tiago Ant?o > Hi, > > Hi Tiago, > Just a comment following from an email in the user list from Bartek. > Should we nag people at Ubuntu/Debian to upgrade from 1.53 to > something newer? See if they need help of some kind or such? > I could volunteer to go and check what is happening and try to pull > things a bit forward... > I'm not an expert on this, but as far as I know, this package is pulled more or less directly from Debian and maintained by the Debian-med team (see http://packages.debian.org/testing/python-biopython , especially the links to maintainers on the right). The delay comes from the fact that every six months, after making a release, ubuntu takes the biopython version from debian testing and puts it into the line for the next release in six months. This gives you effectively at least 6 months delay between the ubuntu version and the current biopython trunk. Lately, biopython makes at least one release (sometimes two) every six months which means that the delay will be at least one release number (more likely two, or more if somebody is not upgrading their ubuntu every 6 months). As far as I can tell, the guys at debian-med have the process of package release fairly automated, but there are two delays: - the delay in picking up new releases from biopython into debian testing. currently this is 1.54, they haven't picked up the 1.55 yet, which means about 1 month of delay - the delay of ubuntu releasing policy (currently, the 1.54 is scheduled to be in 10.10, we can expect, that 1.55 will make it into 11.4, by which time there will probably be biopython 1.56) There is also the ubuntu-backports system, which includes newer packages back-ported to older releases, and it includes biopython, but this only includes the packages already released for newer ubuntu versions. In summary, we might try to minimize the first delay by tyrying to synchronize a bit with ubuntu release cycle (I don't think we should be totally dependent on their schedule, but it might be good to remember that if we don't release in March or september, we will miss more than one ubuntu release) and ask the debian-med team for how we can make sure that the new release will make it into debian-testing as fast as possible. cheers B > -- > Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From andrea at biocomp.unibo.it Tue Sep 14 12:22:06 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Tue, 14 Sep 2010 18:22:06 +0200 (CEST) Subject: [Biopython-dev] New: Uniprot XML parser In-Reply-To: References: <3b674cf220d52226cf9b2e189598fe61.squirrel@lipid.biocomp.unibo.it> <8ec7479153894f66ea029abd059e06c5.squirrel@lipid.biocomp.unibo.it> Message-ID: Hi Peter, I've commented your commits directly on github, basically agreeing with them. Parsing PDB structures as positional features was done to capture all the information in the uniprot file. I do not see any better place than a SeqFeature for a positional information, the only option here is to skip it. I saw in your repository you are using the string "uniprot-xml" to call the parser, however the format name at the EBI REST and SOAP services is simply "uniprotxml". take a look at: http://www.ebi.ac.uk/Tools/webservices/services/dbfetch_rest I think it is better to be conservative in this. I'm still working on the SeqIO.index to make a faster implementation. RE are really slow, and ElementTree should cope well with this task. Anyhow it works with the current implementation, so it's not a big deal. Andrea > Hi Andrea, > > I've done some work on the plain text swiss parser to handle features, > and some basic testing to make sure it agrees with the uniprot-xml > parser. This showed some problems with end locations out by one > in the XML parser which I believe I was able to resolve. I have also > commented out the use of the skip_parsing_errors option - it doesn't > seem to be needed and silent errors are bad. > > I have (for the moment) introduced a couple of new position classes > in Bio.SeqFeature for "?123" where we have a position but it is > uncertain, and "?" where we don't have a position at all. The later > might be handled more elegantly by inferring a Before/AfterPosition > instead... > > Note that for testing purposes, I have disabled your code where > it builds a SeqFeature for a dbReference - I'm not sure what the > best plan here is yet. > > Could you have a look at my branch please? > > http://github.com/peterjc/biopython/commits/uniprot > > Thanks, > > Peter > From biopython at maubp.freeserve.co.uk Tue Sep 14 13:58:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 14 Sep 2010 18:58:32 +0100 Subject: [Biopython-dev] New: Uniprot XML parser In-Reply-To: References: <3b674cf220d52226cf9b2e189598fe61.squirrel@lipid.biocomp.unibo.it> <8ec7479153894f66ea029abd059e06c5.squirrel@lipid.biocomp.unibo.it> Message-ID: On Tue, Sep 14, 2010 at 5:22 PM, Andrea Pierleoni wrote: > > Hi Peter, > I've commented your commits directly on github, basically agreeing with > them. Thanks. > Parsing PDB structures as positional features was done to capture all the > information in the uniprot file. I do not see any better place than a > SeqFeature for a positional information, the only option here is to skip it. We could put the DB cross reference into the dbxrefs list, but that only captures a tiny part of the data. We could also put it in the annotations, but that loses the benefits of the position information. Maybe using a SeqFeature is the best plan... > I saw in your repository you are using the string "uniprot-xml" to call > the parser, however the format name at the EBI REST and SOAP services > is simply "uniprotxml". take a look at: > > http://www.ebi.ac.uk/Tools/webservices/services/dbfetch_rest > > I think it is better to be conservative in this. On the other hand, "uniprot-xml" fits well with the idea of "format-variant". Whatever we go with will have downsides. > I'm still working on the SeqIO.index to make a faster implementation. RE > are really slow, and ElementTree should cope well with this task. > Anyhow it works with the current implementation, so it's not a big deal. I don't know enough about ElementTree to help right now, sorry. Peter From biopython at maubp.freeserve.co.uk Tue Sep 14 17:59:29 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 14 Sep 2010 22:59:29 +0100 Subject: [Biopython-dev] Fwd: [Utilities-announce] PubMed E-Utility 2011 DTD updates - please read! In-Reply-To: References: Message-ID: Hi all, It looks like there are two more DTD files (available now) to add to Biopython for the Bio.Entrez parser. Peter ---------- Forwarded message ---------- From: Date: Tue, Sep 14, 2010 at 9:24 PM Subject: [Utilities-announce] PubMed E-Utility 2011 DTD updates - please read! To: NLM/NCBI List utilities-announce Dear NCBI PubMed E-Utility Users, We anticipate updating the PubMed E-Utility DTDs for 2011 in mid-December, approximately on December 13, 2010. The forthcoming DTDs are available from: http://eutils.ncbi.nlm.nih.gov/corehtml/query/DTD/pubmed_110101.dtd http://eutils.ncbi.nlm.nih.gov/corehtml/query/DTD/nlmmedlinecitationset_110101.dtd *[image: http://jira/images/icons/linkext7.gif]* 1. DTD AND XML CHANGES FOR 2011 1. Changes to NLMMedlineCitationSet DTD AND PubMed XML The DTD changes for the 2011 production year are itemized in the Revision Notes section near the top of the DTD. The following describes the substantive changes to NLMMedlineCitationSet dtd and PubMed XML: 1. Accommodating Structured Abstracts Two new attributes, Label and NlmCategory, are added to the AbstractText element which is used with both the Abstract and OtherAbstract elements. A valid label name found in published structured abstracts (e.g., Introduction, Goals, Study Design, Findings, Discussion) will be identified in the XML as an Abstract Text Label and each ?parent? concept to which the published Label name is mapped at NLM will be identified as an Abstract Text NlmCategory. Five NLM-assigned mapped-to categories are possible: Background, Objective, Methods, Results, and Conclusions. In general, the lack of Label and NlmCategory attributes in AbstractText means the published abstract is unstructured. Note that the content of structured abstracts will be exported in separate segments that need to be joined for display of the complete abstract text. DTD: In the following example, the published label names are INTRODUCTION; AIMS; DESIGN, SETTING AND PARTICIPANTS; RESULTS; and DISCUSSION which correspondingly map to the five NLM-assigned categories. Sample XML: 1. Implementing Protocol Class 2 Supplementary Concept Record (SCR) and Rare Disease Class 3 SCR terms A new element, SupplMeshList, is added to the MedlineCitation element and another new element, SupplMeshName with its attribute Type, is added to SupplMeshList. DTD: Sample XML: disease term protocol term 1. Separating MeSH Geographic Descriptor Names from other MeSH Descriptors The Type attribute is added to the DescriptorName element. DTD: Sample XML: New York 1. Accommodating Versioned Articles; Corresponding Change to PMID There is a new model of publishing referred to as ?versioning? whereby multiple versions of the same online article are released, sometimes in quick succession and sometimes almost as soon as the original article has been published. Beginning in the 2011 production year, NLM will create an individual citation for each article?s version and link the versions via new attributes for the MedlineCitation and PMID elements. The new attributes for the MedlineCitation element are VersionID and VersionDate. DTD: The new attribute for the PMID element is Version. DTD: Sample XML: 20029669 Search and display implementation in PubMed is under consideration at this time; all implementation decisions will be documented in a forthcoming NLM Technical Bulletin article. Details are: - The PMID combined with its version attribute value (e.g., 1, 2, 3) becomes the citation?s new unique identifier, represented as 12345678. - The PMID Version attribute value ?1? will be assigned to all existing records at the time the 2011 baseline files are produced and exported. - The PubDate value on citations of versions of the same article will be identical. The MedlineCitation Version Date and VersionID attribute values supplied by the publisher will identify the specific version. - If a citation is not for PMID Version 1, it must contain MedlineCitation Version Date and VersionID attribute values, and the original publication date for Version 1 as the PubDate. - A PMID Version attribute value higher than 1 indicates that there is a citation for at least one prior version (although it might happen, rarely, that a prior version subsequently gets deleted). Although the MedlineCitation VersionDate value may be different from the PubDate, it might be the same as PubDate if the new version was released later the same day. - In the future, when non-PubMed Central journals are included, the publisher-supplied VersionID value will be whatever the publisher decides it to be; e.g. the 2nd publisher-supplied VersionID may be ?b? or ?2b? and the PMID Version attribute value assigned by NLM will still be 2. 1. Eliminating Pre-defined Source Attribute Values for NameID Element The 2010 DTD specifies the values that may be used for the Source attribute for the NameID element. Please note that the NameID element has not yet been used; it is expected to be implemented at some point during the 2011. DTD: Sample XML: 1. Simplifying Author Element Structure The NameID element has been repositioned in the Author element to simplify the DTD structure. The XML is not affected. DTD: 1. Accommodating Identification of Machine-generated Keywords. A new valid value, NLM-AUTO, is added to the Owner attribute of the element KeywordList. DTD: Sample XML: 1. ENHANCED CHARACTER SET A subset of UTF-8 characters is currently supported for PubMed data. PubMed data now supports the full UTF-8 Character Set. Exceptions: All instances that represent a Double Quote will be translated to the straight double quote (Unicode 0022). All instances that represent a Single Quote (including the prime and apostrophe) will be translated to the straight single quote (Unicode 0027). Em Dash, En Dash, Hyphen, or Minus will be translated to the single dash (Unicode 002D). Those three Unicode values are part of the current Character Set. _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From andrea at biocomp.unibo.it Wed Sep 15 08:25:04 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Wed, 15 Sep 2010 14:25:04 +0200 (CEST) Subject: [Biopython-dev] New: Uniprot XML parser In-Reply-To: References: <3b674cf220d52226cf9b2e189598fe61.squirrel@lipid.biocomp.unibo.it> <8ec7479153894f66ea029abd059e06c5.squirrel@lipid.biocomp.unibo.it> Message-ID: <38e57130d02951c066b39ca420488d70.squirrel@lipid.biocomp.unibo.it> > > We could put the DB cross reference into the dbxrefs list, but that only > captures a tiny part of the data. We could also put it in the annotations, > but that loses the benefits of the position information. Maybe using a > SeqFeature is the best plan... > it is to me > > On the other hand, "uniprot-xml" fits well with the idea of > "format-variant". > Whatever we go with will have downsides. well I suppose we have to choose one. uniprot-xml is fine for me. > >> I'm still working on the SeqIO.index to make a faster implementation. RE >> are really slow, and ElementTree should cope well with this task. >> Anyhow it works with the current implementation, so it's not a big deal. > > I don't know enough about ElementTree to help right now, sorry. > Well I've reimplemented the _index UniprotDict function using ElementTree, but it looks like this cannot be done using ElementTree. To iterate over an XML file ElementTree uses a the iterparse function, that is able to capture start and end events for every tag. unfortunately this event "capture" is not aligned with the parsing process, meaning that when a start event is raised the parser could be up to 16K ahead in reading the file, and the actual position is variable. See http://mail.python.org/pipermail/xml-sig/2005-January/010838.html Thus I cannot pick up the start position of the tag in the file. The only way I found to make it work is going line by line, like you did in your implementation. We can use that one. Andrea implementation From updates at feedmyinbox.com Thu Sep 16 03:12:25 2010 From: updates at feedmyinbox.com (Feed My Inbox) Date: Thu, 16 Sep 2010 03:12:25 -0400 Subject: [Biopython-dev] 9/16 biopython Questions - BioStar Message-ID: <4a5e91a12d7f379e8caea7bca2ce04ce@74.63.51.88> // How can I automate BLAST of >1 sequence and output the top hits for each input? // September 15, 2010 at 12:02 PM http://biostar.stackexchange.com/questions/2488/how-can-i-automate-blast-of-1-sequence-and-output-the-top-hits-for-each-input I'm looking for a way to do BLAST simultaneously with >1 sequence. The input is several fasta files or one file containing several sequence formatted in fasta. For an initial testing, I want to try obtaining the top 10 blast results for each input sequence and output them to a text file (one text file for each input sequence, or one big file containing all). But currently, I'm drawing a blank. Is there a way to do this with (preferably) biopython? -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/444424/f8ec200ea7b1a33442ee9d28a3d1365a23421b9a/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. 230 Franklin Road Suite 814 Franklin, TN 37064 From biopython at maubp.freeserve.co.uk Sat Sep 18 08:25:11 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 18 Sep 2010 13:25:11 +0100 Subject: [Biopython-dev] Moving Bio.Motif documentation into Tutorial.tex Message-ID: Hi Bartek, I think it would be good to try and move your Bio.Motif documentation from file Docs/cookbook/motif/motif.tex into the main Docs/Tutorial.tex as a new chapter. Currently it isn't obvious that Biopython supports things like a Position Weight Matrix (PWM). What do you think? The text will need a slight update since we have now deprecated and removed Bio.AlignAce and Bio.MEME, but that should be easy. Thanks, Peter From barwil at gmail.com Sat Sep 18 09:04:37 2010 From: barwil at gmail.com (Bartek Wilczynski) Date: Sat, 18 Sep 2010 15:04:37 +0200 Subject: [Biopython-dev] Moving Bio.Motif documentation into Tutorial.tex In-Reply-To: References: Message-ID: Hi, On Sat, Sep 18, 2010 at 2:25 PM, Peter wrote: > Hi Bartek, > > I think it would be good to try and move your Bio.Motif > documentation from file Docs/cookbook/motif/motif.tex > into the main Docs/Tutorial.tex as a new chapter. > Currently it isn't obvious that Biopython supports > things like a Position Weight Matrix (PWM). > > What do you think? > > The text will need a slight update since we have now > deprecated and removed Bio.AlignAce and Bio.MEME, > but that should be easy. > In general, I'm all for it. It's just that right now is not necessarily the best time for me to put much work into it. I'm trying to meet a RECOMB deadline of Oct. 8th with a paper, so if it would not be a problem, I could update it to the current state of the API after that. On the other hand, if there's anybody who wants to do it before then, I can review the changes even earlier. thanks for remembering about it. Bartek From bugzilla-daemon at portal.open-bio.org Sat Sep 18 12:26:00 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 18 Sep 2010 12:26:00 -0400 Subject: [Biopython-dev] [Bug 3010] Bio.KDTree is leaking memory In-Reply-To: Message-ID: <201009181626.o8IGQ0vk025229@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3010 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-09-18 12:25 EST ------- Created an attachment (id=1542) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1542&action=view) C program to show the memory leak C example written by Bartosz Telenczuk at the "Python and Friends" workshop http://pythonfriends.blogspot.com The idea of this was to determine if the memory leak was via the Python binding, or in the KDTree C code - this showed it was in the C code. We could then run this in valgrind which reveals the nature of the memory leak which scales directly with the number of coordinates (and a very big clue on how to fix this - commit to follow soon). ==5438== 16,000 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==5438== at 0x4025016: realloc (vg_replace_malloc.c:525) ==5438== by 0x8048F77: KDTree_add_point (in /home/peterjc/repositories/biopython/Bio/KDTree/test) ==5438== by 0x80495A3: KDTree_set_data (in /home/peterjc/repositories/biopython/Bio/KDTree/test) ==5438== by 0x8048618: main (in /home/peterjc/repositories/biopython/Bio/KDTree/test) ==5438== (this was with 2000 points, one iteration) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Sep 18 12:35:02 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 18 Sep 2010 12:35:02 -0400 Subject: [Biopython-dev] [Bug 3010] Bio.KDTree is leaking memory In-Reply-To: Message-ID: <201009181635.o8IGZ2b2025417@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3010 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-09-18 12:35 EST ------- Bug fix was to free tree->_data_point_list in KDTree_destroy, committed here: http://github.com/biopython/biopython/commit/285c085d9f0476eaefcc1049e36285f2d3b9b54f Unit tests look good - marking as fixed. P.S. To compile this C test on Linux, put it in the Bio/KDTree directory and: gcc test_kdtree.c KDTree.c -lm -o test -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Sat Sep 18 15:48:56 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Sep 2010 15:48:56 -0400 Subject: [Biopython-dev] Python 3 status (ignoring our C code and most dependencies) In-Reply-To: References: Message-ID: > On Thu, Sep 9, 2010 at 1:08 PM, Peter wrote: >> Eric - Can you reproduce this test_Phylo.py failure on your machine? [...] >> ------------------------------------------------------------------------ >> test_Phylo ... FAIL >> >> Traceback (most recent call last): >> ?File "test_Phylo.py", line 47, in test_convert >> ? ?Phylo.convert(self.mem_file, 'nexus', mem_file_2, 'phyloxml') [...] >> ?File "/home/xxx/lib/python3.1/xml/etree/ElementTree.py", line 677, in _write >> ? ?file.write(_encode("<" + tag, encoding)) >> TypeError: string argument expected, got 'bytes' >> I fixed this one: http://github.com/biopython/biopython/commit/9b5faa2d1affdf439acfbbec911cde0862111005 No progress on the DisorderedResidue error yet. -Eric From biopython at maubp.freeserve.co.uk Mon Sep 20 06:09:54 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Sep 2010 11:09:54 +0100 Subject: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez In-Reply-To: <488872.91962.qm@web62406.mail.re1.yahoo.com> References: <488872.91962.qm@web62406.mail.re1.yahoo.com> Message-ID: On Fri, Sep 3, 2010 at 6:26 PM, Michiel de Hoon wrote: > Hi everybody, > > The parser in Bio.Entrez can parse any XML returned by the Entrez E-utilities > as long as the corresponding DTD is available ... > The advantage of removing these hacks is that it will allow us to validate all > XML against the DTD, and to raise an error (if the user requests so) if any > elements are found in the XML that don't validate against the DTD. Hi Michiel, All the tests look fine but there is a deprecation warning from your new exception classes (I'm running it with Python 2.6 on the Mac): $ python test_Entrez.py Test error handling when presented with Fasta non-XML data ... /Library/Python/2.6/site-packages/Bio/Entrez/Parser.py:114: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 self.message = message ok Test error handling when presented with GenBank non-XML data ... ok Test parsing XML returned by EFetch, Nucleotide database (first test) ... ok Test parsing XML returned by EFetch, Protein database ... ok Test parsing XML returned by EFetch, OMIM database ... ok Test parsing XML returned by EFetch, PubMed database (first test) ... ok Test parsing XML returned by EFetch, PubMed database (second test) ... ok Test parsing XML returned by EFetch, Taxonomy database ... ok Test parsing XML output returned by EGQuery (first test) ... ok Test parsing XML output returned by EGQuery (second test) ... ok Test if corrupted XML is handled correctly ... /Library/Python/2.6/site-packages/Bio/Entrez/Parser.py:121: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 self.message = message ok Peter From mjldehoon at yahoo.com Mon Sep 20 06:41:33 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 20 Sep 2010 03:41:33 -0700 (PDT) Subject: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez In-Reply-To: Message-ID: <954716.31147.qm@web62402.mail.re1.yahoo.com> Thanks for letting me know. Could you try with the latest version in the Python 2.6 series? This DeprecationWarning seems to be a bug in Python itself. I haven't seen this deprecation warning with either Python 2.6.5 or Python 2.7. --Michiel. --- On Mon, 9/20/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez > To: "Michiel de Hoon" > Cc: biopython-dev at biopython.org > Date: Monday, September 20, 2010, 6:09 AM > On Fri, Sep 3, 2010 at 6:26 PM, > Michiel de Hoon > wrote: > > Hi everybody, > > > > The parser in Bio.Entrez can parse any XML returned by > the Entrez E-utilities > > as long as the corresponding DTD is available ... > > The advantage of removing these hacks is that it will > allow us to validate all > > XML against the DTD, and to raise an error (if the > user requests so) if any > > elements are found in the XML that don't validate > against the DTD. > > Hi Michiel, > > All the tests look fine but there is a deprecation warning > from your new > exception classes (I'm running it with Python 2.6 on the > Mac): > > $ python test_Entrez.py > Test error handling when presented with Fasta non-XML data > ... > /Library/Python/2.6/site-packages/Bio/Entrez/Parser.py:114: > DeprecationWarning: BaseException.message has been > deprecated as of > Python 2.6 > ? self.message = message > ok > Test error handling when presented with GenBank non-XML > data ... ok > Test parsing XML returned by EFetch, Nucleotide database > (first test) ... ok > Test parsing XML returned by EFetch, Protein database ... > ok > Test parsing XML returned by EFetch, OMIM database ... ok > Test parsing XML returned by EFetch, PubMed database (first > test) ... ok > Test parsing XML returned by EFetch, PubMed database > (second test) ... ok > Test parsing XML returned by EFetch, Taxonomy database ... > ok > Test parsing XML output returned by EGQuery (first test) > ... ok > Test parsing XML output returned by EGQuery (second test) > ... ok > Test if corrupted XML is handled correctly ... > /Library/Python/2.6/site-packages/Bio/Entrez/Parser.py:121: > DeprecationWarning: BaseException.message has been > deprecated as of > Python 2.6 > ? self.message = message > ok > > > Peter > From biopython at maubp.freeserve.co.uk Mon Sep 20 07:17:54 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Sep 2010 12:17:54 +0100 Subject: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez In-Reply-To: <954716.31147.qm@web62402.mail.re1.yahoo.com> References: <954716.31147.qm@web62402.mail.re1.yahoo.com> Message-ID: On Mon, Sep 20, 2010 at 11:41 AM, Michiel de Hoon wrote: > Thanks for letting me know. > Could you try with the latest version in the Python 2.6 series? This > DeprecationWarning seems to be a bug in Python itself. I haven't seen > this deprecation warning with either Python 2.6.5 or Python 2.7. > Using the Apple provided Python 2.6 on Mac OS X 10.6.4 "Snow Leopard", $ python2.6 Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> class X(ValueError): ... def __init__(self, message): ... self.message = message ... >>> raise X("Test") __main__:3: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 Traceback (most recent call last): File "", line 1, in __main__.X >>> On one of our Linux machines, $ python2.6 -Wall Python 2.6.6 (r266:84292, Aug 31 2010, 16:21:14) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> class X(ValueError): ... def __init__(self, message): ... self.message = message ... >>> raise X("Test") Traceback (most recent call last): File "", line 1, in __main__.X >>> I did a little Google searching, and according to PEP 352 the BaseException message attribute which was introduced in Python 2.5 was deprecated in Python 2.6, http://www.python.org/dev/peps/pep-0352/ However, the initial implementation of the deprecation warning triggered false positives (as in your code on Python 2.6.1), and this was fixed later in the Python 2.6.x series: http://bugs.python.org/issue6844 There is a simple work around - avoid using message as the attribute name. For example we could use msg, or simply a private attribute like _message instead: $ python2.6 Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> class X(ValueError): ... def __init__(self, message): ... self.msg = message ... >>> raise X("Test") Traceback (most recent call last): File "", line 1, in __main__.X Peter From biopython at maubp.freeserve.co.uk Mon Sep 20 07:20:17 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Sep 2010 12:20:17 +0100 Subject: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez In-Reply-To: References: <954716.31147.qm@web62402.mail.re1.yahoo.com> Message-ID: On Mon, Sep 20, 2010 at 12:17 PM, Peter wrote: > However, the initial implementation of the deprecation warning triggered > false positives (as in your code on Python 2.6.1), and this was fixed later > in the Python 2.6.x series: http://bugs.python.org/issue6844 The commit for http://bugs.python.org/issue6844 changed the Python NEWS file, so I can see now this was fixed in Python 2.6.3 http://svn.python.org/view?view=rev&revision=74848 Peter From biopython at maubp.freeserve.co.uk Mon Sep 20 07:36:16 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Sep 2010 12:36:16 +0100 Subject: [Biopython-dev] Python 3 status (ignoring our C code and most dependencies) In-Reply-To: References: Message-ID: On Sat, Sep 18, 2010 at 8:48 PM, Eric Talevich wrote: > > I fixed this one: > http://github.com/biopython/biopython/commit/9b5faa2d1affdf439acfbbec911cde0862111005 > Well, almost ;) http://github.com/biopython/biopython/commit/5fca22f919eeb9c3161519818be9779f203d0a38 Works for me now :) Peter From mjldehoon at yahoo.com Mon Sep 20 09:05:56 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 20 Sep 2010 06:05:56 -0700 (PDT) Subject: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez In-Reply-To: Message-ID: <964724.85299.qm@web62406.mail.re1.yahoo.com> OK, msg it is. --Michiel. --- On Mon, 9/20/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez > To: "Michiel de Hoon" > Cc: biopython-dev at biopython.org > Date: Monday, September 20, 2010, 7:17 AM > On Mon, Sep 20, 2010 at 11:41 AM, > Michiel de Hoon > wrote: > > Thanks for letting me know. > > Could you try with the latest version in the Python > 2.6 series? This > > DeprecationWarning seems to be a bug in Python itself. > I haven't seen > > this deprecation warning with either Python 2.6.5 or > Python 2.7. > > > > Using the Apple provided Python 2.6 on Mac OS X 10.6.4 > "Snow Leopard", > > $ python2.6 > Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) > [GCC 4.2.1 (Apple Inc. build 5646)] on darwin > Type "help", "copyright", "credits" or "license" for more > information. > >>> class X(ValueError): > ...? ???def __init__(self, message): > ...? ? ? ???self.message = > message > ... > >>> raise X("Test") > __main__:3: DeprecationWarning: BaseException.message has > been > deprecated as of Python 2.6 > Traceback (most recent call last): > ? File "", line 1, in > __main__.X > >>> > > On one of our Linux machines, > > $ python2.6 -Wall > Python 2.6.6 (r266:84292, Aug 31 2010, 16:21:14) > [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 > Type "help", "copyright", "credits" or "license" for more > information. > >>> class X(ValueError): > ...? ???def __init__(self, message): > ...? ? ? ???self.message = > message > ... > >>> raise X("Test") > Traceback (most recent call last): > ? File "", line 1, in > __main__.X > >>> > > I did a little Google searching, and according to PEP 352 > the BaseException > message attribute which was introduced in Python 2.5 was > deprecated in > Python 2.6, http://www.python.org/dev/peps/pep-0352/ > > However, the initial implementation of the deprecation > warning triggered > false positives (as in your code on Python 2.6.1), and this > was fixed later > in the Python 2.6.x series: http://bugs.python.org/issue6844 > > There is a simple work around - avoid using message as the > attribute > name. For example we could use msg, or simply a private > attribute like > _message instead: > > $ python2.6 > Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) > [GCC 4.2.1 (Apple Inc. build 5646)] on darwin > Type "help", "copyright", "credits" or "license" for more > information. > >>> class X(ValueError): > ...? ???def __init__(self, message): > ...? ? ? ???self.msg = > message > ... > >>> raise X("Test") > Traceback (most recent call last): > ? File "", line 1, in > __main__.X > > Peter > From tiagoantao at gmail.com Tue Sep 28 07:41:16 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 28 Sep 2010 12:41:16 +0100 Subject: [Biopython-dev] Continuous integration Message-ID: Hi, I've been playing with buildbot a bit (for continuous integration stuff). I am creating a page on the wiki with some info on that front. This is just concept/exploratory stuff: if people don't like it, it is just a question to delete the page. Hopefully this will at least permit to see if continuous integration is worthwhile the effort and if buildbot is a good platform for Biopython. Any comments most welcome. I expect to have a working prototype very soon. If people don't like it, I just trash it (no problems with that). Tiago -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From krother at rubor.de Tue Sep 28 10:04:06 2010 From: krother at rubor.de (Kristian Rother) Date: Tue, 28 Sep 2010 16:04:06 +0200 Subject: [Biopython-dev] Report from the Python & Friends workshop Message-ID: <1f14b30493e399d4252107455f03563e-EhVcX1xCQgFaRwICBxEAXR0wfgFLV15YQUBGAEFfUC9ZUFgWXVpyH1RXX0FaLkQAV19fSFpTXA4=-webmailer1@server08.webmailer.hosteurope.de> Hi, I'd like share a report from a recent bioinformatics Python meeting (basically because we did Biopython work there). Sorry for crossposting. At the Python & Friends Workshop 2010 in Karpacz, 20 Python bioinformaticians spent three busy days programming and learning about newest technologies. The Polish & German ISCB Student Council members brought together an equal number of people from both countries. Some highlights of the workshop were: * A keynote on Biopython and subsequent tutorial were given by Peter Cock. During the session, the brand new GSOC code branch for analyzing PDB structures was tested. We found the center of mass, coarse-grained models, and renumbering features working well, but the hydrogenation and SS-bond calculation require further testing. * The K-means clustering algorithm implemented in numpy was explained in detail, and an intense numpy tutorial prepared by Bartosz Telenczuk. * A live demonstration on how to search & cluster ligands with the Schroedinger software package was given by Ewa Bielska. * The Galaxy tool integration workbench was in the focus of interest of many participants, and Sebastian Schultheiss gave a hands-on tutorial. A closer interaction of the Galaxy platform with Biopython would probably be useful for many people. * In a hackathon session, a memory leak in the KDTree code of Biopython was closed. * And of course, there was a hike into the mountains near the Czech border. The workshop took place in an informal setting bringing together groups with a variety of backgrounds. According to participants, it made sense to find out what tools are there, and to plan strategic issues like py3k migration. The workshop has been organized by Teresa Szczepinska (ISCB Student Council, regional group Poland), Sebastian Schultheiss (ISCB Student Council, regional group Germany), Stefan G?nther (Freiburg) and Kristian Rother (Poznan). Best Regards, Kristian From krother at rubor.de Tue Sep 28 10:04:49 2010 From: krother at rubor.de (Kristian Rother) Date: Tue, 28 Sep 2010 16:04:49 +0200 Subject: [Biopython-dev] RNA Alphabet with modified nucleotides Message-ID: Hi, finally got some more tests and cleanup done on Bio.RNA.RNASeq. The current branch http://github.com/krother/biopython/commits/rna_alphabet contains: - doctests - unit tests - copyright statements - added RNASeq object to test_Seq_obs.py - compatibility to the Seq class (there is a caveat though: functions like complement/translate/reverse_complement don't make sense on sequences containing modified nucleotides, and result in an exception.) Also, there is a branch with ongoing bugfix work on the PDB GSOC code, but I need to correspond with Joao Rodrigues on this. Best Regards, Kristian Am 2010-07-01 15:26, Peter wrote: > On Thu, Jul 1, 2010 at 2:01 PM, Kristian Rother wrote: >> Hi, >> >> I've commited code + tests for representing RNA sequences with modified >> nucleotides to a branch on Github. See: >> >> http://github.com/krother/biopython/commits/rna_alphabet >> >> I'm done with my list of 'most wanted' features for this class, including >> suggestions from Peter. >> What could I do next to help integrating the new code with the rest of >> Biopython? > Hi Kristian, > > I haven't had a play with the code, just a very brief look at it. > > You'll need to add licence and copyright statements. > > A few embedded doctests in the docstrings would be very nice > to help explain how the new classes are to be used. > > What happens if you add some of the new DNA seq objects > to test_Seq_objs.py? Is it all fine? > > Are you planning to add a reverse complement method etc? Or > does the current fall back on the Seq implementation work OK? > > Peter > From bugzilla-daemon at portal.open-bio.org Tue Sep 28 14:47:39 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 28 Sep 2010 14:47:39 -0400 Subject: [Biopython-dev] [Bug 3139] New: python setup.py test ends with error code 0 even on failure Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3139 Summary: python setup.py test ends with error code 0 even on failure Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: tiagoantao at gmail.com As per the subject: python setup.py test ends with error code zero even if there are failed tests. This is problematic for tools evaluating the outcome of tests (e.g. integration testing) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Sep 28 17:36:34 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 28 Sep 2010 17:36:34 -0400 Subject: [Biopython-dev] [Bug 3139] python setup.py test ends with error code 0 even on failure In-Reply-To: Message-ID: <201009282136.o8SLaYA0013308@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3139 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-09-28 17:36 EST ------- Hi Taigo, I removed a sys.exit(...) call from run_tests.py because it was annoying if running the tests from IDLE. With hindsight that was short sighted of me and should be reverted pending a better solution - see: http://github.com/biopython/biopython/commit/7def689001f1f457d754703d0f233f82b33c9074 Furthermore the TestRunner class method run should return a value to be used as the return code. Since 0 is success, and 2 is already used for bad args, I suggest 1 for at least one test failure. If we have "python run_tests.py" returning useful error codes, then it is easier (but not essential) to do the same for "python setup.py test". Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Sep 29 15:36:10 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Sep 2010 15:36:10 -0400 Subject: [Biopython-dev] [Bug 3139] python setup.py test ends with error code 0 even on failure In-Reply-To: Message-ID: <201009291936.o8TJaAg7028406@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3139 ------- Comment #2 from tiagoantao at gmail.com 2010-09-29 15:36 EST ------- > Furthermore the TestRunner class method run should return a value to be used > as the return code. Since 0 is success, and 2 is already used for bad args, > I suggest 1 for at least one test failure. > > If we have "python run_tests.py" returning useful error codes, then it is > easier (but not essential) to do the same for "python setup.py test". For now I will hack away my test scripts with buildbot. But if people like integration testing, we might have to revisit this to have a more clean system... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Sep 1 11:06:11 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 1 Sep 2010 12:06:11 +0100 Subject: [Biopython-dev] IMGT parser (modified EMBL format), In-Reply-To: References: Message-ID: On Tue, Aug 24, 2010 at 3:35 PM, Uri Laserson wrote: > Hi all, > > I would obviously prefer it to go into the distribution as soon as it is > possible, but I don't want to mess with the releases. ?The IMGT people said > they'll put a news announcement on their site and a link to biopython once > the code is in the official release. > > Uri I've checked this into the master branch now, so this will be in the next release (Biopython 1.56, probably in October/November 2010). http://github.com/biopython/biopython/commit/6d1e144e1054231162ce57cee5ca8c37921ada41 Thank you! Peter From bugzilla-daemon at portal.open-bio.org Wed Sep 1 11:06:55 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 1 Sep 2010 07:06:55 -0400 Subject: [Biopython-dev] [Bug 3069] Support for EMBL-like files from IMGT In-Reply-To: Message-ID: <201009011106.o81B6tq0020803@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #24 from biopython-bugzilla at maubp.freeserve.co.uk 2010-09-01 07:06 EST ------- Code from branch checked in, http://github.com/biopython/biopython/commit/6d1e144e1054231162ce57cee5ca8c37921ada41 Thanks! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Sep 1 11:15:17 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 1 Sep 2010 12:15:17 +0100 Subject: [Biopython-dev] Deprecated code to remove for Biopython 1.56 Message-ID: Hi all, The following were deprecated in Biopython 1.52 (released 22 September 2009), so I think they can be removed ready for Biopython 1.56 now: Bio.EZRetrieve, Bio.NetCatch, Bio.File.SGMLHandle, Bio.FilteredReader Bio.AlignAce and Bio.MEME We also still need to look at Bio.Translate, Bio.Transcribe and linked modules: http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008210.html Peter From biopython at maubp.freeserve.co.uk Wed Sep 1 15:38:31 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 1 Sep 2010 16:38:31 +0100 Subject: [Biopython-dev] Bio.utils, Bio.PropertyManager, Bio.Encodings.IUPACEncoding, etc Message-ID: On Wed, Sep 1, 2010 at 12:15 PM, Peter wrote: > Hi all, > > The following were deprecated in Biopython 1.52 (released 22 September 2009), > so I think they can be removed ready for Biopython 1.56 now: > > Bio.EZRetrieve, Bio.NetCatch, Bio.File.SGMLHandle, Bio.FilteredReader > Bio.AlignAce and Bio.MEME > > We also still need to look at Bio.Translate, Bio.Transcribe and linked modules: > http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008210.html Bio.Translate and Bio.Transcribe were deprecated in Biopython 1.51, over a year ago now. The deprecated modules Bio.Translate and Bio.Transcribe are imported by Bio.Encodings.IUPACEncoding, so trying to import that triggers deprecation warnings. Effectively this means we can claim Bio.Encodings.IUPACEncoding has already been deprecated (and this is the only code under Bio.Encodings). Now, as far as I can tell, the point of IUPACEncoding is to attach properties to the IUPAC alphabets using Bio.PropertyManager (which I assume predates the inclusion of properties into the Python language). The only place this was used (I think) was in Bio.utils, e.g. something like this: >>> from Bio.utils import total_weight >>> from Bio.Seq import Seq >>> from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA >>> total_weight(Seq("ACGT", IUPACAmbiguousDNA())) (deprecation warnings ignored) 1355.0 Since this triggers deprecation warnings from Bio.Transcribe and Bio.Transcribe, we could argue that this function in Bio.utils (and all similar ones using Bio.PropertyManager) have effectively been labelled as deprecated for some time now. Unfortunately this example would fail on Biopython 1.55 (which we didn't spot since Bio.utils has no unit tests) due to this apparently harmless change, http://github.com/biopython/biopython/commit/8a08d553d367b9aa1c7f730f967bc11e1fca7a6e It may have looked like a pointless import, but it had the (undocumented) side effect of attaching properties like weight and translation tables to the IUPAC alphabet objects. I have just reverted this: http://github.com/biopython/biopython/commit/f9efb5e5ae5c58096addd398b7d50f1400d82ccc For Biopython 1.55 we explicitly declared Bio.utils, Bio.PropertyManager, and Bio.Encodings as obsolete. So, what do we do now? I'd like to declare them deprecated in Biopython 1.56, and remove them (and Bio.Translate and Bio.Transcribe) in Biopython 1.57. This is a quicker removal than usual, but I'd argue anyone using these modules would have already been getting deprecation warnings about Bio.Translate and Bio.Transcribe anyway. [I suppose we could just remove it all now - but some explicit warning seems safer!] Comments? I think the only useful bit of functionality (which wasn't documented, nor had unit tests) was to calculate the molecular weight of sequences. That could be added under Bio.SeqUtils I think. Peter From biopython at maubp.freeserve.co.uk Wed Sep 1 15:52:48 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 1 Sep 2010 16:52:48 +0100 Subject: [Biopython-dev] Deprecated code to remove for Biopython 1.56 In-Reply-To: References: Message-ID: On Wed, Sep 1, 2010 at 12:15 PM, Peter wrote: > Hi all, > > The following were deprecated in Biopython 1.52 (released 22 September 2009), > so I think they can be removed ready for Biopython 1.56 now: > > Bio.EZRetrieve, Bio.NetCatch, Bio.File.SGMLHandle, Bio.FilteredReader > Done: http://github.com/biopython/biopython/commit/1097994fa2557cbee14a23c5354b643f312f0b07 > > Bio.AlignAce and Bio.MEME > Bartek - could you handle removing Bio.AlignAce and Bio.MEME please (assuming you agree that makes sense)? > We also still need to look at Bio.Translate, Bio.Transcribe and linked modules: > http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008210.html See this thread: http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008231.html Peter From barwil at gmail.com Wed Sep 1 15:55:41 2010 From: barwil at gmail.com (Bartek Wilczynski) Date: Wed, 1 Sep 2010 17:55:41 +0200 Subject: [Biopython-dev] Deprecated code to remove for Biopython 1.56 In-Reply-To: References: Message-ID: On Wed, Sep 1, 2010 at 5:52 PM, Peter wrote: > > > > Bio.AlignAce and Bio.MEME > > > > Bartek - could you handle removing Bio.AlignAce and Bio.MEME please > (assuming > you agree that makes sense)? > > Absolutely. It makes perfect sense. I'll do it tomorrow. Bartek -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From bugzilla-daemon at portal.open-bio.org Fri Sep 3 13:49:03 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Sep 2010 09:49:03 -0400 Subject: [Biopython-dev] [Bug 3135] New: Wrong instance length bug in MEME parser Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3135 Summary: Wrong instance length bug in MEME parser Product: Biopython Version: 1.55 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: johnbaronreid at netscape.net The MEME parser in biopython 1.55 seems to incorrectly set the length of the first instance of a motif to 0. Here is an example: #Sequence, start, length, site Motif: E-value: 0.000010 seq_3, 213, 0, AGGTGACAGAG seq_1, 146, 11, AGGTGACAGAG seq_0, 490, 11, AGGTGACAGAG seq_0, 83, 11, AGGTGACAGAG seq_0, 388, 11, AGGAAACAGAG seq_1, 422, 11, AGGGGACAGAG seq_1, 79, 11, TGGAGACAGAG seq_0, 281, 11, TGGGGACAGAG seq_0, 16, 11, TAGAGACAGAG seq_1, 228, 11, TTGTGACAGAG seq_4, 156, 11, AGGGGACAGGG seq_0, 348, 11, AGGAAAGAGAA seq_0, 374, 11, AGGAATGAGAG seq_5, 22, 11, GGGAAACTGAG seq_3, 486, 11, AAGGGAGTGAG Here's the code that generated the above: from Bio.Motif.Parsers.MEME import MEMEParser import cStringIO meme_output = cStringIO.StringIO(""" ******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 4.3.0 (Release date: Sat Sep 26 01:51:56 PDT 2009) For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.nbcr.net. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.nbcr.net. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= /home/john/Data/Tompa-data-set/Real/hm22r.fasta ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ seq_0 1.0000 500 seq_1 1.0000 500 seq_2 1.0000 500 seq_3 1.0000 500 seq_4 1.0000 500 seq_5 1.0000 500 ******************************************************************************** ******************************************************************************** COMMAND LINE SUMMARY ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. command: meme /home/john/Data/Tompa-data-set/Real/hm22r.fasta -maxsize 1000000 -oc output/run_dataset/Tompa/hm22r/Real -dna -mod anr -revcomp -print_starts -maxiter 1000 -minw 8 -maxw 20 -minsites 2 -nmotifs 1 model: mod= anr nmotifs= 1 evt= inf object function= E-value of product of p-values width: minw= 8 maxw= 20 minic= 0.00 width: wg= 11 ws= 1 endgaps= yes nsites: minsites= 2 maxsites= 30 wnsites= 0.8 theta: prob= 1 spmap= uni spfuzz= 0.5 global: substring= yes branching= no wbranch= no em: prior= dirichlet b= 0.01 maxiter= 1000 distance= 1e-05 data: n= 3000 N= 6 strands: + - sample: seed= 0 seqfrac= 1 Letter frequencies in dataset: A 0.195 C 0.305 G 0.305 T 0.195 Background letter frequencies (from dataset with add-one prior applied): A 0.195 C 0.305 G 0.305 T 0.195 ******************************************************************************** ******************************************************************************** MOTIF 1 width = 11 sites = 15 llr = 159 E-value = 9.8e-006 ******************************************************************************** -------------------------------------------------------------------------------- Motif 1 Description -------------------------------------------------------------------------------- Simplified A 71:439:9:91 pos.-specific C ::::::8:::: probability G 18a37:2:a19 matrix T 31:3:1:1::: bits 2.4 2.1 * 1.9 * * * 1.6 * * *** Relative 1.4 * * **** Entropy 1.2 * * * **** (15.3 bits) 0.9 *** ******* 0.7 *********** 0.5 *********** 0.2 *********** 0.0 ----------- Multilevel AGGAGACAGAG consensus T TA G sequence G -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Strand Start P-value Site ------------- ------ ----- --------- ----------- seq_3 - 213 4.54e-07 GGCCTTTGGA AGGTGACAGAG GCGCGGCCAC seq_1 - 146 4.54e-07 CCCAACAGGA AGGTGACAGAG GTGGCTCTGG seq_0 + 490 4.54e-07 AAAACAGCAG AGGTGACAGAG seq_0 - 83 4.54e-07 CCCAGCAGGA AGGTGACAGAG GTGGCTCTGG seq_0 + 388 5.99e-07 ATGAGAGGAG AGGAAACAGAG CTTCCTGGAC seq_1 + 422 1.10e-06 ATGAGAGGGG AGGGGACAGAG GACACCTGAA seq_1 + 79 1.33e-06 TTGGTGGTAC TGGAGACAGAG GGCTGGTCCC seq_0 + 281 3.17e-06 CCTCCCCTGA TGGGGACAGAG GTCTCATCAG seq_0 + 16 5.72e-06 CTGGTGACAC TAGAGACAGAG GGCTGGTCCC seq_1 - 228 1.18e-05 TTATTTTCCT TTGTGACAGAG AAACCCAGCA seq_4 + 156 2.07e-05 TCAAGTCCCA AGGGGACAGGG AGCAGAAGGG seq_0 + 348 2.47e-05 GTAGACAGAA AGGAAAGAGAA AGTAAGGACA seq_0 + 374 3.14e-05 GGACAAAGGT AGGAATGAGAG GAGAGGAAAC seq_5 - 22 4.53e-05 CTCTTGTGTA GGGAAACTGAG CACGGGGAAC seq_3 + 486 5.02e-05 CGCCAATGGG AAGGGAGTGAG TGCC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- seq_3 5e-05 212_[-1]_262_[+1]_4 seq_1 1.2e-05 78_[+1]_56_[-1]_71_[-1]_183_[+1]_68 seq_0 3.2e-06 15_[+1]_56_[-1]_187_[+1]_56_[+1]_ 15_[+1]_3_[+1]_91_[+1] seq_4 2.1e-05 155_[+1]_334 seq_5 4.5e-05 21_[-1]_468 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 1 width=11 seqs=15 seq_3 ( 213) AGGTGACAGAG 1 seq_1 ( 146) AGGTGACAGAG 1 seq_0 ( 490) AGGTGACAGAG 1 seq_0 ( 83) AGGTGACAGAG 1 seq_0 ( 388) AGGAAACAGAG 1 seq_1 ( 422) AGGGGACAGAG 1 seq_1 ( 79) TGGAGACAGAG 1 seq_0 ( 281) TGGGGACAGAG 1 seq_0 ( 16) TAGAGACAGAG 1 seq_1 ( 228) TTGTGACAGAG 1 seq_4 ( 156) AGGGGACAGGG 1 seq_0 ( 348) AGGAAAGAGAA 1 seq_0 ( 374) AGGAATGAGAG 1 seq_5 ( 22) GGGAAACTGAG 1 seq_3 ( 486) AAGGGAGTGAG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 11 n= 2940 bayes= 6.7534 E= 9.8e-006 177 -1055 -219 45 -55 -1055 139 -155 -1055 -1055 171 -1055 103 -1055 -19 77 45 -1055 127 -1055 226 -1055 -1055 -155 -1055 139 -61 -1055 215 -1055 -1055 -55 -1055 -1055 171 -1055 226 -1055 -219 -1055 -155 -1055 161 -1055 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 11 nsites= 15 E= 9.8e-006 0.666667 0.000000 0.066667 0.266667 0.133333 0.000000 0.800000 0.066667 0.000000 0.000000 1.000000 0.000000 0.400000 0.000000 0.266667 0.333333 0.266667 0.000000 0.733333 0.000000 0.933333 0.000000 0.000000 0.066667 0.000000 0.800000 0.200000 0.000000 0.866667 0.000000 0.000000 0.133333 0.000000 0.000000 1.000000 0.000000 0.933333 0.000000 0.066667 0.000000 0.066667 0.000000 0.933333 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 regular expression -------------------------------------------------------------------------------- [AT]GG[ATG][GA]A[CG]AGAG -------------------------------------------------------------------------------- Time 3.78 secs. ******************************************************************************** ******************************************************************************** SUMMARY OF MOTIFS ******************************************************************************** -------------------------------------------------------------------------------- Combined block diagrams: non-overlapping sites with p-value < 0.0001 -------------------------------------------------------------------------------- SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- seq_0 4.45e-04 15_[+1(5.72e-06)]_56_[-1(4.54e-07)]_187_[+1(3.17e-06)]_56_[+1(2.47e-05)]_15_[+1(3.14e-05)]_3_[+1(5.99e-07)]_91_[+1(4.54e-07)] seq_1 4.45e-04 78_[+1(1.33e-06)]_56_[-1(4.54e-07)]_71_[-1(1.18e-05)]_183_[+1(1.10e-06)]_68 seq_2 2.03e-01 500 seq_3 4.45e-04 212_[-1(4.54e-07)]_262_[+1(5.02e-05)]_4 seq_4 2.01e-02 155_[+1(2.07e-05)]_334 seq_5 4.34e-02 21_[-1(4.53e-05)]_468 -------------------------------------------------------------------------------- ******************************************************************************** ******************************************************************************** Stopped because nmotifs = 1 reached. ******************************************************************************** CPU: john-dell ******************************************************************************** """) parser = MEMEParser() parsed = parser.parse(meme_output) print '#Sequence, start, length, site' for motif in parsed.motifs: print 'Motif: E-value: %f' % motif.evalue for instance in motif.instances: print "%10s, %5d, %5d, %s" % ( instance.sequence_name, instance.start, instance.length, str(instance), ) #assert instance.length == motif.length -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Sep 3 13:49:22 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Sep 2010 09:49:22 -0400 Subject: [Biopython-dev] [Bug 3135] Wrong instance length bug in MEME parser In-Reply-To: Message-ID: <201009031349.o83DnMxp002567@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3135 johnbaronreid at netscape.net changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |johnbaronreid at netscape.net -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Sep 3 14:29:02 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Sep 2010 10:29:02 -0400 Subject: [Biopython-dev] [Bug 3135] Wrong instance length bug in MEME parser In-Reply-To: Message-ID: <201009031429.o83ET2vO005042@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3135 barwil at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Sep 3 14:53:17 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Sep 2010 10:53:17 -0400 Subject: [Biopython-dev] [Bug 3135] Wrong instance length bug in MEME parser In-Reply-To: Message-ID: <201009031453.o83ErHXg006375@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3135 barwil at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #1 from barwil at gmail.com 2010-09-03 10:53 EST ------- Small bug, fixed in the master branch. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Fri Sep 3 17:17:24 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 3 Sep 2010 10:17:24 -0700 (PDT) Subject: [Biopython-dev] DeprecationWarnings Message-ID: <712888.65830.qm@web62408.mail.re1.yahoo.com> Hi everybody, In Python 2.7, DeprecationWarnings are silenced by default; they are shown if Python is started with the -Wd option. Most of our users won't use the -Wd option and therefore won't see any of the DeprecationWarnings. I suggest that we replace all DeprecationWarnings in Biopython with a Biopython-specific warning, perhaps implemented in Bio/__init__.py, to make sure that users actually see these warnings when they occur. At the same time, we can add DeprecationWarnings to all functions marked as obsolete, since most users won't see these DeprecationWarnings anyway (assuming they are using Python 2.7 or later). This allows us to check if any software depends on obsolete code by using the -Wd option when starting Python. Any objections? --Michiel. From mjldehoon at yahoo.com Fri Sep 3 17:26:51 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 3 Sep 2010 10:26:51 -0700 (PDT) Subject: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez Message-ID: <488872.91962.qm@web62406.mail.re1.yahoo.com> Hi everybody, The parser in Bio.Entrez can parse any XML returned by the Entrez E-utilities as long as the corresponding DTD is available (which are included with each release of Biopython). One corner case is efetch results from the Journals database. Officially, efetch from the Journals database does not generate output in the XML format, but only plain text or HTML. However, when requesting XML explicitly from Entrez, in practice it does return an XML-like output. Our parser in Bio.Entrez is able to parse this XML, but it requires several hacks in the parser code. As probably few users are interested in efetch output from the Journals database, I suggest that we remove these hacks from Bio.Entrez altogether -- after all, this is for XML that is not supported by NCBI to begin with. If there are some users that really want to parse efetch output from the Journals database, we can always add a simple parser for plain-text efetch output. The advantage of removing these hacks is that it will allow us to validate all XML against the DTD, and to raise an error (if the user requests so) if any elements are found in the XML that don't validate against the DTD. Any objections? --Michiel. From biopython at maubp.freeserve.co.uk Fri Sep 3 17:28:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 3 Sep 2010 18:28:32 +0100 Subject: [Biopython-dev] DeprecationWarnings In-Reply-To: <712888.65830.qm@web62408.mail.re1.yahoo.com> References: <712888.65830.qm@web62408.mail.re1.yahoo.com> Message-ID: On Fri, Sep 3, 2010 at 6:17 PM, Michiel de Hoon wrote: > Hi everybody, > > In Python 2.7, DeprecationWarnings are silenced by default; they are shown > if Python is started with the -Wd option. Most of our users won't use the -Wd > option and therefore won't see any of the DeprecationWarnings. I remember you pointed this out last week, I agree this will be a problem. http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008207.html > I suggest that > we replace all DeprecationWarnings in Biopython with a Biopython-specific > warning, perhaps implemented in Bio/__init__.py, to make sure that users > actually see these warnings when they occur. Sounds good, maybe Bio.BioDeprecationWarning would do as a name? > At the same time, we can add DeprecationWarnings to all functions > marked as obsolete, since most users won't see these DeprecationWarnings > anyway (assuming they are using Python 2.7 or later). ?This allows us to > check if any software depends on obsolete code by using the -Wd option > when starting Python. > > Any objections? Yes. That would be a misuse of DeprecationWarning, and would by very annoying for people running on Python 2.6 or older (which will probably be most users for the time being). Instead we can use the built in PendingDeprecationWarning (which is silent by default): http://lists.open-bio.org/pipermail/biopython-dev/2009-September/006762.html Regards, Peter From biopython at maubp.freeserve.co.uk Fri Sep 3 17:31:09 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 3 Sep 2010 18:31:09 +0100 Subject: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez In-Reply-To: <488872.91962.qm@web62406.mail.re1.yahoo.com> References: <488872.91962.qm@web62406.mail.re1.yahoo.com> Message-ID: On Fri, Sep 3, 2010 at 6:26 PM, Michiel de Hoon wrote: > Hi everybody, > > The parser in Bio.Entrez can parse any XML returned by the Entrez E-utilities > as long as the corresponding DTD is available (which are included with each > release of Biopython). One corner case is efetch results from the Journals > database. Officially, efetch from the Journals database does not generate > output in the XML format, but only plain text or HTML. However, when > requesting XML explicitly from Entrez, in practice it does return an XML-like > output. Our parser in Bio.Entrez is able to parse this XML, but it requires > several hacks in the parser code. Out of interest, have you asked the NCBI about this undocumented XML output? > As probably few users are interested in efetch output from the Journals > database, I suggest that we remove these hacks from Bio.Entrez altogether > -- after all, this is for XML that is not supported by NCBI to begin with. If > there are some users that really want to parse efetch output from the > Journals database, we can always add a simple parser for plain-text > efetch output. > > The advantage of removing these hacks is that it will allow us to validate > all XML against the DTD, and to raise an error (if the user requests so) > if any elements are found in the XML that don't validate against the DTD. > > Any objections? Is it feasible to just put deprecation warnings in for Biopython 1.56, and then remove the hacks later? Peter From mjldehoon at yahoo.com Sat Sep 4 05:18:34 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 3 Sep 2010 22:18:34 -0700 (PDT) Subject: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez In-Reply-To: Message-ID: <695617.66055.qm@web62406.mail.re1.yahoo.com> --- On Fri, 9/3/10, Peter wrote: > Out of interest, have you asked the NCBI about this > undocumented XML output? Yes, several times. > Is it feasible to just put deprecation warnings in for > Biopython 1.56, and then remove the hacks later? Yes that is possible, but I don't think that we should go through the usual obsolete / deprecation / removal procedure, because this will take more than a year, and it's not worthwhile for a piece of code that probably nobody uses. In addition, once the hacks are removed, the parser will be able to validate each XML document it parses, which is to the benefit of all users and use cases. I don't think that we should postpone that by a year. --Michiel. From mjldehoon at yahoo.com Sat Sep 4 05:21:07 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 3 Sep 2010 22:21:07 -0700 (PDT) Subject: [Biopython-dev] DeprecationWarnings In-Reply-To: Message-ID: <160151.26509.qm@web62401.mail.re1.yahoo.com> --- On Fri, 9/3/10, Peter wrote: > > I suggest that > > we replace all DeprecationWarnings in Biopython with a > Biopython-specific > > warning, perhaps implemented in Bio/__init__.py, to > make sure that users > > actually see these warnings when they occur. > > Sounds good, maybe Bio.BioDeprecationWarning would do as a > name? I would prefer Bio.DeprecationWarning instead of Bio.BioDeprecationWarning. [for obsolete code:] > Instead we can use the built in > PendingDeprecationWarning (which is silent by default): OK, let's use that then for obsolete code. If there are no other opinions, I'll make these changes next weekend. --Michiel. From biopython at maubp.freeserve.co.uk Sat Sep 4 12:03:28 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 4 Sep 2010 13:03:28 +0100 Subject: [Biopython-dev] DeprecationWarnings In-Reply-To: <160151.26509.qm@web62401.mail.re1.yahoo.com> References: <160151.26509.qm@web62401.mail.re1.yahoo.com> Message-ID: On Sat, Sep 4, 2010 at 6:21 AM, Michiel de Hoon wrote: > > On Fri, 9/3/10, Peter wrote: >> >> Sounds good, maybe Bio.BioDeprecationWarning would do as a >> name? > > I would prefer Bio.DeprecationWarning instead of Bio.BioDeprecationWarning. But then if we do "from Bio import DeprecationWarning" is would mask the built in DeprecationWarning - which could cause confusion? > [for obsolete code:] >> Instead we can use the built in >> PendingDeprecationWarning (which is silent by default): > > OK, let's use that then for obsolete code. OK Peter From mjldehoon at yahoo.com Sat Sep 4 12:29:29 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 4 Sep 2010 05:29:29 -0700 (PDT) Subject: [Biopython-dev] DeprecationWarnings In-Reply-To: Message-ID: <277645.62569.qm@web62403.mail.re1.yahoo.com> --- On Sat, 9/4/10, Peter wrote: > > I would prefer Bio.DeprecationWarning instead of > Bio.BioDeprecationWarning. > > But then if we do "from Bio import DeprecationWarning" is > would mask the built in DeprecationWarning - which could cause > confusion? OK, then how about Bio.BiopythonDeprecationWarning? --Michiel. From mjldehoon at yahoo.com Sat Sep 4 15:23:16 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 4 Sep 2010 08:23:16 -0700 (PDT) Subject: [Biopython-dev] Fwd: [blast-announce] Correction: BLAST 2.2.24 release announcement In-Reply-To: Message-ID: <438480.92769.qm@web62402.mail.re1.yahoo.com> Being able to convert Blast ASN.1 output into any of the other formats will make a big difference to us. If we had a parser for ASN.1 Blast output, then strictly speaking there is no reason to have a parser for any of the other formats (in practice, we can be more flexible of course). I looked some more into the Blast parser issues we discussed earlier (starting here: http://lists.open-bio.org/pipermail/biopython-dev/2010-May/007762.html). Unfortunately things are not as easy as I had hoped. Except for the new ASN.1 output format, none of the other output formats (plain text, XML, tabular) contain all of the output generated by the Blast run. Some results are only found in the XML, some only in the plain text output, and tabular output can contain all kinds of stuff depending on the exact options that were used. As a consequence, it's hard to design a generic Blast record class; having a specialized Record class for plain text, XML, and tabular seems more appropriate, and these record classes may not be fully consistent with each other (some elements may exist in one class but not in the other). Also, we cannot read in the Blast output in one format and write out the Blast output in a different format (at least not reliably). With the format converter in Blast 2.2.24, luckily there is no longer such a need for such converters in Biopython. If we had an ASN.1 parser, we could run Blast, save its output in ASN.1, load the Blast output into Python, filter the Blast output or otherwise modify it, write out the modified output in ASN.1 format, and then use the Blast 2.2.24 format converter to convert the modified output to plain text or some other format. That would be really useful. Unfortunately, making a parser for ASN.1 will not be so easy. As far as I know there isn't anything like expat or DOM for ASN.1 like we have for XML. Maybe this is something for a google summer of code? --Michiel. --- On Tue, 8/24/10, Peter wrote: > From: Peter > Subject: [Biopython-dev] Fwd: [blast-announce] Correction: BLAST 2.2.24 release announcement > To: "Biopython-Dev Mailing List" > Date: Tuesday, August 24, 2010, 12:30 PM > Hi all, > > The NCBI have just released a new version of BLAST+ (see > below). > > I've just updated the existing BLAST+ application wrappers > for the minor > changes made in BLAST 2.2.24+. > > Something potentially quite useful in this release is the > blast_formatter > command for turning ASN.1 BLAST+ output (using ?outfmt > 11) into > any of the other output formats. i.e. If you are not sure > what output > format will be most useful (e.g. plain text, XML, tabular) > and rerunning > the BLAST is slow, the NCBI now let you run the BLAST once > and save > it as ASN.1, then convert this to any other format on > demand using > blast_formatter (which should be fast). > > We should write a command line wrapper for this new > tool... > > Peter > > ---------- Forwarded message ---------- > From: mcginnis > Date: Tue, Aug 24, 2010 at 4:46 PM > Subject: [blast-announce] Correction: BLAST 2.2.24 release > announcement > To: NLM/NCBI List blast-announce > > > A new version of the stand-alone applications is > available. > > Users are encouraged to use the BLAST+ applications > available at > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ > This release includes a number of bug fixes as well as new > features > for the BLAST+ applications: > > *?Introduce BLAST Archive format to permit reformatting > of?stand-alone > BLAST searches with the blast_formatter(see BLAST+ user > manual) > * Added the blast_formatter application (see BLAST+ user > manual) > * Added support for translated subject soft masking in the > BLAST databases > * Added support for the BLAST Trace-back operations (btop) > output format > * Added command line options to blastdbcmd for listing > available BLAST databases > * Improved performance of formatting of remote BLAST > searches > * Use a consistent exit code for out of memory conditions > * Fixed bug in indexed megablast with multiple > space-separated BLAST databases > * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and > makeblastdb > * Fixed Windows installer for 64-bit installations > > BLAST+ applications, as well as the legacy C applications > (e.g. > blastall), may be downloaded from > http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython at maubp.freeserve.co.uk Sat Sep 4 17:29:42 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 4 Sep 2010 18:29:42 +0100 Subject: [Biopython-dev] DeprecationWarnings In-Reply-To: <277645.62569.qm@web62403.mail.re1.yahoo.com> References: <277645.62569.qm@web62403.mail.re1.yahoo.com> Message-ID: On Sat, Sep 4, 2010 at 1:29 PM, Michiel de Hoon wrote: > --- On Sat, 9/4/10, Peter wrote: >> > I would prefer Bio.DeprecationWarning instead of >> Bio.BioDeprecationWarning. >> >> But then if we do "from Bio import DeprecationWarning" is >> would mask the built in DeprecationWarning - which could cause >> confusion? > > OK, then how about Bio.BiopythonDeprecationWarning? I like that too - longer but clear. Peter From anaryin at gmail.com Mon Sep 6 15:05:56 2010 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 6 Sep 2010 16:05:56 +0100 Subject: [Biopython-dev] GSOC Bio.PDB Project - Final Summary In-Reply-To: <20100816132716.GG23299@sobchak.mgh.harvard.edu> References: <20100816132716.GG23299@sobchak.mgh.harvard.edu> Message-ID: Hello Brad! I think most of it is ready to be integrated with the main branch. There are a couple of features like Hydrogenation that need a thorough look at them. Regarding compatibility, I think there is absolutely no problem at all. These are extensions based on previous code. The few completely new additions don't break anything as far as I know :) I will clean it up a little bit more and maybe then we can merge it. Best! And thanks! Jo?o From anaryin at gmail.com Mon Sep 6 15:05:56 2010 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 6 Sep 2010 16:05:56 +0100 Subject: [Biopython-dev] GSOC Bio.PDB Project - Final Summary In-Reply-To: <20100816132716.GG23299@sobchak.mgh.harvard.edu> References: <20100816132716.GG23299@sobchak.mgh.harvard.edu> Message-ID: Hello Brad! I think most of it is ready to be integrated with the main branch. There are a couple of features like Hydrogenation that need a thorough look at them. Regarding compatibility, I think there is absolutely no problem at all. These are extensions based on previous code. The few completely new additions don't break anything as far as I know :) I will clean it up a little bit more and maybe then we can merge it. Best! And thanks! Jo?o From biopython at maubp.freeserve.co.uk Tue Sep 7 11:17:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 7 Sep 2010 12:17:37 +0100 Subject: [Biopython-dev] Fwd: [blast-announce] Correction: BLAST 2.2.24 release announcement In-Reply-To: <438480.92769.qm@web62402.mail.re1.yahoo.com> References: <438480.92769.qm@web62402.mail.re1.yahoo.com> Message-ID: On Sat, Sep 4, 2010 at 4:23 PM, Michiel de Hoon wrote: > Being able to convert Blast ASN.1 output into any of the other formats > will make a big difference to us. If we had a parser for ASN.1 Blast > output, then strictly speaking there is no reason to have a parser for > any of the other formats (in practice, we can be more flexible of course). Dave Messina made a good point on the BioPerl list that (depending on what data you are interested in) post-processing to generate the alignment views is a waste of CPU time: http://lists.open-bio.org/pipermail/bioperl-l/2010-August/033972.html Also, and we see this already with the XML output, the output file size is quite inflated - especially if all you need can be presented in one of the tabular forms which is smaller and quicker to parse. So yes, in principle a parser for ASN.1 Blast would be all we need, but in practice tabular/plaintext/XML BLAST parsers are still useful. > I looked some more into the Blast parser issues we discussed > earlier (starting here: > http://lists.open-bio.org/pipermail/biopython-dev/2010-May/007762.html > ). Unfortunately things are not as easy as I had hoped. Except > for the new ASN.1 output format, none of the other output > formats (plain text, XML, tabular) contain all of the output > generated by the Blast run. Some results are only found in > the XML, some only in the plain text output, and tabular > output can contain all kinds of stuff depending on the exact > options that were used. As a consequence, it's hard to design > a generic Blast record class; having a specialized Record > class for plain text, XML, and tabular seems more appropriate, > and these record classes may not be fully consistent with each > other (some elements may exist in one class but not in the > other). I thought it would be hard :( > Also, we cannot read in the Blast output in one format and > write out the Blast output in a different format (at least not > reliably). In some cases this isn't surprising of course (e.g. tabular to XML isn't going to work). > With the format converter in Blast 2.2.24, luckily there is > no longer such a need for such converters in Biopython. > If we had an ASN.1 parser, we could run Blast, save its > output in ASN.1, load the Blast output into Python, filter > the Blast output or otherwise modify it, write out the > modified output in ASN.1 format, and then use the Blast > 2.2.24 format converter to convert the modified output > to plain text or some other format. That would be really > useful. > > Unfortunately, making a parser for ASN.1 will not be so > easy. As far as I know there isn't anything like expat or > DOM for ASN.1 like we have for XML. Maybe this is > something for a google summer of code? Maybe. There are some python libraries out there for ASN.1 (it is an ISO standard used beyond the NCBI). http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One http://bitbucket.org/haypo/hachoir/wiki/hachoir-parser Peter From updates at feedmyinbox.com Wed Sep 8 07:10:53 2010 From: updates at feedmyinbox.com (Feed My Inbox) Date: Wed, 8 Sep 2010 03:10:53 -0400 Subject: [Biopython-dev] 9/8 biopython Questions - BioStar Message-ID: // Biopython NCBIStandalone blastall gives different result than calling blastall directly from cmd // September 7, 2010 at 7:36 PM http://biostar.stackexchange.com/questions/2391/biopython-ncbistandalone-blastall-gives-different-result-than-calling-blastall-di So, first I tested what results I should get from the blastall program using the command line, with e-value 0.001: C:\Niek\Test\blast-2.2.17\bin\blastall -p blastp -d C:\Niek\Test\arabidopsis-smallproteins.fasta -i C:\Niek\Test\arabidopsis-HD.fasta -e 0.001 -F F -m 8 -o C:\Niek\Test\arab-HD-smallproteins-notfiltered.out and C:\Niek\Test\blast-2.2.17\bin\blastall -p blastp -d C:\Niek\Test\arabidopsis-smallproteins.fasta -i C:\Niek\Test\arabidopsis-HD.fasta -e 0.001 -m 8 -o C:\Niek\Test\arab-HD-smallproteins-filtered.out After that I made a local blast program, which works fine but it only found 91 results with e-value equal or lower than 0.001, where the results from the blastall via cmd gave around 140~ something results. I first thought it missed some, but all the e-values are different. from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML my_blast_db = r"C:\Niek\Test\arabidopsis-smallproteins.fasta" my_blast_file = r"C:\Niek\Test\arabidopsis-HD.fasta" my_blast_exe =r"C:\Niek\blast-2.2.17\bin\blastall.exe" result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe, "blastp", my_blast_db, my_blast_file) blast_records = NCBIXML.parse(result_handle) E_VALUE_THRESH = 0.001 x = 0 for blast_record in blast_records: blast_record = blast_records.next() for alignment in blast_record.alignments: for hsp in alignment.hsps: if hsp.expect <= E_VALUE_THRESH: print "==========Alignment========" print "sequence:", alignment.title print "length:", alignment.length print "e value:", hsp.expect x += 1 I first thought that the local blast from biopython uses a different algorithm, but at 'my_blast_exe =r"C:\Niek\blast-2.2.17\bin\blastall.exe"' I specify the same program but it should be the same. Then I thought it had something to do with the filtering option but I checked both filtered and unfiltered and wasn't any of that. If you know why the local blast from biopython NCBIStandalone gives a different result than doing it directly in the cmd, please let me know. Thanks in advance, Niek edit: I checked, and it seems that the NCBIStandalone filters out 100% identities, which the blastall called by cmd does not. However this doesn't explain why the e-values are so different. // Problem with blastp of biopython: returned non-zero exit status 1 // September 7, 2010 at 4:10 PM http://biostar.stackexchange.com/questions/2388/problem-with-blastp-of-biopython-returned-non-zero-exit-status-1 I want to do a local BLAST using blastp from the Bio.Blast.Applications. However, when I run it I get a runtime error: returned non-zero exit status 1. According to the manual it is Exit Code Meaning: 1 Error in query sequence(s) or BLAST options. The query used is fasta format protein sequences. The command line I used, with the BLAST options, was: '>>> print blastp_cline C:\Program Files\NCBI\blast-2.2.24+\bin\blastp.exe -query "C:\Documents and Settings\newintern\Desktop\Microproteins_niek\arabidopsis-HD.fasta" -db C:\Documents and Settings\newintern\Desktop\Microproteins_niek\arabidopsis-smallproteins.fasta -out test.xml -evalue 0.001 -outfmt 5 Anyone know how to fix that error? Thanks Niek -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/444424/f8ec200ea7b1a33442ee9d28a3d1365a23421b9a/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. 230 Franklin Road Suite 814 Franklin, TN 37064 From biopython at maubp.freeserve.co.uk Thu Sep 9 17:08:45 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Sep 2010 18:08:45 +0100 Subject: [Biopython-dev] Python 3 status (ignoring our C code and most dependencies) Message-ID: Hi Bartek, Eric, et al, I've rerun the test suite on the trunk code, and we have the following issues, most of which I'd already noted in this thread: http://lists.open-bio.org/pipermail/biopython-dev/2010-July/008079.html Bartek - I was seeing a couple of issues with Bio.Motif which came down to relative import issues, this seems to have fixed things. Could you confirm this change looks OK to you? http://github.com/biopython/biopython/commit/4700d1be7afffe5e06b41df6ee8cc19e68a9a6c1 Eric - Can you reproduce this test_Phylo.py failure on your machine? And is there any chance you'll be able to look at the Bio.PDB issue with DisorderedResidue? Thanks, Peter ------------------------------------------------------------------------ test_LocationParser ... Syntax error at or near `467' token Something in the spark parser isn't handled by 2to3, not urgent as I want to deprecate Bio.GenBank.LocationParser which is the only thing using spark. http://lists.open-bio.org/pipermail/biopython/2010-September/006734.html ------------------------------------------------------------------------ test_PDB ... FAIL TypeError: 'DisorderedResidue' object is not subscriptable See: http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008159.html ------------------------------------------------------------------------ test_Phylo ... FAIL Traceback (most recent call last): File "test_Phylo.py", line 47, in test_convert Phylo.convert(self.mem_file, 'nexus', mem_file_2, 'phyloxml') File "/home/xxx/repositories/biopython/Bio/Phylo/_io.py", line 102, in convert return write(trees, out_file, out_format, **kwargs) File "/home/xxx/repositories/biopython/Bio/Phylo/_io.py", line 92, in write n = getattr(supported_formats[format], 'write')(trees, file, **kwargs) File "/home/xxx/repositories/biopython/Bio/Phylo/PhyloXMLIO.py", line 148, in write return Writer(obj).write(file, encoding=encoding, indent=indent) File "/home/xxx/repositories/biopython/Bio/Phylo/PhyloXMLIO.py", line 684, in write self._tree.write(file, encoding) File "/home/xxx/lib/python3.1/xml/etree/ElementTree.py", line 659, in write self._write(file, self._root, encoding, {}) File "/home/xxx/lib/python3.1/xml/etree/ElementTree.py", line 677, in _write file.write(_encode("<" + tag, encoding)) TypeError: string argument expected, got 'bytes' ------------------------------------------------------------------------ test_SeqIO_online ... FAIL May need to turn all online byte handles into unicode handles, http://lists.open-bio.org/pipermail/biopython-dev/2010-July/008076.html ------------------------------------------------------------------------ test_property_manager ... FAIL Looks like a formatting change of some kind, but I want to deprecate this: http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008231.html ------------------------------------------------------------------------ Peter From biopython at maubp.freeserve.co.uk Thu Sep 9 17:27:16 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Sep 2010 18:27:16 +0100 Subject: [Biopython-dev] Deprecated code to remove for Biopython 1.56 In-Reply-To: References: Message-ID: On Wed, Sep 1, 2010 at 4:55 PM, Bartek Wilczynski wrote: > On Wed, Sep 1, 2010 at 5:52 PM, Peter wrote: >> >> Bartek - could you handle removing Bio.AlignAce and Bio.MEME please >> (assuming you agree that makes sense)? >> > > Absolutely. It makes perfect sense. I'll do it tomorrow. > Hi Bartek, I updated the DEPRECATED about their removal, and NEWS to mention you've contributed to what will be Biopython 1.56. Could you check if there is anything in test_MEME.py worth keeping (i.e. moving into test_Motif.py), and then delete test_MEME.py? Thanks, Peter From bugzilla-daemon at portal.open-bio.org Thu Sep 9 17:42:29 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 9 Sep 2010 13:42:29 -0400 Subject: [Biopython-dev] [Bug 3096] PPBuilder build_peptides bugs In-Reply-To: Message-ID: <201009091742.o89HgTs8024309@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3096 ------- Comment #6 from skong at zymeworks.com 2010-09-09 13:42 EST ------- Hi Peter, I tested out the code (on the script directly, not using git) and it works fine. I only have minor concerns that the additional input variable "standard_aa_only" for _accept() method in class _PPBuilder might break other codes that assumes it still has two instead of three input variables. Also within the same script there are three different naming and default value for the same flag (standard amino acid): 1. named "standard" with default False in is_aa() method 2. named "aa_only" with default 1 in build_peptides() method of class _PPBuilder 3. named "standard_aa_only" with no default value in _accept() method of class _PPBuilder Which is again minor. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Sep 9 17:53:02 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Sep 2010 18:53:02 +0100 Subject: [Biopython-dev] Bio.utils, Bio.PropertyManager, Bio.Encodings.IUPACEncoding, etc In-Reply-To: References: Message-ID: On Wed, Sep 1, 2010 at 4:38 PM, Peter wrote: > > For Biopython 1.55 we explicitly declared Bio.utils, Bio.PropertyManager, > and Bio.Encodings as obsolete. So, what do we do now? I'd like to declare > them deprecated in Biopython 1.56, and remove them (and Bio.Translate > and Bio.Transcribe) in Biopython 1.57. This is a quicker removal than usual, > but I'd argue anyone using these modules would have already been getting > deprecation warnings about Bio.Translate and Bio.Transcribe anyway. > I've marked them as deprecated, with an explicit DeprecationWarning in Bio.utils, but not for Bio.PropertyManager and Bio.Encodings which would be triggered by Bio.Alphabet.IUPAC. http://github.com/biopython/biopython/commit/28a7daeef6ff57979ec08de62777528219976df7 Peter From bugzilla-daemon at portal.open-bio.org Thu Sep 9 17:58:01 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 9 Sep 2010 13:58:01 -0400 Subject: [Biopython-dev] [Bug 3096] PPBuilder build_peptides bugs In-Reply-To: Message-ID: <201009091758.o89Hw1i8025811@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3096 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2010-09-09 13:58 EST ------- (In reply to comment #6) > Hi Peter, > > I tested out the code (on the script directly, not using git) and it works > fine. Excellent - thank you. > I only have minor concerns that the additional input variable > "standard_aa_only" for _accept() method in class _PPBuilder might break > other codes that assumes it still has two instead of three input variables. True, but I think that is a low risk and it is intended as a private API. It could be made an optional argument I suppose. > Also within the same script there are three different naming and default value > for the same flag (standard amino acid): > > 1. named "standard" with default False in is_aa() method > 2. named "aa_only" with default 1 in build_peptides() method of class > _PPBuilder > 3. named "standard_aa_only" with no default value in _accept() method of class > _PPBuilder > > Which is again minor. We can change the new argument ("standard_aa_only") added to _accept() without breaking backwards compatibility. I was trying to make it explicit - would you prefer "standard" instead? We both agreed that "aa_only" is very misleading. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bartek at rezolwenta.eu.org Thu Sep 9 18:51:07 2010 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Thu, 9 Sep 2010 20:51:07 +0200 Subject: [Biopython-dev] Python 3 status (ignoring our C code and most dependencies) In-Reply-To: References: Message-ID: On Thu, Sep 9, 2010 at 7:08 PM, Peter wrote: > Hi Bartek, Eric, et al, > > Hi, > I've rerun the test suite on the trunk code, and we have the > following issues, most of which I'd already noted in this thread: > http://lists.open-bio.org/pipermail/biopython-dev/2010-July/008079.html > > Bartek - I was seeing a couple of issues with Bio.Motif which came > down to relative import issues, this seems to have fixed things. > Could you confirm this change looks OK to you? > > http://github.com/biopython/biopython/commit/4700d1be7afffe5e06b41df6ee8cc19e68a9a6c1 > > I have to confess that my knowledge about python 3 is almost non-existent, so I was not following the messages related to the migration too closely. As far as I can tell, the changes in the imports are practically purely syntactic. I don't see possibilities of serious code breakage (the names which have changed in the Bio.Motif namespaces were introduced recently and I think any code relying on having Parsers.AlignAce.read inside Bio.Motif namespace should be reviewed anyway). Thanks for making the changes (and finishing my removal of Bio.MEME and Bio.AlignAce - I keep forgetting about those pesky files in the main biopython dir...) cheers Bartek From bugzilla-daemon at portal.open-bio.org Fri Sep 10 09:40:40 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 10 Sep 2010 05:40:40 -0400 Subject: [Biopython-dev] [Bug 3096] PPBuilder build_peptides bugs In-Reply-To: Message-ID: <201009100940.o8A9eev4019621@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3096 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2010-09-10 05:40 EST ------- Fix cherry-picked from that branch and committed: http://github.com/biopython/biopython/commit/544e4855e219cfbce813a50fa183683a7b0e4b3e I've also added you as a contributor (let me know if you want your email address included in the CONTRIB file, or would prefer not to be named): http://github.com/biopython/biopython/commit/993d58eb8e49a32d6821471421050720b88bfeeb Marking bug as fixed. Thank you :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Fri Sep 10 16:30:56 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 10 Sep 2010 12:30:56 -0400 Subject: [Biopython-dev] Python 3 status (ignoring our C code and most dependencies) In-Reply-To: References: Message-ID: On Thu, Sep 9, 2010 at 1:08 PM, Peter wrote: > Eric - Can you reproduce this test_Phylo.py failure on your machine? > And is there any chance you'll be able to look at the Bio.PDB issue > with DisorderedResidue? I'll try to give these a shot this weekend: > ------------------------------------------------------------------------ > test_PDB ... FAIL > > TypeError: 'DisorderedResidue' object is not subscriptable > > See: > http://lists.open-bio.org/pipermail/biopython-dev/2010-August/008159.html I took an initial look at it and was baffled. 2to3 doesn't seem to do anything that would affect it, and the relevant part of the code is an interesting tangle of if-else clauses related to the state of something non-local. So, this will take some careful stepping-through. Does anyone else have any hints on why this might be happening? > ------------------------------------------------------------------------ > test_Phylo ... FAIL > > Traceback (most recent call last): > ?File "test_Phylo.py", line 47, in test_convert > ? ?Phylo.convert(self.mem_file, 'nexus', mem_file_2, 'phyloxml') > ?File "/home/xxx/repositories/biopython/Bio/Phylo/_io.py", line 102, in convert > ? ?return write(trees, out_file, out_format, **kwargs) > ?File "/home/xxx/repositories/biopython/Bio/Phylo/_io.py", line 92, in write > ? ?n = getattr(supported_formats[format], 'write')(trees, file, **kwargs) > ?File "/home/xxx/repositories/biopython/Bio/Phylo/PhyloXMLIO.py", > line 148, in write > ? ?return Writer(obj).write(file, encoding=encoding, indent=indent) > ?File "/home/xxx/repositories/biopython/Bio/Phylo/PhyloXMLIO.py", > line 684, in write > ? ?self._tree.write(file, encoding) > ?File "/home/xxx/lib/python3.1/xml/etree/ElementTree.py", line 659, in write > ? ?self._write(file, self._root, encoding, {}) > ?File "/home/xxx/lib/python3.1/xml/etree/ElementTree.py", line 677, in _write > ? ?file.write(_encode("<" + tag, encoding)) > TypeError: string argument expected, got 'bytes' > Neat. I added this test shortly before the Biopython 1.55 release, and I guess it's doing its job. It might have something to do with the 'encoding' argument triggering some string/byte incompatibility in ElementTree; I'll check it out. -Eric From biopython at maubp.freeserve.co.uk Mon Sep 13 10:29:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 13 Sep 2010 11:29:32 +0100 Subject: [Biopython-dev] SwissProt parser: Feature ID kept in description string Message-ID: Hi Michiel at al, I'm looking at the SwissProt plain text parser (for Bug 2235, making SeqFeature objects in SeqIO for "swiss" format), and noticed something that puzzles me in the new parser in Bio/SwissProt/__init__.py: The parser spots /FTId= entries and extracts the feature ID, which is good, but leaves this string in the description string, which I find odd. Essentially I'd like to change this bit: if line[29:35]==r"/FTId=": ft_id = line[35:70].rstrip()[:-1] else: ft_id ="" too: if line[29:35]==r"/FTId=": ft_id = line[35:70].rstrip()[:-1] description = "" else: ft_id ="" What do you think? Peter From mjldehoon at yahoo.com Mon Sep 13 13:01:44 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 13 Sep 2010 06:01:44 -0700 (PDT) Subject: [Biopython-dev] SwissProt parser: Feature ID kept in description string In-Reply-To: Message-ID: <185662.21952.qm@web62401.mail.re1.yahoo.com> I have no objections. --Michiel. --- On Mon, 9/13/10, Peter wrote: > From: Peter > Subject: SwissProt parser: Feature ID kept in description string > To: "Biopython-Dev Mailing List" , "Michiel de Hoon" > Date: Monday, September 13, 2010, 6:29 AM > Hi Michiel at al, > > I'm looking at the SwissProt plain text parser (for Bug > 2235, making > SeqFeature objects in SeqIO for "swiss" format), and > noticed something > that puzzles me in the new parser in > Bio/SwissProt/__init__.py: > > The parser spots /FTId= entries and extracts the feature > ID, which > is good, but leaves this string in the description string, > which I find > odd. Essentially I'd like to change this bit: > > ? ? if line[29:35]==r"/FTId=": > ? ? ? ? ft_id = > line[35:70].rstrip()[:-1] > ? ? else: > ? ? ? ? ft_id ="" > > too: > > ? ? if line[29:35]==r"/FTId=": > ? ? ? ? ft_id = > line[35:70].rstrip()[:-1] > ? ? ? ? description = "" > ? ? else: > ? ? ? ? ft_id ="" > > What do you think? > > Peter > From biopython at maubp.freeserve.co.uk Mon Sep 13 14:01:13 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 13 Sep 2010 15:01:13 +0100 Subject: [Biopython-dev] SwissProt parser: Feature ID kept in description string In-Reply-To: <185662.21952.qm@web62401.mail.re1.yahoo.com> References: <185662.21952.qm@web62401.mail.re1.yahoo.com> Message-ID: On Mon, Sep 13, 2010 at 2:01 PM, Michiel de Hoon wrote: > I have no objections. > --Michiel. Great - done here: http://github.com/biopython/biopython/tree/3f600a8197a96856c5b7977e2bc140a8c6a6f7c8 and unit test updated here: http://github.com/biopython/biopython/tree/5ab01c52cdd789133b35dccbd20896a7d342a2f5 Peter From biopython at maubp.freeserve.co.uk Mon Sep 13 17:47:23 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 13 Sep 2010 18:47:23 +0100 Subject: [Biopython-dev] New: Uniprot XML parser In-Reply-To: References: <3b674cf220d52226cf9b2e189598fe61.squirrel@lipid.biocomp.unibo.it> <8ec7479153894f66ea029abd059e06c5.squirrel@lipid.biocomp.unibo.it> Message-ID: Hi Andrea, I've done some work on the plain text swiss parser to handle features, and some basic testing to make sure it agrees with the uniprot-xml parser. This showed some problems with end locations out by one in the XML parser which I believe I was able to resolve. I have also commented out the use of the skip_parsing_errors option - it doesn't seem to be needed and silent errors are bad. I have (for the moment) introduced a couple of new position classes in Bio.SeqFeature for "?123" where we have a position but it is uncertain, and "?" where we don't have a position at all. The later might be handled more elegantly by inferring a Before/AfterPosition instead... Note that for testing purposes, I have disabled your code where it builds a SeqFeature for a dbReference - I'm not sure what the best plan here is yet. Could you have a look at my branch please? http://github.com/peterjc/biopython/commits/uniprot Thanks, Peter From updates at feedmyinbox.com Tue Sep 14 07:12:43 2010 From: updates at feedmyinbox.com (Feed My Inbox) Date: Tue, 14 Sep 2010 03:12:43 -0400 Subject: [Biopython-dev] 9/14 biopython Questions - BioStar Message-ID: <804dd35f521a1792fdf62a62806ac334@74.63.51.88> // How to show more BLAST results using biopython? // September 13, 2010 at 12:58 PM http://biostar.stackexchange.com/questions/2462/how-to-show-more-blast-results-using-biopython Hi, I'm using biopython to BLAST over the internet. However, it only saves 30 results (there are more than 30 results that are under the e-value I chose) in the xml. I've been looking all over but can't find how to make that number higher. So my question is, how can you show more results from BLAST using biopython. I'm using NCBIWWW.qblast from BIO.BLAST. from Bio.Blast import NCBIWWW File = "MIF" fasta_string = open(File+".fasta").read() result_handle = NCBIWWW.qblast("blastp", "nr", fasta_string) Thanks, Niek -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/444424/f8ec200ea7b1a33442ee9d28a3d1365a23421b9a/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. 230 Franklin Road Suite 814 Franklin, TN 37064 From tiagoantao at gmail.com Tue Sep 14 12:25:45 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 14 Sep 2010 13:25:45 +0100 Subject: [Biopython-dev] ubuntu Message-ID: Hi, Just a comment following from an email in the user list from Bartek. Should we nag people at Ubuntu/Debian to upgrade from 1.53 to something newer? See if they need help of some kind or such? I could volunteer to go and check what is happening and try to pull things a bit forward... -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From bartek at rezolwenta.eu.org Tue Sep 14 13:22:00 2010 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 14 Sep 2010 15:22:00 +0200 Subject: [Biopython-dev] ubuntu In-Reply-To: References: Message-ID: 2010/9/14 Tiago Ant?o > Hi, > > Hi Tiago, > Just a comment following from an email in the user list from Bartek. > Should we nag people at Ubuntu/Debian to upgrade from 1.53 to > something newer? See if they need help of some kind or such? > I could volunteer to go and check what is happening and try to pull > things a bit forward... > I'm not an expert on this, but as far as I know, this package is pulled more or less directly from Debian and maintained by the Debian-med team (see http://packages.debian.org/testing/python-biopython , especially the links to maintainers on the right). The delay comes from the fact that every six months, after making a release, ubuntu takes the biopython version from debian testing and puts it into the line for the next release in six months. This gives you effectively at least 6 months delay between the ubuntu version and the current biopython trunk. Lately, biopython makes at least one release (sometimes two) every six months which means that the delay will be at least one release number (more likely two, or more if somebody is not upgrading their ubuntu every 6 months). As far as I can tell, the guys at debian-med have the process of package release fairly automated, but there are two delays: - the delay in picking up new releases from biopython into debian testing. currently this is 1.54, they haven't picked up the 1.55 yet, which means about 1 month of delay - the delay of ubuntu releasing policy (currently, the 1.54 is scheduled to be in 10.10, we can expect, that 1.55 will make it into 11.4, by which time there will probably be biopython 1.56) There is also the ubuntu-backports system, which includes newer packages back-ported to older releases, and it includes biopython, but this only includes the packages already released for newer ubuntu versions. In summary, we might try to minimize the first delay by tyrying to synchronize a bit with ubuntu release cycle (I don't think we should be totally dependent on their schedule, but it might be good to remember that if we don't release in March or september, we will miss more than one ubuntu release) and ask the debian-med team for how we can make sure that the new release will make it into debian-testing as fast as possible. cheers B > -- > Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From andrea at biocomp.unibo.it Tue Sep 14 16:22:06 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Tue, 14 Sep 2010 18:22:06 +0200 (CEST) Subject: [Biopython-dev] New: Uniprot XML parser In-Reply-To: References: <3b674cf220d52226cf9b2e189598fe61.squirrel@lipid.biocomp.unibo.it> <8ec7479153894f66ea029abd059e06c5.squirrel@lipid.biocomp.unibo.it> Message-ID: Hi Peter, I've commented your commits directly on github, basically agreeing with them. Parsing PDB structures as positional features was done to capture all the information in the uniprot file. I do not see any better place than a SeqFeature for a positional information, the only option here is to skip it. I saw in your repository you are using the string "uniprot-xml" to call the parser, however the format name at the EBI REST and SOAP services is simply "uniprotxml". take a look at: http://www.ebi.ac.uk/Tools/webservices/services/dbfetch_rest I think it is better to be conservative in this. I'm still working on the SeqIO.index to make a faster implementation. RE are really slow, and ElementTree should cope well with this task. Anyhow it works with the current implementation, so it's not a big deal. Andrea > Hi Andrea, > > I've done some work on the plain text swiss parser to handle features, > and some basic testing to make sure it agrees with the uniprot-xml > parser. This showed some problems with end locations out by one > in the XML parser which I believe I was able to resolve. I have also > commented out the use of the skip_parsing_errors option - it doesn't > seem to be needed and silent errors are bad. > > I have (for the moment) introduced a couple of new position classes > in Bio.SeqFeature for "?123" where we have a position but it is > uncertain, and "?" where we don't have a position at all. The later > might be handled more elegantly by inferring a Before/AfterPosition > instead... > > Note that for testing purposes, I have disabled your code where > it builds a SeqFeature for a dbReference - I'm not sure what the > best plan here is yet. > > Could you have a look at my branch please? > > http://github.com/peterjc/biopython/commits/uniprot > > Thanks, > > Peter > From biopython at maubp.freeserve.co.uk Tue Sep 14 17:58:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 14 Sep 2010 18:58:32 +0100 Subject: [Biopython-dev] New: Uniprot XML parser In-Reply-To: References: <3b674cf220d52226cf9b2e189598fe61.squirrel@lipid.biocomp.unibo.it> <8ec7479153894f66ea029abd059e06c5.squirrel@lipid.biocomp.unibo.it> Message-ID: On Tue, Sep 14, 2010 at 5:22 PM, Andrea Pierleoni wrote: > > Hi Peter, > I've commented your commits directly on github, basically agreeing with > them. Thanks. > Parsing PDB structures as positional features was done to capture all the > information in the uniprot file. I do not see any better place than a > SeqFeature for a positional information, the only option here is to skip it. We could put the DB cross reference into the dbxrefs list, but that only captures a tiny part of the data. We could also put it in the annotations, but that loses the benefits of the position information. Maybe using a SeqFeature is the best plan... > I saw in your repository you are using the string "uniprot-xml" to call > the parser, however the format name at the EBI REST and SOAP services > is simply "uniprotxml". take a look at: > > http://www.ebi.ac.uk/Tools/webservices/services/dbfetch_rest > > I think it is better to be conservative in this. On the other hand, "uniprot-xml" fits well with the idea of "format-variant". Whatever we go with will have downsides. > I'm still working on the SeqIO.index to make a faster implementation. RE > are really slow, and ElementTree should cope well with this task. > Anyhow it works with the current implementation, so it's not a big deal. I don't know enough about ElementTree to help right now, sorry. Peter From biopython at maubp.freeserve.co.uk Tue Sep 14 21:59:29 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 14 Sep 2010 22:59:29 +0100 Subject: [Biopython-dev] Fwd: [Utilities-announce] PubMed E-Utility 2011 DTD updates - please read! In-Reply-To: References: Message-ID: Hi all, It looks like there are two more DTD files (available now) to add to Biopython for the Bio.Entrez parser. Peter ---------- Forwarded message ---------- From: Date: Tue, Sep 14, 2010 at 9:24 PM Subject: [Utilities-announce] PubMed E-Utility 2011 DTD updates - please read! To: NLM/NCBI List utilities-announce Dear NCBI PubMed E-Utility Users, We anticipate updating the PubMed E-Utility DTDs for 2011 in mid-December, approximately on December 13, 2010. The forthcoming DTDs are available from: http://eutils.ncbi.nlm.nih.gov/corehtml/query/DTD/pubmed_110101.dtd http://eutils.ncbi.nlm.nih.gov/corehtml/query/DTD/nlmmedlinecitationset_110101.dtd *[image: http://jira/images/icons/linkext7.gif]* 1. DTD AND XML CHANGES FOR 2011 1. Changes to NLMMedlineCitationSet DTD AND PubMed XML The DTD changes for the 2011 production year are itemized in the Revision Notes section near the top of the DTD. The following describes the substantive changes to NLMMedlineCitationSet dtd and PubMed XML: 1. Accommodating Structured Abstracts Two new attributes, Label and NlmCategory, are added to the AbstractText element which is used with both the Abstract and OtherAbstract elements. A valid label name found in published structured abstracts (e.g., Introduction, Goals, Study Design, Findings, Discussion) will be identified in the XML as an Abstract Text Label and each ?parent? concept to which the published Label name is mapped at NLM will be identified as an Abstract Text NlmCategory. Five NLM-assigned mapped-to categories are possible: Background, Objective, Methods, Results, and Conclusions. In general, the lack of Label and NlmCategory attributes in AbstractText means the published abstract is unstructured. Note that the content of structured abstracts will be exported in separate segments that need to be joined for display of the complete abstract text. DTD: In the following example, the published label names are INTRODUCTION; AIMS; DESIGN, SETTING AND PARTICIPANTS; RESULTS; and DISCUSSION which correspondingly map to the five NLM-assigned categories. Sample XML: 1. Implementing Protocol Class 2 Supplementary Concept Record (SCR) and Rare Disease Class 3 SCR terms A new element, SupplMeshList, is added to the MedlineCitation element and another new element, SupplMeshName with its attribute Type, is added to SupplMeshList. DTD: Sample XML: disease term protocol term 1. Separating MeSH Geographic Descriptor Names from other MeSH Descriptors The Type attribute is added to the DescriptorName element. DTD: Sample XML: New York 1. Accommodating Versioned Articles; Corresponding Change to PMID There is a new model of publishing referred to as ?versioning? whereby multiple versions of the same online article are released, sometimes in quick succession and sometimes almost as soon as the original article has been published. Beginning in the 2011 production year, NLM will create an individual citation for each article?s version and link the versions via new attributes for the MedlineCitation and PMID elements. The new attributes for the MedlineCitation element are VersionID and VersionDate. DTD: The new attribute for the PMID element is Version. DTD: Sample XML: 20029669 Search and display implementation in PubMed is under consideration at this time; all implementation decisions will be documented in a forthcoming NLM Technical Bulletin article. Details are: - The PMID combined with its version attribute value (e.g., 1, 2, 3) becomes the citation?s new unique identifier, represented as 12345678. - The PMID Version attribute value ?1? will be assigned to all existing records at the time the 2011 baseline files are produced and exported. - The PubDate value on citations of versions of the same article will be identical. The MedlineCitation Version Date and VersionID attribute values supplied by the publisher will identify the specific version. - If a citation is not for PMID Version 1, it must contain MedlineCitation Version Date and VersionID attribute values, and the original publication date for Version 1 as the PubDate. - A PMID Version attribute value higher than 1 indicates that there is a citation for at least one prior version (although it might happen, rarely, that a prior version subsequently gets deleted). Although the MedlineCitation VersionDate value may be different from the PubDate, it might be the same as PubDate if the new version was released later the same day. - In the future, when non-PubMed Central journals are included, the publisher-supplied VersionID value will be whatever the publisher decides it to be; e.g. the 2nd publisher-supplied VersionID may be ?b? or ?2b? and the PMID Version attribute value assigned by NLM will still be 2. 1. Eliminating Pre-defined Source Attribute Values for NameID Element The 2010 DTD specifies the values that may be used for the Source attribute for the NameID element. Please note that the NameID element has not yet been used; it is expected to be implemented at some point during the 2011. DTD: Sample XML: 1. Simplifying Author Element Structure The NameID element has been repositioned in the Author element to simplify the DTD structure. The XML is not affected. DTD: 1. Accommodating Identification of Machine-generated Keywords. A new valid value, NLM-AUTO, is added to the Owner attribute of the element KeywordList. DTD: Sample XML: 1. ENHANCED CHARACTER SET A subset of UTF-8 characters is currently supported for PubMed data. PubMed data now supports the full UTF-8 Character Set. Exceptions: All instances that represent a Double Quote will be translated to the straight double quote (Unicode 0022). All instances that represent a Single Quote (including the prime and apostrophe) will be translated to the straight single quote (Unicode 0027). Em Dash, En Dash, Hyphen, or Minus will be translated to the single dash (Unicode 002D). Those three Unicode values are part of the current Character Set. _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From andrea at biocomp.unibo.it Wed Sep 15 12:25:04 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Wed, 15 Sep 2010 14:25:04 +0200 (CEST) Subject: [Biopython-dev] New: Uniprot XML parser In-Reply-To: References: <3b674cf220d52226cf9b2e189598fe61.squirrel@lipid.biocomp.unibo.it> <8ec7479153894f66ea029abd059e06c5.squirrel@lipid.biocomp.unibo.it> Message-ID: <38e57130d02951c066b39ca420488d70.squirrel@lipid.biocomp.unibo.it> > > We could put the DB cross reference into the dbxrefs list, but that only > captures a tiny part of the data. We could also put it in the annotations, > but that loses the benefits of the position information. Maybe using a > SeqFeature is the best plan... > it is to me > > On the other hand, "uniprot-xml" fits well with the idea of > "format-variant". > Whatever we go with will have downsides. well I suppose we have to choose one. uniprot-xml is fine for me. > >> I'm still working on the SeqIO.index to make a faster implementation. RE >> are really slow, and ElementTree should cope well with this task. >> Anyhow it works with the current implementation, so it's not a big deal. > > I don't know enough about ElementTree to help right now, sorry. > Well I've reimplemented the _index UniprotDict function using ElementTree, but it looks like this cannot be done using ElementTree. To iterate over an XML file ElementTree uses a the iterparse function, that is able to capture start and end events for every tag. unfortunately this event "capture" is not aligned with the parsing process, meaning that when a start event is raised the parser could be up to 16K ahead in reading the file, and the actual position is variable. See http://mail.python.org/pipermail/xml-sig/2005-January/010838.html Thus I cannot pick up the start position of the tag in the file. The only way I found to make it work is going line by line, like you did in your implementation. We can use that one. Andrea implementation From updates at feedmyinbox.com Thu Sep 16 07:12:25 2010 From: updates at feedmyinbox.com (Feed My Inbox) Date: Thu, 16 Sep 2010 03:12:25 -0400 Subject: [Biopython-dev] 9/16 biopython Questions - BioStar Message-ID: <4a5e91a12d7f379e8caea7bca2ce04ce@74.63.51.88> // How can I automate BLAST of >1 sequence and output the top hits for each input? // September 15, 2010 at 12:02 PM http://biostar.stackexchange.com/questions/2488/how-can-i-automate-blast-of-1-sequence-and-output-the-top-hits-for-each-input I'm looking for a way to do BLAST simultaneously with >1 sequence. The input is several fasta files or one file containing several sequence formatted in fasta. For an initial testing, I want to try obtaining the top 10 blast results for each input sequence and output them to a text file (one text file for each input sequence, or one big file containing all). But currently, I'm drawing a blank. Is there a way to do this with (preferably) biopython? -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/444424/f8ec200ea7b1a33442ee9d28a3d1365a23421b9a/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. 230 Franklin Road Suite 814 Franklin, TN 37064 From biopython at maubp.freeserve.co.uk Sat Sep 18 12:25:11 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 18 Sep 2010 13:25:11 +0100 Subject: [Biopython-dev] Moving Bio.Motif documentation into Tutorial.tex Message-ID: Hi Bartek, I think it would be good to try and move your Bio.Motif documentation from file Docs/cookbook/motif/motif.tex into the main Docs/Tutorial.tex as a new chapter. Currently it isn't obvious that Biopython supports things like a Position Weight Matrix (PWM). What do you think? The text will need a slight update since we have now deprecated and removed Bio.AlignAce and Bio.MEME, but that should be easy. Thanks, Peter From barwil at gmail.com Sat Sep 18 13:04:37 2010 From: barwil at gmail.com (Bartek Wilczynski) Date: Sat, 18 Sep 2010 15:04:37 +0200 Subject: [Biopython-dev] Moving Bio.Motif documentation into Tutorial.tex In-Reply-To: References: Message-ID: Hi, On Sat, Sep 18, 2010 at 2:25 PM, Peter wrote: > Hi Bartek, > > I think it would be good to try and move your Bio.Motif > documentation from file Docs/cookbook/motif/motif.tex > into the main Docs/Tutorial.tex as a new chapter. > Currently it isn't obvious that Biopython supports > things like a Position Weight Matrix (PWM). > > What do you think? > > The text will need a slight update since we have now > deprecated and removed Bio.AlignAce and Bio.MEME, > but that should be easy. > In general, I'm all for it. It's just that right now is not necessarily the best time for me to put much work into it. I'm trying to meet a RECOMB deadline of Oct. 8th with a paper, so if it would not be a problem, I could update it to the current state of the API after that. On the other hand, if there's anybody who wants to do it before then, I can review the changes even earlier. thanks for remembering about it. Bartek From bugzilla-daemon at portal.open-bio.org Sat Sep 18 16:26:00 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 18 Sep 2010 12:26:00 -0400 Subject: [Biopython-dev] [Bug 3010] Bio.KDTree is leaking memory In-Reply-To: Message-ID: <201009181626.o8IGQ0vk025229@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3010 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-09-18 12:25 EST ------- Created an attachment (id=1542) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1542&action=view) C program to show the memory leak C example written by Bartosz Telenczuk at the "Python and Friends" workshop http://pythonfriends.blogspot.com The idea of this was to determine if the memory leak was via the Python binding, or in the KDTree C code - this showed it was in the C code. We could then run this in valgrind which reveals the nature of the memory leak which scales directly with the number of coordinates (and a very big clue on how to fix this - commit to follow soon). ==5438== 16,000 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==5438== at 0x4025016: realloc (vg_replace_malloc.c:525) ==5438== by 0x8048F77: KDTree_add_point (in /home/peterjc/repositories/biopython/Bio/KDTree/test) ==5438== by 0x80495A3: KDTree_set_data (in /home/peterjc/repositories/biopython/Bio/KDTree/test) ==5438== by 0x8048618: main (in /home/peterjc/repositories/biopython/Bio/KDTree/test) ==5438== (this was with 2000 points, one iteration) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Sep 18 16:35:02 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 18 Sep 2010 12:35:02 -0400 Subject: [Biopython-dev] [Bug 3010] Bio.KDTree is leaking memory In-Reply-To: Message-ID: <201009181635.o8IGZ2b2025417@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3010 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-09-18 12:35 EST ------- Bug fix was to free tree->_data_point_list in KDTree_destroy, committed here: http://github.com/biopython/biopython/commit/285c085d9f0476eaefcc1049e36285f2d3b9b54f Unit tests look good - marking as fixed. P.S. To compile this C test on Linux, put it in the Bio/KDTree directory and: gcc test_kdtree.c KDTree.c -lm -o test -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Sat Sep 18 19:48:56 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Sep 2010 15:48:56 -0400 Subject: [Biopython-dev] Python 3 status (ignoring our C code and most dependencies) In-Reply-To: References: Message-ID: > On Thu, Sep 9, 2010 at 1:08 PM, Peter wrote: >> Eric - Can you reproduce this test_Phylo.py failure on your machine? [...] >> ------------------------------------------------------------------------ >> test_Phylo ... FAIL >> >> Traceback (most recent call last): >> ?File "test_Phylo.py", line 47, in test_convert >> ? ?Phylo.convert(self.mem_file, 'nexus', mem_file_2, 'phyloxml') [...] >> ?File "/home/xxx/lib/python3.1/xml/etree/ElementTree.py", line 677, in _write >> ? ?file.write(_encode("<" + tag, encoding)) >> TypeError: string argument expected, got 'bytes' >> I fixed this one: http://github.com/biopython/biopython/commit/9b5faa2d1affdf439acfbbec911cde0862111005 No progress on the DisorderedResidue error yet. -Eric From biopython at maubp.freeserve.co.uk Mon Sep 20 10:09:54 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Sep 2010 11:09:54 +0100 Subject: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez In-Reply-To: <488872.91962.qm@web62406.mail.re1.yahoo.com> References: <488872.91962.qm@web62406.mail.re1.yahoo.com> Message-ID: On Fri, Sep 3, 2010 at 6:26 PM, Michiel de Hoon wrote: > Hi everybody, > > The parser in Bio.Entrez can parse any XML returned by the Entrez E-utilities > as long as the corresponding DTD is available ... > The advantage of removing these hacks is that it will allow us to validate all > XML against the DTD, and to raise an error (if the user requests so) if any > elements are found in the XML that don't validate against the DTD. Hi Michiel, All the tests look fine but there is a deprecation warning from your new exception classes (I'm running it with Python 2.6 on the Mac): $ python test_Entrez.py Test error handling when presented with Fasta non-XML data ... /Library/Python/2.6/site-packages/Bio/Entrez/Parser.py:114: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 self.message = message ok Test error handling when presented with GenBank non-XML data ... ok Test parsing XML returned by EFetch, Nucleotide database (first test) ... ok Test parsing XML returned by EFetch, Protein database ... ok Test parsing XML returned by EFetch, OMIM database ... ok Test parsing XML returned by EFetch, PubMed database (first test) ... ok Test parsing XML returned by EFetch, PubMed database (second test) ... ok Test parsing XML returned by EFetch, Taxonomy database ... ok Test parsing XML output returned by EGQuery (first test) ... ok Test parsing XML output returned by EGQuery (second test) ... ok Test if corrupted XML is handled correctly ... /Library/Python/2.6/site-packages/Bio/Entrez/Parser.py:121: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 self.message = message ok Peter From mjldehoon at yahoo.com Mon Sep 20 10:41:33 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 20 Sep 2010 03:41:33 -0700 (PDT) Subject: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez In-Reply-To: Message-ID: <954716.31147.qm@web62402.mail.re1.yahoo.com> Thanks for letting me know. Could you try with the latest version in the Python 2.6 series? This DeprecationWarning seems to be a bug in Python itself. I haven't seen this deprecation warning with either Python 2.6.5 or Python 2.7. --Michiel. --- On Mon, 9/20/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez > To: "Michiel de Hoon" > Cc: biopython-dev at biopython.org > Date: Monday, September 20, 2010, 6:09 AM > On Fri, Sep 3, 2010 at 6:26 PM, > Michiel de Hoon > wrote: > > Hi everybody, > > > > The parser in Bio.Entrez can parse any XML returned by > the Entrez E-utilities > > as long as the corresponding DTD is available ... > > The advantage of removing these hacks is that it will > allow us to validate all > > XML against the DTD, and to raise an error (if the > user requests so) if any > > elements are found in the XML that don't validate > against the DTD. > > Hi Michiel, > > All the tests look fine but there is a deprecation warning > from your new > exception classes (I'm running it with Python 2.6 on the > Mac): > > $ python test_Entrez.py > Test error handling when presented with Fasta non-XML data > ... > /Library/Python/2.6/site-packages/Bio/Entrez/Parser.py:114: > DeprecationWarning: BaseException.message has been > deprecated as of > Python 2.6 > ? self.message = message > ok > Test error handling when presented with GenBank non-XML > data ... ok > Test parsing XML returned by EFetch, Nucleotide database > (first test) ... ok > Test parsing XML returned by EFetch, Protein database ... > ok > Test parsing XML returned by EFetch, OMIM database ... ok > Test parsing XML returned by EFetch, PubMed database (first > test) ... ok > Test parsing XML returned by EFetch, PubMed database > (second test) ... ok > Test parsing XML returned by EFetch, Taxonomy database ... > ok > Test parsing XML output returned by EGQuery (first test) > ... ok > Test parsing XML output returned by EGQuery (second test) > ... ok > Test if corrupted XML is handled correctly ... > /Library/Python/2.6/site-packages/Bio/Entrez/Parser.py:121: > DeprecationWarning: BaseException.message has been > deprecated as of > Python 2.6 > ? self.message = message > ok > > > Peter > From biopython at maubp.freeserve.co.uk Mon Sep 20 11:17:54 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Sep 2010 12:17:54 +0100 Subject: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez In-Reply-To: <954716.31147.qm@web62402.mail.re1.yahoo.com> References: <954716.31147.qm@web62402.mail.re1.yahoo.com> Message-ID: On Mon, Sep 20, 2010 at 11:41 AM, Michiel de Hoon wrote: > Thanks for letting me know. > Could you try with the latest version in the Python 2.6 series? This > DeprecationWarning seems to be a bug in Python itself. I haven't seen > this deprecation warning with either Python 2.6.5 or Python 2.7. > Using the Apple provided Python 2.6 on Mac OS X 10.6.4 "Snow Leopard", $ python2.6 Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> class X(ValueError): ... def __init__(self, message): ... self.message = message ... >>> raise X("Test") __main__:3: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 Traceback (most recent call last): File "", line 1, in __main__.X >>> On one of our Linux machines, $ python2.6 -Wall Python 2.6.6 (r266:84292, Aug 31 2010, 16:21:14) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> class X(ValueError): ... def __init__(self, message): ... self.message = message ... >>> raise X("Test") Traceback (most recent call last): File "", line 1, in __main__.X >>> I did a little Google searching, and according to PEP 352 the BaseException message attribute which was introduced in Python 2.5 was deprecated in Python 2.6, http://www.python.org/dev/peps/pep-0352/ However, the initial implementation of the deprecation warning triggered false positives (as in your code on Python 2.6.1), and this was fixed later in the Python 2.6.x series: http://bugs.python.org/issue6844 There is a simple work around - avoid using message as the attribute name. For example we could use msg, or simply a private attribute like _message instead: $ python2.6 Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> class X(ValueError): ... def __init__(self, message): ... self.msg = message ... >>> raise X("Test") Traceback (most recent call last): File "", line 1, in __main__.X Peter From biopython at maubp.freeserve.co.uk Mon Sep 20 11:20:17 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Sep 2010 12:20:17 +0100 Subject: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez In-Reply-To: References: <954716.31147.qm@web62402.mail.re1.yahoo.com> Message-ID: On Mon, Sep 20, 2010 at 12:17 PM, Peter wrote: > However, the initial implementation of the deprecation warning triggered > false positives (as in your code on Python 2.6.1), and this was fixed later > in the Python 2.6.x series: http://bugs.python.org/issue6844 The commit for http://bugs.python.org/issue6844 changed the Python NEWS file, so I can see now this was fixed in Python 2.6.3 http://svn.python.org/view?view=rev&revision=74848 Peter From biopython at maubp.freeserve.co.uk Mon Sep 20 11:36:16 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Sep 2010 12:36:16 +0100 Subject: [Biopython-dev] Python 3 status (ignoring our C code and most dependencies) In-Reply-To: References: Message-ID: On Sat, Sep 18, 2010 at 8:48 PM, Eric Talevich wrote: > > I fixed this one: > http://github.com/biopython/biopython/commit/9b5faa2d1affdf439acfbbec911cde0862111005 > Well, almost ;) http://github.com/biopython/biopython/commit/5fca22f919eeb9c3161519818be9779f203d0a38 Works for me now :) Peter From mjldehoon at yahoo.com Mon Sep 20 13:05:56 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 20 Sep 2010 06:05:56 -0700 (PDT) Subject: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez In-Reply-To: Message-ID: <964724.85299.qm@web62406.mail.re1.yahoo.com> OK, msg it is. --Michiel. --- On Mon, 9/20/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez > To: "Michiel de Hoon" > Cc: biopython-dev at biopython.org > Date: Monday, September 20, 2010, 7:17 AM > On Mon, Sep 20, 2010 at 11:41 AM, > Michiel de Hoon > wrote: > > Thanks for letting me know. > > Could you try with the latest version in the Python > 2.6 series? This > > DeprecationWarning seems to be a bug in Python itself. > I haven't seen > > this deprecation warning with either Python 2.6.5 or > Python 2.7. > > > > Using the Apple provided Python 2.6 on Mac OS X 10.6.4 > "Snow Leopard", > > $ python2.6 > Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) > [GCC 4.2.1 (Apple Inc. build 5646)] on darwin > Type "help", "copyright", "credits" or "license" for more > information. > >>> class X(ValueError): > ...? ???def __init__(self, message): > ...? ? ? ???self.message = > message > ... > >>> raise X("Test") > __main__:3: DeprecationWarning: BaseException.message has > been > deprecated as of Python 2.6 > Traceback (most recent call last): > ? File "", line 1, in > __main__.X > >>> > > On one of our Linux machines, > > $ python2.6 -Wall > Python 2.6.6 (r266:84292, Aug 31 2010, 16:21:14) > [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 > Type "help", "copyright", "credits" or "license" for more > information. > >>> class X(ValueError): > ...? ???def __init__(self, message): > ...? ? ? ???self.message = > message > ... > >>> raise X("Test") > Traceback (most recent call last): > ? File "", line 1, in > __main__.X > >>> > > I did a little Google searching, and according to PEP 352 > the BaseException > message attribute which was introduced in Python 2.5 was > deprecated in > Python 2.6, http://www.python.org/dev/peps/pep-0352/ > > However, the initial implementation of the deprecation > warning triggered > false positives (as in your code on Python 2.6.1), and this > was fixed later > in the Python 2.6.x series: http://bugs.python.org/issue6844 > > There is a simple work around - avoid using message as the > attribute > name. For example we could use msg, or simply a private > attribute like > _message instead: > > $ python2.6 > Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) > [GCC 4.2.1 (Apple Inc. build 5646)] on darwin > Type "help", "copyright", "credits" or "license" for more > information. > >>> class X(ValueError): > ...? ???def __init__(self, message): > ...? ? ? ???self.msg = > message > ... > >>> raise X("Test") > Traceback (most recent call last): > ? File "", line 1, in > __main__.X > > Peter > From tiagoantao at gmail.com Tue Sep 28 11:41:16 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 28 Sep 2010 12:41:16 +0100 Subject: [Biopython-dev] Continuous integration Message-ID: Hi, I've been playing with buildbot a bit (for continuous integration stuff). I am creating a page on the wiki with some info on that front. This is just concept/exploratory stuff: if people don't like it, it is just a question to delete the page. Hopefully this will at least permit to see if continuous integration is worthwhile the effort and if buildbot is a good platform for Biopython. Any comments most welcome. I expect to have a working prototype very soon. If people don't like it, I just trash it (no problems with that). Tiago -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From krother at rubor.de Tue Sep 28 14:04:06 2010 From: krother at rubor.de (Kristian Rother) Date: Tue, 28 Sep 2010 16:04:06 +0200 Subject: [Biopython-dev] Report from the Python & Friends workshop Message-ID: <1f14b30493e399d4252107455f03563e-EhVcX1xCQgFaRwICBxEAXR0wfgFLV15YQUBGAEFfUC9ZUFgWXVpyH1RXX0FaLkQAV19fSFpTXA4=-webmailer1@server08.webmailer.hosteurope.de> Hi, I'd like share a report from a recent bioinformatics Python meeting (basically because we did Biopython work there). Sorry for crossposting. At the Python & Friends Workshop 2010 in Karpacz, 20 Python bioinformaticians spent three busy days programming and learning about newest technologies. The Polish & German ISCB Student Council members brought together an equal number of people from both countries. Some highlights of the workshop were: * A keynote on Biopython and subsequent tutorial were given by Peter Cock. During the session, the brand new GSOC code branch for analyzing PDB structures was tested. We found the center of mass, coarse-grained models, and renumbering features working well, but the hydrogenation and SS-bond calculation require further testing. * The K-means clustering algorithm implemented in numpy was explained in detail, and an intense numpy tutorial prepared by Bartosz Telenczuk. * A live demonstration on how to search & cluster ligands with the Schroedinger software package was given by Ewa Bielska. * The Galaxy tool integration workbench was in the focus of interest of many participants, and Sebastian Schultheiss gave a hands-on tutorial. A closer interaction of the Galaxy platform with Biopython would probably be useful for many people. * In a hackathon session, a memory leak in the KDTree code of Biopython was closed. * And of course, there was a hike into the mountains near the Czech border. The workshop took place in an informal setting bringing together groups with a variety of backgrounds. According to participants, it made sense to find out what tools are there, and to plan strategic issues like py3k migration. The workshop has been organized by Teresa Szczepinska (ISCB Student Council, regional group Poland), Sebastian Schultheiss (ISCB Student Council, regional group Germany), Stefan G?nther (Freiburg) and Kristian Rother (Poznan). Best Regards, Kristian From krother at rubor.de Tue Sep 28 14:04:49 2010 From: krother at rubor.de (Kristian Rother) Date: Tue, 28 Sep 2010 16:04:49 +0200 Subject: [Biopython-dev] RNA Alphabet with modified nucleotides Message-ID: Hi, finally got some more tests and cleanup done on Bio.RNA.RNASeq. The current branch http://github.com/krother/biopython/commits/rna_alphabet contains: - doctests - unit tests - copyright statements - added RNASeq object to test_Seq_obs.py - compatibility to the Seq class (there is a caveat though: functions like complement/translate/reverse_complement don't make sense on sequences containing modified nucleotides, and result in an exception.) Also, there is a branch with ongoing bugfix work on the PDB GSOC code, but I need to correspond with Joao Rodrigues on this. Best Regards, Kristian Am 2010-07-01 15:26, Peter wrote: > On Thu, Jul 1, 2010 at 2:01 PM, Kristian Rother wrote: >> Hi, >> >> I've commited code + tests for representing RNA sequences with modified >> nucleotides to a branch on Github. See: >> >> http://github.com/krother/biopython/commits/rna_alphabet >> >> I'm done with my list of 'most wanted' features for this class, including >> suggestions from Peter. >> What could I do next to help integrating the new code with the rest of >> Biopython? > Hi Kristian, > > I haven't had a play with the code, just a very brief look at it. > > You'll need to add licence and copyright statements. > > A few embedded doctests in the docstrings would be very nice > to help explain how the new classes are to be used. > > What happens if you add some of the new DNA seq objects > to test_Seq_objs.py? Is it all fine? > > Are you planning to add a reverse complement method etc? Or > does the current fall back on the Seq implementation work OK? > > Peter > From bugzilla-daemon at portal.open-bio.org Tue Sep 28 18:47:39 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 28 Sep 2010 14:47:39 -0400 Subject: [Biopython-dev] [Bug 3139] New: python setup.py test ends with error code 0 even on failure Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3139 Summary: python setup.py test ends with error code 0 even on failure Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: tiagoantao at gmail.com As per the subject: python setup.py test ends with error code zero even if there are failed tests. This is problematic for tools evaluating the outcome of tests (e.g. integration testing) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Sep 28 21:36:34 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 28 Sep 2010 17:36:34 -0400 Subject: [Biopython-dev] [Bug 3139] python setup.py test ends with error code 0 even on failure In-Reply-To: Message-ID: <201009282136.o8SLaYA0013308@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3139 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-09-28 17:36 EST ------- Hi Taigo, I removed a sys.exit(...) call from run_tests.py because it was annoying if running the tests from IDLE. With hindsight that was short sighted of me and should be reverted pending a better solution - see: http://github.com/biopython/biopython/commit/7def689001f1f457d754703d0f233f82b33c9074 Furthermore the TestRunner class method run should return a value to be used as the return code. Since 0 is success, and 2 is already used for bad args, I suggest 1 for at least one test failure. If we have "python run_tests.py" returning useful error codes, then it is easier (but not essential) to do the same for "python setup.py test". Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Sep 29 19:36:10 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Sep 2010 15:36:10 -0400 Subject: [Biopython-dev] [Bug 3139] python setup.py test ends with error code 0 even on failure In-Reply-To: Message-ID: <201009291936.o8TJaAg7028406@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3139 ------- Comment #2 from tiagoantao at gmail.com 2010-09-29 15:36 EST ------- > Furthermore the TestRunner class method run should return a value to be used > as the return code. Since 0 is success, and 2 is already used for bad args, > I suggest 1 for at least one test failure. > > If we have "python run_tests.py" returning useful error codes, then it is > easier (but not essential) to do the same for "python setup.py test". For now I will hack away my test scripts with buildbot. But if people like integration testing, we might have to revisit this to have a more clean system... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.