From bugzilla-daemon at portal.open-bio.org Mon Feb 1 06:17:58 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Feb 2010 06:17:58 -0500 Subject: [Biopython-dev] [Bug 3004] PSL alignment format parsing In-Reply-To: Message-ID: <201002011117.o11BHwib023118@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3004 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk Summary|PSL alignment format parsing|PSL alignment format parsing |in Bio.AlignIO | ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-01 06:17 EST ------- (In reply to comment #2) > Now on github: > > http://github.com/vforget/PyBLATPSL > > Vince > Thanks for the link. I don't see how this connects to sequence alignments for Bio.AlignIO as suggested in your original comment (bug title edited accordingly). I see you are parsing tabular output into an object, with addition methods for scores etc. This looks fairly useful, but is not appropriate for the Bio.AlignIO module. Maybe it can go under a new namespace instead, maybe Bio.BLAT? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 1 06:27:00 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Feb 2010 06:27:00 -0500 Subject: [Biopython-dev] [Bug 3000] Could SeqIO.parse() store the whole, unparsed multiline entry? In-Reply-To: Message-ID: <201002011127.o11BR0lp023326@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3000 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-01 06:26 EST ------- (In reply to comment #1) > (In reply to comment #0) > > Still, I suspect this will > > reformat the entry (currently I see trailing dot removed from KEYWORDS, no > > REFERENCE, AUTHORS, TITLE, JOURNAL, PUBMED; and FEATURES.source being > > re-ordered). > > Yes, using Bio.SeqIO to read/write a GenBank record will give you (slightly) > different output. We do not guarantee a 100% round trip (even on simpler > formats like FASTA). Even little things like line wrapping would make this > very difficult. > > Regarding GenBank KEYWORDS, please file a bug. Don't worry about reporting a bug for this, I've just fixed the missing period for KEYWORDS: http://github.com/biopython/biopython/commit/5a87b070fc1f4fb911d4cf8a2e53c330cd6bd83d Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 1 08:35:11 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Feb 2010 08:35:11 -0500 Subject: [Biopython-dev] [Bug 2294] Writing GenBank files with Bio.SeqIO In-Reply-To: Message-ID: <201002011335.o11DZBcJ029190@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2294 ------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-01 08:35 EST ------- (In reply to comment #16) > > > * Writing references > > Not done yet, but for my personal needs this is low priority. Reference output in GenBank format from SeqIO just committed on github, http://github.com/biopython/biopython/commit/42707bda738d0239a9ff85a39c39c89c8024549d > > * Extending to cover writing EBML files > > Not done yet, but should be comparatively straight forward. Let's track this > possible enhancement on a separate bug. EMBL output in SeqIO was done a while ago and was included in Biopython 1.52 (although we don't yet write references in EMBL output). Things still to do on GenBank output include better handling of the LOCUS line, such as the data division. See also Bug 2578 for the molecule type. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 1 09:43:41 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Feb 2010 09:43:41 -0500 Subject: [Biopython-dev] [Bug 2294] Writing GenBank files with Bio.SeqIO In-Reply-To: Message-ID: <201002011443.o11EhfAT031724@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2294 ------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-01 09:43 EST ------- (In reply to comment #17) > > EMBL output in SeqIO was done a while ago and was included in Biopython 1.52 > (although we don't yet write references in EMBL output). References in EMBL output implemented now: http://github.com/biopython/biopython/commit/370e02053a45aec6209bd826aebab7bfc29d7e84 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Feb 2 13:37:25 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 2 Feb 2010 18:37:25 +0000 Subject: [Biopython-dev] Getting raw unparsed records with SeqIO? Message-ID: <320fb6e01002021037p75d24f43w84837c426b057093@mail.gmail.com> Hi all, Over on enhancement Bug 3000, Martin was asking about getting raw unparsed strings for each record in a sequence file: http://bugzilla.open-bio.org/show_bug.cgi?id=3000 This makes sense for sequential files like FASTA and GenBank, but not for interlaced files like PHYLIP, and has less obvious uses when there is any kind of header or footer (e.g. XML or SFF files). The particular example Martin gave was selecting a subset of records in a large GenBank file (I've done this myself in the past). While this can be done via Bio.SeqIO, the process of parsing the data into a SeqRecord and saving it again is lossy. While there is room for improvement. For this particular example, I suggested Martin use the "old" iterator class in Bio.GenBank. In general things like white space and wrapping mean that a SeqIO parse/write cannot guarantee a 100% unaltered round trip, and will also be slower than using the raw record as a string. Martin suggested adding an optional argument to the parse function. I'm not sure this is a good API choice, as it would dramatically alter the return values. Perhaps we could have a new iterator function in Bio.SeqIO for suitable sequential files only which returns a series of strings, one for each record, unmodified? Either way I don't see how this would be used - surely the user would need to do some basic analysis of each raw record to decide how to process it? In this example, they would need to extract the ID/accession to see if they want to output the record or not. While parsing the record into a SeqRecord may not be needed, in most cases the record identifier would be very useful - and this has some big overlaps with the Bio.SeqIO.index() code which already breaks up files into records and extracts their identifiers. i.e. A top level Bio.SeqIO function to iterate over a file returning tuples of the record identifier and the raw record as strings *could* be useful. Implementing this nicely would mean re-factoring Bio.SeqIO.index() extensively. Another solution to this task (extracting the raw GenBank records from a large file) would seem to be to extend the Bio.SeqIO.index functionality. The patch I'm about to attach to Bug 3000 adds a new "get_raw" method to the dictionary like object we return. Unlike the __getitem__ and get methods which return a SeqRecord this just gives the raw string. Note that I haven't implemented this for all the index support file formats yet, and this has had only very basic testing. Writing this email took longer than writing the code. However, I hope it illustrates the idea enough for a discussion. As an example how the index function could be used with this patch: >>> from Bio import SeqIO >>> data = SeqIO.index("cor6_6.gb", "gb") >>> data.keys() ['L31939.1', 'AJ237582.1', 'X62281.1', 'AF297471.1', 'X55053.1', 'M81224.1'] >>> print data.get_raw("X62281.1") LOCUS ATKIN2 880 bp DNA PLN 23-JUL-1992 DEFINITION A.thaliana kin2 gene. ACCESSION X62281 ... // What are people's thoughts on this? Peter From bugzilla-daemon at portal.open-bio.org Tue Feb 2 13:40:07 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 2 Feb 2010 13:40:07 -0500 Subject: [Biopython-dev] [Bug 3000] Could SeqIO.parse() store the whole, unparsed multiline entry? In-Reply-To: Message-ID: <201002021840.o12Ie7pO015898@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3000 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-02 13:40 EST ------- Created an attachment (id=1436) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1436&action=view) Adds a get_raw method to the dictionaries returned by Bio.SeqIO.index() Outline implementation of an alternative proposal, allowing access to the raw text for each record via the Bio.SeqIO.index() dictionary like objects. See discussion here: http://lists.open-bio.org/pipermail/biopython-dev/2010-February/007301.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From krother at rubor.de Wed Feb 3 05:29:04 2010 From: krother at rubor.de (Kristian Rother) Date: Wed, 3 Feb 2010 11:29:04 +0100 Subject: [Biopython-dev] report: what happens on 'from Bio import PDB'? In-Reply-To: <201002021840.o12Ie7pO015898@portal.open-bio.org> References: <201002021840.o12Ie7pO015898@portal.open-bio.org> Message-ID: <18fbb8f40f6ec6efe3d5dffff68aaa57-EhVcX1xCQgFaRwICBxEAXR0wfgFLV15YQUBGAEFfUC9ZUFgWXVpyH1RXX0FdQU1tXlhfRVlcWgFbWw==-webmailer2@server03.webmailer.hosteurope.de> Hi, I'm currently checking what my application is using its memory for (because it uses way too much for non-Biopython related things). However, as soon as the simple command from Bio import PDB is executed, these are the objects that Python has in memory after running the gc: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 from module:Bio.GenBank.utils 1 from module:Bio.PDB.PDBIO 1 from module:Bio.PropertyManager 1 from module:os 2 2 2 2 2 2 2 2 2 2 from module:Bio.GenBank.LocationParser 2 from module:xml.sax.handler 3 3 3 3 4 5 6 6 6 from module:numpy.ma.extras 7 7 from module:Bio.Alphabet.IUPAC 7 from module:__future__ 8 9 10 13 14 15 16 16 19 27 35 35 36 38 49 56 56 from module:Bio.Alphabet 68 76 91 95 from module:numpy.ma.core 201 203 225 350 351 from module:Bio.Data.CodonTable 360 385 393 407 579 837 1365 2073 3191 3289 4099 11989 19718 total 50912 Hope this is useful ;-) Best Regards, Kristian From lplp90 at gmail.com Wed Feb 3 06:35:49 2010 From: lplp90 at gmail.com (Laura Padioleu) Date: Wed, 3 Feb 2010 12:35:49 +0100 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... Message-ID: <771202501002030335r52f62fccta1b976f9e586d435@mail.gmail.com> On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox > wrote: >* *>* Hi Folks, *>* *>* this is a demo that i use to create then align my fasta sequences using clustalw. Hope it helps. here's the code * >def clustal(list_struc): > > > hash_table={} > for i in range (len(list_struc)): > for j in range (i+1,len(list_struc)): > pair=(list_struc[i],list_struc[j]) > hash_table >[pair]=0 > > > for pair in hash_table >.keys(): > fasta_fic=open("fasta.fasta",'w') > for ID in pair: > fasta_fic.write(">"+ID.get_id()+'\n') > > # recuperation des sequences des acides amines > for chain in ID.get_chains(): > ppb = PPBuilder() > > pp = ppb.build_peptides(chain) > # l'ajout des sequences aux fichiers fasta > fasta_fic.write(pp[0].get_sequence().tostring()) > fasta_fic.write('\n') > fasta_fic.close() > cline = ClustalwCommandline(cmd="clustalw", infile="file.fasta") > return_code = subprocess.call(str(cline), shell=(sys.platform!="win32")) > > alignment = AlignIO.read(open("file"+str(nb)+".aln"),"clustal") > > > j=0 > i=0 > for record in alignment: > for amino_acid in record.seq: > if amino_acid == '-': > pass > else: > if amino_acid == alignment[0].seq[j]: > i += 1 > j += 1 > j = 0 > >seq = str(record.seq) > gap_strip = seq.replace('-', '') > percent = 100.0*i/len(seq) > > i=0 > hash_table[pair]=str(percent)+"\t"+str(percent2) > > > return hash_table > >def csv_writer(list_struc): > hash_table=clustal(list_struc) > csv_fic=open("file.csv",'a') > for couple in hash_table.keys(): > csv_fic.write(pari[0].get_id()+"\t"+str(hash_table[pair])+'\n') > csv_fic.close()* Hello, im using python version 2.5 but i can't compile this code correctly what version of python and biopython you are using ? Thanks From chapmanb at 50mail.com Wed Feb 3 07:46:48 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 3 Feb 2010 07:46:48 -0500 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... In-Reply-To: <771202501002030335r52f62fccta1b976f9e586d435@mail.gmail.com> References: <771202501002030335r52f62fccta1b976f9e586d435@mail.gmail.com> Message-ID: <20100203124648.GC40046@sobchak.mgh.harvard.edu> Hi Laura; [clustalw example from Cymon] > im using python version 2.5 but i can't compile this code correctly > what version of python and biopython you are using ? We could help more with some additional information. Could you copy and paste the error message you are seeing? Brad From cy at cymon.org Wed Feb 3 07:48:49 2010 From: cy at cymon.org (Cymon Cox) Date: Wed, 3 Feb 2010 12:48:49 +0000 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... In-Reply-To: <7265d4f1002030412l1258237jf50ff37845e7c5a5@mail.gmail.com> References: <771202501002030335r52f62fccta1b976f9e586d435@mail.gmail.com> <7265d4f1002030412l1258237jf50ff37845e7c5a5@mail.gmail.com> Message-ID: <7265d4f1002030448n28065ea1ifc411cf0c7b462e8@mail.gmail.com> ---------- Forwarded message ---------- From: Cymon Cox Date: 3 February 2010 12:12 Subject: Re: [Biopython-dev] Multiple alignment - Clustalw etc... To: Laura Padioleu Hi Laura, On 3 February 2010 11:35, Laura Padioleu wrote: > On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox > wrote: > >* > *>* Hi Folks, > Yes, I did write that... Hello, > > im using python version 2.5 but i can't compile this code correctly > what version of python and biopython you are using ? > How exactly are you using this code? What error do you get? Can you cut and paste a session from the terminal? Cheers, C. -- From chapmanb at 50mail.com Wed Feb 3 07:55:52 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 3 Feb 2010 07:55:52 -0500 Subject: [Biopython-dev] Getting raw unparsed records with SeqIO? In-Reply-To: <320fb6e01002021037p75d24f43w84837c426b057093@mail.gmail.com> References: <320fb6e01002021037p75d24f43w84837c426b057093@mail.gmail.com> Message-ID: <20100203125552.GD40046@sobchak.mgh.harvard.edu> Hi Peter; > Another solution to this task (extracting the raw GenBank > records from a large file) would seem to be to extend the > Bio.SeqIO.index functionality. The patch I'm about to > attach to Bug 3000 adds a new "get_raw" method to the > dictionary like object we return. Unlike the __getitem__ > and get methods which return a SeqRecord this just gives > the raw string. [...] > >>> from Bio import SeqIO > >>> data = SeqIO.index("cor6_6.gb", "gb") > >>> data.keys() > ['L31939.1', 'AJ237582.1', 'X62281.1', 'AF297471.1', 'X55053.1', 'M81224.1'] > >>> print data.get_raw("X62281.1") > LOCUS ATKIN2 880 bp DNA PLN 23-JUL-1992 > DEFINITION A.thaliana kin2 gene. > ACCESSION X62281 > ... > // > > What are people's thoughts on this? Not much to add, but a +1 from me. This sounds like a solid solution and makes sense for the use case I can think of, which is picking out records of interest from a large file and re-writing them in a smaller file. Brad From chapmanb at 50mail.com Wed Feb 3 07:55:52 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 3 Feb 2010 07:55:52 -0500 Subject: [Biopython-dev] Getting raw unparsed records with SeqIO? In-Reply-To: <320fb6e01002021037p75d24f43w84837c426b057093@mail.gmail.com> References: <320fb6e01002021037p75d24f43w84837c426b057093@mail.gmail.com> Message-ID: <20100203125552.GD40046@sobchak.mgh.harvard.edu> Hi Peter; > Another solution to this task (extracting the raw GenBank > records from a large file) would seem to be to extend the > Bio.SeqIO.index functionality. The patch I'm about to > attach to Bug 3000 adds a new "get_raw" method to the > dictionary like object we return. Unlike the __getitem__ > and get methods which return a SeqRecord this just gives > the raw string. [...] > >>> from Bio import SeqIO > >>> data = SeqIO.index("cor6_6.gb", "gb") > >>> data.keys() > ['L31939.1', 'AJ237582.1', 'X62281.1', 'AF297471.1', 'X55053.1', 'M81224.1'] > >>> print data.get_raw("X62281.1") > LOCUS ATKIN2 880 bp DNA PLN 23-JUL-1992 > DEFINITION A.thaliana kin2 gene. > ACCESSION X62281 > ... > // > > What are people's thoughts on this? Not much to add, but a +1 from me. This sounds like a solid solution and makes sense for the use case I can think of, which is picking out records of interest from a large file and re-writing them in a smaller file. Brad From bugzilla-daemon at portal.open-bio.org Wed Feb 3 16:44:14 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 3 Feb 2010 16:44:14 -0500 Subject: [Biopython-dev] [Bug 1999] new frame translation method In-Reply-To: Message-ID: <201002032144.o13LiERA027299@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1999 ------- Comment #3 from eric.talevich at gmail.com 2010-02-03 16:44 EST ------- Can we split this into two functions? I tried this function today, hoping it would help me get a list of ORFs from a big contig -- but both frameTranslations and six_frame_translation do two things without stopping in between: 1. Translate the DNA or RNA sequence to amino acids in all six frames 2. Pretty-print the six-frame translation So, how about factoring out just this piece (or similar): def translate_six_frames(seq, genetic_code=1): """Dictionary of 6-frame translations.""" anti = seq.reverse_complement() frames = {} for i in range(0,3): frames[i+1] = seq[i:].translate(genetic_code) frames[-i-1] = SeqUtils.reverse(anti[i:].translate(genetic_code)) return frames Then either pretty-printer can call this internally, and the user also has access to the individual translated sequences. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Feb 3 18:13:10 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Feb 2010 23:13:10 +0000 Subject: [Biopython-dev] [Bug 3000] Could SeqIO.parse() store the whole, unparsed multiline entry? In-Reply-To: <4B6995D0.3030405@fold.natur.cuni.cz> References: <201002021840.o12Ie88i015906@portal.open-bio.org> <4B6995D0.3030405@fold.natur.cuni.cz> Message-ID: <320fb6e01002031513r1faac5faicf027daf5da77d80@mail.gmail.com> On Wed, Feb 3, 2010 at 3:27 PM, Martin MOKREJ? wrote: > > Hi Peter, > ?thank you very much for all your efforts. I will try to get to testing the cvs > code in few days. Definitely will keep you updated. ;) > Martin > > bugzilla-daemon at portal.open-bio.org wrote: >> http://bugzilla.open-bio.org/show_bug.cgi?id=3000 >> ... The patch hasn't been checked in, but should apply to either the master branch in github or (I expect) Biopython 1.53 I'm looking forward to feedback. Peter From bugzilla-daemon at portal.open-bio.org Thu Feb 4 10:20:51 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 4 Feb 2010 10:20:51 -0500 Subject: [Biopython-dev] [Bug 1999] new frame translation method In-Reply-To: Message-ID: <201002041520.o14FKp9j000360@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1999 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-04 10:20 EST ------- (In reply to comment #3) > Can we split this into two functions? I tried this function today, hoping it > would help me get a list of ORFs from a big contig -- but both > frameTranslations and six_frame_translation do two things without stopping in > between: > > 1. Translate the DNA or RNA sequence to amino acids in all six frames I'd wondered about this - possibly as a generator/iterator which always gives back exactly six sequences - but don't really see much point. There is also going to be some debate about how frames are labelled (especially the minus frames). > 2. Pretty-print the six-frame translation Personally I don't see this as being very useful, but someone must like it. I lean to just deprecating and removing this code. > So, how about factoring out just this piece (or similar): > > def translate_six_frames(seq, genetic_code=1): > """Dictionary of 6-frame translations.""" > anti = seq.reverse_complement() > frames = {} > for i in range(0,3): > frames[i+1] = seq[i:].translate(genetic_code) > frames[-i-1] = SeqUtils.reverse(anti[i:].translate(genetic_code)) > return frames You should be taking the reverse complement, not just the reverse. This would just be seq[i:].reverse_complement() or seq.reverse_complenent()[i:] depending on how you label the reverse frames. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Feb 4 10:30:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Feb 2010 15:30:47 +0000 Subject: [Biopython-dev] Getting raw unparsed records with SeqIO? In-Reply-To: <20100203125552.GD40046@sobchak.mgh.harvard.edu> References: <320fb6e01002021037p75d24f43w84837c426b057093@mail.gmail.com> <20100203125552.GD40046@sobchak.mgh.harvard.edu> Message-ID: <320fb6e01002040730q7835e1a1uc784dfaae5faaef2@mail.gmail.com> On Wed, Feb 3, 2010 at 12:55 PM, Brad Chapman wrote: > > Not much to add, but a +1 from me. This sounds like a solid solution > and makes sense for the use case I can think of, which is picking > out records of interest from a large file and re-writing them in a > smaller file. > Let's give Martin a chance to test with the patch, and see how he gets on. I'm curious if anyone can come up with other examples of how this could be applied, which would help justify adding it to Bio.SeqIO. Peter From biopython at maubp.freeserve.co.uk Thu Feb 4 10:30:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Feb 2010 15:30:47 +0000 Subject: [Biopython-dev] Getting raw unparsed records with SeqIO? In-Reply-To: <20100203125552.GD40046@sobchak.mgh.harvard.edu> References: <320fb6e01002021037p75d24f43w84837c426b057093@mail.gmail.com> <20100203125552.GD40046@sobchak.mgh.harvard.edu> Message-ID: <320fb6e01002040730q7835e1a1uc784dfaae5faaef2@mail.gmail.com> On Wed, Feb 3, 2010 at 12:55 PM, Brad Chapman wrote: > > Not much to add, but a +1 from me. This sounds like a solid solution > and makes sense for the use case I can think of, which is picking > out records of interest from a large file and re-writing them in a > smaller file. > Let's give Martin a chance to test with the patch, and see how he gets on. I'm curious if anyone can come up with other examples of how this could be applied, which would help justify adding it to Bio.SeqIO. Peter From bugzilla-daemon at portal.open-bio.org Mon Feb 8 12:08:33 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 8 Feb 2010 12:08:33 -0500 Subject: [Biopython-dev] [Bug 3006] New: esearch medline fails with xml format Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3006 Summary: esearch medline fails with xml format Product: Biopython Version: Not Applicable Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: georg.lipps at fhnw.ch I used to retrieve Pubmed records with python 2.5.1 however lately the efetch with xml produces an error. The problem has arosen at the year change maybe related to the DTD definition file: Here is a short code which produces the error: from Bio import Entrez from Bio import Medline def retrieve_medline(doi): # Uses the doi to obtain the medline id and then retrieves the medline entry # Returns the medline entry as text and python object or an empty string print "...queing medline with DOI", doi handle = Entrez.esearch(db="pubmed", term=doi, retmode="XML") record=Entrez.read(handle) if record["Count"]<>"1": return None, None handle=Entrez.efetch(db="pubmed", id=record["IdList"], retmode="text", rettype="medline") xml=Entrez.efetch(db="pubmed", id=record["IdList"], retmode="XML", rettype="medline") return handle.read(), Entrez.read(xml) doi='10.1038/nature07389' article, xml=retrieve_medline(doi) print article OUTPUT: Traceback (most recent call last): File "U:/Literatur/pdf to RM converter/test.py", line 24, in article, xml=retrieve_medline(doi) File "U:/Literatur/pdf to RM converter/test.py", line 15, in retrieve_medline return handle.read(), Entrez.read(xml) File "C:\Program Files\python25\lib\site-packages\Bio\Entrez\__init__.py", line 283, in read record = handler.run(handle) File "C:\Program Files\python25\lib\site-packages\Bio\Entrez\Parser.py", line 95, in run self.parser.ParseFile(handle) File "C:\Program Files\python25\lib\site-packages\Bio\Entrez\Parser.py", line 131, in startElement return UnboundLocalError: local variable 'object' referenced before assignment -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 8 18:26:38 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 8 Feb 2010 18:26:38 -0500 Subject: [Biopython-dev] [Bug 3006] esearch medline fails with xml format In-Reply-To: Message-ID: <201002082326.o18NQcwP006902@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3006 ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2010-02-08 18:26 EST ------- I was not able to replicate this bug. Your example code ran correctly with Python 2.6, Biopython 1.53. Are you using the latest version of Biopython? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sandford at ufl.edu Mon Feb 8 16:49:20 2010 From: sandford at ufl.edu (Michael Sandford) Date: Mon, 08 Feb 2010 16:49:20 -0500 Subject: [Biopython-dev] Where should feature intersection code go? Message-ID: <4B7086E0.1090501@ufl.edu> I'm working on a project that's looking for alternative splicing using solexa data instead of microarray data. Basically we've got a GFF file containing all the genes, introns and exons and 35M reads that have been placed into one of the various chromosomes via the excellent bowtie application out of Maryland. Bowtie output is documented here: http://bowtie-bio.sourceforge.net/manual.shtml#default-bowtie-output In summary it's roughly a cross between fastq and GFF. It's got the read name, strand, sequence the read aligned to, position, sequence, quality, and a few others. It seems like it could rather easily be coerced into a SeqRecord (http://biopython.org/DIST/docs/api/Bio.SeqRecord.SeqRecord-class.html). It might not get filled up completely, but it'd be better than handling things in a one-off way. The FeatureLocation class provides for approximate and exact locations (both start and stop positions). It seems like the correct location to put code that determines if two FeatureLocations overlap, or if one contains another, or is contained by another. Overall I'm talking about writing a bowtie .map parser and the comparison code for FeatureLocation. Would these be welcome features? Thanks, Mike From chapmanb at 50mail.com Mon Feb 8 20:04:25 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 8 Feb 2010 20:04:25 -0500 Subject: [Biopython-dev] Where should feature intersection code go? In-Reply-To: <4B7086E0.1090501@ufl.edu> References: <4B7086E0.1090501@ufl.edu> Message-ID: <20100209010425.GD2193@kunkel> Mike; > I'm working on a project that's looking for alternative splicing > using solexa data instead of microarray data. Basically we've got a > GFF file containing all the genes, introns and exons and 35M reads > that have been placed into one of the various chromosomes via the > excellent bowtie application out of Maryland. [...] > Overall I'm talking about writing a bowtie .map parser and the > comparison code for FeatureLocation. Would these be welcome > features? A .map parser would definitely be useful. Another suggestion is to get Bowtie to produce SAM format and use Pysam for parsing: http://code.google.com/p/pysam/ The advantage of SAM is that it's an emerging standard and a lot of downstream applications can use it. This way you can switch aligners in your workflow without much disruption. For doing feature overlaps, IntervalTree in bx-python is excellent: http://bitbucket.org/james_taylor/bx-python/wiki/Home http://bitbucket.org/james_taylor/bx-python/src/tip/lib/bx/intervals/intersection.pyx See the doc string of the IntervalTree class for how to use it. My normal workflow is to build an IntervalTree with the GFF features of your genome, and then loop through the alignment file finding features that each alignment intersects. For alternative splicing, are you using the raw genome or a built transcriptome for all possible combinations of exons? One practical thing to consider if that a read will not be aligned to the genome if it splits an exon/exon junction. Hope this helps, Brad From bugzilla-daemon at portal.open-bio.org Tue Feb 9 20:42:28 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Feb 2010 20:42:28 -0500 Subject: [Biopython-dev] [Bug 2294] Writing GenBank files with Bio.SeqIO In-Reply-To: Message-ID: <201002100142.o1A1gSJJ022517@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2294 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #19 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-09 20:42 EST ------- (In reply to comment #17) > > Things still to do on GenBank output include better handling of the LOCUS > line, such as the data division. See also Bug 2578 for the molecule type. > I've adding mappings for some EMBL divisions to suitable GenBank divisions. I'm closing this bug now, as GenBank output does basically work now. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rjalves at igc.gulbenkian.pt Wed Feb 10 13:30:05 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Wed, 10 Feb 2010 18:30:05 +0000 Subject: [Biopython-dev] KEGG support Message-ID: <4B72FB2D.4070808@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi everyone, KEGG support in Biopython has been mostly untouched for the past 8 years with only a few changes and test additions. There is code in the tree to work with the Enzyme and Compound databases but not for others such as GENES, ORTHOLOGY, DRUG, ... Considering the fact that I will need to write some code to work with other formats I was planning to contribute and integrate it with the SeqIO interface. This will require some additional homework on my part. KEGG also has a SOAP based API [1]. It's functionality could be in some aspects compared to NCBI eutils. Using the python SOAP library suds [2] I had no problem interacting with it. So just in case someone was already working on this secretly :) I would like to know to make my life easier. If not I would also like to know if you would be interested in the addition and finally what's your thought about the SOAP interface and the suds (optional) dependency. Just a word on suds. Even though the project has been around for a few years now, it's still not available in most Linux distros. On my personal experience with it it's probably the simplest and easy to use SOAP library for python out there. Cheers, Renato [1] - http://www.genome.jp/kegg/soap/doc/keggapi_manual.html [2] - https://fedorahosted.org/suds/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkty+ygACgkQYh11EUYTX9Sb3wCgiQrS/HWOr96CEwHErx+RKBVQ 1VMAn1NOlNr/HZ/rmFuqKTlyOM/pZwqi =zBxB -----END PGP SIGNATURE----- From kellrott at gmail.com Wed Feb 10 15:12:10 2010 From: kellrott at gmail.com (Kyle) Date: Wed, 10 Feb 2010 12:12:10 -0800 Subject: [Biopython-dev] KEGG support In-Reply-To: <4B72FB2D.4070808@igc.gulbenkian.pt> References: <4B72FB2D.4070808@igc.gulbenkian.pt> Message-ID: I think external library dependancies should be avoided unless necessary. Would a tool like wsdl2py produce code that isn't dependent on an installed library? Alternatively, suds is LGPL based, could we just cannibalize the source code for the important classes? Kyle On Wed, Feb 10, 2010 at 10:30 AM, Renato Alves wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi everyone, > > KEGG support in Biopython has been mostly untouched for the past 8 years > with only a few changes and test additions. There is code in the tree to > work with the Enzyme and Compound databases but not for others such as > GENES, ORTHOLOGY, DRUG, ... > > Considering the fact that I will need to write some code to work with > other formats I was planning to contribute and integrate it with the > SeqIO interface. This will require some additional homework on my part. > > KEGG also has a SOAP based API [1]. It's functionality could be in some > aspects compared to NCBI eutils. Using the python SOAP library suds [2] > I had no problem interacting with it. > > So just in case someone was already working on this secretly :) I would > like to know to make my life easier. If not I would also like to know if > you would be interested in the addition and finally what's your thought > about the SOAP interface and the suds (optional) dependency. > > Just a word on suds. Even though the project has been around for a few > years now, it's still not available in most Linux distros. On my > personal experience with it it's probably the simplest and easy to use > SOAP library for python out there. > > Cheers, > Renato > > [1] - http://www.genome.jp/kegg/soap/doc/keggapi_manual.html > [2] - https://fedorahosted.org/suds/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAkty+ygACgkQYh11EUYTX9Sb3wCgiQrS/HWOr96CEwHErx+RKBVQ > 1VMAn1NOlNr/HZ/rmFuqKTlyOM/pZwqi > =zBxB > -----END PGP SIGNATURE----- > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From dalloliogm at gmail.com Wed Feb 10 17:13:04 2010 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 10 Feb 2010 23:13:04 +0100 Subject: [Biopython-dev] KEGG support In-Reply-To: <4B72FB2D.4070808@igc.gulbenkian.pt> References: <4B72FB2D.4070808@igc.gulbenkian.pt> Message-ID: <5aa3b3571002101413o55d04432vc76c230aa9c43252@mail.gmail.com> On Wed, Feb 10, 2010 at 7:30 PM, Renato Alves wrote: > KEGG support in Biopython has been mostly untouched for the past 8 years > with only a few changes and test additions. There is code in the tree to > work with the Enzyme and Compound databases but not for others such as > GENES, ORTHOLOGY, DRUG, ... > Hi, I had a terrible experience with parsing Kegg pathway's files: in the end I discovered that the files that are stored in their ftp don't correspond exactly to the diagrams that you can find in the web interface, as for example biochemical interactions don't have directionality while if you look at them on kegg/pathway you will see arrows. Some time ago I proposed to implement something similar to what you have said for kegg/pathway, but in the end I abandoned the effort, because I had problem both with suds and SOAPpy, and I wasn't satisfied by the annotations in KEGG. > > Considering the fact that I will need to write some code to work with > other formats I was planning to contribute and integrate it with the > SeqIO interface. This will require some additional homework on my part. > > If you are serious about that I may help you, but I can only work on the weekends and you should tell me exactly what I have to do :-) KEGG also has a SOAP based API [1]. It's functionality could be in some > aspects compared to NCBI eutils. Using the python SOAP library suds [2] > I had no problem interacting with it. > Are you sure? I tried it on KEGG an year ago and I was having problems to execute slightly more complex queries. If you look at suds's bug tracker, you will find some reports by me, like this one: - https://fedorahosted.org/suds/ticket/213 I remember that I was looping between the KEGG support centre and the suds bug tracker; both were very responsive to feedback and very keen to answer me, but in the end they didn't speak to each other and the bug reports that I have filed are still unfixed. Which library can you use for the soap queries? I had the feeling that SOAPpy (which I think it is included in the standard lib) worked well with KEGG, however it development has stopped many years ago ( http://sourceforge.net/projects/pywebsvcs/files/SOAP.py/), it is a mess if you want to use it behind an http_proxy (I should have a patch somewhere if you are interested) and I am sure it won't be kept compatible with the future versions of python. Another alternative may be beautiful soup, but I have never tried it. This question on stackoverflow may provide you some ideas: - http://stackoverflow.com/questions/206154/whats-the-best-soap-client-library-for-python-and-where-is-the-documentation-fo I am not sure about which is the standard soap library for python, and which one is included in the standard lib. If you are going to use SOAPpy, it is a bad bet toward compatibility and maintenance for the future releases. Suds is the best option but it is not in the standard lib, and they still have to fix the bugs I have reported an year ago. I have the feeling that there is no good alternative for python. Moreover, the WSDL functions that I have seen for KEGG are not especially useful. They seems to allow for the basic queries, but for most of the tasks it is better to download the ftp locally and work there. > So just in case someone was already working on this secretly :) I would > like to know to make my life easier. If not I would also like to know if > you would be interested in the addition and finally what's your thought > about the SOAP interface and the suds (optional) dependency. > > Just a word on suds. Even though the project has been around for a few > years now, it's still not available in most Linux distros. On my > personal experience with it it's probably the simplest and easy to use > SOAP library for python out there. > > Cheers, > Renato > > [1] - http://www.genome.jp/kegg/soap/doc/keggapi_manual.html > [2] - https://fedorahosted.org/suds/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAkty+ygACgkQYh11EUYTX9Sb3wCgiQrS/HWOr96CEwHErx+RKBVQ > 1VMAn1NOlNr/HZ/rmFuqKTlyOM/pZwqi > =zBxB > -----END PGP SIGNATURE----- > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- Giovanni Dall'Olio, phd student Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain) My blog on bioinformatics: http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Wed Feb 10 17:16:14 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 10 Feb 2010 17:16:14 -0500 Subject: [Biopython-dev] [Bug 3009] New: Check the FASTA m10 alignment parser works with FASTA36 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3009 Summary: Check the FASTA m10 alignment parser works with FASTA36 Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Bill Pearson has just announced the release of FASTA36: http://faculty.virginia.edu/wrpearson/fasta/fasta36/ >From his email, > This version is a major update from FASTA version 35. > It's main new feature is the ability to report all > statistically significant alignments between a query > and library sequence (equivalent to BLAST's multiple > HSPs). All previous versions of the FASTA program > reported only the best alignment between the query > and library sequence, a serious shortcoming when > comparing a query protein to a multi-exon gene or > multi-domain protein. We need to check the FASTA36 -m 10 output, add this to our unit tests, and update our parser as required. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Wed Feb 10 17:26:08 2010 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 10 Feb 2010 23:26:08 +0100 Subject: [Biopython-dev] KEGG support In-Reply-To: References: <4B72FB2D.4070808@igc.gulbenkian.pt> Message-ID: <5aa3b3571002101426p7b57f50aga270f0ea7eb8554f@mail.gmail.com> On Wed, Feb 10, 2010 at 9:12 PM, Kyle wrote: > I think external library dependancies should be avoided unless necessary. > Would a tool like wsdl2py produce code that isn't dependent on an > installed > library? Alternatively, suds is LGPL based, could we just cannibalize the > source code for the important classes? > Honestly I think that the best solution would be to make an external module to extend the basic biopython and to link it on the biopython's web page. The core biopython should provide objects and infrastructures for biological data, but then the additional functionalities should go on separate modules linked on the biopython's web page, taking inspiration from BioConductor and installed with easy_install or a derivate. If we keep on maintaining a constrain that all biopython modules should have the same dependencies, then it is impossible to make anything more complex than the basic stuff, and then biopython won't never be useful as it may be. You can't make a good library for using WSDL services with SOAPpy, or plot nice graphics without matplotlib, or store data in HDF5 format, and there are many other examples. Bioinformatics is a very general word, people working on it have a big variety of needs, and it is difficult to accomplish it all with few dependencies. -- Giovanni Dall'Olio, phd student Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain) My blog on bioinformatics: http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Wed Feb 10 17:27:07 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Feb 2010 22:27:07 +0000 Subject: [Biopython-dev] KEGG support In-Reply-To: <4B72FB2D.4070808@igc.gulbenkian.pt> References: <4B72FB2D.4070808@igc.gulbenkian.pt> Message-ID: <320fb6e01002101427o5efc003es3402308e44b2533a@mail.gmail.com> On Wed, Feb 10, 2010 at 6:30 PM, Renato Alves wrote: > > Hi everyone, > > KEGG support in Biopython has been mostly untouched for the past 8 years > with only a few changes and test additions. There is code in the tree to > work with the Enzyme and Compound databases but not for others such as > GENES, ORTHOLOGY, DRUG, ... > > Considering the fact that I will need to write some code to work with > other formats I was planning to contribute and integrate it with the > SeqIO interface. This will require some additional homework on my part. Excellent news. Have you looked at the existing KEGG parsers in Biopython, and do you think the current style is suitable? (I haven't looked at the code recently myself, but will do). Regarding the SeqIO interface (for KEGG GENES only?), I would be happy to advise. Initially I suggest you work on adding a parser much like the other KEGG parsers, returning gene records. Then we can add a Bio/SeqIO/KeggGeneIO.py wrapper to turn these into SeqRecord objects. > KEGG also has a SOAP based API [1]. It's functionality could be in some > aspects compared to NCBI eutils. Using the python SOAP library suds [2] > I had no problem interacting with it. I have not used SOAP, and have a personal preference for REST style APIs. However, if that is what KEGG offers, this is worth considering. I think Brad has some experience with (other) SOAP services in Python. Note the KEGG documentation suggests using SOAPpy for Python. Interestingly, KEGG are however looking into providing RDF (and perhaps one day SPARQL endpoints). I will try and find out what sort of time scale they have in mind while I am at the BioHackathon 2010 this week - http://hackathon3.dbcls.jp/ For now, I would prioritise the KEGG flat file parsers. Peter From biopython at maubp.freeserve.co.uk Wed Feb 10 17:37:03 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Feb 2010 22:37:03 +0000 Subject: [Biopython-dev] KEGG support In-Reply-To: References: <4B72FB2D.4070808@igc.gulbenkian.pt> Message-ID: <320fb6e01002101437p5f382ce0tb4cd864cec59a6c5@mail.gmail.com> On Wed, Feb 10, 2010 at 8:12 PM, Kyle wrote: > I think external library dependancies should be avoided unless necessary. > ?Would a tool like wsdl2py produce code that isn't dependent on an installed > library? Alternatively, suds is LGPL based, could we just cannibalize the > source code for the important classes? Working with SOAP is so complicated that using an external library would be the sensible option. It would be an optional dependency (and would not be an install time dependency like NumPy), much like how we have a optional dependency on ReportLab just for Bio.Graphics, and now also the option to use NetworkX with the new Bio.phylo code. Package management (e.g. under Linux distros) can mark these external modules as suggestions or soft requirements, making this quite straight forward. Regarding some of Giovanni's points, modularising the distribution of Biopython (which can already be considered to be a core plus assorted domain-specific modules like Bio.PDB, Bio.Cluster, Bio.Graphics and so on) seems premature to me give the current state of python distribution. Peter P.S. We can't take any GPL or LPGL code and incorporate it into Biopython, due to the nature of those licences. From anaryin at gmail.com Wed Feb 10 17:52:53 2010 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 10 Feb 2010 14:52:53 -0800 Subject: [Biopython-dev] KEGG support Message-ID: Hello all, For what it's worth: I worked with KEGG about a year and a half ago, to do some very basic things. I remember I tried using SOAPpy and ZSI. The first is a pain to install in Windows (at least then it was), so I opted for the second. However it has been quite outdated and I had some problems dealing with complex data types.. Regarding modularising/non-modularising the code, I guess that some features will have to have dependences that cannot be included in the core distribution, and thus the user should be warned that it needs library X or Y to have them work. In short, keeping the current structure seems the wisest IMO. I don't see such a need of creating outer-modules. Lastly, good luck with KEGG's services' speed. That API is slower than a turle :x From rjalves at igc.gulbenkian.pt Wed Feb 10 19:44:59 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Thu, 11 Feb 2010 00:44:59 +0000 Subject: [Biopython-dev] KEGG support In-Reply-To: <320fb6e01002101427o5efc003es3402308e44b2533a@mail.gmail.com> References: <4B72FB2D.4070808@igc.gulbenkian.pt> <320fb6e01002101427o5efc003es3402308e44b2533a@mail.gmail.com> Message-ID: <4B73530B.7090203@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - From Peter on 02/10/2010 10:27 PM: > Excellent news. Have you looked at the existing KEGG parsers in > Biopython, and do you think the current style is suitable? (I haven't > looked at the code recently myself, but will do). The style seems good enough but I was thinking of having a more functional approach, at least for the parser to try to get away of the massive if/elif/else cascades. The writer would come as second priority and would be similar although I would also try to keep code duplication at lower levels than what we can see in the Enzyme/__init__.py file. I would also consider using Genes.py instead of Genes/__init__.py ... I don't see the need of packages here. > Regarding the SeqIO interface (for KEGG GENES only?), I would be > happy to advise. Initially I suggest you work on adding a parser much > like the other KEGG parsers, returning gene records. Then we can > add a Bio/SeqIO/KeggGeneIO.py wrapper to turn these into SeqRecord > objects. Yes for now my main goal would be GENES. The other formats can probably grow from there. Your suggestion on the SeqIO seems reasonable. I'll try to have a prototype in the next days/weekend and we can discuss from there. > I have not used SOAP, and have a personal preference for REST style > APIs. However, if that is what KEGG offers, this is worth considering. > I think Brad has some experience with (other) SOAP services in Python. > Note the KEGG documentation suggests using SOAPpy for Python. According to the http://www.genome.jp/kegg/docs/weblink.html page they do mention a REST like URL for generic entries, pathways and brite. But it seems more useful for external linking than as an API. I couldn't even figure out how to return the information in plaintext instead of the default HTML. About SOAPpy, I've nothing against it besides the fact that when I first tried I had few problems. Anyway it was a long time ago... I've only played with suds since. > Interestingly, KEGG are however looking into providing RDF (and > perhaps one day SPARQL endpoints). I will try and find out what sort > of time scale they have in mind while I am at the BioHackathon 2010 > this week - http://hackathon3.dbcls.jp/ We'll be waiting on your feedback on this :) > For now, I would prioritise the KEGG flat file parsers. Agreed. > Peter -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAktzUwgACgkQYh11EUYTX9SPcwCfSrNkIovs1vnPinuAtMFZQJYn pmAAnjHAAro2Ls/c1Nq4DCuliReaPm64 =Dohn -----END PGP SIGNATURE----- From rjalves at igc.gulbenkian.pt Wed Feb 10 19:53:03 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Thu, 11 Feb 2010 00:53:03 +0000 Subject: [Biopython-dev] KEGG support In-Reply-To: <320fb6e01002101437p5f382ce0tb4cd864cec59a6c5@mail.gmail.com> References: <4B72FB2D.4070808@igc.gulbenkian.pt> <320fb6e01002101437p5f382ce0tb4cd864cec59a6c5@mail.gmail.com> Message-ID: <4B7354EF.8020703@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - From Peter on 02/10/2010 10:37 PM: > On Wed, Feb 10, 2010 at 8:12 PM, Kyle wrote: >> I think external library dependancies should be avoided unless necessary. >> Would a tool like wsdl2py produce code that isn't dependent on an installed >> library? Alternatively, suds is LGPL based, could we just cannibalize the >> source code for the important classes? > > Working with SOAP is so complicated that using an external library > would be the sensible option. It would be an optional dependency > (and would not be an install time dependency like NumPy), much > like how we have a optional dependency on ReportLab just for > Bio.Graphics, and now also the option to use NetworkX with the > new Bio.phylo code. Yes that would be my idea on the SOAP interface. If doable we could even evaluate the possibility of having some abstraction layer that could enable the use of SOAPpy or suds if either is already available on the system. > Package management (e.g. under Linux distros) can mark these > external modules as suggestions or soft requirements, making > this quite straight forward. The 'or' case for soap libraries would also fit in this scheme since most package managers already support this kind of feature. > Regarding some of Giovanni's points, modularising the distribution > of Biopython (which can already be considered to be a core plus > assorted domain-specific modules like Bio.PDB, Bio.Cluster, > Bio.Graphics and so on) seems premature to me give the current > state of python distribution. Could you elaborate a little on what you mean by 'current state of python...'. Are you referring to the python3 transition? Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAktzVO0ACgkQYh11EUYTX9S1ngCfYFiW7VeNu6atl0J1eViqquSo PCIAn3KO2p//fRYpZVC0QSp2gITP/n2I =uTTc -----END PGP SIGNATURE----- From chapmanb at 50mail.com Wed Feb 10 19:56:00 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 10 Feb 2010 19:56:00 -0500 Subject: [Biopython-dev] KEGG support In-Reply-To: <4B73530B.7090203@igc.gulbenkian.pt> References: <4B72FB2D.4070808@igc.gulbenkian.pt> <320fb6e01002101427o5efc003es3402308e44b2533a@mail.gmail.com> <4B73530B.7090203@igc.gulbenkian.pt> Message-ID: <20100211005600.GB1923@kunkel> Renato; Great idea to work with the KEGG parsers. Very happy to have someone tackling this. > According to the http://www.genome.jp/kegg/docs/weblink.html page they > do mention a REST like URL for generic entries, pathways and brite. But > it seems more useful for external linking than as an API. I couldn't > even figure out how to return the information in plaintext instead of > the default HTML. About SOAPpy, I've nothing against it besides the fact > that when I first tried I had few problems. Anyway it was a long time > ago... I've only played with suds since. My suggestion would be to use the TogoWS REST interface http://togows.dbcls.jp/site/en/rest.html It makes getting records crazy easy. There are tons of examples, but for GENES, here's how to get the plain text record: http://togows.dbcls.jp/entry/gene/eco:b0002 If you really want to use SOAP, my experience has been best with suds. However, the complexities of SOAP are really not worth it if you can get REST approaches to do what you need. Hope this helps, Brad From rjalves at igc.gulbenkian.pt Wed Feb 10 20:14:52 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Thu, 11 Feb 2010 01:14:52 +0000 Subject: [Biopython-dev] KEGG support In-Reply-To: <5aa3b3571002101413o55d04432vc76c230aa9c43252@mail.gmail.com> References: <4B72FB2D.4070808@igc.gulbenkian.pt> <5aa3b3571002101413o55d04432vc76c230aa9c43252@mail.gmail.com> Message-ID: <4B735A0C.8070902@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - From Giovanni Marco Dall'Olio on 02/10/2010 10:13 PM: > Hi, > I had a terrible experience with parsing Kegg pathway's files: in the > end I discovered that the files that are stored in their ftp don't > correspond exactly to the diagrams that you can find in the web > interface, as for example biochemical interactions don't have > directionality while if you look at them on kegg/pathway you will see > arrows. I haven't used pathway files yet so I'll be careful when I reach them :) Have you mentioned this aspect to the KEGG maintainers? > Some time ago I proposed to implement something similar to what you have > said for kegg/pathway, but in the end I abandoned the effort, because I > had problem both with suds and SOAPpy, and I wasn't satisfied by the > annotations in KEGG. > > If you are serious about that I may help you, but I can only work on the > weekends and you should tell me exactly what I have to do :-) Hehe, I can only tell you once I get my hands dirty. I'll keep my code on github to maximize interaction. I'll get back at you when I get the first working draft for GENES. Thanks for the hand ;) > Are you sure? I tried it on KEGG an year ago and I was having problems > to execute slightly more complex queries. If you look at suds's bug > tracker, you will find some reports by me, like this one: > - https://fedorahosted.org/suds/ticket/213 As of suds revision 658 I can no longer reproduce the error in the ticket. > I remember that I was looping between the KEGG support centre and the > suds bug tracker; both were very responsive to feedback and very keen to > answer me, but in the end they didn't speak to each other and the bug > reports that I have filed are still unfixed. > > Which library can you use for the soap queries? I had the feeling that > SOAPpy (which I think it is included in the standard lib) worked well > with KEGG, however it development has stopped many years ago > (http://sourceforge.net/projects/pywebsvcs/files/SOAP.py/), it is a mess > if you want to use it behind an http_proxy (I should have a patch > somewhere if you are interested) and I am sure it won't be kept > compatible with the future versions of python. SOAPpy doesn't seem to be in the standard lib, at least I don't have it out of the box here. Only as external package in the repository. > Another alternative may be beautiful soup, but I have never tried it. I've only used beautiful soup as HTML cleaner/formatter, like HTML tidy. I wasn't aware that it could be used for SOAP stuff. Are you sure about this? > This question on stackoverflow may provide you some ideas: > http://stackoverflow.com/questions/206154/whats-the-best-soap-client-library-for-python-and-where-is-the-documentation-fo > > I am not sure about which is the standard soap library for python, and > which one is included in the standard lib. If you are going to use > SOAPpy, it is a bad bet toward compatibility and maintenance for the > future releases. Suds is the best option but it is not in the standard > lib, and they still have to fix the bugs I have reported an year ago. I > have the feeling that there is no good alternative for python. I'll wait for your opinions. I don't want to sound religious about suds. :P > Moreover, the WSDL functions that I have seen for KEGG are not > especially useful. They seems to allow for the basic queries, but for > most of the tasks it is better to download the ftp locally and work there. Well if you just want a quick check on something the API still gives better/quicker results than downloading the stuff via FTP. Given the size, probably the load of the server and the fact that I'm on the other side of the globe, I got an ETA of close to 20 hours when downloading the genes.tar.gz file which is only a few GB in size. Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAktzWgoACgkQYh11EUYTX9Rp6QCfaHf6Ic3uT/npDw2o8l9F+8Kk RtgAnjNXGxcrfvh48dcdFf6G4wK9+PNI =vpUY -----END PGP SIGNATURE----- From biopython at maubp.freeserve.co.uk Wed Feb 10 20:15:21 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Feb 2010 01:15:21 +0000 Subject: [Biopython-dev] KEGG support In-Reply-To: <4B7354EF.8020703@igc.gulbenkian.pt> References: <4B72FB2D.4070808@igc.gulbenkian.pt> <320fb6e01002101437p5f382ce0tb4cd864cec59a6c5@mail.gmail.com> <4B7354EF.8020703@igc.gulbenkian.pt> Message-ID: <320fb6e01002101715n3ccb8894r155631a2c6e34cb6@mail.gmail.com> Renato Alves wrote: >> Regarding some of Giovanni's points, modularising the distribution >> of Biopython (which can already be considered to be a core plus >> assorted domain-specific modules like Bio.PDB, Bio.Cluster, >> Bio.Graphics and so on) seems premature to me give the current >> state of python distribution. > > Could you elaborate a little on what you mean by 'current state of > python...'. Are you referring to the python3 transition? I didn't mean anything about Python 3 here. Just the current state of python package management, with distutils vs setuptools, easy_install, Distribute, etc. I'm am looking forward to an official Python successor to distutils one day which will properly handle dependencies (and hopefully uninstallation) nicely. However, for now, a single monolithic Biopython released several times a year works fine and I see no reason to change that. Peter From rjalves at igc.gulbenkian.pt Wed Feb 10 20:46:59 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Thu, 11 Feb 2010 01:46:59 +0000 Subject: [Biopython-dev] KEGG support In-Reply-To: <20100211005600.GB1923@kunkel> References: <4B72FB2D.4070808@igc.gulbenkian.pt> <320fb6e01002101427o5efc003es3402308e44b2533a@mail.gmail.com> <4B73530B.7090203@igc.gulbenkian.pt> <20100211005600.GB1923@kunkel> Message-ID: <4B736193.9020801@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Renato; > Great idea to work with the KEGG parsers. Very happy to have someone > tackling this. Well as we say here, when the need comes we grab the bull by the horns. :) (Small illustration even though I'm not a fan of the 'sport' http://www.youtube.com/watch?v=OBORPnrm89I) > My suggestion would be to use the TogoWS REST interface > > http://togows.dbcls.jp/site/en/rest.html > > It makes getting records crazy easy. There are tons of examples, > but for GENES, here's how to get the plain text record: > > http://togows.dbcls.jp/entry/gene/eco:b0002 > > If you really want to use SOAP, my experience has been best with > suds. However, the complexities of SOAP are really not worth it if > you can get REST approaches to do what you need. Indeed this exactly the same without the need of additional libraries. If all the functionality available on the SOAP API is also here I agree with you, the complexity of SOAP is unnecessary. Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAktzYZEACgkQYh11EUYTX9RMWQCeLOXZH5vBjxB7rgPjhS53Fx7Z EuMAoItWzjJ1LEtV6T8NcDDqnoDyIyBS =dPVp -----END PGP SIGNATURE----- From biopython at maubp.freeserve.co.uk Thu Feb 11 00:29:15 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Feb 2010 05:29:15 +0000 Subject: [Biopython-dev] Merging Bio.SeqIO SFF support? In-Reply-To: <320fb6e01001110911g2961a680qe95c01b14e8d23b3@mail.gmail.com> References: <320fb6e00911230643l611bb8f5i253630f3acabf438@mail.gmail.com> <320fb6e01001110911g2961a680qe95c01b14e8d23b3@mail.gmail.com> Message-ID: <320fb6e01002102129r24e210e1qc070b40f7652fac8@mail.gmail.com> On Mon, Jan 11, 2010 at 5:11 PM, Peter wrote: > > Hi all, > > I didn't want to rush the SFF support into Biopython 1.53, but its been > waiting "ready" for a while now. Any objections or comments about > me merging this now? > > Thanks, > > Peter There were no objections, and I ran this by Brad and Michiel and have just merged this into the master branch. Time for some more testing! Peter From krother at rubor.de Thu Feb 11 07:31:58 2010 From: krother at rubor.de (Kristian Rother) Date: Thu, 11 Feb 2010 13:31:58 +0100 Subject: [Biopython-dev] Bio.PDB.KDTree test for memory leak Message-ID: <112c17235319b66a00eebc499294fb2b-EhVcX1xCQgFaRwICBxEAXR0wfgFLV15YQUBGAEFfUC9ZUFgWXVpyH1RXX0FdQU1tXlhfRVBcWQ1eVw==-webmailer2@server03.webmailer.hosteurope.de> Hi, I've encountered a problem with running KDTree: it leaks memory. The code below fills 1GB memory within a minute. Running the GC doesn't help (it slows the process down, but only because the GC is much slower than KDTree. I think the problem might be in the C code. I'd like to get this bug sorted out, but I'm not very good in C. Is there anyone around who I could check ideas with? Best Regards, Kristian ---- from Bio.KDTree.KDTree import * from numpy.random import random nr_points=1000 dim=3 bucket_size=10 coords=(200*random((nr_points, dim))) while 1: kdtree=KDTree(dim, bucket_size) kdtree.set_coords(coords) From biopython at maubp.freeserve.co.uk Fri Feb 12 01:10:13 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 12 Feb 2010 06:10:13 +0000 Subject: [Biopython-dev] Bio.PDB.KDTree test for memory leak In-Reply-To: <112c17235319b66a00eebc499294fb2b-EhVcX1xCQgFaRwICBxEAXR0wfgFLV15YQUBGAEFfUC9ZUFgWXVpyH1RXX0FdQU1tXlhfRVBcWQ1eVw==-webmailer2@server03.webmailer.hosteurope.de> References: <112c17235319b66a00eebc499294fb2b-EhVcX1xCQgFaRwICBxEAXR0wfgFLV15YQUBGAEFfUC9ZUFgWXVpyH1RXX0FdQU1tXlhfRVBcWQ1eVw==-webmailer2@server03.webmailer.hosteurope.de> Message-ID: <320fb6e01002112210y10ad4670p7ac3e003b5976685@mail.gmail.com> On Thu, Feb 11, 2010 at 12:31 PM, Kristian Rother wrote: > > Hi, > > I've encountered a problem with running KDTree: it leaks memory. > The code below fills 1GB memory within a minute. > > Running the GC doesn't help (it slows the process down, but only because > the GC is much slower than KDTree. You mean something like this? import gc from Bio.KDTree.KDTree import * from numpy.random import random nr_points=1000 dim=3 bucket_size=10 coords=(200*random((nr_points, dim))) while True: kdtree=KDTree(dim, bucket_size) kdtree.set_coords(coords) del kdtree #explicitly tell Python it can GC this object gc.collect() #force Python to run GC I agree, this does seem to gradually consume more and more RAM. Could you open a bug on bugzilla to track this please? > I think the problem might be in the C code. I'd like to get this bug > sorted out, but I'm not very good in C. Is there anyone around who > I could check ideas with? Have you ever used valgrind on a C tool? I'm not sure if it is easy to use via Python, but it is my tool of choice for checking memory leaks in C. Peter From bugzilla-daemon at portal.open-bio.org Fri Feb 12 03:30:12 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Feb 2010 03:30:12 -0500 Subject: [Biopython-dev] [Bug 3010] New: Bio.KDTree is leaking memory Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3010 Summary: Bio.KDTree is leaking memory Product: Biopython Version: 1.53 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: krother at rubor.de When I run KDTree on several of our PCs (Ubuntu, one with BioPython 1.53, one with 1.51), it consumes memory that is never freed unless the process terminates. The code below fills 1GB memory within about a minute. ---- #!/usr/bin/env python from Bio.KDTree.KDTree import * from numpy.random import random nr_points=1000 dim=3 bucket_size=10 coords=(200*random((nr_points, dim))) while True: kdtree=KDTree(dim, bucket_size) kdtree.set_coords(coords) ---- Running the GC doesn't help (via del kdtree; gc.collect() in the while loop) does not help. I think the problem might be the C code or the Python/C interaction. I checked the sources of KDTree superficially (to see whether there is a free() for each malloc(), but did not see anything unusual (am not a C programmer though). Peter proposed using valgrind to check memory leaks in C. Eventually it is applicable to the problem. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 12 07:31:13 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Feb 2010 07:31:13 -0500 Subject: [Biopython-dev] [Bug 3006] esearch medline fails with xml format In-Reply-To: Message-ID: <201002121231.o1CCVDlN010496@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3006 georg.lipps at fhnw.ch changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from georg.lipps at fhnw.ch 2010-02-12 07:31 EST ------- I updated to python 2.6.4 and Biopython 1.5.3 and can confirm that the problem does not persist. Thanks for checking. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 12 11:23:17 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Feb 2010 11:23:17 -0500 Subject: [Biopython-dev] [Bug 3010] Bio.KDTree is leaking memory In-Reply-To: Message-ID: <201002121623.o1CGNHHd017669@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3010 ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2010-02-12 11:23 EST ------- Does the memory leak occur also without the line kdtree.set_coords(coords)? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Feb 14 05:45:48 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 14 Feb 2010 05:45:48 -0500 Subject: [Biopython-dev] [Bug 3010] Bio.KDTree is leaking memory In-Reply-To: Message-ID: <201002141045.o1EAjmV1029393@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3010 ------- Comment #2 from krother at rubor.de 2010-02-14 05:45 EST ------- (In reply to comment #1) > Does the memory leak occur also without the line kdtree.set_coords(coords)? > No, I tried, and it doesnt. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From MatatTHC at gmx.de Tue Feb 16 04:48:25 2010 From: MatatTHC at gmx.de (Matthias Bernt) Date: Tue, 16 Feb 2010 10:48:25 +0100 Subject: [Biopython-dev] derive from Seq Message-ID: <20100216094825.25190@gmx.net> Hi, I've implemented a class derived from Seq. Many of the Seq functions return Seq. Thus, I can not use those functions because I need instances of the derived class. This can easily be fixed by returning: self.__class__( .. ) Regards, Matthias -- Sicherer, schneller und einfacher. Die aktuellen Internet-Browser - jetzt kostenlos herunterladen! http://portal.gmx.net/de/go/atbrowser From chapmanb at 50mail.com Tue Feb 16 08:09:45 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 16 Feb 2010 08:09:45 -0500 Subject: [Biopython-dev] derive from Seq In-Reply-To: <20100216094825.25190@gmx.net> References: <20100216094825.25190@gmx.net> Message-ID: <20100216130945.GH64068@sobchak.mgh.harvard.edu> Hi Matthias; > I've implemented a class derived from Seq. Many of the Seq functions > return Seq. Thus, I can not use those functions because I need > instances of the derived class. > > This can easily be fixed by returning: > > self.__class__( .. ) Good catch. Would you be able to submit a patch for this to the bug tracker? More generally, it is interesting that you are subclassing Seq. Can you describe your application for this? I was debating with Peter and Michiel this week and arguing that the Seq class should be switched to a standard string, with biological functions like reverse_complement and the like moving to stand alone functions and SeqRecord objects. I'd be interested in hearing the opposite case; that additional functionality is needed on a Seq object. Brad From bugzilla-daemon at portal.open-bio.org Tue Feb 16 12:53:29 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Feb 2010 12:53:29 -0500 Subject: [Biopython-dev] [Bug 3013] New: import warnings missing in Bio/PDB/MMCIF2Dict.py Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3013 Summary: import warnings missing in Bio/PDB/MMCIF2Dict.py Product: Biopython Version: 1.53 Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: macrozhu+biopy at gmail.com python library >>warnings<< is not imported in Bio/PDB/MMCIF2Dict.py Please import the library in the beginning of the source code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Feb 16 20:24:39 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Feb 2010 20:24:39 -0500 Subject: [Biopython-dev] [Bug 3013] import warnings missing in Bio/PDB/MMCIF2Dict.py In-Reply-To: Message-ID: <201002170124.o1H1OdhE003209@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3013 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2010-02-16 20:24 EST ------- Fixed in the repository, thanks. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From p.j.a.cock at googlemail.com Tue Feb 16 21:48:01 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 17 Feb 2010 02:48:01 +0000 Subject: [Biopython-dev] derive from Seq In-Reply-To: <20100216130945.GH64068@sobchak.mgh.harvard.edu> References: <20100216094825.25190@gmx.net> <20100216130945.GH64068@sobchak.mgh.harvard.edu> Message-ID: <320fb6e01002161848s28e543a9jdac436976de3f279@mail.gmail.com> On Tue, Feb 16, 2010 at 1:09 PM, Brad Chapman wrote: > Hi Matthias; > >> I've implemented a class derived from Seq. Many of the Seq functions >> return Seq. Thus, I can not use those functions because I need >> instances of the derived class. >> >> This can easily be fixed by returning: >> >> self.__class__( .. ) We debated this on the mailing list a while ago (I'd hack to search a little harder to find the thread). While switching to this form makes subclassing easier in some cases, it doesn't in all. > More generally, it is interesting that you are subclassing Seq. Can > you describe your application for this? ... I'd be interested in > hearing ... additional functionality is needed on a Seq object. > > Brad Last time this (subclassing the Seq object) was mentioned, the specific use was to change the equality operations to be string like. This is a change we're considering making in Biopython itself (and again was something Brad, Michiel and I chatted about last week - I will be sending out an email about that next week, I'm on holiday right now and haven't had internet access till today). But to echo Brad, use cases for subclassing the Seq are of great interest. Regards, Peter From MatatTHC at gmx.de Wed Feb 17 03:33:11 2010 From: MatatTHC at gmx.de (Matthias Bernt) Date: Wed, 17 Feb 2010 09:33:11 +0100 Subject: [Biopython-dev] derive from Seq In-Reply-To: <320fb6e01002161848s28e543a9jdac436976de3f279@mail.gmail.com> References: <20100216094825.25190@gmx.net> <20100216130945.GH64068@sobchak.mgh.harvard.edu> <320fb6e01002161848s28e543a9jdac436976de3f279@mail.gmail.com> Message-ID: <20100217083311.287840@gmx.net> Hi, I'm dealing with circular sequences. Thus, I need some specialised functions (e.g. getting a subsequence). Furthermore, for me it seems to be the natural way to extend the functionality of Seq to my own needs. But, maybe this is not the best way. Matthias > > Hi Matthias; > > > >> I've implemented a class derived from Seq. Many of the Seq functions > >> return Seq. Thus, I can not use those functions because I need > >> instances of the derived class. > >> > >> This can easily be fixed by returning: > >> > >> self.__class__( .. ) > > We debated this on the mailing list a while ago (I'd hack to search > a little harder to find the thread). While switching to this form makes > subclassing easier in some cases, it doesn't in all. > > > More generally, it is interesting that you are subclassing Seq. Can > > you describe your application for this? ... I'd be interested in > > hearing ... additional functionality is needed on a Seq object. > > > > Brad > > Last time this (subclassing the Seq object) was mentioned, the > specific use was to change the equality operations to be string > like. This is a change we're considering making in Biopython itself > (and again was something Brad, Michiel and I chatted about > last week - I will be sending out an email about that next week, > I'm on holiday right now and haven't had internet access till > today). > > But to echo Brad, use cases for subclassing the Seq are > of great interest. -- NEU: Mit GMX DSL ?ber 1000,- ? sparen! http://portal.gmx.net/de/go/dsl02 From bugzilla-daemon at portal.open-bio.org Thu Feb 18 11:09:52 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Feb 2010 11:09:52 -0500 Subject: [Biopython-dev] [Bug 3013] import warnings missing in Bio/PDB/MMCIF2Dict.py In-Reply-To: Message-ID: <201002181609.o1IG9qth028156@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3013 ------- Comment #2 from macrozhu+biopy at gmail.com 2010-02-18 11:09 EST ------- Can pychecker be of any use for detecting such minor bugs? It might be too much, I guess. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Feb 20 13:40:59 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 20 Feb 2010 13:40:59 -0500 Subject: [Biopython-dev] [Bug 3013] import warnings missing in Bio/PDB/MMCIF2Dict.py In-Reply-To: Message-ID: <201002201840.o1KIexYS017773@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3013 ------- Comment #3 from eric.talevich at gmail.com 2010-02-20 13:40 EST ------- (In reply to comment #2) > Can pychecker be of any use for detecting such minor bugs? It might be too > much, I guess. > I don't know about PyChecker, but PyLint will catch import errors and uninitialized variables like this. For example, I just tried "pylint -e Bio/PDB/*.py" to a branch that didn't have this fix in it yet, and it flagged this bug: E: 79:MMCIF2Dict._make_mmcif_dict: Undefined variable 'warnings' E: 91:MMCIF2Dict._make_mmcif_dict: Undefined variable 'warnings' E:107:MMCIF2Dict._make_mmcif_dict: Undefined variable 'warnings' While I'm at it, here are the other errors in Bio.PDB that pylint caught in a freshly updated master branch: ************* Module Chain E: 79:Chain.__delitem__: Class 'Entity' has no '__delitem__' member ************* Module DSSP E:101:make_dssp_dict: function already defined line 8 E:139:DSSP: class already defined line 8 ************* Module Entity E: 56:Entity.get_level: Instance of 'Entity' has no 'level' member ************* Module FragmentMapper E:137:Fragment.add_residue: Undefined variable 'PDBException' E:191:_make_fragment_list: Undefined variable 'PDBException' E:193:_make_fragment_list: Undefined variable 'PDBException' E:226:FragmentMapper: class already defined line 10 E:250:FragmentMapper.__init__: Undefined variable 'PDBException' ************* Module HSExposure E: 67:_AbstractHSExposure.__init__: Instance of '_AbstractHSExposure' has no '_get_cb' member E:131:HSExposureCA: class already defined line 9 E:222:HSExposureCB: class already defined line 9 E:257:ExposureCN: class already defined line 9 ************* Module MMCIF2Dict E: 8: No name 'MMCIFlex' in module 'Bio.PDB.mmCIF' E: 31:MMCIF2Dict.__init__: Module 'Bio.PDB.mmCIF' has no 'MMCIFlex' member E: 33:MMCIF2Dict.__init__: Module 'Bio.PDB.mmCIF' has no 'MMCIFlex' member E: 44:MMCIF2Dict._make_mmcif_dict: Module 'Bio.PDB.mmCIF' has no 'MMCIFlex' member ************* Module NACCESS E:183: Instance of 'NACCESS' has no 'get_iterator' member ************* Module PDBParser E:159:PDBParser._parse_coordinates: Undefined variable 'PDBContructionError' ************* Module Polypeptide E:276:_PPBuilder.build_peptides: Instance of '_PPBuilder' has no '_is_connected' member ************* Module ResidueDepth E: 65:get_surface: function already defined line 11 E:123:ResidueDepth: class already defined line 11 ************* Module StructureAlignment E: 14:StructureAlignment: class already defined line 6 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Sat Feb 20 14:01:42 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 20 Feb 2010 14:01:42 -0500 Subject: [Biopython-dev] derive from Seq In-Reply-To: <20100216130945.GH64068@sobchak.mgh.harvard.edu> References: <20100216094825.25190@gmx.net> <20100216130945.GH64068@sobchak.mgh.harvard.edu> Message-ID: <3f6baf361002201101m78a9550csc6d371df251e4319@mail.gmail.com> On Tue, Feb 16, 2010 at 8:09 AM, Brad Chapman wrote: > > More generally, it is interesting that you are subclassing Seq. Can > you describe your application for this? I was debating with Peter > and Michiel this week and arguing that the Seq class should be > switched to a standard string, with biological functions like > reverse_complement and the like moving to stand alone functions and > SeqRecord objects. I'd be interested in hearing the opposite case; > that additional functionality is needed on a Seq object. > > I've seen a technique like this used to good effect: # File: Seq.py # Standalone functions all take a string-like first argument def reverse_complement(seq): ... def translate(seq, table=1): ... class Seq(basestring): # or str def __init__(self, data, alphabet): ... # Then attach the above functions as methods here reverse_complement = reverse_complement translate = translate ... The same functionality is then available in a functional or OO style, with minimal code duplication. And for interactive sessions, where converting strings to Seqs is a bit more of an inconvenience, "from Bio.Seq import *" becomes quick and handy. -Eric From biopython at maubp.freeserve.co.uk Sun Feb 21 07:03:21 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 21 Feb 2010 12:03:21 +0000 Subject: [Biopython-dev] derive from Seq In-Reply-To: <3f6baf361002201101m78a9550csc6d371df251e4319@mail.gmail.com> References: <20100216094825.25190@gmx.net> <20100216130945.GH64068@sobchak.mgh.harvard.edu> <3f6baf361002201101m78a9550csc6d371df251e4319@mail.gmail.com> Message-ID: <320fb6e01002210403i721f8f26l2e7d37aae0b13c35@mail.gmail.com> On Sat, Feb 20, 2010 at 7:01 PM, Eric Talevich wrote: > I've seen a technique like this used to good effect: > > # File: Seq.py > > ... > > The same functionality is then available in a functional or OO style, with > minimal code duplication. And for interactive sessions, where converting > strings to Seqs is a bit more of an inconvenience, "from Bio.Seq import *" > becomes quick and handy. Doesn't that describe the Bio.Seq module as it is pretty well? In addition to the Seq object methods, there are several functions which can be used on strings or Seq (like) objects. Peter From eric.talevich at gmail.com Sun Feb 21 11:36:13 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 21 Feb 2010 11:36:13 -0500 Subject: [Biopython-dev] derive from Seq In-Reply-To: <320fb6e01002210403i721f8f26l2e7d37aae0b13c35@mail.gmail.com> References: <20100216094825.25190@gmx.net> <20100216130945.GH64068@sobchak.mgh.harvard.edu> <3f6baf361002201101m78a9550csc6d371df251e4319@mail.gmail.com> <320fb6e01002210403i721f8f26l2e7d37aae0b13c35@mail.gmail.com> Message-ID: <3f6baf361002210836of243016s8206035c1b89de24@mail.gmail.com> On Sun, Feb 21, 2010 at 7:03 AM, Peter wrote: > On Sat, Feb 20, 2010 at 7:01 PM, Eric Talevich > wrote: > > I've seen a technique like this used to good effect: > > > > ... > > > > The same functionality is then available in a functional or OO style, > with > > minimal code duplication. And for interactive sessions, where converting > > strings to Seqs is a bit more of an inconvenience, "from Bio.Seq import > *" > > becomes quick and handy. > > Doesn't that describe the Bio.Seq module as it is pretty well? > In addition to the Seq object methods, there are several functions > which can be used on strings or Seq (like) objects. > > Peter > I'm not fully up to speed on the debate or the use cases that triggered it, but I'm guessing the goal is better code flexibility without sacrificing performance. Here's some code to consider: def transcribe(dna, alphabet=None): """Transcribe a DNA sequence into RNA. Returns a string.""" if isinstance(dna, Seq) or isinstance(dna, MutableSeq): # At first, maybe issue a warning here alphabet = dna.alphabet dna = str(dna) if alphabet is not None: # Validate base = Alphabet._get_base_alphabet(alphabet) if isinstance(base, Alphabet.ProteinAlphabet): raise ValueError("Proteins cannot be transcribed!") if isinstance(base, Alphabet.RNAAlphabet): raise ValueError("RNA cannot be transcribed!") return dna.replace('T','U').replace('t','u') class Seq: # ... def transcribe(self): transcript = transcribe(self._data) # Rebuild the Seq object if self.alphabet==IUPAC.unambiguous_dna: alphabet = IUPAC.unambiguous_rna elif self.alphabet==IUPAC.ambiguous_dna: alphabet = IUPAC.ambiguous_rna else: alphabet = Alphabet.generic_rna return Seq(transcript, alphabet) Notes: - The standalone takes an optional 'alphabet' argument, and performs validation if requested. - Since the standalone function now has the same functionality as the Seq method, Seq can dispatch to the function -- rather than the other way around, as it is currently -- and then just rebuild a Seq object. - The standalone function now always returns the same type (str). Since this might break some existing code, a little shim and deprecation dance may be needed in real life. But I think returning a plain string is the Right Thing: there's "one obvious way" to work with Seq objects or plain strings. - If the grand proposal is to eventually move the alphabet attribute to SeqRecord, this provides an intermediate step and a more convenient foundation for testing the idea. Best, Eric From biopython at maubp.freeserve.co.uk Mon Feb 22 09:48:14 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Feb 2010 14:48:14 +0000 Subject: [Biopython-dev] Changing Seq equality In-Reply-To: <320fb6e00911270339s3354051cub0cc193466575f16@mail.gmail.com> References: <200911250945.20870.jblanca@btc.upv.es> <320fb6e00911250226w4e86ea5cr4cdea4a424d32b7@mail.gmail.com> <200911251220.53881.jblanca@btc.upv.es> <320fb6e00911250348m249533d1g5e30b6c593769dd1@mail.gmail.com> <3f6baf360911252314u72ab5c19rbcb899e736117a4f@mail.gmail.com> <320fb6e00911260241j22fbee47ufaad13412c0ff580@mail.gmail.com> <3f6baf360911261213g2047607aw212215cce2b4fe82@mail.gmail.com> <320fb6e00911270339s3354051cub0cc193466575f16@mail.gmail.com> Message-ID: <320fb6e01002220648n5d47f015r65f17a37f782fcde@mail.gmail.com> Hi all, I've just got back from Japan - Brad and I were fortunate to be able to attend the DBCLS BioHackathon 2010 held in Tokyo, http://hackathon3.dbcls.jp/ As Brad already mentioned in passing, we also managed to have dinner one evening with Michiel, and had an informal chat about Biopython plans. Expect a few more emails on other topics to follow. One of the short term aims we agreed on was to press ahead with the Seq equality changes outlined on this thread late last year. Mailing list archive link: http://lists.open-bio.org/pipermail/biopython-dev/2009-November/007021.html To recap, the agreed best behaviour was to make Seq equality act like string equality, but to raise a Python warning when incompatible alphabets are compared (e.g. DNA to Protein). This also applies to all the other comparison operators: not equal, less than, greater than, less than or equal, and greater than or equal. This is my outline plan for the change: For Biopython up to 1.53, Seq class uses object equality, seq1==seq2 acts as id(seq1)==id(seq2) For Biopython 1.54 (and perhaps a few more releases), the Seq classes will still use object equality but will trigger a warning suggesting explicit use of id(seq1)==id(seq2) or str(seq1)==str(seq2) as appropriate. For Biopython 1.xx (maybe 1.55 or 1.56?) the Seq classes will switch to using string equality (with an alphabet aware warning for comparing DNA to RNA etc), but will also trigger a warning that this is a change from previous releases, and suggest in the short term the continued explicit use of either id(seq1)==id(seq2) for object identity or str(seq1)==str(seq2) for string identity. For Biopython 1.yy (maybe 1.57?) the Seq classes will use string equality (with an alphabet aware warning for comparing DNA to RNA etc), without any warning about this being a change from historic behaviour. These warning messages could also point at a wiki page, and we'd need a FAQ entry in the tutorial as well. The aim of this slightly drawn out switch is to try and make sure all users are aware of the change, even if they only update their copy of Biopython every few releases. Does that all sound sensible? If so, we should probably have an announcement on the main mailing list, in case there are any other views. Other more complex options include a flag for switching between the modes - but that complexity doesn't seem such a good idea to me. All my own code and most of the unit tests use str(seq1)==str(seq2) explicitly anyway. The only exception is some of the genetic algorithm unit tests which do seem to want explicit object identity. Regards, Peter From biopython at maubp.freeserve.co.uk Tue Feb 23 06:31:35 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Feb 2010 11:31:35 +0000 Subject: [Biopython-dev] Handles and/or filenames in Bio.SeqIO etc? In-Reply-To: <20090728221726.GK68751@sobchak.mgh.harvard.edu> References: <320fb6e00907280934i54f326a6r38325c05a314cdbc@mail.gmail.com> <20090728221726.GK68751@sobchak.mgh.harvard.edu> Message-ID: <320fb6e01002230331j5f5f87c5lf328d3bacc4a557b@mail.gmail.com> Hi all, As mentioned in another thread, Brad, Michiel and I had an informal meeting earlier this month in Tokyo and discussed some plans for Biopython. One of the short term changes we agreed on was to push ahead with the Seq object equality changes, see: http://lists.open-bio.org/pipermail/biopython-dev/2010-February/007351.html Another short term change we agreed was worthwhile was to follow other Python libraries and allow handles OR filenames in our parsers (starting with SeqIO and AlignIO). This follows the discussion for the "TreeIO" module (since renamed) and the Bio.SeqIO.convert functions here on the mailing list last year, see: http://lists.open-bio.org/pipermail/biopython-dev/2009-July/006503.html I will tackle this shortly for Bio.SeqIO and Bio.AlignIO. Peter From bugzilla-daemon at portal.open-bio.org Tue Feb 23 12:43:01 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Feb 2010 12:43:01 -0500 Subject: [Biopython-dev] [Bug 3013] import warnings missing in Bio/PDB/MMCIF2Dict.py In-Reply-To: Message-ID: <201002231743.o1NHh17v001826@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3013 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-23 12:43 EST ------- Hi Eric, I have fixed most (all?) of those problems reported by pylint - see mailing list post. Thanks! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Feb 23 12:43:31 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Feb 2010 17:43:31 +0000 Subject: [Biopython-dev] Running pylint over Biopython Message-ID: <320fb6e01002230943g474dc673x43f7df120e679d91@mail.gmail.com> Hi all, Those following @Biopython on twitter or subscribed to the github RSS feed for our repository will know this already, but I've been using pylint today to spot some errors in Biopython. http://www.logilab.org/project/pylint This was prompted by Eric trying this on Bio.PDB for Bug 3013 and finding some issues - thank Eric, this was a valuable suggestion. With its default settings pylint is very very noisy, and in particular doesn't like our naming conventions. However, with the following command line you can focus in on the important stuff: pylint --disable-msg-cat=CRW --include-ids=y --disable-msg=E1101,E1103,E0102 -r n Bio BioSQL Note that instead of module names, you can give filenames (e.g. *.py). What that does is disable several categories of message (conventions, possible refactorings, warnings) leaving just errors and fatal messages. I turned on the message identifiers so that I have something useful to stick into Google if need be, or to add to the ignore list (currently three cases which looked like false positives). Then I turn off the detailed report. [Tip - don't run this from the Biopython source directory as then importing our C code modules will fail] As you will be able to tell from the recent flurry of git commits, this highlighted some simple errors like missing imports or typos in variable names. Tiago, could you have a look at these possible problems in Bio.PopGen: ************* Module Bio.PopGen.Async E0602: 78:Async.get_result: Undefined variable 'done' E0602: 79:Async.get_result: Undefined variable 'done' ************* Module Bio.PopGen.GenePop E0602:160:Record.split_in_pops: Undefined variable 'GenePop' E0602:177:Record.split_in_loci: Undefined variable 'GenePop' ************* Module Bio.PopGen.GenePop.Controller E0602: 41:_read_allele_freq_table: Undefined variable 'self' E0602:133:_hw_func: Undefined variable 'self' E0602:393:GenePopController.test_pop_hw_prob: Undefined variable 'ext' E0602:458:GenePopController.test_ld.ld_pop_func: Undefined variable 'currrent_pop' ************* Module Bio.PopGen.SimCoal.Cache E0602: 79:SimCoalCache.getSimulation: Undefined variable 'Config' E0602: 88: Undefined variable 'Cache' ************* Module Bio.PopGen.SimCoal.Controller E0602: 47:SimCoalController.run_simcoal: Undefined variable 'Config' Eric, I don't have all the dependencies installed by pylint does appear to dislike a few things in Bio.Phylo on the trunk: ************* Module Bio.Phylo.BaseTree E0203:521:TreeMixin.prune: Access to member 'root' before its definition line 531 E0203:527:TreeMixin.prune: Access to member 'root' before its definition line 531 E0202:672:Subtree.root: An attribute inherited from TreeMixin hide this method ************* Module Bio.Phylo.PhyloXML E1120:182:Phylogeny.get_alignment: No value passed for parameter 'follow_attrs' in function call One thing this exercise has shown is that we still need to do some work on the unit test coverage. Regards Peter From tiagoantao at gmail.com Tue Feb 23 12:56:22 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 23 Feb 2010 17:56:22 +0000 Subject: [Biopython-dev] Running pylint over Biopython In-Reply-To: <320fb6e01002230943g474dc673x43f7df120e679d91@mail.gmail.com> References: <320fb6e01002230943g474dc673x43f7df120e679d91@mail.gmail.com> Message-ID: <6d941f121002230956h582acc11r59cb18bca8d2f727@mail.gmail.com> This comes in a good time, I've actually been making changes to the code (as the genepop parser is not able to handle big files and I've had quite a few complains about that). it seems to be 2.6 related or so because I've detected the Config problem myself. I will correct this next week (this week is _impossible_), along with an update to the genepop parser to support big files. 2010/2/23 Peter : > Hi all, > > Those following @Biopython on twitter or subscribed to the github RSS > feed for our repository will know this already, but I've been using > pylint today to spot some errors in Biopython. > http://www.logilab.org/project/pylint > > This was prompted by Eric trying this on Bio.PDB for Bug 3013 and > finding some issues - thank Eric, this was a valuable suggestion. > > With its default settings pylint is very very noisy, and in particular > doesn't like our naming conventions. However, with the following > command line you can focus in on the important stuff: > > pylint --disable-msg-cat=CRW --include-ids=y > --disable-msg=E1101,E1103,E0102 -r n Bio BioSQL > > Note that instead of module names, you can give filenames (e.g. *.py). > What that does is disable several categories of message (conventions, > possible refactorings, warnings) leaving just errors and fatal > messages. I turned on the message identifiers so that I have something > useful to stick into Google if need be, or to add to the ignore list > (currently three cases which looked like false positives). Then I turn > off the detailed report. > > [Tip - don't run this from the Biopython source directory as then > importing our C code modules will fail] > > As you will be able to tell from the recent flurry of git commits, > this highlighted some simple errors like missing imports or typos in > variable names. > > > Tiago, could you have a look at these possible problems in Bio.PopGen: > > ************* Module Bio.PopGen.Async > E0602: 78:Async.get_result: Undefined variable 'done' > E0602: 79:Async.get_result: Undefined variable 'done' > ************* Module Bio.PopGen.GenePop > E0602:160:Record.split_in_pops: Undefined variable 'GenePop' > E0602:177:Record.split_in_loci: Undefined variable 'GenePop' > ************* Module Bio.PopGen.GenePop.Controller > E0602: 41:_read_allele_freq_table: Undefined variable 'self' > E0602:133:_hw_func: Undefined variable 'self' > E0602:393:GenePopController.test_pop_hw_prob: Undefined variable 'ext' > E0602:458:GenePopController.test_ld.ld_pop_func: Undefined variable > 'currrent_pop' > ************* Module Bio.PopGen.SimCoal.Cache > E0602: 79:SimCoalCache.getSimulation: Undefined variable 'Config' > E0602: 88: Undefined variable 'Cache' > ************* Module Bio.PopGen.SimCoal.Controller > E0602: 47:SimCoalController.run_simcoal: Undefined variable 'Config' > > > Eric, I don't have all the dependencies installed by pylint does > appear to dislike a few things in Bio.Phylo on the trunk: > > ************* Module Bio.Phylo.BaseTree > E0203:521:TreeMixin.prune: Access to member 'root' before its > definition line 531 > E0203:527:TreeMixin.prune: Access to member 'root' before its > definition line 531 > E0202:672:Subtree.root: An attribute inherited from TreeMixin hide this method > ************* Module Bio.Phylo.PhyloXML > E1120:182:Phylogeny.get_alignment: No value passed for parameter > 'follow_attrs' in function call > > > One thing this exercise has shown is that we still need to do some > work on the unit test coverage. > > Regards > > Peter > -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From bugzilla-daemon at portal.open-bio.org Tue Feb 23 13:03:53 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Feb 2010 13:03:53 -0500 Subject: [Biopython-dev] [Bug 3013] import warnings missing in Bio/PDB/MMCIF2Dict.py In-Reply-To: Message-ID: <201002231803.o1NI3r10002509@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3013 macrozhu+biopy at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |macrozhu+biopy at gmail.com ------- Comment #5 from macrozhu+biopy at gmail.com 2010-02-23 13:03 EST ------- wow, the developers really respond very quickly. How about running >>pylint<< or >>pychecker<< on all BioPython code to detect potential problems? cheers, -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Feb 23 13:59:49 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Feb 2010 13:59:49 -0500 Subject: [Biopython-dev] [Bug 3013] import warnings missing in Bio/PDB/MMCIF2Dict.py In-Reply-To: Message-ID: <201002231859.o1NIxnJH004142@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3013 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-23 13:59 EST ------- (In reply to comment #5) > wow, the developers really respond very quickly. > > How about running >>pylint<< or >>pychecker<< on all BioPython code to detect > potential problems? > > cheers, > Already tried with pylint earlier today ;) http://lists.open-bio.org/pipermail/biopython-dev/2010-February/007354.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Tue Feb 23 22:11:25 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 23 Feb 2010 22:11:25 -0500 Subject: [Biopython-dev] Running pylint over Biopython In-Reply-To: <320fb6e01002230943g474dc673x43f7df120e679d91@mail.gmail.com> References: <320fb6e01002230943g474dc673x43f7df120e679d91@mail.gmail.com> Message-ID: <3f6baf361002231911g65b47cbfw3625f9aacf863abc@mail.gmail.com> 2010/2/23 Peter > Hi all, > > Those following @Biopython on twitter or subscribed to the github RSS > feed for our repository will know this already, but I've been using > pylint today to spot some errors in Biopython. > http://www.logilab.org/project/pylint > > This was prompted by Eric trying this on Bio.PDB for Bug 3013 and > finding some issues - thank Eric, this was a valuable suggestion. > > Glad I could help. :) > Eric, I don't have all the dependencies installed by pylint does > appear to dislike a few things in Bio.Phylo on the trunk: > Pylint hates the way I wrote Bio.Phylo, in particular the way TreeMixin assumes it will be mixed with a class that has 'root' and 'is_terminal' attributes, and the __dict__ hack in the PhyloXML class __init__ methods -- it can't figure out where the attributes are coming from. The last error was real, and I've pushed a fix to the trunk. Thanks for catching it. One thing this exercise has shown is that we still need to do some > work on the unit test coverage. > Agreed. I also added a unit test for get_alignment (finally), and should get to TreeMixin.prune and .split soon. Then Bio.Phylo will have essentially 100% unit test coverage. Cheers -Eric From biopython at maubp.freeserve.co.uk Wed Feb 24 02:41:18 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Feb 2010 07:41:18 +0000 Subject: [Biopython-dev] Running pylint over Biopython In-Reply-To: <3f6baf361002231911g65b47cbfw3625f9aacf863abc@mail.gmail.com> References: <320fb6e01002230943g474dc673x43f7df120e679d91@mail.gmail.com> <3f6baf361002231911g65b47cbfw3625f9aacf863abc@mail.gmail.com> Message-ID: <320fb6e01002232341n3ee397basddde348df86d4871@mail.gmail.com> On Wed, Feb 24, 2010 at 3:11 AM, Eric Talevich wrote: > 2010/2/23 Peter > >> Hi all, >> >> Those following @Biopython on twitter or subscribed to the github RSS >> feed for our repository will know this already, but I've been using >> pylint today to spot some errors in Biopython. >> http://www.logilab.org/project/pylint >> >> This was prompted by Eric trying this on Bio.PDB for Bug 3013 and >> finding some issues - thank Eric, this was a valuable suggestion. >> >> Glad I could help. :) Re-reading Bug 3013, we might also want to try PyChecker as suggested by Hongbo Zhu - I've not used that before. >> Eric, I don't have all the dependencies installed by pylint does >> appear to dislike a few things in Bio.Phylo on the trunk: > > Pylint hates the way I wrote Bio.Phylo, in particular the way TreeMixin > assumes it will be mixed with a class that has 'root' and 'is_terminal' > attributes, and the __dict__ hack in the PhyloXML class __init__ methods -- > it can't figure out where the attributes are coming from. Some of the "apparent false positives" I was ignoring related to the iterator classes in Bio.SeqIO, again this seems to be valid code which pylint can't cope with. We may want to follow up on this (it could be a bug in pylint?). That said, if you can think of a cleaner way to code your bits that might be advantageous for long term maintainance. Maybe just add a TODO comment to consider using Abstract Base Classes once we require Python 2.6+ for Biopython (if that looks suitable)? > The last error was real, and I've pushed a fix to the trunk. > Thanks for catching it. Cool. >> One thing this exercise has shown is that we still need >> to do some work on the unit test coverage. > > Agreed. I also added a unit test for get_alignment (finally), > and should get to TreeMixin.prune and .split soon. Then > Bio.Phylo will have essentially 100% unit test coverage. I didn't mean to single out just Bio.Phylo - I meant the whole of Biopython would benefit from more unit tests. In particular, a lot of the "minor" errors pylint helped me fix were in error messages (e.g. wrong variable name used). This means if a user hit the error, rather than the exception we wanted to raise they'd get an error about our message. So, not critical, but it suggests we need more tests to cover the exceptions (as well as the more important tests to cover typical usage). Peter From p.j.a.cock at googlemail.com Wed Feb 24 02:43:48 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 24 Feb 2010 07:43:48 +0000 Subject: [Biopython-dev] Medium/long term plans Message-ID: <320fb6e01002232343s2df80990s96774b44f942e851@mail.gmail.com> Hi all, As mentioned in other recent threads, Brad and I were in Tokyo earlier this month for the DBCLS BioHackathon 2010 (see http://hackathon3.dbcls.jp/ for details). While there, we met up with Michiel for an informal dinner meeting, and discussed some possible plans for Biopython. === Short term action points === Seq object equality, see: http://lists.open-bio.org/pipermail/biopython-dev/2010-February/007351.html Filenames or handles in SeqIO, AlignIO, etc, see: http://lists.open-bio.org/pipermail/biopython-dev/2010-February/007352.html === Medium term action points === Python 3 support. With NumPy starting to make serious plans for supporting Python 3 this year, we should be able to look at doing this too. Initially we will continue to focus on Python 2.x, but make more effort to ensure that we can run without issues in the "Python 3 warning mode" available in Python 2.6 (or 2.7 once that is out). Then start to put Biopython through 2to3, and see how we get on. Name space reorganisation for sequences. It would be nice to have the Seq objects, SeqFeature, SeqRecord and probably SeqUtils and SeqIO all under one module name. We may be able to handle this in the short term with two import routes with the old module names discouraged and eventually deprecated. See also the "Code review request for phyloxml branch" thread which covered some of this: http://lists.open-bio.org/pipermail/biopython-dev/2010-January/007215.html === Long term action points === There are things in Biopython that with hindsight we feel have not worked out so well (module naming, alphabets objects) where change may require a break, i.e. a Biopython version two. Should we start a wiki to record points of debate, and get people to list their niggles/faults for consideration? Regarding Python 3.x support and a possible Biopython 2.x see also Guido's blog post (there is probably an email version on one of the python mailing lists too): http://www.artima.com/weblogs/viewpost.jsp?thread=227041 Peter From biopython at maubp.freeserve.co.uk Wed Feb 24 06:52:55 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Feb 2010 11:52:55 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> Message-ID: <320fb6e01002240352p622f842bo140b1210d76fde4c@mail.gmail.com> On Tue, Dec 22, 2009 at 4:08 PM, Peter wrote: > > The gzip mode issue is interesting... running on the Mac, > Leopard 10.5, using the Apple provided Python 2.5.2, > looking at a gzipped QUAL file everything is fine: > > Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53) > [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>>> import gzip >>>> gzip.open("Quality/example.qual.gz", "r").read() > ... > > Looking at a gzipped FASTA file everything is fine: > ... > > But, there is a problem with my gzipped FASTQ file: > >>>> gzip.open("Quality/example.fastq.gz", "r").read() > '@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n' >>>> gzip.open("Quality/example.fastq.gz", "rb").read() > '@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n' >>>> gzip.open("Quality/example.fastq.gz", "rU").read() > Traceback (most recent call last): > ?File "", line 1, in > ?File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py", > line 220, in read > ? ?self._read(readsize) > ?File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py", > line 292, in _read > ? ?self._read_eof() > ?File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py", > line 311, in _read_eof > ? ?raise IOError, "CRC check failed" > IOError: CRC check failed > > I may have stumbled on a bug in the Python gzip library :( > Prompted by a thread on the BioPerl mailing list, I revisited this issue: http://lists.open-bio.org/pipermail/bioperl-l/2010-February/032359.html >From some cross platform testing, I always seem to get the CRC error when trying to open this gzipped FASTQ file in universal read lines mode. The FASTA and QUAL file seem fine. According to the gzip python module's documentation, it uses the zlib module, and you can find the underlying version number like this: >>> import zlib >>> zlib.ZLIB_VERSION '1.2.3' Results from some testing the simple examples above (using Python and the gzip module only): [1] Mac OS X 10.5, Python 2.5.2, GCC 4.0.1, zlib 1.2.3 - fails [2] Linux, Python 2.4.3, GCC 3.4.5, zlib 1.2.1.2 - fails [3] Linux, Python 2.3.4, GCC 3.4.6, zlib 1.2.1.2 - fails [3] Linux, Python 2.6.1, GCC 3.4.6, zlib 1.2.1.2 - fails [4] Linux, Python 2.4.3, GCC 4.1.2, zlib 1.2.3 - fails [4] Linux, Python 2.6.1, GCC 3.4.6, zlib 1.2.3 - fails [5] Windows XP 32bit, Python 2.7a1, MSC v.1500, zlib 1.2.3 - fails [5] Windows XP 32bit, Python 2.6, MSC v.1500, zlib 1.2.3 - fails [5] Windows XP 32bit, Python 2.5.2, MSC v.1310, zlib 1.2.3 - fails [5] Windows XP 32bit, Python 2.4.4, MSC v.1310, zlib 1.2.3 - fails [5] Windows XP 32bit, Python 2.3.5, MSC v.1200, zlib 1.1.4 - fails [1] My mac, [2] Local server, [3] Cluster head, [4] Cluster node, [5] My windows box This tells me that the failure isn't OS specific, and isn't specific to a particular version of Python or zlib. Note that on the Mac and Linux machines where I get the CRC failure in python, the command line tool gunzip can decompress the files fine. If anyone else wants to test this (to confirm I'm not missing anything obvious), you can download the gzipped files from github here: wget http://github.com/peterjc/biopython/raw/index-zip/Tests/Quality/example.qual.gz wget http://github.com/peterjc/biopython/raw/index-zip/Tests/Quality/example.fasta.gz wget http://github.com/peterjc/biopython/raw/index-zip/Tests/Quality/example.fastq.gz Maybe this mode isn't fully supported in gzip? I think that provided we assume that any gzipped text file will use Unix new lines, we don't need to worry about this. Peter From biopython at maubp.freeserve.co.uk Wed Feb 24 07:00:18 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Feb 2010 12:00:18 +0000 Subject: [Biopython-dev] Running pylint over Biopython In-Reply-To: <6d941f121002230956h582acc11r59cb18bca8d2f727@mail.gmail.com> References: <320fb6e01002230943g474dc673x43f7df120e679d91@mail.gmail.com> <6d941f121002230956h582acc11r59cb18bca8d2f727@mail.gmail.com> Message-ID: <320fb6e01002240400k11764b2al2438d5381ed335c4@mail.gmail.com> 2010/2/23 Tiago Ant?o : > This comes in a good time, I've actually been making changes to the > code (as the genepop parser is not able to handle big files and I've > had quite a few complains about that). it seems to be 2.6 related or > so because I've detected the Config problem myself. I will correct > this next week (this week is _impossible_), along with an update to > the genepop parser to support big files. Sound good :) Peter From biopython at maubp.freeserve.co.uk Wed Feb 24 07:37:20 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Feb 2010 12:37:20 +0000 Subject: [Biopython-dev] test_PhyloXML.py failing on Windows Message-ID: <320fb6e01002240437s4be4aef1q82bee9141acfb501@mail.gmail.com> Hi Eric, Do you have access to a Windows machine for testing? There seem to be two issues in the PhyloXML tests (tested on Python 2.5, 2.6 and 2.7a1 on Windows XP): Count and confirm the number of tags in each example XML file. ... FAIL Round-trip parsing and serialization of apaf.xml. ... ERROR Round-trip parsing and serialization of bcl_2.xml. ... ERROR Round-trip parsing and serialization of o_tol_332_d_dollo.xml. ... ERROR Round-trip parsing and serialization of made_up.xml. ... ERROR Round-trip parsing and serialization of phyloxml_examples.xml. ... ERROR The tag count error I don't immediately understand: ====================================================================== FAIL: Count and confirm the number of tags in each example XML file. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\repositories\biopython_official\Tests\test_PhyloXML.py", line 56, in test_dump_tags self.assertEquals(len(output.readlines()), count) AssertionError: 301 != 289 ---------------------------------------------------------------------- The rest all fail in _stash_rewrite_and_call where something about your file renaming is failing. It looks like you deliberately move some of your example XML files to a temp filename during the test and then move them back. This seems risky (e.g. if the test suite is stopped mid way). Can you rework this to write the output to a temp file or perhaps better yet a StringIO handle? The errors look like this: ====================================================================== ERROR: Round-trip parsing and serialization of apaf.xml. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PhyloXML.py", line 561, in test_apaf (TreeTests, ['test_DomainArchitecture']), File "test_PhyloXML.py", line 546, in _stash_rewrite_and_call os.rename(fname, fname + '~') WindowsError: [Error 183] Cannot create a file when that file already exists ---------------------------------------------------------------------- Thanks, Peter From bugzilla-daemon at portal.open-bio.org Wed Feb 24 10:38:13 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 24 Feb 2010 10:38:13 -0500 Subject: [Biopython-dev] [Bug 2998] Document need XCode with 10.4 SDK for Mac OS In-Reply-To: Message-ID: <201002241538.o1OFcDJ4005667@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2998 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-24 10:38 EST ------- Just to add a note, on Snow Leopard Apple provides python 2.5 (default, 32bit only) and python 2.6 (supports 64 bit). I suspect if you install Biopython under python 2.6 you won't need the 10.4 SDK... something to check? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rjalves at igc.gulbenkian.pt Wed Feb 24 11:07:01 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Wed, 24 Feb 2010 16:07:01 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e01002240352p622f842bo140b1210d76fde4c@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> <320fb6e01002240352p622f842bo140b1210d76fde4c@mail.gmail.com> Message-ID: <4B854EA5.7050100@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Quoting Peter on 02/24/2010 11:52 AM: > Maybe this mode isn't fully supported in gzip? I think that provided we > assume that any gzipped text file will use Unix new lines, we don't need > to worry about this. Your example puzzled me. I did a few more tests with the files you pointed out. Turns out that the fastq file is 'badly' read even on normal open 'Universal' mode. This doesn't happen on the other files: Python 2.6.4 [GCC 4.4.1] Linux >>> open('example.fastq.gz', 'rb').read() == open('example.fastq.gz', 'rU').read() False >>> open('example.fasta.gz', 'rb').read() == open('example.fasta.gz', 'rU').read() True >>> open('example.qual.gz', 'rb').read() == open('example.qual.gz', 'rU').read() True In particular the character in fault seems to be: >>> (open('example.fastq.gz', 'rb').read()[145], open('example.fastq.gz', 'rU').read()[145]) ('\r', '\n') This is the only thing that changed. After going a little over the content of the file, I found this workaround: $ gunzip example.fastq.gz && echo >> example.fastq && gzip example.fastq Which simply adds a new empty line to the end of the file. >>> open('example.fastq.gz', 'rb').read() == open('example.fastq.gz', 'rU').read() True After this I also looked into python3 (3.1.1) just in case they fixed it already and apparently they did. See for yourself: This was tested in Python-3.1.1 from within blender2.5, (apologies for that, it was the only python3 version I had around). >>> open('example.fastq.gz','rb').read() == open('example.fastq.gz','rU').read() Traceback (most recent call last): (...) UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: unexpected code byte Seems like I need to force binary mode... >>> open('example.fastq.gz','rb').read() == open('example.fastq.gz','rbU').read() True Success! >>> import gzip >>> gzip.open('example.fastq.gz','rb').read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n' >>> gzip.open('example.fastq.gz','rU').read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n' >>> gzip.open('example.fastq.gz','rbU').read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n' And everything works as expected. So unless the blender devs changed python to fix this bug, this has been fixed in python3. Should this go upstream? - -- Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuFTqAACgkQYh11EUYTX9TXbgCgmBDKrrjL6Eue8qRfgs2ydAUQ 11kAnR0beVQDLP4ldBcd2RFfJ5Q+Opo6 =MLu3 -----END PGP SIGNATURE----- From biopython at maubp.freeserve.co.uk Wed Feb 24 11:48:58 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Feb 2010 16:48:58 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <4B854EA5.7050100@igc.gulbenkian.pt> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> <320fb6e01002240352p622f842bo140b1210d76fde4c@mail.gmail.com> <4B854EA5.7050100@igc.gulbenkian.pt> Message-ID: <320fb6e01002240848t37b0bea4jc8488440ab266ac6@mail.gmail.com> On Wed, Feb 24, 2010 at 4:07 PM, Renato Alves wrote: > > After this I also looked into python3 (3.1.1) just in case they fixed it > already and apparently they did. See for yourself: You seem to be right, I tried this on Windows using Python 3.0.1 and 3.1.1, C:\repositories\biopython_pjc\Tests>c:\python30\python Python 3.0.1 (r301:69561, Feb 13 2009, 20:04:18) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import gzip >>> gzip.open("Quality\example.fastq.gz", "r").read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;; 88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;; 3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7; 393333\n' >>> gzip.open("Quality\example.fastq.gz", "rb").read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;; 88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;; 3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7; 393333\n' >>> gzip.open("Quality\example.fastq.gz", "rU").read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;; 88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;; 3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7; 393333\n' C:\repositories\biopython_pjc\Tests>c:\python31\python Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import gzip >>> gzip.open("Quality\example.fastq.gz", "r").read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;; 88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;; 3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7; 393333\n' >>> gzip.open("Quality\example.fastq.gz", "rb").read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;; 88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;; 3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7; 393333\n' >>> gzip.open("Quality\example.fastq.gz", "rU").read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;; 88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;; 3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7; 393333\n' So this does look like a Python 2.x bug which has been fixed in Python 3.x, and we should probably report this (after searching to see if it is a known issue). However, even if it is fixed in Python 2.6.x and 2.7.x, it won't get fixed in older versions like Python 2.4 or 2.5. Peter From biopython at maubp.freeserve.co.uk Wed Feb 24 12:03:09 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Feb 2010 17:03:09 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e01002240848t37b0bea4jc8488440ab266ac6@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> <320fb6e01002240352p622f842bo140b1210d76fde4c@mail.gmail.com> <4B854EA5.7050100@igc.gulbenkian.pt> <320fb6e01002240848t37b0bea4jc8488440ab266ac6@mail.gmail.com> Message-ID: <320fb6e01002240903m52629576vf85f428f68d32d15@mail.gmail.com> Hi all, I've updated my branch to cope with gzipped FASTQ files, tested on Windows XP, Mac OS X Snow Leopard, and Linux: http://github.com/peterjc/biopython/tree/index-zip This works by just opening gzipped files in default mode - which seems to be fine with the examples (FASTA, QUAL and FASTQ) where the text file in the archive uses Unix new line entries. While this may be a good solution, we should test on gzipped files containing Windows new lines too. Plus of course, try non-gzipped compression. And very large files. etc. Peter From eric.talevich at gmail.com Wed Feb 24 12:03:31 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 24 Feb 2010 12:03:31 -0500 Subject: [Biopython-dev] test_PhyloXML.py failing on Windows In-Reply-To: <320fb6e01002240437s4be4aef1q82bee9141acfb501@mail.gmail.com> References: <320fb6e01002240437s4be4aef1q82bee9141acfb501@mail.gmail.com> Message-ID: <3f6baf361002240903q2b395fd1qa5426a130b5b3d61@mail.gmail.com> On Wed, Feb 24, 2010 at 7:37 AM, Peter wrote: > Hi Eric, > > Do you have access to a Windows machine for testing? There > seem to be two issues in the PhyloXML tests (tested on > Python 2.5, 2.6 and 2.7a1 on Windows XP): > I'll have access to Windows XP this weekend, but I think I can probably fix these tests before then. ====================================================================== > FAIL: Count and confirm the number of tags in each example XML file. > ---------------------------------------------------------------------- > This was an early sanity check for parsing XML with ElementTree, and while I don't see a good reason for the number of lines to be different between OSes (line endings?), the test isn't Biopython-specific anyway. I'll just delete it. ====================================================================== > ERROR: Round-trip parsing and serialization of apaf.xml. > ---------------------------------------------------------------------- > Apparently Windows doesn't like renaming a file to replace another existing file. To fix this error asap I'll call os.remove before the rename, but you're right that these tests should be rewritten to use named temp files or StringIO. (I needed to trick unittest into re-running the parser tests on re-written files and this sufficed last summer) -Eric From rjalves at igc.gulbenkian.pt Wed Feb 24 12:13:41 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Wed, 24 Feb 2010 17:13:41 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e01002240848t37b0bea4jc8488440ab266ac6@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> <320fb6e01002240352p622f842bo140b1210d76fde4c@mail.gmail.com> <4B854EA5.7050100@igc.gulbenkian.pt> <320fb6e01002240848t37b0bea4jc8488440ab266ac6@mail.gmail.com> Message-ID: <4B855E45.9080708@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >So this does look like a Python 2.x bug which has been fixed in Python >3.x, and we should probably report this (after searching to see if it >is a known issue). The closest I could find is: http://bugs.python.org/issue5148 But it's also on gzip.open(), not plain open(). >However, even if it is fixed in Python 2.6.x and 2.7.x, it won't get >fixed in older versions like Python 2.4 or 2.5. Do you raising a warning if the 'U' mode is explicitly passed would be a reasonable solution for older python versions? Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuFXkMACgkQYh11EUYTX9TKNACfXIj2p5OTRetf9cWU/ppV8oWb CPcAoIJkkNfHj6AeLAxl2/FtSH3+7UR5 =W7wg -----END PGP SIGNATURE----- From biopython at maubp.freeserve.co.uk Wed Feb 24 12:28:58 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Feb 2010 17:28:58 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <4B855E45.9080708@igc.gulbenkian.pt> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> <320fb6e01002240352p622f842bo140b1210d76fde4c@mail.gmail.com> <4B854EA5.7050100@igc.gulbenkian.pt> <320fb6e01002240848t37b0bea4jc8488440ab266ac6@mail.gmail.com> <4B855E45.9080708@igc.gulbenkian.pt> Message-ID: <320fb6e01002240928h54519628pfb91dd1bf8d9c1f7@mail.gmail.com> On Wed, Feb 24, 2010 at 5:13 PM, Renato Alves wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > >>So this does look like a Python 2.x bug which has been fixed in Python >>3.x, and we should probably report this (after searching to see if it >>is a known issue). > > The closest I could find is: http://bugs.python.org/issue5148 > > But it's also on gzip.open(), not plain open(). It is gzip.open() that we have a problem with, open() is fine. It does look like http://bugs.python.org/issue6759 and/or the linked bug http://bugs.python.org/issue6759 cover this issue. Thanks for finding them. >>However, even if it is fixed in Python 2.6.x and 2.7.x, it won't get >>fixed in older versions like Python 2.4 or 2.5. > > Do you raising a warning if the 'U' mode is explicitly passed > would be a reasonable solution for older python versions? Are you asking about what I would like Python to do? I would like gzip.open() to support universal newline mode. For Biopython's index function we currently don't allow the user to specify the mode at all - the code decides this based on the file format (SFF files must be binary, for text files I use universal newline mode). Peter From rjalves at igc.gulbenkian.pt Wed Feb 24 13:25:04 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Wed, 24 Feb 2010 18:25:04 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e01002240928h54519628pfb91dd1bf8d9c1f7@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> <320fb6e01002240352p622f842bo140b1210d76fde4c@mail.gmail.com> <4B854EA5.7050100@igc.gulbenkian.pt> <320fb6e01002240848t37b0bea4jc8488440ab266ac6@mail.gmail.com> <4B855E45.9080708@igc.gulbenkian.pt> <320fb6e01002240928h54519628pfb91dd1bf8d9c1f7@mail.gmail.com> Message-ID: <4B856F00.7030201@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > For Biopython's index function we currently don't allow the > user to specify the mode at all - the code decides this based > on the file format (SFF files must be binary, for text files I use > universal newline mode). For some reason I thought the user could set the mode. Anyway, thanks for the clarification. Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuFbvwACgkQYh11EUYTX9QM6gCeK4aMVBoZWZmI+SNccwSd9qle xv8AnA8gZLQn1m8bXMT9Dl5YIRM4akC2 =jQ9l -----END PGP SIGNATURE----- From bugzilla-daemon at portal.open-bio.org Thu Feb 25 08:35:04 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Feb 2010 08:35:04 -0500 Subject: [Biopython-dev] [Bug 2837] Reading Roche 454 SFF sequence read files in Bio.SeqIO In-Reply-To: Message-ID: <201002251335.o1PDZ4qn013099@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2837 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-25 08:35 EST ------- Marking as fixed since I recently merged this code into the trunk. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Feb 25 09:29:19 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 25 Feb 2010 14:29:19 +0000 Subject: [Biopython-dev] test_PhyloXML.py failing on Windows In-Reply-To: <3f6baf361002240903q2b395fd1qa5426a130b5b3d61@mail.gmail.com> References: <320fb6e01002240437s4be4aef1q82bee9141acfb501@mail.gmail.com> <3f6baf361002240903q2b395fd1qa5426a130b5b3d61@mail.gmail.com> Message-ID: <320fb6e01002250629te597954v46308838faca607e@mail.gmail.com> On Wed, Feb 24, 2010 at 5:03 PM, Eric Talevich wrote: > > Apparently Windows doesn't like renaming a file to replace another existing > file. To fix this error asap I'll call os.remove before the rename, ... I had to add other similar check before it would run on my machine. > but you're right that these tests should be rewritten to use named temp > files or StringIO. (I needed to trick unittest into re-running the parser > tests on re-written files and this sufficed last summer) OK, something for the TODO list. Should we file a bug to remind us? Peter From biopython at maubp.freeserve.co.uk Fri Feb 26 08:09:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Feb 2010 13:09:46 +0000 Subject: [Biopython-dev] ImportWarning is new on Python 2.5 Message-ID: <320fb6e01002260509k65e9f9acm494c80a76af8d25e@mail.gmail.com> Hi Eric, I've just been running the test suite on Python 2.4 (on CentOS 5.4) and noticed you use ImportWarning (which was added in Python 2.5) in Bio/Phylo/PhyloXMLIO.py Although we are going to phase out support for Python 2.4, we still need to keep things compatible for now. Are you happy to switch this to a different warning for now, and add a TODO comment to put it back to an ImportWarning once we drop Python 2.4 support? Thanks Peter From eric.talevich at gmail.com Fri Feb 26 09:56:53 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 26 Feb 2010 09:56:53 -0500 Subject: [Biopython-dev] ImportWarning is new on Python 2.5 In-Reply-To: <320fb6e01002260509k65e9f9acm494c80a76af8d25e@mail.gmail.com> References: <320fb6e01002260509k65e9f9acm494c80a76af8d25e@mail.gmail.com> Message-ID: <3f6baf361002260656n581a526dtc4a5374640f546ed@mail.gmail.com> On Fri, Feb 26, 2010 at 8:09 AM, Peter wrote: > Hi Eric, > > I've just been running the test suite on Python 2.4 (on CentOS 5.4) > and noticed you use ImportWarning (which was added in Python 2.5) in > Bio/Phylo/PhyloXMLIO.py > > Although we are going to phase out support for Python 2.4, we still > need to keep things compatible for now. > > Are you happy to switch this to a different warning for now, and add a > TODO comment to put it back to an ImportWarning once we drop Python > 2.4 support? > Sure, I'll switch it to a generic Warning for now and leave a comment. I doubt the type of the warning is very important for most uses. -Eric From bugzilla-daemon at portal.open-bio.org Fri Feb 26 11:26:10 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Feb 2010 11:26:10 -0500 Subject: [Biopython-dev] [Bug 2553] Adding SeqRecord objects to an alignment (append or extend) In-Reply-To: Message-ID: <201002261626.o1QGQA1g028222@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2553 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-26 11:26 EST ------- I've started a possible implementation of an improved multiple sequence alignment object on a github branch: http://github.com/peterjc/biopython/commits/alignment-obj This already handles: Bug 2553 - Adding SeqRecord objects to an alignment (append or extend) Bug 2554 - Creating an Alignment from a list of SeqRecord objects I also plan to cover: Bug 2551 - Adding advanced __getitem__ e.g. align[1:2,5:-5] Bug 2552 - Adding alignments -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 26 11:26:43 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Feb 2010 11:26:43 -0500 Subject: [Biopython-dev] [Bug 2554] Creating an Alignment from a list of SeqRecord objects In-Reply-To: Message-ID: <201002261626.o1QGQhNF028283@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2554 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-26 11:26 EST ------- I've started a possible implementation of an improved multiple sequence alignment object on a github branch: http://github.com/peterjc/biopython/commits/alignment-obj This already handles: Bug 2553 - Adding SeqRecord objects to an alignment (append or extend) Bug 2554 - Creating an Alignment from a list of SeqRecord objects I also plan to cover: Bug 2551 - Adding advanced __getitem__ e.g. align[1:2,5:-5] Bug 2552 - Adding alignments -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 26 12:28:31 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Feb 2010 12:28:31 -0500 Subject: [Biopython-dev] [Bug 2552] Adding alignments In-Reply-To: Message-ID: <201002261728.o1QHSVob029960@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2552 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-26 12:28 EST ------- I've started a possible implementation of an improved multiple sequence alignment object on a github branch: http://github.com/peterjc/biopython/commits/alignment-obj This now handles: Bug 2552 - Adding alignments (this bug) Bug 2553 - Adding SeqRecord objects to an alignment (append or extend) Bug 2554 - Creating an Alignment from a list of SeqRecord objects I also plan to cover: Bug 2551 - Adding advanced __getitem__ e.g. align[1:2,5:-5] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Feb 27 13:24:03 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 27 Feb 2010 13:24:03 -0500 Subject: [Biopython-dev] [Bug 3016] New: Change WriterTests in test_PhyloXML.py to use StringIO or temp files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3016 Summary: Change WriterTests in test_PhyloXML.py to use StringIO or temp files Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: minor Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: eric.talevich at gmail.com The method _stash_rewrite_and_call currently parses each of the example phyloXML files, renames the parsed file to [filename]~, writes out another copy (from the parsed data structure) using the original filename, re-runs the suite of parser tests on the rewritten files, and finally renames the stashed copies back to the original filenames. This is protected by a try-finally clause, but could still fail to restore the original test files if the Python interpreter is interrupted/killed. Moreover, the design is a little pathological, and could be hard to maintain or extend later. Redesign the writer tests to rewrite and test a copy of each originals at some location other than the original filename. Ideally, use StringIO to store the copy; a named temporary file (see tempfile module) is also acceptable. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 1 00:21:51 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 31 Jan 2010 19:21:51 -0500 Subject: [Biopython-dev] [Bug 3004] PSL alignment format parsing in Bio.AlignIO In-Reply-To: Message-ID: <201002010021.o110Lp9e009311@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3004 ------- Comment #2 from forgetta at gmail.com 2010-01-31 19:21 EST ------- Now on github: http://github.com/vforget/PyBLATPSL Vince -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 1 11:17:58 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Feb 2010 06:17:58 -0500 Subject: [Biopython-dev] [Bug 3004] PSL alignment format parsing In-Reply-To: Message-ID: <201002011117.o11BHwib023118@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3004 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk Summary|PSL alignment format parsing|PSL alignment format parsing |in Bio.AlignIO | ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-01 06:17 EST ------- (In reply to comment #2) > Now on github: > > http://github.com/vforget/PyBLATPSL > > Vince > Thanks for the link. I don't see how this connects to sequence alignments for Bio.AlignIO as suggested in your original comment (bug title edited accordingly). I see you are parsing tabular output into an object, with addition methods for scores etc. This looks fairly useful, but is not appropriate for the Bio.AlignIO module. Maybe it can go under a new namespace instead, maybe Bio.BLAT? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 1 11:27:00 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Feb 2010 06:27:00 -0500 Subject: [Biopython-dev] [Bug 3000] Could SeqIO.parse() store the whole, unparsed multiline entry? In-Reply-To: Message-ID: <201002011127.o11BR0lp023326@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3000 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-01 06:26 EST ------- (In reply to comment #1) > (In reply to comment #0) > > Still, I suspect this will > > reformat the entry (currently I see trailing dot removed from KEYWORDS, no > > REFERENCE, AUTHORS, TITLE, JOURNAL, PUBMED; and FEATURES.source being > > re-ordered). > > Yes, using Bio.SeqIO to read/write a GenBank record will give you (slightly) > different output. We do not guarantee a 100% round trip (even on simpler > formats like FASTA). Even little things like line wrapping would make this > very difficult. > > Regarding GenBank KEYWORDS, please file a bug. Don't worry about reporting a bug for this, I've just fixed the missing period for KEYWORDS: http://github.com/biopython/biopython/commit/5a87b070fc1f4fb911d4cf8a2e53c330cd6bd83d Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 1 13:35:11 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Feb 2010 08:35:11 -0500 Subject: [Biopython-dev] [Bug 2294] Writing GenBank files with Bio.SeqIO In-Reply-To: Message-ID: <201002011335.o11DZBcJ029190@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2294 ------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-01 08:35 EST ------- (In reply to comment #16) > > > * Writing references > > Not done yet, but for my personal needs this is low priority. Reference output in GenBank format from SeqIO just committed on github, http://github.com/biopython/biopython/commit/42707bda738d0239a9ff85a39c39c89c8024549d > > * Extending to cover writing EBML files > > Not done yet, but should be comparatively straight forward. Let's track this > possible enhancement on a separate bug. EMBL output in SeqIO was done a while ago and was included in Biopython 1.52 (although we don't yet write references in EMBL output). Things still to do on GenBank output include better handling of the LOCUS line, such as the data division. See also Bug 2578 for the molecule type. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 1 14:43:41 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Feb 2010 09:43:41 -0500 Subject: [Biopython-dev] [Bug 2294] Writing GenBank files with Bio.SeqIO In-Reply-To: Message-ID: <201002011443.o11EhfAT031724@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2294 ------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-01 09:43 EST ------- (In reply to comment #17) > > EMBL output in SeqIO was done a while ago and was included in Biopython 1.52 > (although we don't yet write references in EMBL output). References in EMBL output implemented now: http://github.com/biopython/biopython/commit/370e02053a45aec6209bd826aebab7bfc29d7e84 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Feb 2 18:37:25 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 2 Feb 2010 18:37:25 +0000 Subject: [Biopython-dev] Getting raw unparsed records with SeqIO? Message-ID: <320fb6e01002021037p75d24f43w84837c426b057093@mail.gmail.com> Hi all, Over on enhancement Bug 3000, Martin was asking about getting raw unparsed strings for each record in a sequence file: http://bugzilla.open-bio.org/show_bug.cgi?id=3000 This makes sense for sequential files like FASTA and GenBank, but not for interlaced files like PHYLIP, and has less obvious uses when there is any kind of header or footer (e.g. XML or SFF files). The particular example Martin gave was selecting a subset of records in a large GenBank file (I've done this myself in the past). While this can be done via Bio.SeqIO, the process of parsing the data into a SeqRecord and saving it again is lossy. While there is room for improvement. For this particular example, I suggested Martin use the "old" iterator class in Bio.GenBank. In general things like white space and wrapping mean that a SeqIO parse/write cannot guarantee a 100% unaltered round trip, and will also be slower than using the raw record as a string. Martin suggested adding an optional argument to the parse function. I'm not sure this is a good API choice, as it would dramatically alter the return values. Perhaps we could have a new iterator function in Bio.SeqIO for suitable sequential files only which returns a series of strings, one for each record, unmodified? Either way I don't see how this would be used - surely the user would need to do some basic analysis of each raw record to decide how to process it? In this example, they would need to extract the ID/accession to see if they want to output the record or not. While parsing the record into a SeqRecord may not be needed, in most cases the record identifier would be very useful - and this has some big overlaps with the Bio.SeqIO.index() code which already breaks up files into records and extracts their identifiers. i.e. A top level Bio.SeqIO function to iterate over a file returning tuples of the record identifier and the raw record as strings *could* be useful. Implementing this nicely would mean re-factoring Bio.SeqIO.index() extensively. Another solution to this task (extracting the raw GenBank records from a large file) would seem to be to extend the Bio.SeqIO.index functionality. The patch I'm about to attach to Bug 3000 adds a new "get_raw" method to the dictionary like object we return. Unlike the __getitem__ and get methods which return a SeqRecord this just gives the raw string. Note that I haven't implemented this for all the index support file formats yet, and this has had only very basic testing. Writing this email took longer than writing the code. However, I hope it illustrates the idea enough for a discussion. As an example how the index function could be used with this patch: >>> from Bio import SeqIO >>> data = SeqIO.index("cor6_6.gb", "gb") >>> data.keys() ['L31939.1', 'AJ237582.1', 'X62281.1', 'AF297471.1', 'X55053.1', 'M81224.1'] >>> print data.get_raw("X62281.1") LOCUS ATKIN2 880 bp DNA PLN 23-JUL-1992 DEFINITION A.thaliana kin2 gene. ACCESSION X62281 ... // What are people's thoughts on this? Peter From bugzilla-daemon at portal.open-bio.org Tue Feb 2 18:40:07 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 2 Feb 2010 13:40:07 -0500 Subject: [Biopython-dev] [Bug 3000] Could SeqIO.parse() store the whole, unparsed multiline entry? In-Reply-To: Message-ID: <201002021840.o12Ie7pO015898@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3000 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-02 13:40 EST ------- Created an attachment (id=1436) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1436&action=view) Adds a get_raw method to the dictionaries returned by Bio.SeqIO.index() Outline implementation of an alternative proposal, allowing access to the raw text for each record via the Bio.SeqIO.index() dictionary like objects. See discussion here: http://lists.open-bio.org/pipermail/biopython-dev/2010-February/007301.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From krother at rubor.de Wed Feb 3 10:29:04 2010 From: krother at rubor.de (Kristian Rother) Date: Wed, 3 Feb 2010 11:29:04 +0100 Subject: [Biopython-dev] report: what happens on 'from Bio import PDB'? In-Reply-To: <201002021840.o12Ie7pO015898@portal.open-bio.org> References: <201002021840.o12Ie7pO015898@portal.open-bio.org> Message-ID: <18fbb8f40f6ec6efe3d5dffff68aaa57-EhVcX1xCQgFaRwICBxEAXR0wfgFLV15YQUBGAEFfUC9ZUFgWXVpyH1RXX0FdQU1tXlhfRVlcWgFbWw==-webmailer2@server03.webmailer.hosteurope.de> Hi, I'm currently checking what my application is using its memory for (because it uses way too much for non-Biopython related things). However, as soon as the simple command from Bio import PDB is executed, these are the objects that Python has in memory after running the gc: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 from module:Bio.GenBank.utils 1 from module:Bio.PDB.PDBIO 1 from module:Bio.PropertyManager 1 from module:os 2 2 2 2 2 2 2 2 2 2 from module:Bio.GenBank.LocationParser 2 from module:xml.sax.handler 3 3 3 3 4 5 6 6 6 from module:numpy.ma.extras 7 7 from module:Bio.Alphabet.IUPAC 7 from module:__future__ 8 9 10 13 14 15 16 16 19 27 35 35 36 38 49 56 56 from module:Bio.Alphabet 68 76 91 95 from module:numpy.ma.core 201 203 225 350 351 from module:Bio.Data.CodonTable 360 385 393 407 579 837 1365 2073 3191 3289 4099 11989 19718 total 50912 Hope this is useful ;-) Best Regards, Kristian From lplp90 at gmail.com Wed Feb 3 11:35:49 2010 From: lplp90 at gmail.com (Laura Padioleu) Date: Wed, 3 Feb 2010 12:35:49 +0100 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... Message-ID: <771202501002030335r52f62fccta1b976f9e586d435@mail.gmail.com> On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox > wrote: >* *>* Hi Folks, *>* *>* this is a demo that i use to create then align my fasta sequences using clustalw. Hope it helps. here's the code * >def clustal(list_struc): > > > hash_table={} > for i in range (len(list_struc)): > for j in range (i+1,len(list_struc)): > pair=(list_struc[i],list_struc[j]) > hash_table >[pair]=0 > > > for pair in hash_table >.keys(): > fasta_fic=open("fasta.fasta",'w') > for ID in pair: > fasta_fic.write(">"+ID.get_id()+'\n') > > # recuperation des sequences des acides amines > for chain in ID.get_chains(): > ppb = PPBuilder() > > pp = ppb.build_peptides(chain) > # l'ajout des sequences aux fichiers fasta > fasta_fic.write(pp[0].get_sequence().tostring()) > fasta_fic.write('\n') > fasta_fic.close() > cline = ClustalwCommandline(cmd="clustalw", infile="file.fasta") > return_code = subprocess.call(str(cline), shell=(sys.platform!="win32")) > > alignment = AlignIO.read(open("file"+str(nb)+".aln"),"clustal") > > > j=0 > i=0 > for record in alignment: > for amino_acid in record.seq: > if amino_acid == '-': > pass > else: > if amino_acid == alignment[0].seq[j]: > i += 1 > j += 1 > j = 0 > >seq = str(record.seq) > gap_strip = seq.replace('-', '') > percent = 100.0*i/len(seq) > > i=0 > hash_table[pair]=str(percent)+"\t"+str(percent2) > > > return hash_table > >def csv_writer(list_struc): > hash_table=clustal(list_struc) > csv_fic=open("file.csv",'a') > for couple in hash_table.keys(): > csv_fic.write(pari[0].get_id()+"\t"+str(hash_table[pair])+'\n') > csv_fic.close()* Hello, im using python version 2.5 but i can't compile this code correctly what version of python and biopython you are using ? Thanks From chapmanb at 50mail.com Wed Feb 3 12:46:48 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 3 Feb 2010 07:46:48 -0500 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... In-Reply-To: <771202501002030335r52f62fccta1b976f9e586d435@mail.gmail.com> References: <771202501002030335r52f62fccta1b976f9e586d435@mail.gmail.com> Message-ID: <20100203124648.GC40046@sobchak.mgh.harvard.edu> Hi Laura; [clustalw example from Cymon] > im using python version 2.5 but i can't compile this code correctly > what version of python and biopython you are using ? We could help more with some additional information. Could you copy and paste the error message you are seeing? Brad From cy at cymon.org Wed Feb 3 12:48:49 2010 From: cy at cymon.org (Cymon Cox) Date: Wed, 3 Feb 2010 12:48:49 +0000 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... In-Reply-To: <7265d4f1002030412l1258237jf50ff37845e7c5a5@mail.gmail.com> References: <771202501002030335r52f62fccta1b976f9e586d435@mail.gmail.com> <7265d4f1002030412l1258237jf50ff37845e7c5a5@mail.gmail.com> Message-ID: <7265d4f1002030448n28065ea1ifc411cf0c7b462e8@mail.gmail.com> ---------- Forwarded message ---------- From: Cymon Cox Date: 3 February 2010 12:12 Subject: Re: [Biopython-dev] Multiple alignment - Clustalw etc... To: Laura Padioleu Hi Laura, On 3 February 2010 11:35, Laura Padioleu wrote: > On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox > wrote: > >* > *>* Hi Folks, > Yes, I did write that... Hello, > > im using python version 2.5 but i can't compile this code correctly > what version of python and biopython you are using ? > How exactly are you using this code? What error do you get? Can you cut and paste a session from the terminal? Cheers, C. -- From chapmanb at 50mail.com Wed Feb 3 12:55:52 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 3 Feb 2010 07:55:52 -0500 Subject: [Biopython-dev] Getting raw unparsed records with SeqIO? In-Reply-To: <320fb6e01002021037p75d24f43w84837c426b057093@mail.gmail.com> References: <320fb6e01002021037p75d24f43w84837c426b057093@mail.gmail.com> Message-ID: <20100203125552.GD40046@sobchak.mgh.harvard.edu> Hi Peter; > Another solution to this task (extracting the raw GenBank > records from a large file) would seem to be to extend the > Bio.SeqIO.index functionality. The patch I'm about to > attach to Bug 3000 adds a new "get_raw" method to the > dictionary like object we return. Unlike the __getitem__ > and get methods which return a SeqRecord this just gives > the raw string. [...] > >>> from Bio import SeqIO > >>> data = SeqIO.index("cor6_6.gb", "gb") > >>> data.keys() > ['L31939.1', 'AJ237582.1', 'X62281.1', 'AF297471.1', 'X55053.1', 'M81224.1'] > >>> print data.get_raw("X62281.1") > LOCUS ATKIN2 880 bp DNA PLN 23-JUL-1992 > DEFINITION A.thaliana kin2 gene. > ACCESSION X62281 > ... > // > > What are people's thoughts on this? Not much to add, but a +1 from me. This sounds like a solid solution and makes sense for the use case I can think of, which is picking out records of interest from a large file and re-writing them in a smaller file. Brad From chapmanb at 50mail.com Wed Feb 3 12:55:52 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 3 Feb 2010 07:55:52 -0500 Subject: [Biopython-dev] Getting raw unparsed records with SeqIO? In-Reply-To: <320fb6e01002021037p75d24f43w84837c426b057093@mail.gmail.com> References: <320fb6e01002021037p75d24f43w84837c426b057093@mail.gmail.com> Message-ID: <20100203125552.GD40046@sobchak.mgh.harvard.edu> Hi Peter; > Another solution to this task (extracting the raw GenBank > records from a large file) would seem to be to extend the > Bio.SeqIO.index functionality. The patch I'm about to > attach to Bug 3000 adds a new "get_raw" method to the > dictionary like object we return. Unlike the __getitem__ > and get methods which return a SeqRecord this just gives > the raw string. [...] > >>> from Bio import SeqIO > >>> data = SeqIO.index("cor6_6.gb", "gb") > >>> data.keys() > ['L31939.1', 'AJ237582.1', 'X62281.1', 'AF297471.1', 'X55053.1', 'M81224.1'] > >>> print data.get_raw("X62281.1") > LOCUS ATKIN2 880 bp DNA PLN 23-JUL-1992 > DEFINITION A.thaliana kin2 gene. > ACCESSION X62281 > ... > // > > What are people's thoughts on this? Not much to add, but a +1 from me. This sounds like a solid solution and makes sense for the use case I can think of, which is picking out records of interest from a large file and re-writing them in a smaller file. Brad From bugzilla-daemon at portal.open-bio.org Wed Feb 3 21:44:14 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 3 Feb 2010 16:44:14 -0500 Subject: [Biopython-dev] [Bug 1999] new frame translation method In-Reply-To: Message-ID: <201002032144.o13LiERA027299@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1999 ------- Comment #3 from eric.talevich at gmail.com 2010-02-03 16:44 EST ------- Can we split this into two functions? I tried this function today, hoping it would help me get a list of ORFs from a big contig -- but both frameTranslations and six_frame_translation do two things without stopping in between: 1. Translate the DNA or RNA sequence to amino acids in all six frames 2. Pretty-print the six-frame translation So, how about factoring out just this piece (or similar): def translate_six_frames(seq, genetic_code=1): """Dictionary of 6-frame translations.""" anti = seq.reverse_complement() frames = {} for i in range(0,3): frames[i+1] = seq[i:].translate(genetic_code) frames[-i-1] = SeqUtils.reverse(anti[i:].translate(genetic_code)) return frames Then either pretty-printer can call this internally, and the user also has access to the individual translated sequences. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Feb 3 23:13:10 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Feb 2010 23:13:10 +0000 Subject: [Biopython-dev] [Bug 3000] Could SeqIO.parse() store the whole, unparsed multiline entry? In-Reply-To: <4B6995D0.3030405@fold.natur.cuni.cz> References: <201002021840.o12Ie88i015906@portal.open-bio.org> <4B6995D0.3030405@fold.natur.cuni.cz> Message-ID: <320fb6e01002031513r1faac5faicf027daf5da77d80@mail.gmail.com> On Wed, Feb 3, 2010 at 3:27 PM, Martin MOKREJ? wrote: > > Hi Peter, > ?thank you very much for all your efforts. I will try to get to testing the cvs > code in few days. Definitely will keep you updated. ;) > Martin > > bugzilla-daemon at portal.open-bio.org wrote: >> http://bugzilla.open-bio.org/show_bug.cgi?id=3000 >> ... The patch hasn't been checked in, but should apply to either the master branch in github or (I expect) Biopython 1.53 I'm looking forward to feedback. Peter From bugzilla-daemon at portal.open-bio.org Thu Feb 4 15:20:51 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 4 Feb 2010 10:20:51 -0500 Subject: [Biopython-dev] [Bug 1999] new frame translation method In-Reply-To: Message-ID: <201002041520.o14FKp9j000360@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1999 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-04 10:20 EST ------- (In reply to comment #3) > Can we split this into two functions? I tried this function today, hoping it > would help me get a list of ORFs from a big contig -- but both > frameTranslations and six_frame_translation do two things without stopping in > between: > > 1. Translate the DNA or RNA sequence to amino acids in all six frames I'd wondered about this - possibly as a generator/iterator which always gives back exactly six sequences - but don't really see much point. There is also going to be some debate about how frames are labelled (especially the minus frames). > 2. Pretty-print the six-frame translation Personally I don't see this as being very useful, but someone must like it. I lean to just deprecating and removing this code. > So, how about factoring out just this piece (or similar): > > def translate_six_frames(seq, genetic_code=1): > """Dictionary of 6-frame translations.""" > anti = seq.reverse_complement() > frames = {} > for i in range(0,3): > frames[i+1] = seq[i:].translate(genetic_code) > frames[-i-1] = SeqUtils.reverse(anti[i:].translate(genetic_code)) > return frames You should be taking the reverse complement, not just the reverse. This would just be seq[i:].reverse_complement() or seq.reverse_complenent()[i:] depending on how you label the reverse frames. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Feb 4 15:30:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Feb 2010 15:30:47 +0000 Subject: [Biopython-dev] Getting raw unparsed records with SeqIO? In-Reply-To: <20100203125552.GD40046@sobchak.mgh.harvard.edu> References: <320fb6e01002021037p75d24f43w84837c426b057093@mail.gmail.com> <20100203125552.GD40046@sobchak.mgh.harvard.edu> Message-ID: <320fb6e01002040730q7835e1a1uc784dfaae5faaef2@mail.gmail.com> On Wed, Feb 3, 2010 at 12:55 PM, Brad Chapman wrote: > > Not much to add, but a +1 from me. This sounds like a solid solution > and makes sense for the use case I can think of, which is picking > out records of interest from a large file and re-writing them in a > smaller file. > Let's give Martin a chance to test with the patch, and see how he gets on. I'm curious if anyone can come up with other examples of how this could be applied, which would help justify adding it to Bio.SeqIO. Peter From biopython at maubp.freeserve.co.uk Thu Feb 4 15:30:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Feb 2010 15:30:47 +0000 Subject: [Biopython-dev] Getting raw unparsed records with SeqIO? In-Reply-To: <20100203125552.GD40046@sobchak.mgh.harvard.edu> References: <320fb6e01002021037p75d24f43w84837c426b057093@mail.gmail.com> <20100203125552.GD40046@sobchak.mgh.harvard.edu> Message-ID: <320fb6e01002040730q7835e1a1uc784dfaae5faaef2@mail.gmail.com> On Wed, Feb 3, 2010 at 12:55 PM, Brad Chapman wrote: > > Not much to add, but a +1 from me. This sounds like a solid solution > and makes sense for the use case I can think of, which is picking > out records of interest from a large file and re-writing them in a > smaller file. > Let's give Martin a chance to test with the patch, and see how he gets on. I'm curious if anyone can come up with other examples of how this could be applied, which would help justify adding it to Bio.SeqIO. Peter From bugzilla-daemon at portal.open-bio.org Mon Feb 8 17:08:33 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 8 Feb 2010 12:08:33 -0500 Subject: [Biopython-dev] [Bug 3006] New: esearch medline fails with xml format Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3006 Summary: esearch medline fails with xml format Product: Biopython Version: Not Applicable Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: georg.lipps at fhnw.ch I used to retrieve Pubmed records with python 2.5.1 however lately the efetch with xml produces an error. The problem has arosen at the year change maybe related to the DTD definition file: Here is a short code which produces the error: from Bio import Entrez from Bio import Medline def retrieve_medline(doi): # Uses the doi to obtain the medline id and then retrieves the medline entry # Returns the medline entry as text and python object or an empty string print "...queing medline with DOI", doi handle = Entrez.esearch(db="pubmed", term=doi, retmode="XML") record=Entrez.read(handle) if record["Count"]<>"1": return None, None handle=Entrez.efetch(db="pubmed", id=record["IdList"], retmode="text", rettype="medline") xml=Entrez.efetch(db="pubmed", id=record["IdList"], retmode="XML", rettype="medline") return handle.read(), Entrez.read(xml) doi='10.1038/nature07389' article, xml=retrieve_medline(doi) print article OUTPUT: Traceback (most recent call last): File "U:/Literatur/pdf to RM converter/test.py", line 24, in article, xml=retrieve_medline(doi) File "U:/Literatur/pdf to RM converter/test.py", line 15, in retrieve_medline return handle.read(), Entrez.read(xml) File "C:\Program Files\python25\lib\site-packages\Bio\Entrez\__init__.py", line 283, in read record = handler.run(handle) File "C:\Program Files\python25\lib\site-packages\Bio\Entrez\Parser.py", line 95, in run self.parser.ParseFile(handle) File "C:\Program Files\python25\lib\site-packages\Bio\Entrez\Parser.py", line 131, in startElement return UnboundLocalError: local variable 'object' referenced before assignment -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 8 23:26:38 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 8 Feb 2010 18:26:38 -0500 Subject: [Biopython-dev] [Bug 3006] esearch medline fails with xml format In-Reply-To: Message-ID: <201002082326.o18NQcwP006902@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3006 ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2010-02-08 18:26 EST ------- I was not able to replicate this bug. Your example code ran correctly with Python 2.6, Biopython 1.53. Are you using the latest version of Biopython? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sandford at ufl.edu Mon Feb 8 21:49:20 2010 From: sandford at ufl.edu (Michael Sandford) Date: Mon, 08 Feb 2010 16:49:20 -0500 Subject: [Biopython-dev] Where should feature intersection code go? Message-ID: <4B7086E0.1090501@ufl.edu> I'm working on a project that's looking for alternative splicing using solexa data instead of microarray data. Basically we've got a GFF file containing all the genes, introns and exons and 35M reads that have been placed into one of the various chromosomes via the excellent bowtie application out of Maryland. Bowtie output is documented here: http://bowtie-bio.sourceforge.net/manual.shtml#default-bowtie-output In summary it's roughly a cross between fastq and GFF. It's got the read name, strand, sequence the read aligned to, position, sequence, quality, and a few others. It seems like it could rather easily be coerced into a SeqRecord (http://biopython.org/DIST/docs/api/Bio.SeqRecord.SeqRecord-class.html). It might not get filled up completely, but it'd be better than handling things in a one-off way. The FeatureLocation class provides for approximate and exact locations (both start and stop positions). It seems like the correct location to put code that determines if two FeatureLocations overlap, or if one contains another, or is contained by another. Overall I'm talking about writing a bowtie .map parser and the comparison code for FeatureLocation. Would these be welcome features? Thanks, Mike From chapmanb at 50mail.com Tue Feb 9 01:04:25 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 8 Feb 2010 20:04:25 -0500 Subject: [Biopython-dev] Where should feature intersection code go? In-Reply-To: <4B7086E0.1090501@ufl.edu> References: <4B7086E0.1090501@ufl.edu> Message-ID: <20100209010425.GD2193@kunkel> Mike; > I'm working on a project that's looking for alternative splicing > using solexa data instead of microarray data. Basically we've got a > GFF file containing all the genes, introns and exons and 35M reads > that have been placed into one of the various chromosomes via the > excellent bowtie application out of Maryland. [...] > Overall I'm talking about writing a bowtie .map parser and the > comparison code for FeatureLocation. Would these be welcome > features? A .map parser would definitely be useful. Another suggestion is to get Bowtie to produce SAM format and use Pysam for parsing: http://code.google.com/p/pysam/ The advantage of SAM is that it's an emerging standard and a lot of downstream applications can use it. This way you can switch aligners in your workflow without much disruption. For doing feature overlaps, IntervalTree in bx-python is excellent: http://bitbucket.org/james_taylor/bx-python/wiki/Home http://bitbucket.org/james_taylor/bx-python/src/tip/lib/bx/intervals/intersection.pyx See the doc string of the IntervalTree class for how to use it. My normal workflow is to build an IntervalTree with the GFF features of your genome, and then loop through the alignment file finding features that each alignment intersects. For alternative splicing, are you using the raw genome or a built transcriptome for all possible combinations of exons? One practical thing to consider if that a read will not be aligned to the genome if it splits an exon/exon junction. Hope this helps, Brad From bugzilla-daemon at portal.open-bio.org Wed Feb 10 01:42:28 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Feb 2010 20:42:28 -0500 Subject: [Biopython-dev] [Bug 2294] Writing GenBank files with Bio.SeqIO In-Reply-To: Message-ID: <201002100142.o1A1gSJJ022517@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2294 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #19 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-09 20:42 EST ------- (In reply to comment #17) > > Things still to do on GenBank output include better handling of the LOCUS > line, such as the data division. See also Bug 2578 for the molecule type. > I've adding mappings for some EMBL divisions to suitable GenBank divisions. I'm closing this bug now, as GenBank output does basically work now. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rjalves at igc.gulbenkian.pt Wed Feb 10 18:30:05 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Wed, 10 Feb 2010 18:30:05 +0000 Subject: [Biopython-dev] KEGG support Message-ID: <4B72FB2D.4070808@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi everyone, KEGG support in Biopython has been mostly untouched for the past 8 years with only a few changes and test additions. There is code in the tree to work with the Enzyme and Compound databases but not for others such as GENES, ORTHOLOGY, DRUG, ... Considering the fact that I will need to write some code to work with other formats I was planning to contribute and integrate it with the SeqIO interface. This will require some additional homework on my part. KEGG also has a SOAP based API [1]. It's functionality could be in some aspects compared to NCBI eutils. Using the python SOAP library suds [2] I had no problem interacting with it. So just in case someone was already working on this secretly :) I would like to know to make my life easier. If not I would also like to know if you would be interested in the addition and finally what's your thought about the SOAP interface and the suds (optional) dependency. Just a word on suds. Even though the project has been around for a few years now, it's still not available in most Linux distros. On my personal experience with it it's probably the simplest and easy to use SOAP library for python out there. Cheers, Renato [1] - http://www.genome.jp/kegg/soap/doc/keggapi_manual.html [2] - https://fedorahosted.org/suds/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkty+ygACgkQYh11EUYTX9Sb3wCgiQrS/HWOr96CEwHErx+RKBVQ 1VMAn1NOlNr/HZ/rmFuqKTlyOM/pZwqi =zBxB -----END PGP SIGNATURE----- From kellrott at gmail.com Wed Feb 10 20:12:10 2010 From: kellrott at gmail.com (Kyle) Date: Wed, 10 Feb 2010 12:12:10 -0800 Subject: [Biopython-dev] KEGG support In-Reply-To: <4B72FB2D.4070808@igc.gulbenkian.pt> References: <4B72FB2D.4070808@igc.gulbenkian.pt> Message-ID: I think external library dependancies should be avoided unless necessary. Would a tool like wsdl2py produce code that isn't dependent on an installed library? Alternatively, suds is LGPL based, could we just cannibalize the source code for the important classes? Kyle On Wed, Feb 10, 2010 at 10:30 AM, Renato Alves wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi everyone, > > KEGG support in Biopython has been mostly untouched for the past 8 years > with only a few changes and test additions. There is code in the tree to > work with the Enzyme and Compound databases but not for others such as > GENES, ORTHOLOGY, DRUG, ... > > Considering the fact that I will need to write some code to work with > other formats I was planning to contribute and integrate it with the > SeqIO interface. This will require some additional homework on my part. > > KEGG also has a SOAP based API [1]. It's functionality could be in some > aspects compared to NCBI eutils. Using the python SOAP library suds [2] > I had no problem interacting with it. > > So just in case someone was already working on this secretly :) I would > like to know to make my life easier. If not I would also like to know if > you would be interested in the addition and finally what's your thought > about the SOAP interface and the suds (optional) dependency. > > Just a word on suds. Even though the project has been around for a few > years now, it's still not available in most Linux distros. On my > personal experience with it it's probably the simplest and easy to use > SOAP library for python out there. > > Cheers, > Renato > > [1] - http://www.genome.jp/kegg/soap/doc/keggapi_manual.html > [2] - https://fedorahosted.org/suds/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAkty+ygACgkQYh11EUYTX9Sb3wCgiQrS/HWOr96CEwHErx+RKBVQ > 1VMAn1NOlNr/HZ/rmFuqKTlyOM/pZwqi > =zBxB > -----END PGP SIGNATURE----- > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From dalloliogm at gmail.com Wed Feb 10 22:13:04 2010 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 10 Feb 2010 23:13:04 +0100 Subject: [Biopython-dev] KEGG support In-Reply-To: <4B72FB2D.4070808@igc.gulbenkian.pt> References: <4B72FB2D.4070808@igc.gulbenkian.pt> Message-ID: <5aa3b3571002101413o55d04432vc76c230aa9c43252@mail.gmail.com> On Wed, Feb 10, 2010 at 7:30 PM, Renato Alves wrote: > KEGG support in Biopython has been mostly untouched for the past 8 years > with only a few changes and test additions. There is code in the tree to > work with the Enzyme and Compound databases but not for others such as > GENES, ORTHOLOGY, DRUG, ... > Hi, I had a terrible experience with parsing Kegg pathway's files: in the end I discovered that the files that are stored in their ftp don't correspond exactly to the diagrams that you can find in the web interface, as for example biochemical interactions don't have directionality while if you look at them on kegg/pathway you will see arrows. Some time ago I proposed to implement something similar to what you have said for kegg/pathway, but in the end I abandoned the effort, because I had problem both with suds and SOAPpy, and I wasn't satisfied by the annotations in KEGG. > > Considering the fact that I will need to write some code to work with > other formats I was planning to contribute and integrate it with the > SeqIO interface. This will require some additional homework on my part. > > If you are serious about that I may help you, but I can only work on the weekends and you should tell me exactly what I have to do :-) KEGG also has a SOAP based API [1]. It's functionality could be in some > aspects compared to NCBI eutils. Using the python SOAP library suds [2] > I had no problem interacting with it. > Are you sure? I tried it on KEGG an year ago and I was having problems to execute slightly more complex queries. If you look at suds's bug tracker, you will find some reports by me, like this one: - https://fedorahosted.org/suds/ticket/213 I remember that I was looping between the KEGG support centre and the suds bug tracker; both were very responsive to feedback and very keen to answer me, but in the end they didn't speak to each other and the bug reports that I have filed are still unfixed. Which library can you use for the soap queries? I had the feeling that SOAPpy (which I think it is included in the standard lib) worked well with KEGG, however it development has stopped many years ago ( http://sourceforge.net/projects/pywebsvcs/files/SOAP.py/), it is a mess if you want to use it behind an http_proxy (I should have a patch somewhere if you are interested) and I am sure it won't be kept compatible with the future versions of python. Another alternative may be beautiful soup, but I have never tried it. This question on stackoverflow may provide you some ideas: - http://stackoverflow.com/questions/206154/whats-the-best-soap-client-library-for-python-and-where-is-the-documentation-fo I am not sure about which is the standard soap library for python, and which one is included in the standard lib. If you are going to use SOAPpy, it is a bad bet toward compatibility and maintenance for the future releases. Suds is the best option but it is not in the standard lib, and they still have to fix the bugs I have reported an year ago. I have the feeling that there is no good alternative for python. Moreover, the WSDL functions that I have seen for KEGG are not especially useful. They seems to allow for the basic queries, but for most of the tasks it is better to download the ftp locally and work there. > So just in case someone was already working on this secretly :) I would > like to know to make my life easier. If not I would also like to know if > you would be interested in the addition and finally what's your thought > about the SOAP interface and the suds (optional) dependency. > > Just a word on suds. Even though the project has been around for a few > years now, it's still not available in most Linux distros. On my > personal experience with it it's probably the simplest and easy to use > SOAP library for python out there. > > Cheers, > Renato > > [1] - http://www.genome.jp/kegg/soap/doc/keggapi_manual.html > [2] - https://fedorahosted.org/suds/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAkty+ygACgkQYh11EUYTX9Sb3wCgiQrS/HWOr96CEwHErx+RKBVQ > 1VMAn1NOlNr/HZ/rmFuqKTlyOM/pZwqi > =zBxB > -----END PGP SIGNATURE----- > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- Giovanni Dall'Olio, phd student Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain) My blog on bioinformatics: http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Wed Feb 10 22:16:14 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 10 Feb 2010 17:16:14 -0500 Subject: [Biopython-dev] [Bug 3009] New: Check the FASTA m10 alignment parser works with FASTA36 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3009 Summary: Check the FASTA m10 alignment parser works with FASTA36 Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Bill Pearson has just announced the release of FASTA36: http://faculty.virginia.edu/wrpearson/fasta/fasta36/ >From his email, > This version is a major update from FASTA version 35. > It's main new feature is the ability to report all > statistically significant alignments between a query > and library sequence (equivalent to BLAST's multiple > HSPs). All previous versions of the FASTA program > reported only the best alignment between the query > and library sequence, a serious shortcoming when > comparing a query protein to a multi-exon gene or > multi-domain protein. We need to check the FASTA36 -m 10 output, add this to our unit tests, and update our parser as required. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Wed Feb 10 22:26:08 2010 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 10 Feb 2010 23:26:08 +0100 Subject: [Biopython-dev] KEGG support In-Reply-To: References: <4B72FB2D.4070808@igc.gulbenkian.pt> Message-ID: <5aa3b3571002101426p7b57f50aga270f0ea7eb8554f@mail.gmail.com> On Wed, Feb 10, 2010 at 9:12 PM, Kyle wrote: > I think external library dependancies should be avoided unless necessary. > Would a tool like wsdl2py produce code that isn't dependent on an > installed > library? Alternatively, suds is LGPL based, could we just cannibalize the > source code for the important classes? > Honestly I think that the best solution would be to make an external module to extend the basic biopython and to link it on the biopython's web page. The core biopython should provide objects and infrastructures for biological data, but then the additional functionalities should go on separate modules linked on the biopython's web page, taking inspiration from BioConductor and installed with easy_install or a derivate. If we keep on maintaining a constrain that all biopython modules should have the same dependencies, then it is impossible to make anything more complex than the basic stuff, and then biopython won't never be useful as it may be. You can't make a good library for using WSDL services with SOAPpy, or plot nice graphics without matplotlib, or store data in HDF5 format, and there are many other examples. Bioinformatics is a very general word, people working on it have a big variety of needs, and it is difficult to accomplish it all with few dependencies. -- Giovanni Dall'Olio, phd student Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain) My blog on bioinformatics: http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Wed Feb 10 22:27:07 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Feb 2010 22:27:07 +0000 Subject: [Biopython-dev] KEGG support In-Reply-To: <4B72FB2D.4070808@igc.gulbenkian.pt> References: <4B72FB2D.4070808@igc.gulbenkian.pt> Message-ID: <320fb6e01002101427o5efc003es3402308e44b2533a@mail.gmail.com> On Wed, Feb 10, 2010 at 6:30 PM, Renato Alves wrote: > > Hi everyone, > > KEGG support in Biopython has been mostly untouched for the past 8 years > with only a few changes and test additions. There is code in the tree to > work with the Enzyme and Compound databases but not for others such as > GENES, ORTHOLOGY, DRUG, ... > > Considering the fact that I will need to write some code to work with > other formats I was planning to contribute and integrate it with the > SeqIO interface. This will require some additional homework on my part. Excellent news. Have you looked at the existing KEGG parsers in Biopython, and do you think the current style is suitable? (I haven't looked at the code recently myself, but will do). Regarding the SeqIO interface (for KEGG GENES only?), I would be happy to advise. Initially I suggest you work on adding a parser much like the other KEGG parsers, returning gene records. Then we can add a Bio/SeqIO/KeggGeneIO.py wrapper to turn these into SeqRecord objects. > KEGG also has a SOAP based API [1]. It's functionality could be in some > aspects compared to NCBI eutils. Using the python SOAP library suds [2] > I had no problem interacting with it. I have not used SOAP, and have a personal preference for REST style APIs. However, if that is what KEGG offers, this is worth considering. I think Brad has some experience with (other) SOAP services in Python. Note the KEGG documentation suggests using SOAPpy for Python. Interestingly, KEGG are however looking into providing RDF (and perhaps one day SPARQL endpoints). I will try and find out what sort of time scale they have in mind while I am at the BioHackathon 2010 this week - http://hackathon3.dbcls.jp/ For now, I would prioritise the KEGG flat file parsers. Peter From biopython at maubp.freeserve.co.uk Wed Feb 10 22:37:03 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Feb 2010 22:37:03 +0000 Subject: [Biopython-dev] KEGG support In-Reply-To: References: <4B72FB2D.4070808@igc.gulbenkian.pt> Message-ID: <320fb6e01002101437p5f382ce0tb4cd864cec59a6c5@mail.gmail.com> On Wed, Feb 10, 2010 at 8:12 PM, Kyle wrote: > I think external library dependancies should be avoided unless necessary. > ?Would a tool like wsdl2py produce code that isn't dependent on an installed > library? Alternatively, suds is LGPL based, could we just cannibalize the > source code for the important classes? Working with SOAP is so complicated that using an external library would be the sensible option. It would be an optional dependency (and would not be an install time dependency like NumPy), much like how we have a optional dependency on ReportLab just for Bio.Graphics, and now also the option to use NetworkX with the new Bio.phylo code. Package management (e.g. under Linux distros) can mark these external modules as suggestions or soft requirements, making this quite straight forward. Regarding some of Giovanni's points, modularising the distribution of Biopython (which can already be considered to be a core plus assorted domain-specific modules like Bio.PDB, Bio.Cluster, Bio.Graphics and so on) seems premature to me give the current state of python distribution. Peter P.S. We can't take any GPL or LPGL code and incorporate it into Biopython, due to the nature of those licences. From anaryin at gmail.com Wed Feb 10 22:52:53 2010 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 10 Feb 2010 14:52:53 -0800 Subject: [Biopython-dev] KEGG support Message-ID: Hello all, For what it's worth: I worked with KEGG about a year and a half ago, to do some very basic things. I remember I tried using SOAPpy and ZSI. The first is a pain to install in Windows (at least then it was), so I opted for the second. However it has been quite outdated and I had some problems dealing with complex data types.. Regarding modularising/non-modularising the code, I guess that some features will have to have dependences that cannot be included in the core distribution, and thus the user should be warned that it needs library X or Y to have them work. In short, keeping the current structure seems the wisest IMO. I don't see such a need of creating outer-modules. Lastly, good luck with KEGG's services' speed. That API is slower than a turle :x From rjalves at igc.gulbenkian.pt Thu Feb 11 00:44:59 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Thu, 11 Feb 2010 00:44:59 +0000 Subject: [Biopython-dev] KEGG support In-Reply-To: <320fb6e01002101427o5efc003es3402308e44b2533a@mail.gmail.com> References: <4B72FB2D.4070808@igc.gulbenkian.pt> <320fb6e01002101427o5efc003es3402308e44b2533a@mail.gmail.com> Message-ID: <4B73530B.7090203@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - From Peter on 02/10/2010 10:27 PM: > Excellent news. Have you looked at the existing KEGG parsers in > Biopython, and do you think the current style is suitable? (I haven't > looked at the code recently myself, but will do). The style seems good enough but I was thinking of having a more functional approach, at least for the parser to try to get away of the massive if/elif/else cascades. The writer would come as second priority and would be similar although I would also try to keep code duplication at lower levels than what we can see in the Enzyme/__init__.py file. I would also consider using Genes.py instead of Genes/__init__.py ... I don't see the need of packages here. > Regarding the SeqIO interface (for KEGG GENES only?), I would be > happy to advise. Initially I suggest you work on adding a parser much > like the other KEGG parsers, returning gene records. Then we can > add a Bio/SeqIO/KeggGeneIO.py wrapper to turn these into SeqRecord > objects. Yes for now my main goal would be GENES. The other formats can probably grow from there. Your suggestion on the SeqIO seems reasonable. I'll try to have a prototype in the next days/weekend and we can discuss from there. > I have not used SOAP, and have a personal preference for REST style > APIs. However, if that is what KEGG offers, this is worth considering. > I think Brad has some experience with (other) SOAP services in Python. > Note the KEGG documentation suggests using SOAPpy for Python. According to the http://www.genome.jp/kegg/docs/weblink.html page they do mention a REST like URL for generic entries, pathways and brite. But it seems more useful for external linking than as an API. I couldn't even figure out how to return the information in plaintext instead of the default HTML. About SOAPpy, I've nothing against it besides the fact that when I first tried I had few problems. Anyway it was a long time ago... I've only played with suds since. > Interestingly, KEGG are however looking into providing RDF (and > perhaps one day SPARQL endpoints). I will try and find out what sort > of time scale they have in mind while I am at the BioHackathon 2010 > this week - http://hackathon3.dbcls.jp/ We'll be waiting on your feedback on this :) > For now, I would prioritise the KEGG flat file parsers. Agreed. > Peter -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAktzUwgACgkQYh11EUYTX9SPcwCfSrNkIovs1vnPinuAtMFZQJYn pmAAnjHAAro2Ls/c1Nq4DCuliReaPm64 =Dohn -----END PGP SIGNATURE----- From rjalves at igc.gulbenkian.pt Thu Feb 11 00:53:03 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Thu, 11 Feb 2010 00:53:03 +0000 Subject: [Biopython-dev] KEGG support In-Reply-To: <320fb6e01002101437p5f382ce0tb4cd864cec59a6c5@mail.gmail.com> References: <4B72FB2D.4070808@igc.gulbenkian.pt> <320fb6e01002101437p5f382ce0tb4cd864cec59a6c5@mail.gmail.com> Message-ID: <4B7354EF.8020703@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - From Peter on 02/10/2010 10:37 PM: > On Wed, Feb 10, 2010 at 8:12 PM, Kyle wrote: >> I think external library dependancies should be avoided unless necessary. >> Would a tool like wsdl2py produce code that isn't dependent on an installed >> library? Alternatively, suds is LGPL based, could we just cannibalize the >> source code for the important classes? > > Working with SOAP is so complicated that using an external library > would be the sensible option. It would be an optional dependency > (and would not be an install time dependency like NumPy), much > like how we have a optional dependency on ReportLab just for > Bio.Graphics, and now also the option to use NetworkX with the > new Bio.phylo code. Yes that would be my idea on the SOAP interface. If doable we could even evaluate the possibility of having some abstraction layer that could enable the use of SOAPpy or suds if either is already available on the system. > Package management (e.g. under Linux distros) can mark these > external modules as suggestions or soft requirements, making > this quite straight forward. The 'or' case for soap libraries would also fit in this scheme since most package managers already support this kind of feature. > Regarding some of Giovanni's points, modularising the distribution > of Biopython (which can already be considered to be a core plus > assorted domain-specific modules like Bio.PDB, Bio.Cluster, > Bio.Graphics and so on) seems premature to me give the current > state of python distribution. Could you elaborate a little on what you mean by 'current state of python...'. Are you referring to the python3 transition? Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAktzVO0ACgkQYh11EUYTX9S1ngCfYFiW7VeNu6atl0J1eViqquSo PCIAn3KO2p//fRYpZVC0QSp2gITP/n2I =uTTc -----END PGP SIGNATURE----- From chapmanb at 50mail.com Thu Feb 11 00:56:00 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 10 Feb 2010 19:56:00 -0500 Subject: [Biopython-dev] KEGG support In-Reply-To: <4B73530B.7090203@igc.gulbenkian.pt> References: <4B72FB2D.4070808@igc.gulbenkian.pt> <320fb6e01002101427o5efc003es3402308e44b2533a@mail.gmail.com> <4B73530B.7090203@igc.gulbenkian.pt> Message-ID: <20100211005600.GB1923@kunkel> Renato; Great idea to work with the KEGG parsers. Very happy to have someone tackling this. > According to the http://www.genome.jp/kegg/docs/weblink.html page they > do mention a REST like URL for generic entries, pathways and brite. But > it seems more useful for external linking than as an API. I couldn't > even figure out how to return the information in plaintext instead of > the default HTML. About SOAPpy, I've nothing against it besides the fact > that when I first tried I had few problems. Anyway it was a long time > ago... I've only played with suds since. My suggestion would be to use the TogoWS REST interface http://togows.dbcls.jp/site/en/rest.html It makes getting records crazy easy. There are tons of examples, but for GENES, here's how to get the plain text record: http://togows.dbcls.jp/entry/gene/eco:b0002 If you really want to use SOAP, my experience has been best with suds. However, the complexities of SOAP are really not worth it if you can get REST approaches to do what you need. Hope this helps, Brad From rjalves at igc.gulbenkian.pt Thu Feb 11 01:14:52 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Thu, 11 Feb 2010 01:14:52 +0000 Subject: [Biopython-dev] KEGG support In-Reply-To: <5aa3b3571002101413o55d04432vc76c230aa9c43252@mail.gmail.com> References: <4B72FB2D.4070808@igc.gulbenkian.pt> <5aa3b3571002101413o55d04432vc76c230aa9c43252@mail.gmail.com> Message-ID: <4B735A0C.8070902@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - From Giovanni Marco Dall'Olio on 02/10/2010 10:13 PM: > Hi, > I had a terrible experience with parsing Kegg pathway's files: in the > end I discovered that the files that are stored in their ftp don't > correspond exactly to the diagrams that you can find in the web > interface, as for example biochemical interactions don't have > directionality while if you look at them on kegg/pathway you will see > arrows. I haven't used pathway files yet so I'll be careful when I reach them :) Have you mentioned this aspect to the KEGG maintainers? > Some time ago I proposed to implement something similar to what you have > said for kegg/pathway, but in the end I abandoned the effort, because I > had problem both with suds and SOAPpy, and I wasn't satisfied by the > annotations in KEGG. > > If you are serious about that I may help you, but I can only work on the > weekends and you should tell me exactly what I have to do :-) Hehe, I can only tell you once I get my hands dirty. I'll keep my code on github to maximize interaction. I'll get back at you when I get the first working draft for GENES. Thanks for the hand ;) > Are you sure? I tried it on KEGG an year ago and I was having problems > to execute slightly more complex queries. If you look at suds's bug > tracker, you will find some reports by me, like this one: > - https://fedorahosted.org/suds/ticket/213 As of suds revision 658 I can no longer reproduce the error in the ticket. > I remember that I was looping between the KEGG support centre and the > suds bug tracker; both were very responsive to feedback and very keen to > answer me, but in the end they didn't speak to each other and the bug > reports that I have filed are still unfixed. > > Which library can you use for the soap queries? I had the feeling that > SOAPpy (which I think it is included in the standard lib) worked well > with KEGG, however it development has stopped many years ago > (http://sourceforge.net/projects/pywebsvcs/files/SOAP.py/), it is a mess > if you want to use it behind an http_proxy (I should have a patch > somewhere if you are interested) and I am sure it won't be kept > compatible with the future versions of python. SOAPpy doesn't seem to be in the standard lib, at least I don't have it out of the box here. Only as external package in the repository. > Another alternative may be beautiful soup, but I have never tried it. I've only used beautiful soup as HTML cleaner/formatter, like HTML tidy. I wasn't aware that it could be used for SOAP stuff. Are you sure about this? > This question on stackoverflow may provide you some ideas: > http://stackoverflow.com/questions/206154/whats-the-best-soap-client-library-for-python-and-where-is-the-documentation-fo > > I am not sure about which is the standard soap library for python, and > which one is included in the standard lib. If you are going to use > SOAPpy, it is a bad bet toward compatibility and maintenance for the > future releases. Suds is the best option but it is not in the standard > lib, and they still have to fix the bugs I have reported an year ago. I > have the feeling that there is no good alternative for python. I'll wait for your opinions. I don't want to sound religious about suds. :P > Moreover, the WSDL functions that I have seen for KEGG are not > especially useful. They seems to allow for the basic queries, but for > most of the tasks it is better to download the ftp locally and work there. Well if you just want a quick check on something the API still gives better/quicker results than downloading the stuff via FTP. Given the size, probably the load of the server and the fact that I'm on the other side of the globe, I got an ETA of close to 20 hours when downloading the genes.tar.gz file which is only a few GB in size. Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAktzWgoACgkQYh11EUYTX9Rp6QCfaHf6Ic3uT/npDw2o8l9F+8Kk RtgAnjNXGxcrfvh48dcdFf6G4wK9+PNI =vpUY -----END PGP SIGNATURE----- From biopython at maubp.freeserve.co.uk Thu Feb 11 01:15:21 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Feb 2010 01:15:21 +0000 Subject: [Biopython-dev] KEGG support In-Reply-To: <4B7354EF.8020703@igc.gulbenkian.pt> References: <4B72FB2D.4070808@igc.gulbenkian.pt> <320fb6e01002101437p5f382ce0tb4cd864cec59a6c5@mail.gmail.com> <4B7354EF.8020703@igc.gulbenkian.pt> Message-ID: <320fb6e01002101715n3ccb8894r155631a2c6e34cb6@mail.gmail.com> Renato Alves wrote: >> Regarding some of Giovanni's points, modularising the distribution >> of Biopython (which can already be considered to be a core plus >> assorted domain-specific modules like Bio.PDB, Bio.Cluster, >> Bio.Graphics and so on) seems premature to me give the current >> state of python distribution. > > Could you elaborate a little on what you mean by 'current state of > python...'. Are you referring to the python3 transition? I didn't mean anything about Python 3 here. Just the current state of python package management, with distutils vs setuptools, easy_install, Distribute, etc. I'm am looking forward to an official Python successor to distutils one day which will properly handle dependencies (and hopefully uninstallation) nicely. However, for now, a single monolithic Biopython released several times a year works fine and I see no reason to change that. Peter From rjalves at igc.gulbenkian.pt Thu Feb 11 01:46:59 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Thu, 11 Feb 2010 01:46:59 +0000 Subject: [Biopython-dev] KEGG support In-Reply-To: <20100211005600.GB1923@kunkel> References: <4B72FB2D.4070808@igc.gulbenkian.pt> <320fb6e01002101427o5efc003es3402308e44b2533a@mail.gmail.com> <4B73530B.7090203@igc.gulbenkian.pt> <20100211005600.GB1923@kunkel> Message-ID: <4B736193.9020801@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Renato; > Great idea to work with the KEGG parsers. Very happy to have someone > tackling this. Well as we say here, when the need comes we grab the bull by the horns. :) (Small illustration even though I'm not a fan of the 'sport' http://www.youtube.com/watch?v=OBORPnrm89I) > My suggestion would be to use the TogoWS REST interface > > http://togows.dbcls.jp/site/en/rest.html > > It makes getting records crazy easy. There are tons of examples, > but for GENES, here's how to get the plain text record: > > http://togows.dbcls.jp/entry/gene/eco:b0002 > > If you really want to use SOAP, my experience has been best with > suds. However, the complexities of SOAP are really not worth it if > you can get REST approaches to do what you need. Indeed this exactly the same without the need of additional libraries. If all the functionality available on the SOAP API is also here I agree with you, the complexity of SOAP is unnecessary. Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAktzYZEACgkQYh11EUYTX9RMWQCeLOXZH5vBjxB7rgPjhS53Fx7Z EuMAoItWzjJ1LEtV6T8NcDDqnoDyIyBS =dPVp -----END PGP SIGNATURE----- From biopython at maubp.freeserve.co.uk Thu Feb 11 05:29:15 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Feb 2010 05:29:15 +0000 Subject: [Biopython-dev] Merging Bio.SeqIO SFF support? In-Reply-To: <320fb6e01001110911g2961a680qe95c01b14e8d23b3@mail.gmail.com> References: <320fb6e00911230643l611bb8f5i253630f3acabf438@mail.gmail.com> <320fb6e01001110911g2961a680qe95c01b14e8d23b3@mail.gmail.com> Message-ID: <320fb6e01002102129r24e210e1qc070b40f7652fac8@mail.gmail.com> On Mon, Jan 11, 2010 at 5:11 PM, Peter wrote: > > Hi all, > > I didn't want to rush the SFF support into Biopython 1.53, but its been > waiting "ready" for a while now. Any objections or comments about > me merging this now? > > Thanks, > > Peter There were no objections, and I ran this by Brad and Michiel and have just merged this into the master branch. Time for some more testing! Peter From krother at rubor.de Thu Feb 11 12:31:58 2010 From: krother at rubor.de (Kristian Rother) Date: Thu, 11 Feb 2010 13:31:58 +0100 Subject: [Biopython-dev] Bio.PDB.KDTree test for memory leak Message-ID: <112c17235319b66a00eebc499294fb2b-EhVcX1xCQgFaRwICBxEAXR0wfgFLV15YQUBGAEFfUC9ZUFgWXVpyH1RXX0FdQU1tXlhfRVBcWQ1eVw==-webmailer2@server03.webmailer.hosteurope.de> Hi, I've encountered a problem with running KDTree: it leaks memory. The code below fills 1GB memory within a minute. Running the GC doesn't help (it slows the process down, but only because the GC is much slower than KDTree. I think the problem might be in the C code. I'd like to get this bug sorted out, but I'm not very good in C. Is there anyone around who I could check ideas with? Best Regards, Kristian ---- from Bio.KDTree.KDTree import * from numpy.random import random nr_points=1000 dim=3 bucket_size=10 coords=(200*random((nr_points, dim))) while 1: kdtree=KDTree(dim, bucket_size) kdtree.set_coords(coords) From biopython at maubp.freeserve.co.uk Fri Feb 12 06:10:13 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 12 Feb 2010 06:10:13 +0000 Subject: [Biopython-dev] Bio.PDB.KDTree test for memory leak In-Reply-To: <112c17235319b66a00eebc499294fb2b-EhVcX1xCQgFaRwICBxEAXR0wfgFLV15YQUBGAEFfUC9ZUFgWXVpyH1RXX0FdQU1tXlhfRVBcWQ1eVw==-webmailer2@server03.webmailer.hosteurope.de> References: <112c17235319b66a00eebc499294fb2b-EhVcX1xCQgFaRwICBxEAXR0wfgFLV15YQUBGAEFfUC9ZUFgWXVpyH1RXX0FdQU1tXlhfRVBcWQ1eVw==-webmailer2@server03.webmailer.hosteurope.de> Message-ID: <320fb6e01002112210y10ad4670p7ac3e003b5976685@mail.gmail.com> On Thu, Feb 11, 2010 at 12:31 PM, Kristian Rother wrote: > > Hi, > > I've encountered a problem with running KDTree: it leaks memory. > The code below fills 1GB memory within a minute. > > Running the GC doesn't help (it slows the process down, but only because > the GC is much slower than KDTree. You mean something like this? import gc from Bio.KDTree.KDTree import * from numpy.random import random nr_points=1000 dim=3 bucket_size=10 coords=(200*random((nr_points, dim))) while True: kdtree=KDTree(dim, bucket_size) kdtree.set_coords(coords) del kdtree #explicitly tell Python it can GC this object gc.collect() #force Python to run GC I agree, this does seem to gradually consume more and more RAM. Could you open a bug on bugzilla to track this please? > I think the problem might be in the C code. I'd like to get this bug > sorted out, but I'm not very good in C. Is there anyone around who > I could check ideas with? Have you ever used valgrind on a C tool? I'm not sure if it is easy to use via Python, but it is my tool of choice for checking memory leaks in C. Peter From bugzilla-daemon at portal.open-bio.org Fri Feb 12 08:30:12 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Feb 2010 03:30:12 -0500 Subject: [Biopython-dev] [Bug 3010] New: Bio.KDTree is leaking memory Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3010 Summary: Bio.KDTree is leaking memory Product: Biopython Version: 1.53 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: krother at rubor.de When I run KDTree on several of our PCs (Ubuntu, one with BioPython 1.53, one with 1.51), it consumes memory that is never freed unless the process terminates. The code below fills 1GB memory within about a minute. ---- #!/usr/bin/env python from Bio.KDTree.KDTree import * from numpy.random import random nr_points=1000 dim=3 bucket_size=10 coords=(200*random((nr_points, dim))) while True: kdtree=KDTree(dim, bucket_size) kdtree.set_coords(coords) ---- Running the GC doesn't help (via del kdtree; gc.collect() in the while loop) does not help. I think the problem might be the C code or the Python/C interaction. I checked the sources of KDTree superficially (to see whether there is a free() for each malloc(), but did not see anything unusual (am not a C programmer though). Peter proposed using valgrind to check memory leaks in C. Eventually it is applicable to the problem. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 12 12:31:13 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Feb 2010 07:31:13 -0500 Subject: [Biopython-dev] [Bug 3006] esearch medline fails with xml format In-Reply-To: Message-ID: <201002121231.o1CCVDlN010496@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3006 georg.lipps at fhnw.ch changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from georg.lipps at fhnw.ch 2010-02-12 07:31 EST ------- I updated to python 2.6.4 and Biopython 1.5.3 and can confirm that the problem does not persist. Thanks for checking. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 12 16:23:17 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Feb 2010 11:23:17 -0500 Subject: [Biopython-dev] [Bug 3010] Bio.KDTree is leaking memory In-Reply-To: Message-ID: <201002121623.o1CGNHHd017669@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3010 ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2010-02-12 11:23 EST ------- Does the memory leak occur also without the line kdtree.set_coords(coords)? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Feb 14 10:45:48 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 14 Feb 2010 05:45:48 -0500 Subject: [Biopython-dev] [Bug 3010] Bio.KDTree is leaking memory In-Reply-To: Message-ID: <201002141045.o1EAjmV1029393@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3010 ------- Comment #2 from krother at rubor.de 2010-02-14 05:45 EST ------- (In reply to comment #1) > Does the memory leak occur also without the line kdtree.set_coords(coords)? > No, I tried, and it doesnt. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From MatatTHC at gmx.de Tue Feb 16 09:48:25 2010 From: MatatTHC at gmx.de (Matthias Bernt) Date: Tue, 16 Feb 2010 10:48:25 +0100 Subject: [Biopython-dev] derive from Seq Message-ID: <20100216094825.25190@gmx.net> Hi, I've implemented a class derived from Seq. Many of the Seq functions return Seq. Thus, I can not use those functions because I need instances of the derived class. This can easily be fixed by returning: self.__class__( .. ) Regards, Matthias -- Sicherer, schneller und einfacher. Die aktuellen Internet-Browser - jetzt kostenlos herunterladen! http://portal.gmx.net/de/go/atbrowser From chapmanb at 50mail.com Tue Feb 16 13:09:45 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 16 Feb 2010 08:09:45 -0500 Subject: [Biopython-dev] derive from Seq In-Reply-To: <20100216094825.25190@gmx.net> References: <20100216094825.25190@gmx.net> Message-ID: <20100216130945.GH64068@sobchak.mgh.harvard.edu> Hi Matthias; > I've implemented a class derived from Seq. Many of the Seq functions > return Seq. Thus, I can not use those functions because I need > instances of the derived class. > > This can easily be fixed by returning: > > self.__class__( .. ) Good catch. Would you be able to submit a patch for this to the bug tracker? More generally, it is interesting that you are subclassing Seq. Can you describe your application for this? I was debating with Peter and Michiel this week and arguing that the Seq class should be switched to a standard string, with biological functions like reverse_complement and the like moving to stand alone functions and SeqRecord objects. I'd be interested in hearing the opposite case; that additional functionality is needed on a Seq object. Brad From bugzilla-daemon at portal.open-bio.org Tue Feb 16 17:53:29 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Feb 2010 12:53:29 -0500 Subject: [Biopython-dev] [Bug 3013] New: import warnings missing in Bio/PDB/MMCIF2Dict.py Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3013 Summary: import warnings missing in Bio/PDB/MMCIF2Dict.py Product: Biopython Version: 1.53 Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: macrozhu+biopy at gmail.com python library >>warnings<< is not imported in Bio/PDB/MMCIF2Dict.py Please import the library in the beginning of the source code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Feb 17 01:24:39 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Feb 2010 20:24:39 -0500 Subject: [Biopython-dev] [Bug 3013] import warnings missing in Bio/PDB/MMCIF2Dict.py In-Reply-To: Message-ID: <201002170124.o1H1OdhE003209@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3013 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2010-02-16 20:24 EST ------- Fixed in the repository, thanks. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From p.j.a.cock at googlemail.com Wed Feb 17 02:48:01 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 17 Feb 2010 02:48:01 +0000 Subject: [Biopython-dev] derive from Seq In-Reply-To: <20100216130945.GH64068@sobchak.mgh.harvard.edu> References: <20100216094825.25190@gmx.net> <20100216130945.GH64068@sobchak.mgh.harvard.edu> Message-ID: <320fb6e01002161848s28e543a9jdac436976de3f279@mail.gmail.com> On Tue, Feb 16, 2010 at 1:09 PM, Brad Chapman wrote: > Hi Matthias; > >> I've implemented a class derived from Seq. Many of the Seq functions >> return Seq. Thus, I can not use those functions because I need >> instances of the derived class. >> >> This can easily be fixed by returning: >> >> self.__class__( .. ) We debated this on the mailing list a while ago (I'd hack to search a little harder to find the thread). While switching to this form makes subclassing easier in some cases, it doesn't in all. > More generally, it is interesting that you are subclassing Seq. Can > you describe your application for this? ... I'd be interested in > hearing ... additional functionality is needed on a Seq object. > > Brad Last time this (subclassing the Seq object) was mentioned, the specific use was to change the equality operations to be string like. This is a change we're considering making in Biopython itself (and again was something Brad, Michiel and I chatted about last week - I will be sending out an email about that next week, I'm on holiday right now and haven't had internet access till today). But to echo Brad, use cases for subclassing the Seq are of great interest. Regards, Peter From MatatTHC at gmx.de Wed Feb 17 08:33:11 2010 From: MatatTHC at gmx.de (Matthias Bernt) Date: Wed, 17 Feb 2010 09:33:11 +0100 Subject: [Biopython-dev] derive from Seq In-Reply-To: <320fb6e01002161848s28e543a9jdac436976de3f279@mail.gmail.com> References: <20100216094825.25190@gmx.net> <20100216130945.GH64068@sobchak.mgh.harvard.edu> <320fb6e01002161848s28e543a9jdac436976de3f279@mail.gmail.com> Message-ID: <20100217083311.287840@gmx.net> Hi, I'm dealing with circular sequences. Thus, I need some specialised functions (e.g. getting a subsequence). Furthermore, for me it seems to be the natural way to extend the functionality of Seq to my own needs. But, maybe this is not the best way. Matthias > > Hi Matthias; > > > >> I've implemented a class derived from Seq. Many of the Seq functions > >> return Seq. Thus, I can not use those functions because I need > >> instances of the derived class. > >> > >> This can easily be fixed by returning: > >> > >> self.__class__( .. ) > > We debated this on the mailing list a while ago (I'd hack to search > a little harder to find the thread). While switching to this form makes > subclassing easier in some cases, it doesn't in all. > > > More generally, it is interesting that you are subclassing Seq. Can > > you describe your application for this? ... I'd be interested in > > hearing ... additional functionality is needed on a Seq object. > > > > Brad > > Last time this (subclassing the Seq object) was mentioned, the > specific use was to change the equality operations to be string > like. This is a change we're considering making in Biopython itself > (and again was something Brad, Michiel and I chatted about > last week - I will be sending out an email about that next week, > I'm on holiday right now and haven't had internet access till > today). > > But to echo Brad, use cases for subclassing the Seq are > of great interest. -- NEU: Mit GMX DSL ?ber 1000,- ? sparen! http://portal.gmx.net/de/go/dsl02 From bugzilla-daemon at portal.open-bio.org Thu Feb 18 16:09:52 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Feb 2010 11:09:52 -0500 Subject: [Biopython-dev] [Bug 3013] import warnings missing in Bio/PDB/MMCIF2Dict.py In-Reply-To: Message-ID: <201002181609.o1IG9qth028156@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3013 ------- Comment #2 from macrozhu+biopy at gmail.com 2010-02-18 11:09 EST ------- Can pychecker be of any use for detecting such minor bugs? It might be too much, I guess. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Feb 20 18:40:59 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 20 Feb 2010 13:40:59 -0500 Subject: [Biopython-dev] [Bug 3013] import warnings missing in Bio/PDB/MMCIF2Dict.py In-Reply-To: Message-ID: <201002201840.o1KIexYS017773@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3013 ------- Comment #3 from eric.talevich at gmail.com 2010-02-20 13:40 EST ------- (In reply to comment #2) > Can pychecker be of any use for detecting such minor bugs? It might be too > much, I guess. > I don't know about PyChecker, but PyLint will catch import errors and uninitialized variables like this. For example, I just tried "pylint -e Bio/PDB/*.py" to a branch that didn't have this fix in it yet, and it flagged this bug: E: 79:MMCIF2Dict._make_mmcif_dict: Undefined variable 'warnings' E: 91:MMCIF2Dict._make_mmcif_dict: Undefined variable 'warnings' E:107:MMCIF2Dict._make_mmcif_dict: Undefined variable 'warnings' While I'm at it, here are the other errors in Bio.PDB that pylint caught in a freshly updated master branch: ************* Module Chain E: 79:Chain.__delitem__: Class 'Entity' has no '__delitem__' member ************* Module DSSP E:101:make_dssp_dict: function already defined line 8 E:139:DSSP: class already defined line 8 ************* Module Entity E: 56:Entity.get_level: Instance of 'Entity' has no 'level' member ************* Module FragmentMapper E:137:Fragment.add_residue: Undefined variable 'PDBException' E:191:_make_fragment_list: Undefined variable 'PDBException' E:193:_make_fragment_list: Undefined variable 'PDBException' E:226:FragmentMapper: class already defined line 10 E:250:FragmentMapper.__init__: Undefined variable 'PDBException' ************* Module HSExposure E: 67:_AbstractHSExposure.__init__: Instance of '_AbstractHSExposure' has no '_get_cb' member E:131:HSExposureCA: class already defined line 9 E:222:HSExposureCB: class already defined line 9 E:257:ExposureCN: class already defined line 9 ************* Module MMCIF2Dict E: 8: No name 'MMCIFlex' in module 'Bio.PDB.mmCIF' E: 31:MMCIF2Dict.__init__: Module 'Bio.PDB.mmCIF' has no 'MMCIFlex' member E: 33:MMCIF2Dict.__init__: Module 'Bio.PDB.mmCIF' has no 'MMCIFlex' member E: 44:MMCIF2Dict._make_mmcif_dict: Module 'Bio.PDB.mmCIF' has no 'MMCIFlex' member ************* Module NACCESS E:183: Instance of 'NACCESS' has no 'get_iterator' member ************* Module PDBParser E:159:PDBParser._parse_coordinates: Undefined variable 'PDBContructionError' ************* Module Polypeptide E:276:_PPBuilder.build_peptides: Instance of '_PPBuilder' has no '_is_connected' member ************* Module ResidueDepth E: 65:get_surface: function already defined line 11 E:123:ResidueDepth: class already defined line 11 ************* Module StructureAlignment E: 14:StructureAlignment: class already defined line 6 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Sat Feb 20 19:01:42 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 20 Feb 2010 14:01:42 -0500 Subject: [Biopython-dev] derive from Seq In-Reply-To: <20100216130945.GH64068@sobchak.mgh.harvard.edu> References: <20100216094825.25190@gmx.net> <20100216130945.GH64068@sobchak.mgh.harvard.edu> Message-ID: <3f6baf361002201101m78a9550csc6d371df251e4319@mail.gmail.com> On Tue, Feb 16, 2010 at 8:09 AM, Brad Chapman wrote: > > More generally, it is interesting that you are subclassing Seq. Can > you describe your application for this? I was debating with Peter > and Michiel this week and arguing that the Seq class should be > switched to a standard string, with biological functions like > reverse_complement and the like moving to stand alone functions and > SeqRecord objects. I'd be interested in hearing the opposite case; > that additional functionality is needed on a Seq object. > > I've seen a technique like this used to good effect: # File: Seq.py # Standalone functions all take a string-like first argument def reverse_complement(seq): ... def translate(seq, table=1): ... class Seq(basestring): # or str def __init__(self, data, alphabet): ... # Then attach the above functions as methods here reverse_complement = reverse_complement translate = translate ... The same functionality is then available in a functional or OO style, with minimal code duplication. And for interactive sessions, where converting strings to Seqs is a bit more of an inconvenience, "from Bio.Seq import *" becomes quick and handy. -Eric From biopython at maubp.freeserve.co.uk Sun Feb 21 12:03:21 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 21 Feb 2010 12:03:21 +0000 Subject: [Biopython-dev] derive from Seq In-Reply-To: <3f6baf361002201101m78a9550csc6d371df251e4319@mail.gmail.com> References: <20100216094825.25190@gmx.net> <20100216130945.GH64068@sobchak.mgh.harvard.edu> <3f6baf361002201101m78a9550csc6d371df251e4319@mail.gmail.com> Message-ID: <320fb6e01002210403i721f8f26l2e7d37aae0b13c35@mail.gmail.com> On Sat, Feb 20, 2010 at 7:01 PM, Eric Talevich wrote: > I've seen a technique like this used to good effect: > > # File: Seq.py > > ... > > The same functionality is then available in a functional or OO style, with > minimal code duplication. And for interactive sessions, where converting > strings to Seqs is a bit more of an inconvenience, "from Bio.Seq import *" > becomes quick and handy. Doesn't that describe the Bio.Seq module as it is pretty well? In addition to the Seq object methods, there are several functions which can be used on strings or Seq (like) objects. Peter From eric.talevich at gmail.com Sun Feb 21 16:36:13 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 21 Feb 2010 11:36:13 -0500 Subject: [Biopython-dev] derive from Seq In-Reply-To: <320fb6e01002210403i721f8f26l2e7d37aae0b13c35@mail.gmail.com> References: <20100216094825.25190@gmx.net> <20100216130945.GH64068@sobchak.mgh.harvard.edu> <3f6baf361002201101m78a9550csc6d371df251e4319@mail.gmail.com> <320fb6e01002210403i721f8f26l2e7d37aae0b13c35@mail.gmail.com> Message-ID: <3f6baf361002210836of243016s8206035c1b89de24@mail.gmail.com> On Sun, Feb 21, 2010 at 7:03 AM, Peter wrote: > On Sat, Feb 20, 2010 at 7:01 PM, Eric Talevich > wrote: > > I've seen a technique like this used to good effect: > > > > ... > > > > The same functionality is then available in a functional or OO style, > with > > minimal code duplication. And for interactive sessions, where converting > > strings to Seqs is a bit more of an inconvenience, "from Bio.Seq import > *" > > becomes quick and handy. > > Doesn't that describe the Bio.Seq module as it is pretty well? > In addition to the Seq object methods, there are several functions > which can be used on strings or Seq (like) objects. > > Peter > I'm not fully up to speed on the debate or the use cases that triggered it, but I'm guessing the goal is better code flexibility without sacrificing performance. Here's some code to consider: def transcribe(dna, alphabet=None): """Transcribe a DNA sequence into RNA. Returns a string.""" if isinstance(dna, Seq) or isinstance(dna, MutableSeq): # At first, maybe issue a warning here alphabet = dna.alphabet dna = str(dna) if alphabet is not None: # Validate base = Alphabet._get_base_alphabet(alphabet) if isinstance(base, Alphabet.ProteinAlphabet): raise ValueError("Proteins cannot be transcribed!") if isinstance(base, Alphabet.RNAAlphabet): raise ValueError("RNA cannot be transcribed!") return dna.replace('T','U').replace('t','u') class Seq: # ... def transcribe(self): transcript = transcribe(self._data) # Rebuild the Seq object if self.alphabet==IUPAC.unambiguous_dna: alphabet = IUPAC.unambiguous_rna elif self.alphabet==IUPAC.ambiguous_dna: alphabet = IUPAC.ambiguous_rna else: alphabet = Alphabet.generic_rna return Seq(transcript, alphabet) Notes: - The standalone takes an optional 'alphabet' argument, and performs validation if requested. - Since the standalone function now has the same functionality as the Seq method, Seq can dispatch to the function -- rather than the other way around, as it is currently -- and then just rebuild a Seq object. - The standalone function now always returns the same type (str). Since this might break some existing code, a little shim and deprecation dance may be needed in real life. But I think returning a plain string is the Right Thing: there's "one obvious way" to work with Seq objects or plain strings. - If the grand proposal is to eventually move the alphabet attribute to SeqRecord, this provides an intermediate step and a more convenient foundation for testing the idea. Best, Eric From biopython at maubp.freeserve.co.uk Mon Feb 22 14:48:14 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Feb 2010 14:48:14 +0000 Subject: [Biopython-dev] Changing Seq equality In-Reply-To: <320fb6e00911270339s3354051cub0cc193466575f16@mail.gmail.com> References: <200911250945.20870.jblanca@btc.upv.es> <320fb6e00911250226w4e86ea5cr4cdea4a424d32b7@mail.gmail.com> <200911251220.53881.jblanca@btc.upv.es> <320fb6e00911250348m249533d1g5e30b6c593769dd1@mail.gmail.com> <3f6baf360911252314u72ab5c19rbcb899e736117a4f@mail.gmail.com> <320fb6e00911260241j22fbee47ufaad13412c0ff580@mail.gmail.com> <3f6baf360911261213g2047607aw212215cce2b4fe82@mail.gmail.com> <320fb6e00911270339s3354051cub0cc193466575f16@mail.gmail.com> Message-ID: <320fb6e01002220648n5d47f015r65f17a37f782fcde@mail.gmail.com> Hi all, I've just got back from Japan - Brad and I were fortunate to be able to attend the DBCLS BioHackathon 2010 held in Tokyo, http://hackathon3.dbcls.jp/ As Brad already mentioned in passing, we also managed to have dinner one evening with Michiel, and had an informal chat about Biopython plans. Expect a few more emails on other topics to follow. One of the short term aims we agreed on was to press ahead with the Seq equality changes outlined on this thread late last year. Mailing list archive link: http://lists.open-bio.org/pipermail/biopython-dev/2009-November/007021.html To recap, the agreed best behaviour was to make Seq equality act like string equality, but to raise a Python warning when incompatible alphabets are compared (e.g. DNA to Protein). This also applies to all the other comparison operators: not equal, less than, greater than, less than or equal, and greater than or equal. This is my outline plan for the change: For Biopython up to 1.53, Seq class uses object equality, seq1==seq2 acts as id(seq1)==id(seq2) For Biopython 1.54 (and perhaps a few more releases), the Seq classes will still use object equality but will trigger a warning suggesting explicit use of id(seq1)==id(seq2) or str(seq1)==str(seq2) as appropriate. For Biopython 1.xx (maybe 1.55 or 1.56?) the Seq classes will switch to using string equality (with an alphabet aware warning for comparing DNA to RNA etc), but will also trigger a warning that this is a change from previous releases, and suggest in the short term the continued explicit use of either id(seq1)==id(seq2) for object identity or str(seq1)==str(seq2) for string identity. For Biopython 1.yy (maybe 1.57?) the Seq classes will use string equality (with an alphabet aware warning for comparing DNA to RNA etc), without any warning about this being a change from historic behaviour. These warning messages could also point at a wiki page, and we'd need a FAQ entry in the tutorial as well. The aim of this slightly drawn out switch is to try and make sure all users are aware of the change, even if they only update their copy of Biopython every few releases. Does that all sound sensible? If so, we should probably have an announcement on the main mailing list, in case there are any other views. Other more complex options include a flag for switching between the modes - but that complexity doesn't seem such a good idea to me. All my own code and most of the unit tests use str(seq1)==str(seq2) explicitly anyway. The only exception is some of the genetic algorithm unit tests which do seem to want explicit object identity. Regards, Peter From biopython at maubp.freeserve.co.uk Tue Feb 23 11:31:35 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Feb 2010 11:31:35 +0000 Subject: [Biopython-dev] Handles and/or filenames in Bio.SeqIO etc? In-Reply-To: <20090728221726.GK68751@sobchak.mgh.harvard.edu> References: <320fb6e00907280934i54f326a6r38325c05a314cdbc@mail.gmail.com> <20090728221726.GK68751@sobchak.mgh.harvard.edu> Message-ID: <320fb6e01002230331j5f5f87c5lf328d3bacc4a557b@mail.gmail.com> Hi all, As mentioned in another thread, Brad, Michiel and I had an informal meeting earlier this month in Tokyo and discussed some plans for Biopython. One of the short term changes we agreed on was to push ahead with the Seq object equality changes, see: http://lists.open-bio.org/pipermail/biopython-dev/2010-February/007351.html Another short term change we agreed was worthwhile was to follow other Python libraries and allow handles OR filenames in our parsers (starting with SeqIO and AlignIO). This follows the discussion for the "TreeIO" module (since renamed) and the Bio.SeqIO.convert functions here on the mailing list last year, see: http://lists.open-bio.org/pipermail/biopython-dev/2009-July/006503.html I will tackle this shortly for Bio.SeqIO and Bio.AlignIO. Peter From bugzilla-daemon at portal.open-bio.org Tue Feb 23 17:43:01 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Feb 2010 12:43:01 -0500 Subject: [Biopython-dev] [Bug 3013] import warnings missing in Bio/PDB/MMCIF2Dict.py In-Reply-To: Message-ID: <201002231743.o1NHh17v001826@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3013 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-23 12:43 EST ------- Hi Eric, I have fixed most (all?) of those problems reported by pylint - see mailing list post. Thanks! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Feb 23 17:43:31 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Feb 2010 17:43:31 +0000 Subject: [Biopython-dev] Running pylint over Biopython Message-ID: <320fb6e01002230943g474dc673x43f7df120e679d91@mail.gmail.com> Hi all, Those following @Biopython on twitter or subscribed to the github RSS feed for our repository will know this already, but I've been using pylint today to spot some errors in Biopython. http://www.logilab.org/project/pylint This was prompted by Eric trying this on Bio.PDB for Bug 3013 and finding some issues - thank Eric, this was a valuable suggestion. With its default settings pylint is very very noisy, and in particular doesn't like our naming conventions. However, with the following command line you can focus in on the important stuff: pylint --disable-msg-cat=CRW --include-ids=y --disable-msg=E1101,E1103,E0102 -r n Bio BioSQL Note that instead of module names, you can give filenames (e.g. *.py). What that does is disable several categories of message (conventions, possible refactorings, warnings) leaving just errors and fatal messages. I turned on the message identifiers so that I have something useful to stick into Google if need be, or to add to the ignore list (currently three cases which looked like false positives). Then I turn off the detailed report. [Tip - don't run this from the Biopython source directory as then importing our C code modules will fail] As you will be able to tell from the recent flurry of git commits, this highlighted some simple errors like missing imports or typos in variable names. Tiago, could you have a look at these possible problems in Bio.PopGen: ************* Module Bio.PopGen.Async E0602: 78:Async.get_result: Undefined variable 'done' E0602: 79:Async.get_result: Undefined variable 'done' ************* Module Bio.PopGen.GenePop E0602:160:Record.split_in_pops: Undefined variable 'GenePop' E0602:177:Record.split_in_loci: Undefined variable 'GenePop' ************* Module Bio.PopGen.GenePop.Controller E0602: 41:_read_allele_freq_table: Undefined variable 'self' E0602:133:_hw_func: Undefined variable 'self' E0602:393:GenePopController.test_pop_hw_prob: Undefined variable 'ext' E0602:458:GenePopController.test_ld.ld_pop_func: Undefined variable 'currrent_pop' ************* Module Bio.PopGen.SimCoal.Cache E0602: 79:SimCoalCache.getSimulation: Undefined variable 'Config' E0602: 88: Undefined variable 'Cache' ************* Module Bio.PopGen.SimCoal.Controller E0602: 47:SimCoalController.run_simcoal: Undefined variable 'Config' Eric, I don't have all the dependencies installed by pylint does appear to dislike a few things in Bio.Phylo on the trunk: ************* Module Bio.Phylo.BaseTree E0203:521:TreeMixin.prune: Access to member 'root' before its definition line 531 E0203:527:TreeMixin.prune: Access to member 'root' before its definition line 531 E0202:672:Subtree.root: An attribute inherited from TreeMixin hide this method ************* Module Bio.Phylo.PhyloXML E1120:182:Phylogeny.get_alignment: No value passed for parameter 'follow_attrs' in function call One thing this exercise has shown is that we still need to do some work on the unit test coverage. Regards Peter From tiagoantao at gmail.com Tue Feb 23 17:56:22 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 23 Feb 2010 17:56:22 +0000 Subject: [Biopython-dev] Running pylint over Biopython In-Reply-To: <320fb6e01002230943g474dc673x43f7df120e679d91@mail.gmail.com> References: <320fb6e01002230943g474dc673x43f7df120e679d91@mail.gmail.com> Message-ID: <6d941f121002230956h582acc11r59cb18bca8d2f727@mail.gmail.com> This comes in a good time, I've actually been making changes to the code (as the genepop parser is not able to handle big files and I've had quite a few complains about that). it seems to be 2.6 related or so because I've detected the Config problem myself. I will correct this next week (this week is _impossible_), along with an update to the genepop parser to support big files. 2010/2/23 Peter : > Hi all, > > Those following @Biopython on twitter or subscribed to the github RSS > feed for our repository will know this already, but I've been using > pylint today to spot some errors in Biopython. > http://www.logilab.org/project/pylint > > This was prompted by Eric trying this on Bio.PDB for Bug 3013 and > finding some issues - thank Eric, this was a valuable suggestion. > > With its default settings pylint is very very noisy, and in particular > doesn't like our naming conventions. However, with the following > command line you can focus in on the important stuff: > > pylint --disable-msg-cat=CRW --include-ids=y > --disable-msg=E1101,E1103,E0102 -r n Bio BioSQL > > Note that instead of module names, you can give filenames (e.g. *.py). > What that does is disable several categories of message (conventions, > possible refactorings, warnings) leaving just errors and fatal > messages. I turned on the message identifiers so that I have something > useful to stick into Google if need be, or to add to the ignore list > (currently three cases which looked like false positives). Then I turn > off the detailed report. > > [Tip - don't run this from the Biopython source directory as then > importing our C code modules will fail] > > As you will be able to tell from the recent flurry of git commits, > this highlighted some simple errors like missing imports or typos in > variable names. > > > Tiago, could you have a look at these possible problems in Bio.PopGen: > > ************* Module Bio.PopGen.Async > E0602: 78:Async.get_result: Undefined variable 'done' > E0602: 79:Async.get_result: Undefined variable 'done' > ************* Module Bio.PopGen.GenePop > E0602:160:Record.split_in_pops: Undefined variable 'GenePop' > E0602:177:Record.split_in_loci: Undefined variable 'GenePop' > ************* Module Bio.PopGen.GenePop.Controller > E0602: 41:_read_allele_freq_table: Undefined variable 'self' > E0602:133:_hw_func: Undefined variable 'self' > E0602:393:GenePopController.test_pop_hw_prob: Undefined variable 'ext' > E0602:458:GenePopController.test_ld.ld_pop_func: Undefined variable > 'currrent_pop' > ************* Module Bio.PopGen.SimCoal.Cache > E0602: 79:SimCoalCache.getSimulation: Undefined variable 'Config' > E0602: 88: Undefined variable 'Cache' > ************* Module Bio.PopGen.SimCoal.Controller > E0602: 47:SimCoalController.run_simcoal: Undefined variable 'Config' > > > Eric, I don't have all the dependencies installed by pylint does > appear to dislike a few things in Bio.Phylo on the trunk: > > ************* Module Bio.Phylo.BaseTree > E0203:521:TreeMixin.prune: Access to member 'root' before its > definition line 531 > E0203:527:TreeMixin.prune: Access to member 'root' before its > definition line 531 > E0202:672:Subtree.root: An attribute inherited from TreeMixin hide this method > ************* Module Bio.Phylo.PhyloXML > E1120:182:Phylogeny.get_alignment: No value passed for parameter > 'follow_attrs' in function call > > > One thing this exercise has shown is that we still need to do some > work on the unit test coverage. > > Regards > > Peter > -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From bugzilla-daemon at portal.open-bio.org Tue Feb 23 18:03:53 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Feb 2010 13:03:53 -0500 Subject: [Biopython-dev] [Bug 3013] import warnings missing in Bio/PDB/MMCIF2Dict.py In-Reply-To: Message-ID: <201002231803.o1NI3r10002509@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3013 macrozhu+biopy at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |macrozhu+biopy at gmail.com ------- Comment #5 from macrozhu+biopy at gmail.com 2010-02-23 13:03 EST ------- wow, the developers really respond very quickly. How about running >>pylint<< or >>pychecker<< on all BioPython code to detect potential problems? cheers, -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Feb 23 18:59:49 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Feb 2010 13:59:49 -0500 Subject: [Biopython-dev] [Bug 3013] import warnings missing in Bio/PDB/MMCIF2Dict.py In-Reply-To: Message-ID: <201002231859.o1NIxnJH004142@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3013 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-23 13:59 EST ------- (In reply to comment #5) > wow, the developers really respond very quickly. > > How about running >>pylint<< or >>pychecker<< on all BioPython code to detect > potential problems? > > cheers, > Already tried with pylint earlier today ;) http://lists.open-bio.org/pipermail/biopython-dev/2010-February/007354.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Wed Feb 24 03:11:25 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 23 Feb 2010 22:11:25 -0500 Subject: [Biopython-dev] Running pylint over Biopython In-Reply-To: <320fb6e01002230943g474dc673x43f7df120e679d91@mail.gmail.com> References: <320fb6e01002230943g474dc673x43f7df120e679d91@mail.gmail.com> Message-ID: <3f6baf361002231911g65b47cbfw3625f9aacf863abc@mail.gmail.com> 2010/2/23 Peter > Hi all, > > Those following @Biopython on twitter or subscribed to the github RSS > feed for our repository will know this already, but I've been using > pylint today to spot some errors in Biopython. > http://www.logilab.org/project/pylint > > This was prompted by Eric trying this on Bio.PDB for Bug 3013 and > finding some issues - thank Eric, this was a valuable suggestion. > > Glad I could help. :) > Eric, I don't have all the dependencies installed by pylint does > appear to dislike a few things in Bio.Phylo on the trunk: > Pylint hates the way I wrote Bio.Phylo, in particular the way TreeMixin assumes it will be mixed with a class that has 'root' and 'is_terminal' attributes, and the __dict__ hack in the PhyloXML class __init__ methods -- it can't figure out where the attributes are coming from. The last error was real, and I've pushed a fix to the trunk. Thanks for catching it. One thing this exercise has shown is that we still need to do some > work on the unit test coverage. > Agreed. I also added a unit test for get_alignment (finally), and should get to TreeMixin.prune and .split soon. Then Bio.Phylo will have essentially 100% unit test coverage. Cheers -Eric From biopython at maubp.freeserve.co.uk Wed Feb 24 07:41:18 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Feb 2010 07:41:18 +0000 Subject: [Biopython-dev] Running pylint over Biopython In-Reply-To: <3f6baf361002231911g65b47cbfw3625f9aacf863abc@mail.gmail.com> References: <320fb6e01002230943g474dc673x43f7df120e679d91@mail.gmail.com> <3f6baf361002231911g65b47cbfw3625f9aacf863abc@mail.gmail.com> Message-ID: <320fb6e01002232341n3ee397basddde348df86d4871@mail.gmail.com> On Wed, Feb 24, 2010 at 3:11 AM, Eric Talevich wrote: > 2010/2/23 Peter > >> Hi all, >> >> Those following @Biopython on twitter or subscribed to the github RSS >> feed for our repository will know this already, but I've been using >> pylint today to spot some errors in Biopython. >> http://www.logilab.org/project/pylint >> >> This was prompted by Eric trying this on Bio.PDB for Bug 3013 and >> finding some issues - thank Eric, this was a valuable suggestion. >> >> Glad I could help. :) Re-reading Bug 3013, we might also want to try PyChecker as suggested by Hongbo Zhu - I've not used that before. >> Eric, I don't have all the dependencies installed by pylint does >> appear to dislike a few things in Bio.Phylo on the trunk: > > Pylint hates the way I wrote Bio.Phylo, in particular the way TreeMixin > assumes it will be mixed with a class that has 'root' and 'is_terminal' > attributes, and the __dict__ hack in the PhyloXML class __init__ methods -- > it can't figure out where the attributes are coming from. Some of the "apparent false positives" I was ignoring related to the iterator classes in Bio.SeqIO, again this seems to be valid code which pylint can't cope with. We may want to follow up on this (it could be a bug in pylint?). That said, if you can think of a cleaner way to code your bits that might be advantageous for long term maintainance. Maybe just add a TODO comment to consider using Abstract Base Classes once we require Python 2.6+ for Biopython (if that looks suitable)? > The last error was real, and I've pushed a fix to the trunk. > Thanks for catching it. Cool. >> One thing this exercise has shown is that we still need >> to do some work on the unit test coverage. > > Agreed. I also added a unit test for get_alignment (finally), > and should get to TreeMixin.prune and .split soon. Then > Bio.Phylo will have essentially 100% unit test coverage. I didn't mean to single out just Bio.Phylo - I meant the whole of Biopython would benefit from more unit tests. In particular, a lot of the "minor" errors pylint helped me fix were in error messages (e.g. wrong variable name used). This means if a user hit the error, rather than the exception we wanted to raise they'd get an error about our message. So, not critical, but it suggests we need more tests to cover the exceptions (as well as the more important tests to cover typical usage). Peter From p.j.a.cock at googlemail.com Wed Feb 24 07:43:48 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 24 Feb 2010 07:43:48 +0000 Subject: [Biopython-dev] Medium/long term plans Message-ID: <320fb6e01002232343s2df80990s96774b44f942e851@mail.gmail.com> Hi all, As mentioned in other recent threads, Brad and I were in Tokyo earlier this month for the DBCLS BioHackathon 2010 (see http://hackathon3.dbcls.jp/ for details). While there, we met up with Michiel for an informal dinner meeting, and discussed some possible plans for Biopython. === Short term action points === Seq object equality, see: http://lists.open-bio.org/pipermail/biopython-dev/2010-February/007351.html Filenames or handles in SeqIO, AlignIO, etc, see: http://lists.open-bio.org/pipermail/biopython-dev/2010-February/007352.html === Medium term action points === Python 3 support. With NumPy starting to make serious plans for supporting Python 3 this year, we should be able to look at doing this too. Initially we will continue to focus on Python 2.x, but make more effort to ensure that we can run without issues in the "Python 3 warning mode" available in Python 2.6 (or 2.7 once that is out). Then start to put Biopython through 2to3, and see how we get on. Name space reorganisation for sequences. It would be nice to have the Seq objects, SeqFeature, SeqRecord and probably SeqUtils and SeqIO all under one module name. We may be able to handle this in the short term with two import routes with the old module names discouraged and eventually deprecated. See also the "Code review request for phyloxml branch" thread which covered some of this: http://lists.open-bio.org/pipermail/biopython-dev/2010-January/007215.html === Long term action points === There are things in Biopython that with hindsight we feel have not worked out so well (module naming, alphabets objects) where change may require a break, i.e. a Biopython version two. Should we start a wiki to record points of debate, and get people to list their niggles/faults for consideration? Regarding Python 3.x support and a possible Biopython 2.x see also Guido's blog post (there is probably an email version on one of the python mailing lists too): http://www.artima.com/weblogs/viewpost.jsp?thread=227041 Peter From biopython at maubp.freeserve.co.uk Wed Feb 24 11:52:55 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Feb 2010 11:52:55 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> Message-ID: <320fb6e01002240352p622f842bo140b1210d76fde4c@mail.gmail.com> On Tue, Dec 22, 2009 at 4:08 PM, Peter wrote: > > The gzip mode issue is interesting... running on the Mac, > Leopard 10.5, using the Apple provided Python 2.5.2, > looking at a gzipped QUAL file everything is fine: > > Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53) > [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>>> import gzip >>>> gzip.open("Quality/example.qual.gz", "r").read() > ... > > Looking at a gzipped FASTA file everything is fine: > ... > > But, there is a problem with my gzipped FASTQ file: > >>>> gzip.open("Quality/example.fastq.gz", "r").read() > '@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n' >>>> gzip.open("Quality/example.fastq.gz", "rb").read() > '@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n' >>>> gzip.open("Quality/example.fastq.gz", "rU").read() > Traceback (most recent call last): > ?File "", line 1, in > ?File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py", > line 220, in read > ? ?self._read(readsize) > ?File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py", > line 292, in _read > ? ?self._read_eof() > ?File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py", > line 311, in _read_eof > ? ?raise IOError, "CRC check failed" > IOError: CRC check failed > > I may have stumbled on a bug in the Python gzip library :( > Prompted by a thread on the BioPerl mailing list, I revisited this issue: http://lists.open-bio.org/pipermail/bioperl-l/2010-February/032359.html >From some cross platform testing, I always seem to get the CRC error when trying to open this gzipped FASTQ file in universal read lines mode. The FASTA and QUAL file seem fine. According to the gzip python module's documentation, it uses the zlib module, and you can find the underlying version number like this: >>> import zlib >>> zlib.ZLIB_VERSION '1.2.3' Results from some testing the simple examples above (using Python and the gzip module only): [1] Mac OS X 10.5, Python 2.5.2, GCC 4.0.1, zlib 1.2.3 - fails [2] Linux, Python 2.4.3, GCC 3.4.5, zlib 1.2.1.2 - fails [3] Linux, Python 2.3.4, GCC 3.4.6, zlib 1.2.1.2 - fails [3] Linux, Python 2.6.1, GCC 3.4.6, zlib 1.2.1.2 - fails [4] Linux, Python 2.4.3, GCC 4.1.2, zlib 1.2.3 - fails [4] Linux, Python 2.6.1, GCC 3.4.6, zlib 1.2.3 - fails [5] Windows XP 32bit, Python 2.7a1, MSC v.1500, zlib 1.2.3 - fails [5] Windows XP 32bit, Python 2.6, MSC v.1500, zlib 1.2.3 - fails [5] Windows XP 32bit, Python 2.5.2, MSC v.1310, zlib 1.2.3 - fails [5] Windows XP 32bit, Python 2.4.4, MSC v.1310, zlib 1.2.3 - fails [5] Windows XP 32bit, Python 2.3.5, MSC v.1200, zlib 1.1.4 - fails [1] My mac, [2] Local server, [3] Cluster head, [4] Cluster node, [5] My windows box This tells me that the failure isn't OS specific, and isn't specific to a particular version of Python or zlib. Note that on the Mac and Linux machines where I get the CRC failure in python, the command line tool gunzip can decompress the files fine. If anyone else wants to test this (to confirm I'm not missing anything obvious), you can download the gzipped files from github here: wget http://github.com/peterjc/biopython/raw/index-zip/Tests/Quality/example.qual.gz wget http://github.com/peterjc/biopython/raw/index-zip/Tests/Quality/example.fasta.gz wget http://github.com/peterjc/biopython/raw/index-zip/Tests/Quality/example.fastq.gz Maybe this mode isn't fully supported in gzip? I think that provided we assume that any gzipped text file will use Unix new lines, we don't need to worry about this. Peter From biopython at maubp.freeserve.co.uk Wed Feb 24 12:00:18 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Feb 2010 12:00:18 +0000 Subject: [Biopython-dev] Running pylint over Biopython In-Reply-To: <6d941f121002230956h582acc11r59cb18bca8d2f727@mail.gmail.com> References: <320fb6e01002230943g474dc673x43f7df120e679d91@mail.gmail.com> <6d941f121002230956h582acc11r59cb18bca8d2f727@mail.gmail.com> Message-ID: <320fb6e01002240400k11764b2al2438d5381ed335c4@mail.gmail.com> 2010/2/23 Tiago Ant?o : > This comes in a good time, I've actually been making changes to the > code (as the genepop parser is not able to handle big files and I've > had quite a few complains about that). it seems to be 2.6 related or > so because I've detected the Config problem myself. I will correct > this next week (this week is _impossible_), along with an update to > the genepop parser to support big files. Sound good :) Peter From biopython at maubp.freeserve.co.uk Wed Feb 24 12:37:20 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Feb 2010 12:37:20 +0000 Subject: [Biopython-dev] test_PhyloXML.py failing on Windows Message-ID: <320fb6e01002240437s4be4aef1q82bee9141acfb501@mail.gmail.com> Hi Eric, Do you have access to a Windows machine for testing? There seem to be two issues in the PhyloXML tests (tested on Python 2.5, 2.6 and 2.7a1 on Windows XP): Count and confirm the number of tags in each example XML file. ... FAIL Round-trip parsing and serialization of apaf.xml. ... ERROR Round-trip parsing and serialization of bcl_2.xml. ... ERROR Round-trip parsing and serialization of o_tol_332_d_dollo.xml. ... ERROR Round-trip parsing and serialization of made_up.xml. ... ERROR Round-trip parsing and serialization of phyloxml_examples.xml. ... ERROR The tag count error I don't immediately understand: ====================================================================== FAIL: Count and confirm the number of tags in each example XML file. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\repositories\biopython_official\Tests\test_PhyloXML.py", line 56, in test_dump_tags self.assertEquals(len(output.readlines()), count) AssertionError: 301 != 289 ---------------------------------------------------------------------- The rest all fail in _stash_rewrite_and_call where something about your file renaming is failing. It looks like you deliberately move some of your example XML files to a temp filename during the test and then move them back. This seems risky (e.g. if the test suite is stopped mid way). Can you rework this to write the output to a temp file or perhaps better yet a StringIO handle? The errors look like this: ====================================================================== ERROR: Round-trip parsing and serialization of apaf.xml. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PhyloXML.py", line 561, in test_apaf (TreeTests, ['test_DomainArchitecture']), File "test_PhyloXML.py", line 546, in _stash_rewrite_and_call os.rename(fname, fname + '~') WindowsError: [Error 183] Cannot create a file when that file already exists ---------------------------------------------------------------------- Thanks, Peter From bugzilla-daemon at portal.open-bio.org Wed Feb 24 15:38:13 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 24 Feb 2010 10:38:13 -0500 Subject: [Biopython-dev] [Bug 2998] Document need XCode with 10.4 SDK for Mac OS In-Reply-To: Message-ID: <201002241538.o1OFcDJ4005667@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2998 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-24 10:38 EST ------- Just to add a note, on Snow Leopard Apple provides python 2.5 (default, 32bit only) and python 2.6 (supports 64 bit). I suspect if you install Biopython under python 2.6 you won't need the 10.4 SDK... something to check? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rjalves at igc.gulbenkian.pt Wed Feb 24 16:07:01 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Wed, 24 Feb 2010 16:07:01 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e01002240352p622f842bo140b1210d76fde4c@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> <320fb6e01002240352p622f842bo140b1210d76fde4c@mail.gmail.com> Message-ID: <4B854EA5.7050100@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Quoting Peter on 02/24/2010 11:52 AM: > Maybe this mode isn't fully supported in gzip? I think that provided we > assume that any gzipped text file will use Unix new lines, we don't need > to worry about this. Your example puzzled me. I did a few more tests with the files you pointed out. Turns out that the fastq file is 'badly' read even on normal open 'Universal' mode. This doesn't happen on the other files: Python 2.6.4 [GCC 4.4.1] Linux >>> open('example.fastq.gz', 'rb').read() == open('example.fastq.gz', 'rU').read() False >>> open('example.fasta.gz', 'rb').read() == open('example.fasta.gz', 'rU').read() True >>> open('example.qual.gz', 'rb').read() == open('example.qual.gz', 'rU').read() True In particular the character in fault seems to be: >>> (open('example.fastq.gz', 'rb').read()[145], open('example.fastq.gz', 'rU').read()[145]) ('\r', '\n') This is the only thing that changed. After going a little over the content of the file, I found this workaround: $ gunzip example.fastq.gz && echo >> example.fastq && gzip example.fastq Which simply adds a new empty line to the end of the file. >>> open('example.fastq.gz', 'rb').read() == open('example.fastq.gz', 'rU').read() True After this I also looked into python3 (3.1.1) just in case they fixed it already and apparently they did. See for yourself: This was tested in Python-3.1.1 from within blender2.5, (apologies for that, it was the only python3 version I had around). >>> open('example.fastq.gz','rb').read() == open('example.fastq.gz','rU').read() Traceback (most recent call last): (...) UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: unexpected code byte Seems like I need to force binary mode... >>> open('example.fastq.gz','rb').read() == open('example.fastq.gz','rbU').read() True Success! >>> import gzip >>> gzip.open('example.fastq.gz','rb').read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n' >>> gzip.open('example.fastq.gz','rU').read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n' >>> gzip.open('example.fastq.gz','rbU').read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n' And everything works as expected. So unless the blender devs changed python to fix this bug, this has been fixed in python3. Should this go upstream? - -- Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuFTqAACgkQYh11EUYTX9TXbgCgmBDKrrjL6Eue8qRfgs2ydAUQ 11kAnR0beVQDLP4ldBcd2RFfJ5Q+Opo6 =MLu3 -----END PGP SIGNATURE----- From biopython at maubp.freeserve.co.uk Wed Feb 24 16:48:58 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Feb 2010 16:48:58 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <4B854EA5.7050100@igc.gulbenkian.pt> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> <320fb6e01002240352p622f842bo140b1210d76fde4c@mail.gmail.com> <4B854EA5.7050100@igc.gulbenkian.pt> Message-ID: <320fb6e01002240848t37b0bea4jc8488440ab266ac6@mail.gmail.com> On Wed, Feb 24, 2010 at 4:07 PM, Renato Alves wrote: > > After this I also looked into python3 (3.1.1) just in case they fixed it > already and apparently they did. See for yourself: You seem to be right, I tried this on Windows using Python 3.0.1 and 3.1.1, C:\repositories\biopython_pjc\Tests>c:\python30\python Python 3.0.1 (r301:69561, Feb 13 2009, 20:04:18) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import gzip >>> gzip.open("Quality\example.fastq.gz", "r").read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;; 88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;; 3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7; 393333\n' >>> gzip.open("Quality\example.fastq.gz", "rb").read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;; 88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;; 3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7; 393333\n' >>> gzip.open("Quality\example.fastq.gz", "rU").read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;; 88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;; 3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7; 393333\n' C:\repositories\biopython_pjc\Tests>c:\python31\python Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import gzip >>> gzip.open("Quality\example.fastq.gz", "r").read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;; 88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;; 3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7; 393333\n' >>> gzip.open("Quality\example.fastq.gz", "rb").read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;; 88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;; 3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7; 393333\n' >>> gzip.open("Quality\example.fastq.gz", "rU").read() b'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;; 88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;; 3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7; 393333\n' So this does look like a Python 2.x bug which has been fixed in Python 3.x, and we should probably report this (after searching to see if it is a known issue). However, even if it is fixed in Python 2.6.x and 2.7.x, it won't get fixed in older versions like Python 2.4 or 2.5. Peter From biopython at maubp.freeserve.co.uk Wed Feb 24 17:03:09 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Feb 2010 17:03:09 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e01002240848t37b0bea4jc8488440ab266ac6@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> <320fb6e01002240352p622f842bo140b1210d76fde4c@mail.gmail.com> <4B854EA5.7050100@igc.gulbenkian.pt> <320fb6e01002240848t37b0bea4jc8488440ab266ac6@mail.gmail.com> Message-ID: <320fb6e01002240903m52629576vf85f428f68d32d15@mail.gmail.com> Hi all, I've updated my branch to cope with gzipped FASTQ files, tested on Windows XP, Mac OS X Snow Leopard, and Linux: http://github.com/peterjc/biopython/tree/index-zip This works by just opening gzipped files in default mode - which seems to be fine with the examples (FASTA, QUAL and FASTQ) where the text file in the archive uses Unix new line entries. While this may be a good solution, we should test on gzipped files containing Windows new lines too. Plus of course, try non-gzipped compression. And very large files. etc. Peter From eric.talevich at gmail.com Wed Feb 24 17:03:31 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 24 Feb 2010 12:03:31 -0500 Subject: [Biopython-dev] test_PhyloXML.py failing on Windows In-Reply-To: <320fb6e01002240437s4be4aef1q82bee9141acfb501@mail.gmail.com> References: <320fb6e01002240437s4be4aef1q82bee9141acfb501@mail.gmail.com> Message-ID: <3f6baf361002240903q2b395fd1qa5426a130b5b3d61@mail.gmail.com> On Wed, Feb 24, 2010 at 7:37 AM, Peter wrote: > Hi Eric, > > Do you have access to a Windows machine for testing? There > seem to be two issues in the PhyloXML tests (tested on > Python 2.5, 2.6 and 2.7a1 on Windows XP): > I'll have access to Windows XP this weekend, but I think I can probably fix these tests before then. ====================================================================== > FAIL: Count and confirm the number of tags in each example XML file. > ---------------------------------------------------------------------- > This was an early sanity check for parsing XML with ElementTree, and while I don't see a good reason for the number of lines to be different between OSes (line endings?), the test isn't Biopython-specific anyway. I'll just delete it. ====================================================================== > ERROR: Round-trip parsing and serialization of apaf.xml. > ---------------------------------------------------------------------- > Apparently Windows doesn't like renaming a file to replace another existing file. To fix this error asap I'll call os.remove before the rename, but you're right that these tests should be rewritten to use named temp files or StringIO. (I needed to trick unittest into re-running the parser tests on re-written files and this sufficed last summer) -Eric From rjalves at igc.gulbenkian.pt Wed Feb 24 17:13:41 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Wed, 24 Feb 2010 17:13:41 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e01002240848t37b0bea4jc8488440ab266ac6@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> <320fb6e01002240352p622f842bo140b1210d76fde4c@mail.gmail.com> <4B854EA5.7050100@igc.gulbenkian.pt> <320fb6e01002240848t37b0bea4jc8488440ab266ac6@mail.gmail.com> Message-ID: <4B855E45.9080708@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >So this does look like a Python 2.x bug which has been fixed in Python >3.x, and we should probably report this (after searching to see if it >is a known issue). The closest I could find is: http://bugs.python.org/issue5148 But it's also on gzip.open(), not plain open(). >However, even if it is fixed in Python 2.6.x and 2.7.x, it won't get >fixed in older versions like Python 2.4 or 2.5. Do you raising a warning if the 'U' mode is explicitly passed would be a reasonable solution for older python versions? Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuFXkMACgkQYh11EUYTX9TKNACfXIj2p5OTRetf9cWU/ppV8oWb CPcAoIJkkNfHj6AeLAxl2/FtSH3+7UR5 =W7wg -----END PGP SIGNATURE----- From biopython at maubp.freeserve.co.uk Wed Feb 24 17:28:58 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Feb 2010 17:28:58 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <4B855E45.9080708@igc.gulbenkian.pt> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> <320fb6e01002240352p622f842bo140b1210d76fde4c@mail.gmail.com> <4B854EA5.7050100@igc.gulbenkian.pt> <320fb6e01002240848t37b0bea4jc8488440ab266ac6@mail.gmail.com> <4B855E45.9080708@igc.gulbenkian.pt> Message-ID: <320fb6e01002240928h54519628pfb91dd1bf8d9c1f7@mail.gmail.com> On Wed, Feb 24, 2010 at 5:13 PM, Renato Alves wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > >>So this does look like a Python 2.x bug which has been fixed in Python >>3.x, and we should probably report this (after searching to see if it >>is a known issue). > > The closest I could find is: http://bugs.python.org/issue5148 > > But it's also on gzip.open(), not plain open(). It is gzip.open() that we have a problem with, open() is fine. It does look like http://bugs.python.org/issue6759 and/or the linked bug http://bugs.python.org/issue6759 cover this issue. Thanks for finding them. >>However, even if it is fixed in Python 2.6.x and 2.7.x, it won't get >>fixed in older versions like Python 2.4 or 2.5. > > Do you raising a warning if the 'U' mode is explicitly passed > would be a reasonable solution for older python versions? Are you asking about what I would like Python to do? I would like gzip.open() to support universal newline mode. For Biopython's index function we currently don't allow the user to specify the mode at all - the code decides this based on the file format (SFF files must be binary, for text files I use universal newline mode). Peter From rjalves at igc.gulbenkian.pt Wed Feb 24 18:25:04 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Wed, 24 Feb 2010 18:25:04 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e01002240928h54519628pfb91dd1bf8d9c1f7@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> <320fb6e01002240352p622f842bo140b1210d76fde4c@mail.gmail.com> <4B854EA5.7050100@igc.gulbenkian.pt> <320fb6e01002240848t37b0bea4jc8488440ab266ac6@mail.gmail.com> <4B855E45.9080708@igc.gulbenkian.pt> <320fb6e01002240928h54519628pfb91dd1bf8d9c1f7@mail.gmail.com> Message-ID: <4B856F00.7030201@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > For Biopython's index function we currently don't allow the > user to specify the mode at all - the code decides this based > on the file format (SFF files must be binary, for text files I use > universal newline mode). For some reason I thought the user could set the mode. Anyway, thanks for the clarification. Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuFbvwACgkQYh11EUYTX9QM6gCeK4aMVBoZWZmI+SNccwSd9qle xv8AnA8gZLQn1m8bXMT9Dl5YIRM4akC2 =jQ9l -----END PGP SIGNATURE----- From bugzilla-daemon at portal.open-bio.org Thu Feb 25 13:35:04 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Feb 2010 08:35:04 -0500 Subject: [Biopython-dev] [Bug 2837] Reading Roche 454 SFF sequence read files in Bio.SeqIO In-Reply-To: Message-ID: <201002251335.o1PDZ4qn013099@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2837 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-25 08:35 EST ------- Marking as fixed since I recently merged this code into the trunk. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Feb 25 14:29:19 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 25 Feb 2010 14:29:19 +0000 Subject: [Biopython-dev] test_PhyloXML.py failing on Windows In-Reply-To: <3f6baf361002240903q2b395fd1qa5426a130b5b3d61@mail.gmail.com> References: <320fb6e01002240437s4be4aef1q82bee9141acfb501@mail.gmail.com> <3f6baf361002240903q2b395fd1qa5426a130b5b3d61@mail.gmail.com> Message-ID: <320fb6e01002250629te597954v46308838faca607e@mail.gmail.com> On Wed, Feb 24, 2010 at 5:03 PM, Eric Talevich wrote: > > Apparently Windows doesn't like renaming a file to replace another existing > file. To fix this error asap I'll call os.remove before the rename, ... I had to add other similar check before it would run on my machine. > but you're right that these tests should be rewritten to use named temp > files or StringIO. (I needed to trick unittest into re-running the parser > tests on re-written files and this sufficed last summer) OK, something for the TODO list. Should we file a bug to remind us? Peter From biopython at maubp.freeserve.co.uk Fri Feb 26 13:09:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Feb 2010 13:09:46 +0000 Subject: [Biopython-dev] ImportWarning is new on Python 2.5 Message-ID: <320fb6e01002260509k65e9f9acm494c80a76af8d25e@mail.gmail.com> Hi Eric, I've just been running the test suite on Python 2.4 (on CentOS 5.4) and noticed you use ImportWarning (which was added in Python 2.5) in Bio/Phylo/PhyloXMLIO.py Although we are going to phase out support for Python 2.4, we still need to keep things compatible for now. Are you happy to switch this to a different warning for now, and add a TODO comment to put it back to an ImportWarning once we drop Python 2.4 support? Thanks Peter From eric.talevich at gmail.com Fri Feb 26 14:56:53 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 26 Feb 2010 09:56:53 -0500 Subject: [Biopython-dev] ImportWarning is new on Python 2.5 In-Reply-To: <320fb6e01002260509k65e9f9acm494c80a76af8d25e@mail.gmail.com> References: <320fb6e01002260509k65e9f9acm494c80a76af8d25e@mail.gmail.com> Message-ID: <3f6baf361002260656n581a526dtc4a5374640f546ed@mail.gmail.com> On Fri, Feb 26, 2010 at 8:09 AM, Peter wrote: > Hi Eric, > > I've just been running the test suite on Python 2.4 (on CentOS 5.4) > and noticed you use ImportWarning (which was added in Python 2.5) in > Bio/Phylo/PhyloXMLIO.py > > Although we are going to phase out support for Python 2.4, we still > need to keep things compatible for now. > > Are you happy to switch this to a different warning for now, and add a > TODO comment to put it back to an ImportWarning once we drop Python > 2.4 support? > Sure, I'll switch it to a generic Warning for now and leave a comment. I doubt the type of the warning is very important for most uses. -Eric From bugzilla-daemon at portal.open-bio.org Fri Feb 26 16:26:10 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Feb 2010 11:26:10 -0500 Subject: [Biopython-dev] [Bug 2553] Adding SeqRecord objects to an alignment (append or extend) In-Reply-To: Message-ID: <201002261626.o1QGQA1g028222@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2553 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-26 11:26 EST ------- I've started a possible implementation of an improved multiple sequence alignment object on a github branch: http://github.com/peterjc/biopython/commits/alignment-obj This already handles: Bug 2553 - Adding SeqRecord objects to an alignment (append or extend) Bug 2554 - Creating an Alignment from a list of SeqRecord objects I also plan to cover: Bug 2551 - Adding advanced __getitem__ e.g. align[1:2,5:-5] Bug 2552 - Adding alignments -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 26 16:26:43 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Feb 2010 11:26:43 -0500 Subject: [Biopython-dev] [Bug 2554] Creating an Alignment from a list of SeqRecord objects In-Reply-To: Message-ID: <201002261626.o1QGQhNF028283@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2554 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-26 11:26 EST ------- I've started a possible implementation of an improved multiple sequence alignment object on a github branch: http://github.com/peterjc/biopython/commits/alignment-obj This already handles: Bug 2553 - Adding SeqRecord objects to an alignment (append or extend) Bug 2554 - Creating an Alignment from a list of SeqRecord objects I also plan to cover: Bug 2551 - Adding advanced __getitem__ e.g. align[1:2,5:-5] Bug 2552 - Adding alignments -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 26 17:28:31 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Feb 2010 12:28:31 -0500 Subject: [Biopython-dev] [Bug 2552] Adding alignments In-Reply-To: Message-ID: <201002261728.o1QHSVob029960@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2552 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-26 12:28 EST ------- I've started a possible implementation of an improved multiple sequence alignment object on a github branch: http://github.com/peterjc/biopython/commits/alignment-obj This now handles: Bug 2552 - Adding alignments (this bug) Bug 2553 - Adding SeqRecord objects to an alignment (append or extend) Bug 2554 - Creating an Alignment from a list of SeqRecord objects I also plan to cover: Bug 2551 - Adding advanced __getitem__ e.g. align[1:2,5:-5] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Feb 27 18:24:03 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 27 Feb 2010 13:24:03 -0500 Subject: [Biopython-dev] [Bug 3016] New: Change WriterTests in test_PhyloXML.py to use StringIO or temp files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3016 Summary: Change WriterTests in test_PhyloXML.py to use StringIO or temp files Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: minor Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: eric.talevich at gmail.com The method _stash_rewrite_and_call currently parses each of the example phyloXML files, renames the parsed file to [filename]~, writes out another copy (from the parsed data structure) using the original filename, re-runs the suite of parser tests on the rewritten files, and finally renames the stashed copies back to the original filenames. This is protected by a try-finally clause, but could still fail to restore the original test files if the Python interpreter is interrupted/killed. Moreover, the design is a little pathological, and could be hard to maintain or extend later. Redesign the writer tests to rewrite and test a copy of each originals at some location other than the original filename. Ideally, use StringIO to store the copy; a named temporary file (see tempfile module) is also acceptable. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.